Age | Commit message (Collapse) | Author | |
---|---|---|---|
2022-09-10 | Always save checkpoints. Always load from a checkpoint if one exists.con_checkpoint | John Bauer | |
Build the constituency optimizer using knowledge of how far you are in the training process - multistage part 1 gets Adadelta, for example Test that a multistage training process builds the correct optimizers, including when reloading When continuing training from a checkpoint, use the existing epochs_trained Restart epochs count when doing a finetune | |||
2022-09-10 | Technically it works the old way, but the filenames look silly | John Bauer | |
2022-09-10 | Verify that hooks behave as expected when loading & saving | John Bauer | |
2022-09-08 | Add a method to get the constituents known by a conparser, as requested in #1066 | John Bauer | |
2022-09-08 | NER get_known_tags possibly applies to multiple models | John Bauer | |
2022-09-08 | relearn_structure should reuse the foundation_cache if possible | John Bauer | |
2022-09-08 | Use the same foundation cache as the retag_pipeline to avoid reloading the ↵ | John Bauer | |
same pretrains multiple times in the constituency | |||
2022-09-08 | A script which convert Sindhi tokenization from Isra university | John Bauer | |
Can also be applied to other similar datasets Read sentences & use the tokenization module to align the tokens with the original text Randomly split the sentences Write out the sentences and prepare their labels | |||
2022-09-08 | Add a function which adds fake dependencies (if regular dependencies are ↵ | John Bauer | |
missing) to a list of conllu lines. Needed for processing conllu files with eval.py if a dataset doesn't have deps | |||
2022-09-08 | Rearrange a bunch of functions from prepare_tokenizer_treebank to a common file | John Bauer | |
Move the read/write conllu functions to a common folder so they can be used elsewhere Move the MWT_RE etc as well Move prepare_treebank_labels to common (and rename it) Move convert_conllu_to_txt as well Refactor a tokenizer_conllu_name function | |||
2022-09-08 | Reformat epoch logging in the conparser | John Bauer | |
2022-09-08 | Eliminate a redundant function call | John Bauer | |
2022-09-08 | Update a comment on a sentence being eliminated in constituency VIT | John Bauer | |
2022-09-07 | Add a test of small cache size in the multilingual pipeline | John Bauer | |
2022-09-07 | remove -> pop for dict. Addresses #1115 | John Bauer | |
2022-09-07 | Add pytest marks to the langid tests | John Bauer | |
2022-09-07 | Separate the langid test into two separate test scripts | John Bauer | |
2022-09-07 | Update for the latest version of the constituency treebank | John Bauer | |
some sentences fixed in UD, some updates to the constituency treebank | |||
2022-09-07 | If a ValueError happens while tokenizing, try to make it a bit more ↵ | John Bauer | |
descriptive. Apparently does not impact tokenization time | |||
2022-09-06 | Temporarily extract a .tar.gz file if it's not extracted on the file system | John Bauer | |
2022-09-06 | Sort subfolders so that results are reproducible | John Bauer | |
2022-09-06 | Import either lxml or ElementTree. ElementTree is slower, but doesn't ↵ | John Bauer | |
require an additional module dependency | |||
2022-09-06 | NAMESPACES -> NAMESPACE, replace all xpath with findall | John Bauer | |
2022-09-06 | Mostly pl_ner conversion test - tests the conversion of the XML so far | John Bauer | |
2022-09-06 | Separate out a smaller piece of the extraction function in convert_nkjp | John Bauer | |
2022-09-04 | Replace click with argparse in the Polish NER, rather than adding a new ↵ | John Bauer | |
library dependency | |||
2022-09-04 | Remove global variable usage by passing it around everywhere instead | John Bauer | |
2022-09-04 | NER Polish (#1110) | Karol Saputa | |
* Add NER dataset for Polish Co-authored-by: ryszardtuora <ryszardtuora@gmail.com> Co-authored-by: Karol Saputa <ksaputa@gputrain.dariah.ipipan.waw.pl> This PR adds Polish NER dataset #1070 | |||
2022-09-02 | Remove some unnecessary list creation. Rather than using shutil, read then ↵ | John Bauer | |
write sentences so that we can later manipulate the sentences as needed in write_sentences | |||
2022-09-02 | Add some more notes on bilstm size experiments in the classifier | John Bauer | |
2022-09-01 | Update compose_ete_results.py to allow multiple input files | John Bauer | |
2022-09-01 | no enhanced dependencies | John Bauer | |
2022-09-01 | Integrate the newer eval.py from udtools in place of the previously existing ↵ | John Bauer | |
conll18 version | |||
2022-09-01 | More informative errors if the data can't be found | John Bauer | |
2022-09-01 | Update Hebrew default to a combined model | John Bauer | |
2022-09-01 | Add the capacity to build he_combined models from UD_Hebrew-IAHLTwiki and a ↵hebrew_combined | John Bauer | |
fork of HTB. Addresses #1109 | |||
2022-09-01 | Allow for a list of NER models in the processors argument, similar to the ↵ | John Bauer | |
list of NER models in the package argument when creating a pipeline. From @mpenalver in #928 | |||
2022-09-01 | Output the class chosen if choosing an xpos factory from scratch | John Bauer | |
2022-09-01 | A couple more experiment notes | John Bauer | |
2022-08-31 | Don't save optimizers for the non-checkpoints (and fix a save bug for the ↵ | John Bauer | |
end of epoch save) | |||
2022-08-31 | Make saved models smaller in the classifier test. Will hopefully save disk ↵ | John Bauer | |
space and time | |||
2022-08-31 | notes on the madgrad LR experiments | John Bauer | |
2022-08-31 | Update a couple defaults based on recent experiments | John Bauer | |
2022-08-31 | Save the best score when training a model so that future training from a ↵ | John Bauer | |
checkpoint knows when to save a better model | |||
2022-08-31 | Update to 0.0005 - less likely to go completely bad | John Bauer | |
2022-08-31 | Oops, correct a few uses of model in the classifier main program | John Bauer | |
2022-08-31 | Save checkpoints with epochs_trained+1 at the end of an epoch (otherwise the ↵ | John Bauer | |
epoch will not be incremented properly when reloading) | |||
2022-08-31 | Add a checkpoint mechanism to sentiment | John Bauer | |
pass checkpoint_file to train_model in the unittest, but TODO: need to add tests for checkpointing | |||
2022-08-31 | Simplify the load mechanism in classifier Trainer so that the load() call ↵ | John Bauer | |
loads the pretrain, charlm, etc | |||
2022-08-31 | Refactor a Trainer object out of the classifier.py main program. In ↵sentiment_trainer | John Bauer | |
addition to the model, this saves and loads the optimizer and the number of epochs trained. Purpose: to make it so that it is easy to checkpoint model training the same way the charlm is checkpointed |