Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/stanfordnlp/stanza.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-09-10Always save checkpoints. Always load from a checkpoint if one exists.con_checkpointJohn Bauer
Build the constituency optimizer using knowledge of how far you are in the training process - multistage part 1 gets Adadelta, for example Test that a multistage training process builds the correct optimizers, including when reloading When continuing training from a checkpoint, use the existing epochs_trained Restart epochs count when doing a finetune
2022-09-10Technically it works the old way, but the filenames look sillyJohn Bauer
2022-09-10Verify that hooks behave as expected when loading & savingJohn Bauer
2022-09-08Add a method to get the constituents known by a conparser, as requested in #1066John Bauer
2022-09-08NER get_known_tags possibly applies to multiple modelsJohn Bauer
2022-09-08relearn_structure should reuse the foundation_cache if possibleJohn Bauer
2022-09-08Use the same foundation cache as the retag_pipeline to avoid reloading the ↵John Bauer
same pretrains multiple times in the constituency
2022-09-08A script which convert Sindhi tokenization from Isra universityJohn Bauer
Can also be applied to other similar datasets Read sentences & use the tokenization module to align the tokens with the original text Randomly split the sentences Write out the sentences and prepare their labels
2022-09-08Add a function which adds fake dependencies (if regular dependencies are ↵John Bauer
missing) to a list of conllu lines. Needed for processing conllu files with eval.py if a dataset doesn't have deps
2022-09-08Rearrange a bunch of functions from prepare_tokenizer_treebank to a common fileJohn Bauer
Move the read/write conllu functions to a common folder so they can be used elsewhere Move the MWT_RE etc as well Move prepare_treebank_labels to common (and rename it) Move convert_conllu_to_txt as well Refactor a tokenizer_conllu_name function
2022-09-08Reformat epoch logging in the conparserJohn Bauer
2022-09-08Eliminate a redundant function callJohn Bauer
2022-09-08Update a comment on a sentence being eliminated in constituency VITJohn Bauer
2022-09-07Add a test of small cache size in the multilingual pipelineJohn Bauer
2022-09-07remove -> pop for dict. Addresses #1115John Bauer
2022-09-07Add pytest marks to the langid testsJohn Bauer
2022-09-07Separate the langid test into two separate test scriptsJohn Bauer
2022-09-07Update for the latest version of the constituency treebankJohn Bauer
some sentences fixed in UD, some updates to the constituency treebank
2022-09-07If a ValueError happens while tokenizing, try to make it a bit more ↵John Bauer
descriptive. Apparently does not impact tokenization time
2022-09-06Temporarily extract a .tar.gz file if it's not extracted on the file systemJohn Bauer
2022-09-06Sort subfolders so that results are reproducibleJohn Bauer
2022-09-06Import either lxml or ElementTree. ElementTree is slower, but doesn't ↵John Bauer
require an additional module dependency
2022-09-06NAMESPACES -> NAMESPACE, replace all xpath with findallJohn Bauer
2022-09-06Mostly pl_ner conversion test - tests the conversion of the XML so farJohn Bauer
2022-09-06Separate out a smaller piece of the extraction function in convert_nkjpJohn Bauer
2022-09-04Replace click with argparse in the Polish NER, rather than adding a new ↵John Bauer
library dependency
2022-09-04Remove global variable usage by passing it around everywhere insteadJohn Bauer
2022-09-04NER Polish (#1110)Karol Saputa
* Add NER dataset for Polish Co-authored-by: ryszardtuora <ryszardtuora@gmail.com> Co-authored-by: Karol Saputa <ksaputa@gputrain.dariah.ipipan.waw.pl> This PR adds Polish NER dataset #1070
2022-09-02Remove some unnecessary list creation. Rather than using shutil, read then ↵John Bauer
write sentences so that we can later manipulate the sentences as needed in write_sentences
2022-09-02Add some more notes on bilstm size experiments in the classifierJohn Bauer
2022-09-01Update compose_ete_results.py to allow multiple input filesJohn Bauer
2022-09-01no enhanced dependenciesJohn Bauer
2022-09-01Integrate the newer eval.py from udtools in place of the previously existing ↵John Bauer
conll18 version
2022-09-01More informative errors if the data can't be foundJohn Bauer
2022-09-01Update Hebrew default to a combined modelJohn Bauer
2022-09-01Add the capacity to build he_combined models from UD_Hebrew-IAHLTwiki and a ↵hebrew_combinedJohn Bauer
fork of HTB. Addresses #1109
2022-09-01Allow for a list of NER models in the processors argument, similar to the ↵John Bauer
list of NER models in the package argument when creating a pipeline. From @mpenalver in #928
2022-09-01Output the class chosen if choosing an xpos factory from scratchJohn Bauer
2022-09-01A couple more experiment notesJohn Bauer
2022-08-31Don't save optimizers for the non-checkpoints (and fix a save bug for the ↵John Bauer
end of epoch save)
2022-08-31Make saved models smaller in the classifier test. Will hopefully save disk ↵John Bauer
space and time
2022-08-31notes on the madgrad LR experimentsJohn Bauer
2022-08-31Update a couple defaults based on recent experimentsJohn Bauer
2022-08-31Save the best score when training a model so that future training from a ↵John Bauer
checkpoint knows when to save a better model
2022-08-31Update to 0.0005 - less likely to go completely badJohn Bauer
2022-08-31Oops, correct a few uses of model in the classifier main programJohn Bauer
2022-08-31Save checkpoints with epochs_trained+1 at the end of an epoch (otherwise the ↵John Bauer
epoch will not be incremented properly when reloading)
2022-08-31Add a checkpoint mechanism to sentimentJohn Bauer
pass checkpoint_file to train_model in the unittest, but TODO: need to add tests for checkpointing
2022-08-31Simplify the load mechanism in classifier Trainer so that the load() call ↵John Bauer
loads the pretrain, charlm, etc
2022-08-31Refactor a Trainer object out of the classifier.py main program. In ↵sentiment_trainerJohn Bauer
addition to the model, this saves and loads the optimizer and the number of epochs trained. Purpose: to make it so that it is easy to checkpoint model training the same way the charlm is checkpointed