github.com/stanfordnlp/stanza.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2022-09-10	Always save checkpoints. Always load from a checkpoint if one exists.con_checkpoint	John Bauer
	Build the constituency optimizer using knowledge of how far you are in the training process - multistage part 1 gets Adadelta, for example Test that a multistage training process builds the correct optimizers, including when reloading When continuing training from a checkpoint, use the existing epochs_trained Restart epochs count when doing a finetune
2022-09-10	Technically it works the old way, but the filenames look silly	John Bauer

2022-09-10	Verify that hooks behave as expected when loading & saving	John Bauer

2022-09-08	Add a method to get the constituents known by a conparser, as requested in #1066	John Bauer

2022-09-08	NER get_known_tags possibly applies to multiple models	John Bauer

2022-09-08	relearn_structure should reuse the foundation_cache if possible	John Bauer

2022-09-08	Use the same foundation cache as the retag_pipeline to avoid reloading the ↵	John Bauer
	same pretrains multiple times in the constituency
2022-09-08	A script which convert Sindhi tokenization from Isra university	John Bauer
	Can also be applied to other similar datasets Read sentences & use the tokenization module to align the tokens with the original text Randomly split the sentences Write out the sentences and prepare their labels
2022-09-08	Add a function which adds fake dependencies (if regular dependencies are ↵	John Bauer
	missing) to a list of conllu lines. Needed for processing conllu files with eval.py if a dataset doesn't have deps
2022-09-08	Rearrange a bunch of functions from prepare_tokenizer_treebank to a common file	John Bauer
	Move the read/write conllu functions to a common folder so they can be used elsewhere Move the MWT_RE etc as well Move prepare_treebank_labels to common (and rename it) Move convert_conllu_to_txt as well Refactor a tokenizer_conllu_name function
2022-09-08	Reformat epoch logging in the conparser	John Bauer

2022-09-08	Eliminate a redundant function call	John Bauer

2022-09-08	Update a comment on a sentence being eliminated in constituency VIT	John Bauer

2022-09-07	Add a test of small cache size in the multilingual pipeline	John Bauer

2022-09-07	remove -> pop for dict. Addresses #1115	John Bauer

2022-09-07	Add pytest marks to the langid tests	John Bauer

2022-09-07	Separate the langid test into two separate test scripts	John Bauer

2022-09-07	Update for the latest version of the constituency treebank	John Bauer
	some sentences fixed in UD, some updates to the constituency treebank
2022-09-07	If a ValueError happens while tokenizing, try to make it a bit more ↵	John Bauer
	descriptive. Apparently does not impact tokenization time
2022-09-06	Temporarily extract a .tar.gz file if it's not extracted on the file system	John Bauer

2022-09-06	Sort subfolders so that results are reproducible	John Bauer

2022-09-06	Import either lxml or ElementTree. ElementTree is slower, but doesn't ↵	John Bauer
	require an additional module dependency
2022-09-06	NAMESPACES -> NAMESPACE, replace all xpath with findall	John Bauer

2022-09-06	Mostly pl_ner conversion test - tests the conversion of the XML so far	John Bauer

2022-09-06	Separate out a smaller piece of the extraction function in convert_nkjp	John Bauer

2022-09-04	Replace click with argparse in the Polish NER, rather than adding a new ↵	John Bauer
	library dependency
2022-09-04	Remove global variable usage by passing it around everywhere instead	John Bauer

2022-09-04	NER Polish (#1110)	Karol Saputa
	* Add NER dataset for Polish Co-authored-by: ryszardtuora <ryszardtuora@gmail.com> Co-authored-by: Karol Saputa <ksaputa@gputrain.dariah.ipipan.waw.pl> This PR adds Polish NER dataset #1070
2022-09-02	Remove some unnecessary list creation. Rather than using shutil, read then ↵	John Bauer
	write sentences so that we can later manipulate the sentences as needed in write_sentences
2022-09-02	Add some more notes on bilstm size experiments in the classifier	John Bauer

2022-09-01	Update compose_ete_results.py to allow multiple input files	John Bauer

2022-09-01	no enhanced dependencies	John Bauer

2022-09-01	Integrate the newer eval.py from udtools in place of the previously existing ↵	John Bauer
	conll18 version
2022-09-01	More informative errors if the data can't be found	John Bauer

2022-09-01	Update Hebrew default to a combined model	John Bauer

2022-09-01	Add the capacity to build he_combined models from UD_Hebrew-IAHLTwiki and a ↵hebrew_combined	John Bauer
	fork of HTB. Addresses #1109
2022-09-01	Allow for a list of NER models in the processors argument, similar to the ↵	John Bauer
	list of NER models in the package argument when creating a pipeline. From @mpenalver in #928
2022-09-01	Output the class chosen if choosing an xpos factory from scratch	John Bauer

2022-09-01	A couple more experiment notes	John Bauer

2022-08-31	Don't save optimizers for the non-checkpoints (and fix a save bug for the ↵	John Bauer
	end of epoch save)
2022-08-31	Make saved models smaller in the classifier test. Will hopefully save disk ↵	John Bauer
	space and time
2022-08-31	notes on the madgrad LR experiments	John Bauer

2022-08-31	Update a couple defaults based on recent experiments	John Bauer

2022-08-31	Save the best score when training a model so that future training from a ↵	John Bauer
	checkpoint knows when to save a better model
2022-08-31	Update to 0.0005 - less likely to go completely bad	John Bauer

2022-08-31	Oops, correct a few uses of model in the classifier main program	John Bauer

2022-08-31	Save checkpoints with epochs_trained+1 at the end of an epoch (otherwise the ↵	John Bauer
	epoch will not be incremented properly when reloading)
2022-08-31	Add a checkpoint mechanism to sentiment	John Bauer
	pass checkpoint_file to train_model in the unittest, but TODO: need to add tests for checkpointing
2022-08-31	Simplify the load mechanism in classifier Trainer so that the load() call ↵	John Bauer
	loads the pretrain, charlm, etc
2022-08-31	Refactor a Trainer object out of the classifier.py main program. In ↵sentiment_trainer	John Bauer
	addition to the model, this saves and loads the optimizer and the number of epochs trained. Purpose: to make it so that it is easy to checkpoint model training the same way the charlm is checkpointed