github.com/stanfordnlp/stanza.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2022-11-04	Adjust learning rate, don't print out infinite ppltrans_lm	John Bauer

2022-11-04	sum losses when training by length. use sum loss instead of mean so that ↵	John Bauer
	unnecessarily long items get dinged
2022-11-04	Have defaults for both IT and VI	John Bauer

2022-11-04	log training args in the trans_lm	John Bauer

2022-11-04	attempt to use transformer lm reranker	John Bauer

2022-11-04	maybe use a TQDM when scoring stuff	John Bauer

2022-11-04	Small end to end unit test	John Bauer
	very small parameters to make it time & memory efficient checks that loaded model params are the same, check that results are the same
2022-11-04	Put the rest of the model config into args and save the config.	John Bauer
	Split up large batches of train data to hopefully avoid OOM errors Move criterion to the model - will allow for using it for scoring Add a score method using the criterion Add a load method score a single sentence as if it were a list of sentences
2022-11-04	Load in a parse tree LM dataset	John Bauer
	sort by length Batchify the sentences without flattening them (the ppl is no longer useful with the new data) Pass in padding masks Add an argparser for the configuration Refactor a bunch of methods Predict the test loss on the parsed test set... not happy process a dev pred file too Save model (no config yet)
2022-11-04	Code changes to make the demo work - can refactor things to make loading and ↵	John Bauer
	saving easier
2022-11-04	Copy chunks of tutorial from pytorch.org as a module - does not run yet ↵	John Bauer
	because that was originally a jupyter notebook https://pytorch.org/tutorials/beginner/transformer_tutorial.html
2022-11-03	Add min_len and max_len args to tokenize_wiki.py. Skip one line wiki docs, ↵	John Bauer
	since those are likely to be useless
2022-11-02	Fix format error in log line	John Bauer

2022-11-01	slice in a more generic manner when copying model. makes it easier to make ↵	John Bauer
	future changes
2022-11-01	Set this option in the partitioned test so that it still tests this code ↵	John Bauer
	path if the lattn_partitioned default changes
2022-11-01	lattn_partitioned == False should affect the input proj dimension as well	John Bauer

2022-11-01	Add an argument for partitioning / not partitioning lattn	John Bauer

2022-11-01	Oops, this was incorrect	John Bauer

2022-11-01	Log some stats after all models are created for training (move the log line)	John Bauer

2022-11-01	Use some words from the silver dataset (currently \|gold\| words are added, ↵	John Bauer
	even if that means some overlaps)
2022-10-31	Add a suffix argument to the renormalize script	John Bauer

2022-10-31	Script to renormalize Vietnamese diacritics	John Bauer

2022-10-30	Add a separate argument for --silver_epoch_size, just in case people want that	John Bauer

2022-10-30	Add notes on silver words for the delta embedding	John Bauer

2022-10-30	Since we just ran into a bug where checkpoints were not correctly loaded, ↵	John Bauer
	add a test of exactly that functionality
2022-10-30	update comment	John Bauer

2022-10-30	Track how many batches a model gets trained for. Backdoor test for the ↵	John Bauer
	silver trees, since adding a silver treebank makes an epoch take twice as long
2022-10-30	Rough draft of using silver trees.	John Bauer
	Mostly untested. Includes an unfinished test of the silver data
2022-10-29	Move uses_xpos() to the model itself, add it Ensemble. Will make it easier ↵	John Bauer
	to generalize selftrain.py to use Ensemble as well
2022-10-29	Try smaller chunks for the parse_text. One giant chunk ran out of GPU	John Bauer

2022-10-29	Add a couple hopefully helpful log lines to the parse_text operation	John Bauer

2022-10-29	Connect model ensembles to the predict_text functionality	John Bauer

2022-10-29	oops, model was supposed to be set to eval() when run in predict_text mode	John Bauer

2022-10-29	refactor predict dir,file,format args so they can be used elsewhere if needed	John Bauer

2022-10-29	Refactor an unnecessary duplication of arguments	John Bauer

2022-10-29	Add functionality to turn a tokenized text file into a file of parse trees	John Bauer

2022-10-29	ignore em dashes in Wikipedia, as that seems to be lists	John Bauer

2022-10-29	Add a useful doc on how to build batches from tagged words	John Bauer

2022-10-29	Use reasonable defaults for EN and VI ensembles. Can add other languages as ↵	John Bauer
	relevant (especially IT)
2022-10-29	Oops, logger was missing in the retagging.py module	John Bauer

2022-10-29	A script for tokenizing a Wikipedia file and writing it out	John Bauer

2022-10-28	Accept a single file for wiki processing in selftrain_wiki.py	John Bauer

2022-10-28	fix bug in the lt/gt finding (it can start a line). use FoundationCache to ↵	John Bauer
	save on memory
2022-10-28	Add a rotation to make N non-overlapping dev sets with the remainder being ↵	John Bauer
	train for vlsp22
2022-10-28	Simplify reading & writing loop. Will make it easier to 'rotate' the dev set	John Bauer

2022-10-28	Add a prototype of model emsembling. Better would be to integrate it with ↵	John Bauer
	run_constituency and/or refactor some of the methods in Ensemble Error check the mixture of models Defaults currently set to English
2022-10-28	Refactor the retagging args & pipeline creation into a separate modeule	John Bauer

2022-10-28	Keep scores when parsing a block of sentences.	John Bauer
	Unfortunately, the lengths are causing a problem when just adding scores for the purposes of a reranker However, this seems to work for ensembling multiple models together add a keep_scores flag to parse_sentences etc
2022-10-28	fix a comment	John Bauer

2022-10-27	Order the pretrains so that resource files are made with a consistent md5sum	John Bauer