Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/stanfordnlp/stanza.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-11-04Adjust learning rate, don't print out infinite ppltrans_lmJohn Bauer
2022-11-04sum losses when training by length. use sum loss instead of mean so that ↵John Bauer
unnecessarily long items get dinged
2022-11-04Have defaults for both IT and VIJohn Bauer
2022-11-04log training args in the trans_lmJohn Bauer
2022-11-04attempt to use transformer lm rerankerJohn Bauer
2022-11-04maybe use a TQDM when scoring stuffJohn Bauer
2022-11-04Small end to end unit testJohn Bauer
very small parameters to make it time & memory efficient checks that loaded model params are the same, check that results are the same
2022-11-04Put the rest of the model config into args and save the config.John Bauer
Split up large batches of train data to hopefully avoid OOM errors Move criterion to the model - will allow for using it for scoring Add a score method using the criterion Add a load method score a single sentence as if it were a list of sentences
2022-11-04Load in a parse tree LM datasetJohn Bauer
sort by length Batchify the sentences without flattening them (the ppl is no longer useful with the new data) Pass in padding masks Add an argparser for the configuration Refactor a bunch of methods Predict the test loss on the parsed test set... not happy process a dev pred file too Save model (no config yet)
2022-11-04Code changes to make the demo work - can refactor things to make loading and ↵John Bauer
saving easier
2022-11-04Copy chunks of tutorial from pytorch.org as a module - does not run yet ↵John Bauer
because that was originally a jupyter notebook https://pytorch.org/tutorials/beginner/transformer_tutorial.html
2022-11-03Add min_len and max_len args to tokenize_wiki.py. Skip one line wiki docs, ↵John Bauer
since those are likely to be useless
2022-11-02Fix format error in log lineJohn Bauer
2022-11-01slice in a more generic manner when copying model. makes it easier to make ↵John Bauer
future changes
2022-11-01Set this option in the partitioned test so that it still tests this code ↵John Bauer
path if the lattn_partitioned default changes
2022-11-01lattn_partitioned == False should affect the input proj dimension as wellJohn Bauer
2022-11-01Add an argument for partitioning / not partitioning lattnJohn Bauer
2022-11-01Oops, this was incorrectJohn Bauer
2022-11-01Log some stats after all models are created for training (move the log line)John Bauer
2022-11-01Use some words from the silver dataset (currently |gold| words are added, ↵John Bauer
even if that means some overlaps)
2022-10-31Add a suffix argument to the renormalize scriptJohn Bauer
2022-10-31Script to renormalize Vietnamese diacriticsJohn Bauer
2022-10-30Add a separate argument for --silver_epoch_size, just in case people want thatJohn Bauer
2022-10-30Add notes on silver words for the delta embeddingJohn Bauer
2022-10-30Since we just ran into a bug where checkpoints were not correctly loaded, ↵John Bauer
add a test of exactly that functionality
2022-10-30update commentJohn Bauer
2022-10-30Track how many batches a model gets trained for. Backdoor test for the ↵John Bauer
silver trees, since adding a silver treebank makes an epoch take twice as long
2022-10-30Rough draft of using silver trees.John Bauer
Mostly untested. Includes an unfinished test of the silver data
2022-10-29Move uses_xpos() to the model itself, add it Ensemble. Will make it easier ↵John Bauer
to generalize selftrain.py to use Ensemble as well
2022-10-29Try smaller chunks for the parse_text. One giant chunk ran out of GPUJohn Bauer
2022-10-29Add a couple hopefully helpful log lines to the parse_text operationJohn Bauer
2022-10-29Connect model ensembles to the predict_text functionalityJohn Bauer
2022-10-29oops, model was supposed to be set to eval() when run in predict_text modeJohn Bauer
2022-10-29refactor predict dir,file,format args so they can be used elsewhere if neededJohn Bauer
2022-10-29Refactor an unnecessary duplication of argumentsJohn Bauer
2022-10-29Add functionality to turn a tokenized text file into a file of parse treesJohn Bauer
2022-10-29ignore em dashes in Wikipedia, as that seems to be listsJohn Bauer
2022-10-29Add a useful doc on how to build batches from tagged wordsJohn Bauer
2022-10-29Use reasonable defaults for EN and VI ensembles. Can add other languages as ↵John Bauer
relevant (especially IT)
2022-10-29Oops, logger was missing in the retagging.py moduleJohn Bauer
2022-10-29A script for tokenizing a Wikipedia file and writing it outJohn Bauer
2022-10-28Accept a single file for wiki processing in selftrain_wiki.pyJohn Bauer
2022-10-28fix bug in the lt/gt finding (it can start a line). use FoundationCache to ↵John Bauer
save on memory
2022-10-28Add a rotation to make N non-overlapping dev sets with the remainder being ↵John Bauer
train for vlsp22
2022-10-28Simplify reading & writing loop. Will make it easier to 'rotate' the dev setJohn Bauer
2022-10-28Add a prototype of model emsembling. Better would be to integrate it with ↵John Bauer
run_constituency and/or refactor some of the methods in Ensemble Error check the mixture of models Defaults currently set to English
2022-10-28Refactor the retagging args & pipeline creation into a separate modeuleJohn Bauer
2022-10-28Keep scores when parsing a block of sentences.John Bauer
Unfortunately, the lengths are causing a problem when just adding scores for the purposes of a reranker However, this seems to work for ensembling multiple models together add a keep_scores flag to parse_sentences etc
2022-10-28fix a commentJohn Bauer
2022-10-27Order the pretrains so that resource files are made with a consistent md5sumJohn Bauer