Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/stanfordnlp/stanza.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-09-15Bump version number to release a few small changesHEADv1.4.2mainJohn Bauer
2022-09-15normalize and sort dependencies, add transformers extra (#1124)Nicholas Bollweg
normalize, sort deps syntax, add transformers extra
2022-09-15Hide the imports of SiLU and Mish from older versions of torch. #1120John Bauer
2022-09-153.9 is a supported version of python nowJohn Bauer
2022-09-15Stop requiring pytest for all installations. Instead we hopefully can ↵John Bauer
hopefully have the test suite install the extra_requires https://github.com/stanfordnlp/stanza/issues/1120 Install 'test' from setup.py as part of running the unit tests
2022-09-15Switch dict + list to OrderedDictJohn Bauer
2022-09-14Update a couple versions in the README.md to better reflect reality - we ↵John Bauer
support 3.8 & 3.9 on conda and CoreNLP is past 4.1.0
2022-09-14Squeeze a little bit more - only use depparse in the depparse pipelinev1.4.1John Bauer
2022-09-14Turn some multilingual pipeline tests into fixtures. Again, should save memoryJohn Bauer
2022-09-14Turn some pipelines getting built over and over into fixtures. Will make ↵John Bauer
them take up less GPU memory, even if the cleanup isn't reliable
2022-09-14Now there should be POS models which match the PL charlms as wellJohn Bauer
2022-09-14Simpler way to have PL charlm specific for NERJohn Bauer
2022-09-14Try to reduce the scope on various pipelines to make the test suite less ↵John Bauer
likely to run out of GPU memory. Not sure this is the correct approach
2022-09-14Lower log level on some messages we don't want written to the pipelineJohn Bauer
2022-09-14Oops, bugfix. Otherwise you get the whole dictionary for a language/model ↵John Bauer
pair where the model is not in the dictionary
2022-09-13Temporarily don't include charlm in the POS models for PL - those haven't ↵John Bauer
retrained yet
2022-09-13Add charlm to the sentiment dependencies when building resources.jsonJohn Bauer
2022-09-13PL now has an NER modelJohn Bauer
2022-09-13Add a couple sentiment models for v1.4.1John Bauer
2022-09-13Add a tool to evaluate treebanks that are written out by a parser, such as ↵John Bauer
when the constiuency_parser has --predict_file turned on. Allows for easy checking of what happens when multiple models are mixed together.
2022-09-13Refactor a little bit. Make it so the scoring interface can handle either ↵John Bauer
scored trees or trees with no score (another option would be to attach the score directly to a tree)
2022-09-13Default trees written with format _OJohn Bauer
2022-09-12Don't double save_dir if the user gives save_dir as part of the model filenameJohn Bauer
2022-09-12Fix remove_optimizer modeJohn Bauer
2022-09-12Throw out batches which had gone to NaN. Log the number of times it happensJohn Bauer
2022-09-10Not sure why, but AnCora xpos tags are half-finishedJohn Bauer
2022-09-10Add a debug log line to reloading optimizers in conparseJohn Bauer
2022-09-10Fix typoJohn Bauer
2022-09-10Add some doc and update dev_sentence -> dep_sentence to reflect where the ↵John Bauer
variable comes from
2022-09-10Fix a few tag errors when reading VIT constituentsJohn Bauer
2022-09-10Always save checkpoints. Always load from a checkpoint if one exists.con_checkpointJohn Bauer
Build the constituency optimizer using knowledge of how far you are in the training process - multistage part 1 gets Adadelta, for example Test that a multistage training process builds the correct optimizers, including when reloading When continuing training from a checkpoint, use the existing epochs_trained Restart epochs count when doing a finetune
2022-09-10Technically it works the old way, but the filenames look sillyJohn Bauer
2022-09-10Verify that hooks behave as expected when loading & savingJohn Bauer
2022-09-08Add a method to get the constituents known by a conparser, as requested in #1066John Bauer
2022-09-08NER get_known_tags possibly applies to multiple modelsJohn Bauer
2022-09-08relearn_structure should reuse the foundation_cache if possibleJohn Bauer
2022-09-08Use the same foundation cache as the retag_pipeline to avoid reloading the ↵John Bauer
same pretrains multiple times in the constituency
2022-09-08A script which convert Sindhi tokenization from Isra universityJohn Bauer
Can also be applied to other similar datasets Read sentences & use the tokenization module to align the tokens with the original text Randomly split the sentences Write out the sentences and prepare their labels
2022-09-08Add a function which adds fake dependencies (if regular dependencies are ↵John Bauer
missing) to a list of conllu lines. Needed for processing conllu files with eval.py if a dataset doesn't have deps
2022-09-08Rearrange a bunch of functions from prepare_tokenizer_treebank to a common fileJohn Bauer
Move the read/write conllu functions to a common folder so they can be used elsewhere Move the MWT_RE etc as well Move prepare_treebank_labels to common (and rename it) Move convert_conllu_to_txt as well Refactor a tokenizer_conllu_name function
2022-09-08Reformat epoch logging in the conparserJohn Bauer
2022-09-08Eliminate a redundant function callJohn Bauer
2022-09-08Update a comment on a sentence being eliminated in constituency VITJohn Bauer
2022-09-07Add a test of small cache size in the multilingual pipelineJohn Bauer
2022-09-07remove -> pop for dict. Addresses #1115John Bauer
2022-09-07Add pytest marks to the langid testsJohn Bauer
2022-09-07Separate the langid test into two separate test scriptsJohn Bauer
2022-09-07Update for the latest version of the constituency treebankJohn Bauer
some sentences fixed in UD, some updates to the constituency treebank
2022-09-07If a ValueError happens while tokenizing, try to make it a bit more ↵John Bauer
descriptive. Apparently does not impact tokenization time
2022-09-06Temporarily extract a .tar.gz file if it's not extracted on the file systemJohn Bauer