Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/stanfordnlp/stanza.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2021-08-23Update trainer.pywordinput-sentsegmenterGordon
2021-08-23Update utils.pyGordon
2021-08-23Update vocab.pyGordon
2021-08-23Update model.pyGordon
2021-08-23Update data.pyGordon
2021-08-12Fix some whitespaceJohn Bauer
2021-08-10includes TEST_100K as test set for BEST evalGordon
2021-08-10includes TEST_100K as test set for BEST evalGordon
2021-08-10Updated BEST to include TEST_100KGordon
2021-08-10Open/close files in a context to guarantee handles are closedJohn Bauer
2021-08-10Dictionary redo (#776)vythaihn
2021-08-04Create thai_syllable_dict_generator.pyGordon
2021-08-03Also check if the test set has tags not present in the tagger or if the train...John Bauer
2021-08-03Add a test to see if any tags are in the dev set but not the train setJohn Bauer
2021-08-03Change file not found to an errorJohn Bauer
2021-08-02Add two new NER models to the resourcesJohn Bauer
2021-07-31This test was backwards, causing a bunch of stray java processes when a conte...John Bauer
2021-07-31skip langid tests until resources set upJ38
2021-07-29Update word embedding to the dimension in the file when creating a new modelJohn Bauer
2021-07-28Double check that the length of the processors list is 2 when adding just tok...John Bauer
2021-07-28Add mwt if tokenize is passed without MWT (#777)David Riff
2021-07-27Add vlsp pos dataset option for VLSP WS task (#772)vythaihn
2021-07-25Add some explanation to the logging output for the NER scoresJohn Bauer
2021-07-25Add processing for it_fbk. Uses the .tsv file they sent us and their recomme...John Bauer
2021-07-25Add the ability for the ner model to upscale basic (no B- or I-) tagging -> B...John Bauer
2021-07-25Add a processing step for NHCLT datasets. Currently Afrikaans is the most us...John Bauer
2021-07-25Make the matrix more readable when there are a ton of categoriesJohn Bauer
2021-07-25Format ints differently from floats in the confusion matrixJohn Bauer
2021-07-25Add a confusion matrix over tokens to the output of the ner_taggerJohn Bauer
2021-07-25Add a flag for finetuning from a different load name from the save_nameJohn Bauer
2021-07-25If given an empty list, simply return an empty list when sort is called. Fix...John Bauer
2021-07-24Merge pull request #766 from stanfordnlp/thai_lst20_redoJohn Bauer
2021-07-23Add a test of empty text for the pipelineJohn Bauer
2021-07-23Add indentation to the json rather than saving it in one large dumpJohn Bauer
2021-07-23Fix command line for hindi datasetsJohn Bauer
2021-07-23Process gz files as well as .txt and .txt.xz in the charlmJohn Bauer
2021-07-22Adjust orchid preparation script to always include spaces after sentencesJohn Bauer
2021-07-22Add a test which checks that the orchid results are consistentJohn Bauer
2021-07-22Add a longer test for a couple different variations on processing textJohn Bauer
2021-07-22Add an option to split clauses into sentences if a space is between clausesJohn Bauer
2021-07-22Add more notes on how the tokenization boundaries are determinedJohn Bauer
2021-07-22Add an option to add spaces after the sentence ends (which is actually more c...John Bauer
2021-07-22Add a lot of notes on how the characters are expected to line up in the testJohn Bauer
2021-07-21Attempt to add a helpful error explaining where it looked for LST20John Bauer
2021-07-21Add a tiny test for part of the LST20 preparationJohn Bauer
2021-07-21Make the retokenization an option for the lst20 datasetJohn Bauer
2021-07-21Use pythainlp to resplit lst20 sentences as wellJohn Bauer
2021-07-21Refactor some of the processing code which uses pythainlpJohn Bauer
2021-07-21Revert "Adjust the newpar title"John Bauer
2021-07-20Standardize the final short_name of the hindi ner dataset regardless of which...John Bauer