Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
them take up less GPU memory, even if the cleanup isn't reliable
|
|
|
|
|
|
likely to run out of GPU memory. Not sure this is the correct approach
|
|
|
|
pair where the model is not in the dictionary
|
|
retrained yet
|
|
|
|
|
|
|
|
when the constiuency_parser has --predict_file turned on. Allows for easy checking of what happens when multiple models are mixed together.
|
|
scored trees or trees with no score (another option would be to attach the score directly to a tree)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
variable comes from
|
|
|
|
Build the constituency optimizer using knowledge of how far you are in the training process - multistage part 1 gets Adadelta, for example
Test that a multistage training process builds the correct optimizers, including when reloading
When continuing training from a checkpoint, use the existing epochs_trained
Restart epochs count when doing a finetune
|
|
|
|
|
|
|
|
|
|
|
|
same pretrains multiple times in the constituency
|
|
Can also be applied to other similar datasets
Read sentences & use the tokenization module to align the tokens with the original text
Randomly split the sentences
Write out the sentences and prepare their labels
|
|
missing) to a list of conllu lines. Needed for processing conllu files with eval.py if a dataset doesn't have deps
|
|
Move the read/write conllu functions to a common folder so they can be used elsewhere
Move the MWT_RE etc as well
Move prepare_treebank_labels to common (and rename it)
Move convert_conllu_to_txt as well
Refactor a tokenizer_conllu_name function
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
some sentences fixed in UD, some updates to the constituency treebank
|
|
descriptive. Apparently does not impact tokenization time
|
|
|
|
|
|
require an additional module dependency
|
|
|
|
|