Num weight bits = 18 learning rate = 10 initial_t = 1 power_t = 0.5 decay_learning_rate = 1 creating cache_file = train-sets/wsj_small.dat.gz.cache Reading from train-sets/wsj_small.dat.gz num sources = 1 average since sequence example current label current predicted current cur cur predic. examples loss last counter weight sequence prefix sequence prefix features pass pol made gener. 22.333333 22.333333 3 3.000000 [14 10 13 9 1 2 1 4..] [11 1 2 1 2 3 1 2 3..] 2659 0 0 93 64 20.833333 19.333333 6 6.000000 [19 2 22 4 3 9 1 1 ..] [19 2 3 9 1 2 1 1 1..] 3324 0 0 196 160 17.909091 14.400000 11 11.000000 [29 4 3 9 1 1 23 8 ..] [1 2 3 9 1 1 1 10 7..] 1424 0 0 328 312 14.000000 10.090909 22 22.000000 [11 11 21 3 10 13 3..] [11 11 21 3 1 2 3 1..] 3419 0 0 613 576 10.909091 7.818182 44 44.000000 [3 26 9 1 4 3 1 2 5..] [3 1 1 1 2 3 1 2 2 ..] 1139 0 0 1120 1107 8.494253 6.023256 87 87.000000 [11 11 12 9 1 2 11 ..] [11 11 12 9 1 2 11 ..] 2564 1 0 2220 2192 4.344828 0.195402 174 174.000000 [11 1 10 13 2 17 30..] [11 1 10 13 2 17 30..] 1044 2 0 4370 4358 2.839080 1.333333 348 348.000000 [2 11 2 11 12 3 11 ..] [2 11 9 11 12 3 11 ..] 2279 4 1 49096 8673 4.918103 6.997126 696 696.000000 [19 22 4 5 3 1 2 1 ..] [28 5 12 7 41 32 27..] 2754 8 2 478276 17212 finished run number of examples = 936 weighted example sum = 936 weighted label sum = 0 average loss = 9.39 best constant = -0.00107 total feature number = 87398973