Num weight bits = 18
learning rate = 10
initial_t = 1
power_t = 0.5
decay_learning_rate = 1
creating cache_file = train-sets/wsj_small.dat.gz.cache
Reading from train-sets/wsj_small.dat.gz
num sources = 1
average    since      sequence         example            current label      current predicted  current   cur   cur         predic.        examples
loss       last        counter          weight          sequence prefix        sequence prefix features  pass   pol            made          gener.
22.333333  22.333333         3        3.000000   [14 10 13 9 1 2 1 4..] [11 1 2 1 2 3 1 2 3..]     2659     0     0              93              64
20.833333  19.333333         6        6.000000   [19 2 22 4 3 9 1 1 ..] [19 2 3 9 1 2 1 1 1..]     3324     0     0             196             160
17.909091  14.400000        11       11.000000   [29 4 3 9 1 1 23 8 ..] [1 2 3 9 1 1 1 10 7..]     1424     0     0             328             312
14.000000  10.090909        22       22.000000   [11 11 21 3 10 13 3..] [11 11 21 3 1 2 3 1..]     3419     0     0             613             576
10.909091  7.818182         44       44.000000   [3 26 9 1 4 3 1 2 5..] [3 1 1 1 2 3 1 2 2 ..]     1139     0     0            1120            1107
8.494253   6.023256         87       87.000000   [11 11 12 9 1 2 11 ..] [11 11 12 9 1 2 11 ..]     2564     1     0            2220            2192
4.344828   0.195402        174      174.000000   [11 1 10 13 2 17 30..] [11 1 10 13 2 17 30..]     1044     2     0            4370            4358
2.839080   1.333333        348      348.000000   [2 11 2 11 12 3 11 ..] [2 11 9 11 12 3 11 ..]     2279     4     1           49096            8673
4.918103   6.997126        696      696.000000   [19 22 4 5 3 1 2 1 ..] [28 5 12 7 41 32 27..]     2754     8     2          478276           17212

finished run
number of examples = 936
weighted example sum = 936
weighted label sum = 0
average loss = 9.39
best constant = -0.00107
total feature number = 87398973