your learning rate is too high, setting it to 1 Num weight bits = 13 learning rate = 1 initial_t = 1 power_t = 0.5 using no cache Reading datafile = train-sets/wiki1K.dat num sources = 1 average since example example current current current loss last counter weight label predict features 10.380830 10.380830 3 3.0 unknown 0.0000 37 10.456700 10.532571 6 6.0 unknown 0.0000 13 10.398070 10.327713 11 11.0 unknown 0.0000 31 10.499986 10.601903 22 22.0 unknown 0.0000 1 10.451025 10.402064 44 44.0 unknown 0.0000 165 10.446713 10.442300 87 87.0 unknown 0.0000 28 10.076803 9.706893 174 174.0 unknown 0.0000 16 9.506702 8.936600 348 348.0 unknown 0.0000 1 8.930677 8.354652 696 696.0 unknown 0.0000 142 finished run number of examples = 1000 weighted example sum = 1000 weighted label sum = 0 average loss = 8.716 best constant = 1 total feature number = 86919