1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
|
====== Optimization Package =======
{{anchor:optim.dok}}
This package provides a set of optimization algorithms, which all follow
a unified, closure-based API.
This package is fully compatible with the 'nn' package, but can also be
used to optimize arbitrary objective functions.
For now, the following algorithms are provided:
* Stochastic Gradient Descent (SGD): [[#optim.sgd|optim.sgd]]
* Averaged Stochastic Gradient Descent (ASGD): [[#optim.asgd|optim.asgd]]
* L-BFGS: [[#optim.lbfgs|optim.lbfgs]]
* Congugate Gradients (CG): [[#optim.cg|optim.cg]]
All these algorithms are designed to support batch optimization as
well as stochastic optimization. It's up to the user to construct an
objective function that represents the batch, mini-batch, or single sample
on which to evaluate the objective.
Some of these algorithms support a line search, which can be passed as
a function (L-BFGS), whereas others only support a learning rate (SGD).
====== Overview of the Optimization Package ======
{{anchor:optim.overview.dok}}
Rather than long descriptions, let's simply start with a little example.
<file lua>
-- write an example here.
</file>
===== Simple Objective =====
===== Neural Network Objective =====
====== Algorithms ======
{{anchor:nn.API}}
All the algorithms provided rely on a unified interface:
<file lua>
w_new,fs = optim.method(func,w,state)
</file>
where:
w is the trainable/adjustable parameter vector,
state contains both options for the algorithm and the state of the algorihtm,
func is a closure that has the following interface:
<file lua>
f,df_dw = func(w)
</file>
w_new is the new parameter vector (after optimization),
fs is a a table containing all the values of the objective, as evaluated during
the optimization procedure: fs[1] is the value before optimization, and fs[#fs]
is the most optimized one (the lowest).
===== [x] sgd(func, w, state) =====
{{anchor:optim.sgd}}
An implementation of Stochastic Gradient Descent.
===== [x] asgd(func, w, state) =====
{{anchor:optim.asgd}}
An implementation of Averaged Stochastic Gradient Descent.
===== [x] lbfgs(func, w, state) =====
{{anchor:optim.lbfgs}}
An implementation of L-BFGS.
===== [x] cg(func, w, state) =====
{{anchor:optim.cg}}
An implementation of the Conjugate Gradient method.
|