Welcome to mirror list, hosted at ThFree Co, Russian Federation.

07-functional.md « docs_topics - github.com/stevedonovan/Penlight.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: 5921a3d5d15dfe8fcebe67a4322d2d9980634d9d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
## Functional Programming

### Sequences

@lookup pl.seq

A Lua iterator (in its simplest form) is a function which can be repeatedly
called to return a set of one or more values. The `for in` statement understands
these iterators, and loops until the function returns `nil`. There are standard
sequence adapters for tables in Lua (`ipairs` and `pairs`), and `io.lines`
returns an iterator over all the lines in a file. In the Penlight libraries, such
iterators are also called _sequences_.  A sequence of single values (say from
`io.lines`) is called _single-valued_, whereas the sequence defined by `pairs` is
_double-valued_.

`pl.seq` provides a number of useful iterators, and some functions which operate
on sequences.  At first sight this example looks like an attempt to write Python
in Lua, (with the sequence being inclusive):

    > for i in seq.range(1,4) do print(i) end
    1
    2
    3
    4

But `range` is actually equivalent to Python's `xrange`, since it generates a
sequence, not a list.  To get a list, use `seq.copy(seq.range(1,10))`, which
takes any single-value sequence and makes a table from the result. `seq.list` is
like `ipairs` except that it does not give you the index, just the value.

    > for x in seq.list {1,2,3} do print(x) end
    1
    2
    3

`enum` takes a sequence and turns it into a double-valued sequence consisting of
a sequence number and the value, so `enum(list(ls))` is actually equivalent to
`ipairs`. A more interesting example prints out a file with line numbers:

    for i,v in seq.enum(io.lines(fname)) do print(i..' '..v) end

Sequences can be _combined_, either by 'zipping' them or by concatenating them.

    > for x,y in seq.zip(l1,l2) do print(x,y) end
    10      1
    20      2
    30      3
    > for x in seq.splice(l1,l2) do print(x) end
    10
    20
    30
    1
    2
    3

`seq.printall` is useful for printing out single-valued sequences, and provides
some finer control over formating, such as a delimiter, the number of fields per
line, and a format string to use (@see string.format)

    > seq.printall(seq.random(10))
    0.0012512588885159 0.56358531449324 0.19330423902097 ....
    > seq.printall(seq.random(10), ',', 4, '%4.2f')
    0.17,0.86,0.71,0.51
    0.30,0.01,0.09,0.36
    0.15,0.17,

`map` will apply a function to a sequence.

    > seq.printall(seq.map(string.upper, {'one','two'}))
    ONE TWO
    > seq.printall(seq.map('+', {10,20,30}, 1))
    11 21 31

`filter` will filter a sequence using a boolean function (often called a
_predicate_). For instance, this code only prints lines in a file which are
composed of digits:

    for l in seq.filter(io.lines(file), stringx.isdigit) do print(l) end

The following returns a table consisting of all the positive values in the
original table (equivalent to `tablex.filter(ls, '>', 0)`)

    ls = seq.copy(seq.filter(ls, '>', 0))

We're already encounted `seq.sum` when discussing `input.numbers`. This can also
be expressed with `seq.reduce`:

    > seq.reduce(function(x,y) return x + y end, seq.list{1,2,3,4})
    10

`seq.reduce` applies a binary function in a recursive fashion, so that:

    reduce(op,{1,2,3}) => op(1,reduce(op,{2,3}) => op(1,op(2,3))

it's now possible to easily generate other cumulative operations; the standard
operations declared in `pl.operator` are useful here:

    > ops = require 'pl.operator'
    > -- can also say '*' instead of ops.mul
    > = seq.reduce(ops.mul,input.numbers '1 2 3 4')
    24

There are functions to extract statistics from a sequence of numbers:

    > l1 = List {10,20,30}
    > l2 = List {1,2,3}
    > = seq.minmax(l1)
    10      30
    > = seq.sum(l1)
    60      3

It is common to get sequences where values are repeated, say the words in a file.
`count_map` will take such a sequence and count the values, returning a table
where the _keys_ are the unique values, and the value associated with each key is
the number of times they occurred:

    > t = seq.count_map {'one','fred','two','one','two','two'}
    > = t
    {one=2,fred=1,two=3}

This will also work on numerical sequences, but you cannot expect the result to
be a proper list, i.e. having no 'holes'. Instead, you always need to use `pairs`
to iterate over the result - note that there is a hole at index 5:

    > t = seq.count_map {1,2,4,2,2,3,4,2,6}
    > for k,v in pairs(t) do print(k,v) end
    1       1
    2       4
    3       1
    4       2
    6       1

`unique` uses `count_map` to return a list of the unique values, that is, just
the keys of the resulting table.

`last` turns a single-valued sequence into a double-valued sequence with the
current value and the last value:

    > for current,last in seq.last {10,20,30,40} do print (current,last) end
    20      10
    30      20
    40      30

This makes it easy to do things like identify repeated lines in a file, or
construct differences between values. `filter` can handle double-valued sequences
as well, so one could filter such a sequence to only return cases where the
current value is less than the last value by using `operator.lt` or just '<'.
This code then copies the resulting code into a table.

    > ls = {10,9,10,3}
    > = seq.copy(seq.filter(seq.last(s),'<'))
    {9,3}


### Sequence Wrappers

The functions in `pl.seq` cover the common patterns when dealing with sequences,
but chaining these functions together can lead to ugly code. Consider the last
example of the previous section; `seq` is repeated three times and the resulting
expression has to be read right-to-left. The first issue can be helped by local
aliases, so that the expression becomes `copy(filter(last(s),'<'))` but the
second issue refers to the somewhat unnatural order of functional application.
We tend to prefer reading operations from left to right, which is one reason why
object-oriented notation has become popular. Sequence adapters allow this
expression to be written like so:

    seq(s):last():filter('<'):copy()

With this notation, the operation becomes a chain of method calls running from
left to right.

'Sequence' is not a basic Lua type, they are generally functions or callable
objects. The expression `seq(s)` wraps a sequence in a _sequence wrapper_, which
is an object which understands all the functions in `pl.seq` as methods. This
object then explicitly represents sequences.

As a special case, the  constructor (which is when you call the table `seq`) will
make a wrapper for a plain list-like table. Here we apply the length operator to
a sequence of strings, and print them out.

    > seq{'one','tw','t'} :map '#' :printall()
    3 2 1

As a convenience, there is a function `seq.lines` which behaves just like
`io.lines` except it wraps the result as an explicit sequence type. This takes
the first 10 lines from standard input, makes it uppercase, turns it into a
sequence with a count and the value, glues these together with the concatenation
operator, and finally prints out the sequence delimited by a newline.

    seq.lines():take(10):upper():enum():map('..'):printall '\n'

Note the method `upper`, which is not a `seq` function. if an unknown method is
called, sequence wrappers apply that method to all the values in the sequence
(this is implicit use of `mapmethod`)

It is straightforward to create custom sequences that can be used in this way. On
Unix, `/dev/random` gives you an _endless_ sequence of random bytes, so we use
`take` to limit the sequence, and then `map` to scale the result into the desired
range. The key step is to use `seq` to wrap the iterator function:

    -- random.lua
    local seq = require 'pl.seq'

    function dev_random()
        local f = io.open('/dev/random')
        local byte = string.byte
        return seq(function()
            -- read two bytes into a string and convert into a 16-bit number
            local s = f:read(2)
            return byte(s,1) + 256*byte(s,2)
        end)
    end

    -- print 10 random numbers from 0 to 1 !
    dev_random():take(10):map('%',100):map('/',100):printall ','


Another Linux one-liner depends on the `/proc` filesystem and makes a list of all
the currently running processes:

    pids = seq(lfs.dir '/proc'):filter(stringx.isdigit):map(tonumber):copy()

This version of Penlight has an experimental feature which relies on the fact
that _all_ Lua types can have metatables, including functions. This makes
_implicit sequence wrapping_ possible:

    > seq.import()
    > seq.random(5):printall(',',5,'%4.1f')
     0.0, 0.1, 0.4, 0.1, 0.2

This avoids the awkward `seq(seq.random(5))` construction. Or the iterator can
come from somewhere else completely:

    > ('one two three'):gfind('%a+'):printall(',')
    one,two,three,

After `seq.import`, it is no longer necessary to explicitly wrap sequence
functions.

But there is a price to pay for this convenience. _Every_ function is affected,
so that any function can be used, appropriate or not:

    > math.sin:printall()
    ..seq.lua:287: bad argument #1 to '(for generator)' (number expected, got nil)
    > a = tostring
    > = a:find(' ')
    function: 0042C920

What function is returned? It's almost certain to be something that makes no
sense in the current context. So implicit sequences may make certain kinds of
programming mistakes harder to catch - they are best used for interactive
exploration and small scripts.

<a id="comprehensions"/>

### List Comprehensions

List comprehensions are a compact way to create tables by specifying their
elements. In Python, you can say this:

    ls = [x for x in range(5)]  # == [0,1,2,3,4]

In Lua, using `pl.comprehension`:

    > C = require('pl.comprehension').new()
    > = C ('x for x=1,10') ()
    {1,2,3,4,5,6,7,8,9,10}

`C` is a function which compiles a list comprehension _string_ into a _function_.
In this case, the function has no arguments. The parentheses are redundant for a
function taking a string argument, so this works as well:

    > = C 'x^2 for x=1,4' ()
    {1,4,9,16}
    > = C '{x,x^2} for x=1,4' ()
    {{1,1},{2,4},{3,9},{4,16}}

Note that the expression can be _any_ function of the variable `x`!

The basic syntax so far is `<expr> for <set>`, where `<set>` can be anything that
the Lua `for` statement understands. `<set>` can also just be the variable, in
which case the values will come from the _argument_ of the comprehension. Here
I'm emphasizing that a comprehension is a function which can take a list argument:

    > = C '2*x for x' {1,2,3}
    {2,4,6}
    > dbl = C '2*x for x'
    > = dbl {10,20,30}
    {20,40,60}

Here is a somewhat more explicit way of saying the same thing; `_1` is a
_placeholder_ refering to the _first_ argument passed to the comprehension.

    > = C '2*x for _,x in pairs(_1)' {10,20,30}
    {20,40,60}
    > = C '_1(x) for x'(tostring,{1,2,3,4})
    {'1','2','3','4'}

This extended syntax is useful when you wish to collect the result of some
iterator, such as `io.lines`. This comprehension creates a function which creates
a table of all the lines in a file:

    > f = io.open('array.lua')
    > lines = C 'line for line in _1:lines()' (f)
    > = #lines
    118

There are a number of functions that may be applied to the result of a
comprehension:

    > = C 'min(x for x)' {1,44,0}
    0
    > = C 'max(x for x)' {1,44,0}
    44
    > = C 'sum(x for x)' {1,44,0}
    45

(These are equivalent to a reduce operation on a list.)

After the `for` part, there may be a condition, which filters the output. This
comprehension collects the even numbers from a list:

    > = C 'x for x if x % 2 == 0' {1,2,3,4,5}
    {2,4}

There may be a number of `for` parts:

    > = C '{x,y} for x = 1,2 for y = 1,2' ()
    {{1,1},{1,2},{2,1},{2,2}}
    > = C '{x,y} for x for y' ({1,2},{10,20})
    {{1,10},{1,20},{2,10},{2,20}}

These comprehensions are useful when dealing with functions of more than one
variable, and are not so easily achieved with the other Penlight functional forms.

<a id="func"/>

### Creating Functions from Functions

@lookup pl.func

Lua functions may be treated like any other value, although of course you cannot
multiply or add them. One operation that makes sense is _function composition_,
which chains function calls (so `(f * g)(x)` is `f(g(x))`.)

    > func = require 'pl.func'
    > printf = func.compose(io.write,string.format)
    > printf("hello %s\n",'world')
    hello world
    true

Many functions require you to pass a function as an argument, say to apply to all
values of a sequence or as a callback. Often useful functions have the wrong
number of arguments. So there is a need to construct a function of one argument
from one of two arguments, _binding_ the extra argument to a given value.

_partial application_ takes a function of n arguments and returns a function of n-1
arguments where the first argument is bound to some value:

    > p2 = func.bind1(print,'start>')
    > p2('hello',2)
    start>  hello   2
    > ops = require 'pl.operator'
    > = tablex.filter({1,-2,10,-1,2},bind1(ops.gt,0))
    {-2,-1}
    > tablex.filter({1,-2,10,-1,2},bind1(ops.le,0))
    {1,10,2}

The last example unfortunately reads backwards, because `bind1` alway binds the
first argument!  Also unfortunately, in my youth I confused 'currying' with
'partial application', so the old name for `bind1` is `curry` - this alias still exists.

This is a specialized form of function argument binding. Here is another way
to say the `print` example:

    > p2 = func.bind(print,'start>',func._1,func._2)
    > p2('hello',2)
    start>  hello   2

where `_1` and `_2` are _placeholder variables_, corresponding to the first and
second argument respectively.

Having `func` all over the place is distracting, so it's useful to pull all of
`pl.func` into the local context. Here is the filter example, this time the right
way around:

    > utils.import 'pl.func'
    > tablex.filter({1,-2,10,-1,2},bind(ops.gt, _1, 0))
    {1,10,2}

`tablex.merge` does a general merge of two tables. This example shows the
usefulness of binding the last argument of a function.

    > S1 = {john=27, jane=31, mary=24}
    > S2 = {jane=31, jones=50}
    > intersection = bind(tablex.merge, _1, _2, false)
    > union = bind(tablex.merge, _1, _2, true)
    > = intersection(S1,S2)
    {jane=31}
    > = union(S1,S2)
    {mary=24,jane=31,john=27,jones=50}

When using `bind` with `print`, we got a function of precisely two arguments,
whereas we really want our function to use varargs like `print`. This is the role
of `_0`:

    > _DEBUG = true
    > p = bind(print,'start>', _0)
    return function (fn,_v1)
        return function(...) return fn(_v1,...) end
    end

    > p(1,2,3,4,5)
    start>  1       2       3       4       5

I've turned on the global `_DEBUG` flag, so that the function generated is
printed out. It is actually a function which _generates_ the required function;
the first call _binds the value_ of `_v1` to 'start>'.

### Placeholder Expressions

A common pattern in Penlight is a function which applies another function to all
elements in a table or a sequence, such as `tablex.map` or `seq.filter`. Lua does
anonymous functions well, although they can be a bit tedious to type:

    > = tablex.map(function(x) return x*x end, {1,2,3,4})
    {1,4,9,16}

`pl.func` allows you to define _placeholder expressions_, which can cut down on
the typing required, and also make your intent clearer. First, we bring contents
of `pl.func` into our context, and then supply an expression using placeholder
variables, such as `_1`,`_2`,etc. (C++ programmers will recognize this from the
Boost libraries.)

    > utils.import 'pl.func'
    > = tablex.map(_1*_1, {1,2,3,4})
    {1,4,9,16}

Functions of up to 5 arguments can be generated.

    > = tablex.map2(_1+_2,{1,2,3}, {10,20,30})
    {11,22,33}

These expressions can use arbitrary functions, altho they must first be
registered with the functional library. `func.register` brings in a single
function, and `func.import` brings in a whole table of functions, such as `math`.

    > sin = register(math.sin)
    > = tablex.map(sin(_1), {1,2,3,4})
    {0.8414709848079,0.90929742682568,0.14112000805987,-0.75680249530793}
    > import 'math'
    > = tablex.map(cos(2*_1),{1,2,3,4})
    {-0.41614683654714,-0.65364362086361,0.96017028665037,-0.14550003380861}

A common operation is calling a method of a set of objects:

    > = tablex.map(_1:sub(1,1), {'one','four','x'})
    {'o','f','x'}

There are some restrictions on what operators can be used in PEs. For instance,
because the `__len` metamethod cannot be overriden by plain Lua tables, we need
to define a special function to express `#_1':

    > = tablex.map(Len(_1), {'one','four','x'})
    {3,4,1}

Likewise for comparison operators, which cannot be overloaded for _different_
types, and thus also have to be expressed as a special function:

    > = tablex.filter(Gt(_1,0), {1,-1,2,4,-3})
    {1,2,4}

It is useful to express the fact that a function returns multiple values. For
instance, `tablex.pairmap`  expects a function that will be called with the key
and the value, and returns the new value and the key, in that order.

    > = pairmap(Args(_2,_1:upper()),{fred=1,alice=2})
    {ALICE=2,FRED=1}

PEs cannot contain `nil` values, since PE function arguments are represented as
an array. Instead, a special value called `Nil` is provided.  So say
`_1:f(Nil,1)` instead of `_1:f(nil,1)`.

A placeholder expression cannot be automatically used as a Lua function. The
technical reason is that the call operator must be overloaded to construct
function calls like `_1(1)`.  If you want to force a PE to return a function, use
`func.I`.

    > = tablex.map(_1(10),{I(2*_1),I(_1*_1),I(_1+2)})
    {20,100,12}

Here we make a table of functions taking a single argument, and then call them
all with a value of 10.

The essential idea with PEs is to 'quote' an expression so that it is not
immediately evaluated, but instead turned into a function that can be applied
later to some arguments. The basic mechanism is to wrap values and placeholders
so that the usual Lua operators have the effect of building up an _expression
tree_. (It turns out that you can do _symbolic algebra_ using PEs, see
`symbols.lua` in the examples directory, and its test runner `testsym.lua`, which
demonstrates symbolic differentiation.)

The rule is that if any operator has a PE operand, the result will be quoted.
Sometimes we need to quote things explicitly. For instance, say we want to pass a
function to a filter that must return true if the element value is in a set.
`set[_1]` is the obvious expression, but it does not give the desired result,
since it evaluates directly, giving `nil`. Indexing works differently than a
binary operation like addition (set+_1 _is_ properly quoted) so there is a need
for an explicit quoting or wrapping operation. This is the job of the `_`
function; the PE in this case should be `_(set)[_1]`.  This works for functions
as well, as a convenient alternative to registering functions: `_(math.sin)(_1)`.
This is equivalent to using the `lines' method:

    for line in I(_(f):read()) do print(line) end

Now this will work for _any_ 'file-like' object which which has a `read` method
returning the next line. If you had a LuaSocket client which was being 'pushed'
by lines sent from a server, then `_(s):receive '*l'` would create an iterator
for accepting input. These forms can be convenient for adapting your data flow so
that it can be passed to the sequence functions in `pl.seq'.

Placeholder expressions can be mixed with sequence wrapper expressions.
`lexer.lua` will give us a double-valued sequence of tokens, where the first
value is a type, and the second is a value. We filter out only the values where
the type is 'iden', extract the actual value using `map`, get the unique values
and finally copy to a list.

    > str = 'for i=1,10 do for j = 1,10 do print(i,j) end end'
    > = seq(lexer.lua(str)):filter('==','iden'):map(_2):unique():copy()
    {i,print,j}

This is a particularly intense line (and I don't always suggest making everything
a one-liner!); the key is the behaviour of `map`, which will take both values of
the sequence, so `_2` returns the value part. (Since `filter` here takes extra
arguments, it only operates on the type values.)

There are some performance considerations to using placeholder expressions.
Instantiating a PE requires constructing and compiling a function, which is not
such a fast operation. So to get best performance, factor out PEs from loops like
this;

    local fn = I(_1:f() + _2:g())
    for i = 1,n do
        res[i] = tablex.map2(fn,first[i],second[i])
    end