Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/stevedonovan/Penlight.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorsteve donovan <steve.j.donovan@gmail.com>2013-03-08 16:21:53 +0400
committersteve donovan <steve.j.donovan@gmail.com>2013-03-08 16:21:53 +0400
commit5c6078254c0f2ec81572d726c2027354d943dbfa (patch)
tree3d7f16930eb5e9fed9f84c8e4d434ce2afd86610
parent2c0de51c7f77deb8c396daf8448a6aa9dac6a96f (diff)
updated manual and XML test1.1.0
-rw-r--r--docs/config.ld2
-rw-r--r--docs/manual/01-introduction.md21
-rw-r--r--docs/manual/03-strings.md3
-rw-r--r--docs/manual/06-data.md100
-rw-r--r--tests/test-xml.lua29
5 files changed, 139 insertions, 16 deletions
diff --git a/docs/config.ld b/docs/config.ld
index 1130a28..13d4c94 100644
--- a/docs/config.ld
+++ b/docs/config.ld
@@ -1,5 +1,5 @@
project = 'Penlight'
-description = 'Penlight Lua Libraries 1.0.3'
+description = 'Penlight Lua Libraries 1.1.0'
full_description = 'The documentation is available @{01-introduction.md|here}.'
title = 'Penlight Documentation'
dir = 'api'
diff --git a/docs/manual/01-introduction.md b/docs/manual/01-introduction.md
index 2006028..5b5deb1 100644
--- a/docs/manual/01-introduction.md
+++ b/docs/manual/01-introduction.md
@@ -93,7 +93,6 @@ formal need to keep the global table uncluttered and the informal need for
convenience. `require'pl.import_into'` returns a function, which accepts a table
for injecting Penlight into, or if no table is given, it passes back a new one.
-
local pl = require'pl.import_into'()
The table `pl` is a 'lazy table' which loads modules as needed, so we can then
@@ -194,6 +193,14 @@ For example,
If you were to accidently type `mymod.Answer()`, then you would get a runtime
error: "variable 'Answer' is not declared in 'mymod'".
+This can be applied to existing modules. You may desire to have the same level
+of checking for the Lua standard libraries:
+
+ strict.make_all_strict(_G)
+
+Thereafter a typo such as `math.cosine` will give you an explicit error, rather
+than merely returning a `nil` that will cause problems later.
+
### What are function arguments in Penlight?
Many functions in Penlight themselves take function arguments, like `map` which
@@ -273,6 +280,8 @@ The function `printf` discussed earlier is included in `pl.utils` because it
makes properly formatted output easier. (There is an equivalent `fprintf` which
also takes a file object parameter, just like the C function.)
+Splitting a string using a delimiter is a fairly common operation, hence `split`.
+
Utility functions like `is_callable` and `is_type` help with identifying what
kind of animal you are dealing with. Obviously, a function is callable, but an
object can be callable as well if it has overriden the `__call` metamethod. The
@@ -319,7 +328,7 @@ upfront, since in general you won't know what values are needed.
Penlight is fully compatible with Lua 5.1, 5.2 and LuaJIT 2. To ensure this,
`utils` also defines the global Lua 5.2
-[load](http://www.lua.org/work/doc/manual.html#pdf-load) function when needed.
+[load](http://www.lua.org/work/doc/manual.html#pdf-load) function as `utils.load`
* the input (either a string or a function)
* the source name used in debug information
@@ -327,9 +336,15 @@ Penlight is fully compatible with Lua 5.1, 5.2 and LuaJIT 2. To ensure this,
whether the source is a binary chunk or text code (default is 'bt')
* the environment for the compiled chunk
-Using `load` should reduce the need to call the deprecated function `setfenv`,
+Using `utils.load` should reduce the need to call the deprecated function `setfenv`,
and make your Lua 5.1 code 5.2-friendly.
+Currently, the `utils` module does define a global `getfenv` and `setfenv` for
+Lua 5.2, based on code by Sergey Rozhenko. Note that these functions can fail
+for functions which don't access any globals. (whether it's wise to directly
+inject these functions into global or not, I'll leave for a later version to
+decide)
+
### Application Support
`app.parse_args` is a simple command-line argument parser. If called without any
diff --git a/docs/manual/03-strings.md b/docs/manual/03-strings.md
index 5312dc3..a408fc9 100644
--- a/docs/manual/03-strings.md
+++ b/docs/manual/03-strings.md
@@ -39,7 +39,8 @@ easily at hand. Note that can be injected into the `string` table if you use
`stringx.import`, but a simple alias like `local stringx = require 'pl.stringx'`
is preferrable. This is the recommended practice when writing modules for
consumption by other people, since it is bad manners to change the global state
-of the rest of the system.
+of the rest of the system. Magic may be used for convenience, but there is always
+a cost.
### String Templates
diff --git a/docs/manual/06-data.md b/docs/manual/06-data.md
index ea69330..e64ee33 100644
--- a/docs/manual/06-data.md
+++ b/docs/manual/06-data.md
@@ -224,9 +224,15 @@ have to use `==` (this warning comes from experience.)
For this to work, _field names must be Lua identifiers_. So `read` will massage
fieldnames so that all non-alphanumeric chars are replaced with underscores.
+However, the `original_fieldnames` field always contains the original un-massaged
+fieldnames.
`read` can handle standard CSV files fine, although doesn't try to be a
-full-blown CSV parser. Spreadsheet programs are not always the best tool to
+full-blown CSV parser. With the `csv=true` option, it's possible to have
+double-quoted fields, which may contain commas; then trailing commas become
+significant as well.
+
+Spreadsheet programs are not always the best tool to
process such data, strange as this might seem to some people. This is a toy CSV
file; to appreciate the problem, imagine thousands of rows and dozens of columns
like this:
@@ -263,6 +269,34 @@ condition (such as belonging to a specified set) then it is not generally
possible to express such a condition as a query string, without resorting to
hackery such as global variables.
+With 1.0.3, you can specify explicit conversion functions for selected columns.
+For instance, this is a log file with a Unix date stamp:
+
+ Time Message
+ 1266840760 +# EE7C0600006F0D00C00F06010302054000000308010A00002B00407B00
+ 1266840760 closure data 0.000000 1972 1972 0
+ 1266840760 ++ 1266840760 EE 1
+ 1266840760 +# EE7C0600006F0D00C00F06010302054000000408020A00002B00407B00
+ 1266840764 closure data 0.000000 1972 1972 0
+
+We would like the first column as an actual date object, so the `convert`
+field sets an explicit conversion for column 1. (Note that we have to explicitly
+convert the string to a number first.)
+
+ Date = require 'pl.Date'
+
+ function date_convert (ds)
+ return Date(tonumber(ds))
+ end
+
+ d = data.read(f,{convert={[1]=date_convert},last_field_collect=true})
+
+This gives us a two-column dataset, where the first column contains `Date` objects
+and the second column contains the rest of the line. Queries can then easily
+pick out events on a day of the week:
+
+ q = d:select "Time,Message where Time:weekday_name()=='Sun'"
+
Data does not have to come from files, nor does it necessarily come from the lab
or the accounts department. On Linux, `ps aux` gives you a full listing of all
processes running on your machine. It is straightforward to feed the output of
@@ -303,7 +337,7 @@ And it can be used generally as a filter command to extract columns from data.
(As with AWK, please note the single-quotes used in this command; this prevents
the shell trying to expand the column indexes. If you are on Windows, then you
-are fine, but it is still necessary to quote the expression in double-quotes so
+must quote the expression in double-quotes so
it is passed as one argument to your batch file.)
As a tutorial resource, have a look at `test-data.lua` in the PL tests directory
@@ -477,7 +511,8 @@ the following fields:
list_delim = ',',
trim_quotes = true,
ignore_assign = false,
- keysep = '='
+ keysep = '=',
+ smart = false,
}
`variablilize` is the option that converted `write.timeout` in the first example
@@ -554,7 +589,27 @@ That result is a string, since `tonumber` doesn't like it, but defining the
`convert_numbers` option as `function(s) return tonumber((s:gsub(' kB$','')))
end` will get the memory figures as actual numbers in the result. (The extra
parentheses are necessary so that `tonumber` only gets the first result from
-`gsub`)
+`gsub`). From `tests/test-config.lua':
+
+ testconfig([[
+ MemTotal: 1024748 kB
+ MemFree: 220292 kB
+ ]],
+ { MemTotal = 1024748, MemFree = 220292 },
+ {
+ keysep = ':',
+ convert_numbers = function(s)
+ s = s:gsub(' kB$','')
+ return tonumber(s)
+ end
+ }
+ )
+
+
+The `smart` option lets `config.read` make a reasonable guess for you; there
+are examples in `tests/test-config.lua`, but basically these common file
+formats (and those following the same pattern) can be processed directly in
+smart mode: 'etc/fstab', '/proc/XXXX/status', 'ssh_config' and 'pdatedb.conf'.
Please note that `config.read` can be passed a _file-like object_; if it's not a
string and supports the `read` method, then that will be used. For instance, to
@@ -675,7 +730,7 @@ snippet from 'text-lexer.lua':
test.asserteq(ls,List{'for','in','do','if','then','else','end','end'})
Here is a useful little utility that identifies all common global variables found
-in a lua module:
+in a lua module (ignoring those declared locally for the moment):
-- testglobal.lua
require 'pl'
@@ -721,7 +776,8 @@ specialized library.
#### Parsing and Pretty-Printing
-The semi-standard XML parser in the Lua universe is [lua-expat](). In particular,
+The semi-standard XML parser in the Lua universe is [lua-expat](http://matthewwild.co.uk/projects/luaexpat/).
+In particular,
it has a function called `lxp.lom.parse` which will parse XML into the Lua Object
Model (LOM) format. However, it does not provide a way to convert this data back
into XML text. `xml.parse` will use this function, _if_ `lua-expat` is
@@ -758,7 +814,7 @@ also as an array. It is always present.
the first child of `d`, etc.
It could be argued that having attributes also as the array part of `attr` is not
-essential (you generally cannot depend on attribute order in XML) but that's how
+essential (you cannot depend on attribute order in XML) but that's how
it goes with this standard.
`lua-expat` is another _soft dependency_ of Penlight; generally, the fallback
@@ -826,6 +882,7 @@ on Debian/Ubuntu Linux systems.
d = xml.parse [[
<serviceproviders format="2.0">
+ ...
<country code="za">
<provider>
<name>Cell-c</name>
@@ -873,7 +930,7 @@ on Debian/Ubuntu Linux systems.
</gsm>
</provider>
</country>
-
+ ....
</serviceproviders>
]]
@@ -882,7 +939,7 @@ Getting the names of the providers per-country is straightforward:
local t = {}
for country in d:childtags() do
local providers = {}
- t[country.tag] = providers
+ t[country.attr.code] = providers
for provider in country:childtags() do
table.insert(providers,provider:child_with_name('name'):get_text())
end
@@ -891,12 +948,13 @@ Getting the names of the providers per-country is straightforward:
pretty.dump(t)
-->
{
- country = {
+ za = {
"Cell-c",
"MTN",
"Vodacom",
"Virgin Mobile"
}
+ ....
}
#### Generating XML with 'xmlification'
@@ -996,4 +1054,26 @@ The `match` method can be passed a LOM document or some text, which will be
parsed first. Note that `$NUMBER` is treated specially as a numerical index, so
that `$1` is the first element of the resulting array, etc.
+#### HTML Parsing
+
+HTML is an ususally slack dialect of XML, and Dennis Schridde has contributed
+a feature which makes parsing it easier. For instance, from the tests:
+
+ xml.parsehtml = true
+
+ doc = xml.parse [[
+ <BODY>
+ Hello dolly<br>
+ HTML is <b>slack</b><br>
+ </BODY>
+ ]]
+
+ asserteq(xml.tostring(doc),[[
+ <body>
+ Hello dolly<br/>
+ HTML is <b>slack</b><br/></body>]])
+
+That is, all tags are converted to lowercase, and some elements like `br`
+are properly closed.
+
diff --git a/tests/test-xml.lua b/tests/test-xml.lua
index dde8ce8..14fe3c1 100644
--- a/tests/test-xml.lua
+++ b/tests/test-xml.lua
@@ -391,4 +391,31 @@ t = SP{country{code="$country",provider{
name '$name', gsm{apn {value="$apn",dns '196.43.46.190'}}
}}}
-print(xml.tostring(t,' ',' '))
+out = xml.tostring(t,' ',' ')
+asserteq(out,[[
+
+ <serviceprovider>
+ <country code='$country'>
+ <provider>
+ <name>$name</name>
+ <gsm>
+ <apn value='$apn'>
+ <dns>196.43.46.190</dns>
+ </apn>
+ </gsm>
+ </provider>
+ </country>
+ </serviceprovider>]])
+
+xml.parsehtml = true
+doc = parse [[
+<BODY>
+Hello dolly<br>
+HTML is <b>slack</b><br>
+</BODY>
+]]
+
+asserteq(xml.tostring(doc),[[
+<body>
+Hello dolly<br/>
+HTML is <b>slack</b><br/></body>]])