diff options
author | Matthieu Aubry <matt@piwik.org> | 2015-02-10 05:16:53 +0300 |
---|---|---|
committer | Matthieu Aubry <matt@piwik.org> | 2015-02-10 05:16:53 +0300 |
commit | 175421ec6477f8df9f336841c054a11dde166c0f (patch) | |
tree | 42199df748504bb9bdec0d99be13ef71fc5dc24b /misc | |
parent | 36d87593bc0dfbad4126ef1ed4845be3f30e54dc (diff) |
Improved readme structure + added call to contributors! linked to #7163
Diffstat (limited to 'misc')
-rw-r--r-- | misc/log-analytics/README.md | 216 |
1 files changed, 115 insertions, 101 deletions
diff --git a/misc/log-analytics/README.md b/misc/log-analytics/README.md index d7eb5aff3a..5684d5f112 100644 --- a/misc/log-analytics/README.md +++ b/misc/log-analytics/README.md @@ -4,7 +4,12 @@ * Python 2.6 or 2.7. Python 3.x is not supported. * Update to Piwik 1.11 -* OrderedDict is optional (see https://pypi.python.org/pypi/ordereddict for more details). . + +## Contributors + +We're looking for contributors! Feel free to submit Pull requests on Github. + +For example this documentation page could be improved and maybe you would like to help? Or **maybe you know Python**, check out the [list of issues for import_logs.py](https://github.com/piwik/piwik/labels/c%3A%20Log%20Analytics%20%28import_logs.py%29) which lists many interesting ideas and projects that need help. FYI [we plan to move](https://github.com/piwik/piwik/issues/7163) the project to its own repository on Github and split the big file into smaller files. ## How to use this script? @@ -65,14 +70,116 @@ To improve performance, you can disable server access logging for these requests. Each Piwik webserver (Apache, Nginx, IIS) can also be tweaked a bit to handle more req/sec. -## Setup Apache CustomLog that directly imports in Piwik +## Advanced uses + +### Example Nginx Virtual Host Log Format + +This log format can be specified for nginx access logs to capture multiple virtual hosts: + +* log_format vhosts '$host $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'; +* access_log /PATH/TO/access.log vhosts; + +When executing import_logs.py specify the "common_complete" format. + +### How do I import Page Speed Metric from logs? + +In Piwik> Actions> Page URLs and Page Title reports, Piwik reports the Avg. generation time, as an indicator of your website speed. +This metric works by default when using the Javascript tracker, but you can use it with log file as well. + +Apache can log the generation time in microseconds using %D in the LogFormat. +This metric can be imported using a custom log format in this script. +In the command line, add the --log-format-regex parameter that contains the group generation_time_micro. + +Here's an example: +Apache LogFormat "%h %l %u %t \"%r\" %>s %b %D" +--log-format-regex="(?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] \"\S+ (?P<path>.*?) \S+\" (?P<status>\S+) (?P<length>\S+) (?P<generation_time_micro>\S+)" + +Note: the group <generation_time_milli> is also available if your server logs generation time in milliseconds rather than microseconds. + +### How do I setup Nginx to directly imports in Piwik via syslog? + +With the syslog patch from http://wiki.nginx.org/3rdPartyModules which is compiled in dotdeb's release, you can log to syslog and imports them live to Piwik. +Path: Nginx -> syslog -> (syslog central server) -> this script -> piwik + +You can use any log format that this script can handle, like Apache Combined, and Json format which needs less processing. + +##### Setup Nginx logs + +``` +http { +... +log_format piwik '{"ip": "$remote_addr",' + '"host": "$host",' + '"path": "$request_uri",' + '"status": "$status",' + '"referrer": "$http_referer",' + '"user_agent": "$http_user_agent",' + '"length": $bytes_sent,' + '"generation_time_milli": $request_time,' + '"date": "$time_iso8601"}'; +... + server { + ... + access_log syslog:info piwik; + ... + } +} +``` + +##### Setup syslog-ng + +This is the config for the central server if any. If not, you can also use this config on the same server as Nginx. + +``` +options { + stats_freq(600); stats_level(1); + log_fifo_size(1280000); + log_msg_size(8192); +}; +source s_nginx { udp(); }; +destination d_piwik { + program("/usr/local/piwik/piwik.sh" template("$MSG\n")); +}; +log { source(s_nginx); filter(f_info); destination(d_piwik); }; +``` + +##### piwik.sh + +Just needed to configure the best params for import_logs.py : +``` +#!/bin/sh + +exec python /path/to/misc/log-analytics/import_logs.py \ + --url=http://localhost/ --token-auth=<your_auth_token> \ + --idsite=1 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots \ + --log-format-name=nginx_json - +``` + +##### Example of regex for syslog format (centralized logs) + +###### log format exemple + +``` +Aug 31 23:59:59 tt-srv-name www.tt.com: 1.1.1.1 - - [31/Aug/2014:23:59:59 +0200] "GET /index.php HTTP/1.0" 200 3838 "http://www.tt.com/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 365020 www.tt.com +``` + +###### Corresponding regex + +``` +--log-format-regex='.* ((?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "\S+ (?P<path>.*?) \S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*' +``` + + +### Setup Apache CustomLog that directly imports in Piwik Since apache CustomLog directives can send log data to a script, it is possible to import hits into piwik server-side in real-time rather than processing a logfile each day. This approach has many advantages, including real-time data being available on your piwik site, using real logs files instead of relying on client-side Javacsript, and not having a surge of CPU/RAM usage during log processing. The disadvantage is that if Piwik is unavailable, logging data will be lost. Therefore we recommend to also log into a standard log file. Bear in mind also that apache processes will wait until a request is logged before processing a new request, so if piwik runs slow so does your site: it's therefore important to tune --recorders to the right level. -In the most basic setup, you might have in your main config section: +##### Basic setup + +You might have in your main config section: ``` # Set up your log format as a normal extended format, with hostname at the start @@ -95,7 +202,7 @@ Useful options here are: You can have as many CustomLog statements as you like. However, if you define any CustomLog directives within a <VirtualHost> block, all CustomLogs in the main config will be overridden. Therefore if you require custom logging for particular VirtualHosts, it is recommended to use mod_macro to make configuration more maintainable. -## Advanced Log Analytics use case: Apache vhost, custom logs, automatic website creation +##### Advanced setup: Apache vhost, custom logs, automatic website creation As a rather extreme example of what you can do, here is an apache config with: @@ -106,7 +213,7 @@ As a rather extreme example of what you can do, here is an apache config with: NB use of mod_macro to ensure consistency and maintainability -## Apache configuration source code: +Apache configuration source code: ``` # Set up macro with the options @@ -172,102 +279,9 @@ Use piwiklog %v vhost_common main " " </VirtualHost> ``` -## Nginx Virtual Host Log Format - -This log format can be specified for nginx access logs to capture multiple virtual hosts: - -* log_format vhosts '$host $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'; -* access_log /PATH/TO/access.log vhosts; - -When executing import_logs.py specify the "common_complete" format. - -## Import Page Speed Metric from logs - -In Piwik> Actions> Page URLs and Page Title reports, Piwik reports the Avg. generation time, as an indicator of your website speed. -This metric works by default when using the Javascript tracker, but you can use it with log file as well. - -Apache can log the generation time in microseconds using %D in the LogFormat. -This metric can be imported using a custom log format in this script. -In the command line, add the --log-format-regex parameter that contains the group generation_time_micro. - -Here's an example: -Apache LogFormat "%h %l %u %t \"%r\" %>s %b %D" ---log-format-regex="(?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] \"\S+ (?P<path>.*?) \S+\" (?P<status>\S+) (?P<length>\S+) (?P<generation_time_micro>\S+)" - -Note: the group <generation_time_milli> is also available if your server logs generation time in milliseconds rather than microseconds. - -## Setup Nginx to directly imports in Piwik via syslog - -With the syslog patch from http://wiki.nginx.org/3rdPartyModules which is compiled in dotdeb's release, you can log to syslog and imports them live to Piwik. -Path: Nginx -> syslog -> (syslog central server) -> this script -> piwik - -You can use any log format that this script can handle, like Apache Combined, and Json format which needs less processing. - -### Setup Nginx logs - -``` -http { -... -log_format piwik '{"ip": "$remote_addr",' - '"host": "$host",' - '"path": "$request_uri",' - '"status": "$status",' - '"referrer": "$http_referer",' - '"user_agent": "$http_user_agent",' - '"length": $bytes_sent,' - '"generation_time_milli": $request_time,' - '"date": "$time_iso8601"}'; -... - server { - ... - access_log syslog:info piwik; - ... - } -} -``` - -# Setup syslog-ng +### And that's all ! -This is the config for the central server if any. If not, you can also use this config on the same server as Nginx. - -``` -options { - stats_freq(600); stats_level(1); - log_fifo_size(1280000); - log_msg_size(8192); -}; -source s_nginx { udp(); }; -destination d_piwik { - program("/usr/local/piwik/piwik.sh" template("$MSG\n")); -}; -log { source(s_nginx); filter(f_info); destination(d_piwik); }; -``` - -# piwik.sh - -Just needed to configure the best params for import_logs.py : -``` -#!/bin/sh - -exec python /path/to/misc/log-analytics/import_logs.py \ - --url=http://localhost/ --token-auth=<your_auth_token> \ - --idsite=1 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots \ - --log-format-name=nginx_json - -``` - -# regex example for syslog format (centralized logs) - -## log format exemple - -``` -Aug 31 23:59:59 tt-srv-name www.tt.com: 1.1.1.1 - - [31/Aug/2014:23:59:59 +0200] "GET /index.php HTTP/1.0" 200 3838 "http://www.tt.com/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 365020 www.tt.com -``` - -## Corresponding regex - -``` ---log-format-regex='.* ((?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "\S+ (?P<path>.*?) \S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*' -``` -And that's all ! +***This documentation is a community effort, feel free to suggest any change via Github Pull request.*** + |