Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/matomo-org/matomo.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/misc
diff options
context:
space:
mode:
authorMatthieu Aubry <matt@piwik.org>2015-02-10 05:16:53 +0300
committerMatthieu Aubry <matt@piwik.org>2015-02-10 05:16:53 +0300
commit175421ec6477f8df9f336841c054a11dde166c0f (patch)
tree42199df748504bb9bdec0d99be13ef71fc5dc24b /misc
parent36d87593bc0dfbad4126ef1ed4845be3f30e54dc (diff)
Improved readme structure + added call to contributors! linked to #7163
Diffstat (limited to 'misc')
-rw-r--r--misc/log-analytics/README.md216
1 files changed, 115 insertions, 101 deletions
diff --git a/misc/log-analytics/README.md b/misc/log-analytics/README.md
index d7eb5aff3a..5684d5f112 100644
--- a/misc/log-analytics/README.md
+++ b/misc/log-analytics/README.md
@@ -4,7 +4,12 @@
* Python 2.6 or 2.7. Python 3.x is not supported.
* Update to Piwik 1.11
-* OrderedDict is optional (see https://pypi.python.org/pypi/ordereddict for more details). .
+
+## Contributors
+
+We're looking for contributors! Feel free to submit Pull requests on Github.
+
+For example this documentation page could be improved and maybe you would like to help? Or **maybe you know Python**, check out the [list of issues for import_logs.py](https://github.com/piwik/piwik/labels/c%3A%20Log%20Analytics%20%28import_logs.py%29) which lists many interesting ideas and projects that need help. FYI [we plan to move](https://github.com/piwik/piwik/issues/7163) the project to its own repository on Github and split the big file into smaller files.
## How to use this script?
@@ -65,14 +70,116 @@ To improve performance,
you can disable server access logging for these requests.
Each Piwik webserver (Apache, Nginx, IIS) can also be tweaked a bit to handle more req/sec.
-## Setup Apache CustomLog that directly imports in Piwik
+## Advanced uses
+
+### Example Nginx Virtual Host Log Format
+
+This log format can be specified for nginx access logs to capture multiple virtual hosts:
+
+* log_format vhosts '$host $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"';
+* access_log /PATH/TO/access.log vhosts;
+
+When executing import_logs.py specify the "common_complete" format.
+
+### How do I import Page Speed Metric from logs?
+
+In Piwik> Actions> Page URLs and Page Title reports, Piwik reports the Avg. generation time, as an indicator of your website speed.
+This metric works by default when using the Javascript tracker, but you can use it with log file as well.
+
+Apache can log the generation time in microseconds using %D in the LogFormat.
+This metric can be imported using a custom log format in this script.
+In the command line, add the --log-format-regex parameter that contains the group generation_time_micro.
+
+Here's an example:
+Apache LogFormat "%h %l %u %t \"%r\" %>s %b %D"
+--log-format-regex="(?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] \"\S+ (?P<path>.*?) \S+\" (?P<status>\S+) (?P<length>\S+) (?P<generation_time_micro>\S+)"
+
+Note: the group <generation_time_milli> is also available if your server logs generation time in milliseconds rather than microseconds.
+
+### How do I setup Nginx to directly imports in Piwik via syslog?
+
+With the syslog patch from http://wiki.nginx.org/3rdPartyModules which is compiled in dotdeb's release, you can log to syslog and imports them live to Piwik.
+Path: Nginx -> syslog -> (syslog central server) -> this script -> piwik
+
+You can use any log format that this script can handle, like Apache Combined, and Json format which needs less processing.
+
+##### Setup Nginx logs
+
+```
+http {
+...
+log_format piwik '{"ip": "$remote_addr",'
+ '"host": "$host",'
+ '"path": "$request_uri",'
+ '"status": "$status",'
+ '"referrer": "$http_referer",'
+ '"user_agent": "$http_user_agent",'
+ '"length": $bytes_sent,'
+ '"generation_time_milli": $request_time,'
+ '"date": "$time_iso8601"}';
+...
+ server {
+ ...
+ access_log syslog:info piwik;
+ ...
+ }
+}
+```
+
+##### Setup syslog-ng
+
+This is the config for the central server if any. If not, you can also use this config on the same server as Nginx.
+
+```
+options {
+ stats_freq(600); stats_level(1);
+ log_fifo_size(1280000);
+ log_msg_size(8192);
+};
+source s_nginx { udp(); };
+destination d_piwik {
+ program("/usr/local/piwik/piwik.sh" template("$MSG\n"));
+};
+log { source(s_nginx); filter(f_info); destination(d_piwik); };
+```
+
+##### piwik.sh
+
+Just needed to configure the best params for import_logs.py :
+```
+#!/bin/sh
+
+exec python /path/to/misc/log-analytics/import_logs.py \
+ --url=http://localhost/ --token-auth=<your_auth_token> \
+ --idsite=1 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots \
+ --log-format-name=nginx_json -
+```
+
+##### Example of regex for syslog format (centralized logs)
+
+###### log format exemple
+
+```
+Aug 31 23:59:59 tt-srv-name www.tt.com: 1.1.1.1 - - [31/Aug/2014:23:59:59 +0200] "GET /index.php HTTP/1.0" 200 3838 "http://www.tt.com/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 365020 www.tt.com
+```
+
+###### Corresponding regex
+
+```
+--log-format-regex='.* ((?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "\S+ (?P<path>.*?) \S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*'
+```
+
+
+### Setup Apache CustomLog that directly imports in Piwik
Since apache CustomLog directives can send log data to a script, it is possible to import hits into piwik server-side in real-time rather than processing a logfile each day.
This approach has many advantages, including real-time data being available on your piwik site, using real logs files instead of relying on client-side Javacsript, and not having a surge of CPU/RAM usage during log processing.
The disadvantage is that if Piwik is unavailable, logging data will be lost. Therefore we recommend to also log into a standard log file. Bear in mind also that apache processes will wait until a request is logged before processing a new request, so if piwik runs slow so does your site: it's therefore important to tune --recorders to the right level.
-In the most basic setup, you might have in your main config section:
+##### Basic setup
+
+You might have in your main config section:
```
# Set up your log format as a normal extended format, with hostname at the start
@@ -95,7 +202,7 @@ Useful options here are:
You can have as many CustomLog statements as you like. However, if you define any CustomLog directives within a <VirtualHost> block, all CustomLogs in the main config will be overridden. Therefore if you require custom logging for particular VirtualHosts, it is recommended to use mod_macro to make configuration more maintainable.
-## Advanced Log Analytics use case: Apache vhost, custom logs, automatic website creation
+##### Advanced setup: Apache vhost, custom logs, automatic website creation
As a rather extreme example of what you can do, here is an apache config with:
@@ -106,7 +213,7 @@ As a rather extreme example of what you can do, here is an apache config with:
NB use of mod_macro to ensure consistency and maintainability
-## Apache configuration source code:
+Apache configuration source code:
```
# Set up macro with the options
@@ -172,102 +279,9 @@ Use piwiklog %v vhost_common main " "
</VirtualHost>
```
-## Nginx Virtual Host Log Format
-
-This log format can be specified for nginx access logs to capture multiple virtual hosts:
-
-* log_format vhosts '$host $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"';
-* access_log /PATH/TO/access.log vhosts;
-
-When executing import_logs.py specify the "common_complete" format.
-
-## Import Page Speed Metric from logs
-
-In Piwik> Actions> Page URLs and Page Title reports, Piwik reports the Avg. generation time, as an indicator of your website speed.
-This metric works by default when using the Javascript tracker, but you can use it with log file as well.
-
-Apache can log the generation time in microseconds using %D in the LogFormat.
-This metric can be imported using a custom log format in this script.
-In the command line, add the --log-format-regex parameter that contains the group generation_time_micro.
-
-Here's an example:
-Apache LogFormat "%h %l %u %t \"%r\" %>s %b %D"
---log-format-regex="(?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] \"\S+ (?P<path>.*?) \S+\" (?P<status>\S+) (?P<length>\S+) (?P<generation_time_micro>\S+)"
-
-Note: the group <generation_time_milli> is also available if your server logs generation time in milliseconds rather than microseconds.
-
-## Setup Nginx to directly imports in Piwik via syslog
-
-With the syslog patch from http://wiki.nginx.org/3rdPartyModules which is compiled in dotdeb's release, you can log to syslog and imports them live to Piwik.
-Path: Nginx -> syslog -> (syslog central server) -> this script -> piwik
-
-You can use any log format that this script can handle, like Apache Combined, and Json format which needs less processing.
-
-### Setup Nginx logs
-
-```
-http {
-...
-log_format piwik '{"ip": "$remote_addr",'
- '"host": "$host",'
- '"path": "$request_uri",'
- '"status": "$status",'
- '"referrer": "$http_referer",'
- '"user_agent": "$http_user_agent",'
- '"length": $bytes_sent,'
- '"generation_time_milli": $request_time,'
- '"date": "$time_iso8601"}';
-...
- server {
- ...
- access_log syslog:info piwik;
- ...
- }
-}
-```
-
-# Setup syslog-ng
+### And that's all !
-This is the config for the central server if any. If not, you can also use this config on the same server as Nginx.
-
-```
-options {
- stats_freq(600); stats_level(1);
- log_fifo_size(1280000);
- log_msg_size(8192);
-};
-source s_nginx { udp(); };
-destination d_piwik {
- program("/usr/local/piwik/piwik.sh" template("$MSG\n"));
-};
-log { source(s_nginx); filter(f_info); destination(d_piwik); };
-```
-
-# piwik.sh
-
-Just needed to configure the best params for import_logs.py :
-```
-#!/bin/sh
-
-exec python /path/to/misc/log-analytics/import_logs.py \
- --url=http://localhost/ --token-auth=<your_auth_token> \
- --idsite=1 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots \
- --log-format-name=nginx_json -
-```
-
-# regex example for syslog format (centralized logs)
-
-## log format exemple
-
-```
-Aug 31 23:59:59 tt-srv-name www.tt.com: 1.1.1.1 - - [31/Aug/2014:23:59:59 +0200] "GET /index.php HTTP/1.0" 200 3838 "http://www.tt.com/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0" 365020 www.tt.com
-```
-
-## Corresponding regex
-
-```
---log-format-regex='.* ((?P<ip>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "\S+ (?P<path>.*?) \S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*'
-```
-And that's all !
+***This documentation is a community effort, feel free to suggest any change via Github Pull request.***
+