Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/mapsme/omim.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorYury Melnichek <melnichek@gmail.com>2012-09-17 14:13:25 +0400
committerAlex Zolotarev <alex@maps.me>2015-09-23 01:43:34 +0300
commitf8b8a13a870024f5a20e0efdfa9c12239fa1cad9 (patch)
tree26c8bd4dbc380e5aadb40a9e5069297c42498e54 /crawler
parentf8d90e92ce791650dc89944fca009fc36d9e3a90 (diff)
[crawler] Download full wikitravel images, not thumbnails.
Diffstat (limited to 'crawler')
-rwxr-xr-xcrawler/normalize-image-urls.sh4
-rwxr-xr-xcrawler/wikitravel-crawler.sh4
2 files changed, 7 insertions, 1 deletions
diff --git a/crawler/normalize-image-urls.sh b/crawler/normalize-image-urls.sh
new file mode 100755
index 0000000000..ee045b1df6
--- /dev/null
+++ b/crawler/normalize-image-urls.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+set -e -u -x
+
+cat $1 | sed 's:/thumb\(/.*\)/[0-9][0-9]*px-.*$:\1:' | sort -u > $2
diff --git a/crawler/wikitravel-crawler.sh b/crawler/wikitravel-crawler.sh
index 58fd1a2f3f..dee0e843a1 100755
--- a/crawler/wikitravel-crawler.sh
+++ b/crawler/wikitravel-crawler.sh
@@ -28,6 +28,8 @@ cat wikitravel-pages.json | python $MY_PATH/wikitravel-optimize-articles.py
$MY_PATH/extract-image-urls.sh wikitravel-images.urls
-wget --wait=1 --no-clobber -i wikitravel-images.urls
+$MY_PATH/normalize-image-urls.sh wikitravel-images.urls wikitravel-images-normalized.url
+
+wget --wait=1 --random-wait --no-clobber -i wikitravel-images-normalized.urls
# TODO: Run publisher.