Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/mapsme/omim.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorYury Melnichek <melnichek@gmail.com>2012-05-28 16:31:07 +0400
committerAlex Zolotarev <alex@maps.me>2015-09-23 01:40:14 +0300
commit157cbfe78c67b6188b21cfd28cf414292c10d37c (patch)
tree9541a9d66f7401f19a3c0f0fdf723297e6e0198e /crawler
parent37c71bed8dd1a961aaea2ae8a3b57483aaeac76d (diff)
[guide] Extract images from html files.
Diffstat (limited to 'crawler')
-rwxr-xr-xcrawler/extract-image-urls.sh8
-rwxr-xr-xcrawler/wikitravel-crawler.sh2
2 files changed, 10 insertions, 0 deletions
diff --git a/crawler/extract-image-urls.sh b/crawler/extract-image-urls.sh
new file mode 100755
index 0000000000..e63d10085d
--- /dev/null
+++ b/crawler/extract-image-urls.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+set -e -u -x
+
+grep --ignore-case --only-matching --no-filename '<img[^/]*src=\"[^">]*"' *.opt \
+ | sed 's/<img.*src="//g' \
+ | sed 's/"$//g' \
+ | sort -u \
+ > $1
diff --git a/crawler/wikitravel-crawler.sh b/crawler/wikitravel-crawler.sh
index d6e8406bd3..7f3eb802fe 100755
--- a/crawler/wikitravel-crawler.sh
+++ b/crawler/wikitravel-crawler.sh
@@ -26,4 +26,6 @@ cat wikitravel-pages.json | python $MY_PATH/wikitravel-process-articles.py
cat wikitravel-pages.json | python $MY_PATH/wikitravel-optimize-articles.py
+$MY_PATH/extract-image-urls.sh wikitravel-images.urls
+
# TODO: Run publisher.