Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/hxseven/htmlSQL.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJonas John <jonas@jonasjohn.de>2012-02-09 14:49:06 +0400
committerJonas John <jonas@jonasjohn.de>2012-02-09 14:49:06 +0400
commitce9c480292ec48b7bae27ac4e82057aebfbdceee (patch)
tree68d2ca39b84618fc8fc9781c7306c056e237766d /README.md
parent86ea60d5dabfa6bba23ad29d6daaf8c8d7c3a10a (diff)
Minor code cleanup
Diffstat (limited to 'README.md')
-rwxr-xr-xREADME.md33
1 files changed, 18 insertions, 15 deletions
diff --git a/README.md b/README.md
index 1b471bb..db2399c 100755
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
htmlSQL - Version 0.5
=====================
-htmlSQL is a experimental PHP class which allows you to access HTML
+htmlSQL is a experimental PHP library which allows you to access HTML
values by an SQL like syntax. This means that you don't have to write
complex functions or regular expressions to extract specific values.
@@ -20,28 +20,30 @@ The project has been abandoned
------------------------------
htmlSQL was a experiment I made in 2006. I'm **not** supporting or extending the library anymore, this repository is only for historical purposes.
-But feel free to fork, modify and study the source code. If you need a reliable library for data scraping I recommend using **other modules** (see below).
+But feel free to fork, modify and study the source code. If you need a reliable library for data scraping I recommend using **other modules**.
Related projects:
-* PHP: [SimpleXML](http://www.php.net/dom), [DOM](http://www.php.net/dom)
-* Perl: [pQuery](http://search.cpan.org/~ingy/pQuery-0.07/lib/pQuery.pm)
-* Python: [Scrapy](http://scrapy.org/)
+* PHP: [phpQuery](http://code.google.com/p/phpquery/), [SimpleXML](http://www.php.net/simplexml), [DOM](http://www.php.net/dom)
+* Perl: [WWW::Mechanize](http://search.cpan.org/dist/WWW-Mechanize/), [pQuery](http://search.cpan.org/~ingy/pQuery-0.07/lib/pQuery.pm)
+* Python: [Scrapy](http://scrapy.org/), [Beautiful Soup](http://www.crummy.com/software/BeautifulSoup/)
* JavaScript: [node.js](http://blog.nodejitsu.com/jsdom-jquery-in-5-lines-on-nodejs)
+* .NET: [Html Agility Pack](http://htmlagilitypack.codeplex.com/)
+Related links:
-Related Hacker News threads:
-
-* [PHP class to query the web by an SQL like language](http://news.ycombinator.com/item?id=2097008)
-* [Ask YC: What do you scrape? How do you scrape?](http://news.ycombinator.com/item?id=159025)
+* [Stack Overflow: Options for HTML scraping?](http://stackoverflow.com/questions/2861/options-for-html-scraping)
+* [Stack Overflow: HTML Scraping in PHP](http://stackoverflow.com/questions/34120/html-scraping-in-php)
+* [Hacker News: PHP class to query the web by an SQL like language](http://news.ycombinator.com/item?id=2097008)
+* [Hacker News: Ask YC: What do you scrape? How do you scrape?](http://news.ycombinator.com/item?id=159025)
Requirements
------------
- Any flavor of PHP4+ should do
-- [Snoopy PHP class - Version 1.2.3](http://sourceforge.net/projects/snoopy/) (optional - required for web transfers)
+- [Snoopy PHP class - Version 1.2.3](http://sourceforge.net/projects/snoopy/) (optional - required for web transfers)
You find all Snoopy related documents (copyright, readme, etc) in the snoopy_data/ subdirectory.
@@ -50,7 +52,7 @@ Usage
Just include the "snoopy.class.php" and the "htmlsql.class.php" files
into your PHP scripts and look at the examples to get an idea of how
-to use the htmlSQL class. It should be very simple :-)
+to use the htmlSQL library. It should be very simple :-)
Background / idea
@@ -59,9 +61,9 @@ Background / idea
I had this idea while extracting some data from a website. As I realized
that the algorithms and functions to extract links and other tags are
often the same - I had the idea to combine all functions to an universal
-usable class. While drinking a coffee and thinking on that problem, I
+usable library. While drinking a coffee and thinking about that, I
thought it would be cool to access HTML elements by using SQL. So I
-started creating this class...
+started creating this library...
Warning
@@ -78,8 +80,9 @@ Todo
* Enhance the HTML parser
* Test htmlSQL with invalid and bad HTML files
* Replace the ugly `eval()` method for the WHERE statement with an own method
-* More error checks
-* Include the LIMIT function/method like in SQL
+* Add more error checks
+* Add unit tests
+* Add a LIMIT function like in SQL
Author