Software Details:
Version: 0.83
Upload Date: 1 Mar 15
Distribution Type: Freeware
Downloads: 80
Can be used in writing search crawlers (spiders) that mine Web pages for various information.
PHPCrawl acquires information it was configured to fetch and passes it to more powerful apps for further processing.
Features:
- Filters for URL and Content-Type data
- Define ways to handle cookies
- Define ways to handle robots.txt files
- Limit its activity in various ways
- Multi-processing modes
What is new in this release:
- Fixed bugs:
- Links that are partially urlencoded and partially not get rebuild/encoded correctly now.
- Removed a unnecessary debug var_dump() from PHPCrawlerRobotsTxtParser.class.php
- Server-name-indication in TLS/SSL works correctly now.
- "base-href"-tags in websites get interpreted correctly now again.
What is new in version 0.80 beta:
- Code was completely refactored, ported to PHP5-OO-code and a lot of code was rewritten.
- Added the ability to use use multiple processes to spider a website. Method "goMultiProcessed()" added.
- New overridable method "initChildProcess()" added for initiating child-processes when using the crawler in multi-process-mode.
- Implementet an alternative, internal SQlite caching-mechanism for URLs making it possible to spider very large websites.
- Method "setUrlCacheType()" added.
- New method setWorkingDirectory() added for defining the location of the crawlers temporary working-directory manually. Therefor method "setTmpFile()" is marked as deprecated (has no function anymore).
- New method "addContentTypeReceiveRule()" replaces the old method "addReceiveContentType()".
- The function "addReceiveContentType()" still is present, but was marked as deprecated.
Requirements:
- PHP 5 or higher
- PHP with OpenSSL support
Comments not found