PHPCrawl

Software Screenshot:
PHPCrawl
Software Details:
Version: 0.83
Upload Date: 1 Mar 15
Developer: Uwe Hunfeld
Distribution Type: Freeware
Downloads: 26

Rating: nan/5 (Total Votes: 0)

Can be used in writing search crawlers (spiders) that mine Web pages for various information.

PHPCrawl acquires information it was configured to fetch and passes it to more powerful apps for further processing.

Features:

  • Filters for URL and Content-Type data
  • Define ways to handle cookies
  • Define ways to handle robots.txt files
  • Limit its activity in various ways
  • Multi-processing modes

What is new in this release:

  • Fixed bugs:
  • Links that are partially urlencoded and partially not get rebuild/encoded correctly now.
  • Removed a unnecessary debug var_dump() from PHPCrawlerRobotsTxtParser.class.php
  • Server-name-indication in TLS/SSL works correctly now.
  • "base-href"-tags in websites get interpreted correctly now again.

What is new in version 0.80 beta:

  • Code was completely refactored, ported to PHP5-OO-code and a lot of code was rewritten.
  • Added the ability to use use multiple processes to spider a website. Method "goMultiProcessed()" added.
  • New overridable method "initChildProcess()" added for initiating child-processes when using the crawler in multi-process-mode.
  • Implementet an alternative, internal SQlite caching-mechanism for URLs making it possible to spider very large websites.
  • Method "setUrlCacheType()" added.
  • New method setWorkingDirectory() added for defining the location of the crawlers temporary working-directory manually. Therefor method "setTmpFile()" is marked as deprecated (has no function anymore).
  • New method "addContentTypeReceiveRule()" replaces the old method "addReceiveContentType()".
  • The function "addReceiveContentType()" still is present, but was marked as deprecated.

Requirements:

  • PHP 5 or higher
  • PHP with OpenSSL support

Similar Software

Lupyne
Lupyne

13 Apr 15

jQuery Facets
jQuery Facets

13 May 15

Zoom Search Engine
Zoom Search Engine

10 Feb 16

Lunr.js
Lunr.js

10 Apr 16

Comments to PHPCrawl

Comments not found
Add Comment
Turn on images!