arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Frequently Asked Questions

Licensing: 

  1. Where can I obtain the latest version?
  2. Can the license be upgraded?
  3. Can arachnode.net be installed on more than one machine?
  4. Anything else to know about licensing?

Installation Errors: 

  1. Does arachnode.net support installation to multiple machines?
  2. Are there any installation videos and/or additional documentation?

SQL Connections: 

  1. I am having trouble connecting to the SQL server.  Or, my Console window doesn't appear to be doing anything.

Usage: 

  1. How do I submit a CrawlRequest to be crawled?
  2. How are the 'Depth' and 'Priority' settings defined?
  3. In the 'CrawlRequests' database table, what is the difference between AbsoluteUri1 and AbsoluteUri2?
  4. What do the values in UriClassificationType mean?  How does this relate to IsCrawlRestricted and IsDiscoveryRestricted in DiscoveryManager.cs?  How can I limit crawling to a specific domain?
  5. arachnode.net seems to be crawling more pages than I set it to?
  6. How can I crawl Google SERPS and then start my crawls from there?
  7. How do I ensure that I crawl politely?
  8. Can I submit CrawlRequests while crawling?  And, can I do so from a Plugin?
  9. How should I go about parsing the HTML I download?
  10. How do I know when a Discovery (File/Image/WebPage) has changed?
  11. How can I configure arachnode.net to crawl RSS/ATOM files?
  12. How do I disable the robots.txt rule?
  13. How do I integrate arachnode.net with other applications?
  14. How do I submit different credentials to different sites?

Features: 

  1. How do I convert downloaded WebPages into xhtml and store in the database (1)?
  2. How do I convert downloaded WebPages into xhtml and store in the database (2)?
  3. How do I implement custom paging for the Lucene.net index results when searching?
  4. Can arachnode.net crawl Facebook and Twitter?
  5. What does the BayesianClassifier.cs file do?
  6. What does the Templater.cs file do (1)?
  7. What does the Templater.cs file do (2)?
  8. How do I administer arachnode.net from a GUI or WebPage?
  9. Does arachnode.net crawl JavaScript links?
  10. How do I crawl WebPages and filter them based on keywords in the WebPage source?

Common Errors: 

  1. What's the deal with the 'Analysis Services' project (1)?
  2. What's the deal with the 'Analysis Services' project (2)?
  3. I cannot open the 'Functions' project?
  4. I can't create the assembly for 'Arachnode.Functions'?
  5. Why are the hundred or so reporting tables empty?
  6. Visual Studio keeps breaking execution in WebClient.cs?
  7. I seem to be retrieving fewer results than I expect.  Why?

Site Usage: 

  1. Why was my post moderated?
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC