arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

how to crawl a site with specific path starting from http://xyz.com/en/?

rated by 0 users
Answered (Not Verified) This post has 0 verified answers | 1 Reply | 2 Followers

Top 10 Contributor
83 Posts
InvestisDev posted on Sun, Mar 18 2012 10:09 PM

Hello,

we have to crawl a site that is multi lingual and in our case the url will look something like

site in English language starts with -> http://xyz.com/en/homepage.aspx

site in German language starts with -> http://xyz.com/de-DE//homepage.aspx

while crawling a site in English, it also takes URL of German language too...

Is there any way to start crawling a site that starts with such path?

Thanks

All Replies

Top 10 Contributor
1,905 Posts

Do you want to crawl the de-DE directory, or no?

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 1 (2 items) | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC