arachnode.net
An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does arachnode.net scale? | Download the latest release

Deploy error

rated by 0 users
Answered (Verified) This post has 1 verified answer | 44 Replies | 5 Followers

Top 50 Contributor
11 Posts
dawood posted on Thu, Apr 9 2009 3:27 AM

I get the following error while deploying 'Analysis' project.

The project could not be deployed to the 'localhost' server because of the following connectivity problems :  A connection cannot be made. Ensure that the server is running.  To verify or update the name of the target server, right-click on the project in Solution Explorer, select Project Properties, click on the Deployment tab, and then enter the name of the server. 

Also, I see that my DB does not have 'Analysis' database. When try to activate using 'Analysis services' deployment wizard, i get the error saying that ' connection could not be made with localhost'.  

pls let me know if you can help. Am using sql server 2005 sqlexpress edition.

 

Answered (Verified) Verified Answer

Top 10 Contributor
1,905 Posts

OK. 

Analysis Services isn't part of SQL Server 2005 Express Edition.

You can remove the Analysis project from the solution.  It isn't required for crawling, indexing or searching.

http://www.microsoft.com/Sqlserver/2005/en/us/compare-features.aspx

Thanks!
Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

All Replies

Top 10 Contributor
1,905 Posts

OK. 

Analysis Services isn't part of SQL Server 2005 Express Edition.

You can remove the Analysis project from the solution.  It isn't required for crawling, indexing or searching.

http://www.microsoft.com/Sqlserver/2005/en/us/compare-features.aspx

Thanks!
Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 50 Contributor
11 Posts
dawood replied on Tue, Apr 14 2009 2:11 AM

Thanks for your reply.

I have this problem now. SQL Express edition is reinstalled in my pc for some tech reason. Now, I could not load Analysis, Integration & Test project. I get the error '.dwproj' , 'dtproj' file is not isntalled but I could see them available in the directories.

Kindly assist!

Top 10 Contributor
1,905 Posts

If you are using the Express version then these three projects won't be available.  You can remove them from the solution.  They aren't necessary for core crawling/indexing functionality.

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 50 Contributor
11 Posts
dawood replied on Wed, Apr 15 2009 4:07 AM

Thanks for your reply :)

I could build the sln and run it now.. I let the crawl console window run for nearly 4 hrs and as you could see below, lucenedotnetindex file is created with stuff...

oops...i could not paste :(

when i run, search.aspx, all my search fails (i.e.) '0' results found.. not sure why.. as i told I do have lucenedotnetindex folder with files under 'currentcrawl' dir.. ( 0_fdx, 0_fdt, 1_tis, 1_tii,1_prx,1_nrm, 1_frq,1_fnm,0_tis, 0_tii .... )

Am currently working on a poc to have this 'search' to be incorporated as a plugin into our web site..

Awaiting an update..

Top 50 Contributor
11 Posts
dawood replied on Wed, Apr 15 2009 5:50 AM

Also, I see below line from 'page_load' of search.aspx.cs line no. 94

hits =

Global.IndexSearcher.Search(query);

I see the value of 'query' as  "absoluteuri:test host:test text:test title:test " , if I pass 'Test' as the value in textbox while running search.aspx. Also, I get the value of 'hits' length as 0 after the above line gets executed.

As I told before, I need to add this 'search' utility for my website. Can you kindly adivse me on the procedure to do the same..:)

Thanks for your time in advance...

Top 10 Contributor
1,905 Posts

OK.  If you have files in the CurrentCrawl directory but not in the parent folder that means the console window was likely closed using ctrl-c instead of clicking on the close box.  Start the crawl again and wait until the crawler is downloading content, and then click the close box. 

At the beginning and end of the crawl, the ManageLuceneDotNetIndexes.cs plug-in merges the CurrentCrawl index with the main index.  If the console is cancelled instead of closing it and allowing the process to clean up and save the state to the database, then the lucene.net indexes won't be updated.

Does 'LuceneDotNetIndexDirectory' in the database table point to your lucene.net index location?

Let me know if you still have problems.

Thanks!
Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 50 Contributor
11 Posts
dawood replied on Thu, Apr 16 2009 5:35 AM

Thanks Mike.. It worked :) ( not sure why ctrl-c not worked ...)

I could now see results on search... Am in the process of customizing to our intranet sharepoint web sites & internet web sites which requires user credentials (username & passwd). I get the following in exception table when trying to access above said websites

The remote server returned an error: (401) Unauthorized.

1. How do we configure to access websites ( internet & intranet)  which requires user credentials?

2. In the crawlrequest table, I see the following columns absoluteuri,depth, restricttourihost & priority. Can you give the description of each of these columns and their significance?

2. Can you give us an overview on the configuration details in database tables to fine tune my search?

Sincerely appreciate your support...Gr8 work :)

Top 10 Contributor
1,905 Posts

ctrl-c is a bug that I need to fix.  :)

1.) I need to add this functionality into the core. Modify the WebClient at WebClient.cs in the SiteCrawler project.

2.) AbsoluteUri is the Uri you want to crawl.  http://arachnode.net.  Depth = 0 means crawl the page and record everything you found on the page but don't follow any links.  Depth = 1 means crawl the page and record everything you found on the page and do the same for every HyperLink you find on the page.  RestrictToUriHost means that if you wanted to crawl http://mikesblog.blogspot.com you would set this to '1' and it would restrict the Crawl to that Host.  The Domain for the previous AbsoluteUri is blogspot.com.  Priority is the priority in which CrawlRequests are crawled.  The higher the number the higher the priority.

3.) I can.  Once you have some indexed results run the stored proceure sp_UpdateReporting and ping me again.  :D  (Take a look at all the database table in the rpt. schema.)

Thanks!  A labor of love for sure.

-Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 50 Contributor
11 Posts
dawood replied on Fri, Apr 17 2009 6:29 AM

I was trying to update WebClient.cs without success :(

I need to achieve the following

1. To be able to crawl and index the content from our SharePoint intranet website which uses 'Windows Authentication'

2. To be able to crawl & index from Internet WebSite which uses 'Forms Authentication'

I do have the necessary credentials for the above 2 requirements but could my crawl doesn't index and end up with 'The remote server returned an error: (401) Unauthorized' error

Can you kindly suggest the implementation required to achieve the above requirements?

It would be off great help if you could brief me on this.

Thanks in Advance..

Have a good weekend  with your bands :)

Top 10 Contributor
1,905 Posts

This looks like an excellent resource for modifying WebClient.cs: http://msdn.microsoft.com/en-us/library/system.net.credentialcache.aspx

Try the code below?

 

 

public WebClient(string userAgent, ArachnodeDAO arachnodeDAO)

{

CachePolicy =

new RequestCachePolicy(RequestCacheLevel.CacheIfAvailable);

 

CredentialCache myCache = new CredentialCache();

myCache.Add(

new Uri("http://www.contoso.com/"), "Basic", new NetworkCredential(UserName, SecurelyStoredPassword));

myCache.Add(

new Uri("http://www.contoso.com/"), "Digest", new NetworkCredential(UserName, SecurelyStoredPassword, Domain));

Credentials = myCache;

 

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 10 Contributor
1,905 Posts

Actually, I just added an overload to Crawler.cs so you can pass in a CredentialCache.  Try that?

Thanks!
Mike

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Top 50 Contributor
11 Posts
dawood replied on Mon, Apr 20 2009 5:40 AM

Many Thanks for the update.

Along with the above changes ( crawler.cs & webclient.cs ), I added the following in "GetWebResponse" function in webclient.cs 

protected 

override WebResponse GetWebResponse(WebRequest request){

( (

HttpWebRequest)request).Accept = _accept;

((

HttpWebRequest )request).UserAgent = _userAgent;

request.Timeout =

ApplicationSettings.CrawlRequestTimeoutInMinutes * 60000;

 

 

// dawood added 

 

 

NetworkCredential myCredential = Credentials.GetCredential(new Uri("http://psi-epm/sites/LRO3.0_ORM/default.aspx"), "Basic");

request.Credentials = myCredential; 

 

 

// end

 

 

 

 

 try

 { _webException = null;

_webResponse =

base.GetWebResponse(request);

 

 

 

if (_webResponse.ResponseUri != request.RequestUri)

{

 

int i = 0; ..................

 

 

}

I achieved the following

1. Unauthorised error I reported earlier is not coming now

But, I see the results bit erratic. What I mean is, for a search, Am not getting the link as output but I do the results as below

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

arachnode.net
1 / 51
Results 1 - 10 of about 5 for alfred. (0 seconds)
P and L 21/12/2005 15:35 Alfred 05 Statement of Work 21/12/2005 15:35 Alfred 06 Statement of Work - RF - Real Time Feedback 18/04/2006 10:54 Alfred 07 ...
http://psi-epm/sites/LRO3.0_ORM/01%20%20Pre%20%20Sales/Forms/AllItems.aspx - Cached (1)
Page: 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 As you see above, LINK is not available for search result...Also, results shows as 51 but all I could see is only one result...

Also, If I search for a keyword like 'improvement program', am not getting the results of page that has both the keywords. for eg. if a specific page has both this key word, it is not displayed in results

 Your suggestions will help me to proceed further...

Thanks in advance !

 

Top 50 Contributor
11 Posts
dawood replied on Mon, Apr 20 2009 6:02 AM

One more quick clarification

I have modified the status of  '.doc' value to 'false' in table 'disallowedFileExtensions' but still I see that my search does not fetch results from doc files available in my web sites....

Thanks

daw

Top 10 Contributor
1,905 Posts

Check the table 'AllowedDataTypes'.  See DiscoveryTypes for the value for DiscoveryTypeID.

Perhaps I should clear the rows from 'DisallowedFileExtensions'... ???

For best service when you require assistance:

  1. Check the DisallowedAbsoluteUris and Exceptions tables first.
  2. Cut and paste actual exceptions from the Exceptions table.
  3. Include screenshots.

Skype: arachnodedotnet

Page 1 of 3 (45 items) 1 2 3 Next > | RSS
An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, arachnode.net LLC