An Open Source C# web crawler with Lucene.NET search using SQL Server 2008/2012/2014/2016/CE An Open Source C# web crawler with Lucene.NET search using MongoDB/RavenDB/Hadoop

Completely Open Source @ GitHub

Does scale? | Download the latest release
MySQL/Xamarin Port in development now


Initial thoughts: MySQL isn't nearly as polished as SQL Server.  The Migration Wizard has plenty of quirks/bugs.  The maximum key length is 767 bytes and doesn't seem to play well with a varchar(767) column in the converted MySQL database.  Also, none of my views or stored procedures were converted, despite the Migration Wizard seeming to indicate they would be.

Main challenge? - ~77 warnings.  Most of the warnings for my particular conversion center around key/index name length.


There is a VS add-in/Connector for MySQL, which is nice - now I can interact in native mode rather than through ODBC.

What I need is a mature cross-platform compare tool.  There are over 200 tables in and porting them by hand will take months.


I was able to have solid success with this tool:

I wanted to use the TableAdapters for my DAO but it doesn't look like the integration into Visual Studio is completely finished - DBDirect methods don't seem to work for INSERT/UPDATE/DELETE.

I found this, and it looks like I can create a complete CRUD model for MySQL in Visual Studio:

I should be able to make light work of the conversion - more tomorrow...


OK, I have encountered a problem with DBConvert - while it reads the Foreign Keys OK, it doesn't respect their order and so data migration fails.

So, back to the MySQL Migration wizard - I had moved away from the MySQL Migration Wizard as it was clunky and seemed to raise a significant number of errors.

I need to remove the ASP.Net Membership Provider tables for this release - not sure if I can make them compatible as this is a MS-only thing - they are not essential for AN usage and are there as a helper, pre-wired as a WebService should you need them.

After removing the ASP.Net Membership Provider tables I only have one warning, which should be OK - we'll see...

Success!  All tables migrated with all data copied!

Now, I load up the MySQL conversion into CodeFluent.  Be prepared for ".dll hell" if you have any other version than 6.9.5 installed ANYWHERE: 

Next, add a producer...

Select 'Surfaces\Default' (from the CodeFluent project) and Build.

Then, you should have a set of BOM classes to interact with MySQL!

OK, the big snag I have hit is that I can't seem to figure out how to get the entities to INSERT/UPDATE/DELETE into MySQL - adding a MySQL producer drops all table from the intended DB and errors out on trying to create the first table.  I will have to call their support line directly tomorrow...

[EDIT]: OK, be sure that you have your delimiter set.  DELIMITER // :: CodeFluent will blow away your existing DB, so have a SQL backup handy to restore after it generates your new scripts.

I should be able to port each of the IArachnodeDAO methods to MySQL in the upcoming week - I would be surprised if it took me more than a day or two to do.


I have most of the configuration GETS and the DB INSERTS complete - testing, testing, testing.  It looks like I will need to add methods for each Discovery type whereby the Discovery may be selected by ID or by AbsouteUri: WebPages, Images, Files, HyperLinks - CodeFluent should be able to generate the SQL stored procedures for me.  I have about two hours into the code, another six or so and the basics for CRUD should be solid.

Initial performance testing shows that SQL Server and MySQL are neck and neck in resource consumption.  I am unsure why MySQL is consuming more RAM... ???


I have most of the ArachnodeDAOMySQL.cs class completed and most everything seems to be working as expected.  A few things were learned along the way.

One - characters need to be escaped when inserting/updating into MySQL.  MySQL treats character strings just like they are treated in C/C# - \ must be escaped to \\.

Also, there are a ridiculous amount of configuration settings for InnoDB:

We are very likely into territory analgous to the "Inside SQL Server" book series.

What else?  CodeFluent is quite helpful but I have had to design my own workflow as either several things are missing or I just don't know how to operate properly within CodeFluent Entities.

However, their platform independent code is quite nice.  I have had to create methods to select by AbsoluteUri, [Discovery]ID and [Discovery]ID:

I did notice that CodeFluent isn't reading the LONGBLOB column types properly and I must Find/Replace them on each insert.

The other thing to sort out is PERFORMANCE.

My Query TableAdapter can insert 10,000 records in 30 seconds in a particular, non-joined test case, yet it takes MySQL 2 minutes and 5 seconds to perform the same insert operations.


I am still adding MySQL code to port over additional tables, most notably in the cfg. schema - I finished up a major project this last week so I should have additional bandwidth during the week.

I discovered you may have to correct the increment/ID of any MySQL to DataTable populations you may use when populating data from MySQL as there isn't a native bridge from MySQL to the DataTable, as used by AN.


A Beta is checked into SVN.

Posted Wed, Feb 18 2015 3:28 PM by


jusrob wrote re: MySQL/Xamarin Port in development now
on Thu, Mar 5 2015 5:43 PM


I'm a C# developer that's worked with spiders/crawlers in the past.  I tried to download your source but was unable.  I also saw that you were porting to Linux and mySQL.  If you'd like any assistance on the coding, please let me know.  I've been looking for a really cool side project!


Justin / wrote re: MySQL/Xamarin Port in development now
on Mon, Mar 9 2015 7:53 PM


I hope to have some time in the evenings this week - I will let you know, or, perhaps kick what I have to you for testing and input?



An Open Source C# web crawler with Lucene.NET search using SQL 2008/2012/CE

copyright 2004-2017, LLC