Recently, I started playing with sitemaps. I even got a nice cron to help out. Today, I thought I would take a look and see if anyone had crawled one of the domains, say, earl.holaservers.com, and the some of the results (with a friendlier format) are in
+———-+———————-+————-+
| count(*) | domain | user_agent |
+———-+———————-+————-+
| 3 | earl.holaservers.com | . . . Baiduspider . . . spider_jp.html |
| 3 | earl.holaservers.com | ia_archiver |
| 4 | earl.holaservers.com | . . . Yahoo! Slurp |
| 14 | earl.holaservers.com | . . . Googlebot |
+———-+———————-+————-+
So it looks like I got crawled by
- the good old baidu spider who only hit my robots.txt file. Guess the japanese folks aren’t real interested in my test and rather random ftp’d files.
- ia_archiver, who looks to be alexa, though I am not sure where they came from
- our dear friend googlebot (welcome!) – who actually got a few pages
- yahoo! – just got robots.txt and sitemap.xml
- a couple other folks (not listed) just getting sitemap.xml
Looks like the bots came like 1-3 days after the pings got sent. Also looks like to pings were sent after the first day. Hmm, very strange since I have been uploading files here and there. At least one, I think. Well, this helped me track down a bug, where if you re-put a file, the timestamp wouldn’t change, and so the pinger wouldn’t find your latest. Think I fixed that, though I will tell you better in a couple days.
I think the above results rather telling. I pinged several search engines all about some random site that they had never heard of, and got rather different results from each, including crickets. Makes me wonder where the other guys are. Like ask.com? Seems like they were trying to make a run at google? Guess more like a run at baidu!
And yahoo!? Wonder why you can’t catch up? Well, might want to start by following the whole sitemap.
Moreover.com? Who are you and why don’t you come and visit?
Did I mention that it is pretty cool have traffic show up in mysql? Kind of reliant on mysql a bit, but hey, it stays up, and when my one server can’t handle traffic, I think that will be a good problem to have.
Enjoy!
Earl