Archive for May, 2008

linker_site

Tuesday, May 6th, 2008

Well, tonight I spent a few minutes so that in the logged-in area, you can designate a linker_site, which is a site that will eventually have a link to a spack site.  I got the linker_sites into mysql, and added a linker_crawl_history table, though I am not actually crawling.  Of course, it is my crappy looking html.  Yet another thing to be fixed later 🙂

For some reason, the linker_sites seem much less flaky than me trying to get someone to email their friends.  Guess we’ll see what folks do more.

Earl

a little sitemap payoff

Tuesday, May 6th, 2008

Click here to see good old earl.holaservers.com in google, and here for Yahoo!

Now I just need to add some nice link backs for like a million legit sites and I bet I would be doing pretty well 🙂

Earl

(some) bots welcome

Saturday, May 3rd, 2008

Recently, I started playing with sitemaps. I even got a nice cron to help out. Today, I thought I would take a look and see if anyone had crawled one of the domains, say, earl.holaservers.com, and the some of the results (with a friendlier format) are in

+———-+———————-+————-+
| count(*) | domain | user_agent |
+———-+———————-+————-+
| 3 | earl.holaservers.com | . . . Baiduspider . . . spider_jp.html |
| 3 | earl.holaservers.com | ia_archiver |
| 4 | earl.holaservers.com | . . . Yahoo! Slurp |
| 14 | earl.holaservers.com | . . . Googlebot |
+———-+———————-+————-+

So it looks like I got crawled by

  1. the good old baidu spider who only hit my robots.txt file. Guess the japanese folks aren’t real interested in my test and rather random ftp’d files.
  2. ia_archiver, who looks to be alexa, though I am not sure where they came from
  3. our dear friend googlebot (welcome!) – who actually got a few pages
  4. yahoo! – just got robots.txt and sitemap.xml
  5. a couple other folks (not listed) just getting sitemap.xml

Looks like the bots came like 1-3 days after the pings got sent. Also looks like to pings were sent after the first day. Hmm, very strange since I have been uploading files here and there. At least one, I think. Well, this helped me track down a bug, where if you re-put a file, the timestamp wouldn’t change, and so the pinger wouldn’t find your latest. Think I fixed that, though I will tell you better in a couple days.

I think the above results rather telling. I pinged several search engines all about some random site that they had never heard of, and got rather different results from each, including crickets. Makes me wonder where the other guys are. Like ask.com? Seems like they were trying to make a run at google? Guess more like a run at baidu!

And yahoo!? Wonder why you can’t catch up? Well, might want to start by following the whole sitemap.

Moreover.com? Who are you and why don’t you come and visit?

Did I mention that it is pretty cool have traffic show up in mysql? Kind of reliant on mysql a bit, but hey, it stays up, and when my one server can’t handle traffic, I think that will be a good problem to have.

Enjoy!

Earl