Archive for June, 2009

look out google!

Tuesday, June 30th, 2009

For ages I have been meaning to add some sort of search to mycomparer.  Well, we’re live!  I spent may four hours total on it over two nights, and implemented the following features

  1. walk through each word in the query and search categories and += matching categories
  2. walk through each word in the query and check against upcs
  3. walk through each word in the query and check against affiliate ids, like searching by asin
  4. good old full text search via mysql

I did the first three the first night, and started the full text search.  Here’s what I had to do.

  • Create a table
    • CREATE TABLE sh_product_my_text (
      product_id INT NOT NULL,
      FOREIGN KEY (product_id) REFERENCES sh_product(id) ON DELETE CASCADE,
      my_text TEXT NOT NULL,
      FULLTEXT(my_text),
      timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
      ) ENGINE=’MYISAM’;
  • then in my shopping db population process, I populated the table with some product stuff
  • I change a query like “computers netbook -wireless” to “+computers +netbook -wireless”
  • the above ends up in ? for
    • SELECT product_id FROM sh_product_my_text WHERE MATCH(my_text) AGAINST(? IN BOOLEAN MODE) limit 20;
  • also added–key_buffer_size=1024M to my mysql config.  pretty terrible before this change, pretty good after

Course, I implemented it as a service and tie into the service via my Template::Plugin::WebService with code that looks like

[% USE web_service = WebService %]
[% search_ref = web_service.webservice_call(‘/api/shop/search’, form) %]

Can’t tell you how cool I think that is.  If I decide to serve straight from flex or something, then it is pretty well no code change.

And that’s about it.  Give it a go.

Enjoy!

Earl

a little pig helping make me famous

Thursday, June 18th, 2009

Awhile ago I wrote some (I think) cool stuff for pig that allowed for parsing apache logs.  Unfortunately I wrote my stuff on an old branch.  Didn’t really know it was an old branch and that everything I wrote would need to get ported, but there you go.  Recently, someone ported my stuff (which was awesome!), and folks at cloudera are blogging about it.

Years ago, I wrote this (I thought) cool stuff, Data::Fallback, which would allow you to pull data from various sources.  I don’t think anyone in the world ever used it.  Like ever.  In fact I discovered memcached and I quit using it.  Kind of cool that folks might actually use some stuff I wrote.

Earl

fixing bugs and satisfying users

Thursday, June 18th, 2009

One of my “many” hosting customers was mentioning that when he logs into his site, he sometimes sees an error.  Well, it turns out I would sometimes see that error and he mentioning it inspired me to look into it.  It comes down to speed.  I have my admin stuff hosting on google app engine and it talks to my backend via web service.  Turns out that google doesn’t want to host slow serving pages, meaning pages that take more than like five seconds to load.  And it turns out that my web service would sometimes take more than five seconds.  There were a couple issues.

  1. Memcached helps me not hit the database.  I used to have servers at 10.1.1.1 and 10.1.1.2.  A little while ago I quit running the 10.1.1.2 server, but was still checking it in the code.  Think I would hit some timeout which wasn’t too long, but it slowed me down enough to annoy google.
  2. Memcached is all about what the memkey is.  You look up values based on a memkey.  Well, I call it memkey anyway.  For me, if the memkey doesn’t return something, I hit the database and then add the memkey.  Well, in my code for getting a user’s configuration, I had hard coded $memkey = time, which means that each time the code ran, I would fail to get the conf.  I guess that someday in the past I wanted to generate the conf each time, and then just happened to commit.  Oops.
  3. Added an index or two to mysql, but don’t think that helped too much.

I am afraid that folks would try and login, get an error and give up.  For sure they wouldn’t likely tell their friend to come sign up for a site.

Enjoy!

Earl

bulk load and +=

Saturday, June 13th, 2009

Let’s supposing that you have log files of some sort pouring in and you want to put aggregate data representing the logs into an rdmbs.  To begin, let’s start with a blank slate, i.e., just dumping the data in.  And let’s have a simple table, that in mysql is created via

CREATE TABLE `history` (
`id` int(11) NOT NULL auto_increment,
`hits` int(11) NOT NULL,
PRIMARY KEY  (`id`)
);

I did a pass each for both MyISAM and Innodb with a million inserts.

engine

queries per insert

seconds (lower is better)

MyISAM

10000

7.046952963

MyISAM

1000

7.342753172

MyISAM

100

8.521313906

MyISAM

10

31.44731498

MyISAM

1

135.3045712

MyISAM load data infile

4.927606106

Innodb

10000

19.76374817

Innodb

1000

30.58060002

Innodb

100

89.54839206

Innodb

10

723.135994

Innodb load data infile

17.25715899

A multi-value insert for three values looks like this

INSERT INTO today (hits) VALUES (?), (?), (?)

Then I execute with the three values.

The fact that inserts with 1000 values start to approach the load data infile numbers is a little compelling.  But let’s suppose that we want to do every insert from a bulk load but we want to have a table (like history above) that has aggregate data, += style.  Is it possible?  Sure.

Here is one approach for mysql:

  1. Create a temp table, which I will call today
  2. Bulk load the data into today
  3. Run the query INSERT INTO history (SELECT * FROM today) ON DUPLICATE KEY UPDATE history.hits = history.hits + today.hits;
  4. Drop today

I would like to apply this strategy and contribute some pig code that allows for bulk insert.  This would (I think) allow for some pretty large scale aggregating all from with a “simple” pig script.  Would also like to start using chukwa, but it looks a little tough.  I think the architecture would then look something like

web servers -> chukwa -> pig -> mysql

Think then I would be pretty well at yahoo! or facebook scale.

Guess we’ll see how it all goes 🙂

Enjoy!

Earl