Archive for the ‘Uncategorized’ Category

look out google!

Tuesday, June 30th, 2009

For ages I have been meaning to add some sort of search to mycomparer.  Well, we’re live!  I spent may four hours total on it over two nights, and implemented the following features

  1. walk through each word in the query and search categories and += matching categories
  2. walk through each word in the query and check against upcs
  3. walk through each word in the query and check against affiliate ids, like searching by asin
  4. good old full text search via mysql

I did the first three the first night, and started the full text search.  Here’s what I had to do.

  • Create a table
    • CREATE TABLE sh_product_my_text (
      product_id INT NOT NULL,
      FOREIGN KEY (product_id) REFERENCES sh_product(id) ON DELETE CASCADE,
      my_text TEXT NOT NULL,
      FULLTEXT(my_text),
      timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
      ) ENGINE=’MYISAM’;
  • then in my shopping db population process, I populated the table with some product stuff
  • I change a query like “computers netbook -wireless” to “+computers +netbook -wireless”
  • the above ends up in ? for
    • SELECT product_id FROM sh_product_my_text WHERE MATCH(my_text) AGAINST(? IN BOOLEAN MODE) limit 20;
  • also added–key_buffer_size=1024M to my mysql config.  pretty terrible before this change, pretty good after

Course, I implemented it as a service and tie into the service via my Template::Plugin::WebService with code that looks like

[% USE web_service = WebService %]
[% search_ref = web_service.webservice_call(‘/api/shop/search’, form) %]

Can’t tell you how cool I think that is.  If I decide to serve straight from flex or something, then it is pretty well no code change.

And that’s about it.  Give it a go.

Enjoy!

Earl

a little pig helping make me famous

Thursday, June 18th, 2009

Awhile ago I wrote some (I think) cool stuff for pig that allowed for parsing apache logs.  Unfortunately I wrote my stuff on an old branch.  Didn’t really know it was an old branch and that everything I wrote would need to get ported, but there you go.  Recently, someone ported my stuff (which was awesome!), and folks at cloudera are blogging about it.

Years ago, I wrote this (I thought) cool stuff, Data::Fallback, which would allow you to pull data from various sources.  I don’t think anyone in the world ever used it.  Like ever.  In fact I discovered memcached and I quit using it.  Kind of cool that folks might actually use some stuff I wrote.

Earl

fixing bugs and satisfying users

Thursday, June 18th, 2009

One of my “many” hosting customers was mentioning that when he logs into his site, he sometimes sees an error.  Well, it turns out I would sometimes see that error and he mentioning it inspired me to look into it.  It comes down to speed.  I have my admin stuff hosting on google app engine and it talks to my backend via web service.  Turns out that google doesn’t want to host slow serving pages, meaning pages that take more than like five seconds to load.  And it turns out that my web service would sometimes take more than five seconds.  There were a couple issues.

  1. Memcached helps me not hit the database.  I used to have servers at 10.1.1.1 and 10.1.1.2.  A little while ago I quit running the 10.1.1.2 server, but was still checking it in the code.  Think I would hit some timeout which wasn’t too long, but it slowed me down enough to annoy google.
  2. Memcached is all about what the memkey is.  You look up values based on a memkey.  Well, I call it memkey anyway.  For me, if the memkey doesn’t return something, I hit the database and then add the memkey.  Well, in my code for getting a user’s configuration, I had hard coded $memkey = time, which means that each time the code ran, I would fail to get the conf.  I guess that someday in the past I wanted to generate the conf each time, and then just happened to commit.  Oops.
  3. Added an index or two to mysql, but don’t think that helped too much.

I am afraid that folks would try and login, get an error and give up.  For sure they wouldn’t likely tell their friend to come sign up for a site.

Enjoy!

Earl

bulk load and +=

Saturday, June 13th, 2009

Let’s supposing that you have log files of some sort pouring in and you want to put aggregate data representing the logs into an rdmbs.  To begin, let’s start with a blank slate, i.e., just dumping the data in.  And let’s have a simple table, that in mysql is created via

CREATE TABLE `history` (
`id` int(11) NOT NULL auto_increment,
`hits` int(11) NOT NULL,
PRIMARY KEY  (`id`)
);

I did a pass each for both MyISAM and Innodb with a million inserts.

engine

queries per insert

seconds (lower is better)

MyISAM

10000

7.046952963

MyISAM

1000

7.342753172

MyISAM

100

8.521313906

MyISAM

10

31.44731498

MyISAM

1

135.3045712

MyISAM load data infile

4.927606106

Innodb

10000

19.76374817

Innodb

1000

30.58060002

Innodb

100

89.54839206

Innodb

10

723.135994

Innodb load data infile

17.25715899

A multi-value insert for three values looks like this

INSERT INTO today (hits) VALUES (?), (?), (?)

Then I execute with the three values.

The fact that inserts with 1000 values start to approach the load data infile numbers is a little compelling.  But let’s suppose that we want to do every insert from a bulk load but we want to have a table (like history above) that has aggregate data, += style.  Is it possible?  Sure.

Here is one approach for mysql:

  1. Create a temp table, which I will call today
  2. Bulk load the data into today
  3. Run the query INSERT INTO history (SELECT * FROM today) ON DUPLICATE KEY UPDATE history.hits = history.hits + today.hits;
  4. Drop today

I would like to apply this strategy and contribute some pig code that allows for bulk insert.  This would (I think) allow for some pretty large scale aggregating all from with a “simple” pig script.  Would also like to start using chukwa, but it looks a little tough.  I think the architecture would then look something like

web servers -> chukwa -> pig -> mysql

Think then I would be pretty well at yahoo! or facebook scale.

Guess we’ll see how it all goes 🙂

Enjoy!

Earl

eating someone else’s dog food

Wednesday, April 15th, 2009

I was pretty excited about getting sitemaps working, so much so that I recently wrote about it. Turns out I had a couple bugs in my implementation.  When I was on diamondcomparer.com, I would do something like show all the categories / products that diamondcomparer didn’t actually offer.  Also turns out that pretty well each shopping site had more than 50,000 urls, which means I had to break things up a bit.  Plus, I wasn’t zipping things, and I wasn’t real confident I was doing everything right.  So, I decided to use google’s open source code for generating sitemaps, which I figured handled everything I was looking for.

In the past I had used the google code for crawling directories, but now I needed to pull from a database to my list of urls.  Well, turns out the google code can handle that as well.  You just dump the urls to a file, make a config file explaining a few things and then away you go.  Was really not too bad.  Stayed up till three am last night getting this to work

sitemaps

While I am here, have you seen chrome’s xml viewer?  Yeah, me neither, it just dumps to the screen.

I have been tracking google (and others) crawling my stuff, and it looks like the product pages haven’t been getting crawled.  I am hoping this helps that out.  Guess we shall see.  I am now generating these files and pinging the search engines nightly.  Really would like to get traffic based on product pages being indexed well.

Enjoy!

Earl

1 or more

Tuesday, April 14th, 2009

A couple main parts to my shopping vision.

  1. Help users filter down to just the product(s) that they are looking for
  2. Find the best prices on said products

Well, been hoping for some progress on the second front for a good few years to no avail.  For whatever reason, this last week I integrated products (not a ton but a few) from newegg, bestbuy and buy.  This allows for actual price comparison, like so from http://mycomparer.com/ap/B00005ATMK/yo

pricespretty dang cool, I think.

The first one took a fair amount of work, but the next couple were a bit easier.  Looks like overstock.com has a data feed.  I would like to integrate with them and pretty well anyone that pays for conversions and offers a csv or the like.

I bit the bullet and switched everything that was going to trackings.com, shopthar.com, shop.spack.net, yohomes.com (and maybe a few others), and pointed it to http://mycomparer.com/.  Guess we’ll see how it goes.

Enjoy!

Earl

say hi to your NaN!

Tuesday, March 17th, 2009

Recently I cleaned up some shopping stuff so that if I got a single slider (that’s what I call the number picker guys) that was kind of empty, I would kill it.  I knew that it wasn’t really in general, and figured that I would someday have to clean it up.  Well, tonight was the night.  Stuff that looked like this

slider_nan

now doesn’t have that last chunk.  Not real sure where it comes from, but at least now I can clean it up 🙂

Also cleaned up something a little more subtle.  My goal is to have a pretty generic shopping engine, which doesn’t know the difference between a hard drive form factor and the clarity of a diamond.  That’s fine, except that also means it doesn’t know that IsLabCreated may not be the most meaningful, or weird contract warranty terms, or whatever.  Tonight I added the ability to ignore a category of stuff.  I even made it smart so that I walk the lineage of a category and look for ignore lists.  The cool part there is that I didn’t need to ignore IsLabCreated for each type of diamond ring.  So now, this

ignore

has the weird stuff stripped out.  Granted I need to do a database insert each time I find something else to ignore, but that’s ok.  I don’t have that many top level categories to manage (for now).

Next up, I would like to specify the order for distinctables to appear.  Like color and clarity before number of stones, or the like.

Tonight it hit me that once I get a little further along, I can go put up some “looking for a friend in the diamond comparing business?” flyers at byu, since I think a few folks down there are looking for rings 🙂

Anyone have any thoughts on mycomparer.com?  I registered it today.  Gotta be better than shopthar, right?  Still liking my diamondcomparer.com.

Enjoy!

Earl

me look pretty one day

Sunday, March 15th, 2009

So, trying to clean some things up a bit on the shopping stuff.  First thing I did was get rid of the check boxes and replaced them with links.  The next was to add a couple currency formatters.  A little subtle, but I think, you know a year or two of such changes could help a lot 🙂

Before:

prices_before_left

After:

prices_after_left

Before:

prices_before_rightAfter:

prices_after_right

Pretty nice, right?  Especially if anyone buys pretty well anything that has a comma in its price 🙂  Sorry, I don’t have the old check box stuff.  I updated too quick.

Enjoy!

Earl

a little documentation

Saturday, March 14th, 2009

Awhile ago, I wrote about LogHelper.  Well, sort of.  Turns out I just mentioned that I hadn’t really mentioned it before then went on my way.  Nothing too remarkable I guess, I would just like to parse folks’ rather arbitrary logs and allow them to do some pretty set analytics.  The theory is that folks would send me their logs, I would ingest them and output some pretty graphs.  As a proof of concept, I tried to create and eat my own dog food with HolaServers and LogHelper.  I scp’d logs from HolaServers to another box, parsed the logs there with good old pig (which I have written about), and display graphs using the Google Visualization API.  The cool graphs look like this

hs_traffic_graph1

A little side note.  Sometime ago, I made it so you could email a photo to your HolaServers site.  I would check once a minute for messages and act accordingly.  Well, the code I was using logged quite a bit and I didn’t realize it was logging anything.  Long story short, I filled up /tmp and my logging stuff started emailing me every few minutes.  So, I commented out a cron or two and quit doing my LogHelper stuff.  A week or two later I discovered that the mail checker was the real culprit but couldn’t figure out how to get the logging to work again.  Even if I had the following few lines, I would have saved several nights / hours.

  • the magic conf file is here conf/loghelper.com/mylogadm.conf, which needs to be copied to /etc/mylogadm.conf
  • on client boxes, this is the magic cron
  • 5,10,15,20,25,30,35,40,45,50,55 * * * * /usr/sbin/logadm -p now /opt/coolstack/apache2/logs/mylog -v -f /etc/mylogadm.conf
  • on the server box, this is the magic cron
  • 2,7,12,17,22,27,32,37,42,47,52,57 * * * * perl -I/export/home/earl/svn/lib /export/home/earl/svn/crons/loghelper/go

If you want to see the stats in action, just sign up for an HolaServers site, get some traffic, and then wait a few minutes to see things show up.

Enjoy!

Earl

now flex!

Sunday, March 8th, 2009

So, I dove in and learned some flex.  I watched many of these videos, referred to this gallery, installed flex builder 3 and cut some code for ShopThar.  I did a few things in not a ton of time, none of which anyone has likely noticed, but here’s a little list.

  1. ShopThar pages, like this one, have a nice image pulled from amazon for ranking stars.  There was a bug where flex would translate the 100 based numbers into a 5.5 rating on a 5 scale.  Got a nice broken image.  So, I finally tracked down the problem (which was hard because what I thought was a built in flex function was actually a shopthar function) and fixed it.  No more broken images on reviews!
  2. Used to have ARRAY or HASH stuff show up on product pages.  Again, it was hard to track down, but once I found a page with the problem, did some firebug magic, worked around memcache hanging onto bad data, and thought I fixed it.  Turns out the jury may be out a bit.  Just ran a couple queries.  
    • delete from sh_product where id in (select product_id from sh_product_distinctable where raw_value like ‘ARRAY%’);
    • Query OK, 1 row affected (7.17 sec)
    • delete from sh_product where id in (select product_id from sh_product_distinctable where raw_value like ‘HASH%’);
    • Query OK, 9665 rows affected (5 min 1.95 sec)
  3. Sometimes I would get NAN (not a number) for slider stuff.  Turns out I was sending out garbage data.  I look specifically for that garbage data and filter it out.  Not the best fix, but at least I fixed that one instance.
  4. Didn’t send out sets of distinctables (the checkboxes) when there was only one choice.  Not much of a choice if there’s only one, huh!

Flex!  I did all of that in flex.  Wow, better go update the resume, huh!  Did I ever mention that I paid someone to make my resume prettier?  Look how pretty!

Coming up next?

  1. Change the checkboxes and filtering to act more like Yahoo! mail filtering.  I would like folks to be able to pick more than one thing from a set, and don’t quite have that worked out in my head.
  2. Put the categories which are currently on the top of the page in html into flex.
  3. See what I think about changing sliders to use the NumericStepper component.
  4. Do some google analytics stuff for tighter tracking.

I am starting to get spider more and more, but real traffic hasn’t started to hit.

Oh, and I bought and put up diamondcomparer.com.  Took an hour or two to get all up and running.  A whole new vertical in an hour or two.  Someday this stuff will just have to start making money.  I want to get the engine working really smooth on diamondcomparer. 

Enjoy!

Earl