swift (saio) on rackspace

August 5th, 2010

Thought I should get a little acquainted with rackspace so thought I would try my scripts and process from this post there.  The steps are similar.

  1. Start an Ubuntu 10.04 LTS (Lucid Lynx) server, which I named saio and 256 megs worked for me
  2. Shell to your instance as the root user.
  3. wget http://blog.spack.net/saio.sh
  4. bash saio.sh

Make sure you . /etc/profile, then you can try st stat.

The ubuntu set ups between amazon and rackspace are a little different, so I had to change my scripts a bit, but they should now work on both platforms.  I read awhile ago about rackspace claiming they were much faster than amazon, so thought I would put them to the test.  On amazon, my script (which pulls down quite a few packages and is a bit expensive) took

real    3m48.428s
user    1m9.970s
sys     0m1.800s

on rackspace

real    2m9.948s
user    1m17.100s
sys     0m7.920s

So, yeah, rackspace took 56.58% of the time that amazon did.  That’s a pretty real difference.  And the rackspace server cost 1.5 cents per hour.  That’s pretty much free.

Since this was my first time on rackspace, here are some general thoughts.

  • Got a verification call after signup which I actually liked.  Guy was friendly and the call was brief.
  • Haven’t tried the cloud tools stuff, but the aws console feels a little more mature.
  • Didn’t have to set up keys or security groups with rackspace.  Back in the day, those things (especially the key) gave me pause on aws.  Course now I can do it in my sleep (nearly literally sometimes), but for a first time cloud user that might have been nice.  Might be nice to have an option for key based, and perhaps such an option exists.  Seems like you would want to have the security groups available but I haven’t found them in my quick browsing.
  • I got to / had to give the server a name.  This is actually a nice feature.  Pretty easy to get lost at aws in their instance ids.
  • A couple of the rackspace dialogs are a little slow to respond, so I ended up trying to start an instance twice but the second attempt failed because of name repetition.

If nothing else, rackspace’s openstack effort has now made them nearly a nickle from my testing efforts!

Enjoy!

Earl

swift / saio amazon install

July 22nd, 2010

So, I am pretty interested in cloud computing of late and thought I would give swift from openstack a shot.  After going through the several steps described here, I decided to try and script the process.  Here are the (hopefully) much simpler steps.

  1. Create an instance at Amazon using ami-bd37ded4 which is Ubuntu 10.4.
  2. Shell to your instance as the ubuntu user.
  3. wget http://blog.spack.net/saio.sh
  4. sudo bash saio.sh

Once the script finishes if you source your /etc/profile (. /etc/profile) to pick up a couple environment variables you should be able to see something by running

$ st stat

st is a command line tool for interacting with swift.  Type st<ENTER> to see some options.

A few notes.  The saio.sh script needs to be run as root and it will grab a couple more scripts (saio-guest-1.sh and saio-guest-2.sh) that get run as the user you create.  You can set a few environment variables before running the scripts if you like,  but the defaults will likely work just fine for you.

DISK_SIZE – size in bytes for how big of a drive to make, defaults to 10240.  The scripts just create a loop back device

MY_USER – user to run swift as, defaults to swift

MY_GROUP – group to run swift as, defaults to $MY_USER

MY_SHELL – shell for $MY_USER, defaults to /bin/bash

We now can poke around a bit using st.  Let’s make some dummy content

$ mkdir my_cool_directory

$ for i in {1..5};do echo “howdy file $i on `date`” >> my_cool_directory/$i; done

$ st upload my_cool_container my_cool_directory

Then you can try a couple things like

$ st stat

Account: aea05981-0136-4024-a2b1-d015df8e0c96
Containers: 1
Objects: 0
Bytes: 0

$ st list

my_cool_container

$ st list my_cool_container

my_cool_directory/1
my_cool_directory/2
my_cool_directory/3
my_cool_directory/4
my_cool_directory/5

I think the beauty of cloud computing is that for 8.5 cents (an hour) you can test out a likely pretty cool new offering.  Eventually, I would like to provide a lasic script that might be a touch cleaner, but I think this rather clean 🙂

Enjoy!

Earl

other than that, how was the play?

August 23rd, 2009

Oh that we could all be a little more forgiving like Luis below.

npr_does_not_work_at_all

About a year ago I posted about my hopes for android and it helping me write CrumbTracker.  Been about a month ago that I got my MyTouch, a google phone running android.  I have kind of caught the vision of app stores and the like.  Played a fair amount of a pretty killer tower defense game, Robo Defense, which I even bought after getting hooked on the trial.

This past week, in my copious spare time, I dove in and worked on my android app CrumbTracker, that ties into my first real java appengine work at the web site.  The vision is that the phone phones home to an appengine web site.  If you have an android powered phone, give it a go.  Last night I got it so the uploaded crumbs show up on a google maps map.  Think I will write a series of blog posts about my experience getting it all up and running.  Non-trivial, but hey I have only worked on it a few nights, so that’s pretty cool.

Here are some posts I am thinking about writing

  • switching a domain from google apps to appengine
  • SQLite on android
  • the datastore on java-powered app engine
  • the emulator
  • doing an http post from java / android
  • recognizing you’re in the android emulator from java
  • sharing code (or not) across android / appengine projects
  • ui in android
  • android location management
  • google maps initiation
  • svn hiccups

Wow, that sounds VERY interesting!  And I hope it is.

Enjoy!

Earl

onion days

August 7th, 2009

Maybe I should wait to write three of them, but if I worked for the onion, I think I would submit the following for consideration

  • Study finds global warming caused by computers studying global warming.
  • Two NFL athletes move into training camp facility to fight for “first one to practice, last one to leave” title.

What do you think?  Should I send them my resume?  Maybe I’ll wait till I get a third one.

Enjoy!

Earl

some help sending

July 21st, 2009

So, I have a couple famous friends and one of them asked if I could write a little app that could help him send emails to folks on his totally legit email list.  Well, since it is a totally legit email list, folks that really do want to hear from him, I felt pretty comfortable writing the app on google appengine.  About a day after he asked for a little help, google released an offline taskqueue that could be used nicely for sending emails.  Rather serendipitous, I thought.

Lesson in non-engineer customers: sometimes they want you to do things that you think you can’t do, but with a little effort you can.  My friend said he would like to just maintain spreadsheets in google docs for who to send to, then have my program look at the spreadsheets and send accordingly.  I was like, “yeah, I don’t think we can do that.”  We started to discuss how to maintain the email list just with my app, how to keep things in sync, how to query for users in a certain country or state, etc.  Messy.  Well, turns out using the gdata api, I can authenticate a user, talk to google docs, allow the user to pick a spreadsheet / worksheet, then pull stuff like email addresses.  Awesome!

Took a few nights, but I have something up and running.  My friend says he will give me some nice powered by links when he sends and I am hoping for a blog post here or there.

If you happen to have a totally legit email list (very serious about that part) and would like to trade some sending help for some publicity / marketing, please drop a line to cahille AT yahoo DOT com.

Enjoy!

Earl

look out google!

June 30th, 2009

For ages I have been meaning to add some sort of search to mycomparer.  Well, we’re live!  I spent may four hours total on it over two nights, and implemented the following features

  1. walk through each word in the query and search categories and += matching categories
  2. walk through each word in the query and check against upcs
  3. walk through each word in the query and check against affiliate ids, like searching by asin
  4. good old full text search via mysql

I did the first three the first night, and started the full text search.  Here’s what I had to do.

  • Create a table
    • CREATE TABLE sh_product_my_text (
      product_id INT NOT NULL,
      FOREIGN KEY (product_id) REFERENCES sh_product(id) ON DELETE CASCADE,
      my_text TEXT NOT NULL,
      FULLTEXT(my_text),
      timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
      ) ENGINE=’MYISAM’;
  • then in my shopping db population process, I populated the table with some product stuff
  • I change a query like “computers netbook -wireless” to “+computers +netbook -wireless”
  • the above ends up in ? for
    • SELECT product_id FROM sh_product_my_text WHERE MATCH(my_text) AGAINST(? IN BOOLEAN MODE) limit 20;
  • also added–key_buffer_size=1024M to my mysql config.  pretty terrible before this change, pretty good after

Course, I implemented it as a service and tie into the service via my Template::Plugin::WebService with code that looks like

[% USE web_service = WebService %]
[% search_ref = web_service.webservice_call(‘/api/shop/search’, form) %]

Can’t tell you how cool I think that is.  If I decide to serve straight from flex or something, then it is pretty well no code change.

And that’s about it.  Give it a go.

Enjoy!

Earl

a little pig helping make me famous

June 18th, 2009

Awhile ago I wrote some (I think) cool stuff for pig that allowed for parsing apache logs.  Unfortunately I wrote my stuff on an old branch.  Didn’t really know it was an old branch and that everything I wrote would need to get ported, but there you go.  Recently, someone ported my stuff (which was awesome!), and folks at cloudera are blogging about it.

Years ago, I wrote this (I thought) cool stuff, Data::Fallback, which would allow you to pull data from various sources.  I don’t think anyone in the world ever used it.  Like ever.  In fact I discovered memcached and I quit using it.  Kind of cool that folks might actually use some stuff I wrote.

Earl

fixing bugs and satisfying users

June 18th, 2009

One of my “many” hosting customers was mentioning that when he logs into his site, he sometimes sees an error.  Well, it turns out I would sometimes see that error and he mentioning it inspired me to look into it.  It comes down to speed.  I have my admin stuff hosting on google app engine and it talks to my backend via web service.  Turns out that google doesn’t want to host slow serving pages, meaning pages that take more than like five seconds to load.  And it turns out that my web service would sometimes take more than five seconds.  There were a couple issues.

  1. Memcached helps me not hit the database.  I used to have servers at 10.1.1.1 and 10.1.1.2.  A little while ago I quit running the 10.1.1.2 server, but was still checking it in the code.  Think I would hit some timeout which wasn’t too long, but it slowed me down enough to annoy google.
  2. Memcached is all about what the memkey is.  You look up values based on a memkey.  Well, I call it memkey anyway.  For me, if the memkey doesn’t return something, I hit the database and then add the memkey.  Well, in my code for getting a user’s configuration, I had hard coded $memkey = time, which means that each time the code ran, I would fail to get the conf.  I guess that someday in the past I wanted to generate the conf each time, and then just happened to commit.  Oops.
  3. Added an index or two to mysql, but don’t think that helped too much.

I am afraid that folks would try and login, get an error and give up.  For sure they wouldn’t likely tell their friend to come sign up for a site.

Enjoy!

Earl

bulk load and +=

June 13th, 2009

Let’s supposing that you have log files of some sort pouring in and you want to put aggregate data representing the logs into an rdmbs.  To begin, let’s start with a blank slate, i.e., just dumping the data in.  And let’s have a simple table, that in mysql is created via

CREATE TABLE `history` (
`id` int(11) NOT NULL auto_increment,
`hits` int(11) NOT NULL,
PRIMARY KEY  (`id`)
);

I did a pass each for both MyISAM and Innodb with a million inserts.

engine

queries per insert

seconds (lower is better)

MyISAM

10000

7.046952963

MyISAM

1000

7.342753172

MyISAM

100

8.521313906

MyISAM

10

31.44731498

MyISAM

1

135.3045712

MyISAM load data infile

4.927606106

Innodb

10000

19.76374817

Innodb

1000

30.58060002

Innodb

100

89.54839206

Innodb

10

723.135994

Innodb load data infile

17.25715899

A multi-value insert for three values looks like this

INSERT INTO today (hits) VALUES (?), (?), (?)

Then I execute with the three values.

The fact that inserts with 1000 values start to approach the load data infile numbers is a little compelling.  But let’s suppose that we want to do every insert from a bulk load but we want to have a table (like history above) that has aggregate data, += style.  Is it possible?  Sure.

Here is one approach for mysql:

  1. Create a temp table, which I will call today
  2. Bulk load the data into today
  3. Run the query INSERT INTO history (SELECT * FROM today) ON DUPLICATE KEY UPDATE history.hits = history.hits + today.hits;
  4. Drop today

I would like to apply this strategy and contribute some pig code that allows for bulk insert.  This would (I think) allow for some pretty large scale aggregating all from with a “simple” pig script.  Would also like to start using chukwa, but it looks a little tough.  I think the architecture would then look something like

web servers -> chukwa -> pig -> mysql

Think then I would be pretty well at yahoo! or facebook scale.

Guess we’ll see how it all goes 🙂

Enjoy!

Earl

eating someone else’s dog food

April 15th, 2009

I was pretty excited about getting sitemaps working, so much so that I recently wrote about it. Turns out I had a couple bugs in my implementation.  When I was on diamondcomparer.com, I would do something like show all the categories / products that diamondcomparer didn’t actually offer.  Also turns out that pretty well each shopping site had more than 50,000 urls, which means I had to break things up a bit.  Plus, I wasn’t zipping things, and I wasn’t real confident I was doing everything right.  So, I decided to use google’s open source code for generating sitemaps, which I figured handled everything I was looking for.

In the past I had used the google code for crawling directories, but now I needed to pull from a database to my list of urls.  Well, turns out the google code can handle that as well.  You just dump the urls to a file, make a config file explaining a few things and then away you go.  Was really not too bad.  Stayed up till three am last night getting this to work

sitemaps

While I am here, have you seen chrome’s xml viewer?  Yeah, me neither, it just dumps to the screen.

I have been tracking google (and others) crawling my stuff, and it looks like the product pages haven’t been getting crawled.  I am hoping this helps that out.  Guess we shall see.  I am now generating these files and pinging the search engines nightly.  Really would like to get traffic based on product pages being indexed well.

Enjoy!

Earl