Archive for October, 2008

thoughts?

Wednesday, October 29th, 2008

"So, I've been thinking of trying a little lipstick."

“So, I’ve been thinking of trying a little lipstick.”

I smell bacon!

Friday, October 24th, 2008

For years I have been interested in parsing logs.  Did a fair amount of it in a past life.  I think pig changes pretty well everything.  Back in the day, I wrote some parsing in perl, tried to figure out how to group things, colate, += these logs into a database, etc.  Well, for me, pig makes the parsing pretty slick.

My pig scripts generally look like

REGISTER coolJarOne.jar

REGISTER secondJar.jar

Those jars can contain user defined functions (UDFs), for doing custom log parsing.

Then, let’s slurp in a log

raw = LOAD ‘myLog.txt’ USING org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader() AS (remoteAddr, remoteLogname, user, dayTime, method, uri, proto, status, bytes, referer, userAgent);

So that one line will parse one of apache’s combinedLogs.  And yeah, I wrote it.  It isn’t commited in yet, but I hope to submit a patch tonight-ish.

Then, this little bit of pig latin pulls out all the refers from my log

refererRaw1 = FOREACH raw GENERATE MyDateExtractor(dayTime), host, remoteAddr, referer;
refererRaw2 = FILTER refererRaw1 BY com.loghelper.MyLength($3) > 1;
refererRaw3 = GROUP refererRaw2 BY ($0, $1, $2, $3);
refererRaw4 = FOREACH refererRaw3 GENERATE FLATTEN($0), COUNT($1);
STORE refererRaw4 INTO ‘log-referers.txt’ USING PigStorage();

so log-referers.txt is a tab delimited file, which is rather easy to parse and += into mysql.  The beauty is that you can supposedly do it all in parallel.  Haven’t done that yet, but hopefully soon.

A few exciting things have been happening for me.

  1. My pig patches (which I call bacon) have been committed in.  Check it out.
  2. I tied in my pig stuff to parse logs from good old holaservers.
  3. I tied into the Google Visualization API and am now generating some pretty killer graphs.
  4. Tonight I took a friend’s combined log, ran it through my system and folded it on in to the above graphs.

Course, all the pig and hadoop (the parallel processing stuff) work will hopefully be the basis of good old loghelper.

where is my van?

Friday, October 3rd, 2008

(originally posted September 16th, 2008)

So, after my awesome marketing, I actually got some few signups.  And sadly, though perhaps not too surprisingly, most all the signups are spammers.  Looking like rather dumb linker sites, and some auto-redirect sites.  I nearly linked to them, but there you go.  Think I will block with rather extreme prejudice.  By email, by ip, each site for the offending user, etc.

I used to fight against such things at freeservers.  Sounds like I need to write a script or two that can detect problems and block offending sites.  Not really what I want to be working on, but I sure don’t want to pay for folks to peddle their spammy sites.

Finding offensive javascript is one thing, but I am thinking that for offensive images maybe I will use amazon’s mechanical turk for such things.  In order to get my feet wet, I have the following HIT (Human Intelligence Tasks) currently live

In English, please type in your favorite clean joke. Jokes that aren’t clean will not be accepted. You are welcome to provide an explanation for why you like your joke so much.
For a nickle a joke, it has been fun to watch.  Think my favorite weird response has been
There was one litle man whit his van and wherever he goes he say i am a litlle man this is my van and he robs a Bank.After 20 states finally he come’s in Bosnia and say i am a litlle man where is my van. I am from Bosnia and Hercegovina and this joke is one of many about bosnian people I had to change the joke a litlle bit to write in english but it didn’t lose his charm. I like this joke because most people Bosnian people survived everything and they are clever and most of them thives.
And the joke I will most likely keep repeating?
Q: What is brown and sticky?
A: a stick
Enjoy!
Earl

head in the clouds

Friday, October 3rd, 2008

(originally posted September 16th, 2008)

Recently I went to an amazon web service thing up in salt lake.  Turns out the cloud (they say) is all the rage.  Folks doing pretty well everything in the cloud.  I recently moved the start of holaservers to google app engine.  Here is a before pic

Pretty well every non-S3 thing was a single point of failure.

And now

So, for a bit of work, I have a highly available front page.  And for some reason, it is free until I start getting like five million hits a month.  Yeah, not really bumping up to that.

At the aws thing, I learned that I could do all my holaservers stuff in their cloud.  Didn’t really think I could do ftp or kind of hard web serving via my own custom apache, but yeah, can totally do both.  Sounds like EC2 can pretty well just give me a virtual opensolaris box.  I am kind of tempted to start moving stuff on over.  Besides a pretty big aws learning curve, and quite a bit of work, I would have to start putting out $72 a month, which won’t really be happening till I make some money or become a startup finalist.  Think I find out on October 3rd.

Earl

a little marketing music

Friday, October 3rd, 2008

(originally posted September 8th, 2008)

Well, am finally going to give it a little bit of a go.  I just posted HolaServers to the Google App Engine Gallery, and you can hopefully see me here.  Wonder if I will get my first real signup based from it.  I have analytics going (actually just added a couple more pages), so I should hopefully be able to measure the deluge pretty well.  Also have the whole every hit goes into mysql going, so we shall see.  I am afraid of spammers or inappropriate contenters, but there you go.

I also entered the Amazon Startup Challenge for HolaServers and good old LogHelper, which it turns out, I haven’t blogged about.  Well, won’t that be an awesome entry.  I figure I have a decent shot at making the finals, which would mean at least $5000 in amazon aws credits.  That would turn out the be real money, since I will be paying out of pocket for S3 as I am getting started.  Also, I would quite like to have my mysql hosted on say EC2, which it looks like is getting more doable daily.  Not sure how they ensure that your mysql daemon stays up.  To do such a thing directly with amazon is like $72 / month (ten cents / hour * 24 hours / day * 30 days / month).  Not too bad once I am getting upgrades, but to start, I think it rather prohibitive.  My co-lo is only like $100 per month and that’s for 2U of servers.  Winning the contest and getting 50k in cash and 50k in credits would also be cool.

So lately I have been trying to get the styles to line up a bit better and though I am not all done, I would feel pretty hopeless without good old firebug.  With firebug you can mouse through the dom and the browser highlights what you’re over.  Also, coming from a guy who used to make a living reading some not so good perl, css is a bit hard to read / follow.  Guess I would get better at such things, but I am not there yet.

Are the next generations of designers going to be great at design, html / css, javascript, templates and the like?  Can someone point me in the direction of some such person?  I would really like to be able to set someone up with eclipse, python and getting holaservers working on their laptop, have the mythical designer edit the templates and commit them when they’re done.  Is that so wrong?

Enjoy,

Earl