“So, I’ve been thinking of trying a little lipstick.”
Archive for October, 2008
thoughts?
Wednesday, October 29th, 2008I smell bacon!
Friday, October 24th, 2008For years I have been interested in parsing logs. Did a fair amount of it in a past life. I think pig changes pretty well everything. Back in the day, I wrote some parsing in perl, tried to figure out how to group things, colate, += these logs into a database, etc. Well, for me, pig makes the parsing pretty slick.
My pig scripts generally look like
REGISTER coolJarOne.jar
…
REGISTER secondJar.jar
Those jars can contain user defined functions (UDFs), for doing custom log parsing.
Then, let’s slurp in a log
raw = LOAD ‘myLog.txt’ USING org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader() AS (remoteAddr, remoteLogname, user, dayTime, method, uri, proto, status, bytes, referer, userAgent);
So that one line will parse one of apache’s combinedLogs. And yeah, I wrote it. It isn’t commited in yet, but I hope to submit a patch tonight-ish.
Then, this little bit of pig latin pulls out all the refers from my log
refererRaw1 = FOREACH raw GENERATE MyDateExtractor(dayTime), host, remoteAddr, referer;
refererRaw2 = FILTER refererRaw1 BY com.loghelper.MyLength($3) > 1;
refererRaw3 = GROUP refererRaw2 BY ($0, $1, $2, $3);
refererRaw4 = FOREACH refererRaw3 GENERATE FLATTEN($0), COUNT($1);
STORE refererRaw4 INTO ‘log-referers.txt’ USING PigStorage();
so log-referers.txt is a tab delimited file, which is rather easy to parse and += into mysql. The beauty is that you can supposedly do it all in parallel. Haven’t done that yet, but hopefully soon.
A few exciting things have been happening for me.
- My pig patches (which I call bacon) have been committed in. Check it out.
- I tied in my pig stuff to parse logs from good old holaservers.
- I tied into the Google Visualization API and am now generating some pretty killer graphs.
- Tonight I took a friend’s combined log, ran it through my system and folded it on in to the above graphs.
Course, all the pig and hadoop (the parallel processing stuff) work will hopefully be the basis of good old loghelper.
where is my van?
Friday, October 3rd, 2008(originally posted September 16th, 2008)
So, after my awesome marketing, I actually got some few signups. And sadly, though perhaps not too surprisingly, most all the signups are spammers. Looking like rather dumb linker sites, and some auto-redirect sites. I nearly linked to them, but there you go. Think I will block with rather extreme prejudice. By email, by ip, each site for the offending user, etc.
I used to fight against such things at freeservers. Sounds like I need to write a script or two that can detect problems and block offending sites. Not really what I want to be working on, but I sure don’t want to pay for folks to peddle their spammy sites.
Finding offensive javascript is one thing, but I am thinking that for offensive images maybe I will use amazon’s mechanical turk for such things. In order to get my feet wet, I have the following HIT (Human Intelligence Tasks) currently live
head in the clouds
Friday, October 3rd, 2008(originally posted September 16th, 2008)
Recently I went to an amazon web service thing up in salt lake. Turns out the cloud (they say) is all the rage. Folks doing pretty well everything in the cloud. I recently moved the start of holaservers to google app engine. Here is a before pic
Pretty well every non-S3 thing was a single point of failure.
And now
So, for a bit of work, I have a highly available front page. And for some reason, it is free until I start getting like five million hits a month. Yeah, not really bumping up to that.
At the aws thing, I learned that I could do all my holaservers stuff in their cloud. Didn’t really think I could do ftp or kind of hard web serving via my own custom apache, but yeah, can totally do both. Sounds like EC2 can pretty well just give me a virtual opensolaris box. I am kind of tempted to start moving stuff on over. Besides a pretty big aws learning curve, and quite a bit of work, I would have to start putting out $72 a month, which won’t really be happening till I make some money or become a startup finalist. Think I find out on October 3rd.
Earl
a little marketing music
Friday, October 3rd, 2008(originally posted September 8th, 2008)
Well, am finally going to give it a little bit of a go. I just posted HolaServers to the Google App Engine Gallery, and you can hopefully see me here. Wonder if I will get my first real signup based from it. I have analytics going (actually just added a couple more pages), so I should hopefully be able to measure the deluge pretty well. Also have the whole every hit goes into mysql going, so we shall see. I am afraid of spammers or inappropriate contenters, but there you go.
I also entered the Amazon Startup Challenge for HolaServers and good old LogHelper, which it turns out, I haven’t blogged about. Well, won’t that be an awesome entry. I figure I have a decent shot at making the finals, which would mean at least $5000 in amazon aws credits. That would turn out the be real money, since I will be paying out of pocket for S3 as I am getting started. Also, I would quite like to have my mysql hosted on say EC2, which it looks like is getting more doable daily. Not sure how they ensure that your mysql daemon stays up. To do such a thing directly with amazon is like $72 / month (ten cents / hour * 24 hours / day * 30 days / month). Not too bad once I am getting upgrades, but to start, I think it rather prohibitive. My co-lo is only like $100 per month and that’s for 2U of servers. Winning the contest and getting 50k in cash and 50k in credits would also be cool.
So lately I have been trying to get the styles to line up a bit better and though I am not all done, I would feel pretty hopeless without good old firebug. With firebug you can mouse through the dom and the browser highlights what you’re over. Also, coming from a guy who used to make a living reading some not so good perl, css is a bit hard to read / follow. Guess I would get better at such things, but I am not there yet.
Are the next generations of designers going to be great at design, html / css, javascript, templates and the like? Can someone point me in the direction of some such person? I would really like to be able to set someone up with eclipse, python and getting holaservers working on their laptop, have the mythical designer edit the templates and commit them when they’re done. Is that so wrong?
Enjoy,
Earl