Archive for the ‘Code’ Category

Search engines

Tuesday, October 16th, 2007

Search engines are difficult creatures. (In case you didn’t know, Steven tells me I’m “working” on the search engine for the site.) With minimal work, he’d like the algorithmic engine and database to intuit what we meant — even if it isn’t what we asked. The engine, however, lacks a critical feature that disambiguates statements. Consider the queries “batman” and “val kilmer.” What should a search engine for movies return? Should “batman” return the Val Kilmer actor entry? Should “val kilmer” return the Batman movie entry? Maybe it should just return the phrase “worst batman ever.”

What do you expect? I would expect a search for “batman” to return a list of the Batman movies. Likewise, I would expect a search for “val kilmer” to return his actor page as one of a very small set of results. These results are comparable to going to your movie friend and asking about Batman and Val Kilmer. I like using the phrase Batman to test our search engine. One day, it surprised me and pulled up an actor. No, it wasn’t Val Kilmer, or even Michael Keaton. It was an actor named Batman. I was ready to scream! It was probably around 1am when this was happening, and it most likely meant that my initial database dump had some sort of irregularity that I was picking up in the parsing. So I looked through the raw files. The files showed that Batman really was there. So who is this guy?

IMDB comes to my rescue — http://www.imdb.com/name/nm2533507/

Batman indeed appears in a movie. I have never heard of this movie, but Batman stars in it. What happened here? The engine lacks cultural knowledge! We know of Batman because of the movie (or more likely, the comic) not because of the actor. So should the engine bias the results towards the movie? Such a bias raises troubling questions. As someone who works with data, I think the actor batman is the more interesting result, but it might not agree with what people expect. Therefore, the engine will end up biased towards what most people expect. It’d be better if the search engine really was a baby that we could train just like a person; that way, it’d have the cultural knowledge to answer queries appropriately. (Of course, treating it like a baby raises ethical issues with telling a baby about certain unmentionable movies .)

I get distracted by code, and we have a new idea

Sunday, October 14th, 2007

I was actually going to write this yesterday, but I somehow ended up coding instead of blogging. This happens a lot, and not just with this project. Most of the time, I procrastinate by coding. It’s awesome because I get a lot accomplished this way, but also not so great because (as you might expect if you’ve read Josh’s previous post) I am falling behind in full-time recruiting and classes. Not that I should worry about recruiting, though - this would be an awesome company to work for. Classes, however, I’m not sure. I only have 11 units right now, and that’s including two tennis classes.

I think this past week has been somewhat of a revitalization of the project. Chris, Steven, and I came up with an interesting twist on what we’re doing on Sunday (a week ago). Over the several meetings we’ve had this week, the idea has been polished and refined, even in details of implementation and launch strategy. It’s defined pretty well now, and I think it really is a much better reflection of our vision than our original design. I can’t talk too much about it, but I can tell you that it requires a pretty significant revamp of the site’s design and a compressed timeline for some of the modules. Nonetheless, the whole team is (I think) really excited about this new development, and we’re working hard to bring it public.