Sunday, March 29, 2009

The Amazing Computer Science Diet

Ingredients:
  • A weight tracking site like FitDay. People enter some information about themselves, what they eat every day, and how much they weight. The site lets them track the number of calories they eat along with their weight. Currently, these sites don't make recommendations about what to eat.

  • Data mining algorithms.

Preparation Instructions:

Once you have over 5 million users on your weight tracking site, perform statistics to find out what actually makes people lose weight – e.g., “by eating one more cup of yogurt every day, you can lose 1lb per month.” As opposed to other diets that are pulled out of somebody’s behind, this one will be based on millions of data points.

Start recommending what to eat.

Alternative (Advanced) Preparation Instructions:

Use collaborative filtering to determine what is the best diet for each individual. By looking at people who have similar profiles to each other (they weight the same, like to eat the same things, etc.), it may be possible to design a diet that works for you personally: “that person who is very similar to you lost weight by doing X.”

Sunday, March 22, 2009

Should You Go to Grad School?

The Chronicle of Higher Education has a provocative article that strongly recommends not going to graduate school in the humanities. The last paragraph is particularly striking:
It's hard to tell young people that universities recognize that their idealism and energy — and lack of information — are an exploitable resource. For universities, the impact of graduate programs on the lives of those students is an acceptable externality, like dumping toxins into a river. If you cannot find a tenure-track position, your university will no longer court you; it will pretend you do not exist and will act as if your unemployability is entirely your fault. It will make you feel ashamed, and you will probably just disappear, convinced it's right rather than that the game was rigged from the beginning.

While the article is specifically about graduate school in the humanities, I feel that some of its points are also somewhat valid for computer science. The gist of the author's argument is: incoming students are not aware that the chances of getting a faculty job are tiny; further, even when you do get that prized faculty job, the job is not that good. I personally think being a professor is great job, but I can see how some could argue against that, especially considering how hard it is to get the job.

Let’s start with numbers. The number of people who graduate from “top 10” computer science programs every year is approximately 250. Conversely, the number of faculty positions that get filled at “top 50” research universities is about 25. That’s a ratio of about 10%, which doesn’t sound so bad (certainly not as bad as in the humanities), but there are two things that make the situation actually bad: (1) Notice that I took graduates from “top 10” programs and placed them in “top 50” programs, so this is not quite a fair comparison. The ratio becomes more like 4% if you count all graduates from “top 50” programs. At CMU, when we advertise a single faculty opening, we get approximately 500 applicants. That’s a success ratio of 0.2%. (2) The people who enroll in “top 10” computer science programs have already beaten the odds more than once. To be accepted into one of these highly ranked programs, you have to have excelled in an excellent college; to be accepted to an excellent college, you have to have excelled in high school, and so on. These are truly amazing individuals. To a large extent, they have 4.0 GPAs from college, perfect scores in the GREs, have managed to impress their professors to the point where their recommendation letters say things like “best student we’ve had in the last five years,” and by the time they graduate from college they have already published a few academic papers. All of this just to be placed in a situation where their chances of success are much less than 10%!

At this point you start wondering if being an NFL player is easier than getting a faculty job. I don’t actually know whether this is the case, but I can say one thing: a starting professor salary is about $120,000/year, and by the time you have become insanely famous or won the Turing Award, you’re making maybe twice or thrice that amount. The minimum salary for NFL players is about $300,000/year (and that's for like the rookie backup backup kicker), and if you become insanely successful, you can be making $30 million per year or more.

Ok, enough with the grim numbers. After all, things have worked out pretty well for me. Let me now give some reasons why the situation is not as bad as in the humanities and argue why going to graduate school in computer science is not that bad of a decision.

First, with a PhD in computer science, you can get a job at one of many great research labs or “researchy” companies like Google, and in many ways these jobs are better than being professor -- they certainly pay more with time. This means that the chances of getting a “good job” after getting a PhD in computer science are much higher than 10%. Second, the job of a researcher or a professor is pretty awesome: for all practical purposes, you have no boss! Also, according to many surveys, being a scientist is one of the most “prestigious” occupations. Third, I think graduate school is extremely enjoyable: you have about 5 years to work on WHATEVER you want, with very few responsibilities whatsoever. You don’t have a set 9-5 schedule (i.e. you can stay at home for days or even entire weeks), and you get to travel throughout the world -- as a graduate student, I went for free to Mexico, Hawaii, Austria, the Netherlands, Poland, Panama, Switzerland, more than 30 places inside the continental United States, and about 5 cities in Canada.

In the end, I think getting a PhD in computer science can be a good idea provided you actually enjoy doing research. But, (a) you should not do it for the money, and (b) you should be aware of how hard it is to get a faculty job afterwards.

Thoughts?

Friday, March 20, 2009

30 Pies Thrown at Me (Literally)



This is what happens at CMU when you post on your blog that you want to fail more students.

Tuesday, March 17, 2009

Failing Students

I sometimes want to fail more people in my classes. This is not because I am evil (although some people here seem to think so), but because I want the people who graduate from our computer science program to be truly the best in the world.

When I came to Carnegie Mellon, I was surprised at the insanely high quality of our undergraduates in Computer Science. I knew the PhD program was ranked #1, but I had no idea how awesome the undergrads were. Still, I think CMU and other top universities in the US need to fail a few more students in their classes.

The philosophy in US universities seems to be mostly one of making it really hard to get into the programs, but once you're in, the chances of graduating are really high. In fact, most rankings of American universities such as the one from US News place quite a bit of weight on four- or five-year graduation rates -- the fewer students that fail, the higher the university will be ranked. I find this counter-intuitive. While I understand that prospective students want to know that if they come here they will not be flunked, I think we all need to accept that mistakes are sometimes made in the admissions process.

In some other countries, like Guatemala where I went to high school, the philosophy is exactly the opposite. Pretty much anybody can be accepted to any university. However, a large fraction of the people who enter end up failing out. The reason this appeals to me is that rather than making a decision based on a single test score (the SAT) and a couple of recommendation letters, universities get to test students for the span of several years before giving them a seal of approval.

Should I be stricter with my grades?

Saturday, March 14, 2009

Irresponsible Push

Inspired by a popular internet company whose name I won't reveal here (you know who you are), the GWAP Web site team has started following a software engineering technique that I'm calling "irresponsible pushing." It works wonders.

Developer:I've implemented a first draft of the new feature. I just need to test it. We'll be able to release it in 2-3 weeks.
Luis: Push now. Release it.
Developer:What? Live?
Luis: Yes, push push push.

Then the untested feature is released (with bugs of course).

Luis: There are bugs! It's live. People are seeing the bugs! We're gonna lose users. FIX IT. Fix it now!

Then the developer goes nuts for the next 30 minutes fixing the issue, and voila: what was going to take 2-3 weeks took less than an hour.

I should be writing a book about this stuff.

Saturday, March 7, 2009

Optimal Number of PhD Students?

I want to conquer the world. Should I take on more PhD students? I am too busy. Should I drop some of them? These are fundamental questions in the life of a professor.

The students, of course, have similar questions but much less control over this: "My advisor has no time for me, and they are taking ANOTHER student?" or "Man, I wish I wasn't the only student this crazy guy has; that way he'd leave me alone for longer."

So, I decided to start a poll here to determine what people think is the optimal number of PhD students that an advisor should have. (I have six.)

Most of you read this blog via some sort of blog reader like Google Reader. Unfortunately, you'll have to visit the blog directly to vote.

Addendum: After a week of voting, ~300 people responded as follows:
0 students: 3%
1 student: 4%
2-4 students: 63%
5-8 students: 20%
9 or more: 7%

Apparently I have too many students :)

Thursday, March 5, 2009

Porn or Not Dot Com

Here’s an idea I’ve had for many years but have not been brave enough to launch. (a) Computers cannot perfectly tell whether an image is pornographic or not; (b) sites such as image search engines need to block pornography; (c) many people like looking at porn. Everything aligns perfectly: Why not let people who like looking at porn tell us which images are pornographic? As a reward, the more accurate they are, the better pornography they see.

The site would be simple. The user sees an image and they have to say if it is pornographic. If it happens to be pornography they rate how “good” it is. By giving them some images for which we know the correct answer (porn or not), we can measure how accurate they are. The more accurate they are, the more high quality porn we give them.

Sunday, March 1, 2009

Reviews Should be Published

Here's another in my series of rants about how we should change the academic world -- paper reviews. Although some people claim they like reviewing papers, I seem to be a receptacle for evaluating crappy ones. I therefore do not enjoy it!

Currently, papers are reviewed mostly as follows: after it is submitted, a "program committee" who can see the authors' names decides who are the best N people to review it (taking into account area of expertise, conflicts of interest, etc.); the reviewers write a review and remain anonymous forever; a decision based on these reviews is made on whether to accept the paper or not.

The problem I see is that there is very little incentive to write high-quality reviews. Heck, there is very little incentive to even review a paper at all because to a large extent reviewers get zero credit. Unless you are a member of the program committee, your name is usually not even posted anywhere. This, combined with the fact that most submitted papers are not very good, makes me not want to review at all.

So here's what I propose: High-quality reviews should be published. If the review is positive and explains why the paper is of importance, it should be published along with the paper (some journals like Science and Nature are already doing something similar). If the review is negative and gives a non-trivial reason of why the paper should not be published (e.g., a clever break of a cryptosystem, a little-known fact that makes a study useless, etc.), the review should be published instead of the paper. (This should only be done with papers that seem like good ideas at first, for which the reviewer found a subtle but critical flaw.)

Oh, and stop sending me lame papers, please.