Nine Pomona College students on two teams—Team Chirp and Team Stingrays—earned a "Best Insight" award at this year's DataFest, a 48-hour competition held at UCLA on April 19-21. Thirty-two teams from UCLA, USC, UC Riverside and Cal State University, Long Beach competed in the data analysis competition which used data from one million users of the online dating site eHarmony. The teams were whittled down by the end and allowed to pair up, ending with the combined Pomona team winning one of two "Best Insight" awards, the competition's best in show prize.
More than 100 students analyzed matches between a man and a woman that had been made by eHarmony's matchmaking algorithm, which uses 29 "dimensions" or criteria promising to take hopeful users "from single to soul mate." Teams were asked to explain why one party might make contact, why the other party might not, or why both parties emailed the other. The goal was to glean as much information out of the data as possible and communicate it visually via graphics to a panel of judges. (Because of an agreement signed with eHarmony, the insights the team provided cannot be disclosed.)
The students made observations about compatibility across approximately 50-60 variables, with each variable having multiple levels, making the whole dataset about 240 columns. The variables included whether the two eHarmony users had communicated within the last week (both a variable for if the man sent the communication or if the woman did, or both); age; the distance in miles between the two people; and preferences such as ethnicities, religion, education level, weekly alcohol consumption and more. The teams had to define a statistic for the average compatibility score.
For a non-statistician, the sheer amount of numbers to be scrutinized would generate more nerves than a first date, but not for the Pomona team.
"It was a lot of fun to be given a huge dataset and have no guidance at all about what we should be doing or where we should end up. The creativity that was necessary to come up with an interesting question to pursue ended up making the whole experience very positive and definitely worthwhile," says Brian Williamson '14.
As the 48 hours ticked by, the urgency ramped up. In the final hours, Williamson felt the crunch and downed five cups of coffee—"unheard of for me"—to fuel the last push, which included creating a graphic that was key for their presentation. "We were able to crank out everything we needed with literally five minutes to spare," he says.
Each team had 60 seconds to present one slide of their findings in the lightning round. Then, to advance to the final round, they had 45 minutes to find a partner team. These superteams then made a five-minute, three-slide presentation to the judges.
Bill DeRose ‘15, a computer science major, found the DataFest experience exhilarating. "I found working with a team liberating after being graded as an individual for so long in school," he says.
Advised by Pomona College Associate Professor of Mathematics Johanna Hardin, the students who participated in the competition were Jake Coleman ‘13, Maricela Cruz '14, Bill DeRose '15, Ciaran Evans '16, Rob Knickerbocker '15, Kevin Lu '15, Derek Owens-Oas ‘13, Ben Shand '14 and Brian Williamson '14.
This was the third annual DataFest competition. Last year, one Pomona team won "Best Use of External Data" and a second team won an honorable mention for "Best Visualization."