Pomona College Students Win at "DataFest" Competition
Five Pomona College students on two teams participated in DataFest, the second annual 48-hour data analysis competition at UCLA, winning one of the three awards and an honorable mention.
On May 4-6, 69 Pomona and UCLA students on 15 teams worked with a large dataset from non-profit microloan broker Kiva.org, with the goal of helping Kiva find a metric to better target projects to customers. The dataset included 15,000 lenders, 97,000 loans and several million transactions.
One Pomona team (Team PC: Joseph Replogle ’13, Erika Parks ’13, and Karl Kumbier ’13) won Best Use of External Data; the other team (Team Not So PC: Tim Stutz ’12 and Drew DiPalma ’13) won honorable mention for Best Visualization. The Pomona students were an interdisciplinary group, with sociology, molecular biology, computer science and math majors in attendance. One of Pomona’s teams sought to characterize lenders and another sought to understand what factors contribute to a fertile micro-lending environment in a country.
Using graphs, models, and likely supercharged laptops, the students examined patterns of variation in default rates to examine trends over time and across the globe, to see how patterns of giving varied with geography and how to understand whether different regions of the world seek different types of loans.
Team PC found country-level data on variables such as GDP, population, political instability index and Gini coefficient. Their results showed that, somewhat counter-intuitively, none of the typical explanations for success of microfinance in a given country held up when applied to the data.
Using text-mining techniques, Team Not So PC analyzed why people had decided to donate to particular borrowers—finding groups of words that were particular to different geographical regions; the words found were more likely to show up in a given region than across all regions. The team used words to classify lenders into progressive and faith-based groups. Across time, trends indicate that the percentage of faith-based lenders has increased substantially.
Because Kiva’s microfinance data could be dissected at so many different levels, it was crucial to determine which relationships could be interesting and useful for the organization, said Replogle.
It was an opportunity to apply smarts to a real-life application, something taught in Associate Professor of Mathematics Johanna Hardin’s classes.
“Personally, the great lesson—or rather, refresher of a maxim taught in Prof. Hardin's stats classes—of Datafest was that even in a large data-mining project, statistics must be constantly accompanied by theory,” said Replogle. “Throughout the competition, we continuously generated hypotheses from our statistical observations and subsequently pushed the statistics further to better test our hypotheses.”
Two days of caffeine-and-sugar-fueled stat crunching later, the students’ hopes didn’t flag. The goal remained the same: results that will help Kiva better achieve their mission to connect people through lending to alleviate poverty.
DataFest’s 48-hour length notwithstanding, the students had made a pact: “Driving to UCLA, we agreed that we wouldn’t sacrifice sleep for the competition; after all, we had finals to study for on Monday!” said Replogle.
Though the timing was troublesome, but surmountable, DataFest was worth it. “I think DataFest is a great opportunity for our students because it gives them a chance to work with an actual data set on an interesting problem in real time,” said Hardin. “Not only does the experience help the students realize what types of career prospects might exist, but it also communicates to grad schools and future employers that our students are working at the cutting edge of statistical and computational analysis and that they are engaged in solving interesting problems.”