Summer 2013 Research Project: Eleanor Cawthon '15

Watch Eleanor Cawthon '15 discuss her research project.

The Summer Undergraduate Research Program (SURP) enables students to conduct extended, focused research in close cooperation with a Pomona faculty member. Below are recent summer research projects in the Computer Science Department.

2013

Parallelization of Gene Sequence Alignment Tools Using SeqDB

Eleanor Swent Cawthon (2015); Additional Collaborator(s): Kirsten Fagnan (Lawrence Berkeley National Laboratory); Brian (Bushnell Joint Genome Institute); Mentor(s): Katerina Antypas (Lawrence Berkeley National Lab)

Abstract: Genome researchers around the world use the FASTQ file format to represent genome sequence data. Because the data in a single FASTQ file must be accessed sequentially, this standard has created a bottleneck in the performance of sequence alignment tools. This study examines the extent to which using SeqDB, an alternative to FASTQ based on the Hierarchical Data Format, can improve the performance of the BBMap sequence alignment tool. BBMap, was modified to support input in SeqDB format natively. Throughput of BBMap's format conversion tool was measured when the same read data were given in uncompressed FASTQ, Gzip-compressed FASTQ, and SeqDB formats. The modified version of BBMap processed reads at a rate that increased at a rate of approximately 22.3 reads per millisecond per doubling of the number of threads, up to a maximum of 126 reads per millisecond with eight threads. Average throughput was 300 reads per millisecond for uncompressed FASTQ and 227 reads per millisecond for Gzip-compressed FASTQ. These rates did not vary substantially with the number of threads used. Input file size was not found to be related to SeqDB’s throughput. The results of this investigation suggest that SeqDB has the potential to be a scalable solution to one significant input and output bottleneck, but that additional changes in BBMap will be required in order for SeqDB support to match or exceed the performance of older formats.
Funding Provided by: U.S. Department of Energy Office of Science

Integrating the Grace Programming Language into DrRacket

Richard Yannow (2014); Student Collaborator(s): Nicholas Cho (2015); Mentor(s): Kim Bruce

Abstract: The Grace programming language project was started with the intention of developing an object-oriented programming language that would make it easy to teach programming to novices. To this end, we have to provide not only a simple and flexible language amenable to different teaching styles and programming paradigms, but also a robust environment in which novices can learn to program. We decided to take DrRacket, an integrated development environment (IDE) for the Racket programming language, and extend it to support Grace, allowing us to take advantage of its numerous built-in beginner-friendly features through Racket’s language-binding capabilities. In order for DrRacket to understand Grace code, we wrote a parser that takes Grace code, or the surface representation, and interprets it to build an Abstract Syntax Tree (AST), or the underlying syntax; a typechecker that builds a type environment and supports the typechecking of any combination of statically-typed and dynamically-typed code; and an interpreter that translates the semantics of the AST into Racket code, so that DrRacket can evaluate it as it would any other program.
Funding provided by Pomona College SURP

Impro-Visor: Audio Input and Style Recognition

Anna Turner (2015); Student Collaborator(s): Hayden Blauzvern (2016 HMC); Nate Tarrh (2014 Tufts University); Kelly Lee (2016 HMC); Mentor(s): Robert Keller (HMC)

Abstract: We present research on Impro-Visor, intelligent music software dedicated to helping both beginner and expert jazz musicians improve their playing. We first introduce the creation of audio input capabilities. We accomplish this through SuperCollider, a programming language used for audio synthesis. We use pitch and onset detection to detect notes and rests, which we then send as MIDI input to Impro-Visor. We integrate existing external software to make this feature as portable as possible. We also describe an automated method for style recognition of jazz melodies through the use of supervised training. We train a neural network to recognize defining stylistic elements of specific musicians. We then present melodies to a critic for judgment on a grading scale, and for prediction of the musicians to whom the melodies sound most similar.
Funding Provided by: National Science Foundation (HMC)

Walking in place using the Microsoft Kinect to explore a large VE

Kevin Nguyen (2016); Student Collaborator(s): Preston Tunnell Wilson (2016 Rhodes College); Additional Collaborator(s): Kyle Dempsey (Mississippi University for Women); Mentor(s): Betsy Williams-Sanders (Rhodes College)

Abstract: One way to permit free exploration of any sized virtual environment (VE) and provide some of the inertial cues of walking is to have users “walk in place” (WIP) [Williams et al. 2011]. With WIP, each step is treated as a virtual translation even though the participant remains in the same location. In our prior work [Williams et al. 2011], we had success in implementing a WIP method using an inexpensive Nintendo Wii Balance Board. We showed that participants’ spatial orientation was the same as normal walking and superior to joystick navigation. There were two major drawbacks to this WIP algorithm. First, our step detection algorithm had a half–step lag. Second, participants found it slightly annoying to walk in place on the small board. Thus, the current work seeks to overcome these limitations by implementing an algorithm to WIP using two Microsoft Kinect sensors (150 USD each). Specifically, we are interested in seeing how well users can explore a large VE by WIP with the Kinect (WIP–K). Due to the large size of the VE, comparing these results to normal physical walking is not possible. Therefore, we directly compare WIP–K to joystick navigation. Also, we examine scaling the translational gain of WIP–K so that one “step” carries the user forward two steps (referred to as WIP–K x 2). Thus, this within–subject experiment compares subjects’ spatial orientation as they navigate a VE in three conditions: Joystick, WIP–K, WIP–K x 2.
Funding provided by Pomona College SURP

2012

Examining graph clustering stability

Evan Fields (2013); Mentor(s): Tzu-Yi Chen

Abstract: A graph is a collection of objects (nodes) and connections between pairs of objects (edges). Graphs have been used to study phenomena such as social networks, where the nodes represent people and edges represent friendships. Partitioning nodes into “clusters” such that two nodes within a cluster are more likely to be connected than two nodes from separate clusters can reveal structures such as communities in social networks. Many clustering algorithms have been proposed. However, popular measures of clustering strength are computationally unfeasible to maximize and algorithms can, at best, compute reasonably good clusterings. More importantly, clustering algorithms will return a partition of the nodes even for graphs lacking community structure. We investigate how to determine whether the returned clusters represent a meaningful structure in the graph. We present an algorithm-agnostic method of examining stability (and thus meaningfulness of clusters): after running a clustering algorithm on the initial graph, we add a single edge not present in the original graph and re-run the clustering algorithm. The distance between the clustering on the original and on the modified graphs is recorded, and this experiment is repeated for a large number of edges not present in the original graph. We hypothesize that the distribution of recorded distances carries information about stability and present experimental data collected on a variety of synthetic and real-world graphs.

Anonymity in Online Communities

Eli Omernick (2013); Mentor(s): Sara Sood

Abstract: With the ever-expanding scope of computer-mediated communication, especially in this age of social media and instant communication, there are some interesting and meaningful questions being raised on how we communicate and the subsequent implications; specifically we have investigated the influence of anonymity on the behavior of Internet users. TechCrunch is a technology news site, which posts articles and allows readers to post comments in response. On March 1st, 2011, TechCrunch switched from the Disqus commenting platform to the Facebook commenting platform, marking the end of condoned anonymity in their online community. They did this in the name of “Troll Slaying,” or the attempt of reducing intentionally negative or destructive user contributions. We looked at trends between the two corpora as wholes as well as between the user groups (which we characterize as having varying degrees of anonymity, from totally anonymous, to a pseudonym, to using users’ real names) within the individual corpora. We evaluated comments in terms of several qualitative (e.g. Readability, Relevance, Word Usage) as well as quantitative (e.g. Comments/Article, Comments/User, Comment Length) metrics.
Funding Provided by: Pomona College SURP

Programming in Grace

Amy Ruskin (2014); Student Collaborator(s): Richard Yannow (2014); Mentor(s): Kim Bruce

Abstract: Grace is a programming language that is currently in development with the eventual goal of being used to teach introductory computer science courses. I programmed extensively in Grace to find some remaining bugs and provide feedback on the experience of actually using the language. In the end, I produced Grace code for various data structures, using Java structures as guides, and translated some of the projects and assignments from Pomona's Data Structures and Advanced Programming (CS62) into Grace. Most of the problems encountered were due to features of the language that were not yet fully implemented and the lack of extensive and current documentation, but once those issues are resolved, Grace should be easy to learn and straightforward to write.
Funding Provided by: Pomona College SURP

Adapting Object-Oriented Languages for Instructional IDEs

Richard Yannow (2014); Mentor(s): Kim Bruce

Abstract: The Grace programming language project was started with the intention of making a new object-oriented language for teaching the practice of programming. In order to be successful, Grace must be easily usable by novices, and a significant factor towards that goal is having a beginner-friendly integrated development environment (or IDE). We decided to use DrRacket as an IDE for Grace, allowing us to take advantage of its numerous novice-friendly features and Racket's powerful language-building capabilities. We developed a new backend language, Racket-Grace, with Grace's semantics, but with Racket-style syntax. We also wrote a pretty-printer that will take processed abstract syntax trees of Grace code, and return an equivalent Racket-Grace program. This will allow us to input a Grace program into DrRacket and run it there, translating it into Racket-Grace as an under-the-hood intermediate step, allowing us to maintain compatibility with DrRacket's many useful tools.
Funding provided by Pomona College SURP