The Summer Undergraduate Research Program (SURP) enables students to conduct extended, focused research in close cooperation with a Pomona faculty member. Below are recent summer research projects in the Computer Science Department.
Fluorescence Bioimaging of Organellar Network Evolution
Chinasa Okolo ’18; Mentor: Shannon Quinn (University of Georgia); Collaborator: Pramod Giri (University of Georgia)
Currently there are no bioimaging application models that have the capability to tag organelles as dynamic social networks. Being able to model these patterns as evolving networks with respect to time will yield additional insights not possible with existing bioimaging methods. The analytical framework designed in this project was implemented using open source tools such as ImageJ, MATLAB, and Mahotas, which will help increase the accessibility of these tools and the transparency of future analyses. From experimentation with various Python packages and scientific image processors, numerous images were produced that quantified the number of non-zero pixels in actin channels, the centroids of the cells, and the segmented parts of the cells. Given the limited time frame of ten weeks, there were no evident morphological changes observed between the uninfected and infected slides of the 6 hours and 24 hours time points. Due to the lack of facilities and lower costs, Listeria monocytogenes was used as a model, but to characterize the spatio-temporal evolution of organellar networks, the next step of this project will use cells infected with Mycobacterium tuberculosis (Mtb) and Legionella pneumophila (Lp), respectively. The completed framework will quantify changes in organellar shape, quantity, and spatial distribution over large sequences of Z-stack microscope images and digital videos, which will improve our understanding of how cellular mechanisms respond to their environments.
Funding Provided By: National Science Foundation (University of Georgia)
Improving the User Interface for the Online Grace Web Editor
Kelvin Kim ’17; Mentor: Kim Bruce; Collaborator: Reid Mitchell ‘16
Funding Provided By: Richter
Static Typing and Documentation Generation for the Grace Programming Language
Reid Mitchell ’16; Mentor: Kim Bruce
This summer, I assisted Prof. Kim Bruce in his work on the Grace programming language. Grace is a new language designed to facilitate the teaching of programming concepts to students. It eliminates some of the unnecessary complexities of other languages to reduce the learning curve for those new to computer science. It has been taught in introductory CS courses at Pomona and will be taught again in the fall. One of my projects was to build a documentation generator for Grace. It takes in well-commented Grace code and outputs HTML pages containing documentation of that code, similar to the way Javadoc works for Java. Initially, the compiler ignored comments entirely, so I had to modify the parser to preserve comments in the abstract syntax tree. Then I wrote a program to translate that abstract syntax tree into a set of HTML pages. My other project was to fix the statically-typed dialect of Grace. A language is statically-typed if it checks the types of all objects at compile time for correctness and consistency. Standard Grace does not do this, but static typing is a good teaching tool; thus, this dialect was created to enforce these rules for students. At the start of summer, the dialect did not work for all cases. It's still not perfect, but I have made significant progress. One major improvement was to make the compiler write information about types declared in a module to a file visible to other modules. This allows for type-checking against types from imported modules.
Funding Provided By: Richter
Syntactic tree parsing in lexical simplification tasks
Eli Fessler ’17; Mentor: David Kauchak
Our work examines the task of lexical simplification, which reduces the reading complexity of sentences by replacing certain words with simpler, more accessible variants. Previous work on the project built an initial corpus from corresponding sentence pairs extracted from English and Simple English Wikipedia, which we utilize in this project to learn word simplifications. We first train a feature-based ranking function (using SVMrank—a linear support vector machine) on a set of labeled simplifications collected using Amazon Mechanical Turk. From the collected aligned sentences, we extract simplification rules which include candidate word replacements and word replacement frequency. In applying these rules to simplify different sentences, we modified the system to automatically identify words to be changed in testing data, rather than requiring manual identification of the words to be simplified. We then created and refined simplification rules and predictive language features, e.g. a feature to compare the original and modified sentences' parse (syntax) tree. We found that in the original system, 86.25% of the sentences were changed, with a precision of 0.7615 and an accuracy of 0.6629. With the inclusion of the new feature, we saw the precision and accuracy increase to 0.7702 and 0.6668, respectively, with 85.85% of sentences modified; however, these numbers proved to be non-significant, although slight improvement did demonstrate progress in the right direction.
Funding Provided By: Fletcher Jones
The Impact of Quality Metrics on Communities Detected in Complex Networks
Jennifer Nguyen ’17; Mentor: Tzu-Yi Chen; Collaborator: Israel Gebremariam ‘19
Complex networks can be used to represent interactions between users on Facebook, members of a sports league, or neurons within the human brain. A community within a network is a set of members that are more connected to each other than to other members; communities might represent groups on Facebook or college athletic conferences in the NCAA's Division III. In many applications we are interested in identifying communities within these networks based solely on the interactions observed. The majority of commonly-used algorithms for detecting communities optimizes a metric known as modularity, which compares the density of connections of members within a community to members of other communities. However, other quality metrics such as conductance, coverage, performance, and silhouette index could also be used within those same algorithms. For this study we examined the impact of replacing modularity with coverage in existing implementations of the Louvain and CNM community detection algorithms. Preliminary results suggest that the metric used does not dramatically affect the communities identified.
Funding Provided By: Howard Hughes Medical Institute High Achievement Program (Gebremariam), Paul K. Richter and Evalyn E. Cook Richter Memorial Fund (Nguyen)
Analysis of Performance Portable Graph Algorithm Performance
Joshua Landgraf ’16; Mentor: Jeff Amelang (HMC); Collaborator: Reyna Hulett ’16 (HMC)
With the rise of multicore CPUs, powerful GPUs, and massive supercomputers, parallel computing has become increasingly relevant to both regular consumers and members of the high performance computing community. Unfortunately, many parallel programming languages and libraries result in programs being tied to only one particular type of parallel computing architecture by lacking support for other architectures or by forcing programmers to write their programs in such a way that prevents them from performing well on other architectures. However, Kokkos solves these issues by supporting multiple parallel computing architectures and handling many important details itself. This gives Kokkos the freedom to tailor program implementations to the architecture they will be running on, resulting in good performance across architectures. We tested Kokkos by using it to implement a variety of approaches to solving graph problems on CPUs and GPUs. We then compared the performance of Kokkos against OpenMP, a popular parallel programming language for CPUs, and Cuda, a popular parallel programming language for Nvidia GPUs. We also compared different approaches with each other to see which ones worked best. We found that Kokkos generally performed as well as both OpenMP and Cuda when run on sufficiently large graph problems. However, there were some instances where Kokkos was significantly better and some instances where Kokkos was significantly worse than OpenMP or Cuda.
Parallelization of Gene Sequence Alignment Tools Using SeqDB
Eleanor Swent Cawthon (2015); Additional Collaborator(s): Kirsten Fagnan (Lawrence Berkeley National Laboratory); Brian (Bushnell Joint Genome Institute); Mentor(s): Katerina Antypas (Lawrence Berkeley National Lab)
Abstract: Genome researchers around the world use the FASTQ file format to represent genome sequence data. Because the data in a single FASTQ file must be accessed sequentially, this standard has created a bottleneck in the performance of sequence alignment tools. This study examines the extent to which using SeqDB, an alternative to FASTQ based on the Hierarchical Data Format, can improve the performance of the BBMap sequence alignment tool. BBMap, was modified to support input in SeqDB format natively. Throughput of BBMap's format conversion tool was measured when the same read data were given in uncompressed FASTQ, Gzip-compressed FASTQ, and SeqDB formats. The modified version of BBMap processed reads at a rate that increased at a rate of approximately 22.3 reads per millisecond per doubling of the number of threads, up to a maximum of 126 reads per millisecond with eight threads. Average throughput was 300 reads per millisecond for uncompressed FASTQ and 227 reads per millisecond for Gzip-compressed FASTQ. These rates did not vary substantially with the number of threads used. Input file size was not found to be related to SeqDB’s throughput. The results of this investigation suggest that SeqDB has the potential to be a scalable solution to one significant input and output bottleneck, but that additional changes in BBMap will be required in order for SeqDB support to match or exceed the performance of older formats.
Funding Provided by: U.S. Department of Energy Office of Science
Integrating the Grace Programming Language into DrRacket
Richard Yannow (2014); Student Collaborator(s): Nicholas Cho (2015); Mentor(s): Kim Bruce
Abstract: The Grace programming language project was started with the intention of developing an object-oriented programming language that would make it easy to teach programming to novices. To this end, we have to provide not only a simple and flexible language amenable to different teaching styles and programming paradigms, but also a robust environment in which novices can learn to program. We decided to take DrRacket, an integrated development environment (IDE) for the Racket programming language, and extend it to support Grace, allowing us to take advantage of its numerous built-in beginner-friendly features through Racket’s language-binding capabilities. In order for DrRacket to understand Grace code, we wrote a parser that takes Grace code, or the surface representation, and interprets it to build an Abstract Syntax Tree (AST), or the underlying syntax; a typechecker that builds a type environment and supports the typechecking of any combination of statically-typed and dynamically-typed code; and an interpreter that translates the semantics of the AST into Racket code, so that DrRacket can evaluate it as it would any other program.
Funding provided by Pomona College SURP
Impro-Visor: Audio Input and Style Recognition
Anna Turner (2015); Student Collaborator(s): Hayden Blauzvern (2016 HMC); Nate Tarrh (2014 Tufts University); Kelly Lee (2016 HMC); Mentor(s): Robert Keller (HMC)
Abstract: We present research on Impro-Visor, intelligent music software dedicated to helping both beginner and expert jazz musicians improve their playing. We first introduce the creation of audio input capabilities. We accomplish this through SuperCollider, a programming language used for audio synthesis. We use pitch and onset detection to detect notes and rests, which we then send as MIDI input to Impro-Visor. We integrate existing external software to make this feature as portable as possible. We also describe an automated method for style recognition of jazz melodies through the use of supervised training. We train a neural network to recognize defining stylistic elements of specific musicians. We then present melodies to a critic for judgment on a grading scale, and for prediction of the musicians to whom the melodies sound most similar.
Funding Provided by: National Science Foundation (HMC)
Walking in place using the Microsoft Kinect to explore a large VE
Kevin Nguyen (2016); Student Collaborator(s): Preston Tunnell Wilson (2016 Rhodes College); Additional Collaborator(s): Kyle Dempsey (Mississippi University for Women); Mentor(s): Betsy Williams-Sanders (Rhodes College)
Abstract: One way to permit free exploration of any sized virtual environment (VE) and provide some of the inertial cues of walking is to have users “walk in place” (WIP) [Williams et al. 2011]. With WIP, each step is treated as a virtual translation even though the participant remains in the same location. In our prior work [Williams et al. 2011], we had success in implementing a WIP method using an inexpensive Nintendo Wii Balance Board. We showed that participants’ spatial orientation was the same as normal walking and superior to joystick navigation. There were two major drawbacks to this WIP algorithm. First, our step detection algorithm had a half–step lag. Second, participants found it slightly annoying to walk in place on the small board. Thus, the current work seeks to overcome these limitations by implementing an algorithm to WIP using two Microsoft Kinect sensors (150 USD each). Specifically, we are interested in seeing how well users can explore a large VE by WIP with the Kinect (WIP–K). Due to the large size of the VE, comparing these results to normal physical walking is not possible. Therefore, we directly compare WIP–K to joystick navigation. Also, we examine scaling the translational gain of WIP–K so that one “step” carries the user forward two steps (referred to as WIP–K x 2). Thus, this within–subject experiment compares subjects’ spatial orientation as they navigate a VE in three conditions: Joystick, WIP–K, WIP–K x 2.
Funding provided by Pomona College SURP
Examining graph clustering stability
Evan Fields (2013); Mentor(s): Tzu-Yi Chen
Abstract: A graph is a collection of objects (nodes) and connections between pairs of objects (edges). Graphs have been used to study phenomena such as social networks, where the nodes represent people and edges represent friendships. Partitioning nodes into “clusters” such that two nodes within a cluster are more likely to be connected than two nodes from separate clusters can reveal structures such as communities in social networks. Many clustering algorithms have been proposed. However, popular measures of clustering strength are computationally unfeasible to maximize and algorithms can, at best, compute reasonably good clusterings. More importantly, clustering algorithms will return a partition of the nodes even for graphs lacking community structure. We investigate how to determine whether the returned clusters represent a meaningful structure in the graph. We present an algorithm-agnostic method of examining stability (and thus meaningfulness of clusters): after running a clustering algorithm on the initial graph, we add a single edge not present in the original graph and re-run the clustering algorithm. The distance between the clustering on the original and on the modified graphs is recorded, and this experiment is repeated for a large number of edges not present in the original graph. We hypothesize that the distribution of recorded distances carries information about stability and present experimental data collected on a variety of synthetic and real-world graphs.
Anonymity in Online Communities
Eli Omernick (2013); Mentor(s): Sara Sood
Abstract: With the ever-expanding scope of computer-mediated communication, especially in this age of social media and instant communication, there are some interesting and meaningful questions being raised on how we communicate and the subsequent implications; specifically we have investigated the influence of anonymity on the behavior of Internet users. TechCrunch is a technology news site, which posts articles and allows readers to post comments in response. On March 1st, 2011, TechCrunch switched from the Disqus commenting platform to the Facebook commenting platform, marking the end of condoned anonymity in their online community. They did this in the name of “Troll Slaying,” or the attempt of reducing intentionally negative or destructive user contributions. We looked at trends between the two corpora as wholes as well as between the user groups (which we characterize as having varying degrees of anonymity, from totally anonymous, to a pseudonym, to using users’ real names) within the individual corpora. We evaluated comments in terms of several qualitative (e.g. Readability, Relevance, Word Usage) as well as quantitative (e.g. Comments/Article, Comments/User, Comment Length) metrics.
Programming in Grace
Amy Ruskin (2014); Student Collaborator(s): Richard Yannow (2014); Mentor(s): Kim Bruce
Abstract: Grace is a programming language that is currently in development with the eventual goal of being used to teach introductory computer science courses. I programmed extensively in Grace to find some remaining bugs and provide feedback on the experience of actually using the language. In the end, I produced Grace code for various data structures, using Java structures as guides, and translated some of the projects and assignments from Pomona's Data Structures and Advanced Programming (CS62) into Grace. Most of the problems encountered were due to features of the language that were not yet fully implemented and the lack of extensive and current documentation, but once those issues are resolved, Grace should be easy to learn and straightforward to write.
Funding Provided by: Pomona College SURP
Adapting Object-Oriented Languages for Instructional IDEs
Richard Yannow (2014); Mentor(s): Kim Bruce
Abstract: The Grace programming language project was started with the intention of making a new object-oriented language for teaching the practice of programming. In order to be successful, Grace must be easily usable by novices, and a significant factor towards that goal is having a beginner-friendly integrated development environment (or IDE). We decided to use DrRacket as an IDE for Grace, allowing us to take advantage of its numerous novice-friendly features and Racket's powerful language-building capabilities. We developed a new backend language, Racket-Grace, with Grace's semantics, but with Racket-style syntax. We also wrote a pretty-printer that will take processed abstract syntax trees of Grace code, and return an equivalent Racket-Grace program. This will allow us to input a Grace program into DrRacket and run it there, translating it into Racket-Grace as an under-the-hood intermediate step, allowing us to maintain compatibility with DrRacket's many useful tools.
Funding provided by Pomona College SURP