One way that Pomona College provides opportunities for students to excel is through research opportunities. Below is a list of recent summer research projects conducted by students in the Math Department.
3 Mathematicians, 400 Years, 1 Tradition
LeCuk Peral ’18; Mentor: Shahriar Shahriari; Collaborator: Malyq McElroy ‘18
While the word "algebra" may now evoke memories of early schooling, hundreds of years ago advances in the subject were vital to the development of global trade and mathematics as a whole. We studied the works of 3 mathematicians in this field spanning 6 centuries from the 700s to the late 1200s. The three mathematicians, the persian Al-Khwarizmi, the egyptian Abu-Kamil, and the italian Leonardo of Pisano (better known as Fibonacci) all worked on similar problems and championed the use of the Hindu-Arabic numeric system. Delving into their works---as part of a larger project on dissemination of knowledge---we sought to figure out why many of their worked-out problems were nearly identical, considering the amount of time that separates them, and what this meant for the development of algebra in the medieval period.
Funding Provided By: Pomona Unrestricted (McElroy), Richter (Peral)
A Walk Through the Woods: Growing Random Forests for Big Data
John Bryan ’16; Mentor: Johanna Hardin: Collaborator: Yenny Zhang ‘17
The Bag of Little Bootstraps (BLB) is a relatively new methodology that is designed to implement bootstrapping on large datasets. It relies on the bootstrap method, a way of estimating sampling distributions that are not well-characterized by statistical theory. BLB utilizes parallel computing architectures and a multinomial method to drastically reduce computational costs while preserving the appropriate level of error and statistical correctness. The multinomial method is used to resample to a given original sample size N while only using data vectors of size B << N. We combined the ideas of BLB with Random Forests (RF), which is a type of widely-used model for classification and prediction. Applying BLB to RF required that we study and modify the original randomForest R package. The original package lacked the capability to employ a multinomial method like BLB, so we implemented the multinomial method into the package. Through these modifications, we constructed a novel randomForest package that performs the new algorithm. Looking to the future, we plan to run simulations using this new algorithm to assess the reduction in computational cost as well as the algorithm’s reliability in producing confidence estimates of the predicted values. Additionally, we hope to investigate the infinitesimal jackknife estimator to measure the variability of the estimates. Ultimately, we hope to produce a BLBRF algorithm that is optimized for employing RF models in big data situations.
Funding Provided By: Richter (Bryan), Pomona College Mathematics Department (Zhang)
Alternating Means: The Impact of Noise in the Logistic Family
Austin Wei ’18; Mentors: Ami Radunskaya and Johanna Hardin
The logistic map is a one-parameter model for population growth with limited resources. We can model the effects of noise in the environment by replacing the fixed parameter value with a random value chosen according to a distribution centered at that value. We call this the stochastic logistic map. The question arises of whether the introduction of noise tends to reduce or increase the long term average of successive iterations of the logistic map. We show that the introduction of randomness decreases the mean value for parameter values where the deterministic map has an attracting fixed point. Surprisingly, however, the mean value actually increases when the deterministic map has an attracting period 2 cycle. We further conjecture that this alternation of optimal means continues through the period doubling bifurcations of the deterministic logistic map.
Funding Provided By: Fletcher Jones
False discovery rate control in a two-step dependent filtering procedure for differential gene expression analysis
Ciaran Evans ’16; Mentors: Johanna Hardin and Daniel Stoebel (HMC)
Analysis of genetic data to investigate differential gene expression often involves performing thousands of hypothesis tests. When multiple tests are performed, it is necessary to control the quantity of false positives that are produced, so that the test results are meaningful. The standard error rate used in differential expression analysis is the false discovery rate (FDR), which is the expected ratio of the number of false positives to the total number of tests declared significant. A common goal of research in this field is to increase the power of multiple-testing procedures to detect true differences while maintaining FDR control. A collection of proposed methods to increase power involve data-based approaches to filtering out true null hypotheses. The method we consider in this research uses a two-step testing procedure for differential gene expression, in which a global test is performed to test for any differences in gene expression between all experimental conditions, and then pairwise tests to determine the nature of differential expression are performed on those genes for which the global null hypothesis is rejected. To attain FDR control, an estimate of the distribution of p-values in the second step of the procedure is required. We examine a mixture-model approach to estimate this distribution, using a mixture of beta distributions which is estimated through the Expectation-Maximization (EM) algorithm, and investigate its ability to aid in FDR control.
Funding Provided By: Howard Hughes Medical Institute
Parity Biquandle Invariants of Virtual Knots
Leo Selker ’17; Mentor: Sam Nelson (CMC); Collaborator: Aaron Kaestner (North Park University)
Virtual knots are a generalization of classical knots. We define counting and cocycle enhancement invariants of virtual knots using parity biquandles. These invariants are determined by pairs consisting of a biquandle $2$-cocycle $\phi^0$ and a map $\phi^1$ with certain compatibility conditions leading to one-variable or two-variable polynomial invariants of virtual knots. We provide examples to show that the parity cocycle invariants can distinguish virtual knots which are not distinguished by the corresponding non-parity invariants.
Funding Provided By: Pomona College Mathematics Department
What’s in a Name? Defining Quantitative Literacy
Edwin Villafane Hernandez ’18; Mentor: Gizem Karaali; Collaborator: Jeremy Taylor ‘18
This project aims to bring together different threads in the eclectic literature that make up the scholarship around the theme of Quantitative Literacy. In investigating the meanings of terms like "quantitative literacy", "quantitative reasoning" and "numeracy", we seek common ground, common goals and aspirations of a community of practitioners, and common themes. A decade ago, these terms were relatively new; today accrediting agencies are using them and inserting them in general education conversations. Having good, representative, and perhaps even compact and easily digestible definitions of these terms might come in handy in public relations contexts as well as in other situations where practitioners need to communicate their goals and practice to others (journalists, policy makers, funding agencies) and even assess their own success (how do you measure something if you cannot even define it?).
Funding Provided By: Pomona Unrestricted (Villafane Hernandez)
Solving Two-Point Boundary Value Problems in MATLAB
Clive Bender (2017); Mentor(s): Adolfo Rumbos
Abstract: The goal of this project is to come up with codes in MATLAB that can be used to solve two-point boundary value problems (BVPs). We used a MATLAB solver called bvp4c to solve two- point BVPs. We then used a code involving finite differences to find eigenvalues to linear two-point BVPs. Once the eigenvalues were estimated, we used the ode45 solver to estimate the corresponding eigenfunctions. The question still remains as to whether these numerical methods can be used to estimate solutions of problems at resonance with respect to the eigenvalue.
Funding Provided by: Pomona College SURP
Applications of Degree Theory to Two-Point Boundary Value Problems
Mary Kamitaki (2015); Student Collaborator(s): Connor Sutton (2015); Mentor(s): Adolfo Rumbos
Abstract: The goal of our project is to investigate degree-theoretical results as they pertain to problems in ordinary differential equations, and particularly problems at resonance with respect to the Dancer-Fučík spectrum. We begin with duality results from the work of Mawhin and theorems relating the Leray-Schauder and Brouwer degrees. Our focus is to verify and expound these results in order to provide a foundation for their use in boundary value problems. Finally, we apply our understanding of these results to the problems presented in Fabry and Fonda’s, “Nonlinear Resonance in Asymmetric Oscillators” (1998). In particular, this paper is concerned with proving the existence of solutions to periodic boundary-value problems at resonance with respect to the Dancer-Fučík spectrum. We hope to provide justification and understanding of the degree equality presented at the end of Chapter 3 as well as the paper’s subsequent results.
Funding Provided by: Kenneth T. and Eileen L. Norris Foundation (MK); The Constance Abbott Spears and Philip Lacey Spears Mathematics Fund (CS)
Periodic Solutions to Nonlinear Second Order Ordinary Differential Equations via Variational Techniques
Adam Waterbury (2016); Mentor(s): Adolfo Rumbos
Abstract: Our project focused on proving the existence of solutions to two-point boundary value problems for nonlinear second order ordinary differential equations. The problem we sought to understand involved periodic boundary conditions for a piece-wise linear equation with a bounded perturbation. This problem is related to the Dancer- Fucik spectrum and we considered the case in which the nonlinearity is a resonance with respect to that spectrum. In order to prove existence of solutions, we used the variational approach and followed the techniques that were presented in a paper by Morris and Robinson. The existence result is obtained through the use of a saddle point theorem proved by Willem.
Funding Provided by: Paul K. Richter and Evalyn E. Cook Richter Memorial Fund
Mathematical Modeling on DNA Segregation
Shiyue Li (2017); Student Collaborator(s): Emily Meyer (2014), Edwin Villafane Hernandez (2018); Mentor(s): Blerta Shtylla
Abstract: Mammalian cells employ a dynamic network of polymers called the mitotic spindle to reorganize and equipartition their genetic material. On the other hand, the mechanisms involved in transporting and localizing DNA in bacterial cells are not as well understood, primarily due to the small size of these cells. In this project, we focus our lens on the specific spatiotemporal localization of two main partitioning proteins, namely ParA and ParB in Caulobacter crescentus cells. A 3D detailed stochastic mathematical model of this reaction- diffusion process is simulated to investigate the functions and interactions of these two proteins in realistic cellular simulations. By establishing the system using in-vitro data and simulating dynamics of these two proteins’ movements, we investigate the primary mechanisms in bacterial DNA segregation process and the role of cell geometry in conjunction with Par protein reactions.
Funding Provided by: National Science Foundation #DMS-1358932 (Shtylla); Howard Hughes Medical Institute (SL); Howard Hughes Medical Institute HAP (EV)
Intrinsic Linking for Complete Graphs
Yubai Di (2016); Student Collaborator(s): Daniel Thompson (2015); Mentor(s): Erica Flapan
Abstract: The motivating question for our project is to determine the smallest n such that every embedding of the complete graph on n vertices in R3 contains a link of two disjoint cycles with linking number greater than 1. Previous works have narrowed down the answer to this question to either 9 or 10, so the main goal of our project is either to prove or disprove that K9 has this property. In the beginning we used computer programs to compute the linking numbers of links in different embeddings of K9 and the result showed that every embedding of K9 we computed contains a link with linking number greater than 1. This leads us to believe that every embedding of K9 would have such a link. We built up our knowledge on linking in complete graphs from complete graphs of smaller n and found some linking patterns with cycles in K4. Then we moved up to complete graphs of 7 and 8 vertices, and we constructed linking graphs to keep track of the triangle-triangle links in the embeddings. With this method we discovered the linking patterns of triangles in embeddings of K7 and K8 that have a minimal number of triangle- triangle links. We were also able to determine the minimal number of triangle-triangle links for K7, K8, and K9, as well as the minimal number of triangle- square links for K7. Additionally we have also found and proved a condition in which there would be a link with linking number greater than 1 in an embedding of two K4s, as a sub-graph of K9.
Funding Provided by: Paul K. Richter and Evalyn E. Cook Richter Memorial Fund (YD)
Purpose and humanism in mathematics education research
Luke Fischinger (2016); Student Collaborator(s): Alejandra Castillo (2017), Prisca Diala (2018); Mentor(s): Gizem Karaali
Abstract: One of the most influential journals in mathematics education research opened with an editorial titled “Why to teach mathematics so as to be useful” [Hans Freudenthal, Educational Studies in Mathematics, Volume 1 (1968), Numbers 1-2, 3-8]. Thus began an extended discussion of the purpose(s) of mathematics education that continued across many years and volumes, though mainly appearing as one undercurrent or hidden assumption among many. In our daily lives as mathematics students and researchers, we often confront this same question directly: Why should I learn mathematics? Though this version frequently comes coated in subtle hostility toward the subject and may sometimes be cast aside as such, the underlying question is still worthy of our scrutiny and understanding: Why teach mathematics? This research presentation will focus on this question and attempt to document how attitudes toward purpose evolved amidst mathematics education researchers. In particular, we will note in our research the emergence and development of the humanist and social constructivist paradigms on the one hand, and the interlocked themes of discovery, inquiry and active learning in the classroom on the other, and analyze how their proponents engaged with the question of purpose. Analyzing the different purposes of why math should be taught can give us a better understanding of why math is being taught the way it is.
Funding Provided by: Fletcher Jones Foundation (LF) Howard Hughes Medical Institute HAP (PD)
Expanding DESeq to differential expression analysis of three or more conditions for high- throughput data
Ciaran Evans (2016); Student Collaborator(s): Garrett Wong (2014 HMC); Additional Collaborator(s): Daniel Stoebel (HMC); Mentor(s): Johanna Hardin
Abstract: The software package DESeq is an important tool in differential gene expression analysis across two conditions. However, so far no one has extended the software to allow simultaneous analysis across three or more conditions. We begin with a direct generalization of the mathematical basis of DESeq to simultaneous analysis of any number of conditions. Unfortunately, a direct generalization of the DESeq algorithm for exact probability calculations is computationally prohibitive. In this project, we compare two sampling-based techniques for estimating these probabilities: a naive simple random sample, and a random sampling technique employing local regression for interpolation. We then present an analytic approach to estimation with the goal of improved computing time and more accurate estimates than the sampling techniques provide.
Funding Provided by: Howard Hughes Medical Institute (CE)
Forbidden Configurations in the Linear Lattices
Song Yu (2017); Mentor(s): Shahriar Shahriari
Abstract: Lines and planes that go through the origin are subspaces of the three-dimensional Euclidean space R^3. We can easily draw one million lines and one million planes in a way that none of the lines are contained in any of the planes. If we replace the real numbers with just the two numbers 0 & 1 where addition and multiplication are mod 2, then we still can have a three- dimensional space and lines and planes through the origin, but the largest collection of lines and planes where none of the lines are on any of the planes has size at most 7. In our project in Combinatorial Mathematics, we prove new results about forbidden configurations among subspaces of an n- dimensional vector space where the scalars are from a finite field. Our approach is to use the analogy between subspaces of a vector space and subsets of a set, since finding the largest size of a family of subsets of a finite set that excludes a certain configuration has been a fruitful area of research since the so-called Sperner's Theorem in 1928. We explore forbidden configurations including the Butterfly, N, Y, and the r-fork and extend known results about subsets to the important but less well-studied context of subspaces.
Funding Provided by: Kenneth T. and Eileen L. Norris Foundation
Sparse Canonical Correlation Analysis (SCCA)
Nicolas Alvarez (2016); Mentor(s): Johanna Hardin
Abstract: Canonical Correlation Analysis (CCA) is a statistical method used to generate linear combinations of two sets of variables with maximal correlation. CCA finds variables in one set which are highly related to variables in another set. When the number of variables exceeds the number of observations however, CCA may lack interpretability in an applied setting and/or be infeasible. Sparse Canonical Correlation Analysis (SCCA) solves this problem by finding linear combinations containing only some (sparse sets) of the variables. During summer 2014, a previously created program which executes SCCA was further developed. The family of functions and scripts produce sample data, analyze the data with SCCA and interpret how well the SCCA method worked. The program was thoroughly edited to increase efficiency and improve readability.
Funding Provided by: The Constance Abbott Spears and Philip Lacey Spears Mathematics Fund
Strong Solution to Smale's 17th Problem for Strongly Sparse Systems
Paula Burkhardt (2016); Additional Collaborator(s): Kaitlyn Phillipson (Texas A&M University); Mentor(s): J. Maurice Rojas (Texas A&M University), Shahriar Shahriari
Abstract: Smale’s 17th problem asks whether one can deterministically approximate a single root of a system of polynomials, in polynomial-time on average. The best recent results are probabilistic polynomial-time algorithms, so Smale's 17th Problem has not yet been fully solved. We give a much faster deterministic algorithm for the special case of binomial systems, and certain systems of binomials and trinomials. Our approach is also a stepping stone to harder variants of Smale's 17th Problem, such as approximating roots near a query point or approximating a single real root. This research was conducted as part of the 2014 REU at Texas A&M University.
Funding Provided by: National Science Foundation (Texas A & M)
Intrinsic 2-component linking in complete graphs
Spencer Johnson (2014); Mentor(s): Erica Flapan
Abstract: We present an approach to finding the smallest n such that every embedding of the complete graph on n vertices in R^n contains a link of at least two disjoint cycles with linking number greater than one.
Funding Provided by: Paul K. Richter and Evelyn E. Cook Richter Memorial Fund
Expanding DESeq to differential expression analysis of three or more conditions for high-throughput data
Ciaran Evans (2016); Student Collaborator(s): Garrett Wong (2014 HMC); Additional Collaborator(s): Dan Stoebel (HMC); Mentor(s): Johanna Hardin
Abstract: The software package DESeq is an important tool in differential gene expression analysis across two conditions. However, so far no one has extended the software to allow simultaneous analysis across three or more conditions. The goal of this project is to use statistical methods to analyze differential expression and overcome the issues that arise when dealing with more than two conditions.
Funding Provided by: Howard Hughes Medical Institute
A robust extension of Sparse Canonical Correlation Analysis for the analysis of genomic data
Joseph Replogle (2013); Student Collaborator(s): Jake Coleman (2013); Mentor(s): Johanna Hardin
Abstract: Medical genomics seeks to explain complex phenotypes based on variations in genetic, epigenetic, and environmental elements. To this end, high-throughput genomic and molecular biology technologies generate vast biological datasets that provide for examination of many variables simultaneously. In order to illuminate the mechanisms and pathways underlying human traits and unveil novel therapeutic avenues, creative statistical techniques must help integrate these diverse genetic datasets. Canonical Correlation Analysis (CCA), a statistical method that maximizes the correlation between linear combinations of sets of variables, and particularly Sparse Canonical Correlation Analysis (SCCA), which performs CCA on a small subset of variables extracted using a penalty function, are fruitful techniques for analysis of the complex relationships found in genomic data. Here we extend SCCA using Spearman Rank Correlation to make the method more robust to outliers. We use a combination of simulated and real data to show that our method outperforms previously proposed SCCA methods in the presence of the noisy data commonly found in biology. Additionally, we propose a permutation test for assessing the significance of multiple canonical variates. We hope that our robust SCCA will allow biologists to better characterize the associations between genetic datasets in order to improve understanding of human disease.
Funding Provided by: Howard Hughes Medical Institute
Averages in the Period 2 Region of the Logistic Map
Maricela Cruz (2014); Mentor(s): Johanna Hardin; Ami Radunskaya
Abstract: The logistic map is a nonlinear difference equation well studied in literature, used to model reproduction and starvation in certain populations. Here we study the distributional characteristics of the stochastic logistic map giving evidence that the map has a stable distribution over the period 2 region. We use simulations in R to support the claim that regardless of the initial distribution of x (for example, x = population size), the logistic map iterates to a unique stable distribution. That is, after 10,000 iterations of the logistic map we arrive at a unique stable distribution of x. We also examine the relationship between the mean of the stochastic logistic equation and the mean of the deterministic logistic equation for period 2. Our initial results show that the relationship between the two averages in period 2 is opposite of that in period one. In the period 2 case the mean of the stochastic logistic equation is greater than the mean of the deterministic logistic equation. We investigate this relationship as the parameter, λ, changes.
Funding Provided by: Linares Family SURP
Quantum Algorithm for Markov Chain Monte Carlo Methods
Gillian Grindstaff (2014); Student Collaborator(s): Kevin Wilson (2015 University of Oregon); Mentor(s): Yevgeniy Kovchegov (Oregon State University)
Abstract: With quantum computers hopefully in development, the field of quantum algorithms is rapidly expanding. In my research I looked into a method of using quantum computation to drastically improve existing algorithms for sampling from a desired probability distribution. We exhibit a transformation that will produce a unitary matrix from a stochastic matrix, in particular matrices implemented in Markov chain Monte Carlo methods, as a means of defining a quantum dynamical system which parallels the Metropolis-Hastings algorithm. For the uniform cyclic walk on $n$ states, we give an explicit formula for the quantum operator which will, with use of a quantum Fourier transform to compute averages, converge on the desired distribution.
Funding Provided by: National Science Foundation
Modeling the Effects of Angiogenesis and Macrophage Phenotype on Glioma Growth.
Stephen Ragain (2014); Additional Collaborator(s): Lisette DePillis (HMC); Mentor(s): Ami Radunskaya
Abstract: Glioma is cancer in the central nervous system arising from glial cells. We develop a novel compartment model featuring tumor cells, oxygen concentration, and two phenotypes of macrophages. The model is designed to focus on two major dynamics that affect tumor growth: the effects of angiogenesis as represented by oxygen influx in the model, and a conversion from phagocytic to reparative macrophage phenotype. Without angiogenesis, the tumor quickly depletes available nutrient and cells can no longer proliferate. Without changes to macrophage phenotype, the body's immune response effectively removes the tumor. Of those mechanisms that may potentially be affected by treatment, parameter sensitivity of the model shows that α, how strongly reparative macrophages promote oxygen influx, significantly impacts tumor growth.
Funding Provided by: Howard Hughes Medical Institute
Solvability of non-linear two-point boundary value problems of second order
Daria Drozdova (2014); Student Collaborator(s): Mary Kamitaki (2015); Mentor(s): Adolfo Rumbos
Abstract: The goal of this project was to prove existence results for a general class of nonlinear two-point boundary value problems of second order. These problems are interesting from the point of view of the theory of differential equations. They also arise in many situations in the physical sciences and their study is fundamental to understanding the underlying physical problems. In addition to learning the theory of two-point boundary value problems, we went over the article “Boundary value problems for weakly nonlinear ordinary differential equations” by E. N. Dancer (Bulletin of the Australian Mathematical Society, Volume 15, 1976, pp. 321-328), which uses “shooting method” arguments to prove existence of solutions. Concentrating on the problem with Dirichlet boundary conditions, we considered a result in Dancer’s paper in which the nonlinearity is asymptotically linear at infinity and at resonance with respect to the Fucik spectrum. The key result of our project was proving that under certain conditions, the non-linear two-point boundary value problem has at least one solution. The observation that the gaps between zeros of a piecewise linear initial value problem and of the nonlinear initial value problem are small simplifies the task of finding the form of the solution to our problem; this implies that the shooting method can be implemented. The next step in the project is to look at the case in which the nonlinearity grows more than linearly.
Funding Provided by: Pomona College SURP (DD); National Science Foundation # DMS-1016136 (MK)
Luis Garcia (2014); Student Collaborator(s): Jacob Brumbaugh-Smith (2013); Andrew Turner (2014 HMC); Madeleine Bulkow (2014 SCR); Additional Collaborator(s): Matthew Michal*; Mentor(s): Stephan Garcia
*Claremont Graduate University
Abstract: The theory of supercharacters, which generalizes classical character theory, was recently introduced by P. Diaconis and I.M. Isaacs. We study supercharacter theories on (Z=nZ)d induced by the actions of certain matrix groups, demonstrating that a variety of exponential sums of interest in number theory (e.g., Gauss, Ramanujan, and Kloosterman sums) arise in this manner. We also develop the super-Fourier transform, a generalization of the discrete Fourier transform, in which supercharacters play the role of the Fourier exponential basis. For this transform, we provide a corresponding uncertainty principle and compute the associated constants in several cases.
Funding Provided by: Pomona College SURP (LG); Pomona College Mathematics Department (JBS)
Canonical Correlation applied to genes from RNASeq data
Isabelle Ambler (2013); Student Collaborator(s): Jacob Coleman (2013); Mentor(s): Johanna Hardin
Abstract: There has been recent interest in studying natural variation in human gene expression with respect to phenotypic characteristics to find groups of genes with similar function. Previous studies have proposed applying canonical correlation analysis (CCA) to high-throughput data to find maximum correlations between linear combinations of two multidimensional data sets. However, high-throughput data, like RNA-Seq data, is characteristically noisy and, without modification, CCA is susceptible to distortions from these outliers. We demonstrate that robust CCA is a promising approach to computing reliable canonical correlations.
Funding Provided by: Fletcher Jones Foundation (IA); Paul K. Richter and Evalyn E. Cook Richter Memorial Funds (JC)
Karl Kumbier (2013); Student Collaborator(s): Cody Moore (2013); Mentor(s): Johanna Hardin
Abstract: Recent advances in gene sequencing technology have spurred an increase of Genome Wide Association Studies (GWAS), which attempt to find relationships between expressed phenotypes and different loci in the genome. One of the parameters of interest in these studies is heritability (h3), the proportion of phenotypic variance for a given trait accounted for by additive genetic effects. Finding h3 requires an estimate of the relatedness among a sample of individuals; the pairwise estimates make up the genetic relationship matrix (GRM). Our study aims to show that the measurement and sampling errors introduced by estimating the GRM bias estimates of heritability downward. Understanding both how much noise is introduced by using a sample GRM and how much this noise biases estimates of h3 allows future GWAS to correct for sampling and measurement errors.
Funding Provided by: Paul K. Richter and Evalyn E. Cook Richter Memorial Funds (KK); Pomona College Mathematics Department (CM)