# 13 Mathematics, Computer Science, and Statistics students present at the 2011 Festival of Science

**The abstracts for Mathematics, Computer Science, and Statistics students who presented at the Festival of Science:**

**Katelynn Benzing**

Faculty Sponsor: Michael Schuckers, Statistics**Noise and DNA Profiling Data Analysis**

DNA profiling has become a widely used method for the process of human identification in forensic science (Weir, 2007). There has been a widely accepted belief that DNA profiles are rare, meaning that there is a small chance that two individuals in a database will have the same profile (Weir, 2007). However, it must be noted that as the number of individuals used in forensic profiling grows, the chance that two profiles match also grows. It has been customary to calculate the probability of a match by adding together the products of the probability of each possible genotype and the match probability of that genotype (Weir, 2004). Instead of using this approach, we have constructed a metric that allows analysis of the likelihood of a match between two DNA profiles from distinct individuals. In particular, we are interested in understanding how the distribution of match scores changes when we add different levels of noise to our sample. For this research, we are focusing on stutter in the electropherogram as the source of noise.

**Lauren Brozowski**

Faculty Sponsor: Michael Schuckers, Statistics**An Analysis of Penalty Biases called in the NHL in the Regular Season 2009-2010**

Penalties in ice hockey are an important aspect of the game as the consequence of —a penalty being drawn can lead to a goal and ultimately influence which team wins the game. In this paper, we analyze all the penalties taken during the National Hockey League’s 2009-2010 regular season. As part of our analysis we look at the rate at which penalties were called by each of the league’s referees and linesmen. A few factors we include are the experience a referee has in calling specific penalties as well as the tendencies among the types of penalties called by each official. The results of our analysis would be useful to NHL teams to guide their style of play knowing that certain officials will be on the ice for a given game.

**James Curro and Matthew Dodge**

Faculty Sponsor: Jessica Chapman, Mathematics**Rock you like a... Statistician**

Guitar Hero is a popular video game in which rock enthusiasts can act as Slash, Hendrix or Clapton playing their favorite songs with a guitar-shaped controller. Players attempt to hit sequences of notes at specific times as dictated by the game. If the player hits a wrong note, plays at the incorrect time, or misses the note altogether, the note doesn’t count and the song doesn’t play. As more notes are missed, the in-game spectators respond unfavorable, and the player risks getting booed off the stage before the end of the song. We wondered if missed notes occurred randomly or were grouped together in difficult parts of the song. Thus, we developed estimators to determine how ordered the grouping was on three artificial songs; one with seemingly random misses, one with obvious grouping, and one with some randomness and some grouping. We then obtained data by allowing undergraduate students and professors to try their hand at becoming a rock legend. We will apply our estimators to our datasets,(delete comma) and perform simulations to compare how well our different methods perform under a variety of situations. Our estimators’ effectiveness can then be evaluated and we will be able to determine if missed notes follow a pattern or are random in nature.

**Anne Lawless**

Faculty Sponsor, Ivan Ramler, Statistics**Analyzing Exotic Amazonian Bird Foraging**

“God gives every bird its food, but he doesn’t throw it in their nest‚” – J. G. Holland. We have obtained information on different species of Ant-Following Amazonian birds and their competitive eating habits. As the data contains small counts, typical methods such as ANOVA are not appropriate. The eating habits of these birds can be modeled using Poisson regression. Further, a new multiple comparison technique extending the concept of Tukey’s HSD to Poisson regression has been developed to discover significant differences in their mean success foraging rates.

**Nicole Martin**

Faculty Sponsor: Ivan Ramler, Statistics**Words of Today Compared to Terminology of Yesterday**

Words that were common in the past have often been replaced with new words. For the past couple of years Google has uploaded the full text of books into their database to measure individual word usage. In this project we will be looking at the usage of the words “lunch,” “dinner,” and “supper.” These three words referring to meals taken at different times of the day have experienced usage fluctuation over time. We will filter the three words, with capital beginnings, lowercase beginnings and plurals from the Google labs data of 470 million lines into a smaller data set. We plan to then apply smoothers such as spline and loess to investigate patterns between these words. After applying these time series techniques, we will also determine the year range in which "lunch" became a more popular word than "supper”.

**Waled Murshed**

Faculty Sponsor: Ivan Ramler, Statistics**Introduction to Survival Analysis**

Estimating the survival function and making predictions has been of major interest in many statistical fields, including medical research/statistic. A very popular method used to estimate the survival function and a statistical test for comparing survival distributions is the product-limit method, also known as the Kaplan-Meier method. Furthermore, a proportional hazards model, more specifically the Cox model, is used for more in-depth analysis. This poster will introduce these and several other aspects of survival analysis, as well as apply these methods to several data sets like “Time to First Recurrence of a Tumor in Bladder Cancer Patients.”

**Hau Nguyen **

Faculty Sponsor: Jessica Chapman, Mathematics**Bayesian vs. Frequentist Approaches to Modeling Seal Populations**

A classical approach in statistics, the Frequentist method, is based on repeated random sampling with fixed parameters to test hypotheses and form confidence intervals. The Bayesian approach to statistics differs in that the parameters are treated as random variables that can be modeled according to some distribution. Although these methods may seem contradictory, their applications should be complementary; their usefulness depends on how we want to approach the data and the models. In my research, I illustrated these differences by comparing the results that I obtain from performing a Poisson regression analysis of harbor seal haul-outs in Ireland using both the Frequentist and Bayesian approaches.

**Tansy Peplau and George Konidaris (Univ. of MA)**

Faculty Sponsor: Richard Sharp, Andrew Barto (Univ of MA) Computer Science**Adaptive Gaming with Pacman**

Our aim is to adapt a simple game so that it adjusts the difficulty based on the skill level of the current player. By changing only the difficulty of a specific aspect of a game that the player already finds intrinsically interesting, one can motivate people to play the game longer. We took java code for Pacman from online and added adaptive functionality. We added a global intelligence variable that controls how well the ghosts track Pacman and choose "intelligent" directions to turn towards catching Pacman. We added code for the ghost tracking as well. We also added methods to control the difficulty, and now the game gets easier if the player does badly, and vice versa if he does well. Aspects of the game that are influenced by the difficulty include Pacman speed, ghost speed, ghost intelligence, and power pellets.

**Matt Raley**

Faculty Sponsor: Ivan Ramler, Mathematics**Modeling the Dow Jones Industrial Average Using Time Series Analysis**

Cyclical by nature, the economy of the United States is constantly changing. Stock market indices signify both expansionary and recessionary trends in the economy. I used multiple linear regression and time series analyses, and incorporating the statistical bootstrap method, to model monthly movements in the Dow Jones Industrial Average (DJIA) based on multiple economic indicators: West Texas Intermediate (WTI) Crude Oil Spot Prices, Gold Spot Prices, Unemployment Rates, Federal Funds Rates, and Housing Starts.

**Somphone Sonenarong**

Faculty Sponsor: Dante Giarrusso, Mathematics**Hamilton and the Discovery of Quaternion**

In 1843 Sir William Rowan Hamilton inscribed "i^{2} = j^{2} = k^{2}= ijk = -1" onto the Brougham Bridge in Dublin prior to attending a council meeting at the Royal Irish Academy. The above inscription represents Hamilton's discovery of the quaternions, a number system that extends complex numbers into four dimensions. In this presentation, I will discuss the algebraic properties of the quaternions and prove the norm property, namely that N(qq') = N(q)N(q'). I will also discuss the Sum of Four Squares Theorem, which states that any positive integer can be represented as the sum of at most four squares. By using the norm property of quaternionic products and the Fundamental Theorem of Arithmetic we can reduce the problem to determining which prime numbers may be represented as a sum of four squares.

**Lisa VanderVoort**

Faculty Sponsor: Michael Schuckers, Statistics**Evaluation of Estimators of Generalized Pareto Distribution**

Biometrics is the study of identifying individuals based on their physical traits. Biometric systems are designed in order to detect whether a person attempting to gain access to information is the genuine person or an imposter. Of particular importance to statisticians working on improving the False Match Rates (FMR) and False Non-Match Rates (FNMR) are extreme value statistics; that is, the lowermost or uppermost portions of the genuine and imposter distributions. The present study set out to investigate the accuracy of estimators of the Generalized Pareto Distribution (GPD). The GPD is an effective way to analyze points above a threshold. Four estimators from the R Pot Package were chosen: Maximum Likelihood Estimator (MLE), Unbiased Probability Weighted Moments (PWMU), Biased Probability Weighted Moments (PWMB), and Method of Moments (MOM). Under small and medium sample sizes, the MLE estimator gave the most accurate estimates of scale and shape. However, the PWMU and PWMB estimators performed most consistently across small, medium and large sample sizes.

**Danielle Winters**

Faculty Sponsor, Ivan Ramler**Detecting Struggling Students Early Using Linear Regression**

We develop a strategy to predict final grades from early materials in introductory statistics and calculus courses. Using individual quiz and exam scores from historic data from sections using a similar structure, we use multiple linear regression and logistic regression to predict final grades on a week-by-week basis during the current semester. These models can be used to predict final grades for students based on a few early quiz scores and thus can be used to identify struggling students early in the semester. We believe that these models provide a strategy that teachers can use within their classrooms on a weekly basis to predict student performance.