News

# 13 Mathematics, Computer Science, and Statistics students present at the 2011 Festival of Science

The abstracts for Mathematics, Computer Science, and Statistics students who presented at the Festival of Science:

Katelynn Benzing
Noise and DNA Profiling Data Analysis
DNA profiling has become a widely used method for the process of human identification in forensic science (Weir, 2007).  There has been a widely accepted belief that DNA profiles are rare, meaning that there is a small chance that two individuals in a database will have the same profile (Weir, 2007).  However, it must be noted that as the number of individuals used in forensic profiling grows, the chance that two profiles match also grows.  It has been customary to calculate the probability of a match by adding together the products of the probability of each possible genotype and the match probability of that genotype (Weir, 2004).  Instead of using this approach, we have constructed a metric that allows analysis of the likelihood of a match between two DNA profiles from distinct individuals.  In particular, we are interested in understanding how the distribution of match scores changes when we add different levels of noise to our sample.  For this research, we are focusing on stutter in the electropherogram as the source of noise.

Lauren Brozowski
An Analysis of Penalty Biases called in the NHL in the Regular Season 2009-2010
Penalties in ice hockey are an important aspect of the game as the consequence of —a penalty being drawn can lead to a goal and ultimately influence which team wins the game. In this paper, we analyze all the penalties taken during the National Hockey League’s 2009-2010 regular season. As part of our analysis we look at the rate at which penalties were called by each of the league’s referees and linesmen. A few factors we include are the experience a referee has in calling specific penalties as well as the tendencies among the types of penalties called by each official. The results of our analysis would be useful to NHL teams to guide their style of play knowing that certain officials will be on the ice for a given game.

James Curro and Matthew Dodge
Rock you like a... Statistician
Guitar Hero is a popular video game in which rock enthusiasts can act as Slash, Hendrix or Clapton playing their favorite songs with a guitar-shaped controller.  Players attempt to hit sequences of notes at specific times as dictated by the game.  If the player hits a wrong note, plays at the incorrect time, or misses the note altogether, the note doesn’t count and the song doesn’t play.  As more notes are missed, the in-game spectators respond unfavorable, and the player risks getting booed off the stage before the end of the song.  We wondered if missed notes occurred randomly or were grouped together in difficult parts of the song.  Thus, we developed estimators to determine how ordered the grouping was on three artificial songs; one with seemingly random misses, one with obvious grouping, and one with some randomness and some grouping.  We then obtained data by allowing undergraduate students and professors to try their hand at becoming a rock legend.   We will apply our estimators to our datasets,(delete comma) and perform simulations to compare how well our different methods perform under a variety of situations.  Our estimators’ effectiveness can then be evaluated and we will be able to determine if missed notes follow a pattern or are random in nature.

Anne Lawless
Analyzing Exotic Amazonian Bird Foraging
“God gives every bird its food, but he doesn’t throw it in their nest‚” – J. G. Holland. We have obtained information on different species of Ant-Following Amazonian birds and their competitive eating habits. As the data contains small counts, typical methods such as ANOVA are not appropriate. The eating habits of these birds can be modeled using Poisson regression. Further, a new multiple comparison technique extending the concept of Tukey’s HSD to Poisson regression has been developed to discover significant differences in their mean success foraging rates.

Nicole Martin
Words of Today Compared to Terminology of Yesterday
Words that were common in the past have often been replaced with new words.  For the past couple of years Google has uploaded the full text of books into their database to measure individual word usage.  In this project we will be looking at the usage of the words “lunch,” “dinner,” and “supper.”  These three words referring to meals taken at different times of the day have experienced usage fluctuation over time.  We will filter the three words, with capital beginnings, lowercase beginnings and plurals from the Google labs data of 470 million lines into a smaller data set.  We plan to then apply smoothers such as spline and loess to investigate patterns between these words.  After applying these time series techniques, we will also determine the year range in which "lunch" became a more popular word than "supper”.

Waled Murshed
Introduction to Survival Analysis
Estimating the survival function and making predictions has been of major interest in many statistical fields, including medical research/statistic.  A very popular method used to estimate the survival function and a statistical test for comparing survival distributions is the product-limit method, also known as the Kaplan-Meier method. Furthermore, a proportional hazards model, more specifically the Cox model, is used for more in-depth analysis. This poster will introduce these and several other aspects of survival analysis, as well as apply these methods to several data sets like “Time to First Recurrence of a Tumor in Bladder Cancer Patients.”

Hau Nguyen
Bayesian vs. Frequentist Approaches to Modeling Seal Populations
A classical approach in statistics, the Frequentist method, is based on repeated random sampling with fixed parameters to test hypotheses and form confidence intervals. The Bayesian approach to statistics differs in that the parameters are treated as random variables that can be modeled according to some distribution. Although these methods may seem contradictory, their applications should be complementary; their usefulness depends on how we want to approach the data and the models. In my research, I illustrated these differences by comparing the results that I obtain from performing a Poisson regression analysis of harbor seal haul-outs in Ireland using both the Frequentist and Bayesian approaches.

Tansy Peplau and George Konidaris (Univ. of MA)
Faculty Sponsor: Richard Sharp, Andrew Barto (Univ of MA) Computer Science
Our aim is to adapt a simple game so that it adjusts the difficulty based on the skill level of the current player.  By changing only the difficulty of a specific aspect of a game that the player already finds intrinsically interesting, one can motivate people to play the game longer.  We took java code for Pacman from online and added adaptive functionality.  We added a global intelligence variable that controls how well the ghosts track Pacman and choose "intelligent" directions to turn towards catching Pacman.  We added code for the ghost tracking as well.  We also added methods to control the difficulty, and now the game gets easier if the player does badly, and vice versa if he does well.  Aspects of the game that are influenced by the difficulty include Pacman speed, ghost speed, ghost intelligence, and power pellets.

Matt Raley
Modeling the Dow Jones Industrial Average Using Time Series Analysis
Cyclical by nature, the economy of the United States is constantly changing. Stock market indices signify both expansionary and recessionary trends in the economy. I used multiple linear regression and time series analyses, and incorporating the statistical bootstrap method, to model monthly movements in the Dow Jones Industrial Average (DJIA) based on multiple economic indicators: West Texas Intermediate (WTI) Crude Oil Spot Prices, Gold Spot Prices, Unemployment Rates, Federal Funds Rates, and Housing Starts.

Somphone Sonenarong