# 13 Mathematics, Computer Science, and Statistics students present at the 2011 Festival of Science

**The abstracts for Mathematics, Computer Science, and Statistics students who presented at the Festival of Science:**

Faculty Sponsor: Michael

Schuckers, Statistics

**Noise and DNA
Profiling Data Analysis**

DNA

profiling has become a widely used method for the process of human

identification in forensic science (Weir, 2007). There has been a widely accepted belief that

DNA profiles are rare, meaning that there is a small chance that two

individuals in a database will have the same profile (Weir, 2007). However, it must be noted that as the number

of individuals used in forensic profiling grows, the chance that two profiles

match also grows. It has been customary

to calculate the probability of a match by adding together the products of the

probability of each possible genotype and the match probability of that

genotype (Weir, 2004). Instead of using

this approach, we have constructed a metric that allows analysis of the

likelihood of a match between two DNA profiles from distinct individuals. In particular, we are interested in

understanding how the distribution of match scores changes when we add

different levels of noise to our sample.

For this research, we are focusing on stutter in the electropherogram as

the source of noise.

**Lauren Brozowski**

Faculty Sponsor: Michael

Schuckers, Statistics

**An Analysis of
Penalty Biases called in the NHL in the Regular Season 2009-2010**

Penalties

in ice hockey are an important aspect of the game as the consequence of —a

penalty being drawn can lead to a goal and ultimately influence which team wins

the game. In this paper, we analyze all the penalties taken during the National

Hockey League’s 2009-2010 regular season. As part of our analysis we look at

the rate at which penalties were called by each of the league’s referees and

linesmen. A few factors we include are the experience a referee has in calling

specific penalties as well as the tendencies among the types of penalties

called by each official. The results of our analysis would be useful to NHL

teams to guide their style of play knowing that certain officials will be on

the ice for a given game.

**James Curro and Matthew Dodge**

Faculty Sponsor: Jessica

Chapman, Mathematics

**Rock you like
a... Statistician**

Guitar

Hero is a popular video game in which rock enthusiasts can act as Slash,

Hendrix or Clapton playing their favorite songs with a guitar-shaped

controller. Players attempt to hit

sequences of notes at specific times as dictated by the game. If the player hits a wrong note, plays at the

incorrect time, or misses the note altogether, the note doesn’t count and the

song doesn’t play. As more notes are

missed, the in-game spectators respond unfavorable, and the player risks

getting booed off the stage before the end of the song. We wondered if missed notes occurred randomly

or were grouped together in difficult parts of the song. Thus, we developed estimators to determine

how ordered the grouping was on three artificial songs; one with seemingly

random misses, one with obvious grouping, and one with some randomness and some

grouping. We then obtained data by

allowing undergraduate students and professors to try their hand at becoming a

rock legend. We will apply our

estimators to our datasets,(delete comma) and perform simulations to compare

how well our different methods perform under a variety of situations. Our estimators’ effectiveness can then be

evaluated and we will be able to determine if missed notes follow a pattern or

are random in nature.

**Anne Lawless**

Faculty Sponsor, Ivan

Ramler, Statistics

**Analyzing Exotic
Amazonian Bird Foraging**

“God

gives every bird its food, but he doesn’t throw it in their nest‚” – J. G.

Holland. We have obtained information on different species of Ant-Following

Amazonian birds and their competitive eating habits. As the data contains small

counts, typical methods such as ANOVA are not appropriate. The eating habits of

these birds can be modeled using Poisson regression. Further, a new multiple

comparison technique extending the concept of Tukey’s HSD to Poisson regression

has been developed to discover significant differences in their mean success

foraging rates.

**Nicole Martin**

Faculty Sponsor: Ivan

Ramler, Statistics

**Words
of Today Compared to Terminology of Yesterday**

Words

that were common in the past have often been replaced with new words. For the past couple of years Google has

uploaded the full text of books into their database to measure individual word

usage. In this project we will be

looking at the usage of the words “lunch,” “dinner,” and “supper.” These three words referring to meals taken at

different times of the day have experienced usage fluctuation over time. We will filter the three words, with capital

beginnings, lowercase beginnings and plurals from the Google labs data of 470

million lines into a smaller data set.

We plan to then apply smoothers such as spline and loess to investigate

patterns between these words. After

applying these time series techniques, we will also determine the year range in

which "lunch" became a more popular word than "supper”.

**Waled Murshed**

Faculty Sponsor: Ivan

Ramler, Statistics

**Introduction
to Survival Analysis**

Estimating

the survival function and making predictions has been of major interest in many

statistical fields, including medical research/statistic. A very popular method used to estimate the

survival function and a statistical test for comparing survival distributions

is the product-limit method, also known as the Kaplan-Meier method.

Furthermore, a proportional hazards model, more specifically the Cox model, is

used for more in-depth analysis. This poster will introduce these and several

other aspects of survival analysis, as well as apply these methods to several

data sets like “Time to First Recurrence of a Tumor in Bladder Cancer

Patients.”

**Hau Nguyen **

Faculty

Sponsor: Jessica Chapman, Mathematics

**Bayesian
vs. Frequentist Approaches to Modeling Seal Populations**

A

classical approach in statistics, the Frequentist method, is based on repeated

random sampling with fixed parameters to test hypotheses and form confidence

intervals. The Bayesian approach to statistics differs in that the parameters

are treated as random variables that can be modeled according to some

distribution. Although these methods may seem contradictory, their applications

should be complementary; their usefulness depends on how we want to approach

the data and the models. In my research, I illustrated these differences by

comparing the results that I obtain from performing a Poisson regression

analysis of harbor seal haul-outs in Ireland using both the Frequentist and

Bayesian approaches.

**Tansy Peplau and George Konidaris (Univ. of MA)**

Faculty Sponsor: Richard Sharp, Andrew Barto (Univ of MA) Computer

Science

**Adaptive Gaming
with Pacman**

Our

aim is to adapt a simple game so that it adjusts the difficulty based on the

skill level of the current player. By

changing only the difficulty of a specific aspect of a game that the player

already finds intrinsically interesting, one can motivate people to play the

game longer. We took java code for

Pacman from online and added adaptive functionality. We added a global intelligence variable that

controls how well the ghosts track Pacman and choose "intelligent"

directions to turn towards catching Pacman.

We added code for the ghost tracking as well. We also added methods to control the

difficulty, and now the game gets easier if the player does badly, and vice

versa if he does well. Aspects of the

game that are influenced by the difficulty include Pacman speed, ghost speed,

ghost intelligence, and power pellets.

**Matt Raley**

Faculty Sponsor: Ivan

Ramler, Mathematics

**Modeling the Dow
Jones Industrial Average Using Time Series Analysis**

Cyclical

by nature, the economy of the United States is constantly changing. Stock

market indices signify both expansionary and recessionary trends in the

economy. I used multiple linear regression and time series analyses, and

incorporating the statistical bootstrap method, to model monthly movements in

the Dow Jones Industrial Average (DJIA) based on multiple economic indicators:

West Texas Intermediate (WTI) Crude Oil Spot Prices, Gold Spot Prices,

Unemployment Rates, Federal Funds Rates, and Housing Starts.

**Somphone Sonenarong**

Faculty Sponsor: Dante

Giarrusso, Mathematics

**Hamilton and the
Discovery of Quaternion**

In 1843 Sir William Rowan Hamilton inscribed

"i

^{2}= j

^{2}= k

^{2}= ijk = -1" onto the

Brougham Bridge in Dublin prior to attending a council meeting at the Royal

Irish Academy. The above inscription represents Hamilton's discovery of

the quaternions, a number system that extends complex numbers into four dimensions.

In this presentation, I will discuss the algebraic properties of the

quaternions and prove the norm property, namely that N(qq') =

N(q)N(q'). I will also discuss the Sum of Four Squares Theorem,

which states that any positive integer can be represented as the sum of at most

four squares. By using the norm property of quaternionic products and the

Fundamental Theorem of Arithmetic we can reduce the problem to determining

which prime numbers may be represented as a sum of four squares.

**Lisa VanderVoort**

Faculty Sponsor: Michael

Schuckers, Statistics

**Evaluation
of Estimators of Generalized Pareto Distribution**

Biometrics

is the study of identifying individuals based on their physical traits. Biometric systems are designed in order to

detect whether a person attempting to gain access to information is the genuine

person or an imposter. Of particular

importance to statisticians working on improving the False Match Rates (FMR)

and False Non-Match Rates (FNMR) are extreme value statistics; that is, the

lowermost or uppermost portions of the genuine and imposter distributions. The present study set out to investigate the

accuracy of estimators of the Generalized Pareto Distribution (GPD). The GPD is an effective way to analyze points

above a threshold. Four estimators from

the R Pot Package were chosen: Maximum Likelihood Estimator (MLE), Unbiased

Probability Weighted Moments (PWMU), Biased Probability Weighted Moments

(PWMB), and Method of Moments (MOM). Under small and medium sample sizes, the

MLE estimator gave the most accurate estimates of scale and shape. However, the PWMU and PWMB estimators

performed most consistently across small, medium and large sample sizes.

**Danielle Winters**

Faculty

Sponsor, Ivan Ramler

**Detecting
Struggling Students Early Using Linear Regression**

We develop

a strategy to predict final grades from early materials in introductory

statistics and calculus courses. Using

individual quiz and exam scores from historic data from sections using a

similar structure, we use multiple linear regression and logistic regression to

predict final grades on a week-by-week basis during the current semester. These

models can be used to predict final grades for students based on a few early

quiz scores and thus can be used to identify struggling students early in the

semester. We believe that these models

provide a strategy that teachers can use within their classrooms on a weekly

basis to predict student performance.