News

13 Mathematics, Computer Science, and Statistics students present at the 2011 Festival of Science

The abstracts for Mathematics, Computer Science, and Statistics students who presented at the Festival of Science:

Katelynn Benzing

Faculty Sponsor: Michael
Schuckers, Statistics

Noise and DNA
Profiling Data Analysis

DNA
profiling has become a widely used method for the process of human
identification in forensic science (Weir, 2007).  There has been a widely accepted belief that
DNA profiles are rare, meaning that there is a small chance that two
individuals in a database will have the same profile (Weir, 2007).  However, it must be noted that as the number
of individuals used in forensic profiling grows, the chance that two profiles
match also grows.  It has been customary
to calculate the probability of a match by adding together the products of the
probability of each possible genotype and the match probability of that
genotype (Weir, 2004).  Instead of using
this approach, we have constructed a metric that allows analysis of the
likelihood of a match between two DNA profiles from distinct individuals.  In particular, we are interested in
understanding how the distribution of match scores changes when we add
different levels of noise to our sample. 
For this research, we are focusing on stutter in the electropherogram as
the source of noise.

Lauren Brozowski

Faculty Sponsor: Michael
Schuckers, Statistics

An Analysis of
Penalty Biases called in the NHL in the Regular Season 2009-2010

Penalties
in ice hockey are an important aspect of the game as the consequence of —a
penalty being drawn can lead to a goal and ultimately influence which team wins
the game. In this paper, we analyze all the penalties taken during the National
Hockey League’s 2009-2010 regular season. As part of our analysis we look at
the rate at which penalties were called by each of the league’s referees and
linesmen. A few factors we include are the experience a referee has in calling
specific penalties as well as the tendencies among the types of penalties
called by each official. The results of our analysis would be useful to NHL
teams to guide their style of play knowing that certain officials will be on
the ice for a given game.

James Curro and Matthew Dodge

Faculty Sponsor: Jessica
Chapman, Mathematics

Rock you like
a... Statistician

Guitar
Hero is a popular video game in which rock enthusiasts can act as Slash,
Hendrix or Clapton playing their favorite songs with a guitar-shaped
controller.  Players attempt to hit
sequences of notes at specific times as dictated by the game.  If the player hits a wrong note, plays at the
incorrect time, or misses the note altogether, the note doesn’t count and the
song doesn’t play.  As more notes are
missed, the in-game spectators respond unfavorable, and the player risks
getting booed off the stage before the end of the song.  We wondered if missed notes occurred randomly
or were grouped together in difficult parts of the song.  Thus, we developed estimators to determine
how ordered the grouping was on three artificial songs; one with seemingly
random misses, one with obvious grouping, and one with some randomness and some
grouping.  We then obtained data by
allowing undergraduate students and professors to try their hand at becoming a
rock legend.   We will apply our
estimators to our datasets,(delete comma) and perform simulations to compare
how well our different methods perform under a variety of situations.  Our estimators’ effectiveness can then be
evaluated and we will be able to determine if missed notes follow a pattern or
are random in nature.

Anne Lawless

Faculty Sponsor, Ivan
Ramler, Statistics

Analyzing Exotic
Amazonian Bird Foraging

“God
gives every bird its food, but he doesn’t throw it in their nest‚” – J. G.
Holland. We have obtained information on different species of Ant-Following
Amazonian birds and their competitive eating habits. As the data contains small
counts, typical methods such as ANOVA are not appropriate. The eating habits of
these birds can be modeled using Poisson regression. Further, a new multiple
comparison technique extending the concept of Tukey’s HSD to Poisson regression
has been developed to discover significant differences in their mean success
foraging rates. 

Nicole Martin

Faculty Sponsor: Ivan
Ramler, Statistics

Words
of Today Compared to Terminology of Yesterday

Words
that were common in the past have often been replaced with new words.  For the past couple of years Google has
uploaded the full text of books into their database to measure individual word
usage.  In this project we will be
looking at the usage of the words “lunch,” “dinner,” and “supper.”  These three words referring to meals taken at
different times of the day have experienced usage fluctuation over time.  We will filter the three words, with capital
beginnings, lowercase beginnings and plurals from the Google labs data of 470
million lines into a smaller data set. 
We plan to then apply smoothers such as spline and loess to investigate
patterns between these words.  After
applying these time series techniques, we will also determine the year range in
which "lunch" became a more popular word than "supper”.

Waled Murshed

Faculty Sponsor: Ivan
Ramler, Statistics

Introduction
to Survival Analysis

Estimating
the survival function and making predictions has been of major interest in many
statistical fields, including medical research/statistic.  A very popular method used to estimate the
survival function and a statistical test for comparing survival distributions
is the product-limit method, also known as the Kaplan-Meier method.
Furthermore, a proportional hazards model, more specifically the Cox model, is
used for more in-depth analysis. This poster will introduce these and several
other aspects of survival analysis, as well as apply these methods to several
data sets like “Time to First Recurrence of a Tumor in Bladder Cancer
Patients.”

Hau Nguyen           

Faculty
Sponsor: Jessica Chapman, Mathematics

Bayesian
vs. Frequentist Approaches to Modeling Seal Populations

A
classical approach in statistics, the Frequentist method, is based on repeated
random sampling with fixed parameters to test hypotheses and form confidence
intervals. The Bayesian approach to statistics differs in that the parameters
are treated as random variables that can be modeled according to some
distribution. Although these methods may seem contradictory, their applications
should be complementary; their usefulness depends on how we want to approach
the data and the models. In my research, I illustrated these differences by
comparing the results that I obtain from performing a Poisson regression
analysis of harbor seal haul-outs in Ireland using both the Frequentist and
Bayesian approaches.

Tansy Peplau and George Konidaris (Univ. of MA)

Faculty Sponsor: Richard Sharp, Andrew Barto (Univ of MA) Computer
Science

Adaptive Gaming
with Pacman

Our
aim is to adapt a simple game so that it adjusts the difficulty based on the
skill level of the current player.  By
changing only the difficulty of a specific aspect of a game that the player
already finds intrinsically interesting, one can motivate people to play the
game longer.  We took java code for
Pacman from online and added adaptive functionality.  We added a global intelligence variable that
controls how well the ghosts track Pacman and choose "intelligent"
directions to turn towards catching Pacman. 
We added code for the ghost tracking as well.  We also added methods to control the
difficulty, and now the game gets easier if the player does badly, and vice
versa if he does well.  Aspects of the
game that are influenced by the difficulty include Pacman speed, ghost speed,
ghost intelligence, and power pellets.

Matt Raley

Faculty Sponsor: Ivan
Ramler, Mathematics

Modeling the Dow
Jones Industrial Average Using Time Series Analysis

Cyclical
by nature, the economy of the United States is constantly changing. Stock
market indices signify both expansionary and recessionary trends in the
economy. I used multiple linear regression and time series analyses, and
incorporating the statistical bootstrap method, to model monthly movements in
the Dow Jones Industrial Average (DJIA) based on multiple economic indicators:
West Texas Intermediate (WTI) Crude Oil Spot Prices, Gold Spot Prices,
Unemployment Rates, Federal Funds Rates, and Housing Starts.

Somphone Sonenarong

Faculty Sponsor: Dante
Giarrusso, Mathematics

Hamilton and the
Discovery of Quaternion

In 1843 Sir William Rowan Hamilton inscribed
"i2 = j2 = k2= ijk = -1" onto the
Brougham Bridge in Dublin prior to attending a council meeting at the Royal
Irish Academy.  The above inscription represents Hamilton's discovery of
the quaternions, a number system that extends complex numbers into four dimensions. 
In this presentation, I will discuss the algebraic properties of the
quaternions and prove the norm property, namely that N(qq') =
N(q)N(q').  I will also discuss the Sum of Four Squares Theorem,
which states that any positive integer can be represented as the sum of at most
four squares.  By using the norm property of quaternionic products and the
Fundamental Theorem of Arithmetic we can reduce the problem to determining
which prime numbers may be represented as a sum of four squares.

Lisa VanderVoort

Faculty Sponsor: Michael
Schuckers, Statistics

Evaluation
of Estimators of Generalized Pareto Distribution

Biometrics
is the study of identifying individuals based on their physical traits.  Biometric systems are designed in order to
detect whether a person attempting to gain access to information is the genuine
person or an imposter.  Of particular
importance to statisticians working on improving the False Match Rates (FMR)
and False Non-Match Rates (FNMR) are extreme value statistics; that is, the
lowermost or uppermost portions of the genuine and imposter distributions.  The present study set out to investigate the
accuracy of estimators of the Generalized Pareto Distribution (GPD).  The GPD is an effective way to analyze points
above a threshold.  Four estimators from
the R Pot Package were chosen: Maximum Likelihood Estimator (MLE), Unbiased
Probability Weighted Moments (PWMU), Biased Probability Weighted Moments
(PWMB), and Method of Moments (MOM). Under small and medium sample sizes, the
MLE estimator gave the most accurate estimates of scale and shape.  However, the PWMU and PWMB estimators
performed most consistently across small, medium and large sample sizes.

Danielle Winters                              

Faculty
Sponsor, Ivan Ramler

Detecting
Struggling Students Early Using Linear Regression

We develop
a strategy to predict final grades from early materials in introductory
statistics and calculus courses.  Using
individual quiz and exam scores from historic data from sections using a
similar structure, we use multiple linear regression and logistic regression to
predict final grades on a week-by-week basis during the current semester. These
models can be used to predict final grades for students based on a few early
quiz scores and thus can be used to identify struggling students early in the
semester.  We believe that these models
provide a strategy that teachers can use within their classrooms on a weekly
basis to predict student performance.