Festival of Science 2010 - Student Abstracts

The annual St. Lawrence University Festival of Science was held on Friday, April 23, 2010.  In total, 124 students presented their research at the festival, 30 of which were students from the Department of Mathematics, Computer Science, and Statistics.  More information and a complete program of the event are available at the Festival of Science website.   

Spring 2010 FOS Mathematics, Computer Science and Statistics Abstracts

Bryan Ben-Haim and Steve Pogozelski - SLUlist: St. Lawrence’s Online Marketplace
Computer Science - Rich Sharp

In this project, we seek to provide a web forum for St. Lawrence students and faculty to trade and barter. Our web application will have a function similar to that of Craigslist. Items bought, sold, and traded will range from books for classes, to electronics, to off-campus housing. The application’s interface will be friendly to new users of the online trading world. For example, it will permit users to post, edit and remove articles. The viewer will be able to browse by subject, date, poster and even search by keyword. The goal of this project is to provide a simple and easy way to exchange goods on the St. Lawrence campus, by offering online tools to effortlessly post, view, and search for items. 

Doug Denu - SLU Web Crawler
Computer Science - Ed Harcourt

A web crawler is a program that scours the Internet moving from website to website. Web crawlers have many different purposes, such as sending out junk mail, finding dead links within a domain, and searching websites and databases for relevant information like that of Google. This project focuses on using a web crawler to map the hierarchy of links within a particular domain. Starting at the St. Lawrence University’s home page, the web crawler gathers all the links that are found while crawling the St. Lawrence domain. Once the links have been gathered, they will be graphed to represent the structure of how they are connected. The web crawler is written in the Python programming language. Certain features that are needed to construct a web crawler, such as data structures, regular expressions, and search algorithms, are already part of Python. The web crawler stores all the links that it comes across, by using a combination of a queue and a dictionary. Breadth first searching (BFS) is used in tandem with the queue as a way to go through all the links, making sure that all appropriate links are visited. The web crawler is broken up over three modules. The first is the Main module which initializes the web crawler. The second module does most of the work, doing the actual crawling. The third module takes the link interconnection data and organizes them so that another program can map them out in a visualization tool called Graphiz.

Christopher Fuerte, Thomas Whelan and James Johnston -  Online Texas Hold'em
Computer Science - Ed Harcourt

Our research this semester was into the architecture of online TBG (Turn Based Gaming). We discovered that the architecture breaks down into the client side and server side applications, both of which are then further broken down into smaller subcategories to make the program more efficient. The client side gets broken down into the computation heavy programming, the GUI intensive programming and the section that tracks the progress of the game. The server side breaks down into the initial connection, the active play and game tracker, and lastly the referee part that checks to see if the game rules are being followed. We did much of our research in a practical manner by analyzing a few different games and then creating our own, which was a Texas hold'em game. We chose this game type because it allowed us to apply all parts of every game that we had looked at in one application. On the client side we had to have a class that would check for the best possible hand, which is an extremely computation heavy task. We had a sub-class that is focused only on maintaining the client's GUI so that everyone is looking at the same thing. The last aspect of the client side application was in charge of tracking the client’s assets, his cards and money, and communicating with the server side application. The server side application was a very thread heavy application. It ran one thread which acted to receive clients and assign them their sockets. The server also had separate threads for each client, which had to be synchronized. Each thread was in charge of making sure that their respective client received the correct data, was kept up to speed, and was abiding by the rules of game play. 

Susan Garnett  - The World Wide Web
Computer Science - Ed  Harcourt

In the 1960 the Department of Defense started a large scale computer network that was used to share documents, shared communications and remote computer access for researchers working on defense-related contracts. We now refer to this network as the internet. This semester I designed a database driven internet site to support the Hunger Project for the Center for Civic Engagement and Learning (CCEL).  The site is developed using HTML, CSS and JavaScript on the client side (to generate what the web user sees) and uses PHP on the web server to access the hunger database. This data in turn gets rendered by the web browser for the user. The site is designed to be maintainable by CCEL staff who have with no programming experience.

John LaShomb  - Web Programming for Mobile Devices
Computer Science - Ed Harcourt

This semester I recreated several SLU web pages suitable for accessing on both desktop computers as well as mobile devices. These sites include a version of the Dana dining menu as well as the faculty/staff/student phone directory.  Mobile devices have small screens and usually suffer from slow connection speeds, but their portability makes them convenient for users. Web designers must structure their sites so they serve both desktops and mobile devices to provide the best experience possible. This means providing mobile friendly versions of sites that are lightweight, load fast, and are easy to navigate. Providing a site that is compatible with many different devices can be challenging. Since most mobile devices have a restricted web browser it usually makes sense to create pages dynamically on the server for a specific device. Mobile devices also have features not present on the desktop browser. For example, a telephone number rendered in a web page on a mobile device should be active in the sense that the user should be able to click on the number and have it dial the phone; this may not make as much sense on the desktop device.

Simon Lynch - A Survey of OpenGL Shading Language
Computer Science - Rich Sharp

   The purpose of the project was to gain a deep understanding of the OpenGL shading language. The goal was to learn to manipulate both the vertex and fragment shaders to help me create a more realistic and customizable graphical program.
   The fixed functionality of OpenGL is not capable of producing per-fragment lighting, applying textures based on vertices, and many other low-level operations. In the project, I created shaders, applied textures, implemented per-fragment lighting, and distorted the objects inside the vertex shader as the object was being drawn.

James T. Perconti - A Pipelining Package for SystemC
Computer Science - Ed Harcourt

Pipelining is a common pattern in hardware design for increasing performance. It is a form of parallelism in which each stage of a sequential process can simultaneously operate on different transactions. We have written a package for SystemC, a powerful hardware modeling language based in C++, that automates some aspects of modeling a hardware system that uses a pipeline, making it easier to efficiently create and test models for a variety of pipeline configurations. We used the package to build a simulated pipeline that performs floating-point addition, based on a five-stage algorithm.

Martin Sarov - RealTime Simulation of Light Scattering by Dust Partened Room
Computer Science - Rich Sharp

This paper presents a method for efficiently simulating the scattering of light within participating media such as dust particles. Our goal is to create a sensible and relatively accurate simulation of this phenomenon that both allows for user interaction and is rendered in real-time. For this purpose, we utilize the global illumination technique of photon mapping and integrate the resulting sampling data into our application using the OpenGL Shading Language. We also adopt a volumetric rendering technique and a trilinear filtering method in order to effectively produce the scene. Furthermore, we integrate an intuitive user interface so that our interactivity objective is fulfilled. Our end product is a computer simulation that could be run on any Windows machine. In it the user is placed in the simulated environment of a dark room that has a single window from which light is scattered. The user is both able to change the view freely within the confines of the room and is capable of controlling the levels of inelastic and elastic light scattering intensities. Our novel approach yields visually appealing results rendered in realtime. 

Adrienne Woodworth - Kung Fu Master for Teenvity
Computer Science - Ed Harcourt

In Summer 2009 I participated in an NSF-funded SURF-IT REU (Research Experience for Undergraduates) at the University of California, Santa Cruz. This REU involved creating an iPhone/iPod Touch application called Teenvity that is designed to motivate teenagers to exercise by playing games that require movement. The application is molded to the user’s personality; the system selects an agent, motivational phrases, and games to suggest to the user based on a short personality test. In order to create a quick prototype to get feedback from actual teens, we used games from the iTunes App Store. This was less than ideal, as the iPhone platform does not allow running applications simultaneously. For example, we could not automatically log the user’s play time or other data while they were outside of Teenvity playing the suggested games.

To overcome this problem, for my senior project, I created a companion game for Teenvity. The game, Kung Fu Master, runs within the application, eliminating the need for the user to log their own data. The game requires the user to pretend that there is a stack of bricks, a plank, etc. in front of them, and then swing the device in a chopping motion as though they were going to chop through the material. Each material requires an increasing level of force to break through. This application requires extensive use of the iPhone’s built-in accelerometer to detect motion. Kung Fu Master promotes healthy living by combining physical activity with a fun and rewarding game that reinforces to the user that exercise does not have to be arduous or boring. 

Cindy Chin - The Golden Ratio: the Most Irrational Among Irrational Numbers
Mathematics - Sam Vandervelde

The golden ratio is an irrational number that can be defined by the equation x^2 = x + 1, giving us a solution phi = (1 + sqrt(5)) / 2, which is about 1.6180339887. Unlike sqrt(2), sqrt(3), pi, or other irrational numbers, phi is the most irrational among the irrationals. In order to understand this claim, we use continued fractions to verify that it is the real number for which approximations by rational numbers (fractions) is least successful. From continued fractions, the famous Fibonacci sequences and Lucas numbers arise.

Alexander Fisher  - Magnetic Resonance Imaging: The Heat Equation, Proton Spins, and Signal Processing
Mathematics - Jim DeFranza

Magnetic resonance imaging (MRI) is a non-invasive imaging technique used for digitally photographing the internal structure of the human body. MRI was first introduced in 1980 and has since begun to replace the traditional, invasive methods of x-ray photography, which have been used in medicine since their discovery in 1895. MRI relies heavily on the properties of elementary particles such as proton spin and mathematical techniques of Fourier analysis to create and process a digitally converted signal. As Fourier analysis was originally introduced in 1807 and proton spin was first discovered in 1924, the natural question arises as to why the emergence of MRI took so long to become implemented in the medical setting. I will attempt to answer this question by considering the evolution and convergence of Fourier series and analog medical photography into digital signal processing and digital medical photography.

Ben von Reyn - Probability Bucketing of Independent Events: An alternative to the Monte Carlo approach
Mathematics - Duncan Melville

This study explores a new approach, presented by Hull and White, to approximating the distribution of a series of independent events. This approach segments the continuum of outcomes into different buckets and assigns a value and a corresponding probability to each bucket. These values are calculated iteratively, as we introduce a new event on each step and adjust the bucket values and probabilities accordingly. One main advantage of this approach is its computational quickness relative to Monte Carlo simulations, as its final outcome is reached after only one consideration of the events. It has applications in financial derivatives and modeling the probability of certain outcomes more quickly from these instruments.

Tara Akstull - Poisson Models to Predict Scoring Rates in Hockey
Statistics - Robin Lock

Trying to predict the score of a hockey game can be a complicated and seemingly impossible task due to the fast-pace nature of the game. We propose to model scoring rates by investigating various factors such as a team’s offensive ability, defensive ability of its opponent, and home-ice advantage. Considering that hockey scores are not normally distributed, we assume that scores follow a Poisson distribution and use these factors to build a Poisson regression model for the scoring rates. We apply this model to data from the 2008-2009 ECAC Division I Women’s Ice Hockey season. Using the Poisson model we examine each team’s scoring to generate a Poisson scoring rate and use the fit to produce an offensive rating, defensive rating, and predicted winning percentage for each team.

Lauren Brozowski, Courtney Sawchuk and Jennifer Porter- Guitar Hero: Estimate If Your Song Has Groupies
Statistics - Ivan Ramler

Guitar HeroTM is a game that attracts people of all ages. It is a game of speed and skill with a guitar shaped controller in which players hit five notes either separately or sometimes in chords, in order to simulate the guitar portion of a song. There are parts of a song called the “bridge” which includes a difficult solo required by the player. In this time frame, it is usually prevalent for a player to “flunk out” of a song. We created an estimator in order to predict whether the notes that are missed by the player are at random, or in groups. If they are at random, then the “bridge” of a song has no effect on the player’s ability to play. Missed notes occur when a player hits the wrong key, forgets to strum the guitar, or doesn’t hit a note altogether. With this, we took an approach to generate estimators to predict the randomness of a note missed by using three fake songs. The first song had random notes missed, the second had some grouping, and the third had obvious grouping. Once we tested our sets of estimators on these three songs, we then experimented with actual songs from Guitar HeroTM. We found that missed notes do occur more often in groups thus implying that in harder songs, the bridge portion will certainly be a tough section to pass. Finally, we evaluated the effectiveness of our estimators through a series of simulation experiments designed to represent varying scenarios of non-random misses.

Jeremy Hadler - Evaluating the Robustness of Competing Clustering Algorithms
Statistics - Ivan Ramler

The task of grouping individual data points which have no visible traits other than their proximity to other points on a graph (which in and of itself is not always obvious) is a useful but often difficult task. One tool to address this problem is cluster analysis, which is an unsupervised process that allows us to construct groups of similar observations. As there are numerous competing clustering algorithms, this raises the question, “Which algorithm is the most robust at identifying groups in data?” There are various ways to approach this question, but we focus on comparing the performance of five of the more popular clustering algorithms across various scenarios. In particular, we use a simulation study to assess the performance of each algorithm when the correct number of clusters is given as well as the robustness under various incorrect numbers of clusters. The results are then used to suggest which algorithms are best in each situation. We then apply the best algorithm to grouping 2008-2009 ECAC men’s and women’s hockey player data, including information on games played, goals, assists, penalties, and spread. We hope to identify interesting groups of players.

Meg Howard - Classification Trees and Predicting Breast Cancer
Statistics - Michael Schuckers

Classification trees are used with a categorical response variable. The goal of a classification tree is to derive a model that predicts to which category a particular subject or individual belongs, based on one or more explanatory factors. For example, we could use a classification tree to predict the diagnosis (Benign or Malignant) of a particular patient based upon information obtained by doctors through scanned images. These classification trees are displayed as a decision tree that has a start node which then branches into other nodes. Using classification and regression trees (CART), we develop the ability to fit a tree to data. Once we have formulated a CART model through pruning and impurity, we evaluate its predictive ability. We apply this methodology to data obtained from Machine Learning Repository. The data set is entitled Breast Cancer Wisconsin (Diagnostic) Data Set. Upon finding the best CART, we compare it against a logistic regression model to check its accuracy.

Catherine Lane - Does Iron-Fortified Fish Sauce Reduce the Presence of Anemia?: Data Analysis and Simulations
Statistics - Jessica Chapman

Anemia is a blood disorder that, if left untreated, can cause serious complications, especially in women of childbearing age. A study was done in 2004 to investigate the effectiveness of iron-fortified fish sauce for controlling anemia rates in women of childbearing age from Vietnam. In this study, entire villages were assigned to either a treatment or control group. Studies of this nature are often referred to as cluster randomization trials. Across disciplines, several different methods have been used to analyze this type of data. We investigated whether the conclusions made from these data depend on the choice of method. Further, we performed a series of simulations to investigate how different aspects of the experimental design, such as cluster imbalance, intracluster correlation, and the number of clusters, impact the power of these statistical methods.

Paul Mercurio - Explaining the Spread of Raccoon Rabies using Spatial Data
Statistics - Michael Schuckers

My poster will present an example of statistical spatial analysis. In 1993, raccoon rabies appeared on the western end of Connecticut, and within 4 years made its way to the easternmost coast. Spatial data provided by Waller and Gotway (2004) includes the coordinates of 169 Connecticut townships and the relative times of the first appearance of rabies at each. Trend surface modeling will be used to investigate the spread of raccoon rabies through the state of Connecticut.

Charles Mildrum - Spatial Statistics and Point Patterns: An Analysis of Shot Patterns in the NHL
Statistics - Michael Schuckers

The application of spatial statistics in the analysis of shot patterns in the National Hockey League yields valuable information that players, coaches and organizations alike can use to better the performance of their respective team. Using a dataset consisting of the 74,521 shots taken in the 2008-2009 NHL regular season, this study attempts to implement spatial statistical analysis to discern various relationships for specific players, teams, and/or the entire league itself. For example, from which location(s) on the ice do the Montreal Canadians give up the majority of their goals? Or, assume your team is preparing for a game against the Washington Capitals; a concern of the coach may be from where on the ice does Alexander Ovechkin score most of his goals? Understanding such relationships allow a team to adjust their defensive approach and hopefully lower the amount of goals they give up. Various spatial analysis techniques are used to determine these relationships, as well as many others. Furthermore such relationships are expressed not only numerically, but are also represented visually. The X and Y coordinates of each shot, as well as the intensity of shots at each location on the ice surface, are used to produce computer generated contour and probability maps, which illustrate the relationship in question.

Katherine Miller - Predicting Wins in Baseball
Statistics - Robin Lock

Baseball is the great American pastime. In this study we examine different aspects of baseball games to determine what factors play a role in predicting the winning team for a specific game or an entire season. To predict who is likely to win individual games, we consider factors such as each team’s offensive or defensive ability, home field advantage, past game scores, and previous winning percentage. We will also model winning percentage of an overall season, based on offensive production, pitching, and defense. We use these models and computer simulations to examine how often the best team actually wins.

Jennifer Porter - Detecting Hotspots with Spatial Analysis
Statistics - Michael Schuckers

The assessment and detection of areas with abnormally high incidents of rare cancers --- sometimes called hotspots --- is an important epidemiological function. Spatial analysis is useful in determining if there are clusters of high rates of rare diseases in a particular location. This type of analysis aims to determine if the rates of cancer are randomly dispersed or are clustered in uncommon ways. Using a data set of Leukemia rates in Upstate New York, I use spatial analysis to assess whether there are 'hotspots' present.

Karma Sonam - Investigating the convergence rate of sampling distributions from skewed populations
Statistics - Ivan Ramler

The Central Limit Theorem (CLT) states that the distribution of the sample mean of independent and identically distributed random variables converges to the normal distribution as the sample size increases. A common rule of thumb is to consider sample sizes greater than 30 as "large enough" samples to use the CLT as an approximation. However, the suitability of this rule of thumb depends on how non-normal the individual observations are distributed. Using the level of skewness as a measure of non-normality, this study investigates the normality of the distribution of sample means using Gamma distributed and Poisson distributed random variables with different levels of skewness. Various simulation techniques in R are used to test whether the use of the CLT is a good approximation for the distribution of the sample mean. As skewness increases, a larger sample is needed to use the CLT as a good approximation of the distribution of the sample mean. This study uses a polynomial regression to estimate the nominal convergence rate for confidence intervals of the mean based on the skewness and sample sizes. The results from this study will helps us decide, for a given skewness level and sample size, when it is appropriate to use the CLT or if alternative methods (such as the bootstrap) are needed.

Marcus Tuttle - Exploring Markov Chain Monte Carlo Techniques
Statistics - Robin Lock 

Markov Chain Monte Carlo (MCMC) methods are powerful algorithms that enable statisticians to explore information about probability distributions through computer simulations when exact theoretical methods are not feasible. The Gibbs Sampler, for example, allows us to gather information about marginal and joint distributions of multivariate densities assuming that we know information about the conditional distributions. Of particular interest is the use of MCMC methods in Bayesian statistics to help estimate posterior distributions. In this project we illustrate several uses of MCMC methods through computer simulation and applications to real data.

Cara Valentine - Spatial Analysis of Road Kill Data
Statistics - Michael Schuckers

This project uses road kill data from three different routes in St. Lawrence County collected in 2007-2008. Each road kill spot represents a geographic location of features and boundaries in nature, therefore, data is considered spatial. The analysis of spatial data studies the distribution of points, detecting clustering and regularity. In other words, it locates what is known as "point patterns" and identifies occurrences as points in some type of space. The road kill data was analyzed using the function known as Ripley's K, which was written into the statistical software R. Ripley's K is unique, in that, it considers not only nearest-neighbor points, but accounts for edge and overlap effects on data. Moreover, RIpley's K combines distance measurements with modular arithmetic which provides a more responsive analysis for the road kill data. Finally, the function was applied to the data to test the hypothesis that the road kill locations occurred randomly along each of the routes. 

Cara Valentine, Lisa VanderVoort and Zach Graham  - Strumming Like a Rockstar: A Statistical Approach to Modeling Missed Notes in Guitar HeroTM
Statistics - Michael Schuckers

Guitar HeroTM is a popular video game played by many college students today in which gamers “play” a guitar to various songs. People with experience playing the game know that the game keeps information about the accuracy of a player, keeping track of hits and misses of notes, to give the user an overall score. Songs vary by difficulty, but even different parts of songs have spots that seem much harder than the rest of it. The students in Dr. Ramler’s Math 326 class investigated this by developing estimators to test whether missed notes in the songs occur randomly. Students collected their own data by playing Guitar HeroTM in the QRC, like “Hungry Like the Wolf” by Duran Duran and “Ring of Fire” by Johnny Cash and used this data as an application of their estimators. Hits and misses for the songs were recorded as vectors of zeroes and ones. The statistical software package R was used to test the hypothesis that notes were missed at random by the method of bootstrapping. Finally, the estimators were evaluated through a series of simulated scenarios representing different types of missing sequences.