# Current student research projects

To be updated mid-Spring 2014...

# Past student research projects

## Julian Applebaum: *A model of outbound client traffic on the Tor anonymity network* (Honors thesis, 2013; advisor: Norman Danner)

Tor is a popular low latency anonymity system. It works by routing web traffic through a series of relays operated by volunteers across the globe. Academic research on Tor frequently involves proposing new attacks on the network, creating defenses against these attacks, and designing more efficient methods for routing traffic. For ethical and practical reasons, it is often necessary to perform this research on a simulated version of the live Tor network. ExperimenTor and Shadow are two simulation environments that realistically model many aspects of the extant network. Both of these environments, however, only offer crude models of the outgoing web traffic generated by actual Tor users. In this thesis, we seek to improve on these models by collecting traffic data from the live network, clustering it into groups with similar behavior, then training a Hidden Markov Model on each cluster.

## Diego Calderon: *Systemic testing of improved bacteria speciation algorithms* (Honors thesis, 2013; advisors: Fred Cohan [Biology], Danny Krizanc)

For scientists to understand microbial ecosystems, identifying the fundamental unit of bacterial species is imperative. Previous attempts involved characterizing metabolic pathways, or percentage of genome sequence similarity, however these approaches have proven ineffective. The Cohan lab developed a program called Ecotype Simulation (ES) that attempts to demarcate bacterial species based on the Stable Ecotype model of speciation. Previously, our lab has demonstrated the superior accuracy of ES, compared to other demarcators (AdaptML, BAPS, GMYC), with field experiments and sequence simulations. However, ES lacks the efficiency of the other programs. Recently we have made several improvements to the algorithm behind ES that increase its efficiency by a factor of ten to twenty. With these improvements I aim to demonstrate the high accuracy of our second version of Ecotype Simulation (ES2) compared to ES for smaller inputs and then show ES2’s superiority to other demarcators for larger size inputs.

## Lingyan Ke: Parallelization for ecotype simulation (Honors thesis, 2013; advisors: Fred Cohan [Biology], Danny Krizanc)

The main goal of this project is to design and implement a new high performance version of the Ecotype Simulation (ES) algorithm for demarcating bacterial ecotypes from DNA data that is suitable for use on multi-core and distributed computing platforms such as a cluster. In this thesis we parallelize ES by OpenMP and test the resulting program in the Wesleyan Cluster. The test results (on simulated data) show the potential for a significant increase in the size of the data sets ES can process in comparison to the current sequential implementation.

## Stefan Sundseth: Identity authentication and secrecy in the Pi-Calculus and Prolog (Honors thesis, 2013; advisor: Jim Lipton)

Robin Milner developed the Pi-Calculus to formalize the communication between computational systems. Due to it's rigorous nature, an especially significant use of the Pi-Calculus is the evaluation of security protocols, where matters of identity authentication and secrecy are of utmost importance. To automate studies of this sort, Martin Abadi and Bruno Blanchet developed a translation of the Pi-Calculus into the logic programming language Prolog. Using this translation, a complex protocol can be executed to discover if it contains any security flaws. In this thesis I describe a modified version of the Needham-Schroeder protocol in the Pi-Calculus to formally show how its weakness to a man-in-the-middle attack leads to an identity authentication failure, and how that failure can be transformed into the loss of a secret to an attacker. I also show that the same flaw can be found by the program created from Abadi and Blanchet's translation. Finally, I prove that if executing the translation of a Pi-Calculus protocol into Prolog cannot derive the goal *attacker(s)*, meaning that the attacker knows s, then that protocol preserves the secrecy of s and can be called secure.

## Jennifer Paykin: *A formalism for certified cost bounds* (Honors thesis, 2012; advisor: Norman Danner)

We develop a static cost analysis system for a functional target lnaguage with structural list recursion. The cost of evaluating a target expression is defined to be the size of its evaluation derivation in the operational semantics. The complexity of a target expression is a pair consisting of a cost and a potential, which is a measure of size. As an example, the potential of a list is its length and the potential of a lambda-abstraction is a function from potentials to complexities. Complexities are compositional in that the complexity of a term depends only on the complexity of its subterms. A translation function ||.|| maps target expressions to complexities. Our main result is the following soundness theorem: If *t* is a term in the target language, then the cost component of ||*t*|| is an upper bound on the cost of evaluating *t*. We formalize the mathematical development of the system in the proof assistant Coq. We implement target language typing and evaluation in Coq, as well as an independent complexity language to reason about complexities. By formalizing the proof of the soundess theorem in Coq, we obtain a system which automatically generates certified upper bounds on the complexity of target-language terms.

## Jeff Ruberg: *Static type-checking for Python* (Honors thesis, 2012; advisor: Norman Danner)

Pyty is a static type-checker for the Python programming language. Pyty imposes a static type system on users, for which they provide information through type declarations contained in comments. This clashes with many views of how to use Python, but is ideal for many sane scenarios and for developers that prefer guaranteed stability over code elegance. The type system developed is modeled on the Hindley-Milner type inference algorithm for the lambda calculus and its application to ML. Pyty also employs type inference for a restricted subset of the Python language in order to facilitate the proof search algorithm behind type-checking. This approach—verifying if a type is assignable—contrasts with similar works that perform type inference for compiler optimization by assigning types to data and modeling the flow of that data through the program.

## David White: *Traversal of infinite graphs with random local orientations* (M.A. thesis, 2012; advisor: Danny Krizanc)

In 1921 George Poyla famously resolved the question of recurrence versus transience of the simple random walk on integer lattices. I studied this question for the so-called basic walk which arises in the field of robotics as a method for instructing a robot with limited memory on how to explore a room. In my thesis I resolve the question of recurrence versus transience of the basic walk on lattices and then extended this result to all locally finite graphs of bounded degree. I also provide a number of examples to show that the hypotheses in my theorems are necessary and to demonstrate configurations which make the robot do interesting things. Finally, I consider finite graphs and the robotics questions which originally motivated the definition of the basic walk--in particular how quickly the basic walk explores a graph and how much of the graph we can expect a given basic walk to explore. I end with several results on finite graphs, with some conjectures based on experiments, and with some directions for future work.

## Will Boyd: *A simulation of circuit creation in Tor* (Honors thesis, 2011; advisors: Norman Danner, Danny Krizanc)

We simulate the circuit creation behavior of Tor. We first observe the behavior of the network by probing individual routers; we then cluster those observations and then train a hidden Markov model for each cluster. We use these hidden Markov models to produce simulations of the network, and show by a probabilistic evaluation that this simulation produces a more accurate simulation than others proposed and used previously.

## Sam DeFabbia-Kane: *Evaluating the effectiveness of correlation attacks on Tor* (Honors thesis, 2011; advisors: Norman Danner, Danny Krizanc)

Tor is a widely used low-latency anonymity system. It allows users of web browsers, chat clients, and other common low-latency applications to communicate anonymously online by routing their connection through a circuit of Tor routers. However, Tor is commonly assumed to be vulnerable to a wide variety of attacks; one such is an end-to-end correlation attack, whereby an attacker controlling the first and last router in a circuit can use timing and other data to correlate streams observed at those routers and therefore break Tor's anonymity. Most tests of correlation algorithms have been either in simulation or against certain kinds of traffic; in this thesis we test three such algorithms on the deployed Tor network. We also describe a new correlation algorithm. We show that while previously-described attacks do not fare well on the deployed network, our algorithm does work reliably.

## Eli Fox-Epstein: *Forbidden pairs make problems hard* (Honors thesis, 2011; advisor: Danny Krizanc)

Many problems in computer science can be formulated in terms of graphs. Thus, classifying large swaths of graph problems as NP-complete can be useful when reasoning about complexity in any domain. A subgraph of a graph respects a set of forbidden edge pairs if no pair of its edges are in the set of forbidden edge pairs. Given a graph property, the forbidden pairs decision problem is whether or not an arbitrary graph has a subgraph that satisfies the property while respecting a set of forbidden edge pairs. I seek to prove that the forbidden pairs decision problem is NP-complete for interesting classes of graph properties.

## Juan Mendoza: *A Dynamical Systems-based Model for Indoor Robot Navigation* (M.A. thesis, 2011; advisor: Eric Aaron)

Common indoor environments are particularly challenging scenarios for autonomous robot navigation: robots need to navigate in relatively tight and complicated paths, and at the same time be responsive to unpredictable changes in the environment, such as those created by humans walking around. The dynamical systems approach to navigation enables very responsive navigation in changing environments, but previous models that use this approach have not focused on the complexities of indoor environments and the challenges they can present to reactive methods such as dynamical systems-based ones. This thesis presents a new dynamical systems-based model that seeks to improve performance in complex obstacle configurations while maintaining the responsiveness of traditional dynamical systems-based models. This is achieved through three contributions: new competitive dynamics between target seeking and obstacle avoidance more accurately model the situations where either of those tasks should be turned off; adaptive obstacle representations provide a single effective framework for mapping obstacles of different shapes to repellers; observed obstacles are manipulated by joining ones that are too close to each other into larger obstacles and approximating sharp concavities with smooth arcs to minimize problems during navigation. Experiments using real and simulated robots show that this model enables effective navigation in several particularly challenging indoor scenarios, and simulations suggest that the model could enable safe and effective navigation in a natural indoor office environment.

## Foster Nichols: *Dynamical BDI-Based Intention Recognition for Virtual Agents* (M.A. thesis, 2011; advisor: Eric Aaron)

Virtual agents might need to recognize the intentions of other agents in order to adapt to a dynamic environment. This thesis presents a way to incorporate intention recognition into the cognitive system of a hybrid dynamical cognitive agent (HDCA). HDCAs have cognitive systems that consist of beliefs, desires, and intentions that evolve dynamically over time. While intention recognition methods for Belief-Desire-Intention (BDI) architectures have been implemented in the literature, none have been based on dynamical BDI systems. This work was motivated by the intention recognition tasks that might face an informal college tour guide; adding intention recognition beliefs to a guide's cognitive system allows the guide to respond dynamically to the intentions of a tour taker.
## Sam DeFabbia-Kane: Tor (Research project, 2009-10; advisors: Norman Danner, Danny Krizanc)

Tor (The Onion Router) is an anonymity network. It protects the anonymity of its users by routing their network connections through a circuit of three proxies located around the world. There have been many attacks proposed against Tor aimed at compromising the anonymity it provides. One of the most common types of attack is an end-to-end correlation attack. If an attacker controls the first and last nodes in a circuit, they can compare the network streams at each and identify the sender and the recipient of the stream, compromising its anonymity. One way for an attacker to increase the chance of that happening is for them to kill any circuits they have nodes present in but can't control. The user is then required to create a new circuit, giving the attacker another chance to compromise it. We're working on a method of detecting attackers doing that that's feasible to implement and run on the Tor network.## Carlo Francisco: Ecotype simulation (Research project, 2009-10; advisor: Danny Krizanc)

A central question in microbiology is finding a consistently reliable method to demarcate the groups playing ecologically distinct roles (ecotypes) in a set of bacterial cells within a natural community. Multiple programs have been developed for this purpose. We want to test the accuracy of four of these programs, Ecotype Simulation, AdaptML, GMYC, and BAPS, on real and simulated data. The goal is to see which one yields closer results to the actual known demarcations of the clades that will be tested. Since bacterial cells can be isolated from the habitat they're specialized to, we make use of a variable called "specialization," or the probability that a particular kind of cell (differentiated by its DNA sequence) is isolated in this way. We also intend to include the relative running time of each of the algorithms in the study.

## Juan Pablo Mendoza: *Dynamical-systems based navigation: Modeling and verification* (Honors thesis, 2010; advisor: Eric Aaron)

Autonomous navigation modeling and verification are valuable parts of the design of situated autonomous agents. Navigation modeling is necessary to create mobile agents, such as robots, while verification makes it possible to analyze such models to identify flaws or prove correctness. This paper explores modeling and verification in the context of the dynamical systems approach to navigation (dynamical navigation, for short), in which autonomous navigation arises from the evolution of differential equations governing agent behavior. While dynamical navigation has considerable desirable qualities, like robustness in unpredictable environments and obstacle avoidance at constant speed, its underlying mathematics has the restricting limitation that obstacles must be represented by circles. Representing elongated obstacles, such as walls, is therefore a challenging problem that this paper addresses by presenting a dynamic tangent (DT) model for wall representation: an agent represents each wall by a circular obstacle, tangent to the wall, that resizes and moves with the agent to support effective dynamical navigation. Simulations show that the DT model supports more successful navigation than previously employed wall representations, and it is comparable in efficiency to the fastest previous representations. This paper also describes a heuristic reachability model for analyzing dynamical navigation. This model, based on reachability analysis, examines the overall behavior of a system by analyzing approximations of its behavior locally in subregions of the world. Heuristic reachability is applied to estimate relative difficulty of different navigation environments.

## Foster Nichols, *Constructing polygonal maps for navigating agents using extracted line segments* (Honors thesis, 2010; advisor: Eric Aaron)

This thesis describes a way to use line segments extracted from images to construct precise polygonal maps of obstacles in an environment. Agents such as robots or animated characters could use naively extracted line segments to avoid obstacles as part of intelligent navigation, except that extracted segments do not usually perfectly represent obstacles: for example, squares found by a popular technique called the Hough Transform (HT) often have gaps in one side or have two sides that do not meet at a corner. This research augments the HT to extend and join extracted line segments so that obstacles are more precisely represented. In a series of tests, simple shapes were represented precisely, but more complicated shapes were sometimes mistakenly joined to other shapes or confused with noise in the image. Future work could improve the line-joining algorithm and generalize the augmented HT to a broader range of environments.

## Henny Admoni, *Demonstrations of Dynamical Intention for Hybrid Agents* (M.A. thesis, 2009; advisor: Eric Aaron)

Representations of intention shared by reactive and deliberative systems of hybrid agents enable seamless integration of high-level logical reasoning and low-level behavioral response. This thesis presents an architecture for hybrid dynamical cognitive agents (HDCAs), hybrid reactive/deliberative agents with cognitive systems of continuously evolving beliefs, desires, and intentions based on BDI and spreading activation network models. Dynamical intentions support goal-directed behavior in both reactive and deliberative systems of HDCAs: on the reactive level, dynamical intentions allow for continuous cognitive evolution and real-time task re-sequencing; on the deliberative level, dynamical intentions enable logical reasoning and plan generation. Because intention representations are shared between both systems, reactive behavior and goal-directed deliberation are straightforwardly integrated in HDCAs. Additionally, Hebbian learning on connections in the spreading activation network of beliefs, desires, and intentions trains HDCAsâ€™ reactive systems to respond to typically deliberative-level information. To establish comparability between HDCAs and traditional BDI-based architectures, dynamical intentions are shown to be consistent with the philosophical definition of intention from other BDI models. Simulations of autonomous, embodied HDCAs navigating to complete tasks in a grid city environment illustrate dynamical, intention-based behavior that derives from clean integration of reactive and deliberative systems.
## Adam Robbins-Pianka, *Modeling mRNA Secondary Structure Effects on Translation Initiation in Saccharomyces cerevisiae* (M.A. thesis, 2009; advisor: Mike Rice)

This M.A. thesis studies the influence of RNA secondary structure on translational events in Saccharoymces cerevisiae.
## Jon Gillick, *A Clustering Algorithm for Recombinant Jazz Improvisations* (Honors thesis, 2009; advisor: Danny Krizanc)

Music, one of the most structurally analyzed forms of human creativity, provides an opportune platform for computer simulation of human artistic choice. This thesis addresses the question of how well a computer model can capture and imitate the improvisational style of a jazz soloist. How closely can improvisational style be approximated by a set of rules? Can a computer program write music that, even to the trained ear, is indistinguishable from a piece improvised by a well-known player?

We discuss computer models for jazz improvisation and introduce a new system, Recombinant Improvisations for Jazz Riffs (Riff Jr.), based on Hidden Markov Models, the global structure of jazz solos, and a clustering algorithm. Our method represents improvements largely because of attention paid to the full structure of improvisations.

To verify the effectiveness of our program, we tested whether listeners could tell the difference between human solos and computer improvisations. In a survey asking subjects to identify which of four solos were by Charlie Parker and which were by Riff Jr., only 45 percent of answers among 120 people were correct, and less than 5 percent identified all four correctly.

## Henny Admoni, *Decision Making and Learning for Hybrid Dynamical Agents* (Honors Thesis, General Scholarship, 2008; advisor: Eric Aaron)

Models of human decision making and learning may benefit from being implemented in cognitively-inspired, dynamic frameworks. This paper describes one such framework built on a hybrid dynamical system, which models agent cognition as the continuous evolution of cognitive elements with occasional discrete changes in patterns of behavior. Cognitive elements are beliefs, desires, and intentions, from the Belief-Desire-Intention theory of human rationality, along with ground concepts and sequencing intentions. Continuous dynamics of the hybrid system result from changes in element activation levels; activation can spread to related elements over links in a neurologically-inspired spreading activation network. This hybrid system performs decision making by selecting new patterns of behavior when activation levels of elements reach certain configurations, and performs learning by strengthening spreading activation links between elements. This system is capable of representing long-term sequences of actions, or plans, as well as dynamically replanning by selecting new actions when environmental changes make old action sequences undesirable. In simulations of the model, agents navigated grid world environments while dynamically selecting goals and actions to fulfill those goals, and autonomously learned associations between people and places in their environments.
## Bach Dao, *A Simulation Study of Protein Family Degree Distribution* (Honors thesis, 2008; advisor: Danny Krizanc)

The focus of this thesis is the degree distribution of protein family network graphs. We propose three models of evolution that generate a protein family. The first one uses Sequence Alignment to quantify protein relationship while the second uses the number of mutations accumulated on proteins. The last model incorporates explicitly preferential attachment. Following many other studies, we consider three operations: duplication, gene death and mutation. The main result that we report from the three models is that although the exponential distribution is the best fit of the data, the power law distribution fits the data well with the rates of evolution found in many studies.

## Jesse Farnham, *Performance Enhancement and Equivalence Criteria for Cellular Automaton-Based Tumor Simulations* (Honors thesis, 2008; advisor: Eric Aaron)

This paper describes a performance enhancement to a cellular automaton-based simulation of tumor growth. A cellular automaton is a grid-based data structure whose elements are updated according to specific rules. Elements represent small areas of simulated tissue and contain local cell population and nutrient concentration data for those areas. The simulation's computational efficiency is enhanced by suppressing updates to \emph{steady-state} tissue locations, areas with little change in nutrient concentration or cell population. This modification produces different tissue development from the original, canonical method in which all tissue areas are updated on every timestep. This paper defines several criteria by which a modified, efficient simulation can be considered equivalent to an original, canonical method; these criteria are applied to determine whether or not the performance enhancement can be considered equivalent to the original simulation.
## Daniel Hore, *Implementation and Type-checking for ***ATR** (Honors thesis, 2007; advisor: Norman Danner)

**ATR**

In this thesis we implement an interpreter and describe a principal-type algorithm for the language ATR (for *Affine Tiered Recursion*) defined by Danner and Royer. The interpreter and the principal type algorithm (when implemented) will allow programmers to program in ATR . This becomes interesting upon understanding the constraints guaranteed for any program that successfully compiles in ATR, namely that any program that compiles in ATR runs in polynomial time. However, the notion of polynomial time must be extended since ATR contains higher order types.

## Brendan Dolan-Gavitt, *Timing Attacks in Anonymity-Providing Systems* (Honors thesis, 2006; advisors: Norman Danner, Danny Krizanc)

In general, when a client accesses any server over the Internet, a great deal of information is trivially obtainable. In particular, the sender's IP address is included in every packet of data they transmit, and this information can be used to determine their Internet Service Provider and in many cases their approximate geographic location. In addition, each packet contains the IP address of its destination. Anonymous networks provide a way of preventing outside observers from determining that an initiator and a responder are, in fact, communicating.

The most common way of achieving this anonymity is by routing the connection through a series of intermediate proxy servers, along the way mixing or delaying packets in order to frustrate attempts at uncovering the sender through traffic analysis. However, for most interactive applications, such as web browsing, SSH sessions, or live chat, too much mixing or delaying will result in a poor or unusable experience for the user. A web browsing session, would quickly become frustrating if it took five minutes for each page to load. It seems, then, that there is a trade-off between anonymity and speed. This thesis examines what threats are actually posed by traffic analysis against low-latency mix-nets and how practical it is to implement them on a real network, using Tor as our case study.