2017

Emily Black: Towards a Formal Verification of Courcelle's Theorem (Honors thesis 2017; advisor: Dan Licata)

The goal of this project is to complete a formal verification of Courcelle’s Theorem in Agda. On a high level, Courcelle's theorem states that if you ask certain kinds of questions about certain kinds of graphs, you can get a certain kind of linear time answer. This theorem is important because it applies to many NP-hard problems, such as vertex cover and dominating set, and thus gives a quasi-linear algorithm for deciding certain questions (monadic second order logic questions) on a certain class of graphs (bounded treewidth graphs). Verification via proof assistants is a method of ensuring that either a mathematical proof or a program is correct, in a way that removes much of the possibility for human error. At this stage, we have completed our translation of the definitions given in Courcelle’s Theorem: A Game Theoretic Approach by Kneis et al. into Agda, which is the proof that we have chosen to verify. This involved reformulating several complex mathematical concepts into machine-checkable definitions. Specifically, we define symbols, signatures, structures, expansions of structures, restrictions of structures, tree decompositions, MSO formulas, isomorphisms of structures, model checking games, extended model checking games, equivalence between game positions and subgames, and finally a notion of a reduced game. The bulk of this thesis document is a detailed explanation of the fact that the mathematical definitions we formalize correspond to the way we encode them. This is in an effort to convince the reader that, once we do finish the complete proof, it will be correct since our definitions are correct, for the only way a proof assistant can be incorrect is by the author of the program unintentionally encoding the wrong statement of the theorem. Additionally, we have defined the types of the functions corresponding to the lemmas in the paper that prove Courcelle’s Theorem, however we have yet to implement them. We also have yet to prove the fact that the algorithm is fixed-parameter tractable linear; this is next on our agenda after completing the proof of the algorithm itself. After this work is complete, we hope to extend our verification of Courcelle’s Theorem to some of its applications.

Joomy Korkut: Thinking Outside the Box: Verified Compilation of ML5 to Javascript (Honors thesis 2017; advisor: Dan Licata)

Curry-Howard correspondence describes a language that corresponds to propositional logic. Since modal logic is an extension of propositional logic, then what language corresponds to modal logic? If there is one, then what is it good for? Murphy's dissertation argues that a programming language designed based on modal type systems can provide elegant abstractions to organize local resources on different computers. In this thesis, I limit his argument to simple web programming and claim that a modal logic based language provides a way to write readable code and correct web applications. To do this, I defined a minimal language called ML5 in the Agda proof assistant and then implemented a compiler to JavaScript for it and proved its static correctness. The compiler is a series of type directed translations through fully formalized languages, the last one of which is a very limited subset of JavaScript. As opposed to Murphy's compiler, this one compiles to JavaScript both on the front end and back end through Node.js. Currently the last step of conversion to JavaScript is not entirely complete. We have not specified the conversion rules for the modal types, and network communication only has a partially working proof-of-concept.

Miriam Lester: Mining Design Rationale from Software Documentation Using Nature Inspired Metaheuristics (M.A. thesis 2017; advisor: Janet Burge)

Design rationale refers to the explicit documentation of the reasons design decisions are made, the alternative design options considered, and the justification supporting these decisions. If properly recorded during the software development life cycle, design rationale documentation can be extremely useful to software engineers. However, because of the costs associated with maintaining extra documentation, it is often omitted. Fortunately, design rationale is naturally embedded in other documents that are created in the general course of software development - such as bug reports, project specifications, emails, or meeting transcripts. I will introduce the text mining problem of automatic extraction of design rationale from such software documentation, and will describe the machine learning based approach to capturing the rationale. I will discuss the feature selection step of the machine learning method, and give background on the genetic algorithm and ant colony optimization metaheuristic chosen in this research for the feature selection procedure. I will show that by applying these text mining techniques to classification of design rationale in software documentation, we are able to achieve higher F-measure than our estimate of what would be achievable manually.

Sam Stern: The Tree of Trees: on methods for finding all non-isomorphic tree-realizations of degree sequences (Honors thesis 2017; advisor: Danny Krizanc)

A degree sequence for a graph is a list of positive integers, one for every vertex, where each integer corresponds to the number of neighbors of that vertex. It is possible to create sequences that have no corresponding graphs, as well as sequences that correspond to multiple distinct (i.e., non-isomorphic) graphs. A graph that corresponds to a given degree sequence is called a realization of that sequence, and those degree sequences corresponding to trees are referred to as tree-realizable, or tree-able, sequences. In this thesis, we present a program that determines the number of tree realizations for any tree-able sequence, along with the sequences with the most tree-realizations (for fixed sequence lengths).

Maksim Trifunovski: An Interactive Proof Assistant for Linear Logic (Honors thesis 2017; advsior: Dan Licata)

Building formal proofs is made easier by tools called proof assistants. In this thesis we present the process of building a proof assistant for propositional intuitionistic linear logic. Linear logic is a refinement of classical and intuitionistic logic which puts its main focus on the role of its formulas as resources. While the consumption of formulas as if they are resources is a big advantage of linear logic that other logics do not offer, it is also a burden when trying to build proofs. Resource allocation in rules like Tensor Right is a difficult task since we need to predict where resources are going to be used by the rest of the proof before building it. Previous work by Hodas and Miller presents a way of using Input-Output contexts to work around this issue. But because we are building a proof-assistant and not an automated theorem prover, we need to allow the user to be able to switch between goals at any time, and be able to construct the proof in any order they want, so we cannot solve the problem with such contexts. We tackle this problem by allowing unbounded context growth when moving up the proof derivation tree, and instead allowing terms to only use variables from a given resources multiset which is a subset of the whole context. These subsets then have to satisfy a given set of equations that make them suitable for simulating context splittings and changes. In order to allow for incremental proof construction we use meta variables to stand for incomplete terms and modal contexts to store said variables, which builds on previous work by Nanevski et al. We define two sequent calculi, a base one, and one that represents the implementation. We present a cut admissibility proof that proves that our base sequent calculus is consistent, as well as a theorem which shows that every proof in the implementation calculus, has a proof in the base calculus as well.

2016

Celeste Barnaby: Tools for extracting cost information from functional programs (Research project, Summer 2016; advisor: Norman Danner)

Emily Black: Formal verification of red-black tree deletion (Research project, 2015-16; advisor: Daniel Licata)

Ben Hudson: Computer-checked Recurrence Extraction for Functional Programs (M.A. thesis, 2016; advisor: Daniel Licata)

This thesis continues work towards a computer-checked system for automatic complexity analysis of functional programs. Following [Danner, N., Licata, D. R., and Ramyaa, R., Denotational cost semantics for functional languages with inductive types, Proceedings of the International Conference on Functional Programming, 2015], we extract cost information from a source language to a monadic metalanguage which we call the complexity language, and give a denotational semantics to the complexity language which characterizes recurrences as monotone maps between preorder structures. Building upon the work presented in [Hudson, Bowornmet, Certified Cost Bounds in Agda: A Step Towards Automated Complexity Analysis, Honors Thesis, Wesleyan University, 2015], we extend the source language with more types, and prove a soundness result for the denotational semantics we give to the complexity language. In addition, we observe that the recurrences we obtain using our system resemble recurrences obtained manually. We implement all of our work in Agda, a dependently-typed programming language and proof assistant.

Theodore Kim: Cost Semantics for Plotkin's PCF (M.A. thesis, 2016; advisor: Norman Danner)

We develop a formalism for extracting recurrences from programs written in a variant of Plotkin's PCF, a typed higher-order functional language that integrates functions and natural numbers through general recursion. The complexity of an expression is defined as a pair consisting of a cost and a potential. The former is the explicit cost of evaluating the expression to a value while the latter is the cost of future use of the value. The extraction function is a language-to-language translation that makes evaluation costs explicit by mapping source language expressions to complexity language expressions. We prove by a logical relations argument that recurrences extracted by the extraction function are upper bounds for evaluation cost. The proof makes use of bounded recursion, a key property of which is the principle of fixed point induction, which allows us to be able to reason about recursive computations.

Joomy Korkut: Verified regular expression matching, verified compilers (Research project, 2015-16; advisor: Daniel Licata)

Justin Raymond: Extracting Cost Reucrrences from Sequential and Parallel Functional Programs (M.A. thesis, 2016; advisor: Norman Danner)

Complexity analysis aims to predict the resources, most often time and space, which a program requires. We build on previous work by [Danner, N., Paykin, J., and Royer, J., A static cost analysis of a higher-order language, Proceedings of the 7th Workshop on Programming Languages Meets Program Verification, 2013] and [Danner, N., Licata, D. R., and Ramyaa, R., Denotational cost semantics for functional languages with inductive types, Proceedings of the International Conference on Functional Programming, 2015] which formalizes the extraction of recurrences for evaluation cost from higher order functional programs. Source language programs are translated into a complexity language. The translation of a program is a pair of a cost, a bound on the cost of evaluating the program to a value, and a potential, the cost of future use of the value. We use the formalization to analyze the time complexity of higher order functional programs. We also demonstrate the flexibility of the method by extending it to parallel cost semantics. In parallel cost semantics, costs are cost graphs, which express dependencies between subcomputations in the program. We prove by logical relations that the extracted recurrences are an upper bound on the evaluation cost of the original program. We also give examples of the analysis of higher order functional programs under the parallel evaluation semantics. We also prove the recurrence for the potential of a program does not depend on the cost of the program.

Pi Songkuntham: Tools for extracting cost information from functional programs (Research project, Summer 2016; advisor: Norman Danner)

Max Trifunovski: Verified regular expression matching, implementation of directed type theory (Research project, 2015-16; advisor: Daniel Licata)

Sam Zhu: Tools for extracting cost information from functional programs (Research project, Summer 2016; advisor: Norman Danner)

2015

Ben Hudson: Certified Cost Bounds in Agda: A Step Towards Automated Complexity Analysis (Honors thesis, 2015; advisor: Daniel Licata)

Automatically computing the time complexity of a recursive program relies on not only extracting recurrence relations from source code, but also proving that the costs obtained by the recurrences are upper bounds on the running time of the code. We do this by proving a bounding theorem, which states that the costs produced by the translation of source programs represent upper bounds on the costs specified by the source language operational semantics. This approach is inspired by the work of Danner, Paykin, and Royer. We formalize our work in Agda, a dependently-typed functional programming language and proof assistant, to ensure that we develop a reliable and mathematically correct system with which to extract recurrences.

Shalisa Pattarawuttiwong: The Effects of Clustering Technique on Simluation of the Tor Network (Honors thesis, 2015; advisor: Norman Danner)

Tor is a low-latency network built for anonymity online. Due to ethical concerns, attacks and protocol changes cannot be attempted on the actual network and as a result, considerable work has been done in an attempt to precisely model the network itself. While many simulation platforms attempt to model the Tor network for research purposes, most platforms contain simple unrealistic models of the behavior of actual Tor users. Without an appropriate model of network traffic, the simulated network cannot fully mimic the real Tor network, potentially weakening any previous conclusions. This thesis presents an examination of the effects of clustering algorithm and distance measure choices on the quality of a user model. In particular, the goal is to improve a previously established generative model of outbound data with the end goal of realistically simulating user traffic of the Tor network.

2014

Elliot Meyerson: The Dynamic Map Visitation Problem: Foremost Waypoint Coverage of Time-varying Graphs (Honors thesis, 2014; advisors: Eric Aaron, Danny Krizanc)

This thesis introduces the Dynamic Map Visitation Problem (DMVP), in which a team of agents must visit a collection of critical locations as quickly as possible, in an environment that may change rapidly and unpredictably during the agents' navigation. We apply recent formulations of time-varying graphs (TVGs) to DMVP, shedding new light on the computational hierarchy $\mathcal{R} \supset \mathcal{B} \supset \mathcal{P}$ of TVG classes by analyzing them in the context of graph navigation. We provide hardness results for all three classes, and for several restricted topologies, we show a separation between the classes by showing severe inapproximability in $\mathcal{R}$, limited approximability in $\mathcal{B}$, and tractability in $\mathcal{P}$. We also give topologies in which DMVP in $\mathcal{R}$ is fixed parameter tractable, which may serve as a first step toward fully characterizing the features that make DMVP difficult.

2013

Julian Applebaum: A model of outbound client traffic on the Tor anonymity network (Honors thesis, 2013; advisor: Norman Danner)

Tor is a popular low latency anonymity system. It works by routing web traffic through a series of relays operated by volunteers across the globe. Academic research on Tor frequently involves proposing new attacks on the network, creating defenses against these attacks, and designing more efficient methods for routing traffic. For ethical and practical reasons, it is often necessary to perform this research on a simulated version of the live Tor network. ExperimenTor and Shadow are two simulation environments that realistically model many aspects of the extant network. Both of these environments, however, only offer crude models of the outgoing web traffic generated by actual Tor users. In this thesis, we seek to improve on these models by collecting traffic data from the live network, clustering it into groups with similar behavior, then training a Hidden Markov Model on each cluster.

Diego Calderon: Systemic testing of improved bacteria speciation algorithms (Honors thesis, 2013; advisors: Fred Cohan [Biology], Danny Krizanc)

For scientists to understand microbial ecosystems, identifying the fundamental unit of bacterial species is imperative. Previous attempts involved characterizing metabolic pathways, or percentage of genome sequence similarity, however these approaches have proven ineffective. The Cohan lab developed a program called Ecotype Simulation (ES) that attempts to demarcate bacterial species based on the Stable Ecotype model of speciation. Previously, our lab has demonstrated the superior accuracy of ES, compared to other demarcators (AdaptML, BAPS, GMYC), with field experiments and sequence simulations. However, ES lacks the efficiency of the other programs. Recently we have made several improvements to the algorithm behind ES that increase its efficiency by a factor of ten to twenty. With these improvements I aim to demonstrate the high accuracy of our second version of Ecotype Simulation (ES2) compared to ES for smaller inputs and then show ES2’s superiority to other demarcators for larger size inputs.

Lingyan Ke: Parallelization for ecotype simulation (Honors thesis, 2013; advisors: Fred Cohan [Biology], Danny Krizanc)

The main goal of this project is to design and implement a new high performance version of the Ecotype Simulation (ES) algorithm for demarcating bacterial ecotypes from DNA data that is suitable for use on multi-core and distributed computing platforms such as a cluster. In this thesis we parallelize ES by OpenMP and test the resulting program in the Wesleyan Cluster. The test results (on simulated data) show the potential for a significant increase in the size of the data sets ES can process in comparison to the current sequential implementation.

Stefan Sundseth: Identity authentication and secrecy in the Pi-Calculus and Prolog (Honors thesis, 2013; advisor: Jim Lipton)

Robin Milner developed the Pi-Calculus to formalize the communication between computational systems. Due to it's rigorous nature, an especially significant use of the Pi-Calculus is the evaluation of security protocols, where matters of identity authentication and secrecy are of utmost importance. To automate studies of this sort, Martin Abadi and Bruno Blanchet developed a translation of the Pi-Calculus into the logic programming language Prolog. Using this translation, a complex protocol can be executed to discover if it contains any security flaws. In this thesis I describe a modified version of the Needham-Schroeder protocol in the Pi-Calculus to formally show how its weakness to a man-in-the-middle attack leads to an identity authentication failure, and how that failure can be transformed into the loss of a secret to an attacker. I also show that the same flaw can be found by the program created from Abadi and Blanchet's translation. Finally, I prove that if executing the translation of a Pi-Calculus protocol into Prolog cannot derive the goal *attacker(s)*, meaning that the attacker knows s, then that protocol preserves the secrecy of s and can be called secure.

2012

Jennifer Paykin: A formalism for certified cost bounds (Honors thesis, 2012; advisor: Norman Danner)

We develop a static cost analysis system for a functional target lnaguage with structural list recursion. The cost of evaluating a target expression is defined to be the size of its evaluation derivation in the operational semantics. The complexity of a target expression is a pair consisting of a cost and a potential, which is a measure of size. As an example, the potential of a list is its length and the potential of a lambda-abstraction is a function from potentials to complexities. Complexities are compositional in that the complexity of a term depends only on the complexity of its subterms. A translation function ||.|| maps target expressions to complexities. Our main result is the following soundness theorem: If t is a term in the target language, then the cost component of ||t|| is an upper bound on the cost of evaluating t. We formalize the mathematical development of the system in the proof assistant Coq. We implement target language typing and evaluation in Coq, as well as an independent complexity language to reason about complexities. By formalizing the proof of the soundess theorem in Coq, we obtain a system which automatically generates certified upper bounds on the complexity of target-language terms.

Jeff Ruberg: Static type-checking for Python (Honors thesis, 2012; advisor: Norman Danner)

Pyty is a static type-checker for the Python programming language. Pyty imposes a static type system on users, for which they provide information through type declarations contained in comments. This clashes with many views of how to use Python, but is ideal for many sane scenarios and for developers that prefer guaranteed stability over code elegance. The type system developed is modeled on the Hindley-Milner type inference algorithm for the lambda calculus and its application to ML. Pyty also employs type inference for a restricted subset of the Python language in order to facilitate the proof search algorithm behind type-checking. This approach—verifying if a type is assignable—contrasts with similar works that perform type inference for compiler optimization by assigning types to data and modeling the flow of that data through the program.

David White: Traversal of infinite graphs with random local orientations (M.A. thesis, 2012; advisor: Danny Krizanc)

In 1921 George Poyla famously resolved the question of recurrence versus transience of the simple random walk on integer lattices. I studied this question for the so-called basic walk which arises in the field of robotics as a method for instructing a robot with limited memory on how to explore a room. In my thesis I resolve the question of recurrence versus transience of the basic walk on lattices and then extended this result to all locally finite graphs of bounded degree. I also provide a number of examples to show that the hypotheses in my theorems are necessary and to demonstrate configurations which make the robot do interesting things. Finally, I consider finite graphs and the robotics questions which originally motivated the definition of the basic walk--in particular how quickly the basic walk explores a graph and how much of the graph we can expect a given basic walk to explore. I end with several results on finite graphs, with some conjectures based on experiments, and with some directions for future work.

2011

Will Boyd: A simulation of circuit creation in Tor (Honors thesis, 2011; advisors: Norman Danner, Danny Krizanc)

We simulate the circuit creation behavior of Tor. We first observe the behavior of the network by probing individual routers; we then cluster those observations and then train a hidden Markov model for each cluster. We use these hidden Markov models to produce simulations of the network, and show by a probabilistic evaluation that this simulation produces a more accurate simulation than others proposed and used previously.

Sam DeFabbia-Kane: Evaluating the effectiveness of correlation attacks on Tor (Honors thesis, 2011; advisors: Norman Danner, Danny Krizanc)

Tor is a widely used low-latency anonymity system. It allows users of web browsers, chat clients, and other common low-latency applications to communicate anonymously online by routing their connection through a circuit of Tor routers. However, Tor is commonly assumed to be vulnerable to a wide variety of attacks; one such is an end-to-end correlation attack, whereby an attacker controlling the first and last router in a circuit can use timing and other data to correlate streams observed at those routers and therefore break Tor's anonymity. Most tests of correlation algorithms have been either in simulation or against certain kinds of traffic; in this thesis we test three such algorithms on the deployed Tor network. We also describe a new correlation algorithm. We show that while previously-described attacks do not fare well on the deployed network, our algorithm does work reliably.

Eli Fox-Epstein: Forbidden pairs make problems hard (Honors thesis, 2011; advisor: Danny Krizanc)

Many problems in computer science can be formulated in terms of graphs. Thus, classifying large swaths of graph problems as NP-complete can be useful when reasoning about complexity in any domain. A subgraph of a graph respects a set of forbidden edge pairs if no pair of its edges are in the set of forbidden edge pairs. Given a graph property, the forbidden pairs decision problem is whether or not an arbitrary graph has a subgraph that satisfies the property while respecting a set of forbidden edge pairs. I seek to prove that the forbidden pairs decision problem is NP-complete for interesting classes of graph properties.

Juan Mendoza: A Dynamical Systems-based Model for Indoor Robot Navigation (M.A. thesis, 2011; advisor: Eric Aaron)

Common indoor environments are particularly challenging scenarios for autonomous robot navigation: robots need to navigate in relatively tight and complicated paths, and at the same time be responsive to unpredictable changes in the environment, such as those created by humans walking around. The dynamical systems approach to navigation enables very responsive navigation in changing environments, but previous models that use this approach have not focused on the complexities of indoor environments and the challenges they can present to reactive methods such as dynamical systems-based ones. This thesis presents a new dynamical systems-based model that seeks to improve performance in complex obstacle configurations while maintaining the responsiveness of traditional dynamical systems-based models. This is achieved through three contributions: new competitive dynamics between target seeking and obstacle avoidance more accurately model the situations where either of those tasks should be turned off; adaptive obstacle representations provide a single effective framework for mapping obstacles of different shapes to repellers; observed obstacles are manipulated by joining ones that are too close to each other into larger obstacles and approximating sharp concavities with smooth arcs to minimize problems during navigation. Experiments using real and simulated robots show that this model enables effective navigation in several particularly challenging indoor scenarios, and simulations suggest that the model could enable safe and effective navigation in a natural indoor office environment.

Foster Nichols: Dynamical BDI-Based Intention Recognition for Virtual Agents (M.A. thesis, 2011; advisor: Eric Aaron)

Virtual agents might need to recognize the intentions of other agents in order to adapt to a dynamic environment. This thesis presents a way to incorporate intention recognition into the cognitive system of a hybrid dynamical cognitive agent (HDCA). HDCAs have cognitive systems that consist of beliefs, desires, and intentions that evolve dynamically over time. While intention recognition methods for Belief-Desire-Intention (BDI) architectures have been implemented in the literature, none have been based on dynamical BDI systems. This work was motivated by the intention recognition tasks that might face an informal college tour guide; adding intention recognition beliefs to a guide's cognitive system allows the guide to respond dynamically to the intentions of a tour taker.

2010

Sam DeFabbia-Kane: Tor (Research project, 2009-10; advisors: Norman Danner, Danny Krizanc)

Tor (The Onion Router) is an anonymity network. It protects the anonymity of its users by routing their network connections through a circuit of three proxies located around the world. There have been many attacks proposed against Tor aimed at compromising the anonymity it provides. One of the most common types of attack is an end-to-end correlation attack. If an attacker controls the first and last nodes in a circuit, they can compare the network streams at each and identify the sender and the recipient of the stream, compromising its anonymity. One way for an attacker to increase the chance of that happening is for them to kill any circuits they have nodes present in but can't control. The user is then required to create a new circuit, giving the attacker another chance to compromise it. We're working on a method of detecting attackers doing that that's feasible to implement and run on the Tor network.

Carlo Francisco: Ecotype simulation (Research project, 2009-10; advisor: Danny Krizanc)

A central question in microbiology is finding a consistently reliable method to demarcate the groups playing ecologically distinct roles (ecotypes) in a set of bacterial cells within a natural community. Multiple programs have been developed for this purpose. We want to test the accuracy of four of these programs, Ecotype Simulation, AdaptML, GMYC, and BAPS, on real and simulated data. The goal is to see which one yields closer results to the actual known demarcations of the clades that will be tested. Since bacterial cells can be isolated from the habitat they're specialized to, we make use of a variable called "specialization," or the probability that a particular kind of cell (differentiated by its DNA sequence) is isolated in this way. We also intend to include the relative running time of each of the algorithms in the study.

Juan Pablo Mendoza: Dynamical-systems based navigation: Modeling and verification (Honors thesis, 2010; advisor: Eric Aaron)

Autonomous navigation modeling and verification are valuable parts of the design of situated autonomous agents. Navigation modeling is necessary to create mobile agents, such as robots, while verification makes it possible to analyze such models to identify flaws or prove correctness. This paper explores modeling and verification in the context of the dynamical systems approach to navigation (dynamical navigation, for short), in which autonomous navigation arises from the evolution of differential equations governing agent behavior. While dynamical navigation has considerable desirable qualities, like robustness in unpredictable environments and obstacle avoidance at constant speed, its underlying mathematics has the restricting limitation that obstacles must be represented by circles. Representing elongated obstacles, such as walls, is therefore a challenging problem that this paper addresses by presenting a dynamic tangent (DT) model for wall representation: an agent represents each wall by a circular obstacle, tangent to the wall, that resizes and moves with the agent to support effective dynamical navigation. Simulations show that the DT model supports more successful navigation than previously employed wall representations, and it is comparable in efficiency to the fastest previous representations. This paper also describes a heuristic reachability model for analyzing dynamical navigation. This model, based on reachability analysis, examines the overall behavior of a system by analyzing approximations of its behavior locally in subregions of the world. Heuristic reachability is applied to estimate relative difficulty of different navigation environments.

Foster Nichols, Constructing polygonal maps for navigating agents using extracted line segments (Honors thesis, 2010; advisor: Eric Aaron)

This thesis describes a way to use line segments extracted from images to construct precise polygonal maps of obstacles in an environment. Agents such as robots or animated characters could use naively extracted line segments to avoid obstacles as part of intelligent navigation, except that extracted segments do not usually perfectly represent obstacles: for example, squares found by a popular technique called the Hough Transform (HT) often have gaps in one side or have two sides that do not meet at a corner. This research augments the HT to extend and join extracted line segments so that obstacles are more precisely represented. In a series of tests, simple shapes were represented precisely, but more complicated shapes were sometimes mistakenly joined to other shapes or confused with noise in the image. Future work could improve the line-joining algorithm and generalize the augmented HT to a broader range of environments.

2009

Henny Admoni, Demonstrations of Dynamical Intention for Hybrid Agents (M.A. thesis, 2009; advisor: Eric Aaron)

Representations of intention shared by reactive and deliberative systems of hybrid agents enable seamless integration of high-level logical reasoning and low-level behavioral response. This thesis presents an architecture for hybrid dynamical cognitive agents (HDCAs), hybrid reactive/deliberative agents with cognitive systems of continuously evolving beliefs, desires, and intentions based on BDI and spreading activation network models. Dynamical intentions support goal-directed behavior in both reactive and deliberative systems of HDCAs: on the reactive level, dynamical intentions allow for continuous cognitive evolution and real-time task re-sequencing; on the deliberative level, dynamical intentions enable logical reasoning and plan generation. Because intention representations are shared between both systems, reactive behavior and goal-directed deliberation are straightforwardly integrated in HDCAs. Additionally, Hebbian learning on connections in the spreading activation network of beliefs, desires, and intentions trains HDCAsâ€™ reactive systems to respond to typically deliberative-level information. To establish comparability between HDCAs and traditional BDI-based architectures, dynamical intentions are shown to be consistent with the philosophical definition of intention from other BDI models. Simulations of autonomous, embodied HDCAs navigating to complete tasks in a grid city environment illustrate dynamical, intention-based behavior that derives from clean integration of reactive and deliberative systems.

Adam Robbins-Pianka, Modeling mRNA Secondary Structure Effects on Translation Initiation in Saccharomyces cerevisiae (M.A. thesis, 2009; advisor: Mike Rice)

This M.A. thesis studies the influence of RNA secondary structure on translational events in Saccharoymces cerevisiae.

Jon Gillick, A Clustering Algorithm for Recombinant Jazz Improvisations (Honors thesis, 2009; advisor: Danny Krizanc)

Music, one of the most structurally analyzed forms of human creativity, provides an opportune platform for computer simulation of human artistic choice. This thesis addresses the question of how well a computer model can capture and imitate the improvisational style of a jazz soloist. How closely can improvisational style be approximated by a set of rules? Can a computer program write music that, even to the trained ear, is indistinguishable from a piece improvised by a well-known player?

We discuss computer models for jazz improvisation and introduce a new system, Recombinant Improvisations for Jazz Riffs (Riff Jr.), based on Hidden Markov Models, the global structure of jazz solos, and a clustering algorithm. Our method represents improvements largely because of attention paid to the full structure of improvisations.

To verify the effectiveness of our program, we tested whether listeners could tell the difference between human solos and computer improvisations. In a survey asking subjects to identify which of four solos were by Charlie Parker and which were by Riff Jr., only 45 percent of answers among 120 people were correct, and less than 5 percent identified all four correctly.

2008

Henny Admoni, Decision Making and Learning for Hybrid Dynamical Agents (Honors Thesis, General Scholarship, 2008; advisor: Eric Aaron)

Models of human decision making and learning may benefit from being implemented in cognitively-inspired, dynamic frameworks. This paper describes one such framework built on a hybrid dynamical system, which models agent cognition as the continuous evolution of cognitive elements with occasional discrete changes in patterns of behavior. Cognitive elements are beliefs, desires, and intentions, from the Belief-Desire-Intention theory of human rationality, along with ground concepts and sequencing intentions. Continuous dynamics of the hybrid system result from changes in element activation levels; activation can spread to related elements over links in a neurologically-inspired spreading activation network. This hybrid system performs decision making by selecting new patterns of behavior when activation levels of elements reach certain configurations, and performs learning by strengthening spreading activation links between elements. This system is capable of representing long-term sequences of actions, or plans, as well as dynamically replanning by selecting new actions when environmental changes make old action sequences undesirable. In simulations of the model, agents navigated grid world environments while dynamically selecting goals and actions to fulfill those goals, and autonomously learned associations between people and places in their environments.

Bach Dao, A Simulation Study of Protein Family Degree Distribution (Honors thesis, 2008; advisor: Danny Krizanc)

The focus of this thesis is the degree distribution of protein family network graphs. We propose three models of evolution that generate a protein family. The first one uses Sequence Alignment to quantify protein relationship while the second uses the number of mutations accumulated on proteins. The last model incorporates explicitly preferential attachment. Following many other studies, we consider three operations: duplication, gene death and mutation. The main result that we report from the three models is that although the exponential distribution is the best fit of the data, the power law distribution fits the data well with the rates of evolution found in many studies.

Jesse Farnham, Performance Enhancement and Equivalence Criteria for Cellular Automaton-Based Tumor Simulations (Honors thesis, 2008; advisor: Eric Aaron)

This paper describes a performance enhancement to a cellular automaton-based simulation of tumor growth. A cellular automaton is a grid-based data structure whose elements are updated according to specific rules. Elements represent small areas of simulated tissue and contain local cell population and nutrient concentration data for those areas. The simulation's computational efficiency is enhanced by suppressing updates to \emph{steady-state} tissue locations, areas with little change in nutrient concentration or cell population. This modification produces different tissue development from the original, canonical method in which all tissue areas are updated on every timestep. This paper defines several criteria by which a modified, efficient simulation can be considered equivalent to an original, canonical method; these criteria are applied to determine whether or not the performance enhancement can be considered equivalent to the original simulation.

2007

Daniel Hore, Implementation and Type-checking for ATR (Honors thesis, 2007; advisor: Norman Danner)

In this thesis we implement an interpreter and describe a principal-type algorithm for the language ATR (for Affine Tiered Recursion) defined by Danner and Royer. The interpreter and the principal type algorithm (when implemented) will allow programmers to program in ATR . This becomes interesting upon understanding the constraints guaranteed for any program that successfully compiles in ATR, namely that any program that compiles in ATR runs in polynomial time. However, the notion of polynomial time must be extended since ATR contains higher order types.

2006

Brendan Dolan-Gavitt, Timing Attacks in Anonymity-Providing Systems (Honors thesis, 2006; advisors: Norman Danner, Danny Krizanc)

In general, when a client accesses any server over the Internet, a great deal of information is trivially obtainable. In particular, the sender's IP address is included in every packet of data they transmit, and this information can be used to determine their Internet Service Provider and in many cases their approximate geographic location. In addition, each packet contains the IP address of its destination. Anonymous networks provide a way of preventing outside observers from determining that an initiator and a responder are, in fact, communicating.

The most common way of achieving this anonymity is by routing the connection through a series of intermediate proxy servers, along the way mixing or delaying packets in order to frustrate attempts at uncovering the sender through traffic analysis. However, for most interactive applications, such as web browsing, SSH sessions, or live chat, too much mixing or delaying will result in a poor or unusable experience for the user. A web browsing session, would quickly become frustrating if it took five minutes for each page to load. It seems, then, that there is a trade-off between anonymity and speed. This thesis examines what threats are actually posed by traffic analysis against low-latency mix-nets and how practical it is to implement them on a real network, using Tor as our case study.

Adam Robbins-Pianka and Paul Cao, Analysis of upAUG Conservation and Possible Misannotation of S. cerevisiae ORFs (Hughes Summer Program 2006; advisors: Mike Rice, Michael Weir)

In order to study misannotation, we first identified candidate genes by comparing alignments of orthologous genes of four Saccharomyces species and finding well-conserved sequences which contained possible start sites upstream of the annotated open reading frame (annORF). Using information theoretic analysis, we compared and contrasted patterns in the sequence data surrounding the upAUGs and their corresponding annotated AUGs (annAUGs) of the candidates with those of high-confidence, verified genes to further refine our assessment of misannotation. Furthermore, we studied the sequential pattern of conservation of upAUGs in order to discover possible introns. To further identify and confirm intron candidates, we aligned upstream protein sequences of orthologous genes to identify regions of extended conservation and, conversely, gaps of non-conservation, which may identify regions of introns. It is our aim in this project to improve upon current gene and intron annotation by identifying and studying the characteristics of upAUGs in the yeast genomes.

Other student projects

The Multitouch Project

The Wesleyan Multitouch Developers group is exploring the future of human-computer interaction through natural user interfaces. As the hardware for creating touch-sensitive surfaces becomes cheaper and more popular, more possibilities for intuitive interactions with computers emerge. We are currently working on a framework for well-integrated, efficient, and usable software.

The Humanitarian FOSS project

We are part of a growing community involved in The Humanitarian FOSS Project, dedicated to building and using Free and Open Source Software (FOSS) to benefit humanity. We are a founding chapter of the H-FOSS Project, along with Trinity College and Connecticut College.

Mathematics & Computer Science