The Student Honors Papers collection represents exemplary work in computer science at Illinois Wesleyan University. The Ames Library is proud to archive these and other honors projects in Digital Commons @ IWU, the University's online archive of student, faculty and staff scholarship and creative activity.

*by Tec Yan Yap*

Generative adversarial networks (GANs) are now one of the key techniques for detecting anomalies in images, yielding remarkable results. Applying similar methods to discrete structures, such as text sequences, is still largely an unknown. In this work, we introduce a new GAN-based text anomaly detection method, called ARAE-AnoGAN, that trains an adversarially regularized autoencoder (ARAE) to reconstruct normal sentences and detects anomalies via a combined anomaly score based on the building blocks of ARAE. Finally, we present experimental results demonstrating the effectiveness of ARAE-AnoGAN and other deep learning methods in text anomaly detection.

*by Wenting Zhao*

Symmetry occurs in many constraint satisfaction problems, and it is important to deal with it efficiently and effectively, as it often leads to an exponential number of isomorphic assignments. Symmetric rows and columns in matrices are an important class of symmetries in constraint programming. In this work, we develop a new SAT encoding for partial lexicographical ordering constraints to break symmetries in such places. We also survey all the previous complete lex-leader encodings in literature and translate them into SAT encodings. We perform experimental analysis on how these lex-leader constraints impact the solving of Balanced Incomplete Block Design (BIBD) instances. Each encoding is able to outperform the other encodings on some instances, and they all perform close to each other; no clear winner can be drawn. Finally, the result shows that though using any lex-leader constraints is detrimental to finding a single BIBD, they are necessary in enumerating all BIBDs and proving non-existing designs.

*by Ammar Malik*

Constraint satisfaction problems (CSPs) involve finding assignments to a set of variables that satisfy some mathematical constraints. Unsatisfiable constraint problems are CSPs with no solution. However, useful characteristic subsets of these problems may be extracted with algorithms such as the MARCO algorithm, which outperforms the best known algorithms in the literature. A heuristic choice in the algorithm affects how it traverses the search space to output these subsets. This work analyzes the effect of this choice and introduces three improvements to the algorithm. The first of these improvements sacrifices completeness in terms of one type of subset in order to improve the output rate of another; the second and third are variations of a local search in between iterations of the algorithm which result in improved guidance in the search space. The performance of these improvements is analyzed both individually and in combinations across a variety of benchmarks and they are shown to improve the output rate of MARCO.

*by Jordyn C. Maglalang*

Boolean cardinality constraints are commonly translated (encoded) into Boolean CNF, a standard form for Boolean satisï¬ability problems, which can be solved using a standard SAT solving program. However, cardinality constraints are a simple generalization of clauses, and the complexity entailed by encoding them into CNF can be avoided by reasoning about cardinality constraints natively within a SAT solver. In this work, we compare the performance of two forms of native cardinality constraints against some of the best performing encodings from the literature. We designed a number of experiments, modeling the general use of cardinality constraints including crafted, random and application problems, to be run in parallel on a cluster of computers. Results show that native implementations substantially outperform CNF encodings on instances composed entirely of cardinality constraints, and instances that are mostly clauses with few cardinality constraints exhibit mixed results warranting further study.

*by Aditya Rajgarhia '07*

Face detection is the task of determining the locations and sizes of human faces in arbitrary digital images, while ignoring any other objects to the greatest possible extent. A fundamental problem in computer vision, it has important applications in fields ranging from surveillance-based security to autonomous vehicle navigation. Although face detection has been studied for almost a decade, the results are not satisfactory for a variety of practical applications, and the topic continues to receive attention.

A commonly used approach for detecting faces is based on the techniques of "boosting" and "cascading", which allow for real-time face detection. However, systems based on boosted cascades have been shown to suffer from low detection rates in the later stages of the cascade. Yet, such face detectors are preferable to other methods due to their extreme computational efficiency.

In this thesis we introduce a novel variation of the boosting process that uses features extracted by Independent Component Analysis (ICA), which is a statistical technique that reveals the hidden factors that underlie sets of random variables or signals. The information describing a face may be contained in both linear as well as high-order dependencies among the image pixels. These high-order dependencies can be captured effectively by representation in ICA space. Moreover, it has been argued that the metric induced by lCA is superior to other methods in the sense that it may provide a representation that is more robust to the effect of noise such as variations in lightening. We propose that features extracted from such a representation may be boosted better in the later stages of the cascade, thus leading to improved detection rates while maintaining comparable speed. We present the results of our face detector, as well as comparisons with existing systems.

*by Daniel H. Garrette '06*

An intrusion detection system (IDS) is used to determine when a computer or computer network is under attack. Most contemporary IDSs operate by defining what an intrusion looks like and checking traffic for matching patterns in network traffic. This approach has unavoidable limitations including the inability to detect novel attacks and the maintenance of a rule bank that must grow with every new intrusion discovered. An anomaly detection scheme attempts to define what is normal so that abnormal traffic can be distinguished from it. This thesis explores the ways that an unsupervised technique called "clustering" can be used to distinguish normal traffic from anomalous traffic. This thesis will also explore an attempt to improve upon existing clustering algorithms to improve anomaly detection by adding in limited amounts of a posteriori knowledge.

*by Kyle Barkmeier '06*

Determining the computational complexity of problems is a large area of study. It seeks to separate these problems into ones with "efficient" solutions, and those with "inefficient" solutions. Of course, the strata is much more fine-grain than this. Of special interest are two classes of problems: P and NP. These have been of much interest to complexity theorists for quite some time, because both contain many instances of important real-world problems, and finding efficient solutions for those in NP would be beneficial for computing applications. Yet with all this attention, there are still important unanswered questions about the two classes. It is known that P â NP, however it is still unknown whether P = NP or if P â NP. Before we discuss why this problem is so crucial to complexity theory, an overview of P, NP, and coNP is necessary.

The class P is a model of the notion of "efficiently solvable", and thus contains all languages (problems) that are decidable in deterministic polynomial time. This means that any language in P has a deterministic Turing Machine (algorithm) that will either accept or reject any input in n^k steps, where n is the length of the input string, and k is a constant. The class NP contains all languages that are decidable in nondeterministic polynomial time. A nondeterministic Turing Machine is one that is allowed to "guess" the correct path of computation, and seems to be able to reach an accept or reject state faster than if it was forced to run deterministically. It is unknown whether NP is closed under complementation because of this nondeterminism. It is quite easy to show a class of deterministically-solvable languages (such as P) is closed under complementation: we simply reverse the accept and reject states. This method is not viable for a nondeterministic machine, since switching the accept and reject states will result in machine that computes a completely different language. Thus the class coNP is defined as containing the complement of every language in NP.

In the rest of this paper we will present structural definitions of P and NP as well as present example languages from each. These structural definitions will give insight into the arrangement of the polynomial hierarchy, which is discussed in section 3. A diagonalization proof is presented in section 4, and an explanation of the general usage of diagonalization follows. In section 5, universal languages are defined and an important result from Kozen is given. In the final section, the limits of diagonalization as they pertain to P and NP are outlined, as well as the same limits for relativized classes.

*by Mark Portolese '05*

Ray-tracing algorithms have the potential to create extremely realistic three-dimensional computer graphics. The basic idea is to trace light rays from the user through the computer screen into the hypothetical three-dimensional world. This is done to determine what objects should be displayed on the screen. Furthermore, these rays are traced back to the light sources themselves to determine shading and other photorealistic effects. However, without optimization these algorithms are slow and impractical. This paper explores the use of the classic binary space subdivision algorithm in order to speed up the process. Binary space subdivision is the use of binary trees to recursively partition the screen into rectangular areas which are then rendered separately. The algorithms were implemented using C++. The use of binary space subdivision dramatically improved the speed of the implementation in most cases, resulting in a doubled or tripled frame rate under favorable circumstances.

*by Dmitry Mogilevsky '03*

*by Rajen Subba '03*

P-FASTUS is an Information Extraction system developed in SICStus Prolog based on the implementation of FASTUS. It is program that extracts prespecified information such as the name of the companny, location and the position being advertised from" Job PostingIs'' in text files. The system is composed of different levels of processing phases that are implemented using finite-state transducers.