Kenyon Project Pages

Kenyon: A Common Software Stratigraphy System

Jennifer Bevan Sung Kim Lijie Zou

Software evolution research has developed to the point where common data extraction, representation, storage, and access issues can be identified across different systems. Most systems compare two or more revisions of a system, most have a computationally expensive step that can be done asynchronously with user interaction, and most need to store these preprocessed results for later interactive access. While work has been done in creating standardized exchange formats (SEFs) such as GXL for graph representations, no common platform has yet been created to support the full sequence of data extraction, preprocessing, and storage.

We have created Kenyon, a common extraction, preprocessing, and storage platform, to support future work in software evolution. It will achieve this goal by facilitating the separation of extraction and analysis concerns, which will in turn enable data analysis to begin much sooner than would otherwise be possible. We expect that software evolution research will receive a significant benefit from Kenyon in two primary ways. First, the start-up time to create a repository for analysis will be greatly reduced, and will be limited only by the setup, selection, and specification of the fact extraction command sequence. Secondly, pre-existing Kenyon databases, which can be accessed via either Hibernate class access or by direct SQL queries, will allow the entire historical data set, or any single revision within it, to be analyzed by many different research techniques. In essence, Kenyon will extend the standardized exchange format concept across an entire revision history.

We were inspired to create Kenyon because of a noticeable functional overlap within our own research projects: Beagle and IVA. Beagle currently requires a manual revision extraction and preprocessing phase but provides FactLoader, a means for loading TA-format extracted facts into a relational database. IVA currently has an automated SCM extraction tool, but stores its preprocessed data in a WebDAV repository. IVA provides a generic script-based approach for allowing any pipeline of command-line tools to be used for fact extraction; the current implementation is Unix-specific but a more general interface has been planned. IVA also contains a general graph implementation that can support TA, GXL, and many other transfer formats.

Our design of Kenyon is be based on reusing, restructuring, and generalizing these existing solutions in our current projects. We augmented the IVA graph implementation with Hibernate, a system that stores Java classes in relational databases and provides class-based data access as well as standard SQL and Hibernate SQL queries. Project-specific data sets are be definable as well, to facilitate storing and accessing a set of related data such as bug tracking system data and SCM commit messages. We will open source Kenyon in order to leverage ongoing and future efforts towards supporting multi-language systems, more third-party fact extractors, and more SCM systems. We also plan to make Kenyon available as an Eclipse plugin, which would allow users to integrate incremental data prepration activities with their development environment; for example, fact extraction and preprocessing could be automatically performed as an Ant task within the Eclipse build or commit tasks.

Project Status

Kenyon 1.0 is ready to go, and it and the user manual will soon appear on the DForge site.

Project Library

Kenyon is hosted by DForge at All published documentation for the system will be found there.