CodeKōan - A Source Code Analysis Platform

Source Code is Written by Humans

Producing faultless, high quality source code is a challenging task, that many programmers strive to excel at performing. Computer science has made great strides in the last decades in developing new techniques for producing better source code. The introduction of programming paradigms -- object oriented, functional and aspect oriented programming -- and technology, such as exceptions and the actor model, has enabled programmers to write the software, which powers our everyday life.

Previous research has contributed algorithms and technology to enable writing better software. The goal of project CodeKōan is to put the focus on the most important part of coding: The Programmers.

Project CodeKōan develops novel methods to explore, how programmers reuse coding patterns from documentation to develop software.

To fulfill this goal, project CodeKōan contributes a search engine, that finds similarities between pieces of source code and code examples in online resources like Stackoverflow. The CodeKōan search engine enables developers to review their own code, and compare their solutions to those of other developers. Thereby, knowledge provided by others, is made more accessible.

The CodeKōan search engine empowers programmers to write better software for the applications of tomorrow.

Current Status

A public beta version of the CodeKōan Search Engine will be available shortly. The master thesis describing the search algorithm will also be made available to the public.

The CodeKōan Search Engine

The CodeKōan Pipeline

For each search query in the form of a source code document, CodeKōan search engine finds all reuses of code patterns from a database. Currently, this set of code patterns is all source code in answers on Stackoverflow.com.

The CodeKōan Code Search Algorithm

CodeKōan search engine uses a novel algorithm to find similarities between pieces of source code that yields better results than full-text search methods.

This algorithm combines previously published techniques in source code plagiarism detection, automata theory and natural language processing as well as newly proposed methods. The CodeKōan code search algorithm uses the programming language-specific structure of source code to find matching regions. Thus, it leads to a better result than conventional methods.

The CodeKōan search algorithm is a pipeline, that first finds all regions in a query document, that are vaguely similar to regions of indexed code fragments. These results are further processed, successively refined and grouped into sensible search results.

Supported Language Plugins

The backend for CodeKōan search engine is designed using an extensible architecture, that groups language specific features into plugins. So far, CodeKōan search engine includes plugins for Java, Python and Haskell.

Leveraging Search Results with Human Feedback

Code similarity is a subjective notion. Each developer may perceive similarity of source code differently. Thus, project CodeKōan will investigate the possibility of using Human Computation techniques to provide good search results in practice. User feedback on the quality of search results will play an important role in improving the search engine.

Frequently Asked Questions

Is the data sent to CodeKōan safe?

The code in a query to CodeKōan is sent to the CodeKōan server using https, i.e. it is encrypted and protects against man-in-the-middle attacks.

How is the data sent to CodeKōan used?

Code sent to CodeKōan is used for computing answers, for improving CodeKōan and for statistical analysis. It will neither be given to third parties nor commercially exploited.

Aggregated, anonymized analytics such as statistics are used for research purposes.

IP addresses from which queries are sent to CodeKōan can be used for improving the search engine by use in e.g. spam protection measures.

What does the name CodeKōan mean?

A kōan (公案) is a short anecdote or riddle used in Zen Buddhism to demonstrate the inadequacy of logical reasoning and to provoke enlightenment. CodeKōan aims to give programmers new insight and perspective based on short source code fragments, hence the name.

Who is running CodeKōan?

CodeKōan is a research project of Prof. Bry's research group of the Institute of Informatics of Ludwig-Maximilian University of Munich, Germany. Ludwig-Maximilian University of Munich is a public not-for-profit German agency.

How to get more information on CodeKōan?

The CodeKōan website provides information on CodeKōan and points to current research activities and publications related to CodeKōan;.

For further questions please contact us at kontakt [at] codekoan * org

Are CodeKōan search results correct?

CodeKōan's results are correct in the sense that they are computed with a principled algorithm. It is, however, possible that some users have a different perception of what code pattern detection should be. In such cases, CodeKōan's answers would not necessarily be correct with respect to such a perception.