The Codekoan search engine

Overview

Installation instructions

The Codekoan search engine is easy to set up. Please see the installation instructions

Overview

This project is the complete implementation of my code pattern recognition engine including a web-application to acces it.

It is structured as a multi-package stack project.

Packages:

codekoan-search-backend

This package contains a programming language-agnostic library that provides:

Access to stackoverflow data through PostgreSQL
Access to the stackoverflow REST-API
Data structures for Stackoverflow data
Levenshtein search
Token Bloom filters
Syntactic filtering steps
- Alignment
- Aggregation
Static code analysis
- Blocks analysis
- Word similiarity analysis

codekoan-messaging

This package contains types that are used for communicating between microservices.

Language Implementations

codekoan-language-java

codekoan-language-python

codekoan-postgres-indexer

This package provides an executable that writes data from a Stackoverflow posts-dump into a PostgreSQL database

codekoan-rmq-injector

This package provides a webservice that takes JSON queries and puts them into RabbitMQ.

codekoan-search-service (Search Worker)

This package contains an executable that takes data from RabbitMQ, performs searches and returns the results to RabbitMQ

RabbitMQ Configuration for every single Worker Instance

The following Code are yaml syntax, the space indent are crucial.

search-language: "java" // one of the three language "java", "python". "haskell", input is checked in source codes
search-exchange: "queries-java-3" // unique queue name in RabbitMQ, the current work polls its task message from this unique queue
search-question-tag: "java" // filter tag from StackOverflow to prevent search through irrelevant stackoverflow answers.
search-answer-digits: [0,3,4,5] // the answer id from StackOverflow are chunked into 10 portions with modulo, the nummer presented here are the chunked index. We assume the post topic are uniformly distributed amount the chunks with this seperation method, which can be systematically proved with in a future work. 
search-cluster-size: 4 // work arround for failing configuration service. This number is crucial, which should be identical for all works for one language, so that the reply cache serivce know how many response are pending. This number reflex on the number of workers for a dedicated language. 
search-rabbitmq-settings: // this section describes the credential for rabbitmq (aka: Message Queueing Service)
 rabbitmq-user: "user"
 rabbitmq-pwd: "password"
 rabbitmq-host: "localhost"
 rabbitmq-virtual-host: "/"
search-postgres-database: // this section describes the credential for PostgreSQL DB 
 db-user: "user"
 db-pwd: "password"
 db-name: "testdb"
 db-port: 5432
 db-host: "10.155.208.4"
search-semantic-url: "http://localhost:3666/submit" // the end-point of semantic service for work to fetch identifier similarity scores

Installation instructions

First get a working stack installation. Then git clone the project into a directory, change into the codekoan directory and use stack build && stack exec-ma-site to run the webapplication. The default port to reach the webapplication is 3000.

You will probably also have to adapt the config in ma-project/ma-site/config and get the necessary data from a publicly available stackoverflow data-dump.

Starting the system

Start RabbitMQ Injector Service

Yaml config file

# Settings pertaining to the RabbitMQ connection
rabbitmq-settings:
 rabbitmq-user: "user"
 rabbitmq-pwd : "password"
 rabbitmq-host: "localhost"
 rabbitmq-virtual-host: "/"

# Settings pertaining to logging
log-settings:
 log-level: debug

# The port that the application is run on
application-port: 6368

starting with stack exec codekoan-rmq-injector

Start Reply Cache Service

Yaml config file

# Settings pertaining to the RabbitMQ connection
rabbitmq-settings:
 rabbitmq-user: "user"
 rabbitmq-pwd : "password"
 rabbitmq-host: "localhost"
 rabbitmq-virtual-host: "/"

# The RabbitMQ queue that we observe for replies
reply-queue: "replies-1"

# Settings pertaining to logging
log-settings:
 log-level: debug

# The port that the application is run on
application-port: 6367

starting with stack exec codekoan-reply-cache

Start Semantic Service

Yaml config file

#Directory that is recursively searched for code files
#(ending is hardcoded into language type) 
corpus-directory: /home/analytics/temp/javacorpus/elasticsearch

# The language to use (one of ["python", "java", "haskell"])
corpus-language: java

# Settings pertaining to logging
log-settings:
 log-level: debug

# The port that the application is run on
application-port: 6366

starting with start exec codekoan-semantic-service

Config RabbitMQ

prerequisition

Confiq topic exchange named "queries"
Config 3 fanout exchanges for the 3 supported languages, 1 fanout for "java", 1 fanout for "python", 1 fanout for "haskell"
Config binding for routing with routing key (Key), fanout exchanges "queries-java" -> java, fanout exchanges "queries-python" -> python, fanout exchanges "queries-haskell" -> haskell
Using RabbitMQ Management Interface to create new queues for each worker, and bin the fontout exchange to each of this worker queue. (the messages passing though the fanout exchange will be redundantly copyed to each queue binded to the fanout exchange.)
the search-exchange variable in the config Yaml file of each work, should be identical to the name of the queue binded to the approperate fanout exchange.
Config reply exchange for receiving response messages and bind it to a queue.

Starting Work instance

copy the codekoan-project folder to all the workers
cd codekoan-project
stack build to build the binary for the worker on each worker node
stack exec codekoan-search-service <path-to-yaml>, is the path to the worker yaml config file (after starting the work, the surfix tree index will be build automatically)

Starting Web Application

stack exec codekoan-site-light, pwd out is codekoan-project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Codekoan search engine

Overview

Installation instructions

Overview

Packages:

codekoan-search-backend

codekoan-messaging

Language Implementations

codekoan-language-java

codekoan-language-python

codekoan-postgres-indexer

codekoan-rmq-injector

codekoan-search-service (Search Worker)

RabbitMQ Configuration for every single Worker Instance

Installation instructions

Starting the system

Start RabbitMQ Injector Service

Start Reply Cache Service

Start Semantic Service

Config RabbitMQ

prerequisition

Starting Work instance

Starting Web Application

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 664 Commits
codekoan-language-haskell		codekoan-language-haskell
codekoan-language-java		codekoan-language-java
codekoan-language-python		codekoan-language-python
codekoan-messaging		codekoan-messaging
codekoan-postgres-indexer		codekoan-postgres-indexer
codekoan-reply-cache		codekoan-reply-cache
codekoan-rmq-injector		codekoan-rmq-injector
codekoan-search-backend		codekoan-search-backend
codekoan-search-service		codekoan-search-service
codekoan-semantic-service		codekoan-semantic-service
codekoan-site-light		codekoan-site-light
codekoan-survey		codekoan-survey
doc		doc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
stack.yaml		stack.yaml

License

schrammc/codekoan

Folders and files

Latest commit

History

Repository files navigation

The Codekoan search engine

Overview

Installation instructions

Overview

Packages:

codekoan-search-backend

codekoan-messaging

Language Implementations

codekoan-language-java

codekoan-language-python

codekoan-postgres-indexer

codekoan-rmq-injector

codekoan-search-service (Search Worker)

RabbitMQ Configuration for every single Worker Instance

Installation instructions

Starting the system

Start RabbitMQ Injector Service

Start Reply Cache Service

Start Semantic Service

Config RabbitMQ

prerequisition

Starting Work instance

Starting Web Application

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages