Data visualization and prosessing platform utilizing microservices architecture.
The platform integrates to existing data sources via a range of methods. The exetrnal data sources can be, for example:
- Public API's of an organization
- Private API's of an organization
- Open Data
- Smart Devices
- Sensors
- Automation systems
- Databases
- ...
The key is to have access to all the data sources that matter. All of the relevant data should be available for
- visualization
- analyses and data prosessing
And the visualizations and analyses should often be available almost in real time.
The method we are implementing in ultrahack for data source integration is polling: the platform enables the user to provide api endpoints that are polled at user-spesicied intervals. The polled data is submitted to kafka topics for further analysis and/or persistent storage.
The key here is that the configuration of this polling is made super-easy: If you want to use/store data behind certain API, it's a no brainer.
This api polling approach could be useful in numerous different domains. For example:
- Home automation: There are numerous different home automation platforms, gadgets and other data sources that can affect in one building. Yet they most often have access through specified api for status-checks, sensor readings etc. Now we could have it all in one place.
- Industrial machinery: Industrial equipment has nowadays a wide array of sensors, logs and usage statistics. The also tend to be more and more connected to the outside world.
- Open Data fusion: numerous open data api's could provide useful contextual data to enrichen the data analyses. Through frequent polling this data can be stored and utilized easily.
Often times the data provided by machinery, sensors or open data sources, is not in usable format. Platform V's refining ang preprocessing services are aimed to be simple and scalable services that refines input data to one or several output formats that could be utilized in statistical analyses, for example. The data refining services could include
- Normalization (value or frequency)
- Add contextual data
- Formating
- Value conversions
- Parsing
- Error handling
To enable vast amounts of data with sufficient error handling and data persistence properties, an Apache Kafka Distributed server is utilized. It functions as a centran gateway to all data, and makes the system fault tolerant by acting as a buffer. It also enables the data to be used both in real time and in bathces.
The data is streamed from Kafka to different services.
- Scheduled prosessing services uses batch processing to utilize, for example, machile learning, forecasting and dynamic modelling algorithms
- Connection to R Studio enables the end users of the platform to directly connect to the raw data for advanced extra analytics. It can also be used to develop the platform and its services by prototyping different calculations, and by getting to know the raw data in more detail.
- updates existing models with new data
- create predictions based on last x number of datapoints
- show real-time status of a large system by using data from a wide range of inputs
- stream important filtered data straight to end users dashboard
The frontend web app utilizes WebSockets for duplex communication between the platform and the user. The streamed real-time data is transformed to the screen of the end user. The data is visualized in charts and KPI's, in such a intuitive way: no extra clutter and no inuseful information.
This platform allows us to use R-languages vast library analytic functions. We create predictive models from the raw data and push the results to a database from which they can be fetched easily by the web service.
=================================
The big idea behind microservices is to architect large, complex and long-lived applications as a set of cohesive services that evolve over time. The term microservices strongly suggests that the services should be small.
In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API.
Project uses:
- Play Framework 2.4.x
- Apache Kafka 0.8.2
- Install Apache kafka with zookeeper from kafka.apache.org
- clone
https://github.com/yahoo/kafka-manager.git
repo for kafka manager - See powershell folder for start script.
- verify that kafka works by using the provided command line consumer and producer
To check code quality of all the modules
$ ./activator clean compile scalastyle
To check code coverage of test cases for all modules
$ ./activator clean coverage test
By default, scoverage will generate reports for each project seperately. You can merge them into an aggregated report by invoking
$ ./activator coverageAggregate
$ ./activator "project <service-name>" "run <PORT>"