Hi, this is my bachelor thesis about extension of data warehouse. The thesis covers following points:
- Describe DAFOS in the context of data warehousing
- Automation of ETL using Apache Airflow
- Incremental updates of data for optimization of ETL
- Implementation of staging area with query capabilities using MongoDB
- Centralized log storage
- Lineage of data
- Integration of new data source: European Patent Office
- Developing a basic testing strategy for code that uses newly added technologies
pdfcsplain main.tex; pdfcsplain main.tex; open main.pdf
You need to compile the tex file twice, because on the first iteration only references to images are created, on the second run they are placed into the document.