- Jordan Pappas, University of Rochester
Data intensive applications (DIA) are an important part of many valuable services that we rely on in our day to day lives. These applications in most cases are built by performing data engineering and data science at scale. Scale in this case implies data volume and compute capacity far outside of what is available on a single machine and its narrow connection to the internet.
To this end, we developed a scalable machine learning product to predict flight delays and rank airline traffic in real time using flight record data and hourly weather report data. By collaborating with data engineers and data scientists, our team built an end-to-end ML pipeline using ETL, exploratory data analysis, MLOps, and model monitoring.