Spark

An analysis on US flights and cascading failures using PySpark

09/09/2020  ·  7 minutes
Python Spark
Introduction In this blog post, we are going to study a dataset of US only flights during the year 2007. The dataset was released by the American Statistical Association as part of their Bi-Annual Data exposition. During the competition, participants were asked to focus on only one question and try answering it by investigating the dataset. The question we are going to try to answer is: Can you detect cascading failures as delays in one airport create delays in others?