Detroit Blight Ticket Compliance Predictor

From the Kaggle competition website:

"Every year, Detroit issues millions of dollars in fines to residents that fail to maintain their property (called blight violations), and every year, many of these fines remain unpaid. Enforcing unpaid blight fines is a costly and tedious process, so the city wants to know: how can we increase blight ticket compliance? The first step in answering this question is understanding when and why a resident might fail to comply with a blight ticket. This is where predictive modeling comes in. In the prediction competition, your task is simple - predict whether a given blight ticket will be paid on time."

For this project I used Pandas for data exploration and Scikit-learn to create a pipeline for data preprocessing and classification. Preprocessing involved handling missing values, transforming categorical variables, and incorporating latitudinal and longitudinal data. I used a Gradient Boosted Decision Tree classifier with grid search to find the best parameter settings. This classifier outperformed the Gaussian Naive Bayes classifier with non-engineered features I used as a baseline by approximately 20 points in the area under curve score.

Project Code