Loan Default Prediction
Predicting the outcome of a loan as a means to maximize return for peer-to-peer lending investments.
Description
In this project, loan data for a peer-to-peer lending platform is obtained and analyzed with a goal of maximizing return for investors. Variables from the dataset are evaluated to determine if they can be used for prediction of the target variable, loan status. Each variable is then analyzed to describe its distribution and its association with whether or not a loan was paid in full. Following exploratory data analysis, preprocessing and predictive modeling is used to create a loan selection process that maximizes return on investment. This loan selection process is compared to using FICO scores and loan grade for selecting loans. Finally, ideas for further analysis and key learnings are provided.
Data
Contents
Data
- Dataset web address.txt - Web address of dataset (too large to store in Github)
- LCDataDictionary - Variable descriptions
Papers
- Proposal.pdf
- Expected Questions.pdf
- Report.pdf
Notebooks and Visualizations (main directory)
- Exploratory Data Analysis.ipynb - First step in the project; exploring and visualizing data
- Data Profile Report.html - Variable statistics, distributions, and relationships
- Predictive Modeling.ipynb - Last step in the project; fitting various predictive models and applying them to the business problem
- Excel Charts.xlsx - Visualizations used in the Report
- Presentation Deck.pdf - Recording of presentation can be found here
Tools
- Python
- PyCaret
- SKLearn
- MatPlotLib
- Seaborn
Author
Samuel Sears @ssears219