Predictive Modeling for New York City Taxi Fare
Project Overview
This project focuses on predicting fare amounts for taxi cab rides in New York City using data collected by the New York City Taxi and Limousine Commission. The project is hosted on GitHub Pages. To view the project, click here. If you’re interested in exploring the source code, you can find it here.
Notebooks Overview
- Inspect and Analyze Data
- This notebook investigates and understands the provided taxi cab dataset, including creating a pandas dataframe, performing a cursory inspection, and compiling summary information about the data.
- Exploratory Data Analysis (EDA)
- Conducts exploratory data analysis on the dataset, including data cleaning, building visualizations, and evaluating and sharing results.
- Hypothesis Testing
- Demonstrates knowledge of hypothesis testing by analyzing whether there is a relationship between payment type and fare amount, using descriptive statistics and hypothesis testing in Python.
- Multiple Linear Regression Model
- Builds a multiple linear regression model to predict fare amount, evaluating model performance and interpreting results.
- Random Forest and XGBoost Models
- Implements Random Forest and XGBoost models to predict whether a customer is a generous tipper, considering ethical implications and recommending adjustments to the model objective if necessary.
Importance of Each Notebook
Each notebook plays a crucial role in understanding the data, conducting analysis, building models, and providing insights regarding fare prediction and customer tipping behavior in New York City taxi cab rides.
