TikTok Claim Classification
Project Overview
This project aims to classify each video on TikTok according to whether it presents a claim or an opinion. The project is hosted on GitHub Pages. To view the project, click here. If you’re interested in exploring the source code, you can find it here.
Notebooks Overview
- Data Understanding
- Initiates the exploration of the provided TikTok dataset by acquainting with the data, compiling summary information, and preparing for further analysis.
- Exploratory Data Analysis (EDA)
- Conducts exploratory data analysis on the TikTok dataset, focusing on factors that differentiate claim videos from opinion videos.
- Hypothesis Testing
- Demonstrates knowledge of hypothesis testing by applying descriptive and inferential statistics, probability distributions, and hypothesis testing in Python.
- Logistic Regression Model
- Builds a logistic regression model to predict user churn and evaluates its performance using exploratory data analysis techniques.
- Random Forest and XGBoost
- Implements Random Forest and XGBoost models to automate the initial stages of the claims process on TikTok, predicting whether a video presents a “claim” or an “opinion.”
Importance of Each Notebook
Each notebook plays a crucial role in understanding the data, conducting analysis, building models, and providing insights regarding claim classification on TikTok.
