TikTok-ML

Project: TikTok Claim Classification

This project aims to utilize machine learning techniques to classify each video according to whether it presents a claim or an opinion. The project comprises five notebooks, each focusing on a different aspect of the data analysis process.

Each notebook plays a crucial role in understanding the data, conducting analysis, building models, and providing insights regarding claim classification on TikTok.

Notebook 1: Data Inspection and Analysis

1. Percentage of Claims vs. Opinions

2. Factors Correlated with Claim Status

claim-engagement

3. Factors Correlated with Engagement Level

Notebook 2: Exploratory Data Analysis (EDA)

1. Distribution Analysis of Variables

Distribution

Observations:

2. Analysis of Claim Status and Verification Status

Claim Status and Verification Status

Observations:

3. Median View Counts Analysis

Median View Counts Analysis

Observations:

4. Overall View Count Analysis

Overall View Count Analysis

Observations:

5. Outlier Threshold Modification

Observations:

Notebook 3: Hypothesis Testing

1. Hypothesis Formulation

Null Hypothesis ($H_0$): There is no difference in the number of views between TikTok videos posted by verified accounts and TikTok videos posted by unverified accounts.

Alternative Hypothesis ($H_A$): There is a difference in the number of views between TikTok videos posted by verified accounts and TikTok videos posted by unverified accounts.

2. Hypothesis Test Result

3. Implications

4. Next Steps

Notebook 4: Logistic Regression Model

1. Multicollinearity Issues

Multicollinearity Issues

2. Impact of Video Duration on Verified Status

Impact of Video Duration on Verified Status

3. Model Performance

Model Performance1 Model Performance2

4. Consideration of Non-linear Relationships

Consideration of Non-linear Relationships

Notebook 5: Random Forest and XGBoost Models

1. Evaluation Metric Selection

2. Model Comparison

Model Comparison

3. Model Recommendation

4. Predictive Features

Predictive Features

5. Feature Engineering