Mayura Vartak | Portfolio

Data Extraction for MotoGP Analysis

End-to-end web-scraping + pandas workflow turning MotoGP rider pages into a single, clean CSV for 2024 analysis and viz.

Also See On:

Kaggle

GitHub

TECH STACK

Python

Pandas

Web Scraping

Python web scraping techniques were used to extract data from Wikipedia articles and then organized to create a clean dataset for analysis.
A career overview of 22 riders who competed in 2024, summarized by season and class they competed in.

The dataset can be used for data visualization and insightful analysis of your favorite riders' performance and career trends.

METHODOLOGY

The code to extract and clean this data is written in Python with libraries - Pandas and ssl.

Extraction: Web scraping.

Data is extracted using web scraping from the wikipedia pages of each individual riders. Final file format is csv.

Data Cleaning:

Handled null values, standardized column names, assigned data types, and sorted/rearranged columns for consistency. The dataset is cleaned and arranged by season after initially being sorted by bike number.