Python web scraping techniques were used to extract data from Wikipedia articles and then organized to create a clean dataset for analysis.
A career overview of 22 riders who competed in 2024, summarized by season and class they competed in.
The dataset can be used for data visualization and insightful analysis of your favorite riders' performance and career trends.
METHODOLOGY
The code to extract and clean this data is written in Python with libraries - Pandas and ssl.
Extraction: Web scraping.
Data is extracted using web scraping from the wikipedia pages of each individual riders. Final file format is csv.
Data Cleaning:
Handled null values, standardized column names, assigned data types, and sorted/rearranged columns for consistency. The dataset is cleaned and arranged by season after initially being sorted by bike number.


