Wikipedia Tabular Data Scraper

GitHub

Live

Docs

Project Overview

Developed a streamlined data extraction module designed to convert Wikipedia's unstructured web tables into structured CSV files for analysis. This tool simplifies the data collection process for researchers and analysts.

Key Features

Automated Extraction: Scrapes tabular data directly from user-provided Wikipedia URLs.
Polite Scraping: Implements custom User-Agents to adhere to Wikipedia’s scraping policies and ensure respectful data retrieval.
Instant Conversion: Automatically transforms HTML tables into cleaned pandas DataFrames for immediate CSV export.

Impact & Results

Live Deployment: Hosted as a web application via Streamlit for public accessibility.
Versatility: Serves as a foundational module for larger data analysis pipelines and research projects.

Tech Stack

Python | BeautifulSoup4 | Pandas | Streamlit | Requests

Mekhma Tamang

Project Overview

Key Features

Impact & Results

Tech Stack