Wikipedia Tabular Data Scraper
| GitHub | Live | Docs |
Project Overview
Developed a streamlined data extraction module designed to convert Wikipedia's unstructured web tables into structured CSV files for analysis. This tool simplifies the data collection process for researchers and analysts.
Key Features
- Automated Extraction: Scrapes tabular data directly from user-provided Wikipedia URLs.
- Polite Scraping: Implements custom User-Agents to adhere to Wikipedia’s scraping policies and ensure respectful data retrieval.
- Instant Conversion: Automatically transforms HTML tables into cleaned pandas DataFrames for immediate CSV export.
Impact & Results
- Live Deployment: Hosted as a web application via Streamlit for public accessibility.
- Versatility: Serves as a foundational module for larger data analysis pipelines and research projects.
Tech Stack
Python | BeautifulSoup4 | Pandas | Streamlit | Requests