EduPop is a data pipeline project designed to combine education-related data from Texas schools with population data for cities across Texas. By using web scraping and API integrations, this project provides insights into the relationship between city populations and school density, a dataset not readily available for free online. The goal is to help urban planners, educational policymakers, and researchers better understand how well school systems in different cities cater to their populations.
-
School Data (API):
- We used the Texas Schools dataset available via ArcGIS, which contains data on school names, district names, enrollment, and more.
- The API link: ArcGIS Schools Data
-
Population Data (Web Scraping):
- Population data for Texas cities was scraped from City-Data.com, a website that doesn't provide a publicly accessible API but offers detailed population figures for cities in Texas.
- Schools per Capita: This dataset reveals the number of schools available per person in each city, helping identify areas with potential school shortages.
- Average Population per School: This shows the average number of people that a school in a city serves, which can highlight overcrowded school systems or underpopulated areas with excess school capacity.
The EduPop dataset provides a unique combination of data that isn't available online for free. It gives stakeholders in education and urban planning the ability to:
- Identify school accessibility gaps in cities where populations are underserved by existing educational infrastructure.
- Assist in planning future educational infrastructure by helping decision-makers understand which cities require additional school investments.
- Optimize resource allocation by providing a clear view of how educational resources are distributed in relation to population size.
This dataset not only helps in educational policy but can also be used to assist urban planning in rapidly growing cities or in identifying rural areas that might need additional resources.
To run this project locally, follow the steps below:
-
Clone the repository:
git clone https://github.com/ahamedfoisal/EduPop.git
-
Navigate into the project directory:
cd EduPop
-
Create a virtual environment and activate it:
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the main Python script to generate the dataset:
python main.py
-
The final dataset will be saved as
final_school_population_dataset.csv
in the project directory.
EduPop/
│
├── main.py # The main Python script for the project
├── requirements.txt # Python dependencies
├── README.md # Project overview
├── ETHICS.md # Ethical considerations
├── final_school_population_dataset.csv # Example of the cleaned dataset output
└── .gitignore # To exclude unnecessary files like virtual env