News Summarizer: Unlocking Valuable Data from BBC News

Overview

This project is designed to scrape news headlines, summaries, and links from the BBC News website. The goal is to extract valuable information from a website that doesn't provide a publicly accessible API, demonstrating how web scraping can be utilized to gather data for educational and research purposes.

Purpose of Data Collection

The primary purpose of this project is to extract and analyze news data from BBC News to:

Provide users with quick access to the latest news headlines and summaries.
Enable keyword-based filtering to help users find news articles relevant to their interests.
Demonstrate the practical application of web scraping techniques in a real-world scenario.

Data Source Selection

Website Used: BBC News
Why Chosen: BBC News is a reputable source of global news that doesn't offer a publicly accessible API for extracting news data. Scraping this site allows us to obtain timely news information that can be valuable for analysis and research.
Robots.txt Compliance: We have reviewed the BBC's robots.txt file and ensured that our scraping activities comply with their guidelines.

Project Structure

News-Summarizer/
├── main.py
├── requirements.txt
├── README.md
├── ETHICS.md
├── .gitignore

main.py: The main Python script for scraping the news data.
requirements.txt: A list of required Python packages.
README.md: Project documentation.
ETHICS.md: Discussion of ethical considerations.
.gitignore: Excludes virtual environment and other unnecessary files from the repository.

Installation

Clone the Repository:

git clone https://github.com/ahamedfoisal/News-Summarizer.git

Navigate to the Project Directory:
```
cd News-Summarizer
```

Set Up a Virtual Environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install Required Libraries:
```
pip install -r requirements.txt
```

Usage

Run the Script:
```
python main.py
```
Enter a Keyword (Optional):
- When prompted, enter a keyword to filter news articles.
- Press Enter without typing anything to display all available news articles.
View the Results:
- The script will display the news articles along with their summaries and links.

Features

Scrape Latest News: Fetches the most recent news stories from the BBC News homepage.
Extract Detailed Information: Retrieves the headline, summary, and direct link for each news article.
Keyword Filtering: Allows users to filter news articles by keywords in the title or summary.
Duplicate Prevention: Ensures that duplicate news articles are not displayed.

Collection Practices

Respect Robots.txt: The scraper only accesses parts of the website that are not disallowed by the robots.txt file.
Rate Limiting: The script is designed to be efficient and respectful, minimizing the number of requests to avoid overloading the server.
No Bypassing Restrictions: The scraper does not attempt to access password-protected areas or bypass any security measures.

Data Handling and Privacy

No Personal Data Collection: The scraper does not collect any Personally Identifiable Information (PII) or user-specific data.
Secure Data Storage: Any data collected is stored securely and is included in the .gitignore file to prevent accidental uploads to public repositories.

Data Usage

Educational and Research Purposes Only: The data collected is intended solely for educational demonstrations and research.
No Commercial Use: The data will not be used for commercial purposes or redistributed.

Ethical Considerations

For a detailed discussion on the ethical considerations of this project and how they are addressed, please refer to the ETHICS.md file.

Contributing

Contributions are welcome! If you have suggestions for improvements or encounter any issues, please open an issue or submit a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Summarizer: Unlocking Valuable Data from BBC News

Overview

Table of Contents

Purpose of Data Collection

Data Source Selection

Project Structure

Installation

Usage

Features

Collection Practices

Data Handling and Privacy

Data Usage

Ethical Considerations

Contributing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
ETHICS.md		ETHICS.md
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

ahamedfoisal/News-Summarizer

Folders and files

Latest commit

History

Repository files navigation

News Summarizer: Unlocking Valuable Data from BBC News

Overview

Table of Contents

Purpose of Data Collection

Data Source Selection

Project Structure

Installation

Usage

Features

Collection Practices

Data Handling and Privacy

Data Usage

Ethical Considerations

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages