Survey Insights Engine: Advanced Topic Modeling and Sentiment Analysis

Overview

The Survey Insights Engine is an advanced tool for analyzing unstructured survey response data using state-of-the-art natural language processing (NLP) techniques. This updated version offers improved topic modeling, enhanced preprocessing, and more comprehensive sentiment analysis to extract meaningful insights from text data.

Key Features

Improved Text Preprocessing
- Enhanced tokenization and lemmatization
- Customizable stop words removal, including domain-specific terms
- Retention of meaningful short words for better context preservation
Advanced Topic Modeling
- Latent Dirichlet Allocation (LDA) with optimized hyperparameters
- Non-Negative Matrix Factorization (NMF) for alternative topic extraction
- Improved coherence scores for more meaningful topic identification
Comprehensive Sentiment Analysis
- VADER sentiment analyzer for nuanced sentiment scoring
- Visualization of sentiment distribution across responses
Enhanced Visualizations
- Topic distribution charts for both LDA and NMF models
- Interactive heatmaps for topic distribution across responses
- Word clouds for easy identification of key terms in each topic
Model Validation and Comparison
- Coherence score calculation for both LDA and NMF models
- Comparative analysis between LDA and NMF results
Flexible Data Handling
- Support for various input formats, including CSV files and lists

Prerequisites

Python 3.7+
Jupyter Notebook or JupyterLab

Installation

Clone the repository:

git clone https://github.com/your-username/survey-insights-engine.git

Navigate to the project directory:
```
cd survey-insights-engine
```
Install required packages:
```
pip install -r requirements.txt
```

Usage

Open survey_insights_engine.ipynb in Jupyter Notebook or JupyterLab.
Update the survey_responses variable with your data or specify the path to your CSV file.
Run all cells in the notebook to perform the analysis.

Output

The engine generates several output files:

lda_topic_distribution.png & nmf_topic_distribution.png: Bar charts of average topic distribution for LDA and NMF models.
lda_topic_heatmap.png & nmf_topic_heatmap.png: Heatmaps of topic distribution across responses.
vader_sentiment_distribution.png: Histogram of sentiment scores.
topic_X_wordcloud.png: Word clouds for each topic or an overall word cloud (where X is the topic number).

Improvements

This version introduces several key improvements:

Enhanced Coherence: By implementing both LDA and NMF models, we achieve better topic coherence and interpretability. The coherence scores are now calculated for both models, allowing for direct comparison and selection of the most appropriate model for your data.
Improved Preprocessing: The new preprocessing pipeline includes lemmatization and more nuanced stop word removal, resulting in more meaningful topic extraction.
Dual Topic Modeling: The addition of NMF alongside LDA provides a complementary perspective on topic extraction, often resulting in more interpretable topics for certain types of data.
Extended Visualizations: New visualizations for NMF results and comparative views between LDA and NMF offer deeper insights into the topic structure of your data.
Flexible Hyperparameter Tuning: Easily adjustable hyperparameters for both LDA and NMF models allow for fine-tuning based on your specific dataset.

Customization

You can customize the analysis by modifying:

num_topics: Number of topics for LDA and NMF (default is 5)
doc_topic_prior and topic_word_prior: Hyperparameters for LDA
max_df and min_df in TfidfVectorizer for vocabulary building
ngram_range to include multi-word phrases in the analysis

Troubleshooting

If you encounter issues:

Ensure all required packages are installed correctly.
Verify that your input data is in the correct format.
Check the Jupyter notebook for any error messages and refer to the documentation of the relevant libraries.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
survey_insights_engine.ipynb		survey_insights_engine.ipynb
survey_insights_engine.py		survey_insights_engine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Survey Insights Engine: Advanced Topic Modeling and Sentiment Analysis

Table of Contents

Overview

Key Features

Prerequisites

Installation

Usage

Output

Improvements

Customization

Troubleshooting

Contributing

License

About

Releases

Languages

License

rose-deasha/Topic-Modeling-and-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Survey Insights Engine: Advanced Topic Modeling and Sentiment Analysis

Table of Contents

Overview

Key Features

Prerequisites

Installation

Usage

Output

Improvements

Customization

Troubleshooting

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages