Machine learning project to classify disaster labeled data from Figure Eight. This project is also part of a Nanodegree in Data Science at Udacity and it has 3 main components:
- ETL scripts to process the data
- Machine Learning pipeline to extract features, training and optimize a classifier model using grid search
- Web application to evaluate the model and get statistics about the training set data
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to http://0.0.0.0:3001
- data/: ETL scripts to process data and save it into a relation database
- models/: Machine learning pipeline code for extracting features, training and optimize the model using grid search
- logs/: Output of best model performance (precision, recall and f1-score) using test data
- plot_data/: Scripts to wrangle data and prepare web app plots
- app/: Flask app code
- test/: Unit tests to validate customized tokenizers and sklearn estimators
For a given message, the app runs the model and outputs the predicted categories.
Overview of training set distribution and input message content.
Must give credit to Figure Eight and Udacity for providing the dataset.
This project is under MIT License.