Over a period of nine years in deep space, the NASA Kepler space telescope has been out on a planet-hunting mission to discover hidden planets outside of our solar system.
To help process this data, you will create machine learning models capable of classifying candidate exoplanets from the raw dataset.
In this homework assignment, we are to:
- Preprocess the dataset prior to fitting the model.
- Perform feature selection and remove unnecessary features.
- Use
MinMaxScaler
to scale the numerical data. - Separate the data into training and testing data.
- Use
GridSearch
to tune model parameters. - Tune and compare at least two different classifiers.
- Python
- SVM (Machine Learning Model)
- Random forest (Machine Learning Model)
Random Forest is better than SVM, even without GridSearchCV.
Before CV | After CV | |
---|---|---|
Training Score | 0.85026 | 0.88777 |
Testing Score | 0.83898 | 0.88106 |
precision | recall | f1-score | support | |
---|---|---|---|---|
CANDIDATE | 0.85 | 0.65 | 0.73 | 523 |
CONFIRMED | 0.75 | 0.87 | 0.81 | 594 |
FALSE POS | 0.98 | 1.00 | 0.99 | 1069 |
micro avg | 0.88 | 0.88 | 0.88 | 2186 |
macro avg | 0.86 | 0.84 | 0.84 | 2186 |
weighted avg | 0.88 | 0.88 | 0.88 | 2186 |
Before CV | After CV | |
---|---|---|
Training Score | 0.99497 | 1.0 |
Testing Score | 0.87740 | 0.89616 |
precision | recall | f1-score | support | |
---|---|---|---|---|
CANDIDATE | 0.84 | 0.73 | 0.78 | 523 |
CONFIRMED | 0.81 | 0.86 | 0.83 | 594 |
FALSE POS | 0.97 | 1.00 | 0.98 | 1069 |
micro avg | 0.90 | 0.90 | 0.90 | 2186 |
macro avg | 0.87 | 0.86 | 0.87 | 2186 |
weighted avg | 0.89 | 0.90 | 0.89 | 2186 |