The goal of the project is to analyze my country's Airbnb data set. Also, a simple implementation of a recommendation system based on the description of each Airbnb using Term Frequency–Inverse Document Frequency and Cosine Similarity metric.
I downloaded the data sets for my country from here. You can do the same for your own or for any other country you want to analyze.
There are 3 datasets:
- listings.csv
- calendar.csv
- reviews.csv
The whole procedure of each notebook consists of:
- Loading data sets.
- Droping any rows that have a nan value
- word_cloud.ipynb
- Merging data sets
- Text preprocessing
- Generating Word Clouds
- recommendation.ipynb
- Concatenating name and description columns
- Text preprocessing
- TF-IDF vectorization
- Calculating the similarity of each Airbnb with the others
- Storing 100 most similar Airbnbs for each one (Linear time)
- listings.ipynb
- Cleaning price column
- Data analysis
- calendar.ipynb
- Cleaning price column and separating date to year, month and day columns
- Data analysis