Finding your dream vacation destination (with NLP)
Making the prediction algorithm was the easy part.
Garrett Mayock posted 2019-05-06 03:33:33 UTC
My Vacation Planner
Know what kind of vacation you want to experience, but not where to find it? Write a review of your dream vacation and use My Vacation Planner to find a match!
The planner takes in written descriptions (in a "review" format) of your dream vacation and returns a recommended destination. There are a few sample reviews at the bottom of the page for you to play around with to get a hang of it.
IMPORTANT NOTES AND LINKS:
- Data courtesy of Datafiniti's Hotel Reviews on Kaggle: https://www.kaggle.com/datafiniti/hotel-reviews
- Idea courtesy of a free webinar from Thinkful.com: https://www.thinkful.com/workshops/city/data-science-vacation/
- See my notebook on Kaggle for a walk-through of creating the classifiers: https://www.kaggle.com/gmayock/where-to-find-your-dream-vacation/
- See my GitHub for how I implemented the code on my website: https://github.com/gmayock/vacation_planner/
- To use it yourself: http://gmayock.com/py/vacation/
What it took to get done
Normally my blogs are about the model I create. I'm going to direct you to my Kaggle notebook for that and give a quick rundown of the effort it took to get this going.
I'll start by giving credit where credit is due. Data comes from Datafiniti and the idea was seeded from a free webinar by Thinkful.com. The webinar led the attendees to the part of my Kaggle notebook titled "Model Review", after which the work is all mine. I also decided on my own to build a separate model for each season.
However, to actually make this part of my portfolio, I had to go a bit above and beyond "just creating a model".
Here's a quick summary of what I had to do:
- Use TF-IDF Vectorizer to create a bag of words
- Find most common terms to create a vocabulary
- Revectorize using vocabulary
- Train one classifier per season, measuring accuracy using train_test_split
- "Save" the classifiers and the vectorizer using pickle so I don't have to retrain them every time a new review is submitted
- Create a web page to host it (http://gmayock.com/py/vacation/)
- Get Python installed on my web server. My hosting company installed Python2.7 on root
- Create an SSH access key
- SSH into my server using PuTTY
- Create a virtual environment for Python 3.7.3 while SSHd in
- Install required packages using SSH
- Create Python file for server to run
- Create communication between Python file on server and My Vacation Planner page
Here's what happens every time a user submits a review for analysis:
- Python file on server loads previously trained classifiers and the vectorizer using pickle
- Python file on server receives input from My Vacation Planner page
- Python file on server vectorizes text review input
- Python file on server submits the text review input to the appropriate classifier based on selected season
- Python file on server returns the classifier output to My Vacation Planner page
- Python file on server redirects user back to My Vacation Planner page
- My Vacation Planner page displays results
I'd never done any of that stuff before, so it was really exciting to get a chance to do so many new things! Especially for something as fun as this.
Okay, now go play around on it!contact me