-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the hotel-dash wiki!
Hotel Dash is a data visualization web app powered by Plotly DASH. The app aims to visualize hotel business performance that currently supports the Bangkok city area. The performance is determined based on the hotel's price range, review score, and review count. The main idea that justifies future business performance is the basic demand and supply. In general, hotels compete with each other mostly within a certain price range and the local area. There is no fixed price range, but the tiers usually include luxury & Upper Upscale, Upscale & Upper Mid, Midscale & Economy. However, the price range for each of these is highly dependent on the region. Therefore, we integrate an unsupervised clustering algorithm to identify hotel hubs in the city. Our hypothesis is that hotels behave differently in each hub. The demand for luxury, middle, and economy tiers is expected to be different. Thus, generating a hotel competitiveness score that corresponds to the business performance relative to its neighboring competitor is our main goal. Our hotel, data is collected from Expedia website. To retrieve the information faster and easier, I decided to use Apify to automatically perform web scraping tasks.
├── Dockerfile # Dockerfile
├── README.md
├── assets
│ └── style.css # CSS file that prettifies DASH
├── csv_files # stores SCV files
│ ├── cleaned_hotel.csv
│ ├── clustered_hotel.csv
│ ├── clustered_hotel_regressor_filled.csv
│ └── hotel_data_expedia.csv
├── main.py # MAIN python file to run the web app
├── models # provide KMeans, DBSCAN, AgglomerativeClustering, DBSCAN algorithms
│ ├── clustering_models.py # Class trains/predicts and stores results for model evaluate
│ └── dbscan.py
├── preprocessing
│ ├── clean_data.py # clean data from web scraping
│ ├── feature_engineering.py # perform feature engineering tasks
│ └── graphs.py # contain functions that plot graphs in DASH
├── requirements.txt # package install requirements
├── train_clustering.py # use to train clustering algorithms
└── train_regressor.py # fill missing value with XGBoost regressor
Hotel Dash contains three main steps
- Machine Learning Algorithms
- Developing Dash
- Deployment
The app currently supports four unsupervised algorithms, KMeans, DBSCAN, AgglomerativeClustering, and DBSCAN, to perform geospatial clustering that identifies hotel competitive hubs. Using object-oriented programming, models are stored in clustering_models.py
and dbscan.py
. They contain functions that use class inputs as configs to train and evaluate models in train_clustering.py
. All models are evaluated and scored using Silhouette Plots, while KMeans can identify the optimal number of clusters using the elbow method.
We use the DASH framework to simplify web development allowing us to quickly build an interactive data visualization web app. Dash wraps HTML, CSS, and JavaScript in pure Python. By using the Python Dash HTML Components module, you may create your layout without writing HTML or using an HTML templating engine. Python files that are associated with this part are main.py
and style.css
.
There are three main pillars in developing Dash.
One can either use HTML or Dash Bootstrap Components to construct the app layout. Dash allows HTML to be written in python. In this project, HTML components are decorated with style.css
. Furthermore, the Dash Core Component possesses components, like dropdowns, checklists, and sliders which add interactiveness to the app. Personally, I referred to this Medium for my first time developing Dash. I recommend you follow through with the tutorial because this gave me a huge head start.
Now, with the page structure created. We must add content where we add a LOT of graphs listed in graphs.py
. Plotly provides many kinds of graphs that help Dash look astonishing. I use Plotly Mapbox to visualize hotel locations and clustered hubs interactively.
@app.callback allows the app to accept inputs from Dash Core Component and generate responses or feedback that alter the way we present data. This is the way we create interactiveness can we can design how it works the way we want in the callback function. For example, it can be picking data to display in the dropdown. And once the user clicks, the graphs change to show that data. Here in the app, I added two sets of callbacks. One is the two sliders that filter the price and review score of hotels to display. Another set of callbacks input algorithm configurations. The dropdown let users pick what clustering algorithm to be used to cluster hotels, while the switch can trigger the data to fill missing price value with the regressor. However, the output does not go off constantly after the choices were picked but the values are instead stored as STATE
. The input will affect the visual once the buttons were clicked which is the OUTPUT that applies stored STATE.
In this case, we decided to use Microsoft Azure's container instance services to host our custom-made docker container and let the app runs on it. Keep in mind that we must set the host address to 0.0.0.0
and set the port to os.environ['APP_PORT']
. The host address allows other computers to access the app, while we will set the port to default (80) in Azure. To begin with, we must construct a Dockerfile by simply renaming an empty file 'Dockerfile' and Pycharm will automatically detect the file type. Then, I use pipreqs
in the terminal to generate the requirements.txt. This is how I wrote my Dockerfile, but one can customize it any way they want. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. Using docker build users can create an automated build that executes several command-line instructions in succession. Now, we must build an image from this Dockerfile and put it on Azure so that we can create a container to host our app. But first, let's build an image using this command for macOS Apple silicon chip docker buildx build --platform=linux/amd64 -t account-name/image-name:latest .
. Do not forget to add a '.' at the end with a spacebar. Next, we have to push this image to a hub which I chose the Docker Hub. I push the image using this command docker push account-name/image-name:latest
. Now, we are ready to create a container instance in Azure. Simply navigate through the search bar, and you will find the container instance. On the page, fill info as follows.
If you are using Docker Hub like me, I filled registry.hub.docker.com/account-name/image-name:latest
on the image blank. Click next if everything is all set. On the next page, write your DNS name label and leave the next one as a tenant. I also leave ports as 80 then click next. On the next page, we have to set Environment variables as we have set port=os.environ['APP_PORT']
in index.py
. Thus, we write:
After everything is all set, you are ready to review + create the instance.
When the container instance is started and running. FQDN is the address of the leads to the app website. You must add an additional port number if you are not using port 80, ex. :8080
. I use port 80 which is a default port, so I do have to add :80
after the FQDN. You have made it to the end! If you have more questions, please feel free to ask.