htx technical assessment
Before you start, make sure you have Python 3.10 installed on your machine.
First, create a virtual environment.
python3 -m venv venv
This creates a new directory venv
To activate the virtual environment, run:
# on linux or mac run
source venv/bin/activate
# on windows run
.\venv\Scripts\activate
Your shell prompt will change to show the name of the activated environment.
Next, install the Python dependencies:
pip install -r requirements.txt
To run the server, use the uvicorn
command:
uvicorn asr.asr-api:app --reload
NOTE: Due to relative import errors, please run the script from the root directory of the project.
The setup.sh
script is used to download the necessary data for Task 2d. This includes the sample audio files for cv-valid-dev
.
To run the script, use the following command from the root directory of the project:
./setup.sh
This will download a data.zip file, unzip it to create a data directory, and then delete the data.zip file. If the data directory already exists, the script will not download anything.
Once the server is up and running, you can test if it's working correctly by pinging the /ping
endpoint. Use the following curl
command:
For Task 2b
curl http://127.0.0.1:8000/ping
This should return a response "pong" indicating that the server is running correctly.
You can test the ASR endpoint by sending a POST request with a file. A sample audio file sample.mp3
has been provided. Use the following curl
command:
For Task 2c
curl -X POST "http://127.0.0.1:8000/asr" -H "accept: application/json" -H "Content-Type: multipart/form-data" -F "file=@./sample.mp3"
A json response similar to this will be returned:
{
"transcription": "I THINK JOHN TEYLOR IS E YOUNG CAPABLE A I ENGINEER HE WOULD MAKE A GREAT FIT FOR ST S",
"duration": "6.635125"
}
You can run the cv-decode.py
script to process each sample file in the data
directory through the localhost server. This script will generate transcriptions for each file and create a new cv-valid-dev-updated.csv
file in the /asr
directory.
The cv-valid-dev-updated.csv
is already in the directory for Task 2d
If you would like to run the script, use the following command from the root directory of the project:
python asr/cv-decode.py
NOTE: Due to relative import errors, please run the script from the root directory of the project.
For Task 2e, we containerize the FastAPI app using a Dockerfile. To build, run, and test the image, follow the steps below:
- Build the Docker image:
docker build -t asr-api .
- Run the Docker container:
docker run -p 80:80 asr-api
- Test the API:
curl http://localhost:80/ping
curl -X POST "http://localhost:80/asr" -H "accept: application/json" -H "Content-Type: multipart/form-data" -F "file=@./sample.mp3"
Note: Since we are using FastAPI's UploadFile
class, the files uploaded to the API are temporary files that will be discarded after each API call has finished execution.
PDF report can be found in the deployment-design/design.pdf
file.
A docker compose file spins up an Elasticsearch cluster with two nodes. The cv-index.py
script indexes the asr/cv-valid-dev-updated.csv
file into the Elasticsearch cluster.
cd elastic-backend
docker-compose up
cd ..
python elastic-backend/cv-index.py
NOTE: Due to relative import errors, please run the script from the root directory of the project.
The Search UI is a react app taken from the search-ui docs. The docs provided a sample template found here for searching a movie database. The app config has been modified for our use case of cv-transcriptions
. A docker-compose file has been added to the to spin up the Search UI and Elasticsearch cluster together.
Navigate to the search-ui
directory of the project where the docker-compose.yaml
file is located. Execute the following command to start the services:
docker-compose up
Once the Elasticsearch service is running, it's necessary to index the dataset into Elasticsearch for the search functionality to work. Navigate to the project root directory and run the following command:
python elastic-backend/cv-index.py
This script indexes the CSV file data into the Elasticsearch service, ensuring that the search functionality is operational.
NOTE: Due to relative import errors, please run the script from the root directory of the project.
After the services are up and the data has been indexed, the frontend application will be accessible at http://localhost:3000
. This URL serves the Search-UI React app, through which users can perform search operations on the indexed data.
This document provides an improved overview of the cloud deployment process for a scalable and highly available application. While a more detailed and complex deployment strategy is outlined in the deployment-design/design.pdf
document, this guide focuses on a simplified approach for the technical assessment with limited cloud resources.
Due to resource constraints, we've opted for a straightforward deployment utilizing Docker Compose on a single Amazon EC2 instance. This method allows us to demonstrate the application's functionality in a cloud environment without the complexity and cost associated with a full-scale deployment.
The core components of our application are divided as follows:
- Frontend: A React application serving as the user interface, configured to communicate with the Elasticsearch backend.
- Backend: A 2-node Elasticsearch cluster responsible for indexing and searching data efficiently.
The docker-compose.yaml
file, located in the root directory of the project, orchestrates the setup and interconnection of these components.
Ensure that Docker and Docker Compose are installed on your EC2 instance. Clone the project repository to your instance to get started.
- Be sure to also change the env variable
REACT_APP_ELASTICSEARCH_URL
in thedocker-compose.yaml
file to the public IP of the EC2 instance. - Be sure to change the endpoint in the
search-ui/src/config/engine.json
file to the public IP of the EC2 instance.
Navigate to the root directory of the project where the docker-compose.yaml
file is located. Execute the following command to start the services:
docker-compose up
Once the Elasticsearch service is running, it's necessary to index the dataset into Elasticsearch for the search functionality to work. Run the following command:
python elastic-backend/cv-index.py
This script indexes the CSV file data into the Elasticsearch service, ensuring that the search functionality is operational.
NOTE: Due to relative import errors, please run the script from the root directory of the project.
After the services are up and the data has been indexed, the frontend application will be accessible at http://<ec2-public-ip>:3000
. This URL serves the Search-UI React app, through which users can perform search operations on the indexed data.
The final project has been deployed on an AWS EC2 instance. You can access the application using the following link: here
The essay can be found in the root directory essay.pdf
.