- Contents
- Summary
- Dependencies
- Setup
- Download Code and Token From GitHub
- Set Up AWS Before Creating App
- Create App in AWS Elastic Beanstalk
- Deploy App in AWS EC2
- Notes
- API
- Endpoints
- Additional Details
- Examples
- Data Model
- Future Features
- Metadata
This application retrieves, stores, processes, and accesses data about daily weather and yearly crop yields.
As of 2024-07-16, this application is running live on AWS Elastic Beanstalk. For API usage information, see the APIDocs at the URL below.
http://gconan-corteva-challenge.us-west-2.elasticbeanstalk.com/apidocs
See requirements.txt
for full list of dependencies.
- Python v3.10+
- Python-Poetry
- NumPy v2.0.0+
- Pandas v2.2.2+
- Flask-SQLAlchemy v3.1.1+
- SQLAlchemy v2.0.31+
- PsycoPG2-Binary v2.9.9+
- Dask[dataframe] v2024.7.0+
- Flasgger v0.9.7.1+
- Clone this (
Corteva-Challenge
) repository, then select all of its contents and export them as a .ZIP file. Do not just download the repository as a .ZIP file through the GitHub website, because that will put all of the repo contents in a top-level subdirectory within the .ZIP file and cause errors later. - Get a valid GitHub authorization token to access the GitHub API.
-
Open the AWS Management Console in your browser.
-
Create, or log in to, an AWS account.
-
Create and download an AWS EC2 key pair. I named mine
Corteva-Challenge
. -
In the IAM page of the AWS Management Console, go to
Roles
and clickCreate role
. UnderUse case
, selectEC2
. ClickNext
. On theAdd permissions
page, add the following permissions policies, then clickNext
.AmazonS3FullAccess
AWSElasticBeanstalkRoleWorkerTier
AWSElasticBeanstalkWebTier
AWSElasticBeanstalkWorkerTier
-
Name the role and click
Create role
. -
Create an EC2 instance profile using this role.
- Open the the Elastic Beanstalk page in the AWS Management Console.
- Click
Create application
. - Fill in the
Application name
,Environment name
, andDomain
fields. For this example, I named my appCorteva-Challenge
with an environment calledCorteva-Challenge-env
at the subdomaingconan-corteva-challenge
. - In the
Platform
field, selectPython
, and forPlatform branch
selectPython 3.11
. - Check
Upload your code
andLocal file
, then clickChoose file
and upload your.zip
file copy of theCorteva-Challenge
code repo. ClickNext
. - Click
Use an existing service role
and select the defaultaws-elasticbeanstalk-service-role
. UnderEC2 key pair
, select the key pair you downloaded earlier. UnderEC2 instance role
, select the role you created earlier. - Under
Public IP address
, click theActivated
box. Also click theEnable database
switch. UnderUsername
andPassword
, typepostgres
.1 - Under
Environment properties
, clickAdd environment property
. Name itGITHUB_TOKEN
and enter the entire token string you generated. - Click
Next
, and on theReview
page clickSubmit
to create the app.
-
Open Amazon RDS in the AWS Management Console. Click
Databases
in the sidebar, then select the database you generated when you created your Elastic Beanstalk application. UnderConnected compute resources
, clickActions
and clickSet up EC2 connection
. In theEC2 instance
dropdown, select the EC2 instance running your application, then clickContinue
. -
From the EC2 page of the AWS Management Console, click
Instances
, and then the string under theInstance ID
of the instance running your application. ClickConnect
, ensure thatConnect using EC2 Instance Connect
is checked, and then click theConnect
button at the bottom-right. -
In the EC2 Instance Connect command-line terminal, run
source /var/app/venv/staging-*/bin/activate
.2 -
From the Elastic Beanstalk page of the AWS Management Console, click
Environments
and then the env you created (e.g.Corteva-Challenge-env
). Copy the URL path underDomain
. -
From the RDS page of the AWS Management Console, go to
Databases
and then click the database you started for this app. Copy the URI path listed underEndpoint & Port
. -
In the EC2 Instance Connect command-line terminal, activate the environment and define its variables.2 In the terminal,
- Run
ls -d /var/app/venv/staging-*/bin
to get thebin
directory path. - Run
export PYTHONPATH=
followed by thebin
directory path. - Run
source ${PYTHONPATH}/activate
- Run
export GITHUB_TOKEN=
followed by the entire GitHub access token string you generated. - Run
export SQLALCHEMY_DATABASE_URI=postgresql+psycopg2://postgres:postgres@
followed by the database URI path and:5432/postgres
at the end.
- Run
-
In the
Elastic Beanstalk > Environments > Corteva-Challenge-env > Configuration
section, define those environment variables again.3 In theEnvironment properties
section of theConfiguration
page,- If there is no environment variable named
GITHUB_TOKEN
, then add one and set its value to the entire GitHub token string. - If there is no environment variable named
PYTHONPATH
, then add one and set its value to thebin
directory path. - If there is no environment variable named
SQLALCHEMY_DATABASE_URI
, then add one and set its value to the same stringpostgresql+psycopg2://postgres:postgres@
{database-URI}:5432/postgres
.
- If there is no environment variable named
-
To start the application, connect to its host via EC2 Instance Connect and then do the following:
cd
to the directory containingapp.py
. That file should be in a subdirectory of/var/app/current/
.- Run
flask setup-db
.2 - Load all data into the database by running
flask load-data
.2
-
The application should now be fully usable. Navigate to the domain path URL you copied earlier in your browser, and you should be able to access any of the API endpoints defined below as subdomains.
In a full production deployment used by actual clients, I would write the app to:
- use a secure username and password, and require user authentication to access the application.
- run its setup steps automatically. For this test deployment, I do them manually.
- ensure that environment variables are passed between Elastic Beanstalk and the EC2 instance terminal. Currently, environment variables must be defined both places.
/
returns a simple message stating whether the application is running./apidocs
uses Flasgger to provide additional information on this application's API endpoints and what data they allow you to access./api/crop
returns crop yield data: the number of crop bushels per year./api/weather
returns daily weather report data: the daily maximum/minimum temperature and precipitation at each weather station. This endpoint accepts several parameters to filter the data:station_id=N
will only include reports from the weather station with the ID number N.max_date=YYYY-MM-DD
will exclude any reports after the specified date in ISO 8601 format.min_date=YYYY-MM-DD
will exclude any reports before the specified date in ISO 8601 format.
/api/weather/stats
returns overall weather report data: the average minimum/maximum temperature and total precipitation at a given station during a given yearstation_id=N
will only include reports from the weather station with the ID number N.year=YYYY
will only include stations' reports for the year YYYY.
/api/weather/stations
returns the name and ID number of every weather station.
- The
/api/weather
,/api/weather/stations
, and/api/crop
endpoints return paginated results. They accept two parameters to filter results by page:per_page=N
organizes results into groups of N. By default, it will return the first N results.page=N
will return the Nth page/group. By default, it will return the Nth 50 results.
Navigate to this API endpoint to access the twentieth to fourtieth daily weather reports from 1997 at station 5:
/api/weather?page=2&per_page=20&min_date=1997-01-01&max_date=1997-12-31&station_id=5
Navigate to this API endpoint to access the average yearly maximum/minimum temperature and total precipitation at weather station 3 in 1998:
/api/weather/stats?station_id=3&year=1998
classDiagram
WeatherStation "1" --> "many" WeatherReport : generates
class WeatherStation {
+id: int
+created: datetime
+name: string
+updated: datetime
}
class WeatherReport {
+id: int
+date: date
+max_temp: float
+min_temp: float
+precipitation: int
+station_id: int
}
class CropYield {
+id: int
+corn_bushels: int
+created: datetime
+year: int
}
The following are not currently features of this application, but I would add them if implementing it for production-level use by actual clients.
- Add Yearly Statistics Class/Model. Explicitly define a SQL database table, and corresponding Python class in
models.py
, to store the yearly statistics returned from the/api/weather/stats
endpoint. - User Authentication. Instead of allowing data access to anyone who can access the page, the application could require user authentication.
- Scheduled Data Ingestion. The application could query the source data files and update its database at specified intervals, like on a
cron
job. - Statistical Predictive Modeling. The application could use daily weather reports to predict and yearly crop yield. In its most basic form, the application would correlate the data columns of the
weather_report
table in a given year with thecorn_bushels
yield for that year. Further models would identify which stations and periods of time best predict the yield. - Filtering By Station Name. Instead of accepting the arbitrary
station_id
parameter, the/api/weather
endpoint could accept astation_name
parameter and determine the ID number of that station bySELECT
ing thatstation_name
in theweather_station
table.
- Written 2024-07-15 by @GregConan ([email protected])
- Updated 2024-07-16 by @GregConan ([email protected])