A set of Python tools to make it easier to extract weather station data (e.g., temperature, precipitation) from the Global Historical Climatology Network - Daily (GHCND)
"The Global Historical Climatology Network daily (GHCNd) is an integrated database of daily climate summaries from land surface stations across the globe. GHCNd is made up of daily climate records from numerous sources that have been integrated and subjected to a common suite of quality assurance reviews. GHCNd contains records from more than 100,000 stations in 180 countries and territories. NCEI provides numerous daily variables, including maximum and minimum temperature, total daily precipitation, snowfall, and snow depth. About half the stations only report precipitation. Both record length and period of record vary by station and cover intervals ranging from less than a year to more than 175 years." source
More information on the data can be found here
- Install from the source code:
- Clone the repository source code:
git clone https://github.com/scotthosking/get-station-data.git
- Install along with its dependencies:
cd /path/to/my/get-station-data
pip install -v -e .
from get_station_data import ghcnd
from get_station_data.util import nearest_stn
%matplotlib inline
stn_md = ghcnd.get_stn_metadata()
london_lon_lat = -0.1278, 51.5074
my_stns = nearest_stn(stn_md,
london_lon_lat[0], london_lon_lat[1],
n_neighbours=5 )
my_stns
station | lat | lon | elev | name | |
---|---|---|---|---|---|
52113 | UKE00105915 | 51.5608 | 0.1789 | 137.0 | HAMPSTEAD |
52165 | UKM00003772 | 51.4780 | -0.4610 | 25.3 | HEATHROW |
52098 | UKE00105900 | 51.8067 | 0.3581 | 128.0 | ROTHAMSTED |
52191 | UKW00035054 | 51.2833 | 0.4000 | 91.1 | WEST MALLING |
52131 | UKE00107650 | 51.4789 | 0.4489 | 25.0 | HEATHROW |
df = ghcnd.get_data(my_stns)
df.head()
station | year | month | day | element | value | mflag | qflag | sflag | date | lon | lat | elev | name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | UKE00105915 | 1959 | 12 | 1 | TMAX | NaN | 1959-12-01 | 0.1789 | 51.5608 | 137.0 | HAMPSTEAD | |||
1 | UKE00105915 | 1959 | 12 | 2 | TMAX | NaN | 1959-12-02 | 0.1789 | 51.5608 | 137.0 | HAMPSTEAD | |||
2 | UKE00105915 | 1959 | 12 | 3 | TMAX | NaN | 1959-12-03 | 0.1789 | 51.5608 | 137.0 | HAMPSTEAD | |||
3 | UKE00105915 | 1959 | 12 | 4 | TMAX | NaN | 1959-12-04 | 0.1789 | 51.5608 | 137.0 | HAMPSTEAD | |||
4 | UKE00105915 | 1959 | 12 | 5 | TMAX | NaN | 1959-12-05 | 0.1789 | 51.5608 | 137.0 | HAMPSTEAD |
var = 'PRCP' # precipitation
df = df[ df['element'] == var ]
### Tidy up columns
df = df.rename(index=str, columns={"value": var})
df = df.drop(['element'], axis=1)
df.head()
station | year | month | day | PRCP | mflag | qflag | sflag | date | lon | lat | elev | name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
93 | UKE00105915 | 1960 | 1 | 1 | 2.5 | E | 1960-01-01 | 0.1789 | 51.5608 | 137.0 | HAMPSTEAD | ||
94 | UKE00105915 | 1960 | 1 | 2 | 1.5 | E | 1960-01-02 | 0.1789 | 51.5608 | 137.0 | HAMPSTEAD | ||
95 | UKE00105915 | 1960 | 1 | 3 | 1.0 | E | 1960-01-03 | 0.1789 | 51.5608 | 137.0 | HAMPSTEAD | ||
96 | UKE00105915 | 1960 | 1 | 4 | 0.8 | E | 1960-01-04 | 0.1789 | 51.5608 | 137.0 | HAMPSTEAD | ||
97 | UKE00105915 | 1960 | 1 | 5 | 0.0 | E | 1960-01-05 | 0.1789 | 51.5608 | 137.0 | HAMPSTEAD |
df.drop(columns=['mflag','qflag','sflag']).tail(n=10)
station | year | month | day | PRCP | date | lon | lat | elev | name | |
---|---|---|---|---|---|---|---|---|---|---|
83938 | UKE00107650 | 2016 | 12 | 22 | 0.0 | 2016-12-22 | 0.4489 | 51.4789 | 25.0 | HEATHROW |
83939 | UKE00107650 | 2016 | 12 | 23 | 1.4 | 2016-12-23 | 0.4489 | 51.4789 | 25.0 | HEATHROW |
83940 | UKE00107650 | 2016 | 12 | 24 | 0.0 | 2016-12-24 | 0.4489 | 51.4789 | 25.0 | HEATHROW |
83941 | UKE00107650 | 2016 | 12 | 25 | 1.0 | 2016-12-25 | 0.4489 | 51.4789 | 25.0 | HEATHROW |
83942 | UKE00107650 | 2016 | 12 | 26 | 0.0 | 2016-12-26 | 0.4489 | 51.4789 | 25.0 | HEATHROW |
83943 | UKE00107650 | 2016 | 12 | 27 | 0.0 | 2016-12-27 | 0.4489 | 51.4789 | 25.0 | HEATHROW |
83944 | UKE00107650 | 2016 | 12 | 28 | 0.2 | 2016-12-28 | 0.4489 | 51.4789 | 25.0 | HEATHROW |
83945 | UKE00107650 | 2016 | 12 | 29 | 0.4 | 2016-12-29 | 0.4489 | 51.4789 | 25.0 | HEATHROW |
83946 | UKE00107650 | 2016 | 12 | 30 | 0.0 | 2016-12-30 | 0.4489 | 51.4789 | 25.0 | HEATHROW |
83947 | UKE00107650 | 2016 | 12 | 31 | 0.4 | 2016-12-31 | 0.4489 | 51.4789 | 25.0 | HEATHROW |
df.to_csv('London_5stns_GHCN-D.csv', index=False)
df['PRCP'].plot.hist(bins=40)
<matplotlib.axes._subplots.AxesSubplot at 0x11ae36898>
heathrow = df[ df['name'] == 'HEATHROW' ]
heathrow['PRCP'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x81f0d7240>