Hydroclimate Data Retriever#
HyRiver (formerly named hydrodata) is a suite of Python packages that provides a unified API for retrieving geospatial/temporal data from various web services. HyRiver includes two categories of packages:
Low-level APIs for accessing any of the supported web services, i.e., ArcGIS RESTful, WMS, and WFS.
High-level APIs for accessing some of the most commonly used datasets in hydrology and climatology studies. Currently, this project only includes hydrology and climatology data within the US.

Video Gallery#
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Navigate and subset mid- and high-res NHD, NHDPlus, and NHDPlus VAA using WaterData, NLDI, ScienceBase, and The National Map web services.
Access Daymet for daily, monthly and annual summaries of climate data at 1-km scale for both single pixels and gridded, over the North America, Hawaii, and Puerto Rico.
Access GridMet for daily climate data at 4-km scale for both single pixels and gridded data, over the conterminous United States.
Getting Started#
Why HyRiver?#
Some major capabilities of HyRiver are as follows:
Easy access to many web services for subsetting data on server-side and returning the requests as masked Datasets or GeoDataFrames.
Splitting large requests into smaller chunks, under-the-hood, since web services often limit the number of features per request. So the only bottleneck for subsetting the data is your local machine memory.
Navigating and subsetting NHDPlus database (both medium- and high-resolution) using web services.
Cleaning up the vector NHDPlus data, fixing some common issues, and computing vector-based accumulation through a river network.
A URL inventory for some popular (and tested) web services.
Some utilities for manipulating the obtained data and their visualization.
Installation#
You can install all the packages using pip
:
$ pip install py3dep pynhd pygeohydro pydaymet pygridmet pynldas2 hydrosignatures pygeoogc pygeoutils async-retriever
Please note that installation with pip
fails if libgdal
is not installed on your system.
You should install this package manually beforehand. For example, on Ubuntu-based distros
the required package is libgdal-dev
. If this package is installed on your system
you should be able to run gdal-config --version
successfully.
Alternatively, you can install them using conda
:
$ conda install -c conda-forge py3dep pynhd pygeohydro pydaymet pygridmet pynldas2 hydrosignatures pygeoogc pygeoutils async-retriever
or mambaforge
(recommended):
$ mamba install py3dep pynhd pygeohydro pydaymet pygridmet pynldas2 hydrosignatures pygeoogc pygeoutils async-retriever
Additionally, you can create a new environment, named hyriver
with all the packages
and optional dependencies installed with mambaforge
using the provided
environment.yml
file:
$ mamba env create -f ./environment.yml
Dependencies#
aiohttp[speedups]>=3.8.3
aiohttp-client-cache>=0.8.1
aiosqlite
cytoolz
ujson
async-retriever<0.17,>=0.16
cytoolz
defusedxml
joblib
multidict
owslib>=0.27.2
pyproj>=3.0.1
requests
requests-cache>=0.9.6
shapely>=2
typing_extensions
ujson
url-normalize>=1.4
urllib3
yarl
cytoolz
geopandas>=0.10
netcdf4
numpy>=1.21
pyproj>=3.0.1
rasterio>=1.2
rioxarray>=0.11
scipy
shapely>=2
ujson
xarray>=2023.01
async-retriever<0.17,>=0.16
cytoolz
geopandas>=0.10
networkx
numpy>=1.21
pandas>=1
pyarrow>=1.0.1
pygeoogc<0.17,>=0.16
pygeoutils<0.17,>=0.16
shapely>=2
async-retriever<0.17,>=0.16
click>=0.7
cytoolz
geopandas>=0.10
numpy>=1.17
pygeoogc<0.17,>=0.16.1
pygeoutils<0.17,>=0.16.1
rasterio>=1.2
rioxarray>=0.11
scipy
shapely>=2
xarray>=2023.01
async-retriever<0.17,>=0.16
cytoolz
defusedxml
folium
geopandas>=0.10
h5netcdf
hydrosignatures<0.17,>=0.16
matplotlib>=3.5
numpy>=1.21
pandas>=1
pygeoogc<0.17,>=0.16
pygeoutils<0.17,>=0.16
pynhd<0.17,>=0.16
pyproj>=3.0.1
rioxarray>=0.11
scipy
shapely>=2
ujson
xarray>=2023.01
async-retriever<0.17,>=0.16
click>=0.7
geopandas>=0.10
numpy>=1.21
pandas>=1
py3dep<0.17,>=0.16.1
pygeoogc<0.17,>=0.16.1
pygeoutils<0.17,>=0.16.1
pyproj>=3.0.1
scipy
shapely>=2
xarray>=2023.01
async-retriever<0.17,>=0.16
click>=0.7
geopandas>=0.10
numpy>=1.21
pandas>=1
pygeoogc<0.17,>=0.16
pygeoutils<0.17,>=0.16
pyproj>=3.0.1
shapely>=2
xarray>=2023.01
async-retriever<0.17,>=0.16
h5netcdf
numpy>=1.21
pandas>=1
pygeoutils<0.17,>=0.16
pyproj>=3.0.1
rioxarray>=0.11
xarray>=2023.01
numpy>=1.21
pandas>=1
scipy
xarray>=2023.01
Additionally, you can also install bottleneck
and numba
to improve
the performance of some computations. Installing pyogrio
is highly recommended
for improving the performance of working with vector data. For NHDPlus, py7zr
and pyogrio
are required dependencies. For retrieving soil
data, you should install planetary-computer
and pystac-client
.
Software Stack#
A detailed description of each component of the HyRiver software stack.
Features#
PyNHD is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services.
This package provides access to several hydro-linked datasets:
These web services can be used to navigate and extract vector data from NHDPlus V2 (both mid- and high-resolution) database such as catchments, HUC8, HUC12, GagesII, flowlines, and water bodies.
Moreover, the PyGeoAPI service provides four functionalities:
flow_trace
: Trace flow from a starting point to up/downstream direction.split_catchment
: Split the local catchment of a point of interest at the point’s location.elevation_profile
: Extract elevation profile along a flow path between two points.cross_section
: Extract cross-section at a point of interest along a flow line.
PyNHD also provides access to the entire NHDPlus dataset for CONUS (L48) via
nhdplus_l48
function. You can get any of the 31 layers that are available in the
NHDPlus dataset. You can also get NHDPlus Value Added Attributes on
Hydroshare
and ENHD.
These datasets that do not have geometries, include slope and roughness, among other
attributes, for all NHD flowlines. You can use nhdplus_vaa
and enhd_attrs
functions to get these datasets.
Additionally, you can get many more derived attributes at NHD catchment-level through two sources:
Select Attributes for NHDPlus Version 2.1 Reach Catchments from an item on ScienceBase
EPA’s StreamCat dataset.
They both include hundreds of attributes such as hydroclimate properties, water quality,
urbanization, and population. In addition to NHD catchment summaries, they also have
their network-accumulated values (both upstream and divergence-routed). You can use
nhdplus_attrs
, epa_nhd_catchments
, streamcat
functions to get these datasets.
Additionally, PyNHD offers some extra utilities for processing the NHD flowlines:
flowline_xsection
andnetwork_xsection
: Get cross-section lines along a flowline at a given spacing or a network of flowlines at a given spacing.flowline_resample
andnetwork_resample
: Resample a flowline or network of flowlines based on a given spacing. This is useful for smoothing jagged flowlines similar to those in the NHDPlus database.prepare_nhdplus
: For cleaning up the data frame by, for example, removing tiny networks, adding ato_comid
column, and finding terminal flowlines if it doesn’t exist.topoogical_sort
: For sorting the river network topologically which is useful for routing and flow accumulation.vector_accumulation
: For computing flow accumulation in a river network. This function is generic, and any routing method can be plugged in.
These utilities are developed based on an R package called nhdplusTools and a Python package called nldi-xstool.
All functions and classes that request data from web services use async-retriever
that offers response caching. By default, the expiration time is set to never expire.
All these functions and classes have two optional parameters for controlling the cache:
expire_after
and disable_caching
. You can use expire_after
to set the expiration
time in seconds. If expire_after
is set to -1
, the cache will never expire (default).
You can use disable_caching
if you don’t want to use the cached responses. The cached
responses are stored in the ./cache/aiohttp_cache.sqlite
file.
You can find some example notebooks here.
Moreover, under the hood, PyNHD uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.
You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_NAME_HTTP
: Path to the caching SQLite database for HTTP requests. It defaults to./cache/http_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to one week.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.HYRIVER_SSL_CERT
: Path to a SSL certificate file.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"
You can also try using PyNHD without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Moreover, requests for additional functionalities can be submitted via issue tracker.
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Installation#
You can install PyNHD using pip
after installing libgdal
on your system
(for example, in Ubuntu run sudo apt install libgdal-dev
):
$ pip install pynhd
Alternatively, PyNHD can be installed from the conda-forge
repository
using Conda
or Mamba:
$ conda install -c conda-forge pynhd
Quick start#
Let’s explore the capabilities of NLDI
. We need to instantiate the class first:
from pynhd import NLDI, WaterData, NHDPlusHR
import pynhd as nhd
First, let’s get the watershed geometry of the contributing basin of a
USGS station using NLDI
:
nldi = NLDI()
station_id = "01031500"
basin = nldi.get_basins(station_id)
The navigate_byid
class method can be used to navigate NHDPlus in
both upstream and downstream of any point in the database. Let’s get the ComIDs and flowlines
of the tributaries and the main river channel upstream of the station.
flw_main = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamMain",
source="flowlines",
distance=1000,
)
flw_trib = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="flowlines",
distance=1000,
)
We can get other USGS stations upstream (or downstream) of the station and even set a distance limit (in km):
st_all = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="nwissite",
distance=1000,
)
st_d20 = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="nwissite",
distance=20,
)
We can get more information about these stations using GeoConnex:
gcx = GeoConnex("gauges")
stations = st_all.identifier.str.split("-").str[1].unique()
gauges = gpd.GeoDataFrame(
pd.concat(gcx.query({"provider_id": sid}) for sid in stations),
crs=4326,
)
Instead, we can carry out a spatial query within the basin of interest:
gauges = pynhd.geoconnex(
item="gauges",
query={"geometry": basin.geometry.iloc[0]},
)
Now, let’s get the HUC12 pour points:
pp = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="huc12pp",
distance=1000,
)

Also, we can get the slope data for each river segment from the NHDPlus VAA database:
vaa = nhd.nhdplus_vaa("input_data/nhdplus_vaa.parquet")
flw_trib["comid"] = pd.to_numeric(flw_trib.nhdplus_comid)
slope = gpd.GeoDataFrame(
pd.merge(flw_trib, vaa[["comid", "slope"]], left_on="comid", right_on="comid"),
crs=flw_trib.crs,
)
slope[slope.slope < 0] = np.nan
Additionally, we can obtain cross-section lines along the main river channel with 4 km spacing
and width of 2 km using network_xsection
as follows:
from pynhd import NHD
distance = 4000 # in meters
width = 2000 # in meters
nhd = NHD("flowline_mr")
main_nhd = nhd.byids("COMID", flw_main.index)
main_nhd = pynhd.prepare_nhdplus(main_nhd, 0, 0, 0, purge_non_dendritic=True)
main_nhd = main_nhd.to_crs("ESRI:102003")
cs = pynhd.network_xsection(main_nhd, distance, width)
Then, we can use Py3DEP to obtain the elevation profile along the cross-section lines.
Now, let’s explore the PyGeoAPI capabilities. There are two ways that you can access
PyGeoAPI: PyGeoAPI
class and pygeoapi
function. The PyGeoAPI
class
is for querying the database for a single location using tuples and list while the
pygeoapi
function is for querying the database for multiple locations at once
and accepts a geopandas.GeoDataFrame
as input. The pygeoapi
function
is more efficient than the PyGeoAPI
class and has a simpler interface. In future
versions, the PyGeoAPI
class will be deprecated and the pygeoapi
function
will be the only way to access the database. Let’s compare the two, starting by
PyGeoAPI
:
pygeoapi = PyGeoAPI()
trace = pygeoapi.flow_trace((1774209.63, 856381.68), crs="ESRI:102003", direction="none")
split = pygeoapi.split_catchment((-73.82705, 43.29139), crs=4326, upstream=False)
profile = pygeoapi.elevation_profile(
[(-103.801086, 40.26772), (-103.80097, 40.270568)],
numpts=101,
dem_res=1,
crs=4326,
)
section = pygeoapi.cross_section((-103.80119, 40.2684), width=1000.0, numpts=101, crs=4326)
Now, let’s do the same operations using pygeoapi
:
import geopandas as gpd
import shapely.geometry as sgeom
import pynhd as nhd
coords = gpd.GeoDataFrame(
{
"direction": ["up", "down"],
"upstream": [True, False],
"width": [1000.0, 500.0],
"numpts": [101, 55],
},
geometry=[
sgeom.Point(-73.82705, 43.29139),
sgeom.Point(-103.801086, 40.26772),
],
crs=4326,
)
trace = nhd.pygeoapi(coords, "flow_trace")
split = nhd.pygeoapi(coords, "split_catchment")
section = nhd.pygeoapi(coords, "cross_section")
coords = gpd.GeoDataFrame(
{
"direction": ["up", "down"],
"upstream": [True, False],
"width": [1000.0, 500.0],
"numpts": [101, 55],
"dem_res": [1, 10],
},
geometry=[
sgeom.MultiPoint([(-103.801086, 40.26772), (-103.80097, 40.270568)]),
sgeom.MultiPoint([(-102.801086, 39.26772), (-102.80097, 39.270568)]),
],
crs=4326,
)
profile = nhd.pygeoapi(coords, "elevation_profile")

Next, we retrieve mid- and high-resolution flowlines within the bounding box of our
watershed and compare them using WaterData
for mid-resolution, NHDPlusHR
for
high-resolution.
mr = WaterData("nhdflowline_network")
nhdp_mr = mr.bybox(basin.geometry[0].bounds)
hr = NHDPlusHR("flowline")
nhdp_hr = hr.bygeom(basin.geometry[0].bounds)

An alternative to WaterData
and NHDPlusHR
is the NHD
class that
supports both the mid- and high-resolution NHDPlus V2 data:
mr = NHD("flowline_mr")
nhdp_mr = mr.bygeom(basin.geometry[0].bounds)
hr = NHD("flowline_hr")
nhdp_hr = hr.bygeom(basin.geometry[0].bounds)
Moreover, WaterData
can find features within a given radius (in meters) of a point:
eck4 = "+proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
coords = (-5727797.427596455, 5584066.49330473)
rad = 5e3
flw_rad = mr.bydistance(coords, rad, loc_crs=eck4)
flw_rad = flw_rad.to_crs(eck4)
Instead of getting all features within a radius of the coordinate, we can snap to the closest feature ID using NLDI:
comid_closest = nldi.comid_byloc((x, y), eck4)
flw_closest = nhdp_mr.byid("comid", comid_closest.comid.values[0])

Since NHDPlus HR is still at the pre-release stage let’s use the MR flowlines to
demonstrate the vector-based accumulation. Based on a topological sorted river network
pynhd.vector_accumulation
computes flow accumulation in the network.
It returns a data frame that is sorted from upstream to downstream that
shows the accumulated flow in each node.
PyNHD has a utility called prepare_nhdplus
that identifies such
relationships among other things such as fixing some common issues with
NHDPlus flowlines. But first, we need to get all the NHDPlus attributes
for each ComID since NLDI
only provides the flowlines’ geometries
and ComIDs which is useful for navigating the vector river network data.
For getting the NHDPlus database we use WaterData
. Let’s use the
nhdflowline_network
layer to get required info.
wd = WaterData("nhdflowline_network")
comids = flw_trib.nhdplus_comid.to_list()
nhdp_trib = wd.byid("comid", comids)
flw = nhd.prepare_nhdplus(nhdp_trib, 0, 0, purge_non_dendritic=False)
To demonstrate the use of routing, let’s use nhdplus_attrs
function to get a list of available
NHDPlus attributes
char = "CAT_RECHG"
area = "areasqkm"
local = nldi.getcharacteristic_byid(comids, "local", char_ids=char)
flw = flw.merge(local[char], left_on="comid", right_index=True)
def runoff_acc(qin, q, a):
return qin + q * a
flw_r = flw[["comid", "tocomid", char, area]]
runoff = nhd.vector_accumulation(flw_r, runoff_acc, char, [char, area])
def area_acc(ain, a):
return ain + a
flw_a = flw[["comid", "tocomid", area]]
areasqkm = nhd.vector_accumulation(flw_a, area_acc, area, [area])
runoff /= areasqkm
Since these are catchment-scale characteristics, let’s get the catchments then add the accumulated characteristic as a new column and plot the results.
wd = WaterData("catchmentsp")
catchments = wd.byid("featureid", comids)
c_local = catchments.merge(local, left_on="featureid", right_index=True)
c_acc = catchments.merge(runoff, left_on="featureid", right_index=True)

More examples can be found here.
PyGeoHydro: Retrieve Geospatial Hydrology Data#
Features#
PyGeoHydro (formerly named hydrodata) is a part of
HyRiver software stack that
is designed to aid in hydroclimate analysis through web services. This package provides
access to some public web services that offer geospatial hydrology data. It has three
main modules: pygeohydro
, plot
, and helpers
.
PyGeoHydro supports the following datasets:
gNATSGO for US soil properties.
Derived Soil Properties for soil porosity, available water capacity, and field capacity across the US.
NWIS for daily mean streamflow observations (returned as a
pandas.DataFrame
orxarray.Dataset
with station attributes),SensorThings API for accessing real-time data of USGS sensors.
CAMELS for accessing streamflow observations (1980-2014) and basin-level attributes of 671 stations within CONUS.
Water Quality Portal for accessing current and historical water quality data from more than 1.5 million sites across the US,
NID for accessing the National Inventory of Dams web service,
HCDN 2009 for identifying sites where human activity affects the natural flow of the watercourse,
NLCD 2021 for land cover/land use, imperviousness descriptor, and canopy data. You can get data using both geometries and coordinates.
WBD for accessing Hydrologic Unit (HU) polygon boundaries within the US (all HUC levels).
SSEBop for daily actual evapotranspiration, for both single pixel and gridded data.
Irrigation Withdrawals for estimated monthly water use for irrigation by 12-digit hydrologic unit in the CONUS for 2015
STN for access USGS Short-Term Network (STN)
eHydro for accessing USACE Hydrographic Surveys that includes topobathymetry data
NFHL for accessing FEMA’s National Flood Hazard Layer (NFHL) data.
Also, it includes several other functions:
interactive_map
: Interactive map for exploring NWIS stations within a bounding box.cover_statistics
: Categorical statistics of land use/land cover data.overland_roughness
: Estimate overland roughness from land use/land cover data.streamflow_fillna
: Fill missing daily streamflow values with day-of-year averages. Streamflow observations must be at least for 10-year long.
The plot
module includes two main functions:
signatures
: Hydrologic signature graphs.cover_legends
: Official NLCD land cover legends for plotting a land cover dataset.descriptor_legends
: Color map and legends for plotting an imperviousness descriptor dataset.
The helpers
module includes:
nlcd_helper
: A roughness coefficients lookup table for each land cover and imperviousness descriptor type which is useful for overland flow routing among other applications.nwis_error
: A dataframe for finding information about NWIS requests’ errors.
You can find some example notebooks here.
Moreover, under the hood, PyGeoHydro uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.
You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_NAME_HTTP
: Path to the caching SQLite database for HTTP requests. It defaults to./cache/http_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to one week.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.HYRIVER_SSL_CERT
: Path to a SSL certificate file.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"
You can also try using PyGeoHydro without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Moreover, requests for additional functionalities can be submitted via issue tracker.
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Installation#
You can install PyGeoHydro using pip
after installing libgdal
on your system
(for example, in Ubuntu run sudo apt install libgdal-dev
). Moreover, PyGeoHydro has an optional
dependency for using persistent caching, requests-cache
. We highly recommend installing
this package as it can significantly speed up send/receive queries. You don’t have to change
anything in your code, since PyGeoHydro under-the-hood looks for requests-cache
and
if available, it will automatically use persistent caching:
$ pip install pygeohydro
Alternatively, PyGeoHydro can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge pygeohydro
Quick start#
We can obtain river topobathymetry data using the EHydro
class. We can subset
the dataset either using a geometry or a bounding box, based on their ID, or SQL query:
from pygeohydro import EHydro
ehydro = EHydro("points")
topobathy = ehydro.bygeom((-122.53, 45.57, -122.52, 45.59))
We can explore the available NWIS stations within a bounding box using interactive_map
function. It returns an interactive map and by clicking on a station some of the most
important properties of stations are shown.
import pygeohydro as gh
bbox = (-69.5, 45, -69, 45.5)
gh.interactive_map(bbox)

We can select all the stations within this boundary box that have daily mean streamflow data from
2000-01-01
to 2010-12-31
:
from pygeohydro import NWIS
nwis = NWIS()
query = {
"bBox": ",".join(f"{b:.06f}" for b in bbox),
"hasDataTypeCd": "dv",
"outputDataTypeCd": "dv",
}
info_box = nwis.get_info(query)
dates = ("2000-01-01", "2010-12-31")
stations = info_box[
(info_box.begin_date <= dates[0]) & (info_box.end_date >= dates[1])
].site_no.tolist()
Then, we can get the daily streamflow data in mm/day (by default the values are in cms) and plot them:
from pygeohydro import plot
qobs = nwis.get_streamflow(stations, dates, mmd=True)
plot.signatures(qobs)
By default, get_streamflow
returns a pandas.DataFrame
that has a attrs
method
containing metadata for all the stations. You can access it like so qobs.attrs
.
Moreover, we can get the same data as xarray.Dataset
as follows:
qobs_ds = nwis.get_streamflow(stations, dates, to_xarray=True)
This xarray.Dataset
has two dimensions: time
and station_id
. It has
10 variables including discharge
with two dimensions while other variables
that are station attitudes are one dimensional.
We can also get instantaneous streamflow data using get_streamflow
. This method assumes
that the input dates are in UTC time zone and returns the data in UTC time zone as well.
date = ("2005-01-01 12:00", "2005-01-12 15:00")
qobs = nwis.get_streamflow("01646500", date, freq="iv")
We can query USGS stations of type “stream” in Arizona using SensorThings API as follows:
odata = {
"filter": "properties/monitoringLocationType eq 'Stream' and properties/stateFIPS eq 'US:04'",
}
df = sensor.query_byodata(odata)
Irrigation withdrawals data can be obtained as follows:
irr = gh.irrigation_withdrawals()
We can get the CAMELS dataset as a geopandas.GeoDataFrame
that includes geometry and
basin-level attributes of 671 natural watersheds within CONUS and their streamflow
observations between 1980-2014 as a xarray.Dataset
, like so:
attrs, qobs = gh.get_camels()
The WaterQuality
has a number of convenience methods to retrieve data from the
web service. Since there are many parameter combinations that can be
used to retrieve data, a general method is also provided to retrieve data from
any of the valid endpoints. You can use get_json
to retrieve stations info
as a geopandas.GeoDataFrame
or get_csv
to retrieve stations data as a
pandas.DataFrame
. You can construct a dictionary of the parameters and pass
it to one of these functions. For more information on the parameters, please
consult the Water Quality Data documentation.
For example, let’s find all the stations within a bounding box that have Caffeine data:
from pynhd import WaterQuality
bbox = (-92.8, 44.2, -88.9, 46.0)
kwds = {"characteristicName": "Caffeine"}
wq = WaterQuality()
stations = wq.station_bybbox(bbox, kwds)
Or the same criterion but within a 30-mile radius of a point:
stations = wq.station_bydistance(-92.8, 44.2, 30, kwds)
Then we can get the data for all these stations the data like this:
sids = stations.MonitoringLocationIdentifier.tolist()
caff = wq.data_bystation(sids, kwds)

Moreover, we can get land use/land cove data using nlcd_bygeom
or nlcd_bycoods
functions,
percentages of land cover types using cover_statistics
, and overland roughness using
overland_roughness
. The nlcd_bycoords
function returns a geopandas.GeoDataFrame
with the NLCD layers as columns and input coordinates as the geometry
column. Moreover,
the nlcd_bygeom
function accepts both a single geometry or a geopandas.GeoDataFrame
as the input.
from pynhd import NLDI
basins = NLDI().get_basins(["01031450", "01318500", "01031510"])
lulc = gh.nlcd_bygeom(basins, 100, years={"cover": [2016, 2019]})
stats = gh.cover_statistics(lulc["01318500"].cover_2016)
roughness = gh.overland_roughness(lulc["01318500"].cover_2019)

Next, let’s use ssebopeta_bygeom
to get actual ET data for a basin. Note that there’s a
ssebopeta_bycoords
function that returns an ETA time series for a single coordinate.
geometry = NLDI().get_basins("01315500").geometry[0]
eta = gh.ssebopeta_bygeom(geometry, dates=("2005-10-01", "2005-10-05"))

Additionally, we can pull all the US dams data using NID
. Let’s get dams that are within this
bounding box and have a maximum storage larger than 200 acre-feet.
nid = NID()
dams = nid.get_bygeom((-65.77, 43.07, -69.31, 45.45), 4326)
dams = nid.inventory_byid(dams.id.to_list())
dams = dams[dams.maxStorage > 200]
We can get also all dams within CONUS with maximum storage larger than 2500 acre-feet:
conus_geom = gh.get_us_states("contiguous")
dam_list = nid.get_byfilter([{"maxStorage": ["[2500 +inf]"]}])
dams = nid.inventory_byid(dam_list[0].id.to_list(), stage_nid=True)
conus_dams = dams[dams.stateKey.isin(conus_geom.STUSPS)].reset_index(drop=True)

The WBD
class allows us to get Hydrologic Unit (HU) polygon boundaries. Let’s
get the two Hudson HUC4s:
from pygeohydro import WBD
wbd = WBD("huc4")
hudson = wbd.byids("huc4", ["0202", "0203"])
The NFHL
class allows us to retrieve FEMA’s National Flood Hazard Layer (NFHL) data.
Let’s get the cross-section data for a small region in Vermont:
from pygeohydro import NFHL
nfhl = NFHL("NFHL", "cross-sections")
gdf_xs = nfhl.bygeom((-73.42, 43.28, -72.9, 43.52), geo_crs=4269)
Py3DEP: Topographic data through 3DEP#
Features#
Py3DEP is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access to the 3DEP database which is a part of the National Map services. The 3DEP service has multi-resolution sources and depending on the user-provided resolution, the data is resampled on the server-side based on all the available data sources. Py3DEP returns the requests as xarray dataset.
The following functionalities are currently available:
get_map
: Get topographic data the dynamic 3DEP service that supports the following layers:DEM
Hillshade Gray
Aspect Degrees
Aspect Map
GreyHillshade Elevation Fill
Hillshade Multidirectional
Slope Degrees
Slope Map
Hillshade Elevation Tinted
Height Ellipsoidal
Contour 25
Contour Smoothed 25
static_3dep_dem
: Get DEM data at 10 m, 30 m, or 60 m resolution from the staged 3DEP data. Since this function only returns DEM, for computing other terrain attributes you can use xarray-spatial. Just note that you should reproject the outputDataArray
to a projected CRS like 5070 before passing it toxarray-spatial
like so:dem = dem.rio.reproject(5070)
.get_dem
: Get DEM data from either the dynamic or static 3DEP service. Considering that the static service is much faster, if the target DEM resolution is 10 m, 30 m, or 60 m, then the static service is used (static_3dep_dem
). Otherwise, the dynamic service is used (get_map
usingDEM
layer).get_map_vrt
: Get DEM data and store it as a GDAL VRT file from the dynamic 3DEP service. This function is mainly provided for large requests due to its low memory footprint. Moreover, due to lazy loading of the data this function can be much faster thanget_map
orget_dem
, even for small requests at the cost of higher disk usage.elevation_bygrid
: For retrieving elevations of all the grid points in a 2D grid.add_elevation
: For adding elevation data as a new variable to an inputxarray.DataArray
orxarray.Dataset
.elevation_bycoords
: For retrieving elevation of a list ofx
andy
coordinates.elevation_profile
: For retrieving elevation profile along a line at a given spacing. This function converts the line to a B-spline and then calculates the elevation along the spline at a given uniform spacing.deg2mpm
: For converting slope dataset from degree to meter per meter.query_3dep_sources
: For querying bounds of 3DEP’s data sources within a bounding box.check_3dep_availability
: For querying 3DEP’s resolution availability within a bounding box.
You can find some example notebooks here.
Moreover, under the hood, Py3DEP uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.
You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_NAME_HTTP
: Path to the caching SQLite database for HTTP requests. It defaults to./cache/http_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to one week.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.HYRIVER_SSL_CERT
: Path to a SSL certificate file.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"
You can also try using Py3DEP without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Moreover, requests for additional functionalities can be submitted via issue tracker.
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Installation#
You can install Py3DEP using pip
after installing libgdal
on your system
(for example, in Ubuntu run sudo apt install libgdal-dev
). Moreover, Py3DEP has an optional
dependency for using persistent caching, requests-cache
. We highly recommend installing
this package as it can significantly speed up send/receive queries. You don’t have to change
anything in your code, since Py3DEP under-the-hood looks for requests-cache
and if available,
it will automatically use persistent caching:
$ pip install py3dep
Alternatively, Py3DEP can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge py3dep
Quick start#
You can use Py3DEP using command-line or as a Python library. The command-line interface provides access to two functionality:
Getting topographic data: You must create a
geopandas.GeoDataFrame
that contains the geometries of the target locations. This dataframe must have at least three columns:id
,res
, andgeometry
. Theid
column is used as filenames for saving the obtained topographic data to a NetCDF (.nc
) file. Theres
column must be the target resolution in meter. Then, you must save the dataframe to a file with extensions such as.shp
or.gpkg
(whatever thatgeopandas.read_file
can read).Getting elevation: You must create a
pandas.DataFrame
that contains coordinates of the target locations. This dataframe must have at least two columns:x
andy
. The elevations are obtained usingairmap
service in meters. The data are saved as acsv
file with the same filename as the input file with an_elevation
appended, e.g.,coords_elevation.csv
.
$ py3dep --help
Usage: py3dep [OPTIONS] COMMAND [ARGS]...
Command-line interface for Py3DEP.
Options:
-h, --help Show this message and exit.
Commands:
coords Retrieve topographic data for a list of coordinates.
geometry Retrieve topographic data within geometries.
The coords
sub-command is as follows:
$ py3dep coords -h
Usage: py3dep coords [OPTIONS] FPATH
Retrieve topographic data for a list of coordinates.
FPATH: Path to a csv file with two columns named ``lon`` and ``lat``.
Examples:
$ cat coords.csv
lon,lat
-122.2493328,37.8122894
$ py3dep coords coords.csv -q airmap -s topo_dir
Options:
-q, --query_source [airmap|tnm|tep]
Source of the elevation data.
-s, --save_dir PATH Path to a directory to save the requested
files. Extension for the outputs is either
`.nc` for geometry or `.csv` for coords.
-h, --help Show this message and exit.
And, the geometry
sub-command is as follows:
$ py3dep geometry -h
Usage: py3dep geometry [OPTIONS] FPATH
Retrieve topographic data within geometries.
FPATH: Path to a shapefile (.shp) or geopackage (.gpkg) file.
This file must have three columns and contain a ``crs`` attribute:
- ``id``: Feature identifiers that py3dep uses as the output netcdf/csv filenames.
- ``res``: Target resolution in meters.
- ``geometry``: A Polygon or MultiPloygon.
Examples:
$ py3dep geometry ny_geom.gpkg -l "Slope Map" -l DEM -s topo_dir
Options:
-l, --layers [DEM|Hillshade Gray|Aspect Degrees|Aspect Map|GreyHillshade_elevationFill|Hillshade Multidirectional|Slope Map|Slope Degrees|Hillshade Elevation Tinted|Height Ellipsoidal|Contour 25|Contour Smoothed 25]
Target topographic data layers
-s, --save_dir PATH Path to a directory to save the requested
files.Extension for the outputs is either
`.nc` for geometry or `.csv` for coords.
-h, --help Show this message and exit.
Now, let’s see how we can use Py3DEP as a library.
Py3DEP accepts Shapely’s
Polygon or a bounding box (a tuple of length four) as an input geometry.
We can use PyNHD to get a watershed’s geometry, then use it to get the DEM and slope
in meters/meters from Py3DEP using get_map
function.
The get_map
has a resolution
argument that sets the target resolution
in meters. Note that the highest available resolution throughout the CONUS is about 10 m,
though higher resolutions are available in limited parts of the US. Note that the input
geometry can be in any valid spatial reference (geo_crs
argument). The crs
argument,
however, is limited to CRS:84
, EPSG:4326
, and EPSG:3857
since 3DEP only supports
these spatial references.
import py3dep
from pynhd import NLDI
geom = NLDI().get_basins("01031500").geometry[0]
dem = py3dep.get_map("DEM", geom, resolution=30, geo_crs=4326, crs=3857)
slope = py3dep.get_map("Slope Degrees", geom, resolution=30)
slope = py3dep.deg2mpm(slope)
We can also use static_dem
function to get the same DEM:
import xrspatial
dem = py3dep.get_dem(geom, 30)
slope = xrspatial.slope(dem.rio.reproject(5070))
slope = py3dep.deg2mpm(slope)

We can use rioxarray package to save the obtained dataset as a raster file:
import rioxarray
dem.rio.to_raster("dem_01031500.tif")
Moreover, we can get the elevations of a set of x- and y- coordinates on a grid. For example, let’s get the minimum temperature data within this watershed from Daymet using PyDaymet then add the elevation as a new variable to the dataset:
import pydaymet as daymet
import xarray as xr
import numpy as np
clm = daymet.get_bygeom(geometry, ("2005-01-01", "2005-01-31"), variables="tmin")
elev = py3dep.elevation_bygrid(clm.x.values, clm.y.values, clm.crs, clm.res[0] * 1000)
attrs = clm.attrs
clm = xr.merge([clm, elev])
clm["elevation"] = clm.elevation.where(~np.isnan(clm.isel(time=0).tmin), drop=True)
clm.attrs.update(attrs)
Now, let’s get street network data using osmnx package
and add elevation data for its nodes using elevation_bycoords
function.
import osmnx as ox
G = ox.graph_from_place("Piedmont, California, USA", network_type="drive")
x, y = nx.get_node_attributes(G, "x").values(), nx.get_node_attributes(G, "y").values()
elevation = py3dep.elevation_bycoords(zip(x, y), crs=4326)
nx.set_node_attributes(G, dict(zip(G.nodes(), elevation)), "elevation")

We can get the elevation profile along a line at a given spacing using elevation_profile
function. For example, let’s get the elevation profile at 10-m spacing along the main flowline
of the upstream drainage area of a USGS station with ID 01031500
:
import py3dep
from pynhd import NLDI
flw_main = NLDI().navigate_byid(
fsource="nwissite",
fid="USGS-01031500",
navigation="upstreamMain",
source="flowlines",
distance=1000,
)
line = flw_main.geometry.unary_union
elevation = py3dep.elevation_profile(line, 10)
PyDaymet: Daily climate data through Daymet#
Warning
Since the release of Daymet v4 R1 on November 2022, the URL of Daymet’s server has been changed. Therefore, only PyDaymet v0.13.7+ is going to work, and previous versions will not work anymore.
Features#
PyDaymet is a part of HyRiver software stack that
is designed to aid in hydroclimate analysis through web services. This package provides
access to climate data from
Daymet V4 R1 database using NetCDF
Subset Service (NCSS). Both single pixel (using get_bycoords
function) and gridded data (using
get_bygeom
) are supported which are returned as
pandas.DataFrame
and xarray.Dataset
, respectively. Climate data is available for North
America, Hawaii from 1980, and Puerto Rico from 1950 at three time scales: daily, monthly,
and annual. Additionally, PyDaymet can compute Potential EvapoTranspiration (PET)
using three methods: penman_monteith
, priestley_taylor
, and hargreaves_samani
for
both single pixel and gridded data.
For PET computations, PyDaymet accepts four additional user-defined parameters:
penman_monteith
:soil_heat_flux
,albedo
,alpha
,and
arid_correction
.
priestley_taylor
:soil_heat_flux
,albedo
, andarid_correction
.hargreaves_samani
: None.
Default values for the parameters are: soil_heat_flux
= 0, albedo
= 0.23,
alpha
= 1.26, and arid_correction
= False.
An important parameter for priestley_taylor
and penman_monteith
methods
is arid_correction
which is used to correct the actual vapor pressure
for arid regions. Since relative humidity is not provided by Daymet, the actual
vapor pressure is computed assuming that the dew point temperature is equal to
the minimum temperature. However, for arid regions, FAO 56 suggests subtracting
minimum temperature by 2-3 °C to account for the fact that in arid regions,
the air might not be saturated when its temperature is at its minimum. For such
areas, you can pass {"arid_correction": True, ...}
to subtract 2 °C from the
minimum temperature for computing the actual vapor pressure.
You can find some example notebooks here.
Moreover, under the hood, PyDaymet uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.
You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_NAME_HTTP
: Path to the caching SQLite database for HTTP requests. It defaults to./cache/http_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to one week.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.HYRIVER_SSL_CERT
: Path to a SSL certificate file.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"
You can also try using PyDaymet without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Moreover, requests for additional functionalities can be submitted via issue tracker.
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Installation#
You can install PyDaymet using pip
after installing libgdal
on your system
(for example, in Ubuntu run sudo apt install libgdal-dev
):
$ pip install pydaymet
Alternatively, PyDaymet can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge pydaymet
Quick start#
You can use PyDaymet using command-line or as a Python library. The commanda-line provides access to two functionality:
Getting gridded climate data: You must create a
geopandas.GeoDataFrame
that contains the geometries of the target locations. This dataframe must have four columns:id
,start
,end
,geometry
. Theid
column is used as filenames for saving the obtained climate data to a NetCDF (.nc
) file. Thestart
andend
columns are starting and ending dates of the target period. Then, you must save the dataframe as a shapefile (.shp
) or geopackage (.gpkg
) with CRS attribute.Getting single pixel climate data: You must create a CSV file that contains coordinates of the target locations. This file must have at four columns:
id
,start
,end
,lon
, andlat
. Theid
column is used as filenames for saving the obtained climate data to a CSV (.csv
) file. Thestart
andend
columns are the same as thegeometry
command. Thelon
andlat
columns are the longitude and latitude coordinates of the target locations.
$ pydaymet -h
Usage: pydaymet [OPTIONS] COMMAND [ARGS]...
Command-line interface for PyDaymet.
Options:
-h, --help Show this message and exit.
Commands:
coords Retrieve climate data for a list of coordinates.
geometry Retrieve climate data for a dataframe of geometries.
The coords
sub-command is as follows:
$ pydaymet coords -h
Usage: pydaymet coords [OPTIONS] FPATH
Retrieve climate data for a list of coordinates.
FPATH: Path to a csv file with four columns:
- ``id``: Feature identifiers that daymet uses as the output netcdf filenames.
- ``start``: Start time.
- ``end``: End time.
- ``lon``: Longitude of the points of interest.
- ``lat``: Latitude of the points of interest.
- ``time_scale``: (optional) Time scale, either ``daily`` (default), ``monthly`` or ``annual``.
- ``pet``: (optional) Method to compute PET. Supported methods are:
``penman_monteith``, ``hargreaves_samani``, ``priestley_taylor``, and ``none`` (default).
- ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.
Examples:
$ cat coords.csv
id,lon,lat,start,end,pet
california,-122.2493328,37.8122894,2012-01-01,2014-12-31,hargreaves_samani
$ pydaymet coords coords.csv -v prcp -v tmin
Options:
-v, --variables TEXT Target variables. You can pass this flag multiple
times for multiple variables.
-s, --save_dir PATH Path to a directory to save the requested files.
Extension for the outputs is .nc for geometry and .csv
for coords.
--disable_ssl Pass to disable SSL certification verification.
-h, --help Show this message and exit.
And, the geometry
sub-command is as follows:
$ pydaymet geometry -h
Usage: pydaymet geometry [OPTIONS] FPATH
Retrieve climate data for a dataframe of geometries.
FPATH: Path to a shapefile (.shp) or geopackage (.gpkg) file.
This file must have four columns and contain a ``crs`` attribute:
- ``id``: Feature identifiers that daymet uses as the output netcdf filenames.
- ``start``: Start time.
- ``end``: End time.
- ``geometry``: Target geometries.
- ``time_scale``: (optional) Time scale, either ``daily`` (default), ``monthly`` or ``annual``.
- ``pet``: (optional) Method to compute PET. Supported methods are:
``penman_monteith``, ``hargreaves_samani``, ``priestley_taylor``, and ``none`` (default).
- ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.
Examples:
$ pydaymet geometry geo.gpkg -v prcp -v tmin
Options:
-v, --variables TEXT Target variables. You can pass this flag multiple
times for multiple variables.
-s, --save_dir PATH Path to a directory to save the requested files.
Extension for the outputs is .nc for geometry and .csv
for coords.
--disable_ssl Pass to disable SSL certification verification.
-h, --help Show this message and exit.
Now, let’s see how we can use PyDaymet as a library.
PyDaymet offers two functions for getting climate data; get_bycoords
and get_bygeom
.
The arguments of these functions are identical except the first argument where the latter
should be polygon and the former should be a coordinate (a tuple of length two as in (x, y)).
The input geometry or coordinate can be in any valid CRS (defaults to EPSG:4326
). The
dates
argument can be either a tuple of length two like (start_str, end_str)
or a list of
years like [2000, 2005]
. It is noted that both functions have a pet
flag for computing PET
and a snow
flag for separating snow from precipitation using
Martinez and Gupta (2010) method.
Additionally, we can pass time_scale
to get daily, monthly or annual summaries. This flag
by default is set to daily.
from pynhd import NLDI
import pydaymet as daymet
geometry = NLDI().get_basins("01031500").geometry[0]
var = ["prcp", "tmin"]
dates = ("2000-01-01", "2000-06-30")
daily = daymet.get_bygeom(geometry, dates, variables=var, pet="priestley_taylor", snow=True)
monthly = daymet.get_bygeom(geometry, dates, variables=var, time_scale="monthly")

If the input geometry (or coordinate) is in a CRS other than EPSG:4326
, we should pass
it to the functions.
coords = (-1431147.7928, 318483.4618)
crs = 3542
dates = ("2000-01-01", "2006-12-31")
annual = daymet.get_bycoords(coords, dates, variables=var, loc_crs=crs, time_scale="annual")

Additionally, the get_bycoords
function accepts a list of coordinates and by setting the
to_xarray
flag to True
it can return the results as a xarray.Dataset
instead of
a pandas.DataFrame
:
coords = [(-94.986, 29.973), (-95.478, 30.134)]
idx = ["P1", "P2"]
clm_ds = daymet.get_bycoords(coords, range(2000, 2021), coords_id=idx, to_xarray=True)
Also, we can use the potential_et
function to compute PET by passing the daily climate data.
We can either pass a pandas.DataFrame
or a xarray.Dataset
. Note that, penman_monteith
and priestley_taylor
methods have parameters that can be passed via the params
argument,
if any value other than the default values are needed. For example, default value of alpha
for priestley_taylor
method is 1.26 (humid regions), we can set it to 1.74 (arid regions)
as follows:
pet_hs = daymet.potential_et(daily, methods="priestley_taylor", params={"alpha": 1.74})
Next, let’s get annual total precipitation for Hawaii and Puerto Rico for 2010.
hi_ext = (-160.3055, 17.9539, -154.7715, 23.5186)
pr_ext = (-67.9927, 16.8443, -64.1195, 19.9381)
hi = daymet.get_bygeom(hi_ext, 2010, variables="prcp", region="hi", time_scale="annual")
pr = daymet.get_bygeom(pr_ext, 2010, variables="prcp", region="pr", time_scale="annual")
Some example plots are shown below:


PyGridMET: Daily climate data through GridMET#
Features#
PyGridMET is a part of HyRiver software stack that
is designed to aid in hydroclimate analysis through web services. This package provides
access to daily climate data over contermonious US (CONUS) from
GridMET database using NetCDF
Subset Service (NCSS). Both single pixel (using get_bycoords
function) and gridded data (using
get_bygeom
) are supported which are returned as
pandas.DataFrame
and xarray.Dataset
, respectively.
You can find some example notebooks here.
Moreover, under the hood, PyGridMET uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.
You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_NAME_HTTP
: Path to the caching SQLite database for HTTP requests. It defaults to./cache/http_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to one week.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.HYRIVER_SSL_CERT
: Path to a SSL certificate file.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"
You can also try using PyGridMET without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Moreover, requests for additional functionalities can be submitted via issue tracker.
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Installation#
You can install PyGridMET using pip
as follows:
$ pip install pygridmet
Alternatively, PyGridMET can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge pygridmet
Quick start#
You can use PyGridMET using command-line or as a Python library. The commanda-line provides access to two functionality:
Getting gridded climate data: You must create a
geopandas.GeoDataFrame
that contains the geometries of the target locations. This dataframe must have four columns:id
,start
,end
,geometry
. Theid
column is used as filenames for saving the obtained climate data to a NetCDF (.nc
) file. Thestart
andend
columns are starting and ending dates of the target period. Then, you must save the dataframe as a shapefile (.shp
) or geopackage (.gpkg
) with CRS attribute.Getting single pixel climate data: You must create a CSV file that contains coordinates of the target locations. This file must have at four columns:
id
,start
,end
,lon
, andlat
. Theid
column is used as filenames for saving the obtained climate data to a CSV (.csv
) file. Thestart
andend
columns are the same as thegeometry
command. Thelon
andlat
columns are the longitude and latitude coordinates of the target locations.
$ pygridmet -h
Usage: pygridmet [OPTIONS] COMMAND [ARGS]...
Command-line interface for PyGridMET.
Options:
-h, --help Show this message and exit.
Commands:
coords Retrieve climate data for a list of coordinates.
geometry Retrieve climate data for a dataframe of geometries.
The coords
sub-command is as follows:
$ pygridmet coords -h
Usage: pygridmet coords [OPTIONS] FPATH
Retrieve climate data for a list of coordinates.
FPATH: Path to a csv file with four columns:
- ``id``: Feature identifiers that gridmet uses as the output netcdf filenames.
- ``start``: Start time.
- ``end``: End time.
- ``lon``: Longitude of the points of interest.
- ``lat``: Latitude of the points of interest.
- ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.
Examples:
$ cat coords.csv
id,lon,lat,start,end
california,-122.2493328,37.8122894,2012-01-01,2014-12-31
$ pygridmet coords coords.csv -v pr -v tmmn
Options:
-v, --variables TEXT Target variables. You can pass this flag multiple
times for multiple variables.
-s, --save_dir PATH Path to a directory to save the requested files.
Extension for the outputs is .nc for geometry and .csv
for coords.
--disable_ssl Pass to disable SSL certification verification.
-h, --help Show this message and exit.
And, the geometry
sub-command is as follows:
$ pygridmet geometry -h
Usage: pygridmet geometry [OPTIONS] FPATH
Retrieve climate data for a dataframe of geometries.
FPATH: Path to a shapefile (.shp) or geopackage (.gpkg) file.
This file must have four columns and contain a ``crs`` attribute:
- ``id``: Feature identifiers that gridmet uses as the output netcdf filenames.
- ``start``: Start time.
- ``end``: End time.
- ``geometry``: Target geometries.
- ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.
Examples:
$ pygridmet geometry geo.gpkg -v pr -v tmmn
Options:
-v, --variables TEXT Target variables. You can pass this flag multiple
times for multiple variables.
-s, --save_dir PATH Path to a directory to save the requested files.
Extension for the outputs is .nc for geometry and .csv
for coords.
--disable_ssl Pass to disable SSL certification verification.
-h, --help Show this message and exit.
Now, let’s see how we can use PyGridMET as a library.
PyGridMET offers two functions for getting climate data; get_bycoords
and get_bygeom
.
The arguments of these functions are identical except the first argument where the latter
should be polygon and the former should be a coordinate (a tuple of length two as in (x, y)).
The input geometry or coordinate can be in any valid CRS (defaults to EPSG:4326
). The
dates
argument can be either a tuple of length two like (start_str, end_str)
or a list of
years like [2000, 2005]
. It is noted that both functions have a snow
flag for separating
snow from precipitation using
Martinez and Gupta (2010) method.
We can get a dataframe of available variables and their info by calling
GridMET().gridmet_table
:
Variable |
Abbr |
Unit |
---|---|---|
Precipitation |
|
mm |
Maximum Relative Humidity |
|
% |
Minimum Relative Humidity |
|
% |
Specific Humidity |
|
kg/kg |
Surface Radiation |
|
W/m2 |
Wind Direction |
|
Degrees Clockwise from north |
Minimum Air Temperature |
|
K |
Maximum Air Temperature |
|
K |
Wind Speed |
|
m/s |
Burning Index |
|
Dimensionless |
Fuel Moisture (100-hr) |
|
% |
Fuel Moisture (1000-hr) |
|
% |
Energy Release Component |
|
Dimensionless |
Reference Evapotranspiration (Alfalfa) |
|
mm |
Reference Evapotranspiration (Grass) |
|
mm |
Vapor Pressure Deficit |
|
kPa |
from pynhd import NLDI
import pygridmet as gridmet
geometry = NLDI().get_basins("01031500").geometry[0]
var = ["pr", "tmmn"]
dates = ("2000-01-01", "2000-06-30")
daily = gridmet.get_bygeom(geometry, dates, variables=var, snow=True)

If the input geometry (or coordinate) is in a CRS other than EPSG:4326
, we should pass
it to the functions.
coords = (-1431147.7928, 318483.4618)
crs = 3542
dates = ("2000-01-01", "2006-12-31")
data = gridmet.get_bycoords(coords, dates, variables=var, loc_crs=crs)

Additionally, the get_bycoords
function accepts a list of coordinates and by setting the
to_xarray
flag to True
it can return the results as a xarray.Dataset
instead of
a pandas.DataFrame
:
coords = [(-94.986, 29.973), (-95.478, 30.134)]
idx = ["P1", "P2"]
clm_ds = gridmet.get_bycoords(coords, range(2000, 2021), coords_id=idx, to_xarray=True)
PyNLDAS2: Hourly NLDAS-2 Forcing Data#
Features#
PyNLDAS2 is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access NLDAS-2 Forcing dataset via Hydrology Data Rods. Currently, only hourly data is supported. There are three main functions:
get_bycoords
: Forcing data for a list of coordinates as apandas.DataFrame
orxarray.Dataset
,get_bygeom
: Forcing data within a geometry as axarray.Dataset
,get_grid_mask
: NLDAS2 land/water grid mask as axarray.Dataset
.
PyNLDAS2 only provides access to the hourly NLDAS2 dataset, so if you need to access other NASA climate datasets you can check out tsgettoolbox developed by Tim Cera.
Moreover, under the hood, PyNLDAS2 uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.
You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_NAME_HTTP
: Path to the caching SQLite database for HTTP requests. It defaults to./cache/http_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to one week.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.HYRIVER_SSL_CERT
: Path to a SSL certificate file.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"
You can find some example notebooks here.
You can also try using PyNLDAS2 without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Moreover, requests for additional functionalities can be submitted via issue tracker.
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Installation#
You can install pynldas2
using pip
:
$ pip install pynldas2
Alternatively, pynldas2
can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge pynldas2
Quick start#
The NLDAS2 database provides forcing data at 1/8th-degree grid spacing and range from 01 Jan 1979 to present. Let’s take a look at NLDAS2 grid mask that includes land, water, soil, and vegetation masks:
import pynldas2 as nldas
grid = nldas.get_grid_mask()

Next, we use PyGeoHydro to get the geometry of a HUC8 with ID of 1306003, then we get the forcing data within the obtained geometry.
from pygeohydro import WBD
huc8 = WBD("huc8")
geometry = huc8.byids("huc8", "13060003").geometry[0]
clm = nldas.get_bygeom(geometry, "2010-01-01", "2010-01-31", 4326)

Road Map#
[ ] Add PET calculation functions similar to PyDaymet but at hourly timescale.
[ ] Add a command line interfaces.
HydroSignatures: Tools for computing hydrological signatures#
Features#
HydroSignatures is a suite of tools for computing hydrological signatures and a part of HyRiver software stack. This package includes the following functions:
exceedance
: Exceedance probability that can be used plotting flow duration curves;flow_duration_curve_slope
: Slope of flow duration curve;flashiness_index
: Flashiness index;mean_monthly
: Mean monthly summary of a time series that can be used for plotting regime curves;rolling_mean_monthly
: Rolling mean monthly summary of a time series that can be used for plotting smoothed regime curves;baseflow
: Extracting baseflow from a streamflow time series using the Lyne and Hollick digital filter (Ladson et al., 2013);baseflow_index
: Baseflow index;aridity_index
: Aridity index;seasonality_index_walsh
: Seasonality index (Walsh and Lawler, 1981);seasonality_index_markham
: Seasonality index (Markham, 1970);extract_extrema
: Determining the location of local maxima and minima in a time series;
Moreover, the package has a class called HydroSignatures
that can be used to compute
all these signatures by passing a streamflow and a precipitation time series, both
in millimeters per day (or any other unit of time). This class supports subtraction
and inequality operators, which can be used to compare two HydroSignatures
objects.
You can serialize the class to a JSON object using the to_json
method or convert it
to a dictionary using the to_dict
method.
Moreover, numba
is an optional dependency for the baseflow
function.
Installing numba
will speed up the computation of baseflow significantly.
For more efficient handling of NaN values, you can also install numbagg
.
You can also try using HydroSignatures without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Moreover, requests for additional functionalities can be submitted via issue tracker.
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Installation#
You can install HydroSignatures using pip
:
$ pip install hydrosignatures
or from the conda-forge
repository using Conda
or Mamba:
$ conda install -c conda-forge hydrosignatures
Quick start#
Let’s explore the capabilities of HydroSignatures
by getting streamflow
using PyGeoHydro, basin geometry using PyNHD and precipitation using PyDaymet.
In this example, we select West Branch Herring Run At Idlewylde, MD, as the
watershed of interest and compute the hydrological signatures for the period
from 2010 to 2020.
import pydaymet as daymet
import hydrosignatures as hs
import pygeohydro as gh
from hydrosignatures import HydroSignatures
from pygeohydro import NWIS
from pynhd import WaterData
site = "01585200"
start = "2010-01-01"
end = "2020-12-31"
First, we get the basin geometry of the watershed using gagesii_basins
layer of
the USGS’s WaterData web service.
wd = WaterData("gagesii_basins")
geometry = wd.byid("gage_id", site).geometry[0]
Then, we obtain the station’s info and streamflow data using NWIS. Note that we should convert the streamflow from cms to mm/day.
nwis = NWIS()
info = nwis.get_info({"site": site})
area_sqm = info.drain_sqkm.values[0] * 1e6
q_cms = nwis.get_streamflow(site, (start, end))
q_mmpd = q_cms * (24.0 * 60.0 * 60.0) / area_sqm * 1e3
q_mmpd.index = pd.to_datetime(q_mmpd.index.date)
Next, we retrieve the precipitation data using PyDaymet over the whole basin using the basin geometry and take its mean as the basin’s precipitation.
prcp = daymet.get_bygeom(geometry, (start, end), variables="prcp")
p_mmpd = prcp.prcp.mean(dim=["x", "y"]).to_pandas()
p_mmpd.index = pd.to_datetime(p_mmpd.index.date)
q_mmpd = q_mmpd.loc[p_mmpd.index]
Now, we can pass these two to the HydroSignatures
class:
sig = HydroSignatures(q_mmpd, p_mmpd)
The values
property of this class contains the computed signatures. For example,
let’s plot the regime curves:
sig.values.mean_monthly.plot()

Note that, you can also use the functions directly. For example, let’s get streamflow observations for another station and separate the baseflow using various filter parameters and compare them:
import numpy as np
import pandas as pd
q = nwis.get_streamflow("12304500", ("2019-01-01", "2019-12-31"))
alpha = np.arange(0.9, 1, 0.01)
qb = pd.DataFrame({a: hs.baseflow(q.squeeze(), alpha=a) for a in alpha})

Lastly, let’s compute Markham’s seasonality index for all streamflow time series of the stations in the CAMELS dataset. We retrieve the CAMELS dataset using PyGeoHydro:
import xarray as xr
_, camels_qobs = gh.get_camels()
discharge = camels_qobs.discharge.dropna("station_id")
discharge = xr.where(discharge < 0, 0, discharge)
si = hs.seasonality_index_markham(discharge.to_pandas())
More examples can be found here.
AsyncRetriever: Asynchronous requests with persistent caching#
Features#
AsyncRetriever is a part of HyRiver software stack that
is designed to aid in hydroclimate analysis through web services. This package serves as HyRiver’s
engine for asynchronously sending requests and retrieving responses as text
, binary
, or
json
objects. It uses persistent caching using
aiohttp-client-cache to speed up the retrieval
even further. Moreover, thanks to nest_asyncio
you can use this package in Jupyter notebooks. Although this package is part of the HyRiver
software stack, it can be used for any web calls. There are three functions that you can
use to make web calls:
retrieve_text
: Get responses astext
objects.retrieve_binary
: Get responses asbinary
objects.retrieve_json
: Get responses asjson
objects.stream_write
: Stream responses and write them to disk in chunks.
You can also use the general-purpose retrieve
function to get responses as any
of the three types. All responses are returned as a list that has the same order as the
input list of requests. Moreover, there is another function called delete_url_cache
for removing all requests from a cache file that contains a given URL.
You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to one week.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.HYRIVER_SSL_CERT
: Path to a SSL certificate file.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"
You can find some example notebooks here.
You can also try using AsyncRetriever without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Moreover, requests for additional functionalities can be submitted via issue tracker.
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Installation#
You can install async-retriever
using pip
:
$ pip install async-retriever
Alternatively, async-retriever
can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge async-retriever
Quick start#
AsyncRetriever by default creates and/or uses ./cache/aiohttp_cache.sqlite
as the cache
that you can customize by the cache_name
argument. Also, by default, the cache doesn’t
have any expiration date and the delete_url_cache
function should be used if you know
that a database on a server was updated, and you want to retrieve the latest data.
Alternatively, you can use the expire_after
to set the expiration date for the cache.
As an example for retrieving a binary
response, let’s use the DAAC server to get
NDVI.
The responses can be directly passed to xarray.open_mfdataset
to get the data as
a xarray
Dataset. We can also disable SSL certificate verification by setting
ssl=False
.
import io
import xarray as xr
import async_retriever as ar
from datetime import datetime
west, south, east, north = (-69.77, 45.07, -69.31, 45.45)
base_url = "https://thredds.daac.ornl.gov/thredds/ncss/ornldaac/1299"
dates_itr = ((datetime(y, 1, 1), datetime(y, 1, 31)) for y in range(2000, 2005))
urls, kwds = zip(
*[
(
f"{base_url}/MCD13.A{s.year}.unaccum.nc4",
{
"params": {
"var": "NDVI",
"north": f"{north}",
"west": f"{west}",
"east": f"{east}",
"south": f"{south}",
"disableProjSubset": "on",
"horizStride": "1",
"time_start": s.strftime("%Y-%m-%dT%H:%M:%SZ"),
"time_end": e.strftime("%Y-%m-%dT%H:%M:%SZ"),
"timeStride": "1",
"addLatLon": "true",
"accept": "netcdf",
}
},
)
for s, e in dates_itr
]
)
resp = ar.retrieve_binary(urls, kwds, max_workers=8, ssl=False)
data = xr.open_mfdataset(io.BytesIO(r) for r in resp)
We can remove these requests and their responses from the cache like so:
ar.delete_url_cache(base_url)

For a json
response example, let’s get water level recordings of an NOAA’s water level station,
8534720 (Atlantic City, NJ), during 2012, using CO-OPS API. Note that this CO-OPS product has a
31-day limit for a single request, so we have to break the request down accordingly.
import pandas as pd
station_id = "8534720"
start = pd.to_datetime("2012-01-01")
end = pd.to_datetime("2012-12-31")
s = start
dates = []
for e in pd.date_range(start, end, freq="m"):
dates.append((s.date(), e.date()))
s = e + pd.offsets.MonthBegin()
url = "https://api.tidesandcurrents.noaa.gov/api/prod/datagetter"
urls, kwds = zip(
*[
(
url,
{
"params": {
"product": "water_level",
"application": "web_services",
"begin_date": f'{s.strftime("%Y%m%d")}',
"end_date": f'{e.strftime("%Y%m%d")}',
"datum": "MSL",
"station": f"{station_id}",
"time_zone": "GMT",
"units": "metric",
"format": "json",
}
},
)
for s, e in dates
]
)
resp = ar.retrieve_json(urls, kwds)
wl_list = []
for rjson in resp:
wl = pd.DataFrame.from_dict(rjson["data"])
wl["t"] = pd.to_datetime(wl.t)
wl = wl.set_index(wl.t).drop(columns="t")
wl["v"] = pd.to_numeric(wl.v, errors="coerce")
wl_list.append(wl)
water_level = pd.concat(wl_list).sort_index()
water_level.attrs = rjson["metadata"]

Now, let’s see an example without any payload or headers. Here’s how we can retrieve harmonic constituents of several NOAA stations from CO-OPS:
stations = [
"8410140",
"8411060",
"8413320",
"8418150",
"8419317",
"8419870",
"8443970",
"8447386",
]
base_url = "https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations"
urls = [f"{base_url}/{i}/harcon.json?units=metric" for i in stations]
resp = ar.retrieve_json(urls)
amp_list = []
phs_list = []
for rjson in resp:
sid = rjson["self"].rsplit("/", 2)[1]
const = pd.DataFrame.from_dict(rjson["HarmonicConstituents"]).set_index("name")
amp = const.rename(columns={"amplitude": sid})[sid]
phase = const.rename(columns={"phase_GMT": sid})[sid]
amp_list.append(amp)
phs_list.append(phase)
amp = pd.concat(amp_list, axis=1)
phs = pd.concat(phs_list, axis=1)

PyGeoOGC: Retrieve Data from RESTful, WMS, and WFS Services#
Features#
PyGeoOGC is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides general interfaces to web services that are based on ArcGIS RESTful, WMS, and WFS. Although all these web services have limits on the number of features per request (e.g., 1000 object IDs for a RESTful request or 8 million pixels for a WMS request), PyGeoOGC, first, divides the large requests into smaller chunks, and then returns the merged results.
Moreover, under the hood, PyGeoOGC uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.
You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_NAME_HTTP
: Path to the caching SQLite database for HTTP requests. It defaults to./cache/http_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to one week.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.HYRIVER_SSL_CERT
: Path to a SSL certificate file.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"
There is also an inventory of URLs for some of these web services in form of a class called
ServiceURL
. These URLs are in four categories: ServiceURL().restful
,
ServiceURL().wms
, ServiceURL().wfs
, and ServiceURL().http
. These URLs provide you
with some examples of the services that PyGeoOGC supports. If you have success using PyGeoOGC with a web
service please consider submitting a request to be added to this URL inventory. You can get all
the URLs in the ServiceURL
class by just printing it print(ServiceURL())
.
PyGeoOGC has three main classes:
ArcGISRESTful
: This class can be instantiated by providing the target layer URL. For example, for getting Watershed Boundary Data we can useServiceURL().restful.wbd
. By looking at the web service’s website we see that there are nine layers. For example, 1 for 2-digit HU (Region), 6 for 12-digit HU (Subregion), and so on. We can pass the URL to the target layer directly, like thisf"{ServiceURL().restful.wbd}/6"
or as a separate argument vialayer
.Afterward, we request for the data in two steps. First, we need to get the target object IDs using
oids_bygeom
(within a geometry),oids_byfield
(specific field IDs), oroids_bysql
(any valid SQL 92 WHERE clause) class methods. Then, we can get the target features usingget_features
class method. The returned response can be converted into ageopandas.GeoDataFrame
usingjson2geodf
function from PyGeoUtils.WMS
: Instantiation of this class requires at least 3 arguments: service URL, layer name(s), and output format. Additionally, target CRS and the web service version can be provided. Upon instantiation, we can usegetmap_bybox
method class to get the target raster data within a bounding box. The box can be in any valid CRS and if it is different from the default CRS,EPSG:4326
, it should be passed usingbox_crs
argument. The service response can be converted into axarray.Dataset
usinggtiff2xarray
function from PyGeoUtils.WFS
: Instantiation of this class is similar toWMS
. The only difference is that only one layer name can be passed. Upon instantiation there are three ways to get the data:getfeature_bybox
: Get all the target features within a bounding box in any valid CRS.getfeature_byid
: Get all the target features based on the IDs. Note that two arguments should be provided:featurename
, andfeatureids
. You can get a list of valid feature names usingget_validnames
class method.getfeature_byfilter
: Get the data based on any valid CQL filter.
You can convert the returned response of this function to a
GeoDataFrame
usingjson2geodf
function from PyGeoUtils package.
PyGeoOGC also includes several utilities:
streaming_download
for downloading large files in parallel and in chunks, efficiently.traverse_json
for traversing a nested JSON object.match_crs
for reprojecting a geometry or bounding box to any valid CRS.
You can find some example notebooks here.
Furthermore, you can also try using PyGeoOGC without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Moreover, requests for additional functionalities can be submitted via issue tracker.
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Installation#
You can install PyGeoOGC using pip
:
$ pip install pygeoogc
Alternatively, PyGeoOGC can be installed from the conda-forge
repository
using Conda
or Mamba:
$ conda install -c conda-forge pygeoogc
Quick start#
We can access
NHDPlus HR
via RESTful service,
National Wetlands Inventory from WMS, and
FEMA National Flood Hazard
via WFS. The output for these functions are of type requests.Response
that
can be converted to GeoDataFrame
or xarray.Dataset
using
PyGeoUtils.
Let’s start the National Map’s NHDPlus HR web service. We can query the flowlines that are within a geometry as follows:
from pygeoogc import ArcGISRESTful, WFS, WMS, ServiceURL
import pygeoutils as geoutils
from pynhd import NLDI
basin_geom = NLDI().get_basins("01031500").geometry[0]
hr = ArcGISRESTful(ServiceURL().restful.nhdplushr, 2, outformat="json")
resp = hr.get_features(hr.oids_bygeom(basin_geom, 4326))
flowlines = geoutils.json2geodf(resp)
Note oids_bygeom
has three additional arguments: sql_clause
, spatial_relation
,
and distance
. We can use sql_clause
for passing any valid SQL WHERE clauses and
spatial_relation
for specifying the target predicate such as
intersect, contain, cross, etc. The default predicate is intersect
(esriSpatialRelIntersects
). Additionally, we can use distance
for specifying the buffer
distance from the input geometry for getting features.
We can also submit a query based on IDs of any valid field in the database. If the measure
property is desired you can pass return_m
as True
to the get_features
class method:
oids = hr.oids_byfield("PERMANENT_IDENTIFIER", ["103455178", "103454362", "103453218"])
resp = hr.get_features(oids, return_m=True)
flowlines = geoutils.json2geodf(resp)
Additionally, any valid SQL 92 WHERE clause can be used. For more details look here. For example, let’s limit our first request to only include catchments with areas larger than 0.5 sqkm.
oids = hr.oids_bygeom(basin_geom, geo_crs=4326, sql_clause="AREASQKM > 0.5")
resp = hr.get_features(oids)
catchments = geoutils.json2geodf(resp)
A WMS-based example is shown below:
wms = WMS(
ServiceURL().wms.fws,
layers="0",
outformat="image/tiff",
crs=3857,
)
r_dict = wms.getmap_bybox(
basin_geom.bounds,
1e3,
box_crs=4326,
)
wetlands = geoutils.gtiff2xarray(r_dict, basin_geom, 4326)
Query from a WFS-based web service can be done either within a bounding box or using any valid CQL filter.
wfs = WFS(
ServiceURL().wfs.fema,
layer="public_NFHL:Base_Flood_Elevations",
outformat="esrigeojson",
crs=4269,
)
r = wfs.getfeature_bybox(basin_geom.bounds, box_crs=4326)
flood = geoutils.json2geodf(r.json(), 4269, 4326)
layer = "wmadata:huc08"
wfs = WFS(
ServiceURL().wfs.waterdata,
layer=layer,
outformat="application/json",
version="2.0.0",
crs=4269,
)
r = wfs.getfeature_byfilter(f"huc8 LIKE '13030%'")
huc8 = geoutils.json2geodf(r.json(), 4269, 4326)

PyGeoUtils: Utilities for (Geo)JSON and (Geo)TIFF Conversion#
Features#
PyGeoUtils is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides utilities for manipulating (Geo)JSON and (Geo)TIFF responses from web services. These utilities are:
Coordinates
: Generate validated and normalized coordinates in WGS84.GeoBSpline
: Create B-spline from ageopandas.GeoDataFrame
of points.smooth_linestring
: Smooth ashapely.geometry.LineString
using B-spline.bspline_curvature
: Compute tangent angles, curvature, and radius of curvature of a B-Spline at any points along the curve.arcgis2geojson
: Convert ESRIGeoJSON format to GeoJSON.break_lines
: Break lines at specified points in a given direction.gtiff2xarray
: Convert (Geo)Tiff byte responses toxarray.Dataset
.json2geodf
: Creategeopandas.GeoDataFrame
from (Geo)JSON responsessnap2nearest
: Find the nearest points on a line to a set of points.xarray2geodf
: Vectorize axarray.DataArray
to ageopandas.GeoDataFrame
.geodf2xarray
: Rasterize ageopandas.GeoDataFrame
to axarray.DataArray
.xarray_geomask
: Mask axarray.Dataset
based on a geometry.query_indices
: A wrapper aroundgeopandas.sindex.query_bulk
. However, instead of returning an array of positional indices, it returns a dictionary of indices where keys are the indices of the input geometry and values are a list of indices of the tree geometries that intersect with the input geometry.nested_polygons
: Determining nested (multi)polygons in ageopandas.GeoDataFrame
.multi2poly
: For converting aMultiPolygon
to aPolygon
in ageopandas.GeoDataFrame
.geometry_reproject
: For reprojecting a geometry (bounding box, list of coordinates, or anyshapely.geometry
) to a new CRS.gtiff2vrt
: For converting a list of GeoTIFF files to a VRT file.
You can find some example notebooks here.
You can also try using PyGeoUtils without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Moreover, requests for additional functionalities can be submitted via issue tracker.
Citation#
If you use any of HyRiver packages in your research, we appreciate citations:
@article{Chegini_2021,
author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
doi = {10.21105/joss.03175},
journal = {Journal of Open Source Software},
month = {10},
number = {66},
pages = {1--3},
title = {{HyRiver: Hydroclimate Data Retriever}},
volume = {6},
year = {2021}
}
Installation#
You can install PyGeoUtils using pip
after installing libgdal
on your system
(for example, in Ubuntu run sudo apt install libgdal-dev
).
$ pip install pygeoutils
Alternatively, PyGeoUtils can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge pygeoutils
Quick start#
We start by smoothing a shapely.geometry.LineString
using B-spline:
import pygeoutils as pgu
from shapely import LineString
line = LineString(
[
(-97.06138, 32.837),
(-97.06133, 32.836),
(-97.06124, 32.834),
(-97.06127, 32.832),
]
)
line = pgu.geometry_reproject(line, 4326, 5070)
sp = pgu.smooth_linestring(line, 5070, 5)
line_sp = pgu.geometry_reproject(sp.line, 5070, 4326)
Next, we use
PyGeoOGC to access
National Wetlands Inventory from WMS, and
FEMA National Flood Hazard
via WFS, then convert the output to xarray.Dataset
and GeoDataFrame
, respectively.
from pygeoogc import WFS, WMS, ServiceURL
from shapely.geometry import Polygon
geometry = Polygon(
[
[-118.72, 34.118],
[-118.31, 34.118],
[-118.31, 34.518],
[-118.72, 34.518],
[-118.72, 34.118],
]
)
crs = 4326
wms = WMS(
ServiceURL().wms.mrlc,
layers="NLCD_2011_Tree_Canopy_L48",
outformat="image/geotiff",
crs=crs,
)
r_dict = wms.getmap_bybox(
geometry.bounds,
1e3,
box_crs=crs,
)
canopy = pgu.gtiff2xarray(r_dict, geometry, crs)
mask = canopy > 60
canopy_gdf = pgu.xarray2geodf(canopy, "float32", mask)
url_wfs = "https://hazards.fema.gov/gis/nfhl/services/public/NFHL/MapServer/WFSServer"
wfs = WFS(
url_wfs,
layer="public_NFHL:Base_Flood_Elevations",
outformat="esrigeojson",
crs=4269,
)
r = wfs.getfeature_bybox(geometry.bounds, box_crs=crs)
flood = pgu.json2geodf(r.json(), 4269, crs)
Example Gallery#
The following notebooks demonstrate capabilities of the HyRiver software stack.
This page was generated from 3dep.ipynb.
Interactive online version:
Topographic Data#
[1]:
from pathlib import Path
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import osmnx as ox
import osmnx.plot as ox_plot
import xarray as xr
import xrspatial
import py3dep
import pydaymet as daymet
from pynhd import NLDI
Py3DEP provides access to the 3DEP database which is a part of the National Map services. The 3DEP service has multi-resolution sources and depending on the user provided resolution, the data is resampled on the server-side based on all the available data sources.
The following functionalities are currently available:
get_map
: Get topographic data the dynamic 3DEP service that supports the following layers:DEM
Hillshade Gray
Aspect Degrees
Aspect Map
GreyHillshade Elevation Fill
Hillshade Multidirectional
Slope Degrees
Slope Map
Hillshade Elevation Tinted
Height Ellipsoidal
Contour 25
Contour Smoothed 25
get_dem
: Get DEM data from either the dynamic or static 3DEP service. Considering that the static service is much faster, if the target DEM resolution is 10 m, 30 m, or 60 m, then the static service is used. Otherwise, the dynamic service is used.static_3dep_dem
: Get DEM data at 10 m, 30 m, or 60 m resolution from the staged 3DEP data. Since this function only returns DEM, for computing other terrain attributes you can use xarray-spatial. Just note that you should reproject the outputDataArray
to a projected CRS like 5070 before passing it toxarray-spatial
like so:dem = dem.rio.reproject(5070)
.elevation_bygrid
: For retrieving elevations of all the grid points in a 2D grid.elevation_bycoords
: For retrieving elevation of a list ofx
andy
coordinates.elevation_profile
: For retrieving elevation profile along a line at a given spacing. This function converts the line to a B-spline and then calculates the elevation along the spline at a given uniform spacing.deg2mpm
: For converting slope dataset from degree to meter per meter.query_3dep_sources
: For querying bounds of 3DEP’s data sources within a bounding box.check_3dep_availability
: For querying 3DEP’s resolution availability within a bounding box.
Let’s get a watershed geometry using NLDI and then get DEM and slope at 90 m resolution.
[2]:
geometry = NLDI().get_basins("11092450").geometry[0]
We can either directly use get_map
to get DEM and slope or use get_dem
to get DEM and then use xarray-spatial
to compute slope.
[3]:
topo = py3dep.get_map(["DEM", "Slope Degrees"], geometry, 90, geo_crs=4326, crs=5070)
[4]:
dem = py3dep.get_dem(geometry, 30)
dem = dem.rio.reproject(5070)
slope = py3dep.deg2mpm(xrspatial.slope(dem))
topo = xr.merge([dem, slope])
We can save the DEM DataArray
as a raster file using rioxarray
:
[5]:
dem.rio.to_raster(Path("input_data", "dem_11092450.tif"))
[6]:
fig, axs = plt.subplots(ncols=2, figsize=(13, 4))
dem.plot(ax=axs[0])
slope.plot(ax=axs[1])
for ax in axs:
ax.set_title("")
ax.set_axis_off()
fig.savefig("_static/dem_slope.png", bbox_inches="tight", facecolor="w")

We can also get elevations of a list of coordinates using py3dep.elevation_bycoords
function. This function is particularly useful for getting elevations of nodes of a network, for example, is a river or a street network. Let’s use osmnx package to get a street network:
[7]:
G = ox.graph_from_place("Piedmont, California, USA", network_type="drive")
Now, we can get the elevations for each node based on their coordinates and then plot the results.
[8]:
x, y = nx.get_node_attributes(G, "x").values(), nx.get_node_attributes(G, "y").values()
elevation = py3dep.elevation_bycoords(list(zip(x, y)), crs="epsg:4326", source="airmap")
nx.set_node_attributes(G, dict(zip(G.nodes(), elevation)), "elevation")
[9]:
nc = ox_plot.get_node_colors_by_attr(G, "elevation", cmap="terrain")
fig, ax = ox.plot_graph(
G,
node_color=nc,
node_size=10,
save=True,
bgcolor="w",
filepath="_static/street_elev.png",
dpi=100,
)

Note that, this function gets the elevation data from the elevation map of the bounding box of all the coordinates. So, if the larger the extent of this bounding box, the longer is going to take for the function to get the data.
Additionally, we can get the elevations of set of x- and y- coordinates of a grid. For example, let’s get the minimum temperature data within the watershed from Daymet using PyDaymet then add the elevation as a new variable to the dataset:
[10]:
clm = daymet.get_bygeom(geometry, ("2005-01-01", "2005-01-31"), variables="tmin")
elev = py3dep.elevation_bygrid(clm.x.values, clm.y.values, clm.rio.crs, 1000)
attrs = clm.attrs
clm = xr.merge([clm, elev])
clm["elevation"] = clm.elevation.where(~np.isnan(clm.isel(time=0).tmin))
clm.attrs.update(attrs)
[11]:
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
clm.tmin.isel(time=10).plot(ax=ax1)
clm.elevation.plot(ax=ax2)
[11]:
<matplotlib.collections.QuadMesh at 0x1b2e299d0>

This page was generated from 3dhp.ipynb.
Interactive online version:
USGS 3DHP#
[1]:
import shapely
from pynhd import HP3D, NLDI
USGS 3DHP web service has five layers: hydrolocation
, flowline
, waterbody
, drainage_area
, and catchment
. Let’s start by getting the closest flowline within a 10 m radius of a point.
[2]:
nhd3d = HP3D("flowline")
point = shapely.Point(-89.441, 43.487)
flw = nhd3d.bygeom(point, distance=10)
flw.explore()
[2]:
Next, we use NLDI
to get the basin boundary of a NHD flowline and use it to query different layers of the 3DHP web service.
[3]:
comid = "937070225"
basin = NLDI().get_basins(comid, "comid")
basin.explore()
[3]:
[4]:
nhd3d = HP3D("flowline")
network = nhd3d.bygeom(basin.geometry[0])
network.explore()
[4]:
[5]:
nhd3d = HP3D("waterbody")
water = nhd3d.bygeom(basin.geometry[0])
water.explore()
[5]: