Hydroclimate Data Retriever
Contents
Hydroclimate Data Retriever#
HyRiver (formerly named hydrodata) is a suite of Python packages that provides a unified API for retrieving geospatial/temporal data from various web services. HyRiver includes two categories of packages:
Low-level APIs for accessing any of the supported web services, i.e., ArcGIS RESTful, WMS, and WFS.
High-level APIs for accessing some of the most commonly used datasets in hyrdology and climatology studies. Currently, this project only includes hydrology and climatology data within the US.
You can watch these videos for a quick overview of HyRiver
:
Getting Started#
Why HyRiver?#
Some of the major capabilities of HyRiver are as follows:
Easy access to many web services for subsetting data on server-side and returning the requests as masked Datasets or GeoDataFrames.
Splitting large requests into smaller chunks, under-the-hood, since web services often limit the number of features per request. So the only bottleneck for subsetting the data is your local machine memory.
Navigating and subsetting NHDPlus database (both medium- and high-resolution) using web services.
Cleaning up the vector NHDPlus data, fixing some common issues, and computing vector-based accumulation through a river network.
A URL inventory for some of the popular (and tested) web services.
Some utilities for manipulating the obtained data and their visualization.
Installation#
You can install all the packages using pip
:
$ pip install py3dep pynhd pygeohydro pydaymet pygeoogc pygeoutils async_retriever
Please note that installation with pip
fails if libgdal
is not installed on your system.
You should install this package manually beforehand. For example, on Ubuntu-based distros
the required package is libgdal-dev
. If this package is installed on your system
you should be able to run gdal-config --version
successfully.
Alternatively, you can use conda
or mamba
(recommended) to install these packages from
the conda-forge
repository:
$ conda install -c conda-forge py3dep pynhd pygeohydro pydaymet pygeoogc pygeoutils async_retriever
or:
$ mamba install -c conda-forge --strict-channel-priority py3dep pynhd pygeohydro pydaymet pygeoogc pygeoutils async_retriever
Dependencies#
async_retriever
cytoolz
geopandas
networkx
numpy
pandas
pyarrow
pygeoogc
pygeoutils
requests
shapely
simplejson
async_retriever
defusedxml
folium
geopandas
lxml
matplotlib
numpy
openpyxl
pandas
pygeoogc
pygeoutils
pynhd
rasterio
shapely
async_retriever
click
cytoolz
numpy
pydantic
pygeoogc
pygeoutils
rasterio
scipy
shapely
xarray
async_retriever
click
dask
lxml
numpy
pandas
py3dep
pygeoogc
pygeoutils
rasterio
scipy
shapely
xarray
async_retriever
cytoolz
defusedxml
owslib
pydantic
pyproj
pyyaml
requests
shapely
simplejson
urllib3
affine
dask
geopandas
netcdf4
numpy
orjson
pygeoogc
pyproj
rasterio
shapely
xarray
aiohttp-client-cache
aiohttp[speedups]
aiosqlite
cytoolz
nest-asyncio
orjson
Additionally, you can also install bottleneck
, pygeos
, and rtree
to improve
performance of xarray
and geopandas
. For handling vector and
raster data projections, cartopy
and rioxarray
are useful.
Software Stack#
A detailed description of each component of the HyRiver software stack.
Features#
PyNHD is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services.
This package provides access to several hydro-linked datasets including WaterData, The National Map’s NHDPlus MR, and NHDPlus HR, NLDI, PyGeoAPI, and GeoConnex.
These web services can be used to navigate and extract vector data from NHDPlus V2 (both mid- and high-resolution) database such as catchments, HUC8, HUC12, GagesII, flowlines, and water bodies. Moreover, PyNHD gives access to an item on ScienceBase called Select Attributes for NHDPlus Version 2.1 Reach Catchments and Modified Network Routed Upstream Watersheds for the Conterminous United States that is located here. This item provides over 30 attributes at catchment-scale based on NHDPlus ComIDs. These attributes are available in three categories:
Local (local): For individual reach catchments,
Total (upstream_acc): For network-accumulated values using total cumulative drainage area,
Divergence (div_routing): For network-accumulated values using divergence-routed.
A list of these attributes for each characteristic type can be accessed using nhdplus_attrs
function.
Moreover, the PyGeoAPI service provides four functionalities:
flow_trace
: Trace flow from a starting point to up/downstream direction.split_catchment
: Split the local catchment of a point of interest at the point’s location.elevation_profile
: Extract elevation profile along a flow path between two points.cross_section
: Extract cross-section at a point of interest along a flow line.
Similarly, PyNHD provides access to ComID-linked NHDPlus Value Added Attributes on
Hydroshare.
This dataset includes slope and roughness, among other attributes, for all the flowlines.
You can use nhdplus_vaa
function to get this dataset.
Additionally, PyNHD offers some extra utilities for processing the flowlines:
flowline_xsection
andnetwork_xsection
: Get cross-section lines along a flowline at a given spacing or a network of flowlines at a given spacing.flowline_resample
andnetwork_resample
: Resampe a flowline or network of flowlines based on a given spacing. This is useful for smoothing jagged flowlines similar to those in the NHDPlus database.prepare_nhdplus
: For cleaning up the data frame by, for example, removing tiny networks, adding ato_comid
column, and finding terminal flowlines if it doesn’t exist.topoogical_sort
: For sorting the river network topologically which is useful for routing and flow accumulation.vector_accumulation
: For computing flow accumulation in a river network. This function is generic, and any routing method can be plugged in.
These utilities are developed based on an R package called nhdplusTools and a Python package called nldi-xstool.
All functions and classes that request data from web services use async_retriever
that offers response caching. By default, the expiration time is set to never expire.
All these functions and classes have two optional parameters for controlling the cache:
expire_after
and disable_caching
. You can use expire_after
to set the expiration
time in seconds. If expire_after
is set to -1
, the cache will never expire (default).
You can use disable_caching
if you don’t want to use the cached responses. The cached
responses are stored in the ./cache/aiohttp_cache.sqlite
file.
You can find some example notebooks here.
Moreover, under the hood, PyNHD uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.
You can control the request/response caching behavior by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to -1 (never expire).HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
You can also try using PyNHD without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Please note that since this project is in the early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.
Moreover, requests for additional functionalities can be submitted via issue tracker.
Installation#
You can install PyNHD using pip
after installing libgdal
on your system
(for example, in Ubuntu run sudo apt install libgdal-dev
):
$ pip install pynhd
Alternatively, PyNHD can be installed from the conda-forge
repository
using Conda
or Mamba:
$ conda install -c conda-forge pynhd
Quick start#
Let’s explore the capabilities of NLDI
. We need to instantiate the class first:
from pynhd import NLDI, WaterData, NHDPlusHR
import pynhd as nhd
First, let’s get the watershed geometry of the contributing basin of a
USGS station using NLDI
:
nldi = NLDI()
station_id = "01031500"
basin = nldi.get_basins(station_id)
The navigate_byid
class method can be used to navigate NHDPlus in
both upstream and downstream of any point in the database. Let’s get the ComIDs and flowlines
of the tributaries and the main river channel upstream of the station.
flw_main = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamMain",
source="flowlines",
distance=1000,
)
flw_trib = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="flowlines",
distance=1000,
)
We can get other USGS stations upstream (or downstream) of the station and even set a distance limit (in km):
st_all = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="nwissite",
distance=1000,
)
st_d20 = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="nwissite",
distance=20,
)
We can get more information about these stations using GeoConnex:
gcx = GeoConnex("gages")
stations = st_all.identifier.str.split("-").str[1].unique()
gages = gpd.GeoDataFrame(
pd.concat(gcx.query({"provider_id": sid}) for sid in stations),
crs="epsg:4326",
)
Instead, we can carry out a spatial query within the basin of interest:
gages = pynhd.geoconnex(
item="gages",
query={"geometry": basin.geometry.iloc[0]},
)
Now, let’s get the HUC12 pour points:
pp = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="huc12pp",
distance=1000,
)

Also, we can get the slope data for each river segment from the NHDPlus VAA database:
vaa = nhd.nhdplus_vaa("input_data/nhdplus_vaa.parquet")
flw_trib["comid"] = pd.to_numeric(flw_trib.nhdplus_comid)
slope = gpd.GeoDataFrame(
pd.merge(flw_trib, vaa[["comid", "slope"]], left_on="comid", right_on="comid"),
crs=flw_trib.crs,
)
slope[slope.slope < 0] = np.nan
Additionally, we can obtain cross-section lines along the main river channel with 4 km spacing
and width of 2 km using network_xsection
as follows:
from pynhd import NHD
distance = 4000 # in meters
width = 2000 # in meters
nhd = NHD("flowline_mr")
main_nhd = nhd.byids("COMID", flw_main.index)
main_nhd = pynhd.prepare_nhdplus(main_nhd, 0, 0, 0, purge_non_dendritic=True)
main_nhd = main_nhd.to_crs("ESRI:102003")
cs = pynhd.network_xsection(main_nhd, distance, width)
Then, we can use Py3DEP to obtain the elevation profile along the cross-section lines.
Now, let’s explore the PyGeoAPI capabilities. There are two ways that you can access
PyGeoAPI: PyGeoAPI
class and pygeoapi
function. The PyGeoAPI
class
is for querying the database for a single location using tuples and list while the
pygeoapi
function is for querying the database for multiple locations at once
and accepts a geopandas.GeoDataFrame
as input. The pygeoapi
function
is more efficient than the PyGeoAPI
class and has a simpler interface. In future
versions, the PyGeoAPI
class will be deprecated and the pygeoapi
function
will be the only way to access the database. Let’s compare the two, starting by
PyGeoAPI
:
pygeoapi = PyGeoAPI()
trace = pygeoapi.flow_trace(
(1774209.63, 856381.68), crs="ESRI:102003", direction="none"
)
split = pygeoapi.split_catchment((-73.82705, 43.29139), crs="epsg:4326", upstream=False)
profile = pygeoapi.elevation_profile(
[(-103.801086, 40.26772), (-103.80097, 40.270568)],
numpts=101,
dem_res=1,
crs="epsg:4326",
)
section = pygeoapi.cross_section(
(-103.80119, 40.2684), width=1000.0, numpts=101, crs="epsg:4326"
)
Now, let’s do the same operations using pygeoapi
:
import geopandas as gpd
import shapely.geometry as sgeom
import pynhd as nhd
coords = gpd.GeoDataFrame(
{
"direction": ["up", "down"],
"upstream": [True, False],
"width": [1000.0, 500.0],
"numpts": [101, 55],
},
geometry=[
sgeom.Point(-73.82705, 43.29139),
sgeom.Point(-103.801086, 40.26772),
],
crs="epsg:4326",
)
trace = nhd.pygeoapi(coords, "flow_trace")
split = nhd.pygeoapi(coords, "split_catchment")
section = nhd.pygeoapi(coords, "cross_section")
coords = gpd.GeoDataFrame(
{
"direction": ["up", "down"],
"upstream": [True, False],
"width": [1000.0, 500.0],
"numpts": [101, 55],
"dem_res": [1, 10],
},
geometry=[
sgeom.MultiPoint([(-103.801086, 40.26772), (-103.80097, 40.270568)]),
sgeom.MultiPoint([(-102.801086, 39.26772), (-102.80097, 39.270568)]),
],
crs="epsg:4326",
)
profile = nhd.pygeoapi(coords, "elevation_profile")

Next, we retrieve mid- and high-resolution flowlines within the bounding box of our
watershed and compare them using WaterData
for mid-resolution, NHDPlusHR
for
high-resolution.
mr = WaterData("nhdflowline_network")
nhdp_mr = mr.bybox(basin.geometry[0].bounds)
hr = NHDPlusHR("flowline")
nhdp_hr = hr.bygeom(basin.geometry[0].bounds)

An alternative to WaterData
and NHDPlusHR
is the NHD
class that
supports both the mid- and high-resolution NHDPlus V2 data:
mr = NHD("flowline_mr")
nhdp_mr = mr.bygeom(basin.geometry[0].bounds)
hr = NHD("flowline_hr")
nhdp_hr = hr.bygeom(basin.geometry[0].bounds)
Moreover, WaterData
can find features within a given radius (in meters) of a point:
eck4 = "+proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
coords = (-5727797.427596455, 5584066.49330473)
rad = 5e3
flw_rad = mr.bydistance(coords, rad, loc_crs=eck4)
flw_rad = flw_rad.to_crs(eck4)
Instead of getting all features within a radius of the coordinate, we can snap to the closest feature ID using NLDI:
comid_closest = nldi.comid_byloc((x, y), eck4)
flw_closest = nhdp_mr.byid("comid", comid_closest.comid.values[0])

Since NHDPlus HR is still at the pre-release stage let’s use the MR flowlines to
demonstrate the vector-based accumulation. Based on a topological sorted river network
pynhd.vector_accumulation
computes flow accumulation in the network.
It returns a data frame that is sorted from upstream to downstream that
shows the accumulated flow in each node.
PyNHD has a utility called prepare_nhdplus
that identifies such
relationships among other things such as fixing some common issues with
NHDPlus flowlines. But first, we need to get all the NHDPlus attributes
for each ComID since NLDI
only provides the flowlines’ geometries
and ComIDs which is useful for navigating the vector river network data.
For getting the NHDPlus database we use WaterData
. Let’s use the
nhdflowline_network
layer to get required info.
wd = WaterData("nhdflowline_network")
comids = flw_trib.nhdplus_comid.to_list()
nhdp_trib = wd.byid("comid", comids)
flw = nhd.prepare_nhdplus(nhdp_trib, 0, 0, purge_non_dendritic=False)
To demonstrate the use of routing, let’s use nhdplus_attrs
function to get a list of available
NHDPlus attributes
char = "CAT_RECHG"
area = "areasqkm"
local = nldi.getcharacteristic_byid(comids, "local", char_ids=char)
flw = flw.merge(local[char], left_on="comid", right_index=True)
def runoff_acc(qin, q, a):
return qin + q * a
flw_r = flw[["comid", "tocomid", char, area]]
runoff = nhd.vector_accumulation(flw_r, runoff_acc, char, [char, area])
def area_acc(ain, a):
return ain + a
flw_a = flw[["comid", "tocomid", area]]
areasqkm = nhd.vector_accumulation(flw_a, area_acc, area, [area])
runoff /= areasqkm
Since these are catchment-scale characteristics, let’s get the catchments then add the accumulated characteristic as a new column and plot the results.
wd = WaterData("catchmentsp")
catchments = wd.byid("featureid", comids)
c_local = catchments.merge(local, left_on="featureid", right_index=True)
c_acc = catchments.merge(runoff, left_on="featureid", right_index=True)

More examples can be found here.
PyGeoHydro: Retrieve Geospatial Hydrology Data#
Features#
PyGeoHydro (formerly named hydrodata) is a part of
HyRiver software stack that
is designed to aid in hydroclimate analysis through web services. This package provides
access to some public web services that offer geospatial hydrology data. It has three
main modules: pygeohydro
, plot
, and helpers
.
PyGeoHydro supports the following datasets:
NWIS for daily mean streamflow observations (returned as a
pandas.DataFrame
orxarray.Dataset
with station attributes),CAMELS for accessing streamflow observations (1980-2014) and basin-level attributes of 671 stations within CONUS.
Water Quality Portal for accessing current and historical water quality data from more than 1.5 million sites across the US,
NID for accessing the National Inventory of Dams web service,
HCDN 2009 for identifying sites where human activity affects the natural flow of the watercourse,
NLCD 2019 for land cover/land use, imperviousness, imperviousness descriptor, and canopy data. You can get data using both geometries and coordinates.
WBD for accessing Hydrologic Unit (HU) polygon boundaries within the US (all HUC levels).
SSEBop for daily actual evapotranspiration, for both single pixel and gridded data.
Also, it has two other functions:
interactive_map
: Interactive map for exploring NWIS stations within a bounding box.cover_statistics
: Categorical statistics of land use/land cover data.overland_roughness
: Estimate overland roughness from land use/land cover data.
The plot
module includes two main functions:
signatures
: Hydrologic signature graphs.cover_legends
: Official NLCD land cover legends for plotting a land cover dataset.descriptor_legends
: Color map and legends for plotting an imperviousness descriptor dataset.
The helpers
module includes:
nlcd_helper
: A roughness coefficients lookup table for each land cover and imperviousness descriptor type which is useful for overland flow routing among other applications.nwis_error
: A dataframe for finding information about NWIS requests’ errors.
You can find some example notebooks here.
Moreover, under the hood, PyGeoHydro uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.
You can control the request/response caching behavior by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to -1 (never expire).HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
You can also try using PyGeoHydro without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Please note that since this project is in early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.
Moreover, requests for additional functionalities can be submitted via issue tracker.
Installation#
You can install PyGeoHydro using pip
after installing libgdal
on your system
(for example, in Ubuntu run sudo apt install libgdal-dev
). Moreover, PyGeoHydro has an optional
dependency for using persistent caching, requests-cache
. We highly recommend installing
this package as it can significantly speed up send/receive queries. You don’t have to change
anything in your code, since PyGeoHydro under-the-hood looks for requests-cache
and
if available, it will automatically use persistent caching:
$ pip install pygeohydro
Alternatively, PyGeoHydro can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge pygeohydro
Quick start#
We can explore the available NWIS stations within a bounding box using interactive_map
function. It returns an interactive map and by clicking on a station some of the most
important properties of stations are shown.
import pygeohydro as gh
bbox = (-69.5, 45, -69, 45.5)
gh.interactive_map(bbox)

We can select all the stations within this boundary box that have daily mean streamflow data from
2000-01-01
to 2010-12-31
:
from pygeohydro import NWIS
nwis = NWIS()
query = {
**nwis.query_bybox(bbox),
"hasDataTypeCd": "dv",
"outputDataTypeCd": "dv",
}
info_box = nwis.get_info(query)
dates = ("2000-01-01", "2010-12-31")
stations = info_box[
(info_box.begin_date <= dates[0]) & (info_box.end_date >= dates[1])
].site_no.tolist()
Then, we can get the daily streamflow data in mm/day (by default the values are in cms) and plot them:
from pygeohydro import plot
qobs = nwis.get_streamflow(stations, dates, mmd=True)
plot.signatures(qobs)
By default, get_streamflow
returns a pandas.DataFrame
that has a attrs
method
containing metadata for all the stations. You can access it like so qobs.attrs
.
Moreover, we can get the same data as xarray.Dataset
as follows:
qobs_ds = nwis.get_streamflow(stations, dates, to_xarray=True)
This xarray.Dataset
has two dimensions: time
and station_id
. It has
10 variables including discharge
with two dimensions while other variables
that are station attitudes are one dimensional.
We can also get instantaneous streamflow data using get_streamflow
. This method assumes
that the input dates are in UTC time zone and returns the data in UTC time zone as well.
date = ("2005-01-01 12:00", "2005-01-12 15:00")
qobs = nwis.get_streamflow("01646500", date, freq="iv")
We can get the CAMELS dataset as a geopandas.GeoDataFrame
that includes geometry and
basin-level attributes of 671 natural watersheds within CONUS and their streamflow
observations between 1980-2014 as a xarray.Dataset
, like so:
attrs, qobs = gh.get_camels()

The WaterQuality
has a number of convenience methods to retrieve data from the
web service. Since there are many parameter combinations that can be
used to retrieve data, a general method is also provided to retrieve data from
any of the valid endpoints. You can use get_json
to retrieve stations info
as a geopandas.GeoDataFrame
or get_csv
to retrieve stations data as a
pandas.DataFrame
. You can construct a dictionary of the parameters and pass
it to one of these functions. For more information on the parameters, please
consult the Water Quality Data documentation.
For example, let’s find all the stations within a bounding box that have Caffeine data:
from pynhd import WaterQuality
bbox = (-92.8, 44.2, -88.9, 46.0)
kwds = {"characteristicName": "Caffeine"}
wq = WaterQuality()
stations = wq.station_bybbox(bbox, kwds)
Or the same criterion but within a 30-mile radius of a point:
stations = wq.station_bydistance(-92.8, 44.2, 30, kwds)
Then we can get the data for all these stations the data like this:
sids = stations.MonitoringLocationIdentifier.tolist()
caff = wq.data_bystation(sids, kwds)

Moreover, we can get land use/land cove data using nlcd_bygeom
or nlcd_bycoods
functions,
percentages of land cover types using cover_statistics
, and overland roughness using
overland_roughness
. The nlcd_bycoords
function returns a geopandas.GeoDataFrame
with the NLCD layers as columns and input coordinates as the geometry
column. Moreover,
the nlcd_bygeom
function accepts both a single geometry or a geopandas.GeoDataFrame
as the input.
from pynhd import NLDI
basins = NLDI().get_basins(["01031450", "01318500", "01031510"])
lulc = gh.nlcd_bygeom(basins, 100, years={"cover": [2016, 2019]})
stats = gh.cover_statistics(lulc["01318500"].cover_2016)
roughness = gh.overland_roughness(lulc["01318500"].cover_2019)

Next, let’s use ssebopeta_bygeom
to get actual ET data for a basin. Note that there’s a
ssebopeta_bycoords
function that returns an ETA time series for a single coordinate.
geometry = NLDI().get_basins("01315500").geometry[0]
eta = gh.ssebopeta_bygeom(geometry, dates=("2005-10-01", "2005-10-05"))

Additionally, we can pull all the US dams data using NID
. Let’s get dams that are within this
bounding box and have a maximum storage larger than 200 acre-feet.
nid = NID()
dams = nid.get_bygeom((-65.77, 43.07, -69.31, 45.45), "epsg:4326")
dams = nid.inventory_byid(dams.id.to_list())
dams = dams[dams.maxStorage > 200]
We can get also all dams within CONUS in NID with maximum storage larger than 200 acre-feet:
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
conus = world[world.name == "United States of America"].geometry.iloc[0].geoms[0]
dam_list = nid.get_byfilter([{"maxStorage": ["[200 5000]"]}])
dams = dam_list[0][dam_list[0].is_valid]
dams = dams[dams.within(conus)]

The WBD
class allows us to get Hydrologic Unit (HU) polygon boundaries. Let’s
get the two Hudson HUC4s:
from pygeohydro import WBD
wbd = WBD("huc4")
hudson = wbd.byids("huc4", ["0202", "0203"])
Py3DEP: Topographic data through 3DEP#
Features#
Py3DEP is a part of HyRiver software stack that
is designed to aid in hydroclimate analysis through web services. This package provides
access to the 3DEP
database which is a part of the
National Map services.
The 3DEP service has multi-resolution sources and depending on the user-provided resolution,
the data is resampled on the server-side based on all the available data sources. Py3DEP returns
the requests as xarray dataset. The main function is
get_map
which supports the following layers:
DEM
Hillshade Gray
Aspect Degrees
Aspect Map
GreyHillshade Elevation Fill
Hillshade Multidirectional
Slope Degrees
Slope Map
Hillshade Elevation Tinted
Height Ellipsoidal
Contour 25
Contour Smoothed 25
Moreover, Py3DEP offers some additional utilities:
elevation_bygrid
: For retrieving elevations of all the grid points in a 2D grid.elevation_bycoords
: For retrieving elevation of a list ofx
andy
coordinates.elevation_profile
: For retrieving elevation profile along a line at a given spacing. This function converts the line to a B-spline and then calculates the elevation along the spline at a given uniform spacing.deg2mpm
: For converting slope dataset from degree to meter per meter.query_3dep_sources
: For querying bounds of 3DEP’s data sources within a bounding box.check_3dep_availability
: For querying 3DEP’s resolution availability within a bounding box.
You can find some example notebooks here.
Moreover, under the hood, Py3DEP uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.
You can control the request/response caching behavior by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to -1 (never expire).HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
You can also try using Py3DEP without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Please note that since this project is in the early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.
Moreover, requests for additional functionalities can be submitted via issue tracker.
Installation#
You can install Py3DEP using pip
after installing libgdal
on your system
(for example, in Ubuntu run sudo apt install libgdal-dev
). Moreover, Py3DEP has an optional
dependency for using persistent caching, requests-cache
. We highly recommend installing
this package as it can significantly speed up send/receive queries. You don’t have to change
anything in your code, since Py3DEP under-the-hood looks for requests-cache
and if available,
it will automatically use persistent caching:
$ pip install py3dep
Alternatively, Py3DEP can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge py3dep
Quick start#
You can use Py3DEP using command-line or as a Python library. The command-line interface provides access to two functionality:
Getting topographic data: You must create a
geopandas.GeoDataFrame
that contains the geometries of the target locations. This dataframe must have at least three columns:id
,res
, andgeometry
. Theid
column is used as filenames for saving the obtained topographic data to a NetCDF (.nc
) file. Theres
column must be the target resolution in meter. Then, you must save the dataframe to a file with extensions such as.shp
or.gpkg
(whatever thatgeopandas.read_file
can read).Getting elevation: You must create a
pandas.DataFrame
that contains coordinates of the target locations. This dataframe must have at least two columns:x
andy
. The elevations are obtained usingairmap
service in meters. The data are saved as acsv
file with the same filename as the input file with an_elevation
appended, e.g.,coords_elevation.csv
.
$ py3dep --help
Usage: py3dep [OPTIONS] COMMAND [ARGS]...
Command-line interface for Py3DEP.
Options:
-h, --help Show this message and exit.
Commands:
coords Retrieve topographic data for a list of coordinates.
geometry Retrieve topographic data within geometries.
The coords
sub-command is as follows:
$ py3dep coords -h
Usage: py3dep coords [OPTIONS] FPATH
Retrieve topographic data for a list of coordinates.
FPATH: Path to a csv file with two columns named ``lon`` and ``lat``.
Examples:
$ cat coords.csv
lon,lat
-122.2493328,37.8122894
$ py3dep coords coords.csv -q airmap -s topo_dir
Options:
-q, --query_source [airmap|tnm|tep]
Source of the elevation data.
-s, --save_dir PATH Path to a directory to save the requested
files. Extension for the outputs is either
`.nc` for geometry or `.csv` for coords.
-h, --help Show this message and exit.
And, the geometry
sub-command is as follows:
$ py3dep geometry -h
Usage: py3dep geometry [OPTIONS] FPATH
Retrieve topographic data within geometries.
FPATH: Path to a shapefile (.shp) or geopackage (.gpkg) file.
This file must have three columns and contain a ``crs`` attribute:
- ``id``: Feature identifiers that py3dep uses as the output netcdf/csv filenames.
- ``res``: Target resolution in meters.
- ``geometry``: A Polygon or MultiPloygon.
Examples:
$ py3dep geometry ny_geom.gpkg -l "Slope Map" -l DEM -s topo_dir
Options:
-l, --layers [DEM|Hillshade Gray|Aspect Degrees|Aspect Map|GreyHillshade_elevationFill|Hillshade Multidirectional|Slope Map|Slope Degrees|Hillshade Elevation Tinted|Height Ellipsoidal|Contour 25|Contour Smoothed 25]
Target topographic data layers
-s, --save_dir PATH Path to a directory to save the requested
files.Extension for the outputs is either
`.nc` for geometry or `.csv` for coords.
-h, --help Show this message and exit.
Now, let’s see how we can use Py3DEP as a library.
Py3DEP accepts Shapely’s
Polygon or a bounding box (a tuple of length four) as an input geometry.
We can use PyNHD to get a watershed’s geometry, then use it to get the DEM and slope
in meters/meters from Py3DEP using get_map
function.
The get_map
has a resolution
argument that sets the target resolution
in meters. Note that the highest available resolution throughout the CONUS is about 10 m,
though higher resolutions are available in limited parts of the US. Note that the input
geometry can be in any valid spatial reference (geo_crs
argument). The crs
argument,
however, is limited to CRS:84
, EPSG:4326
, and EPSG:3857
since 3DEP only supports
these spatial references.
import py3dep
from pynhd import NLDI
geom = NLDI().get_basins("01031500").geometry[0]
dem = py3dep.get_map("DEM", geom, resolution=30, geo_crs="epsg:4326", crs="epsg:3857")
slope = py3dep.get_map("Slope Degrees", geom, resolution=30)
slope = py3dep.deg2mpm(slope)

We can use rioxarray package to save the obtained dataset as a raster file:
import rioxarray
dem.rio.to_raster("dem_01031500.tif")
Moreover, we can get the elevations of a set of x- and y- coordinates on a grid. For example, let’s get the minimum temperature data within this watershed from Daymet using PyDaymet then add the elevation as a new variable to the dataset:
import pydaymet as daymet
import xarray as xr
import numpy as np
clm = daymet.get_bygeom(geometry, ("2005-01-01", "2005-01-31"), variables="tmin")
elev = py3dep.elevation_bygrid(clm.x.values, clm.y.values, clm.crs, clm.res[0] * 1000)
attrs = clm.attrs
clm = xr.merge([clm, elev])
clm["elevation"] = clm.elevation.where(~np.isnan(clm.isel(time=0).tmin), drop=True)
clm.attrs.update(attrs)
Now, let’s get street network data using osmnx package
and add elevation data for its nodes using elevation_bycoords
function.
import osmnx as ox
G = ox.graph_from_place("Piedmont, California, USA", network_type="drive")
x, y = nx.get_node_attributes(G, "x").values(), nx.get_node_attributes(G, "y").values()
elevation = py3dep.elevation_bycoords(zip(x, y), crs="epsg:4326")
nx.set_node_attributes(G, dict(zip(G.nodes(), elevation)), "elevation")

We can get the elevation profile along a line at a given spacing using elevation_profile
function. For example, let’s get the elevation profile at 10-m spacing along the main flowline
of the upstream drainage area of a USGS station with ID 01031500
:
import py3dep
from pynhd import NLDI
flw_main = NLDI().navigate_byid(
fsource="nwissite",
fid="USGS-01031500",
navigation="upstreamMain",
source="flowlines",
distance=1000,
)
line = flw_main.geometry.unary_union
elevation = py3dep.elevation_profile(line, 10)
PyDaymet: Daily climate data through Daymet#
Features#
PyDaymet is a part of HyRiver software stack that
is designed to aid in hydroclimate analysis through web services. This package provides
access to climate data from
Daymet V4 database using NetCDF
Subset Service (NCSS). Both single pixel (using get_bycoords
function) and gridded data (using
get_bygeom
) are supported which are returned as
pandas.DataFrame
and xarray.Dataset
, respectively. Climate data is available for North
America, Hawaii from 1980, and Puerto Rico from 1950 at three time scales: daily, monthly,
and annual. Additionally, PyDaymet can compute Potential EvapoTranspiration (PET)
using three methods: penman_monteith
, priestley_taylor
, and hargreaves_samani
for
both single pixel and gridded data.
You can find some example notebooks here.
Moreover, under the hood, PyDaymet uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.
You can control the request/response caching behavior by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to -1 (never expire).HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
You can also try using PyDaymet without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Please note that since this project is in the early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.
Moreover, requests for additional functionalities can be submitted via issue tracker.
Installation#
You can install PyDaymet using pip
after installing libgdal
on your system
(for example, in Ubuntu run sudo apt install libgdal-dev
):
$ pip install pydaymet
Alternatively, PyDaymet can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge pydaymet
Quick start#
You can use PyDaymet using command-line or as a Python library. The commanda-line provides access to two functionality:
Getting gridded climate data: You must create a
geopandas.GeoDataFrame
that contains the geometries of the target locations. This dataframe must have four columns:id
,start
,end
,geometry
. Theid
column is used as filenames for saving the obtained climate data to a NetCDF (.nc
) file. Thestart
andend
columns are starting and ending dates of the target period. Then, you must save the dataframe as a shapefile (.shp
) or geopackage (.gpkg
) with CRS attribute.Getting single pixel climate data: You must create a CSV file that contains coordinates of the target locations. This file must have at four columns:
id
,start
,end
,lon
, andlat
. Theid
column is used as filenames for saving the obtained climate data to a CSV (.csv
) file. Thestart
andend
columns are the same as thegeometry
command. Thelon
andlat
columns are the longitude and latitude coordinates of the target locations.
$ pydaymet -h
Usage: pydaymet [OPTIONS] COMMAND [ARGS]...
Command-line interface for PyDaymet.
Options:
-h, --help Show this message and exit.
Commands:
coords Retrieve climate data for a list of coordinates.
geometry Retrieve climate data for a dataframe of geometries.
The coords
sub-command is as follows:
$ pydaymet coords -h
Usage: pydaymet coords [OPTIONS] FPATH
Retrieve climate data for a list of coordinates.
FPATH: Path to a csv file with four columns:
- ``id``: Feature identifiers that daymet uses as the output netcdf filenames.
- ``start``: Start time.
- ``end``: End time.
- ``lon``: Longitude of the points of interest.
- ``lat``: Latitude of the points of interest.
- ``time_scale``: (optional) Time scale, either ``daily`` (default), ``monthly`` or ``annual``.
- ``pet``: (optional) Method to compute PET. Suppoerted methods are:
``penman_monteith``, ``hargreaves_samani``, ``priestley_taylor``, and ``none`` (default).
- ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.
Examples:
$ cat coords.csv
id,lon,lat,start,end,pet
california,-122.2493328,37.8122894,2012-01-01,2014-12-31,hargreaves_samani
$ pydaymet coords coords.csv -v prcp -v tmin
Options:
-v, --variables TEXT Target variables. You can pass this flag multiple
times for multiple variables.
-s, --save_dir PATH Path to a directory to save the requested files.
Extension for the outputs is .nc for geometry and .csv
for coords.
--disable_ssl Pass to disable SSL certification verification.
-h, --help Show this message and exit.
And, the geometry
sub-command is as follows:
$ pydaymet geometry -h
Usage: pydaymet geometry [OPTIONS] FPATH
Retrieve climate data for a dataframe of geometries.
FPATH: Path to a shapefile (.shp) or geopackage (.gpkg) file.
This file must have four columns and contain a ``crs`` attribute:
- ``id``: Feature identifiers that daymet uses as the output netcdf filenames.
- ``start``: Start time.
- ``end``: End time.
- ``geometry``: Target geometries.
- ``time_scale``: (optional) Time scale, either ``daily`` (default), ``monthly`` or ``annual``.
- ``pet``: (optional) Method to compute PET. Suppoerted methods are:
``penman_monteith``, ``hargreaves_samani``, ``priestley_taylor``, and ``none`` (default).
- ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.
Examples:
$ pydaymet geometry geo.gpkg -v prcp -v tmin
Options:
-v, --variables TEXT Target variables. You can pass this flag multiple
times for multiple variables.
-s, --save_dir PATH Path to a directory to save the requested files.
Extension for the outputs is .nc for geometry and .csv
for coords.
--disable_ssl Pass to disable SSL certification verification.
-h, --help Show this message and exit.
Now, let’s see how we can use PyDaymet as a library.
PyDaymet offers two functions for getting climate data; get_bycoords
and get_bygeom
.
The arguments of these functions are identical except the first argument where the latter
should be polygon and the former should be a coordinate (a tuple of length two as in (x, y)).
The input geometry or coordinate can be in any valid CRS (defaults to EPSG:4326
). The
dates
argument can be either a tuple of length two like (start_str, end_str)
or a list of
years like [2000, 2005]
. It is noted that both functions have a pet
flag for computing PET
and a snow
flag for separating snow from precipitation using
Martinez and Gupta (2010) method.
Additionally, we can pass time_scale
to get daily, monthly or annual summaries. This flag
by default is set to daily.
from pynhd import NLDI
import pydaymet as daymet
geometry = NLDI().get_basins("01031500").geometry[0]
var = ["prcp", "tmin"]
dates = ("2000-01-01", "2000-06-30")
daily = daymet.get_bygeom(
geometry, dates, variables=var, pet="priestley_taylor", snow=True
)
monthly = daymet.get_bygeom(geometry, dates, variables=var, time_scale="monthly")

If the input geometry (or coordinate) is in a CRS other than EPSG:4326
, we should pass
it to the functions.
coords = (-1431147.7928, 318483.4618)
crs = "epsg:3542"
dates = ("2000-01-01", "2006-12-31")
annual = daymet.get_bycoords(
coords, dates, variables=var, loc_crs=crs, time_scale="annual"
)

Also, we can use the potential_et
function to compute PET by passing the daily climate data.
We can either pass a pandas.DataFrame
or a xarray.Dataset
. Note that, penman_monteith
and priestley_taylor
methods have parameters that can be passed via the params
argument,
if any value other than the default values are needed. For example, default value of alpha
for priestley_taylor
method is 1.26 (humid regions), we can set it to 1.74 (arid regions)
as follows:
pet_hs = daymet.potential_et(daily, methods="priestley_taylor", params={"alpha": 1.74})
Next, let’s get annual total precipitation for Hawaii and Puerto Rico for 2010.
hi_ext = (-160.3055, 17.9539, -154.7715, 23.5186)
pr_ext = (-67.9927, 16.8443, -64.1195, 19.9381)
hi = daymet.get_bygeom(hi_ext, 2010, variables="prcp", region="hi", time_scale="annual")
pr = daymet.get_bygeom(pr_ext, 2010, variables="prcp", region="pr", time_scale="annual")
Some example plots are shown below:


AsyncRetriever: Asynchronous requests with persistent caching#
Features#
AsyncRetriever is a part of HyRiver software stack that
is designed to aid in hydroclimate analysis through web services. This package serves as HyRiver’s
engine for asynchronously sending requests and retrieving responses as text
, binary
, or
json
objects. It uses persistent caching using
aiohttp-client-cache to speed up the retrieval
even further. Moreover, thanks to nest_asyncio
you can use this package in Jupyter notebooks. Although this package is part of the HyRiver
software stack, it can be used for any web calls. There are three functions that you can
use to make web calls:
retrieve_text
: Get responses astext
objects.retrieve_binary
: Get responses asbinary
objects.retrieve_json
: Get responses asjson
objects.
You can also use the general-purpose retrieve
function to get responses as any
of the three types. All responses are returned as a list that has the same order as the
input list of requests. Moreover, there is another function called delete_url_cache
for removing all requests from a cache file that contains the given URL.
You can control the request/response caching behavior by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to -1 (never expire).HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
You can find some example notebooks here.
You can also try using AsyncRetriever without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Please note that since this project is in the early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.
Moreover, requests for additional functionalities can be submitted via issue tracker.
Installation#
You can install async_retriever
using pip
:
$ pip install async_retriever
Alternatively, async_retriever
can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge async_retriever
Quick start#
AsyncRetriever by default creates and/or uses ./cache/aiohttp_cache.sqlite
as the cache
that you can customize by the cache_name
argument. Also, by default, the cache doesn’t
have any expiration date and the delete_url_cache
function should be used if you know
that a database on a server was updated, and you want to retrieve the latest data.
Alternatively, you can use the expire_after
to set the expiration date for the cache.
As an example for retrieving a binary
response, let’s use the DAAC server to get
NDVI.
The responses can be directly passed to xarray.open_mfdataset
to get the data as
a xarray
Dataset. We can also disable SSL certificate verification by setting
ssl=False
.
import io
import xarray as xr
import async_retriever as ar
from datetime import datetime
west, south, east, north = (-69.77, 45.07, -69.31, 45.45)
base_url = "https://thredds.daac.ornl.gov/thredds/ncss/ornldaac/1299"
dates_itr = ((datetime(y, 1, 1), datetime(y, 1, 31)) for y in range(2000, 2005))
urls, kwds = zip(
*[
(
f"{base_url}/MCD13.A{s.year}.unaccum.nc4",
{
"params": {
"var": "NDVI",
"north": f"{north}",
"west": f"{west}",
"east": f"{east}",
"south": f"{south}",
"disableProjSubset": "on",
"horizStride": "1",
"time_start": s.strftime("%Y-%m-%dT%H:%M:%SZ"),
"time_end": e.strftime("%Y-%m-%dT%H:%M:%SZ"),
"timeStride": "1",
"addLatLon": "true",
"accept": "netcdf",
}
},
)
for s, e in dates_itr
]
)
resp = ar.retrieve_binary(urls, kwds, max_workers=8, ssl=False)
data = xr.open_mfdataset(io.BytesIO(r) for r in resp)
We can remove these requests and their responses from the cache like so:
ar.delete_url_cache(base_url)

For a json
response example, let’s get water level recordings of an NOAA’s water level station,
8534720 (Atlantic City, NJ), during 2012, using CO-OPS API. Note that this CO-OPS product has a
31-day limit for a single request, so we have to break the request down accordingly.
import pandas as pd
station_id = "8534720"
start = pd.to_datetime("2012-01-01")
end = pd.to_datetime("2012-12-31")
s = start
dates = []
for e in pd.date_range(start, end, freq="m"):
dates.append((s.date(), e.date()))
s = e + pd.offsets.MonthBegin()
url = "https://api.tidesandcurrents.noaa.gov/api/prod/datagetter"
urls, kwds = zip(
*[
(
url,
{
"params": {
"product": "water_level",
"application": "web_services",
"begin_date": f'{s.strftime("%Y%m%d")}',
"end_date": f'{e.strftime("%Y%m%d")}',
"datum": "MSL",
"station": f"{station_id}",
"time_zone": "GMT",
"units": "metric",
"format": "json",
}
},
)
for s, e in dates
]
)
resp = ar.retrieve_json(urls, kwds)
wl_list = []
for rjson in resp:
wl = pd.DataFrame.from_dict(rjson["data"])
wl["t"] = pd.to_datetime(wl.t)
wl = wl.set_index(wl.t).drop(columns="t")
wl["v"] = pd.to_numeric(wl.v, errors="coerce")
wl_list.append(wl)
water_level = pd.concat(wl_list).sort_index()
water_level.attrs = rjson["metadata"]

Now, let’s see an example without any payload or headers. Here’s how we can retrieve harmonic constituents of several NOAA stations from CO-OPS:
stations = [
"8410140",
"8411060",
"8413320",
"8418150",
"8419317",
"8419870",
"8443970",
"8447386",
]
base_url = "https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations"
urls = [f"{base_url}/{i}/harcon.json?units=metric" for i in stations]
resp = ar.retrieve_json(urls)
amp_list = []
phs_list = []
for rjson in resp:
sid = rjson["self"].rsplit("/", 2)[1]
const = pd.DataFrame.from_dict(rjson["HarmonicConstituents"]).set_index("name")
amp = const.rename(columns={"amplitude": sid})[sid]
phase = const.rename(columns={"phase_GMT": sid})[sid]
amp_list.append(amp)
phs_list.append(phase)
amp = pd.concat(amp_list, axis=1)
phs = pd.concat(phs_list, axis=1)

PyGeoOGC: Retrieve Data from RESTful, WMS, and WFS Services#
Features#
PyGeoOGC is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides general interfaces to web services that are based on ArcGIS RESTful, WMS, and WFS. Although all these web services have limits on the number of features per request (e.g., 1000 object IDs for a RESTful request or 8 million pixels for a WMS request), PyGeoOGC, first, divides the large requests into smaller chunks, and then returns the merged results.
Moreover, under the hood, PyGeoOGC uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.
You can control the request/response caching behavior by setting the following environment variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database. It defaults to./cache/aiohttp_cache.sqlite
HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds. It defaults to -1 (never expire).HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache. The default is false.
For example, in your code before making any requests you can do:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
There is also an inventory of URLs for some of these web services in form of a class called
ServiceURL
. These URLs are in four categories: ServiceURL().restful
,
ServiceURL().wms
, ServiceURL().wfs
, and ServiceURL().http
. These URLs provide you
with some examples of the services that PyGeoOGC supports. If you have success using PyGeoOGC with a web
service please consider submitting a request to be added to this URL inventory. You can get all
the URLs in the ServiceURL
class by just printing it print(ServiceURL())
.
PyGeoOGC has three main classes:
ArcGISRESTful
: This class can be instantiated by providing the target layer URL. For example, for getting Watershed Boundary Data we can useServiceURL().restful.wbd
. By looking at the web service’s website we see that there are nine layers. For example, 1 for 2-digit HU (Region), 6 for 12-digit HU (Subregion), and so on. We can pass the URL to the target layer directly, like thisf"{ServiceURL().restful.wbd}/6"
or as a separate argument vialayer
.Afterward, we request for the data in two steps. First, we need to get the target object IDs using
oids_bygeom
(within a geometry),oids_byfield
(specific field IDs), oroids_bysql
(any valid SQL 92 WHERE clause) class methods. Then, we can get the target features usingget_features
class method. The returned response can be converted into a GeoDataFrame usingjson2geodf
function from PyGeoUtils.WMS
: Instantiation of this class requires at least 3 arguments: service URL, layer name(s), and output format. Additionally, target CRS and the web service version can be provided. Upon instantiation, we can usegetmap_bybox
method class to get the target raster data within a bounding box. The box can be in any valid CRS and if it is different from the default CRS,EPSG:4326
, it should be passed usingbox_crs
argument. The service response can be converted into axarray.Dataset
usinggtiff2xarray
function from PyGeoUtils.WFS
: Instantiation of this class is similar toWMS
. The only difference is that only one layer name can be passed. Upon instantiation there are three ways to get the data:getfeature_bybox
: Get all the target features within a bounding box in any valid CRS.getfeature_byid
: Get all the target features based on the IDs. Note that two arguments should be provided:featurename
, andfeatureids
. You can get a list of valid feature names usingget_validnames
class method.getfeature_byfilter
: Get the data based on any valid CQL filter.
You can convert the returned response of this function to a
GeoDataFrame
usingjson2geodf
function from PyGeoUtils package.
You can find some example notebooks here.
Furthermore, you can also try using PyGeoOGC without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Please note that since this project is in early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.
Moreover, requests for additional functionalities can be submitted via issue tracker.
Installation#
You can install PyGeoOGC using pip
:
$ pip install pygeoogc
Alternatively, PyGeoOGC can be installed from the conda-forge
repository
using Conda
or Mamba:
$ conda install -c conda-forge pygeoogc
Quick start#
We can access
NHDPlus HR
via RESTful service,
National Wetlands Inventory from WMS, and
FEMA National Flood Hazard
via WFS. The output for these functions are of type requests.Response
that
can be converted to GeoDataFrame
or xarray.Dataset
using
PyGeoUtils.
Let’s start the National Map’s NHDPlus HR web service. We can query the flowlines that are within a geometry as follows:
from pygeoogc import ArcGISRESTful, WFS, WMS, ServiceURL
import pygeoutils as geoutils
from pynhd import NLDI
basin_geom = NLDI().get_basins("01031500").geometry[0]
hr = ArcGISRESTful(ServiceURL().restful.nhdplushr, 2, outformat="json")
resp = hr.get_features(hr.oids_bygeom(basin_geom, "epsg:4326"))
flowlines = geoutils.json2geodf(resp)
Note oids_bygeom
has three additional arguments: sql_clause
, spatial_relation
,
and distance
. We can use sql_clause
for passing any valid SQL WHERE clauses and
spatial_relation
for specifying the target predicate such as
intersect, contain, cross, etc. The default predicate is intersect
(esriSpatialRelIntersects
). Additionally, we can use distance
for specifying the buffer
distance from the input geometry for getting features.
We can also submit a query based on IDs of any valid field in the database. If the measure
property is desired you can pass return_m
as True
to the get_features
class method:
oids = hr.oids_byfield("PERMANENT_IDENTIFIER", ["103455178", "103454362", "103453218"])
resp = hr.get_features(oids, return_m=True)
flowlines = geoutils.json2geodf(resp)
Additionally, any valid SQL 92 WHERE clause can be used. For more details look here. For example, let’s limit our first request to only include catchments with areas larger than 0.5 sqkm.
oids = hr.oids_bygeom(basin_geom, geo_crs="epsg:4326", sql_clause="AREASQKM > 0.5")
resp = hr.get_features(oids)
catchments = geoutils.json2geodf(resp)
A WMS-based example is shown below:
wms = WMS(
ServiceURL().wms.fws,
layers="0",
outformat="image/tiff",
crs="epsg:3857",
)
r_dict = wms.getmap_bybox(
basin_geom.bounds,
1e3,
box_crs="epsg:4326",
)
wetlands = geoutils.gtiff2xarray(r_dict, basin_geom, "epsg:4326")
Query from a WFS-based web service can be done either within a bounding box or using any valid CQL filter.
wfs = WFS(
ServiceURL().wfs.fema,
layer="public_NFHL:Base_Flood_Elevations",
outformat="esrigeojson",
crs="epsg:4269",
)
r = wfs.getfeature_bybox(basin_geom.bounds, box_crs="epsg:4326")
flood = geoutils.json2geodf(r.json(), "epsg:4269", "epsg:4326")
layer = "wmadata:huc08"
wfs = WFS(
ServiceURL().wfs.waterdata,
layer=layer,
outformat="application/json",
version="2.0.0",
crs="epsg:4269",
)
r = wfs.getfeature_byfilter(f"huc8 LIKE '13030%'")
huc8 = geoutils.json2geodf(r.json(), "epsg:4269", "epsg:4326")

PyGeoUtils: Utilities for (Geo)JSON and (Geo)TIFF Conversion#
Features#
PyGeoUtils is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides utilities for manipulating (Geo)JSON and (Geo)TIFF responses from web services. These utilities are:
json2geodf
: For converting (Geo)JSON objects to GeoPandas dataframe.arcgis2geojson
: For converting ESRIGeoJSON to the standard GeoJSON format.gtiff2xarray
: For converting (Geo)TIFF objects to xarray datasets.xarray2geodf
: For convertingxarray.DataArray
to ageopandas.GeoDataFrame
, i.e., vectorization.xarray_geomask
: For masking axarray.Dataset
orxarray.DataArray
using a polygon.
All these functions handle all necessary CRS transformations.
You can find some example notebooks here.
You can also try using PyGeoUtils without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!
Please note that since this project is in the early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.
Moreover, requests for additional functionalities can be submitted via issue tracker.
Installation#
You can install PyGeoUtils using pip
after installing libgdal
on your system
(for example, in Ubuntu run sudo apt install libgdal-dev
).
$ pip install pygeoutils
Alternatively, PyGeoUtils can be installed from the conda-forge
repository
using Conda:
$ conda install -c conda-forge pygeoutils
Quick start#
To demonstrate the capabilities of PyGeoUtils let’s use
PyGeoOGC to access
National Wetlands Inventory from WMS, and
FEMA National Flood Hazard
via WFS, then convert the output to xarray.Dataset
and GeoDataFrame
, respectively.
import pygeoutils as geoutils
from pygeoogc import WFS, WMS, ServiceURL
from shapely.geometry import Polygon
geometry = Polygon(
[
[-118.72, 34.118],
[-118.31, 34.118],
[-118.31, 34.518],
[-118.72, 34.518],
[-118.72, 34.118],
]
)
crs = "epsg:4326"
wms = WMS(
ServiceURL().wms.mrlc,
layers="NLCD_2011_Tree_Canopy_L48",
outformat="image/geotiff",
crs=crs,
)
r_dict = wms.getmap_bybox(
geometry.bounds,
1e3,
box_crs=crs,
)
canopy = geoutils.gtiff2xarray(r_dict, geometry, crs)
mask = canopy > 60
canopy_gdf = geoutils.xarray2geodf(canopy, "float32", mask)
url_wfs = "https://hazards.fema.gov/gis/nfhl/services/public/NFHL/MapServer/WFSServer"
wfs = WFS(
url_wfs,
layer="public_NFHL:Base_Flood_Elevations",
outformat="esrigeojson",
crs="epsg:4269",
)
r = wfs.getfeature_bybox(geometry.bounds, box_crs=crs)
flood = geoutils.json2geodf(r.json(), "epsg:4269", crs)
Example Gallery#
The following notebooks demonstrate capabilities of the HyRiver software stack.
This page was generated from notebooks/3dep.ipynb.
Interactive online version:
Topographic Data#
[1]:
from pathlib import Path
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import osmnx as ox
import py3dep
import pydaymet as daymet
import rioxarray # noqa: F401
import xarray as xr
from pynhd import NLDI
Py3DEP provides access to the 3DEP database which is a part of the National Map services. The 3DEP service has multi-resolution sources and depending on the user provided resolution, the data is resampled on the server-side based on all the available data sources. Py3DEP returns the requests as xarray dataset. The 3DEP includes the following layers:
DEM
Hillshade Gray
Aspect Degrees
Aspect Map
GreyHillshade Elevation Fill
Hillshade Multidirectional
Slope Map
Slope Degrees
Hillshade Elevation Tinted
Height Ellipsoidal
Contour 25
Contour Smoothed 25
Moreover, Py3DEP offers some additional utilities:
elevation_bygrid
: For getting elevations of all the grid points in a 2D grid.elevation_byloc
: For getting elevation of a single point which is based on the National Map’s Elevation Point Query Service.deg2mpm
: For converting slope dataset from degree to meter per meter.
Let’s get a watershed geometry using NLDI and then get DEM and slope.
[2]:
geometry = NLDI().get_basins("11092450").geometry[0]
[3]:
dem = py3dep.get_map("DEM", geometry, resolution=90, geo_crs="epsg:4326", crs="epsg:3857")
dem.name = "dem"
dem.attrs["units"] = "meters"
slope = py3dep.get_map("Slope Degrees", geometry, resolution=30)
slope = py3dep.deg2mpm(slope)
We can save the DEM DataArray
as a raster file using rioxarray
:
[4]:
dem.rio.to_raster(Path("input_data", "dem_11092450.tif"))
[5]:
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
dem.plot(ax=ax1)
slope.plot(ax=ax2)
fig.savefig("_static/dem_slope.png", bbox_inches="tight", facecolor="w")

We can also get elevations of a list of coordinates using py3dep.elevation_bycoords
function. This function is particularly useful for getting elevations of nodes of a network, for example, is a river or a street network. Let’s use osmnx package to get a street network:
[6]:
G = ox.graph_from_place("Piedmont, California, USA", network_type="drive")
/Users/tchegini/.local/apps/mambaforge/envs/hyriver-dev/lib/python3.10/site-packages/osmnx/geocoder.py:110: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
gdf = gdf.append(_geocode_query_to_gdf(q, wr, by_osmid))
Now, we can get the elevations for each node based on their coordinates and then plot the results.
[7]:
x, y = nx.get_node_attributes(G, "x").values(), nx.get_node_attributes(G, "y").values()
elevation = py3dep.elevation_bycoords(list(zip(x, y)), crs="epsg:4326", source="airmap")
nx.set_node_attributes(G, dict(zip(G.nodes(), elevation)), "elevation")
[8]:
nc = ox.plot.get_node_colors_by_attr(G, "elevation", cmap="terrain")
fig, ax = ox.plot_graph(
G,
node_color=nc,
node_size=10,
save=True,
bgcolor="w",
filepath="_static/street_elev.png",
dpi=100,
)

Note that, this function gets the elevation data from the elevation map of the bounding box of all the coordinates. So, if the larger the extent of this bounding box, the longer is going to take for the function to get the data.
Additionally, we can get the elevations of set of x- and y- coordinates of a grid. For example, let’s get the minimum temperature data within the watershed from Daymet using PyDaymet then add the elevation as a new variable to the dataset:
[9]:
clm = daymet.get_bygeom(geometry, ("2005-01-01", "2005-01-31"), variables="tmin")
elev = py3dep.elevation_bygrid(clm.x.values, clm.y.values, clm.crs, 1000)
attrs = clm.attrs
clm = xr.merge([clm, elev])
clm["elevation"] = clm.elevation.where(~np.isnan(clm.isel(time=0).tmin), drop=True)
clm.attrs.update(attrs)
[10]:
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
clm.tmin.isel(time=10).plot(ax=ax1)
clm.elevation.plot(ax=ax2)
[10]:
<matplotlib.collections.QuadMesh at 0x1afc96380>

This page was generated from notebooks/aridity.ipynb.
Interactive online version:
Aridity#
PyDaymet computes Potential Evaportranspiration based on FAO Penman-Monteith equation using tmin (deg c)
, tmax (deg c)
, vp (Pa)
, srad (W/m^2)
, and dayl (s)
variables from Daymet:
Basin#
Let’s start by getting the basin geometry of USGS-01031500 station (Piscataquis River near Dover-Foxcroft, Maine)
[1]:
import os
import warnings
from pathlib import Path
import geopandas as gpd
import pynhd as nhd
warnings.filterwarnings("ignore", message=".*initial implementation of Parquet.*")
root = Path("input_data")
os.makedirs(root, exist_ok=True)
BASE_PLOT = {"facecolor": "k", "edgecolor": "b", "alpha": 0.2, "figsize": (18, 9)}
CRS = "esri:102008"
station_id = "01031500"
nldi = nhd.NLDI()
cfile = Path(root, f"basin_{station_id}.feather")
if cfile.exists():
basin = gpd.read_feather(cfile)
else:
basin = nldi.get_basins(station_id)
basin.to_feather(cfile)
[2]:
ax = basin.to_crs(CRS).plot(**BASE_PLOT)
ax.axis("off")
ax.margins(0)

Precipitation and Potential Evapotranspiration#
[3]:
import pydaymet as daymet
from tqdm.notebook import tqdm
years = list(range(2006, 2016))
geometry = basin.iloc[0].geometry
for yr in tqdm(years, desc="Getting Climate"):
cfile = Path(root, f"clm_{yr}.nc")
if cfile.exists():
continue
daymet.get_bygeom(geometry, yr, variables="prcp", pet="hargreaves_samani").to_netcdf(cfile)
[4]:
import xarray as xr
clm = xr.open_mfdataset(Path(root).glob("clm_*.nc"), coords="minimal")
clm["elevation"] = clm["elevation"].isel(time=0, drop=True)
clm
[4]:
<xarray.Dataset> Dimensions: (time: 3650, y: 37, x: 40) Coordinates: * time (time) datetime64[ns] 2006-01-01T12:00:00 ... 2015-12-31T12:00:00 * y (y) float32 719.0 718.0 717.0 716.0 ... 686.0 685.0 684.0 683.0 * x (x) float32 2.21e+03 2.211e+03 2.212e+03 ... 2.248e+03 2.249e+03 Data variables: dayl (time, y, x) float32 dask.array<chunksize=(365, 37, 40), meta=np.ndarray> lat (time, y, x) float64 dask.array<chunksize=(365, 37, 40), meta=np.ndarray> lon (time, y, x) float64 dask.array<chunksize=(365, 37, 40), meta=np.ndarray> prcp (time, y, x) float32 dask.array<chunksize=(365, 37, 40), meta=np.ndarray> srad (time, y, x) float32 dask.array<chunksize=(365, 37, 40), meta=np.ndarray> tmax (time, y, x) float32 dask.array<chunksize=(365, 37, 40), meta=np.ndarray> tmin (time, y, x) float32 dask.array<chunksize=(365, 37, 40), meta=np.ndarray> vp (time, y, x) float32 dask.array<chunksize=(365, 37, 40), meta=np.ndarray> elevation (y, x) float64 dask.array<chunksize=(37, 40), meta=np.ndarray> pet (time, y, x) float64 dask.array<chunksize=(365, 37, 40), meta=np.ndarray> Attributes: (12/17) start_year: 2006 source: Daymet Software Version 4.0 Version_software: Daymet Software Version 4.0 Version_data: Daymet Data Version 4.0 Conventions: CF-1.6 citation: Please see http://daymet.ornl.gov/ for current Dayme... ... ... geospatial_lon_max: -69.13309404831536 crs: +proj=lcc +lat_1=25 +lat_2=60 +lat_0=42.5 +lon_0=-10... nodatavals: 0.0 transform: [ 1.00000e+00 0.00000e+00 2.20625e+03 0.00000e+00... res: [1. 1.] bounds: [2209.24651856 682.07851575 2250.2123966 719.9719...
[5]:
clm.elevation.plot()
[5]:
<matplotlib.collections.QuadMesh at 0x1af159840>

[6]:
import hvplot.xarray # noqa
clm.hvplot.violin(y="pet", by="time.month")
[6]:
Aridity#
[7]:
variables = {"pet": "Mean Annual PET", "prcp": "Mean Annual P"}
data = {}
for v, n in variables.items():
data[v] = clm[v].groupby("time.year").sum().mean(dim="year")
data[v] = data[v].where(data[v] > 0)
data[v] = data[v].rename(n)
data[v] = data[v].assign_attrs(unit="mm/year")
[8]:
aridity = data["pet"] / data["prcp"]
aridity = aridity.rename("Aridity Index")
aridity.plot()
[8]:
<matplotlib.collections.QuadMesh at 0x1b2454340>

This page was generated from notebooks/columbia.ipynb.
Interactive online version:
[1]:
import warnings
from pathlib import Path
warnings.filterwarnings("ignore", message=".*initial implementation of Parquet.*")
root = Path("input_data")
root.mkdir(parents=True, exist_ok=True)
BASE_PLOT = {"facecolor": "k", "edgecolor": "b", "alpha": 0.2, "figsize": (18, 9)}
CRS = "esri:102008"
Dams in Columbia River Network#
Basin#
[2]:
import geopandas as gpd
import pynhd as nhd
nldi = nhd.NLDI()
station_id = "14246900"
cfile = Path(root, f"basin_{station_id}.feather")
if cfile.exists():
basin = gpd.read_feather(cfile)
else:
basin = nldi.get_basins(station_id)
basin.to_feather(cfile)
Main#
[3]:
cfile = Path(root, f"flowline_main_{station_id}.feather")
if cfile.exists():
flw_main = gpd.read_feather(cfile)
else:
flw_main = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamMain",
source="flowlines",
distance=2000,
)
flw_main.to_feather(cfile)
Tributaries#
[4]:
cfile = Path(root, f"flowline_trib_{station_id}.feather")
if cfile.exists():
flw_trib = gpd.read_feather(cfile)
else:
flw_trib = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="flowlines",
distance=2000,
)
flw_trib.to_feather(cfile)
flw_trib["nhdplus_comid"] = flw_trib["nhdplus_comid"].astype("float").astype("Int64")
[5]:
ax = basin.plot(**BASE_PLOT)
flw_trib.plot(ax=ax)
flw_main.plot(ax=ax, lw=3, color="r")
ax.legend(["Tributaries", "Main"])
ax.axis("off")
ax.margins(0)

Accumulated Dams#
[6]:
import pandas as pd
cfile = Path(root, "nid_flw.pkl")
if cfile.exists():
nid_flw = pd.read_pickle(cfile)
else:
meta = nhd.nhdplus_attrs()
nid_years = (
meta[meta.description.str.contains("dam", case=False)].sort_values("name").name.tolist()
)
nid_flw = {n.split("_")[-1]: nhd.nhdplus_attrs(n) for n in nid_years}
pd.to_pickle(nid_flw, cfile)
Now, let’s see what catchment-level characteristics are available that are related to dams.
[7]:
div_chars = nldi.get_validchars("div")
div_chars[div_chars.characteristic_description.str.contains("dam")]
[7]:
characteristic_description | units | dataset_label | dataset_url | theme_label | theme_url | characteristic_type | |
---|---|---|---|---|---|---|---|
ACC_NDAMS2013 | Number of dams built on or before YYYY , | count | Number of major dams built on or before YYYY i... | https://www.sciencebase.gov/catalog/item/58c30... | Hydrologic Modifications | unknown | divRoute_name |
ACC_NID_STORAGE2013 | The maximum dam storage (in acre-feet) defined... | acre-feet | Number of major dams built on or before YYYY i... | https://www.sciencebase.gov/catalog/item/58c30... | Hydrologic Modifications | unknown | divRoute_name |
ACC_NORM_STORAGE2013 | The normal dam storage (in acre-feet) defined ... | acre-feet | Number of major dams built on or before YYYY i... | https://www.sciencebase.gov/catalog/item/58c30... | Hydrologic Modifications | unknown | divRoute_name |
ACC_MAJOR2013 | Number of major dams built on or before YYYY i... | count | Number of major dams built on or before YYYY i... | https://www.sciencebase.gov/catalog/item/58c30... | Hydrologic Modifications | unknown | divRoute_name |
Let’s get ACC_NID_STORAGE2013
:
We can achieve the same results using another function that uses ScienceBase
instead of NLDI
:
nid_vals = nldi.getcharacteristic_byid(flw_trib.nhdplus_comid.tolist(), "div", "ACC_NID_STORAGE2013")
comids = [int(c) for c in flw_trib.nhdplus_comid.tolist()]
nid_vals = {
yr: df.loc[df.COMID.isin(comids), ["COMID", f"ACC_NID_STORAGE{yr}", f"ACC_NDAMS{yr}"]].rename(
columns={
"COMID": "comid",
f"ACC_NID_STORAGE{yr}": "smax",
f"ACC_NDAMS{yr}": "ndams",
}
)
for yr, df in nid_flw.items()
}
nid_vals = pd.concat(nid_vals).reset_index().drop(columns="level_1")
nid_vals = nid_vals.rename(columns={"level_0": "year"}).astype({"year": int})
[8]:
comids = [int(c) for c in flw_trib.nhdplus_comid.tolist()]
nid_vals = {
yr: df.loc[df.COMID.isin(comids), ["COMID", f"ACC_NID_STORAGE{yr}", f"ACC_NDAMS{yr}"]].rename(
columns={
"COMID": "comid",
f"ACC_NID_STORAGE{yr}": "smax",
f"ACC_NDAMS{yr}": "ndams",
}
)
for yr, df in nid_flw.items()
}
nid_vals = pd.concat(nid_vals).reset_index().drop(columns="level_1")
nid_vals = nid_vals.rename(columns={"level_0": "year"}).astype({"year": int})
Accumulated Max Storage#
[9]:
nid_vals = (
nid_vals.set_index("comid")
.merge(
flw_trib.astype({"nhdplus_comid": int}).set_index("nhdplus_comid"),
left_index=True,
right_index=True,
suffixes=(None, None),
)
.reset_index()
.rename(columns={"index": "comid"})
)
smax = nid_vals.groupby(["year", "comid"]).sum()["smax"].unstack()
smax = gpd.GeoDataFrame(
smax.T.merge(
flw_trib.astype({"nhdplus_comid": int}).set_index("nhdplus_comid"),
left_index=True,
right_index=True,
suffixes=(None, None),
)
)
[10]:
yr = 2013
ax = basin.plot(**BASE_PLOT)
smax.plot(ax=ax, scheme="Quantiles", k=2, column=yr, cmap="coolwarm", lw=0.5, legend=False)
ax.set_title(f"Accumulated Maximum Storage Capacity of Dams up to {yr}")
ax.axis("off")
ax.margins(0)

This page was generated from notebooks/cross_section.ipynb.
Interactive online version:
River Elevation and Cross-Section#
[1]:
import geopandas as gpd
import numpy as np
import pandas as pd
import py3dep
import pygeoogc.utils as ogc_utils
import pynhd
from pynhd import NHD, NLDI
from shapely.geometry import Point
[2]:
import warnings
warnings.filterwarnings("ignore", message=".*Index.*")
We can retrieve elevation profile for tributaries of a given USGS station ID using PyNHD
and Py3DEP
. For this purpose, we get the elevation data for points along the tributaries’ flowlines every one kilometer. Note that since the distance is in meters we reproject the geospatial data into ESRI:102003.
[3]:
CRS = "ESRI:102003"
station_id = "01031500"
distance = 1000 # in meters
First, let’s get the basin geometry and tributaries for the station.
[4]:
nldi = NLDI()
basin = nldi.get_basins(station_id)
flw = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="flowlines",
distance=1000,
)
flw = flw.set_index("nhdplus_comid").to_crs(CRS)
Now, we can compute the number of points along each river segment based on the target 1-km distance. Note that since we want to sample the data from a DEM in EPSG:4326 wen need to reproject the sample points from EPSG:102003 to EPSG:4326. We use pygeoogc.utils.match_crs
to do so.
[5]:
flw["n_points"] = np.ceil(flw.length / distance).astype("int")
flw_points = [
(
n,
ogc_utils.match_crs(
[
(p[0][0], p[1][0])
for p in (
l.interpolate(x, normalized=True).xy
for x in np.linspace(0, 1, pts, endpoint=False)
)
],
CRS,
"EPSG:4326",
),
)
for n, l, pts in flw.itertuples(name=None)
]
We use Py3DEP
to get elevation data for these points. We can either use py3dep.elevation_bycoords
to get elevations directly from the Elevation Point Query Service (EPQS) at 10-m resolution or use py3dep.get_map
to get the elevation data for the entire basin and then sample it for the target locations.
Using py3dep.elevation_bycoords
is simpler but the EPQS service is a bit unstable and slow. Here’s the code to get the elevations using EPQS:
flw_elevation = [(n, py3dep.elevation_bycoords(pts, CRS)) for n, pts in flw_points]
Let’s use py3dep.get_map
instead since we can control the resolution and reuse it later on.
[6]:
dem = py3dep.get_map("DEM", basin.geometry[0].bounds, resolution=10)
Additionally, if RichDEM is installed, we can use py3dep.fill_depressions
function for filling its depressions.
[7]:
dem = py3dep.fill_depressions(dem)
dem.plot()
A Priority-Flood (Zhou2016 version)
C Zhou, G., Sun, Z., Fu, S., 2016. An efficient variant of the Priority-Flood algorithm for filling depressions in raster digital elevation models. Computers & Geosciences 90, Part A, 87 – 96. doi:http://dx.doi.org/10.1016/j.cageo.2016.02.021
t Zhou2016 wall-time = 1.46397 s
A Barnes (2014) Flat Resolution Flat Mask Generation
C Barnes, R., Lehman, C., Mulla, D., 2014a. An efficient assignment of drainage direction over flat surfaces in raster digital elevation models. Computers & Geosciences 62, 128–135. doi:10.1016/j.cageo.2013.01.009
t Succeeded in = 0.082256 s
p Setting up labels matrix...
p Setting up flat resolution mask...
p Searching for flats...
t Succeeded in = 0.141309 s================== ] (99% - 0.0s - 1 threads)
m Cells with no flow direction = 765370
m Low edge cells = 73178
m High edge cells = 289378
p Labeling flats...
m Unique flats = 71561
p Removing flats without outlets from the queue...
The flat height vector will require approximately 0MB of RAM.
p Creating flat height vector...
p Performing Barnes flat resolution's away gradient...
t Succeeded in = 0.0502031 s
p Barnes flat resolution: toward and combined gradients...
t Succeeded in = 0.117537 s
t Wall-time = 0.504376 s
A Barnes (2014) Flat Resolution (DEM modification)...
C Barnes, R., Lehman, C., Mulla, D., 2014a. An efficient assignment of drainage direction over flat surfaces in raster digital elevation models. Computers & Geosciences 62, 128–135. doi:10.1016/j.cageo.2013.01.009
[================================================= ] (99% - 0.0s - 1 threads)W Cells inappropriately raised above surrounding terrain = 9328
t Succeeded in = 0.404791 s
[7]:
<matplotlib.collections.QuadMesh at 0x1a9270c10>

[8]:
flw_elevation = [
(n, [dem.interp(x=[x], y=[y]).values[0][0] for x, y in pts]) for n, pts in flw_points
]
Now that we have both the coordinates and their elevation data for the points, we can create a new dataframe and plot the results.
[9]:
points = pd.DataFrame(flw_points).set_index(0)[1].explode()
points = pd.DataFrame([(n, Point(p)) for n, p in points.items()])
points = points.rename(columns={0: "comid", 1: "geometry"}).set_index("comid")
points = gpd.GeoDataFrame(points, crs="EPSG:4326").to_crs(CRS)
elevations = pd.DataFrame(flw_elevation).rename(columns={0: "comid", 1: "elevation"})
elevations = elevations.set_index("comid")["elevation"].explode().astype("float")
points["elevation"] = elevations
[10]:
ax = basin.to_crs(CRS).plot(figsize=(10, 10), facecolor="none", edgecolor="black")
points.plot(ax=ax, column="elevation", markersize=20, legend=True, scheme="quantiles")
flw.plot(ax=ax, color="b", linewidth=0.8, alpha=0.8)
ax.set_axis_off()

Let’s extract corss-section profiles along the main river of the basin. We get cross-section profiles at every 3 km along the main river and within a buffer distance of 2 km.
[11]:
distance = 2000 # in meters
width = 2000 # in meters
half_width = width * 0.5
First, we need to get the main river’s flowlines.
[12]:
main = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamMain",
source="flowlines",
distance=1000,
)
main = main.set_index("nhdplus_comid").to_crs(CRS)
[13]:
nhd = NHD("flowline_mr")
main_nhd = nhd.byids("COMID", main.index.tolist())
main_nhd = pynhd.prepare_nhdplus(main_nhd, 0, 0, 0, purge_non_dendritic=True).to_crs(CRS)
cs = pynhd.network_xsection(main_nhd, distance, width)
Let’s plot the obtained cross-section lines.
[14]:
ax = cs.plot(color="r", figsize=(10, 10))
main.plot(ax=ax, color="b", linewidth=1, alpha=0.8)
flw.plot(ax=ax, color="b", linewidth=0.5, alpha=0.8)
ax.set_axis_off()

Using the obtained cross-section lines, we can compute the cross-section profiles every 500 meters along the cross-section lines following the same procedure as before.
[17]:
distance = 100 # in meters
cs["n_points"] = np.ceil(cs.length / distance).astype("int")
cs_points = [
(
n,
ogc_utils.match_crs(
[
(p[0][0], p[1][0])
for p in (
l.interpolate(x, normalized=True).xy
for x in np.linspace(0, 1, pts, endpoint=False)
)
],
CRS,
"EPSG:4326",
),
)
for n, l, pts in cs.itertuples(name=None)
]
cs_elevation = [
(n, [dem.interp(x=[p[0]], y=[p[1]]).values[0][0] for p in pts]) for n, pts in cs_points
]
[18]:
points = pd.DataFrame(cs_points).set_index(0)[1].explode()
points = pd.DataFrame([(n, Point(p)) for n, p in points.items()])
points = points.rename(columns={0: "comid", 1: "geometry"}).set_index("comid")
points = gpd.GeoDataFrame(points, crs="EPSG:4326").to_crs(CRS)
elevations = pd.DataFrame(cs_elevation).rename(columns={0: "comid", 1: "elevation"})
elevations = elevations.set_index("comid")["elevation"].explode().astype("float")
points["elevation"] = elevations
[19]:
ax = basin.to_crs(CRS).plot(figsize=(10, 10), facecolor="none", edgecolor="black")
main.plot(ax=ax, color="b", linewidth=0.8, alpha=0.8)
points.plot(ax=ax, column="elevation", markersize=20, legend=True, scheme="quantiles")
cs.plot(ax=ax, color="b", linewidth=0.8, alpha=0.8)
ax.set_axis_off()

This page was generated from notebooks/dam_impact.ipynb.
Interactive online version:
Dam Impact on streamflow#
We’re going to take a look at the impact of building dams on their downstream streamflow. Here are some example peer-reviewed papers on this topic.
We set the area of interest (AOI) to Texas and study dams that have been built in the 1995-2005 period.
Workflow: - Get gauge stations within Texas that have enough streamflow data - Get dams within Texas that have been built have been built between 1995-2005 - Filter out stations that don’t have any dams in their upstream up to 10 km - Get streamflow data for the remaining stations - Visually inspect the streamflow data for dam impacts - Check land use/land cover type around the dams
Streamflow Gauges#
First, we use the National Water Information System (NWIS) service to check streamflow data availability in our AOI and 10 years before and after the period of our study, i.e., the 1985-2015 period.
Let’s start by importing the required libraries.
[1]:
import geopandas as gpd
import numpy as np
import pandas as pd
import proplot as pplt
import pygeohydro as gh
from pygeohydro import NID, NWIS
from pynhd import NLDI, WaterData, ZeroMatched
pplt.rc["figure.facecolor"] = "w"
[2]:
start = "1985-01-01"
end = "2015-01-01"
texas = gh.helpers.get_us_states("contiguous")
texas = texas[texas.STUSPS == "TX"]
[3]:
_, ax = pplt.subplots(refwidth=7)
texas.plot(ax=ax, facecolor="b", edgecolor="r", linewidth=2, alpha=0.8, figsize=(10, 10))
ax.set_axis_off()

[4]:
nwis = NWIS()
query = {
"stateCd": "TX",
"startDt": start,
"endDt": end,
"outputDataTypeCd": "dv", # daily values
"hasDataTypeCd": "dv", # daily values
"parameterCd": "00060", # discharge
}
sites = nwis.get_info(query)
sites.shape[0]
[4]:
3856
We can see that there are 3856 streamflow gauge stations in Texas that fit our criteria.
[5]:
_, ax = pplt.subplots(refwidth=7)
texas.plot(ax=ax, facecolor="b", edgecolor="r", linewidth=2, alpha=0.8, figsize=(10, 10))
sites.plot(ax=ax, marker="*", markersize=90, color="w", edgecolor="k", label="NWIS sites")
ax.legend(loc="upper left")
ax.set_axis_off()

Now, let’s filter these stations a step further to include only those stations that have around 30 years of daily streamflow data with drainage area of larger than 10 km\(^2\) that have been impacted by human activities.
[6]:
sites = sites[
(sites.parm_cd == "00060") # discharge
& (sites.stat_cd == "00003") # mean
& (sites.count_nu >= 30 * 365) # at least 30 years of data
& ~sites.hcdn_2009 # not pristine
& (sites.drain_sqkm > 10)
].copy()
sites.shape[0]
[6]:
280
Upon applying this filter, we’re left with 280 stations.
[7]:
_, ax = pplt.subplots(refwidth=7)
texas.plot(ax=ax, facecolor="b", edgecolor="r", linewidth=2, alpha=0.8, figsize=(10, 10))
sites.plot(ax=ax, marker="*", markersize=90, color="w", edgecolor="k", label="NWIS sites")
ax.legend(loc="upper left")
ax.set_axis_off()

Dams#
Next, we need to retrieve the dams in Texas that have been built between 1995 and 2005. We use National Inventory of Dams (NID) web service for this purpose. First, let’s check out the fields that NID exposes through their web service.
[8]:
nid = NID()
nid.valid_fields[nid.valid_fields.str.contains("year", case=False)]
[8]:
70 yearCompleted
71 yearCompletedId
83 yearsModified
Name: name, dtype: object
[9]:
nid.valid_fields[nid.valid_fields.str.contains("state", case=False)]
[9]:
13 stateFedId
17 stateRegulatedId
19 stateRegulatoryAgency
30 state
31 cityState
38 countyState
191 statesId
193 stateKey
Name: name, dtype: object
Based on these, we should use the stateKey
and yearCompleted
fields.
[10]:
dam_list = nid.get_byfilter(
[
{
"stateKey": ["TX"],
"yearCompleted": ["[1995 2005]"],
},
],
)
dams = dam_list[0]
[11]:
_, ax = pplt.subplots(refwidth=7)
texas.plot(ax=ax, facecolor="b", edgecolor="r", linewidth=2, alpha=0.8, figsize=(10, 10))
sites.plot(ax=ax, marker="*", markersize=110, color="w", edgecolor="k", label="NWIS sites")
dams.plot(ax=ax, marker="o", markersize=20, color="g", edgecolor="k", label="NID dams")
ax.legend(loc="upper left")
ax.set_axis_off()

As is evident from the plot above, there are many stations that don’t have any dams in their vicinity. One way to eliminate these stations is using a spatial query based on a search radius. We can determine an estimation for our search radius based on the upstream drainage area distribution of the streamflow gauges.
[12]:
ax = sites.hist("drain_sqkm", figsize=(4, 2.5), bins="auto")
_ = ax[0][0].set_title("Drainage Area (km$^2$) Histogram")

We can see that most stations have a drainage area of less than 15000 km\(^2\). Since they’re not very large a search radius of 10 km should be sufficient. Now, we define a function that carries out an efficient spatial query to find the stations that have at least one dam within a 10-km radius.
[13]:
def distance_filter(gdf1, gdf2, dist_km):
"""Filter gdf1 to only include points within distance of gdf2."""
buff = gdf1.to_crs(5070).buffer(dist_km * 1e3)
idx2, idx1 = buff.sindex.query_bulk(gdf2.to_crs(5070).geometry)
return gdf1.iloc[pd.unique(idx1)].reset_index(drop=True), gdf2.iloc[
pd.unique(idx2)
].reset_index(drop=True)
[14]:
sites, dams = distance_filter(sites, dams, 10)
ax = sites.hist("drain_sqkm", figsize=(4, 2.5), bins="auto")
_ = ax[0][0].set_title("Drainage Area (km$^2$) Histogram")

[15]:
_, ax = pplt.subplots(refwidth=7)
texas.plot(ax=ax, facecolor="b", edgecolor="r", linewidth=2, alpha=0.8, figsize=(10, 10))
sites.plot(ax=ax, marker="*", markersize=110, color="w", edgecolor="k", label="NWIS sites")
dams.plot(ax=ax, marker="o", markersize=20, color="g", edgecolor="k", label="NID dams")
ax.legend(loc="upper left")
ax.set_axis_off()

Flowlines and Hydrolinking#
So far, we obtained only the stations that have at least one dam in their 10-km radius, but we didn’t find if those dams are in their upstream or downstream. We use the Hydro Network-Linked Data Index (NLDI) web service to obtain the upstream flowlines of the streamflow gauges up to 10 km. Note that this 10 km is the distance along the flowlines.
[16]:
nldi = NLDI()
[17]:
flw_up = {}
noflw = []
for agency, fid in sites[["agency_cd", "site_no"]].itertuples(index=False, name=None):
try:
flw_up[fid] = nldi.navigate_byid(
fsource="nwissite",
fid=f"{agency}-{fid}",
navigation="upstreamTributaries",
source="flowlines",
distance=10,
)
except ZeroMatched:
noflw.append(fid)
noflw
[17]:
['08058900', '08059400']
Note that two of our stations are not indexed by the NLDI service. There could be several reasons for this but in this tutorial we just simply ignore them.
[18]:
sites = sites[sites.site_no.isin(flw_up)].copy()
sites, dams = distance_filter(sites, dams, 10)
We now have flowlines and dams that are closer than 10 km, but we still don’t have a way of linking these features. For this purpose, we use NHDPlus’s Common Identifier (comid
). One way to obtain the first downstream flowline of coordinate is to use NLDI’s hydrolocation
endpoint.
[19]:
coords = list(dams[["longitude", "latitude"]].astype(float).itertuples(index=False, name=None))
nid_comids = nldi.comid_byloc(coords)
missing = []
if isinstance(nid_comids, tuple):
nid_comids, missing = nid_comids
if missing:
coords_str = [str(c) for c in coords]
dams = dams.drop(index=[coords_str.index(i) for i in missing])
sites, dams = distance_filter(sites, dams, 10)
dams["nhdplus_comid"] = nid_comids.comid
1 of 37 inputs didn't return any features. They are returned as a list.
[20]:
coords = list(sites[["dec_long_va", "dec_lat_va"]].astype(float).itertuples(index=False, name=None))
site_comids = nldi.comid_byloc(coords)
missing = []
if isinstance(site_comids, tuple):
site_comids, missing = site_comids
if missing:
coords_str = [str(c) for c in coords]
sites = sites.drop(index=[coords_str.index(i) for i in missing])
sites, dams = distance_filter(sites, dams, 10)
sites["nhdplus_comid"] = site_comids.comid
We can then use these obtained comids to hydrolink the dams, stations, and flowlines.
[21]:
flw_all = pd.concat(flw_up.values(), keys=flw_up.keys()).reset_index().drop(columns="level_1")
flw_all = gpd.GeoDataFrame(
flw_all.rename(columns={"level_0": "usgs_id"}), crs=flw_up["07312200"].crs
)
flw_all.head()
[21]:
usgs_id | geometry | nhdplus_comid | |
---|---|---|---|
0 | 07312200 | LINESTRING (-98.91447 33.90670, -98.91436 33.9... | 13730379 |
1 | 07312200 | LINESTRING (-98.98054 33.88001, -98.98019 33.8... | 13730385 |
2 | 07312200 | LINESTRING (-98.90898 33.91041, -98.90892 33.9... | 13730365 |
3 | 07312200 | LINESTRING (-98.92151 33.91425, -98.92038 33.9... | 13730363 |
4 | 07312200 | LINESTRING (-98.90453 33.92514, -98.90453 33.9... | 13730361 |
Finally, upon hydrolinking these features we can determine the stations that have at least one dam in their upstream up to 10 km.
[22]:
flw_dam = pd.merge(
flw_all[["usgs_id", "nhdplus_comid"]],
dams[["id", "nhdplus_comid"]].rename(columns={"id": "dam_id"}),
on="nhdplus_comid",
how="left",
)
flw_dam = flw_dam[~flw_dam.dam_id.isna()].reset_index(drop=True)
flw_dam["dam_id"] = flw_dam.dam_id.astype(int)
flw_dam
[22]:
usgs_id | nhdplus_comid | dam_id | |
---|---|---|---|
0 | 07312200 | 13730353 | 543785 |
1 | 07312200 | 13730353 | 543855 |
2 | 08095000 | 5531532 | 544204 |
3 | 08095000 | 5531532 | 544213 |
4 | 08158920 | 5781345 | 542221 |
5 | 08158920 | 5781345 | 543202 |
6 | 08057445 | 1260363 | 543218 |
7 | 08177300 | 1638809 | 544499 |
[23]:
site_dam = flw_dam.usgs_id.unique()
site_dam
[23]:
array(['07312200', '08095000', '08158920', '08057445', '08177300'],
dtype=object)
Streamflow Observations#
Upon finalizing the stations that satisfy our criteria, we use NWIS to obtain the daily streamflow data. But first, we need to obtain the year that the construction of dams were completed, so we can exactly specify the required streamflow data period.
[24]:
dams_info = nid.inventory_byid(flw_dam.dam_id.to_list())
dams_info.head()
[24]:
id | associatedStructuresCount | conditionAssessDate | conditionAssessDetail | conditionAssessId | congRepresentative | damHeight | damLength | distance | drainageArea | ... | ownerTypeIds | primaryDamTypeId | primaryOwnerTypeId | primaryPurposeId | purposeIds | dsacId | femaRegion | lastEapExcerDate | dataUpdated | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 543785 | 0 | None | NOT RATED | 6 | Mac Thornberry (R) | 29.0 | 983 | 20 | 0.00 | ... | 1 | 8 | 1 | 2 | 2 | None | 6 | None | 2021-05-24T12:00:00.000Z | POINT (-98.91833 33.92000) |
1 | 543855 | 0 | None | NOT RATED | 5 | Mac Thornberry (R) | 27.0 | 0 | 0 | 0.20 | ... | 1 | 8 | 1 | 2 | 2 | None | 6 | None | 2018-07-23T12:00:00.000Z | POINT (-98.91696 33.92299) |
2 | 544204 | 0 | None | NOT RATED | 5 | Roger Williams (R) | 18.6 | 130 | 0 | 150.00 | ... | 5 | 11 | 5 | 2 | 2;8 | None | 6 | None | 2018-07-23T12:00:00.000Z | POINT (-97.57644 31.79505) |
3 | 544213 | 0 | 2015-12-14T12:00:00.000Z | SATISFACTORY | 1 | Roger Williams (R) | 47.0 | 1300 | 0 | 0.36 | ... | 5 | 8 | 5 | 8 | 8 | None | 6 | None | 2018-07-23T12:00:00.000Z | POINT (-97.60174 31.78104) |
4 | 542221 | 0 | 2016-10-13T12:00:00.000Z | POOR | 3 | Chip Roy (R) | 26.0 | 730 | 0 | 1.34 | ... | 5 | 4 | 5 | 9 | 9 | None | 6 | None | 2018-07-23T12:00:00.000Z | POINT (-97.89934 30.24614) |
5 rows × 90 columns
We use the yearCompleted
field to determine the required study period by looking at 15 years before and after the year completed for each dam. The reason that we opted for 15 years is the fact that during the construction the streamflow is temporarily diverted from its original path and also upon completion it usually takes some time for dams to be filled and reach their operational capacity. So we consider a 5-year buffer around the year completed.
[25]:
flw_dam = pd.merge(
flw_dam,
dams_info[["id", "nidStorage", "nidHeight", "yearCompleted"]].rename(columns={"id": "dam_id"}),
on="dam_id",
how="left",
)
flw_dam["start"] = pd.to_datetime(flw_dam.yearCompleted - 15, format="%Y")
flw_dam["end"] = pd.to_datetime(flw_dam.yearCompleted + 15, format="%Y") + pd.DateOffset(
years=1, days=-1
)
flw_dam.head()
[25]:
usgs_id | nhdplus_comid | dam_id | nidStorage | nidHeight | yearCompleted | start | end | |
---|---|---|---|---|---|---|---|---|
0 | 07312200 | 13730353 | 543785 | 22 | 30.0 | 1997 | 1982-01-01 | 2012-12-31 |
1 | 07312200 | 13730353 | 543855 | 135 | 30.0 | 1997 | 1982-01-01 | 2012-12-31 |
2 | 08095000 | 5531532 | 544204 | 410 | 22.6 | 1998 | 1983-01-01 | 2013-12-31 |
3 | 08095000 | 5531532 | 544213 | 731 | 73.0 | 2000 | 1985-01-01 | 2015-12-31 |
4 | 08158920 | 5781345 | 542221 | 156 | 26.0 | 1999 | 1984-01-01 | 2014-12-31 |
We then use these dates to obtain the streamflow.
[26]:
dates = (flw_dam.start.min(), flw_dam.end.max())
qobs = nwis.get_streamflow(site_dam, dates)
Now, let’s take a look at the data and assess the station by visual inspection. Note the there are several methods for quantifying the impact of dams on streamflow such as Reservoir Index, but in this tutorial we just rely on visual inspection for brevity.
[27]:
ma = qobs.groupby(pd.Grouper(freq="Y")).mean()
yc = flw_dam.set_index("usgs_id").yearCompleted
fig, axs = pplt.subplots(nrows=ma.shape[1], refwidth=5, refaspect=3, sharex=True, sharey=False)
hs = [0, 1]
for i, s in enumerate(ma):
hs[0] = axs[i].plot(ma[s], label="Mean annual flow (cms)")
axs[i].format(ylim=(0, ma[s].quantile(0.95) * 1.7), ylabel=s)
yrs = yc.loc[s.replace("USGS-", "")]
yrs = [yrs] if isinstance(yrs, np.int64) else yrs
for yr in yrs:
hs[1] = axs[i].axvline(
pd.to_datetime(yr, format="%Y"),
color="r",
linestyle="--",
linewidth=2,
label="Dam Year Completed",
)
_ = fig.legend(hs, loc="t")

We can see that based on the available data and visual inspection, the first station shows a noticeable difference before and after the dam construction. Next we take a closer look at this station, USGS-07312200
.
Land Use/Land Cover (LULC)#
To gain a better understanding of the land features around and near the USGS-07312200
station and its dams, we use the LULC data. For this purpose, first, we need to obtain the local catchments up to 10 km upstream of the station. The WaterData web service has several layers one of which is local catchments. Let’s use the comids to obtain their corresponding catchments.
[28]:
station_id = ma.columns[0].replace("USGS-", "")
wd = WaterData("catchmentsp")
catchment = wd.byid("featureid", flw_up[station_id].nhdplus_comid.to_list())
catchment.head()
[28]:
geometry | gridcode | featureid | sourcefc | areasqkm | shape_length | shape_area | |
---|---|---|---|---|---|---|---|
0 | MULTIPOLYGON (((-98.92139 33.91434, -98.92185 ... | 1347963 | 13730353 | NHDFlowline | 0.973724 | 0.059060 | 0.000095 |
1 | MULTIPOLYGON (((-98.96395 33.90708, -98.96382 ... | 1347704 | 13730355 | NHDFlowline | 5.234858 | 0.119837 | 0.000510 |
2 | MULTIPOLYGON (((-98.90715 33.90959, -98.90847 ... | 1347573 | 13730361 | NHDFlowline | 2.926106 | 0.087883 | 0.000285 |
3 | MULTIPOLYGON (((-98.92131 33.91219, -98.92150 ... | 1347533 | 13730363 | NHDFlowline | 0.600177 | 0.033255 | 0.000059 |
4 | MULTIPOLYGON (((-98.91456 33.90672, -98.91476 ... | 1347708 | 13730365 | NHDFlowline | 0.277969 | 0.036201 | 0.000027 |
[29]:
_, ax = pplt.subplots(refwidth=7)
catchment.plot(ax=ax, facecolor="none", edgecolor="k", lw=0.5)
flw_up[station_id].plot(ax=ax, color="b", linewidth=1.5, zorder=1)
sites[sites.site_no.isin([station_id])].plot(
ax=ax, edgecolor="k", c="r", marker="*", ms=150, label="Station"
)
dams[dams.nhdplus_comid.isin(flw_up[station_id].nhdplus_comid)].plot(
ax=ax, edgecolor="k", c="m", marker="^", ms=50, label="Dam"
)
ax.legend(loc="upper left")
ax.set_axis_off()

Now, we can retrieve the LULC data at 30-m resolution within the union of these catchments using the Multi-Resolution Land Characteristics (MRLC) Consortium web service.
[30]:
geom = gpd.GeoSeries([catchment.unary_union], crs=catchment.crs, index=[station_id])
nlcd = gh.nlcd_bygeom(geom, 30)[station_id]
nlcd
[30]:
<xarray.Dataset> Dimensions: (y: 268, x: 383) Coordinates: * y (y) float64 33.94 33.93 33.93 33.93 ... 33.86 33.86 33.86 * x (x) float64 -99.0 -99.0 -99.0 ... -98.87 -98.87 -98.87 spatial_ref int64 0 Data variables: canopy_2016 (y, x) float32 dask.array<chunksize=(268, 383), meta=np.ndarray> impervious_2019 (y, x) float32 dask.array<chunksize=(268, 383), meta=np.ndarray> descriptor_2019 (y, x) uint8 dask.array<chunksize=(268, 383), meta=np.ndarray> cover_2019 (y, x) uint8 dask.array<chunksize=(268, 383), meta=np.ndarray>
[31]:
meta = gh.helpers.nlcd_helper()
pd.Series(meta["classes"])
[31]:
11 Open Water - All areas of open water, generall...
12 Perennial Ice/Snow - All areas characterized b...
21 Developed, Open Space - Includes areas with a ...
22 Developed, Low Intensity -Includes areas with ...
23 Developed, Medium Intensity - Includes areas w...
24 Developed, High Intensity - Includes highly de...
31 Barren Land (Rock/Sand/Clay) - Barren areas of...
41 Deciduous Forest - Areas dominated by trees ge...
42 Evergreen Forest - Areas dominated by trees ge...
43 Mixed Forest - Areas dominated by trees genera...
45 Shrub-Forest - Areas identified as currently s...
46 Herbaceous-Forest - Areas identified as curren...
51 Dwarf Scrub - Alaska only areas dominated by s...
52 Shrub/Scrub - Areas dominated by shrubs; less ...
71 Grassland/Herbaceous - Areas dominated by gram...
72 Sedge/Herbaceous - Alaska only areas dominated...
73 Lichens - Alaska only areas dominated by fruti...
74 Moss - Alaska only areas dominated by mosses, ...
81 Pasture/Hay - Areas of grasses, legumes, or gr...
82 Cultivated Crops - Areas used for the producti...
90 Woody Wetlands - Areas where forest or shrub l...
95 Emergent Herbaceous Wetlands - Areas where per...
127 Background value
dtype: object
[32]:
cmap, norm, levels = gh.plot.cover_legends()
_, ax = pplt.subplots(refwidth=6, refaspect=2)
nlcd.cover_2019.plot(ax=ax, cmap=cmap, levels=levels, cbar_kwargs={"ticks": levels[:-1]})
ax.set_xlabel("")
ax.set_ylabel("")
ax.set_title("Land Use/Land Cover 2019")
flw_up[station_id].plot(ax=ax, color="b", linewidth=1.5, zorder=1)
sites[sites.site_no.isin([station_id])].plot(
ax=ax, edgecolor="k", c="r", marker="*", ms=150, label="Station"
)
dams[dams.nhdplus_comid.isin(flw_up[station_id].nhdplus_comid)].plot(
ax=ax, edgecolor="k", c="m", marker="^", ms=50, label="Dam"
)
ax.legend(loc="upper left")
ax.set_axis_off()

As is evident from the plot, there are mostly cultivated crops (82) and grasslands (71) close to the dams.
Final Remarks#
This tutorial demonstrated the application of various web services for obtaining hydrolinked geospatial data using only web services. These web services are powerful tools that make big geospatial data more equitable and the rich ecosystem of tools that are available in R and Python facilitate accessing them.
This page was generated from notebooks/daymet.ipynb.
Interactive online version:
Climate Data#
[1]:
import matplotlib.pyplot as plt
import pydaymet as daymet
from pynhd import NLDI
[2]:
import warnings
warnings.filterwarnings("ignore", message=".*Index.*")
The Daymet database provides climatology data at 1-km resolution. First, we use PyNHD to get the contributing watershed geometry of a NWIS station with the ID of USGS-01031500
:
[3]:
geometry = NLDI().get_basins("01031500").geometry[0]
PyDaymet allows us to get the data for a single pixel or for a region as gridded data. The function to get single pixel is called pydaymet.get_bycoords
and for gridded data is called pydaymet.get_bygeom
. The arguments of these functions are identical except the first argument where the latter should be polygon and the former should be a coordinate (a tuple of length two as in (x, y)
).
The input geometry or coordinate can be in any valid CRS (defaults to EPSG:4326). The date
argument can be either a tuple of length two like (start_str, end_str)
or a list of years like [2000, 2005]
.
Additionally, we can pass time_scale
to get daily, monthly or annual summaries. This flag by default is set to daily. We can pass time_scale
as daily
, monthly
, or annual
to get_bygeom
or get_bycoords
functions to download the respective summaries.
PyDaymet
can also compute Potential EvapoTranspiration (PET) at daily time scale using three methods: penman_monteith
, priestley_taylor
, and hargreaves_samani
. Let’s get the data for six days and plot PET.
[4]:
dates = ("2000-01-01", "2000-01-06")
daily = daymet.get_bygeom(geometry, dates, variables="prcp", pet="hargreaves_samani")
[5]:
ax = daily.where(daily.pet > 0).pet.plot(x="x", y="y", row="time", col_wrap=3)
ax.fig.savefig("_static/daymet_grid.png", facecolor="w", bbox_inches="tight")

Now, let’s get the monthly summaries for six months.
[6]:
var = ["prcp", "tmin"]
dates = ("2000-01-01", "2000-06-30")
monthly = daymet.get_bygeom(geometry, dates, variables=var, time_scale="monthly")
[7]:
ax = monthly.where(monthly.prcp > 0).prcp.plot(x="x", y="y", row="time", col_wrap=3)

Note that the default CRS is EPSG:4326. If the input geometry (or coordinate) is in a different CRS we can pass it to the function. The gridded data are automatically masked to the input geometry. Now, Let’s get the data for a coordinate in EPSG:3542 CRS.
[8]:
coords = (-1431147.7928, 318483.4618)
crs = "epsg:3542"
dates = ("2000-01-01", "2006-12-31")
annual = daymet.get_bycoords(
coords, dates, variables=["prcp", "tmin"], crs=crs, time_scale="annual"
)
[9]:
fig = plt.figure(figsize=(6, 4), facecolor="w")
gs = fig.add_gridspec(1, 2)
axes = gs[:].subgridspec(2, 1, hspace=0).subplots(sharex=True)
annual["tmin (degrees C)"].plot(ax=axes[0], color="r")
axes[0].set_ylabel(r"$T_{min}$ ($^\circ$C)")
axes[0].xaxis.set_ticks_position("none")
annual["prcp (mm/day)"].plot(ax=axes[1])
axes[1].set_ylabel("$P$ (mm/day)")
plt.tight_layout()
fig.savefig("_static/daymet_loc.png", facecolor="w", bbox_inches="tight")

Next, let’s get annual total precipitation for Hawaii and Puerto Rico for 2010.
[10]:
hi_ext = (-160.3055, 17.9539, -154.7715, 23.5186)
pr_ext = (-67.9927, 16.8443, -64.1195, 19.9381)
hi = daymet.get_bygeom(hi_ext, 2010, variables="prcp", region="hi", time_scale="annual")
pr = daymet.get_bygeom(pr_ext, 2010, variables="prcp", region="pr", time_scale="annual")
[11]:
ax = hi.where(hi.prcp > 0).prcp.plot(size=4)
plt.title("Hawaii, 2010")
ax.figure.savefig("_static/hi.png", facecolor="w", bbox_inches="tight")

[12]:
ax = pr.where(pr.prcp > 0).prcp.plot(size=4)
plt.title("Puerto Rico, 2010")
ax.figure.savefig("_static/pr.png", facecolor="w", bbox_inches="tight")

This page was generated from notebooks/geoconnex.ipynb.
Interactive online version:
GeoConnex#
[11]:
import geopandas as gpd
import pandas as pd
import pynhd
from pynhd import GeoConnex
GeoConnex is a web service developed by Internet Of Water that provides access to many hydro-linked datasets. PyNHD provides access to all GeoConnex endpoints.
For single queries it’s recommended to use the pynhd.geoconnex
function and for multiple queries, it’s more efficient to directly use the underlying pynhd.GeoConnex
class that the pynhd.geoconnex
function calls under-the-hood.
Let’s start by the geoconnex
function. If you run the function without any arguments, it will print out a list of available endpoints. Moreover, if you run the function with item
but no query
, it will print out the description, queryable fields, and extent of the selected endpoint (item
).
[12]:
pynhd.geoconnex()
Available Endpoints:
'hu02': Two-digit Hydrologic Regions
'hu04': Four-digit Hydrologic Subregion
'hu06': Six-digit Hydrologic Basins
'hu08': Eight-digit Hydrologic Subbasins
'hu10': Ten-digit Watersheds
'nat_aq': National Aquifers of the United States from USGS National Water Information System National Aquifer code list.
'principal_aq': Principal Aquifers of the United States from 2003 USGS data release
'sec_hydrg_reg': Secondary Hydrogeologic Regions of the Conterminous United States from 2018 USGS data release
'gages': US Reference Stream Gage Monitoring Locations
'mainstems': US Reference Mainstem Rivers
'states': U.S. States
'counties': U.S. Counties
'aiannh': Native American Lands
'cbsa': U.S. Metropolitan and Micropolitan Statistical Areas
'ua10': Urbanized Areas and Urban Clusters (2010 Census)
'places': U.S. legally incororated and Census designated places
'pws': U.S. Public Water Systems
For example, let’s select the hu02
item and find all its queryable fields.
[13]:
pynhd.geoconnex("hu02")
Item: 'hu02'
Description: Two-digit Hydrologic Regions
Queryable Fields: geometry, fid, uri, NAME, gnis_url, GNIS_ID, HUC2, LOADDATE
Extent: (-170, 15, -51, 72)
Now that we know all the queryable fields of hu02
, we can decide what’s the best way to pull our desired data. For example, let’s get the data for “HUC02” using HUC2
field.
[14]:
%load_ext pyinstrument
The pyinstrument extension is already loaded. To reload it, use:
%reload_ext pyinstrument
[15]:
huc2 = pynhd.geoconnex(item="hu02", query={"HUC2": "02"})
ax = huc2.plot(figsize=(6, 6))
ax.set_axis_off()

GeoConnex, also supports spatial query. At the moment, it only accepts requesting features within a bounding box. For example, let’s get all the stations within HUC02 that we previously retrieved.
[16]:
gages = pynhd.geoconnex(item="gages", query={"geometry": huc2.geometry.iloc[0]})
[17]:
ax = huc2.plot(facecolor="none", edgecolor="black", figsize=(6, 6))
gages.plot(ax=ax, color="red", marker="*")
ax.set_axis_off()

Next, we want to retrieve the available data for a couple of USGS stations using the GeoConnex
class.
[18]:
gcx = GeoConnex()
gcx.item = "gages" # instead we can instantiate the class like so GeoConnex("gages")
station_ids = ["01031450", "01031500", "01031510"]
gages = gpd.GeoDataFrame(
pd.concat((gcx.query({"provider_id": sid}) for sid in station_ids), ignore_index=True),
crs="epsg:4326",
)
gages
[18]:
geometry | fid | uri | name | description | subjectOf | provider | provider_id | nhdpv2_REACHCODE | nhdpv2_REACH_measure | nhdpv2_COMID | mainstem_uri | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POINT (-69.45222 45.18444) | 58690 | https://geoconnex.us/ref/gages/1077762 | Kingsbury Stream at Abbot Village, Maine | USGS NWIS Stream/River/Lake Site 01031450: Kin... | https://waterdata.usgs.gov/monitoring-location... | https://waterdata.usgs.gov | 01031450 | 01020004001387 | 54.241330 | 1722913 | https://geoconnex.us/ref/mainstems/2245317 |
1 | POINT (-69.31472 45.17500) | 67663 | https://geoconnex.us/ref/gages/1042227 | Piscataquis River near Dover-Foxcroft, Maine | USGS NWIS Stream/River/Lake Site 01031500: Pis... | https://waterdata.usgs.gov/monitoring-location... | https://waterdata.usgs.gov | 01031500 | 01020004000247 | 0.942998 | 1722317 | https://geoconnex.us/ref/mainstems/2244663 |
2 | POINT (-69.23639 45.15028) | 43877 | https://geoconnex.us/ref/gages/1039176 | Black Stream near Dover-Foxcroft, Maine | USGS NWIS Stream/River/Lake Site 01031510: Bla... | https://waterdata.usgs.gov/monitoring-location... | https://waterdata.usgs.gov | 01031510 | 01020004000353 | 63.771800 | 1722933 | https://geoconnex.us/ref/mainstems/2246031 |
Let’s make a bit more advanced spatial query. Assume that we want to get all the stations within a 5-km radius of each the stations that we obtained above. First, we need to buffer the points by 5 km to create a multipolygon. Then we pass it as a query to geoconnex
.
[19]:
buff = gages.to_crs(5070).buffer(5e3).to_crs(4326)
gages = pynhd.geoconnex(item="gages", query={"geometry": buff.unary_union})
[20]:
ax = buff.plot(facecolor="none", edgecolor="black", figsize=(6, 6))
gages.plot(ax=ax, color="red", marker="*")
ax.set_axis_off()

This page was generated from notebooks/hymod_calibration.ipynb.
Interactive online version:
Hydrological Modeling#
[1]:
from collections import OrderedDict
import contextily as ctx
import numpy as np
import pandas as pd
import psutil
import pydaymet as daymet
import pygmo as pg
from hymod import HYMOD, compute_kge
from pygeohydro import NWIS
from pynhd import NLDI
We use HyRiver packages to get the required input data for hydrological modeling of a watershed. We use PyDaymet to get precipitation and potential evapotranspiration and PyGeoHydro to get streamflow observations. Then, we use an implementation of the HYMOD model for our watershed analysis. The source for the model can be found in hymod.py file.
For this tutorial, we select the Brays Bayou watershed in Houston, TX. There is a NWIS station in this watershed (USGS-08075000
) that has streamflow observation data for more than 10 years (2003-2013).
We start by getting the watershed drainage area geometry using PyNHD and station information using PyGeoHydro. Since the station is not located at the outlet of the watershed, we set split_catchment
flag to True
to split the catchment at the station’s location.
[2]:
station = "08075000"
dates = ("2003-01-01", "2012-12-31")
nwis = NWIS()
basin = NLDI().get_basins(station, split_catchment=True)
info = nwis.get_info({"site": station})
info
[2]:
agency_cd | site_no | station_nm | site_tp_cd | dec_lat_va | dec_long_va | coord_acy_cd | dec_coord_datum_cd | alt_va | alt_acy_va | alt_datum_cd | huc_cd | drain_sqkm | hcdn_2009 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | USGS | 08075000 | Brays Bayou at Houston, TX | ST | 29.697175 | -95.412162 | F | NAD83 | 0.0 | 0.01 | NAVD88 | 12040104 | 261.5814 | False | POINT (-95.41216 29.69717) |
Let’s check the area of the watershed and plot it.
[3]:
area_sqm = basin.to_crs(basin.estimate_utm_crs()).area.values[0]
ax = basin.plot(figsize=(15, 10), alpha=0.5, edgecolor="black")
info.plot(ax=ax, markersize=50, color="red")
ax.set_title(f"USGS-{station} ({area_sqm * 1e-6:.1f} sqkm)")
ctx.add_basemap(ax, crs=basin.crs)
ax.margins(0)
ax.set_axis_off()

Now, we get some information about this station and then retrieve streamflow observations within the period of our study for calibrating the model from PyGeoHydro. Since HYMOD’s output is in mm/day we can either set the mmd
flag of NWIS.get_streamflow
to True
to return the data in mm/day instead of the default cms, or use area_sqm
that we obtained previously to do the conversion. Note that these two methods might not return identical results since NWIS.get_streamflow
uses
GagesII and NWIS to get the drainage area while here we used NLDI to obtain the area and these two are not always the same.
[4]:
qobs = nwis.get_streamflow(station, dates)
qobs = qobs * (1000.0 * 24.0 * 3600.0) / area_sqm
We can check the difference in areas produced by these two methods as follows:
[5]:
area_nwis = nwis._drainage_area_sqm(info, "dv")
f"NWIS: {area_nwis.iloc[0] * 1e-6:.1f} sqkm, NLDI: {area_sqm * 1e-6:.1f} sqkm"
[5]:
'NWIS: 261.6 sqkm, NLDI: 273.0 sqkm'
PyDaymet can compute Potential EvapoTranspiration (PET) at daily timescale using three methods: penman_monteith
, priestley_taylor
, and hargreaves_samani
. Let’s use hargreaves_samani
and get the data for our study period.
[6]:
clm = daymet.get_bygeom(basin.geometry[0], dates, variables="prcp", pet="hargreaves_samani")
[7]:
_ = (
clm.where(clm.pet > 0)
.pet.isel(time=range(100, 700, 100))
.plot(x="x", y="y", row="time", col_wrap=3)
)

Since our HYMOD implementation is a lumped model, we need to take the areal average of the input precipitation and potential evapotranspiration data. Moreover, since Daymet’s calendar doesn’t have leap years (365-day year), we need to make sure that the time index of the input streamflow observation matches that of the climate data.
[8]:
clm_df = clm.mean(dim=["x", "y"]).to_dataframe()[["prcp", "pet"]]
clm_df.index = pd.to_datetime(clm_df.index.date)
qobs.index = pd.to_datetime(qobs.index.to_series().dt.tz_localize(None).dt.date)
idx = qobs.index.intersection(clm_df.index)
qobs, clm_df = qobs.loc[idx], clm_df.loc[idx]
Now, let’s use pygmo to calibrate the model parameters using 70% of the observation data and Kling-Gupta Efficiency as objective function.
[9]:
param_bounds = OrderedDict(
[
("c_max", (1, 100)), # Maximum storage capacity
("b_exp", (0.0, 2)), # Degree of spatial variability of the soil moisture capacity
("alpha", (0.2, 0.99)), # Factor for dividing flow to slow and quick releases
("k_s", (0.01, 0.5)), # Residence time of the slow release reservoir
("k_q", (0.5, 1.2)), # Residence time of the quick release reservoir
]
)
warm_up = 1 # year
cal_idx = int(0.7 * clm_df.shape[0]) # 70% of data for calibration
bounds = list(zip(*param_bounds.values()))
udp = HYMOD(clm_df.iloc[:cal_idx], qobs.iloc[:cal_idx], bounds, 1)
prob = pg.problem(udp)
n_procs = psutil.cpu_count(logical=False)
pop_size = 1000
algo = pg.algorithm(pg.sade(gen=int(pop_size / n_procs)))
archi = pg.archipelago(
n=n_procs,
algo=algo,
prob=prob,
pop_size=pop_size,
)
archi.evolve()
archi.wait_check()
_, vectors, fits = zip(*archi.get_migrants_db())
params = vectors[np.argmin(fits)].squeeze()
dict(zip(param_bounds, params))
[9]:
{'c_max': 19.022301450548646,
'b_exp': 1.9996427290149237,
'alpha': 0.9867340700807876,
'k_s': 0.010048422855588962,
'k_q': 0.7490491923818765}
With the obtained calibrated parameters, we can validate the model over the validation period (30%) of the streamflow observation.
[10]:
sim = udp.simulate(*clm_df.iloc[cal_idx:].to_numpy("f8").T, *tuple(params))
obs = qobs.iloc[cal_idx:][f"USGS-{station}"].to_numpy()
kge = compute_kge(obs, sim)
name = info.station_nm.values[0]
index = qobs.iloc[cal_idx:].index
discharge = pd.DataFrame({"Simulated": sim, "Observed": obs}, index=index)
_ = discharge.plot(
figsize=(15, 5),
xlabel="Validation Period",
ylabel="Discharge (mm/day)",
title=f"{name}\nKGE = {kge:.2f}",
)

This page was generated from notebooks/ndvi.ipynb.
Interactive online version:
NDVI#
[1]:
import io
import async_retriever as ar
import pandas as pd
import xarray as xr
Let’s use the DAAC server to get NDVI. We can use AsyncRetriever
and pass it directly to xarray.open_mfdataset
.
[2]:
west, south, east, north = (-69.77, 45.07, -69.31, 45.45)
base_url = "https://thredds.daac.ornl.gov/thredds/ncss/ornldaac/1299"
dates_itr = ((pd.to_datetime(f"{y}0101"), pd.to_datetime(f"{y}0131")) for y in range(2000, 2005))
urls, kwds = zip(
*(
(
f"{base_url}/MCD13.A{s.year}.unaccum.nc4",
{
"params": {
"var": "NDVI",
"north": f"{north}",
"west": f"{west}",
"east": f"{east}",
"south": f"{south}",
"disableProjSubset": "on",
"horizStride": "1",
"time_start": s.strftime("%Y-%m-%dT%H:%M:%SZ"),
"time_end": e.strftime("%Y-%m-%dT%H:%M:%SZ"),
"timeStride": "1",
"addLatLon": "true",
"accept": "netcdf",
}
},
)
for s, e in dates_itr
)
)
resp = ar.retrieve(urls, "binary", request_kwds=kwds, max_workers=8)
data = xr.open_mfdataset(io.BytesIO(r) for r in resp)
[3]:
ax = data.isel(time=slice(10, 16)).NDVI.plot(x="x", y="y", row="time", col_wrap=3)
ax.fig.savefig("_static/ndvi.png", bbox_inches="tight", facecolor="w")

This page was generated from notebooks/nhdplus.ipynb.
Interactive online version:
River Network#
[1]:
import geopandas as gpd
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pynhd as nhd
from pynhd import NLDI, NHDPlusHR, WaterData
[2]:
import warnings
warnings.filterwarnings("ignore")
PyNHD provides access to the Hydro Network-Linked Data Index (NLDI) and the WaterData web services for navigating and subsetting NHDPlus V2 database. Additionally, you can download NHDPlus High Resolution data as well.
First, let’s get the watershed geometry of the contributing basin of a USGS station using NLDI
:
[3]:
nldi = NLDI()
station_id = "01031500"
basin = nldi.get_basins(station_id)
The navigate_byid
class method can be used to navigate NHDPlus in both upstream and downstream of any point in the database. The available feature sources are comid
, huc12pp
, nwissite
, wade
, wqp
. Let’s get ComIDs and flowlines of the tributaries and the main river channel in the upstream of the station.
[4]:
flw_main = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamMain",
source="flowlines",
distance=1000,
)
flw_trib = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="flowlines",
distance=1000,
)
We can get other USGS stations upstream (or downstream) of the station and even set a distance limit (in km):
[5]:
st_all = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="nwissite",
distance=1000,
)
st_d20 = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="nwissite",
distance=20,
)
Now, let’s get the HUC12 pour points:
[6]:
pp = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="huc12pp",
distance=1000,
)
Let’s plot the vector data:
[7]:
ax = basin.plot(facecolor="none", edgecolor="k", figsize=(8, 8))
st_all.plot(ax=ax, label="USGS stations", marker="*", markersize=300, zorder=4, color="b")
st_d20.plot(
ax=ax,
label="USGS stations up to 20 km",
marker="v",
markersize=100,
zorder=5,
color="darkorange",
)
pp.plot(ax=ax, label="HUC12 pour points", marker="o", markersize=50, color="k", zorder=3)
flw_main.plot(ax=ax, lw=3, color="r", zorder=2, label="Main")
flw_trib.plot(ax=ax, lw=1, zorder=1, label="Tributaries")
ax.legend(loc="best")
ax.set_aspect("auto")
ax.figure.set_dpi(100)
ax.figure.savefig("_static/nhdplus_navigation.png", bbox_inches="tight", facecolor="w")

Next, we get the slope data for each river segment from NHDPlus VAA database:
[8]:
vaa = nhd.nhdplus_vaa("input_data/nhdplus_vaa.parquet")
flw_trib["comid"] = pd.to_numeric(flw_trib.nhdplus_comid)
slope = gpd.GeoDataFrame(
pd.merge(flw_trib, vaa[["comid", "slope"]], left_on="comid", right_on="comid"),
crs=flw_trib.crs,
)
slope[slope.slope < 0] = np.nan
[9]:
slope.plot(
figsize=(8, 8),
column="slope",
cmap="plasma",
legend=True,
legend_kwds={"label": "Slope (m/m)"},
)
[9]:
<AxesSubplot:>

Now, let’s use WaterData service to get the headwater catchments for this basin:
[10]:
wd_cat = WaterData("catchmentsp")
cat = wd_cat.bygeom(basin.geometry[0], predicate="OVERLAPS")
[11]:
ax = cat.plot(figsize=(8, 8))
basin.plot(ax=ax, facecolor="none", edgecolor="k")
ax.set_aspect("auto")
ax.figure.set_dpi(100)

We might to get all the HUC12 pour points within a radius of an station. We can achieve this using bydistance
method of WaterData
class. For example, let’s take station 01031300 and find all the flowlines within a radius of 5 km.
[12]:
eck4 = "+proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
st300 = st_all[st_all.identifier == "USGS-01031300"].to_crs(eck4)
coords = st300.geometry.values[0].coords.xy
x, y = coords[0][0], coords[1][0]
rad = 5e3
[13]:
nhdp_mr = WaterData("nhdflowline_network")
flw_rad = nhdp_mr.bydistance((x, y), rad, loc_crs=eck4)
flw_rad = flw_rad.to_crs(eck4)
Instead of getting all features within a radius of the coordinate, we can snap to the closest flowline using NLDI
:
[14]:
comid_closest = nldi.comid_byloc((x, y), eck4)
flw_closest = nhdp_mr.byid("comid", comid_closest.comid.values[0])
[15]:
ax = basin.to_crs(eck4).plot(figsize=(8, 8), facecolor="none", edgecolor="k")
flw_rad.plot(ax=ax)
flw_closest.to_crs(eck4).plot(ax=ax, color="orange", lw=3)
ax.legend(["Flowlines", "Closest Flowline"])
circle = mpatches.Circle((x, y), rad, ec="r", fill=False)
arrow = mpatches.Arrow(x, y, 0.6 * rad, -0.8 * rad, 1500, color="g")
ax.text(x + 0.6 * rad, y + -rad, f"{int(rad)} m", ha="center")
ax.add_artist(circle)
ax.set_aspect("equal")
ax.add_artist(arrow)
ax.figure.set_dpi(100)
ax.axis("off")
ax.figure.savefig("_static/nhdplus_radius.png", bbox_inches="tight", facecolor="w")

WaterData
gives us access to the medium-resolution NHDPlus database. We can use NHDPlusHR
to retrieve high-resolution NHDPlus data. Let’s get both medium- and high-resolution flowlines within the bounding box of our watershed and compare them. Moreover, Since several web services offer access to NHDPlus database, NHDPlusHR
has an argument for selecting a service and also an argument for automatically switching between services.
[17]:
flw_mr = nhdp_mr.bybox(basin.geometry[0].bounds)
nhdp_hr = NHDPlusHR("flowline")
flw_hr = nhdp_hr.bygeom(basin.geometry[0].bounds)
[18]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8), facecolor="w")
flw_mr.plot(ax=ax1)
ax1.set_title("NHDPlus Medium Resolution")
flw_hr.plot(ax=ax2)
ax2.set_title("NHDPlus High Resolution")
fig.savefig("_static/hr_mr.png", bbox_inches="tight", facecolor="w")

Since NHDPlus HR is still at the pre-release stage, let’s use the MR flowlines to demonstrate the vector-based accumulation.
Based on a topological sorted river network pynhd.vector_accumulation
computes flow accumulation in the network. It returns a dataframe which is sorted from upstream to downstream that shows the accumulated flow in each node.
PyNHD has a utility called prepare_nhdplus
that identifies such relationship among other things such as fixing some common issues with NHDPlus flowlines. But first we need to get all the NHDPlus attributes for each ComID since NLDI
only provides the flowlines’ geometries and ComIDs which is useful for navigating the vector river network data. For getting the NHDPlus database we use WaterData
. The WaterData web service layers are nhdflowline_network
, nhdarea
, nhdwaterbody
,
catchmentsp
, gagesii
, huc08
, huc12
, huc12agg
, and huc12all
. Let’s use the nhdflowline_network
layer to get required info.
[19]:
comids = [int(c) for c in flw_trib.nhdplus_comid.to_list()]
nhdp_trib = nhdp_mr.byid("comid", comids)
flw = nhd.prepare_nhdplus(nhdp_trib, 0, 0, purge_non_dendritic=False)
To demonstrate the use of routing, let’s use nhdplus_attrs
function to get list of available NHDPlus attributes from Select Attributes for NHDPlus Version 2.1 Reach Catchments and Modified Network Routed Upstream Watersheds for the Conterminous United States item on ScienceBase
service. These attributes are in catchment-scale and are available in three categories:
Local (
local
): For individual reach catchments,Total (
upstream_acc
): For network-accumulated values using total cumulative drainage area,Divergence (
div_routing
): For network-accumulated values using divergence-routed.
[20]:
char_ids = nldi.get_validchars("local")
char_ids.head(5)
[20]:
characteristic_description | units | dataset_label | dataset_url | theme_label | theme_url | characteristic_type | |
---|---|---|---|---|---|---|---|
CAT_WILDFIRE_2011 | percent of wild fires per catchments for 2011 | percent | Estimated percent of catchment that experience... | https://www.sciencebase.gov/catalog/item/57f66... | Land Cover | unknown | localCatch_name |
CAT_CONTACT | Subsurface flow contact time index. The subsur... | days | Contact time, the length of time it takes for ... | https://www.sciencebase.gov/catalog/item/56f96... | Hydrologic | https://www.sciencebase.gov/catalog/item/5669a... | localCatch_name |
CAT_EWT | Average depth to water table relatice to the l... | meters | Average depth to water table relative to land ... | https://www.sciencebase.gov/catalog/item/56f97... | Hydrologic | https://www.sciencebase.gov/catalog/item/5669a... | localCatch_name |
CAT_FUNGICIDE | Fungicide use on agricultural land in kilogram... | kilogram per square kilometer | Fungicide use on agricultural land in kilogram... | https://www.sciencebase.gov/catalog/item/56fd6... | Chemical | https://www.sciencebase.gov/catalog/item/56fd6... | localCatch_name |
CAT_IEOF | Percentage of Horton overland flow as a percent | percent | Mean infiltration-excess overland flow as a pe... | https://www.sciencebase.gov/catalog/item/56f97... | Hydrologic | https://www.sciencebase.gov/catalog/item/5669a... | localCatch_name |
Let’s get Mean Annual Groundwater Recharge, RECHG
, using getcharacteristic_byid
class method and carry out the flow accumulation.
[21]:
char = "CAT_RECHG"
area = "areasqkm"
local = nldi.getcharacteristic_byid(comids, "local", char_ids=char)
flw = flw.merge(local[char], left_on="comid", right_index=True)
def runoff_acc(qin, q, a):
return qin + q * a
flw_r = flw[["comid", "tocomid", char, area]]
runoff = nhd.vector_accumulation(flw_r, runoff_acc, char, [char, area])
def area_acc(ain, a):
return ain + a
flw_a = flw[["comid", "tocomid", area]]
areasqkm = nhd.vector_accumulation(flw_a, area_acc, area, [area])
runoff /= areasqkm
Since these are catchment-scale characteristic, let’s get the catchments then add the accumulated characteristic as a new column and plot the results.
[22]:
catchments = wd_cat.byid("featureid", comids)
c_local = catchments.merge(local, left_on="featureid", right_index=True)
c_acc = catchments.merge(runoff, left_on="featureid", right_index=True)
The following requested feature IDs were not found in WaterData:
1723559, 1721835, 1723355, 1721829, 1721857, 1721855, 1723523, 1723331
[23]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8), facecolor="w")
cmap = "viridis"
norm = plt.Normalize(vmin=c_local.CAT_RECHG.min(), vmax=c_acc.acc_CAT_RECHG.max())
c_local.plot(ax=ax1, column=char, cmap=cmap, norm=norm)
flw.plot(ax=ax1, column="streamorde", cmap="Blues", scheme="fisher_jenks")
ax1.set_title("Groundwater Recharge (mm/yr)")
c_acc.plot(ax=ax2, column=f"acc_{char}", cmap=cmap, norm=norm)
flw.plot(ax=ax2, column="streamorde", cmap="Blues", scheme="fisher_jenks")
ax2.set_title("Accumulated Groundwater Recharge (mm/yr)")
cax = fig.add_axes(
[
ax2.get_position().x1 + 0.01,
ax2.get_position().y0,
0.02,
ax2.get_position().height,
]
)
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
fig.colorbar(sm, cax=cax)
fig.savefig("_static/flow_accumulation.png", bbox_inches="tight", facecolor="w")

This page was generated from notebooks/nid.ipynb.
Interactive online version:
National Inventory of Dams#
[1]:
import geopandas as gpd
from pygeohydro import NID
[2]:
SAVE_KWDS = {"bbox_inches": "tight", "dpi": 300, "facecolor": "w"}
CRS = "esri:102008"
First, we need to instantiate the NID class.
[3]:
nid = NID()
Some dam coordinates are either missing or incorrect. Let’s get dams that are within Contiguous US with max storage larger than 200 acre-feet.
[4]:
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
conus_geom = world[world.iso_a3 == "USA"].geometry.iloc[0].geoms[0]
dam_list = nid.get_byfilter([{"maxStorage": ["[200 5000]"]}])
conus_dams = dam_list[0][dam_list[0].is_valid]
conus_dams = conus_dams[conus_dams.within(conus_geom)]
conus_dams = nid.inventory_byid(conus_dams.id.to_list())
Next, we can get a count of the top 10 dams based on types.
[5]:
dam_types = conus_dams.primaryDamTypeId.fillna(-1).astype(int)
conus_dams["primaryDamTypeName"] = dam_types.apply(nid.dam_type.get)
[6]:
types_count = conus_dams["primaryDamTypeName"].value_counts()
ax = types_count.sort_values()[-10:].plot.bar(figsize=(10, 4))
for p in ax.patches:
ax.annotate(
p.get_height(),
(p.get_x() + p.get_width() / 2, p.get_height() + 500),
ha="center",
va="center",
)

Let’s compare the spatial distribution of the top five categories, excluding Earth and Other categories.
[7]:
conus = gpd.GeoSeries([conus_geom], crs=world.crs).to_crs(CRS)
ax = conus.plot(figsize=(10, 6), facecolor="none", edgecolor="k")
top_5types = types_count.index[:5]
marker = dict(zip(top_5types, ["o", "^", "*", "X", "d"]))
color = dict(zip(top_5types, ["r", "b", "g", "k", "c"]))
conus_dams = conus_dams.to_crs(CRS)
for c in top_5types:
conus_dams[conus_dams.primaryDamTypeName == c].plot(
ax=ax,
alpha=0.6,
markersize=10,
marker=marker[c],
facecolor="none",
color=color[c],
label=c,
)
ax.legend()
ax.axis(False)
ax.set_title("Dams within CONUS with max sotrage > 200 acre-feet")
ax.figure.set_dpi(100)
ax.figure.savefig("_static/dams.png", **SAVE_KWDS)

This page was generated from notebooks/nid_dib.ipynb.
Interactive online version:
Spatiotemporal Distribution of Dams#
[1]:
import os
import shutil
import geopandas as gpd
import matplotlib
import matplotlib.pyplot as plt
import pygeohydro as gh
from IPython.display import YouTubeVideo
from matplotlib.font_manager import fontManager
from tqdm.auto import tqdm
Get the data from the National Inventory of Dams using PyGeoHydro package within contiguous US.
[2]:
CRS = "esri:102008"
nid = gh.NID()
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
conus_geom = world[world.iso_a3 == "USA"].geometry.iloc[0].geoms[0]
dam_list = nid.get_byfilter([{"nation": ["USA"]}])
conus_dams = dam_list[0][dam_list[0].is_valid]
conus_dams = conus_dams[conus_dams.within(conus_geom)]
conus_dams = nid.inventory_byid(conus_dams.id)
Since we’re going to use dam coordinates and completion year to filter the data, let’s check number of missing data to find the total available dams that include these two data.
[3]:
missing = ~(conus_dams.is_valid).sum() + conus_dams.yearCompleted.isna().sum()
conus_dams.shape[0] - missing
[3]:
165439
Let’s plot the number of missing data by state.
[4]:
ax = (
conus_dams[~conus_dams.is_valid | conus_dams.yearCompleted.isna()]
.groupby("state")
.size()
.plot.bar(figsize=(20, 5))
)
ax.set_title("Number of dams with missing coordinate or completion year data")
for p in ax.patches:
ax.annotate(
p.get_height(),
(p.get_x() + p.get_width() / 2, p.get_height() + 15),
ha="center",
va="center",
)
ax.figure.set_dpi(300)

Plot the frames.
[5]:
column = "damHeight"
cmap = "viridis"
min_q, max_q = 0.1, 0.9
label = nid.fields_meta[nid.fields_meta["name"] == column].label.iloc[0]
label = "\n".join([label, f"{min_q} - {max_q} Quantile"])
norm = plt.Normalize(
vmin=conus_dams[column].quantile(min_q), vmax=conus_dams[column].quantile(max_q)
)
dpi = 300.0
figsize = (1920.0 / dpi, 1080.0 / dpi)
font = "Lato"
indent = "\n "
if font in {f.name for f in fontManager.ttflist}:
matplotlib.rcParams["font.sans-serif"] = font
matplotlib.rcParams["font.family"] = "sans-serif"
plt.ioff()
os.makedirs("tmp", exist_ok=True)
conus = gpd.GeoSeries([conus_geom], crs=world.crs).to_crs(CRS)
conus_dams = conus_dams.to_crs(CRS)
def get_ax():
ax = conus.plot(figsize=figsize, facecolor="none", edgecolor="k")
ax.axis(False)
fig = ax.figure
fig.set_dpi(dpi)
cax = fig.add_axes(
[
ax.get_position().x1 + 0.01,
ax.get_position().y0,
0.02,
ax.get_position().height,
]
)
sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
fig.colorbar(sm, cax=cax, label=label)
return ax
yr_min = conus_dams.yearCompleted.astype("Int32").min()
yr_max = conus_dams.yearCompleted.astype("Int32").max()
years = range(yr_min + 1, yr_max + 1)
with tqdm(total=len(years), desc="Plotting") as pbar:
for year in years:
pbar.set_postfix_str(f"Year: {year}")
dams = conus_dams[conus_dams.yearCompleted <= year]
ax = get_ax()
dams.plot(
ax=ax,
column=column,
cmap=cmap,
norm=norm,
alpha=0.3,
markersize=3,
)
ax.set_title(f"Dams Completed Up to {year}\nTotal = {len(dams):,}")
h_max = dams[column].max()
name_max = dams.iloc[dams[column].argmax()]["name"].title()
ax.annotate(
f"Largest Dam:{indent}Height: {h_max:.1f} ft{indent}Name: {name_max}",
xy=(0, 0),
xycoords=ax.transAxes,
)
ax.figure.savefig(f"tmp/{year}.png", bbox_inches="tight", dpi=dpi, facecolor="w")
plt.close("all")
pbar.update(1)
Plotting: 100%|██████████| 2021/2021 [10:38<00:00, 3.17it/s, Year: 2021]
Repeat the last frame 100 times.
[6]:
for i in range(1, 100):
shutil.copy(f"tmp/{years[-1]}.png", f"tmp/{years[-1] + i}.png")
Convert the frames to a video file.
[7]:
!ffmpeg -hide_banner -loglevel panic -start_number 1641 -i tmp/%04d.png -pix_fmt yuv420p -vf scale=1920:-2 -y input_data/NID_2019.mp4
You can check out the video here:
[8]:
YouTubeVideo("AM2f9MaBjiQ")
[8]:
This page was generated from notebooks/nlcd.ipynb.
Interactive online version:
Land Use/Land Cover#
[1]:
import matplotlib.pyplot as plt
import networkx as nx
import osmnx as ox
import pygeohydro as gh
from pynhd import NLDI
Land cover, imperviousness, and canopy data can be retrieved from the NLCD database. First, we use PyNHD to get the contributing watershed geometry of three NWIS stations with the ID of USGS-01031450
, USGS-01031500
, and USGS-01031510
:
[2]:
geometry = NLDI().get_basins(["01031450", "01318500", "01031510", "03460000"])
We can now use the nlcd_bygeom
and nlcd_bycoords
functions to get the NLCD data.
Let’s start by nlcd_bygeom
. This function has two positional arguments for passing the target geometries or points of interests and target resolution in meters. Note that, if a single geometry is passed and the geometry is not in EPSG:4326
CRS, geo_crs
argument should be given as well. The second argument is the target resolution of the data in meters. The NLCD database is multi-resolution and based on the target resolution, the source data are resampled on the server side.
You should be mindful of the resolution since higher resolutions require more memory so if your requests requires more memory than the available memory on your system the code is likely to crash. You can either increase the resolution or divide your region of interest into smaller regions.
Moreover, the MRLC GeoServer has a limit of about 8 million pixels per request but PyGeoHydro takes care of the domain decomposition under-the-hood and divides the request to smaller requests then merges them. So the only bottleneck for requests is the amount of available memory on your system.
Both nlcd_bygeom
and nlcd_bycoords
functions can request for four layers from the MRLC web service; imperviousness, land use/land cover, impervious descriptors, and tree canopy. Since NLCD is released every couple of years, you can specify the target year via the years
argument. Layers that are not included in this argument are ignored. By default, years
is {'impervious': [2019], 'cover': [2019], 'canopy': [2019], 'descriptor': [2019]}
.
Furthermore, we can specify the region of interest as well via the region
argument. Valid values are L48
(for CONUS), HI
(for Hawaii), AK
(for Alaska), and PR
(for Puerto Rico). By default, region is set to L48
.
Let’s get the cover and impervious descriptor data at a 100 m resolution for all three stations
[3]:
desc = gh.nlcd_bygeom(geometry, 100, years={"descriptor": 2019}, ssl=False)
This function returns a dict
where its keys are the indices of the input GeoDataFrame
.
[4]:
cmap, norm, levels = gh.plot.descriptor_legends()
ax = desc["01318500"].descriptor_2019.plot(
size=5, cmap=cmap, levels=levels, cbar_kwargs={"ticks": levels[:-1]}
)
ax.axes.set_title("Urban Descriptor 2019")
ax.figure.savefig("_static/descriptor.png", bbox_inches="tight", facecolor="w")

Now let’s get the land cover data:
[5]:
lulc = gh.nlcd_bygeom(geometry, 100, years={"cover": [2016, 2019]}, ssl=False)
Moreover, we can get the statistics of the cover data for each class or category:
[6]:
stats = gh.cover_statistics(lulc["01318500"].cover_2019)
stats
[6]:
Stats(classes={'Open Water': 3.21488376390156, 'Developed, Open Space': 1.6140245522051944, 'Developed, Low Intensity': 0.5561778937764895, 'Developed, Medium Intensity': 0.2702359367601714, 'Developed, High Intensity': 0.039726992412606396, 'Barren Land (Rock/Sand/Clay)': 0.09238835444792183, 'Deciduous Forest': 41.800879999076116, 'Evergreen Forest': 29.35201117899089, 'Mixed Forest': 14.085066577357924, 'Shrub-Forest': 0.693836541903893, 'Herbaceous-Forest': 0.33444584310147707, 'Shrub/Scrub': 0.05127553671859662, 'Grassland/Herbaceous': 0.18777933041540115, 'Pasture/Hay': 0.4517790532503378, 'Cultivated Crops': 0.008776893672552575, 'Woody Wetlands': 6.656349967086649, 'Emergent Herbaceous Wetlands': 0.5903615849222206}, categories={'Background': 0.0, 'Unclassified': 0.0, 'Water': 3.21488376390156, 'Developed': 2.4801653751544617, 'Barren': 0.09238835444792183, 'Forest': 86.2662401404303, 'Shrubland': 0.05127553671859662, 'Herbaceous': 0.18777933041540115, 'Planted/Cultivated': 0.46055594692289037, 'Wetlands': 7.246711552008869})
and an estimation of the overland roughness:
[7]:
roughness = gh.overland_roughness(lulc["01318500"].cover_2019)
Additionally, PyGeoHydro
provides a function for getting the official legends of the cover data. Let’s plot the data using this legends.
[8]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5))
cmap, norm, levels = gh.plot.cover_legends()
lulc["01318500"].cover_2019.plot(
ax=ax1, cmap=cmap, levels=levels, cbar_kwargs={"ticks": levels[:-1]}
)
ax1.set_title("Land Use/Land Cover 2019")
ax1.set_axis_off()
roughness.plot(ax=ax2)
ax2.set_title("Overland Roughness")
ax2.set_axis_off()
fig.savefig("_static/lulc.png", bbox_inches="tight", facecolor="w")

Now, let’s see nlcd_bycoords
in action. The coordinates must be a list of (longitude, latitude) coordinates. Let’s use osmnx package to get a street network:
[9]:
G = ox.graph_from_place("Piedmont, California, USA", network_type="drive")
/Users/tchegini/.local/apps/mambaforge/envs/hyriver-dev/lib/python3.10/site-packages/osmnx/geocoder.py:110: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
gdf = gdf.append(_geocode_query_to_gdf(q, wr, by_osmid))
Now, we can get land cover and tree canopy for each node based on their coordinates and then plot the results.
[10]:
x, y = nx.get_node_attributes(G, "x").values(), nx.get_node_attributes(G, "y").values()
lulc = gh.nlcd_bycoords(list(zip(x, y)), years={"cover": [2019], "canopy": [2016]})
nx.set_node_attributes(G, dict(zip(G.nodes(), lulc.cover_2019)), "cover_2019")
nx.set_node_attributes(G, dict(zip(G.nodes(), lulc.canopy_2016)), "canopy_2016")
[11]:
lulc.head()
[11]:
geometry | canopy_2016 | cover_2019 | |
---|---|---|---|
0 | POINT (-122.24760 37.82625) | 11.0 | 22 |
1 | POINT (-122.24719 37.82422) | 12.0 | 22 |
2 | POINT (-122.24608 37.82492) | 0.0 | 22 |
3 | POINT (-122.24532 37.82541) | 0.0 | 22 |
4 | POINT (-122.24447 37.82595) | 5.0 | 23 |
[12]:
nc = ox.plot.get_node_colors_by_attr(G, "canopy_2016", cmap="viridis_r")
fig, ax = ox.plot_graph(
G,
node_color=nc,
node_size=20,
save=True,
bgcolor="w",
)

Next, let’s get the data for a larger area at a coarser resolution. For example, we get the Upper Hudson subregion boundary using pygeohydro.WBD
, then get the cover data within the area.
[16]:
wbd = gh.WBD("huc4")
hud = wbd.byids("huc4", ["0202", "0203", "0204", "0205", "0206", "0207", "0208"])
hud = hud.set_index("huc4")
lulc = gh.nlcd_bygeom(hud, 1000, years={"cover": [2019]})
[17]:
cmap, norm, levels = gh.plot.cover_legends()
fig, ax = plt.subplots(1, 1, figsize=(8, 8))
xmin, ymin, xmax, ymax = hud.unary_union.bounds
lulc["0202"].cover_2019.plot(ax=ax, cmap=cmap, levels=levels, cbar_kwargs={"ticks": levels[:-1]})
_ = lulc.pop("0202")
for da in lulc.values():
da.cover_2019.plot(ax=ax, cmap=cmap, levels=levels, add_colorbar=False)
hud.plot(ax=ax, facecolor="none", edgecolor="k", linewidth=0.8)
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ax.set_axis_off()
ax.set_title("Land Use/Land Cover 2019")
ax.margins(0)

This page was generated from notebooks/noaa.ipynb.
Interactive online version:
Water Levels and Tides#
[1]:
import async_retriever as ar
import matplotlib.pyplot as plt
import pandas as pd
We can get water level recordings of a NOAA’s water level station through the CO-OPS API, Let’s get the data for 8534720 station (Atlantic City, NJ), during 2012,. Note that this CO-OPS product has a 31-day for a single request, so we have to break the request accordingly.
[2]:
station_id = "8534720"
start = pd.to_datetime("2012-01-01")
end = pd.to_datetime("2012-12-31")
s = start
dates = []
for e in pd.date_range(start, end, freq="m"):
dates.append((s.date(), e.date()))
s = e + pd.offsets.MonthBegin()
url = "https://api.tidesandcurrents.noaa.gov/api/prod/datagetter"
urls, kwds = zip(
*(
(
url,
{
"params": {
"product": "water_level",
"application": "web_services",
"begin_date": f'{s.strftime("%Y%m%d")}',
"end_date": f'{e.strftime("%Y%m%d")}',
"datum": "MSL",
"station": f"{station_id}",
"time_zone": "GMT",
"units": "metric",
"format": "json",
}
},
)
for s, e in dates
)
)
resp = ar.retrieve(urls, read="json", request_kwds=kwds)
wl_list = []
for rjson in resp:
wl = pd.DataFrame.from_dict(rjson["data"])
wl["t"] = pd.to_datetime(wl.t)
wl = wl.set_index(wl.t).drop(columns="t")
wl["v"] = pd.to_numeric(wl.v, errors="coerce")
wl_list.append(wl)
water_level = pd.concat(wl_list).sort_index()
water_level.attrs = rjson["metadata"]
[3]:
ax = water_level.v.plot(figsize=(15, 5))
ax.set_xlabel("")
ax.figure.savefig("_static/water_level.png", bbox_inches="tight", facecolor="w")

Now, let’s see an example without any payload or headers. Here’s how we can retrieve harmonic constituents from CO-OPS:
[4]:
stations = [
"8410140",
"8411060",
"8413320",
"8418150",
"8419317",
"8419870",
"8443970",
"8447386",
]
base_url = "https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations"
urls = [f"{base_url}/{i}/harcon.json?units=metric" for i in stations]
resp = ar.retrieve(urls, "json")
amp_list = []
phs_list = []
for rjson in resp:
sid = rjson["self"].rsplit("/", 2)[1]
const = pd.DataFrame.from_dict(rjson["HarmonicConstituents"]).set_index("name")
amp = const.rename(columns={"amplitude": sid})[sid]
phase = const.rename(columns={"phase_GMT": sid})[sid]
amp_list.append(amp)
phs_list.append(phase)
amp = pd.concat(amp_list, axis=1)
phs = pd.concat(phs_list, axis=1)
[5]:
const = [
"M2",
"S2",
"N2",
"K1",
"O1",
"S1",
"M1",
"J1",
"Q1",
"T2",
"R2",
"P1",
"L2",
"K2",
]
ax = plt.subplot(111, projection="polar")
ax.figure.set_size_inches(5, 5)
for i in const:
phs_ord = phs.loc[i].sort_values()
amp_ord = amp.loc[i][phs_ord.index]
ax.scatter(phs_ord, amp_ord, s=amp_ord * 200, cmap="hsv", alpha=0.75, label=i)
ax.legend(loc="upper left", bbox_to_anchor=(1.05, 1))
ax.figure.savefig("_static/tides.png", bbox_inches="tight", facecolor="w")

This page was generated from notebooks/nwis.ipynb.
Interactive online version:
River Discharge#
[1]:
import pandas as pd
import pydaymet as daymet
import pygeohydro as gh
import xarray as xr
from pygeohydro import NWIS, plot
[2]:
_ = xr.set_options(display_expand_attrs=False)
We can explore the available NWIS stations within a bounding box using interactive_map
function. It returns an interactive map and by clicking on an station some of the most important properties of stations are shown so you can decide which station(s) to choose from.
[3]:
bbox = (-69.5, 45, -69, 45.5)
[4]:
nwis_kwds = {"hasDataTypeCd": "dv", "outputDataTypeCd": "dv", "parameterCd": "00060"}
gh.interactive_map(bbox, nwis_kwds=nwis_kwds)
[4]:
We can get more information about these stations using the get_info
function from NWIS
class.
[5]:
nwis = NWIS()
query = {
"bBox": ",".join(f"{b:.06f}" for b in bbox),
"hasDataTypeCd": "dv",
"outputDataTypeCd": "dv",
}
info_box = nwis.get_info(query)
Now, let’s select all the stations that their daily mean streamflow data are available between 2000-01-01 and 2010-12-31.
[6]:
dates = ("2000-01-01", "2010-12-31")
stations = info_box[
(info_box.begin_date <= dates[0]) & (info_box.end_date >= dates[1])
].site_no.tolist()
One of the useful information in the database in Hydro-Climatic Data Network - 2009 (HCDN-2009) flag. This flag shows whether the station is natural (True) or affected by human activities (False). If an station is not available in the HCDN dataset None
is returned.
[7]:
query = {
"site": ",".join(stations),
"hasDataTypeCd": "dv",
"outputDataTypeCd": "dv",
}
info = nwis.get_info(query, expanded=True)
info.set_index("site_no").hcdn_2009
[7]:
site_no
01031450 False
01031500 True
01031500 True
01031500 True
01031500 True
01033000 False
451031069185301 False
451105069270801 False
Name: hcdn_2009, dtype: bool
We can get the daily mean streamflow for these stations using the get_streamflow
function. This function has a flag to return the data mm/day rather than the default cms
which is useful for hydrolgy models and plotting hydrologic signatures.
[8]:
qobs = nwis.get_streamflow(stations, dates, mmd=False)
Dropped 1 stations since they don't have discharge data from 2000-01-01 to 2010-12-31.
By default, get_streamflow
returns a pandas.DataFrame
that has an attrs
method containing metadata for all the stations.
Moreover, we can get the same data as xarray.Dataset
as follows:
[9]:
qobs_ds = nwis.get_streamflow(stations, dates, mmd=False, to_xarray=True)
qobs_ds
Dropped 1 stations since they don't have discharge data from 2000-01-01 to 2010-12-31.
[9]:
<xarray.Dataset> Dimensions: (time: 4018, station_id: 2) Coordinates: * time (time) datetime64[ns] 2000-01-01T05:00:00 ... 2010-12-31T05... * station_id (station_id) object 'USGS-01031450' 'USGS-01031500' Data variables: discharge (time, station_id) float64 2.039 5.975 1.841 ... 4.389 12.6 station_nm (station_id) <U44 'Kingsbury Stream At Abbot Village, Maine... dec_lat_va (station_id) float64 45.18 45.17 dec_long_va (station_id) float64 -69.45 -69.31 alt_va (station_id) float64 423.0 358.5 alt_acy_va (station_id) float64 0.01 0.01 alt_datum_cd (station_id) <U6 'NGVD29' 'NGVD29' huc_cd (station_id) <U8 '01020004' '01020004' begin_date (station_id) <U10 '1997-07-26' '1902-10-01' end_date (station_id) <U10 '2022-05-11' '2022-05-11' Attributes: (1)
Then we can use the signatures
function to plot hydrologic signatures of the streamflow. Note that the input time series should be in mm/day. This function has argument, output
, for saving the plot as a png
file.
[10]:
plot.signatures(qobs, output="_static/example_plots_signatures.png")

This function also can show precipitation data as a bar plot. Let’s use PyDaymet to get the precipitation at the NWIS stations location.
[11]:
sid = "01031500"
coords = tuple(info[info.site_no == sid][["dec_long_va", "dec_lat_va"]].to_numpy()[0])
prcp = daymet.get_bycoords(coords, dates, variables="prcp", ssl=False)
qobs.index = pd.to_datetime(qobs.index.strftime("%Y-%m-%d"))
[12]:
plot.signatures(qobs[f"USGS-{sid}"], prcp, title=f"Streamflow Observations for USGS-{sid}")

We can also get instantaneous streamflow data using get_streamflow
. This method assumes that the input dates are in UTC time zone and returns the data in UTC time zone as well.
[13]:
qobs = nwis.get_streamflow("01646500", ("2005-01-01 12:00", "2005-01-12 15:00"), freq="iv")
_ = qobs.plot(figsize=(7, 3.5))

This page was generated from notebooks/pet_validation.ipynb.
Interactive online version:
Computing PET#
[1]:
import calendar
from pathlib import Path
import geopandas as gpd
import pandas as pd
import pydaymet as daymet
import xarray as xr
from hymod import compute_kge
[2]:
import warnings
warnings.filterwarnings("ignore", message=".*Index.*")
PyDaymet
offers three methods for computing PET: penman_monteith
, priestley_taylor
, and hargreaves_samani
. Let’s compute PET using PyDaymet
with these three methods and compare them with the CAMELS.
We choose station ID of 01013500 and time period of 2000-01-01 to 2009-01-12.
[3]:
station = "01013500"
coords = (-68.58264, 47.23739)
dates = ("2000-01-01", "2009-01-12")
root = Path("input_data")
[4]:
clm_par = Path(root, f"{station}_clm_single_pixel.parquet")
clm_nc = Path(root, f"{station}_clm_basin.nc")
basin = gpd.read_feather(Path(root, f"{station}_basin.feather"))
if clm_par.exists() and clm_nc.exists():
clm_c = pd.read_parquet(clm_par)
clm_b = xr.open_dataset(clm_nc)
else:
clm_c = daymet.get_bycoords(coords, dates)
clm_b = daymet.get_bygeom(basin.geometry[0], dates)
clm_c.to_parquet(clm_par)
clm_b.to_netcdf(clm_nc)
[5]:
single, gridded, style = {}, {}, {}
methods = {"HS": "hargreaves_samani", "PM": "penman_monteith", "PT": "priestley_taylor"}
for n, m in methods.items():
pet_c = daymet.potential_et(clm_c, coords, method=m)
single[f"{n} (Single Pixel)"] = pet_c["pet (mm/day)"]
style[f"{n} (Single Pixel)"] = "-"
pet_b = daymet.potential_et(clm_b, method=m)
gridded[f"{n} (Areal Average)"] = pet_b.pet.mean(dim=["x", "y"]).values
style[f"{n} (Areal Average)"] = "--"
Now, let’s get the CAMELS dataset and extract PET for our target station and time period.
[6]:
camels = pd.read_parquet(Path(root, f"{station}_05_model_output.parquet"))
camels = camels.loc[clm_c.index, "PET"]
style["CAMELS"] = "-"
[7]:
pet = pd.DataFrame({**single, **gridded, "CAMELS": camels})
pet[pet.lt(0)] = 0.0
Let’s compute Kling-Gupta Efficiency (KGE) for each method in comparison to the CAMELS’ results.
[8]:
kge = {
m: compute_kge(pet["CAMELS"].to_numpy("f8"), pet[m].to_numpy("f8"))
for m in pet
if m != "CAMELS"
}
pd.Series(kge, name="Kling-Gupta Efficiency")
[8]:
HS (Single Pixel) 0.924996
PM (Single Pixel) 0.871693
PT (Single Pixel) 0.909878
HS (Areal Average) 0.910407
PM (Areal Average) 0.876787
PT (Areal Average) 0.929418
Name: Kling-Gupta Efficiency, dtype: float64
It seems that PT (Areal Average) performs better than the rest. We can plot the data for daily and mean-monthly PET.
[9]:
pet.plot(figsize=(12, 6), ylabel="PET (mm/day)", style=style)
[9]:
<AxesSubplot:xlabel='time', ylabel='PET (mm/day)'>

[10]:
month_abbr = dict(enumerate(calendar.month_abbr))
mean_month = pet.groupby(pd.Grouper(freq="M")).sum()
mean_month = mean_month.groupby(mean_month.index.month).mean()
mean_month.index = mean_month.index.map(month_abbr)
[11]:
_ = mean_month.plot(figsize=(12, 6), ylabel="PET (mm/month)", style=style)

This page was generated from notebooks/pygeoapi.ipynb.
Interactive online version:
Split Catchment with PyGeoAPI#
[1]:
import geopandas as gpd
import pynhd
import shapely.geometry as sgeom
PyGeoAPI service provides four functionalities:
flow_trace
: Trace flow from a starting point to up/downstream direction.split_catchment
: Split the local catchment of a point of interest at the point’s location.elevation_profile
: Extract elevation profile along a flow path between two points.cross_section
: Extract cross-section at a point of interest along a flow line.
The pygeoapi
function in PyNHD requires two inputs, a geopandas.GeoDataFrame
that must contain all the required inputs corresponding to target services. Let’s take a look at them in an example:
[2]:
gdf = gpd.GeoDataFrame(
{
"direction": [
"none",
]
},
geometry=[sgeom.Point((1774209.63, 856381.68))],
crs="ESRI:102003",
)
trace = pynhd.pygeoapi(gdf, "flow_trace")
In the split_catchment
service we can set upstream
to True
to get the entire upstream catchments not just the split local catchments.
[3]:
gdf = gpd.GeoDataFrame(
{
"upstream": [
False,
]
},
geometry=[sgeom.Point((-73.82705, 43.29139))],
crs="epsg:4326",
)
split = pynhd.pygeoapi(gdf, "split_catchment")
[4]:
ax = split.plot(figsize=(8, 8), facecolor="lightgrey", edgecolor="black", alpha=0.8)
trace.plot(ax=ax, color="b", linewidth=3.0)
ax.axis("off")
ax.set_title("Split Catchment")
ax.margins(x=0)
ax.figure.set_dpi(100)
ax.figure.savefig("_static/split_catchment.png", bbox_inches="tight", facecolor="w", dpi=100)

The elevation_profile
service returns the elevation profile along a flow path between two points at a given number of points and a specified resolution for DEM.
[5]:
coords = [(-103.801086, 40.26772), (-103.80097, 40.270568)]
gdf = gpd.GeoDataFrame(
{
"numpts": [
101,
],
"dem_res": [
1,
],
},
geometry=[sgeom.MultiPoint(coords)],
crs="epsg:4326",
)
profile = pynhd.pygeoapi(gdf, "elevation_profile")
[6]:
ax = profile[["distance", "elevation"]].plot(
x="distance", y="elevation", color="b", linewidth=3.0, figsize=(12, 4)
)
ax.set_ylim(1280, 1320)
ax.legend(["Elevation Profile"])
ax.margins(x=0)

The cross_section
service extracts the cross-section within a buffer distance from a point of interest along a flow line and at a given number of points.
[7]:
gdf = gpd.GeoDataFrame(
{
"numpts": [
101,
],
"width": [
1000,
],
},
geometry=[sgeom.Point((-103.80119, 40.2684))],
crs="epsg:4326",
)
section = pynhd.pygeoapi(gdf, "cross_section")
[8]:
ax = section[["distance", "elevation"]].plot(
x="distance", y="elevation", color="b", linewidth=3.0, figsize=(10, 5)
)
ax.legend(["Cross Section"])
ax.margins(x=0)

This page was generated from notebooks/rem.ipynb.
Interactive online version:
Relative Elevation Model#
This tutorial is inspired by this blog post, this excellent story map, and this notebook.
[1]:
from pathlib import Path
import numpy as np
import proplot as pplt
import py3dep
import pynhd
import shapely.ops as ops
import xarray as xr
import xrspatial as xs
from datashader import transfer_functions as tf
from datashader import utils as ds_utils
from datashader.colors import Greys9, inferno
from scipy.spatial import KDTree
[2]:
import warnings
warnings.filterwarnings("ignore", ".*invalid value encountered in.*")
Relative Elevation Model (REM) detrends a DEM based on the water surface of a stream. It’s especially useful for visualization of floodplains. We’re going to compute REM for a segment of Carson River and visualize the results using xarray-spatial and datashader.
First, let’s check out the available DEM resolutions in our area of interest (AOI).
[3]:
bbox = (-119.59, 39.24, -119.47, 39.30)
dem_res = py3dep.check_3dep_availability(bbox)
dem_res
[3]:
{'1m': True,
'3m': False,
'5m': False,
'10m': True,
'30m': True,
'60m': False,
'topobathy': False}
We can see that Lidar (1-m), 10-m, and 30-m are available. Obviously, Lidar data gives us the best results, but it can be computational expensive. So, we’re going to set the resolution 5 m since 1-m and 10-m data are available.
[4]:
res = 5
dem = py3dep.get_map("DEM", bbox, res)
fig, ax = pplt.subplots(refwidth=4, refaspect=2)
dem.plot(ax=ax, robust=True)
[4]:
<matplotlib.collections.QuadMesh at 0x1aa7df6a0>

Next, we need to get the river’s centerline. For this purpose, first we get the flowlines within our AOI. Then, we remove all the isolated flowlines using the remove_isolated
flag of pynhd.prepare_nhdplus
and find the main flowline based on the minimum value of the levelpathi
attribute.
[5]:
wd = pynhd.NHD("flowline_mr")
flw = wd.bygeom(bbox)
flw = pynhd.prepare_nhdplus(flw, 0, 0, 0, remove_isolated=True)
flw = flw[flw.levelpathi == flw.levelpathi.min()].copy()
fig, ax = pplt.subplots(refwidth=4, refaspect=2)
flw.plot(ax=ax, color="r")
dem.plot(ax=ax, robust=True)
[5]:
<matplotlib.collections.QuadMesh at 0x1b92bd7e0>

Now, we can get the elevation profile along the obtained main flowline with spacing of 10 meters and using 1-m resolution DEM.
[6]:
lines = ops.linemerge(flw.geometry.tolist())
riv_dem = py3dep.elevation_profile(lines, 10, 1, crs=flw.crs)
riv_dem.plot(x="distance", figsize=(7, 3.2))
[6]:
[<matplotlib.lines.Line2D at 0x1aa3a4310>]

There are several methods for detrending the DEM based on the river’s elevation profile. You can check these method in this poster. Here, we’re going to use Inverse Distance Weighting method using scipy
’s KDTree
function and setting the number of neighbors to 200.
[7]:
def idw(riv_dem: xr.DataArray, dem: xr.DataArray, n_nb: int) -> xr.DataArray:
"""Interpolate grid DEM from river DEM using Inverse Distance Weighting."""
riv_coords = np.column_stack((riv_dem.x, riv_dem.y))
kdt = KDTree(riv_coords)
dem_grid = np.dstack(np.meshgrid(dem.x, dem.y)).reshape(-1, 2)
distances, idx = kdt.query(dem_grid, k=n_nb)
weights = np.reciprocal(distances)
weights = weights / weights.sum(axis=1, keepdims=True)
interp = weights * riv_dem.to_numpy()[idx]
interp = interp.sum(axis=1).reshape((dem.sizes["y"], dem.sizes["x"]))
return xr.DataArray(interp, dims=("y", "x"), coords={"x": dem.x, "y": dem.y})
elevation = idw(riv_dem, dem, 200)
rem = dem - elevation
fig, ax = pplt.subplots(refwidth=4, refaspect=2)
elevation.plot(ax=ax)
flw.plot(ax=ax, color="red")
[7]:
SubplotGrid(nrows=1, ncols=1, length=1)

Let’s use datashader to stack DEM, Hillshade and REM for a nice visualization. There are a couple of parameters in this tutorial that can be change for extending it to other regions: Number of neighbors in IDW, DEM resolution, span
argument of REM’s shading operation.
[8]:
illuminated = xs.hillshade(dem, angle_altitude=10, azimuth=90)
tf.Image.border = 0
img = tf.stack(
tf.shade(dem, cmap=Greys9, how="linear"),
tf.shade(illuminated, cmap=["black", "white"], how="linear", alpha=180),
tf.shade(rem, cmap=inferno[::-1], span=[0, 7], how="log", alpha=200),
)
ds_utils.export_image(img, Path("_static", "rem").as_posix())
[8]:
This page was generated from notebooks/ssebop.ipynb.
Interactive online version:
Actual Evapotranspiration#
[1]:
import networkx as nx
import osmnx as ox
import pandas as pd
import pygeohydro as gh
from pynhd import NLDI
The daily actual evapotranspiration can be retrieved from SEEBop database. Note that since this service does not offer a web service and data are available as raster files on the server, so this function is not as fast as other functions and download speed might be the bottleneck.
You can get the actual ET for position-based requests using ssebopeta_bycoords
and for geometry-based requests using ssebopeta_bygeom
.
Now, let’s see ssebopeta_bycoords
in action. The coordinates must be a dataframe with three columns: id
, x
, and y
. Let’s use osmnx package to get a street network:
[2]:
G = ox.graph_from_place("Piedmont, California, USA", network_type="drive")
/Users/tchegini/.local/apps/mambaforge/envs/hyriver-dev/lib/python3.10/site-packages/osmnx/geocoder.py:110: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
gdf = gdf.append(_geocode_query_to_gdf(q, wr, by_osmid))
Now, we can get land cover and tree canopy for each node based on their coordinates and then plot the results.
[3]:
dates = ("2005-10-01", "2005-10-05")
x, y = nx.get_node_attributes(G, "x"), nx.get_node_attributes(G, "y")
coords = pd.DataFrame({"id": x.keys(), "x": x.values(), "y": y.values()})
ds = gh.ssebopeta_bycoords(coords, dates=dates)
The function returns a xarray.Dataset
:
[4]:
ds
[4]:
<xarray.Dataset> Dimensions: (time: 5, location_id: 346) Coordinates: * time (time) datetime64[ns] 2005-10-01 2005-10-02 ... 2005-10-05 * location_id (location_id) int64 53017091 53018397 ... 9124766429 9231102999 Data variables: eta (time, location_id) float64 0.551 0.551 0.551 ... 0.551 0.551 x (location_id) float64 -122.2 -122.2 -122.2 ... -122.2 -122.2 y (location_id) float64 37.83 37.82 37.82 ... 37.83 37.83 37.82
Let’s add the average value of obtained ETAs to the street nodes:
[5]:
eta_m = ds.mean(dim="time").eta
eta_dict = {sid: eta_m.sel(location_id=sid).item() for sid in eta_m.location_id.values}
nx.set_node_attributes(G, eta_dict, "eta")
[6]:
nc = ox.plot.get_node_colors_by_attr(G, "eta", cmap="viridis")
fig, ax = ox.plot_graph(
G,
node_color=nc,
node_size=20,
save=True,
bgcolor="w",
)

Now, we get a watershed geometry using NLDI and then get the actual ET within its geometry.
[7]:
geometry = NLDI().get_basins("01315500").geometry[0]
[8]:
eta = gh.ssebopeta_bygeom(geometry, dates=dates)
[9]:
ax = eta.isel(time=4).plot(size=5)
date = eta.isel(time=4).time.dt.strftime("%Y-%m-%d").values
ax.axes.set_title(f"Actual Evapotranspiration ({date})")
ax.figure.savefig("_static/eta.png", bbox_inches="tight", facecolor="w")

This page was generated from notebooks/us_coasts.ipynb.
Interactive online version:
CONUS Coasts#
US States and Coastlines#
Census TIGER (Topologically Integrated Geographic Encoding and Referencing database)
[1]:
import os
import warnings
from pathlib import Path
import geopandas as gpd
warnings.filterwarnings("ignore", message=".*initial implementation of Parquet.*")
root = Path("input_data")
os.makedirs(root, exist_ok=True)
BASE_PLOT = {"facecolor": "k", "edgecolor": "b", "alpha": 0.2, "figsize": (18, 9)}
CRS = "esri:102008"
def tiger_shp_2020(data):
url = f"https://www2.census.gov/geo/tiger/TIGER2020/{data.upper()}/tl_2020_us_{data}.zip"
return gpd.read_file(url)
cfile = Path(root, "us_state.feather")
if cfile.exists():
state = gpd.read_feather(cfile)
else:
state = tiger_shp_2020("state")
state.to_feather(cfile)
[2]:
ax = state.to_crs(CRS).plot(**BASE_PLOT)
ax.axis("off")
ax.margins(0)

[3]:
cfile = Path(root, "us_coastline.feather")
if cfile.exists():
coastline = gpd.read_feather(cfile)
else:
coastline = tiger_shp_2020("coastline")
coastline.to_feather(cfile)
[4]:
ax = coastline.to_crs(CRS).plot(figsize=(18, 9))
ax.axis("off")
ax.margins(0)

Clip to CONUS#
[5]:
from shapely.geometry import box
conus_bounds = box(-125, 24, -65, 50)
cfile = Path(root, "conus_states.feather")
if cfile.exists():
conus_states = gpd.read_feather(cfile)
else:
conus_states = state[state.within(conus_bounds)]
conus_states.to_feather(cfile)
[6]:
ax = conus_states.to_crs(CRS).plot(**BASE_PLOT)
ax.axis("off")
ax.margins(0)

[7]:
conus_coastline = coastline[coastline.within(conus_bounds)]
cfile = Path(root, "us_coast_states.feather")
if cfile.exists():
coast_states = gpd.read_feather(cfile)
else:
coast_states = state[state.intersects(conus_coastline.unary_union)]
coast_states.to_feather(cfile)
[8]:
ax = coast_states.to_crs(CRS).plot(**BASE_PLOT)
conus_coastline.to_crs(CRS).plot(ax=ax, edgecolor="b", lw=2, zorder=1)
ax.axis("off")
ax.margins(0)

Tidal and Estuary USGS stations#
We need to look at the Water Services API.
[9]:
import pandas as pd
from pygeohydro import NWIS, ZeroMatched
cfile = Path(root, "coast_stations.feather")
if cfile.exists():
coast_stations = gpd.read_feather(cfile)
else:
queries = [
{
"stateCd": s.lower(),
"siteType": "ST-TS,ES",
"hasDataTypeCd": "dv",
"outputDataTypeCd": "dv",
}
for s in coast_states.STUSPS
]
nwis = NWIS()
def get_info(q):
try:
return nwis.get_info(q, True)
except ZeroMatched:
return None
sites = pd.concat(get_info(q) for q in queries if q is not None)
coast_stations = gpd.GeoDataFrame(
sites,
geometry=gpd.points_from_xy(sites.dec_long_va, sites.dec_lat_va),
crs="epsg:4269",
)
coast_stations.to_feather(cfile)
st = coast_stations[["site_no", "site_tp_cd", "geometry"]].to_crs(CRS)
ts = st[st.site_tp_cd == "ST-TS"].drop_duplicates()
es = st[st.site_tp_cd == "ES"].drop_duplicates()
[10]:
ax = conus_states.to_crs(CRS).plot(**BASE_PLOT)
coastline[coastline.within(conus_bounds)].to_crs(CRS).plot(ax=ax, edgecolor="g", lw=2, zorder=1)
ts.plot(ax=ax, lw=3, c="r")
es.plot(ax=ax, lw=3, c="b")
ax.legend(["Coastline", f"ST-TS ({ts.shape[0]})", f"ES ({es.shape[0]})"], loc="best")
ax.axis("off")
ax.margins(0)

Mean daily discharge for all stations#
[11]:
import numpy as np
import pandas as pd
cfile = Path(root, "discharge.parquet")
dates = ("2000-01-01", "2015-12-31")
if cfile.exists():
discharge = pd.read_parquet(cfile)
else:
nwis = NWIS()
discharge = nwis.get_streamflow(
coast_stations.site_no,
dates,
)
discharge[discharge < 0] = np.nan
discharge.to_parquet(cfile)
[12]:
ax = discharge.plot(legend=False, lw=0.8, figsize=(15, 6))
ax.set_ylabel("Q (cms)")
ax.set_xlabel("")
ax.margins(x=0)

Let’s find the station that has the largest peak value and see it on a map.
[13]:
station_id = discharge.max().idxmax().split("-")[1]
coast_stations[coast_stations.site_no == station_id].iloc[0]["station_nm"]
[13]:
'COLUMBIA RIVER AT PORT WESTWARD, NEAR QUINCY, OR'
[14]:
import pygeohydro as gh
lat, lon = coast_stations[coast_stations.site_no == station_id].iloc[0][
["dec_lat_va", "dec_long_va"]
]
bbox = (lon - 0.2, lat - 0.2, lon + 0.2, lat + 0.2)
nwis_kwds = {"hasDataTypeCd": "dv", "outputDataTypeCd": "dv", "parameterCd": "00060"}
station_map = gh.interactive_map(bbox, nwis_kwds=nwis_kwds)
/Users/tchegini/.local/apps/mambaforge/envs/hyriver-dev/lib/python3.10/site-packages/pandas/core/groupby/generic.py:303: FutureWarning: Dropping invalid columns in SeriesGroupBy.agg is deprecated. In a future version, a TypeError will be raised. Before calling .agg, select only columns which should be valid for the aggregating function.
results[key] = self.aggregate(func)
[15]:
station_map
[15]:
River network data#
Basin#
[16]:
import pynhd as nhd
nldi = nhd.NLDI()
cfile = Path(root, "basin.feather")
if cfile.exists():
basin = gpd.read_feather(cfile)
else:
basin = nldi.get_basins(station_id)
basin.to_feather(cfile)
[17]:
ax = basin.to_crs(CRS).plot(**BASE_PLOT)
ax.axis("off")
ax.margins(0)

[18]:
import folium
folium.GeoJson(basin.geometry).add_to(station_map)
station_map
[18]:
Main river network#
[19]:
cfile = Path(root, "flowline_main.feather")
if cfile.exists():
flw_main = gpd.read_feather(cfile)
else:
flw_main = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamMain",
source="flowlines",
distance=4000,
)
flw_main.to_feather(cfile)
Tributaries#
[20]:
cfile = Path(root, "flowline_trib.feather")
if cfile.exists():
flw_trib = gpd.read_feather(cfile)
else:
flw_trib = nldi.navigate_byid(
fsource="nwissite",
fid=f"USGS-{station_id}",
navigation="upstreamTributaries",
source="flowlines",
distance=4000,
)
flw_trib.to_feather(cfile)
flw_trib["nhdplus_comid"] = flw_trib["nhdplus_comid"].astype("float").astype("Int64")
[21]:
ax = basin.plot(**BASE_PLOT)
flw_trib.plot(ax=ax)
flw_main.plot(ax=ax, lw=3, color="r")
ax.legend(["Tributaries", "Main"])
ax.axis("off")
ax.margins(0)

NHDPlus Value Added Attributes (VAA)#
[22]:
cfile = Path(root, "nhdplus_vaa.parquet")
nhdplus_vaa = nhd.nhdplus_vaa(cfile)
flw_trib = flw_trib.merge(nhdplus_vaa, left_on="nhdplus_comid", right_on="comid", how="left")
flw_trib.loc[flw_trib.slope < 0, "slope"] = np.nan
[23]:
ax = basin.plot(**BASE_PLOT)
flw_trib.plot(
ax=ax,
column="slope",
scheme="natural_breaks",
k=6,
cmap="plasma",
legend=True,
legend_kwds={"title": "Slope [-]"},
)
ax.axis("off")
ax.margins(0)

This page was generated from notebooks/water_quality.ipynb.
Interactive online version:
Water Quality Portal#
[1]:
import contextily as ctx
import geopandas as gpd
from pygeohydro import WaterQuality
from shapely.geometry import Point
[2]:
SAVE_KWDS = {"bbox_inches": "tight", "dpi": 300, "facecolor": "w"}
The WaterQuality
has a number of convenience methods to retrieve data from the web service. Since there are many parameter combinations that can be used to retrieve data, a general method is provided to retrieve data from any of the valid endpoints. You can use get_json
to retrieve stations info as a geopandas.GeoDataFrame
or get_csv
to retrieve stations data as a pandas.DataFrame
. You can construct a dictionary of the parameters and pass it to one of these functions. For
more information on the parameters, please consult the Water Quality Data documentation. For example, let’s find all the stations within a bounding box that have Caffeine data:
[3]:
wq = WaterQuality()
stations = wq.station_bybbox((-92.8, 44.2, -88.9, 46.0), {"characteristicName": "Caffeine"})
Or the same criterion but within a 30 mile radius of a point:
[4]:
stations = wq.station_bydistance(-92.8, 44.2, 30, {"characteristicName": "Caffeine"})
[5]:
point = gpd.GeoSeries([Point(-92.8, 44.2)], crs="epsg:4326")
point = point.to_crs(point.estimate_utm_crs())
buff = point.buffer(30 * 1609.344)
ax = buff.plot(color="b", edgecolor="black", alpha=0.1, figsize=(6, 6))
point.plot(ax=ax, color="g", marker="x")
stations.to_crs(buff.crs).plot(ax=ax, color="r", marker="o", facecolor="none")
ctx.add_basemap(ax, crs=buff.crs)
ax.set_axis_off()
ax.margins(0)
ax.figure.set_dpi(100)
ax.figure.savefig("_static/water_quality.png", **SAVE_KWDS)

Then we can get for al these stations the data like this:
[6]:
sids = stations.MonitoringLocationIdentifier.tolist()
caff = wq.data_bystation(sids, {"characteristicName": "Caffeine"})
caff.head()
[6]:
OrganizationIdentifier | OrganizationFormalName | ActivityIdentifier | ActivityTypeCode | ActivityMediaName | ActivityMediaSubdivisionName | ActivityStartDate | ActivityStartTime/Time | ActivityStartTime/TimeZoneCode | ActivityEndDate | ... | ResultAnalyticalMethod/MethodName | MethodDescriptionText | LaboratoryName | AnalysisStartDate | ResultLaboratoryCommentText | DetectionQuantitationLimitTypeName | DetectionQuantitationLimitMeasure/MeasureValue | DetectionQuantitationLimitMeasure/MeasureUnitCode | PreparationStartDate | ProviderName | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | USGS-MN | USGS Minnesota Water Science Center | nwismn.01.00400395 | Sample-Routine | Water | Surface Water | 2004-09-03 | 11:30:00 | CDT | NaN | ... | NWQL Schedule 1433 | USGS WRI 01-4186 | NaN | 2004-11-13 | NaN | Estimated Detection Level | 0.50 | ug/l | 2004-09-16 | NWIS |
1 | USGS-IA | USGS Iowa Water Science Center | nwisia.01.00700543 | Sample-Routine | Water | Groundwater | 2007-08-27 | 15:00:00 | CDT | NaN | ... | Pest, polar, wf, spe, HPLC-MS | USGS WRI 01-4134 | USGS-National Water Quality Lab, Denver, CO | 2007-09-12 | NaN | Laboratory Reporting Level | 0.04 | ug/l | 2007-09-04 | NWIS |
2 | USGS-MN | USGS Minnesota Water Science Center | nwismn.01.00800189 | Sample-Routine | Water | Groundwater | 2008-04-23 | 12:30:00 | CDT | NaN | ... | NWQL Schedule 4433 | USGS TMR 5-B4 | USGS-National Water Quality Lab, Denver, CO | 2008-05-23 | NaN | Estimated Detection Level | 0.20 | ug/l | 2008-05-08 | NWIS |
3 | USGS-MN | USGS Minnesota Water Science Center | nwismn.01.00800166 | Sample-Routine | Water | Groundwater | 2008-04-23 | 13:53:00 | CDT | NaN | ... | NWQL Schedule 4433 | USGS TMR 5-B4 | USGS-National Water Quality Lab, Denver, CO | 2008-05-26 | NaN | Estimated Detection Level | 0.20 | ug/l | 2008-05-08 | NWIS |
4 | USGS-MN | USGS Minnesota Water Science Center | nwismn.01.00800206 | Sample-Routine | Water | Groundwater | 2008-04-23 | 13:30:00 | CDT | NaN | ... | NWQL Schedule 4433 | USGS TMR 5-B4 | USGS-National Water Quality Lab, Denver, CO | 2008-05-23 | NaN | Estimated Detection Level | 0.20 | ug/l | 2008-05-08 | NWIS |
5 rows × 63 columns
This page was generated from notebooks/webservices.ipynb.
Interactive online version:
Web Services Queries#
[1]:
import pygeoutils as geoutils
from pygeoogc import WFS, WMS, ArcGISRESTful, ServiceURL
from pynhd import NLDI
[2]:
import warnings
warnings.filterwarnings("ignore", message=".*Content metadata for layer.*")
PyGeoOGC and PyGeoUtils can be used to access any web services that are based on ArcGIS RESTful, WMS, or WFS. It is noted that although, all these web service have limits on the number of objects (e.g., 1000 objectIDs for RESTful) or pixels (e.g., 8 million pixels) per requests, PyGeoOGC takes care of dividing the requests into smaller chunks under-the-hood and then merges them.
Let’s get started by retrieving a watershed geometry using NLDI and use it for subsetting the data.
[3]:
basin = NLDI().get_basins("11092450")
basin_geom = basin.geometry[0]
PyGeoOGC has a NamedTuple
called ServiceURL
that contains URLs of the some of the popular web services. Let’s use it to access NHDPlus HR Dataset RESTful service and get all the catchments that our basin contain and use pygeoutils.json2geodf
to convert it into a GeoDataFrame.
[4]:
hr = ArcGISRESTful(ServiceURL().restful.nhdplushr, 10, outformat="json")
oids = hr.oids_bygeom(basin_geom, "epsg:4326", spatial_relation="esriSpatialRelContains")
resp = hr.get_features(oids)
catchments = geoutils.json2geodf(resp)
[5]:
catchments.plot(figsize=(8, 8))
[5]:
<AxesSubplot:>

Note oids_bygeom
has an additional argument for passing any valid SQL WHERE clause to further filter the data on the server side. For example, let’s only keep the ones with an area of larger than 0.5 sqkm.
[6]:
oids = hr.oids_bygeom(basin_geom, geo_crs="epsg:4326", sql_clause="AREASQKM > 0.5")
catchments = geoutils.json2geodf(hr.get_features(oids))
[7]:
ax = catchments.plot(figsize=(8, 8))
ax.figure.savefig("_static/sql_clause.png", bbox_inches="tight", facecolor="w")

We can also submit a query based on IDs of any valid field in the database. If the measure property is desired you can pass return_m
as True
to the get_features
class method:
[8]:
oids = hr.oids_byfield("NHDPLUSID", [5000500013223, 5000400039708, 5000500004825])
resp = hr.get_features(oids, return_m=True)
catchments = geoutils.json2geodf(resp)
[9]:
catchments.plot(figsize=(8, 8))
[9]:
<AxesSubplot:>

Next, let’s get wetlands using the National Wetlands Inventory WMS service. First we need to connect to the service using WMS
class.
[10]:
wms = WMS(
"https://www.gebco.net/data_and_products/gebco_web_services/web_map_service/mapserv",
layers="GEBCO_LATEST",
outformat="image/tiff",
crs="epsg:3857",
)
Then we can get the data using the getmap_bybox
function. Note that this function only accepts a bounding box, so we need to pass a bounding box and mask the returned data later on using pygeoutils.gtiff2xarray
function.
[11]:
bbox = (-95, 29, -94, 30)
r_dict = wms.getmap_bybox(
bbox,
100,
box_crs="epsg:4326",
)
bathymetry = geoutils.gtiff2xarray(r_dict, bbox, "epsg:4326")
[12]:
bathymetry.isel(band=0).plot()
[12]:
<matplotlib.collections.QuadMesh at 0x1a63850f0>

Next, let’s use WaterData service to get HUC8 using WFS
and CQL filter.
[13]:
layer = "wmadata:huc08"
wfs = WFS(
ServiceURL().wfs.waterdata,
layer=layer,
outformat="application/json",
version="2.0.0",
crs="epsg:4269",
)
resp = wfs.getfeature_byfilter("huc8 LIKE '13030%'")
huc8 = geoutils.json2geodf(resp, "epsg:4269", "epsg:4326")
[14]:
huc8.plot(figsize=(8, 8))
[14]:
<AxesSubplot:>

API References#
pynhd
#
Top-level package for PyNHD.
Submodules#
pynhd.core
#
Base classes for PyNHD functions.
Module Contents#
- class pynhd.core.AGRBase(base_url, layer=None, outfields='*', crs=DEF_CRS, outformat='json')#
Base class for getting geospatial data from a ArcGISRESTful service.
- Parameters
base_url (
str
, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last/
in the url or the target layer must be passed as an argument.layer (
str
, optional) – A valid service layer. To see a list of available layers instantiate the class without passing any argument.outfields (
str
orlist
, optional) – Target field name(s), default to “*” i.e., all the fields.crs (
str
, optional) – Target spatial reference, default to EPSG:4326outformat (
str
, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults tojson
.
- bygeom(self, geom, geo_crs=DEF_CRS, sql_clause='', distance=None, return_m=False, return_geom=True)#
Get feature within a geometry that can be combined with a SQL where clause.
- Parameters
geom (
Polygon
ortuple
) – A geometry (Polygon) or bounding box (tuple of length 4).geo_crs (
str
) – The spatial reference of the input geometry.sql_clause (
str
, optional) – A valid SQL 92 WHERE clause, defaults to an empty string.distance (
int
, optional) – The buffer distance for the input geometries in meters, default to None.return_m (
bool
, optional) – Whether to activate the Return M (measure) in the request, defaults to False.return_geom (
bool
, optional) – Whether to return the geometry of the feature, defaults toTrue
.
- Returns
geopandas.GeoDataFrame
– The requested features as a GeoDataFrame.
- byids(self, field, fids, return_m=False, return_geom=True)#
Get features based on a list of field IDs.
- Parameters
- Returns
geopandas.GeoDataFrame
– The requested features as a GeoDataFrame.
- bysql(self, sql_clause, return_m=False, return_geom=True)#
Get feature IDs using a valid SQL 92 WHERE clause.
Notes
Not all web services support this type of query. For more details look here
- Parameters
- Returns
geopandas.GeoDataFrame
– The requested features as a GeoDataFrame.
- get_validlayers(self, url)#
Get a list of valid layers.
- property service_info(self)#
Get the service information.
- class pynhd.core.GeoConnex(item=None)#
Access to the GeoConnex API.
Notes
The
geometry
field of the query can be a Polygon, MultiPolygon, or tuple/list of length 4 (bbox) inEPSG:4326
CRS. They should be within the extent of the GeoConnex endpoint.- Parameters
item (
str
, optional) – The target endpoint to query, defaults toNone
.
- property item(self)#
Return the name of the endpoint.
- query(self, kwds, skip_geometry=False)#
Query the GeoConnex endpoint.
- class pynhd.core.ScienceBase#
Access and explore files on ScienceBase.
- static get_children(item)#
Get children items of an item.
- static get_file_urls(item)#
Get download and meta URLs of all the available files for an item.
- pynhd.core.stage_nhdplus_attrs(parquet_path=None)#
Stage the NHDPlus Attributes database and save to nhdplus_attrs.parquet.
More info can be found here.
- Parameters
parquet_path (
str
orPath
) – Path to a file with.parquet
extension for saving the processed to disk for later use.- Returns
pandas.DataFrame
– The staged data as a DataFrame.
pynhd.network_tools
#
Access NLDI and WaterData databases.
Module Contents#
- pynhd.network_tools.flowline_resample(flw, spacing)#
Resample a flowline based on a given spacing.
- Parameters
flw (
geopandas.GeoDataFrame
) – A dataframe withgeometry
andcomid
columns and CRS attribute. The flowlines should be able to merged to a singleLineString
. Otherwise, you should use thenetwork_resample()
function.spacing (
float
) – Spacing between the sample points in meters.
- Returns
geopandas.GeoDataFrame
– Resampled flowline.
- pynhd.network_tools.flowline_xsection(flw, distance, width)#
Get cross-section of a river network at a given spacing.
- Parameters
flw (
geopandas.GeoDataFrame
) – A dataframe withgeometry
andcomid
columns and CRS attribute.distance (
float
) – The distance between two consecutive cross-sections.width (
float
) – The width of the cross-section.
- Returns
geopandas.GeoDataFrame
– A dataframe with two columns:geometry
andcomid
. Thegeometry
column contains the cross-section of the river network and thecomid
column contains the correspondingcomid
from the input dataframe. Note that eachcomid
can have multiple cross-sections depending on the given spacing distance.
- pynhd.network_tools.network_resample(flw, spacing)#
Get cross-section of a river network at a given spacing.
- Parameters
flw (
geopandas.GeoDataFrame
) – A dataframe withgeometry
andcomid
columns and CRS attribute.spacing (
float
) – The spacing between the points.
- Returns
geopandas.GeoDataFrame
– Resampled flowlines.
- pynhd.network_tools.network_xsection(flw, distance, width)#
Get cross-section of a river network at a given spacing.
- Parameters
flw (
geopandas.GeoDataFrame
) – A dataframe withgeometry
andcomid
columns and CRS attribute.distance (
float
) – The distance between two consecutive cross-sections.width (
float
) – The width of the cross-section.
- Returns
geopandas.GeoDataFrame
– A dataframe with two columns:geometry
andcomid
. Thegeometry
column contains the cross-section of the river network and thecomid
column contains the correspondingcomid
from the input dataframe. Note that eachcomid
can have multiple cross-sections depending on the given spacing distance.
- pynhd.network_tools.nhdflw2nx(flowlines, id_col='comid', toid_col='tocomid', edge_attr=None)#
Convert NHDPlus flowline database to networkx graph.
- Parameters
flowlines (
geopandas.GeoDataFrame
) – NHDPlus flowlines.id_col (
str
, optional) – Name of the column containing the node ID, defaults to “comid”.toid_col (
str
, optional) – Name of the column containing the downstream node ID, defaults to “tocomid”.edge_attr (
str
, optional) – Name of the column containing the edge attributes, defaults toNone
. IfTrue
, all remaining columns will be used as edge attributes.
- Returns
nx.DiGraph
– Networkx directed graph of the NHDPlus flowlines.
- pynhd.network_tools.prepare_nhdplus(flowlines, min_network_size, min_path_length, min_path_size=0, purge_non_dendritic=False, remove_isolated=False, use_enhd_attrs=False, terminal2nan=True)#
Clean up and fix common issues of NHDPlus flowline database.
Ported from nhdplusTools.
- Parameters
flowlines (
geopandas.GeoDataFrame
) – NHDPlus flowlines with at least the following columns:comid
,lengthkm
,ftype
,terminalfl
,fromnode
,tonode
,totdasqkm
,startflag
,streamorde
,streamcalc
,terminalpa
,pathlength
,divergence
,hydroseq
,levelpathi
.min_network_size (
float
) – Minimum size of drainage network in sqkmmin_path_length (
float
) – Minimum length of terminal level path of a network in km.min_path_size (
float
, optional) – Minimum size of outlet level path of a drainage basin in km. Drainage basins with an outlet drainage area smaller than this value will be removed. Defaults to 0.purge_non_dendritic (
bool
, optional) – Whether to remove non dendritic paths, defaults to False.remove_isolated (
bool
, optional) – Whether to remove isolated flowlines, defaults to False. If True,terminal2nan
will be set to False.use_enhd_attrs (
bool
, optional) – Whether to replace the attributes with the ENHD attributes, defaults to False. For more information, see this.terminal2nan (
bool
, optional) – Whether to replace the COMID of the terminal flowline of the network with NaN, defaults to True. If False, the terminal COMID will be set from the ENHD attributes i.e. use_enhd_attrs will be set to True.
- Returns
geopandas.GeoDataFrame
– Cleaned up flowlines. Note that all column names are converted to lower case.
- pynhd.network_tools.topoogical_sort(flowlines, edge_attr=None)#
Topological sorting of a river network.
- Parameters
flowlines (
pandas.DataFrame
) – A dataframe with columns ID and toIDedge_attr (
str
orlist
, optional) – Names of the columns in the dataframe to be used as edge attributes, defaults to None.
- Returns
(list
, dict ,networkx.DiGraph)
– A list of topologically sorted IDs, a dictionary with keys as IDs and values as its upstream nodes, and the generated networkx object. Note that the terminal node ID is set to pd.NA.
- pynhd.network_tools.vector_accumulation(flowlines, func, attr_col, arg_cols, id_col='comid', toid_col='tocomid')#
Flow accumulation using vector river network data.
- Parameters
flowlines (
pandas.DataFrame
) – A dataframe containing comid, tocomid, attr_col and all the columns that ara required for passing tofunc
.func (
function
) – The function that routes the flow in a single river segment. Positions of the arguments in the function should be as follows:func(qin, *arg_cols)
qin
is computed in this function and the rest are in the order of thearg_cols
. For example, ifarg_cols = ["slope", "roughness"]
then the functions is called this way:func(qin, slope, roughness)
where slope and roughness are elemental values read from the flowlines.attr_col (
str
) – The column name of the attribute being accumulated in the network. The column should contain the initial condition for the attribute for each river segment. It can be a scalar or an array (e.g., time series).arg_cols (
list
ofstrs
) – List of the flowlines columns that contain all the required data for a routing a single river segment such as slope, length, lateral flow, etc.id_col (
str
, optional) – Name of the flowlines column containing IDs, defaults tocomid
toid_col (
str
, optional) – Name of the flowlines column containingtoIDs
, defaults totocomid
- Returns
pandas.Series
– Accumulated flow for all the nodes. The dataframe is sorted from upstream to downstream (topological sorting). Depending on the given initial condition in theattr_col
, the outflow for each river segment can be a scalar or an array.
pynhd.nhdplus_derived
#
Access NLDI and WaterData databases.
Module Contents#
- pynhd.nhdplus_derived.enhd_attrs(parquet_path=None)#
Get updated NHDPlus attributes from ENHD.
Notes
This downloads a 160 MB
parquet
file from here . Although this dataframe does not include geometry, it can be linked to other geospatial NHDPlus dataframes through ComIDs.- Parameters
parquet_path (
str
orPath
, optional) – Path to a file with.parquet
extension for storing the file, defaults to./cache/enhd_attrs.parquet
.- Returns
pandas.DataFrame
– A dataframe that includes ComID-level attributes for 2.7 million NHDPlus flowlines.
- pynhd.nhdplus_derived.nhd_fcode()#
Get all the NHDPlus FCodes.
- pynhd.nhdplus_derived.nhdplus_attrs(name=None, parquet_path=None)#
Access NHDPlus V2.1 Attributes from ScienceBase over CONUS.
More info can be found here.
- Parameters
name (
str
, optional) – Name of the NHDPlus attribute, defaults to None which returns a dataframe containing metadata of all the available attributes in the database.parquet_path (
str
orPath
, optional) – Path to a file with.parquet
extension for saving the processed to disk for later use. Defaults to./cache/nhdplus_attrs.parquet
.
- Returns
pandas.DataFrame
– Either a dataframe containing the database metadata or the requested attribute over CONUS.
- pynhd.nhdplus_derived.nhdplus_vaa(parquet_path=None)#
Get NHDPlus Value Added Attributes with ComID-level roughness and slope values.
Notes
This function downloads a 200 MB
parquet
file from here . Although this dataframe does not include geometry, it can be linked to other geospatial NHDPlus dataframes through ComIDs.- Parameters
parquet_path (
str
orPath
, optional) – Path to a file with.parquet
extension for storing the file, defaults to./cache/nldplus_vaa.parquet
.- Returns
pandas.DataFrame
– A dataframe that includes ComID-level attributes for 2.7 million NHDPlus flowlines.
Examples
>>> vaa = nhdplus_vaa() >>> print(vaa.slope.max()) 4.6
pynhd.pynhd
#
Access NLDI and WaterData databases.
Module Contents#
- class pynhd.pynhd.NHD(layer, outfields='*', crs=DEF_CRS)#
Access National Hydrography Dataset (NHD), both meduim and high resolution.
Notes
For more info visit: https://hydro.nationalmap.gov/arcgis/rest/services/nhd/MapServer
- Parameters
layer (
str
, optional) – A valid service layer. Layer names with_hr
are high resolution and_mr
are medium resolution. Also, layer names with_nonconus
are for non-conus areas, i.e., Alaska, Hawaii, Puerto Rico, the Virgin Islands , and the Pacific Islands. Valid layers are:point
point_event
line_hr
flow_direction
flowline_mr
flowline_hr_nonconus
flowline_hr
area_mr
area_hr_nonconus
area_hr
waterbody_mr
waterbody_hr_nonconus
waterbody_hr
outfields (
str
orlist
, optional) – Target field name(s), default to “*” i.e., all the fields.crs (
str
, optional) – Target spatial reference, default toEPSG:4326
- class pynhd.pynhd.NHDPlusHR(layer, outfields='*', crs=DEF_CRS)#
Access National Hydrography Dataset (NHD) high resolution.
Notes
For more info visit: https://edits.nationalmap.gov/arcgis/rest/services/nhd/MapServer
- Parameters
layer (
str
, optional) – A valid service layer. Valid layers are:point
sink
flowline
non_network_flowline
flow_direction
line
wall
burn_line
burn_waterbody
area
waterbody
huc12
catchment
outfields (
str
orlist
, optional) – Target field name(s), default to “*” i.e., all the fields.crs (
str
, optional) – Target spatial reference, default toEPSG:4326
- class pynhd.pynhd.NLDI#
Access the Hydro Network-Linked Data Index (NLDI) service.
- comid_byloc(self, coords, loc_crs=DEF_CRS)#
Get the closest ComID based on coordinates.
Notes
This function tries to find the closest ComID based on flowline grid cells. If such a cell is not found, it will return the closest ComID using the flowtrace endpoint of the PyGeoAPI service to find the closest downstream ComID. The returned dataframe has a
measure
column that indicates the location of the input coordinate on the flowline as a percentage of the total flowline length.- Parameters
- Returns
geopandas.GeoDataFrame
or(geopandas.GeoDataFrame
,list)
– NLDI indexed ComID(s) and points in EPSG:4326. If some coords don’t return any ComID a list of missing coords are returned as well.
- feature_byloc(self, coords, loc_crs=DEF_CRS)#
Get the closest feature ID(s) based on coordinates.
- Parameters
- Returns
geopandas.GeoDataFrame
or(geopandas.GeoDataFrame
,list)
– NLDI indexed feature ID(s) and flowlines in EPSG:4326. If some coords don’t return any IDs a list of missing coords are returned as well.
- get_basins(self, feature_ids, fsource='nwissite', split_catchment=False, simplified=True)#
Get basins for a list of station IDs.
- Parameters
fsource (
str
) – The name of feature(s) source, defaults tonwissite
. The valid sources are:‘comid’ for NHDPlus comid.
‘ca_gages’ for Streamgage catalog for CA SB19
‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest
‘huc12pp’ for HUC12 Pour Points
‘nmwdi-st’ for New Mexico Water Data Initative Sites
‘nwisgw’ for NWIS Groundwater Sites
‘nwissite’ for NWIS Surface Water Sites
‘ref_gage’ for geoconnex.us reference gages
‘vigil’ for Vigil Network Data
‘wade’ for Water Data Exchange 2.0 Sites
‘WQP’ for Water Quality Portal
split_catchment (
bool
, optional) – IfTrue
, split basins at their outlet locations. Default toFalse
.simplified (
bool
, optional) – IfTrue
, return a simplified version of basin geometries. Default toTrue
.
- Returns
geopandas.GeoDataFrame
or(geopandas.GeoDataFrame
,list)
– NLDI indexed basins in EPSG:4326. If some IDs don’t return any features a list of missing ID(s) are returned as well.
- get_validchars(self, char_type)#
Get all the available characteristics IDs for a given characteristics type.
- getcharacteristic_byid(self, comids, char_type, char_ids='all', values_only=True)#
Get characteristics using a list ComIDs.
- Parameters
char_type (
str
) – Type of the characteristic. Valid values arelocal
for individual reach catchments,tot
for network-accumulated values using total cumulative drainage area anddiv
for network-accumulated values using divergence-routed.char_ids (
str
orlist
, optional) – Name(s) of the target characteristics, default to all.values_only (
bool
, optional) – Whether to return onlycharacteristic_value
as a series, default to True. If is set to False,percent_nodata
is returned as well.
- Returns
pandas.DataFrame
ortuple
ofpandas.DataFrame
– Either onlycharacteristic_value
as a dataframe or or ifvalues_only
is Fale returnpercent_nodata
as well.
- getfeature_byid(self, fsource, fid)#
Get feature(s) based ID(s).
- Parameters
fsource (
str
) – The name of feature(s) source. The valid sources are:‘comid’ for NHDPlus comid.
‘ca_gages’ for Streamgage catalog for CA SB19
‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest
‘huc12pp’ for HUC12 Pour Points
‘nmwdi-st’ for New Mexico Water Data Initative Sites
‘nwisgw’ for NWIS Groundwater Sites
‘nwissite’ for NWIS Surface Water Sites
‘ref_gage’ for geoconnex.us reference gages
‘vigil’ for Vigil Network Data
‘wade’ for Water Data Exchange 2.0 Sites
‘WQP’ for Water Quality Portal
- Returns
geopandas.GeoDataFrame
or(geopandas.GeoDataFrame
,list)
– NLDI indexed features in EPSG:4326. If some IDs don’t return any features a list of missing ID(s) are returned as well.
Navigate the NHDPlus database from a single feature id up to a distance.
- Parameters
fsource (
str
) – The name of feature(s) source. The valid sources are:‘comid’ for NHDPlus comid.
‘ca_gages’ for Streamgage catalog for CA SB19
‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest
‘huc12pp’ for HUC12 Pour Points
‘nmwdi-st’ for New Mexico Water Data Initative Sites
‘nwisgw’ for NWIS Groundwater Sites
‘nwissite’ for NWIS Surface Water Sites
‘ref_gage’ for geoconnex.us reference gages
‘vigil’ for Vigil Network Data
‘wade’ for Water Data Exchange 2.0 Sites
‘WQP’ for Water Quality Portal
fid (
str
) – The ID of the feature.navigation (
str
) – The navigation method.source (
str
, optional) – Return the data from another source after navigating the features using fsource, defaults to None.distance (
int
, optional) – Limit the search for navigation up to a distance in km, defaults is 500 km. Note that this is an expensive request so you have be mindful of the value that you provide. The value must be between 1 to 9999 km.trim_start (
bool
, optional) – IfTrue
, trim the starting flowline at the source feature, defaults toFalse
.
- Returns
geopandas.GeoDataFrame
– NLDI indexed features in EPSG:4326.
Navigate the NHDPlus database from a coordinate.
- Parameters
coords (
tuple
) – A tuple of length two (x, y).navigation (
str
, optional) – The navigation method, defaults to None which throws an exception if comid_only is False.source (
str
, optional) – Return the data from another source after navigating the features using fsource, defaults to None which throws an exception if comid_only is False.loc_crs (
str
, optional) – The spatial reference of the input coordinate, defaults to EPSG:4326.distance (
int
, optional) – Limit the search for navigation up to a distance in km, defaults to 500 km. Note that this is an expensive request so you have be mindful of the value that you provide. If you want to get all the available features you can pass a large distance like 9999999.trim_start (
bool
, optional) – IfTrue
, trim the starting flowline at the source feature, defaults toFalse
.
- Returns
geopandas.GeoDataFrame
– NLDI indexed features in EPSG:4326.
- class pynhd.pynhd.PyGeoAPI#
Access PyGeoAPI service.
- cross_section(self, coord, width, numpts, crs=DEF_CRS)#
Return a GeoDataFrame from the xsatpoint service.
- Parameters
coord (
tuple
) – The coordinate of the point to extract the cross-section as a tuple,e.g., (lon, lat).width (
float
) – The width of the cross-section in meters.numpts (
int
) – The number of points to extract the cross-section from the DEM.crs (
str
, optional) – The coordinate reference system of the coordinates, defaults to EPSG:4326.
- Returns
geopandas.GeoDataFrame
– A GeoDataFrame containing the cross-section at the requested point.
Examples
>>> from pynhd import PyGeoAPI >>> pygeoapi = PyGeoAPI() >>> gdf = pygeoapi.cross_section((-103.80119, 40.2684), width=1000.0, numpts=101, crs=DEF_CRS) >>> print(gdf.iloc[-1, 1]) 1000.0
- elevation_profile(self, coords, numpts, dem_res, crs=DEF_CRS)#
Return a GeoDataFrame from the xsatendpts service.
- Parameters
coords (
list
) – A list of two coordinates to trace as a list of tuples, e.g., [(lon1, lat1), (lon2, lat2)].numpts (
int
) – The number of points to extract the elevation profile from the DEM.dem_res (
int
) – The target resolution for requesting the DEM from 3DEP service.crs (
str
, optional) – The coordinate reference system of the coordinates, defaults to EPSG:4326.
- Returns
geopandas.GeoDataFrame
– A GeoDataFrame containing the elevation profile along the requested endpoints.
Examples
>>> from pynhd import PyGeoAPI >>> pygeoapi = PyGeoAPI() >>> gdf = pygeoapi.elevation_profile( ... [(-103.801086, 40.26772), (-103.80097, 40.270568)], numpts=101, dem_res=1, crs=DEF_CRS ... ) >>> print(gdf.iloc[-1, 1]) 411.5906
- flow_trace(self, coord, crs=DEF_CRS, direction='none')#
Return a GeoDataFrame from the flowtrace service.
- Parameters
- Returns
geopandas.GeoDataFrame
– A GeoDataFrame containing the traced flowline.
Examples
>>> from pynhd import PyGeoAPI >>> pygeoapi = PyGeoAPI() >>> gdf = pygeoapi.flow_trace( ... (1774209.63, 856381.68), crs="ESRI:102003", direction="none" ... ) >>> print(gdf.comid.iloc[0]) 22294818
- split_catchment(self, coord, crs=DEF_CRS, upstream=False)#
Return a GeoDataFrame from the splitcatchment service.
- Parameters
coord (
tuple
) – The coordinate of the point to trace as a tuple,e.g., (lon, lat).crs (
str
, optional) – The coordinate reference system of the coordinates, defaults to EPSG:4326.upstream (
bool
, optional) – If True, return all upstream catchments rather than just the local catchment, defaults to False.
- Returns
geopandas.GeoDataFrame
– A GeoDataFrame containing the local catchment or the entire upstream catchments.
Examples
>>> from pynhd import PyGeoAPI >>> pygeoapi = PyGeoAPI() >>> gdf = pygeoapi.split_catchment((-73.82705, 43.29139), crs=DEF_CRS, upstream=False) >>> print(gdf.catchmentID.iloc[0]) 22294818
- class pynhd.pynhd.WaterData(layer, crs=DEF_CRS, validation=True)#
Access to Water Data service.
- Parameters
layer (
str
) – A valid layer from the WaterData service. Valid layers are:nhdarea
,nhdwaterbody
,catchmentsp
,nhdflowline_network
gagesii
,huc08
,huc12
,huc12agg
, andhuc12all
. Note that the layers’ worksapce for the Water Data service iswmadata
which will be added to the givenlayer
argument if it is not provided.crs (
str
, optional) – The target spatial reference system, defaults toepsg:4326
.validation (
bool
, optional) – Whether to validate the input data, defaults toTrue
.
- bybox(self, bbox, box_crs=DEF_CRS)#
Get features within a bounding box.
- bydistance(self, coords, distance, loc_crs=DEF_CRS)#
Get features within a radius (in meters) of a point.
- byfilter(self, cql_filter, method='GET')#
Get features based on a CQL filter.
- bygeom(self, geometry, geo_crs=DEF_CRS, xy=True, predicate='INTERSECTS')#
Get features within a geometry.
- Parameters
geometry (
shapely.geometry
) – The input geometrygeo_crs (
str
, optional) – The CRS of the input geometry, default to epsg:4326.xy (
bool
, optional) – Whether axis order of the input geometry is xy or yx.predicate (
str
, optional) – The geometric prediacte to use for requesting the data, defaults to INTERSECTS. Valid predicates are:EQUALS
,DISJOINT
,INTERSECTS
,TOUCHES
,CROSSES
,WITHIN
CONTAINS
,OVERLAPS
,RELATE
,BEYOND
- Returns
geopandas.GeoDataFrame
– The requested features in the given geometry.
- byid(self, featurename, featureids)#
Get features based on IDs.
- pynhd.pynhd.geoconnex(item=None, query=None, skip_geometry=False)#
Query the GeoConnex API.
Notes
If you run the function without any arguments, it will print out a list of available endpoints. If you run the function with
item
but noquery
, it will print out the description, queryable fields, and extent of the selected endpoint (item
).- Parameters
- Returns
geopandas.GeoDataFrame
– The data.
- pynhd.pynhd.pygeoapi(coords, service)#
Return a GeoDataFrame from the flowtrace service.
- Parameters
coords (
geopandas.GeoDataFrame
) – A GeoDataFrame containing the coordinates to query. The required columns services are:flow_trace
:direction
that indicates the direction of the flow trace. It can beup
,down
, ornone
.split_catchment
:upstream
that indicates whether to return all upstream catchments or just the local catchment.elevation_profile
:numpts
that indicates the number of points to extract along the flowpath and3dep_res
that indicates the target resolution for requesting the DEM from 3DEP service.cross_section
:numpts
that indicates the number of points to extract along the flowpath andwidth
that indicates the width of the cross-section in meters.
service (
str
) – The service to query, can beflow_trace
,split_catchment
,elevation_profile
, orcross_section
.
- Returns
geopandas.GeoDataFrame
– A GeoDataFrame containing the results of requested service.
Examples
>>> from shapely.geometry import Point >>> gdf = gpd.GeoDataFrame( ... { ... "direction": [ ... "none", ... ] ... }, ... geometry=[sgeom.Point((1774209.63, 856381.68))], ... crs="ESRI:102003", ... ) >>> trace = nhd.pygeoapi(gdf, "flow_trace") >>> print(trace.comid.iloc[0]) 22294818
Package Contents#
pygeohydro
#
Submodules#
pygeohydro.helpers
#
Some helper function for PyGeoHydro.
Module Contents#
- pygeohydro.helpers.nlcd_helper()#
Get legends and properties of the NLCD cover dataset.
Notes
- The following references have been used:
- Returns
dict
– Years where data is available and cover classes and categories, and roughness estimations.
- pygeohydro.helpers.nwis_errors()#
Get error code lookup table for USGS sites that have daily values.
pygeohydro.plot
#
Plot hydrological signatures.
Plots includes daily, monthly and annual hydrograph as well as regime curve (monthly mean) and flow duration curve.
Module Contents#
- pygeohydro.plot.exceedance(daily, threshold=0.001)#
Compute exceedance probability from daily data.
- Parameters
daily (
pandas.Series
orpandas.DataFrame
) – The data to be processedthreshold (
float
, optional) – The threshold to compute exceedance probability, defaults to 1e-3.
- Returns
pandas.Series
orpandas.DataFrame
– Exceedance probability.
- pygeohydro.plot.mean_monthly(daily, index_abbr=False)#
Compute mean monthly summary from daily data.
- Parameters
daily (
pandas.Series
orpandas.DataFrame
) – The data to be processed- Returns
pandas.Series
orpandas.DataFrame
– Mean monthly summary.
- pygeohydro.plot.prepare_plot_data(daily)#
Generae a structured data for plotting hydrologic signatures.
- Parameters
daily (
pandas.Series
orpandas.DataFrame
) – The data to be processed- Returns
PlotDataType
– Containingdaily, ``mean_monthly
,ranked
,titles
, andunits
fields.
- pygeohydro.plot.signatures(discharge, precipitation=None, title=None, threshold=0.001, output=None, close=False)#
Plot hydrological signatures with w/ or w/o precipitation.
Plots includes daily hydrograph, regime curve (mean monthly) and flow duration curve. The input discharges are converted from cms to mm/day based on the watershed area, if provided.
- Parameters
discharge (
pd.DataFrame
orpd.Series
) – The streamflows in mm/day. The column names are used as labels on the plot and the column values should be daily streamflow.precipitation (
pd.Series
, optional) – Daily precipitation time series in mm/day. If given, the data is plotted on the second x-axis at the top.title (
str
, optional) – The plot supertitle.threshold (
float
, optional) – The threshold for cutting off the discharge for the flow duration curve to deal with log 0 issue, defaults to \(1^{-3}\) mm/day.output (
str
, optional) – Path to save the plot as png, defaults toNone
which means the plot is not saved to a file.close (
bool
, optional) – Whether to close the figure.
pygeohydro.pygeohydro
#
Accessing data from the supported databases through their APIs.
Module Contents#
- class pygeohydro.pygeohydro.NID#
Retrieve data from the National Inventory of Dams web service.
- get_byfilter(self, query_list)#
Query dams by filters from the National Inventory of Dams web service.
- Parameters
query_list (
list
ofdict
) – List of dictionary of query parameters. For an exhaustive list of the parameters, use the advanced fields dataframe that can be accessed viaNID().fields_meta
. Some filter require min/max values such asdamHeight
anddrainageArea
. For such filters, the min/max values should be passed like so:{filter_key: ["[min1 max1]", "[min2 max2]"]}
.- Returns
geopandas.GeoDataFrame
– Query results.
Examples
>>> from pygeohydro import NID >>> nid = NID() >>> query_list = [ ... {"drainageArea": ["[200 500]"]}, ... {"nidId": ["CA01222"]}, ... ] >>> dam_dfs = nid.get_byfilter(query_list) >>> print(dam_dfs[0].name[0]) Prairie Portage
- get_bygeom(self, geometry, geo_crs)#
Retrieve NID data within a geometry.
- Parameters
- Returns
geopandas.GeoDataFrame
– GeoDataFrame of NID data
Examples
>>> from pygeohydro import NID >>> nid = NID() >>> dams = nid.get_bygeom((-69.77, 45.07, -69.31, 45.45), "epsg:4326") >>> print(dams.name.iloc[0]) Little Moose
- get_suggestions(self, text, context_key='')#
Get suggestions from the National Inventory of Dams web service.
Notes
This function is useful for exploring and/or narrowing down the filter fields that are needed to query the dams using
get_byfilter
.- Parameters
- Returns
tuple
ofpandas.DataFrame
– The suggestions for the requested text as two DataFrames: First, is suggestions found in the dams properties and second, those found in the query fields such as states, huc6, etc.
Examples
>>> from pygeohydro import NID >>> nid = NID() >>> dams, contexts = nid.get_suggestions("texas", "city") >>> print(contexts.loc["CITY", "value"]) Texas City
- inventory_byid(self, dam_ids)#
Get extra attributes for dams based on their dam ID.
Notes
This function is meant to be used for getting extra attributes for dams. For example, first you need to use either
get_bygeom
orget_byfilter
to get basic attributes of the target dams. Then you can use this function to get extra attributes using theid
column of theGeoDataFrame
thatget_bygeom
orget_byfilter
returns.- Parameters
dam_ids (
list
ofint
orstr
) – List of the target dam IDs (digists only). Note that the dam IDs are not the same as the NID IDs.- Returns
pandas.DataFrame
– Dams with extra attributes in addition to the standard NID fields that otherNID
methods return.
Examples
>>> from pygeohydro import NID >>> nid = NID() >>> dams = nid.inventory_byid([514871, 459170, 514868, 463501, 463498]) >>> print(dams.damHeight.max()) 120.0
- class pygeohydro.pygeohydro.WBD(layer, outfields='*', crs=DEF_CRS)#
Access Watershed Boundary Dataset (WBD).
Notes
This file contains Hydrologic Unit (HU) polygon boundaries for the United States, Puerto Rico, and the U.S. Virgin Islands. For more info visit: https://hydro.nationalmap.gov/arcgis/rest/services/wbd/MapServer
- pygeohydro.pygeohydro.cover_statistics(cover_da)#
Percentages of the categorical NLCD cover data.
- Parameters
cover_da (
xarray.DataArray
) – Land cover DataArray from a LULC Dataset from thenlcd_bygeom
function.- Returns
Stats
– A named tuple with the percentages of the cover classes and categories.
- pygeohydro.pygeohydro.get_camels()#
Get streaflow and basin attributes of all 671 stations in CAMELS dataset.
Notes
For more info on CAMELS visit: https://ral.ucar.edu/solutions/products/camels
- Returns
tuple
ofgeopandas.GeoDataFrame
andxarray.Dataset
– The first is basin attributes as ageopandas.GeoDataFrame
and the second is streamflow data and basin attributes as anxarray.Dataset
.
- pygeohydro.pygeohydro.nlcd_bycoords(coords, years=None, region='L48', ssl=None)#
Get data from NLCD database (2019).
- Parameters
coords (
list
oftuple
) – List of coordinates in the form of (longitude, latitude).years (
dict
, optional) – The years for NLCD layers as a dictionary, defaults to{'impervious': [2019], 'cover': [2019], 'canopy': [2019], "descriptor": [2019]}
. Layers that are not in years are ignored, e.g.,{'cover': [2016, 2019]}
returns land cover data for 2016 and 2019.region (
str
, optional) – Region in the US, defaults toL48
. Valid values areL48
(for CONUS),HI
(for Hawaii),AK
(for Alaska), andPR
(for Puerto Rico). Both lower and upper cases are acceptable.ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set toFalse
to disable SSL certification verification.
- Returns
geopandas.GeoDataFrame
– A GeoDataFrame with the NLCD data and the coordinates.
- pygeohydro.pygeohydro.nlcd_bygeom(geometry, resolution, years=None, region='L48', crs=DEF_CRS, ssl=None)#
Get data from NLCD database (2019).
- Parameters
geometry (
geopandas.GeoDataFrame
orgeopandas.GeoSeries
) – A GeoDataFrame or GeoSeries with the geometry to query. The indices are used as keys in the output dictionary.resolution (
float
) – The data resolution in meters. The width and height of the output are computed in pixel based on the geometry bounds and the given resolution.years (
dict
, optional) – The years for NLCD layers as a dictionary, defaults to{'impervious': [2019], 'cover': [2019], 'canopy': [2019], "descriptor": [2019]}
. Layers that are not in years are ignored, e.g.,{'cover': [2016, 2019]}
returns land cover data for 2016 and 2019.region (
str
, optional) – Region in the US, defaults toL48
. Valid values areL48
(for CONUS),HI
(for Hawaii),AK
(for Alaska), andPR
(for Puerto Rico). Both lower and upper cases are acceptable.crs (
str
, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set toFalse
to disable SSL certification verification.
- Returns
dict
ofxarray.Dataset
orxarray.Dataset
– A single or adict
of NLCD datasets. If dict, the keys are indices of the inputGeoDataFrame
.
- pygeohydro.pygeohydro.overland_roughness(cover_da)#
Estimate overland roughness from land cover data.
- Parameters
cover_da (
xarray.DataArray
) – Land cover DataArray from a LULC Dataset from thenlcd_bygeom
function.- Returns
xarray.DataArray
– Overland roughness
- pygeohydro.pygeohydro.ssebopeta_bycoords(coords, dates, crs=DEF_CRS)#
Daily actual ET for a dataframe of coords from SSEBop database in mm/day.
- Parameters
coords (
pandas.DataFrame
) – A dataframe withid
,x
,y
columns.dates (
tuple
orlist
, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].crs (
str
, optional) – The CRS of the input coordinates, defaults to epsg:4326.
- Returns
xarray.Dataset
– Daily actual ET in mm/day as a dataset withtime
andlocation_id
dimensions. Thelocation_id
dimension is the same as theid
column in the input dataframe.
- pygeohydro.pygeohydro.ssebopeta_bygeom(geometry, dates, geo_crs=DEF_CRS)#
Get daily actual ET for a region from SSEBop database.
Notes
Since there’s still no web service available for subsetting SSEBop, the data first needs to be downloaded for the requested period then it is masked by the region of interest locally. Therefore, it’s not as fast as other functions and the bottleneck could be the download speed.
- Parameters
geometry (
shapely.geometry.Polygon
ortuple
) – The geometry for downloading clipping the data. For a tuple bbox, the order should be (west, south, east, north).dates (
tuple
orlist
, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].geo_crs (
str
, optional) – The CRS of the input geometry, defaults to epsg:4326.
- Returns
xarray.DataArray
– Daily actual ET within a geometry in mm/day at 1 km resolution
- pygeohydro.pygeohydro.ssebopeta_byloc(coords, dates)#
Daily actual ET for a location from SSEBop database in mm/day.
Deprecated since version 0.11.5: Use
ssebopeta_bycoords()
instead. For now, this function callsssebopeta_bycoords()
but retains the same functionality, i.e., returns a dataframe and accepts only a single coordinate. Whereas the new function returns axarray.Dataset
and accepts a dataframe containing coordinates.- Parameters
- Returns
pandas.Series
– Daily actual ET for a location
pygeohydro.us_abbrs
#
US states and territories Abbreviations from us
package.
pygeohydro.waterdata
#
Accessing data from the supported databases through their APIs.
Module Contents#
- class pygeohydro.waterdata.NWIS#
Access NWIS web service.
- get_info(self, queries, expanded=False, fix_names=True)#
Send multiple queries to USGS Site Web Service.
- Parameters
- Returns
geopandas.GeoDataFrame
– A correctly typedGeoDataFrame
containing site(s) information.
- get_parameter_codes(self, keyword)#
Search for parameter codes by name or number.
Notes
NWIS guideline for keywords is as follows:
By default an exact search is made. To make a partial search the term should be prefixed and suffixed with a % sign. The % sign matches zero or more characters at the location. For example, to find all with “discharge” enter %discharge% in the field. % will match any number of characters (including zero characters) at the location.
- Parameters
keyword (
str
) – Keyword to search for parameters by name of number.- Returns
pandas.DataFrame
– Matched parameter codes as a dataframe with their description.
Examples
>>> from pygeohydro import NWIS >>> nwis = NWIS() >>> codes = nwis.get_parameter_codes("%discharge%") >>> codes.loc[codes.parameter_cd == "00060", "parm_nm"][0] 'Discharge, cubic feet per second'
- get_streamflow(self, station_ids, dates, freq='dv', mmd=False, to_xarray=False)#
Get mean daily streamflow observations from USGS.
- Parameters
station_ids (
str
,list
) – The gage ID(s) of the USGS station.dates (
tuple
) – Start and end dates as a tuple (start, end).freq (
str
, optional) – The frequency of the streamflow data, defaults todv
(daily values). Valid frequencies aredv
(daily values),iv
(instantaneous values). Note that foriv
the time zone for the input dates is assumed to be UTC.mmd (
bool
, optional) – Convert cms to mm/day based on the contributing drainage area of the stations. Defaults to False.to_xarray (
bool
, optional) – Whether to return a xarray.Dataset. Defaults to False.
- Returns
pandas.DataFrame
orxarray.Dataset
– Streamflow data observations in cubic meter per second (cms). The stations that don’t provide the requested discharge data in the target period will be dropped. Note that when frequency is set toiv
the time zone is converted to UTC.
- retrieve_rdb(self, url, payloads)#
Retrieve and process requests with RDB format.
- Parameters
- Returns
pandas.DataFrame
– Requested features as a pandas’s DataFrame.
- class pygeohydro.waterdata.WaterQuality#
Water Quality Web Service https://www.waterqualitydata.us.
Notes
This class has a number of convenience methods to retrieve data from the Water Quality Data. Since there are many parameter combinations that can be used to retrieve data, a general method is also provided to retrieve data from any of the valid endpoints. You can use
get_json
to retrieve stations info as ageopandas.GeoDataFrame
orget_csv
to retrieve stations data as apandas.DataFrame
. You can construct a dictionary of the parameters and pass it to one of these functions. For more information on the parameters, please consult the Water Quality Data documentation.- data_bystation(self, station_ids, wq_kwds)#
Retrieve data for a single station.
- Parameters
- Returns
pandas.DataFrame
– DataFrame of data for the stations.
- get_csv(self, endpoint, kwds, request_method='GET')#
Get the CSV response from the Water Quality Web Service.
- Parameters
- Returns
pandas.DataFrame
– The web service response as a DataFrame.
- get_json(self, endpoint, kwds, request_method='GET')#
Get the JSON response from the Water Quality Web Service.
- Parameters
- Returns
geopandas.GeoDataFrame
– The web service response as a GeoDataFrame.
- get_param_table(self)#
Get the parameter table from the USGS Water Quality Web Service.
- lookup_domain_values(self, endpoint)#
Get the domain values for the target endpoint.
- station_bybbox(self, bbox, wq_kwds)#
Retrieve station info within bounding box.
- Parameters
- Returns
geopandas.GeoDataFrame
– GeoDataFrame of station info within the bounding box.
- station_bydistance(self, lon, lat, radius, wq_kwds)#
Retrieve station within a radius (decimal miles) of a point.
- Parameters
- Returns
geopandas.GeoDataFrame
– GeoDataFrame of station info within the radius of the point.
- pygeohydro.waterdata.interactive_map(bbox, crs=DEF_CRS, nwis_kwds=None)#
Generate an interactive map including all USGS stations within a bounding box.
- Parameters
bbox (
tuple
) – List of corners in this order (west, south, east, north)crs (
str
, optional) – CRS of the input bounding box, defaults to EPSG:4326.nwis_kwds (
dict
, optional) – Optional keywords to include in the NWIS request as a dictionary like so:{"hasDataTypeCd": "dv,iv", "outputDataTypeCd": "dv,iv", "parameterCd": "06000"}
. Default to None.
- Returns
folium.Map
– Interactive map within a bounding box.
Examples
>>> import pygeohydro as gh >>> nwis_kwds = {"hasDataTypeCd": "dv,iv", "outputDataTypeCd": "dv,iv"} >>> m = gh.interactive_map((-69.77, 45.07, -69.31, 45.45), nwis_kwds=nwis_kwds) >>> n_stations = len(m.to_dict()["children"]) - 1 >>> n_stations 10
Package Contents#
py3dep
#
Top-level package for Py3DEP.
Submodules#
py3dep.py3dep
#
Get data from 3DEP database.
Module Contents#
- py3dep.py3dep.check_3dep_availability(bbox, crs=DEF_CRS)#
Query 3DEP’s resolution availability within a bounding box.
This function checks availability of 3DEP’s at the following resolutions: 1 m, 3 m, 5 m, 10 m, 30 m, 60 m, and topobathy (integrated topobathymetry).
- Parameters
- Returns
dict
– True if bbox intersects 3DEP elevation for each available resolution. Keys are the supported resolutions and values are their availability.
Examples
>>> import py3dep >>> bbox = (-69.77, 45.07, -69.31, 45.45) >>> py3dep.check_3dep_availability(bbox) {'1m': True, '3m': False, '5m': False, '10m': True, '30m': True, '60m': False, 'topobathy': False}
- py3dep.py3dep.elevation_bycoords(coords, crs=DEF_CRS, source='tep')#
Get elevation for a list of coordinates.
- Parameters
coords (
list
oftuple
) – Coordinates of target location as list of tuples[(x, y), ...]
.crs (
str
orpyproj.CRS
, optional) – Spatial reference (CRS) of coords, defaults toEPSG:4326
.source (
str
, optional) – Data source to be used, default toairmap
. Supported sources areairmap
(30 m resolution),tnm
(using The National Map’s Bulk Point Query Service with 10 m resolution) andtep
(using 3DEP’s WMS service at 10 m resolution). Thetnm
andtep
sources are more accurate since they use the 1/3 arc-second DEM layer from 3DEP service but it is limited to the US. They both tend to be slower than the Airmap service. Note thattnm
is bit unstable. It’s recommended to usetep
unless 10-m resolution accuracy is not necessary which in that caseairmap
is more appropriate.
- Returns
- py3dep.py3dep.elevation_bygrid(xcoords, ycoords, crs, resolution, depression_filling=False)#
Get elevation from DEM data for a grid.
This function is intended for getting elevations for a gridded dataset.
- Parameters
xcoords (
list
) – List of x-coordinates of a grid.ycoords (
list
) – List of y-coordinates of a grid.crs (
str
orpyproj.CRS
) – The spatial reference system of the input grid, defaults toEPSG:4326
.resolution (
float
) – The accuracy of the output, defaults to 10 m which is the highest available resolution that covers CONUS. Note that higher resolution increases computation time so chose this value with caution.depression_filling (
bool
, optional) – Fill depressions before sampling using RichDEM package, defaults to False.
- Returns
xarray.DataArray
– Elevations of the input coordinates as axarray.DataArray
.
- py3dep.py3dep.elevation_profile(lines, spacing, dem_res=10, crs=DEF_CRS)#
Get the elevation profile along a line at a given uniform spacing.
This function converts the line to a B-spline and then calculates the elevation along the spline at a given uniform spacing.
- Parameters
lines (
LineString
orMultiLineString
) – Line segment(s) to be profiled. If its type isMultiLineString
, it will be converted to a singleLineString
and if this operation fails, aInvalidInputType
will be raised.spacing (
float
) – Spacing between the sample points along the line in meters.dem_res (
float
, optional) – Resolution of the DEM source to use in meter, defaults to 10.crs (
str
orpyproj.CRS
, optional) – Spatial reference (CRS) oflines
, defaults toEPSG:4326
.
- Returns
xarray.DataArray
– Elevation profile with dimensionz
and three coordinates:x
,y
, anddistance
. Thedistance
coordinate is the distance from the start of the line in meters.
- py3dep.py3dep.get_map(layers, geometry, resolution, geo_crs=DEF_CRS, crs=DEF_CRS)#
Access to 3DEP service.
The 3DEP service has multi-resolution sources, so depending on the user provided resolution the data is resampled on server-side based on all the available data sources. The following layers are available:
DEM
Hillshade Gray
Aspect Degrees
Aspect Map
GreyHillshade_elevationFill
Hillshade Multidirectional
Slope Map
Slope Degrees
Hillshade Elevation Tinted
Height Ellipsoidal
Contour 25
Contour Smoothed 25
- Parameters
layers (
str
orlist
ofstr
) – A valid 3DEP layer or a list of them.geometry (
Polygon
,MultiPolygon
, ortuple
) – A shapely Polygon or a bounding box of the form(west, south, east, north)
.resolution (
float
) – The target resolution in meters. The width and height of the output are computed in pixels based on the geometry bounds and the given resolution.geo_crs (
str
, optional) – The spatial reference system of the input geometry, defaults toEPSG:4326
.crs (
str
, optional) – The spatial reference system to be used for requesting the data, defaults toEPSG:4326
. Valid values areEPSG:4326
,EPSG:3576
,EPSG:3571
,EPSG:3575
,EPSG:3857
,EPSG:3572
,CRS:84
,EPSG:3573
, andEPSG:3574
.
- Returns
xarray.DataArray
orxarray.Dataset
– The requested topographic data as anxarray.DataArray
orxarray.Dataset
.
- py3dep.py3dep.query_3dep_sources(bbox, crs=DEF_CRS, res=None)#
Query 3DEP’s data sources within a bounding box.
This function queries the availability of the underlying data that 3DEP uses at the following resolutions: 1 m, 3 m, 5 m, 10 m, 30 m, 60 m, and topobathy (integrated topobathymetry).
- Parameters
- Returns
geopandas.GeoDataFrame
– Polygon(s) representing the 3DEP data sources at each resolution. Resolutions are given in thedem_res
column.
Examples
>>> import py3dep >>> bbox = (-69.77, 45.07, -69.31, 45.45) >>> src = py3dep.query_3dep_sources(bbox) >>> src.groupby("dem_res")["OBJECTID"].count().to_dict() {'10m': 8, '1m': 4, '30m': 8} >>> src = py3dep.query_3dep_sources(bbox, res="1m") >>> src.groupby("dem_res")["OBJECTID"].count().to_dict() {'1m': 4}
py3dep.utils
#
Utilities for Py3DEP.
Module Contents#
- py3dep.utils.deg2mpm(slope)#
Convert slope from degrees to meter/meter.
- Parameters
slope (
xarray.DataArray
) – Slope in degrees.- Returns
xarray.DataArray
– Slope in meter/meter. The name is set toslope
and theunits
attribute is set tom/m
.
- py3dep.utils.fill_depressions(dem_da)#
Fill depressions and adjust flat areas in a DEM using RichDEM.
- Parameters
dem (
xarray.DataArray
ornumpy.ndarray
) – Digital Elevation Model.- Returns
xarray.DataArray
– Conditioned DEM after applying depression filling and flat area resolution operations.
Package Contents#
pydaymet
#
Top-level package for PyDaymet.
Submodules#
pydaymet.core
#
Core class for the Daymet functions.
Module Contents#
- class pydaymet.core.Daymet(variables=None, pet=None, snow=False, time_scale='daily', region='na')#
Base class for Daymet requests.
- Parameters
variables (
str
orlist
ortuple
, optional) – List of variables to be downloaded. The acceptable variables are:tmin
,tmax
,prcp
,srad
,vp
,swe
,dayl
Descriptions can be found here. Defaults to None i.e., all the variables are downloaded.pet (
str
, optional) – Method for computing PET. Supported methods arepenman_monteith
,priestley_taylor
,hargreaves_samani
, and None (don’t compute PET). Thepenman_monteith
method is based on Allen et al.1 assuming that soil heat flux density is zero. Thepriestley_taylor
method is based on Priestley and TAYLOR2 assuming that soil heat flux density is zero. Thehargreaves_samani
method is based on Hargreaves and Samani3. Defaults toNone
.snow (
bool
, optional) – Compute snowfall from precipitation and minimum temperature. Defaults toFalse
.time_scale (
str
, optional) – Data time scale which can be daily, monthly (monthly summaries), or annual (annual summaries). Defaults to daily.region (
str
, optional) – Region in the US, defaults to na. Acceptable values are:na: Continental North America
hi: Hawaii
pr: Puerto Rico
References
- 1
Richard G Allen, Luis S Pereira, Dirk Raes, Martin Smith, and others. Crop evapotranspiration-guidelines for computing crop water requirements-fao irrigation and drainage paper 56. Fao, Rome, 300(9):D05109, 1998.
- 2
Charles Henry Brian Priestley and Robert Joseph TAYLOR. On the assessment of surface heat flux and evaporation using large-scale parameters. Monthly weather review, 100(2):81–92, 1972.
- 3
George H. Hargreaves and Zohrab A. Samani. Estimating potential evapotranspiration. Journal of the Irrigation and Drainage Division, 108(3):225–230, sep 1982. URL: https://doi.org/10.1061%2Fjrcea4.0001390, doi:10.1061/jrcea4.0001390.
- static check_dates(dates)#
Check if input dates are in correct format and valid.
- dates_todict(self, dates)#
Set dates by start and end dates as a tuple, (start, end).
- dates_tolist(self, dates)#
Correct dates for Daymet accounting for leap years.
Daymet doesn’t account for leap years and removes Dec 31 when it’s leap year.
- separate_snow(self, clm, t_rain=T_RAIN, t_snow=T_SNOW)#
Separate snow based on Martinez and Gupta4.
- Parameters
clm (
pandas.DataFrame
orxarray.Dataset
) – Climate data that should includeprcp
andtmin
.t_rain (
float
, optional) – Threshold for temperature for considering rain, defaults to 2.5 degrees C.t_snow (
float
, optional) – Threshold for temperature for considering snow, defaults to 0.6 degrees C.
- Returns
pandas.DataFrame
orxarray.Dataset
– Input data withsnow (mm/day)
column if input is apandas.DataFrame
, orsnow
variable if input is anxarray.Dataset
.
References
- 4
Guillermo F. Martinez and Hoshin V. Gupta. Toward improved identification of hydrological models: a diagnostic evaluation of the “abcd” monthly water balance model for the conterminous united states. Water Resources Research, 2010. URL: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2009WR008294, arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2009WR008294, doi:https://doi.org/10.1029/2009WR008294.
- years_todict(self, years)#
Set date by list of year(s).
pydaymet.pet
#
Core class for the Daymet functions.
Module Contents#
- pydaymet.pet.potential_et(clm, coords=None, crs=DEF_CRS, method='hargreaves_samani', params=None)#
Compute Potential EvapoTranspiration for both gridded and a single location.
- Parameters
clm (
pandas.DataFrame
orxarray.Dataset
) – The dataset must include at least the following variables:Minimum temperature in degree celsius
Maximum temperature in degree celsius
Solar radiation in in W/m2
Daylight duration in seconds
Optionally, relative humidity and wind speed at 2-m level will be used if available.
Table below shows the variable names that the function looks for in the input data.
DataFrame
Dataset
tmin (degrees C)
tmin
tmax (degrees C)
tmax
srad (W/m2)
srad
dayl (s)
dayl
rh (-)
rh
u2 (m/s)
u2
If relative humidity and wind speed at 2-m level are not available, actual vapour pressure is assumed to be saturation vapour pressure at daily minimum temperature and 2-m wind speed is considered to be 2 m/s.
coords (
tuple
offloats
, optional) – Coordinates of the daymet data location as a tuple, (x, y). This is required whenclm
is aDataFrame
.crs (
str
, optional) – The spatial reference of the input coordinate, defaults toepsg:4326
. This is only used whenclm
is aDataFrame
.method (
str
, optional) – Method for computing PET. Supported methods arepenman_monteith
,priestley_taylor
,hargreaves_samani
, and None (don’t compute PET). Thepenman_monteith
method is based on Allen et al.1 assuming that soil heat flux density is zero. Thepriestley_taylor
method is based on Priestley and TAYLOR2 assuming that soil heat flux density is zero. Thehargreaves_samani
method is based on Hargreaves and Samani3. Defaults tohargreaves_samani
.params (
dict
, optional) – Model-specific parameters as a dictionary, defaults toNone
.
- Returns
pandas.DataFrame
orxarray.Dataset
– The input DataFrame/Dataset with an additional variable namedpet (mm/day)
for DataFrame andpet
for Dataset.
References
- 1
Richard G Allen, Luis S Pereira, Dirk Raes, Martin Smith, and others. Crop evapotranspiration-guidelines for computing crop water requirements-fao irrigation and drainage paper 56. Fao, Rome, 300(9):D05109, 1998.
- 2
Charles Henry Brian Priestley and Robert Joseph TAYLOR. On the assessment of surface heat flux and evaporation using large-scale parameters. Monthly weather review, 100(2):81–92, 1972.
- 3
George H. Hargreaves and Zohrab A. Samani. Estimating potential evapotranspiration. Journal of the Irrigation and Drainage Division, 108(3):225–230, sep 1982. URL: https://doi.org/10.1061%2Fjrcea4.0001390, doi:10.1061/jrcea4.0001390.
pydaymet.pydaymet
#
Access the Daymet database for both single single pixel and gridded queries.
Module Contents#
- pydaymet.pydaymet.get_bycoords(coords, dates, crs=DEF_CRS, variables=None, region='na', time_scale='daily', pet=None, pet_params=None, snow=False, snow_params=None, ssl=None)#
Get point-data from the Daymet database at 1-km resolution.
This function uses THREDDS data service to get the coordinates and supports getting monthly and annual summaries of the climate data directly from the server.
- Parameters
coords (
tuple
) – Coordinates of the location of interest as a tuple (lon, lat)dates (
tuple
orlist
, optional) – Start and end dates as a tuple (start, end) or a list of years[2001, 2010, ...]
.crs (
str
, optional) – The CRS of the input geometry, defaults to"epsg:4326"
.variables (
str
orlist
) – List of variables to be downloaded. The acceptable variables are:tmin
,tmax
,prcp
,srad
,vp
,swe
,dayl
Descriptions can be found here.region (
str
, optional) – Target region in the US, defaults tona
. Acceptable values are:na
: Continental North Americahi
: Hawaiipr
: Puerto Rico
time_scale (
str
, optional) – Data time scale which can bedaily
,monthly
(monthly summaries), orannual
(annual summaries). Defaults todaily
.pet (
str
, optional) – Method for computing PET. Supported methods arepenman_monteith
,priestley_taylor
,hargreaves_samani
, and None (don’t compute PET). Thepenman_monteith
method is based on Allen et al.1 assuming that soil heat flux density is zero. Thepriestley_taylor
method is based on Priestley and TAYLOR2 assuming that soil heat flux density is zero. Thehargreaves_samani
method is based on Hargreaves and Samani3. Defaults toNone
.pet_params (
dict
, optional) – Model-specific parameters as a dictionary that is passed to the PET function. Defaults toNone
.snow (
bool
, optional) – Compute snowfall from precipitation and minimum temperature. Defaults toFalse
.snow_params (
dict
, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used ifsnow
isTrue
. Two parameters are required:t_rain
(deg C) which is the threshold for temperature for considering rain andt_snow
(deg C) which is the threshold for temperature for considering snow. The default values are{'t_rain': 2.5, 't_snow': 0.6}
that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.
- Returns
pandas.DataFrame
– Daily climate data for a location.
Examples
>>> import pydaymet as daymet >>> coords = (-1431147.7928, 318483.4618) >>> dates = ("2000-01-01", "2000-12-31") >>> clm = daymet.get_bycoords( ... coords, ... dates, ... crs="epsg:3542", ... pet="hargreaves_samani", ... ssl=False ... ) >>> clm["pet (mm/day)"].mean() 3.713
References
- 1(1,2)
Richard G Allen, Luis S Pereira, Dirk Raes, Martin Smith, and others. Crop evapotranspiration-guidelines for computing crop water requirements-fao irrigation and drainage paper 56. Fao, Rome, 300(9):D05109, 1998.
- 2(1,2)
Charles Henry Brian Priestley and Robert Joseph TAYLOR. On the assessment of surface heat flux and evaporation using large-scale parameters. Monthly weather review, 100(2):81–92, 1972.
- 3(1,2)
George H. Hargreaves and Zohrab A. Samani. Estimating potential evapotranspiration. Journal of the Irrigation and Drainage Division, 108(3):225–230, sep 1982. URL: https://doi.org/10.1061%2Fjrcea4.0001390, doi:10.1061/jrcea4.0001390.
- pydaymet.pydaymet.get_bygeom(geometry, dates, crs=DEF_CRS, variables=None, region='na', time_scale='daily', pet=None, pet_params=None, snow=False, snow_params=None, ssl=None)#
Get gridded data from the Daymet database at 1-km resolution.
- Parameters
geometry (
Polygon
,MultiPolygon
, orbbox
) – The geometry of the region of interest.dates (
tuple
orlist
, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].crs (
str
, optional) – The CRS of the input geometry, defaults to epsg:4326.variables (
str
orlist
) – List of variables to be downloaded. The acceptable variables are:tmin
,tmax
,prcp
,srad
,vp
,swe
,dayl
Descriptions can be found here.region (
str
, optional) – Region in the US, defaults to na. Acceptable values are:na: Continental North America
hi: Hawaii
pr: Puerto Rico
time_scale (
str
, optional) – Data time scale which can be daily, monthly (monthly average), or annual (annual average). Defaults to daily.pet (
str
, optional) – Method for computing PET. Supported methods arepenman_monteith
,priestley_taylor
,hargreaves_samani
, and None (don’t compute PET). Thepenman_monteith
method is based on Allen et al.1 assuming that soil heat flux density is zero. Thepriestley_taylor
method is based on Priestley and TAYLOR2 assuming that soil heat flux density is zero. Thehargreaves_samani
method is based on Hargreaves and Samani3. Defaults toNone
.pet_params (
dict
, optional) – Model-specific parameters as a dictionary that is passed to the PET function. Defaults toNone
.snow (
bool
, optional) – Compute snowfall from precipitation and minimum temperature. Defaults toFalse
.snow_params (
dict
, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used ifsnow
isTrue
. Two parameters are required:t_rain
(deg C) which is the threshold for temperature for considering rain andt_snow
(deg C) which is the threshold for temperature for considering snow. The default values are{'t_rain': 2.5, 't_snow': 0.6}
that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.
- Returns
xarray.Dataset
– Daily climate data within the target geometry.
Examples
>>> from shapely.geometry import Polygon >>> import pydaymet as daymet >>> geometry = Polygon( ... [[-69.77, 45.07], [-69.31, 45.07], [-69.31, 45.45], [-69.77, 45.45], [-69.77, 45.07]] ... ) >>> clm = daymet.get_bygeom(geometry, 2010, variables="tmin", time_scale="annual") >>> clm["tmin"].mean().compute().item() 1.361
References
Package Contents#
async_retriever
#
Top-level package.
Submodules#
async_retriever.async_retriever
#
Core async functions.
Module Contents#
- async_retriever.async_retriever.delete_url_cache(url, request_method='GET', cache_name=None, **kwargs)#
Delete cached response associated with
url
, along with its history (if applicable).- Parameters
url (
str
) – URL to be deleted from the cacherequest_method (
str
, optional) – HTTP request method to be deleted from the cache, defaults toGET
.cache_name (
str
, optional) – Path to a file for caching the session, defaults to./cache/aiohttp_cache.sqlite
.kwargs (
dict
, optional) – Keywords to pass to thecache.delete_url()
.
- async_retriever.async_retriever.retrieve(urls, read, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=EXPIRE, ssl=None, disable=False)#
Send async requests.
- Parameters
read (
str
) – Method for returning the request;binary
,json
, andtext
.request_kwds (
list
ofdict
, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults toNone
. For example,[{"params": {...}, "headers": {...}}, ...]
.request_method (
str
, optional) – Request type;GET
(get
) orPOST
(post
). Defaults toGET
.max_workers (
int
, optional) – Maximum number of async processes, defaults to 8.cache_name (
str
, optional) – Path to a file for caching the session, defaults to./cache/aiohttp_cache.sqlite
.timeout (
float
, optional) – Timeout for the request, defaults to 5.0.expire_after (
int
, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.disable (
bool
, optional) – IfTrue
temporarily disable caching requests and get new responses from the server, defaults to False.
- Returns
list
– List of responses in the order of input URLs.
Examples
>>> import async_retriever as ar >>> stations = ["01646500", "08072300", "11073495"] >>> url = "https://waterservices.usgs.gov/nwis/site" >>> urls, kwds = zip( ... *[ ... (url, {"params": {"format": "rdb", "sites": s, "siteStatus": "all"}}) ... for s in stations ... ] ... ) >>> resp = ar.retrieve(urls, "text", request_kwds=kwds) >>> resp[0].split('\n')[-2].split('\t')[1] '01646500'
- async_retriever.async_retriever.retrieve_binary(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=EXPIRE, ssl=None, disable=False)#
Send async requests and get the response as
bytes
.- Parameters
request_kwds (
list
ofdict
, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults toNone
. For example,[{"params": {...}, "headers": {...}}, ...]
.request_method (
str
, optional) – Request type;GET
(get
) orPOST
(post
). Defaults toGET
.max_workers (
int
, optional) – Maximum number of async processes, defaults to 8.cache_name (
str
, optional) – Path to a file for caching the session, defaults to./cache/aiohttp_cache.sqlite
.timeout (
float
, optional) – Timeout for the request, defaults to 5.0.expire_after (
int
, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.disable (
bool
, optional) – IfTrue
temporarily disable caching requests and get new responses from the server, defaults to False.
- Returns
bytes
– List of responses in the order of input URLs.
- async_retriever.async_retriever.retrieve_json(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=EXPIRE, ssl=None, disable=False)#
Send async requests and get the response as
json
.- Parameters
request_kwds (
list
ofdict
, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults toNone
. For example,[{"params": {...}, "headers": {...}}, ...]
.request_method (
str
, optional) – Request type;GET
(get
) orPOST
(post
). Defaults toGET
.max_workers (
int
, optional) – Maximum number of async processes, defaults to 8.cache_name (
str
, optional) – Path to a file for caching the session, defaults to./cache/aiohttp_cache.sqlite
.timeout (
float
, optional) – Timeout for the request, defaults to 5.0.expire_after (
int
, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.disable (
bool
, optional) – IfTrue
temporarily disable caching requests and get new responses from the server, defaults to False.
- Returns
dict
– List of responses in the order of input URLs.
Examples
>>> import async_retriever as ar >>> urls = ["https://labs.waterdata.usgs.gov/api/nldi/linked-data/comid/position"] >>> kwds = [ ... { ... "params": { ... "f": "json", ... "coords": "POINT(-68.325 45.0369)", ... }, ... }, ... ] >>> r = ar.retrieve_json(urls, kwds) >>> print(r[0]["features"][0]["properties"]["identifier"]) 2675320
- async_retriever.async_retriever.retrieve_text(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=EXPIRE, ssl=None, disable=False)#
Send async requests and get the response as
text
.- Parameters
request_kwds (
list
ofdict
, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults toNone
. For example,[{"params": {...}, "headers": {...}}, ...]
.request_method (
str
, optional) – Request type;GET
(get
) orPOST
(post
). Defaults toGET
.max_workers (
int
, optional) – Maximum number of async processes, defaults to 8.cache_name (
str
, optional) – Path to a file for caching the session, defaults to./cache/aiohttp_cache.sqlite
.timeout (
float
, optional) – Timeout for the request in seconds, defaults to 5.0.expire_after (
int
, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.disable (
bool
, optional) – IfTrue
temporarily disable caching requests and get new responses from the server, defaults to False.
- Returns
list
– List of responses in the order of input URLs.
Examples
>>> import async_retriever as ar >>> stations = ["01646500", "08072300", "11073495"] >>> url = "https://waterservices.usgs.gov/nwis/site" >>> urls, kwds = zip( ... *[ ... (url, {"params": {"format": "rdb", "sites": s, "siteStatus": "all"}}) ... for s in stations ... ] ... ) >>> resp = ar.retrieve_text(urls, kwds) >>> resp[0].split('\n')[-2].split('\t')[1] '01646500'
async_retriever.utils
#
Core async functions.
Module Contents#
- class async_retriever.utils.BaseRetriever(urls, read, request_kwds=None, request_method='GET', cache_name=None)#
Base class for async retriever.
- static generate_requests(urls, request_kwds)#
Generate urls and keywords.
- async_retriever.utils.create_cachefile(db_name=None)#
Create a cache folder in the current working directory.
- async async_retriever.utils.delete_url(url, method='GET', cache_name=None, **kwargs)#
Delete cached response associated with
url
.
- async_retriever.utils.get_event_loop()#
Create an event loop.
- async async_retriever.utils.retriever(uid, url, s_kwds, session, read_type, r_kwds)#
Create an async request and return the response as binary.
- Parameters
uid (
int
) – ID of the URL for sorting after returning the resultsurl (
str
) – URL to be retrieveds_kwds (
dict
) – Arguments to be passed to requestssession (
ClientSession
) – A ClientSession for sending the requestread_type (
str
) – Return response as text, bytes, or json.r_kwds (
dict
) – Keywords to pass to the response read function. It is{"content_type": None}
ifread
isjson
else an emptydict
.
- Returns
bytes
– The retrieved response as binary.
Package Contents#
pygeoogc
#
Top-level package for PyGeoOGC.
Submodules#
pygeoogc.core
#
Base classes and function for REST, WMS, and WMF services.
Module Contents#
- class pygeoogc.core.ArcGISRESTfulBase(base_url, layer=None, outformat='geojson', outfields='*', crs=DEF_CRS, max_workers=1, verbose=False, disable_retry=False)#
Access to an ArcGIS REST service.
- Parameters
base_url (
str
, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last/
in the url or the target layer must be passed as an argument.layer (
int
, optional) – Target layer number, defaults to None. If None layer number must be included as after the last/
inbase_url
.outformat (
str
, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults togeojson
. It defaults toesriSpatialRelIntersects
.outfields (
str
orlist
) – The output fields to be requested. Setting*
as outfields requests all the available fields which is the default setting.crs (
str
, optional) – The spatial reference of the output data, defaults to EPSG:4326max_workers (
int
, optional) – Max number of simultaneous requests, default to 2. Note that some services might face issues when several requests are sent simultaneously and will return the requests partially. It’s recommended to avoid using too many workers unless you are certain the web service can handle it.verbose (
bool
, optional) – If True, prints information about the requests and responses, defaults to False.disable_retry (
bool
, optional) – IfTrue
in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests is saved to a text file which its path can be accessed viaself.failed_path
.
- esri_query(self, geom, geo_crs=DEF_CRS)#
Generate geometry queries based on ESRI template.
- get_features(self, featureids, return_m=False, return_geom=True)#
Get features based on the feature IDs.
- Parameters
- Returns
dict
– (Geo)json response from the web service.
- get_response(self, url, payloads, method='GET')#
Send payload and get the response.
- initialize_service(self)#
Initialize the RESTFul service.
- partition_oids(self, oids)#
Partition feature IDs based on
self.max_nrecords
.
- retry_failed_requests(self)#
Retry failed requests.
- class pygeoogc.core.RESTValidator#
Validate ArcGISRESTful inputs.
- Parameters
base_url (
str
, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last/
in the url or the target layer must be passed as an argument.layer (
int
, optional) – Target layer number, defaults to None. If None layer number must be included as after the last/
inbase_url
.outformat (
str
, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults togeojson
.outfields (
str
orlist
) – The output fields to be requested. Setting*
as outfields requests all the available fields which is the default setting.crs (
str
, optional) – The spatial reference of the output data, defaults to EPSG:4326max_workers (
int
, optional) – Max number of simultaneous requests, default to 2. Note that some services might face issues when several requests are sent simultaneously and will return the requests partially. It’s recommended to avoid using too many workers unless you are certain the web service can handle it.verbose (
bool
, optional) – If True, prints information about the requests and responses, defaults to False.disable_retry (
bool
, optional) – IfTrue
in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests is saved to a text file which its path can be accessed viaself.failed_path
.
- class pygeoogc.core.WFSBase#
Base class for WFS service.
- Parameters
url (
str
) – The base url for the WFS service, for examples: https://hazards.fema.gov/nfhl/services/public/NFHL/MapServer/WFSServerlayer (
str
) – The layer from the service to be downloaded, defaults to None which throws an error and includes all the available layers offered by the service.outformat (
str
) –- The data format to request for data from the service, defaults to None which
throws an error and includes all the available format offered by the service.
version (
str
, optional) – The WFS service version which should be either1.0.0
,1.1.0
, or2.0.0
. Defaults to2.0.0
.crs (
str
, optional) – The spatial reference system to be used for requesting the data, defaults toepsg:4326
.read_method (
str
, optional) – Method for reading the retrieved data, defaults tojson
. Valid options arejson
,binary
, andtext
.max_nrecords (
int
, optional) – The maximum number of records in a single request to be retrieved from the service, defaults to 1000. If the number of requested records is greater than this value, the query will be split into multiple requests.
- get_validnames(self)#
Get valid column names for a layer.
- validate_wfs(self)#
Validate input arguments with the WFS service.
- class pygeoogc.core.WMSBase#
Base class for accessing a WMS service.
- Parameters
url (
str
) – The base url for the WMS service e.g., https://www.mrlc.gov/geoserver/mrlc_download/wmslayers (
str
orlist
) – A layer or a list of layers from the service to be downloaded. You can pass an empty string to get a list of available layers.outformat (
str
) – The data format to request for data from the service. You can pass an empty string to get a list of available output formats.version (
str
, optional) – The WMS service version which should be either 1.1.1 or 1.3.0, defaults to 1.3.0.crs (
str
, optional) – The spatial reference system to be used for requesting the data, defaults toepsg:4326
.
- get_validlayers(self)#
Get the layers supported by the WMS service.
- validate_wms(self)#
Validate input arguments with the WMS service.
pygeoogc.pygeoogc
#
Base classes and function for REST, WMS, and WMF services.
Module Contents#
- class pygeoogc.pygeoogc.ArcGISRESTful(base_url, layer=None, outformat='geojson', outfields='*', crs=DEF_CRS, max_workers=1, verbose=False, disable_retry=False)#
Access to an ArcGIS REST service.
Notes
By default, all retrieval methods retry to get the missing feature IDs, if there are any. You can disable this behavior by setting
disable_retry
toTrue
. If there are any missing feature IDs after the retry, they are saved to a text file, path of which can be accessed byself.client.failed_path
.- Parameters
base_url (
str
, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last/
in the url or the target layer must be passed as an argument.layer (
int
, optional) – Target layer number, defaults to None. If None layer number must be included as after the last/
inbase_url
.outformat (
str
, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults togeojson
.outfields (
str
orlist
) – The output fields to be requested. Setting*
as outfields requests all the available fields which is the default behaviour.crs (
str
, optional) – The spatial reference of the output data, defaults toepsg:4326
.max_workers (
int
, optional) – Number of simultaneous download, default to 1, i.e., no threading. Note that some services might face issues when several requests are sent simultaneously and will return the requests partially. It’s recommended to avoid using too many workers unless you are certain the web service can handle it.verbose (
bool
, optional) – If True, prints information about the requests and responses, defaults to False.disable_retry (
bool
, optional) – IfTrue
in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests is saved to a text file which its path can be accessed viaself.client.failed_path
.
- get_features(self, featureids, return_m=False, return_geom=True)#
Get features based on the feature IDs.
- Parameters
- Returns
dict
– (Geo)json response from the web service.
- oids_byfield(self, field, ids)#
Get Object IDs based on a list of field IDs.
- oids_bygeom(self, geom, geo_crs=DEF_CRS, spatial_relation='esriSpatialRelIntersects', sql_clause=None, distance=None)#
Get feature IDs within a geometry that can be combined with a SQL where clause.
- Parameters
geom (
LineString
,Polygon
,Point
,MultiPoint
,tuple
, orlist
oftuples
) – A geometry (LineString, Polygon, Point, MultiPoint), tuple of length two ((x, y)
), a list of tuples of length 2 ([(x, y), ...]
), or bounding box (tuple of length 4 ((xmin, ymin, xmax, ymax)
)).geo_crs (
str
orpyproj.CRS
) – The spatial reference of the input geometry.spatial_relation (
str
, optional) – The spatial relationship to be applied on the input geometry while performing the query. If not correct a list of available options is shown. It defaults toesriSpatialRelIntersects
. Valid predicates are:esriSpatialRelIntersects
esriSpatialRelContains
esriSpatialRelCrosses
esriSpatialRelEnvelopeIntersects
esriSpatialRelIndexIntersects
esriSpatialRelOverlaps
esriSpatialRelTouches
esriSpatialRelWithin
esriSpatialRelRelation
sql_clause (
str
, optional) – Valid SQL 92 WHERE clause, default to None.distance (
int
, optional) – Buffer distance in meters for the input geometries, default to None.
- Returns
list
oftuples
– A list of feature IDs partitioned byself.max_nrecords
.
- class pygeoogc.pygeoogc.HttpURLs#
URLs of the supported HTTP services.
- class pygeoogc.pygeoogc.RESTfulURLs#
URLs of the supported RESTful services.
- class pygeoogc.pygeoogc.ServiceURL#
URLs of the supported services.
- class pygeoogc.pygeoogc.WFS(url, layer=None, outformat=None, version='2.0.0', crs=DEF_CRS, read_method='json', max_nrecords=1000, validation=True)#
Data from any WFS service within a geometry or by featureid.
- Parameters
url (
str
) – The base url for the WFS service, for examples: https://hazards.fema.gov/nfhl/services/public/NFHL/MapServer/WFSServerlayer (
str
) – The layer from the service to be downloaded, defaults to None which throws an error and includes all the available layers offered by the service.outformat (
str
) –- The data format to request for data from the service, defaults to None which
throws an error and includes all the available format offered by the service.
version (
str
, optional) – The WFS service version which should be either 1.0.0, 1.1.0, or 2.0.0. Defaults to 2.0.0.crs (
str
, optional) – The spatial reference system to be used for requesting the data, defaults toepsg:4326
.read_method (
str
, optional) – Method for reading the retrieved data, defaults tojson
. Valid options arejson
,binary
, andtext
.max_nrecords (
int
, optional) – The maximum number of records in a single request to be retrieved from the service, defaults to 1000. If the number of records requested is greater than this value, it will be split into multiple requests.validation (
bool
, optional) – Validate the input arguments from the WFS service, defaults to True. Set this to False if you are sure all the WFS settings such as layer and crs are correct to avoid sending extra requests.
- getfeature_bybox(self, bbox, box_crs=DEF_CRS, always_xy=False)#
Get data from a WFS service within a bounding box.
- Parameters
bbox (
tuple
) – A bounding box for getting the data: [west, south, east, north]box_crs (
str
, orpyproj.CRS
, optional) – The spatial reference system of the input bbox, defaults toepsg:4326
.always_xy (
bool
, optional) – Whether to always use xy axis order, defaults to False. Some services change the axis order from xy to yx, following the latest WFS version specifications but some don’t. If the returned value does not have any geometry, it indicates that most probably the axis order does not match. You can set this to True in that case.
- Returns
str
orbytes
ordict
– WFS query response within a bounding box.
- getfeature_byfilter(self, cql_filter, method='GET')#
Get features based on a valid CQL filter.
Notes
The validity of the input CQL expression is user’s responsibility since the function does not perform any checks and just sends a request using the input filter.
- getfeature_bygeom(self, geometry, geo_crs=DEF_CRS, always_xy=False, predicate='INTERSECTS')#
Get features based on a geometry.
- Parameters
geometry (
shapely.geometry
) – The input geometrygeo_crs (
str
, orpyproj.CRS
, optional) – The CRS of the input geometry, default toepsg:4326
.always_xy (
bool
, optional) – Whether to always use xy axis order, defaults to False. Some services change the axis order from xy to yx, following the latest WFS version specifications but some don’t. If the returned value does not have any geometry, it indicates that most probably the axis order does not match. You can set this to True in that case.predicate (
str
, optional) – The geometric predicate to use for requesting the data, defaults toINTERSECTS
. Valid predicates are:EQUALS
DISJOINT
INTERSECTS
TOUCHES
CROSSES
WITHIN
CONTAINS
OVERLAPS
RELATE
BEYOND
- Returns
str
orbytes
ordict
– WFS query response based on the given geometry.
- class pygeoogc.pygeoogc.WFSURLs#
URLs of the supported WFS services.
- class pygeoogc.pygeoogc.WMS(url, layers, outformat, version='1.3.0', crs=DEF_CRS, validation=True, ssl=None)#
Get data from a WMS service within a geometry or bounding box.
- Parameters
url (
str
) – The base url for the WMS service e.g., https://www.mrlc.gov/geoserver/mrlc_download/wmslayers (
str
orlist
) – A layer or a list of layers from the service to be downloaded. You can pass an empty string to get a list of available layers.outformat (
str
) – The data format to request for data from the service. You can pass an empty string to get a list of available output formats.crs (
str
, optional) – The spatial reference system to be used for requesting the data, defaults toepsg:4326
.version (
str
, optional) – The WMS service version which should be either 1.1.1 or 1.3.0, defaults to 1.3.0.validation (
bool
, optional) – Validate the input arguments from the WMS service, defaults to True. Set this to False if you are sure all the WMS settings such as layer and crs are correct to avoid sending extra requests.ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.
- get_validlayers(self)#
Get the layers supported by the WMS service.
- getmap_bybox(self, bbox, resolution, box_crs=DEF_CRS, always_xy=False, max_px=8000000, kwargs=None)#
Get data from a WMS service within a geometry or bounding box.
- Parameters
bbox (
tuple
) – A bounding box for getting the data.resolution (
float
) – The output resolution in meters. The width and height of output are computed in pixel based on the geometry bounds and the given resolution.box_crs (
str
, orpyproj.CRS
, optional) – The spatial reference system of the input bbox, defaults toepsg:4326
.always_xy (
bool
, optional) – Whether to always use xy axis order, defaults to False. Some services change the axis order from xy to yx, following the latest WFS version specifications but some don’t. If the returned value does not have any geometry, it indicates that most probably the axis order does not match. You can set this to True in that case.max_px (
int
,opitonal
) – The maximum allowable number of pixels (width x height) for a WMS requests, defaults to 8 million based on some trial-and-error.kwargs (
dict
, optional) – Optional additional keywords passed as payload, defaults to None. For example,{"styles": "default"}
.
- Returns
dict
– A dict where the keys are the layer name and values are the returned response from the WMS service as bytes.
- class pygeoogc.pygeoogc.WMSURLs#
URLs of the supported WMS services.
pygeoogc.utils
#
Some utilities for PyGeoOGC.
Module Contents#
- class pygeoogc.utils.ESRIGeomQuery#
Generate input geometry query for ArcGIS RESTful services.
- Parameters
geometry (
tuple
orsgeom.Polygon
orsgeom.Point
orsgeom.LineString
) – The input geometry which can be a point (x, y), a list of points [(x, y), …], bbox (xmin, ymin, xmax, ymax), or a Shapely’s sgeom.Polygon.wkid (
int
) – The Well-known ID (WKID) of the geometry’s spatial reference e.g., for EPSG:4326, 4326 should be passed. Check ArcGIS for reference.
- bbox(self)#
Query for a bbox.
- multipoint(self)#
Query for a multi-point.
- point(self)#
Query for a point.
- polygon(self)#
Query for a polygon.
- polyline(self)#
Query for a polyline.
- class pygeoogc.utils.RetrySession(retries=3, backoff_factor=0.3, status_to_retry=(500, 502, 504), prefixes=('https://',), cache_name=None, expire_after=EXPIRE)#
Configures the passed-in session to retry on failed requests.
The fails can be due to connection errors, specific HTTP response codes and 30X redirections. The code is was originally based on: https://github.com/bustawin/retry-requests
- Parameters
retries (
int
, optional) – The number of maximum retries before raising an exception, defaults to 5.backoff_factor (
float
, optional) – A factor used to compute the waiting time between retries, defaults to 0.5.status_to_retry (
tuple
, optional) – A tuple of status codes that trigger the reply behaviour, defaults to (500, 502, 504).prefixes (
tuple
, optional) – The prefixes to consider, defaults to (”http://”, “https://”)cache_name (
str
, optional) – Path to a folder for caching the session, default to None which uses system’s temp directory.expire_after (
int
, optional) – Expiration time for the cache in seconds, defaults to -1 (never expire).
- get(self, url, payload=None, headers=None)#
Retrieve data from a url by GET and return the Response.
- post(self, url, payload=None, headers=None)#
Retrieve data from a url by POST and return the Response.
- pygeoogc.utils.bbox_decompose(bbox, resolution, box_crs=DEF_CRS, max_px=8000000)#
Split the bounding box vertically for WMS requests.
- Parameters
bbox (
tuple
) – A bounding box; (west, south, east, north)resolution (
float
) – The target resolution for a WMS request in meters.box_crs (
str
, optional) – The spatial reference of the input bbox, default to EPSG:4326.max_px (
int
,opitonal
) – The maximum allowable number of pixels (width x height) for a WMS requests, defaults to 8 million based on some trial-and-error.
- Returns
list
oftuples
– Each tuple includes the following elements:Tuple of length 4 that represents a bounding box (west, south, east, north) of a cell,
A label that represents cell ID starting from bottom-left to top-right, for example a 2x2 decomposition has the following labels:
|---------|---------| | | | | 0_1 | 1_1 | | | | |---------|---------| | | | | 0_0 | 1_0 | | | | |---------|---------|
Raster width of a cell,
Raster height of a cell.
- pygeoogc.utils.bbox_resolution(bbox, resolution, bbox_crs=DEF_CRS)#
Image size of a bounding box WGS84 for a given resolution in meters.
- pygeoogc.utils.check_bbox(bbox)#
Check if an input inbox is a tuple of length 4.
- pygeoogc.utils.check_response(resp)#
Extract error message from a response, if any.
- pygeoogc.utils.match_crs(geom, in_crs, out_crs)#
Reproject a geometry to another CRS.
- Parameters
geom (
list
ortuple
orgeometry
) – Input geometry which could be a list of coordinates such as[(x1, y1), ...]
, a bounding box like so(xmin, ymin, xmax, ymax)
, or any validshapely
’s geometry such asPolygon
,MultiPolygon
, etc..in_crs (
str
) – Spatial reference of the input geometryout_crs (
str
) – Target spatial reference
- Returns
same type as the input geometry
– Transformed geometry in the target CRS.
Examples
>>> from pygeoogc.utils import match_crs >>> from shapely.geometry import Point >>> point = Point(-7766049.665, 5691929.739) >>> match_crs(point, "epsg:3857", "epsg:4326").xy (array('d', [-69.7636111130079]), array('d', [45.44549114818127])) >>> bbox = (-7766049.665, 5691929.739, -7763049.665, 5696929.739) >>> match_crs(bbox, "epsg:3857", "epsg:4326") (-69.7636111130079, 45.44549114818127, -69.73666165448431, 45.47699468552394) >>> coords = [(-7766049.665, 5691929.739)] >>> match_crs(coords, "epsg:3857", "epsg:4326") [(-69.7636111130079, 45.44549114818127)]
- pygeoogc.utils.traverse_json(obj, path)#
Extract an element from a JSON file along a specified path.
This function is based on bcmullins.
- Parameters
- Returns
list
– The items founds in the JSON
Examples
>>> from pygeoogc.utils import traverse_json >>> data = [{ ... "employees": [ ... {"name": "Alice", "role": "dev", "nbr": 1}, ... {"name": "Bob", "role": "dev", "nbr": 2}], ... "firm": {"name": "Charlie's Waffle Emporium", "location": "CA"}, ... },] >>> traverse_json(data, ["employees", "name"]) [['Alice', 'Bob']]
- pygeoogc.utils.valid_wms_crs(url)#
Get valid CRSs from a WMS service version 1.3.0.
Package Contents#
pygeoutils
#
Top-level package for PyGeoUtils.
Submodules#
pygeoutils.pygeoutils
#
Some utilities for manipulating GeoSpatial data.
Module Contents#
- class pygeoutils.pygeoutils.Coordinates#
Generate validated and normalized coordinates in WGS84.
- Parameters
Examples
>>> from pygeoutils import Coordinates >>> c = Coordinates([460, 20, -30], [80, 200, 10]) >>> c.points.x.tolist() [100.0, -30.0]
- property points(self)#
Get validate coordinate as a
geopandas.GeoSeries
.
- class pygeoutils.pygeoutils.GeoBSpline(points, npts_sp, degree=3)#
Create B-spline from a geo-dataframe of points.
- Parameters
points (
geopandas.GeoDataFrame
orgeopandas.GeoSeries
) – Input points as aGeoDataFrame
orGeoSeries
in a projected CRS.npts_sp (
int
) – Number of points in the output spline curve.degree (
int
, optional) – Degree of the spline. Should be less than the number of points and greater than 1. Default is 3.
Examples
>>> from pygeoutils import GeoBSpline >>> import geopandas as gpd >>> xl, yl = zip( ... *[ ... (-97.06138, 32.837), ... (-97.06133, 32.836), ... (-97.06124, 32.834), ... (-97.06127, 32.832), ... ] ... ) >>> pts = gpd.GeoSeries(gpd.points_from_xy(xl, yl, crs="epsg:4326")) >>> sp = GeoBSpline(pts.to_crs("epsg:3857"), 5).spline >>> pts_sp = gpd.GeoSeries(gpd.points_from_xy(sp.x, sp.y, crs="epsg:3857")) >>> pts_sp = pts_sp.to_crs("epsg:4326") >>> list(zip(pts_sp.x, pts_sp.y)) [(-97.06138, 32.837), (-97.06135, 32.83629), (-97.06131, 32.83538), (-97.06128, 32.83434), (-97.06127, 32.83319)]
- property spline(self)#
Get the spline as a
Spline
object.
- pygeoutils.pygeoutils.arcgis2geojson(arcgis, id_attr=None)#
Convert ESRIGeoJSON format to GeoJSON.
Notes
Based on arcgis2geojson.
- pygeoutils.pygeoutils.break_lines(lines, points, tol=0.0)#
Break lines at specified points at given direction.
- Parameters
lines (
geopandas.GeoDataFrame
) – Lines to break at intersection points.points (
geopandas.GeoDataFrame
) – Points to break lines at. It must contain a column nameddirection
with valuesup
ordown
. This column is used to determine which part of the lines to keep, i.e., upstream or downstream of points.tol (
float
, optional) – Tolerance for snapping points to the nearest lines in meters. The default is 0.0.
- Returns
geopandas.GeoDataFrame
– Original lines except for the parts that have been broken at the specified points.
- pygeoutils.pygeoutils.geo2polygon(geometry, geo_crs, crs)#
Convert a geometry to a Shapely’s Polygon and transform to any CRS.
- pygeoutils.pygeoutils.geometry_list(geometry)#
Get a list of polygons, points, and lines from a geometry.
- pygeoutils.pygeoutils.get_transform(ds, ds_dims=('y', 'x'))#
Get transform of a
xarray.Dataset
orxarray.DataArray
.- Parameters
ds (
xarray.Dataset
orxarray.DataArray
) – The dataset(array) to be maskedds_dims (
tuple
, optional) – Names of the coordinames in the dataset, defaults to("y", "x")
. The order of the dimension names must be (vertical, horizontal).
- Returns
rasterio.Affine
,int
,int
– The affine transform, width, and height
- pygeoutils.pygeoutils.gtiff2xarray(r_dict, geometry=None, geo_crs=None, ds_dims=None, driver=None, all_touched=False, nodata=None, drop=True)#
Convert (Geo)Tiff byte responses to
xarray.Dataset
.- Parameters
r_dict (
dict
) – Dictionary of (Geo)Tiff byte responses where keys are some names that are used for naming each responses, and values are bytes.geometry (
Polygon
,MultiPolygon
, ortuple
, optional) – The geometry to mask the data that should be in the same CRS as the r_dict. Defaults toNone
.geo_crs (
str
orpyproj.CRS
, optional) – The spatial reference of the input geometry, defaults toNone
. This argument should be given whengeometry
is given.ds_dims (
tuple
ofstr
, optional) – The names of the vertical and horizontal dimensions (in that order) of the target dataset, default to None. If None, dimension names are determined from a list of common names.driver (
str
, optional) – A GDAL driver for reading the content, defaults to automatic detection. A list of the drivers can be found here: https://gdal.org/drivers/raster/index.htmlall_touched (
bool
, optional) – Include a pixel in the mask if it touches any of the shapes. If False (default), include a pixel only if its center is within one of the shapes, or if it is selected by Bresenham’s line algorithm.nodata (
float
orint
, optional) – The nodata value of the raster, defaults to None, i.e., is determined from the raster.drop (
bool
, optional) – If True, drop the data outside of the extent of the mask geometries. Otherwise, it will return the same raster with the data masked. Default is True.
- Returns
xarray.Dataset
orxarray.DataAraay
– Parallel (with dask) dataset or dataarray.
- pygeoutils.pygeoutils.json2geodf(content, in_crs=DEF_CRS, crs=DEF_CRS)#
Create GeoDataFrame from (Geo)JSON.
- Parameters
- Returns
geopandas.GeoDataFrame
– Generated geo-data frame from a GeoJSON
- pygeoutils.pygeoutils.snap2nearest(lines, points, tol)#
Find the nearest points on a line to a set of points.
- Parameters
lines (
geopandas.GeoDataFrame
orgeopandas.GeoSeries
) – Lines.points (
geopandas.GeoDataFrame
orgeopandas.GeoSeries
) – Points to snap to lines.tol (
float
, optional) – Tolerance for snapping points to the nearest lines in meters. It must be greater than 0.0.
- Returns
geopandas.GeoDataFrame
orgeopandas.GeoSeries
– Points snapped to lines.
- pygeoutils.pygeoutils.xarray2geodf(da, dtype, mask_da=None, connectivity=8)#
Vectorize a
xarray.DataArray
to ageopandas.GeoDataFrame
.- Parameters
da (
xarray.DataArray
) – The dataarray to vectorize.dtype (
type
) – The data type of the dataarray. Valid types areint16
,int32
,uint8
,uint16
, andfloat32
.mask_da (
xarray.DataArray
, optional) – The dataarray to use as a mask, defaults toNone
.connectivity (
int
, optional) – Use 4 or 8 pixel connectivity for grouping pixels into features, defaults to 8.
- Returns
geopandas.GeoDataFrame
– The vectorized dataarray.
- pygeoutils.pygeoutils.xarray_geomask(ds, geometry, crs, all_touched=False, drop=True, from_disk=False)#
Mask a
xarray.Dataset
based on a geometry.- Parameters
ds (
xarray.Dataset
orxarray.DataArray
) – The dataset(array) to be maskedgeometry (
Polygon
,MultiPolygon
) – The geometry to mask the datacrs (
str
orpyproj.CRS
) – The spatial reference of the input geometryall_touched (
bool
, optional) – Include a pixel in the mask if it touches any of the shapes. If False (default), include a pixel only if its center is within one of the shapes, or if it is selected by Bresenham’s line algorithm.drop (
bool
, optional) – If True, drop the data outside of the extent of the mask geometries. Otherwise, it will return the same raster with the data masked. Default is True.from_disk (
bool
, optional) – If True, it will clip from disk using rasterio.mask.mask if possible. This is beneficial when the size of the data is larger than memory. Default is False.
- Returns
xarray.Dataset
orxarray.DataArray
– The input dataset with a mask applied (np.nan)
Package Contents#
Changelogs#
History#
0.13.1 (2022-06-11)#
New Features#
Add support for all the GeoConnex web service endpoints. There are two ways to use it. For a single query, you can use the
geoconnex
function and for multiple queries, it’s more efficient to use theGeoConnex
class.Add support for passing any of the supported NLDI feature sources to the
get_basins
method of theNLDI
class. The default isnwissite
to retain backward compatibility.
Bug Fixes#
Set the type of “ReachCode” column to
str
instead ofint
inpygeoapi
andnhdplus_vaa
functions.
0.13.0 (2022-04-03)#
New Features#
Add two new functions called
flowline_resample
andnetwork_resample
for resampling a flowline or network of flowlines based on a given spacing. This is useful for smoothing jagged flowlines similar to those in the NHDPlus database.Add support for the new NLDI endpoint called “hydrolocation”. The
NLDI
class now has two methods for getting features by coordinates:feature_byloc
andcomid_byloc
. Thefeature_byloc
method returns the flowline that is associated with the closest NHDPlus feature to the given coordinates. Thecomid_byloc
method returns a point on the closest downstream flowline to the given coordinates.Add a new function called
pygeoapi
for calling the API in batch mode. This function accepts the input coordinates as ageopandas.GeoDataFrame
. It is more performant than calling its counteractPyGeoAPI
multiple times. It’s recommended to switch to using this new batch function instead of thePyGeoAPI
class. Users just need to prepare an input data frame that has all the required service parameters as columns.Add a new step to
prepare_nhdplus
to convertMultiLineString
toLineString
.Add support for the
simplified
flag of NLDI’sget_basins
function. The default value isTrue
to retain the old behavior.
Breaking Changes#
Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database.HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache file.
You can do this like so:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
0.12.2 (2022-02-04)#
New Features#
Add a new class called
NHD
for accessing the latest National Hydrography Dataset. More info regarding this data can be found here.Add two new functions for getting cross-sections along a single flowline via
flowline_xsection
or throughout a network of flowlines vianetwork_xsection
. You can specify spacing and width parameters to control their location. For more information and examples please consult the documentation.Add a new property to
AGRBase
calledservice_info
to include some useful info about the service includingfeature_types
which can be handy for converting numeric values of types to their string equivalent.
Internal Changes#
Use the new PyGeoAPI API.
Refactor
prepare_nhdplus
for improving the performance and robustness of determiningtocomid
within a network of NHD flowlines.Add empty geometries that
NLDI.getbasins
returns to the list ofnot found
IDs. This is because the NLDI service does not include non-network flowlines and instead returns an empty geometry for these flowlines. (GH#48)
0.12.1 (2021-12-31)#
Internal Changes#
Use the three new
ar.retrieve_*
functions instead of the oldar.retrieve
function to improve type hinting and to make the API more consistent.Revert to the original PyGeoAPI base URL.
0.12.0 (2021-12-27)#
Breaking Changes#
Rewrite
ScienceBase
to make it applicable for working with other ScienceBase items. A new function has been added for staging the Additional NHDPlus attributes items calledstage_nhdplus_attrs
.Refactor
AGRBase
to remove unnecessary functions and make them more general.Update
PyGeoAPI
class to conform to the newpygeoapi
API. This web service is undergoing some changes at the time of this release and the API is not stable, might not work as expected. As soon as the web service is stable, a new version will be released.
New Features#
In
WaterData.byid
show a warning if there are any missing feature IDs that are requested but are not available in the dataset.For all
by*
methods ofWaterData
throw aZeroMatched
exception if no features are found.Add
expire_after
anddisable_caching
arguments to all functions that useasync_retriever
. Set the default request caching expiration time to never expire. You can usedisable_caching
if you don’t want to use the cached responses. Please refer to documentation of the functions for more details.
Internal Changes#
Refactor
prepare_nhdplus
to reduce code complexity by grouping all the NHDPlus tools as a private class.Modify
AGRBase
to reflect the latest API changes inpygeoogc.ArcGISRESTfull
class.Refactor
prepare_nhdplus
by creating a private class that includes all the previously used private functions. This will make the code more readable and easier to maintain.Add all the missing types so
mypy --strict
passes.
0.11.4 (2021-11-12)#
New Features#
Add a new argument to
NLDI.get_basins
calledsplit_catchment
that if is set toTrue
will split the basin geometry at the watershed outlet.
Internal Changes#
Catch service errors in
PyGeoAPI
and show useful error messages.Use
importlib-metadata
for getting the version instead ofpkg_resources
to decrease import time as discussed in this issue.
0.11.3 (2021-09-10)#
Internal Changes#
More robust handling of inputs and outputs of
NLDI
’s methods.Use an alternative download link for NHDPlus VAA file on Hydroshare.
Restructure the codebase to reduce the complexity of
pynhd.py
file by dividing it into three files:pynhd
all classes that provide access to the supported web services,core
that includes base classes, andnhdplus_derived
that has functions for getting databases that provided additional attributes for the NHDPlus database.
0.11.2 (2021-08-26)#
New Features#
Add support for PyGeoAPI. It offers four functionalities:
flow_trace
,split_catchment
,elevation_profile
, andcross_section
.
0.11.1 (2021-07-31)#
New Features#
Add a function for getting all NHD
FCodes
as a data frame, callednhd_fcode
.Improve
prepare_nhdplus
function by removing all coastlines and better detection of the terminal point in a network.
Internal Changes#
Migrate to using
AsyncRetriever
for handling communications with web services.Catch the
ConnectionError
separately inNLDI
and raise aServiceError
instead. So user knows that data cannot be returned due to the out of service status of the server notZeroMatched
.
0.11.0 (2021-06-19)#
New Features#
Add
nhdplus_vaa
to access NHDPlus Value Added Attributes for all its flowlines.To see a list of available layers in NHDPlus HR, you can instantiate its class without passing any argument like so
NHDPlusHR()
.
Breaking Changes#
Drop support for Python 3.6 since many of the dependencies such as
xarray
andpandas
have done so.
Internal Changes#
Use persistent caching for all requests which can help speed up network responses significantly.
Improve documentation and testing.
0.10.1 (2021-03-27)#
Add an announcement regarding the new name for the software stack, HyRiver.
Improve
pip
installation and release workflow.
0.10.0 (2021-03-06)#
The first release after renaming hydrodata to PyGeoHydro.
Make
mypy
checks more strict and fix all the errors and prevent possible bugs.Speed up CI testing by using
mamba
and caching.
0.9.0 (2021-02-14)#
Bump version to the same version as PyGeoHydro.
Breaking Changes#
Add a new function for getting basins geometries for a list of USGS station IDs. The function is a method of
NLDI
class calledget_basins
. So, nowNLDI.getfeature_byid
function does not have a basin flag. This change makes getting geometries easier and faster.Remove
characteristics_dataframe
method fromNLDI
and make a standalone function callednhdplus_attrs
for accessing NHDPlus attributes directly from ScienceBase.Add support for using hydro or edits webs services for getting NHDPlus High-Resolution using
NHDPlusHR
function. The new arguments areservice
which acceptshydro
oredits
, andautos_switch
flag for automatically switching to the other service if the ones passed byservice
fails.
New Features#
Add a new argument to
topoogical_sort
callededge_attr
that allows adding attribute(s) to the returned Networkx Graph. By default, it isNone
.A new base class,
AGRBase
for connecting to ArcGISRESTful-based services such as National Map and EPA’s WaterGEOS.Add support for setting the buffer distance for the input geometries to
AGRBase.bygeom
.Add
comid_byloc
toNLDI
class for getting ComIDs of the closest flowlines from a list of lon/lat coordinates.Add
bydistance
toWaterData
for getting features within a given radius of a point.
0.2.0 (2020-12-06)#
Breaking Changes#
Re-wrote the
NLDI
function to use API v3 of the NLDI service.The
crs
argument ofWaterData
now is the target CRS of the output dataframe. The service CRS is nowEPSG:4269
for all the layers.Remove the
url_only
argument ofNLDI
since it’s not applicable anymore.
New Features#
Added support for NHDPlus High Resolution for getting features by geometry, IDs, or SQL where clause.
The following functions are added to
NLDI
:
getcharacteristic_byid
: Getting characteristics of NHDPlus catchments.navigate_byloc
: Getting the nearest ComID to a coordinate and performing navigation.characteristics_dataframe
: Getting all the available catchment-scale characteristics as a data frame.get_validchars
: Getting a list of available characteristic IDs for a specified characteristic type.
The following function is added to
WaterData
:
byfilter
: Getting data based on any valid CQL filter.bygeom
: Getting data within a geometry (polygon and multipolygon).
Add support for Python 3.9 and tests for Windows.
Bug Fixes#
Refactored
WaterData
to fix the CRS inconsistencies (#1).
0.1.3 (2020-08-18)#
Replaced
simplejson
withorjson
to speed-up JSON operations.
0.1.2 (2020-08-11)#
Add
show_versions
function for showing versions of the installed deps.Improve documentation
0.1.1 (2020-08-03)#
Improved documentation
Refactored
WaterData
to improve readability.
0.1.0 (2020-07-23)#
First release on PyPI.
History#
0.13.1 (2022-06-11)#
New Features#
Add a new function called
get_us_states
to thehelpers
module for obtaining a GeoDataFrame of the US states. It has an optional argument for returning thecontiguous
states,continental
states,commonwealths
states, or USterritories
. The data are retrieved from the Census’ Tiger 2021 database.In the
NID
class keep thevalid_fields
property as apandas.Series
instead of alist
, so it can be searched easier via itsstr
accessor.
Internal Changes#
Refactor the
plot.signatures
function to useproplot
instead ofmatplotlib
.Improve performance of
NWIS.get_streamflow
by not validating the layer name when instantiating theWaterData
class. Also, make the function more robust by checking if streamflow data is available for each station and throw a warning if not.
Bug Fixes#
Fix an issue in
NWIS.get_streamflow
where-9999
values were not being filtered out. According to NWIS, these values are reserved for ice-affected data. This fix sets these values tonumpy.nan
.
0.13.0 (2022-04-03)#
New Features#
Add a new flag to
nlcd_*
functions calledssl
for disabling SSL verification.Add a new function called
get_camels
for getting the CAMELS dataset. The function returns ageopandas.GeoDataFrame
that includes basin-level attributes for all 671 stations in the dataset and axarray.Dataset
that contains streamflow data for all 671 stations and their basin-level attributes.Add a new function named
overland_roughness
for getting the overland roughness values from land cover data.Add a new class called
WBD
for getting watershed boundary (HUC) data.
from pygeohydro import WBD
wbd = WBD("huc4")
hudson = wbd.byids("huc4", ["0202", "0203"])
Breaking Changes#
Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database.HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache file.
You can do this like so:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
Internal Changes#
Write
nodata
attribute usingrioxarray
innlcd_bygeom
since the clipping operation ofrioxarray
uses this value as the fill value.
0.12.4 (2022-02-04)#
Internal Changes#
Return a named tuple instead of a
dict
of percentages in thecover_statistics
function. It makes accessing the values easier.Add
pycln
as a newpre-commit
hooks for removing unused imports.Remove time zone info from the inputs to
plot.signatures
to avoid issues with thematplotlib
backend.
Bug Fixes#
Fix an issue in
plot.signatures
where the newmatplotlib
version requires anumpy
array instead of apandas.DataFrame
.
0.12.3 (2022-01-15)#
Bug Fixes#
Replace no data values of data in
ssebopeta_bygeom
withnp.nan
before converting it to mm/day.Fix an inconsistency issue with CRS projection when using UTM in
nlcd_*
. UseEPSG:3857
for all reprojections and get the data from NLCD in the same projection. (GH85)Improve performance of
nlcd_*
functions by reducing number of service calls.
Internal Changes#
Add type checking with
typeguard
and fix type hinting issues raised bytypeguard
.Refactor
show_versions
to ensure getting correct versions of all dependencies.
0.12.2 (2021-12-31)#
New Features#
The
NWIS.get_info
now returns ageopandas.GeoDataFrame
instead of apandas.DataFrame
.
Bug Fixes#
Fix a bug in
NWIS.get_streamflow
where the drainage area might not be computed correctly if target stations are not located at the outlet of their watersheds.
0.12.1 (2021-12-31)#
Internal Changes#
Use the three new
ar.retrieve_*
functions instead of the oldar.retrieve
function to improve type hinting and to make the API more consistent.
Bug Fixes#
Fix an in issue with
NWIS.get_streamflow
where time zone of the data was not being correctly determined when it was US specific abbreviations such asCST
.
0.12.0 (2021-12-27)#
New Features#
Add support for getting instantaneous streamflow from NWIS in addition to the daily streamflow by adding
freq
argument toNWIS.get_streamflow
that can be eitheriv
ordv
. The default isdv
to retain the previous behavior of the function.Convert the time zone of the streamflow data to UTC.
Add attributes of the requested stations as
attrs
parameter to the returnedpandas.DataFrame
. (GH75)Add a new flag to
NWIS.get_streamflow
for returning the streamflow asxarray.Dataset
. This dataset has two dimensions;time
andstation_id
. It has ten variables which includesdischarge
and nine other station attributes. (GH75)Add
drain_sqkm
from GagesII toNWIS.get_info
.Show
drain_sqkm
in the interactive map generated byinteractive_map
.Add two new functions for getting NLCD data;
nlcd_bygeom
andnlcd_bycoords
. The newnlcd_bycoords
function returns ageopandas.GeoDataFrame
with the NLCD layers as columns and input coordinates, which should be a list of(lon, lat)
tuples, as thegeometry
column. Moreover, The newnlcd_bygeom
function now accepts ageopandas.GeoDataFrame
as the input. In this case, it returns adict
with keys as indices of the inputgeopandas.GeoDataFrame
. (GH80)The previous
nlcd
function is being deprecated. For now, it callsnlcd_bygeom
internally and retains the old behavior. This function will be removed in future versions.
Breaking Changes#
The
ssebop_byloc
is being deprecated and replaced byssebop_bycoords
. The new function accepts apandas.DataFrame
as input that should include three columns:id
,x
, andy
. It returns axarray.Dataset
with two dimensions:time
andlocation_id
. Theid
columns from the input is used as thelocation_id
dimension. Thessebop_byloc
function still retains the old behavior and will be removed in future versions.Set the request caching’s expiration time to never expire. Add two flags to all functions to control the caching:
expire_after
anddisable_caching
.Replace
NID
class with the new RESTful-based web service of National Inventory of Dams. The new NID service is very different from the old one, so this is considered a breaking change.
Internal Changes#
Improve exception handling in
NWIS.get_info
when NWIS returns an error message rather than 500s web service error.The
NWIS.get_streamflow
function now checks if the site info dataset contains any duplicates. Therefore, all the remaining station numbers will be unique. This prevents an issue with settingattrs
where duplicate indexes cause an exception when being converted to a dict. (GH75)Add all the missing types so
mypy --strict
passes.
0.11.4 (2021-11-24)#
New Features#
Add support for the Water Quality Portal Web Services. (GH72)
Add support for two versions of NID web service. The original NID web service is considered version 2 and the new NID is considered version 3. You can pass the version number to the
NID
like soNID(2)
. The default version is 2.
Bug Fixes#
Fix an issue with background percentage calculation in
cover_statistics
.
0.11.3 (2021-11-12)#
New Features#
Add a new map service for National Inventory of Dams (NID).
Internal Changes#
Use
importlib-metadata
for getting the version instead ofpkg_resources
to decrease import time as discussed in this issue.
0.11.2 (2021-07-31)#
Bug Fixes#
Refactor
cover_statistics
to address an issue with wrong category names and also improve performance for large datasets by usingnumpy
’s functions.Fix an issue with detecting wrong number of stations in
NWIS.get_streamflow
. Also, improve filtering stations that their start/end date don’t match the user requested interval.
0.11.1 (2021-07-31)#
The highlight of this release is adding support for NLCD 2019 and significant improvements in NWIS support.
New Features#
Add support for the recently released version of NLCD (2019), including the impervious descriptor layer. Highlights of the new database are:
NLCD 2019 now offers land cover for years 2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019, and impervious surface and impervious descriptor products now updated to match each date of land cover. These products update all previously released versions of land cover and impervious products for CONUS (NLCD 2001, NLCD 2006, NLCD 2011, NLCD 2016) and are not directly comparable to previous products. NLCD 2019 land cover and impervious surface product versions of previous dates must be downloaded for proper comparison. NLCD 2019 also offers an impervious surface descriptor product that identifies the type of each impervious surface pixel. This product identifies types of roads, wind tower sites, building locations, and energy production sites to allow deeper analysis of developed features.
—MRLC
Add support for all the supported regions of NLCD database (CONUS, AK, HI, and PR).
Add support for passing multiple years to the NLCD function, like so
{"cover": [2016, 2019]}
.Add
plot.descriptor_legends
function to plot the legend for the impervious descriptor layer.New features in
NWIS
class are:Remove
query_*
methods since it’s not convenient to pass them directly as a dictionary.Add a new function called
get_parameter_codes
to query parameters and get information about them.To decrease complexity of
get_streamflow
method add a new private function to handle some tasks.For handling more of NWIS’s services make
retrieve_rdb
more general.
Add a new argument called
nwis_kwds
tointeractive_map
so any NWIS specific keywords can be passed for filtering stations.Improve exception handling in
get_info
method and simplify and improve its performance for getting HCDN.
Internal Changes#
Migrate to using
AsyncRetriever
for handling communications with web services.
0.11.0 (2021-06-19)#
Breaking Changes#
Drop support for Python 3.6 since many of the dependencies such as
xarray
andpandas
have done so.Remove
get_nid
andget_nid_codes
functions since NID now has a ArcGISRESTFul service.
New Features#
Add a new class called
NID
for accessing the recently released National Inventory of Dams web service. This service is based on ArcGIS’s RESTful service. So now the user just need to instantiate the class like soNID()
and with three methods ofAGRBase
class, the user can retrieve the data. These methods are:bygeom
,byids
, andbysql
. Moreover, it has aattrs
property that includes descriptions of the database fields with their units.Refactor
NWIS.get_info
to be more generic by accepting any valid queries that are documented at USGS Site Web Service.Allow for passing a list of queries to
NWIS.get_info
and useasync_retriever
that significantly improves the network response time.Add two new flags to
interactive_map
for limiting the stations to those with daily values (dv=True
) and/or instantaneous values (iv=True
). This function now includes a link to stations webpage on USGS website.
Internal Changes#
Use persistent caching for all send/receive requests that can significantly improve the network response time.
Explicitly include all the hard dependencies in
setup.cfg
.Refactor
interactive_map
andNWIS.get_info
to make them more efficient and reduce their code complexity.
0.10.2 (2021-03-27)#
Internal Changes#
Add announcement regarding the new name for the software stack, HyRiver.
Improve
pip
installation and release workflow.
0.10.1 (2021-03-06)#
Internal Changes#
Add
lxml
to deps.
0.10.0 (2021-03-06)#
Internal Changes#
The official first release of PyGeoHydro with a new name and logo.
Replace
cElementTree
withElementTree
since it’s been deprecated bydefusedxml
.Make
mypy
checks more strict and fix all the errors and prevent possible bugs.Speed up CI testing by using
mamba
and caching.
0.9.2 (2021-03-02)#
Internal Changes#
Rename
hydrodata
package toPyGeoHydro
for publication on JOSS.In
NWIS.get_info
, drop rows that don’t have mean daily discharge data instead of slicing.Speed up Github Actions by using
mamba
and caching.Improve
pip
installation by addingpyproject.toml
.
New Features#
Add support for the National Inventory of Dams (NID) via
get_nid
function.
0.9.1 (2021-02-22)#
Internal Changes#
Fix an issue with
NWIS.get_info
method where stations with False values as theirhcdn_2009
value were returned asNone
instead.
0.9.0 (2021-02-14)#
Internal Changes#
Bump versions of packages across the stack to the same version.
Use the new PyNHD function for getting basins,
NLDI.get_basisn
.Made
mypy
checks more strict and added all the missing type annotations.
0.8.0 (2020-12-06)#
Fixed the issue with WaterData due to the recent changes on the server side.
Updated the examples based on the latest changes across the stack.
Add support for multipolygon.
Remove the
fill_hole
argument.Fix a warning in
nlcd
regarding performing division onnan
values.
0.7.2 (2020-8-18)#
Enhancements#
Replaced
simplejson
withorjson
to speed-up JSON operations.Explicitly sort the time dimension of the
ssebopeta_bygeom
function.
Bug Fixes#
Fix an issue with the
nlcd
function where high resolution requests fail.
0.7.1 (2020-8-13)#
New Features#
Added a new argument to
plot.signatures
for controlling the vertical position of the plot title, calledtitle_ypos
. This could be useful for multi-line titles.
Bug Fixes#
Fixed an issue with the
nlcd
function where none layers are not dropped and cause the function to fails.
0.7.0 (2020-8-12)#
This version divides PyGeoHydro into six standalone Python libraries. So many of the changes listed below belong to the modules and functions that are now a separate package. This decision was made for reducing the complexity of the code base and allow the users to only install the packages that they need without having to install all the PyGeoHydro dependencies.
Breaking changes#
The
services
module is now a separate package called PyGeoOGCC and is set as a requirement for PyGeoHydro. PyGeoOGC is a leaner package with much fewer dependencies and is suitable for people who might only need an interface to web services.Unified function names for getting feature by ID and by box.
Combined
start
andend
arguments into atuple
argument calleddates
across the code base.Rewrote NLDI function and moved most of its
classmethods
toStation
so nowStation
class has more cohesion.Removed exploratory functionality of
ArcGISREST
, since it’s more convenient to do so from a browser. Now,base_url
is a required argument.Renamed
in_crs
indatasets
andservices
functions togeo_crs
for geometry andbox_crs
for bounding box inputs.Re-wrote the
signatures
function from scratch usingNamedTuple
to improve readability and efficiency. Now, thedaily
argument should be just apandas.DataFrame
orpandas.Series
and the column names are used for legends.Removed
utils.geom_mask
function and replaced it withrasterio.mask.mask
.Removed
width
as an input in functions with raster output sinceresolution
is almost always the preferred way to request for data. This change made the code more readable.Renamed two functions:
ArcGISRESTful
andwms_bybox
. These function now returnrequests.Response
type output.onlyipv4
is now a class method inRetrySession
.The
plot.signatures
function now assumes that the input time series are in mm/day.Added a flag to
get_streamflow
function in theNWIS
class to convert from cms to mm/day which is useful for plotting hydrologic signatures using thesignatures
functions.
Enhancements#
Remove soft requirements from the env files.
Refactored
requests
functions into a single class and a separate file.Made all the classes available directly from
PyGeoHydro
.Added CodeFactor to the Github pipeline and addressed some issues that
CodeFactor
found.Added Bandit to check the code for security issue.
Improved docstrings and documentations.
Added customized exceptions for better exception handling.
Added
pytest
fixtures to improve the tests speed.Refactored
daymet
andnwis_siteinfo
functions to reduce code complexity and improve readability.Major refactoring of the code base while adding type hinting.
The input geometry (or bounding box) can be provided in any projection and the necessary re-projections are done under the hood.
Refactored the method for getting object IDs in
ArcGISREST
class to improve robustness and efficiency.Refactored
Daymet
class to improve readability.Add Deepsource for further code quality checking.
Automatic handling of large WMS requests (more than 8 million pixels i.e., width x height)
The
json_togeodf
function now accepts both a single (Geo)JSON or a list of themRefactored
plot.signatures
usingadd_gridspec
for a much cleaner code.
New Features#
Added access to WaterData’s GeoServer databases.
Added access to the remaining NLDI database (Water Quality Portal and Water Data Exchange).
Created a Binder for launching a computing environment on the cloud and testing PyGeoHydro.
Added a URL repository for the supported services called
ServiceURL
Added support for FEMA web services for flood maps and FWS for wetlands.
Added a new function called
wms_toxarray
for converting WMS request responses toxarray.DataArray
orxarray.Dataset
.
Bug Fixes#
Re-projection issues for function with input geometry.
Start and end variables not being initialized when coords was used in
Station
.Geometry mask for
xarray.DataArray
WMS output re-projections
0.6.0 (2020-06-23)#
Refactor requests session
Improve overall code quality based on CodeFactor suggestions
Migrate to Github Actions from TravisCI
0.5.5 (2020-06-03)#
Add to conda-forge
Remove pqdm and arcgis2geojson dependencies
0.5.3 (2020-06-07)#
Added threading capability to the flow accumulation function
Generalized WFS to include both by bbox and by featureID
Migrate RTD to
pip
fromconda
.Changed HCDN database source to GagesII database
Increased robustness of functions that need network connections
Made the flow accumulation output a pandas Series for better handling of time series input
Combined DEM, slope, and aspect in a class called NationalMap.
Installation from pip installs all the dependencies
0.5.0 (2020-04-25)#
An almost complete re-writing of the code base and not backward-compatible
New website design
Added vector accumulation
Added base classes and function accessing any ArcGIS REST, WMS, WMS service
Standalone functions for creating datasets from responses and masking the data
Added threading using
pqdm
to speed up the downloadsInteractive map for exploring USGS stations
Replaced OpenTopography with 3DEP
Added HCDN database for identifying natural watersheds
0.4.4 (2020-03-12)#
Added new databases: NLDI, NHDPLus V2, OpenTopography, gridded Daymet, and SSEBop
The gridded data are returned as xarray DataArrays
Removed dependency on StreamStats and replaced it by NLDI
Improved overall robustness and efficiency of the code
Not backward comparable
Added code style enforcement with
isort
, black, flake8 and pre-commitAdded a new shiny logo!
New installation method
Changed OpenTopography base url to their new server
Fixed NLCD legend and statistics bug
0.3.0 (2020-02-10)#
Clipped the obtained NLCD data using the watershed geometry
Added support for specifying the year for getting NLCD
Removed direct NHDPlus data download dependency by using StreamStats and USGS APIs
Renamed
get_lulc
function toget_nlcd
0.2.0 (2020-02-09)#
Simplified import method
Changed usage from
rst
format toipynb
Auto-formatting with the black python package
Change
docstring
format based on SphinxFixed
pytest
warnings and changed its working directoryAdded an example notebook with data files
Added
docstring
for all the functionsAdded Module section to the documentation
Fixed py7zr issue
Changed 7z extractor from
pyunpack
to py7zrFixed some linting issues.
0.1.0 (2020-01-31)#
First release on PyPI.
History#
0.13.1 (2022-06-11)#
New Features#
In
deg2mpm
function look for_FillValue
andnodatavals
in the attributes and if not found, fall back tonumpy.nan
.
Internal Changes#
Ensure that the
deg2mpm
function usesdask
if the input isdask
-enabled.In the
elevation_profile
function use a bounding box to get DEM and a linear interpolation to get the elevation along the profile.
0.13.0 (2022-04-03)#
New Features#
Add a new function called
query_3dep_sources
for querying bounds of 3DEP’s data sources within a bounding box. It returns a geo-dataframe that contains the bounding box of each data source and a columndem_res
identifying the resolution of the raw topographic data within each geometry.Add a new function called
elevation_profile
for getting elevation profile along a line at a given spacing. This function converts the line to a B-spline and then calculates the elevation along the spline at a given uniform spacing.
Breaking Changes#
Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database.HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache file.
You can do this like so:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
0.12.2 (2022-01-15)#
New Features#
Add a new DEM source to
elevation_bycoords
to get elevation from the National Map’s 3DEP WMS service. This can replace thetnm
source sincetnm
is not stable.Add a new function called
check_3dep_availability
to check the availability of 3DEP’s native resolutions within an area of interest. It returns adict
with keys corresponding to the available resolutions and its values are boolean values indicating whether the resolution is available or not.Replace no data values of
slope
indeg2mm
withnp.nan
, so they do not get converted to another value. The output of this function hasnp.float64
type.
Internal Changes#
Refactor
ElevationByCoords
by using__post_init__
for validating the input parameters rather thanpydantic
’s validators.Refactor
elevation_bygrid
by usingget_map
to get DEM andrioxarray
for re-projection.Add type checking with
typeguard
and fixed typing issues raised bytypeguard
.Refactor
show_versions
to ensure getting correct versions of all dependencies.
0.12.1 (2021-12-31)#
Internal Changes#
Use the three new
ar.retrieve_*
functions instead of the oldar.retrieve
function to improve type hinting and to make the API more consistent.
0.12.0 (2021-12-27)#
Breaking Changes#
Set the request caching’s expiration time to never expire. Add two flags to all functions to control the caching:
expire_after
anddisable_caching
.
Internal Changes#
Add all the missing types so
mypy --strict
passes.Improve performance of
elevation_bygrid
by ignoring unnecessary validation.
0.11.4 (2021-11-12)#
Internal Changes#
0.11.3 (2021-10-03)#
Breaking Changes#
Rewrite the command-line interface using
click.group
to improve UX. The command is nowpy3dep [command] [args] [options]
. The two supported commands arecoords
for getting elevations of a dataframe of coordinates inEPSG:4326
CRS andgeometry
for getting the elevation of a geo-dataframe of geometries. Each sub-command now has a separate help message. The format of the input file for thecoords
command is nowcsv
and for thegeometry
command is.shp
or.gpkg
and must have acrs
attribute. Also, thegeometry
command now accepts multiple layers via the--layers
(-l
) option. More information and examples can be in theREADME.rst
file.
New Features#
Make
fill_depressions
function public. This function conditions an input DEM by applying depression filling and flat area resolution operations.
Internal Changes#
The
get_map
function now checks for validation of the inputlayers
argument before sending the actual request with a more helpful message.Improve docstrings.
Move
deg2mpm
,fill_depressions
, andreproject_gtiff
functions to a new file calledutils
. Bothdeg2mpm
andfill_depressions
functions are still accessible frompy3dep
directly.Increase the test coverage.
Use one of the
click
’s internal functions,click..testing.CliRunner
, to run the CLI tests.
0.11.2 (2021-09-17)#
Bug Fixes#
Fix a bug related to
elevation_bycoords
where CRS validation fails if its type ispyrpoj.CRS
by converting inputs with CRS types to string.
Internal Changes#
Fix a couple of typing issues and update the
get_transform
API based on the recent changes inpygeoutils
v0.11.5.
0.11.1 (2021-07-31)#
The first highlight of this release is a major refactor of elevation_bycoords
by
adding support for the Bulk Point Query Service and improving the overall performance
of the function. Another highlight is support for performing depression filling
in elevation_bygrid
before sampling the underlying DEM.
New Features#
Refactor
elevation_bycoords
function to add support for getting elevations of a list of coordinates via The National Map’s Point Query Service. This service is more accurate than Airmap, but it’s limited to the US only. You can select the source via a new argument calledsource
. You can set it tosource=tnm
to use the TNM service. The default istnm
.Refactor
elevation_bygrid
function to add a new capability viafill_depressions
argument for filling depressions in the obtained DEM before extracting elevation data for the input grid points. This is achieved via RichDEM that needs to be installed if this functionality is desired. You can install it viapip
orconda
(mamba
).
Internal Changes#
Migrate to using
AsyncRetriever
for handling communications with web services.Handle the interpolation step in
elevation_bygrid
function more efficiently usingxarray
.
0.11.0 (2021-06-19)#
New Features#
Added command-line interface (GH10).
All feature query functions use persistent caching that can significantly improve the performance.
Breaking Changes#
Drop support for Python 3.6 since many of the dependencies such as
xarray
andpandas
have done so.The returned
xarray
objects are in parallel mode, i.e., in some casescompute
method should be used to get the results.Save the output as a
netcdf
instead ofraster
since conversion fromnc
totiff
can be easily done withrioxarray
.
0.10.1 (2021-03-27)#
Add announcement regarding the new name for the software stack, HyRiver.
Improve
pip
installation and release workflow.
0.10.0 (2021-03-06)#
The first release after renaming hydrodata to PyGeoHydro.
Make
mypy
checks more strict and fix all the errors and prevent possible bugs.Speed up CI testing by using
mamba
and caching.
0.9.0 (2021-02-14)#
Bump version to the same version as PyGeoHydro.
Add support for saving maps as
geotiff
file(s).Replace
Elevation Point Query Service
service withAirMap
for getting elevations for a list of coordinates in bulk sinceAirMap
is much faster. The resolution ofAirMap
is 30 m.Use
cytoolz
for some operations for improving performance.
0.2.0 (2020-12-06)#
Add support for multipolygon.
Remove the
fill_hole
argument.Add a new function to get elevations for a list of coordinates called
elevation_bycoords
.Refactor
elevation_bygrid
function for increasing readability and performance.
0.1.7 (2020-08-18)#
Added a rename operation to
get_map
to automatically rename the variables to a more sensible one.Replaced
simplejson
withorjson
to speed-up JSON operations.
0.1.6 (2020-08-11)#
Add a new function,
show_versions
, for getting versions of the installed dependencies which is useful for debugging and reporting.Fix typos in the docs and improved the README.
Improve testing and coverage.
0.1.5 (2020-08-03)#
Fixed the geometry CRS issue
Improved the documentation
0.1.4 (2020-07-23)#
Refactor
get_map
to usepygeoutils
package.Change the versioning method to
setuptools_scm
.Polish README and add installation from
conda-forge
.
0.1.0 (2020-07-19)#
First release on PyPI.
History#
0.13.1 (2022-06-11)#
New Features#
Adopt the default snow parameters’ values from a new source https://doi.org/10.5194/gmd-11-1077-2018 and add the citation.
Bug Fixes#
0.13.0 (2022-03-03)#
Breaking Changes#
Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database.HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache file.
You can do this like so:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
0.12.3 (2022-02-04)#
New Features#
Add a new flag to both
get_bycoords
andget_bygeom
functions calledsnow
which separates snow from the precipitation using the Martinez and Gupta (2010) method.
Internal Changes#
Add elevation data when computing PET regardless of the
pet
method.Match the chunk size of
elevation
with that of the climate data.Drop
time
dimension fromelevation
,lon
, andlat
variables.
Bug Fixes#
Fix a bug in setting dates for monthly timescales. For monthly timescale Daymet calendar is at 15th or 16th of the month, so input dates need to be adjusted accordingly.
0.12.2 (2022-01-15)#
Internal Changes#
Clean up the PET computation functions’ output by removing temporary variables that are created during the computation.
Add more attributes for
elevation
andpet
variables.Add type checking with
typeguard
and fixed typing issues raised bytypeguard
.Refactor
show_versions
to ensure getting correct versions of all dependencies.
0.12.1 (2021-12-31)#
Internal Changes#
Use the three new
ar.retrieve_*
functions instead of the oldar.retrieve
function to improve type hinting and to make the API more consistent.
0.12.0 (2021-12-27)#
New Features#
Expose the
ssl
argument for disabling the SSL certification verification (GH41). Now, you can passssl=False
to disable the SSL verification in bothget_bygeom
andget_bycoord
functions. Moreover, you can pass--disable_ssl
to PyDaymet’s command line interface to disable the SSL verification.
Breaking Changes#
Set the request caching’s expiration time to never expire. Add two flags to all functions to control the caching:
expire_after
anddisable_caching
.
Internal Changes#
Add all the missing types so
mypy --strict
passes.
0.11.4 (2021-11-12)#
Internal Changes#
Use
importlib-metadata
for getting the version instead ofpkg_resources
to decrease import time as discussed in this issue.
0.11.3 (2021-10-07)#
Bug Fixes#
There was an issue in the PET computation due to
dayofyear
being added as a new dimension. This version fixes it and even further simplifies the code by usingxarray
’sdt
accessor to gain access to thedayofyear
method.
0.11.2 (2021-10-07)#
New Features#
Add
hargreaves_samani
andpriestley_taylor
methods for computing PET.
Breaking Changes#
Rewrite the command-line interface using
click.group
to improve UX. The command is nowpydaymet [command] [args] [options]
. The two supported commands arecoords
for getting climate data for a dataframe of coordinates andgeometry
for getting gridded climate data for a geo-dataframe. Moreover, Each sub-command now has a separate help message and example.Deprecate
get_byloc
in favor ofget_bycoords
.The
pet
argument in bothget_bycoords
andget_bygeom
functions now acceptshargreaves_samani
,penman_monteith
,priestley_taylor
, andNone
.
Internal Changes#
Refactor the
pet
module for reducing duplicate code and improving readability and maintainability. The code is smaller now and the functions for computing physical properties include references to equations from the respective original paper.
0.11.1 (2021-07-31)#
The highlight of this release is a major refactor of Daymet
to allow for
extending PET computation function for using methods other than FAO-56.
New Features#
Refactor
Daymet
class by removingpet_bycoords
andpet_bygrid
methods and creating a new public function calledpotential_et
. This function computes potential evapotranspiration (PET) and supports both gridded (xarray.Dataset
) and single pixel (pandas.DataFrame
) climate data. The long-term plan is to add support for methods other than FAO 56 for computing PET.
0.11.0 (2021-06-19)#
New Features#
Add command-line interface (GH7).
Use
AsyncRetriever
for sending requests asynchronously with persistent caching. A cache folder in the current directory is created.Check for validity of start/end dates based on Daymet V4 since Puerto Rico data starts from 1950 while North America and Hawaii start from 1980.
Check for validity of input coordinate/geometry based on the Daymet V4 bounding boxes.
Improve accuracy of computing Psychometric constant in PET calculations by using an equation in Allen et al. 1998.
Breaking Changes#
Drop support for Python 3.6 since many of the dependencies such as
xarray
andpandas
have done so.Change
loc_crs
andgeo_crs
arguments tocrs
inget_bycoords
andget_bygeom
.
Documentation#
Add examples to docstrings and improve writing.
Add more notes regarding the underlying assumptions for
pet_bycoords
andpet_bygrid
.
Internal Changes#
Refactor
Daymet
class to usepydantic
for validating the inputs.Increase test coverage.
0.10.2 (2021-03-27)#
Add announcement regarding the new name for the software stack, HyRiver.
Improve
pip
installation and release workflow.
0.10.0 (2021-03-06)#
The first release after renaming hydrodata to PyGeoHydro.
Make
mypy
checks more strict and fix all the errors and prevent possible bugs.Speed up CI testing by using
mamba
and caching.
0.9.0 (2021-02-14)#
Bump version to the same version as PyGeoHydro.
Update to version 4 of Daymet database. You can check the release information here
Add a new function called
get_bycoords
that provides an alternative toget_byloc
for getting climate data at a single pixel. This new function uses THREDDS data server with NetCDF Subset Service (NCSS), and supports getting monthly and annual averages directly from the server. Note that this function will replaceget_byloc
in the future. So consider migrating your code by replacingget_byloc
withget_bycoords
. The input arguments ofget_bycoords
is very similar toget_bygeom
. Another difference betweenget_byloc
andget_bycoords
is column names whereget_bycoords
uses the units that are return by NCSS server.Add support for downloading monthly and annual summaries in addition to the daily timescale. You can pass
time_scale
asdaily
,monthly
, orannual
toget_bygeom
orget_bycoords
functions to download the respective summaries.Add support for getting climate data for Hawaii and Puerto Rico by passing
region
toget_bygeom
andget_bycoords
functions. The acceptable values arena
for CONUS,hi
for Hawaii, andpr
for Puerto Rico.
0.2.0 (2020-12-06)#
Add support for multipolygon.
Remove the
fill_hole
argument.Improve masking by geometry.
Use the newly added
async_requests
function frompygeoogc
for getting Daymet data to increase the performance (almost 2x faster)
0.1.3 (2020-08-18)#
Replaced
simplejson
withorjson
to speed-up JSON operations.
0.1.2 (2020-08-11)#
Add
show_versions
for showing versions of the installed deps.
0.1.1 (2020-08-03)#
Retained the compatibility with
xarray
0.15 by removing theattrs
flag.Replaced
open_dataset
withload_dataset
for automatic handling of closing the input after reading the content.Removed
years
argument from bothbyloc
andbygeom
functions. Thedates
argument now accepts both a tuple of start and end dates and a list of years.
0.1.0 (2020-07-27)#
Initial release on PyPI.
History#
0.3.2 (2022-04-03)#
New Features#
Add support for setting caching-related arguments using three environmental variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database.HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache file.
You can do this like so:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
Internal Changes#
Include the URL of a failed request in its exception error message.
0.3.1 (2021-12-31)#
New Features#
Add three new functions called
retrieve_text
,retrieve_json
, andretrieve_binary
. These functions are derived from theretrieve
function and are used to retrieve the text, JSON, or binary content of a response. They are meant to help with type hinting since they have only one return type instead of the three different return types that theretrieve
function has.
Internal Changes#
Move all private functions to a new module called
utils
. This makes the code-base more readable and easier to maintain.
0.3.0 (2021-12-27)#
Breaking Changes#
Set the expiration time to never expire by default.
New Features#
Add two new arguments to
retrieve
for controlling caching. First,delete_url_cache
for deleting caches for specific requests. Second,expire_after
for setting a custom expiration time.Expose the
ssl
argument for disabling the SSL certification verification (GH41).Add a new option called
disable
that temporarily disables caching requests/responses if set toTrue
. It defaults toFalse
.
0.2.5 (2021-11-09)#
New Features#
Add two new arguments,
timeout
andexpire_after
, toretrieve
. These two arguments give the user more control in dealing with issues related to caching.
Internal Changes#
Revert to
pytest
as the testing framework.Use
importlib-metadata
for getting the version instead ofpkg_resources
to decrease import time as discussed in this issue.
0.2.4 (2021-09-10)#
Internal Changes#
Use
ujon
for converting responses to JSON.
Bug Fixes#
Fix an issue with catching service error messages.
0.2.3 (2021-08-26)#
Internal Changes#
Use
ujson
for JSON parsing instead oforjson
sinceorjson
only serializes tobytes
which is not compatible withaiohttp
.
0.2.2 (2021-08-19)#
New Features#
Add a new function,
clean_cache
, for manually removing the expired responses from the cache database.
Internal Changes#
Handle all cache file-related operations in the
create_cachefile
function.
0.2.1 (2021-07-31)#
New Features#
The responses now are returned to the same order as the input URLs.
Add support for passing connection type, i.e., IPv4 only, IPv6 only, or both via the
family
argument. Defaults toboth
.Set
trust_env=True
, so the session can read the system’snetrc
files. This can be useful for working with services such as EarthData service that read the user authentication info from anetrc
file.
Internal Changes#
Replace the
AsyncRequest
class with the_retrieve
function to increase readability and reduce overhead.More robust handling of validating user inputs via a new class called
ValidateInputs
.Move all if-blocks in
async_session
to other functions to improve performance.
0.2.0 (2021-06-17)#
Breaking Changes#
Make persistent caching dependencies required.
Rename
request
argument torequest_method
inretrieve
which now accepts both lower and upper cases ofget
andpost
.
Bug Fixes#
Pass a new loop explicitly to
nest_asyncio
(GH1).
Internal Changes#
Refactor the entire code-base for more efficient handling of different request methods.
Check the validity of inputs before sending requests.
Improve documentation.
Improve cache handling by removing the expired responses before returning the results.
Increase testing coverage to 100%.
0.1.0 (2021-05-01)#
Initial release.
History#
0.13.1 (2022-06-11)#
New Features#
More robust handling of errors in
ArcGISRESTful
by catchingNone
responses. Also, use thePOST
method forArcGISRESTful.bysql
since the SQL Clause could be a long string.
0.13.0 (2022-04-03)#
Breaking Changes#
Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:
HYRIVER_CACHE_NAME
: Path to the caching SQLite database.HYRIVER_CACHE_EXPIRE
: Expiration time for cached requests in seconds.HYRIVER_CACHE_DISABLE
: Disable reading/writing from/to the cache file.
You can do this like so:
import os
os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
Bug Fixes#
In
ArcGISRESTful.oids_byfield
convert the inputids
to alist
if a user passes a singleid
.
Internal Changes#
Refactor
ServicURL
to hard code the supported links instead of reading them from a file. Also, the class now is based onNamedTuple
that has a nicer__repr__
.
0.12.2 (2022-01-15)#
New Features#
Make
validate_crs
public that can be accessed from theutils
module. This is useful for checking validity of user input CRS values and getting its string representation.Add
pygeoogc.utils.valid_wms_crs
function for getting a list of valid CRS values from a WMS service.Add 3DEP’s index WFS service for querying availability of 3DEP data within a bounding box.
Internal Changes#
Add type checking with
typeguard
and fixed typing issues raised bytypeguard
.Refactor
show_versions
to ensure getting correct versions of all dependencies.
0.12.1 (2021-12-31)#
Internal Changes#
Use the three new
ar.retrieve_*
functions instead of the oldar.retrieve
function to improve type hinting and to make the API more consistent.
0.12.0 (2021-12-27)#
New Features#
Add a new argument to
ArcGISRESTful
calledverbose
to turn on/off all info level logs.Add an option to
ArcGISRESTful.get_features
calledget_geometry
to turn on/off requesting the data with or without geometry.Now,
ArcGISRESTful
saves the object IDs of the features that user requested but are not available in the database to./cache/failed_request_ids.txt
.Add a new parameter to
ArcGISRESTful
calleddisable_retry
that IfTrue
in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests are saved to a text file which its path can be accessed viaArcGISRESTful.client.failed_path
.Set response caching expiration time to never expire, for all base classes. A new argument has been added to all three base classes called
expire_after
that can be used to set the expiration time.Add a new method to all three base classes called
clear_cache
that clears all cached responses for that specific client.
Breaking Changes#
All
oids_by*
methods ofArcGISRESTful
class now return a list of object IDs rather than settingself.featureids
. This makes it possible to pass the outputs of theoids_by*
functions directly to theget_features
method.
Internal Changes#
Make
ArcGISRESTful
less cluttered by instantiatingArcGISRESTfulBase
in theinit
method ofArcGISRESTful
rather than inheriting from its base class.Explicitly set a minimum value of 1 for the maximum number of feature IDs per request in
ArcGISRESTful
, i.e.,self.max_nrecords
.Add all the missing types so
mypy --strict
passes.
0.11.7 (2021-11-09)#
Breaking Changes#
Remove the
onlyipv4
method fromRetrySession
since it can be easily be achieved usingwith unittest.mock.patch("socket.has_ipv6", False):
.
Internal Changes#
Use the
geoms
method for iterating over geometries to address the deprecation warning ofshapely
.Use
importlib-metadata
for getting the version instead ofpkg_resources
to decrease import time as discussed in this issue.Remove unnecessary dependency on
simplejson
and useujson
instead.
0.11.5 (2021-09-09)#
Bug Fixes#
Update the code to use the latest
requsts-cache
API.
0.11.4 (2021-08-26)#
New Features#
Add URL for PyGeoAPI service.
0.11.3 (2021-08-21)#
Internal Changes#
Fix a bug in
WFS.getfeature_byid
when the number of IDs exceeds the service’s limit by splitting large requests into multiple smaller requests.Add two new arguments,
max_nrecords
andread_method
, toWFS
to control the maximum number of records per request (defaults to 1000) and specify the response read method (defaults tojson
), respectively.
0.11.2 (2021-08-19)#
Internal Changes#
Simplify the retry logic
ArcGISRESTFul
by making it run four times and making sure that the last retry is one object ID per request.
0.11.1 (2021-07-31)#
The highlight of this release is migrating to use AsyncRetriever
that can improve
the network response time significantly. Another highlight is a major refactoring of
ArcGISRESTFul
that improves performance and reduce code complexity.
New Features#
Add a new method to
ArcGISRESTFul
class for automatically retrying the failed requests. This private method plucks out individual features that were in a failed request with several features. This happens when there are some object IDs that are not available on the server, and they are included in the request. In these situations the request will fail, although there are valid object IDs in the request. This method will pluck out the valid object IDs.Add support for passing additional parameters to
WMS
requests such asstyles
.Add support for WFS version 1.0.0.
Internal Changes#
Migrate to
AsyncRetriever
fromrequests-cache
for all the web services.Rename
ServiceError
toServiceUnavailable
andServerError
toServiceError
Since it’s more representative of the intended exception.Raise for response status in
RetrySession
before the try-except block soRequestsException
can raise and its error messaged be parsed.Deprecate
utils.threading
since all threading operations are now handled byAsyncRetriever
.Increase test coverage.
0.11.0 (2021-06-18)#
New Features#
Add support for requesting
LineString
polygon forArcGISRESTful
.Add a new argument called
distance
toArcGISRESTful.oids_bygeom
for specifying the buffer distance from the input geometry for getting features.
Breaking Changes#
Drop support for Python 3.6 since many of the dependencies such as
xarray
andpandas
have done so.Remove
async_requests
function, since it has been packaged as a new Python library called AsyncRetriever.Refactor
MatchCRS
. Now, it should be instantiated by providing the in and out CRSs like so:MatchCRS(in_crs, out_crs)
. Then its methods, namely,geometry
,bounds
andcoords
, can be called. These methods now have only one input, geometry.Change input and output types of
MatchCRS.coords
from tuple of lists of coordinates to list of(x, y)
coordinates.ArcGISRESTful
now has a new argument,layer
, for specifying the layer number (int). Now, the target layer should either be a part ofbase_url
or be passed withlayer
argument.Move the
spatial_relation
argument fromArcGISRESTful
class tooids_bygeom
method, since that’s where it’s applicable.
Internal Changes#
Refactor
ArcGISRESTfulBase
class to reduce its code complexity and make the service initialization logic much simpler. The class is faster since it makes fewer requests during the initialization process.Add
pydantic
as a new dependency that takes care ofArcGISRESTfulBase
validation.Use persistent caching for all send/receive requests that can significantly improve the network response time.
Explicitly include all the hard dependencies in
setup.cfg
.Set a default value of 1000 for
max_nrecords
inArcGISRESTfulBase
.Use
dataclass
forWMSBase
andWFSBase
since support for Python 3.6 is dropped.
0.10.1 (2021-03-27)#
Add announcement regarding the new name for the software stack, HyRiver.
Improve
pip
installation and release workflow.
0.10.0 (2021-03-06)#
The first release after renaming
hydrodata
toPyGeoHydro
.Fix
extent
property ofArcGISRESTful
being set toNone
incorrectly.Add
feature types
property toArcGISRESTFul
for getting names and IDs of types of features in the database.Replace
cElementTree
withElementTree
since it’s been deprecated bydefusedxml
.Remove dependency on
dataclasses
since its benefits and usage in the code was minimal.Speed up CI testing by using
mamba
and caching.ArcGISRESTFull
now prints number of found features before attempting to retrieve them.User
logging
module for printing information.
0.9.0 (2021-02-14)#
Bump version to the same version as PyGeoHydro.
Add support for query by point and multi-points to
ArcGISRESTful.bygeom
.Add support for buffer distance to
ArcGISRESTful.bygeom
.Add support for generating ESRI-based queries for points and multi-points to
ESRIGeomQuery
.Add all the missing type annotations.
Update the Daymet URL to version 4. You can check the release information here
Use
cytoolz
library for improving performance of some operations.Add
extent
property toArcGISRESTful
class that get the spatial extent of the service.Add URL to
airmap
service for getting elevation data at 30 m resolution.
0.2.3 (2020-12-19)#
Fix
urlib3
deprecation warning about usingmethod_whitelist
.
0.2.2 (2020-12-05)#
Remove unused variables in
async_requests
and usemax_workers
.Fix the
async_requests
issue on Windows systems.
0.2.0 (2020-12-06)#
Added/Renamed three class methods in
ArcGISRESTful
:oids_bygeom
,oids_byfield
, andoids_bysql
. So you can query feature within a geometry, using specific field ID(s), or more generally using any valid SQL 92 WHERE clause.Added support for query with SQL WHERE clause to
ArcGISRESTful
.Changed the NLDI’s URL for migrating to its new API v3.
Added support for CQL filter to
WFS
, credits to Emilio.Moved all the web services URLs to a YAML file that
ServiceURL
class reads. It makes managing the new URLs easier. The file is located atpygeoogc/static/urls.yml
.Turned off threading by default for all the services since not all web services supports it.
Added support for setting the request method,
GET
orPOST
, forWFS.byfilter
, which could be useful when the filter string is long.Added support for asynchronous download via the function
async_requests
.
0.1.10 (2020-08-18)#
Improved
bbox_decompose
to fix theWMS
issue with high resolution requests.Replaces
simplejson
withorjson
to speed up JSON operations.
0.1.8 (2020-08-12)#
Removed threading for
WMS
due to inconsistent behavior.Addressed an issue with domain decomposition for
WMS
where width/height becomes 0.
0.1.7 (2020-08-11)#
Renamed
vsplit_bbox
tobbox_decompose
. The function now decomposes the domain in both directions and return squares and rectangular.
0.1.5 (2020-07-23)#
Re-wrote
wms_bybox
function as a class calledWMS
with a similar interface to theWFS
class.Added support for WMS 1.3.0 and WFS 2.0.0.
Added a custom
Exception
for the threading function calledThreadingException
.Add
always_xy
flag toWMS
andWFS
which is False by default. It is useful for cases where a web service doesn’t change the axis order from the transitionalxy
toyx
for versions higher than 1.3.0.
0.1.3 (2020-07-21)#
Remove unnecessary transformation of the input bbox in WFS.
Use
setuptools_scm
for versioning.
0.1.2 (2020-07-16)#
Add the missing
max_pixel
argument to thewms_bybox
function.Change the
onlyIPv4
method ofRetrySession
class toonlyipv4
to conform to thesnake_case
convention.Improve docstrings.
0.1.1 (2020-07-15)#
Initial release.
History#
0.13.1 (2022-06-11)#
New Features#
Add support for passing a custom bounding box in the
Coordinates
class. The default is the bounds ofEPSG:4326
to retain backward compatibility. This new class parameter allows a user to check if a list of coordinates is within a custom bounding box. The bounds should be theEPSG:4326
coordinate system.Add a new function called
geometry_list
for converting a list of multi-geometries to a list of geometries.
0.13.0 (2022-03-03)#
Internal Changes#
Write
nodata
attribute usingrioxarray
ingeotiff2xarray
since the clipping operation ofrioxarray
uses this value as fill value.
Bug Fixes#
In the
break_lines
function, convertMultiLineString
intoLineString
sinceshapely.ops.substring
only acceptsLineString
.
0.12.3 (2022-02-04)#
New Features#
Add a function called
break_lines
for breaking lines at given points.Add a function called
snap2nearest
for snapping points to the nearest point on a line with a given tolerance. It accepts ageopandas.GeoSeries
of points and ageopandas.GeoSeries
orgeopandas.GeoDataFrame
of lines. It automatically snaps to the closest lines in the input data.
0.12.2 (2022-01-15)#
New Features#
Add a new class called
GeoBSpline
that generates B-splines from a set of coordinates. Thespline
attribute of this class has five attributes:x
andy
coordinates,phi
andradius
which are curvature and radius of curvature, respectively, anddistance
which is the total distance of each point along the B-spline from the starting points.Add a new class called
Coordinates
that validates a set of lon/lat coordinates. It normalizes longitudes to the range [-180, 180) and has apoints
property that isgeopandas.GeoSeries
with validated coordinates. It uses spatial indexing to speed up the validation and should be able to handle large datasets efficiently.Make
transform2tuple
a public function.
Internal Changes#
The
geometry
andgeo_crs
arguments ofgtiff2xarray
are now optional. This is useful for cases when the inputGeoTiff
response is the results of a bounding box query and there is no need for a geometry mask.Replace the missing values after adding geometry mask via
xarray_geomask
by thenodatavals
attribute of the inputxarray.DataArray
orxarray.Dataset
. Therefore, the data type of the inputxarray.DataArray
orxarray.Dataset
is conserved.Expose
connectivity
argument ofrasterio.features.shapes
function inxarray2geodf
function.Move all private functions to a new module to make the main module less cluttered.
0.12.1 (2021-12-31)#
Internal Changes#
Refactor
arcgis2geojson
for better readability and maintainability.In
arcgis2geojson
set the geometry to null if its type is not supported, such as curved polylines.
0.12.0 (2021-12-27)#
Internal Changes#
Add all the missing types so
mypy --strict
passes.Bump version to 0.12.0 to match the release of
pygeoogc
.
0.11.7 (2021-11-09)#
Internal Changes#
0.11.6 (2021-10-06)#
New Features#
Add a new function,
xarray2geodf
, to convert axarray.DataArray
to ageopandas.GeoDataFrame
.
0.11.5 (2021-06-16)#
Bug Fixes#
Fix an issue with
gtiff2xarray
where thescales
andoffsets
attributes of the outputDataArray
were floats rather than tuples (GH30).
Internal Changes#
Add a new function,
transform2tuple
, for convertingAffine
transforms to a tuple. Previously, theAffine
transform was converted to a tuple usingto_gdal()
method ofrasterio.Affine
which was not compatible withrioxarray
.
0.11.4 (2021-08-26)#
Internal Changes#
Use
ujson
for JSON parsing instead oforjson
sinceorjson
only serializes tobytes
which is not compatible withaiohttp
.Convert the transform attribute data type from
Affine
totuple
since saving a data array tonetcdf
cannot handle theAffine
type.
0.11.3 (2021-08-19)#
Fix an issue in
geotiff2xarray
related to saving axarray
object to NetCDF when its transform attribute hasAffine
type rather than a tuple.
0.11.2 (2021-07-31)#
The highlight of this release is performance improvement in gtiff2xarray
for
handling large responses.
New Features#
Automatic detection of the driver by default in
gtiff2xarray
as opposed to it beingGTiff
.
Internal Changes#
Make
geo2polygon
,get_transform
, andget_nodata_crs
public functions since other packages use it.Make
xarray_mask
a public function and simplifygtiff2xarray
.Remove
MatchCRS
since it’s already available inpygeoogc
.Validate input geometry in
geo2polygon
.Refactor
gtiff2xarray
to check for theds_dims
outside the main loops to improve the performance. Also, the function tries to detect the dimension names automatically ifds_dims
is not provided by the user, explicitly.Improve performance of
json2geodf
by using list comprehension and performing checks outside the main loop.
Bug Fixes#
Add the missing arguments for masking the data in
gtiff2xarray
.
0.11.1 (2021-06-19)#
Bug Fixes#
In some edge cases the y-coordinates of a response might not be monotonically sorted so
dask
fails. This release sorts them to address this issue.
0.11.0 (2021-06-19)#
New Features#
Function
gtiff2xarray
returns a parallelizedxarray.Dataset
orxarray.DataAraay
that can handle large responses much more efficiently. This is achieved usingdask
.
Breaking Changes#
Drop support for Python 3.6 since many of the dependencies such as
xarray
andpandas
have done so.Refactor
MatchCRS
. Now, it should be instantiated by providing the in and out CRSs like so:MatchCRS(in_crs, out_crs)
. Then its methods, namely,geometry
,bounds
andcoords
, can be called. These methods now have only one input, geometry.Change input and output types of
MatchCRS.coords
from tuple of lists of coordinates to list of(x, y)
coordinates.Remove
xarray_mask
andgtiff2file
sincerioxarray
is more general and suitable.
Internal Changes#
Remove unnecessary type checks for private functions.
Refactor
json2geodf
to improve robustness. Useget
method ofdict
for checking key availability.
0.10.1 (2021-03-27)#
Setting transform of the merged dataset explicitly (GH3).
Add announcement regarding the new name for the software stack, HyRiver.
Improve
pip
installation and release workflow.
0.10.0 (2021-03-06)#
The first release after renaming
hydrodata
toPyGeoHydro
.Address GH1 by sorting y coordinate after merge.
Make
mypy
checks more strict and fix all the errors and prevent possible bugs.Speed up CI testing by using
mamba
and caching.
0.9.0 (2021-02-14)#
Bump version to the same version as PyGeoHydro.
Add
gtiff2file
for saving raster responses asgeotiff
file(s).Fix an error in
_get_nodata_crs
for handling no data value when its value in the source is None.Fix the warning during the
GeoDataFrame
generation injson2geodf
when there is no geometry column in the input JSON.
0.2.0 (2020-12-06)#
Added checking the validity of input arguments in
gtiff2xarray
function and provide useful messages for debugging.Add support for multipolygon.
Remove the
fill_hole
argument.Fixed a bug in
xarray_geomask
for getting the transform.
0.1.10 (2020-08-18)#
Fixed the
gtiff2xarray
issue with high resolution requests and improved robustness of the function.Replaced
simplejson
withorjson
to speed up JSON operations.
0.1.9 (2020-08-11)#
Modified
griff2xarray
to reflect the latest changes inpygeoogc
0.1.7.
0.1.8 (2020-08-03)#
Retained the compatibility with
xarray
0.15 by removing theattrs
flag.Added
xarray_geomask
function and made it a public function.More efficient handling of large GeoTiff responses by cropping the response before converting it into a dataset.
Added a new function called
geo2polygon
for converting and transforming a polygon or bounding box into a Shapely’s Polygon in the target CRS.
0.1.6 (2020-07-23)#
Fixed the issue with flipped mask in
WMS
.Removed
drop_duplicates
since it may cause issues in some instances.
0.1.4 (2020-07-22)#
Refactor
griff2xarray
and added support for WMS 1.3.0 and WFS 2.0.0.Add
MatchCRS
class.Remove dependency on PyGeoOGC.
Increase test coverage.
0.1.3 (2020-07-21)#
Remove duplicate rows before returning the dataframe in the
json2geodf
function.Add the missing dependency
0.1.0 (2020-07-21)#
First release on PyPI.
Contributing#
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways to any of the packages that are included in HyRiver project. The workflow is the same for all packages. In this page, a contribution workflow for PyGeoHydro is explained.
Types of Contributions#
Report Bugs#
Report bugs at https://github.com/cheginit/pygeohydro/issues.
If you are reporting a bug, please include:
Your operating system name and version.
Any details about your local setup that might be helpful in troubleshooting.
Detailed steps to reproduce the bug.
Fix Bugs#
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.
Implement Features#
Other than new features that you might have in mind, you can look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.
Write Documentation#
PyGeoHydro could always use more documentation, whether as part of the official PyGeoHydro docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback#
The best way to send feedback is to file an issue at https://github.com/cheginit/pygeohydro/issues.
If you are proposing a feature:
Explain in detail how it would work.
Keep the scope as narrow as possible, to make it easier to implement.
Remember that this is a volunteer-driven project, and that contributions are welcome :)
Get Started!#
Ready to contribute? Here’s how to set up PyGeoHydro for local development.
Fork the PyGeoHydro repo through the GitHub website.
Clone your fork locally and add the main PyGeoHydro as the upstream remote:
$ git clone git@github.com:your_name_here/pygeohydro.git
$ git remote add upstream git@github.com:cheginit/pygeohydro.git
Install your local copy into a virtualenv. Assuming you have Conda installed, this is how you can set up your fork for local development:
$ cd pygeohydro/
$ conda env create -f ci/requirements/environment.yml
$ conda activate pygeohydro-dev
$ python -m pip install . --no-deps
Create a branch for local development:
$ git checkout -b bugfix-or-feature/name-of-your-bugfix-or-feature
$ git push
Before you first commit, pre-commit hooks needs to be setup:
$ pre-commit install
$ pre-commit run --all-files
Now you can make your changes locally, make sure to add a description of the changes to
HISTORY.rst
file and add extra tests, if applicable, totests
folder. Also, make sure to give yourself credit by adding your name at the end of the item(s) that you add in the history like thisBy `Taher Chegini <https://github.com/cheginit>`_
. Then, fetch the latest updates from the remote and resolve any merge conflicts:
$ git fetch upstream
$ git merge upstream/name-of-your-branch
Then lint and test the code:
$ make lint
If you are making breaking changes make sure to reflect them in the documentation,
README.rst
, and tests if necessary.Commit your changes and push your branch to GitHub:
$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Tips#
To run a subset of tests:
$ pytest -k "test_name1 or test_name2"
Deploying#
A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in HISTORY.rst). Then run:
$ git tag -a vX.X.X -m "vX.X.X"
$ git push --follow-tags
where X.X.X
is the version number following the
semantic versioning spec i.e., MAJOR.MINOR.PATCH.
Then release the tag from Github and Github Actions will deploy it to PyPi.
Credits#
Development Lead#
Taher Chegini <cheginit@gmail.com>
Contributors#
None yet. Why not be the first?
License#
MIT License
Copyright (c) 2020, Taher Chegini
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
High-level APIs for accessing some pre-configured web services
Navigate and subset mid- and high-res NHD, NHDPlus, and NHDPlus VAA using WaterData, NLDI, ScienceBase, and The National Map web services.
Low-level APIs for connecting to supported web service protocols