Hydroclimate Data Retriever#

Python Versions Binder Build Website JOSS

HyRiver (formerly named hydrodata) is a suite of Python packages that provides a unified API for retrieving geospatial/temporal data from various web services. HyRiver includes two categories of packages:

  • Low-level APIs for accessing any of the supported web services, i.e., ArcGIS RESTful, WMS, and WFS.

  • High-level APIs for accessing some of the most commonly used datasets in hydrology and climatology studies. Currently, this project only includes hydrology and climatology data within the US.

Low- and high-level APIs

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}
High-level APIs for accessing some pre-configured web services

Navigate and subset mid- and high-res NHD, NHDPlus, and NHDPlus VAA using WaterData, NLDI, ScienceBase, and The National Map web services.

PyPi Version Download Stat

Access NWIS, NID, HCDN 2009, NLCD, and SSEBop databases.

PyPi Version Download Stat

Access topographic data through The National Map’s 3DEP web service.

PyPi Version Download Stat

Access Daymet for daily, monthly and annual summaries of climate data at 1-km scale for both single pixels and gridded, over the North America, Hawaii, and Puerto Rico.

PyPi Version Download Stat

Access GridMet for daily climate data at 4-km scale for both single pixels and gridded data, over the conterminous United States.

PyPi Version Download Stat

Access hourly NLDAS-2 forcing data.

PyPi Version Download Stat

A collection of tools for computing hydrological signatures

PyPi Version Download Stat

Low-level APIs for connecting to supported web service protocols

Send queries to and receive responses from any ArcGIS RESTful-, WMS-, and WFS-based services.

PyPi Version Download Stat

Convert responses from PyGeoOGC’s supported web services protocols into geospatial and raster datasets.

PyPi Version Download Stat

Asynchronous send/receive requests with persistent caching.

PyPi Version async_stat

Getting Started#

Why HyRiver?#

Some major capabilities of HyRiver are as follows:

  • Easy access to many web services for subsetting data on server-side and returning the requests as masked Datasets or GeoDataFrames.

  • Splitting large requests into smaller chunks, under-the-hood, since web services often limit the number of features per request. So the only bottleneck for subsetting the data is your local machine memory.

  • Navigating and subsetting NHDPlus database (both medium- and high-resolution) using web services.

  • Cleaning up the vector NHDPlus data, fixing some common issues, and computing vector-based accumulation through a river network.

  • A URL inventory for some popular (and tested) web services.

  • Some utilities for manipulating the obtained data and their visualization.

Installation#

You can install all the packages using pip:

$ pip install py3dep pynhd pygeohydro pydaymet pygridmet pynldas2 hydrosignatures pygeoogc pygeoutils async-retriever

Please note that installation with pip fails if libgdal is not installed on your system. You should install this package manually beforehand. For example, on Ubuntu-based distros the required package is libgdal-dev. If this package is installed on your system you should be able to run gdal-config --version successfully.

Alternatively, you can install them using conda:

$ conda install -c conda-forge py3dep pynhd pygeohydro pydaymet pygridmet pynldas2 hydrosignatures pygeoogc pygeoutils async-retriever

or mambaforge (recommended):

$ mamba install py3dep pynhd pygeohydro pydaymet pygridmet pynldas2 hydrosignatures pygeoogc pygeoutils async-retriever

Additionally, you can create a new environment, named hyriver with all the packages and optional dependencies installed with mambaforge using the provided environment.yml file:

$ mamba env create -f ./environment.yml

Dependencies#

  • aiohttp[speedups]>=3.8.3

  • aiohttp-client-cache>=0.8.1

  • aiosqlite

  • cytoolz

  • ujson

  • async-retriever<0.17,>=0.16

  • cytoolz

  • defusedxml

  • joblib

  • multidict

  • owslib>=0.27.2

  • pyproj>=3.0.1

  • requests

  • requests-cache>=0.9.6

  • shapely>=2

  • typing_extensions

  • ujson

  • url-normalize>=1.4

  • urllib3

  • yarl

  • cytoolz

  • geopandas>=0.10

  • netcdf4

  • numpy>=1.21

  • pyproj>=3.0.1

  • rasterio>=1.2

  • rioxarray>=0.11

  • scipy

  • shapely>=2

  • ujson

  • xarray>=2023.01

  • async-retriever<0.17,>=0.16

  • cytoolz

  • geopandas>=0.10

  • networkx

  • numpy>=1.21

  • pandas>=1

  • pyarrow>=1.0.1

  • pygeoogc<0.17,>=0.16

  • pygeoutils<0.17,>=0.16

  • shapely>=2

  • async-retriever<0.17,>=0.16

  • click>=0.7

  • cytoolz

  • geopandas>=0.10

  • numpy>=1.17

  • pygeoogc<0.17,>=0.16.1

  • pygeoutils<0.17,>=0.16.1

  • rasterio>=1.2

  • rioxarray>=0.11

  • scipy

  • shapely>=2

  • xarray>=2023.01

  • async-retriever<0.17,>=0.16

  • cytoolz

  • defusedxml

  • folium

  • geopandas>=0.10

  • h5netcdf

  • hydrosignatures<0.17,>=0.16

  • matplotlib>=3.5

  • numpy>=1.21

  • pandas>=1

  • pygeoogc<0.17,>=0.16

  • pygeoutils<0.17,>=0.16

  • pynhd<0.17,>=0.16

  • pyproj>=3.0.1

  • rioxarray>=0.11

  • scipy

  • shapely>=2

  • ujson

  • xarray>=2023.01

  • async-retriever<0.17,>=0.16

  • click>=0.7

  • geopandas>=0.10

  • numpy>=1.21

  • pandas>=1

  • py3dep<0.17,>=0.16.1

  • pygeoogc<0.17,>=0.16.1

  • pygeoutils<0.17,>=0.16.1

  • pyproj>=3.0.1

  • scipy

  • shapely>=2

  • xarray>=2023.01

  • async-retriever<0.17,>=0.16

  • click>=0.7

  • geopandas>=0.10

  • numpy>=1.21

  • pandas>=1

  • pygeoogc<0.17,>=0.16

  • pygeoutils<0.17,>=0.16

  • pyproj>=3.0.1

  • shapely>=2

  • xarray>=2023.01

  • async-retriever<0.17,>=0.16

  • h5netcdf

  • numpy>=1.21

  • pandas>=1

  • pygeoutils<0.17,>=0.16

  • pyproj>=3.0.1

  • rioxarray>=0.11

  • xarray>=2023.01

  • numpy>=1.21

  • pandas>=1

  • scipy

  • xarray>=2023.01

Additionally, you can also install bottleneck and numba to improve the performance of some computations. Installing pyogrio is highly recommended for improving the performance of working with vector data. For NHDPlus, py7zr and pyogrio are required dependencies. For retrieving soil data, you should install planetary-computer and pystac-client.

Software Stack#

A detailed description of each component of the HyRiver software stack.

PyNHD: Navigate and subset NHDPlus database#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

PyNHD is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services.

This package provides access to several hydro-linked datasets:

These web services can be used to navigate and extract vector data from NHDPlus V2 (both mid- and high-resolution) database such as catchments, HUC8, HUC12, GagesII, flowlines, and water bodies.

Moreover, the PyGeoAPI service provides four functionalities:

  1. flow_trace: Trace flow from a starting point to up/downstream direction.

  2. split_catchment: Split the local catchment of a point of interest at the point’s location.

  3. elevation_profile: Extract elevation profile along a flow path between two points.

  4. cross_section: Extract cross-section at a point of interest along a flow line.

PyNHD also provides access to the entire NHDPlus dataset for CONUS (L48) via nhdplus_l48 function. You can get any of the 31 layers that are available in the NHDPlus dataset. You can also get NHDPlus Value Added Attributes on Hydroshare and ENHD. These datasets that do not have geometries, include slope and roughness, among other attributes, for all NHD flowlines. You can use nhdplus_vaa and enhd_attrs functions to get these datasets.

Additionally, you can get many more derived attributes at NHD catchment-level through two sources:

  • Select Attributes for NHDPlus Version 2.1 Reach Catchments from an item on ScienceBase

  • EPA’s StreamCat dataset.

They both include hundreds of attributes such as hydroclimate properties, water quality, urbanization, and population. In addition to NHD catchment summaries, they also have their network-accumulated values (both upstream and divergence-routed). You can use nhdplus_attrs, epa_nhd_catchments, streamcat functions to get these datasets.

Additionally, PyNHD offers some extra utilities for processing the NHD flowlines:

  • flowline_xsection and network_xsection: Get cross-section lines along a flowline at a given spacing or a network of flowlines at a given spacing.

  • flowline_resample and network_resample: Resample a flowline or network of flowlines based on a given spacing. This is useful for smoothing jagged flowlines similar to those in the NHDPlus database.

  • prepare_nhdplus: For cleaning up the data frame by, for example, removing tiny networks, adding a to_comid column, and finding terminal flowlines if it doesn’t exist.

  • topoogical_sort: For sorting the river network topologically which is useful for routing and flow accumulation.

  • vector_accumulation: For computing flow accumulation in a river network. This function is generic, and any routing method can be plugged in.

These utilities are developed based on an R package called nhdplusTools and a Python package called nldi-xstool.

All functions and classes that request data from web services use async-retriever that offers response caching. By default, the expiration time is set to never expire. All these functions and classes have two optional parameters for controlling the cache: expire_after and disable_caching. You can use expire_after to set the expiration time in seconds. If expire_after is set to -1, the cache will never expire (default). You can use disable_caching if you don’t want to use the cached responses. The cached responses are stored in the ./cache/aiohttp_cache.sqlite file.

You can find some example notebooks here.

Moreover, under the hood, PyNHD uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.

You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_NAME_HTTP: Path to the caching SQLite database for HTTP requests. It defaults to ./cache/http_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to one week.

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

  • HYRIVER_SSL_CERT: Path to a SSL certificate file.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"

You can also try using PyNHD without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation#

You can install PyNHD using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev):

$ pip install pynhd

Alternatively, PyNHD can be installed from the conda-forge repository using Conda or Mamba:

$ conda install -c conda-forge pynhd

Quick start#

Let’s explore the capabilities of NLDI. We need to instantiate the class first:

from pynhd import NLDI, WaterData, NHDPlusHR
import pynhd as nhd

First, let’s get the watershed geometry of the contributing basin of a USGS station using NLDI:

nldi = NLDI()
station_id = "01031500"

basin = nldi.get_basins(station_id)

The navigate_byid class method can be used to navigate NHDPlus in both upstream and downstream of any point in the database. Let’s get the ComIDs and flowlines of the tributaries and the main river channel upstream of the station.

flw_main = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamMain",
    source="flowlines",
    distance=1000,
)

flw_trib = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="flowlines",
    distance=1000,
)

We can get other USGS stations upstream (or downstream) of the station and even set a distance limit (in km):

st_all = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="nwissite",
    distance=1000,
)

st_d20 = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="nwissite",
    distance=20,
)

We can get more information about these stations using GeoConnex:

gcx = GeoConnex("gauges")
stations = st_all.identifier.str.split("-").str[1].unique()
gauges = gpd.GeoDataFrame(
    pd.concat(gcx.query({"provider_id": sid}) for sid in stations),
    crs=4326,
)

Instead, we can carry out a spatial query within the basin of interest:

gauges = pynhd.geoconnex(
    item="gauges",
    query={"geometry": basin.geometry.iloc[0]},
)

Now, let’s get the HUC12 pour points:

pp = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="huc12pp",
    distance=1000,
)
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/nhdplus_navigation.png

Also, we can get the slope data for each river segment from the NHDPlus VAA database:

vaa = nhd.nhdplus_vaa("input_data/nhdplus_vaa.parquet")

flw_trib["comid"] = pd.to_numeric(flw_trib.nhdplus_comid)
slope = gpd.GeoDataFrame(
    pd.merge(flw_trib, vaa[["comid", "slope"]], left_on="comid", right_on="comid"),
    crs=flw_trib.crs,
)
slope[slope.slope < 0] = np.nan

Additionally, we can obtain cross-section lines along the main river channel with 4 km spacing and width of 2 km using network_xsection as follows:

from pynhd import NHD

distance = 4000  # in meters
width = 2000  # in meters
nhd = NHD("flowline_mr")
main_nhd = nhd.byids("COMID", flw_main.index)
main_nhd = pynhd.prepare_nhdplus(main_nhd, 0, 0, 0, purge_non_dendritic=True)
main_nhd = main_nhd.to_crs("ESRI:102003")
cs = pynhd.network_xsection(main_nhd, distance, width)

Then, we can use Py3DEP to obtain the elevation profile along the cross-section lines.

Now, let’s explore the PyGeoAPI capabilities. There are two ways that you can access PyGeoAPI: PyGeoAPI class and pygeoapi function. The PyGeoAPI class is for querying the database for a single location using tuples and list while the pygeoapi function is for querying the database for multiple locations at once and accepts a geopandas.GeoDataFrame as input. The pygeoapi function is more efficient than the PyGeoAPI class and has a simpler interface. In future versions, the PyGeoAPI class will be deprecated and the pygeoapi function will be the only way to access the database. Let’s compare the two, starting by PyGeoAPI:

pygeoapi = PyGeoAPI()

trace = pygeoapi.flow_trace((1774209.63, 856381.68), crs="ESRI:102003", direction="none")

split = pygeoapi.split_catchment((-73.82705, 43.29139), crs=4326, upstream=False)

profile = pygeoapi.elevation_profile(
    [(-103.801086, 40.26772), (-103.80097, 40.270568)],
    numpts=101,
    dem_res=1,
    crs=4326,
)

section = pygeoapi.cross_section((-103.80119, 40.2684), width=1000.0, numpts=101, crs=4326)

Now, let’s do the same operations using pygeoapi:

import geopandas as gpd
import shapely.geometry as sgeom
import pynhd as nhd

coords = gpd.GeoDataFrame(
    {
        "direction": ["up", "down"],
        "upstream": [True, False],
        "width": [1000.0, 500.0],
        "numpts": [101, 55],
    },
    geometry=[
        sgeom.Point(-73.82705, 43.29139),
        sgeom.Point(-103.801086, 40.26772),
    ],
    crs=4326,
)
trace = nhd.pygeoapi(coords, "flow_trace")
split = nhd.pygeoapi(coords, "split_catchment")
section = nhd.pygeoapi(coords, "cross_section")

coords = gpd.GeoDataFrame(
    {
        "direction": ["up", "down"],
        "upstream": [True, False],
        "width": [1000.0, 500.0],
        "numpts": [101, 55],
        "dem_res": [1, 10],
    },
    geometry=[
        sgeom.MultiPoint([(-103.801086, 40.26772), (-103.80097, 40.270568)]),
        sgeom.MultiPoint([(-102.801086, 39.26772), (-102.80097, 39.270568)]),
    ],
    crs=4326,
)
profile = nhd.pygeoapi(coords, "elevation_profile")
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/split_catchment.png

Next, we retrieve mid- and high-resolution flowlines within the bounding box of our watershed and compare them using WaterData for mid-resolution, NHDPlusHR for high-resolution.

mr = WaterData("nhdflowline_network")
nhdp_mr = mr.bybox(basin.geometry[0].bounds)

hr = NHDPlusHR("flowline")
nhdp_hr = hr.bygeom(basin.geometry[0].bounds)
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/hr_mr.png

An alternative to WaterData and NHDPlusHR is the NHD class that supports both the mid- and high-resolution NHDPlus V2 data:

mr = NHD("flowline_mr")
nhdp_mr = mr.bygeom(basin.geometry[0].bounds)

hr = NHD("flowline_hr")
nhdp_hr = hr.bygeom(basin.geometry[0].bounds)

Moreover, WaterData can find features within a given radius (in meters) of a point:

eck4 = "+proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
coords = (-5727797.427596455, 5584066.49330473)
rad = 5e3
flw_rad = mr.bydistance(coords, rad, loc_crs=eck4)
flw_rad = flw_rad.to_crs(eck4)

Instead of getting all features within a radius of the coordinate, we can snap to the closest feature ID using NLDI:

comid_closest = nldi.comid_byloc((x, y), eck4)
flw_closest = nhdp_mr.byid("comid", comid_closest.comid.values[0])
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/nhdplus_radius.png

Since NHDPlus HR is still at the pre-release stage let’s use the MR flowlines to demonstrate the vector-based accumulation. Based on a topological sorted river network pynhd.vector_accumulation computes flow accumulation in the network. It returns a data frame that is sorted from upstream to downstream that shows the accumulated flow in each node.

PyNHD has a utility called prepare_nhdplus that identifies such relationships among other things such as fixing some common issues with NHDPlus flowlines. But first, we need to get all the NHDPlus attributes for each ComID since NLDI only provides the flowlines’ geometries and ComIDs which is useful for navigating the vector river network data. For getting the NHDPlus database we use WaterData. Let’s use the nhdflowline_network layer to get required info.

wd = WaterData("nhdflowline_network")

comids = flw_trib.nhdplus_comid.to_list()
nhdp_trib = wd.byid("comid", comids)
flw = nhd.prepare_nhdplus(nhdp_trib, 0, 0, purge_non_dendritic=False)

To demonstrate the use of routing, let’s use nhdplus_attrs function to get a list of available NHDPlus attributes

char = "CAT_RECHG"
area = "areasqkm"

local = nldi.getcharacteristic_byid(comids, "local", char_ids=char)
flw = flw.merge(local[char], left_on="comid", right_index=True)


def runoff_acc(qin, q, a):
    return qin + q * a


flw_r = flw[["comid", "tocomid", char, area]]
runoff = nhd.vector_accumulation(flw_r, runoff_acc, char, [char, area])


def area_acc(ain, a):
    return ain + a


flw_a = flw[["comid", "tocomid", area]]
areasqkm = nhd.vector_accumulation(flw_a, area_acc, area, [area])

runoff /= areasqkm

Since these are catchment-scale characteristics, let’s get the catchments then add the accumulated characteristic as a new column and plot the results.

wd = WaterData("catchmentsp")
catchments = wd.byid("featureid", comids)

c_local = catchments.merge(local, left_on="featureid", right_index=True)
c_acc = catchments.merge(runoff, left_on="featureid", right_index=True)
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/flow_accumulation.png

More examples can be found here.

PyGeoHydro: Retrieve Geospatial Hydrology Data#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

PyGeoHydro (formerly named hydrodata) is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access to some public web services that offer geospatial hydrology data. It has three main modules: pygeohydro, plot, and helpers.

PyGeoHydro supports the following datasets:

  • gNATSGO for US soil properties.

  • Derived Soil Properties for soil porosity, available water capacity, and field capacity across the US.

  • NWIS for daily mean streamflow observations (returned as a pandas.DataFrame or xarray.Dataset with station attributes),

  • SensorThings API for accessing real-time data of USGS sensors.

  • CAMELS for accessing streamflow observations (1980-2014) and basin-level attributes of 671 stations within CONUS.

  • Water Quality Portal for accessing current and historical water quality data from more than 1.5 million sites across the US,

  • NID for accessing the National Inventory of Dams web service,

  • HCDN 2009 for identifying sites where human activity affects the natural flow of the watercourse,

  • NLCD 2021 for land cover/land use, imperviousness descriptor, and canopy data. You can get data using both geometries and coordinates.

  • WBD for accessing Hydrologic Unit (HU) polygon boundaries within the US (all HUC levels).

  • SSEBop for daily actual evapotranspiration, for both single pixel and gridded data.

  • Irrigation Withdrawals for estimated monthly water use for irrigation by 12-digit hydrologic unit in the CONUS for 2015

  • STN for access USGS Short-Term Network (STN)

  • eHydro for accessing USACE Hydrographic Surveys that includes topobathymetry data

  • NFHL for accessing FEMA’s National Flood Hazard Layer (NFHL) data.

Also, it includes several other functions:

  • interactive_map: Interactive map for exploring NWIS stations within a bounding box.

  • cover_statistics: Categorical statistics of land use/land cover data.

  • overland_roughness: Estimate overland roughness from land use/land cover data.

  • streamflow_fillna: Fill missing daily streamflow values with day-of-year averages. Streamflow observations must be at least for 10-year long.

The plot module includes two main functions:

  • signatures: Hydrologic signature graphs.

  • cover_legends: Official NLCD land cover legends for plotting a land cover dataset.

  • descriptor_legends: Color map and legends for plotting an imperviousness descriptor dataset.

The helpers module includes:

  • nlcd_helper: A roughness coefficients lookup table for each land cover and imperviousness descriptor type which is useful for overland flow routing among other applications.

  • nwis_error: A dataframe for finding information about NWIS requests’ errors.

You can find some example notebooks here.

Moreover, under the hood, PyGeoHydro uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.

You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_NAME_HTTP: Path to the caching SQLite database for HTTP requests. It defaults to ./cache/http_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to one week.

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

  • HYRIVER_SSL_CERT: Path to a SSL certificate file.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"

You can also try using PyGeoHydro without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation#

You can install PyGeoHydro using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev). Moreover, PyGeoHydro has an optional dependency for using persistent caching, requests-cache. We highly recommend installing this package as it can significantly speed up send/receive queries. You don’t have to change anything in your code, since PyGeoHydro under-the-hood looks for requests-cache and if available, it will automatically use persistent caching:

$ pip install pygeohydro

Alternatively, PyGeoHydro can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge pygeohydro

Quick start#

We can obtain river topobathymetry data using the EHydro class. We can subset the dataset either using a geometry or a bounding box, based on their ID, or SQL query:

from pygeohydro import EHydro

ehydro = EHydro("points")
topobathy = ehydro.bygeom((-122.53, 45.57, -122.52, 45.59))

We can explore the available NWIS stations within a bounding box using interactive_map function. It returns an interactive map and by clicking on a station some of the most important properties of stations are shown.

import pygeohydro as gh

bbox = (-69.5, 45, -69, 45.5)
gh.interactive_map(bbox)
Interactive Map

We can select all the stations within this boundary box that have daily mean streamflow data from 2000-01-01 to 2010-12-31:

from pygeohydro import NWIS

nwis = NWIS()
query = {
    "bBox": ",".join(f"{b:.06f}" for b in bbox),
    "hasDataTypeCd": "dv",
    "outputDataTypeCd": "dv",
}
info_box = nwis.get_info(query)
dates = ("2000-01-01", "2010-12-31")
stations = info_box[
    (info_box.begin_date <= dates[0]) & (info_box.end_date >= dates[1])
].site_no.tolist()

Then, we can get the daily streamflow data in mm/day (by default the values are in cms) and plot them:

from pygeohydro import plot

qobs = nwis.get_streamflow(stations, dates, mmd=True)
plot.signatures(qobs)

By default, get_streamflow returns a pandas.DataFrame that has a attrs method containing metadata for all the stations. You can access it like so qobs.attrs. Moreover, we can get the same data as xarray.Dataset as follows:

qobs_ds = nwis.get_streamflow(stations, dates, to_xarray=True)

This xarray.Dataset has two dimensions: time and station_id. It has 10 variables including discharge with two dimensions while other variables that are station attitudes are one dimensional.

We can also get instantaneous streamflow data using get_streamflow. This method assumes that the input dates are in UTC time zone and returns the data in UTC time zone as well.

date = ("2005-01-01 12:00", "2005-01-12 15:00")
qobs = nwis.get_streamflow("01646500", date, freq="iv")

We can query USGS stations of type “stream” in Arizona using SensorThings API as follows:

odata = {
    "filter": "properties/monitoringLocationType eq 'Stream' and properties/stateFIPS eq 'US:04'",
}
df = sensor.query_byodata(odata)

Irrigation withdrawals data can be obtained as follows:

irr = gh.irrigation_withdrawals()

We can get the CAMELS dataset as a geopandas.GeoDataFrame that includes geometry and basin-level attributes of 671 natural watersheds within CONUS and their streamflow observations between 1980-2014 as a xarray.Dataset, like so:

attrs, qobs = gh.get_camels()

The WaterQuality has a number of convenience methods to retrieve data from the web service. Since there are many parameter combinations that can be used to retrieve data, a general method is also provided to retrieve data from any of the valid endpoints. You can use get_json to retrieve stations info as a geopandas.GeoDataFrame or get_csv to retrieve stations data as a pandas.DataFrame. You can construct a dictionary of the parameters and pass it to one of these functions. For more information on the parameters, please consult the Water Quality Data documentation. For example, let’s find all the stations within a bounding box that have Caffeine data:

from pynhd import WaterQuality

bbox = (-92.8, 44.2, -88.9, 46.0)
kwds = {"characteristicName": "Caffeine"}
wq = WaterQuality()
stations = wq.station_bybbox(bbox, kwds)

Or the same criterion but within a 30-mile radius of a point:

stations = wq.station_bydistance(-92.8, 44.2, 30, kwds)

Then we can get the data for all these stations the data like this:

sids = stations.MonitoringLocationIdentifier.tolist()
caff = wq.data_bystation(sids, kwds)
Water Quality

Moreover, we can get land use/land cove data using nlcd_bygeom or nlcd_bycoods functions, percentages of land cover types using cover_statistics, and overland roughness using overland_roughness. The nlcd_bycoords function returns a geopandas.GeoDataFrame with the NLCD layers as columns and input coordinates as the geometry column. Moreover, the nlcd_bygeom function accepts both a single geometry or a geopandas.GeoDataFrame as the input.

from pynhd import NLDI

basins = NLDI().get_basins(["01031450", "01318500", "01031510"])
lulc = gh.nlcd_bygeom(basins, 100, years={"cover": [2016, 2019]})
stats = gh.cover_statistics(lulc["01318500"].cover_2016)
roughness = gh.overland_roughness(lulc["01318500"].cover_2019)
Land Use/Land Cover

Next, let’s use ssebopeta_bygeom to get actual ET data for a basin. Note that there’s a ssebopeta_bycoords function that returns an ETA time series for a single coordinate.

geometry = NLDI().get_basins("01315500").geometry[0]
eta = gh.ssebopeta_bygeom(geometry, dates=("2005-10-01", "2005-10-05"))
Actual ET

Additionally, we can pull all the US dams data using NID. Let’s get dams that are within this bounding box and have a maximum storage larger than 200 acre-feet.

nid = NID()
dams = nid.get_bygeom((-65.77, 43.07, -69.31, 45.45), 4326)
dams = nid.inventory_byid(dams.id.to_list())
dams = dams[dams.maxStorage > 200]

We can get also all dams within CONUS with maximum storage larger than 2500 acre-feet:

conus_geom = gh.get_us_states("contiguous")

dam_list = nid.get_byfilter([{"maxStorage": ["[2500 +inf]"]}])
dams = nid.inventory_byid(dam_list[0].id.to_list(), stage_nid=True)

conus_dams = dams[dams.stateKey.isin(conus_geom.STUSPS)].reset_index(drop=True)
Dams

The WBD class allows us to get Hydrologic Unit (HU) polygon boundaries. Let’s get the two Hudson HUC4s:

from pygeohydro import WBD

wbd = WBD("huc4")
hudson = wbd.byids("huc4", ["0202", "0203"])

The NFHL class allows us to retrieve FEMA’s National Flood Hazard Layer (NFHL) data. Let’s get the cross-section data for a small region in Vermont:

from pygeohydro import NFHL

nfhl = NFHL("NFHL", "cross-sections")
gdf_xs = nfhl.bygeom((-73.42, 43.28, -72.9, 43.52), geo_crs=4269)

Py3DEP: Topographic data through 3DEP#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

Py3DEP is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access to the 3DEP database which is a part of the National Map services. The 3DEP service has multi-resolution sources and depending on the user-provided resolution, the data is resampled on the server-side based on all the available data sources. Py3DEP returns the requests as xarray dataset.

The following functionalities are currently available:

  • get_map: Get topographic data the dynamic 3DEP service that supports the following layers:

    • DEM

    • Hillshade Gray

    • Aspect Degrees

    • Aspect Map

    • GreyHillshade Elevation Fill

    • Hillshade Multidirectional

    • Slope Degrees

    • Slope Map

    • Hillshade Elevation Tinted

    • Height Ellipsoidal

    • Contour 25

    • Contour Smoothed 25

  • static_3dep_dem: Get DEM data at 10 m, 30 m, or 60 m resolution from the staged 3DEP data. Since this function only returns DEM, for computing other terrain attributes you can use xarray-spatial. Just note that you should reproject the output DataArray to a projected CRS like 5070 before passing it to xarray-spatial like so: dem = dem.rio.reproject(5070).

  • get_dem: Get DEM data from either the dynamic or static 3DEP service. Considering that the static service is much faster, if the target DEM resolution is 10 m, 30 m, or 60 m, then the static service is used (static_3dep_dem). Otherwise, the dynamic service is used (get_map using DEM layer).

  • get_map_vrt: Get DEM data and store it as a GDAL VRT file from the dynamic 3DEP service. This function is mainly provided for large requests due to its low memory footprint. Moreover, due to lazy loading of the data this function can be much faster than get_map or get_dem, even for small requests at the cost of higher disk usage.

  • elevation_bygrid: For retrieving elevations of all the grid points in a 2D grid.

  • add_elevation: For adding elevation data as a new variable to an input xarray.DataArray or xarray.Dataset.

  • elevation_bycoords: For retrieving elevation of a list of x and y coordinates.

  • elevation_profile: For retrieving elevation profile along a line at a given spacing. This function converts the line to a B-spline and then calculates the elevation along the spline at a given uniform spacing.

  • deg2mpm: For converting slope dataset from degree to meter per meter.

  • query_3dep_sources: For querying bounds of 3DEP’s data sources within a bounding box.

  • check_3dep_availability: For querying 3DEP’s resolution availability within a bounding box.

You can find some example notebooks here.

Moreover, under the hood, Py3DEP uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.

You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_NAME_HTTP: Path to the caching SQLite database for HTTP requests. It defaults to ./cache/http_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to one week.

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

  • HYRIVER_SSL_CERT: Path to a SSL certificate file.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"

You can also try using Py3DEP without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation#

You can install Py3DEP using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev). Moreover, Py3DEP has an optional dependency for using persistent caching, requests-cache. We highly recommend installing this package as it can significantly speed up send/receive queries. You don’t have to change anything in your code, since Py3DEP under-the-hood looks for requests-cache and if available, it will automatically use persistent caching:

$ pip install py3dep

Alternatively, Py3DEP can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge py3dep

Quick start#

You can use Py3DEP using command-line or as a Python library. The command-line interface provides access to two functionality:

  • Getting topographic data: You must create a geopandas.GeoDataFrame that contains the geometries of the target locations. This dataframe must have at least three columns: id, res, and geometry. The id column is used as filenames for saving the obtained topographic data to a NetCDF (.nc) file. The res column must be the target resolution in meter. Then, you must save the dataframe to a file with extensions such as .shp or .gpkg (whatever that geopandas.read_file can read).

  • Getting elevation: You must create a pandas.DataFrame that contains coordinates of the target locations. This dataframe must have at least two columns: x and y. The elevations are obtained using airmap service in meters. The data are saved as a csv file with the same filename as the input file with an _elevation appended, e.g., coords_elevation.csv.

$ py3dep --help
Usage: py3dep [OPTIONS] COMMAND [ARGS]...

Command-line interface for Py3DEP.

Options:
-h, --help  Show this message and exit.

Commands:
coords    Retrieve topographic data for a list of coordinates.
geometry  Retrieve topographic data within geometries.

The coords sub-command is as follows:

$ py3dep coords -h
Usage: py3dep coords [OPTIONS] FPATH

Retrieve topographic data for a list of coordinates.

FPATH: Path to a csv file with two columns named ``lon`` and ``lat``.

Examples:
    $ cat coords.csv
    lon,lat
    -122.2493328,37.8122894
    $ py3dep coords coords.csv -q airmap -s topo_dir

Options:
-q, --query_source [airmap|tnm|tep]
                                Source of the elevation data.
-s, --save_dir PATH             Path to a directory to save the requested
                                files. Extension for the outputs is either
                                `.nc` for geometry or `.csv` for coords.

-h, --help                      Show this message and exit.

And, the geometry sub-command is as follows:

$ py3dep geometry -h
Usage: py3dep geometry [OPTIONS] FPATH

Retrieve topographic data within geometries.

FPATH: Path to a shapefile (.shp) or geopackage (.gpkg) file.
This file must have three columns and contain a ``crs`` attribute:
    - ``id``: Feature identifiers that py3dep uses as the output netcdf/csv filenames.
    - ``res``: Target resolution in meters.
    - ``geometry``: A Polygon or MultiPloygon.

Examples:
    $ py3dep geometry ny_geom.gpkg -l "Slope Map" -l DEM -s topo_dir

Options:
-l, --layers [DEM|Hillshade Gray|Aspect Degrees|Aspect Map|GreyHillshade_elevationFill|Hillshade Multidirectional|Slope Map|Slope Degrees|Hillshade Elevation Tinted|Height Ellipsoidal|Contour 25|Contour Smoothed 25]
                                Target topographic data layers
-s, --save_dir PATH             Path to a directory to save the requested
                                files.Extension for the outputs is either
                                `.nc` for geometry or `.csv` for coords.

-h, --help                      Show this message and exit.

Now, let’s see how we can use Py3DEP as a library.

Py3DEP accepts Shapely’s Polygon or a bounding box (a tuple of length four) as an input geometry. We can use PyNHD to get a watershed’s geometry, then use it to get the DEM and slope in meters/meters from Py3DEP using get_map function.

The get_map has a resolution argument that sets the target resolution in meters. Note that the highest available resolution throughout the CONUS is about 10 m, though higher resolutions are available in limited parts of the US. Note that the input geometry can be in any valid spatial reference (geo_crs argument). The crs argument, however, is limited to CRS:84, EPSG:4326, and EPSG:3857 since 3DEP only supports these spatial references.

import py3dep
from pynhd import NLDI

geom = NLDI().get_basins("01031500").geometry[0]
dem = py3dep.get_map("DEM", geom, resolution=30, geo_crs=4326, crs=3857)
slope = py3dep.get_map("Slope Degrees", geom, resolution=30)
slope = py3dep.deg2mpm(slope)

We can also use static_dem function to get the same DEM:

import xrspatial

dem = py3dep.get_dem(geom, 30)
slope = xrspatial.slope(dem.rio.reproject(5070))
slope = py3dep.deg2mpm(slope)
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/dem_slope.png

We can use rioxarray package to save the obtained dataset as a raster file:

import rioxarray

dem.rio.to_raster("dem_01031500.tif")

Moreover, we can get the elevations of a set of x- and y- coordinates on a grid. For example, let’s get the minimum temperature data within this watershed from Daymet using PyDaymet then add the elevation as a new variable to the dataset:

import pydaymet as daymet
import xarray as xr
import numpy as np

clm = daymet.get_bygeom(geometry, ("2005-01-01", "2005-01-31"), variables="tmin")
elev = py3dep.elevation_bygrid(clm.x.values, clm.y.values, clm.crs, clm.res[0] * 1000)
attrs = clm.attrs
clm = xr.merge([clm, elev])
clm["elevation"] = clm.elevation.where(~np.isnan(clm.isel(time=0).tmin), drop=True)
clm.attrs.update(attrs)

Now, let’s get street network data using osmnx package and add elevation data for its nodes using elevation_bycoords function.

import osmnx as ox

G = ox.graph_from_place("Piedmont, California, USA", network_type="drive")
x, y = nx.get_node_attributes(G, "x").values(), nx.get_node_attributes(G, "y").values()
elevation = py3dep.elevation_bycoords(zip(x, y), crs=4326)
nx.set_node_attributes(G, dict(zip(G.nodes(), elevation)), "elevation")
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/street_elev.png

We can get the elevation profile along a line at a given spacing using elevation_profile function. For example, let’s get the elevation profile at 10-m spacing along the main flowline of the upstream drainage area of a USGS station with ID 01031500:

import py3dep
from pynhd import NLDI

flw_main = NLDI().navigate_byid(
    fsource="nwissite",
    fid="USGS-01031500",
    navigation="upstreamMain",
    source="flowlines",
    distance=1000,
)
line = flw_main.geometry.unary_union
elevation = py3dep.elevation_profile(line, 10)

PyDaymet: Daily climate data through Daymet#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Warning

Since the release of Daymet v4 R1 on November 2022, the URL of Daymet’s server has been changed. Therefore, only PyDaymet v0.13.7+ is going to work, and previous versions will not work anymore.

Features#

PyDaymet is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access to climate data from Daymet V4 R1 database using NetCDF Subset Service (NCSS). Both single pixel (using get_bycoords function) and gridded data (using get_bygeom) are supported which are returned as pandas.DataFrame and xarray.Dataset, respectively. Climate data is available for North America, Hawaii from 1980, and Puerto Rico from 1950 at three time scales: daily, monthly, and annual. Additionally, PyDaymet can compute Potential EvapoTranspiration (PET) using three methods: penman_monteith, priestley_taylor, and hargreaves_samani for both single pixel and gridded data.

For PET computations, PyDaymet accepts four additional user-defined parameters:

  • penman_monteith: soil_heat_flux, albedo, alpha,

    and arid_correction.

  • priestley_taylor: soil_heat_flux, albedo, and arid_correction.

  • hargreaves_samani: None.

Default values for the parameters are: soil_heat_flux = 0, albedo = 0.23, alpha = 1.26, and arid_correction = False. An important parameter for priestley_taylor and penman_monteith methods is arid_correction which is used to correct the actual vapor pressure for arid regions. Since relative humidity is not provided by Daymet, the actual vapor pressure is computed assuming that the dew point temperature is equal to the minimum temperature. However, for arid regions, FAO 56 suggests subtracting minimum temperature by 2-3 °C to account for the fact that in arid regions, the air might not be saturated when its temperature is at its minimum. For such areas, you can pass {"arid_correction": True, ...} to subtract 2 °C from the minimum temperature for computing the actual vapor pressure.

You can find some example notebooks here.

Moreover, under the hood, PyDaymet uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.

You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_NAME_HTTP: Path to the caching SQLite database for HTTP requests. It defaults to ./cache/http_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to one week.

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

  • HYRIVER_SSL_CERT: Path to a SSL certificate file.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"

You can also try using PyDaymet without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation#

You can install PyDaymet using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev):

$ pip install pydaymet

Alternatively, PyDaymet can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge pydaymet

Quick start#

You can use PyDaymet using command-line or as a Python library. The commanda-line provides access to two functionality:

  • Getting gridded climate data: You must create a geopandas.GeoDataFrame that contains the geometries of the target locations. This dataframe must have four columns: id, start, end, geometry. The id column is used as filenames for saving the obtained climate data to a NetCDF (.nc) file. The start and end columns are starting and ending dates of the target period. Then, you must save the dataframe as a shapefile (.shp) or geopackage (.gpkg) with CRS attribute.

  • Getting single pixel climate data: You must create a CSV file that contains coordinates of the target locations. This file must have at four columns: id, start, end, lon, and lat. The id column is used as filenames for saving the obtained climate data to a CSV (.csv) file. The start and end columns are the same as the geometry command. The lon and lat columns are the longitude and latitude coordinates of the target locations.

$ pydaymet -h
Usage: pydaymet [OPTIONS] COMMAND [ARGS]...

Command-line interface for PyDaymet.

Options:
-h, --help  Show this message and exit.

Commands:
coords    Retrieve climate data for a list of coordinates.
geometry  Retrieve climate data for a dataframe of geometries.

The coords sub-command is as follows:

$ pydaymet coords -h
Usage: pydaymet coords [OPTIONS] FPATH

Retrieve climate data for a list of coordinates.

FPATH: Path to a csv file with four columns:
    - ``id``: Feature identifiers that daymet uses as the output netcdf filenames.
    - ``start``: Start time.
    - ``end``: End time.
    - ``lon``: Longitude of the points of interest.
    - ``lat``: Latitude of the points of interest.
    - ``time_scale``: (optional) Time scale, either ``daily`` (default), ``monthly`` or ``annual``.
    - ``pet``: (optional) Method to compute PET. Supported methods are:
                ``penman_monteith``, ``hargreaves_samani``, ``priestley_taylor``, and ``none`` (default).
    - ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.

Examples:
    $ cat coords.csv
    id,lon,lat,start,end,pet
    california,-122.2493328,37.8122894,2012-01-01,2014-12-31,hargreaves_samani
    $ pydaymet coords coords.csv -v prcp -v tmin

Options:
-v, --variables TEXT  Target variables. You can pass this flag multiple
                        times for multiple variables.
-s, --save_dir PATH   Path to a directory to save the requested files.
                        Extension for the outputs is .nc for geometry and .csv
                        for coords.
--disable_ssl         Pass to disable SSL certification verification.
-h, --help            Show this message and exit.

And, the geometry sub-command is as follows:

$ pydaymet geometry -h
Usage: pydaymet geometry [OPTIONS] FPATH

Retrieve climate data for a dataframe of geometries.

FPATH: Path to a shapefile (.shp) or geopackage (.gpkg) file.
This file must have four columns and contain a ``crs`` attribute:
    - ``id``: Feature identifiers that daymet uses as the output netcdf filenames.
    - ``start``: Start time.
    - ``end``: End time.
    - ``geometry``: Target geometries.
    - ``time_scale``: (optional) Time scale, either ``daily`` (default), ``monthly`` or ``annual``.
    - ``pet``: (optional) Method to compute PET. Supported methods are:
                ``penman_monteith``, ``hargreaves_samani``, ``priestley_taylor``, and ``none`` (default).
    - ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.

Examples:
    $ pydaymet geometry geo.gpkg -v prcp -v tmin

Options:
-v, --variables TEXT  Target variables. You can pass this flag multiple
                        times for multiple variables.
-s, --save_dir PATH   Path to a directory to save the requested files.
                        Extension for the outputs is .nc for geometry and .csv
                        for coords.
--disable_ssl         Pass to disable SSL certification verification.
-h, --help            Show this message and exit.

Now, let’s see how we can use PyDaymet as a library.

PyDaymet offers two functions for getting climate data; get_bycoords and get_bygeom. The arguments of these functions are identical except the first argument where the latter should be polygon and the former should be a coordinate (a tuple of length two as in (x, y)). The input geometry or coordinate can be in any valid CRS (defaults to EPSG:4326). The dates argument can be either a tuple of length two like (start_str, end_str) or a list of years like [2000, 2005]. It is noted that both functions have a pet flag for computing PET and a snow flag for separating snow from precipitation using Martinez and Gupta (2010) method. Additionally, we can pass time_scale to get daily, monthly or annual summaries. This flag by default is set to daily.

from pynhd import NLDI
import pydaymet as daymet

geometry = NLDI().get_basins("01031500").geometry[0]

var = ["prcp", "tmin"]
dates = ("2000-01-01", "2000-06-30")

daily = daymet.get_bygeom(geometry, dates, variables=var, pet="priestley_taylor", snow=True)
monthly = daymet.get_bygeom(geometry, dates, variables=var, time_scale="monthly")
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/daymet_grid.png

If the input geometry (or coordinate) is in a CRS other than EPSG:4326, we should pass it to the functions.

coords = (-1431147.7928, 318483.4618)
crs = 3542
dates = ("2000-01-01", "2006-12-31")
annual = daymet.get_bycoords(coords, dates, variables=var, loc_crs=crs, time_scale="annual")
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/daymet_loc.png

Additionally, the get_bycoords function accepts a list of coordinates and by setting the to_xarray flag to True it can return the results as a xarray.Dataset instead of a pandas.DataFrame:

coords = [(-94.986, 29.973), (-95.478, 30.134)]
idx = ["P1", "P2"]
clm_ds = daymet.get_bycoords(coords, range(2000, 2021), coords_id=idx, to_xarray=True)

Also, we can use the potential_et function to compute PET by passing the daily climate data. We can either pass a pandas.DataFrame or a xarray.Dataset. Note that, penman_monteith and priestley_taylor methods have parameters that can be passed via the params argument, if any value other than the default values are needed. For example, default value of alpha for priestley_taylor method is 1.26 (humid regions), we can set it to 1.74 (arid regions) as follows:

pet_hs = daymet.potential_et(daily, methods="priestley_taylor", params={"alpha": 1.74})

Next, let’s get annual total precipitation for Hawaii and Puerto Rico for 2010.

hi_ext = (-160.3055, 17.9539, -154.7715, 23.5186)
pr_ext = (-67.9927, 16.8443, -64.1195, 19.9381)
hi = daymet.get_bygeom(hi_ext, 2010, variables="prcp", region="hi", time_scale="annual")
pr = daymet.get_bygeom(pr_ext, 2010, variables="prcp", region="pr", time_scale="annual")

Some example plots are shown below:

https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/hi.png https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/pr.png

PyGridMET: Daily climate data through GridMET#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

PyGridMET is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access to daily climate data over contermonious US (CONUS) from GridMET database using NetCDF Subset Service (NCSS). Both single pixel (using get_bycoords function) and gridded data (using get_bygeom) are supported which are returned as pandas.DataFrame and xarray.Dataset, respectively.

You can find some example notebooks here.

Moreover, under the hood, PyGridMET uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.

You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_NAME_HTTP: Path to the caching SQLite database for HTTP requests. It defaults to ./cache/http_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to one week.

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

  • HYRIVER_SSL_CERT: Path to a SSL certificate file.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"

You can also try using PyGridMET without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation#

You can install PyGridMET using pip as follows:

$ pip install pygridmet

Alternatively, PyGridMET can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge pygridmet

Quick start#

You can use PyGridMET using command-line or as a Python library. The commanda-line provides access to two functionality:

  • Getting gridded climate data: You must create a geopandas.GeoDataFrame that contains the geometries of the target locations. This dataframe must have four columns: id, start, end, geometry. The id column is used as filenames for saving the obtained climate data to a NetCDF (.nc) file. The start and end columns are starting and ending dates of the target period. Then, you must save the dataframe as a shapefile (.shp) or geopackage (.gpkg) with CRS attribute.

  • Getting single pixel climate data: You must create a CSV file that contains coordinates of the target locations. This file must have at four columns: id, start, end, lon, and lat. The id column is used as filenames for saving the obtained climate data to a CSV (.csv) file. The start and end columns are the same as the geometry command. The lon and lat columns are the longitude and latitude coordinates of the target locations.

$ pygridmet -h
Usage: pygridmet [OPTIONS] COMMAND [ARGS]...

Command-line interface for PyGridMET.

Options:
-h, --help  Show this message and exit.

Commands:
coords    Retrieve climate data for a list of coordinates.
geometry  Retrieve climate data for a dataframe of geometries.

The coords sub-command is as follows:

$ pygridmet coords -h
Usage: pygridmet coords [OPTIONS] FPATH

Retrieve climate data for a list of coordinates.

FPATH: Path to a csv file with four columns:
    - ``id``: Feature identifiers that gridmet uses as the output netcdf filenames.
    - ``start``: Start time.
    - ``end``: End time.
    - ``lon``: Longitude of the points of interest.
    - ``lat``: Latitude of the points of interest.
    - ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.

Examples:
    $ cat coords.csv
    id,lon,lat,start,end
    california,-122.2493328,37.8122894,2012-01-01,2014-12-31
    $ pygridmet coords coords.csv -v pr -v tmmn

Options:
-v, --variables TEXT  Target variables. You can pass this flag multiple
                        times for multiple variables.
-s, --save_dir PATH   Path to a directory to save the requested files.
                        Extension for the outputs is .nc for geometry and .csv
                        for coords.
--disable_ssl         Pass to disable SSL certification verification.
-h, --help            Show this message and exit.

And, the geometry sub-command is as follows:

$ pygridmet geometry -h
Usage: pygridmet geometry [OPTIONS] FPATH

Retrieve climate data for a dataframe of geometries.

FPATH: Path to a shapefile (.shp) or geopackage (.gpkg) file.
This file must have four columns and contain a ``crs`` attribute:
    - ``id``: Feature identifiers that gridmet uses as the output netcdf filenames.
    - ``start``: Start time.
    - ``end``: End time.
    - ``geometry``: Target geometries.
    - ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.

Examples:
    $ pygridmet geometry geo.gpkg -v pr -v tmmn

Options:
-v, --variables TEXT  Target variables. You can pass this flag multiple
                        times for multiple variables.
-s, --save_dir PATH   Path to a directory to save the requested files.
                        Extension for the outputs is .nc for geometry and .csv
                        for coords.
--disable_ssl         Pass to disable SSL certification verification.
-h, --help            Show this message and exit.

Now, let’s see how we can use PyGridMET as a library.

PyGridMET offers two functions for getting climate data; get_bycoords and get_bygeom. The arguments of these functions are identical except the first argument where the latter should be polygon and the former should be a coordinate (a tuple of length two as in (x, y)). The input geometry or coordinate can be in any valid CRS (defaults to EPSG:4326). The dates argument can be either a tuple of length two like (start_str, end_str) or a list of years like [2000, 2005]. It is noted that both functions have a snow flag for separating snow from precipitation using Martinez and Gupta (2010) method.

We can get a dataframe of available variables and their info by calling GridMET().gridmet_table:

Variable

Abbr

Unit

Precipitation

pr

mm

Maximum Relative Humidity

rmax

%

Minimum Relative Humidity

rmin

%

Specific Humidity

sph

kg/kg

Surface Radiation

srad

W/m2

Wind Direction

th

Degrees Clockwise from north

Minimum Air Temperature

tmmn

K

Maximum Air Temperature

tmmx

K

Wind Speed

vs

m/s

Burning Index

bi

Dimensionless

Fuel Moisture (100-hr)

fm100

%

Fuel Moisture (1000-hr)

fm1000

%

Energy Release Component

erc

Dimensionless

Reference Evapotranspiration (Alfalfa)

etr

mm

Reference Evapotranspiration (Grass)

pet

mm

Vapor Pressure Deficit

vpd

kPa

from pynhd import NLDI
import pygridmet as gridmet

geometry = NLDI().get_basins("01031500").geometry[0]

var = ["pr", "tmmn"]
dates = ("2000-01-01", "2000-06-30")

daily = gridmet.get_bygeom(geometry, dates, variables=var, snow=True)
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/gridmet_grid.png

If the input geometry (or coordinate) is in a CRS other than EPSG:4326, we should pass it to the functions.

coords = (-1431147.7928, 318483.4618)
crs = 3542
dates = ("2000-01-01", "2006-12-31")
data = gridmet.get_bycoords(coords, dates, variables=var, loc_crs=crs)
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/gridmet_loc.png

Additionally, the get_bycoords function accepts a list of coordinates and by setting the to_xarray flag to True it can return the results as a xarray.Dataset instead of a pandas.DataFrame:

coords = [(-94.986, 29.973), (-95.478, 30.134)]
idx = ["P1", "P2"]
clm_ds = gridmet.get_bycoords(coords, range(2000, 2021), coords_id=idx, to_xarray=True)

PyNLDAS2: Hourly NLDAS-2 Forcing Data#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

PyNLDAS2 is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access NLDAS-2 Forcing dataset via Hydrology Data Rods. Currently, only hourly data is supported. There are three main functions:

  • get_bycoords: Forcing data for a list of coordinates as a pandas.DataFrame or xarray.Dataset,

  • get_bygeom: Forcing data within a geometry as a xarray.Dataset,

  • get_grid_mask: NLDAS2 land/water grid mask as a xarray.Dataset.

PyNLDAS2 only provides access to the hourly NLDAS2 dataset, so if you need to access other NASA climate datasets you can check out tsgettoolbox developed by Tim Cera.

Moreover, under the hood, PyNLDAS2 uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.

You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_NAME_HTTP: Path to the caching SQLite database for HTTP requests. It defaults to ./cache/http_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to one week.

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

  • HYRIVER_SSL_CERT: Path to a SSL certificate file.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"

You can find some example notebooks here.

You can also try using PyNLDAS2 without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation#

You can install pynldas2 using pip:

$ pip install pynldas2

Alternatively, pynldas2 can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge pynldas2

Quick start#

The NLDAS2 database provides forcing data at 1/8th-degree grid spacing and range from 01 Jan 1979 to present. Let’s take a look at NLDAS2 grid mask that includes land, water, soil, and vegetation masks:

import pynldas2 as nldas

grid = nldas.get_grid_mask()
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/nldas_grid.png

Next, we use PyGeoHydro to get the geometry of a HUC8 with ID of 1306003, then we get the forcing data within the obtained geometry.

from pygeohydro import WBD

huc8 = WBD("huc8")
geometry = huc8.byids("huc8", "13060003").geometry[0]
clm = nldas.get_bygeom(geometry, "2010-01-01", "2010-01-31", 4326)
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/nldas_humidity.png

Road Map#

  • [ ] Add PET calculation functions similar to PyDaymet but at hourly timescale.

  • [ ] Add a command line interfaces.

HydroSignatures: Tools for computing hydrological signatures#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

HydroSignatures is a suite of tools for computing hydrological signatures and a part of HyRiver software stack. This package includes the following functions:

  • exceedance: Exceedance probability that can be used plotting flow duration curves;

  • flow_duration_curve_slope: Slope of flow duration curve;

  • flashiness_index: Flashiness index;

  • mean_monthly: Mean monthly summary of a time series that can be used for plotting regime curves;

  • rolling_mean_monthly: Rolling mean monthly summary of a time series that can be used for plotting smoothed regime curves;

  • baseflow: Extracting baseflow from a streamflow time series using the Lyne and Hollick digital filter (Ladson et al., 2013);

  • baseflow_index: Baseflow index;

  • aridity_index: Aridity index;

  • seasonality_index_walsh: Seasonality index (Walsh and Lawler, 1981);

  • seasonality_index_markham: Seasonality index (Markham, 1970);

  • extract_extrema: Determining the location of local maxima and minima in a time series;

Moreover, the package has a class called HydroSignatures that can be used to compute all these signatures by passing a streamflow and a precipitation time series, both in millimeters per day (or any other unit of time). This class supports subtraction and inequality operators, which can be used to compare two HydroSignatures objects. You can serialize the class to a JSON object using the to_json method or convert it to a dictionary using the to_dict method.

Moreover, numba is an optional dependency for the baseflow function. Installing numba will speed up the computation of baseflow significantly. For more efficient handling of NaN values, you can also install numbagg.

You can also try using HydroSignatures without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation#

You can install HydroSignatures using pip:

$ pip install hydrosignatures

or from the conda-forge repository using Conda or Mamba:

$ conda install -c conda-forge hydrosignatures

Quick start#

Let’s explore the capabilities of HydroSignatures by getting streamflow using PyGeoHydro, basin geometry using PyNHD and precipitation using PyDaymet. In this example, we select West Branch Herring Run At Idlewylde, MD, as the watershed of interest and compute the hydrological signatures for the period from 2010 to 2020.

import pydaymet as daymet
import hydrosignatures as hs
import pygeohydro as gh
from hydrosignatures import HydroSignatures
from pygeohydro import NWIS
from pynhd import WaterData

site = "01585200"
start = "2010-01-01"
end = "2020-12-31"

First, we get the basin geometry of the watershed using gagesii_basins layer of the USGS’s WaterData web service.

wd = WaterData("gagesii_basins")
geometry = wd.byid("gage_id", site).geometry[0]

Then, we obtain the station’s info and streamflow data using NWIS. Note that we should convert the streamflow from cms to mm/day.

nwis = NWIS()
info = nwis.get_info({"site": site})
area_sqm = info.drain_sqkm.values[0] * 1e6
q_cms = nwis.get_streamflow(site, (start, end))
q_mmpd = q_cms * (24.0 * 60.0 * 60.0) / area_sqm * 1e3
q_mmpd.index = pd.to_datetime(q_mmpd.index.date)

Next, we retrieve the precipitation data using PyDaymet over the whole basin using the basin geometry and take its mean as the basin’s precipitation.

prcp = daymet.get_bygeom(geometry, (start, end), variables="prcp")
p_mmpd = prcp.prcp.mean(dim=["x", "y"]).to_pandas()
p_mmpd.index = pd.to_datetime(p_mmpd.index.date)
q_mmpd = q_mmpd.loc[p_mmpd.index]

Now, we can pass these two to the HydroSignatures class:

sig = HydroSignatures(q_mmpd, p_mmpd)

The values property of this class contains the computed signatures. For example, let’s plot the regime curves:

sig.values.mean_monthly.plot()
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/signatures_rc.png

Note that, you can also use the functions directly. For example, let’s get streamflow observations for another station and separate the baseflow using various filter parameters and compare them:

import numpy as np
import pandas as pd

q = nwis.get_streamflow("12304500", ("2019-01-01", "2019-12-31"))
alpha = np.arange(0.9, 1, 0.01)
qb = pd.DataFrame({a: hs.baseflow(q.squeeze(), alpha=a) for a in alpha})
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/signatures_bf.png

Lastly, let’s compute Markham’s seasonality index for all streamflow time series of the stations in the CAMELS dataset. We retrieve the CAMELS dataset using PyGeoHydro:

import xarray as xr

_, camels_qobs = gh.get_camels()
discharge = camels_qobs.discharge.dropna("station_id")
discharge = xr.where(discharge < 0, 0, discharge)
si = hs.seasonality_index_markham(discharge.to_pandas())

More examples can be found here.

AsyncRetriever: Asynchronous requests with persistent caching#

PyPi Conda Version CodeCov Python Versions Downloads

Security Status CodeFactor black pre-commit

Features#

AsyncRetriever is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package serves as HyRiver’s engine for asynchronously sending requests and retrieving responses as text, binary, or json objects. It uses persistent caching using aiohttp-client-cache to speed up the retrieval even further. Moreover, thanks to nest_asyncio you can use this package in Jupyter notebooks. Although this package is part of the HyRiver software stack, it can be used for any web calls. There are three functions that you can use to make web calls:

  • retrieve_text: Get responses as text objects.

  • retrieve_binary: Get responses as binary objects.

  • retrieve_json: Get responses as json objects.

  • stream_write: Stream responses and write them to disk in chunks.

You can also use the general-purpose retrieve function to get responses as any of the three types. All responses are returned as a list that has the same order as the input list of requests. Moreover, there is another function called delete_url_cache for removing all requests from a cache file that contains a given URL.

You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to one week.

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

  • HYRIVER_SSL_CERT: Path to a SSL certificate file.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"

You can find some example notebooks here.

You can also try using AsyncRetriever without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation#

You can install async-retriever using pip:

$ pip install async-retriever

Alternatively, async-retriever can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge async-retriever

Quick start#

AsyncRetriever by default creates and/or uses ./cache/aiohttp_cache.sqlite as the cache that you can customize by the cache_name argument. Also, by default, the cache doesn’t have any expiration date and the delete_url_cache function should be used if you know that a database on a server was updated, and you want to retrieve the latest data. Alternatively, you can use the expire_after to set the expiration date for the cache.

As an example for retrieving a binary response, let’s use the DAAC server to get NDVI. The responses can be directly passed to xarray.open_mfdataset to get the data as a xarray Dataset. We can also disable SSL certificate verification by setting ssl=False.

import io
import xarray as xr
import async_retriever as ar
from datetime import datetime

west, south, east, north = (-69.77, 45.07, -69.31, 45.45)
base_url = "https://thredds.daac.ornl.gov/thredds/ncss/ornldaac/1299"
dates_itr = ((datetime(y, 1, 1), datetime(y, 1, 31)) for y in range(2000, 2005))
urls, kwds = zip(
    *[
        (
            f"{base_url}/MCD13.A{s.year}.unaccum.nc4",
            {
                "params": {
                    "var": "NDVI",
                    "north": f"{north}",
                    "west": f"{west}",
                    "east": f"{east}",
                    "south": f"{south}",
                    "disableProjSubset": "on",
                    "horizStride": "1",
                    "time_start": s.strftime("%Y-%m-%dT%H:%M:%SZ"),
                    "time_end": e.strftime("%Y-%m-%dT%H:%M:%SZ"),
                    "timeStride": "1",
                    "addLatLon": "true",
                    "accept": "netcdf",
                }
            },
        )
        for s, e in dates_itr
    ]
)
resp = ar.retrieve_binary(urls, kwds, max_workers=8, ssl=False)
data = xr.open_mfdataset(io.BytesIO(r) for r in resp)

We can remove these requests and their responses from the cache like so:

ar.delete_url_cache(base_url)
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/ndvi.png

For a json response example, let’s get water level recordings of an NOAA’s water level station, 8534720 (Atlantic City, NJ), during 2012, using CO-OPS API. Note that this CO-OPS product has a 31-day limit for a single request, so we have to break the request down accordingly.

import pandas as pd

station_id = "8534720"
start = pd.to_datetime("2012-01-01")
end = pd.to_datetime("2012-12-31")

s = start
dates = []
for e in pd.date_range(start, end, freq="m"):
    dates.append((s.date(), e.date()))
    s = e + pd.offsets.MonthBegin()

url = "https://api.tidesandcurrents.noaa.gov/api/prod/datagetter"

urls, kwds = zip(
    *[
        (
            url,
            {
                "params": {
                    "product": "water_level",
                    "application": "web_services",
                    "begin_date": f'{s.strftime("%Y%m%d")}',
                    "end_date": f'{e.strftime("%Y%m%d")}',
                    "datum": "MSL",
                    "station": f"{station_id}",
                    "time_zone": "GMT",
                    "units": "metric",
                    "format": "json",
                }
            },
        )
        for s, e in dates
    ]
)

resp = ar.retrieve_json(urls, kwds)
wl_list = []
for rjson in resp:
    wl = pd.DataFrame.from_dict(rjson["data"])
    wl["t"] = pd.to_datetime(wl.t)
    wl = wl.set_index(wl.t).drop(columns="t")
    wl["v"] = pd.to_numeric(wl.v, errors="coerce")
    wl_list.append(wl)
water_level = pd.concat(wl_list).sort_index()
water_level.attrs = rjson["metadata"]
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/water_level.png

Now, let’s see an example without any payload or headers. Here’s how we can retrieve harmonic constituents of several NOAA stations from CO-OPS:

stations = [
    "8410140",
    "8411060",
    "8413320",
    "8418150",
    "8419317",
    "8419870",
    "8443970",
    "8447386",
]

base_url = "https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations"
urls = [f"{base_url}/{i}/harcon.json?units=metric" for i in stations]
resp = ar.retrieve_json(urls)

amp_list = []
phs_list = []
for rjson in resp:
    sid = rjson["self"].rsplit("/", 2)[1]
    const = pd.DataFrame.from_dict(rjson["HarmonicConstituents"]).set_index("name")
    amp = const.rename(columns={"amplitude": sid})[sid]
    phase = const.rename(columns={"phase_GMT": sid})[sid]
    amp_list.append(amp)
    phs_list.append(phase)

amp = pd.concat(amp_list, axis=1)
phs = pd.concat(phs_list, axis=1)
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/tides.png

PyGeoOGC: Retrieve Data from RESTful, WMS, and WFS Services#

PyPi Conda Version CodeCov Python Versions Downloads

Security Status CodeFactor black pre-commit Binder

Features#

PyGeoOGC is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides general interfaces to web services that are based on ArcGIS RESTful, WMS, and WFS. Although all these web services have limits on the number of features per request (e.g., 1000 object IDs for a RESTful request or 8 million pixels for a WMS request), PyGeoOGC, first, divides the large requests into smaller chunks, and then returns the merged results.

Moreover, under the hood, PyGeoOGC uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.

You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_NAME_HTTP: Path to the caching SQLite database for HTTP requests. It defaults to ./cache/http_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to one week.

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

  • HYRIVER_SSL_CERT: Path to a SSL certificate file.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"

There is also an inventory of URLs for some of these web services in form of a class called ServiceURL. These URLs are in four categories: ServiceURL().restful, ServiceURL().wms, ServiceURL().wfs, and ServiceURL().http. These URLs provide you with some examples of the services that PyGeoOGC supports. If you have success using PyGeoOGC with a web service please consider submitting a request to be added to this URL inventory. You can get all the URLs in the ServiceURL class by just printing it print(ServiceURL()).

PyGeoOGC has three main classes:

  • ArcGISRESTful: This class can be instantiated by providing the target layer URL. For example, for getting Watershed Boundary Data we can use ServiceURL().restful.wbd. By looking at the web service’s website we see that there are nine layers. For example, 1 for 2-digit HU (Region), 6 for 12-digit HU (Subregion), and so on. We can pass the URL to the target layer directly, like this f"{ServiceURL().restful.wbd}/6" or as a separate argument via layer.

    Afterward, we request for the data in two steps. First, we need to get the target object IDs using oids_bygeom (within a geometry), oids_byfield (specific field IDs), or oids_bysql (any valid SQL 92 WHERE clause) class methods. Then, we can get the target features using get_features class method. The returned response can be converted into a geopandas.GeoDataFrame using json2geodf function from PyGeoUtils.

  • WMS: Instantiation of this class requires at least 3 arguments: service URL, layer name(s), and output format. Additionally, target CRS and the web service version can be provided. Upon instantiation, we can use getmap_bybox method class to get the target raster data within a bounding box. The box can be in any valid CRS and if it is different from the default CRS, EPSG:4326, it should be passed using box_crs argument. The service response can be converted into a xarray.Dataset using gtiff2xarray function from PyGeoUtils.

  • WFS: Instantiation of this class is similar to WMS. The only difference is that only one layer name can be passed. Upon instantiation there are three ways to get the data:

    • getfeature_bybox: Get all the target features within a bounding box in any valid CRS.

    • getfeature_byid: Get all the target features based on the IDs. Note that two arguments should be provided: featurename, and featureids. You can get a list of valid feature names using get_validnames class method.

    • getfeature_byfilter: Get the data based on any valid CQL filter.

    You can convert the returned response of this function to a GeoDataFrame using json2geodf function from PyGeoUtils package.

PyGeoOGC also includes several utilities:

  • streaming_download for downloading large files in parallel and in chunks, efficiently.

  • traverse_json for traversing a nested JSON object.

  • match_crs for reprojecting a geometry or bounding box to any valid CRS.

You can find some example notebooks here.

Furthermore, you can also try using PyGeoOGC without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation#

You can install PyGeoOGC using pip:

$ pip install pygeoogc

Alternatively, PyGeoOGC can be installed from the conda-forge repository using Conda or Mamba:

$ conda install -c conda-forge pygeoogc

Quick start#

We can access NHDPlus HR via RESTful service, National Wetlands Inventory from WMS, and FEMA National Flood Hazard via WFS. The output for these functions are of type requests.Response that can be converted to GeoDataFrame or xarray.Dataset using PyGeoUtils.

Let’s start the National Map’s NHDPlus HR web service. We can query the flowlines that are within a geometry as follows:

from pygeoogc import ArcGISRESTful, WFS, WMS, ServiceURL
import pygeoutils as geoutils
from pynhd import NLDI

basin_geom = NLDI().get_basins("01031500").geometry[0]

hr = ArcGISRESTful(ServiceURL().restful.nhdplushr, 2, outformat="json")

resp = hr.get_features(hr.oids_bygeom(basin_geom, 4326))
flowlines = geoutils.json2geodf(resp)

Note oids_bygeom has three additional arguments: sql_clause, spatial_relation, and distance. We can use sql_clause for passing any valid SQL WHERE clauses and spatial_relation for specifying the target predicate such as intersect, contain, cross, etc. The default predicate is intersect (esriSpatialRelIntersects). Additionally, we can use distance for specifying the buffer distance from the input geometry for getting features.

We can also submit a query based on IDs of any valid field in the database. If the measure property is desired you can pass return_m as True to the get_features class method:

oids = hr.oids_byfield("PERMANENT_IDENTIFIER", ["103455178", "103454362", "103453218"])
resp = hr.get_features(oids, return_m=True)
flowlines = geoutils.json2geodf(resp)

Additionally, any valid SQL 92 WHERE clause can be used. For more details look here. For example, let’s limit our first request to only include catchments with areas larger than 0.5 sqkm.

oids = hr.oids_bygeom(basin_geom, geo_crs=4326, sql_clause="AREASQKM > 0.5")
resp = hr.get_features(oids)
catchments = geoutils.json2geodf(resp)

A WMS-based example is shown below:

wms = WMS(
    ServiceURL().wms.fws,
    layers="0",
    outformat="image/tiff",
    crs=3857,
)
r_dict = wms.getmap_bybox(
    basin_geom.bounds,
    1e3,
    box_crs=4326,
)
wetlands = geoutils.gtiff2xarray(r_dict, basin_geom, 4326)

Query from a WFS-based web service can be done either within a bounding box or using any valid CQL filter.

wfs = WFS(
    ServiceURL().wfs.fema,
    layer="public_NFHL:Base_Flood_Elevations",
    outformat="esrigeojson",
    crs=4269,
)
r = wfs.getfeature_bybox(basin_geom.bounds, box_crs=4326)
flood = geoutils.json2geodf(r.json(), 4269, 4326)

layer = "wmadata:huc08"
wfs = WFS(
    ServiceURL().wfs.waterdata,
    layer=layer,
    outformat="application/json",
    version="2.0.0",
    crs=4269,
)
r = wfs.getfeature_byfilter(f"huc8 LIKE '13030%'")
huc8 = geoutils.json2geodf(r.json(), 4269, 4326)
https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/sql_clause.png

PyGeoUtils: Utilities for (Geo)JSON and (Geo)TIFF Conversion#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

PyGeoUtils is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides utilities for manipulating (Geo)JSON and (Geo)TIFF responses from web services. These utilities are:

  • Coordinates: Generate validated and normalized coordinates in WGS84.

  • GeoBSpline: Create B-spline from a geopandas.GeoDataFrame of points.

  • smooth_linestring: Smooth a shapely.geometry.LineString using B-spline.

  • bspline_curvature: Compute tangent angles, curvature, and radius of curvature of a B-Spline at any points along the curve.

  • arcgis2geojson: Convert ESRIGeoJSON format to GeoJSON.

  • break_lines: Break lines at specified points in a given direction.

  • gtiff2xarray: Convert (Geo)Tiff byte responses to xarray.Dataset.

  • json2geodf: Create geopandas.GeoDataFrame from (Geo)JSON responses

  • snap2nearest: Find the nearest points on a line to a set of points.

  • xarray2geodf: Vectorize a xarray.DataArray to a geopandas.GeoDataFrame.

  • geodf2xarray: Rasterize a geopandas.GeoDataFrame to a xarray.DataArray.

  • xarray_geomask: Mask a xarray.Dataset based on a geometry.

  • query_indices: A wrapper around geopandas.sindex.query_bulk. However, instead of returning an array of positional indices, it returns a dictionary of indices where keys are the indices of the input geometry and values are a list of indices of the tree geometries that intersect with the input geometry.

  • nested_polygons: Determining nested (multi)polygons in a geopandas.GeoDataFrame.

  • multi2poly: For converting a MultiPolygon to a Polygon in a geopandas.GeoDataFrame.

  • geometry_reproject: For reprojecting a geometry (bounding box, list of coordinates, or any shapely.geometry) to a new CRS.

  • gtiff2vrt: For converting a list of GeoTIFF files to a VRT file.

You can find some example notebooks here.

You can also try using PyGeoUtils without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation#

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation#

You can install PyGeoUtils using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev).

$ pip install pygeoutils

Alternatively, PyGeoUtils can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge pygeoutils

Quick start#

We start by smoothing a shapely.geometry.LineString using B-spline:

import pygeoutils as pgu
from shapely import LineString

line = LineString(
    [
        (-97.06138, 32.837),
        (-97.06133, 32.836),
        (-97.06124, 32.834),
        (-97.06127, 32.832),
    ]
)
line = pgu.geometry_reproject(line, 4326, 5070)
sp = pgu.smooth_linestring(line, 5070, 5)
line_sp = pgu.geometry_reproject(sp.line, 5070, 4326)

Next, we use PyGeoOGC to access National Wetlands Inventory from WMS, and FEMA National Flood Hazard via WFS, then convert the output to xarray.Dataset and GeoDataFrame, respectively.

from pygeoogc import WFS, WMS, ServiceURL
from shapely.geometry import Polygon


geometry = Polygon(
    [
        [-118.72, 34.118],
        [-118.31, 34.118],
        [-118.31, 34.518],
        [-118.72, 34.518],
        [-118.72, 34.118],
    ]
)
crs = 4326

wms = WMS(
    ServiceURL().wms.mrlc,
    layers="NLCD_2011_Tree_Canopy_L48",
    outformat="image/geotiff",
    crs=crs,
)
r_dict = wms.getmap_bybox(
    geometry.bounds,
    1e3,
    box_crs=crs,
)
canopy = pgu.gtiff2xarray(r_dict, geometry, crs)

mask = canopy > 60
canopy_gdf = pgu.xarray2geodf(canopy, "float32", mask)

url_wfs = "https://hazards.fema.gov/gis/nfhl/services/public/NFHL/MapServer/WFSServer"
wfs = WFS(
    url_wfs,
    layer="public_NFHL:Base_Flood_Elevations",
    outformat="esrigeojson",
    crs=4269,
)
r = wfs.getfeature_bybox(geometry.bounds, box_crs=crs)
flood = pgu.json2geodf(r.json(), 4269, crs)

API References#

AsyncRetriever

v0.16.0

PyGeoOGC

v0.16.1

PyGeoUtils

v0.16.1

PyNHD

v0.16.2

Py3DEP

v0.16.2

PyGeoHydro

v0.16.0

PyDaymet

v0.16.1

PyGridMET

v0.16.0

PyNLDAS2

v0.16.0

HydroSignatures

v0.16.0

pynhd#

Top-level package for PyNHD.

Submodules#

pynhd.core#

Base classes for PyNHD functions.

Module Contents#
class pynhd.core.AGRBase(base_url, layer=None, outfields='*', crs=4326, outformat='json')#

Base class for getting geospatial data from a ArcGISRESTful service.

Parameters:
  • base_url (str, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last / in the url or the target layer must be passed as an argument.

  • layer (str, optional) – A valid service layer. To see a list of available layers instantiate the class without passing any argument.

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, int, or pyproj.CRS, optional) – Target spatial reference, default to EPSG:4326

  • outformat (str, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults to json.

property service_info: ServiceInfo#

Get the service information.

bygeom(geom, geo_crs=4326, sql_clause='', distance=None, return_m=False, return_geom=True)#

Get feature within a geometry that can be combined with a SQL where clause.

Parameters:
  • geom (Polygon or tuple) – A geometry (Polygon) or bounding box (tuple of length 4).

  • geo_crs (str) – The spatial reference of the input geometry.

  • sql_clause (str, optional) – A valid SQL 92 WHERE clause, defaults to an empty string.

  • distance (int, optional) – The buffer distance for the input geometries in meters, default to None.

  • return_m (bool, optional) – Whether to activate the Return M (measure) in the request, defaults to False.

  • return_geom (bool, optional) – Whether to return the geometry of the feature, defaults to True.

Returns:

geopandas.GeoDataFrame – The requested features as a GeoDataFrame.

Return type:

geopandas.GeoDataFrame

byids(field, fids, return_m=False, return_geom=True)#

Get features based on a list of field IDs.

Parameters:
  • field (str) – Name of the target field that IDs belong to.

  • fids (str or list) – A list of target field ID(s).

  • return_m (bool) – Whether to activate the Return M (measure) in the request, defaults to False.

  • return_geom (bool, optional) – Whether to return the geometry of the feature, defaults to True.

Returns:

geopandas.GeoDataFrame – The requested features as a GeoDataFrame.

Return type:

geopandas.GeoDataFrame

bysql(sql_clause, return_m=False, return_geom=True)#

Get feature IDs using a valid SQL 92 WHERE clause.

Notes

Not all web services support this type of query. For more details look here

Parameters:
  • sql_clause (str) – A valid SQL 92 WHERE clause.

  • return_m (bool) – Whether to activate the measure in the request, defaults to False.

  • return_geom (bool, optional) – Whether to return the geometry of the feature, defaults to True.

Returns:

geopandas.GeoDataFrame – The requested features as a GeoDataFrame.

Return type:

geopandas.GeoDataFrame

static get_validlayers(url)#

Get a list of valid layers.

Parameters:

url (str) – The URL of the ArcGIS REST service.

Returns:

dict – A dictionary of valid layers.

Return type:

dict[str, int]

class pynhd.core.GeoConnex(item=None, dev=False, max_nfeatures=10000)#

Access to the GeoConnex API.

Notes

The geometry field of the query can be a Polygon, MultiPolygon, or tuple/list of length 4 (bbox) in EPSG:4326 CRS. They should be within the extent of the GeoConnex endpoint.

Parameters:
  • The item (service endpoint) to query. Valid endpoints are

    • hu02 for Two-digit Hydrologic Regions

    • hu04 for Four-digit Hydrologic Subregion

    • hu06 for Six-digit Hydrologic Basins

    • hu08 for Eight-digit Hydrologic Subbasins

    • hu10 for Ten-digit Watersheds

    • nat_aq for National Aquifers of the United States from

      USGS National Water Information System National Aquifer code list.

    • principal_aq for Principal Aquifers of the United States from

      2003 USGS data release

    • sec_hydrg_reg for Secondary Hydrogeologic Regions of the

      Conterminous United States from 2018 USGS data release

    • gages for US Reference Stream Gauge Monitoring Locations

    • mainstems for US Reference Mainstem Rivers

    • states for U.S. States

    • counties for U.S. Counties

    • aiannh for Native American Lands

    • cbsa for U.S. Metropolitan and Micropolitan Statistical Areas

    • ua10 for Urbanized Areas and Urban Clusters (2010 Census)

    • places for U.S. legally incororated and Census designated places

    • pws for U.S. Public Water Systems

    • dams for US Reference Dams

  • dev (bool, optional) – Whether to use the development endpoint, defaults to False.

  • max_nfeatures (int, optional) – The maximum number of features to request from the service, defaults to 10000.

property dev: bool#

Return the name of the endpoint.

property item: str | None#

Return the name of the endpoint.

bycql(cql_dict: dict[str, Any], skip_geometry: Literal[False] = False) geopandas.GeoDataFrame#
bycql(cql_dict: dict[str, Any], skip_geometry: Literal[True]) pandas.DataFrame

Query the GeoConnex endpoint.

Notes

GeoConnex only supports simple CQL queries. For more information and examples visit https://portal.ogc.org/files/96288#simple-cql-JSON. Use this for non-spatial queries, since there’s a dedicated method for spatial queries, bygeometry().

Parameters:
  • cql_dict (dict) – A valid CQL dictionary (non-spatial queries).

  • skip_geometry (bool, optional) – If True, no geometry will not be returned, by default False.

Returns:

geopandas.GeoDataFrame – The query result as a geopandas.GeoDataFrame.

bygeometry(geometry1: GTYPE, geometry2: GTYPE | None = ..., predicate: str = ..., crs: CRSTYPE = ..., skip_geometry: Literal[False] = False) geopandas.GeoDataFrame#
bygeometry(geometry1: GTYPE, geometry2: GTYPE | None = ..., predicate: str = ..., crs: CRSTYPE = ..., skip_geometry: Literal[True] = True) pandas.DataFrame

Query the GeoConnex endpoint by geometry.

Parameters:
  • geometry1 (Polygon or tuple of float) – The first geometry or bounding boxes to query. A bounding box is a tuple of length 4 in the form of (xmin, ymin, xmax, ymax). For example, an spatial query for a single geometry would be INTERSECTS(geom, geometry1).

  • geometry2 (Polygon or tuple of float, optional) – The second geometry or bounding boxes to query. A bounding box is a tuple of length 4 in the form of (xmin, ymin, xmax, ymax). Default is None. For example, an spatial query for a two geometries would be CROSSES(geometry1, geometry2).

  • predicate (str, optional) – The predicate to use, by default intersects. Supported predicates are intersects, within, contains, overlaps, crosses, disjoint, touches, and equals.

  • crs (int or str or pyproj.CRS, optional) – The CRS of the polygon, by default EPSG:4326. If the input is a geopandas.GeoDataFrame or geopandas.GeoSeries, this argument will be ignored.

  • skip_geometry (bool, optional) – If True, no geometry will not be returned.

Returns:

geopandas.GeoDataFrame – The query result as a geopandas.GeoDataFrame.

byid(feature_name: str, feature_ids: list[str] | str, skip_geometry: Literal[False] = False) geopandas.GeoDataFrame#
byid(feature_name: str, feature_ids: list[str] | str, skip_geometry: Literal[True]) pandas.DataFrame

Query the GeoConnex endpoint.

class pynhd.core.ScienceBase#

Access and explore items on USGS’s ScienceBase.

static get_children(item)#

Get children items of an item.

static get_file_urls(item)#

Get download and meta URLs of all the available files for an item.

pynhd.network_tools#

Access NLDI and WaterData databases.

Module Contents#
class pynhd.network_tools.NHDTools(flowlines)#

Prepare NHDPlus data for downstream analysis.

Notes

Some of these tools are ported from nhdplusTools.

Parameters:

flowlines (geopandas.GeoDataFrame) – NHDPlus flowlines with at least the following columns: comid, lengthkm, ftype, terminalfl, fromnode, tonode, totdasqkm, startflag, streamorde, streamcalc, terminalpa, pathlength, divergence, hydroseq, and levelpathi.

add_tocomid()#

Find the downstream comid(s) of each comid in NHDPlus flowline database.

Notes

This functions requires the following columns:

comid, terminalpa, fromnode, tonode

static check_requirements(reqs, cols)#

Check for all the required data.

Parameters:
  • reqs (iterable) – A list of required data names (str)

  • cols (list) – A list of variable names (str)

clean_flowlines(use_enhd_attrs, terminal2nan)#

Clean up flowlines.

Parameters:
  • use_enhd_attrs (bool) – Use attributes from the ENHD database.

  • terminal2nan (bool) – Convert terminal flowlines to NaN.

remove_isolated()#

Remove isolated flowlines.

remove_tinynetworks(min_path_size, min_path_length, min_network_size)#

Remove small paths in NHDPlus flowline database.

Notes

This functions requires the following columns: levelpathi, hydroseq, totdasqkm, terminalfl, startflag, pathlength, and terminalpa.

Parameters:
  • min_network_size (float) – Minimum size of drainage network in sqkm.

  • min_path_length (float) – Minimum length of terminal level path of a network in km.

  • min_path_size (float) – Minimum size of outlet level path of a drainage basin in km. Drainage basins with an outlet drainage area smaller than this value will be removed.

to_linestring()#

Convert flowlines to shapely LineString objects.

pynhd.network_tools.enhd_flowlines_nx()#

Get a networkx.DiGraph of the entire NHD flowlines.

Changed in version 0.16.2: The function now replaces all 0 values in the tocomid column of ENHD with the negative of their corresponding comid values. This ensures all sinks are unique and treated accordingly for topological sorting and other network analysis. The difference are in the returned label2comid dictionary and onnetwork_sorted which will contain the negative values for the sinks.

Notes

The graph is directed and has the all the attributes of the flowlines in ENHD. Note that COMIDs are based on the 2020 snapshot of the NHDPlusV2.1.

Returns:

  • graph (networkx.DiGraph) – The generated directed graph

  • label2comid (dict) – A mapping of COMIDs to the node IDs in the graph

  • onnetwork_sorted (list) – A topologically sorted list of the COMIDs.

Return type:

tuple[networkx.DiGraph, dict[int, int], list[int]]

pynhd.network_tools.flowline_resample(flw, spacing, id_col='comid', smoothing=None)#

Resample a flowline based on a given spacing.

Parameters:
  • flw (geopandas.GeoDataFrame) – A dataframe with geometry and id_col columns and CRS attribute. The flowlines should be able to merged to a single LineString. Otherwise, you should use the network_resample() function.

  • spacing (float) – Spacing between the sample points in meters.

  • id_col (str, optional) – Name of the flowlines column containing IDs, defaults to comid.

  • smoothing (float or None, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Larger smoothing means more smoothing while smaller values of smoothing indicates less smoothing. If None (default), smoothing is done with all points.

Returns:

geopandas.GeoDataFrame – Resampled flowline.

Return type:

geopandas.GeoDataFrame

pynhd.network_tools.flowline_xsection(flw, distance, width, id_col='comid', smoothing=None)#

Get cross-section of a river network at a given spacing.

Parameters:
  • flw (geopandas.GeoDataFrame) – A dataframe with geometry and, id_col, and levelpathi columns and a projected CRS attribute.

  • distance (float) – The distance between two consecutive cross-sections.

  • width (float) – The width of the cross-section.

  • id_col (str, optional) – Name of the flowlines column containing IDs, defaults to comid.

  • smoothing (float or None, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Larger smoothing means more smoothing while smaller values of smoothing indicates less smoothing. If None (default), smoothing is done with all points.

Returns:

geopandas.GeoDataFrame – A dataframe with two columns: geometry and comid. The geometry column contains the cross-section of the river network and the comid column contains the corresponding comid from the input dataframe. Note that each comid can have multiple cross-sections depending on the given spacing distance.

Return type:

geopandas.GeoDataFrame

pynhd.network_tools.mainstem_huc12_nx()#

Get a networkx.DiGraph of the entire mainstem HUC12s.

Notes

The directed graph is generated from the nhdplusv2wbd.csv file with all attributes that can be found in Mainstem. Note that HUC12s are based on the 2020 snapshot of the NHDPlusV2.1.

Returns:

  • networkx.DiGraph – The mainstem as a networkx.DiGraph with all the attributes of the mainstems.

  • dict – A mapping of the HUC12s to the node IDs in the graph.

  • list – A topologically sorted list of the HUC12s which strings of length 12.

Return type:

tuple[networkx.DiGraph, dict[int, str], list[str]]

pynhd.network_tools.network_resample(flw, spacing, id_col='comid', smoothing=None)#

Resample a network flowline based on a given spacing.

Parameters:
  • flw (geopandas.GeoDataFrame) – A dataframe with geometry and, id_col, and levelpathi columns and a projected CRS attribute.

  • spacing (float) – Target spacing between the sample points in the length unit of the flw’s CRS.

  • id_col (str, optional) – Name of the flowlines column containing IDs, defaults to comid.

  • smoothing (float or None, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Larger smoothing means more smoothing while smaller values of smoothing indicates less smoothing. If None (default), smoothing is done with all points.

Returns:

geopandas.GeoDataFrame – Resampled flowlines.

Return type:

geopandas.GeoDataFrame

pynhd.network_tools.network_xsection(flw, distance, width, id_col='comid', smoothing=None)#

Get cross-section of a river network at a given spacing.

Parameters:
  • flw (geopandas.GeoDataFrame) – A dataframe with geometry and, id_col, and levelpathi columns and a projected CRS attribute.

  • distance (float) – The distance between two consecutive cross-sections.

  • width (float) – The width of the cross-section.

  • id_col (str, optional) – Name of the flowlines column containing IDs, defaults to comid.

  • smoothing (float or None, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Larger smoothing means more smoothing while smaller values of smoothing indicates less smoothing. If None (default), smoothing is done with all points.

Returns:

geopandas.GeoDataFrame – A dataframe with two columns: geometry and comid. The geometry column contains the cross-section of the river network and the comid column contains the corresponding comid from the input dataframe. Note that each comid can have multiple cross-sections depending on the given spacing distance.

Return type:

geopandas.GeoDataFrame

pynhd.network_tools.nhdflw2nx(flowlines, id_col='comid', toid_col='tocomid', edge_attr=None)#

Convert NHDPlus flowline database to networkx graph.

Parameters:
  • flowlines (geopandas.GeoDataFrame) – NHDPlus flowlines.

  • id_col (str, optional) – Name of the column containing the node ID, defaults to “comid”.

  • toid_col (str, optional) – Name of the column containing the downstream node ID, defaults to “tocomid”.

  • edge_attr (str, optional) – Name of the column containing the edge attributes, defaults to None. If True, all remaining columns will be used as edge attributes.

Returns:

nx.DiGraph – Networkx directed graph of the NHDPlus flowlines. Note that all elements of the toid_col are replaced with negative values of their corresponding id_cl values if they are NaN or 0. This is to ensure that the generated nodes in the graph are unique.

Return type:

networkx.DiGraph

pynhd.network_tools.nhdplus_l48(layer=None, data_dir='cache', **kwargs)#

Get the entire NHDPlus dataset.

Notes

The entire NHDPlus dataset for CONUS (Lower 48) is downloaded from here. This 7.3 GB file will take a while to download, depending on your internet connection. The first time you run this function, the file will be downloaded and stored in the ./cache directory. Subsequent calls will use the cached file. Moreover, there are two additional dependencies required to read the file: pyogrio and py7zr. These dependencies can be installed using pip install pyogrio py7zr or conda install -c conda-forge pyogrio py7zr.

Parameters:
  • layer (str, optional) – The layer name to be returned. Either layer should be provided or sql. Defaults to None. The available layers are:

    • Gage

    • BurnAddLine

    • BurnAddWaterbody

    • LandSea

    • Sink

    • Wall

    • Catchment

    • CatchmentSP

    • NHDArea

    • NHDWaterbody

    • HUC12

    • NHDPlusComponentVersions

    • PlusARPointEvent

    • PlusFlowAR

    • NHDFCode

    • DivFracMP

    • BurnLineEvent

    • NHDFlowline_Network

    • NHDFlowline_NonNetwork

    • GeoNetwork_Junctions

    • PlusFlow

    • N_1_Desc

    • N_1_EDesc

    • N_1_EStatus

    • N_1_ETopo

    • N_1_FloDir

    • N_1_JDesc

    • N_1_JStatus

    • N_1_JTopo

    • N_1_JTopo2

    • N_1_Props

  • data_dire (str or pathlib.Pathlib.Path) – Directory to store the downloaded file and use in subsequent calls, defaults to ./cache.

  • **kwargs – Keyword arguments are passed to pyogrio.read_dataframe. For more information, visit pyogrio.

Returns:

geopandas.GeoDataFrame – A dataframe with all the NHDPlus data.

Return type:

geopandas.GeoDataFrame

pynhd.network_tools.prepare_nhdplus(flowlines, min_network_size, min_path_length, min_path_size=0, purge_non_dendritic=False, remove_isolated=False, use_enhd_attrs=False, terminal2nan=True)#

Clean up and fix common issues of NHDPlus MR and HR flowlines.

Ported from nhdplusTools.

Parameters:
  • flowlines (geopandas.GeoDataFrame) – NHDPlus flowlines with at least the following columns: comid, lengthkm, ftype, terminalfl, fromnode, tonode, totdasqkm, startflag, streamorde, streamcalc, terminalpa, pathlength, divergence, hydroseq, levelpathi.

  • min_network_size (float) – Minimum size of drainage network in sqkm

  • min_path_length (float) – Minimum length of terminal level path of a network in km.

  • min_path_size (float, optional) – Minimum size of outlet level path of a drainage basin in km. Drainage basins with an outlet drainage area smaller than this value will be removed. Defaults to 0.

  • purge_non_dendritic (bool, optional) – Whether to remove non dendritic paths, defaults to False.

  • remove_isolated (bool, optional) – Whether to remove isolated flowlines, i.e., keep only the largest connected component of the flowlines. Defaults to False.

  • use_enhd_attrs (bool, optional) – Whether to replace the attributes with the ENHD attributes, defaults to False. Note that this only works for NHDPlus mid-resolution (MR) and does not work for NHDPlus high-resolution (HR). For more information, see this.

  • terminal2nan (bool, optional) – Whether to replace the COMID of the terminal flowline of the network with NaN, defaults to True. If False, the terminal COMID will be set from the ENHD attributes i.e. use_enhd_attrs will be set to True which is only applicable to NHDPlus mid-resolution (MR).

Returns:

geopandas.GeoDataFrame – Cleaned up flowlines. Note that all column names are converted to lower case.

Return type:

geopandas.GeoDataFrame

pynhd.network_tools.topoogical_sort(flowlines, edge_attr=None, largest_only=False, id_col='ID', toid_col='toID')#

Topological sorting of a river network.

Parameters:
  • flowlines (pandas.DataFrame) – A dataframe with columns ID and toID

  • edge_attr (str or list, optional) – Names of the columns in the dataframe to be used as edge attributes, defaults to None.

  • largest_only (bool, optional) – Whether to return only the largest network, defaults to False.

  • id_col (str, optional) – Name of the column containing the node ID, defaults to ID.

  • toid_col (str, optional) – Name of the column containing the downstream node ID, defaults to toID.

Returns:

(list, dict , networkx.DiGraph) – A list of topologically sorted IDs, a dictionary with keys as IDs and values as a list of its upstream nodes, and the generated networkx.DiGraph object. Note that node IDs are associated with the input flow line IDs, but there might be some negative IDs in the output garph that are not present in the input flow line IDs. These “artificial” nodes are used to represent the graph outlet (the most downstream nodes) in the graph.

Return type:

tuple[list[numpy.int64 | pandas._libs.missing.NAType], dict[int, list[int]], networkx.DiGraph]

pynhd.network_tools.vector_accumulation(flowlines, func, attr_col, arg_cols, id_col='comid', toid_col='tocomid')#

Flow accumulation using vector river network data.

Parameters:
  • flowlines (pandas.DataFrame) – A dataframe containing comid, tocomid, attr_col and all the columns that ara required for passing to func.

  • func (function) – The function that routes the flow in a single river segment. Positions of the arguments in the function should be as follows: func(qin, *arg_cols) qin is computed in this function and the rest are in the order of the arg_cols. For example, if arg_cols = ["slope", "roughness"] then the functions is called this way: func(qin, slope, roughness) where slope and roughness are elemental values read from the flowlines.

  • attr_col (str) – The column name of the attribute being accumulated in the network. The column should contain the initial condition for the attribute for each river segment. It can be a scalar or an array (e.g., time series).

  • arg_cols (list of strs) – List of the flowlines columns that contain all the required data for a routing a single river segment such as slope, length, lateral flow, etc.

  • id_col (str, optional) – Name of the flowlines column containing IDs, defaults to comid

  • toid_col (str, optional) – Name of the flowlines column containing toIDs, defaults to tocomid

Returns:

pandas.Series – Accumulated flow for all the nodes. The dataframe is sorted from upstream to downstream (topological sorting). Depending on the given initial condition in the attr_col, the outflow for each river segment can be a scalar or an array.

Return type:

pandas.Series

pynhd.nhdplus_derived#

Access NLDI and WaterData databases.

Module Contents#
class pynhd.nhdplus_derived.StreamCat#

Get StreamCat API’s properties.

base_url#

The base URL of the API.

Type:

str

valid_names#

The valid names of the metrics.

Type:

list of str

alt_names#

The alternative names of some metrics.

Type:

dict of str

valid_regions#

The valid hydro regions.

Type:

list of str

valid_states#

The valid two letter states’ abbreviations.

Type:

pandas.DataFrame

valid_counties#

The valid counties’ FIPS codes.

Type:

pandas.DataFrame

valid_aois#

The valid types of areas of interest.

Type:

list of str

metrics_df#

The metrics’ metadata such as description and units.

Type:

pandas.DataFrame

valid_years#

A dictionary of the valid years for annual metrics.

Type:

dict

pynhd.nhdplus_derived.enhd_attrs(parquet_path=None)#

Get updated NHDPlus attributes from ENHD V2.0.

Notes

This function downloads a 160 MB parquet file from here. Although this dataframe does not include geometry, it can be linked to other geospatial NHDPlus dataframes through ComIDs.

Parameters:

parquet_path (str or pathlib.Pathlib.Path, optional) – Path to a file with .parquet extension for storing the file, defaults to ./cache/enhd_attrs.parquet.

Returns:

pandas.DataFrame – A dataframe that includes ComID-level attributes for 2.7 million NHDPlus flowlines.

Return type:

pandas.DataFrame

pynhd.nhdplus_derived.epa_nhd_catchments(comids, feature)#

Get NHDPlus catchment-scale data from EPA’s HMS REST API.

Notes

For more information about curve number please refer to the project’s webpage on the EPA’s website.

Parameters:
  • comids (int or list of int) – ComID(s) of NHDPlus catchments.

  • feature (str) – The feature of interest. Available options are:

    • catchment_metrics: 414 catchment-scale metrics.

    • curve_number: 16-day average Curve Number.

    • comid_info: ComID information.

Returns:

dict of pandas.DataFrame or geopandas.GeoDataFrame – A dict of the requested dataframes. A comid_info dataframe is always returned.

Return type:

dict[str, pandas.DataFrame]

Examples

>>> import pynhd
>>> data = nhd.epa_nhd_catchments(1440291, "catchment_metrics")
>>> data["catchment_metrics"].loc[1440291, "AvgWetIndxCat"]
579.532
pynhd.nhdplus_derived.nhd_fcode()#

Get all the NHDPlus FCodes.

pynhd.nhdplus_derived.nhdplus_attrs(attr_name=None)#

Stage the NHDPlus Attributes database and save to nhdplus_attrs.parquet.

Notes

More info can be found here.

Parameters:

attr_names (str , *optional*) – Name of NHDPlus attribute to return, defaults to None, i.e., only return a metadata dataframe that includes the attribute names and their description and units.

Returns:

pandas.DataFrame – The staged data as a DataFrame.

Return type:

pandas.DataFrame

pynhd.nhdplus_derived.nhdplus_attrs_s3(attr_names=None, nodata=False)#

Access NHDPlus V2.1 derived attributes over CONUS.

Notes

More info can be found here.

Parameters:
  • attr_names (str or list of str, optional) – Names of NHDPlus attribute(s) to return, defaults to None, i.e., only return a metadata dataframe that includes the attribute names and their description and units.

  • nodata (bool) – Whether to include NODATA percentages, default is False.

Returns:

pandas.DataFrame – A dataframe of requested NHDPlus attributes.

Return type:

pandas.DataFrame

pynhd.nhdplus_derived.nhdplus_h12pp(gpkg_path=None)#

Access HUC12 Pour Points for NHDPlus V2.1 L48 (CONUS).

Notes

More info can be found here.

Parameters:

gpkg_path (str or pathlib.Pathlib.Path, optional) – Path to the geopackage file, defaults to None, i.e., download the file to the cache directory as 102020wbd_outlets.gpkg.

Returns:

geopandas.GeoDataFrame – A geodataframe of HUC12 pour points.

Return type:

pandas.DataFrame

pynhd.nhdplus_derived.nhdplus_vaa(parquet_path=None)#

Get NHDPlus Value Added Attributes including roughness.

Notes

This function downloads a 245 MB parquet file from here. Although this dataframe does not include geometry, it can be linked to other geospatial NHDPlus dataframes through ComIDs.

Parameters:

parquet_path (str or pathlib.Pathlib.Path, optional) – Path to a file with .parquet extension for storing the file, defaults to ./cache/nldplus_vaa.parquet.

Returns:

pandas.DataFrame – A dataframe that includes ComID-level attributes for 2.7 million NHDPlus flowlines.

Return type:

pandas.DataFrame

pynhd.nhdplus_derived.streamcat(metric_names, metric_areas=None, comids=None, regions=None, states=None, counties=None, conus=False, percent_full=False, area_sqkm=False)#

Get various metrics for NHDPlusV2 catchments from EPA’s StreamCat.

Notes

For more information about the service check its webpage at https://www.epa.gov/national-aquatic-resource-surveys/streamcat-dataset.

Parameters:
  • metric_names (str or list of str) – Metric name(s) to retrieve. There are 567 metrics available. to get a full list check out StreamCat.valid_names(). To get a description of each metric, check out StreamCat.metrics_df(). Some metrics require year and/or slope to be specified, which have [Year] and/or [Slope] in their name. For convenience all these variables and their years/slopes are converted to a dict that can be accessed via StreamCat.valid_years() and StreamCat.valid_slopes().

  • metric_areas (str or list of str, optional) – Areas to return the metrics for, defaults to None, i.e. all areas. Valid options are: catchment, watershed, riparian_catchment, riparian_watershed, other.

  • comids (int or list of int, optional) – NHDPlus COMID(s), defaults to None. Either comids, regions, states, counties, or conus must be passed. They are mutually exclusive.

  • regions (str or list of str, optional) – Hydro region(s) to retrieve metrics for, defaults to None. For a full list of valid regions check out StreamCat.valid_regions() Either comids, regions, states, counties, or conus must be passed. They are mutually exclusive.

  • states (str or list of str, optional) – Two letter state abbreviation(s) to retrieve metrics for, defaults to None. For a full list of valid states check out StreamCat.valid_states() Either comids, regions, states, counties, or conus must be passed. They are mutually exclusive.

  • counties (str or list of str, optional) – County FIPS codes(s) to retrieve metrics for, defaults to None. For a full list of valid county codes check out StreamCat.valid_counties() Either comids, regions, states, counties, or conus must be passed. They are mutually exclusive.

  • conus (bool, optional) – If True, metric_names of all NHDPlus COMIDs are retrieved, defaults False. Either comids, regions, states, counties, or conus must be passed. They are mutually exclusive.

  • percent_full (bool, optional) – If True, return the percent of each area of interest covered by the metric.

  • area_sqkm (bool, optional) – If True, return the area in square kilometers.

Returns:

pandas.DataFrame – A dataframe with the requested metrics.

Return type:

pandas.DataFrame

pynhd.pynhd#

Access NLDI and WaterData databases.

Module Contents#
class pynhd.pynhd.HP3D(layer, outfields='*', crs=4326)#

Access USGS 3D Hydrography Program (3DHP) service.

Notes

For more info visit: https://hydro.nationalmap.gov/arcgis/rest/services/3DHP_all/MapServer

Parameters:
  • layer (str, optional) – A valid service layer. Layer names with _hr are high resolution and _mr are medium resolution. Also, layer names with _nonconus are for non-conus areas, i.e., Alaska, Hawaii, Puerto Rico, the Virgin Islands , and the Pacific Islands. Valid layers are:

    • hydrolocation

    • flowline

    • waterbody

    • drainage_area

    • catchment

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, int, or pyproj.CRS, optional) – Target spatial reference, default to EPSG:4326.

bygeom(geom, geo_crs=4326, sql_clause='', distance=None, return_m=False, return_geom=True)#

Get features within a geometry that can be combined with a SQL where clause.

byids(field, fids, return_m=False, return_geom=True)#

Get features by object IDs.

bysql(sql_clause, return_m=False, return_geom=True)#

Get features using a valid SQL 92 WHERE clause.

class pynhd.pynhd.NHD(layer, outfields='*', crs=4326)#

Access National Hydrography Dataset (NHD), both meduim and high resolution.

Notes

For more info visit: https://hydro.nationalmap.gov/arcgis/rest/services/nhd/MapServer

Parameters:
  • layer (str, optional) – A valid service layer. Layer names with _hr are high resolution and _mr are medium resolution. Also, layer names with _nonconus are for non-conus areas, i.e., Alaska, Hawaii, Puerto Rico, the Virgin Islands , and the Pacific Islands. Valid layers are:

    • point

    • point_event

    • line_hr

    • flow_direction

    • flowline_mr

    • flowline_hr_nonconus

    • flowline_hr

    • area_mr

    • area_hr_nonconus

    • area_hr

    • waterbody_mr

    • waterbody_hr_nonconus

    • waterbody_hr

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, int, or pyproj.CRS, optional) – Target spatial reference, default to EPSG:4326.

bygeom(geom, geo_crs=4326, sql_clause='', distance=None, return_m=False, return_geom=True)#

Get features within a geometry that can be combined with a SQL where clause.

byids(field, fids, return_m=False, return_geom=True)#

Get features by object IDs.

bysql(sql_clause, return_m=False, return_geom=True)#

Get features using a valid SQL 92 WHERE clause.

class pynhd.pynhd.NHDPlusHR(layer, outfields='*', crs=4326)#

Access National Hydrography Dataset (NHD) Plus high resolution.

Notes

For more info visit: https://hydro.nationalmap.gov/arcgis/rest/services/NHDPlus_HR/MapServer

Parameters:
  • layer (str, optional) – A valid service layer. Valid layers are:

    • gage for NHDPlusGage layer

    • sink for NHDPlusSink layer

    • point for NHDPoint layer

    • flowline for NetworkNHDFlowline layer

    • non_network_flowline for NonNetworkNHDFlowline layer

    • flow_direction for FlowDirection layer

    • wall for NHDPlusWall layer

    • line for NHDLine layer

    • area for NHDArea layer

    • waterbody for NHDWaterbody layer

    • catchment for NHDPlusCatchment layer

    • boundary_unit for NHDPlusBoundaryUnit layer

    • huc12 for WBDHU12 layer

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, int, or pyproj.CRS, optional) – Target spatial reference, default to EPSG:4326.

bygeom(geom, geo_crs=4326, sql_clause='', distance=None, return_m=False, return_geom=True)#

Get features within a geometry that can be combined with a SQL where clause.

byids(field, fids, return_m=False, return_geom=True)#

Get features by object IDs.

bysql(sql_clause, return_m=False, return_geom=True)#

Get features using a valid SQL 92 WHERE clause.

class pynhd.pynhd.NLDI#

Access the Hydro Network-Linked Data Index (NLDI) service.

comid_byloc(coords, loc_crs=4326)#

Get the closest ComID based on coordinates using hydrolocation endpoint.

Notes

This function tries to find the closest ComID based on flowline grid cells. If such a cell is not found, it will return the closest ComID using the flowtrace endpoint of the PyGeoAPI service to find the closest downstream ComID. The returned dataframe has a measure column that indicates the location of the input coordinate on the flowline as a percentage of the total flowline length.

Parameters:
  • coords (tuple or list of tuples) – A tuple of length two (x, y) or a list of them.

  • loc_crs (str, int, or pyproj.CRS, optional) – The spatial reference of the input coordinate, defaults to EPSG:4326.

Returns:

geopandas.GeoDataFrame or (geopandas.GeoDataFrame, list) – NLDI indexed ComID(s) and points in EPSG:4326. If some coords don’t return any ComID a list of missing coords are returned as well.

Return type:

geopandas.GeoDataFrame

feature_byloc(coords, loc_crs=4326)#

Get the closest feature ID(s) based on coordinates using position endpoint.

Parameters:
  • coords (tuple or list) – A tuple of length two (x, y) or a list of them.

  • loc_crs (str, int, or pyproj.CRS, optional) – The spatial reference of the input coordinate, defaults to EPSG:4326.

Returns:

geopandas.GeoDataFrame or (geopandas.GeoDataFrame, list) – NLDI indexed feature ID(s) and flowlines in EPSG:4326. If some coords don’t return any IDs a list of missing coords are returned as well.

Return type:

geopandas.GeoDataFrame

get_basins(feature_ids, fsource='nwissite', split_catchment=False, simplified=True)#

Get basins for a list of station IDs.

Parameters:
  • feature_ids (str or list) – Target feature ID(s).

  • fsource (str) – The name of feature(s) source, defaults to nwissite. The valid sources are:

    • ‘comid’ for NHDPlus comid.

    • ‘ca_gages’ for Streamgage catalog for CA SB19

    • ‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest

    • ‘huc12pp’ for HUC12 Pour Points

    • ‘nmwdi-st’ for New Mexico Water Data Initiative Sites

    • ‘nwisgw’ for NWIS Groundwater Sites

    • ‘nwissite’ for NWIS Surface Water Sites

    • ‘ref_gage’ for geoconnex.us reference gages

    • ‘vigil’ for Vigil Network Data

    • ‘wade’ for Water Data Exchange 2.0 Sites

    • ‘WQP’ for Water Quality Portal

  • split_catchment (bool, optional) – If True, split basins at their outlet locations. Default to False.

  • simplified (bool, optional) – If True, return a simplified version of basin geometries. Default to True.

Returns:

geopandas.GeoDataFrame or (geopandas.GeoDataFrame, list) – NLDI indexed basins in EPSG:4326. If some IDs don’t return any features a list of missing ID(s) are returned as well.

Return type:

geopandas.GeoDataFrame

getcharacteristic_byid(feature_ids: str | int | Sequence[str | int], char_type: str, fsource: str = ..., char_ids: str | list[str] = ..., values_only: Literal[True] = ...) pandas.DataFrame#
getcharacteristic_byid(feature_ids: str | int | Sequence[str | int], char_type: str, fsource: str = ..., char_ids: str | list[str] = ..., values_only: Literal[False] = ...) tuple[pandas.DataFrame, pandas.DataFrame]

Get characteristics using a list ComIDs.

Parameters:
  • feature_ids (str or list) – Target feature ID(s).

  • char_type (str) – Type of the characteristic. Valid values are local for individual reach catchments, tot for network-accumulated values using total cumulative drainage area and div for network-accumulated values using divergence-routed.

  • fsource (str, optional) – The name of feature(s) source, defaults to comid. The valid sources are:

    • ‘comid’ for NHDPlus comid.

    • ‘ca_gages’ for Streamgage catalog for CA SB19

    • ‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest

    • ‘huc12pp’ for HUC12 Pour Points

    • ‘nmwdi-st’ for New Mexico Water Data Initiative Sites

    • ‘nwisgw’ for NWIS Groundwater Sites

    • ‘nwissite’ for NWIS Surface Water Sites

    • ‘ref_gage’ for geoconnex.us reference gages

    • ‘vigil’ for Vigil Network Data

    • ‘wade’ for Water Data Exchange 2.0 Sites

    • ‘WQP’ for Water Quality Portal

  • char_ids (str or list, optional) – Name(s) of the target characteristics, default to all.

  • values_only (bool, optional) – Whether to return only characteristic_value as a series, default to True. If is set to False, percent_nodata is returned as well.

Returns:

pandas.DataFrame or tuple of pandas.DataFrame – Either only characteristic_value as a dataframe or or if values_only is Fale return percent_nodata as well.

getfeature_byid(fsource, fids)#

Get feature(s) based ID(s).

Parameters:
  • fsource (str) – The name of feature(s) source. The valid sources are:

    • ‘comid’ for NHDPlus comid.

    • ‘ca_gages’ for Streamgage catalog for CA SB19

    • ‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest

    • ‘huc12pp’ for HUC12 Pour Points

    • ‘nmwdi-st’ for New Mexico Water Data Initiative Sites

    • ‘nwisgw’ for NWIS Groundwater Sites

    • ‘nwissite’ for NWIS Surface Water Sites

    • ‘ref_gage’ for geoconnex.us reference gages

    • ‘vigil’ for Vigil Network Data

    • ‘wade’ for Water Data Exchange 2.0 Sites

    • ‘WQP’ for Water Quality Portal

  • fid (str or list of str) – Feature ID(s).

Returns:

geopandas.GeoDataFrame or (geopandas.GeoDataFrame, list) – NLDI indexed features in EPSG:4326. If some IDs don’t return any features a list of missing ID(s) are returned as well.

Return type:

geopandas.GeoDataFrame

navigate_byid(fsource, fid, navigation, source, distance=500, trim_start=False, stop_comid=None)#

Navigate the NHDPlus database from a single feature id up to a distance.

Parameters:
  • fsource (str) – The name of feature(s) source. The valid sources are:

    • ‘comid’ for NHDPlus comid.

    • ‘ca_gages’ for Streamgage catalog for CA SB19

    • ‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest

    • ‘huc12pp’ for HUC12 Pour Points

    • ‘nmwdi-st’ for New Mexico Water Data Initiative Sites

    • ‘nwisgw’ for NWIS Groundwater Sites

    • ‘nwissite’ for NWIS Surface Water Sites

    • ‘ref_gage’ for geoconnex.us reference gages

    • ‘vigil’ for Vigil Network Data

    • ‘wade’ for Water Data Exchange 2.0 Sites

    • ‘WQP’ for Water Quality Portal

  • fid (str or int) – The ID of the feature.

  • navigation (str) – The navigation method.

  • source (str) – Return the data from another source after navigating features from fsource.

  • distance (int, optional) – Limit the search for navigation up to a distance in km, defaults is 500 km. Note that this is an expensive request so you have be mindful of the value that you provide. The value must be between 1 to 9999 km.

  • trim_start (bool, optional) – If True, trim the starting flowline at the source feature, defaults to False.

  • stop_comid (str or int, optional) – The ComID to stop the navigationation, defaults to None.

Returns:

geopandas.GeoDataFrame – NLDI indexed features in EPSG:4326.

Return type:

geopandas.GeoDataFrame

navigate_byloc(coords, navigation=None, source=None, loc_crs=4326, distance=500, trim_start=False, stop_comid=None)#

Navigate the NHDPlus database from a coordinate.

Notes

This function first calls the feature_byloc function to get the comid of the nearest flowline and then calls the navigate_byid function to get the features from the obtained comid.

Parameters:
  • coords (tuple) – A tuple of length two (x, y).

  • navigation (str, optional) – The navigation method, defaults to None which throws an exception if comid_only is False.

  • source (str, optional) – Return the data from another source after navigating the features based on comid, defaults to None which throws an exception if comid_only is False.

  • loc_crs (str, int, or pyproj.CRS, optional) – The spatial reference of the input coordinate, defaults to EPSG:4326.

  • distance (int, optional) – Limit the search for navigation up to a distance in km, defaults to 500 km. Note that this is an expensive request so you have be mindful of the value that you provide.

  • trim_start (bool, optional) – If True, trim the starting flowline at the source feature, defaults to False.

  • stop_comid (str or int, optional) – The ComID to stop the navigationation, defaults to None.

Returns:

geopandas.GeoDataFrame – NLDI indexed features in EPSG:4326.

Return type:

geopandas.GeoDataFrame

class pynhd.pynhd.PyGeoAPI#

Access PyGeoAPI service.

cross_section(coord, width, numpts, crs=4326)#

Return a GeoDataFrame from the xsatpoint service.

Parameters:
  • coord (tuple) – The coordinate of the point to extract the cross-section as a tuple,e.g., (lon, lat).

  • width (float) – The width of the cross-section in meters.

  • numpts (int) – The number of points to extract the cross-section from the DEM.

  • crs (str, int, or pyproj.CRS, optional) – The coordinate reference system of the coordinates, defaults to EPSG:4326.

Returns:

geopandas.GeoDataFrame – A GeoDataFrame containing the cross-section at the requested point.

Return type:

geopandas.GeoDataFrame

Examples

>>> from pynhd import PyGeoAPI
>>> pga = PyGeoAPI()
>>> gdf = pga.cross_section((-103.80119, 40.2684), width=1000.0, numpts=101, crs=4326)  
>>> print(gdf.iloc[-1, 1])  
1000.0
elevation_profile(line, numpts, dem_res, crs=4326)#

Return a GeoDataFrame from the xsatpathpts service.

Parameters:
  • line (shapely.LineString or shapely.MultiLineString) – The line to extract the elevation profile for.

  • numpts (int) – The number of points to extract the elevation profile from the DEM.

  • dem_res (int) – The target resolution for requesting the DEM from 3DEP service.

  • crs (str, int, or pyproj.CRS, optional) – The coordinate reference system of the coordinates, defaults to EPSG:4326.

Returns:

geopandas.GeoDataFrame – A GeoDataFrame containing the elevation profile along the requested endpoints.

Return type:

geopandas.GeoDataFrame

Examples

>>> from pynhd import PyGeoAPI
>>> from shapely import LineString
>>> pga = PyGeoAPI()
>>> line = LineString([(-103.801086, 40.26772), (-103.80097, 40.270568)])
>>> gdf = pga.elevation_profile(line, 101, 1, 4326)  
>>> print(gdf.iloc[-1, 2])  
1299.8727
endpoints_profile(coords, numpts, dem_res, crs=4326)#

Return a GeoDataFrame from the xsatendpts service.

Parameters:
  • coords (list) – A list of two coordinates to trace as a list of tuples, e.g., [(x1, y1), (x2, y2)].

  • numpts (int) – The number of points to extract the elevation profile from the DEM.

  • dem_res (int) – The target resolution for requesting the DEM from 3DEP service.

  • crs (str, int, or pyproj.CRS, optional) – The coordinate reference system of the coordinates, defaults to EPSG:4326.

Returns:

geopandas.GeoDataFrame – A GeoDataFrame containing the elevation profile along the requested endpoints.

Return type:

geopandas.GeoDataFrame

Examples

>>> from pynhd import PyGeoAPI
>>> pga = PyGeoAPI()
>>> gdf = pga.endpoints_profile(
...     [(-103.801086, 40.26772), (-103.80097, 40.270568)], numpts=101, dem_res=1, crs=4326
... )  
>>> print(gdf.iloc[-1, 1])  
411.5906
flow_trace(coord, crs=4326, direction='none')#

Return a GeoDataFrame from the flowtrace service.

Parameters:
  • coord (tuple) – The coordinate of the point to trace as a tuple,e.g., (lon, lat).

  • crs (str) – The coordinate reference system of the coordinates, defaults to EPSG:4326.

  • direction (str, optional) – The direction of flowpaths, either down, up, or none. Defaults to none.

Returns:

geopandas.GeoDataFrame – A GeoDataFrame containing the traced flowline.

Return type:

geopandas.GeoDataFrame

Examples

>>> from pynhd import PyGeoAPI
>>> pga = PyGeoAPI()
>>> gdf = pga.flow_trace(
...     (1774209.63, 856381.68), crs="ESRI:102003", direction="none"
... )  
>>> print(gdf.comid.iloc[0])  
22294818
split_catchment(coord, crs=4326, upstream=False)#

Return a GeoDataFrame from the splitcatchment service.

Parameters:
  • coord (tuple) – The coordinate of the point to trace as a tuple,e.g., (lon, lat).

  • crs (str, int, or pyproj.CRS, optional) – The coordinate reference system of the coordinates, defaults to EPSG:4326.

  • upstream (bool, optional) – If True, return all upstream catchments rather than just the local catchment, defaults to False.

Returns:

geopandas.GeoDataFrame – A GeoDataFrame containing the local catchment or the entire upstream catchments.

Return type:

geopandas.GeoDataFrame

Examples

>>> from pynhd import PyGeoAPI
>>> pga = PyGeoAPI()
>>> gdf = pga.split_catchment((-73.82705, 43.29139), crs=4326, upstream=False)  
>>> print(gdf.catchmentID.iloc[0])  
22294818
class pynhd.pynhd.WaterData(layer, crs=4326)#

Access to WaterData service.

Parameters:
  • layer (str) – A valid layer from the WaterData service. Valid layers are:

    • catchmentsp

    • gagesii

    • gagesii_basins

    • nhdarea

    • nhdflowline_network

    • nhdflowline_nonnetwork

    • nhdwaterbody

    • wbd02

    • wbd04

    • wbd06

    • wbd08

    • wbd10

    • wbd12

    Note that the layers’ namespace for the WaterData service is wmadata and will be added to the given layer argument if it is not provided.

  • crs (str, int, or pyproj.CRS, optional) – The target spatial reference system, defaults to epsg:4326.

  • validation (bool, optional) – Whether to validate the input data, defaults to True.

bybox(bbox, box_crs=4326, sort_attr=None)#

Get features within a bounding box.

Parameters:
  • bbox (tuple of floats) – A bounding box in the form of (minx, miny, maxx, maxy).

  • box_crs (str, int, or pyproj.CRS, optional) – The spatial reference system of the bounding box, defaults to epsg:4326.

  • sort_attr (str, optional) – The column name in the database to sort request by, defaults to the first attribute in the schema that contains id in its name.

Returns:

geopandas.GeoDataFrame – The requested features in a GeoDataFrames.

Return type:

geopandas.GeoDataFrame

bydistance(coords, distance, loc_crs=4326, sort_attr=None)#

Get features within a radius (in meters) of a point.

Parameters:
  • coords (tuple of float) – The x, y coordinates of the point.

  • distance (int) – The radius (in meters) to search within.

  • loc_crs (str, int, or pyproj.CRS, optional) – The CRS of the input coordinates, default to epsg:4326.

  • sort_attr (str, optional) – The column name in the database to sort request by, defaults to the first attribute in the schema that contains id in its name.

Returns:

geopandas.GeoDataFrame – Requested features as a GeoDataFrame.

Return type:

geopandas.GeoDataFrame

byfilter(cql_filter, method='GET', sort_attr=None)#

Get features based on a CQL filter.

Parameters:
  • cql_filter (str) – The CQL filter to use for requesting the data.

  • method (str, optional) – The HTTP method to use for requesting the data, defaults to GET. Allowed methods are GET and POST.

  • sort_attr (str, optional) – The column name in the database to sort request by, defaults to the first attribute in the schema that contains id in its name.

Returns:

geopandas.GeoDataFrame – The requested features as a GeoDataFrames.

Return type:

geopandas.GeoDataFrame

bygeom(geometry, geo_crs=4326, xy=True, predicate='intersects', sort_attr=None)#

Get features within a geometry.

Parameters:
  • geometry (shapely.Polygon or shapely.MultiPolygon) – The input (multi)polygon to request the data.

  • geo_crs (str, int, or pyproj.CRS, optional) – The CRS of the input geometry, default to epsg:4326.

  • xy (bool, optional) – Whether axis order of the input geometry is xy or yx.

  • predicate (str, optional) – The geometric prediacte to use for requesting the data, defaults to INTERSECTS. Valid predicates are:

    • equals

    • disjoint

    • intersects

    • touches

    • crosses

    • within

    • contains

    • overlaps

    • relate

    • beyond

  • sort_attr (str, optional) – The column name in the database to sort request by, defaults to the first attribute in the schema that contains id in its name.

Returns:

geopandas.GeoDataFrame – The requested features in the given geometry.

Return type:

geopandas.GeoDataFrame

byid(featurename, featureids)#

Get features based on IDs.

pynhd.pynhd.pygeoapi(geodf, service)#

Return a GeoDataFrame from the flowtrace service.

Parameters:
  • geodf (geopandas.GeoDataFrame) – A GeoDataFrame containing geometries to query. The required columns for each service are:

    • flow_trace: direction that indicates the direction of the flow trace. It can be up, down, or none (both directions).

    • split_catchment: upstream that indicates whether to return all upstream catchments or just the local catchment.

    • elevation_profile: numpts that indicates the number of points to extract along the flowpath and 3dep_res that indicates the target resolution for requesting the DEM from 3DEP service.

    • endpoints_profile: numpts that indicates the number of points to extract along the flowpath and 3dep_res that indicates the target resolution for requesting the DEM from 3DEP service.

    • cross_section: numpts that indicates the number of points to extract along the flowpath and width that indicates the width of the cross-section in meters.

  • service (str) – The service to query, can be flow_trace, split_catchment, elevation_profile, endpoints_profile, or cross_section.

Returns:

geopandas.GeoDataFrame – A GeoDataFrame containing the results of requested service.

Return type:

geopandas.GeoDataFrame

Examples

>>> from shapely import Point
>>> import geopandas as gpd
>>> gdf = gpd.GeoDataFrame(
...     {
...         "direction": [
...             "none",
...         ]
...     },
...     geometry=[Point((1774209.63, 856381.68))],
...     crs="ESRI:102003",
... )
>>> trace = nhd.pygeoapi(gdf, "flow_trace")
>>> print(trace.comid.iloc[0])
22294818

Package Contents#

pygeohydro#

Top-level package for PyGeoHydro.

Submodules#

pygeohydro.helpers#

Some helper function for PyGeoHydro.

Module Contents#
pygeohydro.helpers.get_us_states(subset_key=None)#

Get US states as a GeoDataFrame from Census’ TIGERLine 2023 database.

Parameters:

subset_key (str or list of str, optional) – Key to subset the geometries instead of returning all states, by default all states are returned. Valid keys are:

  • contiguous or conus

  • continental

  • commonwealths

  • territories

  • Two letter state codes, e.g., ["TX", "CA", "FL", ...]

Returns:

geopandas.GeoDataFrame – GeoDataFrame of requested US states.

Return type:

geopandas.GeoDataFrame

pygeohydro.helpers.nlcd_helper()#

Get legends and properties of the NLCD cover dataset.

Notes

The following references have been used:
Returns:

dict – Years where data is available and cover classes and categories, and roughness estimations.

Return type:

dict[str, Any]

pygeohydro.helpers.nwis_errors()#

Get error code lookup table for USGS sites that have daily values.

pygeohydro.helpers.states_lookup_table()#

Get codes and names of US states and their counties.

Notes

This function is based on a file prepared by developers of an R package called dataRetrieval.

Returns:

pandas.DataFrame – State codes and name as a dataframe.

Return type:

dict[str, StateCounties]

pygeohydro.nfhl#

Accessing National Flood Hazard Layers (NFHL) through web services.

Module Contents#
class pygeohydro.nfhl.NFHL(service, layer, outfields='*', crs=4326)#

Access National Flood Hazard Layers (NFHL).

Parameters:
  • service (str) – The service type. Valid services are:

    • NFHL: Effective National Flood Hazard Layers

    • Prelim_CSLF: Preliminary Changes Since Last Firm (CSLF)

    • Draft_CSLF: Draft Changes Since Last Firm (CSLF)

    • Prelim_NFHL: Preliminary National Flood Hazard Layers

    • Pending_NFHL: Pending National Flood Hazard Layers

    • Draft_NFHL: Draft National Flood Hazard Layers

  • layer (str) – A valid service layer. Valid layers are service specific:

    • NFHL: nfhl availability, firm panels, lomrs, lomas,

      political jurisdictions, profile baselines, water lines, cross-sections, base flood elevations, levees, seclusion boundaries, coastal transects, transect baselines, general structures, river mile markers, water areas, plss, limit of moderate wave action, flood hazard boundaries, flood hazard zones, primary frontal dunes, base index, topographic low confidence areas, datum conversion points, coastal gages, gages, nodes, high water marks, station start points, hydrologic reaches, alluvial fans, and subbasins

    • Prelim_CSLF: preliminary, coastal high hazard area change,

      floodway change, special flood hazard area change, and non-special flood hazard area change

    • Draft_CSLF: draft, coastal high hazard area change,

      floodway change, special flood hazard area change, and non-special flood hazard area change

    • Prelim_NFHL: preliminary data availability,

      preliminary firm panel index, preliminary plss, preliminary topographic low confidence areas, preliminary river mile markers, preliminary datum conversion points, preliminary coastal gages, preliminary gages, preliminary nodes, preliminary high water marks, preliminary station start points, preliminary cross-sections, preliminary coastal transects, preliminary base flood elevations, preliminary profile baselines, preliminary transect baselines, preliminary limit of moderate wave action, preliminary water lines, preliminary political jurisdictions, preliminary levees, preliminary general structures, preliminary primary frontal dunes, preliminary hydrologic reaches, preliminary flood hazard boundaries, preliminary flood hazard zones, preliminary submittal information, preliminary alluvial fans, preliminary subbasins, and preliminary water areas

    • Pending_NFHL: pending submittal information, pending water areas,

      pending firm panel index, pending data availability, pending firm panels, pending political jurisdictions, pending profile baselines, pending water lines, pending cross-sections, pending base flood elevations, pending levees, pending seclusion boundaries, pending coastal transects, pending transect baselines, pending general structures, pending river mile markers, pending plss, pending limit of moderate wave action, pending flood hazard boundaries, pending flood hazard zones, pending primary frontal dunes, pending topographic low confidence areas, pending datum conversion points, pending coastal gages, pending gages, pending nodes, pending high water marks, pending station start points, pending hydrologic reaches, pending alluvial fans, and pending subbasins

    • Draft_NFHL: draft data availability, draft firm panels,

      draft political jurisdictions, draft profile baselines, draft water lines, draft cross-sections, draft base flood elevations, draft levees, draft submittal info, draft coastal transects, draft transect baselines, draft general structures, draft limit of moderate wave action, draft flood hazard boundaries, and draft flood hazard zones

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, int, or pyproj.CRS, optional) – Target spatial reference of output, default to EPSG:4326.

Examples

>>> from pygeohydro import NFHL
>>> nfhl = NFHL("NFHL", "cross-sections")
>>> gdf_xs = nfhl.bygeom((-73.42, 43.28, -72.9, 43.52), geo_crs=4269)

References

bygeom(geom, geo_crs=4326, sql_clause='', distance=None, return_m=False, return_geom=True)#

Get features within a geometry that can be combined with a SQL where clause.

byids(field, fids, return_m=False, return_geom=True)#

Get features by object IDs.

bysql(sql_clause, return_m=False, return_geom=True)#

Get features using a valid SQL 92 WHERE clause.

property valid_services: dict[str, str]#

A dictionary of valid services and their URLs.

pygeohydro.nlcd#

Accessing data from the supported databases through their APIs.

Module Contents#
pygeohydro.nlcd.cover_statistics(cover_da)#

Percentages of the categorical NLCD cover data.

Parameters:

cover_da (xarray.DataArray) – Land cover DataArray from a LULC Dataset from the nlcd_bygeom function.

Returns:

Stats – A named tuple with the percentages of the cover classes and categories.

Return type:

pygeohydro.helpers.Stats

pygeohydro.nlcd.nlcd_area_percent(geo_df, year=2019, region='L48')#

Compute the area percentages of the natural, developed, and impervious areas.

Notes

This function uses imperviousness and land use/land cover data from NLCD to compute the area percentages of the natural, developed, and impervious areas. It considers land cover classes of 21 to 24 as urban and the rest as natural. Then, uses imperviousness percentage to partition the urban area into developed and impervious areas. So, urban = developed + impervious and always natural + urban = natural + developed + impervious = 100.

Parameters:
  • geometry (geopandas.GeoDataFrame or geopandas.GeoSeries) – A GeoDataFrame or GeoSeries with the geometry to query. The indices are used as keys in the output dictionary.

  • year (int, optional) – Year of the NLCD data, defaults to 2019. Available years are 2021, 2019, 2016, 2013, 2011, 2008, 2006, 2004, and 2001.

  • region (str, optional) – Region in the US that the input geometries are located, defaults to L48. Valid values are L48 (for CONUS), HI (for Hawaii), AK (for Alaska), and PR (for Puerto Rico). Both lower and upper cases are acceptable.

Returns:

pandas.DataFrame – A dataframe with the same index as input geo_df and columns are the area percentages of the natural, developed, impervious, and urban (sum of developed and impervious) areas. Sum of urban and natural percentages is always 100, as well as the sume of natural, developed, and impervious percentages.

Return type:

pandas.DataFrame

pygeohydro.nlcd.nlcd_bycoords(coords, years=None, region='L48', ssl=True)#

Get data from NLCD database (2019).

Parameters:
  • coords (list of tuple) – List of coordinates in the form of (longitude, latitude).

  • years (dict, optional) – The years for NLCD layers as a dictionary, defaults to {'impervious': [2019], 'cover': [2019], 'canopy': [2019], "descriptor": [2019]}. Layers that are not in years are ignored, e.g., {'cover': [2016, 2019]} returns land cover data for 2016 and 2019.

  • region (str, optional) – Region in the US that the input geometries are located, defaults to L48. Valid values are L48 (for CONUS), HI (for Hawaii), AK (for Alaska), and PR (for Puerto Rico). Both lower and upper cases are acceptable.

  • ssl (bool, optional) – Whether to use SSL for the connection, defaults to True.

Returns:

geopandas.GeoDataFrame – A GeoDataFrame with the NLCD data and the coordinates.

Return type:

geopandas.GeoDataFrame

pygeohydro.nlcd.nlcd_bygeom(geometry, resolution=30, years=None, region='L48', crs=4326, ssl=True)#

Get data from NLCD database (2019).

Parameters:
  • geometry (geopandas.GeoDataFrame or geopandas.GeoSeries) – A GeoDataFrame or GeoSeries with the geometry to query. The indices are used as keys in the output dictionary.

  • resolution (float, optional) – The data resolution in meters. The width and height of the output are computed in pixel based on the geometry bounds and the given resolution. The default is 30 m which is the native resolution of NLCD data.

  • years (dict, optional) – The years for NLCD layers as a dictionary, defaults to {'impervious': [2019], 'cover': [2019], 'canopy': [2019], "descriptor": [2019]}. Layers that are not in years are ignored, e.g., {'cover': [2016, 2019]} returns land cover data for 2016 and 2019.

  • region (str, optional) – Region in the US that the input geometries are located, defaults to L48. Valid values are L48 (for CONUS), HI (for Hawaii), AK (for Alaska), and PR (for Puerto Rico). Both lower and upper cases are acceptable.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

  • ssl (bool, optional) – Whether to use SSL for the connection, defaults to True.

Returns:

dict of xarray.Dataset or xarray.Dataset – A single or a dict of NLCD datasets. If dict, the keys are indices of the input GeoDataFrame.

Return type:

dict[int | str, xarray.Dataset]

pygeohydro.nlcd.overland_roughness(cover_da)#

Estimate overland roughness from land cover data.

Parameters:

cover_da (xarray.DataArray) – Land cover DataArray from a LULC Dataset from the nlcd_bygeom function.

Returns:

xarray.DataArray – Overland roughness

Return type:

xarray.DataArray

pygeohydro.nwis#

Accessing NWIS.

Module Contents#
class pygeohydro.nwis.NWIS#

Access NWIS web service.

Notes

More information about query parameters and codes that NWIS accepts can be found at its help webpage.

classmethod get_info(queries, expanded=False, fix_names=True, nhd_info=False)#

Send multiple queries to USGS Site Web Service.

Parameters:
  • queries (dict or list of dict) – A single or a list of valid queries.

  • expanded (bool, optional) – Whether to get expanded site information for example drainage area, default to False.

  • fix_names (bool, optional) – If True, reformat station names and some small annoyances, defaults to True.

  • nhd_info (bool, optional) – If True, get NHD information for each site, defaults to False. This will add four new columns: nhd_comid, nhd_areasqkm, nhd_reachcode, and nhd_measure. Where nhd_id is the NHD COMID of the flowline that the site is located in, nhd_reachcode is the NHD Reach Code that the site is located in, and nhd_measure is the measure along the flowline that the site is located at.

Returns:

geopandas.GeoDataFrame – A correctly typed GeoDataFrame containing site(s) information.

Return type:

geopandas.GeoDataFrame

classmethod get_parameter_codes(keyword)#

Search for parameter codes by name or number.

Notes

NWIS guideline for keywords is as follows:

By default an exact search is made. To make a partial search the term should be prefixed and suffixed with a % sign. The % sign matches zero or more characters at the location. For example, to find all with “discharge” enter %discharge% in the field. % will match any number of characters (including zero characters) at the location.

Parameters:

keyword (str) – Keyword to search for parameters by name of number.

Returns:

pandas.DataFrame – Matched parameter codes as a dataframe with their description.

Return type:

pandas.DataFrame

Examples

>>> from pygeohydro import NWIS
>>> nwis = NWIS()
>>> codes = nwis.get_parameter_codes("%discharge%")
>>> codes.loc[codes.parameter_cd == "00060", "parm_nm"].iloc[0]
'Discharge, cubic feet per second'
classmethod get_streamflow(station_ids: Sequence[str] | str, dates: tuple[str, str], freq: str = 'dv', mmd: bool = False, to_xarray: Literal[False] = ...) pandas.DataFrame#
classmethod get_streamflow(station_ids: Sequence[str] | str, dates: tuple[str, str], freq: str = 'dv', mmd: bool = False, to_xarray: Literal[True] = ...) xarray.Dataset

Get mean daily streamflow observations from USGS.

Parameters:
  • station_ids (str, list) – The gage ID(s) of the USGS station.

  • dates (tuple) – Start and end dates as a tuple (start, end).

  • freq (str, optional) – The frequency of the streamflow data, defaults to dv (daily values). Valid frequencies are dv (daily values), iv (instantaneous values). Note that for iv the time zone for the input dates is assumed to be UTC.

  • mmd (bool, optional) – Convert cms to mm/day based on the contributing drainage area of the stations. Defaults to False.

  • to_xarray (bool, optional) – Whether to return a xarray.Dataset. Defaults to False.

Returns:

pandas.DataFrame or xarray.Dataset – Streamflow data observations in cubic meter per second (cms). The stations that don’t provide the requested discharge data in the target period will be dropped. Note that when frequency is set to iv the time zone is converted to UTC.

static retrieve_rdb(url, payloads)#

Retrieve and process requests with RDB format.

Parameters:
  • url (str) – Name of USGS REST service, valid values are site, dv, iv, gwlevels, and stat. Please consult USGS documentation here for more information.

  • payloads (list of dict) – List of target payloads.

Returns:

pandas.DataFrame – Requested features as a pandas’s DataFrame.

Return type:

pandas.DataFrame

pygeohydro.nwis.streamflow_fillna(streamflow, missing_max=5)#

Fill missing data (NAN) in daily streamflow observations.

It drops stations with more than missing_max days missing data per year. Missing data in the remaining stations, are filled with day-of-year average over the entire dataset.

Parameters:
  • discharge (xarray.DataArray or pandas.DataFrame or pandas.Series) – Daily streamflow observations with at least 10 years of daily data.

  • missing_max (int) – Maximum allowed number of missing daily data per year for filling, defaults to 5.

Returns:

xarray.DataArray or pandas.DataFrame or pandas.Series – Streamflow observations with missing data filled for stations with less than missing_max days of missing data.

Return type:

ArrayLike

pygeohydro.plot#

Plot hydrological signatures.

Plots include daily, monthly and annual hydrograph as well as regime curve (monthly mean) and flow duration curve.

Module Contents#
pygeohydro.plot.prepare_plot_data(daily)#

Generate a structured data for plotting hydrologic signatures.

Parameters:

daily (pandas.Series or pandas.DataFrame) – The data to be processed

Returns:

PlotDataType – Containing daily, ``mean_monthly, ranked, titles, and units fields.

Return type:

PlotDataType

pygeohydro.plot.signatures(discharge, precipitation=None, title=None, figsize=None, output=None, close=False)#

Plot hydrological signatures w/ or w/o precipitation.

Plots includes daily hydrograph, regime curve (mean monthly) and flow duration curve. The input discharges are converted from cms to mm/day based on the watershed area, if provided.

Parameters:
  • discharge (pd.DataFrame or pd.Series) – The streamflows in mm/day. The column names are used as labels on the plot and the column values should be daily streamflow.

  • precipitation (pd.Series, optional) – Daily precipitation time series in mm/day. If given, the data is plotted on the second x-axis at the top.

  • title (str, optional) – The plot supertitle.

  • figsize (tuple, optional) – The figure size in inches, defaults to (9, 5).

  • output (str, optional) – Path to save the plot as png, defaults to None which means the plot is not saved to a file.

  • close (bool, optional) – Whether to close the figure.

pygeohydro.pygeohydro#

Accessing data from the supported databases through their APIs.

Module Contents#
class pygeohydro.pygeohydro.EHydro(data_type='points')#

Access USACE Hydrographic Surveys (eHydro).

Notes

For more info visit: https://navigation.usace.army.mil/Survey/Hydro

bygeom(geom, geo_crs=4326, sql_clause='', distance=None, return_m=False, return_geom=True)#

Get features within a geometry that can be combined with a SQL where clause.

byids(field, fids, return_m=False, return_geom=True)#

Get features by object IDs.

bysql(sql_clause, return_m=False, return_geom=True)#

Get features using a valid SQL 92 WHERE clause.

Parameters:

data_type (str, optional) – Type of the survey data to retrieve, defaults to points. Note that the points data type gets the best available point cloud data, i.e., if SurveyPointHD is available, it will be returned, otherwise SurveyPoint will be returned. Available types are:

  • points: Point clouds

  • outlines: Polygons of survey outlines

  • contours: Depth contours

  • bathymetry: Bathymetry data

Note that point clouds are not available for all surveys.

property survey_grid: geopandas.GeoDataFrame#

Full survey availability on hexagonal grid cells of 35 km resolution.

class pygeohydro.pygeohydro.NID#

Retrieve data from the National Inventory of Dams web service.

property df#

Entire NID inventory (csv version) as a pandas.DataFrame.

property gdf#

Entire NID inventory (gpkg version) as a geopandas.GeoDataFrame.

property nid_inventory_path: pathlib.Path#

Path to the NID inventory feather file.

get_byfilter(query_list)#

Query dams by filters from the National Inventory of Dams web service.

Parameters:

query_list (list of dict) – List of dictionary of query parameters. For an exhaustive list of the parameters, use the advanced fields dataframe that can be accessed via NID().fields_meta. Some filter require min/max values such as damHeight and drainageArea. For such filters, the min/max values should be passed like so: {filter_key: ["[min1 max1]", "[min2 max2]"]}.

Returns:

list of geopandas.GeoDataFrame – Query results in the same order as the input query list.

Return type:

list[geopandas.GeoDataFrame]

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> query_list = [
...    {"drainageArea": ["[200 500]"]},
...    {"nidId": ["CA01222"]},
... ]
>>> dam_dfs = nid.get_byfilter(query_list)
get_bygeom(geometry, geo_crs)#

Retrieve NID data within a geometry.

Parameters:
  • geometry (Polygon, MultiPolygon, or tuple of length 4) – Geometry or bounding box (west, south, east, north) for extracting the data.

  • geo_crs (list of str) – The CRS of the input geometry.

Returns:

geopandas.GeoDataFrame – GeoDataFrame of NID data

Return type:

geopandas.GeoDataFrame

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> dams = nid.get_bygeom((-69.77, 45.07, -69.31, 45.45), 4326)
get_suggestions(text, context_key=None)#

Get suggestions from the National Inventory of Dams web service.

Notes

This function is useful for exploring and/or narrowing down the filter fields that are needed to query the dams using get_byfilter.

Parameters:
  • text (str) – Text to query for suggestions.

  • context_key (str, optional) – Suggestion context, defaults to empty string, i.e., all context keys. For a list of valid context keys, see NID().fields_meta.

Returns:

tuple of pandas.DataFrame – The suggestions for the requested text as two DataFrames: First, is suggestions found in the dams properties and second, those found in the query fields such as states, huc6, etc.

Return type:

tuple[pandas.DataFrame, pandas.DataFrame]

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> dams, contexts = nid.get_suggestions("houston", "city")
inventory_byid(federal_ids)#

Get extra attributes for dams based on their dam ID.

Notes

This function is meant to be used for getting extra attributes for dams. For example, first you need to use either get_bygeom or get_byfilter to get basic attributes of the target dams. Then you can use this function to get extra attributes using the id column of the GeoDataFrame that get_bygeom or get_byfilter returns.

Parameters:

federal_ids (list of str) – List of the target dam Federal IDs.

Returns:

pandas.DataFrame – Dams with extra attributes in addition to the standard NID fields that other NID methods return.

Return type:

geopandas.GeoDataFrame

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> dams = nid.inventory_byid(['KY01232', 'GA02400', 'NE04081', 'IL55070', 'TN05345'])
stage_nid_inventory(fname=None)#

Download the entire NID inventory data and save to a feather file.

Parameters:

fname (str, pathlib.Path, optional) – The path to the file to save the data to, defaults to ./cache/nid_inventory.feather.

pygeohydro.pygeohydro.get_camels()#

Get streaflow and basin attributes of all 671 stations in CAMELS dataset.

Notes

For more info on CAMELS visit: https://ral.ucar.edu/solutions/products/camels

Returns:

tuple of geopandas.GeoDataFrame and xarray.Dataset – The first is basin attributes as a geopandas.GeoDataFrame and the second is streamflow data and basin attributes as an xarray.Dataset.

Return type:

tuple[geopandas.GeoDataFrame, xarray.Dataset]

pygeohydro.pygeohydro.soil_gnatsgo(layers, geometry, crs=4326)#

Get US soil data from the gNATSGO dataset.

Notes

This function uses Microsoft’s Planetary Computer service to get the data. The dataset’s description and its supported soil properties can be found at: https://planetarycomputer.microsoft.com/dataset/gnatsgo-rasters

Parameters:
  • layers (list of str or str) – Target layer(s). Available layers can be found at the dataset’s website here.

  • geometry (Polygon, MultiPolygon, or tuple of length 4) – Geometry or bounding box of the region of interest.

  • crs (int, str, or pyproj.CRS, optional) – The input geometry CRS, defaults to epsg:4326.

Returns:

xarray.Dataset – Requested soil properties.

Return type:

xarray.Dataset

pygeohydro.pygeohydro.soil_properties(properties='*', soil_dir='cache')#

Get soil properties dataset in the United States from ScienceBase.

Notes

This function downloads the source zip files from ScienceBase , extracts the included .tif files, and return them as an xarray.Dataset.

Parameters:
  • properties (list of str or str, optional) – Soil properties to extract, default to “*”, i.e., all the properties. Available properties are awc for available water capacity, fc for field capacity, and por for porosity.

  • soil_dir (str or pathlib.Pathlib.Path) – Directory to store zip files or if exists read from them, defaults to ./cache.

pygeohydro.pygeohydro.ssebopeta_bycoords(coords, dates, crs=4326)#

Daily actual ET for a dataframe of coords from SSEBop database in mm/day.

Parameters:
  • coords (pandas.DataFrame) – A dataframe with id, x, y columns.

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • crs (str, int, or pyproj.CRS, optional) – The CRS of the input coordinates, defaults to epsg:4326.

Returns:

xarray.Dataset – Daily actual ET in mm/day as a dataset with time and location_id dimensions. The location_id dimension is the same as the id column in the input dataframe.

Return type:

xarray.Dataset

pygeohydro.pygeohydro.ssebopeta_bygeom(geometry, dates, geo_crs=4326)#

Get daily actual ET for a region from SSEBop database.

Notes

Since there’s still no web service available for subsetting SSEBop, the data first needs to be downloaded for the requested period then it is masked by the region of interest locally. Therefore, it’s not as fast as other functions and the bottleneck could be the download speed.

Parameters:
  • geometry (shapely.Polygon or tuple) – The geometry for downloading clipping the data. For a tuple bbox, the order should be (west, south, east, north).

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • geo_crs (str, int, or pyproj.CRS, optional) – The CRS of the input geometry, defaults to epsg:4326.

Returns:

xarray.DataArray – Daily actual ET within a geometry in mm/day at 1 km resolution

Return type:

xarray.DataArray

pygeohydro.stnfloodevents#

Access USGS Short-Term Network (STN) via Restful API.

Module Contents#
class pygeohydro.stnfloodevents.STNFloodEventData#

Client for STN Flood Event Data’s RESTFUL Service API.

Advantages of using this client are:

  • The user does not need to know the details of RESTFUL in general and of this API specifically.

  • Parses the data and returns Python objects (e.g., pandas.DataFrame, geopandas.GeoDataFrame) instead of JSON.

  • Convenience functions are offered for data dictionaries.

  • Geo-references the data where applicable.

service_url#

The service url of the STN Flood Event Data RESTFUL Service API.

Type:

str

data_dictionary_url#

The data dictionary url of the STN Flood Event Data RESTFUL Service API.

Type:

str

service_crs#

The CRS of the data from the service which is EPSG:4326.

Type:

int

instruments_query_params#

The accepted query parameters for the instruments data type. Accepted values are SensorType, CurrentStatus, States, Event, County, DeploymentType, EventType, EventStatus, and CollectionCondition.

Type:

set

peaks_query_params#

The accepted query parameters for the peaks data type. Accepted values are EndDate, States, Event, StartDate, County, EventType, and EventStatus.

Type:

set

hwms_query_params#

The accepted query parameters for the hwms data type. Accepted values are EndDate, States, Event, StartDate, County, EventType, and EventStatus.

Type:

set

sites_query_params#

The accepted query parameters for the sites data type. Accepted values are OPDefined, HousingTypeOne, NetworkName, HousingTypeSeven, RDGOnly, HWMOnly, Event, SensorOnly, State, SensorType, and HWMSurveyed.

Type:

set

Notes

Point data from the service is assumed to be in the WGS84 coordinate reference system (EPSG:4326).

References

classmethod data_dictionary(data_type: str, as_dict: Literal[False] = False, async_retriever_kwargs: dict[str, Any] | None = ...) pandas.DataFrame#
classmethod data_dictionary(data_type: str, as_dict: Literal[True] = True, async_retriever_kwargs: dict[str, Any] | None = ...) dict[str, Any]

Retrieve data dictionaries from the STN Flood Event Data API.

Parameters:
  • data_type (str) – The data source from STN Flood Event Data API. It can be instruments, peaks, hwms, or sites.

  • as_dict (bool, default = False) – If True, return the data dictionary as a dictionary. Otherwise, it returns as pandas.DataFrame.

  • async_retriever_kwargs (dict, optional) – Additional keyword arguments to pass to async_retriever.retrieve_json(). The url and request_kwds options are already set.

Returns:

pandas.DataFrame or dict – The retrieved data dictionary as pandas.DataFrame or dict.

See also

get_all_data()

Retrieves all data for a given data type.

get_filtered_data()

Retrieves filtered data for a given data type.

Examples

>>> from pygeohydro.stnfloodevents import STNFloodEventData
>>> data = STNFloodEventData.data_dictionary(data_type="instruments", as_dict=False)
>>> data.shape[1]
2
>>> data.columns
Index(['Field', 'Definition'], dtype='object')
classmethod get_all_data(data_type: str, as_list: Literal[False] = False, crs: CRSTYPE = ..., async_retriever_kwargs: dict[str, Any] | None = ...) geopandas.GeoDataFrame | pandas.DataFrame#
classmethod get_all_data(data_type: str, as_list: Literal[True] = True, crs: CRSTYPE = ..., async_retriever_kwargs: dict[str, Any] | None = ...) list[dict[str, Any]]

Retrieve all data from the STN Flood Event Data API.

Parameters:
  • data_type (str) – The data source from STN Flood Event Data API. It can be instruments, peaks, hwms, or sites.

  • as_list (bool, optional) – If True, return the data as a list, defaults to False.

  • crs (int, str, or pyproj.CRS, optional) – Desired Coordinate reference system (CRS) of output. Only used for GeoDataFrames with hwms and sites data types.

  • async_retriever_kwargs (dict, optional) – Additional keyword arguments to pass to async_retriever.retrieve_json(). The url and request_kwds options are already set.

Returns:

geopandas.GeoDataFrame or pandas.DataFrame or list of dict – The retrieved data as a GeoDataFrame, DataFrame, or a list of dictionaries.

Raises:

InputValueError – If the input data_type is not one of instruments, peaks, hwms, or sites

See also

get_filtered_data()

Retrieves filtered data for a given data type.

data_dictionary()

Retrieves the data dictionary for a given data type.

Notes

Notice schema differences between the data dictionaries, filtered data queries, and all data queries. This is a known issue and is being addressed by USGS.

Examples

>>> from pygeohydro.stnfloodevents import STNFloodEventData
>>> data = STNFloodEventData.get_all_data(data_type="instruments")
>>> data.shape[1]
18
>>> data.columns
Index(['instrument_id', 'sensor_type_id', 'deployment_type_id',
       'location_description', 'serial_number', 'interval', 'site_id',
       'event_id', 'inst_collection_id', 'housing_type_id', 'sensor_brand_id',
       'vented', 'instrument_status', 'data_files', 'files', 'last_updated',
       'last_updated_by', 'housing_serial_number'],
       dtype='object')
classmethod get_filtered_data(data_type: str, query_params: dict[str, Any] | None = ..., as_list: Literal[False] = False, crs: CRSTYPE = ..., async_retriever_kwargs: dict[str, Any] | None = ...) geopandas.GeoDataFrame | pandas.DataFrame#
classmethod get_filtered_data(data_type: str, query_params: dict[str, Any] | None = ..., as_list: Literal[True] = True, crs: CRSTYPE = ..., async_retriever_kwargs: dict[str, Any] | None = ...) list[dict[str, Any]]

Retrieve filtered data from the STN Flood Event Data API.

Parameters:
  • data_type (str) – The data source from STN Flood Event Data API. It can be instruments, peaks, hwms, or sites.

  • query_params (dict, optional) – RESTFUL API query parameters. For accepted values, see the STNFloodEventData class attributes instruments_query_params, peaks_query_params, hwms_query_params, and sites_query_params for available values.

    Also, see the API documentation for each data type for more information:
  • as_list (bool, optional) – If True, return the data as a list, defaults to False.

  • crs (int, str, or pyproj.CRS, optional) – Desired Coordinate reference system (CRS) of output. Only used for GeoDataFrames outputs.

  • async_retriever_kwargs (dict, optional) – Additional keyword arguments to pass to async_retriever.retrieve_json(). The url and request_kwds options are already set.

Returns:

geopandas.GeoDataFrame or pandas.DataFrame or list of dict – The retrieved data as a GeoDataFrame, DataFrame, or a list of dictionaries.

Raises:

See also

get_all_data()

Retrieves all data for a given data type.

data_dictionary()

Retrieves the data dictionary for a given data type.

Notes

Notice schema differences between the data dictionaries, filtered data queries, and all data queries. This is a known issue and is being addressed by USGS.

Examples

>>> from pygeohydro.stnfloodevents import STNFloodEventData
>>> query_params = {"States": "SC, CA"}
>>> data = STNFloodEventData.get_filtered_data(data_type="instruments", query_params=query_params)
>>> data.shape[1]
34
>>> data.columns
Index(['sensorType', 'deploymentType', 'eventName', 'collectionCondition',
    'housingType', 'sensorBrand', 'statusId', 'timeStamp', 'site_no',
    'latitude', 'longitude', 'siteDescription', 'networkNames', 'stateName',
    'countyName', 'siteWaterbody', 'siteHDatum', 'sitePriorityName',
    'siteZone', 'siteHCollectMethod', 'sitePermHousing', 'instrument_id',
    'sensor_type_id', 'deployment_type_id', 'location_description',
    'serial_number', 'housing_serial_number', 'interval', 'site_id',
    'vented', 'instrument_status', 'data_files', 'files', 'geometry'],
    dtype='object')
pygeohydro.stnfloodevents.stn_flood_event(data_type, query_params=None)#

Retrieve data from the STN Flood Event Data API.

Parameters:
  • data_type (str) – The data source from STN Flood Event Data API. It can be instruments, peaks, hwms, or sites.

  • query_params (dict, optional) – RESTFUL API query parameters, defaults to None which returns a pandas.DataFrame of information about the given data_type. For accepted values, see the STNFloodEventData class attributes instruments_query_params, peaks_query_params, hwms_query_params, and sites_query_params for available values.

    Also, see the API documentation for each data type for more information:

Returns:

geopandas.GeoDataFrame or pandas.DataFrame – The retrieved data as a GeoDataFrame or DataFrame (if query_params is not passed).

Raises:
  • InputValueError – If the input data_type is not one of instruments, peaks, hwms, or sites

  • InputValueError – If any of the input query_params are not in accepted parameters.

Return type:

geopandas.GeoDataFrame | pandas.DataFrame

References

Notes

Notice schema differences between the data dictionaries, filtered data queries, and all data queries. This is a known issue and is being addressed by USGS.

Examples

>>> query_params = {"States": "SC, CA"}
>>> data = stn_flood_event("instruments", query_params=query_params)
>>> data.shape[1]
34
>>> data.columns
Index(['sensorType', 'deploymentType', 'eventName', 'collectionCondition',
    'housingType', 'sensorBrand', 'statusId', 'timeStamp', 'site_no',
    'latitude', 'longitude', 'siteDescription', 'networkNames', 'stateName',
    'countyName', 'siteWaterbody', 'siteHDatum', 'sitePriorityName',
    'siteZone', 'siteHCollectMethod', 'sitePermHousing', 'instrument_id',
    'sensor_type_id', 'deployment_type_id', 'location_description',
    'serial_number', 'housing_serial_number', 'interval', 'site_id',
    'vented', 'instrument_status', 'data_files', 'files', 'geometry'],
    dtype='object')
pygeohydro.us_abbrs#

US states and territories Abbreviations from us package.

pygeohydro.waterdata#

Accessing WaterData related APIs.

Module Contents#
class pygeohydro.waterdata.SensorThings#

Class for interacting with SensorThings API.

static odata_helper(columns=None, conditionals=None, expand=None, max_count=None, extra_params=None)#

Generate Odata filters for SensorThings API.

Parameters:
  • columns (list of str, optional) – Columns to be selected from the database, defaults to None.

  • conditionals (str, optional) – Conditionals to be applied to the database, defaults to None. Note that the conditionals should have the form of cond1 operator 'value' and/or cond2 operator 'value. For example: properties/monitoringLocationType eq 'Stream' and ...

  • expand (dict of dict, optional) – Expand the properties of the selected columns, defaults to None. Note that the expand should have the form of {Property: {func: value, ...}}. For example: {"Locations": {"select": "location", "filter": "ObservedProperty/@iot.id eq '00060'"}}

  • max_count (int, optional) – Maximum number of items to be returned, defaults to None.

  • extra_params (dict, optional) – Extra parameters to be added to the Odata filter, defaults to None.

Returns:

odata (dict) – Odata filter for the SensorThings API.

Return type:

dict[str, str]

query_byodata(odata, outformat='json')#

Query the SensorThings API by Odata filter.

Parameters:
  • odata (str) – Odata filter for the SensorThings API.

  • outformat (str, optional) – Format of the response, defaults to json. Valid values are json and geojson.

Returns:

pandas.DataFrame or geopandas.GeoDataFrame – Requested data.

Return type:

geopandas.GeoDataFrame | pandas.DataFrame

sensor_info(sensor_ids)#

Query the SensorThings API by a sensor ID.

Parameters:

sensor_ids (str or list of str) – A single or list of sensor IDs, e.g., USGS-09380000.

Returns:

pandas.DataFrame – Requested sensor data.

Return type:

pandas.DataFrame

sensor_property(sensor_property, sensor_ids)#

Query a sensor property.

Parameters:
  • sensor_property (str or list of str) – A sensor property, Valid properties are Datastreams, MultiDatastreams, Locations, HistoricalLocations, TaskingCapabilities.

  • sensor_ids (str or list of str) – A single or list of sensor IDs, e.g., USGS-09380000.

Returns:

pandas.DataFrame – A dataframe containing the requested property.

Return type:

pandas.DataFrame

class pygeohydro.waterdata.WaterQuality#

Water Quality Web Service https://www.waterqualitydata.us.

Notes

This class has a number of convenience methods to retrieve data from the Water Quality Data. Since there are many parameter combinations that can be used to retrieve data, a general method is also provided to retrieve data from any of the valid endpoints. You can use get_json to retrieve stations info as a geopandas.GeoDataFrame or get_csv to retrieve stations data as a pandas.DataFrame. You can construct a dictionary of the parameters and pass it to one of these functions. For more information on the parameters, please consult the Water Quality Data documentation.

data_bystation(station_ids, wq_kwds)#

Retrieve data for a single station.

Parameters:
  • station_ids (str or list of str) – Station ID(s). The IDs should have the format “Agency code-Station ID”.

  • wq_kwds (dict, optional) – Water Quality Web Service keyword arguments. Default to None.

Returns:

pandas.DataFrame – DataFrame of data for the stations.

Return type:

pandas.DataFrame

get_csv(endpoint, kwds, request_method='GET')#

Get the CSV response from the Water Quality Web Service.

Parameters:
  • endpoint (str) – Endpoint of the Water Quality Web Service.

  • kwds (dict) – Water Quality Web Service keyword arguments.

  • request_method (str, optional) – HTTP request method. Default to GET.

Returns:

pandas.DataFrame – The web service response as a DataFrame.

Return type:

pandas.DataFrame

get_json(endpoint, kwds, request_method='GET')#

Get the JSON response from the Water Quality Web Service.

Parameters:
  • endpoint (str) – Endpoint of the Water Quality Web Service.

  • kwds (dict) – Water Quality Web Service keyword arguments.

  • request_method (str, optional) – HTTP request method. Default to GET.

Returns:

geopandas.GeoDataFrame – The web service response as a GeoDataFrame.

Return type:

geopandas.GeoDataFrame

get_param_table()#

Get the parameter table from the USGS Water Quality Web Service.

lookup_domain_values(endpoint)#

Get the domain values for the target endpoint.

station_bybbox(bbox, wq_kwds)#

Retrieve station info within bounding box.

Parameters:
  • bbox (tuple of float) – Bounding box coordinates (west, south, east, north) in epsg:4326.

  • wq_kwds (dict, optional) – Water Quality Web Service keyword arguments. Default to None.

Returns:

geopandas.GeoDataFrame – GeoDataFrame of station info within the bounding box.

Return type:

geopandas.GeoDataFrame

station_bydistance(lon, lat, radius, wq_kwds)#

Retrieve station within a radius (decimal miles) of a point.

Parameters:
  • lon (float) – Longitude of point.

  • lat (float) – Latitude of point.

  • radius (float) – Radius (decimal miles) of search.

  • wq_kwds (dict, optional) – Water Quality Web Service keyword arguments. Default to None.

Returns:

geopandas.GeoDataFrame – GeoDataFrame of station info within the radius of the point.

Return type:

geopandas.GeoDataFrame

pygeohydro.watershed#

Accessing watershed boundary-level data through web services.

Module Contents#
class pygeohydro.watershed.WBD(layer, outfields='*', crs=4326)#

Access Watershed Boundary Dataset (WBD).

Notes

This web service offers Hydrologic Unit (HU) polygon boundaries for the United States, Puerto Rico, and the U.S. Virgin Islands. For more info visit: https://hydro.nationalmap.gov/arcgis/rest/services/wbd/MapServer

Parameters:
  • layer (str, optional) – A valid service layer. Valid layers are:

    • wbdline

    • huc2

    • huc4

    • huc6

    • huc8

    • huc10

    • huc12

    • huc14

    • huc16

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, int, or pyproj.CRS, optional) – Target spatial reference, default to EPSG:4326.

pygeohydro.watershed.huc_wb_full(huc_lvl)#

Get the full watershed boundary for a given HUC level.

Notes

This function is designed for cases where the full watershed boundary is needed for a given HUC level. If only a subset of the HUCs is needed, then use the pygeohydro.WBD class. The full dataset is downloaded from the National Maps’ WBD staged products.

Parameters:

huc_lvl (int) – HUC level, must be even numbers between 2 and 16.

Returns:

geopandas.GeoDataFrame – The full watershed boundary for the given HUC level.

Return type:

geopandas.GeoDataFrame

pygeohydro.watershed.irrigation_withdrawals()#

Get monthly water use for irrigation at HUC12-level for CONUS.

Notes

Dataset is retrieved from https://doi.org/10.5066/P9FDLY8P.

Package Contents#

py3dep#

Top-level package for Py3DEP.

Submodules#

py3dep.py3dep#

Get data from 3DEP database.

Module Contents#
py3dep.py3dep.add_elevation(ds, resolution=None, x_dim='x', y_dim='y', mask=None)#

Add elevation data to a dataset as a new variable.

Parameters:
  • ds (xarray.DataArray or xarray.Dataset) – The dataset to add elevation data to. It must contain CRS information.

  • resolution (float, optional) – Target DEM source resolution in meters, defaults None, i.e., the resolution of the input ds will be used.

  • x_dim (str, optional) – Name of the x-coordinate dimension in ds, defaults to x.

  • y_dim (str, optional) – Name of the y-coordinate dimension in ds, defaults to y.

  • mask (xarray.DataArray, optional) – A mask to apply to the elevation data, defaults to None.

Returns:

xarray.Dataset – The dataset with elevation variable added.

Return type:

xarray.Dataset

py3dep.py3dep.check_3dep_availability(bbox, crs=4326)#

Query 3DEP’s resolution availability within a bounding box.

This function checks availability of 3DEP’s at the following resolutions: 1 m, 3 m, 5 m, 10 m, 30 m, 60 m, and topobathy (integrated topobathymetry).

Parameters:
  • bbox (tuple) – Bounding box as tuple of (min_x, min_y, max_x, max_y).

  • crs (str, int, or pyproj.CRS or pyproj.CRS, optional) – Spatial reference (CRS) of bbox, defaults to EPSG:4326.

Returns:

dictTrue if bbox intersects 3DEP elevation for each available resolution. Keys are the supported resolutions and values are their availability. If the query fails due to any reason, the value will be Failed. If necessary, you can try again later until there is no Failed value.

Return type:

dict[str, bool | str]

Examples

>>> import py3dep
>>> bbox = (-69.77, 45.07, -69.31, 45.45)
>>> py3dep.check_3dep_availability(bbox)
{'1m': True, '3m': False, '5m': False, '10m': True, '30m': True, '60m': False, 'topobathy': False}
py3dep.py3dep.elevation_bycoords(coords: tuple[float, float], crs: CRSTYPE = ..., source: Literal[tep, tnm] = ...) float#
py3dep.py3dep.elevation_bycoords(coords: list[tuple[float, float]], crs: CRSTYPE = ..., source: Literal[tep, tnm] = ...) list[float]

Get elevation for a list of coordinates.

Parameters:
  • coords (tuple or list of tuple) – Coordinates of target location(s), e.g., [(x, y), ...].

  • crs (str, int, or pyproj.CRS or pyproj.CRS, optional) – Spatial reference (CRS) of coords, defaults to EPSG:4326.

  • source (str, optional) – Data source to be used, default to tep. Supported sources are tnm (using The National Map’s Bulk Point Query Service with 10 m resolution) and tep (using 3DEP’s static DEM VRTs at 10 m resolution). The tnm and tep sources are more accurate since they use the 1/3 arc-second DEM layer from 3DEP service but it is limited to the US. Note that tnm is bit unstable. It’s recommended to use tep unless 10-m resolution accuracy is not necessary.

Returns:

float or list of float – Elevation in meter.

py3dep.py3dep.elevation_bygrid(xcoords, ycoords, crs, resolution, depression_filling=False)#

Get elevation from DEM data for a grid.

This function is intended for getting elevations for a gridded dataset.

Parameters:
  • xcoords (list) – List of x-coordinates of a grid.

  • ycoords (list) – List of y-coordinates of a grid.

  • crs (str, int, or pyproj.CRS or pyproj.CRS) – The spatial reference system of the input grid, defaults to EPSG:4326.

  • resolution (int) – The accuracy of the output, defaults to 10 m which is the highest available resolution that covers CONUS. Note that higher resolution increases computation time so chose this value with caution.

  • depression_filling (bool, optional) – Fill depressions before sampling using pyflwdir package, defaults to False.

Returns:

xarray.DataArray – Elevations of the input coordinates as a xarray.DataArray.

Return type:

xarray.DataArray

py3dep.py3dep.elevation_profile(lines, spacing, crs=4326)#

Get the elevation profile along a line at a given uniform spacing.

Note

This function converts the line to a spline and then calculates the elevation along the spline at a given uniform spacing using 10-m resolution DEM from 3DEP.

Parameters:
  • lines (LineString or MultiLineString) – Line segment(s) to be profiled. If its type is MultiLineString, it will be converted to a single LineString and if this operation fails, an InputTypeError will be raised.

  • spacing (float) – Spacing between the sample points along the line in meters.

  • crs (str, int, or pyproj.CRS, optional) – Spatial reference System (CRS) of lines, defaults to EPSG:4326.

Returns:

xarray.DataArray – Elevation profile with dimension z and three coordinates: x, y, and distance. The distance coordinate is the distance from the start of the line in meters.

Return type:

xarray.DataArray

py3dep.py3dep.get_dem(geometry, resolution, crs=4326)#

Get DEM data at any resolution from 3DEP.

Notes

This function is a wrapper of static_3dep_dem and get_map functions. Since static_3dep_dem is much faster, if the requested resolution is 10 m, 30 m, or 60 m, static_3dep_dem will be used. Otherwise, get_map will be used.

Parameters:
  • geometry (Polygon, MultiPolygon, or tuple of length 4) – Geometry to get DEM within. It can be a polygon or a boundong box of form (xmin, ymin, xmax, ymax).

  • resolution (int) – Target DEM source resolution in meters.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference system of the input geometry, defaults to EPSG:4326.

Returns:

xarray.DataArray – DEM at the specified resolution in meters and 4326 CRS.

Return type:

xarray.DataArray

py3dep.py3dep.get_dem_vrt(bbox, resolution, vrt_path, tiff_dir='cache', crs=4326)#

Get DEM data at any resolution from 3DEP and save it as a VRT file.

Parameters:
  • bbox (tuple of length 4) – The boundong box of form (xmin, ymin, xmax, ymax).

  • resolution (int) – Target DEM source resolution in meters.

  • vrt_path (str or pathlib.Path) – Path to the output VRT file.

  • tiff_dir (str or pathlib.Path, optional) – Path to the directory to save the downloaded TIFF file, defaults to ./cache.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference system of bbox, defaults to EPSG:4326.

py3dep.py3dep.get_map(layers: str, geometry: shapely.Polygon | shapely.MultiPolygon | tuple[float, float, float, float], resolution: int, geo_crs: CRSTYPE = ..., crs: CRSTYPE = ...) xarray.DataArray#
py3dep.py3dep.get_map(layers: list[str], geometry: shapely.Polygon | shapely.MultiPolygon | tuple[float, float, float, float], resolution: int, geo_crs: CRSTYPE = ..., crs: CRSTYPE = ...) xarray.Dataset

Access dynamic layer of 3DEP.

The 3DEP service has multi-resolution sources, so depending on the user provided resolution the data is resampled on server-side based on all the available data sources. The following layers are available:

  • DEM

  • Hillshade Gray

  • Aspect Degrees

  • Aspect Map

  • GreyHillshade_elevationFill

  • Hillshade Multidirectional

  • Slope Map

  • Slope Degrees

  • Hillshade Elevation Tinted

  • Height Ellipsoidal

  • Contour 25

  • Contour Smoothed 25

Parameters:
  • layers (str or list of str) – A valid 3DEP layer or a list of them.

  • geometry (Polygon, MultiPolygon, or tuple) – A shapely Polygon or a bounding box of the form (west, south, east, north).

  • resolution (int) – The target resolution in meters. The width and height of the output are computed in pixels based on the geometry bounds and the given resolution.

  • geo_crs (str, int, or pyproj.CRS, optional) – The spatial reference system of the input geometry, defaults to EPSG:4326.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference system to be used for requesting the data, defaults to EPSG:4326. Valid values are EPSG:4326, EPSG:3576, EPSG:3571, EPSG:3575, EPSG:3857, EPSG:3572, CRS:84, EPSG:3573, and EPSG:3574.

Returns:

xarray.DataArray or xarray.Dataset – The requested topographic data as an xarray.DataArray or xarray.Dataset.

py3dep.py3dep.query_3dep_sources(bbox, crs=4326, res=None)#

Query 3DEP’s data sources within a bounding box.

This function queries the availability of the underlying data that 3DEP uses at the following resolutions: 1 m, 3 m, 5 m, 10 m, 30 m, 60 m, and topobathy (integrated topobathymetry).

Parameters:
  • bbox (tuple) – Bounding box as tuple of (min_x, min_y, max_x, max_y).

  • crs (str, int, or pyproj.CRS or pyproj.CRS, optional) – Spatial reference (CRS) of bbox, defaults to EPSG:4326.

  • res (str, list of str, optional) – Resolution to query, defaults to None, i.e., all resolutions. Available resolutions are: 1m, 3m, 5m, 10m, 30m, 60m, and topobathy.

Returns:

geopandas.GeoDataFrame – Polygon(s) representing the 3DEP data sources at each resolution. Resolutions are given in the dem_res column.

Return type:

geopandas.GeoDataFrame

Examples

>>> import py3dep
>>> bbox = (-69.77, 45.07, -69.31, 45.45)
>>> src = py3dep.query_3dep_sources(bbox)
>>> src.groupby("dem_res")["OBJECTID"].count().to_dict()
{'10m': 8, '1m': 3, '30m': 8}
>>> src = py3dep.query_3dep_sources(bbox, res="1m")
>>> src.groupby("dem_res")["OBJECTID"].count().to_dict()
{'1m': 3}
py3dep.py3dep.static_3dep_dem(geometry, crs, resolution=10)#

Get DEM data at specific resolution from 3DEP.

Notes

In contrast to get_map function, this function only gets DEM data at specific resolution, namely 10 m, 30 m, and 60 m. However, this function is faster. This function is intended for cases where only need DEM at a specific resolution is required and for the other requests get_map should be used.

Parameters:
  • geometry (Polygon, MultiPolygon, or tuple of length 4) – Geometry to get DEM within. It can be a polygon or a boundong box of form (xmin, ymin, xmax, ymax).

  • crs (int, str, of pyproj.CRS) – CRS of the input geometry.

  • resolution (int, optional) – Target DEM source resolution in meters, defaults to 10 m which is the highest resolution available over the US. Available options are 10, 30, and 60.

Returns:

xarray.DataArray – The request DEM at the specified resolution.

Return type:

xarray.DataArray

py3dep.utils#

Utilities for Py3DEP.

Module Contents#
py3dep.utils.deg2mpm(slope)#

Convert slope from degrees to meter/meter.

Parameters:

slope (xarray.DataArray) – Slope in degrees.

Returns:

xarray.DataArray – Slope in meter/meter. The name is set to slope and the units attribute is set to m/m.

Return type:

xarray.DataArray

py3dep.utils.fill_depressions(elevtn, outlets='min', idxs_pit=None, nodata=np.nan, max_depth=-1.0, elv_max=None, connectivity=8)#

Fill local depressions in elevation data based on Wang and Liu (2006).

Note

This function is based on the fill_depressions function from the pyflwdir package. This function improves the performance of the original function by a factor of up to 2 and adds more input checks. Additionally, it works with xarray.DataArray objects.

Outlets are assumed to occur at the edge of valid elevation cells outlets='edge'; at the lowest valid edge cell to create one single outlet outlets='min'; or at user provided outlet cells idxs_pit.

Depressions elsewhere are filled based on its lowest pour point elevation. If the pour point depth is larger than the maximum pour point depth max_depth a pit is set at the depression local minimum elevation.

Wang, L., & Liu, H. (2006). https://doi.org/10.1080/13658810500433453

Parameters:
  • elevtn (numpy.ndarray or xarray.DataArray) – elevation raster as a 2D numpy.ndarray or xarray.DataArray.

  • outlets ({"edge", "min}, optional) – Initial basin outlet(s) at the edge of all cells (‘edge’) or only the minimum elevation edge cell (‘min’; default)

  • idxs_pit (1D array of int, optional) – Linear indices of outlet cells, in any, defaults to None.

  • nodata (float, optional) – nodata value, defaults to numpy.nan.

  • max_depth (float, optional) – Maximum pour point depth. Depressions with a larger pour point depth are set as pit. A negative value (default) equals an infitely large pour point depth causing all depressions to be filled. Defaults to -1.0.

  • elv_max, float, optional – Maximum elevation for outlets, only in combination with outlets='edge'. By default None.

  • connectivity ({4, 8}, optional) – Number of neighboring cells to consider, defaults to 8.

Returns:

elevtn_out (numpy.ndarray) – Depression filled elevation with type float32.

Return type:

DataArray

Package Contents#

pydaymet#

Top-level package for PyDaymet.

Submodules#

pydaymet.core#

Core class for the Daymet functions.

Module Contents#
class pydaymet.core.Daymet(variables=None, pet=None, snow=False, time_scale='daily', region='na')#

Base class for Daymet requests.

Parameters:
  • variables (str or list or tuple, optional) – List of variables to be downloaded. The acceptable variables are: tmin, tmax, prcp, srad, vp, swe, dayl Descriptions can be found here. Defaults to None i.e., all the variables are downloaded.

  • pet (str, optional) – Method for computing PET. Supported methods are penman_monteith, priestley_taylor, hargreaves_samani, and None (don’t compute PET). The penman_monteith method is based on Allen et al.[1] assuming that soil heat flux density is zero. The priestley_taylor method is based on Priestley and TAYLOR[2] assuming that soil heat flux density is zero. The hargreaves_samani method is based on Hargreaves and Samani[3]. Defaults to None.

  • snow (bool, optional) – Compute snowfall from precipitation and minimum temperature. Defaults to False.

  • time_scale (str, optional) – Data time scale which can be daily, monthly (monthly summaries), or annual (annual summaries). Defaults to daily.

  • region (str, optional) – Region in the US, defaults to na. Acceptable values are:

    • na: Continental North America

    • hi: Hawaii

    • pr: Puerto Rico

References

static check_dates(dates)#

Check if input dates are in correct format and valid.

dates_todict(dates)#

Set dates by start and end dates as a tuple, (start, end).

dates_tolist(dates)#

Correct dates for Daymet accounting for leap years.

Daymet doesn’t account for leap years and removes Dec 31 when it’s leap year.

Parameters:

dates (tuple) – Target start and end dates.

Returns:

list – All the dates in the Daymet database within the provided date range.

Return type:

list[tuple[pandas.Timestamp, pandas.Timestamp]]

years_todict(years)#

Set date by list of year(s).

years_tolist(years)#

Correct dates for Daymet accounting for leap years.

Daymet doesn’t account for leap years and removes Dec 31 when it’s leap year.

Parameters:

years (list) – A list of target years.

Returns:

list – All the dates in the Daymet database within the provided date range.

Return type:

list[tuple[pandas.Timestamp, pandas.Timestamp]]

pydaymet.core.separate_snow(clm, t_rain=T_RAIN, t_snow=T_SNOW)#

Separate snow based on Martinez and Gupta[4].

Parameters:
  • clm (pandas.DataFrame or xarray.Dataset) – Climate data that should include prcp and tmin.

  • t_rain (float, optional) – Threshold for temperature for considering rain, defaults to 2.5 degrees C.

  • t_snow (float, optional) – Threshold for temperature for considering snow, defaults to 0.6 degrees C.

Returns:

pandas.DataFrame or xarray.Dataset – Input data with snow (mm/day) column if input is a pandas.DataFrame, or snow variable if input is an xarray.Dataset.

Return type:

DF

References

pydaymet.pet#

Core class for the Daymet functions.

Module Contents#
pydaymet.pet.potential_et(clm: pandas.DataFrame, coords: tuple[float, float], crs: CRSTYPE, method: Literal[penman_monteith, priestley_taylor, hargreaves_samani] = ..., params: dict[str, float] | None = ...) pandas.DataFrame#
pydaymet.pet.potential_et(clm: xarray.Dataset, coords: None = None, crs: None = None, method: Literal[penman_monteith, priestley_taylor, hargreaves_samani] = ..., params: dict[str, float] | None = ...) xarray.Dataset

Compute Potential EvapoTranspiration for both gridded and a single location.

Parameters:
  • clm (pandas.DataFrame or xarray.Dataset) – The dataset must include at least the following variables:

    • Minimum temperature in degree celsius

    • Maximum temperature in degree celsius

    • Solar radiation in in W/m2

    • Daylight duration in seconds

    Optionally, for penman_monteith, wind speed at 2-m level will be used if available, otherwise, default value of 2 m/s will be assumed. Table below shows the variable names that the function looks for in the input data.

    pandas.DataFrame

    xarray.Dataset

    tmin (degrees C)

    tmin

    tmax (degrees C)

    tmax

    srad (W/m2)

    srad

    dayl (s)

    dayl

    u2m (m/s)

    u2m

  • coords (tuple of floats, optional) – Coordinates of the daymet data location as a tuple, (x, y). This is required when clm is a DataFrame.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference of the input coordinate, defaults to EPSG:4326. This is only used when clm is a DataFrame.

  • method (str, optional) – Method for computing PET. Supported methods are penman_monteith, priestley_taylor, and hargreaves_samani. The penman_monteith method is based on Allen et al.[1] assuming that soil heat flux density is zero. The priestley_taylor method is based on Priestley and TAYLOR[2] assuming that soil heat flux density is zero. The hargreaves_samani method is based on Hargreaves and Samani[3]. Defaults to hargreaves_samani.

  • params (dict, optional) – Model-specific parameters as a dictionary, defaults to None. Valid parameters are:

    • penman_monteith: soil_heat_flux, albedo, alpha, and arid_correction.

    • priestley_taylor: soil_heat_flux, albedo, and arid_correction.

    • hargreaves_samani: None.

    Default values for the parameters are: soil_heat_flux = 0, albedo = 0.23, alpha = 1.26, and arid_correction = False. An important parameter for priestley_taylor and penman_monteith methods is arid_correction which is used to correct the actual vapor pressure for arid regions. Since relative humidity is not provided by Daymet, the actual vapor pressure is computed assuming that the dewpoint temperature is equal to the minimum temperature. However, for arid regions, FAO 56 suggests subtracting minimum temperature by 2-3 °C to account for the fact that in arid regions, the air might not be saturated when its temperature is at its minimum. For such areas, you can pass {"arid_correction": True, ...} to subtract 2 °C from the minimum temperature for computing the actual vapor pressure.

Returns:

pandas.DataFrame or xarray.Dataset – The input DataFrame/Dataset with an additional variable named pet (mm/day) for pandas.DataFrame and pet for xarray.Dataset.

References

pydaymet.pydaymet#

Access the Daymet database for both single single pixel and gridded queries.

Module Contents#
pydaymet.pydaymet.get_bycoords(coords, dates, coords_id=None, crs=4326, variables=None, region='na', time_scale='daily', pet=None, pet_params=None, snow=False, snow_params=None, ssl=True, to_xarray=False)#

Get point-data from the Daymet database at 1-km resolution.

This function uses THREDDS data service to get the coordinates and supports getting monthly and annual summaries of the climate data directly from the server.

Parameters:
  • coords (tuple or list of tuples) – Coordinates of the location(s) of interest as a tuple (x, y)

  • dates (tuple or list) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, ...].

  • coords_id (list of int or str, optional) – A list of identifiers for the coordinates. This option only applies when to_xarray is set to True. If not provided, the coordinates will be enumerated.

  • crs (str, int, or pyproj.CRS, optional) – The CRS of the input coordinates, defaults to EPSG:4326.

  • variables (str or list) – List of variables to be downloaded. The acceptable variables are: tmin, tmax, prcp, srad, vp, swe, dayl Descriptions can be found here.

  • region (str, optional) – Target region in the US, defaults to na. Acceptable values are:

    • na: Continental North America

    • hi: Hawaii

    • pr: Puerto Rico

  • time_scale (str, optional) – Data time scale which can be daily, monthly (monthly summaries), or annual (annual summaries). Defaults to daily.

  • pet (str, optional) – Method for computing PET. Supported methods are penman_monteith, priestley_taylor, hargreaves_samani, and None (don’t compute PET). The penman_monteith method is based on Allen et al.[1] assuming that soil heat flux density is zero. The priestley_taylor method is based on Priestley and TAYLOR[2] assuming that soil heat flux density is zero. The hargreaves_samani method is based on Hargreaves and Samani[3]. Defaults to None.

  • pet_params (dict, optional) – Model-specific parameters as a dictionary, defaults to None. An important parameter for priestley_taylor and penman_monteith methods is arid_correction which is used to correct the actual vapor pressure for arid regions. Since relative humidity is not provided by Daymet, the actual vapor pressure is computed assuming that the dewpoint temperature is equal to the minimum temperature. However, for arid regions, FAO 56 suggests subtracting the minimum temperature by 2-3 °C to account for aridity, since in arid regions, the air might not be saturated when its temperature is at its minimum. For such areas, you can pass {"arid_correction": True, ...} to subtract 2 °C from the minimum temperature before computing the actual vapor pressure.

  • snow (bool, optional) – Compute snowfall from precipitation and minimum temperature. Defaults to False.

  • snow_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used if snow is True. Two parameters are required: t_rain (deg C) which is the threshold for temperature for considering rain and t_snow (deg C) which is the threshold for temperature for considering snow. The default values are {'t_rain': 2.5, 't_snow': 0.6} that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.

  • ssl (bool, optional) – Whether to verify SSL certification, defaults to True.

  • to_xarray (bool, optional) – Return the data as an xarray.Dataset. Defaults to False.

Returns:

pandas.DataFrame or xarray.Dataset – Daily climate data for a single or list of locations.

Return type:

pandas.DataFrame | xarray.Dataset

Examples

>>> import pydaymet as daymet
>>> coords = (-1431147.7928, 318483.4618)
>>> dates = ("2000-01-01", "2000-12-31")
>>> clm = daymet.get_bycoords(
...     coords,
...     dates,
...     crs=3542,
...     pet="hargreaves_samani",
... )
>>> clm["pet (mm/day)"].mean()
3.713

References

pydaymet.pydaymet.get_bygeom(geometry, dates, crs=4326, variables=None, region='na', time_scale='daily', pet=None, pet_params=None, snow=False, snow_params=None, ssl=True)#

Get gridded data from the Daymet database at 1-km resolution.

Parameters:
  • geometry (Polygon, MultiPolygon, or bbox) – The geometry of the region of interest.

  • dates (tuple or list) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • crs (str, int, or pyproj.CRS, optional) – The CRS of the input geometry, defaults to epsg:4326.

  • variables (str or list) – List of variables to be downloaded. The acceptable variables are: tmin, tmax, prcp, srad, vp, swe, dayl Descriptions can be found here.

  • region (str, optional) – Region in the US, defaults to na. Acceptable values are:

    • na: Continental North America

    • hi: Hawaii

    • pr: Puerto Rico

  • time_scale (str, optional) – Data time scale which can be daily, monthly (monthly average), or annual (annual average). Defaults to daily.

  • pet (str, optional) – Method for computing PET. Supported methods are penman_monteith, priestley_taylor, hargreaves_samani, and None (don’t compute PET). The penman_monteith method is based on Allen et al.[1] assuming that soil heat flux density is zero. The priestley_taylor method is based on Priestley and TAYLOR[2] assuming that soil heat flux density is zero. The hargreaves_samani method is based on Hargreaves and Samani[3]. Defaults to None.

  • pet_params (dict, optional) – Model-specific parameters as a dictionary, defaults to None. Valid parameters are:

    • penman_monteith: soil_heat_flux, albedo, alpha, and arid_correction.

    • priestley_taylor: soil_heat_flux, albedo, and arid_correction.

    • hargreaves_samani: None.

    Default values for the parameters are: soil_heat_flux = 0, albedo = 0.23, alpha = 1.26, and arid_correction = False. An important parameter for priestley_taylor and penman_monteith methods is arid_correction which is used to correct the actual vapor pressure for arid regions. Since relative humidity is not provided by Daymet, the actual vapor pressure is computed assuming that the dewpoint temperature is equal to the minimum temperature. However, for arid regions, FAO 56 suggests subtracting the minimum temperature by 2-3 °C to account for aridity, since in arid regions, the air might not be saturated when its temperature is at its minimum. For such areas, you can pass {"arid_correction": True, ...} to subtract 2 °C from the minimum temperature before computing the actual vapor pressure.

  • snow (bool, optional) – Compute snowfall from precipitation and minimum temperature. Defaults to False.

  • snow_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used if snow is True. Two parameters are required: t_rain (deg C) which is the threshold for temperature for considering rain and t_snow (deg C) which is the threshold for temperature for considering snow. The default values are {'t_rain': 2.5, 't_snow': 0.6} that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.

  • ssl (bool, optional) – Whether to verify SSL certification, defaults to True.

Returns:

xarray.Dataset – Daily climate data within the target geometry.

Return type:

xarray.Dataset

Examples

>>> from shapely import Polygon
>>> import pydaymet as daymet
>>> geometry = Polygon(
...     [[-69.77, 45.07], [-69.31, 45.07], [-69.31, 45.45], [-69.77, 45.45], [-69.77, 45.07]]
... )
>>> clm = daymet.get_bygeom(geometry, 2010, variables="tmin", time_scale="annual")
>>> clm["tmin"].mean().item()
1.361

References

pydaymet.pydaymet.get_bystac(geometry, dates, crs=4326, variables=None, region='na', time_scale='daily', res_km=1, pet=None, pet_params=None, snow=False, snow_params=None)#

Get gridded Daymet from STAC.

New in version 0.16.1.

Note

This function provides access to the Daymet data from Microsoft’s the Planetary Computer: https://planetarycomputer.microsoft.com/dataset/group/daymet. Although this function can be much faster than get_bygeom(), currently, it gives access to Daymet v4.2 from 1980 to 2020. For accessing the latest version of Daymet (v4.5) you need to use get_bygeom().

Also, this function requires fsspec, dask, zarr, and pystac-client packages. They can be installed using pip install fsspec dask zarr pystac-client or conda install fsspec dask-core zarr pystac-client.

Parameters:
  • geometry (Polygon, MultiPolygon, or bbox) – The geometry of the region of interest.

  • dates (tuple) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • crs (str, int, or pyproj.CRS, optional) – The CRS of the input geometry, defaults to epsg:4326.

  • variables (str or list) – List of variables to be downloaded. The acceptable variables are: tmin, tmax, prcp, srad, vp, swe, dayl Descriptions can be found here.

  • region (str, optional) – Region in the US, defaults to na. Acceptable values are:

    • na: Continental North America

    • hi: Hawaii

    • pr: Puerto Rico

  • time_scale (str, optional) – Data time scale which can be daily, monthly (monthly average), or annual (annual average). Defaults to daily.

  • res_km (int, optional) – Spatial resolution in kilometers, defaults to 1. For values greater than 1, the data will be aggregated (coarsend) using mean.

  • pet (str, optional) – Method for computing PET. Supported methods are penman_monteith, priestley_taylor, hargreaves_samani, and None (don’t compute PET). The penman_monteith method is based on Allen et al.[1] assuming that soil heat flux density is zero. The priestley_taylor method is based on Priestley and TAYLOR[2] assuming that soil heat flux density is zero. The hargreaves_samani method is based on Hargreaves and Samani[3]. Defaults to None.

  • pet_params (dict, optional) – Model-specific parameters as a dictionary, defaults to None. Valid parameters are:

    • penman_monteith: soil_heat_flux, albedo, alpha, and arid_correction.

    • priestley_taylor: soil_heat_flux, albedo, and arid_correction.

    • hargreaves_samani: None.

    Default values for the parameters are: soil_heat_flux = 0, albedo = 0.23, alpha = 1.26, and arid_correction = False. An important parameter for priestley_taylor and penman_monteith methods is arid_correction which is used to correct the actual vapor pressure for arid regions. Since relative humidity is not provided by Daymet, the actual vapor pressure is computed assuming that the dewpoint temperature is equal to the minimum temperature. However, for arid regions, FAO 56 suggests subtracting the minimum temperature by 2-3 °C to account for aridity, since in arid regions, the air might not be saturated when its temperature is at its minimum. For such areas, you can pass {"arid_correction": True, ...} to subtract 2 °C from the minimum temperature before computing the actual vapor pressure.

  • snow (bool, optional) – Compute snowfall from precipitation and minimum temperature. Defaults to False.

  • snow_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used if snow is True. Two parameters are required: t_rain (deg C) which is the threshold for temperature for considering rain and t_snow (deg C) which is the threshold for temperature for considering snow. The default values are {'t_rain': 2.5, 't_snow': 0.6} that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.

  • ssl (bool, optional) – Whether to verify SSL certification, defaults to True.

Returns:

xarray.Dataset – Daily climate data within the target geometry.

Return type:

xarray.Dataset

Examples

>>> from shapely import Polygon
>>> geometry = Polygon(
...     [[-69.77, 45.07], [-69.70, 45.07], [-69.70, 45.15], [-69.77, 45.15], [-69.77, 45.07]]
... )
>>> clm = daymet.get_bystac(
...     geometry,
...     ("2010-01-01", "2010-01-02"),
...     variables="tmin",
...     res_km=4,
...     snow=True,
...     pet="hargreaves_samani",
... )
>>> clm["pet"].mean().item()
0.3

References

Package Contents#

pygridmet#

Top-level package for PyGridMET.

Submodules#

pygridmet.core#

Core class for the GridMET functions.

Module Contents#
class pygridmet.core.GridMET(dates=2000, variables=None, snow=False)#

Base class for GridMET requests.

Parameters:
  • dates (tuple or int or list, optional) – Start and end dates as a tuple, (start, end), or a list of years. Defaults to 2000 so the class can be initialized without any arguments.

  • variables (str or list or tuple, optional) – List of variables to be downloaded. The acceptable variables are: pr, rmax, rmin, sph, srad, th, tmmn, tmmx, vs, bi, fm100, fm1000, erc, etr, pet, and vpd. Descriptions can be found here. Defaults to None, i.e., all the variables are downloaded.

  • snow (bool, optional) – Compute snowfall from precipitation and minimum temperature. Defaults to False.

References

static check_dates(dates)#

Check if input dates are in correct format and valid.

dates_todict(dates)#

Set dates by start and end dates as a tuple, (start, end).

dates_tolist(dates)#

Correct dates for GridMET accounting for leap years.

GridMET doesn’t account for leap years and removes Dec 31 when it’s leap year.

Parameters:

dates (tuple) – Target start and end dates.

Returns:

list – All the dates in the GridMET database within the provided date range.

Return type:

list[tuple[pandas.Timestamp, pandas.Timestamp]]

separate_snow(clm, t_rain=T_RAIN, t_snow=T_SNOW)#

Separate snow based on Martinez and Gupta[1].

Parameters:
  • clm (pandas.DataFrame or xarray.Dataset) – Climate data that should include pr and tmmn.

  • t_rain (float, optional) – Threshold for temperature for considering rain, defaults to 2.5 K.

  • t_snow (float, optional) – Threshold for temperature for considering snow, defaults to 0.6 K.

Returns:

pandas.DataFrame or xarray.Dataset – Input data with snow (mm) column if input is a pandas.DataFrame, or snow variable if input is an xarray.Dataset.

Return type:

DF

References

years_todict(years)#

Set date by list of year(s).

years_tolist(years)#

Correct dates for GridMET accounting for leap years.

GridMET doesn’t account for leap years and removes Dec 31 when it’s leap year.

Parameters:

years (list) – A list of target years.

Returns:

list – All the dates in the GridMET database within the provided date range.

Return type:

list[tuple[pandas.Timestamp, pandas.Timestamp]]

pygridmet.pygridmet#

Access the GridMET database for both single single pixel and gridded queries.

Module Contents#
pygridmet.pygridmet.get_bycoords(coords, dates, coords_id=None, crs=4326, variables=None, snow=False, snow_params=None, ssl=True, to_xarray=False)#

Get point-data from the GridMET database at 1-km resolution.

Parameters:
  • coords (tuple or list of tuples) – Coordinates of the location(s) of interest as a tuple (x, y)

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, ...].

  • coords_id (list of int or str, optional) – A list of identifiers for the coordinates. This option only applies when to_xarray is set to True. If not provided, the coordinates will be enumerated.

  • crs (str, int, or pyproj.CRS, optional) – The CRS of the input coordinates, defaults to EPSG:4326.

  • variables (str or list) – List of variables to be downloaded. The acceptable variables are: pr, rmax, rmin, sph, srad, th, tmmn, tmmx, vs, bi, fm100, fm1000, erc, etr, pet, and vpd. Descriptions can be found here. Defaults to None, i.e., all the variables are downloaded.

  • snow (bool, optional) – Compute snowfall from precipitation and minimum temperature. Defaults to False.

  • snow_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used if snow is True. Two parameters are required: t_rain (deg C) which is the threshold for temperature for considering rain and t_snow (deg C) which is the threshold for temperature for considering snow. The default values are {'t_rain': 2.5, 't_snow': 0.6} that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.

  • ssl (bool, optional) – Whether to verify SSL certification, defaults to True.

  • to_xarray (bool, optional) – Return the data as an xarray.Dataset. Defaults to False.

Returns:

pandas.DataFrame or xarray.Dataset – Daily climate data for a single or list of locations.

Return type:

pandas.DataFrame | xarray.Dataset

Examples

>>> import pygridmet as gridmet
>>> coords = (-1431147.7928, 318483.4618)
>>> dates = ("2000-01-01", "2000-01-31")
>>> clm = gridmet.get_bycoords(
...     coords,
...     dates,
...     crs=3542,
... )
>>> clm["pr (mm)"].mean()
9.677
pygridmet.pygridmet.get_bygeom(geometry, dates, crs=4326, variables=None, snow=False, snow_params=None, ssl=True)#

Get gridded data from the GridMET database at 1-km resolution.

Parameters:
  • geometry (Polygon, MultiPolygon, or bbox) – The geometry of the region of interest.

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • crs (str, int, or pyproj.CRS, optional) – The CRS of the input geometry, defaults to epsg:4326.

  • variables (str or list) – List of variables to be downloaded. The acceptable variables are: pr, rmax, rmin, sph, srad, th, tmmn, tmmx, vs, bi, fm100, fm1000, erc, etr, pet, and vpd. Descriptions can be found here. Defaults to None, i.e., all the variables are downloaded.

  • snow (bool, optional) – Compute snowfall from precipitation and minimum temperature. Defaults to False.

  • snow_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used if snow is True. Two parameters are required: t_rain (deg C) which is the threshold for temperature for considering rain and t_snow (deg C) which is the threshold for temperature for considering snow. The default values are {'t_rain': 2.5, 't_snow': 0.6} that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.

  • ssl (bool, optional) – Whether to verify SSL certification, defaults to True.

Returns:

xarray.Dataset – Daily climate data within the target geometry.

Return type:

xarray.Dataset

Examples

>>> from shapely import Polygon
>>> import pygridmet as gridmet
>>> geometry = Polygon(
...     [[-69.77, 45.07], [-69.31, 45.07], [-69.31, 45.45], [-69.77, 45.45], [-69.77, 45.07]]
... )
>>> clm = gridmet.get_bygeom(geometry, 2010, variables="tmmn")
>>> clm["tmmn"].mean().item()
274.167

Package Contents#

pynldas2#

Top-level package.

Submodules#

pynldas2.pynldas2#

Get hourly NLDAS2 forcing data.

Module Contents#
pynldas2.pynldas2.get_bycoords(coords, start_date, end_date, coords_id=None, crs=4326, variables=None, to_xarray=False, n_conn=4, snow=False, snow_params=None, source='grib')#

Get NLDAS-2 climate forcing data for a list of coordinates.

Parameters:
  • coords (list of tuples) – List of (lon, lat) coordinates.

  • start_date (str) – Start date of the data.

  • end_date (str) – End date of the data.

  • crs (str, int, or pyproj.CRS, optional) – The CRS of the input coordinates, defaults to EPSG:4326.

  • variables (str or list of str, optional) – Variables to download. If None, all variables are downloaded. Valid variables are: prcp, pet, temp, wind_u, wind_v, rlds, rsds, and humidity (and psurf if source=netcdf)

  • to_xarray (bool, optional) – If True, the data is returned as an xarray dataset.

  • n_conn (int, optional) – Number of parallel connections to use for retrieving data, defaults to 4. The maximum number of connections is 4, if more than 4 are requested, 4 connections will be used.

  • snow (bool, optional) – Compute snowfall from precipitation and temperature. Defaults to False.

  • snow_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used if snow is True. Two parameters are required: t_rain (deg C) which is the threshold for temperature for considering rain and t_snow (deg C) which is the threshold for temperature for considering snow. The default values are {'t_rain': 2.5, 't_snow': 0.6} that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.

  • source ({"grib", "netcdf"}, optional) – Source to pull data rods from. Valid sources are: grib and netcdf.

Returns:

pandas.DataFrame – The requested data as a dataframe.

Return type:

pandas.DataFrame | xarray.Dataset

pynldas2.pynldas2.get_bygeom(geometry, start_date, end_date, geo_crs, variables=None, n_conn=4, snow=False, snow_params=None, source='grib')#

Get hourly NLDAS-2 climate forcing within a geometry at 0.125 resolution.

Parameters:
  • geometry (shapely.Polygon, shapely.MultiPolygon, or tuple of length 4) – Input polygon or a bounding box like so (xmin, ymin, xmax, ymax).

  • start_date (str) – Start date of the data.

  • end_date (str) – End date of the data.

  • geo_crs (int, str, or pyproj.CRS) – CRS of the input geometry

  • variables (str or list of str, optional) – Variables to download. If None, all variables are downloaded. Valid variables are: prcp, pet, temp, wind_u, wind_v, rlds, rsds, and humidity (and psurf if source=netcdf)

  • n_conn (int, optional) – Number of parallel connections to use for retrieving data, defaults to 4. It should be less than 4.

  • snow (bool, optional) – Compute snowfall from precipitation and temperature. Defaults to False.

  • snow_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used if snow is True. Two parameters are required: t_rain (deg C) which is the threshold for temperature for considering rain and t_snow (deg C) which is the threshold for temperature for considering snow. The default values are {'t_rain': 2.5, 't_snow': 0.6} that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.

  • source ({"grib", "netcdf"}, optional) – Source to pull data rods from. Valid sources are: grib and netcdf.

Returns:

xarray.Dataset – The requested forcing data.

Return type:

xarray.Dataset

pynldas2.pynldas2.get_grid_mask()#

Get the NLDAS-2 grid that contains the land/water/soil/vegetation mask.

Returns:

xarray.Dataset – The grid mask.

Package Contents#

hydrosignatures#

Top-level package for HydroSignatures.

Submodules#

hydrosignatures.hydrosignatures#

Function for computing hydrologic signature.

Module Contents#
class hydrosignatures.hydrosignatures.HydroSignatures#

Hydrological signatures.

Parameters:
  • q_mmpt (pandas.Series) – Discharge in mm per unit time (the same timescale as precipitation).

  • p_mmpt (pandas.Series) – Precipitation in mm per unit time (the same timescale as discharge).

  • si_method (str, optional) – Seasonality index method. Either walsh or markham. Default is walsh.

  • fdc_slope_bins (tuple of int, optional) – The percentage bins between 1-100 to compute the slope of FDC within it, defaults to (33, 67).

  • bfi_alpha (float, optional) – Alpha parameter for baseflow separation filter using Lyne and Hollick method. Default is 0.925.

property signature_names: dict[str, str]#

Return a dictionary with the hydrological signatures.

property values: SignaturesFloat#

Return a dictionary with the hydrological signatures.

bfi()#

Compute Baseflow Index.

diff(other)#

Compute absolute difference between two hydrological signatures.

fdc()#

Compute exceedance probability (for flow duration curve).

fdc_slope()#

Compute FDC slopes between a list of lower and upper percentiles.

isclose(other)#

Check if the signatures are close between with a tolerance of 1e-3.

mean_annual_flood()#

Compute mean annual flood.

mean_monthly()#

Compute mean monthly flow (for regime curve).

runoff_ratio()#

Compute total runoff ratio.

seasonality_index()#

Compute seasonality index.

streamflow_elasticity()#

Compute streamflow elasticity.

to_dict()#

Return a dictionary with the hydrological signatures.

to_json()#

Return a JSON string with the hydrological signatures.

hydrosignatures.hydrosignatures.aridity_index(pet: pandas.Series, prcp: pandas.Series) numpy.float64#
hydrosignatures.hydrosignatures.aridity_index(pet: pandas.DataFrame, prcp: pandas.DataFrame) pandas.Series
hydrosignatures.hydrosignatures.aridity_index(pet: xarray.DataArray, prcp: xarray.DataArray) xarray.DataArray

Compute (Budyko) aridity index (PET/Prcp).

Parameters:
Returns:

float or pandas.Series or xarray.DataArray – The aridity index.

hydrosignatures.hydrosignatures.baseflow(discharge, alpha=0.925, n_passes=3, pad_width=10)#

Extract baseflow using the Lyne and Hollick filter (Ladson et al., 2013).

Parameters:
  • discharge (numpy.ndarray or pandas.DataFrame or pandas.Series or xarray.DataArray) – Discharge time series that must not have any missing values. It can also be a 2D array where each row is a time series.

  • n_passes (int, optional) – Number of filter passes, defaults to 3. It must be an odd number greater than 3.

  • alpha (float, optional) – Filter parameter that must be between 0 and 1, defaults to 0.925.

  • pad_width (int, optional) – Padding width for extending the data from both ends to address the warm up issue.

Returns:

numpy.ndarray or pandas.DataFrame or pandas.Series or xarray.DataArray – Same discharge input array-like but values replaced with computed baseflow values.

Return type:

ArrayVar

hydrosignatures.hydrosignatures.baseflow_index(discharge, alpha=0.925, n_passes=3, pad_width=10)#

Compute the baseflow index using the Lyne and Hollick filter (Ladson et al., 2013).

Parameters:
  • discharge (numpy.ndarray or pandas.DataFrame or pandas.Series or xarray.DataArray) – Discharge time series that must not have any missing values. It can also be a 2D array where each row is a time series.

  • n_passes (int, optional) – Number of filter passes, defaults to 3. It must be an odd number greater than 3.

  • alpha (float, optional) – Filter parameter that must be between 0 and 1, defaults to 0.925.

  • pad_width (int, optional) – Padding width for extending the data from both ends to address the warm up issue.

Returns:

numpy.float64 – The baseflow index.

Return type:

numpy.float64

hydrosignatures.hydrosignatures.exceedance(daily, threshold=0.001)#

Compute exceedance probability from daily data.

Parameters:
  • daily (pandas.Series or pandas.DataFrame) – The data to be processed

  • threshold (float, optional) – The threshold to compute exceedance probability, defaults to 1e-3.

Returns:

pandas.Series or pandas.DataFrame – Exceedance probability.

Return type:

pandas.DataFrame

hydrosignatures.hydrosignatures.extract_extrema(ts, var_name, n_pts)#

Get local extrema in a time series.

Parameters:
  • ts (pandas.Series) – Variable time series.

  • var_name (str) – Variable name.

  • n_pts (int) – Number of points to consider for detecting local extrema on both sides of each point.

Returns:

pandas.DataFrame – A dataframe with three columns: var_name, peak (bool) and trough (bool).

Return type:

pandas.DataFrame

hydrosignatures.hydrosignatures.flashiness_index(daily)#

Compute flashiness index from daily data following Baker et al. (2004).

Parameters:

daily (pandas.Series or pandas.DataFrame or numpy.ndarray or xarray.DataArray) – The data to be processed

Returns:

numpy.ndarray – Flashiness index.

Return type:

FloatArray

References

Baker, D.B., Richards, R.P., Loftus, T.T. and Kramer, J.W., 2004. A new flashiness index: Characteristics and applications to midwestern rivers and streams 1. JAWRA Journal of the American Water Resources Association, 40(2), pp.503-522.

hydrosignatures.hydrosignatures.flood_moments(streamflow)#

Compute flood moments (MAF, CV, CS) from streamflow.

Parameters:

streamflow (pandas.DataFrame) – The streamflow data to be processed

Returns:

pandas.DataFrame – Flood moments; mean annual flood (MAF), coefficient of variation (CV), and coefficient of skewness (CS).

Return type:

pandas.DataFrame

hydrosignatures.hydrosignatures.flow_duration_curve_slope(discharge, bins, log)#

Compute FDC slopes between the given lower and upper percentiles.

Parameters:
Returns:

numpy.ndarray – The slopes between the given percentiles.

Return type:

FloatArray

hydrosignatures.hydrosignatures.mean_monthly(daily, index_abbr=False, cms=False)#

Compute mean monthly summary from daily data.

Parameters:
  • daily (pandas.Series or pandas.DataFrame) – The data to be processed

  • index_abbr (bool, optional) – Whether to use abbreviated month names as index instead of numbers, defaults to False.

  • cms (bool, optional) – Whether the input data is in cubic meters per second (cms), defaults to False. If True, the mean monthly summary will be computed by taking the mean of the daily data, otherwise the sum of the daily data will be used.

Returns:

pandas.Series or pandas.DataFrame – Mean monthly summary.

Return type:

DF

hydrosignatures.hydrosignatures.rolling_mean_monthly(daily)#

Compute rolling mean monthly.

hydrosignatures.hydrosignatures.seasonality_index_markham(data)#

Compute seasonality index based on Markham, 1970.

hydrosignatures.hydrosignatures.seasonality_index_walsh(data)#

Compute seasonality index based on Walsh and Lawler, 1981 method.

Package Contents#

async_retriever#

Top-level package.

Submodules#

async_retriever.async_retriever#

Core async functions.

Module Contents#
async_retriever.async_retriever.delete_url_cache(url, request_method='GET', cache_name=None, **kwargs)#

Delete cached response associated with url, along with its history (if applicable).

Parameters:
  • url (str) – URL to be deleted from the cache

  • request_method (str, optional) – HTTP request method to be deleted from the cache, defaults to GET.

  • cache_name (str, optional) – Path to a file for caching the session, defaults to ./cache/aiohttp_cache.sqlite.

  • kwargs (dict, optional) – Keywords to pass to the cache.delete_url().

async_retriever.async_retriever.retrieve(urls: Sequence[aiohttp.typedefs.StrOrURL], read_method: Literal[text], request_kwds: Sequence[dict[str, Any]] | None = ..., request_method: Literal[get, GET, post, POST] = ..., max_workers: int = ..., cache_name: pathlib.Path | str | None = ..., timeout: int = ..., expire_after: int = ..., ssl: ssl.SSLContext | bool | None = ..., disable: bool = ..., raise_status: bool = ...) list[str]#
async_retriever.async_retriever.retrieve(urls: Sequence[aiohttp.typedefs.StrOrURL], read_method: Literal[ujson], request_kwds: Sequence[dict[str, Any]] | None = ..., request_method: Literal[get, GET, post, POST] = ..., max_workers: int = ..., cache_name: pathlib.Path | str | None = ..., timeout: int = ..., expire_after: int = ..., ssl: ssl.SSLContext | bool | None = ..., disable: bool = ..., raise_status: bool = ...) list[dict[str, Any]] | list[list[dict[str, Any]]]
async_retriever.async_retriever.retrieve(urls: Sequence[aiohttp.typedefs.StrOrURL], read_method: Literal[binary], request_kwds: Sequence[dict[str, Any]] | None = ..., request_method: Literal[get, GET, post, POST] = ..., max_workers: int = ..., cache_name: pathlib.Path | str | None = ..., timeout: int = ..., expire_after: int = ..., ssl: ssl.SSLContext | bool | None = ..., disable: bool = ..., raise_status: bool = ...) list[bytes]

Send async requests.

Parameters:
  • urls (list of str) – List of URLs.

  • read_method (str) – Method for returning the request; binary, json, and text.

  • request_kwds (list of dict, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults to None. For example, [{"params": {...}, "headers": {...}}, ...].

  • request_method (str, optional) – Request type; GET (get) or POST (post). Defaults to GET.

  • max_workers (int, optional) – Maximum number of async processes, defaults to 8.

  • cache_name (str, optional) – Path to a file for caching the session, defaults to ./cache/aiohttp_cache.sqlite.

  • timeout (int, optional) – Requests timeout in seconds, defaults to 5.

  • expire_after (int, optional) – Expiration time for response caching in seconds, defaults to 2592000 (one week).

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

  • disable (bool, optional) – If True temporarily disable caching requests and get new responses from the server, defaults to False.

  • raise_status (bool, optional) – Raise an exception if the response status is not 200. If False return None. Defaults to True.

Returns:

list – List of responses in the order of input URLs.

Examples

>>> import async_retriever as ar
>>> stations = ["01646500", "08072300", "11073495"]
>>> url = "https://waterservices.usgs.gov/nwis/site"
>>> urls, kwds = zip(
...     *[
...         (url, {"params": {"format": "rdb", "sites": s, "siteStatus": "all"}})
...         for s in stations
...     ]
... )
>>> resp = ar.retrieve(urls, "text", request_kwds=kwds)
>>> resp[0].split("\n")[-2].split("\t")[1]
'01646500'
async_retriever.async_retriever.retrieve_binary(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5, expire_after=EXPIRE_AFTER, ssl=None, disable=False, raise_status=True)#

Send async requests and get the response as bytes.

Parameters:
  • urls (list of str) – List of URLs.

  • request_kwds (list of dict, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults to None. For example, [{"params": {...}, "headers": {...}}, ...].

  • request_method (str, optional) – Request type; GET (get) or POST (post). Defaults to GET.

  • max_workers (int, optional) – Maximum number of async processes, defaults to 8.

  • cache_name (str, optional) – Path to a file for caching the session, defaults to ./cache/aiohttp_cache.sqlite.

  • timeout (int, optional) – Requests timeout in seconds, defaults to 5.

  • expire_after (int, optional) – Expiration time for response caching in seconds, defaults to 2592000 (one week).

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

  • disable (bool, optional) – If True temporarily disable caching requests and get new responses from the server, defaults to False.

  • raise_status (bool, optional) – Raise an exception if the response status is not 200. If False return None. Defaults to True.

Returns:

bytes – List of responses in the order of input URLs.

Return type:

list[bytes]

async_retriever.async_retriever.retrieve_json(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5, expire_after=EXPIRE_AFTER, ssl=None, disable=False, raise_status=True)#

Send async requests and get the response as json.

Parameters:
  • urls (list of str) – List of URLs.

  • request_kwds (list of dict, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults to None. For example, [{"params": {...}, "headers": {...}}, ...].

  • request_method (str, optional) – Request type; GET (get) or POST (post). Defaults to GET.

  • max_workers (int, optional) – Maximum number of async processes, defaults to 8.

  • cache_name (str, optional) – Path to a file for caching the session, defaults to ./cache/aiohttp_cache.sqlite.

  • timeout (int, optional) – Requests timeout in seconds, defaults to 5.

  • expire_after (int, optional) – Expiration time for response caching in seconds, defaults to 2592000 (one week).

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

  • disable (bool, optional) – If True temporarily disable caching requests and get new responses from the server, defaults to False.

  • raise_status (bool, optional) – Raise an exception if the response status is not 200. If False return None. Defaults to True.

Returns:

dict – List of responses in the order of input URLs.

Return type:

list[dict[str, Any]] | list[list[dict[str, Any]]]

Examples

>>> import async_retriever as ar
>>> urls = ["https://labs.waterdata.usgs.gov/api/nldi/linked-data/comid/position"]
>>> kwds = [
...     {
...         "params": {
...             "f": "json",
...             "coords": "POINT(-68.325 45.0369)",
...         },
...     },
... ]
>>> r = ar.retrieve_json(urls, kwds)
>>> print(r[0]["features"][0]["properties"]["identifier"])
2675320
async_retriever.async_retriever.retrieve_text(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5, expire_after=EXPIRE_AFTER, ssl=None, disable=False, raise_status=True)#

Send async requests and get the response as text.

Parameters:
  • urls (list of str) – List of URLs.

  • request_kwds (list of dict, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults to None. For example, [{"params": {...}, "headers": {...}}, ...].

  • request_method (str, optional) – Request type; GET (get) or POST (post). Defaults to GET.

  • max_workers (int, optional) – Maximum number of async processes, defaults to 8.

  • cache_name (str, optional) – Path to a file for caching the session, defaults to ./cache/aiohttp_cache.sqlite.

  • timeout (int, optional) – Requests timeout in seconds in seconds, defaults to 5.

  • expire_after (int, optional) – Expiration time for response caching in seconds, defaults to 2592000 (one week).

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

  • disable (bool, optional) – If True temporarily disable caching requests and get new responses from the server, defaults to False.

  • raise_status (bool, optional) – Raise an exception if the response status is not 200. If False return None. Defaults to True.

Returns:

list – List of responses in the order of input URLs.

Return type:

list[str]

Examples

>>> import async_retriever as ar
>>> stations = ["01646500", "08072300", "11073495"]
>>> url = "https://waterservices.usgs.gov/nwis/site"
>>> urls, kwds = zip(
...     *[
...         (url, {"params": {"format": "rdb", "sites": s, "siteStatus": "all"}})
...         for s in stations
...     ]
... )
>>> resp = ar.retrieve_text(urls, kwds)
>>> resp[0].split("\n")[-2].split("\t")[1]
'01646500'
async_retriever.async_retriever.stream_write(urls, file_paths, request_kwds=None, request_method='GET', max_workers=8, ssl=None, chunk_size=None)#

Send async requests.

Parameters:
  • urls (list of str) – List of URLs.

  • file_paths (list of str or pathlib.Path) – List of file paths to write the response to.

  • request_kwds (list of dict, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults to None. For example, [{"params": {...}, "headers": {...}}, ...].

  • request_method (str, optional) – Request type; GET (get) or POST (post). Defaults to GET.

  • max_workers (int, optional) – Maximum number of async processes, defaults to 8.

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

  • chunk_size (int, optional) – The size of the chunks in bytes to be written to the file, defaults to None, which will iterates over data chunks and write them as received from the server.

Examples

>>> import async_retriever as ar
>>> import tempfile
>>> url = "https://freetestdata.com/wp-content/uploads/2021/09/Free_Test_Data_500KB_CSV-1.csv"
>>> with tempfile.NamedTemporaryFile() as temp:
...     ar.stream_write([url], [temp.name])

Package Contents#

pygeoogc#

Top-level package for PyGeoOGC.

Submodules#

pygeoogc.cache_keys#

Functions for creating unique keys based on web request parameters.

This module is based on the aiohttp-client-cache package, which is licensed under the MIT license. See the LICENSE file for more details.

Module Contents#
pygeoogc.cache_keys.create_request_key(method, url, params=None, data=None, json=None)#

Create a unique cache key based on request details.

pygeoogc.core#

Base classes and function for REST, WMS, and WMF services.

Module Contents#
class pygeoogc.core.ArcGISRESTfulBase(base_url, layer=None, outformat='geojson', outfields='*', crs=4326, max_workers=1, verbose=False, disable_retry=False)#

Access to an ArcGIS REST service.

Parameters:
  • base_url (str, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last / in the url or the target layer must be passed as an argument.

  • layer (int, optional) – Target layer number, defaults to None. If None layer number must be included as after the last / in base_url.

  • outformat (str, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults to geojson. It defaults to esriSpatialRelIntersects.

  • outfields (str or list) – The output fields to be requested. Setting * as outfields requests all the available fields which is the default setting.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference of the output data, defaults to epsg:4326

  • max_workers (int, optional) – Max number of simultaneous requests, default to 2. Note that some services might face issues when several requests are sent simultaneously and will return the requests partially. It’s recommended to avoid using too many workers unless you are certain the web service can handle it.

  • verbose (bool, optional) – If True, prints information about the requests and responses, defaults to False.

  • disable_retry (bool, optional) – If True in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests is saved to a text file which its ipath can be accessed via self.failed_path.

get_features(featureids, return_m=False, return_geom=True)#

Get features based on the feature IDs.

Parameters:
  • featureids (list) – List of feature IDs.

  • return_m (bool, optional) – Whether to activate the Return M (measure) in the request, defaults to False.

  • return_geom (bool, optional) – Whether to return the geometry of the feature, defaults to True.

Returns:

dict – (Geo)json response from the web service.

Return type:

list[dict[str, Any]]

get_response(url, payloads, method='GET')#

Send payload and get the response.

initialize_service()#

Initialize the RESTFul service.

partition_oids(oids)#

Partition feature IDs based on self.max_nrecords.

retry_failed_requests()#

Retry failed requests.

class pygeoogc.core.WFSBase(url, layer=None, outformat=None, version='2.0.0', crs=4326, read_method='json', max_nrecords=1000, validation=True)#

Base class for WFS service.

Parameters:
  • url (str) – The base url for the WFS service, for examples: https://hazards.fema.gov/nfhl/services/public/NFHL/MapServer/WFSServer

  • layer (str) – The layer from the service to be downloaded, defaults to None which throws an error and includes all the available layers offered by the service.

  • outformat (str) –

    The data format to request for data from the service, defaults to None which

    throws an error and includes all the available format offered by the service.

  • version (str, optional) – The WFS service version which should be either 1.0.0, 1.1.0, or 2.0.0. Defaults to 2.0.0.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

  • read_method (str, optional) – Method for reading the retrieved data, defaults to json. Valid options are json, binary, and text.

  • max_nrecords (int, optional) – The maximum number of records in a single request to be retrieved from the service, defaults to 1000. If the number of requested records is greater than this value, the query will be split into multiple requests.

  • validation (bool, optional) – Validate the input arguments from the WFS service, defaults to True. Set this to False if you are sure all the WFS settings such as layer and crs are correct to avoid sending extra requests.

get_service_options()#

Validate input arguments with the WFS service.

sort_params(sort_attr, nfeatures, start_index)#

Get sort parameters for a WFS request.

validate_wfs()#

Validate input arguments with the WFS service.

class pygeoogc.core.WMSBase(url, layers='', outformat='', version='1.3.0', crs=4326, validation=True)#

Base class for accessing a WMS service.

Parameters:
  • url (str) – The base url for the WMS service e.g., https://www.mrlc.gov/geoserver/mrlc_download/wms

  • layers (str or list, optional) – A layer or a list of layers from the service to be downloaded. You can pass an empty string to get a list of available layers.

  • outformat (str, optional) – The data format to request for data from the service. You can pass an empty string to get a list of available output formats.

  • version (str, optional) – The WMS service version which should be either 1.1.1 or 1.3.0, defaults to 1.3.0.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

  • validation (bool, optional) – Validate the input arguments from the WMS service, defaults to True. Set this to False if you are sure all the WMS settings such as layer and crs are correct to avoid sending extra requests.

get_service_options()#

Validate input arguments with the WMS service.

get_validlayers()#

Get the layers supported by the WMS service.

validate_wms()#

Validate input arguments with the WMS service.

pygeoogc.pygeoogc#

Base classes and function for REST, WMS, and WMF services.

Module Contents#
class pygeoogc.pygeoogc.ArcGISRESTful(base_url, layer=None, outformat='geojson', outfields='*', crs=4326, max_workers=1, verbose=False, disable_retry=False)#

Access to an ArcGIS REST service.

Notes

By default, all retrieval methods retry to get the missing feature IDs, if there are any. You can disable this behavior by setting disable_retry to True. If there are any missing feature IDs after the retry, they are saved to a text file, ipath of which can be accessed by self.client.failed_path.

Parameters:
  • base_url (str, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last / in the url or the target layer must be passed as an argument.

  • layer (int, optional) – Target layer number, defaults to None. If None layer number must be included as after the last / in base_url.

  • outformat (str, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults to geojson.

  • outfields (str or list) – The output fields to be requested. Setting * as outfields requests all the available fields which is the default behaviour.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference of the output data, defaults to epsg:4326.

  • max_workers (int, optional) – Number of simultaneous download, default to 1, i.e., no threading. Note that some services might face issues when several requests are sent simultaneously and will return the requests partially. It’s recommended to avoid using too many workers unless you are certain the web service can handle it.

  • verbose (bool, optional) – If True, prints information about the requests and responses, defaults to False.

  • disable_retry (bool, optional) – If True in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests is saved to a text file which its ipath can be accessed via self.client.failed_path.

get_features(featureids, return_m=False, return_geom=True)#

Get features based on the feature IDs.

Parameters:
  • featureids (list) – List of feature IDs.

  • return_m (bool, optional) – Whether to activate the Return M (measure) in the request, defaults to False.

  • return_geom (bool, optional) – Whether to return the geometry of the feature, defaults to True.

Returns:

dict – (Geo)json response from the web service.

Return type:

list[dict[str, Any]]

oids_byfield(field, ids)#

Get Object IDs based on a list of field IDs.

Parameters:
  • field (str) – Name of the target field that IDs belong to.

  • ids (str or list) – A list of target ID(s).

Returns:

list of tuples – A list of feature IDs partitioned by self.max_nrecords.

Return type:

Iterator[tuple[str, Ellipsis]]

oids_bygeom(geom, geo_crs=4326, spatial_relation='esriSpatialRelIntersects', sql_clause=None, distance=None)#

Get feature IDs within a geometry that can be combined with a SQL where clause.

Parameters:
  • geom (LineString, Polygon, Point, MultiPoint, tuple, or list of tuples) – A geometry (LineString, Polygon, Point, MultiPoint), tuple of length two ((x, y)), a list of tuples of length 2 ([(x, y), ...]), or bounding box (tuple of length 4 ((xmin, ymin, xmax, ymax))).

  • geo_crs (str, int, or pyproj.CRS, optional) – The spatial reference of the input geometry, defaults to epsg:4326.

  • spatial_relation (str, optional) – The spatial relationship to be applied on the input geometry while performing the query. If not correct a list of available options is shown. It defaults to esriSpatialRelIntersects. Valid predicates are:

    • esriSpatialRelIntersects

    • esriSpatialRelContains

    • esriSpatialRelCrosses

    • esriSpatialRelEnvelopeIntersects

    • esriSpatialRelIndexIntersects

    • esriSpatialRelOverlaps

    • esriSpatialRelTouches

    • esriSpatialRelWithin

    • esriSpatialRelRelation

  • sql_clause (str, optional) – Valid SQL 92 WHERE clause, default to None.

  • distance (int, optional) – Buffer distance in meters for the input geometries, default to None.

Returns:

list of tuples – A list of feature IDs partitioned by self.max_nrecords.

Return type:

Iterator[tuple[str, Ellipsis]]

oids_bysql(sql_clause)#

Get feature IDs using a valid SQL 92 WHERE clause.

Notes

Not all web services support this type of query. For more details look here.

Parameters:

sql_clause (str) – A valid SQL 92 WHERE clause.

Returns:

list of tuples – A list of feature IDs partitioned by self.max_nrecords.

Return type:

Iterator[tuple[str, Ellipsis]]

partition_oids(oids)#

Partition feature IDs based on self.max_nrecords.

Parameters:

oids (list of int or int) – A list of feature ID(s).

Returns:

list of tuples – A list of feature IDs partitioned by self.max_nrecords.

Return type:

Iterator[tuple[str, Ellipsis]]

class pygeoogc.pygeoogc.HttpURLs#

URLs of the supported HTTP services.

class pygeoogc.pygeoogc.RESTfulURLs#

URLs of the supported RESTful services.

class pygeoogc.pygeoogc.ServiceURL#

URLs of the supported services.

class pygeoogc.pygeoogc.WFS(url, layer=None, outformat=None, version='2.0.0', crs=4326, read_method='json', max_nrecords=1000, validation=True)#

Data from any WFS service within a geometry or by featureid.

Parameters:
  • url (str) – The base url for the WFS service, for examples: https://hazards.fema.gov/nfhl/services/public/NFHL/MapServer/WFSServer

  • layer (str) – The layer from the service to be downloaded, defaults to None which throws an error and includes all the available layers offered by the service.

  • outformat (str) –

    The data format to request for data from the service, defaults to None which

    throws an error and includes all the available format offered by the service.

  • version (str, optional) – The WFS service version which should be either 1.0.0, 1.1.0, or 2.0.0. Defaults to 2.0.0.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

  • read_method (str, optional) – Method for reading the retrieved data, defaults to json. Valid options are json, binary, and text.

  • max_nrecords (int, optional) – The maximum number of records in a single request to be retrieved from the service, defaults to 1000. If the number of records requested is greater than this value, it will be split into multiple requests.

  • validation (bool, optional) – Validate the input arguments from the WFS service, defaults to True. Set this to False if you are sure all the WFS settings such as layer and crs are correct to avoid sending extra requests.

getfeature_bybox(bbox, box_crs=4326, always_xy=False, sort_attr=None)#

Get data from a WFS service within a bounding box.

Parameters:
  • bbox (tuple) – A bounding box for getting the data: [west, south, east, north]

  • box_crs (str, or pyproj.CRS, optional) – The spatial reference system of the input bbox, defaults to epsg:4326.

  • always_xy (bool, optional) – Whether to always use xy axis order, defaults to False. Some services change the axis order from xy to yx, following the latest WFS version specifications but some don’t. If the returned value does not have any geometry, it indicates that most probably the axis order does not match. You can set this to True in that case.

  • sort_attr (str, optional) – The column name in the database to sort request by, defaults to the first attribute in the schema that contains id in its name.

Returns:

list of str or bytes or dict – WFS query response within a bounding box.

Return type:

RESPONSE

getfeature_byfilter(cql_filter, method='GET', sort_attr=None)#

Get features based on a valid CQL filter.

Notes

The validity of the input CQL expression is user’s responsibility since the function does not perform any checks and just sends a request using the input filter.

Parameters:
  • cql_filter (str) – A valid CQL filter expression.

  • method (str) – The request method, could be GET or POST (for long filters).

  • sort_attr (str, optional) – The column name in the database to sort request by, defaults to the first attribute in the schema that contains id in its name.

Returns:

str or bytes or dict – WFS query response

Return type:

RESPONSE

getfeature_bygeom(geometry, geo_crs=4326, always_xy=False, predicate='INTERSECTS', sort_attr=None)#

Get features based on a geometry.

Parameters:
  • geometry (shapely.Polygon or shapely.MultiPolygon) – The input geometry

  • geo_crs (str, or pyproj.CRS, optional) – The CRS of the input geometry, default to epsg:4326.

  • always_xy (bool, optional) – Whether to always use xy axis order, defaults to False. Some services change the axis order from xy to yx, following the latest WFS version specifications but some don’t. If the returned value does not have any geometry, it indicates that most probably the axis order does not match. You can set this to True in that case.

  • predicate (str, optional) – The geometric predicate to use for requesting the data, defaults to INTERSECTS. Valid predicates are:

    • EQUALS

    • DISJOINT

    • INTERSECTS

    • TOUCHES

    • CROSSES

    • WITHIN

    • CONTAINS

    • OVERLAPS

    • RELATE

    • BEYOND

  • sort_attr (str, optional) – The column name in the database to sort request by, defaults to the first attribute in the schema that contains id in its name.

Returns:

str or bytes or dict – WFS query response based on the given geometry.

Return type:

RESPONSE

getfeature_byid(featurename, featureids)#

Get features based on feature IDs.

Parameters:
  • featurename (str) – The name of the column for searching for feature IDs.

  • featureids (int, str, or list of them) – The feature ID(s).

Returns:

str or bytes or dict – WMS query response.

Return type:

RESPONSE

class pygeoogc.pygeoogc.WFSURLs#

URLs of the supported WFS services.

class pygeoogc.pygeoogc.WMS(url, layers, outformat, version='1.3.0', crs=4326, validation=True, ssl=True)#

Get data from a WMS service within a geometry or bounding box.

Parameters:
  • url (str) – The base url for the WMS service e.g., https://www.mrlc.gov/geoserver/mrlc_download/wms

  • layers (str or list) – A layer or a list of layers from the service to be downloaded. You can pass an empty string to get a list of available layers.

  • outformat (str) – The data format to request for data from the service. You can pass an empty string to get a list of available output formats.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

  • version (str, optional) – The WMS service version which should be either 1.1.1 or 1.3.0, defaults to 1.3.0.

  • validation (bool, optional) – Validate the input arguments from the WMS service, defaults to True. Set this to False if you are sure all the WMS settings such as layer and crs are correct to avoid sending extra requests.

  • ssl (bool, optional) – Whether to use SSL for the connection, defaults to True.

get_validlayers()#

Get the layers supported by the WMS service.

getmap_bybox(bbox: tuple[float, float, float, float], resolution: float, box_crs: CRSTYPE = ..., always_xy: bool = ..., max_px: int = ..., kwargs: dict[str, Any] | None = ..., tiff_dir: Literal[None] = None) dict[str, bytes]#
getmap_bybox(bbox: tuple[float, float, float, float], resolution: float, box_crs: CRSTYPE = ..., always_xy: bool = ..., max_px: int = ..., kwargs: dict[str, Any] | None = ..., tiff_dir: str | pathlib.Path = ...) list[pathlib.Path]

Get data from a WMS service within a geometry or bounding box.

Parameters:
  • bbox (tuple) – A bounding box for getting the data.

  • resolution (float) – The output resolution in meters. The width and height of output are computed in pixel based on the geometry bounds and the given resolution.

  • box_crs (str, int, or pyproj.CRS, optional) – The spatial reference system of the input bbox, defaults to epsg:4326.

  • always_xy (bool, optional) – Whether to always use xy axis order, defaults to False. Some services change the axis order from xy to yx, following the latest WFS version specifications but some don’t. If the returned value does not have any geometry, it indicates that most probably the axis order does not match. You can set this to True in that case.

  • max_px (int, optional) – The maximum allowable number of pixels (width x height) for a WMS requests, defaults to 8 million based on some trial-and-error.

  • kwargs (dict, optional) – Optional additional keywords passed as payload, defaults to None. For example, {"styles": "default"}.

  • tiff_dir (str or pathlib.Path, optional) – If given, the retrieved data will be stored on disk instead of returning it, defaults to None, i.e., saving to memory and returning the data.

Returns:

dict of bytes or list of pathlib.Path – If to_disk=False, a dict where the keys are the layer name and values are the returned response from the WMS service as bytes. If to_disk=True, a list of pathlib.Path objects to the saved files.

class pygeoogc.pygeoogc.WMSURLs#

URLs of the supported WMS services.

pygeoogc.utils#

Some utilities for PyGeoOGC.

Module Contents#
class pygeoogc.utils.RetrySession(retries=3, backoff_factor=0.3, status_to_retry=(500, 502, 504), prefixes=('https://',), cache_name=None, expire_after=EXPIRE_AFTER, disable=False, ssl=True)#

Configures the passed-in session to retry on failed requests.

Notes

The fails can be due to connection errors, specific HTTP response codes and 30X redirections. The code was originally based on: bustawin/retry-requests

Parameters:
  • retries (int, optional) – The number of maximum retries before raising an exception, defaults to 5.

  • backoff_factor (float, optional) – A factor used to compute the waiting time between retries, defaults to 0.5.

  • status_to_retry (tuple, optional) – A tuple of status codes that trigger the reply behaviour, defaults to (500, 502, 504).

  • prefixes (tuple, optional) – The prefixes to consider, defaults to (”http://”, “https://”)

  • cache_name (str, optional) – Path to a folder for caching the session, default to None which uses system’s temp directory.

  • expire_after (int, optional) – Expiration time for the cache in seconds, defaults to -1 (never expire).

  • disable (bool, optional) – If True temporarily disable caching request/responses, defaults to False.

  • ssl (bool, optional) – If True verify SSL certificates, defaults to True.

property disable: bool#

Disable caching request/responses.

close()#

Close the session.

get(url, payload=None, params=None, headers=None, stream=None)#

Retrieve data from a url by GET and return the Response.

head(url, params=None, data=None, json=None, headers=None)#

Retrieve data from a url by POST and return the Response.

post(url, payload=None, data=None, json=None, headers=None, stream=None)#

Retrieve data from a url by POST and return the Response.

pygeoogc.utils.match_crs(geom, in_crs, out_crs)#

Reproject a geometry to another CRS.

Parameters:
  • geom (list or tuple or geometry) – Input geometry which could be a list of coordinates such as [(x1, y1), ...], a bounding box like so (xmin, ymin, xmax, ymax), or any valid shapely’s geometry such as Polygon, MultiPolygon, etc..

  • in_crs (str, int, or pyproj.CRS) – Spatial reference of the input geometry

  • out_crs (str, int, or pyproj.CRS) – Target spatial reference

Returns:

same type as the input geometry – Transformed geometry in the target CRS.

Return type:

GEOM

Examples

>>> from shapely import Point
>>> point = Point(-7766049.665, 5691929.739)
>>> match_crs(point, 3857, 4326).xy
(array('d', [-69.7636111130079]), array('d', [45.44549114818127]))
>>> bbox = (-7766049.665, 5691929.739, -7763049.665, 5696929.739)
>>> match_crs(bbox, 3857, 4326)
(-69.7636111130079, 45.44549114818127, -69.73666165448431, 45.47699468552394)
>>> coords = [(-7766049.665, 5691929.739)]
>>> match_crs(coords, 3857, 4326)
[(-69.7636111130079, 45.44549114818127)]
pygeoogc.utils.streaming_download(urls: str, kwds: dict[str, dict[Any, Any]] | None = None, fnames: str | pathlib.Path | None = None, root_dir: str | pathlib.Path | None = None, file_prefix: str = '', file_extention: str = '', method: str = 'GET', ssl: bool = True, chunk_size: int = CHUNK_SIZE, n_jobs: int = MAX_CONN) pathlib.Path#
pygeoogc.utils.streaming_download(urls: list[str], kwds: list[dict[str, dict[Any, Any]]] | None = None, fnames: Sequence[str | pathlib.Path] | None = None, root_dir: str | pathlib.Path | None = None, file_prefix: str = '', file_extention: str = '', method: str = 'GET', ssl: bool = True, chunk_size: int = CHUNK_SIZE, n_jobs: int = MAX_CONN) list[pathlib.Path]

Download and store files in parallel from a list of URLs/Keywords.

Notes

This function runs asynchronously in parallel using n_jobs threads.

Parameters:
  • urls (tuple or list) – A list of URLs to download.

  • kwds (tuple or list, optional) – A list of keywords associated with each URL, e.g., ({“params”: …, “headers”: …}, …). Defaults to None.

  • fnames (tuple or list, optional) – A list of filenames associated with each URL, e.g., (“file1.zip”, …). Defaults to None. If not provided, random unique filenames will be generated based on URL and keyword pairs.

  • root_dir (str or Path, optional) – Root directory to store the files, defaults to None which uses HyRiver’s cache directory. Note that you should either provide root_dir or fnames. If both are provided, root_dir will be ignored.

  • file_prefix (str, optional) – Prefix to add to filenames when storing the files, defaults to None, i.e., no prefix. This argument will be only be used if fnames is not passed.

  • file_extention (str, optional) – Extension to use for storing the files, defaults to None, i.e., no extension if fnames is not provided otherwise. This argument will be only be used if fnames is not passed.

  • method (str, optional) – HTTP method to use, i.e, GET or POST, by default “GET”.

  • ssl (bool, optional) – Whether to use SSL verification, defaults to True.

  • chunk_size (int, optional) – Chunk size to use when downloading, defaults to 100 * 1024 * 1024 i.e., 100 MB.

  • n_jobs (int, optional) – The maximum number of concurrent downloads, defaults to 10.

Returns:

list – A list of pathlib.Path objects associated with URLs in the same order.

pygeoogc.utils.traverse_json(json_data, ipath)#

Extract an element from a JSON-like object along a specified ipath.

This function is based on bcmullins.

Parameters:
  • json_data (dict or list of dicts) – The input json dictionary.

  • ipath (list) – The ipath to the requested element.

Returns:

list – The sub-items founds in the JSON.

Return type:

list[Any]

Examples

>>> data = [
...     {"employees": [
...         {"name": "Alice", "role": "dev", "nbr": 1},
...         {"name": "Bob", "role": "dev", "nbr": 2},
...         ],},
...     {"firm": {"name": "Charlie's Waffle Emporium", "location": "CA"}},
... ]
>>> traverse_json(data, ["employees", "name"])
[['Alice', 'Bob'], [None]]
pygeoogc.utils.validate_crs(crs)#

Validate a CRS.

Parameters:

crs (str, int, or pyproj.CRS) – Input CRS.

Returns:

str – Validated CRS as a string.

Return type:

str

Package Contents#

pygeoutils#

Top-level package for PyGeoUtils.

Submodules#

pygeoutils.geotools#

Some utilities for manipulating GeoSpatial data.

Module Contents#
class pygeoutils.geotools.Coordinates#

Generate validated and normalized coordinates in WGS84.

Parameters:
  • lon (float or list of floats) – Longitude(s) in decimal degrees.

  • lat (float or list of floats) – Latitude(s) in decimal degrees.

  • bounds (tuple of length 4, optional) – The bounding box to check of the input coordinates fall within. Defaults to WGS84 bounds.

Examples

>>> c = Coordinates([460, 20, -30], [80, 200, 10])
>>> c.points.x.tolist()
[100.0, -30.0]
property points: geopandas.GeoSeries#

Get validate coordinate as a geopandas.GeoSeries.

class pygeoutils.geotools.GeoSpline(points, n_pts, degree=3, smoothing=None)#

Create a parametric spline from a GeoDataFrame of points.

Parameters:
  • points (geopandas.GeoDataFrame or geopandas.GeoSeries) – Input points as a GeoDataFrame or GeoSeries. The results will be more accurate if the CRS is projected.

  • npts_sp (int) – Number of points in the output spline curve.

  • degree (int, optional) – Degree of the smoothing spline. Must be 1 <= degree <= 5. Default to 3 which is a cubic spline.

  • smoothing (float or None, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Larger smoothing means more smoothing while smaller values of smoothing indicates less smoothing. If None (default), smoothing is done with all points.

Examples

>>> import geopandas as gpd
>>> xl, yl = zip(
...     *[
...         (-97.06138, 32.837),
...         (-97.06133, 32.836),
...         (-97.06124, 32.834),
...         (-97.06127, 32.832),
...     ]
... )
>>> pts = gpd.GeoSeries(gpd.points_from_xy(xl, yl, crs=4326))
>>> sp = GeoSpline(pts.to_crs(3857), 5).spline
>>> pts_sp = gpd.GeoSeries(gpd.points_from_xy(sp.x, sp.y, crs=3857))
>>> pts_sp = pts_sp.to_crs(4326)
>>> list(zip(pts_sp.x, pts_sp.y))
[(-97.06138, 32.837),
(-97.06132, 32.83575),
(-97.06126, 32.83450),
(-97.06123, 32.83325),
(-97.06127, 32.83200)]
property spline: Spline#

Get the spline as a Spline object.

pygeoutils.geotools.break_lines(lines, points, tol=0.0)#

Break lines at specified points at given direction.

Parameters:
  • lines (geopandas.GeoDataFrame) – Lines to break at intersection points.

  • points (geopandas.GeoDataFrame) – Points to break lines at. It must contain a column named direction with values up or down. This column is used to determine which part of the lines to keep, i.e., upstream or downstream of points.

  • tol (float, optional) – Tolerance for snapping points to the nearest lines in meters. The default is 0.0.

Returns:

geopandas.GeoDataFrame – Original lines except for the parts that have been broken at the specified points.

Return type:

GDFTYPE

pygeoutils.geotools.coords_list(coords)#

Convert a single coordinate or list of coordinates to a list of coordinates.

Parameters:

coords (tuple of list of tuple) – Input coordinates

Returns:

list of tuple – List of coordinates as [(x1, y1), ...].

Return type:

list[tuple[float, float]]

pygeoutils.geotools.geo2polygon(geometry, geo_crs=None, crs=None)#

Convert a geometry to a Shapely’s Polygon and transform to any CRS.

Parameters:
  • geometry (Polygon or tuple of length 4) – Polygon or bounding box (west, south, east, north).

  • geo_crs (int, str, or pyproj.CRS, optional) – Spatial reference of the input geometry, defaults to None.

  • crs (int, str, or pyproj.CRS) – Target spatial reference, defaults to None.

Returns:

shapely.Polygon or shapely.MultiPolygon – A (Multi)Polygon in the target CRS, if different from the input CRS.

Return type:

shapely.Polygon | shapely.MultiPolygon

pygeoutils.geotools.geometry_list(geometry)#

Convert input geometry to a list of Polygons, Points, or LineStrings.

Parameters:

geometry (Polygon or MultiPolygon or tuple of length 4 or list of tuples of length 2 or 3) – Input geometry could be a (Multi)Polygon, (Multi)LineString, (Multi)Point, a tuple/list of length 4 (west, south, east, north), or a list of tuples of length 2 or 3.

Returns:

list – A list of Polygons, Points, or LineStrings.

Return type:

list[shapely.Polygon] | list[shapely.Point] | list[shapely.LineString]

pygeoutils.geotools.geometry_reproject(geom, in_crs, out_crs)#

Reproject a geometry to another CRS.

Parameters:
  • geom (list or tuple or any shapely.GeometryType) – Input geometry could be a list of coordinates such as [(x1, y1), ...], a bounding box like so (xmin, ymin, xmax, ymax), or any valid shapely’s geometry such as Polygon, MultiPolygon, etc..

  • in_crs (str, int, or pyproj.CRS) – Spatial reference of the input geometry

  • out_crs (str, int, or pyproj.CRS) – Target spatial reference

Returns:

same type as the input geometry – Transformed geometry in the target CRS.

Return type:

GEOM

Examples

>>> from shapely import Point
>>> point = Point(-7766049.665, 5691929.739)
>>> geometry_reproject(point, 3857, 4326).xy
(array('d', [-69.7636111130079]), array('d', [45.44549114818127]))
>>> bbox = (-7766049.665, 5691929.739, -7763049.665, 5696929.739)
>>> geometry_reproject(bbox, 3857, 4326)
(-69.7636111130079, 45.44549114818127, -69.73666165448431, 45.47699468552394)
>>> coords = [(-7766049.665, 5691929.739)]
>>> geometry_reproject(coords, 3857, 4326)
[(-69.7636111130079, 45.44549114818127)]
pygeoutils.geotools.line_curvature(line)#

Compute the curvature of a Spline curve.

Notes

The formula for the curvature of a Spline curve is:

\[\kappa = \frac{\dot{x}\ddot{y} - \ddot{x}\dot{y}}{(\dot{x}^2 + \dot{y}^2)^{3/2}}\]

where \(\dot{x}\) and \(\dot{y}\) are the first derivatives of the Spline curve and \(\ddot{x}\) and \(\ddot{y}\) are the second derivatives of the Spline curve. Also, the radius of curvature is:

\[\rho = \frac{1}{|\kappa|}\]
Parameters:

line (shapely.LineString) – Line to compute the curvature at.

Returns:

  • phi (numpy.ndarray) – Angle of the tangent of the Spline curve.

  • curvature (numpy.ndarray) – Curvature of the Spline curve.

  • radius (numpy.ndarray) – Radius of curvature of the Spline curve.

Return type:

tuple[FloatArray, FloatArray, FloatArray]

pygeoutils.geotools.make_spline(x, y, n_pts, k=3, s=None)#

Create a parametric spline from a set of points.

Parameters:
  • x (numpy.ndarray) – x-coordinates of the points.

  • y (numpy.ndarray) – y-coordinates of the points.

  • n_pts (int) – Number of points in the output spline curve.

  • k (int, optional) – Degree of the smoothing spline. Must be 1 <= k <= 5. Default to 3 which is a cubic spline.

  • s (float or None, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Larger s means more smoothing while smaller values of s indicates less smoothing. If None (default), smoothing is done with all data points.

Returns:

Spline – A Spline object with x, y, phi, radius, distance, and line attributes. The line attribute returns the Spline as a shapely.LineString.

Return type:

Spline

pygeoutils.geotools.multi2poly(gdf)#

Convert multipolygons to polygon and fill holes, if any.

Notes

This function tries to convert multipolygons to polygons by first checking if multiploygons can be directly converted using their exterior boundaries. If not, will try to remove very small sub-polygons that their area is less than 1% of the total area of the multipolygon. If this fails, the original multipolygon will be returned.

Parameters:

gdf (geopandas.GeoDataFrame or geopandas.GeoSeries) – A GeoDataFrame or GeoSeries with (multi)polygons. This will be more accurate if the CRS is projected.

Returns:

geopandas.GeoDataFrame or geopandas.GeoSeries – A GeoDataFrame or GeoSeries with polygons (and multipolygons).

Return type:

GDFTYPE

pygeoutils.geotools.nested_polygons(gdf)#

Get nested polygons in a GeoDataFrame.

Parameters:

gdf (geopandas.GeoDataFrame or geopandas.GeoSeries) – A GeoDataFrame or GeoSeries with (multi)polygons.

Returns:

dict – A dictionary where keys are indices of larger polygons and values are a list of indices of smaller polygons that are contained within the larger polygons.

Return type:

dict[int | str, list[int | str]]

pygeoutils.geotools.query_indices(tree_gdf, input_gdf, predicate='intersects')#

Find the indices of the input_geo that intersect with the tree_geo.

Parameters:
Returns:

dict – A dictionary of the indices of the input_gdf that intersect with the tree_gdf. Keys are the index of input_gdf and values are a list of indices of the intersecting tree_gdf.

Return type:

dict[Any, list[Any]]

pygeoutils.geotools.smooth_linestring(line, smoothing=None, npts=None)#

Smooth a LineString using UnivariateSpline from scipy.

Parameters:
  • line (shapely.LineString) – Centerline to be smoothed.

  • smoothing (float or None, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Larger smoothing means more smoothing while smaller values of smoothing indicates less smoothing. If None (default), smoothing is done with all points.

  • npts (int, optional) – Number of points in the output smoothed line. Defaults to 5 times the number of points in the input line.

Returns:

shapely.LineString – Smoothed line with uniform spacing.

Return type:

shapely.LineString

Examples

>>> import geopandas as gpd
>>> import shapely
>>> line = shapely.LineString(
...     [
...         (-97.06138, 32.837),
...         (-97.06133, 32.836),
...         (-97.06124, 32.834),
...         (-97.06127, 32.832),
...     ]
... )
>>> line_smooth = smooth_linestring(line, 4326, 5)
>>> list(zip(*line_smooth.xy))
[(-97.06138, 32.837),
(-97.06132, 32.83575),
(-97.06126, 32.83450),
(-97.06123, 32.83325),
(-97.06127, 32.83200)]
pygeoutils.geotools.snap2nearest(lines, points, tol)#

Find the nearest points on a line to a set of points.

Parameters:
Returns:

geopandas.GeoDataFrame or geopandas.GeoSeries – Points snapped to lines.

Return type:

GDFTYPE

pygeoutils.geotools.spline_curvature(spline_x, spline_y, konts)#

Compute the curvature of a Spline curve.

Notes

The formula for the curvature of a Spline curve is:

\[\kappa = \frac{\dot{x}\ddot{y} - \ddot{x}\dot{y}}{(\dot{x}^2 + \dot{y}^2)^{3/2}}\]

where \(\dot{x}\) and \(\dot{y}\) are the first derivatives of the Spline curve and \(\ddot{x}\) and \(\ddot{y}\) are the second derivatives of the Spline curve. Also, the radius of curvature is:

\[\rho = \frac{1}{|\kappa|}\]
Parameters:
Returns:

  • phi (numpy.ndarray) – Angle of the tangent of the Spline curve.

  • curvature (numpy.ndarray) – Curvature of the Spline curve.

  • radius (numpy.ndarray) – Radius of curvature of the Spline curve.

Return type:

tuple[FloatArray, FloatArray, FloatArray]

pygeoutils.geotools.spline_linestring(line, crs, n_pts, degree=3, smoothing=None)#

Generate a parametric spline from a LineString.

Parameters:
  • line (shapely.LineString, shapely.MultiLineString) – Line to smooth. Note that if line is MultiLineString it will be merged into a single LineString. If the merge fails, an exception will be raised.

  • crs (int, str, or pyproj.CRS) – CRS of the input line. It must be a projected CRS.

  • n_pts (int) – Number of points in the output spline curve.

  • degree (int, optional) – Degree of the smoothing spline. Must be 1 <= degree <= 5. Default to 3 which is a cubic spline.

  • smoothing (float or None, optional) – Smoothing factor is used for determining the number of knots. This arg controls the tradeoff between closeness and smoothness of fit. Larger smoothing means more smoothing while smaller values of smoothing indicates less smoothing. If None (default), smoothing is done with all points.

Returns:

Spline – A Spline object with x, y, phi, radius, distance, and line attributes. The line attribute returns the Spline as a shapely.LineString.

Return type:

Spline

Examples

>>> import geopandas as gpd
>>> import shapely
>>> line = shapely.LineString(
...     [
...         (-97.06138, 32.837),
...         (-97.06133, 32.836),
...         (-97.06124, 32.834),
...         (-97.06127, 32.832),
...     ]
... )
>>> sp = spline_linestring(line, 4326, 5)
>>> list(zip(*sp.line.xy))
[(-97.06138, 32.837),
(-97.06132, 32.83575),
(-97.06126, 32.83450),
(-97.06123, 32.83325),
(-97.06127, 32.83200)]
pygeoutils.pygeoutils#

Some utilities for manipulating GeoSpatial data.

Module Contents#
pygeoutils.pygeoutils.arcgis2geojson(arcgis, id_attr=None)#

Convert ESRIGeoJSON format to GeoJSON.

Notes

Based on arcgis2geojson.

Parameters:
  • arcgis (str or binary) – The ESRIGeoJSON format str (or binary)

  • id_attr (str, optional) – ID of the attribute of interest, defaults to None.

Returns:

dict – A GeoJSON file readable by GeoPandas.

Return type:

dict[str, Any]

pygeoutils.pygeoutils.geodf2xarray(geodf, resolution, attr_col=None, fill=0, projected_crs=5070)#

Rasterize a geopandas.GeoDataFrame to xarray.DataArray.

Parameters:
  • geodf (geopandas.GeoDataFrame or geopandas.GeoSeries) – GeoDataFrame or GeoSeries to rasterize.

  • resolution (float) – Target resolution of the output raster in the projected_crs unit. Since the default projected_crs is EPSG:5070, the default unit for the resolution is meters.

  • attr_col (str, optional) – Column name of the attribute to use as variable., defaults to None, i.e., the variable will be a boolean mask where 1 indicates the presence of a geometry. Also, note that the attribute must be numeric and have one of the following numpy types: int16, int32, uint8, uint16, uint32, float32, and float64.

  • fill (int or float, optional) – Value to use for filling the missing values (mask) of the output raster, defaults to 0.

  • projected_crs (int, str, or pyproj.CRS, optional) – A projected CRS to use for the output raster, defaults to EPSG:5070.

Returns:

xarray.Dataset – The xarray Dataset with a single variable.

Return type:

xarray.Dataset

pygeoutils.pygeoutils.gtiff2vrt(file_list, vrt_path)#

Create a VRT file from a list of (Geo)Tiff files.

Note

This function requires gdal to be installed.

Parameters:
  • file_list (list) – List of paths to the GeoTiff files.

  • vrt_path (str or Path) – Path to the output VRT file.

pygeoutils.pygeoutils.gtiff2xarray(r_dict, geometry=None, geo_crs=None, ds_dims=None, driver=None, all_touched=False, nodata=None, drop=True)#

Convert (Geo)Tiff byte responses to xarray.Dataset.

Parameters:
  • r_dict (dict) – Dictionary of (Geo)Tiff byte responses where keys are some names that are used for naming each responses, and values are bytes.

  • geometry (Polygon, MultiPolygon, or tuple, optional) – The geometry to mask the data that should be in the same CRS as the r_dict. Defaults to None.

  • geo_crs (int, str, or pyproj.CRS, optional) – The spatial reference of the input geometry, defaults to None. This argument should be given when geometry is given.

  • ds_dims (tuple of str, optional) – The names of the vertical and horizontal dimensions (in that order) of the target dataset, default to None. If None, dimension names are determined from a list of common names.

  • driver (str, optional) – A GDAL driver for reading the content, defaults to automatic detection. A list of the drivers can be found here.

  • all_touched (bool, optional) – Include a pixel in the mask if it touches any of the shapes. If False (default), include a pixel only if its center is within one of the shapes, or if it is selected by Bresenham’s line algorithm.

  • nodata (float or int, optional) – The nodata value of the raster, defaults to None, i.e., it is determined from the raster.

  • drop (bool, optional) – If True, drop the data outside of the extent of the mask geometries. Otherwise, it will return the same raster with the data masked. Default is True.

Returns:

xarray.Dataset or xarray.DataAraay – Requested dataset or dataarray.

Return type:

xarray.DataArray | xarray.Dataset

pygeoutils.pygeoutils.json2geodf(content, in_crs=4326, crs=4326)#

Create GeoDataFrame from (Geo)JSON.

Parameters:
  • content (dict or list of dict) – A (Geo)JSON dictionary e.g., response.json() or a list of them.

  • in_crs (int, str, or pyproj.CRS, optional) – CRS of the content, defaults to epsg:4326.

  • crs (int, str, or pyproj.CRS, optional) – The target CRS of the output GeoDataFrame, defaults to epsg:4326.

Returns:

geopandas.GeoDataFrame – Generated geo-data frame from a GeoJSON

Return type:

geopandas.GeoDataFrame

pygeoutils.pygeoutils.xarray2geodf(da, dtype, mask_da=None, connectivity=8)#

Vectorize a xarray.DataArray to a geopandas.GeoDataFrame.

Parameters:
  • da (xarray.DataArray) – The dataarray to vectorize.

  • dtype (type) – The data type of the dataarray. Valid types are int16, int32, uint8, uint16, and float32.

  • mask_da (xarray.DataArray, optional) – The dataarray to use as a mask, defaults to None.

  • connectivity (int, optional) – Use 4 or 8 pixel connectivity for grouping pixels into features, defaults to 8.

Returns:

geopandas.GeoDataFrame – The vectorized dataarray.

Return type:

geopandas.GeoDataFrame

pygeoutils.pygeoutils.xarray_geomask(ds, geometry, crs, all_touched=False, drop=True, from_disk=False)#

Mask a xarray.Dataset based on a geometry.

Parameters:
  • ds (xarray.Dataset or xarray.DataArray) – The dataset(array) to be masked

  • geometry (Polygon, MultiPolygon, or tuple of length 4) – The geometry to mask the data

  • crs (int, str, or pyproj.CRS) – The spatial reference of the input geometry

  • all_touched (bool, optional) – Include a pixel in the mask if it touches any of the shapes. If False (default), include a pixel only if its center is within one of the shapes, or if it is selected by Bresenham’s line algorithm.

  • drop (bool, optional) – If True, drop the data outside of the extent of the mask geometries. Otherwise, it will return the same raster with the data masked. Default is True.

  • from_disk (bool, optional) – If True, it will clip from disk using rasterio.mask.mask if possible. This is beneficial when the size of the data is larger than memory. Default is False.

Returns:

xarray.Dataset or xarray.DataArray – The input dataset with a mask applied (np.nan)

Return type:

XD

Package Contents#

Release Notes#

History#

0.16.2 (2024-02-12)#

Bug Fixes#
  • In NLDI.get_basins, the indices used to be station IDs but in the previous release they were reset by mistake. This version retains the correct indices.

New Features#
  • In nhdplus_l48 function, when the layer is NHDFlowline_Network or NHDFlowline_NonNetwork, merge all MultiLineString geometries to LineString.

0.16.1 (2024-01-03)#

Bug Fixes#
  • Fix an issue in network_xsection and flowline_xsection related to the changes in shapely 2 API. Now, these functions should return the correct cross-sections.

0.16.0 (2024-01-03)#

New Features#
  • Add access to USGS 3D Hydrography Program (3DHP) service. The new class is called HP3D. It can be queried by IDs, geometry, or SQL where clause.

  • Add support for the new PyGeoAPI endpoints called xsatpathpts. This new endpoint is useful for getting elevation profile along A shapely.LineString. You can use pygeoapi function with service="elevation_profile" (or PyGeoAPI class) to access this new endpoint. Previously, the elevation_profile endpoint was used for getting elevation profile along a path from two endpoints and the input GeoDataFrame must have been a MultiPoint with two coordinates. Now, you must the input must contain LineString geometries.

  • Switch to using the new smoothing algorithm from pygeoutils for resampling the flowlines and getting their cross-sections. This new algorithm is more robust, accurate, and faster. It has a new argument called smoothing for controlling the number knots of the spline. Higher values result in smoother curves. The default value is None which uses all the points from the input flowline.

0.15.2 (2023-09-22)#

Bug Fixes#
  • Update GeoConnex based on the latest changes in the web service.

0.15.1 (2023-09-02)#

Bug Fixes#
  • Fix HyRiver libraries requirements by specifying a range instead of exact version so conda-forge can resolve the dependencies.

0.15.0 (2023-05-07)#

From release 0.15 onward, all minor versions of HyRiver packages will be pinned. This ensures that previous minor versions of HyRiver packages cannot be installed with later minor releases. For example, if you have py3dep==0.14.x installed, you cannot install pydaymet==0.15.x. This is to ensure that the API is consistent across all minor versions.

New Features#
  • Add a new function, called nhdplus_h12pp, for retrieving HUC12 pour points across CONUS.

  • Add use_arrow=True to pynhd.nhdplus_l48 when reading the NHDPlus dataset. This speeds up the process since pyarrow is installed.

  • In nhdplus_l48 make layer option so sql parameter of pyogrio.read_dataframe can also be used. This is necessary since pyogrio.read_dataframe does not support passing both layer and sql parameters.

  • Update the mainstems dataset link to version 2.0 in mainstem_huc12_nx.

  • Expose NHDTools class to the public API.

  • For now, retain compatibility with shapely<2 while supporting shapley>=2.

Bug Fixes#
  • Remove unnecessary conversion of id_col and toid_col to Int64 in nhdflw2nx and vector_accumulation. This ensures that the input data types are preserved.

  • Fix an issue in nhdplus_l48, where if the input data_dir is not absolute py7zr fails to extract the file.

0.14.0 (2023-03-05)#

New Features#
  • Rewrite the GeoConnex class to provide access to new capabilities of the web service. Support for spatial queries have been added via CQL queries. For more information, check out the updated GeoConnex example notebook.

  • Add a new property to StreamCat, called metrics_df that gets a dataframe of metric names and their description.

  • Create a new private StreamCatValidator class to avoid polluting the public StreamCat class with private attributes and methods. Moreover, add a new alternative metric names attribute to StreamCat called alt_names for handling those metric names that do not follow METRIC+YYYY convention. This attribute is a dictionary that maps the alternative names to the actual metric names, so users can use METRIC_NAME column of metrics_df and add a year suffix from valid_years attribute of StreamCat to get the actual metric name.

  • In navigate_by* functions of NLDI add stop_comid, which is another criterion for stopping the navigation in addition to distance.

  • Improve UserWarning messages of NLDI and WaterData.

Breaking Changes#
  • Remove pynhd.geoconnex function since more functionality has been added to the GeoConnex service that existence of this function does not make sense anymore. All queries should be done via pynhd.GeoConnex class.

  • Rewrite NLDI to improve code readability and significantly improving performance. Now, its methods do now return tuples if there are failed requests, instead they will be shown as a UserWarning.

  • Bump the minimum required version of shapely to 2.0, and use its new API.

Internal Changes#
  • Sync all minor versions of HyRiver packages to 0.14.0.

0.13.12 (2023-02-10)#

New Features#
  • Update the link to version 2.0 of the ENHD dataset in enhd_attrs.

Internal Changes#
  • Improve columns data types in enhd_attrs and nhdplus_vaa by using int32 instead of Int64, where applicable.

  • Sync all patch versions of HyRiver packages to x.x.12.

0.13.11 (2023-01-24)#

New Features#
  • The prepare_nhdplus now supports NHDPlus HR in addition to NHDPlus MR. It automatically detects the NHDPlus version based on the ID column name: nhdplusid for HR and comid for MR.

Internal Changes#
  • Fully migrate setup.cfg and setup.py to pyproject.toml.

  • Convert relative imports to absolute with absolufy-imports.

  • Improve performance of prepare_nhdplus by using pandas.merge instead of applying a function to each row of the dataframe.

0.13.10 (2023-01-08)#

New Features#
  • Add support for the new EPA’s StreamCat Restful API with around 600 NHDPlus catchment level metrics. One class is added for getting the service properties such as valid metrics, called StreamCat. You can use streamcat function to get the metrics as a pandas.DataFrame.

  • Refactor the show_versions function to improve performance and print the output in a nicer table-like format.

Internal Changes#
  • Skip 0.13.9 version so the minor version of all HyRiver packages become the same.

  • Modify the codebase based on the latest changes in geopandas related to empty dataframes.

0.13.8 (2022-12-09)#

New Features#
  • Add a new function, called nhdplus_attrs_s3, for accessing the recently released NHDPlus derived attributes on a USGS’s S3 bucket. The attributes are provided in parquet files, so getting them is faster than nhdplus_attrs. Also, you can request for multiple attributes at once whereas in nhdplus_attrs you had to request for each attribute one at a time. This function will replace nhdplus_attrs in a future release, as soon as all data that are available on the ScienceBase version are also accessible from the S3 bucket.

  • Add two new functions called mainstem_huc12_nx and enhd_flowlines_nx. These functions generate a networkx directed graph object of NHD HUC12 water boundaries and flowlines, respectively. They also return a dictionary mapping of COMID and HUC12 to the corresponding networkx node. Additionally, a topologically sorted list of COMIDs/HUC12s are returned. The generated data are useful for doing US-scale network analysis and flow accumulation on the NHD network. The NHD graph has about 2.7 million edges and the mainstem HUC12 graph has about 80K edges.

  • Add a new function for getting the entire NHDPlus dataset for CONUS (Lower 48), called nhdplus_l48. The entire NHDPlus dataset is downloaded from here. This 7.3 GB file will take a while to download, depending on your internet connection. The first time you run this function, the file will be downloaded and stored in the ./cache directory. Subsequent calls will use the cached file. Moreover, there are two additional dependencies for using this function: pyogrio and py7zr. These dependencies can be installed using pip install pyogrio py7zr or conda install -c conda-forge pyogrio py7zr.

Internal Changes#
  • Refactor vector_accumulation for significant performance improvements.

  • Modify the codebase based on Refurb suggestions.

0.13.7 (2022-11-04)#

New Features#
  • Add a new function called epa_nhd_catchments to access one of the EPA’s HMS endpoints called WSCatchment. You can use this function to access 414 catchment-scale characteristics for all the NHDPlus catchments including 16-day average curve number. More information on the curve number dataset can be found at its project page here.

Bug Fixes#
  • Fix a bug in NHDTools where due to the recent changes in pandas exception handling, the NHDTools fails in converting columns with NaN values to integer type. Now, pandas throws IntCastingNaNError instead of TypeError when using astype method on a column.

Internal Changes#
  • Use pyupgrade package to update the type hinting annotations to Python 3.10 style.

0.13.6 (2022-08-30)#

Internal Changes#
  • Add the missing PyPi classifiers for the supported Python versions.

0.13.5 (2022-08-29)#

Breaking Changes#
  • Append “Error” to all exception classes for conforming to PEP-8 naming conventions.

Internal Changes#
  • Bump the minimum versions of pygeoogc and pygeoutils to 0.13.5 and that of async-retriever to 0.3.5.

Bug Fixes#
  • Fix an issue in nhdplus_vaa and enhd_attrs functions where if cache folder does not exist, it would not have been created, thus resulting to an error.

0.13.3 (2022-07-31)#

Internal Changes#
  • Use the new async_retriever.stream_write function to download files in nhdplus_vaa and enhd_attrs functions. This is more memory efficient.

  • Convert the type of list of not found items in NLDI.comid_byloc and NLDI.feature_byloc to list of tuples of coordinates from list of strings. This matches the type of returned not found coordinates to that of the inputs.

  • Fix an issue with NLDI that was caused by the recent changes in the NLDI web service’s error handling. The NLDI web service now returns more descriptive error messages in a json format instead of returning the usual status errors.

  • Slice the ENHD dataframe in NHDTools.clean_flowlines before updating the flowline dataframe to reduce the required memory for the update operation.

0.13.2 (2022-06-14)#

Breaking Changes#
  • Set the minimum supported version of Python to 3.8 since many of the dependencies such as xarray, pandas, rioxarray have dropped support for Python 3.7.

Internal Changes#
  • Use micromamba for running tests and use nox for linting in CI.

0.13.1 (2022-06-11)#

New Features#
  • Add support for all the GeoConnex web service endpoints. There are two ways to use it. For a single query, you can use the geoconnex function and for multiple queries, it’s more efficient to use the GeoConnex class.

  • Add support for passing any of the supported NLDI feature sources to the get_basins method of the NLDI class. The default is nwissite to retain backward compatibility.

Bug Fixes#
  • Set the type of “ReachCode” column to str instead of int in pygeoapi and nhdplus_vaa functions.

0.13.0 (2022-04-03)#

New Features#
  • Add two new functions called flowline_resample and network_resample for resampling a flowline or network of flowlines based on a given spacing. This is useful for smoothing jagged flowlines similar to those in the NHDPlus database.

  • Add support for the new NLDI endpoint called “hydrolocation”. The NLDI class now has two methods for getting features by coordinates: feature_byloc and comid_byloc. The feature_byloc method returns the flowline that is associated with the closest NHDPlus feature to the given coordinates. The comid_byloc method returns a point on the closest downstream flowline to the given coordinates.

  • Add a new function called pygeoapi for calling the API in batch mode. This function accepts the input coordinates as a geopandas.GeoDataFrame. It is more performant than calling its counteract PyGeoAPI multiple times. It’s recommended to switch to using this new batch function instead of the PyGeoAPI class. Users just need to prepare an input data frame that has all the required service parameters as columns.

  • Add a new step to prepare_nhdplus to convert MultiLineString to LineString.

  • Add support for the simplified flag of NLDI’s get_basins function. The default value is True to retain the old behavior.

Breaking Changes#
  • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

0.12.2 (2022-02-04)#

New Features#
  • Add a new class called NHD for accessing the latest National Hydrography Dataset. More info regarding this data can be found here.

  • Add two new functions for getting cross-sections along a single flowline via flowline_xsection or throughout a network of flowlines via network_xsection. You can specify spacing and width parameters to control their location. For more information and examples please consult the documentation.

  • Add a new property to AGRBase called service_info to include some useful info about the service including feature_types which can be handy for converting numeric values of types to their string equivalent.

Internal Changes#
  • Use the new PyGeoAPI API.

  • Refactor prepare_nhdplus for improving the performance and robustness of determining tocomid within a network of NHD flowlines.

  • Add empty geometries that NLDI.getbasins returns to the list of not found IDs. This is because the NLDI service does not include non-network flowlines and instead returns an empty geometry for these flowlines. (GH #48)

0.12.1 (2021-12-31)#

Internal Changes#
  • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.

  • Revert to the original PyGeoAPI base URL.

0.12.0 (2021-12-27)#

Breaking Changes#
  • Rewrite ScienceBase to make it applicable for working with other ScienceBase items. A new function has been added for staging the Additional NHDPlus attributes items called stage_nhdplus_attrs.

  • Refactor AGRBase to remove unnecessary functions and make them more general.

  • Update PyGeoAPI class to conform to the new pygeoapi API. This web service is undergoing some changes at the time of this release and the API is not stable, might not work as expected. As soon as the web service is stable, a new version will be released.

New Features#
  • In WaterData.byid show a warning if there are any missing feature IDs that are requested but are not available in the dataset.

  • For all by* methods of WaterData throw a ZeroMatched exception if no features are found.

  • Add expire_after and disable_caching arguments to all functions that use async_retriever. Set the default request caching expiration time to never expire. You can use disable_caching if you don’t want to use the cached responses. Please refer to documentation of the functions for more details.

Internal Changes#
  • Refactor prepare_nhdplus to reduce code complexity by grouping all the NHDPlus tools as a private class.

  • Modify AGRBase to reflect the latest API changes in pygeoogc.ArcGISRESTfull class.

  • Refactor prepare_nhdplus by creating a private class that includes all the previously used private functions. This will make the code more readable and easier to maintain.

  • Add all the missing types so mypy --strict passes.

0.11.4 (2021-11-12)#

New Features#
  • Add a new argument to NLDI.get_basins called split_catchment that if is set to True will split the basin geometry at the watershed outlet.

Internal Changes#
  • Catch service errors in PyGeoAPI and show useful error messages.

  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.11.3 (2021-09-10)#

Internal Changes#
  • More robust handling of inputs and outputs of NLDI’s methods.

  • Use an alternative download link for NHDPlus VAA file on Hydroshare.

  • Restructure the codebase to reduce the complexity of pynhd.py file by dividing it into three files: pynhd all classes that provide access to the supported web services, core that includes base classes, and nhdplus_derived that has functions for getting databases that provided additional attributes for the NHDPlus database.

0.11.2 (2021-08-26)#

New Features#
  • Add support for PyGeoAPI. It offers four functionalities: flow_trace, split_catchment, elevation_profile, and cross_section.

0.11.1 (2021-07-31)#

New Features#
  • Add a function for getting all NHD FCodes as a data frame, called nhd_fcode.

  • Improve prepare_nhdplus function by removing all coastlines and better detection of the terminal point in a network.

Internal Changes#
  • Migrate to using AsyncRetriever for handling communications with web services.

  • Catch the ConnectionError separately in NLDI and raise a ServiceError instead. So user knows that data cannot be returned due to the out of service status of the server not ZeroMatched.

0.11.0 (2021-06-19)#

New Features#
  • Add nhdplus_vaa to access NHDPlus Value Added Attributes for all its flowlines.

  • To see a list of available layers in NHDPlus HR, you can instantiate its class without passing any argument like so NHDPlusHR().

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

Internal Changes#
  • Use persistent caching for all requests which can help speed up network responses significantly.

  • Improve documentation and testing.

0.10.1 (2021-03-27)#

  • Add an announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.0 (2021-03-06)#

  • The first release after renaming hydrodata to PyGeoHydro.

  • Make mypy checks more strict and fix all the errors and prevent possible bugs.

  • Speed up CI testing by using mamba and caching.

0.9.0 (2021-02-14)#

  • Bump version to the same version as PyGeoHydro.

Breaking Changes#
  • Add a new function for getting basins geometries for a list of USGS station IDs. The function is a method of NLDI class called get_basins. So, now NLDI.getfeature_byid function does not have a basin flag. This change makes getting geometries easier and faster.

  • Remove characteristics_dataframe method from NLDI and make a standalone function called nhdplus_attrs for accessing NHDPlus attributes directly from ScienceBase.

  • Add support for using hydro or edits webs services for getting NHDPlus High-Resolution using NHDPlusHR function. The new arguments are service which accepts hydro or edits, and autos_switch flag for automatically switching to the other service if the ones passed by service fails.

New Features#
  • Add a new argument to topoogical_sort called edge_attr that allows adding attribute(s) to the returned Networkx Graph. By default, it is None.

  • A new base class, AGRBase for connecting to ArcGISRESTful-based services such as National Map and EPA’s WaterGEOS.

  • Add support for setting the buffer distance for the input geometries to AGRBase.bygeom.

  • Add comid_byloc to NLDI class for getting ComIDs of the closest flowlines from a list of lon/lat coordinates.

  • Add bydistance to WaterData for getting features within a given radius of a point.

0.2.0 (2020-12-06)#

Breaking Changes#
  • Re-wrote the NLDI function to use API v3 of the NLDI service.

  • The crs argument of WaterData now is the target CRS of the output dataframe. The service CRS is now EPSG:4269 for all the layers.

  • Remove the url_only argument of NLDI since it’s not applicable anymore.

New Features#
  • Added support for NHDPlus High Resolution for getting features by geometry, IDs, or SQL where clause.

  • The following functions are added to NLDI:

  • getcharacteristic_byid: Getting characteristics of NHDPlus catchments.

  • navigate_byloc: Getting the nearest ComID to a coordinate and performing navigation.

  • characteristics_dataframe: Getting all the available catchment-scale characteristics as a data frame.

  • get_validchars: Getting a list of available characteristic IDs for a specified characteristic type.

  • The following function is added to WaterData:

  • byfilter: Getting data based on any valid CQL filter.

  • bygeom: Getting data within a geometry (polygon and multipolygon).

  • Add support for Python 3.9 and tests for Windows.

Bug Fixes#
  • Refactored WaterData to fix the CRS inconsistencies (#1).

0.1.3 (2020-08-18)#

  • Replaced simplejson with orjson to speed-up JSON operations.

0.1.2 (2020-08-11)#

  • Add show_versions function for showing versions of the installed deps.

  • Improve documentation

0.1.1 (2020-08-03)#

  • Improved documentation

  • Refactored WaterData to improve readability.

0.1.0 (2020-07-23)#

  • First release on PyPI.

History#

0.16.2 (2024-XX-XX)#

Bug Fixes#
  • In nlcd_helper function the roughness value for class 82 was set to 0.16 instead of 0.037.

New Features#
  • Converted all methods of NWIS class to classmethod so the class can be used without instantiating it. This change makes the class more flexible and easier to use.

  • In NID class, the stage_nid_inventory method now checks if the remote NID database has been modified since the last download and only downloads the new data if it has been modified. This change makes the method more efficient and reduces the network traffic while ensuring that the local database is always up-to-date.

0.16.0 (2024-01-03)#

Breaking Changes#
  • Bump the minimum supported version of shapely to 2.

Internal Changes#
  • Update the link to NWIS error codes tables in the nwis_errors function.

  • Update NWIS class based on the latest changes to the NWIS web service.

  • Use the default tiles for the interactive_map function.

0.15.2 (2023-09-22)#

New Features#
  • Add a new attribute to EHydro class called survey_grid. It’s a geopandas.GeoDataFrame that includes the survey grid of the eHydro dataset which is a 35-km hexagonal grid.

  • Add support for getting point cloud and survey outline data from eHydro. You can set data_type in EHydro to bathymetry, points, outlines, or contours to get the corresponding data. The default is points since this is the recommended data type by USACE.

  • Add NFHL class within nfhl module to access FEMA’s National Flood Hazard Layer (NFHL) using six different ArcGISRESTFul services. Contributed by Fernando Aristizabal. (PR 108)

Internal Changes#
  • Remove dependency on dask.

  • Move all NLCD related functions to a separate module called nlcd. This doesn’t affect the API since the functions are still available under pygeohydro namespace.

0.15.1 (2023-08-02)#

This release provides access to three new datasets:

  • USACE Hydrographic Surveys (eHydro) and

  • USGS Short-Term Network (STN) Flood Event Data, contributed by Fernando Aristizabal. (PR 108)

  • NLCD 2021

New Features#
  • Add support for getting topobathymetry data from USACE Hydrographic Surveys (eHydro). The new class is called EHydro and gives users the ability to subset the eHydro dataset by geometry, ID, or SQL queries.

  • Add new stnfloodevents module with STNFloodEventData class for retrieving flood event data from the USGS Short-Term Network (STN) RESTful Service. This Python API abstracts away RESTful principles and produces analysis ready data in geo-referenced GeoDataFrames, DataFrames, lists, or dictionaries as desired. The core class methods available are data_dictionary, get_all_data, and get_filtered_data. These class methods retrieve the data dictionaries by type, get all the available data by type, and make filtered requests for data by type as well, respectively. The four types of data include instruments, peaks, hwms, and sites. Contributed by Fernando Aristizabal.

  • Add a wrapper function for the STNFloodEventData class called stn_flood_event.

  • Add support for the new NLCD data (2021) for the three supported layers.

0.15.0 (2023-05-07)#

From release 0.15 onward, all minor versions of HyRiver packages will be pinned. This ensures that previous minor versions of HyRiver packages cannot be installed with later minor releases. For example, if you have py3dep==0.14.x installed, you cannot install pydaymet==0.15.x. This is to ensure that the API is consistent across all minor versions.

New Features#
  • Add a new option to NWIS.get_info, called nhd_info, for retrieving NHDPlus related info on the sites. This will two new service calls that might slow down the function, so it’s disabled by default.

  • Update links in NID to the latest CSV and GPKG versions of the NID dataset.

  • Add two new properties to NID to access the entire NID dataset. You can use NID.df to access the CSV version as a pandas.DataFrame and NID.gdf to access the GPKG version as a geopandas.GeoDataFrame. Installing pyogrio is highly recommended for much faster reading of the GPKG version.

  • Refactor NID.bygeom to use the new NID.gdf property for spatial querying of the dataset. This change should make the query much faster.

  • For now, retain compatibility with shapely<2 while supporting shapley>=2.

0.14.0 (2023-03-05)#

New Features#
  • Add a new function, called nlcd_area_percent, for computing the percentages or natural, developed, and impervious areas within geometries of a given GeoDataFrame. This function uses imperviousness and land use/land cover data from NLCD to compute the area percentages of the natural, developed, and impervious areas. For more information please refer to the function’s documentation.

  • Add a new column to the dataframe returned by NWIS.get_info, called nhd_comid, and rename drain_sqkm to nhd_areasqkm. The new drainage area is the best available estimates of stations’ drainage area that have been extracted from the NHDPlus. The new nhd_comid column makes it easier to link stations to NHDPlus.

  • In get_camels, return qobs with negatives values set to NaN. Also, Add a new variable called Newman_2017 to both datasets for identifying the 531 stations that were used in Newman et al. (2017).

  • Add a new function, called streamflow_fillna, for filling missing streamflow values (NAN) with day-of-year average values.

Breaking Changes#
  • Bump the minimum required version of shapely to 2.0, and use its new API.

Internal Changes#
  • Sync all minor versions of HyRiver packages to 0.14.0.

  • Improve performance of all NLCD functions by merging two methods of the NLCD and also reducing the memory footprint of the functions.

0.13.12 (2023-02-10)#

New Features#
  • Add initial support for SensorThings API Currently, the SensorThings class only supports Things endpoint. Users need to provide a valid Odata filter. The class has a odata_helper function that can be used to generate and validate Odata filters. Additionally, using sensor_info and sensor_property functions users can request for information about sensors themselves or their properties.

Internal Changes#
  • Simplify geometry validation by using pygeoutils.geo2polygon function in ssebopeta_bygeom.

  • Fully migrate setup.cfg and setup.py to pyproject.toml.

  • Convert relative imports to absolute with absolufy-imports.

  • Sync all patch versions of HyRiver packages to x.x.12.

0.13.10 (2023-01-09)#

Breaking Changes#
  • The NID service has changed some of its endpoints to use Federal ID instead of Dam ID. This change affects the NID.inventory_byid function. This function now accepts Federal IDs instead of dam IDs.

New Features#
  • Refactor the show_versions function to improve performance and print the output in a nicer table-like format.

Internal Changes#
  • Use the new pygeoogc.streaming_download function in huc_wb_full to improve performance and reduce code complexity.

  • Skip 0.13.9 version so the minor version of all HyRiver packages become the same.

  • Modify the codebase based on the latest changes in geopandas related to empty dataframes.

  • Use pyright for static type checking instead of mypy and address all typing issues that it raised.

0.13.8 (2022-12-09)#

New Features#
  • Add a function called huc_wb_full that returns the full watershed boundary GeoDataFrame of a given HUC level. If only a subset of HUCs is needed the pygeohydro.WBD class should be used. The full dataset is downloaded from the National Maps’ WBD staged products.

  • Add a new function called irrigation_withdrawals for retrieving estimated monthly water use for irrigation by 12-digit hydrologic unit in the CONUS for 2015 from ScienceBase.

  • Add a new property to NID, called data_units for indicating the units of NID dataset variables.

  • The get_us_states now accepts conus as a subset_key which is equivalent to contiguous.

Internal Changes#
  • Add get_us_states to __init__ file, so it can be loaded directly, e.g., gh.get_us_states("TX").

  • Modify the codebase based on Refurb suggestions.

  • Significant performance improvements in NWIS.get_streamflow especially for large requests by refactoring the timezone handling.

Bug Fixes#
  • Fix the dam types and purposes mapping dictionaries in NID class.

0.13.7 (2022-11-04)#

New Features#
  • Add a two new function for retrieving soil properties across the US:

    • soil_properties: Porosity, available water capacity, and field capacity,

    • soil_gnatsgo: Soil properties from the gNATSGO database.

  • Add a new help function called state_lookup_table for getting a lookup table of US states and their counties. This can be particularly useful for mapping the digit state_cd and county_cd that NWIS returns to state names/codes.

  • Add support for getting individual state geometries using get_us_states function by passing their two letter state code. Also, use TIGER 2022 data for the US states and counties instead of TIGER 2021.

Internal Changes#
  • Remove proplot as a dependency and use matplotlib instead.

0.13.6 (2022-08-30)#

Internal Changes#
  • Add the missing PyPi classifiers for the supported Python versions.

0.13.5 (2022-08-29)#

Breaking Changes#
  • Append “Error” to all exception classes for conforming to PEP-8 naming conventions.

  • Deprecate ssebopeta_byloc since it’s been replaced with ssebopeta_bycoords since version 0.13.0.

Internal Changes#
  • Bump the minimum versions of pygeoogc and pygeoutils to 0.13.5 and that of async-retriever to 0.3.5.

0.13.3 (2022-07-31)#

New Features#
  • Add a new argument to NID.inventory_byid class for staging the entire NID dataset prior to inventory queries. There a new public method called NID.stage_nid_inventory that can be used to download the entire NID dataset and save it as a feather file. This is useful inventory queries with large number of IDs and is much more efficient than querying the NID web service.

Bug Fixes#
  • The background value in cover_statistics function should have been 127 not 0. Also, dropped the background value from the return statistics.

0.13.2 (2022-06-14)#

Breaking Changes#
  • Set the minimum supported version of Python to 3.8 since many of the dependencies such as xarray, pandas, rioxarray have dropped support for Python 3.7.

Internal Changes#
  • Remove USGS prefixes from the input station IDs in NWIS.get_streamflow function. Also, check if the remaining parts of the IDs are all digits and throw an exception if otherwise. Additionally, make sure that IDs have at least 8 chars by adding leading zeros (GH 99).

  • Use micromamba for running tests and use nox for linting in CI.

0.13.1 (2022-06-11)#

New Features#
  • Add a new function called get_us_states to the helpers module for obtaining a GeoDataFrame of the US states. It has an optional argument for returning the contiguous states, continental states, commonwealths states, or US territories. The data are retrieved from the Census’ Tiger 2021 database.

  • In the NID class keep the valid_fields property as a pandas.Series instead of a list, so it can be searched easier via its str accessor.

Internal Changes#
  • Refactor the plot.signatures function to use proplot instead of matplotlib.

  • Improve performance of NWIS.get_streamflow by not validating the layer name when instantiating the WaterData class. Also, make the function more robust by checking if streamflow data is available for each station and throw a warning if not.

Bug Fixes#
  • Fix an issue in NWIS.get_streamflow where -9999 values were not being filtered out. According to NWIS, these values are reserved for ice-affected data. This fix sets these values to numpy.nan.

0.13.0 (2022-04-03)#

New Features#
  • Add a new flag to nlcd_* functions called ssl for disabling SSL verification.

  • Add a new function called get_camels for getting the CAMELS dataset. The function returns a geopandas.GeoDataFrame that includes basin-level attributes for all 671 stations in the dataset and a xarray.Dataset that contains streamflow data for all 671 stations and their basin-level attributes.

  • Add a new function named overland_roughness for getting the overland roughness values from land cover data.

  • Add a new class called WBD for getting watershed boundary (HUC) data.

from pygeohydro import WBD

wbd = WBD("huc4")
hudson = wbd.byids("huc4", ["0202", "0203"])
Breaking Changes#
  • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
Internal Changes#
  • Write nodata attribute using rioxarray in nlcd_bygeom since the clipping operation of rioxarray uses this value as the fill value.

0.12.4 (2022-02-04)#

Internal Changes#
  • Return a named tuple instead of a dict of percentages in the cover_statistics function. It makes accessing the values easier.

  • Add pycln as a new pre-commit hooks for removing unused imports.

  • Remove time zone info from the inputs to plot.signatures to avoid issues with the matplotlib backend.

Bug Fixes#
  • Fix an issue in plot.signatures where the new matplotlib version requires a numpy array instead of a pandas.DataFrame.

0.12.3 (2022-01-15)#

Bug Fixes#
  • Replace no data values of data in ssebopeta_bygeom with np.nan before converting it to mm/day.

  • Fix an inconsistency issue with CRS projection when using UTM in nlcd_*. Use EPSG:3857 for all reprojections and get the data from NLCD in the same projection. (GH 85)

  • Improve performance of nlcd_* functions by reducing number of service calls.

Internal Changes#
  • Add type checking with typeguard and fix type hinting issues raised by typeguard.

  • Refactor show_versions to ensure getting correct versions of all dependencies.

0.12.2 (2021-12-31)#

New Features#
  • The NWIS.get_info now returns a geopandas.GeoDataFrame instead of a pandas.DataFrame.

Bug Fixes#
  • Fix a bug in NWIS.get_streamflow where the drainage area might not be computed correctly if target stations are not located at the outlet of their watersheds.

0.12.1 (2021-12-31)#

Internal Changes#
  • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.

Bug Fixes#
  • Fix an in issue with NWIS.get_streamflow where time zone of the data was not being correctly determined when it was US specific abbreviations such as CST.

0.12.0 (2021-12-27)#

New Features#
  • Add support for getting instantaneous streamflow from NWIS in addition to the daily streamflow by adding freq argument to NWIS.get_streamflow that can be either iv or dv. The default is dv to retain the previous behavior of the function.

  • Convert the time zone of the streamflow data to UTC.

  • Add attributes of the requested stations as attrs parameter to the returned pandas.DataFrame. (GH 75)

  • Add a new flag to NWIS.get_streamflow for returning the streamflow as xarray.Dataset. This dataset has two dimensions; time and station_id. It has ten variables which includes discharge and nine other station attributes. (GH 75)

  • Add drain_sqkm from GagesII to NWIS.get_info.

  • Show drain_sqkm in the interactive map generated by interactive_map.

  • Add two new functions for getting NLCD data; nlcd_bygeom and nlcd_bycoords. The new nlcd_bycoords function returns a geopandas.GeoDataFrame with the NLCD layers as columns and input coordinates, which should be a list of (lon, lat) tuples, as the geometry column. Moreover, The new nlcd_bygeom function now accepts a geopandas.GeoDataFrame as the input. In this case, it returns a dict with keys as indices of the input geopandas.GeoDataFrame. (GH 80)

  • The previous nlcd function is being deprecated. For now, it calls nlcd_bygeom internally and retains the old behavior. This function will be removed in future versions.

Breaking Changes#
  • The ssebop_byloc is being deprecated and replaced by ssebop_bycoords. The new function accepts a pandas.DataFrame as input that should include three columns: id, x, and y. It returns a xarray.Dataset with two dimensions: time and location_id. The id columns from the input is used as the location_id dimension. The ssebop_byloc function still retains the old behavior and will be removed in future versions.

  • Set the request caching’s expiration time to never expire. Add two flags to all functions to control the caching: expire_after and disable_caching.

  • Replace NID class with the new RESTful-based web service of National Inventory of Dams. The new NID service is very different from the old one, so this is considered a breaking change.

Internal Changes#
  • Improve exception handling in NWIS.get_info when NWIS returns an error message rather than 500s web service error.

  • The NWIS.get_streamflow function now checks if the site info dataset contains any duplicates. Therefore, all the remaining station numbers will be unique. This prevents an issue with setting attrs where duplicate indexes cause an exception when being converted to a dict. (GH 75)

  • Add all the missing types so mypy --strict passes.

0.11.4 (2021-11-24)#

New Features#
  • Add support for the Water Quality Portal Web Services. (GH 72)

  • Add support for two versions of NID web service. The original NID web service is considered version 2 and the new NID is considered version 3. You can pass the version number to the NID like so NID(2). The default version is 2.

Bug Fixes#
  • Fix an issue with background percentage calculation in cover_statistics.

0.11.3 (2021-11-12)#

New Features#
  • Add a new map service for National Inventory of Dams (NID).

Internal Changes#
  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.11.2 (2021-07-31)#

Bug Fixes#
  • Refactor cover_statistics to address an issue with wrong category names and also improve performance for large datasets by using numpy’s functions.

  • Fix an issue with detecting wrong number of stations in NWIS.get_streamflow. Also, improve filtering stations that their start/end date don’t match the user requested interval.

0.11.1 (2021-07-31)#

The highlight of this release is adding support for NLCD 2019 and significant improvements in NWIS support.

New Features#
  • Add support for the recently released version of NLCD (2019), including the impervious descriptor layer. Highlights of the new database are:

    NLCD 2019 now offers land cover for years 2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019, and impervious surface and impervious descriptor products now updated to match each date of land cover. These products update all previously released versions of land cover and impervious products for CONUS (NLCD 2001, NLCD 2006, NLCD 2011, NLCD 2016) and are not directly comparable to previous products. NLCD 2019 land cover and impervious surface product versions of previous dates must be downloaded for proper comparison. NLCD 2019 also offers an impervious surface descriptor product that identifies the type of each impervious surface pixel. This product identifies types of roads, wind tower sites, building locations, and energy production sites to allow deeper analysis of developed features.

    MRLC

  • Add support for all the supported regions of NLCD database (CONUS, AK, HI, and PR).

  • Add support for passing multiple years to the NLCD function, like so {"cover": [2016, 2019]}.

  • Add plot.descriptor_legends function to plot the legend for the impervious descriptor layer.

  • New features in NWIS class are:

    • Remove query_* methods since it’s not convenient to pass them directly as a dictionary.

    • Add a new function called get_parameter_codes to query parameters and get information about them.

    • To decrease complexity of get_streamflow method add a new private function to handle some tasks.

    • For handling more of NWIS’s services make retrieve_rdb more general.

  • Add a new argument called nwis_kwds to interactive_map so any NWIS specific keywords can be passed for filtering stations.

  • Improve exception handling in get_info method and simplify and improve its performance for getting HCDN.

Internal Changes#
  • Migrate to using AsyncRetriever for handling communications with web services.

0.11.0 (2021-06-19)#

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

  • Remove get_nid and get_nid_codes functions since NID now has a ArcGISRESTFul service.

New Features#
  • Add a new class called NID for accessing the recently released National Inventory of Dams web service. This service is based on ArcGIS’s RESTful service. So now the user just need to instantiate the class like so NID() and with three methods of AGRBase class, the user can retrieve the data. These methods are: bygeom, byids, and bysql. Moreover, it has a attrs property that includes descriptions of the database fields with their units.

  • Refactor NWIS.get_info to be more generic by accepting any valid queries that are documented at USGS Site Web Service.

  • Allow for passing a list of queries to NWIS.get_info and use async_retriever that significantly improves the network response time.

  • Add two new flags to interactive_map for limiting the stations to those with daily values (dv=True) and/or instantaneous values (iv=True). This function now includes a link to stations webpage on USGS website.

Internal Changes#
  • Use persistent caching for all send/receive requests that can significantly improve the network response time.

  • Explicitly include all the hard dependencies in setup.cfg.

  • Refactor interactive_map and NWIS.get_info to make them more efficient and reduce their code complexity.

0.10.2 (2021-03-27)#

Internal Changes#
  • Add announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.1 (2021-03-06)#

Internal Changes#
  • Add lxml to deps.

0.10.0 (2021-03-06)#

Internal Changes#
  • The official first release of PyGeoHydro with a new name and logo.

  • Replace cElementTree with ElementTree since it’s been deprecated by defusedxml.

  • Make mypy checks more strict and fix all the errors and prevent possible bugs.

  • Speed up CI testing by using mamba and caching.

0.9.2 (2021-03-02)#

Internal Changes#
  • Rename hydrodata package to PyGeoHydro for publication on JOSS.

  • In NWIS.get_info, drop rows that don’t have mean daily discharge data instead of slicing.

  • Speed up Github Actions by using mamba and caching.

  • Improve pip installation by adding pyproject.toml.

New Features#
  • Add support for the National Inventory of Dams (NID) via get_nid function.

0.9.1 (2021-02-22)#

Internal Changes#
  • Fix an issue with NWIS.get_info method where stations with False values as their hcdn_2009 value were returned as None instead.

0.9.0 (2021-02-14)#

Internal Changes#
  • Bump versions of packages across the stack to the same version.

  • Use the new PyNHD function for getting basins, NLDI.get_basisn.

  • Made mypy checks more strict and added all the missing type annotations.

0.8.0 (2020-12-06)#

  • Fixed the issue with WaterData due to the recent changes on the server side.

  • Updated the examples based on the latest changes across the stack.

  • Add support for multipolygon.

  • Remove the fill_hole argument.

  • Fix a warning in nlcd regarding performing division on nan values.

0.7.2 (2020-8-18)#

Enhancements#
  • Replaced simplejson with orjson to speed-up JSON operations.

  • Explicitly sort the time dimension of the ssebopeta_bygeom function.

Bug Fixes#
  • Fix an issue with the nlcd function where high resolution requests fail.

0.7.1 (2020-8-13)#

New Features#
  • Added a new argument to plot.signatures for controlling the vertical position of the plot title, called title_ypos. This could be useful for multi-line titles.

Bug Fixes#
  • Fixed an issue with the nlcd function where none layers are not dropped and cause the function to fails.

0.7.0 (2020-8-12)#

This version divides PyGeoHydro into six standalone Python libraries. So many of the changes listed below belong to the modules and functions that are now a separate package. This decision was made for reducing the complexity of the code base and allow the users to only install the packages that they need without having to install all the PyGeoHydro dependencies.

Breaking changes#
  • The services module is now a separate package called PyGeoOGCC and is set as a requirement for PyGeoHydro. PyGeoOGC is a leaner package with much fewer dependencies and is suitable for people who might only need an interface to web services.

  • Unified function names for getting feature by ID and by box.

  • Combined start and end arguments into a tuple argument called dates across the code base.

  • Rewrote NLDI function and moved most of its classmethods to Station so now Station class has more cohesion.

  • Removed exploratory functionality of ArcGISREST, since it’s more convenient to do so from a browser. Now, base_url is a required argument.

  • Renamed in_crs in datasets and services functions to geo_crs for geometry and box_crs for bounding box inputs.

  • Re-wrote the signatures function from scratch using NamedTuple to improve readability and efficiency. Now, the daily argument should be just a pandas.DataFrame or pandas.Series and the column names are used for legends.

  • Removed utils.geom_mask function and replaced it with rasterio.mask.mask.

  • Removed width as an input in functions with raster output since resolution is almost always the preferred way to request for data. This change made the code more readable.

  • Renamed two functions: ArcGISRESTful and wms_bybox. These function now return requests.Response type output.

  • onlyipv4 is now a class method in RetrySession.

  • The plot.signatures function now assumes that the input time series are in mm/day.

  • Added a flag to get_streamflow function in the NWIS class to convert from cms to mm/day which is useful for plotting hydrologic signatures using the signatures functions.

Enhancements#
  • Remove soft requirements from the env files.

  • Refactored requests functions into a single class and a separate file.

  • Made all the classes available directly from PyGeoHydro.

  • Added CodeFactor to the Github pipeline and addressed some issues that CodeFactor found.

  • Added Bandit to check the code for security issue.

  • Improved docstrings and documentations.

  • Added customized exceptions for better exception handling.

  • Added pytest fixtures to improve the tests speed.

  • Refactored daymet and nwis_siteinfo functions to reduce code complexity and improve readability.

  • Major refactoring of the code base while adding type hinting.

  • The input geometry (or bounding box) can be provided in any projection and the necessary re-projections are done under the hood.

  • Refactored the method for getting object IDs in ArcGISREST class to improve robustness and efficiency.

  • Refactored Daymet class to improve readability.

  • Add Deepsource for further code quality checking.

  • Automatic handling of large WMS requests (more than 8 million pixels i.e., width x height)

  • The json_togeodf function now accepts both a single (Geo)JSON or a list of them

  • Refactored plot.signatures using add_gridspec for a much cleaner code.

New Features#
  • Added access to WaterData’s GeoServer databases.

  • Added access to the remaining NLDI database (Water Quality Portal and Water Data Exchange).

  • Created a Binder for launching a computing environment on the cloud and testing PyGeoHydro.

  • Added a URL repository for the supported services called ServiceURL

  • Added support for FEMA web services for flood maps and FWS for wetlands.

  • Added a new function called wms_toxarray for converting WMS request responses to xarray.DataArray or xarray.Dataset.

Bug Fixes#
  • Re-projection issues for function with input geometry.

  • Start and end variables not being initialized when coords was used in Station.

  • Geometry mask for xarray.DataArray

  • WMS output re-projections

0.6.0 (2020-06-23)#

  • Refactor requests session

  • Improve overall code quality based on CodeFactor suggestions

  • Migrate to Github Actions from TravisCI

0.5.5 (2020-06-03)#

  • Add to conda-forge

  • Remove pqdm and arcgis2geojson dependencies

0.5.3 (2020-06-07)#

  • Added threading capability to the flow accumulation function

  • Generalized WFS to include both by bbox and by featureID

  • Migrate RTD to pip from conda.

  • Changed HCDN database source to GagesII database

  • Increased robustness of functions that need network connections

  • Made the flow accumulation output a pandas Series for better handling of time series input

  • Combined DEM, slope, and aspect in a class called NationalMap.

  • Installation from pip installs all the dependencies

0.5.0 (2020-04-25)#

  • An almost complete re-writing of the code base and not backward-compatible

  • New website design

  • Added vector accumulation

  • Added base classes and function accessing any ArcGIS REST, WMS, WMS service

  • Standalone functions for creating datasets from responses and masking the data

  • Added threading using pqdm to speed up the downloads

  • Interactive map for exploring USGS stations

  • Replaced OpenTopography with 3DEP

  • Added HCDN database for identifying natural watersheds

0.4.4 (2020-03-12)#

  • Added new databases: NLDI, NHDPLus V2, OpenTopography, gridded Daymet, and SSEBop

  • The gridded data are returned as xarray DataArrays

  • Removed dependency on StreamStats and replaced it by NLDI

  • Improved overall robustness and efficiency of the code

  • Not backward comparable

  • Added code style enforcement with isort, black, flake8 and pre-commit

  • Added a new shiny logo!

  • New installation method

  • Changed OpenTopography base url to their new server

  • Fixed NLCD legend and statistics bug

0.3.0 (2020-02-10)#

  • Clipped the obtained NLCD data using the watershed geometry

  • Added support for specifying the year for getting NLCD

  • Removed direct NHDPlus data download dependency by using StreamStats and USGS APIs

  • Renamed get_lulc function to get_nlcd

0.2.0 (2020-02-09)#

  • Simplified import method

  • Changed usage from rst format to ipynb

  • Auto-formatting with the black python package

  • Change docstring format based on Sphinx

  • Fixed pytest warnings and changed its working directory

  • Added an example notebook with data files

  • Added docstring for all the functions

  • Added Module section to the documentation

  • Fixed py7zr issue

  • Changed 7z extractor from pyunpack to py7zr

  • Fixed some linting issues.

0.1.0 (2020-01-31)#

  • First release on PyPI.

History#

0.16.2 (2024-02-12)#

Bug Fixes#
  • In add_elvation function, fix a bug where the function fails to add elevation to a xarray.Dataset with x and y dims not being x and y.

Internal Changes#
  • Refactor fill_depressions function by porting the code from pyflwdir and improve its performance and also now, it directly support xarray.DataArray. Now, pyflwdir is not an optional dependency anymore. You can install numba to improve the performance of the function.

Breaking Changes#
  • The AirMap service has been deprecated and removed from the package. The elevation_bycoords function now only supports the the National Map and the 3DEP services.

0.16.1 (2024-01-15)#

Bug Fixes#
  • In the check_3dep_availability function when the web service is down the function raises a TypeError instead of setting the value of the failed resolution to Failed. This is fixed now. (GH 66).

Internal Changes#
  • Simplify the logic of adding elevation to a Dataset in the add_elevation function to avoid modifying CRS of the input Dataset.

0.16.0 (2024-01-03)#

New Features#
  • Add a new function called get_map_vrt for getting DEM within a bounding box and saving it as a VRT file. This function has low memory usage and is useful for cases where the DEM is needed for a large area. Moreover, even for usual use cases it can be much faster than get_dem since it loads the data lazily, at the cost of higher disk usage.

  • In the get_map function, check if the input geometry is within the bounds of the 3DEP’s WMS service and if not, raise an exception.

  • In the fill_depressions function add a new argument called outlets for specifying outlet detection method: At the edge of all cells (edge) or only the minimum elevation edge cell (min; default).

  • Significantly improve the performance of check_3dep_availability function by minimizng the number of requests to the service and sending all requests asynchronously. Also, the returned dict now uses Failed for those resolutions where the service fails to return a valid response. It will remove the failed responses from the cache, so next time the function is called, it will try to get only the failed resolutions.

  • Add four new options to add_elevation: mask for passing a mask and resolution for specifying the resolution of the source DEM, and x_dim and y_dim for passing the names of spatial dimensions in the input dataset. The mask option is useful for cases where the input xarray.DataArray or xarray.Dataset has a mask and the user wants to use that mask for the elevation data as well. The resolution option is useful for cases where the user wants to get the elevation data at a higher resolution that will be downsampled by bilinear interpolation to the resolution of the input xarray.DataArray or xarray.Dataset. The default is resolution=None which means the resolution of the input xarray.DataArray or xarray.Dataset will be used. The x_dim and y_dim options are useful for cases where the input xarray.DataArray or xarray.Dataset has different names for spatial dimensions than x and y. The default is x_dim="x" and y_dim="y".

Breaking Changes#
  • In the elevation_profile function remove the res argument and use 10-m resolution DEM from 3DEP. Also, add two new attributes to the output xarray.Dataset: source for the dataset to state the data source used and units for the distance variable to state the units of the distance, which is meters.

Internal Changes#
  • Improve initial load time by moving import pyflwdir to the fill_depressions function.

Bug Fixes#
  • Decrease the number of pixels per request from 10e6 to 8e6 to reduce the request load (GH 65).

0.15.2 (2023-09-22)#

Internal Changes#
  • Remove dependency on dask.

0.15.1 (2023-09-02)#

Bug Fixes#
  • Fix HyRiver libraries requirements by specifying a range instead of exact version so conda-forge can resolve the dependencies.

0.15.0 (2023-05-07)#

From release 0.15 onward, all minor versions of HyRiver packages will be pinned. This ensures that previous minor versions of HyRiver packages cannot be installed with later minor releases. For example, if you have py3dep==0.14.x installed, you cannot install pydaymet==0.15.x. This is to ensure that the API is consistent across all minor versions.

New Features#
  • In static_3dep_dem use rioxarray directly instead of rasterio since it can handle VRT files.

  • Improve performance and accuracy of add_elevation by using the dynamic 3DEP service and setting the resolution based on the input xarray.DataArray or xarray.Dataset.

  • Improve the performance of elevation_profile by using the static 3DEP service when the input resolution is 10 m (which is the default for this function).

  • For now, retain compatibility with shapely<2 while supporting shapley>=2.

Bug Fixes#
  • In add_elevation, ensure that the resolution is in meters by reprojecting the input dataset to 5070 before extracting resolution and bound attributes.

0.14.0 (2023-03-05)#

New Features#
  • Add a new function called add_elevation for adding elevation data as a new variable to an input xarray.DataArray or xarray.Dataset.

  • The elevation_bycoords function now accepts a single coordinate and returns a float in addition to a list of coordinates that returned a list of elevations.

  • Modify the elevation_bycoords function to use the new elevation point query service (EPQS) web service. This only affects the source="tnm" option.

Breaking Changes#
  • Bump the minimum required version of shapely to 2.0, and use its new API.

Internal Changes#
  • Sync all minor versions of HyRiver packages to 0.14.0.

0.13.12 (2023-02-01)#

New Features#
  • Use pyflwdir package for depression filling operation instead of richdem since it appears to be unmaintained. Note that pyflwdir is an optional dependency. Also, pyflwdir depends on numba which is not available for Python 3.11 yet. You can follow the progress of numba’s support for Python 3.11 here.

  • Add a new function called get_dem for obtaining DEM that is a wrapper of static_3dep_dem and get_map functions. Since static_3dep_dem is faster, if the requested resolution is 10 m, 30 m, or 60 m, static_3dep_dem will be used. Otherwise, get_map will be used.

Internal Changes#
  • Significantly improve the performance of elevation_bycoords when tep is used as the source by using the static DEM data instead of the dynamic DEM.

  • Fully migrate setup.cfg and setup.py to pyproject.toml.

  • Convert relative imports to absolute with absolufy-imports.

  • Sync all patch versions of HyRiver packages to x.x.12.

0.13.10 (2023-01-08)#

New Features#
  • Refactor the show_versions function to improve performance and print the output in a nicer table-like format.

Bug Fixes#
  • Fix a compatibility issue with the new scipy version in elevation_profile where led to failure of interpolation.

0.13.9 (2022-12-15)#

Bug Fixes#
  • Add the missing annotation import to the cache_keys to ensure Python 3.8 and 3.9 work with Python 3.10 style type hinting.

0.13.8 (2022-12-09)#

New Features#
  • Add a new function called static_3dep_dem for getting only DEM data at 10 m, 30, or 60 m resolution. This is useful for cases where only DEM data (i.e., not slope, aspect, or other terrain attributes that the Dynamic 3DEP service provides) is needed. This function is faster than get_map but is less flexible.

Internal Changes#
  • Modify the codebase based on Refurb suggestions.

0.13.7 (2022-11-04)#

Internal Changes#
  • Use pyupgrade package to update the type hinting annotations to Python 3.10 style.

  • Bump the minimum required version of HyRiver dependencies to the latest versions.

0.13.6 (2022-08-30)#

Internal Changes#
  • Add the missing PyPi classifiers for the supported Python versions.

0.13.5 (2022-08-29)#

Breaking Changes#
  • Append “Error” to all exception classes for conforming to PEP-8 naming conventions.

Internal Changes#
  • Increase the pixel limit for 3DEP’s WMS from 8M to 10M to reduce number of service calls and improve performance.

  • Bump the minimum versions of pygeoogc and pygeoutils to 0.13.5 and that of async-retriever to 0.3.5.

0.13.3 (2022-06-25)#

Bug Fixes#
  • Fix a bug in check_3dep_availability where due to changes in pygeoogc ZeroMatched exception is raised instead of TypeError and as a result check_3dep_availability was not working as expected.

0.13.2 (2022-06-14)#

Breaking Changes#
  • Set the minimum supported version of Python to 3.8 since many of the dependencies such as xarray, pandas, rioxarray have dropped support for Python 3.7.

Internal Changes#
  • Use micromamba for running tests and use nox for linting in CI.

0.13.1 (2022-06-11)#

New Features#
  • In deg2mpm function look for _FillValue and nodatavals in the attributes and if not found, fall back to numpy.nan.

Internal Changes#
  • Ensure that the deg2mpm function uses dask if the input is dask-enabled.

  • In the elevation_profile function use a bounding box to get DEM and a linear interpolation to get the elevation along the profile.

0.13.0 (2022-04-03)#

New Features#
  • Add a new function called query_3dep_sources for querying bounds of 3DEP’s data sources within a bounding box. It returns a geo-dataframe that contains the bounding box of each data source and a column dem_res identifying the resolution of the raw topographic data within each geometry.

  • Add a new function called elevation_profile for getting elevation profile along a line at a given spacing. This function converts the line to a B-spline and then calculates the elevation along the spline at a given uniform spacing.

Breaking Changes#
  • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

0.12.2 (2022-01-15)#

New Features#
  • Add a new DEM source to elevation_bycoords to get elevation from the National Map’s 3DEP WMS service. This can replace the tnm source since tnm is not stable.

  • Add a new function called check_3dep_availability to check the availability of 3DEP’s native resolutions within an area of interest. It returns a dict with keys corresponding to the available resolutions and its values are boolean values indicating whether the resolution is available or not.

  • Replace no data values of slope in deg2mm with np.nan, so they do not get converted to another value. The output of this function has np.float64 type.

Internal Changes#
  • Refactor ElevationByCoords by using __post_init__ for validating the input parameters rather than pydantic’s validators.

  • Refactor elevation_bygrid by using get_map to get DEM and rioxarray for re-projection.

  • Add type checking with typeguard and fixed typing issues raised by typeguard.

  • Refactor show_versions to ensure getting correct versions of all dependencies.

0.12.1 (2021-12-31)#

Internal Changes#
  • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.

0.12.0 (2021-12-27)#

Breaking Changes#
  • Set the request caching’s expiration time to never expire. Add two flags to all functions to control the caching: expire_after and disable_caching.

Internal Changes#
  • Add all the missing types so mypy --strict passes.

  • Improve performance of elevation_bygrid by ignoring unnecessary validation.

0.11.4 (2021-11-12)#

Internal Changes#
  • Use rioxarray for dealing with GeoTIFF binaries since xarray deprecated the xarray.open_rasterio function, as it’s discussed in this PR.

  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.11.3 (2021-10-03)#

Breaking Changes#
  • Rewrite the command-line interface using click.group to improve UX. The command is now py3dep [command] [args] [options]. The two supported commands are coords for getting elevations of a dataframe of coordinates in EPSG:4326 CRS and geometry for getting the elevation of a geo-dataframe of geometries. Each sub-command now has a separate help message. The format of the input file for the coords command is now csv and for the geometry command is .shp or .gpkg and must have a crs attribute. Also, the geometry command now accepts multiple layers via the --layers (-l) option. More information and examples can be in the README.rst file.

New Features#
Internal Changes#
  • The get_map function now checks for validation of the input layers argument before sending the actual request with a more helpful message.

  • Improve docstrings.

  • Move deg2mpm, fill_depressions, and reproject_gtiff functions to a new file called utils. Both deg2mpm and fill_depressions functions are still accessible from py3dep directly.

  • Increase the test coverage.

  • Use one of the click’s internal functions, click..testing.CliRunner, to run the CLI tests.

0.11.2 (2021-09-17)#

Bug Fixes#
  • Fix a bug related to elevation_bycoords where CRS validation fails if its type is pyrpoj.CRS by converting inputs with CRS types to string.

Internal Changes#
  • Fix a couple of typing issues and update the get_transform API based on the recent changes in pygeoutils v0.11.5.

0.11.1 (2021-07-31)#

The first highlight of this release is a major refactor of elevation_bycoords by adding support for the Bulk Point Query Service and improving the overall performance of the function. Another highlight is support for performing depression filling in elevation_bygrid before sampling the underlying DEM.

New Features#
  • Refactor elevation_bycoords function to add support for getting elevations of a list of coordinates via The National Map’s Point Query Service. This service is more accurate than Airmap, but it’s limited to the US only. You can select the source via a new argument called source. You can set it to source=tnm to use the TNM service. The default is tnm.

  • Refactor elevation_bygrid function to add a new capability via fill_depressions argument for filling depressions in the obtained DEM before extracting elevation data for the input grid points. This is achieved via RichDEM that needs to be installed if this functionality is desired. You can install it via pip or conda (mamba).

Internal Changes#
  • Migrate to using AsyncRetriever for handling communications with web services.

  • Handle the interpolation step in elevation_bygrid function more efficiently using xarray.

0.11.0 (2021-06-19)#

New Features#
  • Added command-line interface (GH 10).

  • All feature query functions use persistent caching that can significantly improve the performance.

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

  • The returned xarray objects are in parallel mode, i.e., in some cases compute method should be used to get the results.

  • Save the output as a netcdf instead of raster since conversion from nc to tiff can be easily done with rioxarray.

0.10.1 (2021-03-27)#

  • Add announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.0 (2021-03-06)#

  • The first release after renaming hydrodata to PyGeoHydro.

  • Make mypy checks more strict and fix all the errors and prevent possible bugs.

  • Speed up CI testing by using mamba and caching.

0.9.0 (2021-02-14)#

  • Bump version to the same version as PyGeoHydro.

  • Add support for saving maps as geotiff file(s).

  • Replace Elevation Point Query Service service with AirMap for getting elevations for a list of coordinates in bulk since AirMap is much faster. The resolution of AirMap is 30 m.

  • Use cytoolz for some operations for improving performance.

0.2.0 (2020-12-06)#

  • Add support for multipolygon.

  • Remove the fill_hole argument.

  • Add a new function to get elevations for a list of coordinates called elevation_bycoords.

  • Refactor elevation_bygrid function for increasing readability and performance.

0.1.7 (2020-08-18)#

  • Added a rename operation to get_map to automatically rename the variables to a more sensible one.

  • Replaced simplejson with orjson to speed-up JSON operations.

0.1.6 (2020-08-11)#

  • Add a new function, show_versions, for getting versions of the installed dependencies which is useful for debugging and reporting.

  • Fix typos in the docs and improved the README.

  • Improve testing and coverage.

0.1.5 (2020-08-03)#

  • Fixed the geometry CRS issue

  • Improved the documentation

0.1.4 (2020-07-23)#

  • Refactor get_map to use pygeoutils package.

  • Change the versioning method to setuptools_scm.

  • Polish README and add installation from conda-forge.

0.1.0 (2020-07-19)#

  • First release on PyPI.

History#

0.16.1 (2024-01-15)#

New Features#
  • Add a new function for getting Daymet data from Microsoft’s Planetary Computer called get_bystac. Although this function can be much faster than get_bygeom, currently, it gives access to Daymet v4.2 from 1980 to 2020. As discussed here, the Daymet v4.5 will be added to the Planetary Computer in the future. Until then, for accessing the latest version of Daymet (v4.5) you need to use get_bygeom. Additionally, this function requires fsspec, dask, zarr, and pystac-client packages.

  • Make separate_snow a standalone, pure, and public function. Now, it can be used like so: pydaymet.separate_snow.

  • Change the length unit from km to m for get_bygeom.

Internal Changes#
  • The potential_et function uses py3dep.add_elevation function but the CRS info gets lost in the process for the new elevation variable. This version fixes this issue by adding the CRS info to the elevation variable.

  • Change PetParams class from NamedTuple to dataclass for better performance and consistency. Now, it has a new classmethod called fields that returns a list of the four fields of the class.

0.16.0 (2024-01-03)#

Breaking Changes#
  • Bump min version of shapely to 2.

Internal Changes#
  • Use the new py3dep.add_elevation API.

0.15.2 (2023-09-22)#

Internal Changes#
  • Remove dependency on dask.

0.15.1 (2023-09-02)#

Bug Fixes#
  • Fix HyRiver libraries requirements by specifying a range instead of exact version so conda-forge can resolve the dependencies.

0.15.0 (2023-05-07)#

From release 0.15 onward, all minor versions of HyRiver packages will be pinned. This ensures that previous minor versions of HyRiver packages cannot be installed with later minor releases. For example, if you have py3dep==0.14.x installed, you cannot install pydaymet==0.15.x. This is to ensure that the API is consistent across all minor versions.

New Features#
  • For now, retain compatibility with shapely<2 while supporting shapley>=2.

0.14.0 (2023-03-05)#

New Features#
  • Change missing value of both single-pixel and gridded versions to numpy.nan from -9999.

  • Add a new model parameter for computing PET using priestlet_taylor and penman_monteith models called arid_correction. For arid regions, FAO 56 suggests subtracting the min temperature by 2 degrees. This parameter can be passed via pet_params in daymet_by* functions, or params in potential_pet function.

  • Refactor get_bycoords to reduce memory usage by using a combination of itertools and Generator objects.

  • Refactor the pet module to improve performance and readability, and reduce code duplication.

Documentation#
  • Add more information about parameters that pet functions accept.

Breaking Changes#
  • Bump the minimum required version of shapely to 2.0, and use its new API.

Internal Changes#
  • Sync all minor versions of HyRiver packages to 0.14.0.

0.13.12 (2023-02-10)#

Internal Changes#
  • Fully migrate setup.cfg and setup.py to pyproject.toml.

  • Convert relative imports to absolute with absolufy-imports.

  • Sync all patch versions of HyRiver packages to x.x.12.

0.13.10 (2023-01-08)#

New Features#
  • Refactor the show_versions function to improve performance and print the output in a nicer table-like format.

Bug Fixes#
  • Fix a bug in get_bygeom where for small requests that lead to a single download URL, the function failed.

Internal Changes#
  • Skip 0.13.9 version so the minor version of all HyRiver packages become the same.

0.13.8 (2022-12-09)#

Internal Changes#
  • More robust handling of getting large gridded data. Instead of caching the requests/ responses, directly store the responses as NetCDF files to a cache folder using pygeoogc.streaming_download and ultimately read them using xarray.open_mfdataset. This should make the bygeom function even faster than before and also make it possible to make large requests without having to worry about running out of memory (GH 59).

  • Modify the codebase based on Refurb suggestions.

0.13.7 (2022-11-04)#

Since the release of Daymet v4 R1 on November 2022, the URL of Daymet’s server has been changed. Therefore, only PyDaymet v0.13.7+ is going to work, and previous versions will not work anymore.

New Features#
  • Add support for passing a list of coordinates to the get_bycoords function. Also, optionally, you can pass a list of IDs for the input coordinates that will be used as keys for the returned pandas.DataFrame or a dimension called id in the returned xarray.Dataset if to_xarray is enabled.

  • Add a new argument called to_xarray to the get_bycoords function for returning the results as a xarray.Dataset instead of a pandas.DataFrame. When set to True, the returned xarray.Dataset will have three attributes called units, description, and long_name.

  • The date argument of both get_bycoords and by_geom functions now accepts range-type objects for passing years, e.g., range(2000-2005).

import pydaymet as daymet

coords = [(-94.986, 29.973), (-95.478, 30.134)]
idx = ["P1", "P2"]
clm = daymet.get_bycoords(coords, range(2000, 2021), coords_id=idx, to_xarray=True)
Internal Changes#
  • Use pyupgrade package to update the type hinting annotations to Python 3.10 style.

  • Fix the Daymet server URL.

0.13.6 (2022-08-30)#

Internal Changes#
  • Add the missing PyPi classifiers for the supported Python versions.

0.13.5 (2022-08-29)#

Breaking Changes#
  • Append “Error” to all exception classes for conforming to PEP-8 naming conventions.

Internal Changes#
  • Bump the minimum versions of pygeoogc, pygeoutils, py3dep to 0.13.5 and that of async-retriever to 0.3.5.

0.13.3 (2022-07-31)#

Bug Fixes#
  • Fix a bug in PETGridded where the wrong data type was being set for pet and elevation variables.

  • When initializing PETGridded, only chunk the elevation if the input climate data is chunked.

0.13.2 (2022-06-14)#

Breaking Changes#
  • Set the minimum supported version of Python to 3.8 since many of the dependencies such as xarray, pandas, rioxarray have dropped support for Python 3.7.

Internal Changes#
  • Use micromamba for running tests and use nox for linting in CI.

0.13.1 (2022-06-11)#

New Features#
Bug Fixes#
  • Set the end year based on the current year since Daymet data get updated every year (PR 55) by Tim Cera.

  • Set the months for the annual timescale to correct values (PR 55) by Tim Cera.

0.13.0 (2022-03-03)#

Breaking Changes#
  • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

0.12.3 (2022-02-04)#

New Features#
  • Add a new flag to both get_bycoords and get_bygeom functions called snow which separates snow from the precipitation using the Martinez and Gupta (2010) method.

Internal Changes#
  • Add elevation data when computing PET regardless of the pet method.

  • Match the chunk size of elevation with that of the climate data.

  • Drop time dimension from elevation, lon, and lat variables.

Bug Fixes#
  • Fix a bug in setting dates for monthly timescales. For monthly timescale Daymet calendar is at 15th or 16th of the month, so input dates need to be adjusted accordingly.

0.12.2 (2022-01-15)#

Internal Changes#
  • Clean up the PET computation functions’ output by removing temporary variables that are created during the computation.

  • Add more attributes for elevation and pet variables.

  • Add type checking with typeguard and fixed typing issues raised by typeguard.

  • Refactor show_versions to ensure getting correct versions of all dependencies.

0.12.1 (2021-12-31)#

Internal Changes#
  • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.

0.12.0 (2021-12-27)#

New Features#
  • Expose the ssl argument for disabling the SSL certification verification (GH 41). Now, you can pass ssl=False to disable the SSL verification in both get_bygeom and get_bycoord functions. Moreover, you can pass --disable_ssl to PyDaymet’s command line interface to disable the SSL verification.

Breaking Changes#
  • Set the request caching’s expiration time to never expire. Add two flags to all functions to control the caching: expire_after and disable_caching.

Internal Changes#
  • Add all the missing types so mypy --strict passes.

0.11.4 (2021-11-12)#

Internal Changes#
  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.11.3 (2021-10-07)#

Bug Fixes#
  • There was an issue in the PET computation due to dayofyear being added as a new dimension. This version fixes it and even further simplifies the code by using xarray’s dt accessor to gain access to the dayofyear method.

0.11.2 (2021-10-07)#

New Features#
  • Add hargreaves_samani and priestley_taylor methods for computing PET.

Breaking Changes#
  • Rewrite the command-line interface using click.group to improve UX. The command is now pydaymet [command] [args] [options]. The two supported commands are coords for getting climate data for a dataframe of coordinates and geometry for getting gridded climate data for a geo-dataframe. Moreover, Each sub-command now has a separate help message and example.

  • Deprecate get_byloc in favor of get_bycoords.

  • The pet argument in both get_bycoords and get_bygeom functions now accepts hargreaves_samani, penman_monteith, priestley_taylor, and None.

Internal Changes#
  • Refactor the pet module for reducing duplicate code and improving readability and maintainability. The code is smaller now and the functions for computing physical properties include references to equations from the respective original paper.

0.11.1 (2021-07-31)#

The highlight of this release is a major refactor of Daymet to allow for extending PET computation function for using methods other than FAO-56.

New Features#
  • Refactor Daymet class by removing pet_bycoords and pet_bygrid methods and creating a new public function called potential_et. This function computes potential evapotranspiration (PET) and supports both gridded (xarray.Dataset) and single pixel (pandas.DataFrame) climate data. The long-term plan is to add support for methods other than FAO 56 for computing PET.

0.11.0 (2021-06-19)#

New Features#
  • Add command-line interface (GH 7).

  • Use AsyncRetriever for sending requests asynchronously with persistent caching. A cache folder in the current directory is created.

  • Check for validity of start/end dates based on Daymet V4 since Puerto Rico data starts from 1950 while North America and Hawaii start from 1980.

  • Check for validity of input coordinate/geometry based on the Daymet V4 bounding boxes.

  • Improve accuracy of computing Psychometric constant in PET calculations by using an equation in Allen et al. 1998.

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

  • Change loc_crs and geo_crs arguments to crs in get_bycoords and get_bygeom.

Documentation#
  • Add examples to docstrings and improve writing.

  • Add more notes regarding the underlying assumptions for pet_bycoords and pet_bygrid.

Internal Changes#
  • Refactor Daymet class to use pydantic for validating the inputs.

  • Increase test coverage.

0.10.2 (2021-03-27)#

  • Add announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.0 (2021-03-06)#

  • The first release after renaming hydrodata to PyGeoHydro.

  • Make mypy checks more strict and fix all the errors and prevent possible bugs.

  • Speed up CI testing by using mamba and caching.

0.9.0 (2021-02-14)#

  • Bump version to the same version as PyGeoHydro.

  • Update to version 4 of Daymet database. You can check the release information here

  • Add a new function called get_bycoords that provides an alternative to get_byloc for getting climate data at a single pixel. This new function uses THREDDS data server with NetCDF Subset Service (NCSS), and supports getting monthly and annual averages directly from the server. Note that this function will replace get_byloc in the future. So consider migrating your code by replacing get_byloc with get_bycoords. The input arguments of get_bycoords is very similar to get_bygeom. Another difference between get_byloc and get_bycoords is column names where get_bycoords uses the units that are return by NCSS server.

  • Add support for downloading monthly and annual summaries in addition to the daily timescale. You can pass time_scale as daily, monthly, or annual to get_bygeom or get_bycoords functions to download the respective summaries.

  • Add support for getting climate data for Hawaii and Puerto Rico by passing region to get_bygeom and get_bycoords functions. The acceptable values are na for CONUS, hi for Hawaii, and pr for Puerto Rico.

0.2.0 (2020-12-06)#

  • Add support for multipolygon.

  • Remove the fill_hole argument.

  • Improve masking by geometry.

  • Use the newly added async_requests function from pygeoogc for getting Daymet data to increase the performance (almost 2x faster)

0.1.3 (2020-08-18)#

  • Replaced simplejson with orjson to speed-up JSON operations.

0.1.2 (2020-08-11)#

  • Add show_versions for showing versions of the installed deps.

0.1.1 (2020-08-03)#

  • Retained the compatibility with xarray 0.15 by removing the attrs flag.

  • Replaced open_dataset with load_dataset for automatic handling of closing the input after reading the content.

  • Removed years argument from both byloc and bygeom functions. The dates argument now accepts both a tuple of start and end dates and a list of years.

0.1.0 (2020-07-27)#

  • Initial release on PyPI.

History#

0.16.0 (2024-01-03)#

  • Initial release on PyPI.

History#

0.16.0 (2024-01-03)#

Internal Changes#
  • Drop support for shapely<2.

0.15.2 (2023-09-22)#

Internal Changes#
  • Remove dependency on dask.

  • Reduce complexity of the code by breaking down the _check_inputs function into _get_variables and _get_dates functions.

0.15.1 (2023-07-10)#

Bug Fixes#
  • Fix a bug in computing snow where the t_snow argument was not being converted to Kelvin.

New Features#
  • If snow=True is passed to both get_bygeom and get_bycoords functions, the variables argument will be checked to see if it contains prcp and temp, if not, they will be added to the list of variables to be retrieved. This is to ensure that the snow argument works as expected.

0.15.0 (2023-05-07)#

From release 0.15 onward, all minor versions of HyRiver packages will be pinned. This ensures that previous minor versions of HyRiver packages cannot be installed with later minor releases. For example, if you have py3dep==0.14.x installed, you cannot install pydaymet==0.15.x. This is to ensure that the API is consistent across all minor versions.

New Features#
  • Add source argument to both get_bygeom and get_bycoords functions. Valid values for source are grib (default) and netcdf. Both return the same values, the latter also offers additional variable psurf for surface pressure. Valid variable names for netcdf are: prcp, pet, wind_u, wind_v, humidity, temp, rsds, rlds, psurf Valid variable names for grib source are unchanged as to not introduce breaking changes. By Luc Rébillout.

  • For now, retain compatibility with shapely<2 while supporting shapley>=2.

0.14.0 (2023-03-05)#

New Features#
  • Add snow and snow_params arguments to both get_bygeom and get_bycoords functions for computing snow from prcp and temp.

  • Rewrite by_coords functions to improve performance and reduce memory usage. Also, its to_xarray argument now returns a much better structured xarray.Dataset. Moreover, the function has a new argument called coords_id which allows the user to specify IDs for the input coordinates. This is useful for cases where the coordinates belong to some specific features, such as station location, that have their own IDs. These IDs will be used for both cases where the data is returned as pandas.DataFrame or xarray.Dataset.

Internal Changes#
  • Sync all minor versions of HyRiver packages to 0.14.0.

0.1.12 (2023-02-10)#

Internal Changes#
  • Fully migrate setup.cfg and setup.py to pyproject.toml.

  • Convert relative imports to absolute with absolufy-imports.

  • Sync all patch versions of HyRiver packages to x.x.12.

0.1.2 (2023-01-08)#

New Features#
  • Refactor the show_versions function to improve performance and print the output in a nicer table-like format.

0.1.1 (2022-12-16)#

Bug Fixes#
  • Fix an issue where for single variable, i.e., not a list, could not be detected correctly.

  • Fix an issue in converting the response from the service to a dataframe or dataset when service fails and throws an error.

0.1.0 (2022-12-15)#

  • Initial release.

History#

0.16.0 (2024-01-03)#

New Features#
  • Add a new function called flashiness_index for computing the flashiness index of a daily streamflow time series following Baker et al. (2004).

Breaking Changes#
  • Improve function naming convention by removing the compute_ prefix from all functions and spelling out the full name of the function. For example, compute_fdc_slope and compute_ai are now flow_duration_curve_slope and aridity_index, respectively.

0.15.2 (2023-09-22)#

New Features#
  • Add an option to compute_mean_monthly for specifying whether the input data unit is in mm/day or m3/s. If m3/s, then the monthly values are computed by taking the mean of the daily values for each month. If mm/day, then the monthly values are computed by taking the sum of the daily values for each month.

0.15.0 (2023-05-07)#

From release 0.15 onward, all minor versions of HyRiver packages will be pinned. This ensures that previous minor versions of HyRiver packages cannot be installed with later minor releases. For example, if you have py3dep==0.14.x installed, you cannot install pydaymet==0.15.x. This is to ensure that the API is consistent across all minor versions.

Internal Changes#
  • Explicitly use nopython mode in numba-decorated functions to avoid deprecation warnings.

0.14.0 (2023-03-05)#

Bug Fixes#
  • Address an issue in compute_fdc_slope where if the input includes NANs, it returns NAN. Now, the function correctly handles NAN values. Also, this function now works with any array-like input, i.e., pandas.Series, pandas.DataFrame, numpy.ndarray, and xarray.DataArray. Also, the denominator should have been divided by 100 since the input bins are percentiles.

  • Fix a bug in compute_ai where instead of using mean annual average values, daily values was being used. Also, this function now accepts xarray.DataArray too.

Internal Changes#
  • Sync all minor versions of HyRiver packages to 0.14.0.

0.1.12 (2023-02-10)#

Internal Changes#
  • Fully migrate setup.cfg and setup.py to pyproject.toml.

  • Convert relative imports to absolute with absolufy-imports.

  • Sync all patch versions of HyRiver packages to x.x.12.

0.1.2 (2023-01-08)#

New Features#
  • Refactor the show_versions function to improve performance and print the output in a nicer table-like format.

Internal Changes#
  • Use pyright for type checking and fix all typing issues that it raised.

  • Add xarray as a dependency.

0.1.1 (2022-11-04)#

New Features#
  • Add a new function called compute_ai for computing the aridity index.

  • Add a new function called compute_flood_moments for computing flood moments: Mean annual flood, coefficient of variation, and coefficient of skewness.

  • Add a stand-alone function for computing the FDC slope, called compute_fdc_slope.

Breaking Changes#
  • Remove the runoff_ratio_annual function.

0.1.0 (2022-10-03)#

  • First release on PyPI.

History#

0.16.0 (2024-01-03)#

New Features#
  • Add a new environmental variable called "HYRIVER_SSL_CERT" for setting the path to a SSL certificate file other than the default one. You can do this like so:

import os

os.environ["HYRIVER_SSL_CERT"] = "path/to/file.pem"

0.15.2 (2023-09-22)#

Bug Fixes#
  • Fix an issue with getting all valid keywords that aiohttp accepts by using aiohttp.ClientSession()._request directly.

0.15.0 (2023-05-07)#

From release 0.15 onward, all minor versions of HyRiver packages will be pinned. This ensures that previous minor versions of HyRiver packages cannot be installed with later minor releases. For example, if you have py3dep==0.14.x installed, you cannot install pydaymet==0.15.x. This is to ensure that the API is consistent across all minor versions.

Bug Fixes#
  • When raise_status is False, responses for failed requests used to return as None but their requests ID was not returned, so sorting would have failed. Now request IDs are returned for all requests regardless of whether they were successful or not.

  • Give precedence to non-default arguments for caching related arguments instead of directly getting them from env variables. This is to avoid the case where the user sets the env variables but then passes different arguments to the function. In this case, the function should use the passed arguments instead of the env variables.

0.14.0 (2023-03-05)#

New Features#
  • Add a new option to all functions called raise_status. If False no exception will be raised and instead None is returned for those requests that led to exceptions. This will allow for returning all responses that were successful and ignoring the ones that failed. This option defaults to True for retaining backward compatibility.

  • Set the cache expiration time to one week from never expire. To ensure all users have a smooth transition, cache files that were created before the release of this version will be deleted, and a new cache will be created.

Internal Changes#
  • Sync all minor versions of HyRiver packages to 0.14.0.

0.3.12 (2023-02-10)#

Internal Changes#
  • Rewrite the private async_session function as two separate functions called async_session_without_cache and async_session_with_cache. This makes the code more readable and easier to maintain.

  • Fully migrate setup.cfg and setup.py to pyproject.toml.

  • Convert relative imports to absolute with absolufy-imports.

  • Make utils module private.

  • Sync all patch versions of HyRiver packages to x.x.12.

0.3.10 (2023-01-08)#

New Features#
  • Refactor the show_versions function to improve performance and print the output in a nicer table-like format.

Bug Fixes#
  • Fix a bug in reading the HYRIVER_CACHE_EXPIRE environmental variable.

  • Bump the minimum version of aiohttp-client-cache to 0.8.1 to fix a bug in reading cache files that were created with previous versions. (GH 41)

Internal Changes#
  • Enable fast_save in aiohttp-client-cache to speed up saving responses to the cache file.

  • Use pyright for type checking instead of mypy and fix all type errors.

  • Skip 0.13.8/9 versions so the minor version of all HyRiver packages become the same.

0.3.7 (2022-12-09)#

New Features#
  • Add support for specifying the chunk size in stream_write. Defaults to None which was the default behavior before, and means iterating over and writing the responses as they are received from the server.

Internal Changes#
  • Use pyupgrade package to update the type hinting annotations to Python 3.10 style.

  • Modify the codebase based on Refurb suggestions.

0.3.6 (2022-08-30)#

Internal Changes#
  • Add the missing PyPi classifiers for the supported Python versions.

  • Release the package as both async_retriever and async-retriever on PyPi and Conda-forge.

0.3.5 (2022-08-29)#

Breaking Changes#
  • Append “Error” to all exception classes for conforming to PEP-8 naming conventions.

Internal Changes#
  • Bump minimum version of aiohttp-client-cache to 0.7.3 since the attrs version issue has been addressed.

0.3.4 (2022-07-31)#

New Features#
  • Add a new function, stream_write, for writing a response to a file as it’s being retrieved. This could be very useful for downloading large files. This function does not use persistent caching.

0.3.3 (2022-06-14)#

Breaking Changes#
  • Set the minimum supported version of Python to 3.8 since many of the dependencies such as xarray, pandas, rioxarray have dropped support for Python 3.7.

Internal Changes#
  • Use micromamba for running tests and use nox for linting in CI.

0.3.2 (2022-04-03)#

New Features#
  • Add support for setting caching-related arguments using three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
Internal Changes#
  • Include the URL of a failed request in its exception error message.

0.3.1 (2021-12-31)#

New Features#
  • Add three new functions called retrieve_text, retrieve_json, and retrieve_binary. These functions are derived from the retrieve function and are used to retrieve the text, JSON, or binary content of a response. They are meant to help with type hinting since they have only one return type instead of the three different return types that the retrieve function has.

Internal Changes#
  • Move all private functions to a new module called utils. This makes the code-base more readable and easier to maintain.

0.3.0 (2021-12-27)#

Breaking Changes#
  • Set the expiration time to never expire by default.

New Features#
  • Add two new arguments to retrieve for controlling caching. First, delete_url_cache for deleting caches for specific requests. Second, expire_after for setting a custom expiration time.

  • Expose the ssl argument for disabling the SSL certification verification (GH 41).

  • Add a new option called disable that temporarily disables caching requests/responses if set to True. It defaults to False.

0.2.5 (2021-11-09)#

New Features#
  • Add two new arguments, timeout and expire_after, to retrieve. These two arguments give the user more control in dealing with issues related to caching.

Internal Changes#
  • Revert to pytest as the testing framework.

  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.2.4 (2021-09-10)#

Internal Changes#
  • Use ujon for converting responses to JSON.

Bug Fixes#
  • Fix an issue with catching service error messages.

0.2.3 (2021-08-26)#

Internal Changes#
  • Use ujson for JSON parsing instead of orjson since orjson only serializes to bytes which is not compatible with aiohttp.

0.2.2 (2021-08-19)#

New Features#
  • Add a new function, clean_cache, for manually removing the expired responses from the cache database.

Internal Changes#
  • Handle all cache file-related operations in the create_cachefile function.

0.2.1 (2021-07-31)#

New Features#
  • The responses now are returned to the same order as the input URLs.

  • Add support for passing connection type, i.e., IPv4 only, IPv6 only, or both via the family argument. Defaults to both.

  • Set trust_env=True, so the session can read the system’s netrc files. This can be useful for working with services such as EarthData service that read the user authentication info from a netrc file.

Internal Changes#
  • Replace the AsyncRequest class with the _retrieve function to increase readability and reduce overhead.

  • More robust handling of validating user inputs via a new class called ValidateInputs.

  • Move all if-blocks in async_session to other functions to improve performance.

0.2.0 (2021-06-17)#

Breaking Changes#
  • Make persistent caching dependencies required.

  • Rename request argument to request_method in retrieve which now accepts both lower and upper cases of get and post.

Bug Fixes#
  • Pass a new loop explicitly to nest_asyncio (GH 1).

Internal Changes#
  • Refactor the entire code-base for more efficient handling of different request methods.

  • Check the validity of inputs before sending requests.

  • Improve documentation.

  • Improve cache handling by removing the expired responses before returning the results.

  • Increase testing coverage to 100%.

0.1.0 (2021-05-01)#

  • Initial release.

History#

0.16.2 (2024-XX-XX)#

Internal Changes#
  • Remove the deprecated AirMap URL.

0.16.1 (2024-01-15)#

Bug Fixes#
  • pyproj uses its own env variables for SSL certification. This release fixes the issue with pyproj not being able to download the grid database when using DOI SSL certification file. This release uses pyproj.network.set_ca_bundle_path for setting the SSL certification file given by the user via HYRIVER_SSL_CERT env variable.

  • Fix an issue in WFS.getfeature_byid where the max_nrecords argument was not being used correctly, thus resulting in large requests to fail.

Internal Changes#
  • For ServiceURL class, use dataclass instead for better performance and consistency.

0.16.0 (2024-01-03)#

New Features#
  • Add a new arg to WMS.getmap_bybox called tiff_dir for storing the responses from a WMS request as a GeoTIFF file on disk instead of keeping all responses in memory. When this arg is given the function return a list of paths to these files. This is useful for large requests where the response is too large to be kept in memory. You can create a VRT file from these files using pygeoutils.gtiff2vrt function.

0.15.2 (2023-09-22)#

New Features#
  • Added RESTfulURLs for FEMA’s National Flood Hazard Layer (NFHL) service. Contributed by Fernando Aristizabal. (PR 62)

  • Now, RetrySession can be used as a context manager. This is useful for closing the session after using it. For example:

from pygeoogc import RetrySession

with RetrySession() as session:
    r = session.get("https://httpbin.org/get").json()
Internal Changes#
  • Improve the example in the docstring of traverse_json function.

  • Improve exception handling in the ArcGISRESTful class and return a more informative error message.

0.15.1 (2023-08-02)#

From release 0.15 onward, all minor versions of HyRiver packages will be pinned. This ensures that previous minor versions of HyRiver packages cannot be installed with later minor releases. For example, if you have pygeoogc==0.14.x installed, you cannot install pygeoogc==0.15.x series. This is to ensure that the API is consistent across all minor versions.

New Features#
  • Add the STN Flood Event Data URL to the list of RESTfuls. Contributed by Fernando Aristizabal. (PR 59)

  • Add the link for the eHydro’s web service.

0.15.0 (2023-05-07)#

From release 0.15 onward, all minor versions of HyRiver packages will be pinned. This ensures that previous minor versions of HyRiver packages cannot be installed with later minor releases. For example, if you have pygeoogc==0.14.x installed, you cannot install pygeoogc==0.15.x series. This is to ensure that the API is consistent across all minor versions.

New Features#
  • For now, retain compatibility with shapely<2 while supporting shapley>=2.

Bug Fixes#
  • Fix an issue in WFS where the getfeature_bygeom method fails if the requested web service does not have geometry_column attribute in its schema. This release addresses this issue by trying to find the name from other attributes in the schema. If it fails to find, it raises a ValueError.

  • Catch an edge case in match_crs function where the input is a list of coordinates of length 4.

  • Give precedence to non-default arguments for caching related arguments instead of directly getting them from env variables. This is to avoid the case where the user sets the env variables but then passes different arguments to the function. In this case, the function should use the passed arguments instead of the env variables.

Internal Changes#
  • Remove pyyaml as a dependency since it is not used anymore.

0.14.0 (2023-03-05)#

Breaking Changes#
  • Bump the minimum required version of shapely to 2.0, and use its new API.

Internal Changes#
  • Sync all minor versions of HyRiver packages to 0.14.0.

0.13.12 (2023-02-10)#

New Features#
  • Make match_crs less strict in terms of the input geometry type being tuple or list by relying on shapely and contextlib.suppress. So, now users can pass any combination of list or tuple as coordinates or bounding box.

  • More robust handling of inputs and outputs in streaming_download. Now, only if input is str the function returns a single Path object. Previously if there was only one URL, whether list of length one or str, the output was a single Path, which could have had unintended consequences.

Bug Fixes#
  • In WFS when some layers have missing schema info, the class failed to initialize. This release fixes this issue by ignoring layers with missing schema info and asks the user to pass a sort parameter instead of trying to automatically find a sort parameter. This fix also improves the performance of this function by making fewer web requests.

Internal Changes#
  • Fully migrate setup.cfg and setup.py to pyproject.toml.

  • Convert relative imports to absolute with absolufy-imports.

  • Sync all patch versions of HyRiver packages to x.x.12.

0.13.10 (2023-01-08)#

Bug Fixes#
  • Remove all Python 3.9 type-annotation-style in the codebase except for function signatures to ensure compatibility with Python 3.8. (GH 57, PR 58). Thanks to Tim Cera for reporting and fixing the issue.

Internal Changes#
  • Use pyright for type checking instead of mypy since it is faster and more accurate. Also, fix all the type errors reported by pyright.

  • Improve code quality by addressing issues raised by DeepSource.

0.13.9 (2022-12-15)#

Bug Fixes#
  • Add the missing annotation import to the cache_keys to ensure Python 3.8 and 3.9 work with Python 3.10 style type hinting.

0.13.8 (2022-12-09)#

New Features#
  • Add a new property to WFS class called schema that contains information about column names and their types for all layers. It also the geometry type and its name for each layer.

  • Automatically determine the geometry keyword that should be passed to WFS.getfeature_bygeom using the new schema property of WFS.

  • Add support for disabling SSL verification to RetrySession via ssl parameter.

  • Add support for streaming responses to RetrySession via stream parameter to get and post methods.

  • Add support for closing the session to RetrySession via close method.

  • Add support for passing params, data, and json to RetrySession via get and post methods. Previously, keyword payload was used for params in get and data in post. Now, params and data can also be passed as keyword arguments to these methods.

  • Add a new function called streaming_download for downloading large files in parallel and in chunks.

Bug Fixes#
  • Fix an issue in WFS class where number of requested features exceeds the maximum number of features allowed by the server, but only a portion of the features are returned. This release addresses this issue by first getting only the number of features and then requesting the features in chunks of features IDs based on the maximum number of features allowed by the server.

Internal Changes#
  • Drop support for WFS version 1.0.0 since it does not support paging.

  • Modify the codebase based on Refurb suggestions.

Bug Fixes#
  • Fix the warning message in ArcGISRESTFul where wrong number of missing feature IDs were being reported.

0.13.7 (2022-11-04)#

New Features#
  • Add a new method to RetrySession for getting the request head called RetrySession.head. This is useful for getting the headers of a request without having to make a full request which is useful for getting the Content-Length header for example, i.e., download size.

Bug Fixes#
  • Fix an issue in the decompose function, utils.bbox_decompose, where the generated bounding boxes might overlap in some cases. A new approach has been implemented based on finding the number of required bounding boxes from max allowable no. of pixels and total requested pixels without changing the input bounding box projection. This ensures that the decomposed bounding boxes are not overlapping so xarray.open_mfdataset can be used without any issues.

Internal Changes#
  • In the utils.match_crs function, don’t perform any projection if the source target CRS are the same.

  • Improve type hints for CRS-related arguments of all functions by including string, integer, and pyproj.CRS types.

  • Add a new class method to WMSBase and WFSBase classes called get_service_options for retrieving the available layers, output formats, and CRSs for a given service. Here’s an example:

  • Use pyupgrade package to update the type hinting annotations to Python 3.10 style.

from pygeoogc.core import WMSBase

url = "https://elevation.nationalmap.gov/arcgis/services/3DEPElevation/ImageServer/WMSServer"
wms = WMSBase(url, validation=False)
wms.get_service_options()
print(wms.available_layer)

0.13.6 (2022-08-30)#

Internal Changes#
  • Add the missing PyPi classifiers for the supported Python versions.

0.13.5 (2022-08-29)#

Breaking Changes#
  • Append “Error” to all exception classes for conforming to PEP-8 naming conventions.

Internal Changes#
  • Bump minimum version of owslib to 0.27.2 since the pyproj incompatibility issue has been addressed in this issue.

  • Bump minimum version of requests-cache to 0.9.6 since the attrs version issue has been addressed.

0.13.3 (2022-07-31)#

New Features#
  • Add support for disabling persistent caching in RetrySession via an argument and also HYRIVER_CACHE_DISABLE environmental variable.

0.13.2 (2022-06-14)#

Breaking Changes#
  • Set the minimum supported version of Python to 3.8 since many of the dependencies such as xarray, pandas, rioxarray have dropped support for Python 3.7.

  • Pin owslib to version <0.26 since version 0.26 has pinned pyproj to version <3.3 which is not compatible with rasterio on macOS.

Internal Changes#
  • Use micromamba for running tests and use nox for linting in CI.

0.13.1 (2022-06-11)#

New Features#
  • More robust handling of errors in ArcGISRESTful by catching None responses. Also, use the POST method for ArcGISRESTful.bysql since the SQL Clause could be a long string.

0.13.0 (2022-04-03)#

Breaking Changes#
  • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
Bug Fixes#
  • In ArcGISRESTful.oids_byfield convert the input ids to a list if a user passes a single id.

Internal Changes#
  • Refactor ServicURL to hard code the supported links instead of reading them from a file. Also, the class now is based on NamedTuple that has a nicer __repr__.

0.12.2 (2022-01-15)#

New Features#
  • Make validate_crs public that can be accessed from the utils module. This is useful for checking validity of user input CRS values and getting its string representation.

  • Add pygeoogc.utils.valid_wms_crs function for getting a list of valid CRS values from a WMS service.

  • Add 3DEP’s index WFS service for querying availability of 3DEP data within a bounding box.

Internal Changes#
  • Add type checking with typeguard and fixed typing issues raised by typeguard.

  • Refactor show_versions to ensure getting correct versions of all dependencies.

0.12.1 (2021-12-31)#

Internal Changes#
  • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.

0.12.0 (2021-12-27)#

New Features#
  • Add a new argument to ArcGISRESTful called verbose to turn on/off all info level logs.

  • Add an option to ArcGISRESTful.get_features called get_geometry to turn on/off requesting the data with or without geometry.

  • Now, ArcGISRESTful saves the object IDs of the features that user requested but are not available in the database to ./cache/failed_request_ids.txt.

  • Add a new parameter to ArcGISRESTful called disable_retry that If True in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests are saved to a text file which its path can be accessed via ArcGISRESTful.client.failed_path.

  • Set response caching expiration time to never expire, for all base classes. A new argument has been added to all three base classes called expire_after that can be used to set the expiration time.

  • Add a new method to all three base classes called clear_cache that clears all cached responses for that specific client.

Breaking Changes#
  • All oids_by* methods of ArcGISRESTful class now return a list of object IDs rather than setting self.featureids. This makes it possible to pass the outputs of the oids_by* functions directly to the get_features method.

Internal Changes#
  • Make ArcGISRESTful less cluttered by instantiating ArcGISRESTfulBase in the init method of ArcGISRESTful rather than inheriting from its base class.

  • Explicitly set a minimum value of 1 for the maximum number of feature IDs per request in ArcGISRESTful, i.e., self.max_nrecords.

  • Add all the missing types so mypy --strict passes.

0.11.7 (2021-11-09)#

Breaking Changes#
  • Remove the onlyipv4 method from RetrySession since it can be easily be achieved using with unittest.mock.patch("socket.has_ipv6", False):.

Internal Changes#
  • Use the geoms method for iterating over geometries to address the deprecation warning of shapely.

  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

  • Remove unnecessary dependency on simplejson and use ujson instead.

0.11.5 (2021-09-09)#

Bug Fixes#
  • Update the code to use the latest requsts-cache API.

0.11.4 (2021-08-26)#

New Features#

0.11.3 (2021-08-21)#

Internal Changes#
  • Fix a bug in WFS.getfeature_byid when the number of IDs exceeds the service’s limit by splitting large requests into multiple smaller requests.

  • Add two new arguments, max_nrecords and read_method, to WFS to control the maximum number of records per request (defaults to 1000) and specify the response read method (defaults to json), respectively.

0.11.2 (2021-08-19)#

Internal Changes#
  • Simplify the retry logic ArcGISRESTFul by making it run four times and making sure that the last retry is one object ID per request.

0.11.1 (2021-07-31)#

The highlight of this release is migrating to use AsyncRetriever that can improve the network response time significantly. Another highlight is a major refactoring of ArcGISRESTFul that improves performance and reduce code complexity.

New Features#
  • Add a new method to ArcGISRESTFul class for automatically retrying the failed requests. This private method plucks out individual features that were in a failed request with several features. This happens when there are some object IDs that are not available on the server, and they are included in the request. In these situations the request will fail, although there are valid object IDs in the request. This method will pluck out the valid object IDs.

  • Add support for passing additional parameters to WMS requests such as styles.

  • Add support for WFS version 1.0.0.

Internal Changes#
  • Migrate to AsyncRetriever from requests-cache for all the web services.

  • Rename ServiceError to ServiceUnavailable and ServerError to ServiceError Since it’s more representative of the intended exception.

  • Raise for response status in RetrySession before the try-except block so RequestsException can raise, and its error messaged be parsed.

  • Deprecate utils.threading since all threading operations are now handled by AsyncRetriever.

  • Increase test coverage.

0.11.0 (2021-06-18)#

New Features#
  • Add support for requesting LineString polygon for ArcGISRESTful.

  • Add a new argument called distance to ArcGISRESTful.oids_bygeom for specifying the buffer distance from the input geometry for getting features.

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

  • Remove async_requests function, since it has been packaged as a new Python library called AsyncRetriever.

  • Refactor MatchCRS. Now, it should be instantiated by providing the in and out CRSs like so: MatchCRS(in_crs, out_crs). Then its methods, namely, geometry, bounds and coords, can be called. These methods now have only one input, geometry.

  • Change input and output types of MatchCRS.coords from tuple of lists of coordinates to list of (x, y) coordinates.

  • ArcGISRESTful now has a new argument, layer, for specifying the layer number (int). Now, the target layer should either be a part of base_url or be passed with layer argument.

  • Move the spatial_relation argument from ArcGISRESTful class to oids_bygeom method, since that’s where it’s applicable.

Internal Changes#
  • Refactor ArcGISRESTfulBase class to reduce its code complexity and make the service initialization logic much simpler. The class is faster since it makes fewer requests during the initialization process.

  • Add pydantic as a new dependency that takes care of ArcGISRESTfulBase validation.

  • Use persistent caching for all send/receive requests that can significantly improve the network response time.

  • Explicitly include all the hard dependencies in setup.cfg.

  • Set a default value of 1000 for max_nrecords in ArcGISRESTfulBase.

  • Use dataclass for WMSBase and WFSBase since support for Python 3.6 is dropped.

0.10.1 (2021-03-27)#

  • Add announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.0 (2021-03-06)#

  • The first release after renaming hydrodata to PyGeoHydro.

  • Fix extent property of ArcGISRESTful being set to None incorrectly.

  • Add feature types property to ArcGISRESTFul for getting names and IDs of types of features in the database.

  • Replace cElementTree with ElementTree since it’s been deprecated by defusedxml.

  • Remove dependency on dataclasses since its benefits and usage in the code was minimal.

  • Speed up CI testing by using mamba and caching.

  • ArcGISRESTFull now prints number of found features before attempting to retrieve them.

  • Use logging module for printing information.

0.9.0 (2021-02-14)#

  • Bump version to the same version as PyGeoHydro.

  • Add support for query by point and multi-points to ArcGISRESTful.bygeom.

  • Add support for buffer distance to ArcGISRESTful.bygeom.

  • Add support for generating ESRI-based queries for points and multi-points to ESRIGeomQuery.

  • Add all the missing type annotations.

  • Update the Daymet URL to version 4. You can check the release information here

  • Use cytoolz library for improving performance of some operations.

  • Add extent property to ArcGISRESTful class that get the spatial extent of the service.

  • Add URL to airmap service for getting elevation data at 30 m resolution.

0.2.3 (2020-12-19)#

  • Fix urlib3 deprecation warning about using method_whitelist.

0.2.2 (2020-12-05)#

  • Remove unused variables in async_requests and use max_workers.

  • Fix the async_requests issue on Windows systems.

0.2.0 (2020-12-06)#

  • Added/Renamed three class methods in ArcGISRESTful: oids_bygeom, oids_byfield, and oids_bysql. So you can query feature within a geometry, using specific field ID(s), or more generally using any valid SQL 92 WHERE clause.

  • Added support for query with SQL WHERE clause to ArcGISRESTful.

  • Changed the NLDI’s URL for migrating to its new API v3.

  • Added support for CQL filter to WFS, credits to Emilio.

  • Moved all the web services URLs to a YAML file that ServiceURL class reads. It makes managing the new URLs easier. The file is located at pygeoogc/static/urls.yml.

  • Turned off threading by default for all the services since not all web services supports it.

  • Added support for setting the request method, GET or POST, for WFS.byfilter, which could be useful when the filter string is long.

  • Added support for asynchronous download via the function async_requests.

0.1.10 (2020-08-18)#

  • Improved bbox_decompose to fix the WMS issue with high resolution requests.

  • Replaces simplejson with orjson to speed up JSON operations.

0.1.8 (2020-08-12)#

  • Removed threading for WMS due to inconsistent behavior.

  • Addressed an issue with domain decomposition for WMS where width/height becomes 0.

0.1.7 (2020-08-11)#

  • Renamed vsplit_bbox to bbox_decompose. The function now decomposes the domain in both directions and return squares and rectangular.

0.1.5 (2020-07-23)#

  • Re-wrote wms_bybox function as a class called WMS with a similar interface to the WFS class.

  • Added support for WMS 1.3.0 and WFS 2.0.0.

  • Added a custom Exception for the threading function called ThreadingException.

  • Add always_xy flag to WMS and WFS which is False by default. It is useful for cases where a web service doesn’t change the axis order from the transitional xy to yx for versions higher than 1.3.0.

0.1.3 (2020-07-21)#

  • Remove unnecessary transformation of the input bbox in WFS.

  • Use setuptools_scm for versioning.

0.1.2 (2020-07-16)#

  • Add the missing max_pixel argument to the wms_bybox function.

  • Change the onlyIPv4 method of RetrySession class to onlyipv4 to conform to the snake_case convention.

  • Improve docstrings.

0.1.1 (2020-07-15)#

  • Initial release.

History#

0.16.1 (2024-01-15)#

Bug Fixes#
  • pyproj uses its own env variables for SSL certification. This release fixes the issue with pyproj not being able to download the grid database when using DOI SSL certification file. This release uses pyproj.network.set_ca_bundle_path for setting the SSL certification file given by the user via HYRIVER_SSL_CERT env variable.

  • Ignore FutureWarning of pandas 2.1.0 for all-NaN columns in json2geodf.

Internal Changes#
  • For Attrs class, use dataclass instead for better performance and consistency.

0.16.0 (2024-01-03)#

Breaking Changes#
  • Refactor the spline generation functions to make them more efficient, more accurate, and more robust. Switched to using UnivariateSpline from scipy instead of BSpline. This allows for more control over smoothness of the spline via the smooth parameter. References to BSpline has been removed from the functions and a new functionality has been added. The new spline generation functions are GeoSpline, make_spline, spline_linestring, smooth_linestring, spline_curvature, and line_curvature. The smooth_linestring function now returns a LineString instead of a Spline object. This function is intended for smoothing a LineString when curvature, radius of curvature, and tangent angles are not needed. The spline_linestring function now returns a Spline object that contains the smoothed LineString and curvature, radius of curvature, and tangent angles. Also, line_curvature function can be used to compute curvature, radius of curvature, and tangent angles of a LineString at all point along the LineString.

New Features#
  • Add a new function called gtiff2vrt for creating a VRT file from a list of GeoTiff files. Note that this new function requires gdal to be installed.

  • The xd_write_crs function now keeps spatial_ref attribute of the input xarray.DataArray or xarray.Dataset to retain CF compliance.

0.15.2 (2023-09-22)#

New Features#
  • Add geometry_reproject function for reprojecting a geometry (bounding box, list of coordinates, or any shapely.geometry) to a new CRS.

  • Add smooth_linestring function for smoothing a LineString using B-splines.

  • Make make_bspline and bspline_curvature functions public. The make_bspline function uses scipy to generate a BSplines object and the bspline_curvature function calculates the tangent angles, curvature, and radius of curvature of a B-spline at any point along the B-spline.

  • Improve the accuracy and performance of B-spline generation functions.

Internal Changes#
  • Remove dependency on dask.

0.15.0 (2023-05-07)#

From release 0.15 onward, all minor versions of HyRiver packages will be pinned. This ensures that previous minor versions of HyRiver packages cannot be installed with later minor releases. For example, if you have py3dep==0.14.x installed, you cannot install pydaymet==0.15.x. This is to ensure that the API is consistent across all minor versions.

New Features#
  • For now, retain compatibility with shapely<2 while supporting shapley>=2.

0.14.0 (2023-03-05)#

New Features#
  • Ignore index when concatenating multiple responses in json2geodf to ensure indices are unique

  • Add a new function, called coords_list, for converting/validating input coordinates of any type to a list of tuple, i.e., [(x1, y1), (x2, y2), ...].

  • Make xd_write_crs function public.

  • In xarray_geomask if the input geometry is very small return at least one pixel.

  • Add a new function, called multi2poly, for converting a MultiPolygon to a Polygon in a GeoDataFrame. This function tries to convert MultiPolygon to Polygon by first checking if MultiPolygon can be directly converted using their exterior boundaries. If not, will try to remove those small sub-Polygon that their area is less than 1% of the total area of the MultiPolygon. If this fails, the original MultiPolygon will be returned.

Breaking Changes#
  • Bump the minimum required version of shapely to 2.0, and use its new API.

Internal Changes#
  • Sync all minor versions of HyRiver packages to 0.14.0.

0.13.12 (2023-02-10)#

Breaking Changes#
  • The input GeoDataFrame to break_lines now should be in a projected CRS.

New Features#
  • Significant improvements in the accuracy and performance of nested_``Polygon`` by changing the logic. Now, the function first determines the nested Polygon by comparing the centroids of the geometries with their geometry and then picks the largest geometry from each group of nested geometries.

  • Add a new function called query_indicies which is a wrapper around geopandas.sindex.query_bulk. However, instead of returning an array of positional indices, it returns a dictionary of indices where keys are the indices of the input geometry and values are a list of indices of the tree geometries that intersect with the input geometry.

Internal Changes#
  • Simplify geo2polygon by making the two CRS arguments optional and only reproject if CRS values are given and different.

  • Apply the geometry mask in gtiff2xarray even if the input geometry is a bounding box since the mask might not be the same geometry as the one that was used during data query.

  • Fully migrate setup.cfg and setup.py to pyproject.toml.

  • Convert relative imports to absolute with absolufy-imports.

  • Sync all patch versions of HyRiver packages to x.x.12.

0.13.11 (2023-01-08)#

Bug Fixes#
  • Fix an in issue xarray_geomask where for geometries that are smaller than a single pixel, the bbox clipping operation fails. This is fixed by using the auto_expand option of rioxarray.clip_box.

0.13.10 (2022-12-09)#

New Features#
  • Add a new function called nested_``Polygon`` for determining nested (multi)``Polygon`` in a gepandas.GeoDataFrame or geopandas.GeoSeries.

  • Add a new function called geodf2xarray for rasterizing a geopandas.GeoDataFrame to a xarray.DataArray.

Internal Changes#
  • Modify the codebase based on Refurb suggestions.

  • In xarray_geomask, if drop=True recalculate its transform to ensure the correct geo references are set if the shape of the dataset changes.

0.13.8 (2022-11-04)#

Internal Changes#
  • Improve the performance of xarray_geomask significantly by first clipping the data to the geometry’s bounding box, then if the geometry is a polygon, masking the data with the polygon. This is much faster than directly masking the data with the polygon. Also, support passing a bounding box to xarray_geomask in addition to polygon and MultiPolygon.

  • Fix deprecation warning of pandas when changing the geometry column of a GeoDataFrame in then break_lines function.

0.13.7 (2022-11-04)#

Internal Changes#
  • When combining the responses, now dask handles data chunking more efficiently. This is especially important for handling large responses from WMS services.

  • Improve type hints for CRS-related arguments of all functions by including string, integer, and pyproj.CRS types.

  • In gtiff2xarray use rasterio engine to make sure all rioxarray attrs are read.

0.13.6 (2022-08-30)#

Internal Changes#
  • Add the missing PyPi classifiers for the supported Python versions.

0.13.5 (2022-08-29)#

Breaking Changes#
  • Append “Error” to all exception classes for conforming to PEP-8 naming conventions.

0.13.2 (2022-06-14)#

Breaking Changes#
  • Set the minimum supported version of Python to 3.8 since many of the dependencies such as xarray, pandas, rioxarray have dropped support for Python 3.7.

  • Bump min versions of rioxarray to 0.10 since it adds reading/writing GCPs.

Internal Changes#
  • Use micromamba for running tests and use nox for linting in CI.

0.13.1 (2022-06-11)#

New Features#
  • Add support for passing a custom bounding box in the Coordinates class. The default is the bounds of EPSG:4326 to retain backward compatibility. This new class parameter allows a user to check if a list of coordinates is within a custom bounding box. The bounds should be the EPSG:4326 coordinate system.

  • Add a new function called geometry_list for converting a list of multi-geometries to a list of geometries.

0.13.0 (2022-03-03)#

Internal Changes#
  • Write nodata attribute using rioxarray in geotiff2xarray since the clipping operation of rioxarray uses this value as fill value.

Bug Fixes#
  • In the break_lines function, convert MultiLineString into LineString since shapely.ops.substring only accepts LineString.

0.12.3 (2022-02-04)#

New Features#
  • Add a function called break_lines for breaking lines at given points.

  • Add a function called snap2nearest for snapping points to the nearest point on a line with a given tolerance. It accepts a geopandas.GeoSeries of points and a geopandas.GeoSeries or geopandas.GeoDataFrame of lines. It automatically snaps to the closest lines in the input data.

0.12.2 (2022-01-15)#

New Features#
  • Add a new class called GeoBSpline that generates B-splines from a set of coordinates. The spline attribute of this class has five attributes: x and y coordinates, phi and radius which are curvature and radius of curvature, respectively, and distance which is the total distance of each point along the B-spline from the starting points.

  • Add a new class called Coordinates that validates a set of lon/lat coordinates. It normalizes longitudes to the range [-180, 180) and has a points property that is geopandas.GeoSeries with validated coordinates. It uses spatial indexing to speed up the validation and should be able to handle large datasets efficiently.

  • Make transform2tuple a public function.

Internal Changes#
  • The geometry and geo_crs arguments of gtiff2xarray are now optional. This is useful for cases when the input GeoTiff response is the results of a bounding box query and there is no need for a geometry mask.

  • Replace the missing values after adding geometry mask via xarray_geomask by the nodatavals attribute of the input xarray.DataArray or xarray.Dataset. Therefore, the data type of the input xarray.DataArray or xarray.Dataset is conserved.

  • Expose connectivity argument of rasterio.features.shapes function in xarray2geodf function.

  • Move all private functions to a new module to make the main module less cluttered.

0.12.1 (2021-12-31)#

Internal Changes#
  • Refactor arcgis2geojson for better readability and maintainability.

  • In arcgis2geojson set the geometry to null if its type is not supported, such as curved polylines.

0.12.0 (2021-12-27)#

Internal Changes#
  • Add all the missing types so mypy --strict passes.

  • Bump version to 0.12.0 to match the release of pygeoogc.

0.11.7 (2021-11-09)#

Internal Changes#
  • Use rioxarray for dealing with GeoTIFF binaries since xarray deprecated the xarray.open_rasterio function, as it’s discussed in this PR.

  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.11.6 (2021-10-06)#

New Features#
  • Add a new function, xarray2geodf, to convert a xarray.DataArray to a geopandas.GeoDataFrame.

0.11.5 (2021-06-16)#

Bug Fixes#
  • Fix an issue with gtiff2xarray where the scales and offsets attributes of the output DataArray were floats rather than tuples (GH 30).

Internal Changes#
  • Add a new function, transform2tuple, for converting Affine transforms to a tuple. Previously, the Affine transform was converted to a tuple using to_gdal() method of rasterio.Affine which was not compatible with rioxarray.

0.11.4 (2021-08-26)#

Internal Changes#
  • Use ujson for JSON parsing instead of orjson since orjson only serializes to bytes which is not compatible with aiohttp.

  • Convert the transform attribute data type from Affine to tuple since saving a data array to netcdf cannot handle the Affine type.

0.11.3 (2021-08-19)#

  • Fix an issue in geotiff2xarray related to saving a xarray object to NetCDF when its transform attribute has Affine type rather than a tuple.

0.11.2 (2021-07-31)#

The highlight of this release is performance improvement in gtiff2xarray for handling large responses.

New Features#
  • Automatic detection of the driver by default in gtiff2xarray as opposed to it being GTiff.

Internal Changes#
  • Make geo2polygon, get_transform, and get_nodata_crs public functions since other packages use it.

  • Make xarray_mask a public function and simplify gtiff2xarray.

  • Remove MatchCRS since it’s already available in pygeoogc.

  • Validate input geometry in geo2polygon.

  • Refactor gtiff2xarray to check for the ds_dims outside the main loops to improve the performance. Also, the function tries to detect the dimension names automatically if ds_dims is not provided by the user, explicitly.

  • Improve performance of json2geodf by using list comprehension and performing checks outside the main loop.

Bug Fixes#
  • Add the missing arguments for masking the data in gtiff2xarray.

0.11.1 (2021-06-19)#

Bug Fixes#
  • In some edge cases the y-coordinates of a response might not be monotonically sorted so dask fails. This release sorts them to address this issue.

0.11.0 (2021-06-19)#

New Features#
  • Function gtiff2xarray returns a parallelized xarray.Dataset or xarray.DataAraay that can handle large responses much more efficiently. This is achieved using dask.

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

  • Refactor MatchCRS. Now, it should be instantiated by providing the in and out CRSs like so: MatchCRS(in_crs, out_crs). Then its methods, namely, geometry, bounds and coords, can be called. These methods now have only one input, geometry.

  • Change input and output types of MatchCRS.coords from tuple of lists of coordinates to list of (x, y) coordinates.

  • Remove xarray_mask and gtiff2file since rioxarray is more general and suitable.

Internal Changes#
  • Remove unnecessary type checks for private functions.

  • Refactor json2geodf to improve robustness. Use get method of dict for checking key availability.

0.10.1 (2021-03-27)#

  • Setting transform of the merged dataset explicitly (GH 3).

  • Add announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.0 (2021-03-06)#

  • The first release after renaming hydrodata to PyGeoHydro.

  • Address GH 1 by sorting y coordinate after merge.

  • Make mypy checks more strict and fix all the errors and prevent possible bugs.

  • Speed up CI testing by using mamba and caching.

0.9.0 (2021-02-14)#

  • Bump version to the same version as PyGeoHydro.

  • Add gtiff2file for saving raster responses as geotiff file(s).

  • Fix an error in _get_nodata_crs for handling no data value when its value in the source is None.

  • Fix the warning during the GeoDataFrame generation in json2geodf when there is no geometry column in the input JSON.

0.2.0 (2020-12-06)#

  • Added checking the validity of input arguments in gtiff2xarray function and provide useful messages for debugging.

  • Add support for MultiPolygon.

  • Remove the fill_hole argument.

  • Fixed a bug in xarray_geomask for getting the transform.

0.1.10 (2020-08-18)#

  • Fixed the gtiff2xarray issue with high resolution requests and improved robustness of the function.

  • Replaced simplejson with orjson to speed up JSON operations.

0.1.9 (2020-08-11)#

  • Modified griff2xarray to reflect the latest changes in pygeoogc 0.1.7.

0.1.8 (2020-08-03)#

  • Retained the compatibility with xarray 0.15 by removing the attrs flag.

  • Added xarray_geomask function and made it a public function.

  • More efficient handling of large GeoTiff responses by cropping the response before converting it into a dataset.

  • Added a new function called geo2polygon for converting and transforming a polygon or bounding box into a Shapely’s Polygon in the target CRS.

0.1.6 (2020-07-23)#

  • Fixed the issue with flipped mask in WMS.

  • Removed drop_duplicates since it may cause issues in some instances.

0.1.4 (2020-07-22)#

  • Refactor griff2xarray and added support for WMS 1.3.0 and WFS 2.0.0.

  • Add MatchCRS class.

  • Remove dependency on PyGeoOGC.

  • Increase test coverage.

0.1.3 (2020-07-21)#

  • Remove duplicate rows before returning the dataframe in the json2geodf function.

  • Add the missing dependency

0.1.0 (2020-07-21)#

  • First release on PyPI.

Contributing#

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways to any of the packages that are included in HyRiver project. The workflow is the same for all packages. In this page, a contribution workflow for PyGridMET is explained. You can easily adapt it to other packages by replacing pygridmet with the name of the package that you want to contribute to.

Types of Contributions#

Report Bugs#

Report bugs at hyriver/pygridmet#issues.

Fix Bugs#

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features#

Other than new features that you might have in mind, you can look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation#

PyGridMET could always use more documentation, whether as part of the official PyGridMET docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback#

The best way to send feedback is to file an issue at hyriver/pygridmet#issues.

If you are proposing a feature:

  • Explain in detail how it would work.

  • Keep the scope as narrow as possible, to make it easier to implement.

  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!#

Ready to contribute? Here’s how to set up pygridmet for local development.

  1. Fork the PyGridMET repo through the GitHub website.

  2. Clone your fork locally and add the main pygridmet as the upstream remote:

$ git clone git@github.com:your_name_here/pygridmet.git
$ git remote add upstream git@github.com:hyriver/pygridmet.git
  1. Install your local copy into a virtualenv. Assuming you have mamba installed, this is how you can set up your fork for local development:

$ cd pygridmet/
$ mamba env create -f ci/requirements/environment-dev.yml
$ mamba activate pygridmet-dev
$ python -m pip install . --no-deps
  1. Create a branch for local development:

$ git checkout -b bugfix-or-feature/name-of-your-bugfix-or-feature
$ git push
  1. Now you can make your changes locally, make sure to add a description of the changes to HISTORY.rst file and add extra tests, if applicable, to tests folder. Also, make sure to give yourself credit by adding your name at the end of the item(s) that you add in the history like this By `Taher Chegini <https://github.com/hyriver>`_. Then, fetch the latest updates from the remote and resolve any merge conflicts:

$ git fetch upstream
$ git merge upstream/name-of-your-branch
  1. Then create a new environment for linting and another for testing:

 $ mamba create -n py11 python=3.11 nox tomli pre-commit codespell gdal
 $ mamba activate py11
 $ nox -s pre-commit
 $ nox -s type-check

 $ mamba create -n py38 python=3.8 nox tomli pre-commit codespell gdal
 $ mamba activate py38
 $ nox -s tests

Note that if Python 3.11 is already installed on your system, you can
skip creating the ``py11`` environment and just use your system's Python 3.11
to run the linting and type-checking tests, like this:
$ mamba create -n py38 python=3.8 nox tomli pre-commit codespell gdal
$ mamba activate py38
$ nox
  1. If you are making breaking changes make sure to reflect them in the documentation, README.rst, and tests if necessary.

  2. Commit your changes and push your branch to GitHub. Start the commit message with ENH:, BUG:, DOC: to indicate whether the commit is a new feature, documentation related, or a bug fix. For example:

$ git add .
$ git commit -m "ENH: A detailed description of your changes."
$ git push origin name-of-your-branch
  1. Submit a pull request through the GitHub website.

Tips#

To run a subset of tests:

$ nox -s tests -- -n=1 -k "test_name1 or test_name2"

Deploying#

A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in HISTORY.rst). Then run:

$ git tag -a vX.X.X -m "vX.X.X"
$ git push --follow-tags

where X.X.X is the version number following the semantic versioning spec i.e., MAJOR.MINOR.PATCH. Then release the tag from Github and Github Actions will deploy it to PyPi.

Development Team#

Lead#

Contributors#

License#

MIT License

Copyright (c) 2020, Taher Chegini

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.