Hydroclimate Data Retriever#

Python Versions Binder ReadTheDocs JOSS

HyRiver (formerly named hydrodata) is a suite of Python packages that provides a unified API for retrieving geospatial/temporal data from various web services. HyRiver includes two categories of packages:

  • Low-level APIs for accessing any of the supported web services, i.e., ArcGIS RESTful, WMS, and WFS.

  • High-level APIs for accessing some of the most commonly used datasets in hyrdology and climatology studies. Currently, this project only includes hydrology and climatology data within the US.

You can watch these videos for a quick overview of HyRiver:

Getting Started#

Why HyRiver?#

Some of the major capabilities of HyRiver are as follows:

  • Easy access to many web services for subsetting data on server-side and returning the requests as masked Datasets or GeoDataFrames.

  • Splitting large requests into smaller chunks, under-the-hood, since web services often limit the number of features per request. So the only bottleneck for subsetting the data is your local machine memory.

  • Navigating and subsetting NHDPlus database (both medium- and high-resolution) using web services.

  • Cleaning up the vector NHDPlus data, fixing some common issues, and computing vector-based accumulation through a river network.

  • A URL inventory for some of the popular (and tested) web services.

  • Some utilities for manipulating the obtained data and their visualization.

Installation#

You can install all the packages using pip:

$ pip install py3dep pynhd pygeohydro pydaymet pygeoogc pygeoutils async_retriever

Please note that installation with pip fails if libgdal is not installed on your system. You should install this package manually beforehand. For example, on Ubuntu-based distros the required package is libgdal-dev. If this package is installed on your system you should be able to run gdal-config --version successfully.

Alternatively, you can use conda or mamba (recommended) to install these packages from the conda-forge repository:

$ conda install -c conda-forge py3dep pynhd pygeohydro pydaymet pygeoogc pygeoutils async_retriever

or:

$ mamba install -c conda-forge --strict-channel-priority py3dep pynhd pygeohydro pydaymet pygeoogc pygeoutils async_retriever

Dependencies#

  • async_retriever

  • cytoolz

  • geopandas

  • networkx

  • numpy

  • pandas

  • pyarrow

  • pygeoogc

  • pygeoutils

  • requests

  • shapely

  • simplejson

  • async_retriever

  • defusedxml

  • folium

  • geopandas

  • lxml

  • matplotlib

  • numpy

  • openpyxl

  • pandas

  • pygeoogc

  • pygeoutils

  • pynhd

  • rasterio

  • shapely

  • async_retriever

  • click

  • cytoolz

  • numpy

  • pydantic

  • pygeoogc

  • pygeoutils

  • rasterio

  • scipy

  • shapely

  • xarray

  • async_retriever

  • click

  • dask

  • lxml

  • numpy

  • pandas

  • py3dep

  • pygeoogc

  • pygeoutils

  • rasterio

  • scipy

  • shapely

  • xarray

  • async_retriever

  • cytoolz

  • defusedxml

  • owslib

  • pydantic

  • pyproj

  • pyyaml

  • requests

  • shapely

  • simplejson

  • urllib3

  • affine

  • dask

  • geopandas

  • netcdf4

  • numpy

  • orjson

  • pygeoogc

  • pyproj

  • rasterio

  • shapely

  • xarray

  • aiohttp-client-cache

  • aiohttp[speedups]

  • aiosqlite

  • cytoolz

  • nest-asyncio

  • orjson

Additionally, you can also install bottleneck, pygeos, and rtree to improve performance of xarray and geopandas. For handling vector and raster data projections, cartopy and rioxarray are useful.

Software Stack#

A detailed description of each component of the HyRiver software stack.

PyNHD: Navigate and subset NHDPlus database#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

PyNHD is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services.

This package provides access to several hydro-linked datasets including WaterData, The National Map’s NHDPlus MR, and NHDPlus HR, NLDI, PyGeoAPI, and GeoConnex.

These web services can be used to navigate and extract vector data from NHDPlus V2 (both mid- and high-resolution) database such as catchments, HUC8, HUC12, GagesII, flowlines, and water bodies. Moreover, PyNHD gives access to an item on ScienceBase called Select Attributes for NHDPlus Version 2.1 Reach Catchments and Modified Network Routed Upstream Watersheds for the Conterminous United States that is located here. This item provides over 30 attributes at catchment-scale based on NHDPlus ComIDs. These attributes are available in three categories:

  1. Local (local): For individual reach catchments,

  2. Total (upstream_acc): For network-accumulated values using total cumulative drainage area,

  3. Divergence (div_routing): For network-accumulated values using divergence-routed.

A list of these attributes for each characteristic type can be accessed using nhdplus_attrs function.

Moreover, the PyGeoAPI service provides four functionalities:

  1. flow_trace: Trace flow from a starting point to up/downstream direction.

  2. split_catchment: Split the local catchment of a point of interest at the point’s location.

  3. elevation_profile: Extract elevation profile along a flow path between two points.

  4. cross_section: Extract cross-section at a point of interest along a flow line.

Similarly, PyNHD provides access to ComID-linked NHDPlus Value Added Attributes on Hydroshare. This dataset includes slope and roughness, among other attributes, for all the flowlines. You can use nhdplus_vaa function to get this dataset.

Additionally, PyNHD offers some extra utilities for processing the flowlines:

  • flowline_xsection and network_xsection: Get cross-section lines along a flowline at a given spacing or a network of flowlines at a given spacing.

  • flowline_resample and network_resample: Resampe a flowline or network of flowlines based on a given spacing. This is useful for smoothing jagged flowlines similar to those in the NHDPlus database.

  • prepare_nhdplus: For cleaning up the data frame by, for example, removing tiny networks, adding a to_comid column, and finding terminal flowlines if it doesn’t exist.

  • topoogical_sort: For sorting the river network topologically which is useful for routing and flow accumulation.

  • vector_accumulation: For computing flow accumulation in a river network. This function is generic, and any routing method can be plugged in.

These utilities are developed based on an R package called nhdplusTools and a Python package called nldi-xstool.

All functions and classes that request data from web services use async_retriever that offers response caching. By default, the expiration time is set to never expire. All these functions and classes have two optional parameters for controlling the cache: expire_after and disable_caching. You can use expire_after to set the expiration time in seconds. If expire_after is set to -1, the cache will never expire (default). You can use disable_caching if you don’t want to use the cached responses. The cached responses are stored in the ./cache/aiohttp_cache.sqlite file.

You can find some example notebooks here.

Moreover, under the hood, PyNHD uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.

You can control the request/response caching behavior by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to -1 (never expire).

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

You can also try using PyNHD without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Please note that since this project is in the early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.

Moreover, requests for additional functionalities can be submitted via issue tracker.

Installation#

You can install PyNHD using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev):

$ pip install pynhd

Alternatively, PyNHD can be installed from the conda-forge repository using Conda or Mamba:

$ conda install -c conda-forge pynhd

Quick start#

Let’s explore the capabilities of NLDI. We need to instantiate the class first:

from pynhd import NLDI, WaterData, NHDPlusHR
import pynhd as nhd

First, let’s get the watershed geometry of the contributing basin of a USGS station using NLDI:

nldi = NLDI()
station_id = "01031500"

basin = nldi.get_basins(station_id)

The navigate_byid class method can be used to navigate NHDPlus in both upstream and downstream of any point in the database. Let’s get the ComIDs and flowlines of the tributaries and the main river channel upstream of the station.

flw_main = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamMain",
    source="flowlines",
    distance=1000,
)

flw_trib = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="flowlines",
    distance=1000,
)

We can get other USGS stations upstream (or downstream) of the station and even set a distance limit (in km):

st_all = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="nwissite",
    distance=1000,
)

st_d20 = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="nwissite",
    distance=20,
)

We can get more information about these stations using GeoConnex:

gcx = GeoConnex("gages")
stations = st_all.identifier.str.split("-").str[1].unique()
gages = gpd.GeoDataFrame(
    pd.concat(gcx.query({"provider_id": sid}) for sid in stations),
    crs="epsg:4326",
)

Instead, we can carry out a spatial query within the basin of interest:

gages = pynhd.geoconnex(
    item="gages",
    query={"geometry": basin.geometry.iloc[0]},
)

Now, let’s get the HUC12 pour points:

pp = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="huc12pp",
    distance=1000,
)
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/nhdplus_navigation.png

Also, we can get the slope data for each river segment from the NHDPlus VAA database:

vaa = nhd.nhdplus_vaa("input_data/nhdplus_vaa.parquet")

flw_trib["comid"] = pd.to_numeric(flw_trib.nhdplus_comid)
slope = gpd.GeoDataFrame(
    pd.merge(flw_trib, vaa[["comid", "slope"]], left_on="comid", right_on="comid"),
    crs=flw_trib.crs,
)
slope[slope.slope < 0] = np.nan

Additionally, we can obtain cross-section lines along the main river channel with 4 km spacing and width of 2 km using network_xsection as follows:

from pynhd import NHD

distance = 4000  # in meters
width = 2000  # in meters
nhd = NHD("flowline_mr")
main_nhd = nhd.byids("COMID", flw_main.index)
main_nhd = pynhd.prepare_nhdplus(main_nhd, 0, 0, 0, purge_non_dendritic=True)
main_nhd = main_nhd.to_crs("ESRI:102003")
cs = pynhd.network_xsection(main_nhd, distance, width)

Then, we can use Py3DEP to obtain the elevation profile along the cross-section lines.

Now, let’s explore the PyGeoAPI capabilities. There are two ways that you can access PyGeoAPI: PyGeoAPI class and pygeoapi function. The PyGeoAPI class is for querying the database for a single location using tuples and list while the pygeoapi function is for querying the database for multiple locations at once and accepts a geopandas.GeoDataFrame as input. The pygeoapi function is more efficient than the PyGeoAPI class and has a simpler interface. In future versions, the PyGeoAPI class will be deprecated and the pygeoapi function will be the only way to access the database. Let’s compare the two, starting by PyGeoAPI:

pygeoapi = PyGeoAPI()

trace = pygeoapi.flow_trace(
    (1774209.63, 856381.68), crs="ESRI:102003", direction="none"
)

split = pygeoapi.split_catchment((-73.82705, 43.29139), crs="epsg:4326", upstream=False)

profile = pygeoapi.elevation_profile(
    [(-103.801086, 40.26772), (-103.80097, 40.270568)],
    numpts=101,
    dem_res=1,
    crs="epsg:4326",
)

section = pygeoapi.cross_section(
    (-103.80119, 40.2684), width=1000.0, numpts=101, crs="epsg:4326"
)

Now, let’s do the same operations using pygeoapi:

import geopandas as gpd
import shapely.geometry as sgeom
import pynhd as nhd

coords = gpd.GeoDataFrame(
    {
        "direction": ["up", "down"],
        "upstream": [True, False],
        "width": [1000.0, 500.0],
        "numpts": [101, 55],
    },
    geometry=[
        sgeom.Point(-73.82705, 43.29139),
        sgeom.Point(-103.801086, 40.26772),
    ],
    crs="epsg:4326",
)
trace = nhd.pygeoapi(coords, "flow_trace")
split = nhd.pygeoapi(coords, "split_catchment")
section = nhd.pygeoapi(coords, "cross_section")

coords = gpd.GeoDataFrame(
    {
        "direction": ["up", "down"],
        "upstream": [True, False],
        "width": [1000.0, 500.0],
        "numpts": [101, 55],
        "dem_res": [1, 10],
    },
    geometry=[
        sgeom.MultiPoint([(-103.801086, 40.26772), (-103.80097, 40.270568)]),
        sgeom.MultiPoint([(-102.801086, 39.26772), (-102.80097, 39.270568)]),
    ],
    crs="epsg:4326",
)
profile = nhd.pygeoapi(coords, "elevation_profile")
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/split_catchment.png

Next, we retrieve mid- and high-resolution flowlines within the bounding box of our watershed and compare them using WaterData for mid-resolution, NHDPlusHR for high-resolution.

mr = WaterData("nhdflowline_network")
nhdp_mr = mr.bybox(basin.geometry[0].bounds)

hr = NHDPlusHR("flowline")
nhdp_hr = hr.bygeom(basin.geometry[0].bounds)
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/hr_mr.png

An alternative to WaterData and NHDPlusHR is the NHD class that supports both the mid- and high-resolution NHDPlus V2 data:

mr = NHD("flowline_mr")
nhdp_mr = mr.bygeom(basin.geometry[0].bounds)

hr = NHD("flowline_hr")
nhdp_hr = hr.bygeom(basin.geometry[0].bounds)

Moreover, WaterData can find features within a given radius (in meters) of a point:

eck4 = "+proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
coords = (-5727797.427596455, 5584066.49330473)
rad = 5e3
flw_rad = mr.bydistance(coords, rad, loc_crs=eck4)
flw_rad = flw_rad.to_crs(eck4)

Instead of getting all features within a radius of the coordinate, we can snap to the closest feature ID using NLDI:

comid_closest = nldi.comid_byloc((x, y), eck4)
flw_closest = nhdp_mr.byid("comid", comid_closest.comid.values[0])
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/nhdplus_radius.png

Since NHDPlus HR is still at the pre-release stage let’s use the MR flowlines to demonstrate the vector-based accumulation. Based on a topological sorted river network pynhd.vector_accumulation computes flow accumulation in the network. It returns a data frame that is sorted from upstream to downstream that shows the accumulated flow in each node.

PyNHD has a utility called prepare_nhdplus that identifies such relationships among other things such as fixing some common issues with NHDPlus flowlines. But first, we need to get all the NHDPlus attributes for each ComID since NLDI only provides the flowlines’ geometries and ComIDs which is useful for navigating the vector river network data. For getting the NHDPlus database we use WaterData. Let’s use the nhdflowline_network layer to get required info.

wd = WaterData("nhdflowline_network")

comids = flw_trib.nhdplus_comid.to_list()
nhdp_trib = wd.byid("comid", comids)
flw = nhd.prepare_nhdplus(nhdp_trib, 0, 0, purge_non_dendritic=False)

To demonstrate the use of routing, let’s use nhdplus_attrs function to get a list of available NHDPlus attributes

char = "CAT_RECHG"
area = "areasqkm"

local = nldi.getcharacteristic_byid(comids, "local", char_ids=char)
flw = flw.merge(local[char], left_on="comid", right_index=True)


def runoff_acc(qin, q, a):
    return qin + q * a


flw_r = flw[["comid", "tocomid", char, area]]
runoff = nhd.vector_accumulation(flw_r, runoff_acc, char, [char, area])


def area_acc(ain, a):
    return ain + a


flw_a = flw[["comid", "tocomid", area]]
areasqkm = nhd.vector_accumulation(flw_a, area_acc, area, [area])

runoff /= areasqkm

Since these are catchment-scale characteristics, let’s get the catchments then add the accumulated characteristic as a new column and plot the results.

wd = WaterData("catchmentsp")
catchments = wd.byid("featureid", comids)

c_local = catchments.merge(local, left_on="featureid", right_index=True)
c_acc = catchments.merge(runoff, left_on="featureid", right_index=True)
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/flow_accumulation.png

More examples can be found here.

PyGeoHydro: Retrieve Geospatial Hydrology Data#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

PyGeoHydro (formerly named hydrodata) is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access to some public web services that offer geospatial hydrology data. It has three main modules: pygeohydro, plot, and helpers.

PyGeoHydro supports the following datasets:

  • NWIS for daily mean streamflow observations (returned as a pandas.DataFrame or xarray.Dataset with station attributes),

  • CAMELS for accessing streamflow observations (1980-2014) and basin-level attributes of 671 stations within CONUS.

  • Water Quality Portal for accessing current and historical water quality data from more than 1.5 million sites across the US,

  • NID for accessing the National Inventory of Dams web service,

  • HCDN 2009 for identifying sites where human activity affects the natural flow of the watercourse,

  • NLCD 2019 for land cover/land use, imperviousness, imperviousness descriptor, and canopy data. You can get data using both geometries and coordinates.

  • WBD for accessing Hydrologic Unit (HU) polygon boundaries within the US (all HUC levels).

  • SSEBop for daily actual evapotranspiration, for both single pixel and gridded data.

Also, it has two other functions:

  • interactive_map: Interactive map for exploring NWIS stations within a bounding box.

  • cover_statistics: Categorical statistics of land use/land cover data.

  • overland_roughness: Estimate overland roughness from land use/land cover data.

The plot module includes two main functions:

  • signatures: Hydrologic signature graphs.

  • cover_legends: Official NLCD land cover legends for plotting a land cover dataset.

  • descriptor_legends: Color map and legends for plotting an imperviousness descriptor dataset.

The helpers module includes:

  • nlcd_helper: A roughness coefficients lookup table for each land cover and imperviousness descriptor type which is useful for overland flow routing among other applications.

  • nwis_error: A dataframe for finding information about NWIS requests’ errors.

You can find some example notebooks here.

Moreover, under the hood, PyGeoHydro uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.

You can control the request/response caching behavior by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to -1 (never expire).

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

You can also try using PyGeoHydro without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Please note that since this project is in early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.

Moreover, requests for additional functionalities can be submitted via issue tracker.

Installation#

You can install PyGeoHydro using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev). Moreover, PyGeoHydro has an optional dependency for using persistent caching, requests-cache. We highly recommend installing this package as it can significantly speed up send/receive queries. You don’t have to change anything in your code, since PyGeoHydro under-the-hood looks for requests-cache and if available, it will automatically use persistent caching:

$ pip install pygeohydro

Alternatively, PyGeoHydro can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge pygeohydro

Quick start#

We can explore the available NWIS stations within a bounding box using interactive_map function. It returns an interactive map and by clicking on a station some of the most important properties of stations are shown.

import pygeohydro as gh

bbox = (-69.5, 45, -69, 45.5)
gh.interactive_map(bbox)
Interactive Map

We can select all the stations within this boundary box that have daily mean streamflow data from 2000-01-01 to 2010-12-31:

from pygeohydro import NWIS

nwis = NWIS()
query = {
    **nwis.query_bybox(bbox),
    "hasDataTypeCd": "dv",
    "outputDataTypeCd": "dv",
}
info_box = nwis.get_info(query)
dates = ("2000-01-01", "2010-12-31")
stations = info_box[
    (info_box.begin_date <= dates[0]) & (info_box.end_date >= dates[1])
].site_no.tolist()

Then, we can get the daily streamflow data in mm/day (by default the values are in cms) and plot them:

from pygeohydro import plot

qobs = nwis.get_streamflow(stations, dates, mmd=True)
plot.signatures(qobs)

By default, get_streamflow returns a pandas.DataFrame that has a attrs method containing metadata for all the stations. You can access it like so qobs.attrs. Moreover, we can get the same data as xarray.Dataset as follows:

qobs_ds = nwis.get_streamflow(stations, dates, to_xarray=True)

This xarray.Dataset has two dimensions: time and station_id. It has 10 variables including discharge with two dimensions while other variables that are station attitudes are one dimensional.

We can also get instantaneous streamflow data using get_streamflow. This method assumes that the input dates are in UTC time zone and returns the data in UTC time zone as well.

date = ("2005-01-01 12:00", "2005-01-12 15:00")
qobs = nwis.get_streamflow("01646500", date, freq="iv")

We can get the CAMELS dataset as a geopandas.GeoDataFrame that includes geometry and basin-level attributes of 671 natural watersheds within CONUS and their streamflow observations between 1980-2014 as a xarray.Dataset, like so:

attrs, qobs = gh.get_camels()
CAMELS

The WaterQuality has a number of convenience methods to retrieve data from the web service. Since there are many parameter combinations that can be used to retrieve data, a general method is also provided to retrieve data from any of the valid endpoints. You can use get_json to retrieve stations info as a geopandas.GeoDataFrame or get_csv to retrieve stations data as a pandas.DataFrame. You can construct a dictionary of the parameters and pass it to one of these functions. For more information on the parameters, please consult the Water Quality Data documentation. For example, let’s find all the stations within a bounding box that have Caffeine data:

from pynhd import WaterQuality

bbox = (-92.8, 44.2, -88.9, 46.0)
kwds = {"characteristicName": "Caffeine"}
wq = WaterQuality()
stations = wq.station_bybbox(bbox, kwds)

Or the same criterion but within a 30-mile radius of a point:

stations = wq.station_bydistance(-92.8, 44.2, 30, kwds)

Then we can get the data for all these stations the data like this:

sids = stations.MonitoringLocationIdentifier.tolist()
caff = wq.data_bystation(sids, kwds)
Water Quality

Moreover, we can get land use/land cove data using nlcd_bygeom or nlcd_bycoods functions, percentages of land cover types using cover_statistics, and overland roughness using overland_roughness. The nlcd_bycoords function returns a geopandas.GeoDataFrame with the NLCD layers as columns and input coordinates as the geometry column. Moreover, the nlcd_bygeom function accepts both a single geometry or a geopandas.GeoDataFrame as the input.

from pynhd import NLDI

basins = NLDI().get_basins(["01031450", "01318500", "01031510"])
lulc = gh.nlcd_bygeom(basins, 100, years={"cover": [2016, 2019]})
stats = gh.cover_statistics(lulc["01318500"].cover_2016)
roughness = gh.overland_roughness(lulc["01318500"].cover_2019)
Land Use/Land Cover

Next, let’s use ssebopeta_bygeom to get actual ET data for a basin. Note that there’s a ssebopeta_bycoords function that returns an ETA time series for a single coordinate.

geometry = NLDI().get_basins("01315500").geometry[0]
eta = gh.ssebopeta_bygeom(geometry, dates=("2005-10-01", "2005-10-05"))
Actual ET

Additionally, we can pull all the US dams data using NID. Let’s get dams that are within this bounding box and have a maximum storage larger than 200 acre-feet.

nid = NID()
dams = nid.get_bygeom((-65.77, 43.07, -69.31, 45.45), "epsg:4326")
dams = nid.inventory_byid(dams.id.to_list())
dams = dams[dams.maxStorage > 200]

We can get also all dams within CONUS in NID with maximum storage larger than 200 acre-feet:

import geopandas as gpd

world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
conus = world[world.name == "United States of America"].geometry.iloc[0].geoms[0]

dam_list = nid.get_byfilter([{"maxStorage": ["[200 5000]"]}])
dams = dam_list[0][dam_list[0].is_valid]
dams = dams[dams.within(conus)]
Dams

The WBD class allows us to get Hydrologic Unit (HU) polygon boundaries. Let’s get the two Hudson HUC4s:

from pygeohydro import WBD

wbd = WBD("huc4")
hudson = wbd.byids("huc4", ["0202", "0203"])

Py3DEP: Topographic data through 3DEP#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

Py3DEP is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access to the 3DEP database which is a part of the National Map services. The 3DEP service has multi-resolution sources and depending on the user-provided resolution, the data is resampled on the server-side based on all the available data sources. Py3DEP returns the requests as xarray dataset. The main function is get_map which supports the following layers:

  • DEM

  • Hillshade Gray

  • Aspect Degrees

  • Aspect Map

  • GreyHillshade Elevation Fill

  • Hillshade Multidirectional

  • Slope Degrees

  • Slope Map

  • Hillshade Elevation Tinted

  • Height Ellipsoidal

  • Contour 25

  • Contour Smoothed 25

Moreover, Py3DEP offers some additional utilities:

  • elevation_bygrid: For retrieving elevations of all the grid points in a 2D grid.

  • elevation_bycoords: For retrieving elevation of a list of x and y coordinates.

  • elevation_profile: For retrieving elevation profile along a line at a given spacing. This function converts the line to a B-spline and then calculates the elevation along the spline at a given uniform spacing.

  • deg2mpm: For converting slope dataset from degree to meter per meter.

  • query_3dep_sources: For querying bounds of 3DEP’s data sources within a bounding box.

  • check_3dep_availability: For querying 3DEP’s resolution availability within a bounding box.

You can find some example notebooks here.

Moreover, under the hood, Py3DEP uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.

You can control the request/response caching behavior by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to -1 (never expire).

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

You can also try using Py3DEP without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Please note that since this project is in the early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.

Moreover, requests for additional functionalities can be submitted via issue tracker.

Installation#

You can install Py3DEP using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev). Moreover, Py3DEP has an optional dependency for using persistent caching, requests-cache. We highly recommend installing this package as it can significantly speed up send/receive queries. You don’t have to change anything in your code, since Py3DEP under-the-hood looks for requests-cache and if available, it will automatically use persistent caching:

$ pip install py3dep

Alternatively, Py3DEP can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge py3dep

Quick start#

You can use Py3DEP using command-line or as a Python library. The command-line interface provides access to two functionality:

  • Getting topographic data: You must create a geopandas.GeoDataFrame that contains the geometries of the target locations. This dataframe must have at least three columns: id, res, and geometry. The id column is used as filenames for saving the obtained topographic data to a NetCDF (.nc) file. The res column must be the target resolution in meter. Then, you must save the dataframe to a file with extensions such as .shp or .gpkg (whatever that geopandas.read_file can read).

  • Getting elevation: You must create a pandas.DataFrame that contains coordinates of the target locations. This dataframe must have at least two columns: x and y. The elevations are obtained using airmap service in meters. The data are saved as a csv file with the same filename as the input file with an _elevation appended, e.g., coords_elevation.csv.

$ py3dep --help
Usage: py3dep [OPTIONS] COMMAND [ARGS]...

Command-line interface for Py3DEP.

Options:
-h, --help  Show this message and exit.

Commands:
coords    Retrieve topographic data for a list of coordinates.
geometry  Retrieve topographic data within geometries.

The coords sub-command is as follows:

$ py3dep coords -h
Usage: py3dep coords [OPTIONS] FPATH

Retrieve topographic data for a list of coordinates.

FPATH: Path to a csv file with two columns named ``lon`` and ``lat``.

Examples:
    $ cat coords.csv
    lon,lat
    -122.2493328,37.8122894
    $ py3dep coords coords.csv -q airmap -s topo_dir

Options:
-q, --query_source [airmap|tnm|tep]
                                Source of the elevation data.
-s, --save_dir PATH             Path to a directory to save the requested
                                files. Extension for the outputs is either
                                `.nc` for geometry or `.csv` for coords.

-h, --help                      Show this message and exit.

And, the geometry sub-command is as follows:

$ py3dep geometry -h
Usage: py3dep geometry [OPTIONS] FPATH

Retrieve topographic data within geometries.

FPATH: Path to a shapefile (.shp) or geopackage (.gpkg) file.
This file must have three columns and contain a ``crs`` attribute:
    - ``id``: Feature identifiers that py3dep uses as the output netcdf/csv filenames.
    - ``res``: Target resolution in meters.
    - ``geometry``: A Polygon or MultiPloygon.

Examples:
    $ py3dep geometry ny_geom.gpkg -l "Slope Map" -l DEM -s topo_dir

Options:
-l, --layers [DEM|Hillshade Gray|Aspect Degrees|Aspect Map|GreyHillshade_elevationFill|Hillshade Multidirectional|Slope Map|Slope Degrees|Hillshade Elevation Tinted|Height Ellipsoidal|Contour 25|Contour Smoothed 25]
                                Target topographic data layers
-s, --save_dir PATH             Path to a directory to save the requested
                                files.Extension for the outputs is either
                                `.nc` for geometry or `.csv` for coords.

-h, --help                      Show this message and exit.

Now, let’s see how we can use Py3DEP as a library.

Py3DEP accepts Shapely’s Polygon or a bounding box (a tuple of length four) as an input geometry. We can use PyNHD to get a watershed’s geometry, then use it to get the DEM and slope in meters/meters from Py3DEP using get_map function.

The get_map has a resolution argument that sets the target resolution in meters. Note that the highest available resolution throughout the CONUS is about 10 m, though higher resolutions are available in limited parts of the US. Note that the input geometry can be in any valid spatial reference (geo_crs argument). The crs argument, however, is limited to CRS:84, EPSG:4326, and EPSG:3857 since 3DEP only supports these spatial references.

import py3dep
from pynhd import NLDI

geom = NLDI().get_basins("01031500").geometry[0]
dem = py3dep.get_map("DEM", geom, resolution=30, geo_crs="epsg:4326", crs="epsg:3857")
slope = py3dep.get_map("Slope Degrees", geom, resolution=30)
slope = py3dep.deg2mpm(slope)
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/dem_slope.png

We can use rioxarray package to save the obtained dataset as a raster file:

import rioxarray

dem.rio.to_raster("dem_01031500.tif")

Moreover, we can get the elevations of a set of x- and y- coordinates on a grid. For example, let’s get the minimum temperature data within this watershed from Daymet using PyDaymet then add the elevation as a new variable to the dataset:

import pydaymet as daymet
import xarray as xr
import numpy as np

clm = daymet.get_bygeom(geometry, ("2005-01-01", "2005-01-31"), variables="tmin")
elev = py3dep.elevation_bygrid(clm.x.values, clm.y.values, clm.crs, clm.res[0] * 1000)
attrs = clm.attrs
clm = xr.merge([clm, elev])
clm["elevation"] = clm.elevation.where(~np.isnan(clm.isel(time=0).tmin), drop=True)
clm.attrs.update(attrs)

Now, let’s get street network data using osmnx package and add elevation data for its nodes using elevation_bycoords function.

import osmnx as ox

G = ox.graph_from_place("Piedmont, California, USA", network_type="drive")
x, y = nx.get_node_attributes(G, "x").values(), nx.get_node_attributes(G, "y").values()
elevation = py3dep.elevation_bycoords(zip(x, y), crs="epsg:4326")
nx.set_node_attributes(G, dict(zip(G.nodes(), elevation)), "elevation")
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/street_elev.png

We can get the elevation profile along a line at a given spacing using elevation_profile function. For example, let’s get the elevation profile at 10-m spacing along the main flowline of the upstream drainage area of a USGS station with ID 01031500:

import py3dep
from pynhd import NLDI

flw_main = NLDI().navigate_byid(
    fsource="nwissite",
    fid="USGS-01031500",
    navigation="upstreamMain",
    source="flowlines",
    distance=1000,
)
line = flw_main.geometry.unary_union
elevation = py3dep.elevation_profile(line, 10)

PyDaymet: Daily climate data through Daymet#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

PyDaymet is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access to climate data from Daymet V4 database using NetCDF Subset Service (NCSS). Both single pixel (using get_bycoords function) and gridded data (using get_bygeom) are supported which are returned as pandas.DataFrame and xarray.Dataset, respectively. Climate data is available for North America, Hawaii from 1980, and Puerto Rico from 1950 at three time scales: daily, monthly, and annual. Additionally, PyDaymet can compute Potential EvapoTranspiration (PET) using three methods: penman_monteith, priestley_taylor, and hargreaves_samani for both single pixel and gridded data.

You can find some example notebooks here.

Moreover, under the hood, PyDaymet uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.

You can control the request/response caching behavior by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to -1 (never expire).

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

You can also try using PyDaymet without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Please note that since this project is in the early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.

Moreover, requests for additional functionalities can be submitted via issue tracker.

Installation#

You can install PyDaymet using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev):

$ pip install pydaymet

Alternatively, PyDaymet can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge pydaymet

Quick start#

You can use PyDaymet using command-line or as a Python library. The commanda-line provides access to two functionality:

  • Getting gridded climate data: You must create a geopandas.GeoDataFrame that contains the geometries of the target locations. This dataframe must have four columns: id, start, end, geometry. The id column is used as filenames for saving the obtained climate data to a NetCDF (.nc) file. The start and end columns are starting and ending dates of the target period. Then, you must save the dataframe as a shapefile (.shp) or geopackage (.gpkg) with CRS attribute.

  • Getting single pixel climate data: You must create a CSV file that contains coordinates of the target locations. This file must have at four columns: id, start, end, lon, and lat. The id column is used as filenames for saving the obtained climate data to a CSV (.csv) file. The start and end columns are the same as the geometry command. The lon and lat columns are the longitude and latitude coordinates of the target locations.

$ pydaymet -h
Usage: pydaymet [OPTIONS] COMMAND [ARGS]...

Command-line interface for PyDaymet.

Options:
-h, --help  Show this message and exit.

Commands:
coords    Retrieve climate data for a list of coordinates.
geometry  Retrieve climate data for a dataframe of geometries.

The coords sub-command is as follows:

$ pydaymet coords -h
Usage: pydaymet coords [OPTIONS] FPATH

Retrieve climate data for a list of coordinates.

FPATH: Path to a csv file with four columns:
    - ``id``: Feature identifiers that daymet uses as the output netcdf filenames.
    - ``start``: Start time.
    - ``end``: End time.
    - ``lon``: Longitude of the points of interest.
    - ``lat``: Latitude of the points of interest.
    - ``time_scale``: (optional) Time scale, either ``daily`` (default), ``monthly`` or ``annual``.
    - ``pet``: (optional) Method to compute PET. Suppoerted methods are:
                ``penman_monteith``, ``hargreaves_samani``, ``priestley_taylor``, and ``none`` (default).
    - ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.

Examples:
    $ cat coords.csv
    id,lon,lat,start,end,pet
    california,-122.2493328,37.8122894,2012-01-01,2014-12-31,hargreaves_samani
    $ pydaymet coords coords.csv -v prcp -v tmin

Options:
-v, --variables TEXT  Target variables. You can pass this flag multiple
                        times for multiple variables.
-s, --save_dir PATH   Path to a directory to save the requested files.
                        Extension for the outputs is .nc for geometry and .csv
                        for coords.
--disable_ssl         Pass to disable SSL certification verification.
-h, --help            Show this message and exit.

And, the geometry sub-command is as follows:

$ pydaymet geometry -h
Usage: pydaymet geometry [OPTIONS] FPATH

Retrieve climate data for a dataframe of geometries.

FPATH: Path to a shapefile (.shp) or geopackage (.gpkg) file.
This file must have four columns and contain a ``crs`` attribute:
    - ``id``: Feature identifiers that daymet uses as the output netcdf filenames.
    - ``start``: Start time.
    - ``end``: End time.
    - ``geometry``: Target geometries.
    - ``time_scale``: (optional) Time scale, either ``daily`` (default), ``monthly`` or ``annual``.
    - ``pet``: (optional) Method to compute PET. Suppoerted methods are:
                ``penman_monteith``, ``hargreaves_samani``, ``priestley_taylor``, and ``none`` (default).
    - ``snow``: (optional) Separate snowfall from precipitation, default is ``False``.

Examples:
    $ pydaymet geometry geo.gpkg -v prcp -v tmin

Options:
-v, --variables TEXT  Target variables. You can pass this flag multiple
                        times for multiple variables.
-s, --save_dir PATH   Path to a directory to save the requested files.
                        Extension for the outputs is .nc for geometry and .csv
                        for coords.
--disable_ssl         Pass to disable SSL certification verification.
-h, --help            Show this message and exit.

Now, let’s see how we can use PyDaymet as a library.

PyDaymet offers two functions for getting climate data; get_bycoords and get_bygeom. The arguments of these functions are identical except the first argument where the latter should be polygon and the former should be a coordinate (a tuple of length two as in (x, y)). The input geometry or coordinate can be in any valid CRS (defaults to EPSG:4326). The dates argument can be either a tuple of length two like (start_str, end_str) or a list of years like [2000, 2005]. It is noted that both functions have a pet flag for computing PET and a snow flag for separating snow from precipitation using Martinez and Gupta (2010) method. Additionally, we can pass time_scale to get daily, monthly or annual summaries. This flag by default is set to daily.

from pynhd import NLDI
import pydaymet as daymet

geometry = NLDI().get_basins("01031500").geometry[0]

var = ["prcp", "tmin"]
dates = ("2000-01-01", "2000-06-30")

daily = daymet.get_bygeom(
    geometry, dates, variables=var, pet="priestley_taylor", snow=True
)
monthly = daymet.get_bygeom(geometry, dates, variables=var, time_scale="monthly")
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/daymet_grid.png

If the input geometry (or coordinate) is in a CRS other than EPSG:4326, we should pass it to the functions.

coords = (-1431147.7928, 318483.4618)
crs = "epsg:3542"
dates = ("2000-01-01", "2006-12-31")
annual = daymet.get_bycoords(
    coords, dates, variables=var, loc_crs=crs, time_scale="annual"
)
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/daymet_loc.png

Also, we can use the potential_et function to compute PET by passing the daily climate data. We can either pass a pandas.DataFrame or a xarray.Dataset. Note that, penman_monteith and priestley_taylor methods have parameters that can be passed via the params argument, if any value other than the default values are needed. For example, default value of alpha for priestley_taylor method is 1.26 (humid regions), we can set it to 1.74 (arid regions) as follows:

pet_hs = daymet.potential_et(daily, methods="priestley_taylor", params={"alpha": 1.74})

Next, let’s get annual total precipitation for Hawaii and Puerto Rico for 2010.

hi_ext = (-160.3055, 17.9539, -154.7715, 23.5186)
pr_ext = (-67.9927, 16.8443, -64.1195, 19.9381)
hi = daymet.get_bygeom(hi_ext, 2010, variables="prcp", region="hi", time_scale="annual")
pr = daymet.get_bygeom(pr_ext, 2010, variables="prcp", region="pr", time_scale="annual")

Some example plots are shown below:

https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/hi.png https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/pr.png

AsyncRetriever: Asynchronous requests with persistent caching#

PyPi Conda Version CodeCov Python Versions Github Actions

Security Status CodeFactor black pre-commit

Features#

AsyncRetriever is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package serves as HyRiver’s engine for asynchronously sending requests and retrieving responses as text, binary, or json objects. It uses persistent caching using aiohttp-client-cache to speed up the retrieval even further. Moreover, thanks to nest_asyncio you can use this package in Jupyter notebooks. Although this package is part of the HyRiver software stack, it can be used for any web calls. There are three functions that you can use to make web calls:

  • retrieve_text: Get responses as text objects.

  • retrieve_binary: Get responses as binary objects.

  • retrieve_json: Get responses as json objects.

You can also use the general-purpose retrieve function to get responses as any of the three types. All responses are returned as a list that has the same order as the input list of requests. Moreover, there is another function called delete_url_cache for removing all requests from a cache file that contains the given URL.

You can control the request/response caching behavior by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to -1 (never expire).

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

You can find some example notebooks here.

You can also try using AsyncRetriever without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Please note that since this project is in the early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.

Moreover, requests for additional functionalities can be submitted via issue tracker.

Installation#

You can install async_retriever using pip:

$ pip install async_retriever

Alternatively, async_retriever can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge async_retriever

Quick start#

AsyncRetriever by default creates and/or uses ./cache/aiohttp_cache.sqlite as the cache that you can customize by the cache_name argument. Also, by default, the cache doesn’t have any expiration date and the delete_url_cache function should be used if you know that a database on a server was updated, and you want to retrieve the latest data. Alternatively, you can use the expire_after to set the expiration date for the cache.

As an example for retrieving a binary response, let’s use the DAAC server to get NDVI. The responses can be directly passed to xarray.open_mfdataset to get the data as a xarray Dataset. We can also disable SSL certificate verification by setting ssl=False.

import io
import xarray as xr
import async_retriever as ar
from datetime import datetime

west, south, east, north = (-69.77, 45.07, -69.31, 45.45)
base_url = "https://thredds.daac.ornl.gov/thredds/ncss/ornldaac/1299"
dates_itr = ((datetime(y, 1, 1), datetime(y, 1, 31)) for y in range(2000, 2005))
urls, kwds = zip(
    *[
        (
            f"{base_url}/MCD13.A{s.year}.unaccum.nc4",
            {
                "params": {
                    "var": "NDVI",
                    "north": f"{north}",
                    "west": f"{west}",
                    "east": f"{east}",
                    "south": f"{south}",
                    "disableProjSubset": "on",
                    "horizStride": "1",
                    "time_start": s.strftime("%Y-%m-%dT%H:%M:%SZ"),
                    "time_end": e.strftime("%Y-%m-%dT%H:%M:%SZ"),
                    "timeStride": "1",
                    "addLatLon": "true",
                    "accept": "netcdf",
                }
            },
        )
        for s, e in dates_itr
    ]
)
resp = ar.retrieve_binary(urls, kwds, max_workers=8, ssl=False)
data = xr.open_mfdataset(io.BytesIO(r) for r in resp)

We can remove these requests and their responses from the cache like so:

ar.delete_url_cache(base_url)
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/ndvi.png

For a json response example, let’s get water level recordings of an NOAA’s water level station, 8534720 (Atlantic City, NJ), during 2012, using CO-OPS API. Note that this CO-OPS product has a 31-day limit for a single request, so we have to break the request down accordingly.

import pandas as pd

station_id = "8534720"
start = pd.to_datetime("2012-01-01")
end = pd.to_datetime("2012-12-31")

s = start
dates = []
for e in pd.date_range(start, end, freq="m"):
    dates.append((s.date(), e.date()))
    s = e + pd.offsets.MonthBegin()

url = "https://api.tidesandcurrents.noaa.gov/api/prod/datagetter"

urls, kwds = zip(
    *[
        (
            url,
            {
                "params": {
                    "product": "water_level",
                    "application": "web_services",
                    "begin_date": f'{s.strftime("%Y%m%d")}',
                    "end_date": f'{e.strftime("%Y%m%d")}',
                    "datum": "MSL",
                    "station": f"{station_id}",
                    "time_zone": "GMT",
                    "units": "metric",
                    "format": "json",
                }
            },
        )
        for s, e in dates
    ]
)

resp = ar.retrieve_json(urls, kwds)
wl_list = []
for rjson in resp:
    wl = pd.DataFrame.from_dict(rjson["data"])
    wl["t"] = pd.to_datetime(wl.t)
    wl = wl.set_index(wl.t).drop(columns="t")
    wl["v"] = pd.to_numeric(wl.v, errors="coerce")
    wl_list.append(wl)
water_level = pd.concat(wl_list).sort_index()
water_level.attrs = rjson["metadata"]
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/water_level.png

Now, let’s see an example without any payload or headers. Here’s how we can retrieve harmonic constituents of several NOAA stations from CO-OPS:

stations = [
    "8410140",
    "8411060",
    "8413320",
    "8418150",
    "8419317",
    "8419870",
    "8443970",
    "8447386",
]

base_url = "https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations"
urls = [f"{base_url}/{i}/harcon.json?units=metric" for i in stations]
resp = ar.retrieve_json(urls)

amp_list = []
phs_list = []
for rjson in resp:
    sid = rjson["self"].rsplit("/", 2)[1]
    const = pd.DataFrame.from_dict(rjson["HarmonicConstituents"]).set_index("name")
    amp = const.rename(columns={"amplitude": sid})[sid]
    phase = const.rename(columns={"phase_GMT": sid})[sid]
    amp_list.append(amp)
    phs_list.append(phase)

amp = pd.concat(amp_list, axis=1)
phs = pd.concat(phs_list, axis=1)
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/tides.png

PyGeoOGC: Retrieve Data from RESTful, WMS, and WFS Services#

PyPi Conda Version CodeCov Python Versions Downloads

Security Status CodeFactor black pre-commit Binder

Features#

PyGeoOGC is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides general interfaces to web services that are based on ArcGIS RESTful, WMS, and WFS. Although all these web services have limits on the number of features per request (e.g., 1000 object IDs for a RESTful request or 8 million pixels for a WMS request), PyGeoOGC, first, divides the large requests into smaller chunks, and then returns the merged results.

Moreover, under the hood, PyGeoOGC uses AsyncRetriever for making requests asynchronously with persistent caching. This improves the reliability and speed of data retrieval significantly. AsyncRetriever caches all request/response pairs and upon making an already cached request, it will retrieve the responses from the cache if the server’s response is unchanged.

You can control the request/response caching behavior by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database. It defaults to ./cache/aiohttp_cache.sqlite

  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to -1 (never expire).

  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

There is also an inventory of URLs for some of these web services in form of a class called ServiceURL. These URLs are in four categories: ServiceURL().restful, ServiceURL().wms, ServiceURL().wfs, and ServiceURL().http. These URLs provide you with some examples of the services that PyGeoOGC supports. If you have success using PyGeoOGC with a web service please consider submitting a request to be added to this URL inventory. You can get all the URLs in the ServiceURL class by just printing it print(ServiceURL()).

PyGeoOGC has three main classes:

  • ArcGISRESTful: This class can be instantiated by providing the target layer URL. For example, for getting Watershed Boundary Data we can use ServiceURL().restful.wbd. By looking at the web service’s website we see that there are nine layers. For example, 1 for 2-digit HU (Region), 6 for 12-digit HU (Subregion), and so on. We can pass the URL to the target layer directly, like this f"{ServiceURL().restful.wbd}/6" or as a separate argument via layer.

    Afterward, we request for the data in two steps. First, we need to get the target object IDs using oids_bygeom (within a geometry), oids_byfield (specific field IDs), or oids_bysql (any valid SQL 92 WHERE clause) class methods. Then, we can get the target features using get_features class method. The returned response can be converted into a GeoDataFrame using json2geodf function from PyGeoUtils.

  • WMS: Instantiation of this class requires at least 3 arguments: service URL, layer name(s), and output format. Additionally, target CRS and the web service version can be provided. Upon instantiation, we can use getmap_bybox method class to get the target raster data within a bounding box. The box can be in any valid CRS and if it is different from the default CRS, EPSG:4326, it should be passed using box_crs argument. The service response can be converted into a xarray.Dataset using gtiff2xarray function from PyGeoUtils.

  • WFS: Instantiation of this class is similar to WMS. The only difference is that only one layer name can be passed. Upon instantiation there are three ways to get the data:

    • getfeature_bybox: Get all the target features within a bounding box in any valid CRS.

    • getfeature_byid: Get all the target features based on the IDs. Note that two arguments should be provided: featurename, and featureids. You can get a list of valid feature names using get_validnames class method.

    • getfeature_byfilter: Get the data based on any valid CQL filter.

    You can convert the returned response of this function to a GeoDataFrame using json2geodf function from PyGeoUtils package.

You can find some example notebooks here.

Furthermore, you can also try using PyGeoOGC without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Please note that since this project is in early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.

Moreover, requests for additional functionalities can be submitted via issue tracker.

Installation#

You can install PyGeoOGC using pip:

$ pip install pygeoogc

Alternatively, PyGeoOGC can be installed from the conda-forge repository using Conda or Mamba:

$ conda install -c conda-forge pygeoogc

Quick start#

We can access NHDPlus HR via RESTful service, National Wetlands Inventory from WMS, and FEMA National Flood Hazard via WFS. The output for these functions are of type requests.Response that can be converted to GeoDataFrame or xarray.Dataset using PyGeoUtils.

Let’s start the National Map’s NHDPlus HR web service. We can query the flowlines that are within a geometry as follows:

from pygeoogc import ArcGISRESTful, WFS, WMS, ServiceURL
import pygeoutils as geoutils
from pynhd import NLDI

basin_geom = NLDI().get_basins("01031500").geometry[0]

hr = ArcGISRESTful(ServiceURL().restful.nhdplushr, 2, outformat="json")

resp = hr.get_features(hr.oids_bygeom(basin_geom, "epsg:4326"))
flowlines = geoutils.json2geodf(resp)

Note oids_bygeom has three additional arguments: sql_clause, spatial_relation, and distance. We can use sql_clause for passing any valid SQL WHERE clauses and spatial_relation for specifying the target predicate such as intersect, contain, cross, etc. The default predicate is intersect (esriSpatialRelIntersects). Additionally, we can use distance for specifying the buffer distance from the input geometry for getting features.

We can also submit a query based on IDs of any valid field in the database. If the measure property is desired you can pass return_m as True to the get_features class method:

oids = hr.oids_byfield("PERMANENT_IDENTIFIER", ["103455178", "103454362", "103453218"])
resp = hr.get_features(oids, return_m=True)
flowlines = geoutils.json2geodf(resp)

Additionally, any valid SQL 92 WHERE clause can be used. For more details look here. For example, let’s limit our first request to only include catchments with areas larger than 0.5 sqkm.

oids = hr.oids_bygeom(basin_geom, geo_crs="epsg:4326", sql_clause="AREASQKM > 0.5")
resp = hr.get_features(oids)
catchments = geoutils.json2geodf(resp)

A WMS-based example is shown below:

wms = WMS(
    ServiceURL().wms.fws,
    layers="0",
    outformat="image/tiff",
    crs="epsg:3857",
)
r_dict = wms.getmap_bybox(
    basin_geom.bounds,
    1e3,
    box_crs="epsg:4326",
)
wetlands = geoutils.gtiff2xarray(r_dict, basin_geom, "epsg:4326")

Query from a WFS-based web service can be done either within a bounding box or using any valid CQL filter.

wfs = WFS(
    ServiceURL().wfs.fema,
    layer="public_NFHL:Base_Flood_Elevations",
    outformat="esrigeojson",
    crs="epsg:4269",
)
r = wfs.getfeature_bybox(basin_geom.bounds, box_crs="epsg:4326")
flood = geoutils.json2geodf(r.json(), "epsg:4269", "epsg:4326")

layer = "wmadata:huc08"
wfs = WFS(
    ServiceURL().wfs.waterdata,
    layer=layer,
    outformat="application/json",
    version="2.0.0",
    crs="epsg:4269",
)
r = wfs.getfeature_byfilter(f"huc8 LIKE '13030%'")
huc8 = geoutils.json2geodf(r.json(), "epsg:4269", "epsg:4326")
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/sql_clause.png

PyGeoUtils: Utilities for (Geo)JSON and (Geo)TIFF Conversion#

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features#

PyGeoUtils is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides utilities for manipulating (Geo)JSON and (Geo)TIFF responses from web services. These utilities are:

  • json2geodf: For converting (Geo)JSON objects to GeoPandas dataframe.

  • arcgis2geojson: For converting ESRIGeoJSON to the standard GeoJSON format.

  • gtiff2xarray: For converting (Geo)TIFF objects to xarray datasets.

  • xarray2geodf: For converting xarray.DataArray to a geopandas.GeoDataFrame, i.e., vectorization.

  • xarray_geomask: For masking a xarray.Dataset or xarray.DataArray using a polygon.

All these functions handle all necessary CRS transformations.

You can find some example notebooks here.

You can also try using PyGeoUtils without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Please note that since this project is in the early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.

Moreover, requests for additional functionalities can be submitted via issue tracker.

Installation#

You can install PyGeoUtils using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev).

$ pip install pygeoutils

Alternatively, PyGeoUtils can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge pygeoutils

Quick start#

To demonstrate the capabilities of PyGeoUtils let’s use PyGeoOGC to access National Wetlands Inventory from WMS, and FEMA National Flood Hazard via WFS, then convert the output to xarray.Dataset and GeoDataFrame, respectively.

import pygeoutils as geoutils
from pygeoogc import WFS, WMS, ServiceURL
from shapely.geometry import Polygon


geometry = Polygon(
    [
        [-118.72, 34.118],
        [-118.31, 34.118],
        [-118.31, 34.518],
        [-118.72, 34.518],
        [-118.72, 34.118],
    ]
)
crs = "epsg:4326"

wms = WMS(
    ServiceURL().wms.mrlc,
    layers="NLCD_2011_Tree_Canopy_L48",
    outformat="image/geotiff",
    crs=crs,
)
r_dict = wms.getmap_bybox(
    geometry.bounds,
    1e3,
    box_crs=crs,
)
canopy = geoutils.gtiff2xarray(r_dict, geometry, crs)

mask = canopy > 60
canopy_gdf = geoutils.xarray2geodf(canopy, "float32", mask)

url_wfs = "https://hazards.fema.gov/gis/nfhl/services/public/NFHL/MapServer/WFSServer"
wfs = WFS(
    url_wfs,
    layer="public_NFHL:Base_Flood_Elevations",
    outformat="esrigeojson",
    crs="epsg:4269",
)
r = wfs.getfeature_bybox(geometry.bounds, box_crs=crs)
flood = geoutils.json2geodf(r.json(), "epsg:4269", crs)

API References#

pynhd#

Top-level package for PyNHD.

Submodules#

pynhd.core#

Base classes for PyNHD functions.

Module Contents#
class pynhd.core.AGRBase(base_url, layer=None, outfields='*', crs=DEF_CRS, outformat='json')#

Base class for getting geospatial data from a ArcGISRESTful service.

Parameters
  • base_url (str, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last / in the url or the target layer must be passed as an argument.

  • layer (str, optional) – A valid service layer. To see a list of available layers instantiate the class without passing any argument.

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, optional) – Target spatial reference, default to EPSG:4326

  • outformat (str, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults to json.

bygeom(self, geom, geo_crs=DEF_CRS, sql_clause='', distance=None, return_m=False, return_geom=True)#

Get feature within a geometry that can be combined with a SQL where clause.

Parameters
  • geom (Polygon or tuple) – A geometry (Polygon) or bounding box (tuple of length 4).

  • geo_crs (str) – The spatial reference of the input geometry.

  • sql_clause (str, optional) – A valid SQL 92 WHERE clause, defaults to an empty string.

  • distance (int, optional) – The buffer distance for the input geometries in meters, default to None.

  • return_m (bool, optional) – Whether to activate the Return M (measure) in the request, defaults to False.

  • return_geom (bool, optional) – Whether to return the geometry of the feature, defaults to True.

Returns

geopandas.GeoDataFrame – The requested features as a GeoDataFrame.

byids(self, field, fids, return_m=False, return_geom=True)#

Get features based on a list of field IDs.

Parameters
  • field (str) – Name of the target field that IDs belong to.

  • fids (str or list) – A list of target field ID(s).

  • return_m (bool) – Whether to activate the Return M (measure) in the request, defaults to False.

  • return_geom (bool, optional) – Whether to return the geometry of the feature, defaults to True.

Returns

geopandas.GeoDataFrame – The requested features as a GeoDataFrame.

bysql(self, sql_clause, return_m=False, return_geom=True)#

Get feature IDs using a valid SQL 92 WHERE clause.

Notes

Not all web services support this type of query. For more details look here

Parameters
  • sql_clause (str) – A valid SQL 92 WHERE clause.

  • return_m (bool) – Whether to activate the measure in the request, defaults to False.

  • return_geom (bool, optional) – Whether to return the geometry of the feature, defaults to True.

Returns

geopandas.GeoDataFrame – The requested features as a GeoDataFrame.

get_validlayers(self, url)#

Get a list of valid layers.

Parameters

url (str) – The URL of the ArcGIS REST service.

Returns

dict – A dictionary of valid layers.

property service_info(self)#

Get the service information.

class pynhd.core.GeoConnex(item=None)#

Access to the GeoConnex API.

Notes

The geometry field of the query can be a Polygon, MultiPolygon, or tuple/list of length 4 (bbox) in EPSG:4326 CRS. They should be within the extent of the GeoConnex endpoint.

Parameters

item (str, optional) – The target endpoint to query, defaults to None.

property item(self)#

Return the name of the endpoint.

query(self, kwds, skip_geometry=False)#

Query the GeoConnex endpoint.

class pynhd.core.ScienceBase#

Access and explore files on ScienceBase.

static get_children(item)#

Get children items of an item.

static get_file_urls(item)#

Get download and meta URLs of all the available files for an item.

pynhd.core.stage_nhdplus_attrs(parquet_path=None)#

Stage the NHDPlus Attributes database and save to nhdplus_attrs.parquet.

More info can be found here.

Parameters

parquet_path (str or Path) – Path to a file with .parquet extension for saving the processed to disk for later use.

Returns

pandas.DataFrame – The staged data as a DataFrame.

pynhd.network_tools#

Access NLDI and WaterData databases.

Module Contents#
pynhd.network_tools.flowline_resample(flw, spacing)#

Resample a flowline based on a given spacing.

Parameters
  • flw (geopandas.GeoDataFrame) – A dataframe with geometry and comid columns and CRS attribute. The flowlines should be able to merged to a single LineString. Otherwise, you should use the network_resample() function.

  • spacing (float) – Spacing between the sample points in meters.

Returns

geopandas.GeoDataFrame – Resampled flowline.

pynhd.network_tools.flowline_xsection(flw, distance, width)#

Get cross-section of a river network at a given spacing.

Parameters
  • flw (geopandas.GeoDataFrame) – A dataframe with geometry and comid columns and CRS attribute.

  • distance (float) – The distance between two consecutive cross-sections.

  • width (float) – The width of the cross-section.

Returns

geopandas.GeoDataFrame – A dataframe with two columns: geometry and comid. The geometry column contains the cross-section of the river network and the comid column contains the corresponding comid from the input dataframe. Note that each comid can have multiple cross-sections depending on the given spacing distance.

pynhd.network_tools.network_resample(flw, spacing)#

Get cross-section of a river network at a given spacing.

Parameters
  • flw (geopandas.GeoDataFrame) – A dataframe with geometry and comid columns and CRS attribute.

  • spacing (float) – The spacing between the points.

Returns

geopandas.GeoDataFrame – Resampled flowlines.

pynhd.network_tools.network_xsection(flw, distance, width)#

Get cross-section of a river network at a given spacing.

Parameters
  • flw (geopandas.GeoDataFrame) – A dataframe with geometry and comid columns and CRS attribute.

  • distance (float) – The distance between two consecutive cross-sections.

  • width (float) – The width of the cross-section.

Returns

geopandas.GeoDataFrame – A dataframe with two columns: geometry and comid. The geometry column contains the cross-section of the river network and the comid column contains the corresponding comid from the input dataframe. Note that each comid can have multiple cross-sections depending on the given spacing distance.

pynhd.network_tools.nhdflw2nx(flowlines, id_col='comid', toid_col='tocomid', edge_attr=None)#

Convert NHDPlus flowline database to networkx graph.

Parameters
  • flowlines (geopandas.GeoDataFrame) – NHDPlus flowlines.

  • id_col (str, optional) – Name of the column containing the node ID, defaults to “comid”.

  • toid_col (str, optional) – Name of the column containing the downstream node ID, defaults to “tocomid”.

  • edge_attr (str, optional) – Name of the column containing the edge attributes, defaults to None. If True, all remaining columns will be used as edge attributes.

Returns

nx.DiGraph – Networkx directed graph of the NHDPlus flowlines.

pynhd.network_tools.prepare_nhdplus(flowlines, min_network_size, min_path_length, min_path_size=0, purge_non_dendritic=False, remove_isolated=False, use_enhd_attrs=False, terminal2nan=True)#

Clean up and fix common issues of NHDPlus flowline database.

Ported from nhdplusTools.

Parameters
  • flowlines (geopandas.GeoDataFrame) – NHDPlus flowlines with at least the following columns: comid, lengthkm, ftype, terminalfl, fromnode, tonode, totdasqkm, startflag, streamorde, streamcalc, terminalpa, pathlength, divergence, hydroseq, levelpathi.

  • min_network_size (float) – Minimum size of drainage network in sqkm

  • min_path_length (float) – Minimum length of terminal level path of a network in km.

  • min_path_size (float, optional) – Minimum size of outlet level path of a drainage basin in km. Drainage basins with an outlet drainage area smaller than this value will be removed. Defaults to 0.

  • purge_non_dendritic (bool, optional) – Whether to remove non dendritic paths, defaults to False.

  • remove_isolated (bool, optional) – Whether to remove isolated flowlines, defaults to False. If True, terminal2nan will be set to False.

  • use_enhd_attrs (bool, optional) – Whether to replace the attributes with the ENHD attributes, defaults to False. For more information, see this.

  • terminal2nan (bool, optional) – Whether to replace the COMID of the terminal flowline of the network with NaN, defaults to True. If False, the terminal COMID will be set from the ENHD attributes i.e. use_enhd_attrs will be set to True.

Returns

geopandas.GeoDataFrame – Cleaned up flowlines. Note that all column names are converted to lower case.

pynhd.network_tools.topoogical_sort(flowlines, edge_attr=None)#

Topological sorting of a river network.

Parameters
  • flowlines (pandas.DataFrame) – A dataframe with columns ID and toID

  • edge_attr (str or list, optional) – Names of the columns in the dataframe to be used as edge attributes, defaults to None.

Returns

(list, dict , networkx.DiGraph) – A list of topologically sorted IDs, a dictionary with keys as IDs and values as its upstream nodes, and the generated networkx object. Note that the terminal node ID is set to pd.NA.

pynhd.network_tools.vector_accumulation(flowlines, func, attr_col, arg_cols, id_col='comid', toid_col='tocomid')#

Flow accumulation using vector river network data.

Parameters
  • flowlines (pandas.DataFrame) – A dataframe containing comid, tocomid, attr_col and all the columns that ara required for passing to func.

  • func (function) – The function that routes the flow in a single river segment. Positions of the arguments in the function should be as follows: func(qin, *arg_cols) qin is computed in this function and the rest are in the order of the arg_cols. For example, if arg_cols = ["slope", "roughness"] then the functions is called this way: func(qin, slope, roughness) where slope and roughness are elemental values read from the flowlines.

  • attr_col (str) – The column name of the attribute being accumulated in the network. The column should contain the initial condition for the attribute for each river segment. It can be a scalar or an array (e.g., time series).

  • arg_cols (list of strs) – List of the flowlines columns that contain all the required data for a routing a single river segment such as slope, length, lateral flow, etc.

  • id_col (str, optional) – Name of the flowlines column containing IDs, defaults to comid

  • toid_col (str, optional) – Name of the flowlines column containing toIDs, defaults to tocomid

Returns

pandas.Series – Accumulated flow for all the nodes. The dataframe is sorted from upstream to downstream (topological sorting). Depending on the given initial condition in the attr_col, the outflow for each river segment can be a scalar or an array.

pynhd.nhdplus_derived#

Access NLDI and WaterData databases.

Module Contents#
pynhd.nhdplus_derived.enhd_attrs(parquet_path=None)#

Get updated NHDPlus attributes from ENHD.

Notes

This downloads a 160 MB parquet file from here . Although this dataframe does not include geometry, it can be linked to other geospatial NHDPlus dataframes through ComIDs.

Parameters

parquet_path (str or Path, optional) – Path to a file with .parquet extension for storing the file, defaults to ./cache/enhd_attrs.parquet.

Returns

pandas.DataFrame – A dataframe that includes ComID-level attributes for 2.7 million NHDPlus flowlines.

pynhd.nhdplus_derived.nhd_fcode()#

Get all the NHDPlus FCodes.

pynhd.nhdplus_derived.nhdplus_attrs(name=None, parquet_path=None)#

Access NHDPlus V2.1 Attributes from ScienceBase over CONUS.

More info can be found here.

Parameters
  • name (str, optional) – Name of the NHDPlus attribute, defaults to None which returns a dataframe containing metadata of all the available attributes in the database.

  • parquet_path (str or Path, optional) – Path to a file with .parquet extension for saving the processed to disk for later use. Defaults to ./cache/nhdplus_attrs.parquet.

Returns

pandas.DataFrame – Either a dataframe containing the database metadata or the requested attribute over CONUS.

pynhd.nhdplus_derived.nhdplus_vaa(parquet_path=None)#

Get NHDPlus Value Added Attributes with ComID-level roughness and slope values.

Notes

This function downloads a 200 MB parquet file from here . Although this dataframe does not include geometry, it can be linked to other geospatial NHDPlus dataframes through ComIDs.

Parameters

parquet_path (str or Path, optional) – Path to a file with .parquet extension for storing the file, defaults to ./cache/nldplus_vaa.parquet.

Returns

pandas.DataFrame – A dataframe that includes ComID-level attributes for 2.7 million NHDPlus flowlines.

Examples

>>> vaa = nhdplus_vaa() 
>>> print(vaa.slope.max()) 
4.6
pynhd.pynhd#

Access NLDI and WaterData databases.

Module Contents#
class pynhd.pynhd.NHD(layer, outfields='*', crs=DEF_CRS)#

Access National Hydrography Dataset (NHD), both meduim and high resolution.

Notes

For more info visit: https://hydro.nationalmap.gov/arcgis/rest/services/nhd/MapServer

Parameters
  • layer (str, optional) – A valid service layer. Layer names with _hr are high resolution and _mr are medium resolution. Also, layer names with _nonconus are for non-conus areas, i.e., Alaska, Hawaii, Puerto Rico, the Virgin Islands , and the Pacific Islands. Valid layers are:

    • point

    • point_event

    • line_hr

    • flow_direction

    • flowline_mr

    • flowline_hr_nonconus

    • flowline_hr

    • area_mr

    • area_hr_nonconus

    • area_hr

    • waterbody_mr

    • waterbody_hr_nonconus

    • waterbody_hr

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, optional) – Target spatial reference, default to EPSG:4326

class pynhd.pynhd.NHDPlusHR(layer, outfields='*', crs=DEF_CRS)#

Access National Hydrography Dataset (NHD) high resolution.

Notes

For more info visit: https://edits.nationalmap.gov/arcgis/rest/services/nhd/MapServer

Parameters
  • layer (str, optional) – A valid service layer. Valid layers are:

    • point

    • sink

    • flowline

    • non_network_flowline

    • flow_direction

    • line

    • wall

    • burn_line

    • burn_waterbody

    • area

    • waterbody

    • huc12

    • catchment

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, optional) – Target spatial reference, default to EPSG:4326

class pynhd.pynhd.NLDI#

Access the Hydro Network-Linked Data Index (NLDI) service.

comid_byloc(self, coords, loc_crs=DEF_CRS)#

Get the closest ComID based on coordinates.

Notes

This function tries to find the closest ComID based on flowline grid cells. If such a cell is not found, it will return the closest ComID using the flowtrace endpoint of the PyGeoAPI service to find the closest downstream ComID. The returned dataframe has a measure column that indicates the location of the input coordinate on the flowline as a percentage of the total flowline length.

Parameters
  • coords (tuple or list of tuples) – A tuple of length two (x, y) or a list of them.

  • loc_crs (str, optional) – The spatial reference of the input coordinate, defaults to EPSG:4326.

Returns

geopandas.GeoDataFrame or (geopandas.GeoDataFrame, list) – NLDI indexed ComID(s) and points in EPSG:4326. If some coords don’t return any ComID a list of missing coords are returned as well.

feature_byloc(self, coords, loc_crs=DEF_CRS)#

Get the closest feature ID(s) based on coordinates.

Parameters
  • coords (tuple or list) – A tuple of length two (x, y) or a list of them.

  • loc_crs (str, optional) – The spatial reference of the input coordinate, defaults to EPSG:4326.

Returns

geopandas.GeoDataFrame or (geopandas.GeoDataFrame, list) – NLDI indexed feature ID(s) and flowlines in EPSG:4326. If some coords don’t return any IDs a list of missing coords are returned as well.

get_basins(self, feature_ids, fsource='nwissite', split_catchment=False, simplified=True)#

Get basins for a list of station IDs.

Parameters
  • feature_ids (str or list) – Target feature ID(s).

  • fsource (str) – The name of feature(s) source, defaults to nwissite. The valid sources are:

    • ‘comid’ for NHDPlus comid.

    • ‘ca_gages’ for Streamgage catalog for CA SB19

    • ‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest

    • ‘huc12pp’ for HUC12 Pour Points

    • ‘nmwdi-st’ for New Mexico Water Data Initative Sites

    • ‘nwisgw’ for NWIS Groundwater Sites

    • ‘nwissite’ for NWIS Surface Water Sites

    • ‘ref_gage’ for geoconnex.us reference gages

    • ‘vigil’ for Vigil Network Data

    • ‘wade’ for Water Data Exchange 2.0 Sites

    • ‘WQP’ for Water Quality Portal

  • split_catchment (bool, optional) – If True, split basins at their outlet locations. Default to False.

  • simplified (bool, optional) – If True, return a simplified version of basin geometries. Default to True.

Returns

geopandas.GeoDataFrame or (geopandas.GeoDataFrame, list) – NLDI indexed basins in EPSG:4326. If some IDs don’t return any features a list of missing ID(s) are returned as well.

get_validchars(self, char_type)#

Get all the available characteristics IDs for a given characteristics type.

getcharacteristic_byid(self, comids, char_type, char_ids='all', values_only=True)#

Get characteristics using a list ComIDs.

Parameters
  • comids (str or list) – The NHDPlus Common Identifier(s).

  • char_type (str) – Type of the characteristic. Valid values are local for individual reach catchments, tot for network-accumulated values using total cumulative drainage area and div for network-accumulated values using divergence-routed.

  • char_ids (str or list, optional) – Name(s) of the target characteristics, default to all.

  • values_only (bool, optional) – Whether to return only characteristic_value as a series, default to True. If is set to False, percent_nodata is returned as well.

Returns

pandas.DataFrame or tuple of pandas.DataFrame – Either only characteristic_value as a dataframe or or if values_only is Fale return percent_nodata as well.

getfeature_byid(self, fsource, fid)#

Get feature(s) based ID(s).

Parameters
  • fsource (str) – The name of feature(s) source. The valid sources are:

    • ‘comid’ for NHDPlus comid.

    • ‘ca_gages’ for Streamgage catalog for CA SB19

    • ‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest

    • ‘huc12pp’ for HUC12 Pour Points

    • ‘nmwdi-st’ for New Mexico Water Data Initative Sites

    • ‘nwisgw’ for NWIS Groundwater Sites

    • ‘nwissite’ for NWIS Surface Water Sites

    • ‘ref_gage’ for geoconnex.us reference gages

    • ‘vigil’ for Vigil Network Data

    • ‘wade’ for Water Data Exchange 2.0 Sites

    • ‘WQP’ for Water Quality Portal

  • fid (str or list of str) – Feature ID(s).

Returns

geopandas.GeoDataFrame or (geopandas.GeoDataFrame, list) – NLDI indexed features in EPSG:4326. If some IDs don’t return any features a list of missing ID(s) are returned as well.

navigate_byid(self, fsource, fid, navigation, source, distance=500, trim_start=False)#

Navigate the NHDPlus database from a single feature id up to a distance.

Parameters
  • fsource (str) – The name of feature(s) source. The valid sources are:

    • ‘comid’ for NHDPlus comid.

    • ‘ca_gages’ for Streamgage catalog for CA SB19

    • ‘gfv11_pois’ for USGS Geospatial Fabric V1.1 Points of Interest

    • ‘huc12pp’ for HUC12 Pour Points

    • ‘nmwdi-st’ for New Mexico Water Data Initative Sites

    • ‘nwisgw’ for NWIS Groundwater Sites

    • ‘nwissite’ for NWIS Surface Water Sites

    • ‘ref_gage’ for geoconnex.us reference gages

    • ‘vigil’ for Vigil Network Data

    • ‘wade’ for Water Data Exchange 2.0 Sites

    • ‘WQP’ for Water Quality Portal

  • fid (str) – The ID of the feature.

  • navigation (str) – The navigation method.

  • source (str, optional) – Return the data from another source after navigating the features using fsource, defaults to None.

  • distance (int, optional) – Limit the search for navigation up to a distance in km, defaults is 500 km. Note that this is an expensive request so you have be mindful of the value that you provide. The value must be between 1 to 9999 km.

  • trim_start (bool, optional) – If True, trim the starting flowline at the source feature, defaults to False.

Returns

geopandas.GeoDataFrame – NLDI indexed features in EPSG:4326.

navigate_byloc(self, coords, navigation=None, source=None, loc_crs=DEF_CRS, distance=500, trim_start=False)#

Navigate the NHDPlus database from a coordinate.

Parameters
  • coords (tuple) – A tuple of length two (x, y).

  • navigation (str, optional) – The navigation method, defaults to None which throws an exception if comid_only is False.

  • source (str, optional) – Return the data from another source after navigating the features using fsource, defaults to None which throws an exception if comid_only is False.

  • loc_crs (str, optional) – The spatial reference of the input coordinate, defaults to EPSG:4326.

  • distance (int, optional) – Limit the search for navigation up to a distance in km, defaults to 500 km. Note that this is an expensive request so you have be mindful of the value that you provide. If you want to get all the available features you can pass a large distance like 9999999.

  • trim_start (bool, optional) – If True, trim the starting flowline at the source feature, defaults to False.

Returns

geopandas.GeoDataFrame – NLDI indexed features in EPSG:4326.

class pynhd.pynhd.PyGeoAPI#

Access PyGeoAPI service.

cross_section(self, coord, width, numpts, crs=DEF_CRS)#

Return a GeoDataFrame from the xsatpoint service.

Parameters
  • coord (tuple) – The coordinate of the point to extract the cross-section as a tuple,e.g., (lon, lat).

  • width (float) – The width of the cross-section in meters.

  • numpts (int) – The number of points to extract the cross-section from the DEM.

  • crs (str, optional) – The coordinate reference system of the coordinates, defaults to EPSG:4326.

Returns

geopandas.GeoDataFrame – A GeoDataFrame containing the cross-section at the requested point.

Examples

>>> from pynhd import PyGeoAPI
>>> pygeoapi = PyGeoAPI()
>>> gdf = pygeoapi.cross_section((-103.80119, 40.2684), width=1000.0, numpts=101, crs=DEF_CRS)  
>>> print(gdf.iloc[-1, 1])  
1000.0
elevation_profile(self, coords, numpts, dem_res, crs=DEF_CRS)#

Return a GeoDataFrame from the xsatendpts service.

Parameters
  • coords (list) – A list of two coordinates to trace as a list of tuples, e.g., [(lon1, lat1), (lon2, lat2)].

  • numpts (int) – The number of points to extract the elevation profile from the DEM.

  • dem_res (int) – The target resolution for requesting the DEM from 3DEP service.

  • crs (str, optional) – The coordinate reference system of the coordinates, defaults to EPSG:4326.

Returns

geopandas.GeoDataFrame – A GeoDataFrame containing the elevation profile along the requested endpoints.

Examples

>>> from pynhd import PyGeoAPI
>>> pygeoapi = PyGeoAPI()
>>> gdf = pygeoapi.elevation_profile(
...     [(-103.801086, 40.26772), (-103.80097, 40.270568)], numpts=101, dem_res=1, crs=DEF_CRS
... )  
>>> print(gdf.iloc[-1, 1])  
411.5906
flow_trace(self, coord, crs=DEF_CRS, direction='none')#

Return a GeoDataFrame from the flowtrace service.

Parameters
  • coord (tuple) – The coordinate of the point to trace as a tuple,e.g., (lon, lat).

  • crs (str) – The coordinate reference system of the coordinates, defaults to EPSG:4326.

  • direction (str, optional) – The direction of flowpaths, either down, up, or none. Defaults to none.

Returns

geopandas.GeoDataFrame – A GeoDataFrame containing the traced flowline.

Examples

>>> from pynhd import PyGeoAPI
>>> pygeoapi = PyGeoAPI()
>>> gdf = pygeoapi.flow_trace(
...     (1774209.63, 856381.68), crs="ESRI:102003", direction="none"
... )  
>>> print(gdf.comid.iloc[0])  
22294818
split_catchment(self, coord, crs=DEF_CRS, upstream=False)#

Return a GeoDataFrame from the splitcatchment service.

Parameters
  • coord (tuple) – The coordinate of the point to trace as a tuple,e.g., (lon, lat).

  • crs (str, optional) – The coordinate reference system of the coordinates, defaults to EPSG:4326.

  • upstream (bool, optional) – If True, return all upstream catchments rather than just the local catchment, defaults to False.

Returns

geopandas.GeoDataFrame – A GeoDataFrame containing the local catchment or the entire upstream catchments.

Examples

>>> from pynhd import PyGeoAPI
>>> pygeoapi = PyGeoAPI()
>>> gdf = pygeoapi.split_catchment((-73.82705, 43.29139), crs=DEF_CRS, upstream=False)  
>>> print(gdf.catchmentID.iloc[0])  
22294818
class pynhd.pynhd.WaterData(layer, crs=DEF_CRS, validation=True)#

Access to Water Data service.

Parameters
  • layer (str) – A valid layer from the WaterData service. Valid layers are: nhdarea, nhdwaterbody, catchmentsp, nhdflowline_network gagesii, huc08, huc12, huc12agg, and huc12all. Note that the layers’ worksapce for the Water Data service is wmadata which will be added to the given layer argument if it is not provided.

  • crs (str, optional) – The target spatial reference system, defaults to epsg:4326.

  • validation (bool, optional) – Whether to validate the input data, defaults to True.

bybox(self, bbox, box_crs=DEF_CRS)#

Get features within a bounding box.

bydistance(self, coords, distance, loc_crs=DEF_CRS)#

Get features within a radius (in meters) of a point.

byfilter(self, cql_filter, method='GET')#

Get features based on a CQL filter.

bygeom(self, geometry, geo_crs=DEF_CRS, xy=True, predicate='INTERSECTS')#

Get features within a geometry.

Parameters
  • geometry (shapely.geometry) – The input geometry

  • geo_crs (str, optional) – The CRS of the input geometry, default to epsg:4326.

  • xy (bool, optional) – Whether axis order of the input geometry is xy or yx.

  • predicate (str, optional) – The geometric prediacte to use for requesting the data, defaults to INTERSECTS. Valid predicates are: EQUALS, DISJOINT, INTERSECTS, TOUCHES, CROSSES, WITHIN CONTAINS, OVERLAPS, RELATE, BEYOND

Returns

geopandas.GeoDataFrame – The requested features in the given geometry.

byid(self, featurename, featureids)#

Get features based on IDs.

pynhd.pynhd.geoconnex(item=None, query=None, skip_geometry=False)#

Query the GeoConnex API.

Notes

If you run the function without any arguments, it will print out a list of available endpoints. If you run the function with item but no query, it will print out the description, queryable fields, and extent of the selected endpoint (item).

Parameters
  • item (str, optional) – The item to query.

  • query (dict, optional) – Query parameters. The geometry field can be a Polygon, MultiPolygon, or tuple/list of length 4 (bbox) in EPSG:4326 CRS.

  • skip_geometry (bool, optional) – If True, the geometry will not be returned.

Returns

geopandas.GeoDataFrame – The data.

pynhd.pynhd.pygeoapi(coords, service)#

Return a GeoDataFrame from the flowtrace service.

Parameters
  • coords (geopandas.GeoDataFrame) – A GeoDataFrame containing the coordinates to query. The required columns services are:

    • flow_trace: direction that indicates the direction of the flow trace. It can be up, down, or none.

    • split_catchment: upstream that indicates whether to return all upstream catchments or just the local catchment.

    • elevation_profile: numpts that indicates the number of points to extract along the flowpath and 3dep_res that indicates the target resolution for requesting the DEM from 3DEP service.

    • cross_section: numpts that indicates the number of points to extract along the flowpath and width that indicates the width of the cross-section in meters.

  • service (str) – The service to query, can be flow_trace, split_catchment, elevation_profile, or cross_section.

Returns

geopandas.GeoDataFrame – A GeoDataFrame containing the results of requested service.

Examples

>>> from shapely.geometry import Point
>>> gdf = gpd.GeoDataFrame(
...     {
...         "direction": [
...             "none",
...         ]
...     },
...     geometry=[sgeom.Point((1774209.63, 856381.68))],
...     crs="ESRI:102003",
... )
>>> trace = nhd.pygeoapi(gdf, "flow_trace")
>>> print(trace.comid.iloc[0])
22294818

Package Contents#

pygeohydro#

Submodules#

pygeohydro.helpers#

Some helper function for PyGeoHydro.

Module Contents#
pygeohydro.helpers.nlcd_helper()#

Get legends and properties of the NLCD cover dataset.

Notes

The following references have been used:
Returns

dict – Years where data is available and cover classes and categories, and roughness estimations.

pygeohydro.helpers.nwis_errors()#

Get error code lookup table for USGS sites that have daily values.

pygeohydro.plot#

Plot hydrological signatures.

Plots includes daily, monthly and annual hydrograph as well as regime curve (monthly mean) and flow duration curve.

Module Contents#
pygeohydro.plot.exceedance(daily, threshold=0.001)#

Compute exceedance probability from daily data.

Parameters
  • daily (pandas.Series or pandas.DataFrame) – The data to be processed

  • threshold (float, optional) – The threshold to compute exceedance probability, defaults to 1e-3.

Returns

pandas.Series or pandas.DataFrame – Exceedance probability.

pygeohydro.plot.mean_monthly(daily, index_abbr=False)#

Compute mean monthly summary from daily data.

Parameters

daily (pandas.Series or pandas.DataFrame) – The data to be processed

Returns

pandas.Series or pandas.DataFrame – Mean monthly summary.

pygeohydro.plot.prepare_plot_data(daily)#

Generae a structured data for plotting hydrologic signatures.

Parameters

daily (pandas.Series or pandas.DataFrame) – The data to be processed

Returns

PlotDataType – Containing daily, ``mean_monthly, ranked, titles, and units fields.

pygeohydro.plot.signatures(discharge, precipitation=None, title=None, threshold=0.001, output=None, close=False)#

Plot hydrological signatures with w/ or w/o precipitation.

Plots includes daily hydrograph, regime curve (mean monthly) and flow duration curve. The input discharges are converted from cms to mm/day based on the watershed area, if provided.

Parameters
  • discharge (pd.DataFrame or pd.Series) – The streamflows in mm/day. The column names are used as labels on the plot and the column values should be daily streamflow.

  • precipitation (pd.Series, optional) – Daily precipitation time series in mm/day. If given, the data is plotted on the second x-axis at the top.

  • title (str, optional) – The plot supertitle.

  • threshold (float, optional) – The threshold for cutting off the discharge for the flow duration curve to deal with log 0 issue, defaults to \(1^{-3}\) mm/day.

  • output (str, optional) – Path to save the plot as png, defaults to None which means the plot is not saved to a file.

  • close (bool, optional) – Whether to close the figure.

pygeohydro.pygeohydro#

Accessing data from the supported databases through their APIs.

Module Contents#
class pygeohydro.pygeohydro.NID#

Retrieve data from the National Inventory of Dams web service.

get_byfilter(self, query_list)#

Query dams by filters from the National Inventory of Dams web service.

Parameters

query_list (list of dict) – List of dictionary of query parameters. For an exhaustive list of the parameters, use the advanced fields dataframe that can be accessed via NID().fields_meta. Some filter require min/max values such as damHeight and drainageArea. For such filters, the min/max values should be passed like so: {filter_key: ["[min1 max1]", "[min2 max2]"]}.

Returns

geopandas.GeoDataFrame – Query results.

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> query_list = [
...    {"drainageArea": ["[200 500]"]},
...    {"nidId": ["CA01222"]},
... ]
>>> dam_dfs = nid.get_byfilter(query_list)
>>> print(dam_dfs[0].name[0])
Prairie Portage
get_bygeom(self, geometry, geo_crs)#

Retrieve NID data within a geometry.

Parameters
  • geometry (Polygon, MultiPolygon, or tuple of length 4) – Geometry or bounding box (west, south, east, north) for extracting the data.

  • geo_crs (list of str) – The CRS of the input geometry, defaults to epsg:4326.

Returns

geopandas.GeoDataFrame – GeoDataFrame of NID data

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> dams = nid.get_bygeom((-69.77, 45.07, -69.31, 45.45), "epsg:4326")
>>> print(dams.name.iloc[0])
Little Moose
get_suggestions(self, text, context_key='')#

Get suggestions from the National Inventory of Dams web service.

Notes

This function is useful for exploring and/or narrowing down the filter fields that are needed to query the dams using get_byfilter.

Parameters
  • text (str) – Text to query for suggestions.

  • context_key (str, optional) – Suggestion context, defaults to empty string, i.e., all context keys. For a list of valid context keys, see NID().fields_meta.

Returns

tuple of pandas.DataFrame – The suggestions for the requested text as two DataFrames: First, is suggestions found in the dams properties and second, those found in the query fields such as states, huc6, etc.

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> dams, contexts = nid.get_suggestions("texas", "city")
>>> print(contexts.loc["CITY", "value"])
Texas City
inventory_byid(self, dam_ids)#

Get extra attributes for dams based on their dam ID.

Notes

This function is meant to be used for getting extra attributes for dams. For example, first you need to use either get_bygeom or get_byfilter to get basic attributes of the target dams. Then you can use this function to get extra attributes using the id column of the GeoDataFrame that get_bygeom or get_byfilter returns.

Parameters

dam_ids (list of int or str) – List of the target dam IDs (digists only). Note that the dam IDs are not the same as the NID IDs.

Returns

pandas.DataFrame – Dams with extra attributes in addition to the standard NID fields that other NID methods return.

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> dams = nid.inventory_byid([514871, 459170, 514868, 463501, 463498])
>>> print(dams.damHeight.max())
120.0
class pygeohydro.pygeohydro.WBD(layer, outfields='*', crs=DEF_CRS)#

Access Watershed Boundary Dataset (WBD).

Notes

This file contains Hydrologic Unit (HU) polygon boundaries for the United States, Puerto Rico, and the U.S. Virgin Islands. For more info visit: https://hydro.nationalmap.gov/arcgis/rest/services/wbd/MapServer

Parameters
  • layer (str, optional) – A valid service layer. Valid layers are:

    • wbdline

    • huc2

    • huc4

    • huc6

    • huc8

    • huc10

    • huc12

    • huc14

    • huc16

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, optional) – Target spatial reference, default to EPSG:4326.

pygeohydro.pygeohydro.cover_statistics(cover_da)#

Percentages of the categorical NLCD cover data.

Parameters

cover_da (xarray.DataArray) – Land cover DataArray from a LULC Dataset from the nlcd_bygeom function.

Returns

Stats – A named tuple with the percentages of the cover classes and categories.

pygeohydro.pygeohydro.get_camels()#

Get streaflow and basin attributes of all 671 stations in CAMELS dataset.

Notes

For more info on CAMELS visit: https://ral.ucar.edu/solutions/products/camels

Returns

tuple of geopandas.GeoDataFrame and xarray.Dataset – The first is basin attributes as a geopandas.GeoDataFrame and the second is streamflow data and basin attributes as an xarray.Dataset.

pygeohydro.pygeohydro.nlcd_bycoords(coords, years=None, region='L48', ssl=None)#

Get data from NLCD database (2019).

Parameters
  • coords (list of tuple) – List of coordinates in the form of (longitude, latitude).

  • years (dict, optional) – The years for NLCD layers as a dictionary, defaults to {'impervious': [2019], 'cover': [2019], 'canopy': [2019], "descriptor": [2019]}. Layers that are not in years are ignored, e.g., {'cover': [2016, 2019]} returns land cover data for 2016 and 2019.

  • region (str, optional) – Region in the US, defaults to L48. Valid values are L48 (for CONUS), HI (for Hawaii), AK (for Alaska), and PR (for Puerto Rico). Both lower and upper cases are acceptable.

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

Returns

geopandas.GeoDataFrame – A GeoDataFrame with the NLCD data and the coordinates.

pygeohydro.pygeohydro.nlcd_bygeom(geometry, resolution, years=None, region='L48', crs=DEF_CRS, ssl=None)#

Get data from NLCD database (2019).

Parameters
  • geometry (geopandas.GeoDataFrame or geopandas.GeoSeries) – A GeoDataFrame or GeoSeries with the geometry to query. The indices are used as keys in the output dictionary.

  • resolution (float) – The data resolution in meters. The width and height of the output are computed in pixel based on the geometry bounds and the given resolution.

  • years (dict, optional) – The years for NLCD layers as a dictionary, defaults to {'impervious': [2019], 'cover': [2019], 'canopy': [2019], "descriptor": [2019]}. Layers that are not in years are ignored, e.g., {'cover': [2016, 2019]} returns land cover data for 2016 and 2019.

  • region (str, optional) – Region in the US, defaults to L48. Valid values are L48 (for CONUS), HI (for Hawaii), AK (for Alaska), and PR (for Puerto Rico). Both lower and upper cases are acceptable.

  • crs (str, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

Returns

dict of xarray.Dataset or xarray.Dataset – A single or a dict of NLCD datasets. If dict, the keys are indices of the input GeoDataFrame.

pygeohydro.pygeohydro.overland_roughness(cover_da)#

Estimate overland roughness from land cover data.

Parameters

cover_da (xarray.DataArray) – Land cover DataArray from a LULC Dataset from the nlcd_bygeom function.

Returns

xarray.DataArray – Overland roughness

pygeohydro.pygeohydro.ssebopeta_bycoords(coords, dates, crs=DEF_CRS)#

Daily actual ET for a dataframe of coords from SSEBop database in mm/day.

Parameters
  • coords (pandas.DataFrame) – A dataframe with id, x, y columns.

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • crs (str, optional) – The CRS of the input coordinates, defaults to epsg:4326.

Returns

xarray.Dataset – Daily actual ET in mm/day as a dataset with time and location_id dimensions. The location_id dimension is the same as the id column in the input dataframe.

pygeohydro.pygeohydro.ssebopeta_bygeom(geometry, dates, geo_crs=DEF_CRS)#

Get daily actual ET for a region from SSEBop database.

Notes

Since there’s still no web service available for subsetting SSEBop, the data first needs to be downloaded for the requested period then it is masked by the region of interest locally. Therefore, it’s not as fast as other functions and the bottleneck could be the download speed.

Parameters
  • geometry (shapely.geometry.Polygon or tuple) – The geometry for downloading clipping the data. For a tuple bbox, the order should be (west, south, east, north).

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • geo_crs (str, optional) – The CRS of the input geometry, defaults to epsg:4326.

Returns

xarray.DataArray – Daily actual ET within a geometry in mm/day at 1 km resolution

pygeohydro.pygeohydro.ssebopeta_byloc(coords, dates)#

Daily actual ET for a location from SSEBop database in mm/day.

Deprecated since version 0.11.5: Use ssebopeta_bycoords() instead. For now, this function calls ssebopeta_bycoords() but retains the same functionality, i.e., returns a dataframe and accepts only a single coordinate. Whereas the new function returns a xarray.Dataset and accepts a dataframe containing coordinates.

Parameters
  • coords (tuple) – Longitude and latitude of a single location as a tuple (lon, lat)

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

Returns

pandas.Series – Daily actual ET for a location

pygeohydro.us_abbrs#

US states and territories Abbreviations from us package.

pygeohydro.waterdata#

Accessing data from the supported databases through their APIs.

Module Contents#
class pygeohydro.waterdata.NWIS#

Access NWIS web service.

get_info(self, queries, expanded=False, fix_names=True)#

Send multiple queries to USGS Site Web Service.

Parameters
  • queries (dict or list of dict) – A single or a list of valid queries.

  • expanded (bool, optional) – Whether to get expanded sit information for example drainage area, default to False.

  • fix_names (bool, optional) – If True, reformat station names and some small annoyances, defaults to True.

Returns

geopandas.GeoDataFrame – A correctly typed GeoDataFrame containing site(s) information.

get_parameter_codes(self, keyword)#

Search for parameter codes by name or number.

Notes

NWIS guideline for keywords is as follows:

By default an exact search is made. To make a partial search the term should be prefixed and suffixed with a % sign. The % sign matches zero or more characters at the location. For example, to find all with “discharge” enter %discharge% in the field. % will match any number of characters (including zero characters) at the location.

Parameters

keyword (str) – Keyword to search for parameters by name of number.

Returns

pandas.DataFrame – Matched parameter codes as a dataframe with their description.

Examples

>>> from pygeohydro import NWIS
>>> nwis = NWIS()
>>> codes = nwis.get_parameter_codes("%discharge%")
>>> codes.loc[codes.parameter_cd == "00060", "parm_nm"][0]
'Discharge, cubic feet per second'
get_streamflow(self, station_ids, dates, freq='dv', mmd=False, to_xarray=False)#

Get mean daily streamflow observations from USGS.

Parameters
  • station_ids (str, list) – The gage ID(s) of the USGS station.

  • dates (tuple) – Start and end dates as a tuple (start, end).

  • freq (str, optional) – The frequency of the streamflow data, defaults to dv (daily values). Valid frequencies are dv (daily values), iv (instantaneous values). Note that for iv the time zone for the input dates is assumed to be UTC.

  • mmd (bool, optional) – Convert cms to mm/day based on the contributing drainage area of the stations. Defaults to False.

  • to_xarray (bool, optional) – Whether to return a xarray.Dataset. Defaults to False.

Returns

pandas.DataFrame or xarray.Dataset – Streamflow data observations in cubic meter per second (cms). The stations that don’t provide the requested discharge data in the target period will be dropped. Note that when frequency is set to iv the time zone is converted to UTC.

retrieve_rdb(self, url, payloads)#

Retrieve and process requests with RDB format.

Parameters
  • url (str) – Name of USGS REST service, valid values are site, dv, iv, gwlevels, and stat. Please consult USGS documentation here for more information.

  • payloads (list of dict) – List of target payloads.

Returns

pandas.DataFrame – Requested features as a pandas’s DataFrame.

class pygeohydro.waterdata.WaterQuality#

Water Quality Web Service https://www.waterqualitydata.us.

Notes

This class has a number of convenience methods to retrieve data from the Water Quality Data. Since there are many parameter combinations that can be used to retrieve data, a general method is also provided to retrieve data from any of the valid endpoints. You can use get_json to retrieve stations info as a geopandas.GeoDataFrame or get_csv to retrieve stations data as a pandas.DataFrame. You can construct a dictionary of the parameters and pass it to one of these functions. For more information on the parameters, please consult the Water Quality Data documentation.

data_bystation(self, station_ids, wq_kwds)#

Retrieve data for a single station.

Parameters
  • station_ids (str or list of str) – Station ID(s). The IDs should have the format “Agency code-Station ID”.

  • wq_kwds (dict, optional) – Water Quality Web Service keyword arguments. Default to None.

Returns

pandas.DataFrame – DataFrame of data for the stations.

get_csv(self, endpoint, kwds, request_method='GET')#

Get the CSV response from the Water Quality Web Service.

Parameters
  • endpoint (str) – Endpoint of the Water Quality Web Service.

  • kwds (dict) – Water Quality Web Service keyword arguments.

  • request_method (str, optional) – HTTP request method. Default to GET.

Returns

pandas.DataFrame – The web service response as a DataFrame.

get_json(self, endpoint, kwds, request_method='GET')#

Get the JSON response from the Water Quality Web Service.

Parameters
  • endpoint (str) – Endpoint of the Water Quality Web Service.

  • kwds (dict) – Water Quality Web Service keyword arguments.

  • request_method (str, optional) – HTTP request method. Default to GET.

Returns

geopandas.GeoDataFrame – The web service response as a GeoDataFrame.

get_param_table(self)#

Get the parameter table from the USGS Water Quality Web Service.

lookup_domain_values(self, endpoint)#

Get the domain values for the target endpoint.

station_bybbox(self, bbox, wq_kwds)#

Retrieve station info within bounding box.

Parameters
  • bbox (tuple of float) – Bounding box coordinates (west, south, east, north) in epsg:4326.

  • wq_kwds (dict, optional) – Water Quality Web Service keyword arguments. Default to None.

Returns

geopandas.GeoDataFrame – GeoDataFrame of station info within the bounding box.

station_bydistance(self, lon, lat, radius, wq_kwds)#

Retrieve station within a radius (decimal miles) of a point.

Parameters
  • lon (float) – Longitude of point.

  • lat (float) – Latitude of point.

  • radius (float) – Radius (decimal miles) of search.

  • wq_kwds (dict, optional) – Water Quality Web Service keyword arguments. Default to None.

Returns

geopandas.GeoDataFrame – GeoDataFrame of station info within the radius of the point.

pygeohydro.waterdata.interactive_map(bbox, crs=DEF_CRS, nwis_kwds=None)#

Generate an interactive map including all USGS stations within a bounding box.

Parameters
  • bbox (tuple) – List of corners in this order (west, south, east, north)

  • crs (str, optional) – CRS of the input bounding box, defaults to EPSG:4326.

  • nwis_kwds (dict, optional) – Optional keywords to include in the NWIS request as a dictionary like so: {"hasDataTypeCd": "dv,iv", "outputDataTypeCd": "dv,iv", "parameterCd": "06000"}. Default to None.

Returns

folium.Map – Interactive map within a bounding box.

Examples

>>> import pygeohydro as gh
>>> nwis_kwds = {"hasDataTypeCd": "dv,iv", "outputDataTypeCd": "dv,iv"}
>>> m = gh.interactive_map((-69.77, 45.07, -69.31, 45.45), nwis_kwds=nwis_kwds)
>>> n_stations = len(m.to_dict()["children"]) - 1
>>> n_stations
10

Package Contents#

py3dep#

Top-level package for Py3DEP.

Submodules#

py3dep.py3dep#

Get data from 3DEP database.

Module Contents#
py3dep.py3dep.check_3dep_availability(bbox, crs=DEF_CRS)#

Query 3DEP’s resolution availability within a bounding box.

This function checks availability of 3DEP’s at the following resolutions: 1 m, 3 m, 5 m, 10 m, 30 m, 60 m, and topobathy (integrated topobathymetry).

Parameters
  • bbox (tuple) – Bounding box as tuple of (min_x, min_y, max_x, max_y).

  • crs (str or pyproj.CRS, optional) – Spatial reference (CRS) of bbox, defaults to EPSG:4326.

Returns

dict – True if bbox intersects 3DEP elevation for each available resolution. Keys are the supported resolutions and values are their availability.

Examples

>>> import py3dep
>>> bbox = (-69.77, 45.07, -69.31, 45.45)
>>> py3dep.check_3dep_availability(bbox)
{'1m': True, '3m': False, '5m': False, '10m': True, '30m': True, '60m': False, 'topobathy': False}
py3dep.py3dep.elevation_bycoords(coords, crs=DEF_CRS, source='tep')#

Get elevation for a list of coordinates.

Parameters
  • coords (list of tuple) – Coordinates of target location as list of tuples [(x, y), ...].

  • crs (str or pyproj.CRS, optional) – Spatial reference (CRS) of coords, defaults to EPSG:4326.

  • source (str, optional) – Data source to be used, default to airmap. Supported sources are airmap (30 m resolution), tnm (using The National Map’s Bulk Point Query Service with 10 m resolution) and tep (using 3DEP’s WMS service at 10 m resolution). The tnm and tep sources are more accurate since they use the 1/3 arc-second DEM layer from 3DEP service but it is limited to the US. They both tend to be slower than the Airmap service. Note that tnm is bit unstable. It’s recommended to use tep unless 10-m resolution accuracy is not necessary which in that case airmap is more appropriate.

Returns

list of float – Elevation in meter.

py3dep.py3dep.elevation_bygrid(xcoords, ycoords, crs, resolution, depression_filling=False)#

Get elevation from DEM data for a grid.

This function is intended for getting elevations for a gridded dataset.

Parameters
  • xcoords (list) – List of x-coordinates of a grid.

  • ycoords (list) – List of y-coordinates of a grid.

  • crs (str or pyproj.CRS) – The spatial reference system of the input grid, defaults to EPSG:4326.

  • resolution (float) – The accuracy of the output, defaults to 10 m which is the highest available resolution that covers CONUS. Note that higher resolution increases computation time so chose this value with caution.

  • depression_filling (bool, optional) – Fill depressions before sampling using RichDEM package, defaults to False.

Returns

xarray.DataArray – Elevations of the input coordinates as a xarray.DataArray.

py3dep.py3dep.elevation_profile(lines, spacing, dem_res=10, crs=DEF_CRS)#

Get the elevation profile along a line at a given uniform spacing.

This function converts the line to a B-spline and then calculates the elevation along the spline at a given uniform spacing.

Parameters
  • lines (LineString or MultiLineString) – Line segment(s) to be profiled. If its type is MultiLineString, it will be converted to a single LineString and if this operation fails, a InvalidInputType will be raised.

  • spacing (float) – Spacing between the sample points along the line in meters.

  • dem_res (float, optional) – Resolution of the DEM source to use in meter, defaults to 10.

  • crs (str or pyproj.CRS, optional) – Spatial reference (CRS) of lines, defaults to EPSG:4326.

Returns

xarray.DataArray – Elevation profile with dimension z and three coordinates: x, y, and distance. The distance coordinate is the distance from the start of the line in meters.

py3dep.py3dep.get_map(layers, geometry, resolution, geo_crs=DEF_CRS, crs=DEF_CRS)#

Access to 3DEP service.

The 3DEP service has multi-resolution sources, so depending on the user provided resolution the data is resampled on server-side based on all the available data sources. The following layers are available:

  • DEM

  • Hillshade Gray

  • Aspect Degrees

  • Aspect Map

  • GreyHillshade_elevationFill

  • Hillshade Multidirectional

  • Slope Map

  • Slope Degrees

  • Hillshade Elevation Tinted

  • Height Ellipsoidal

  • Contour 25

  • Contour Smoothed 25

Parameters
  • layers (str or list of str) – A valid 3DEP layer or a list of them.

  • geometry (Polygon, MultiPolygon, or tuple) – A shapely Polygon or a bounding box of the form (west, south, east, north).

  • resolution (float) – The target resolution in meters. The width and height of the output are computed in pixels based on the geometry bounds and the given resolution.

  • geo_crs (str, optional) – The spatial reference system of the input geometry, defaults to EPSG:4326.

  • crs (str, optional) – The spatial reference system to be used for requesting the data, defaults to EPSG:4326. Valid values are EPSG:4326, EPSG:3576, EPSG:3571, EPSG:3575, EPSG:3857, EPSG:3572, CRS:84, EPSG:3573, and EPSG:3574.

Returns

xarray.DataArray or xarray.Dataset – The requested topographic data as an xarray.DataArray or xarray.Dataset.

py3dep.py3dep.query_3dep_sources(bbox, crs=DEF_CRS, res=None)#

Query 3DEP’s data sources within a bounding box.

This function queries the availability of the underlying data that 3DEP uses at the following resolutions: 1 m, 3 m, 5 m, 10 m, 30 m, 60 m, and topobathy (integrated topobathymetry).

Parameters
  • bbox (tuple) – Bounding box as tuple of (min_x, min_y, max_x, max_y).

  • crs (str or pyproj.CRS, optional) – Spatial reference (CRS) of bbox, defaults to EPSG:4326.

  • res (str, optional) – Resolution to query, defaults to None, i.e., all resolutions.

Returns

geopandas.GeoDataFrame – Polygon(s) representing the 3DEP data sources at each resolution. Resolutions are given in the dem_res column.

Examples

>>> import py3dep
>>> bbox = (-69.77, 45.07, -69.31, 45.45)
>>> src = py3dep.query_3dep_sources(bbox)
>>> src.groupby("dem_res")["OBJECTID"].count().to_dict()
{'10m': 8, '1m': 4, '30m': 8}
>>> src = py3dep.query_3dep_sources(bbox, res="1m")
>>> src.groupby("dem_res")["OBJECTID"].count().to_dict()
{'1m': 4}
py3dep.utils#

Utilities for Py3DEP.

Module Contents#
py3dep.utils.deg2mpm(slope)#

Convert slope from degrees to meter/meter.

Parameters

slope (xarray.DataArray) – Slope in degrees.

Returns

xarray.DataArray – Slope in meter/meter. The name is set to slope and the units attribute is set to m/m.

py3dep.utils.fill_depressions(dem_da)#

Fill depressions and adjust flat areas in a DEM using RichDEM.

Parameters

dem (xarray.DataArray or numpy.ndarray) – Digital Elevation Model.

Returns

xarray.DataArray – Conditioned DEM after applying depression filling and flat area resolution operations.

Package Contents#

pydaymet#

Top-level package for PyDaymet.

Submodules#

pydaymet.core#

Core class for the Daymet functions.

Module Contents#
class pydaymet.core.Daymet(variables=None, pet=None, snow=False, time_scale='daily', region='na')#

Base class for Daymet requests.

Parameters
  • variables (str or list or tuple, optional) – List of variables to be downloaded. The acceptable variables are: tmin, tmax, prcp, srad, vp, swe, dayl Descriptions can be found here. Defaults to None i.e., all the variables are downloaded.

  • pet (str, optional) – Method for computing PET. Supported methods are penman_monteith, priestley_taylor, hargreaves_samani, and None (don’t compute PET). The penman_monteith method is based on Allen et al.1 assuming that soil heat flux density is zero. The priestley_taylor method is based on Priestley and TAYLOR2 assuming that soil heat flux density is zero. The hargreaves_samani method is based on Hargreaves and Samani3. Defaults to None.

  • snow (bool, optional) – Compute snowfall from precipitation and minimum temperature. Defaults to False.

  • time_scale (str, optional) – Data time scale which can be daily, monthly (monthly summaries), or annual (annual summaries). Defaults to daily.

  • region (str, optional) – Region in the US, defaults to na. Acceptable values are:

    • na: Continental North America

    • hi: Hawaii

    • pr: Puerto Rico

References

1

Richard G Allen, Luis S Pereira, Dirk Raes, Martin Smith, and others. Crop evapotranspiration-guidelines for computing crop water requirements-fao irrigation and drainage paper 56. Fao, Rome, 300(9):D05109, 1998.

2

Charles Henry Brian Priestley and Robert Joseph TAYLOR. On the assessment of surface heat flux and evaporation using large-scale parameters. Monthly weather review, 100(2):81–92, 1972.

3

George H. Hargreaves and Zohrab A. Samani. Estimating potential evapotranspiration. Journal of the Irrigation and Drainage Division, 108(3):225–230, sep 1982. URL: https://doi.org/10.1061%2Fjrcea4.0001390, doi:10.1061/jrcea4.0001390.

static check_dates(dates)#

Check if input dates are in correct format and valid.

dates_todict(self, dates)#

Set dates by start and end dates as a tuple, (start, end).

dates_tolist(self, dates)#

Correct dates for Daymet accounting for leap years.

Daymet doesn’t account for leap years and removes Dec 31 when it’s leap year.

Parameters

dates (tuple) – Target start and end dates.

Returns

list – All the dates in the Daymet database within the provided date range.

separate_snow(self, clm, t_rain=T_RAIN, t_snow=T_SNOW)#

Separate snow based on Martinez and Gupta4.

Parameters
  • clm (pandas.DataFrame or xarray.Dataset) – Climate data that should include prcp and tmin.

  • t_rain (float, optional) – Threshold for temperature for considering rain, defaults to 2.5 degrees C.

  • t_snow (float, optional) – Threshold for temperature for considering snow, defaults to 0.6 degrees C.

Returns

pandas.DataFrame or xarray.Dataset – Input data with snow (mm/day) column if input is a pandas.DataFrame, or snow variable if input is an xarray.Dataset.

References

4

Guillermo F. Martinez and Hoshin V. Gupta. Toward improved identification of hydrological models: a diagnostic evaluation of the “abcd” monthly water balance model for the conterminous united states. Water Resources Research, 2010. URL: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2009WR008294, arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2009WR008294, doi:https://doi.org/10.1029/2009WR008294.

years_todict(self, years)#

Set date by list of year(s).

years_tolist(self, years)#

Correct dates for Daymet accounting for leap years.

Daymet doesn’t account for leap years and removes Dec 31 when it’s leap year.

Parameters

years (list) – A list of target years.

Returns

list – All the dates in the Daymet database within the provided date range.

pydaymet.pet#

Core class for the Daymet functions.

Module Contents#
pydaymet.pet.potential_et(clm, coords=None, crs=DEF_CRS, method='hargreaves_samani', params=None)#

Compute Potential EvapoTranspiration for both gridded and a single location.

Parameters
  • clm (pandas.DataFrame or xarray.Dataset) – The dataset must include at least the following variables:

    • Minimum temperature in degree celsius

    • Maximum temperature in degree celsius

    • Solar radiation in in W/m2

    • Daylight duration in seconds

    Optionally, relative humidity and wind speed at 2-m level will be used if available.

    Table below shows the variable names that the function looks for in the input data.

    DataFrame

    Dataset

    tmin (degrees C)

    tmin

    tmax (degrees C)

    tmax

    srad (W/m2)

    srad

    dayl (s)

    dayl

    rh (-)

    rh

    u2 (m/s)

    u2

    If relative humidity and wind speed at 2-m level are not available, actual vapour pressure is assumed to be saturation vapour pressure at daily minimum temperature and 2-m wind speed is considered to be 2 m/s.

  • coords (tuple of floats, optional) – Coordinates of the daymet data location as a tuple, (x, y). This is required when clm is a DataFrame.

  • crs (str, optional) – The spatial reference of the input coordinate, defaults to epsg:4326. This is only used when clm is a DataFrame.

  • method (str, optional) – Method for computing PET. Supported methods are penman_monteith, priestley_taylor, hargreaves_samani, and None (don’t compute PET). The penman_monteith method is based on Allen et al.1 assuming that soil heat flux density is zero. The priestley_taylor method is based on Priestley and TAYLOR2 assuming that soil heat flux density is zero. The hargreaves_samani method is based on Hargreaves and Samani3. Defaults to hargreaves_samani.

  • params (dict, optional) – Model-specific parameters as a dictionary, defaults to None.

Returns

pandas.DataFrame or xarray.Dataset – The input DataFrame/Dataset with an additional variable named pet (mm/day) for DataFrame and pet for Dataset.

References

1

Richard G Allen, Luis S Pereira, Dirk Raes, Martin Smith, and others. Crop evapotranspiration-guidelines for computing crop water requirements-fao irrigation and drainage paper 56. Fao, Rome, 300(9):D05109, 1998.

2

Charles Henry Brian Priestley and Robert Joseph TAYLOR. On the assessment of surface heat flux and evaporation using large-scale parameters. Monthly weather review, 100(2):81–92, 1972.

3

George H. Hargreaves and Zohrab A. Samani. Estimating potential evapotranspiration. Journal of the Irrigation and Drainage Division, 108(3):225–230, sep 1982. URL: https://doi.org/10.1061%2Fjrcea4.0001390, doi:10.1061/jrcea4.0001390.

pydaymet.pydaymet#

Access the Daymet database for both single single pixel and gridded queries.

Module Contents#
pydaymet.pydaymet.get_bycoords(coords, dates, crs=DEF_CRS, variables=None, region='na', time_scale='daily', pet=None, pet_params=None, snow=False, snow_params=None, ssl=None)#

Get point-data from the Daymet database at 1-km resolution.

This function uses THREDDS data service to get the coordinates and supports getting monthly and annual summaries of the climate data directly from the server.

Parameters
  • coords (tuple) – Coordinates of the location of interest as a tuple (lon, lat)

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, ...].

  • crs (str, optional) – The CRS of the input geometry, defaults to "epsg:4326".

  • variables (str or list) – List of variables to be downloaded. The acceptable variables are: tmin, tmax, prcp, srad, vp, swe, dayl Descriptions can be found here.

  • region (str, optional) – Target region in the US, defaults to na. Acceptable values are:

    • na: Continental North America

    • hi: Hawaii

    • pr: Puerto Rico

  • time_scale (str, optional) – Data time scale which can be daily, monthly (monthly summaries), or annual (annual summaries). Defaults to daily.

  • pet (str, optional) – Method for computing PET. Supported methods are penman_monteith, priestley_taylor, hargreaves_samani, and None (don’t compute PET). The penman_monteith method is based on Allen et al.1 assuming that soil heat flux density is zero. The priestley_taylor method is based on Priestley and TAYLOR2 assuming that soil heat flux density is zero. The hargreaves_samani method is based on Hargreaves and Samani3. Defaults to None.

  • pet_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the PET function. Defaults to None.

  • snow (bool, optional) – Compute snowfall from precipitation and minimum temperature. Defaults to False.

  • snow_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used if snow is True. Two parameters are required: t_rain (deg C) which is the threshold for temperature for considering rain and t_snow (deg C) which is the threshold for temperature for considering snow. The default values are {'t_rain': 2.5, 't_snow': 0.6} that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

Returns

pandas.DataFrame – Daily climate data for a location.

Examples

>>> import pydaymet as daymet
>>> coords = (-1431147.7928, 318483.4618)
>>> dates = ("2000-01-01", "2000-12-31")
>>> clm = daymet.get_bycoords(
...     coords,
...     dates,
...     crs="epsg:3542",
...     pet="hargreaves_samani",
...     ssl=False
... )
>>> clm["pet (mm/day)"].mean()
3.713

References

1(1,2)

Richard G Allen, Luis S Pereira, Dirk Raes, Martin Smith, and others. Crop evapotranspiration-guidelines for computing crop water requirements-fao irrigation and drainage paper 56. Fao, Rome, 300(9):D05109, 1998.

2(1,2)

Charles Henry Brian Priestley and Robert Joseph TAYLOR. On the assessment of surface heat flux and evaporation using large-scale parameters. Monthly weather review, 100(2):81–92, 1972.

3(1,2)

George H. Hargreaves and Zohrab A. Samani. Estimating potential evapotranspiration. Journal of the Irrigation and Drainage Division, 108(3):225–230, sep 1982. URL: https://doi.org/10.1061%2Fjrcea4.0001390, doi:10.1061/jrcea4.0001390.

pydaymet.pydaymet.get_bygeom(geometry, dates, crs=DEF_CRS, variables=None, region='na', time_scale='daily', pet=None, pet_params=None, snow=False, snow_params=None, ssl=None)#

Get gridded data from the Daymet database at 1-km resolution.

Parameters
  • geometry (Polygon, MultiPolygon, or bbox) – The geometry of the region of interest.

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • crs (str, optional) – The CRS of the input geometry, defaults to epsg:4326.

  • variables (str or list) – List of variables to be downloaded. The acceptable variables are: tmin, tmax, prcp, srad, vp, swe, dayl Descriptions can be found here.

  • region (str, optional) – Region in the US, defaults to na. Acceptable values are:

    • na: Continental North America

    • hi: Hawaii

    • pr: Puerto Rico

  • time_scale (str, optional) – Data time scale which can be daily, monthly (monthly average), or annual (annual average). Defaults to daily.

  • pet (str, optional) – Method for computing PET. Supported methods are penman_monteith, priestley_taylor, hargreaves_samani, and None (don’t compute PET). The penman_monteith method is based on Allen et al.1 assuming that soil heat flux density is zero. The priestley_taylor method is based on Priestley and TAYLOR2 assuming that soil heat flux density is zero. The hargreaves_samani method is based on Hargreaves and Samani3. Defaults to None.

  • pet_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the PET function. Defaults to None.

  • snow (bool, optional) – Compute snowfall from precipitation and minimum temperature. Defaults to False.

  • snow_params (dict, optional) – Model-specific parameters as a dictionary that is passed to the snowfall function. These parameters are only used if snow is True. Two parameters are required: t_rain (deg C) which is the threshold for temperature for considering rain and t_snow (deg C) which is the threshold for temperature for considering snow. The default values are {'t_rain': 2.5, 't_snow': 0.6} that are adopted from https://doi.org/10.5194/gmd-11-1077-2018.

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

Returns

xarray.Dataset – Daily climate data within the target geometry.

Examples

>>> from shapely.geometry import Polygon
>>> import pydaymet as daymet
>>> geometry = Polygon(
...     [[-69.77, 45.07], [-69.31, 45.07], [-69.31, 45.45], [-69.77, 45.45], [-69.77, 45.07]]
... )
>>> clm = daymet.get_bygeom(geometry, 2010, variables="tmin", time_scale="annual")
>>> clm["tmin"].mean().compute().item()
1.361

References

Package Contents#

async_retriever#

Top-level package.

Submodules#

async_retriever.async_retriever#

Core async functions.

Module Contents#
async_retriever.async_retriever.delete_url_cache(url, request_method='GET', cache_name=None, **kwargs)#

Delete cached response associated with url, along with its history (if applicable).

Parameters
  • url (str) – URL to be deleted from the cache

  • request_method (str, optional) – HTTP request method to be deleted from the cache, defaults to GET.

  • cache_name (str, optional) – Path to a file for caching the session, defaults to ./cache/aiohttp_cache.sqlite.

  • kwargs (dict, optional) – Keywords to pass to the cache.delete_url().

async_retriever.async_retriever.retrieve(urls, read, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=EXPIRE, ssl=None, disable=False)#

Send async requests.

Parameters
  • urls (list of str) – List of URLs.

  • read (str) – Method for returning the request; binary, json, and text.

  • request_kwds (list of dict, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults to None. For example, [{"params": {...}, "headers": {...}}, ...].

  • request_method (str, optional) – Request type; GET (get) or POST (post). Defaults to GET.

  • max_workers (int, optional) – Maximum number of async processes, defaults to 8.

  • cache_name (str, optional) – Path to a file for caching the session, defaults to ./cache/aiohttp_cache.sqlite.

  • timeout (float, optional) – Timeout for the request, defaults to 5.0.

  • expire_after (int, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

  • disable (bool, optional) – If True temporarily disable caching requests and get new responses from the server, defaults to False.

Returns

list – List of responses in the order of input URLs.

Examples

>>> import async_retriever as ar
>>> stations = ["01646500", "08072300", "11073495"]
>>> url = "https://waterservices.usgs.gov/nwis/site"
>>> urls, kwds = zip(
...     *[
...         (url, {"params": {"format": "rdb", "sites": s, "siteStatus": "all"}})
...         for s in stations
...     ]
... )
>>> resp = ar.retrieve(urls, "text", request_kwds=kwds)
>>> resp[0].split('\n')[-2].split('\t')[1]
'01646500'
async_retriever.async_retriever.retrieve_binary(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=EXPIRE, ssl=None, disable=False)#

Send async requests and get the response as bytes.

Parameters
  • urls (list of str) – List of URLs.

  • request_kwds (list of dict, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults to None. For example, [{"params": {...}, "headers": {...}}, ...].

  • request_method (str, optional) – Request type; GET (get) or POST (post). Defaults to GET.

  • max_workers (int, optional) – Maximum number of async processes, defaults to 8.

  • cache_name (str, optional) – Path to a file for caching the session, defaults to ./cache/aiohttp_cache.sqlite.

  • timeout (float, optional) – Timeout for the request, defaults to 5.0.

  • expire_after (int, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

  • disable (bool, optional) – If True temporarily disable caching requests and get new responses from the server, defaults to False.

Returns

bytes – List of responses in the order of input URLs.

async_retriever.async_retriever.retrieve_json(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=EXPIRE, ssl=None, disable=False)#

Send async requests and get the response as json.

Parameters
  • urls (list of str) – List of URLs.

  • request_kwds (list of dict, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults to None. For example, [{"params": {...}, "headers": {...}}, ...].

  • request_method (str, optional) – Request type; GET (get) or POST (post). Defaults to GET.

  • max_workers (int, optional) – Maximum number of async processes, defaults to 8.

  • cache_name (str, optional) – Path to a file for caching the session, defaults to ./cache/aiohttp_cache.sqlite.

  • timeout (float, optional) – Timeout for the request, defaults to 5.0.

  • expire_after (int, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

  • disable (bool, optional) – If True temporarily disable caching requests and get new responses from the server, defaults to False.

Returns

dict – List of responses in the order of input URLs.

Examples

>>> import async_retriever as ar
>>> urls = ["https://labs.waterdata.usgs.gov/api/nldi/linked-data/comid/position"]
>>> kwds = [
...     {
...         "params": {
...             "f": "json",
...             "coords": "POINT(-68.325 45.0369)",
...         },
...     },
... ]
>>> r = ar.retrieve_json(urls, kwds)
>>> print(r[0]["features"][0]["properties"]["identifier"])
2675320
async_retriever.async_retriever.retrieve_text(urls, request_kwds=None, request_method='GET', max_workers=8, cache_name=None, timeout=5.0, expire_after=EXPIRE, ssl=None, disable=False)#

Send async requests and get the response as text.

Parameters
  • urls (list of str) – List of URLs.

  • request_kwds (list of dict, optional) – List of requests keywords corresponding to input URLs (1 on 1 mapping), defaults to None. For example, [{"params": {...}, "headers": {...}}, ...].

  • request_method (str, optional) – Request type; GET (get) or POST (post). Defaults to GET.

  • max_workers (int, optional) – Maximum number of async processes, defaults to 8.

  • cache_name (str, optional) – Path to a file for caching the session, defaults to ./cache/aiohttp_cache.sqlite.

  • timeout (float, optional) – Timeout for the request in seconds, defaults to 5.0.

  • expire_after (int, optional) – Expiration time for response caching in seconds, defaults to -1 (never expire).

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

  • disable (bool, optional) – If True temporarily disable caching requests and get new responses from the server, defaults to False.

Returns

list – List of responses in the order of input URLs.

Examples

>>> import async_retriever as ar
>>> stations = ["01646500", "08072300", "11073495"]
>>> url = "https://waterservices.usgs.gov/nwis/site"
>>> urls, kwds = zip(
...     *[
...         (url, {"params": {"format": "rdb", "sites": s, "siteStatus": "all"}})
...         for s in stations
...     ]
... )
>>> resp = ar.retrieve_text(urls, kwds)
>>> resp[0].split('\n')[-2].split('\t')[1]
'01646500'
async_retriever.utils#

Core async functions.

Module Contents#
class async_retriever.utils.BaseRetriever(urls, read, request_kwds=None, request_method='GET', cache_name=None)#

Base class for async retriever.

static generate_requests(urls, request_kwds)#

Generate urls and keywords.

async_retriever.utils.create_cachefile(db_name=None)#

Create a cache folder in the current working directory.

async async_retriever.utils.delete_url(url, method='GET', cache_name=None, **kwargs)#

Delete cached response associated with url.

async_retriever.utils.get_event_loop()#

Create an event loop.

async async_retriever.utils.retriever(uid, url, s_kwds, session, read_type, r_kwds)#

Create an async request and return the response as binary.

Parameters
  • uid (int) – ID of the URL for sorting after returning the results

  • url (str) – URL to be retrieved

  • s_kwds (dict) – Arguments to be passed to requests

  • session (ClientSession) – A ClientSession for sending the request

  • read_type (str) – Return response as text, bytes, or json.

  • r_kwds (dict) – Keywords to pass to the response read function. It is {"content_type": None} if read is json else an empty dict.

Returns

bytes – The retrieved response as binary.

Package Contents#

pygeoogc#

Top-level package for PyGeoOGC.

Submodules#

pygeoogc.core#

Base classes and function for REST, WMS, and WMF services.

Module Contents#
class pygeoogc.core.ArcGISRESTfulBase(base_url, layer=None, outformat='geojson', outfields='*', crs=DEF_CRS, max_workers=1, verbose=False, disable_retry=False)#

Access to an ArcGIS REST service.

Parameters
  • base_url (str, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last / in the url or the target layer must be passed as an argument.

  • layer (int, optional) – Target layer number, defaults to None. If None layer number must be included as after the last / in base_url.

  • outformat (str, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults to geojson. It defaults to esriSpatialRelIntersects.

  • outfields (str or list) – The output fields to be requested. Setting * as outfields requests all the available fields which is the default setting.

  • crs (str, optional) – The spatial reference of the output data, defaults to EPSG:4326

  • max_workers (int, optional) – Max number of simultaneous requests, default to 2. Note that some services might face issues when several requests are sent simultaneously and will return the requests partially. It’s recommended to avoid using too many workers unless you are certain the web service can handle it.

  • verbose (bool, optional) – If True, prints information about the requests and responses, defaults to False.

  • disable_retry (bool, optional) – If True in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests is saved to a text file which its path can be accessed via self.failed_path.

esri_query(self, geom, geo_crs=DEF_CRS)#

Generate geometry queries based on ESRI template.

get_features(self, featureids, return_m=False, return_geom=True)#

Get features based on the feature IDs.

Parameters
  • featureids (list) – List of feature IDs.

  • return_m (bool, optional) – Whether to activate the Return M (measure) in the request, defaults to False.

  • return_geom (bool, optional) – Whether to return the geometry of the feature, defaults to True.

Returns

dict – (Geo)json response from the web service.

get_response(self, url, payloads, method='GET')#

Send payload and get the response.

initialize_service(self)#

Initialize the RESTFul service.

partition_oids(self, oids)#

Partition feature IDs based on self.max_nrecords.

retry_failed_requests(self)#

Retry failed requests.

class pygeoogc.core.RESTValidator#

Validate ArcGISRESTful inputs.

Parameters
  • base_url (str, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last / in the url or the target layer must be passed as an argument.

  • layer (int, optional) – Target layer number, defaults to None. If None layer number must be included as after the last / in base_url.

  • outformat (str, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults to geojson.

  • outfields (str or list) – The output fields to be requested. Setting * as outfields requests all the available fields which is the default setting.

  • crs (str, optional) – The spatial reference of the output data, defaults to EPSG:4326

  • max_workers (int, optional) – Max number of simultaneous requests, default to 2. Note that some services might face issues when several requests are sent simultaneously and will return the requests partially. It’s recommended to avoid using too many workers unless you are certain the web service can handle it.

  • verbose (bool, optional) – If True, prints information about the requests and responses, defaults to False.

  • disable_retry (bool, optional) – If True in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests is saved to a text file which its path can be accessed via self.failed_path.

class pygeoogc.core.WFSBase#

Base class for WFS service.

Parameters
  • url (str) – The base url for the WFS service, for examples: https://hazards.fema.gov/nfhl/services/public/NFHL/MapServer/WFSServer

  • layer (str) – The layer from the service to be downloaded, defaults to None which throws an error and includes all the available layers offered by the service.

  • outformat (str) –

    The data format to request for data from the service, defaults to None which

    throws an error and includes all the available format offered by the service.

  • version (str, optional) – The WFS service version which should be either 1.0.0, 1.1.0, or 2.0.0. Defaults to 2.0.0.

  • crs (str, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

  • read_method (str, optional) – Method for reading the retrieved data, defaults to json. Valid options are json, binary, and text.

  • max_nrecords (int, optional) – The maximum number of records in a single request to be retrieved from the service, defaults to 1000. If the number of requested records is greater than this value, the query will be split into multiple requests.

get_validnames(self)#

Get valid column names for a layer.

validate_wfs(self)#

Validate input arguments with the WFS service.

class pygeoogc.core.WMSBase#

Base class for accessing a WMS service.

Parameters
  • url (str) – The base url for the WMS service e.g., https://www.mrlc.gov/geoserver/mrlc_download/wms

  • layers (str or list) – A layer or a list of layers from the service to be downloaded. You can pass an empty string to get a list of available layers.

  • outformat (str) – The data format to request for data from the service. You can pass an empty string to get a list of available output formats.

  • version (str, optional) – The WMS service version which should be either 1.1.1 or 1.3.0, defaults to 1.3.0.

  • crs (str, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

get_validlayers(self)#

Get the layers supported by the WMS service.

validate_wms(self)#

Validate input arguments with the WMS service.

pygeoogc.core.validate_version(val, valid_versions)#

Validate version from a list of valid versions.

Parameters
  • val (str) – Input version value.

  • valid_versions (list of str) – List of valid versions.

Returns

str – Validated version value.

pygeoogc.pygeoogc#

Base classes and function for REST, WMS, and WMF services.

Module Contents#
class pygeoogc.pygeoogc.ArcGISRESTful(base_url, layer=None, outformat='geojson', outfields='*', crs=DEF_CRS, max_workers=1, verbose=False, disable_retry=False)#

Access to an ArcGIS REST service.

Notes

By default, all retrieval methods retry to get the missing feature IDs, if there are any. You can disable this behavior by setting disable_retry to True. If there are any missing feature IDs after the retry, they are saved to a text file, path of which can be accessed by self.client.failed_path.

Parameters
  • base_url (str, optional) – The ArcGIS RESTful service url. The URL must either include a layer number after the last / in the url or the target layer must be passed as an argument.

  • layer (int, optional) – Target layer number, defaults to None. If None layer number must be included as after the last / in base_url.

  • outformat (str, optional) – One of the output formats offered by the selected layer. If not correct a list of available formats is shown, defaults to geojson.

  • outfields (str or list) – The output fields to be requested. Setting * as outfields requests all the available fields which is the default behaviour.

  • crs (str, optional) – The spatial reference of the output data, defaults to epsg:4326.

  • max_workers (int, optional) – Number of simultaneous download, default to 1, i.e., no threading. Note that some services might face issues when several requests are sent simultaneously and will return the requests partially. It’s recommended to avoid using too many workers unless you are certain the web service can handle it.

  • verbose (bool, optional) – If True, prints information about the requests and responses, defaults to False.

  • disable_retry (bool, optional) – If True in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests is saved to a text file which its path can be accessed via self.client.failed_path.

get_features(self, featureids, return_m=False, return_geom=True)#

Get features based on the feature IDs.

Parameters
  • featureids (list) – List of feature IDs.

  • return_m (bool, optional) – Whether to activate the Return M (measure) in the request, defaults to False.

  • return_geom (bool, optional) – Whether to return the geometry of the feature, defaults to True.

Returns

dict – (Geo)json response from the web service.

oids_byfield(self, field, ids)#

Get Object IDs based on a list of field IDs.

Parameters
  • field (str) – Name of the target field that IDs belong to.

  • ids (str or list) – A list of target ID(s).

Returns

list of tuples – A list of feature IDs partitioned by self.max_nrecords.

oids_bygeom(self, geom, geo_crs=DEF_CRS, spatial_relation='esriSpatialRelIntersects', sql_clause=None, distance=None)#

Get feature IDs within a geometry that can be combined with a SQL where clause.

Parameters
  • geom (LineString, Polygon, Point, MultiPoint, tuple, or list of tuples) – A geometry (LineString, Polygon, Point, MultiPoint), tuple of length two ((x, y)), a list of tuples of length 2 ([(x, y), ...]), or bounding box (tuple of length 4 ((xmin, ymin, xmax, ymax))).

  • geo_crs (str or pyproj.CRS) – The spatial reference of the input geometry.

  • spatial_relation (str, optional) – The spatial relationship to be applied on the input geometry while performing the query. If not correct a list of available options is shown. It defaults to esriSpatialRelIntersects. Valid predicates are:

    • esriSpatialRelIntersects

    • esriSpatialRelContains

    • esriSpatialRelCrosses

    • esriSpatialRelEnvelopeIntersects

    • esriSpatialRelIndexIntersects

    • esriSpatialRelOverlaps

    • esriSpatialRelTouches

    • esriSpatialRelWithin

    • esriSpatialRelRelation

  • sql_clause (str, optional) – Valid SQL 92 WHERE clause, default to None.

  • distance (int, optional) – Buffer distance in meters for the input geometries, default to None.

Returns

list of tuples – A list of feature IDs partitioned by self.max_nrecords.

oids_bysql(self, sql_clause)#

Get feature IDs using a valid SQL 92 WHERE clause.

Notes

Not all web services support this type of query. For more details look here.

Parameters

sql_clause (str) – A valid SQL 92 WHERE clause.

Returns

list of tuples – A list of feature IDs partitioned by self.max_nrecords.

partition_oids(self, oids)#

Partition feature IDs based on self.max_nrecords.

Parameters

oids (list of int or int) – A list of feature ID(s).

Returns

list of tuples – A list of feature IDs partitioned by self.max_nrecords.

class pygeoogc.pygeoogc.HttpURLs#

URLs of the supported HTTP services.

class pygeoogc.pygeoogc.RESTfulURLs#

URLs of the supported RESTful services.

class pygeoogc.pygeoogc.ServiceURL#

URLs of the supported services.

class pygeoogc.pygeoogc.WFS(url, layer=None, outformat=None, version='2.0.0', crs=DEF_CRS, read_method='json', max_nrecords=1000, validation=True)#

Data from any WFS service within a geometry or by featureid.

Parameters
  • url (str) – The base url for the WFS service, for examples: https://hazards.fema.gov/nfhl/services/public/NFHL/MapServer/WFSServer

  • layer (str) – The layer from the service to be downloaded, defaults to None which throws an error and includes all the available layers offered by the service.

  • outformat (str) –

    The data format to request for data from the service, defaults to None which

    throws an error and includes all the available format offered by the service.

  • version (str, optional) – The WFS service version which should be either 1.0.0, 1.1.0, or 2.0.0. Defaults to 2.0.0.

  • crs (str, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

  • read_method (str, optional) – Method for reading the retrieved data, defaults to json. Valid options are json, binary, and text.

  • max_nrecords (int, optional) – The maximum number of records in a single request to be retrieved from the service, defaults to 1000. If the number of records requested is greater than this value, it will be split into multiple requests.

  • validation (bool, optional) – Validate the input arguments from the WFS service, defaults to True. Set this to False if you are sure all the WFS settings such as layer and crs are correct to avoid sending extra requests.

getfeature_bybox(self, bbox, box_crs=DEF_CRS, always_xy=False)#

Get data from a WFS service within a bounding box.

Parameters
  • bbox (tuple) – A bounding box for getting the data: [west, south, east, north]

  • box_crs (str, or pyproj.CRS, optional) – The spatial reference system of the input bbox, defaults to epsg:4326.

  • always_xy (bool, optional) – Whether to always use xy axis order, defaults to False. Some services change the axis order from xy to yx, following the latest WFS version specifications but some don’t. If the returned value does not have any geometry, it indicates that most probably the axis order does not match. You can set this to True in that case.

Returns

str or bytes or dict – WFS query response within a bounding box.

getfeature_byfilter(self, cql_filter, method='GET')#

Get features based on a valid CQL filter.

Notes

The validity of the input CQL expression is user’s responsibility since the function does not perform any checks and just sends a request using the input filter.

Parameters
  • cql_filter (str) – A valid CQL filter expression.

  • method (str) – The request method, could be GET or POST (for long filters).

Returns

str or bytes or dict – WFS query response

getfeature_bygeom(self, geometry, geo_crs=DEF_CRS, always_xy=False, predicate='INTERSECTS')#

Get features based on a geometry.

Parameters
  • geometry (shapely.geometry) – The input geometry

  • geo_crs (str, or pyproj.CRS, optional) – The CRS of the input geometry, default to epsg:4326.

  • always_xy (bool, optional) – Whether to always use xy axis order, defaults to False. Some services change the axis order from xy to yx, following the latest WFS version specifications but some don’t. If the returned value does not have any geometry, it indicates that most probably the axis order does not match. You can set this to True in that case.

  • predicate (str, optional) – The geometric predicate to use for requesting the data, defaults to INTERSECTS. Valid predicates are:

    • EQUALS

    • DISJOINT

    • INTERSECTS

    • TOUCHES

    • CROSSES

    • WITHIN

    • CONTAINS

    • OVERLAPS

    • RELATE

    • BEYOND

Returns

str or bytes or dict – WFS query response based on the given geometry.

getfeature_byid(self, featurename, featureids)#

Get features based on feature IDs.

Parameters
  • featurename (str) – The name of the column for searching for feature IDs.

  • featureids (str or list) – The feature ID(s).

Returns

str or bytes or dict – WMS query response.

class pygeoogc.pygeoogc.WFSURLs#

URLs of the supported WFS services.

class pygeoogc.pygeoogc.WMS(url, layers, outformat, version='1.3.0', crs=DEF_CRS, validation=True, ssl=None)#

Get data from a WMS service within a geometry or bounding box.

Parameters
  • url (str) – The base url for the WMS service e.g., https://www.mrlc.gov/geoserver/mrlc_download/wms

  • layers (str or list) – A layer or a list of layers from the service to be downloaded. You can pass an empty string to get a list of available layers.

  • outformat (str) – The data format to request for data from the service. You can pass an empty string to get a list of available output formats.

  • crs (str, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

  • version (str, optional) – The WMS service version which should be either 1.1.1 or 1.3.0, defaults to 1.3.0.

  • validation (bool, optional) – Validate the input arguments from the WMS service, defaults to True. Set this to False if you are sure all the WMS settings such as layer and crs are correct to avoid sending extra requests.

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

get_validlayers(self)#

Get the layers supported by the WMS service.

getmap_bybox(self, bbox, resolution, box_crs=DEF_CRS, always_xy=False, max_px=8000000, kwargs=None)#

Get data from a WMS service within a geometry or bounding box.

Parameters
  • bbox (tuple) – A bounding box for getting the data.

  • resolution (float) – The output resolution in meters. The width and height of output are computed in pixel based on the geometry bounds and the given resolution.

  • box_crs (str, or pyproj.CRS, optional) – The spatial reference system of the input bbox, defaults to epsg:4326.

  • always_xy (bool, optional) – Whether to always use xy axis order, defaults to False. Some services change the axis order from xy to yx, following the latest WFS version specifications but some don’t. If the returned value does not have any geometry, it indicates that most probably the axis order does not match. You can set this to True in that case.

  • max_px (int, opitonal) – The maximum allowable number of pixels (width x height) for a WMS requests, defaults to 8 million based on some trial-and-error.

  • kwargs (dict, optional) – Optional additional keywords passed as payload, defaults to None. For example, {"styles": "default"}.

Returns

dict – A dict where the keys are the layer name and values are the returned response from the WMS service as bytes.

class pygeoogc.pygeoogc.WMSURLs#

URLs of the supported WMS services.

pygeoogc.utils#

Some utilities for PyGeoOGC.

Module Contents#
class pygeoogc.utils.ESRIGeomQuery#

Generate input geometry query for ArcGIS RESTful services.

Parameters
  • geometry (tuple or sgeom.Polygon or sgeom.Point or sgeom.LineString) – The input geometry which can be a point (x, y), a list of points [(x, y), …], bbox (xmin, ymin, xmax, ymax), or a Shapely’s sgeom.Polygon.

  • wkid (int) – The Well-known ID (WKID) of the geometry’s spatial reference e.g., for EPSG:4326, 4326 should be passed. Check ArcGIS for reference.

bbox(self)#

Query for a bbox.

multipoint(self)#

Query for a multi-point.

point(self)#

Query for a point.

polygon(self)#

Query for a polygon.

polyline(self)#

Query for a polyline.

class pygeoogc.utils.RetrySession(retries=3, backoff_factor=0.3, status_to_retry=(500, 502, 504), prefixes=('https://',), cache_name=None, expire_after=EXPIRE)#

Configures the passed-in session to retry on failed requests.

The fails can be due to connection errors, specific HTTP response codes and 30X redirections. The code is was originally based on: https://github.com/bustawin/retry-requests

Parameters
  • retries (int, optional) – The number of maximum retries before raising an exception, defaults to 5.

  • backoff_factor (float, optional) – A factor used to compute the waiting time between retries, defaults to 0.5.

  • status_to_retry (tuple, optional) – A tuple of status codes that trigger the reply behaviour, defaults to (500, 502, 504).

  • prefixes (tuple, optional) – The prefixes to consider, defaults to (”http://”, “https://”)

  • cache_name (str, optional) – Path to a folder for caching the session, default to None which uses system’s temp directory.

  • expire_after (int, optional) – Expiration time for the cache in seconds, defaults to -1 (never expire).

get(self, url, payload=None, headers=None)#

Retrieve data from a url by GET and return the Response.

post(self, url, payload=None, headers=None)#

Retrieve data from a url by POST and return the Response.

pygeoogc.utils.bbox_decompose(bbox, resolution, box_crs=DEF_CRS, max_px=8000000)#

Split the bounding box vertically for WMS requests.

Parameters
  • bbox (tuple) – A bounding box; (west, south, east, north)

  • resolution (float) – The target resolution for a WMS request in meters.

  • box_crs (str, optional) – The spatial reference of the input bbox, default to EPSG:4326.

  • max_px (int, opitonal) – The maximum allowable number of pixels (width x height) for a WMS requests, defaults to 8 million based on some trial-and-error.

Returns

list of tuples – Each tuple includes the following elements:

  • Tuple of length 4 that represents a bounding box (west, south, east, north) of a cell,

  • A label that represents cell ID starting from bottom-left to top-right, for example a 2x2 decomposition has the following labels:

    |---------|---------|
    |         |         |
    |   0_1   |   1_1   |
    |         |         |
    |---------|---------|
    |         |         |
    |   0_0   |   1_0   |
    |         |         |
    |---------|---------|
    
  • Raster width of a cell,

  • Raster height of a cell.

pygeoogc.utils.bbox_resolution(bbox, resolution, bbox_crs=DEF_CRS)#

Image size of a bounding box WGS84 for a given resolution in meters.

Parameters
  • bbox (tuple) – A bounding box in WGS84 (west, south, east, north)

  • resolution (float) – The resolution in meters

  • bbox_crs (str, optional) – The spatial reference of the input bbox, default to EPSG:4326.

Returns

tuple – The width and height of the image

pygeoogc.utils.check_bbox(bbox)#

Check if an input inbox is a tuple of length 4.

pygeoogc.utils.check_response(resp)#

Extract error message from a response, if any.

pygeoogc.utils.match_crs(geom, in_crs, out_crs)#

Reproject a geometry to another CRS.

Parameters
  • geom (list or tuple or geometry) – Input geometry which could be a list of coordinates such as [(x1, y1), ...], a bounding box like so (xmin, ymin, xmax, ymax), or any valid shapely’s geometry such as Polygon, MultiPolygon, etc..

  • in_crs (str) – Spatial reference of the input geometry

  • out_crs (str) – Target spatial reference

Returns

same type as the input geometry – Transformed geometry in the target CRS.

Examples

>>> from pygeoogc.utils import match_crs
>>> from shapely.geometry import Point
>>> point = Point(-7766049.665, 5691929.739)
>>> match_crs(point, "epsg:3857", "epsg:4326").xy
(array('d', [-69.7636111130079]), array('d', [45.44549114818127]))
>>> bbox = (-7766049.665, 5691929.739, -7763049.665, 5696929.739)
>>> match_crs(bbox, "epsg:3857", "epsg:4326")
(-69.7636111130079, 45.44549114818127, -69.73666165448431, 45.47699468552394)
>>> coords = [(-7766049.665, 5691929.739)]
>>> match_crs(coords, "epsg:3857", "epsg:4326")
[(-69.7636111130079, 45.44549114818127)]
pygeoogc.utils.traverse_json(obj, path)#

Extract an element from a JSON file along a specified path.

This function is based on bcmullins.

Parameters
  • obj (dict) – The input json dictionary

  • path (list) – The path to the requested element

Returns

list – The items founds in the JSON

Examples

>>> from pygeoogc.utils import traverse_json
>>> data = [{
...     "employees": [
...         {"name": "Alice", "role": "dev", "nbr": 1},
...         {"name": "Bob", "role": "dev", "nbr": 2}],
...     "firm": {"name": "Charlie's Waffle Emporium", "location": "CA"},
... },]
>>> traverse_json(data, ["employees", "name"])
[['Alice', 'Bob']]
pygeoogc.utils.valid_wms_crs(url)#

Get valid CRSs from a WMS service version 1.3.0.

pygeoogc.utils.validate_crs(val)#

Validate a CRS.

Parameters

val (str or int) – Input CRS.

Returns

str – Validated CRS as a string.

Package Contents#

pygeoutils#

Top-level package for PyGeoUtils.

Submodules#

pygeoutils.pygeoutils#

Some utilities for manipulating GeoSpatial data.

Module Contents#
class pygeoutils.pygeoutils.Coordinates#

Generate validated and normalized coordinates in WGS84.

Parameters
  • lon (float or list of floats) – Longitude(s) in decimal degrees.

  • lat (float or list of floats) – Latitude(s) in decimal degrees.

Examples

>>> from pygeoutils import Coordinates
>>> c = Coordinates([460, 20, -30], [80, 200, 10])
>>> c.points.x.tolist()
[100.0, -30.0]
property points(self)#

Get validate coordinate as a geopandas.GeoSeries.

class pygeoutils.pygeoutils.GeoBSpline(points, npts_sp, degree=3)#

Create B-spline from a geo-dataframe of points.

Parameters
  • points (geopandas.GeoDataFrame or geopandas.GeoSeries) – Input points as a GeoDataFrame or GeoSeries in a projected CRS.

  • npts_sp (int) – Number of points in the output spline curve.

  • degree (int, optional) – Degree of the spline. Should be less than the number of points and greater than 1. Default is 3.

Examples

>>> from pygeoutils import GeoBSpline
>>> import geopandas as gpd
>>> xl, yl = zip(
...     *[
...         (-97.06138, 32.837),
...         (-97.06133, 32.836),
...         (-97.06124, 32.834),
...         (-97.06127, 32.832),
...     ]
... )
>>> pts = gpd.GeoSeries(gpd.points_from_xy(xl, yl, crs="epsg:4326"))
>>> sp = GeoBSpline(pts.to_crs("epsg:3857"), 5).spline
>>> pts_sp = gpd.GeoSeries(gpd.points_from_xy(sp.x, sp.y, crs="epsg:3857"))
>>> pts_sp = pts_sp.to_crs("epsg:4326")
>>> list(zip(pts_sp.x, pts_sp.y))
[(-97.06138, 32.837),
(-97.06135, 32.83629),
(-97.06131, 32.83538),
(-97.06128, 32.83434),
(-97.06127, 32.83319)]
property spline(self)#

Get the spline as a Spline object.

pygeoutils.pygeoutils.arcgis2geojson(arcgis, id_attr=None)#

Convert ESRIGeoJSON format to GeoJSON.

Notes

Based on arcgis2geojson.

Parameters
  • arcgis (str or binary) – The ESRIGeoJSON format str (or binary)

  • id_attr (str, optional) – ID of the attribute of interest, defaults to None.

Returns

dict – A GeoJSON file readable by GeoPandas.

pygeoutils.pygeoutils.break_lines(lines, points, tol=0.0)#

Break lines at specified points at given direction.

Parameters
  • lines (geopandas.GeoDataFrame) – Lines to break at intersection points.

  • points (geopandas.GeoDataFrame) – Points to break lines at. It must contain a column named direction with values up or down. This column is used to determine which part of the lines to keep, i.e., upstream or downstream of points.

  • tol (float, optional) – Tolerance for snapping points to the nearest lines in meters. The default is 0.0.

Returns

geopandas.GeoDataFrame – Original lines except for the parts that have been broken at the specified points.

pygeoutils.pygeoutils.geo2polygon(geometry, geo_crs, crs)#

Convert a geometry to a Shapely’s Polygon and transform to any CRS.

Parameters
  • geometry (Polygon or tuple of length 4) – Polygon or bounding box (west, south, east, north).

  • geo_crs (str or pyproj.CRS) – Spatial reference of the input geometry

  • crs (str or pyproj.CRS) – Target spatial reference.

Returns

Polygon – A Polygon in the target CRS.

pygeoutils.pygeoutils.geometry_list(geometry)#

Get a list of polygons, points, and lines from a geometry.

pygeoutils.pygeoutils.get_transform(ds, ds_dims=('y', 'x'))#

Get transform of a xarray.Dataset or xarray.DataArray.

Parameters
  • ds (xarray.Dataset or xarray.DataArray) – The dataset(array) to be masked

  • ds_dims (tuple, optional) – Names of the coordinames in the dataset, defaults to ("y", "x"). The order of the dimension names must be (vertical, horizontal).

Returns

rasterio.Affine, int, int – The affine transform, width, and height

pygeoutils.pygeoutils.gtiff2xarray(r_dict, geometry=None, geo_crs=None, ds_dims=None, driver=None, all_touched=False, nodata=None, drop=True)#

Convert (Geo)Tiff byte responses to xarray.Dataset.

Parameters
  • r_dict (dict) – Dictionary of (Geo)Tiff byte responses where keys are some names that are used for naming each responses, and values are bytes.

  • geometry (Polygon, MultiPolygon, or tuple, optional) – The geometry to mask the data that should be in the same CRS as the r_dict. Defaults to None.

  • geo_crs (str or pyproj.CRS, optional) – The spatial reference of the input geometry, defaults to None. This argument should be given when geometry is given.

  • ds_dims (tuple of str, optional) – The names of the vertical and horizontal dimensions (in that order) of the target dataset, default to None. If None, dimension names are determined from a list of common names.

  • driver (str, optional) – A GDAL driver for reading the content, defaults to automatic detection. A list of the drivers can be found here: https://gdal.org/drivers/raster/index.html

  • all_touched (bool, optional) – Include a pixel in the mask if it touches any of the shapes. If False (default), include a pixel only if its center is within one of the shapes, or if it is selected by Bresenham’s line algorithm.

  • nodata (float or int, optional) – The nodata value of the raster, defaults to None, i.e., is determined from the raster.

  • drop (bool, optional) – If True, drop the data outside of the extent of the mask geometries. Otherwise, it will return the same raster with the data masked. Default is True.

Returns

xarray.Dataset or xarray.DataAraay – Parallel (with dask) dataset or dataarray.

pygeoutils.pygeoutils.json2geodf(content, in_crs=DEF_CRS, crs=DEF_CRS)#

Create GeoDataFrame from (Geo)JSON.

Parameters
  • content (dict or list of dict) – A (Geo)JSON dictionary e.g., response.json() or a list of them.

  • in_crs (str or pyproj.CRS) – CRS of the content, defaults to epsg:4326.

  • crs (str or pyproj.CRS, optional) – The target CRS of the output GeoDataFrame, defaults to epsg:4326.

Returns

geopandas.GeoDataFrame – Generated geo-data frame from a GeoJSON

pygeoutils.pygeoutils.snap2nearest(lines, points, tol)#

Find the nearest points on a line to a set of points.

Parameters
Returns

geopandas.GeoDataFrame or geopandas.GeoSeries – Points snapped to lines.

pygeoutils.pygeoutils.xarray2geodf(da, dtype, mask_da=None, connectivity=8)#

Vectorize a xarray.DataArray to a geopandas.GeoDataFrame.

Parameters
  • da (xarray.DataArray) – The dataarray to vectorize.

  • dtype (type) – The data type of the dataarray. Valid types are int16, int32, uint8, uint16, and float32.

  • mask_da (xarray.DataArray, optional) – The dataarray to use as a mask, defaults to None.

  • connectivity (int, optional) – Use 4 or 8 pixel connectivity for grouping pixels into features, defaults to 8.

Returns

geopandas.GeoDataFrame – The vectorized dataarray.

pygeoutils.pygeoutils.xarray_geomask(ds, geometry, crs, all_touched=False, drop=True, from_disk=False)#

Mask a xarray.Dataset based on a geometry.

Parameters
  • ds (xarray.Dataset or xarray.DataArray) – The dataset(array) to be masked

  • geometry (Polygon, MultiPolygon) – The geometry to mask the data

  • crs (str or pyproj.CRS) – The spatial reference of the input geometry

  • all_touched (bool, optional) – Include a pixel in the mask if it touches any of the shapes. If False (default), include a pixel only if its center is within one of the shapes, or if it is selected by Bresenham’s line algorithm.

  • drop (bool, optional) – If True, drop the data outside of the extent of the mask geometries. Otherwise, it will return the same raster with the data masked. Default is True.

  • from_disk (bool, optional) – If True, it will clip from disk using rasterio.mask.mask if possible. This is beneficial when the size of the data is larger than memory. Default is False.

Returns

xarray.Dataset or xarray.DataArray – The input dataset with a mask applied (np.nan)

Package Contents#

Changelogs#

History#

0.13.1 (2022-06-11)#

New Features#
  • Add support for all the GeoConnex web service endpoints. There are two ways to use it. For a single query, you can use the geoconnex function and for multiple queries, it’s more efficient to use the GeoConnex class.

  • Add support for passing any of the supported NLDI feature sources to the get_basins method of the NLDI class. The default is nwissite to retain backward compatibility.

Bug Fixes#
  • Set the type of “ReachCode” column to str instead of int in pygeoapi and nhdplus_vaa functions.

0.13.0 (2022-04-03)#

New Features#
  • Add two new functions called flowline_resample and network_resample for resampling a flowline or network of flowlines based on a given spacing. This is useful for smoothing jagged flowlines similar to those in the NHDPlus database.

  • Add support for the new NLDI endpoint called “hydrolocation”. The NLDI class now has two methods for getting features by coordinates: feature_byloc and comid_byloc. The feature_byloc method returns the flowline that is associated with the closest NHDPlus feature to the given coordinates. The comid_byloc method returns a point on the closest downstream flowline to the given coordinates.

  • Add a new function called pygeoapi for calling the API in batch mode. This function accepts the input coordinates as a geopandas.GeoDataFrame. It is more performant than calling its counteract PyGeoAPI multiple times. It’s recommended to switch to using this new batch function instead of the PyGeoAPI class. Users just need to prepare an input data frame that has all the required service parameters as columns.

  • Add a new step to prepare_nhdplus to convert MultiLineString to LineString.

  • Add support for the simplified flag of NLDI’s get_basins function. The default value is True to retain the old behavior.

Breaking Changes#
  • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

0.12.2 (2022-02-04)#

New Features#
  • Add a new class called NHD for accessing the latest National Hydrography Dataset. More info regarding this data can be found here.

  • Add two new functions for getting cross-sections along a single flowline via flowline_xsection or throughout a network of flowlines via network_xsection. You can specify spacing and width parameters to control their location. For more information and examples please consult the documentation.

  • Add a new property to AGRBase called service_info to include some useful info about the service including feature_types which can be handy for converting numeric values of types to their string equivalent.

Internal Changes#
  • Use the new PyGeoAPI API.

  • Refactor prepare_nhdplus for improving the performance and robustness of determining tocomid within a network of NHD flowlines.

  • Add empty geometries that NLDI.getbasins returns to the list of not found IDs. This is because the NLDI service does not include non-network flowlines and instead returns an empty geometry for these flowlines. (GH#48)

0.12.1 (2021-12-31)#

Internal Changes#
  • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.

  • Revert to the original PyGeoAPI base URL.

0.12.0 (2021-12-27)#

Breaking Changes#
  • Rewrite ScienceBase to make it applicable for working with other ScienceBase items. A new function has been added for staging the Additional NHDPlus attributes items called stage_nhdplus_attrs.

  • Refactor AGRBase to remove unnecessary functions and make them more general.

  • Update PyGeoAPI class to conform to the new pygeoapi API. This web service is undergoing some changes at the time of this release and the API is not stable, might not work as expected. As soon as the web service is stable, a new version will be released.

New Features#
  • In WaterData.byid show a warning if there are any missing feature IDs that are requested but are not available in the dataset.

  • For all by* methods of WaterData throw a ZeroMatched exception if no features are found.

  • Add expire_after and disable_caching arguments to all functions that use async_retriever. Set the default request caching expiration time to never expire. You can use disable_caching if you don’t want to use the cached responses. Please refer to documentation of the functions for more details.

Internal Changes#
  • Refactor prepare_nhdplus to reduce code complexity by grouping all the NHDPlus tools as a private class.

  • Modify AGRBase to reflect the latest API changes in pygeoogc.ArcGISRESTfull class.

  • Refactor prepare_nhdplus by creating a private class that includes all the previously used private functions. This will make the code more readable and easier to maintain.

  • Add all the missing types so mypy --strict passes.

0.11.4 (2021-11-12)#

New Features#
  • Add a new argument to NLDI.get_basins called split_catchment that if is set to True will split the basin geometry at the watershed outlet.

Internal Changes#
  • Catch service errors in PyGeoAPI and show useful error messages.

  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.11.3 (2021-09-10)#

Internal Changes#
  • More robust handling of inputs and outputs of NLDI’s methods.

  • Use an alternative download link for NHDPlus VAA file on Hydroshare.

  • Restructure the codebase to reduce the complexity of pynhd.py file by dividing it into three files: pynhd all classes that provide access to the supported web services, core that includes base classes, and nhdplus_derived that has functions for getting databases that provided additional attributes for the NHDPlus database.

0.11.2 (2021-08-26)#

New Features#
  • Add support for PyGeoAPI. It offers four functionalities: flow_trace, split_catchment, elevation_profile, and cross_section.

0.11.1 (2021-07-31)#

New Features#
  • Add a function for getting all NHD FCodes as a data frame, called nhd_fcode.

  • Improve prepare_nhdplus function by removing all coastlines and better detection of the terminal point in a network.

Internal Changes#
  • Migrate to using AsyncRetriever for handling communications with web services.

  • Catch the ConnectionError separately in NLDI and raise a ServiceError instead. So user knows that data cannot be returned due to the out of service status of the server not ZeroMatched.

0.11.0 (2021-06-19)#

New Features#
  • Add nhdplus_vaa to access NHDPlus Value Added Attributes for all its flowlines.

  • To see a list of available layers in NHDPlus HR, you can instantiate its class without passing any argument like so NHDPlusHR().

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

Internal Changes#
  • Use persistent caching for all requests which can help speed up network responses significantly.

  • Improve documentation and testing.

0.10.1 (2021-03-27)#

  • Add an announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.0 (2021-03-06)#

  • The first release after renaming hydrodata to PyGeoHydro.

  • Make mypy checks more strict and fix all the errors and prevent possible bugs.

  • Speed up CI testing by using mamba and caching.

0.9.0 (2021-02-14)#

  • Bump version to the same version as PyGeoHydro.

Breaking Changes#
  • Add a new function for getting basins geometries for a list of USGS station IDs. The function is a method of NLDI class called get_basins. So, now NLDI.getfeature_byid function does not have a basin flag. This change makes getting geometries easier and faster.

  • Remove characteristics_dataframe method from NLDI and make a standalone function called nhdplus_attrs for accessing NHDPlus attributes directly from ScienceBase.

  • Add support for using hydro or edits webs services for getting NHDPlus High-Resolution using NHDPlusHR function. The new arguments are service which accepts hydro or edits, and autos_switch flag for automatically switching to the other service if the ones passed by service fails.

New Features#
  • Add a new argument to topoogical_sort called edge_attr that allows adding attribute(s) to the returned Networkx Graph. By default, it is None.

  • A new base class, AGRBase for connecting to ArcGISRESTful-based services such as National Map and EPA’s WaterGEOS.

  • Add support for setting the buffer distance for the input geometries to AGRBase.bygeom.

  • Add comid_byloc to NLDI class for getting ComIDs of the closest flowlines from a list of lon/lat coordinates.

  • Add bydistance to WaterData for getting features within a given radius of a point.

0.2.0 (2020-12-06)#

Breaking Changes#
  • Re-wrote the NLDI function to use API v3 of the NLDI service.

  • The crs argument of WaterData now is the target CRS of the output dataframe. The service CRS is now EPSG:4269 for all the layers.

  • Remove the url_only argument of NLDI since it’s not applicable anymore.

New Features#
  • Added support for NHDPlus High Resolution for getting features by geometry, IDs, or SQL where clause.

  • The following functions are added to NLDI:

  • getcharacteristic_byid: Getting characteristics of NHDPlus catchments.

  • navigate_byloc: Getting the nearest ComID to a coordinate and performing navigation.

  • characteristics_dataframe: Getting all the available catchment-scale characteristics as a data frame.

  • get_validchars: Getting a list of available characteristic IDs for a specified characteristic type.

  • The following function is added to WaterData:

  • byfilter: Getting data based on any valid CQL filter.

  • bygeom: Getting data within a geometry (polygon and multipolygon).

  • Add support for Python 3.9 and tests for Windows.

Bug Fixes#
  • Refactored WaterData to fix the CRS inconsistencies (#1).

0.1.3 (2020-08-18)#

  • Replaced simplejson with orjson to speed-up JSON operations.

0.1.2 (2020-08-11)#

  • Add show_versions function for showing versions of the installed deps.

  • Improve documentation

0.1.1 (2020-08-03)#

  • Improved documentation

  • Refactored WaterData to improve readability.

0.1.0 (2020-07-23)#

  • First release on PyPI.

History#

0.13.1 (2022-06-11)#

New Features#
  • Add a new function called get_us_states to the helpers module for obtaining a GeoDataFrame of the US states. It has an optional argument for returning the contiguous states, continental states, commonwealths states, or US territories. The data are retrieved from the Census’ Tiger 2021 database.

  • In the NID class keep the valid_fields property as a pandas.Series instead of a list, so it can be searched easier via its str accessor.

Internal Changes#
  • Refactor the plot.signatures function to use proplot instead of matplotlib.

  • Improve performance of NWIS.get_streamflow by not validating the layer name when instantiating the WaterData class. Also, make the function more robust by checking if streamflow data is available for each station and throw a warning if not.

Bug Fixes#
  • Fix an issue in NWIS.get_streamflow where -9999 values were not being filtered out. According to NWIS, these values are reserved for ice-affected data. This fix sets these values to numpy.nan.

0.13.0 (2022-04-03)#

New Features#
  • Add a new flag to nlcd_* functions called ssl for disabling SSL verification.

  • Add a new function called get_camels for getting the CAMELS dataset. The function returns a geopandas.GeoDataFrame that includes basin-level attributes for all 671 stations in the dataset and a xarray.Dataset that contains streamflow data for all 671 stations and their basin-level attributes.

  • Add a new function named overland_roughness for getting the overland roughness values from land cover data.

  • Add a new class called WBD for getting watershed boundary (HUC) data.

from pygeohydro import WBD

wbd = WBD("huc4")
hudson = wbd.byids("huc4", ["0202", "0203"])
Breaking Changes#
  • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
Internal Changes#
  • Write nodata attribute using rioxarray in nlcd_bygeom since the clipping operation of rioxarray uses this value as the fill value.

0.12.4 (2022-02-04)#

Internal Changes#
  • Return a named tuple instead of a dict of percentages in the cover_statistics function. It makes accessing the values easier.

  • Add pycln as a new pre-commit hooks for removing unused imports.

  • Remove time zone info from the inputs to plot.signatures to avoid issues with the matplotlib backend.

Bug Fixes#
  • Fix an issue in plot.signatures where the new matplotlib version requires a numpy array instead of a pandas.DataFrame.

0.12.3 (2022-01-15)#

Bug Fixes#
  • Replace no data values of data in ssebopeta_bygeom with np.nan before converting it to mm/day.

  • Fix an inconsistency issue with CRS projection when using UTM in nlcd_*. Use EPSG:3857 for all reprojections and get the data from NLCD in the same projection. (GH85)

  • Improve performance of nlcd_* functions by reducing number of service calls.

Internal Changes#
  • Add type checking with typeguard and fix type hinting issues raised by typeguard.

  • Refactor show_versions to ensure getting correct versions of all dependencies.

0.12.2 (2021-12-31)#

New Features#
  • The NWIS.get_info now returns a geopandas.GeoDataFrame instead of a pandas.DataFrame.

Bug Fixes#
  • Fix a bug in NWIS.get_streamflow where the drainage area might not be computed correctly if target stations are not located at the outlet of their watersheds.

0.12.1 (2021-12-31)#

Internal Changes#
  • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.

Bug Fixes#
  • Fix an in issue with NWIS.get_streamflow where time zone of the data was not being correctly determined when it was US specific abbreviations such as CST.

0.12.0 (2021-12-27)#

New Features#
  • Add support for getting instantaneous streamflow from NWIS in addition to the daily streamflow by adding freq argument to NWIS.get_streamflow that can be either iv or dv. The default is dv to retain the previous behavior of the function.

  • Convert the time zone of the streamflow data to UTC.

  • Add attributes of the requested stations as attrs parameter to the returned pandas.DataFrame. (GH75)

  • Add a new flag to NWIS.get_streamflow for returning the streamflow as xarray.Dataset. This dataset has two dimensions; time and station_id. It has ten variables which includes discharge and nine other station attributes. (GH75)

  • Add drain_sqkm from GagesII to NWIS.get_info.

  • Show drain_sqkm in the interactive map generated by interactive_map.

  • Add two new functions for getting NLCD data; nlcd_bygeom and nlcd_bycoords. The new nlcd_bycoords function returns a geopandas.GeoDataFrame with the NLCD layers as columns and input coordinates, which should be a list of (lon, lat) tuples, as the geometry column. Moreover, The new nlcd_bygeom function now accepts a geopandas.GeoDataFrame as the input. In this case, it returns a dict with keys as indices of the input geopandas.GeoDataFrame. (GH80)

  • The previous nlcd function is being deprecated. For now, it calls nlcd_bygeom internally and retains the old behavior. This function will be removed in future versions.

Breaking Changes#
  • The ssebop_byloc is being deprecated and replaced by ssebop_bycoords. The new function accepts a pandas.DataFrame as input that should include three columns: id, x, and y. It returns a xarray.Dataset with two dimensions: time and location_id. The id columns from the input is used as the location_id dimension. The ssebop_byloc function still retains the old behavior and will be removed in future versions.

  • Set the request caching’s expiration time to never expire. Add two flags to all functions to control the caching: expire_after and disable_caching.

  • Replace NID class with the new RESTful-based web service of National Inventory of Dams. The new NID service is very different from the old one, so this is considered a breaking change.

Internal Changes#
  • Improve exception handling in NWIS.get_info when NWIS returns an error message rather than 500s web service error.

  • The NWIS.get_streamflow function now checks if the site info dataset contains any duplicates. Therefore, all the remaining station numbers will be unique. This prevents an issue with setting attrs where duplicate indexes cause an exception when being converted to a dict. (GH75)

  • Add all the missing types so mypy --strict passes.

0.11.4 (2021-11-24)#

New Features#
  • Add support for the Water Quality Portal Web Services. (GH72)

  • Add support for two versions of NID web service. The original NID web service is considered version 2 and the new NID is considered version 3. You can pass the version number to the NID like so NID(2). The default version is 2.

Bug Fixes#
  • Fix an issue with background percentage calculation in cover_statistics.

0.11.3 (2021-11-12)#

New Features#
  • Add a new map service for National Inventory of Dams (NID).

Internal Changes#
  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.11.2 (2021-07-31)#

Bug Fixes#
  • Refactor cover_statistics to address an issue with wrong category names and also improve performance for large datasets by using numpy’s functions.

  • Fix an issue with detecting wrong number of stations in NWIS.get_streamflow. Also, improve filtering stations that their start/end date don’t match the user requested interval.

0.11.1 (2021-07-31)#

The highlight of this release is adding support for NLCD 2019 and significant improvements in NWIS support.

New Features#
  • Add support for the recently released version of NLCD (2019), including the impervious descriptor layer. Highlights of the new database are:

    NLCD 2019 now offers land cover for years 2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019, and impervious surface and impervious descriptor products now updated to match each date of land cover. These products update all previously released versions of land cover and impervious products for CONUS (NLCD 2001, NLCD 2006, NLCD 2011, NLCD 2016) and are not directly comparable to previous products. NLCD 2019 land cover and impervious surface product versions of previous dates must be downloaded for proper comparison. NLCD 2019 also offers an impervious surface descriptor product that identifies the type of each impervious surface pixel. This product identifies types of roads, wind tower sites, building locations, and energy production sites to allow deeper analysis of developed features.

    MRLC

  • Add support for all the supported regions of NLCD database (CONUS, AK, HI, and PR).

  • Add support for passing multiple years to the NLCD function, like so {"cover": [2016, 2019]}.

  • Add plot.descriptor_legends function to plot the legend for the impervious descriptor layer.

  • New features in NWIS class are:

    • Remove query_* methods since it’s not convenient to pass them directly as a dictionary.

    • Add a new function called get_parameter_codes to query parameters and get information about them.

    • To decrease complexity of get_streamflow method add a new private function to handle some tasks.

    • For handling more of NWIS’s services make retrieve_rdb more general.

  • Add a new argument called nwis_kwds to interactive_map so any NWIS specific keywords can be passed for filtering stations.

  • Improve exception handling in get_info method and simplify and improve its performance for getting HCDN.

Internal Changes#
  • Migrate to using AsyncRetriever for handling communications with web services.

0.11.0 (2021-06-19)#

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

  • Remove get_nid and get_nid_codes functions since NID now has a ArcGISRESTFul service.

New Features#
  • Add a new class called NID for accessing the recently released National Inventory of Dams web service. This service is based on ArcGIS’s RESTful service. So now the user just need to instantiate the class like so NID() and with three methods of AGRBase class, the user can retrieve the data. These methods are: bygeom, byids, and bysql. Moreover, it has a attrs property that includes descriptions of the database fields with their units.

  • Refactor NWIS.get_info to be more generic by accepting any valid queries that are documented at USGS Site Web Service.

  • Allow for passing a list of queries to NWIS.get_info and use async_retriever that significantly improves the network response time.

  • Add two new flags to interactive_map for limiting the stations to those with daily values (dv=True) and/or instantaneous values (iv=True). This function now includes a link to stations webpage on USGS website.

Internal Changes#
  • Use persistent caching for all send/receive requests that can significantly improve the network response time.

  • Explicitly include all the hard dependencies in setup.cfg.

  • Refactor interactive_map and NWIS.get_info to make them more efficient and reduce their code complexity.

0.10.2 (2021-03-27)#

Internal Changes#
  • Add announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.1 (2021-03-06)#

Internal Changes#
  • Add lxml to deps.

0.10.0 (2021-03-06)#

Internal Changes#
  • The official first release of PyGeoHydro with a new name and logo.

  • Replace cElementTree with ElementTree since it’s been deprecated by defusedxml.

  • Make mypy checks more strict and fix all the errors and prevent possible bugs.

  • Speed up CI testing by using mamba and caching.

0.9.2 (2021-03-02)#

Internal Changes#
  • Rename hydrodata package to PyGeoHydro for publication on JOSS.

  • In NWIS.get_info, drop rows that don’t have mean daily discharge data instead of slicing.

  • Speed up Github Actions by using mamba and caching.

  • Improve pip installation by adding pyproject.toml.

New Features#
  • Add support for the National Inventory of Dams (NID) via get_nid function.

0.9.1 (2021-02-22)#

Internal Changes#
  • Fix an issue with NWIS.get_info method where stations with False values as their hcdn_2009 value were returned as None instead.

0.9.0 (2021-02-14)#

Internal Changes#
  • Bump versions of packages across the stack to the same version.

  • Use the new PyNHD function for getting basins, NLDI.get_basisn.

  • Made mypy checks more strict and added all the missing type annotations.

0.8.0 (2020-12-06)#

  • Fixed the issue with WaterData due to the recent changes on the server side.

  • Updated the examples based on the latest changes across the stack.

  • Add support for multipolygon.

  • Remove the fill_hole argument.

  • Fix a warning in nlcd regarding performing division on nan values.

0.7.2 (2020-8-18)#

Enhancements#
  • Replaced simplejson with orjson to speed-up JSON operations.

  • Explicitly sort the time dimension of the ssebopeta_bygeom function.

Bug Fixes#
  • Fix an issue with the nlcd function where high resolution requests fail.

0.7.1 (2020-8-13)#

New Features#
  • Added a new argument to plot.signatures for controlling the vertical position of the plot title, called title_ypos. This could be useful for multi-line titles.

Bug Fixes#
  • Fixed an issue with the nlcd function where none layers are not dropped and cause the function to fails.

0.7.0 (2020-8-12)#

This version divides PyGeoHydro into six standalone Python libraries. So many of the changes listed below belong to the modules and functions that are now a separate package. This decision was made for reducing the complexity of the code base and allow the users to only install the packages that they need without having to install all the PyGeoHydro dependencies.

Breaking changes#
  • The services module is now a separate package called PyGeoOGCC and is set as a requirement for PyGeoHydro. PyGeoOGC is a leaner package with much fewer dependencies and is suitable for people who might only need an interface to web services.

  • Unified function names for getting feature by ID and by box.

  • Combined start and end arguments into a tuple argument called dates across the code base.

  • Rewrote NLDI function and moved most of its classmethods to Station so now Station class has more cohesion.

  • Removed exploratory functionality of ArcGISREST, since it’s more convenient to do so from a browser. Now, base_url is a required argument.

  • Renamed in_crs in datasets and services functions to geo_crs for geometry and box_crs for bounding box inputs.

  • Re-wrote the signatures function from scratch using NamedTuple to improve readability and efficiency. Now, the daily argument should be just a pandas.DataFrame or pandas.Series and the column names are used for legends.

  • Removed utils.geom_mask function and replaced it with rasterio.mask.mask.

  • Removed width as an input in functions with raster output since resolution is almost always the preferred way to request for data. This change made the code more readable.

  • Renamed two functions: ArcGISRESTful and wms_bybox. These function now return requests.Response type output.

  • onlyipv4 is now a class method in RetrySession.

  • The plot.signatures function now assumes that the input time series are in mm/day.

  • Added a flag to get_streamflow function in the NWIS class to convert from cms to mm/day which is useful for plotting hydrologic signatures using the signatures functions.

Enhancements#
  • Remove soft requirements from the env files.

  • Refactored requests functions into a single class and a separate file.

  • Made all the classes available directly from PyGeoHydro.

  • Added CodeFactor to the Github pipeline and addressed some issues that CodeFactor found.

  • Added Bandit to check the code for security issue.

  • Improved docstrings and documentations.

  • Added customized exceptions for better exception handling.

  • Added pytest fixtures to improve the tests speed.

  • Refactored daymet and nwis_siteinfo functions to reduce code complexity and improve readability.

  • Major refactoring of the code base while adding type hinting.

  • The input geometry (or bounding box) can be provided in any projection and the necessary re-projections are done under the hood.

  • Refactored the method for getting object IDs in ArcGISREST class to improve robustness and efficiency.

  • Refactored Daymet class to improve readability.

  • Add Deepsource for further code quality checking.

  • Automatic handling of large WMS requests (more than 8 million pixels i.e., width x height)

  • The json_togeodf function now accepts both a single (Geo)JSON or a list of them

  • Refactored plot.signatures using add_gridspec for a much cleaner code.

New Features#
  • Added access to WaterData’s GeoServer databases.

  • Added access to the remaining NLDI database (Water Quality Portal and Water Data Exchange).

  • Created a Binder for launching a computing environment on the cloud and testing PyGeoHydro.

  • Added a URL repository for the supported services called ServiceURL

  • Added support for FEMA web services for flood maps and FWS for wetlands.

  • Added a new function called wms_toxarray for converting WMS request responses to xarray.DataArray or xarray.Dataset.

Bug Fixes#
  • Re-projection issues for function with input geometry.

  • Start and end variables not being initialized when coords was used in Station.

  • Geometry mask for xarray.DataArray

  • WMS output re-projections

0.6.0 (2020-06-23)#

  • Refactor requests session

  • Improve overall code quality based on CodeFactor suggestions

  • Migrate to Github Actions from TravisCI

0.5.5 (2020-06-03)#

  • Add to conda-forge

  • Remove pqdm and arcgis2geojson dependencies

0.5.3 (2020-06-07)#

  • Added threading capability to the flow accumulation function

  • Generalized WFS to include both by bbox and by featureID

  • Migrate RTD to pip from conda.

  • Changed HCDN database source to GagesII database

  • Increased robustness of functions that need network connections

  • Made the flow accumulation output a pandas Series for better handling of time series input

  • Combined DEM, slope, and aspect in a class called NationalMap.

  • Installation from pip installs all the dependencies

0.5.0 (2020-04-25)#

  • An almost complete re-writing of the code base and not backward-compatible

  • New website design

  • Added vector accumulation

  • Added base classes and function accessing any ArcGIS REST, WMS, WMS service

  • Standalone functions for creating datasets from responses and masking the data

  • Added threading using pqdm to speed up the downloads

  • Interactive map for exploring USGS stations

  • Replaced OpenTopography with 3DEP

  • Added HCDN database for identifying natural watersheds

0.4.4 (2020-03-12)#

  • Added new databases: NLDI, NHDPLus V2, OpenTopography, gridded Daymet, and SSEBop

  • The gridded data are returned as xarray DataArrays

  • Removed dependency on StreamStats and replaced it by NLDI

  • Improved overall robustness and efficiency of the code

  • Not backward comparable

  • Added code style enforcement with isort, black, flake8 and pre-commit

  • Added a new shiny logo!

  • New installation method

  • Changed OpenTopography base url to their new server

  • Fixed NLCD legend and statistics bug

0.3.0 (2020-02-10)#

  • Clipped the obtained NLCD data using the watershed geometry

  • Added support for specifying the year for getting NLCD

  • Removed direct NHDPlus data download dependency by using StreamStats and USGS APIs

  • Renamed get_lulc function to get_nlcd

0.2.0 (2020-02-09)#

  • Simplified import method

  • Changed usage from rst format to ipynb

  • Auto-formatting with the black python package

  • Change docstring format based on Sphinx

  • Fixed pytest warnings and changed its working directory

  • Added an example notebook with data files

  • Added docstring for all the functions

  • Added Module section to the documentation

  • Fixed py7zr issue

  • Changed 7z extractor from pyunpack to py7zr

  • Fixed some linting issues.

0.1.0 (2020-01-31)#

  • First release on PyPI.

History#

0.13.1 (2022-06-11)#

New Features#
  • In deg2mpm function look for _FillValue and nodatavals in the attributes and if not found, fall back to numpy.nan.

Internal Changes#
  • Ensure that the deg2mpm function uses dask if the input is dask-enabled.

  • In the elevation_profile function use a bounding box to get DEM and a linear interpolation to get the elevation along the profile.

0.13.0 (2022-04-03)#

New Features#
  • Add a new function called query_3dep_sources for querying bounds of 3DEP’s data sources within a bounding box. It returns a geo-dataframe that contains the bounding box of each data source and a column dem_res identifying the resolution of the raw topographic data within each geometry.

  • Add a new function called elevation_profile for getting elevation profile along a line at a given spacing. This function converts the line to a B-spline and then calculates the elevation along the spline at a given uniform spacing.

Breaking Changes#
  • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

0.12.2 (2022-01-15)#

New Features#
  • Add a new DEM source to elevation_bycoords to get elevation from the National Map’s 3DEP WMS service. This can replace the tnm source since tnm is not stable.

  • Add a new function called check_3dep_availability to check the availability of 3DEP’s native resolutions within an area of interest. It returns a dict with keys corresponding to the available resolutions and its values are boolean values indicating whether the resolution is available or not.

  • Replace no data values of slope in deg2mm with np.nan, so they do not get converted to another value. The output of this function has np.float64 type.

Internal Changes#
  • Refactor ElevationByCoords by using __post_init__ for validating the input parameters rather than pydantic’s validators.

  • Refactor elevation_bygrid by using get_map to get DEM and rioxarray for re-projection.

  • Add type checking with typeguard and fixed typing issues raised by typeguard.

  • Refactor show_versions to ensure getting correct versions of all dependencies.

0.12.1 (2021-12-31)#

Internal Changes#
  • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.

0.12.0 (2021-12-27)#

Breaking Changes#
  • Set the request caching’s expiration time to never expire. Add two flags to all functions to control the caching: expire_after and disable_caching.

Internal Changes#
  • Add all the missing types so mypy --strict passes.

  • Improve performance of elevation_bygrid by ignoring unnecessary validation.

0.11.4 (2021-11-12)#

Internal Changes#
  • Use rioxarray for dealing with GeoTIFF binaries since xarray deprecated the xarray.open_rasterio function, as it’s discussed in this PR.

  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.11.3 (2021-10-03)#

Breaking Changes#
  • Rewrite the command-line interface using click.group to improve UX. The command is now py3dep [command] [args] [options]. The two supported commands are coords for getting elevations of a dataframe of coordinates in EPSG:4326 CRS and geometry for getting the elevation of a geo-dataframe of geometries. Each sub-command now has a separate help message. The format of the input file for the coords command is now csv and for the geometry command is .shp or .gpkg and must have a crs attribute. Also, the geometry command now accepts multiple layers via the --layers (-l) option. More information and examples can be in the README.rst file.

New Features#
Internal Changes#
  • The get_map function now checks for validation of the input layers argument before sending the actual request with a more helpful message.

  • Improve docstrings.

  • Move deg2mpm, fill_depressions, and reproject_gtiff functions to a new file called utils. Both deg2mpm and fill_depressions functions are still accessible from py3dep directly.

  • Increase the test coverage.

  • Use one of the click’s internal functions, click..testing.CliRunner, to run the CLI tests.

0.11.2 (2021-09-17)#

Bug Fixes#
  • Fix a bug related to elevation_bycoords where CRS validation fails if its type is pyrpoj.CRS by converting inputs with CRS types to string.

Internal Changes#
  • Fix a couple of typing issues and update the get_transform API based on the recent changes in pygeoutils v0.11.5.

0.11.1 (2021-07-31)#

The first highlight of this release is a major refactor of elevation_bycoords by adding support for the Bulk Point Query Service and improving the overall performance of the function. Another highlight is support for performing depression filling in elevation_bygrid before sampling the underlying DEM.

New Features#
  • Refactor elevation_bycoords function to add support for getting elevations of a list of coordinates via The National Map’s Point Query Service. This service is more accurate than Airmap, but it’s limited to the US only. You can select the source via a new argument called source. You can set it to source=tnm to use the TNM service. The default is tnm.

  • Refactor elevation_bygrid function to add a new capability via fill_depressions argument for filling depressions in the obtained DEM before extracting elevation data for the input grid points. This is achieved via RichDEM that needs to be installed if this functionality is desired. You can install it via pip or conda (mamba).

Internal Changes#
  • Migrate to using AsyncRetriever for handling communications with web services.

  • Handle the interpolation step in elevation_bygrid function more efficiently using xarray.

0.11.0 (2021-06-19)#

New Features#
  • Added command-line interface (GH10).

  • All feature query functions use persistent caching that can significantly improve the performance.

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

  • The returned xarray objects are in parallel mode, i.e., in some cases compute method should be used to get the results.

  • Save the output as a netcdf instead of raster since conversion from nc to tiff can be easily done with rioxarray.

0.10.1 (2021-03-27)#

  • Add announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.0 (2021-03-06)#

  • The first release after renaming hydrodata to PyGeoHydro.

  • Make mypy checks more strict and fix all the errors and prevent possible bugs.

  • Speed up CI testing by using mamba and caching.

0.9.0 (2021-02-14)#

  • Bump version to the same version as PyGeoHydro.

  • Add support for saving maps as geotiff file(s).

  • Replace Elevation Point Query Service service with AirMap for getting elevations for a list of coordinates in bulk since AirMap is much faster. The resolution of AirMap is 30 m.

  • Use cytoolz for some operations for improving performance.

0.2.0 (2020-12-06)#

  • Add support for multipolygon.

  • Remove the fill_hole argument.

  • Add a new function to get elevations for a list of coordinates called elevation_bycoords.

  • Refactor elevation_bygrid function for increasing readability and performance.

0.1.7 (2020-08-18)#

  • Added a rename operation to get_map to automatically rename the variables to a more sensible one.

  • Replaced simplejson with orjson to speed-up JSON operations.

0.1.6 (2020-08-11)#

  • Add a new function, show_versions, for getting versions of the installed dependencies which is useful for debugging and reporting.

  • Fix typos in the docs and improved the README.

  • Improve testing and coverage.

0.1.5 (2020-08-03)#

  • Fixed the geometry CRS issue

  • Improved the documentation

0.1.4 (2020-07-23)#

  • Refactor get_map to use pygeoutils package.

  • Change the versioning method to setuptools_scm.

  • Polish README and add installation from conda-forge.

0.1.0 (2020-07-19)#

  • First release on PyPI.

History#

0.13.1 (2022-06-11)#

New Features#
Bug Fixes#
  • Set the end year based on the current year since Daymet data get updated every year (PR55) by Tim Cera.

  • Set the months for the annual timescale to correct values (PR55) by Tim Cera.

0.13.0 (2022-03-03)#

Breaking Changes#
  • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"

0.12.3 (2022-02-04)#

New Features#
  • Add a new flag to both get_bycoords and get_bygeom functions called snow which separates snow from the precipitation using the Martinez and Gupta (2010) method.

Internal Changes#
  • Add elevation data when computing PET regardless of the pet method.

  • Match the chunk size of elevation with that of the climate data.

  • Drop time dimension from elevation, lon, and lat variables.

Bug Fixes#
  • Fix a bug in setting dates for monthly timescales. For monthly timescale Daymet calendar is at 15th or 16th of the month, so input dates need to be adjusted accordingly.

0.12.2 (2022-01-15)#

Internal Changes#
  • Clean up the PET computation functions’ output by removing temporary variables that are created during the computation.

  • Add more attributes for elevation and pet variables.

  • Add type checking with typeguard and fixed typing issues raised by typeguard.

  • Refactor show_versions to ensure getting correct versions of all dependencies.

0.12.1 (2021-12-31)#

Internal Changes#
  • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.

0.12.0 (2021-12-27)#

New Features#
  • Expose the ssl argument for disabling the SSL certification verification (GH41). Now, you can pass ssl=False to disable the SSL verification in both get_bygeom and get_bycoord functions. Moreover, you can pass --disable_ssl to PyDaymet’s command line interface to disable the SSL verification.

Breaking Changes#
  • Set the request caching’s expiration time to never expire. Add two flags to all functions to control the caching: expire_after and disable_caching.

Internal Changes#
  • Add all the missing types so mypy --strict passes.

0.11.4 (2021-11-12)#

Internal Changes#
  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.11.3 (2021-10-07)#

Bug Fixes#
  • There was an issue in the PET computation due to dayofyear being added as a new dimension. This version fixes it and even further simplifies the code by using xarray’s dt accessor to gain access to the dayofyear method.

0.11.2 (2021-10-07)#

New Features#
  • Add hargreaves_samani and priestley_taylor methods for computing PET.

Breaking Changes#
  • Rewrite the command-line interface using click.group to improve UX. The command is now pydaymet [command] [args] [options]. The two supported commands are coords for getting climate data for a dataframe of coordinates and geometry for getting gridded climate data for a geo-dataframe. Moreover, Each sub-command now has a separate help message and example.

  • Deprecate get_byloc in favor of get_bycoords.

  • The pet argument in both get_bycoords and get_bygeom functions now accepts hargreaves_samani, penman_monteith, priestley_taylor, and None.

Internal Changes#
  • Refactor the pet module for reducing duplicate code and improving readability and maintainability. The code is smaller now and the functions for computing physical properties include references to equations from the respective original paper.

0.11.1 (2021-07-31)#

The highlight of this release is a major refactor of Daymet to allow for extending PET computation function for using methods other than FAO-56.

New Features#
  • Refactor Daymet class by removing pet_bycoords and pet_bygrid methods and creating a new public function called potential_et. This function computes potential evapotranspiration (PET) and supports both gridded (xarray.Dataset) and single pixel (pandas.DataFrame) climate data. The long-term plan is to add support for methods other than FAO 56 for computing PET.

0.11.0 (2021-06-19)#

New Features#
  • Add command-line interface (GH7).

  • Use AsyncRetriever for sending requests asynchronously with persistent caching. A cache folder in the current directory is created.

  • Check for validity of start/end dates based on Daymet V4 since Puerto Rico data starts from 1950 while North America and Hawaii start from 1980.

  • Check for validity of input coordinate/geometry based on the Daymet V4 bounding boxes.

  • Improve accuracy of computing Psychometric constant in PET calculations by using an equation in Allen et al. 1998.

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

  • Change loc_crs and geo_crs arguments to crs in get_bycoords and get_bygeom.

Documentation#
  • Add examples to docstrings and improve writing.

  • Add more notes regarding the underlying assumptions for pet_bycoords and pet_bygrid.

Internal Changes#
  • Refactor Daymet class to use pydantic for validating the inputs.

  • Increase test coverage.

0.10.2 (2021-03-27)#

  • Add announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.0 (2021-03-06)#

  • The first release after renaming hydrodata to PyGeoHydro.

  • Make mypy checks more strict and fix all the errors and prevent possible bugs.

  • Speed up CI testing by using mamba and caching.

0.9.0 (2021-02-14)#

  • Bump version to the same version as PyGeoHydro.

  • Update to version 4 of Daymet database. You can check the release information here

  • Add a new function called get_bycoords that provides an alternative to get_byloc for getting climate data at a single pixel. This new function uses THREDDS data server with NetCDF Subset Service (NCSS), and supports getting monthly and annual averages directly from the server. Note that this function will replace get_byloc in the future. So consider migrating your code by replacing get_byloc with get_bycoords. The input arguments of get_bycoords is very similar to get_bygeom. Another difference between get_byloc and get_bycoords is column names where get_bycoords uses the units that are return by NCSS server.

  • Add support for downloading monthly and annual summaries in addition to the daily timescale. You can pass time_scale as daily, monthly, or annual to get_bygeom or get_bycoords functions to download the respective summaries.

  • Add support for getting climate data for Hawaii and Puerto Rico by passing region to get_bygeom and get_bycoords functions. The acceptable values are na for CONUS, hi for Hawaii, and pr for Puerto Rico.

0.2.0 (2020-12-06)#

  • Add support for multipolygon.

  • Remove the fill_hole argument.

  • Improve masking by geometry.

  • Use the newly added async_requests function from pygeoogc for getting Daymet data to increase the performance (almost 2x faster)

0.1.3 (2020-08-18)#

  • Replaced simplejson with orjson to speed-up JSON operations.

0.1.2 (2020-08-11)#

  • Add show_versions for showing versions of the installed deps.

0.1.1 (2020-08-03)#

  • Retained the compatibility with xarray 0.15 by removing the attrs flag.

  • Replaced open_dataset with load_dataset for automatic handling of closing the input after reading the content.

  • Removed years argument from both byloc and bygeom functions. The dates argument now accepts both a tuple of start and end dates and a list of years.

0.1.0 (2020-07-27)#

  • Initial release on PyPI.

History#

0.3.2 (2022-04-03)#

New Features#
  • Add support for setting caching-related arguments using three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
Internal Changes#
  • Include the URL of a failed request in its exception error message.

0.3.1 (2021-12-31)#

New Features#
  • Add three new functions called retrieve_text, retrieve_json, and retrieve_binary. These functions are derived from the retrieve function and are used to retrieve the text, JSON, or binary content of a response. They are meant to help with type hinting since they have only one return type instead of the three different return types that the retrieve function has.

Internal Changes#
  • Move all private functions to a new module called utils. This makes the code-base more readable and easier to maintain.

0.3.0 (2021-12-27)#

Breaking Changes#
  • Set the expiration time to never expire by default.

New Features#
  • Add two new arguments to retrieve for controlling caching. First, delete_url_cache for deleting caches for specific requests. Second, expire_after for setting a custom expiration time.

  • Expose the ssl argument for disabling the SSL certification verification (GH41).

  • Add a new option called disable that temporarily disables caching requests/responses if set to True. It defaults to False.

0.2.5 (2021-11-09)#

New Features#
  • Add two new arguments, timeout and expire_after, to retrieve. These two arguments give the user more control in dealing with issues related to caching.

Internal Changes#
  • Revert to pytest as the testing framework.

  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.2.4 (2021-09-10)#

Internal Changes#
  • Use ujon for converting responses to JSON.

Bug Fixes#
  • Fix an issue with catching service error messages.

0.2.3 (2021-08-26)#

Internal Changes#
  • Use ujson for JSON parsing instead of orjson since orjson only serializes to bytes which is not compatible with aiohttp.

0.2.2 (2021-08-19)#

New Features#
  • Add a new function, clean_cache, for manually removing the expired responses from the cache database.

Internal Changes#
  • Handle all cache file-related operations in the create_cachefile function.

0.2.1 (2021-07-31)#

New Features#
  • The responses now are returned to the same order as the input URLs.

  • Add support for passing connection type, i.e., IPv4 only, IPv6 only, or both via the family argument. Defaults to both.

  • Set trust_env=True, so the session can read the system’s netrc files. This can be useful for working with services such as EarthData service that read the user authentication info from a netrc file.

Internal Changes#
  • Replace the AsyncRequest class with the _retrieve function to increase readability and reduce overhead.

  • More robust handling of validating user inputs via a new class called ValidateInputs.

  • Move all if-blocks in async_session to other functions to improve performance.

0.2.0 (2021-06-17)#

Breaking Changes#
  • Make persistent caching dependencies required.

  • Rename request argument to request_method in retrieve which now accepts both lower and upper cases of get and post.

Bug Fixes#
  • Pass a new loop explicitly to nest_asyncio (GH1).

Internal Changes#
  • Refactor the entire code-base for more efficient handling of different request methods.

  • Check the validity of inputs before sending requests.

  • Improve documentation.

  • Improve cache handling by removing the expired responses before returning the results.

  • Increase testing coverage to 100%.

0.1.0 (2021-05-01)#

  • Initial release.

History#

0.13.1 (2022-06-11)#

New Features#
  • More robust handling of errors in ArcGISRESTful by catching None responses. Also, use the POST method for ArcGISRESTful.bysql since the SQL Clause could be a long string.

0.13.0 (2022-04-03)#

Breaking Changes#
  • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

    • HYRIVER_CACHE_NAME: Path to the caching SQLite database.

    • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.

    • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

    You can do this like so:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
Bug Fixes#
  • In ArcGISRESTful.oids_byfield convert the input ids to a list if a user passes a single id.

Internal Changes#
  • Refactor ServicURL to hard code the supported links instead of reading them from a file. Also, the class now is based on NamedTuple that has a nicer __repr__.

0.12.2 (2022-01-15)#

New Features#
  • Make validate_crs public that can be accessed from the utils module. This is useful for checking validity of user input CRS values and getting its string representation.

  • Add pygeoogc.utils.valid_wms_crs function for getting a list of valid CRS values from a WMS service.

  • Add 3DEP’s index WFS service for querying availability of 3DEP data within a bounding box.

Internal Changes#
  • Add type checking with typeguard and fixed typing issues raised by typeguard.

  • Refactor show_versions to ensure getting correct versions of all dependencies.

0.12.1 (2021-12-31)#

Internal Changes#
  • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.

0.12.0 (2021-12-27)#

New Features#
  • Add a new argument to ArcGISRESTful called verbose to turn on/off all info level logs.

  • Add an option to ArcGISRESTful.get_features called get_geometry to turn on/off requesting the data with or without geometry.

  • Now, ArcGISRESTful saves the object IDs of the features that user requested but are not available in the database to ./cache/failed_request_ids.txt.

  • Add a new parameter to ArcGISRESTful called disable_retry that If True in case there are any failed queries, no retrying attempts is done and object IDs of the failed requests are saved to a text file which its path can be accessed via ArcGISRESTful.client.failed_path.

  • Set response caching expiration time to never expire, for all base classes. A new argument has been added to all three base classes called expire_after that can be used to set the expiration time.

  • Add a new method to all three base classes called clear_cache that clears all cached responses for that specific client.

Breaking Changes#
  • All oids_by* methods of ArcGISRESTful class now return a list of object IDs rather than setting self.featureids. This makes it possible to pass the outputs of the oids_by* functions directly to the get_features method.

Internal Changes#
  • Make ArcGISRESTful less cluttered by instantiating ArcGISRESTfulBase in the init method of ArcGISRESTful rather than inheriting from its base class.

  • Explicitly set a minimum value of 1 for the maximum number of feature IDs per request in ArcGISRESTful, i.e., self.max_nrecords.

  • Add all the missing types so mypy --strict passes.

0.11.7 (2021-11-09)#

Breaking Changes#
  • Remove the onlyipv4 method from RetrySession since it can be easily be achieved using with unittest.mock.patch("socket.has_ipv6", False):.

Internal Changes#
  • Use the geoms method for iterating over geometries to address the deprecation warning of shapely.

  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

  • Remove unnecessary dependency on simplejson and use ujson instead.

0.11.5 (2021-09-09)#

Bug Fixes#
  • Update the code to use the latest requsts-cache API.

0.11.4 (2021-08-26)#

New Features#

0.11.3 (2021-08-21)#

Internal Changes#
  • Fix a bug in WFS.getfeature_byid when the number of IDs exceeds the service’s limit by splitting large requests into multiple smaller requests.

  • Add two new arguments, max_nrecords and read_method, to WFS to control the maximum number of records per request (defaults to 1000) and specify the response read method (defaults to json), respectively.

0.11.2 (2021-08-19)#

Internal Changes#
  • Simplify the retry logic ArcGISRESTFul by making it run four times and making sure that the last retry is one object ID per request.

0.11.1 (2021-07-31)#

The highlight of this release is migrating to use AsyncRetriever that can improve the network response time significantly. Another highlight is a major refactoring of ArcGISRESTFul that improves performance and reduce code complexity.

New Features#
  • Add a new method to ArcGISRESTFul class for automatically retrying the failed requests. This private method plucks out individual features that were in a failed request with several features. This happens when there are some object IDs that are not available on the server, and they are included in the request. In these situations the request will fail, although there are valid object IDs in the request. This method will pluck out the valid object IDs.

  • Add support for passing additional parameters to WMS requests such as styles.

  • Add support for WFS version 1.0.0.

Internal Changes#
  • Migrate to AsyncRetriever from requests-cache for all the web services.

  • Rename ServiceError to ServiceUnavailable and ServerError to ServiceError Since it’s more representative of the intended exception.

  • Raise for response status in RetrySession before the try-except block so RequestsException can raise and its error messaged be parsed.

  • Deprecate utils.threading since all threading operations are now handled by AsyncRetriever.

  • Increase test coverage.

0.11.0 (2021-06-18)#

New Features#
  • Add support for requesting LineString polygon for ArcGISRESTful.

  • Add a new argument called distance to ArcGISRESTful.oids_bygeom for specifying the buffer distance from the input geometry for getting features.

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

  • Remove async_requests function, since it has been packaged as a new Python library called AsyncRetriever.

  • Refactor MatchCRS. Now, it should be instantiated by providing the in and out CRSs like so: MatchCRS(in_crs, out_crs). Then its methods, namely, geometry, bounds and coords, can be called. These methods now have only one input, geometry.

  • Change input and output types of MatchCRS.coords from tuple of lists of coordinates to list of (x, y) coordinates.

  • ArcGISRESTful now has a new argument, layer, for specifying the layer number (int). Now, the target layer should either be a part of base_url or be passed with layer argument.

  • Move the spatial_relation argument from ArcGISRESTful class to oids_bygeom method, since that’s where it’s applicable.

Internal Changes#
  • Refactor ArcGISRESTfulBase class to reduce its code complexity and make the service initialization logic much simpler. The class is faster since it makes fewer requests during the initialization process.

  • Add pydantic as a new dependency that takes care of ArcGISRESTfulBase validation.

  • Use persistent caching for all send/receive requests that can significantly improve the network response time.

  • Explicitly include all the hard dependencies in setup.cfg.

  • Set a default value of 1000 for max_nrecords in ArcGISRESTfulBase.

  • Use dataclass for WMSBase and WFSBase since support for Python 3.6 is dropped.

0.10.1 (2021-03-27)#

  • Add announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.0 (2021-03-06)#

  • The first release after renaming hydrodata to PyGeoHydro.

  • Fix extent property of ArcGISRESTful being set to None incorrectly.

  • Add feature types property to ArcGISRESTFul for getting names and IDs of types of features in the database.

  • Replace cElementTree with ElementTree since it’s been deprecated by defusedxml.

  • Remove dependency on dataclasses since its benefits and usage in the code was minimal.

  • Speed up CI testing by using mamba and caching.

  • ArcGISRESTFull now prints number of found features before attempting to retrieve them.

  • User logging module for printing information.

0.9.0 (2021-02-14)#

  • Bump version to the same version as PyGeoHydro.

  • Add support for query by point and multi-points to ArcGISRESTful.bygeom.

  • Add support for buffer distance to ArcGISRESTful.bygeom.

  • Add support for generating ESRI-based queries for points and multi-points to ESRIGeomQuery.

  • Add all the missing type annotations.

  • Update the Daymet URL to version 4. You can check the release information here

  • Use cytoolz library for improving performance of some operations.

  • Add extent property to ArcGISRESTful class that get the spatial extent of the service.

  • Add URL to airmap service for getting elevation data at 30 m resolution.

0.2.3 (2020-12-19)#

  • Fix urlib3 deprecation warning about using method_whitelist.

0.2.2 (2020-12-05)#

  • Remove unused variables in async_requests and use max_workers.

  • Fix the async_requests issue on Windows systems.

0.2.0 (2020-12-06)#

  • Added/Renamed three class methods in ArcGISRESTful: oids_bygeom, oids_byfield, and oids_bysql. So you can query feature within a geometry, using specific field ID(s), or more generally using any valid SQL 92 WHERE clause.

  • Added support for query with SQL WHERE clause to ArcGISRESTful.

  • Changed the NLDI’s URL for migrating to its new API v3.

  • Added support for CQL filter to WFS, credits to Emilio.

  • Moved all the web services URLs to a YAML file that ServiceURL class reads. It makes managing the new URLs easier. The file is located at pygeoogc/static/urls.yml.

  • Turned off threading by default for all the services since not all web services supports it.

  • Added support for setting the request method, GET or POST, for WFS.byfilter, which could be useful when the filter string is long.

  • Added support for asynchronous download via the function async_requests.

0.1.10 (2020-08-18)#

  • Improved bbox_decompose to fix the WMS issue with high resolution requests.

  • Replaces simplejson with orjson to speed up JSON operations.

0.1.8 (2020-08-12)#

  • Removed threading for WMS due to inconsistent behavior.

  • Addressed an issue with domain decomposition for WMS where width/height becomes 0.

0.1.7 (2020-08-11)#

  • Renamed vsplit_bbox to bbox_decompose. The function now decomposes the domain in both directions and return squares and rectangular.

0.1.5 (2020-07-23)#

  • Re-wrote wms_bybox function as a class called WMS with a similar interface to the WFS class.

  • Added support for WMS 1.3.0 and WFS 2.0.0.

  • Added a custom Exception for the threading function called ThreadingException.

  • Add always_xy flag to WMS and WFS which is False by default. It is useful for cases where a web service doesn’t change the axis order from the transitional xy to yx for versions higher than 1.3.0.

0.1.3 (2020-07-21)#

  • Remove unnecessary transformation of the input bbox in WFS.

  • Use setuptools_scm for versioning.

0.1.2 (2020-07-16)#

  • Add the missing max_pixel argument to the wms_bybox function.

  • Change the onlyIPv4 method of RetrySession class to onlyipv4 to conform to the snake_case convention.

  • Improve docstrings.

0.1.1 (2020-07-15)#

  • Initial release.

History#

0.13.1 (2022-06-11)#

New Features#
  • Add support for passing a custom bounding box in the Coordinates class. The default is the bounds of EPSG:4326 to retain backward compatibility. This new class parameter allows a user to check if a list of coordinates is within a custom bounding box. The bounds should be the EPSG:4326 coordinate system.

  • Add a new function called geometry_list for converting a list of multi-geometries to a list of geometries.

0.13.0 (2022-03-03)#

Internal Changes#
  • Write nodata attribute using rioxarray in geotiff2xarray since the clipping operation of rioxarray uses this value as fill value.

Bug Fixes#
  • In the break_lines function, convert MultiLineString into LineString since shapely.ops.substring only accepts LineString.

0.12.3 (2022-02-04)#

New Features#
  • Add a function called break_lines for breaking lines at given points.

  • Add a function called snap2nearest for snapping points to the nearest point on a line with a given tolerance. It accepts a geopandas.GeoSeries of points and a geopandas.GeoSeries or geopandas.GeoDataFrame of lines. It automatically snaps to the closest lines in the input data.

0.12.2 (2022-01-15)#

New Features#
  • Add a new class called GeoBSpline that generates B-splines from a set of coordinates. The spline attribute of this class has five attributes: x and y coordinates, phi and radius which are curvature and radius of curvature, respectively, and distance which is the total distance of each point along the B-spline from the starting points.

  • Add a new class called Coordinates that validates a set of lon/lat coordinates. It normalizes longitudes to the range [-180, 180) and has a points property that is geopandas.GeoSeries with validated coordinates. It uses spatial indexing to speed up the validation and should be able to handle large datasets efficiently.

  • Make transform2tuple a public function.

Internal Changes#
  • The geometry and geo_crs arguments of gtiff2xarray are now optional. This is useful for cases when the input GeoTiff response is the results of a bounding box query and there is no need for a geometry mask.

  • Replace the missing values after adding geometry mask via xarray_geomask by the nodatavals attribute of the input xarray.DataArray or xarray.Dataset. Therefore, the data type of the input xarray.DataArray or xarray.Dataset is conserved.

  • Expose connectivity argument of rasterio.features.shapes function in xarray2geodf function.

  • Move all private functions to a new module to make the main module less cluttered.

0.12.1 (2021-12-31)#

Internal Changes#
  • Refactor arcgis2geojson for better readability and maintainability.

  • In arcgis2geojson set the geometry to null if its type is not supported, such as curved polylines.

0.12.0 (2021-12-27)#

Internal Changes#
  • Add all the missing types so mypy --strict passes.

  • Bump version to 0.12.0 to match the release of pygeoogc.

0.11.7 (2021-11-09)#

Internal Changes#
  • Use rioxarray for dealing with GeoTIFF binaries since xarray deprecated the xarray.open_rasterio function, as it’s discussed in this PR.

  • Use importlib-metadata for getting the version instead of pkg_resources to decrease import time as discussed in this issue.

0.11.6 (2021-10-06)#

New Features#
  • Add a new function, xarray2geodf, to convert a xarray.DataArray to a geopandas.GeoDataFrame.

0.11.5 (2021-06-16)#

Bug Fixes#
  • Fix an issue with gtiff2xarray where the scales and offsets attributes of the output DataArray were floats rather than tuples (GH30).

Internal Changes#
  • Add a new function, transform2tuple, for converting Affine transforms to a tuple. Previously, the Affine transform was converted to a tuple using to_gdal() method of rasterio.Affine which was not compatible with rioxarray.

0.11.4 (2021-08-26)#

Internal Changes#
  • Use ujson for JSON parsing instead of orjson since orjson only serializes to bytes which is not compatible with aiohttp.

  • Convert the transform attribute data type from Affine to tuple since saving a data array to netcdf cannot handle the Affine type.

0.11.3 (2021-08-19)#

  • Fix an issue in geotiff2xarray related to saving a xarray object to NetCDF when its transform attribute has Affine type rather than a tuple.

0.11.2 (2021-07-31)#

The highlight of this release is performance improvement in gtiff2xarray for handling large responses.

New Features#
  • Automatic detection of the driver by default in gtiff2xarray as opposed to it being GTiff.

Internal Changes#
  • Make geo2polygon, get_transform, and get_nodata_crs public functions since other packages use it.

  • Make xarray_mask a public function and simplify gtiff2xarray.

  • Remove MatchCRS since it’s already available in pygeoogc.

  • Validate input geometry in geo2polygon.

  • Refactor gtiff2xarray to check for the ds_dims outside the main loops to improve the performance. Also, the function tries to detect the dimension names automatically if ds_dims is not provided by the user, explicitly.

  • Improve performance of json2geodf by using list comprehension and performing checks outside the main loop.

Bug Fixes#
  • Add the missing arguments for masking the data in gtiff2xarray.

0.11.1 (2021-06-19)#

Bug Fixes#
  • In some edge cases the y-coordinates of a response might not be monotonically sorted so dask fails. This release sorts them to address this issue.

0.11.0 (2021-06-19)#

New Features#
  • Function gtiff2xarray returns a parallelized xarray.Dataset or xarray.DataAraay that can handle large responses much more efficiently. This is achieved using dask.

Breaking Changes#
  • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

  • Refactor MatchCRS. Now, it should be instantiated by providing the in and out CRSs like so: MatchCRS(in_crs, out_crs). Then its methods, namely, geometry, bounds and coords, can be called. These methods now have only one input, geometry.

  • Change input and output types of MatchCRS.coords from tuple of lists of coordinates to list of (x, y) coordinates.

  • Remove xarray_mask and gtiff2file since rioxarray is more general and suitable.

Internal Changes#
  • Remove unnecessary type checks for private functions.

  • Refactor json2geodf to improve robustness. Use get method of dict for checking key availability.

0.10.1 (2021-03-27)#

  • Setting transform of the merged dataset explicitly (GH3).

  • Add announcement regarding the new name for the software stack, HyRiver.

  • Improve pip installation and release workflow.

0.10.0 (2021-03-06)#

  • The first release after renaming hydrodata to PyGeoHydro.

  • Address GH1 by sorting y coordinate after merge.

  • Make mypy checks more strict and fix all the errors and prevent possible bugs.

  • Speed up CI testing by using mamba and caching.

0.9.0 (2021-02-14)#

  • Bump version to the same version as PyGeoHydro.

  • Add gtiff2file for saving raster responses as geotiff file(s).

  • Fix an error in _get_nodata_crs for handling no data value when its value in the source is None.

  • Fix the warning during the GeoDataFrame generation in json2geodf when there is no geometry column in the input JSON.

0.2.0 (2020-12-06)#

  • Added checking the validity of input arguments in gtiff2xarray function and provide useful messages for debugging.

  • Add support for multipolygon.

  • Remove the fill_hole argument.

  • Fixed a bug in xarray_geomask for getting the transform.

0.1.10 (2020-08-18)#

  • Fixed the gtiff2xarray issue with high resolution requests and improved robustness of the function.

  • Replaced simplejson with orjson to speed up JSON operations.

0.1.9 (2020-08-11)#

  • Modified griff2xarray to reflect the latest changes in pygeoogc 0.1.7.

0.1.8 (2020-08-03)#

  • Retained the compatibility with xarray 0.15 by removing the attrs flag.

  • Added xarray_geomask function and made it a public function.

  • More efficient handling of large GeoTiff responses by cropping the response before converting it into a dataset.

  • Added a new function called geo2polygon for converting and transforming a polygon or bounding box into a Shapely’s Polygon in the target CRS.

0.1.6 (2020-07-23)#

  • Fixed the issue with flipped mask in WMS.

  • Removed drop_duplicates since it may cause issues in some instances.

0.1.4 (2020-07-22)#

  • Refactor griff2xarray and added support for WMS 1.3.0 and WFS 2.0.0.

  • Add MatchCRS class.

  • Remove dependency on PyGeoOGC.

  • Increase test coverage.

0.1.3 (2020-07-21)#

  • Remove duplicate rows before returning the dataframe in the json2geodf function.

  • Add the missing dependency

0.1.0 (2020-07-21)#

  • First release on PyPI.

Contributing#

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways to any of the packages that are included in HyRiver project. The workflow is the same for all packages. In this page, a contribution workflow for PyGeoHydro is explained.

Types of Contributions#

Report Bugs#

Report bugs at https://github.com/cheginit/pygeohydro/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.

  • Any details about your local setup that might be helpful in troubleshooting.

  • Detailed steps to reproduce the bug.

Fix Bugs#

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features#

Other than new features that you might have in mind, you can look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation#

PyGeoHydro could always use more documentation, whether as part of the official PyGeoHydro docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback#

The best way to send feedback is to file an issue at https://github.com/cheginit/pygeohydro/issues.

If you are proposing a feature:

  • Explain in detail how it would work.

  • Keep the scope as narrow as possible, to make it easier to implement.

  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!#

Ready to contribute? Here’s how to set up PyGeoHydro for local development.

  1. Fork the PyGeoHydro repo through the GitHub website.

  2. Clone your fork locally and add the main PyGeoHydro as the upstream remote:

$ git clone git@github.com:your_name_here/pygeohydro.git
$ git remote add upstream git@github.com:cheginit/pygeohydro.git
  1. Install your local copy into a virtualenv. Assuming you have Conda installed, this is how you can set up your fork for local development:

$ cd pygeohydro/
$ conda env create -f ci/requirements/environment.yml
$ conda activate pygeohydro-dev
$ python -m pip install . --no-deps
  1. Create a branch for local development:

$ git checkout -b bugfix-or-feature/name-of-your-bugfix-or-feature
$ git push
  1. Before you first commit, pre-commit hooks needs to be setup:

$ pre-commit install
$ pre-commit run --all-files
  1. Now you can make your changes locally, make sure to add a description of the changes to HISTORY.rst file and add extra tests, if applicable, to tests folder. Also, make sure to give yourself credit by adding your name at the end of the item(s) that you add in the history like this By `Taher Chegini <https://github.com/cheginit>`_. Then, fetch the latest updates from the remote and resolve any merge conflicts:

$ git fetch upstream
$ git merge upstream/name-of-your-branch
  1. Then lint and test the code:

$ make lint
  1. If you are making breaking changes make sure to reflect them in the documentation, README.rst, and tests if necessary.

  2. Commit your changes and push your branch to GitHub:

$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature
  1. Submit a pull request through the GitHub website.

Tips#

To run a subset of tests:

$ pytest -k "test_name1 or test_name2"

Deploying#

A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in HISTORY.rst). Then run:

$ git tag -a vX.X.X -m "vX.X.X"
$ git push --follow-tags

where X.X.X is the version number following the semantic versioning spec i.e., MAJOR.MINOR.PATCH. Then release the tag from Github and Github Actions will deploy it to PyPi.

Credits#

Development Lead#

Contributors#

None yet. Why not be the first?

License#

MIT License

Copyright (c) 2020, Taher Chegini

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

High-level APIs for accessing some pre-configured web services

Navigate and subset mid- and high-res NHD, NHDPlus, and NHDPlus VAA using WaterData, NLDI, ScienceBase, and The National Map web services.

PyPi Version Download Stat

Access NWIS, NID, HCDN 2009, NLCD, and SSEBop databases.

PyPi Version Download Stat

Access topographic data through The National Map’s 3DEP web service.

PyPi Version Download Stat

Access Daymet for daily, monthly and annual summaries of climate data at 1-km scale for both single pixels and gridded.

PyPi Version Download Stat

Low-level APIs for connecting to supported web service protocols

Send queries to and receive responses from any ArcGIS RESTful-, WMS-, and WFS-based services.

PyPi Version Download Stat

Convert responses from PyGeoOGC’s supported web services protocols into geospatial and raster datasets.

PyPi Version Download Stat

Asynchronous send/receive requests with persistent caching.

PyPi Version async_stat