pygeohydro.pygeohydro#

Accessing data from the supported databases through their APIs.

Module Contents#

class pygeohydro.pygeohydro.NID#

Retrieve data from the National Inventory of Dams web service.

get_byfilter(query_list)#

Query dams by filters from the National Inventory of Dams web service.

Parameters

query_list (list of dict) – List of dictionary of query parameters. For an exhaustive list of the parameters, use the advanced fields dataframe that can be accessed via NID().fields_meta. Some filter require min/max values such as damHeight and drainageArea. For such filters, the min/max values should be passed like so: {filter_key: ["[min1 max1]", "[min2 max2]"]}.

Returns

geopandas.GeoDataFrame – Query results.

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> query_list = [
...    {"drainageArea": ["[200 500]"]},
...    {"nidId": ["CA01222"]},
... ]
>>> dam_dfs = nid.get_byfilter(query_list)
>>> print(dam_dfs[0].loc[dam_dfs[0].name == "Prairie Portage"].id.item())
496613
get_bygeom(geometry, geo_crs)#

Retrieve NID data within a geometry.

Parameters
  • geometry (Polygon, MultiPolygon, or tuple of length 4) – Geometry or bounding box (west, south, east, north) for extracting the data.

  • geo_crs (list of str) – The CRS of the input geometry, defaults to epsg:4326.

Returns

geopandas.GeoDataFrame – GeoDataFrame of NID data

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> dams = nid.get_bygeom((-69.77, 45.07, -69.31, 45.45), "epsg:4326")
>>> print(dams.name.iloc[0])
Little Moose
get_suggestions(text, context_key='')#

Get suggestions from the National Inventory of Dams web service.

Notes

This function is useful for exploring and/or narrowing down the filter fields that are needed to query the dams using get_byfilter.

Parameters
  • text (str) – Text to query for suggestions.

  • context_key (str, optional) – Suggestion context, defaults to empty string, i.e., all context keys. For a list of valid context keys, see NID().fields_meta.

Returns

tuple of pandas.DataFrame – The suggestions for the requested text as two DataFrames: First, is suggestions found in the dams properties and second, those found in the query fields such as states, huc6, etc.

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> dams, contexts = nid.get_suggestions("texas", "city")
>>> print(contexts.loc["CITY", "value"])
Texas City
inventory_byid(dam_ids, stage_nid=False)#

Get extra attributes for dams based on their dam ID.

Notes

This function is meant to be used for getting extra attributes for dams. For example, first you need to use either get_bygeom or get_byfilter to get basic attributes of the target dams. Then you can use this function to get extra attributes using the id column of the GeoDataFrame that get_bygeom or get_byfilter returns.

Parameters
  • dam_ids (list of int or str) – List of the target dam IDs (digists only). Note that the dam IDs are not the same as the NID IDs.

  • stage_nid (bool, optional) – Whether to get the entire NID and then query locally or query from the NID web service which tends to be very slow for large number of requests. Defaults to False. The staged NID database is saved as a feather file in ./cache/nid_inventory.feather.

Returns

pandas.DataFrame – Dams with extra attributes in addition to the standard NID fields that other NID methods return.

Examples

>>> from pygeohydro import NID
>>> nid = NID()
>>> dams = nid.inventory_byid([514871, 459170, 514868, 463501, 463498])
>>> print(dams.damHeight.max())
120.0
stage_nid_inventory(fname=None)#

Download the entire NID inventory data and save to a feather file.

Parameters

fname (str, pathlib.Path, optional) – The path to the file to save the data to, defaults to ./cache/nid_inventory.feather.

class pygeohydro.pygeohydro.WBD(layer, outfields='*', crs=4326)#

Access Watershed Boundary Dataset (WBD).

Notes

This web service offers Hydrologic Unit (HU) polygon boundaries for the United States, Puerto Rico, and the U.S. Virgin Islands. For more info visit: https://hydro.nationalmap.gov/arcgis/rest/services/wbd/MapServer

Parameters
  • layer (str, optional) – A valid service layer. Valid layers are:

    • wbdline

    • huc2

    • huc4

    • huc6

    • huc8

    • huc10

    • huc12

    • huc14

    • huc16

  • outfields (str or list, optional) – Target field name(s), default to “*” i.e., all the fields.

  • crs (str, int, or pyproj.CRS, optional) – Target spatial reference, default to EPSG:4326.

pygeohydro.pygeohydro.cover_statistics(cover_da)#

Percentages of the categorical NLCD cover data.

Parameters

cover_da (xarray.DataArray) – Land cover DataArray from a LULC Dataset from the nlcd_bygeom function.

Returns

Stats – A named tuple with the percentages of the cover classes and categories.

pygeohydro.pygeohydro.get_camels()#

Get streaflow and basin attributes of all 671 stations in CAMELS dataset.

Notes

For more info on CAMELS visit: https://ral.ucar.edu/solutions/products/camels

Returns

tuple of geopandas.GeoDataFrame and xarray.Dataset – The first is basin attributes as a geopandas.GeoDataFrame and the second is streamflow data and basin attributes as an xarray.Dataset.

pygeohydro.pygeohydro.nlcd_bycoords(coords, years=None, region='L48', ssl=None)#

Get data from NLCD database (2019).

Parameters
  • coords (list of tuple) – List of coordinates in the form of (longitude, latitude).

  • years (dict, optional) – The years for NLCD layers as a dictionary, defaults to {'impervious': [2019], 'cover': [2019], 'canopy': [2019], "descriptor": [2019]}. Layers that are not in years are ignored, e.g., {'cover': [2016, 2019]} returns land cover data for 2016 and 2019.

  • region (str, optional) – Region in the US, defaults to L48. Valid values are L48 (for CONUS), HI (for Hawaii), AK (for Alaska), and PR (for Puerto Rico). Both lower and upper cases are acceptable.

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

Returns

geopandas.GeoDataFrame – A GeoDataFrame with the NLCD data and the coordinates.

pygeohydro.pygeohydro.nlcd_bygeom(geometry, resolution, years=None, region='L48', crs=4326, ssl=None)#

Get data from NLCD database (2019).

Parameters
  • geometry (geopandas.GeoDataFrame or geopandas.GeoSeries) – A GeoDataFrame or GeoSeries with the geometry to query. The indices are used as keys in the output dictionary.

  • resolution (float) – The data resolution in meters. The width and height of the output are computed in pixel based on the geometry bounds and the given resolution.

  • years (dict, optional) – The years for NLCD layers as a dictionary, defaults to {'impervious': [2019], 'cover': [2019], 'canopy': [2019], "descriptor": [2019]}. Layers that are not in years are ignored, e.g., {'cover': [2016, 2019]} returns land cover data for 2016 and 2019.

  • region (str, optional) – Region in the US, defaults to L48. Valid values are L48 (for CONUS), HI (for Hawaii), AK (for Alaska), and PR (for Puerto Rico). Both lower and upper cases are acceptable.

  • crs (str, int, or pyproj.CRS, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

  • ssl (bool or SSLContext, optional) – SSLContext to use for the connection, defaults to None. Set to False to disable SSL certification verification.

Returns

dict of xarray.Dataset or xarray.Dataset – A single or a dict of NLCD datasets. If dict, the keys are indices of the input GeoDataFrame.

pygeohydro.pygeohydro.overland_roughness(cover_da)#

Estimate overland roughness from land cover data.

Parameters

cover_da (xarray.DataArray) – Land cover DataArray from a LULC Dataset from the nlcd_bygeom function.

Returns

xarray.DataArray – Overland roughness

pygeohydro.pygeohydro.soil_gnatsgo(layers, geometry, crs=4326)#

Get US soil data from the gNATSGO dataset.

Notes

This function uses Microsoft’s Planetary Computer service to get the data. The dataset’s description and its suppoerted soil properties can be found at: https://planetarycomputer.microsoft.com/dataset/gnatsgo-rasters

Parameters
  • layers (list of str or str) – Target layer(s). Available layers can be found at the dataset’s website here.

  • geometry (Polygon, MultiPolygon, or tuple of length 4) – Geometry or bounding box of the region of interest.

  • crs (int, str, or pyproj.CRS, optional) – The input geometry CRS, defaults to epsg:4326.

Returns

xarray.Dataset – Requested soil properties.

pygeohydro.pygeohydro.soil_properties(properties='*', soil_dir='cache')#

Get soil properties dataset in the United States from ScienceBase.

Notes

This function downloads the source zip files from ScienceBase , extracts the included .tif files, and return them as an xarray.Dataset.

Parameters
  • properties (list of str or str, optional) – Soil properties to extract, default to “*”, i.e., all the properties. Available properties are awc for available water capacity, fc for field capacity, and por for porosity.

  • soil_dir (str or pathlib.Path) – Directory to store zip files or if exists read from them, defaults to ./cache.

pygeohydro.pygeohydro.ssebopeta_bycoords(coords, dates, crs=4326)#

Daily actual ET for a dataframe of coords from SSEBop database in mm/day.

Parameters
  • coords (pandas.DataFrame) – A dataframe with id, x, y columns.

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • crs (str, int, or pyproj.CRS, optional) – The CRS of the input coordinates, defaults to epsg:4326.

Returns

xarray.Dataset – Daily actual ET in mm/day as a dataset with time and location_id dimensions. The location_id dimension is the same as the id column in the input dataframe.

pygeohydro.pygeohydro.ssebopeta_bygeom(geometry, dates, geo_crs=4326)#

Get daily actual ET for a region from SSEBop database.

Notes

Since there’s still no web service available for subsetting SSEBop, the data first needs to be downloaded for the requested period then it is masked by the region of interest locally. Therefore, it’s not as fast as other functions and the bottleneck could be the download speed.

Parameters
  • geometry (shapely.geometry.Polygon or tuple) – The geometry for downloading clipping the data. For a tuple bbox, the order should be (west, south, east, north).

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • geo_crs (str, int, or pyproj.CRS, optional) – The CRS of the input geometry, defaults to epsg:4326.

Returns

xarray.DataArray – Daily actual ET within a geometry in mm/day at 1 km resolution