pygeohydro.pygeohydro
#
Accessing data from the supported databases through their APIs.
Module Contents#
- class pygeohydro.pygeohydro.EHydro#
Access USACE Hydrographic Surveys (eHydro).
Notes
For more info visit: https://navigation.usace.army.mil/Survey/Hydro
- class pygeohydro.pygeohydro.NID#
Retrieve data from the National Inventory of Dams web service.
- property df#
Entire NID inventory (
csv
version) as apandas.DataFrame
.
- property gdf#
Entire NID inventory (
gpkg
version) as ageopandas.GeoDataFrame
.
- property nid_inventory_path: pathlib.Path#
Path to the NID inventory feather file.
- get_byfilter(query_list)#
Query dams by filters from the National Inventory of Dams web service.
- Parameters:
query_list (
list
ofdict
) – List of dictionary of query parameters. For an exhaustive list of the parameters, use the advanced fields dataframe that can be accessed viaNID().fields_meta
. Some filter require min/max values such asdamHeight
anddrainageArea
. For such filters, the min/max values should be passed like so:{filter_key: ["[min1 max1]", "[min2 max2]"]}
.- Returns:
list
ofgeopandas.GeoDataFrame
– Query results in the same order as the input query list.
Examples
>>> from pygeohydro import NID >>> nid = NID() >>> query_list = [ ... {"drainageArea": ["[200 500]"]}, ... {"nidId": ["CA01222"]}, ... ] >>> dam_dfs = nid.get_byfilter(query_list)
- get_bygeom(geometry, geo_crs)#
Retrieve NID data within a geometry.
- Parameters:
- Returns:
geopandas.GeoDataFrame
– GeoDataFrame of NID data
Examples
>>> from pygeohydro import NID >>> nid = NID() >>> dams = nid.get_bygeom((-69.77, 45.07, -69.31, 45.45), 4326)
- get_suggestions(text, context_key=None)#
Get suggestions from the National Inventory of Dams web service.
Notes
This function is useful for exploring and/or narrowing down the filter fields that are needed to query the dams using
get_byfilter
.- Parameters:
- Returns:
tuple
ofpandas.DataFrame
– The suggestions for the requested text as two DataFrames: First, is suggestions found in the dams properties and second, those found in the query fields such as states, huc6, etc.
Examples
>>> from pygeohydro import NID >>> nid = NID() >>> dams, contexts = nid.get_suggestions("houston", "city")
- inventory_byid(federal_ids)#
Get extra attributes for dams based on their dam ID.
Notes
This function is meant to be used for getting extra attributes for dams. For example, first you need to use either
get_bygeom
orget_byfilter
to get basic attributes of the target dams. Then you can use this function to get extra attributes using theid
column of theGeoDataFrame
thatget_bygeom
orget_byfilter
returns.- Parameters:
federal_ids (
list
ofstr
) – List of the target dam Federal IDs.- Returns:
pandas.DataFrame
– Dams with extra attributes in addition to the standard NID fields that otherNID
methods return.
Examples
>>> from pygeohydro import NID >>> nid = NID() >>> dams = nid.inventory_byid(['KY01232', 'GA02400', 'NE04081', 'IL55070', 'TN05345'])
- stage_nid_inventory(fname=None)#
Download the entire NID inventory data and save to a feather file.
- Parameters:
fname (
str
,pathlib.Path
, optional) – The path to the file to save the data to, defaults to./cache/nid_inventory.feather
.
- pygeohydro.pygeohydro.cover_statistics(cover_da)#
Percentages of the categorical NLCD cover data.
- Parameters:
cover_da (
xarray.DataArray
) – Land cover DataArray from a LULC Dataset from thenlcd_bygeom
function.- Returns:
Stats
– A named tuple with the percentages of the cover classes and categories.
- pygeohydro.pygeohydro.get_camels()#
Get streaflow and basin attributes of all 671 stations in CAMELS dataset.
Notes
For more info on CAMELS visit: https://ral.ucar.edu/solutions/products/camels
- Returns:
tuple
ofgeopandas.GeoDataFrame
andxarray.Dataset
– The first is basin attributes as ageopandas.GeoDataFrame
and the second is streamflow data and basin attributes as anxarray.Dataset
.
- pygeohydro.pygeohydro.nlcd_area_percent(geo_df, year=2019, region='L48')#
Compute the area percentages of the natural, developed, and impervious areas.
Notes
This function uses imperviousness and land use/land cover data from NLCD to compute the area percentages of the natural, developed, and impervious areas. It considers land cover classes of 21 to 24 as urban and the rest as natural. Then, uses imperviousness percentage to partition the urban area into developed and impervious areas. So,
urban = developed + impervious
and alwaysnatural + urban = natural + developed + impervious = 100
.- Parameters:
geometry (
geopandas.GeoDataFrame
orgeopandas.GeoSeries
) – A GeoDataFrame or GeoSeries with the geometry to query. The indices are used as keys in the output dictionary.year (
int
, optional) – Year of the NLCD data, defaults to 2019. Available years are 2021, 2019, 2016, 2013, 2011, 2008, 2006, 2004, and 2001.region (
str
, optional) – Region in the US that the input geometries are located, defaults toL48
. Valid values areL48
(for CONUS),HI
(for Hawaii),AK
(for Alaska), andPR
(for Puerto Rico). Both lower and upper cases are acceptable.
- Returns:
pandas.DataFrame
– A dataframe with the same index as inputgeo_df
and columns are the area percentages of the natural, developed, impervious, and urban (sum of developed and impervious) areas. Sum of urban and natural percentages is always 100, as well as the sume of natural, developed, and impervious percentages.
- pygeohydro.pygeohydro.nlcd_bycoords(coords, years=None, region='L48', ssl=None)#
Get data from NLCD database (2019).
- Parameters:
coords (
list
oftuple
) – List of coordinates in the form of (longitude, latitude).years (
dict
, optional) – The years for NLCD layers as a dictionary, defaults to{'impervious': [2019], 'cover': [2019], 'canopy': [2019], "descriptor": [2019]}
. Layers that are not in years are ignored, e.g.,{'cover': [2016, 2019]}
returns land cover data for 2016 and 2019.region (
str
, optional) – Region in the US that the input geometries are located, defaults toL48
. Valid values areL48
(for CONUS),HI
(for Hawaii),AK
(for Alaska), andPR
(for Puerto Rico). Both lower and upper cases are acceptable.ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set toFalse
to disable SSL certification verification.
- Returns:
geopandas.GeoDataFrame
– A GeoDataFrame with the NLCD data and the coordinates.
- pygeohydro.pygeohydro.nlcd_bygeom(geometry, resolution, years=None, region='L48', crs=4326, ssl=None)#
Get data from NLCD database (2019).
- Parameters:
geometry (
geopandas.GeoDataFrame
orgeopandas.GeoSeries
) – A GeoDataFrame or GeoSeries with the geometry to query. The indices are used as keys in the output dictionary.resolution (
float
) – The data resolution in meters. The width and height of the output are computed in pixel based on the geometry bounds and the given resolution.years (
dict
, optional) – The years for NLCD layers as a dictionary, defaults to{'impervious': [2019], 'cover': [2019], 'canopy': [2019], "descriptor": [2019]}
. Layers that are not in years are ignored, e.g.,{'cover': [2016, 2019]}
returns land cover data for 2016 and 2019.region (
str
, optional) – Region in the US that the input geometries are located, defaults toL48
. Valid values areL48
(for CONUS),HI
(for Hawaii),AK
(for Alaska), andPR
(for Puerto Rico). Both lower and upper cases are acceptable.crs (
str
,int
, orpyproj.CRS
, optional) – The spatial reference system to be used for requesting the data, defaults toepsg:4326
.ssl (
bool
orSSLContext
, optional) – SSLContext to use for the connection, defaults to None. Set toFalse
to disable SSL certification verification.
- Returns:
dict
ofxarray.Dataset
orxarray.Dataset
– A single or adict
of NLCD datasets. If dict, the keys are indices of the inputGeoDataFrame
.
- pygeohydro.pygeohydro.overland_roughness(cover_da)#
Estimate overland roughness from land cover data.
- Parameters:
cover_da (
xarray.DataArray
) – Land cover DataArray from a LULC Dataset from thenlcd_bygeom
function.- Returns:
xarray.DataArray
– Overland roughness
- pygeohydro.pygeohydro.soil_gnatsgo(layers, geometry, crs=4326)#
Get US soil data from the gNATSGO dataset.
Notes
This function uses Microsoft’s Planetary Computer service to get the data. The dataset’s description and its suppoerted soil properties can be found at: https://planetarycomputer.microsoft.com/dataset/gnatsgo-rasters
- Parameters:
layers (
list
ofstr
orstr
) – Target layer(s). Available layers can be found at the dataset’s website here.geometry (
Polygon
,MultiPolygon
, ortuple
oflength 4
) – Geometry or bounding box of the region of interest.crs (
int
,str
, orpyproj.CRS
, optional) – The input geometry CRS, defaults toepsg:4326
.
- Returns:
xarray.Dataset
– Requested soil properties.
- pygeohydro.pygeohydro.soil_properties(properties='*', soil_dir='cache')#
Get soil properties dataset in the United States from ScienceBase.
Notes
This function downloads the source zip files from ScienceBase , extracts the included
.tif
files, and return them as anxarray.Dataset
.- Parameters:
properties (
list
ofstr
orstr
, optional) – Soil properties to extract, default to “*”, i.e., all the properties. Available properties areawc
for available water capacity,fc
for field capacity, andpor
for porosity.soil_dir (
str
orpathlib.Pathlib.Path
) – Directory to store zip files or if exists read from them, defaults to./cache
.
- pygeohydro.pygeohydro.ssebopeta_bycoords(coords, dates, crs=4326)#
Daily actual ET for a dataframe of coords from SSEBop database in mm/day.
- Parameters:
coords (
pandas.DataFrame
) – A dataframe withid
,x
,y
columns.dates (
tuple
orlist
, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].crs (
str
,int
, orpyproj.CRS
, optional) – The CRS of the input coordinates, defaults toepsg:4326
.
- Returns:
xarray.Dataset
– Daily actual ET in mm/day as a dataset withtime
andlocation_id
dimensions. Thelocation_id
dimension is the same as theid
column in the input dataframe.
- pygeohydro.pygeohydro.ssebopeta_bygeom(geometry, dates, geo_crs=4326)#
Get daily actual ET for a region from SSEBop database.
Notes
Since there’s still no web service available for subsetting SSEBop, the data first needs to be downloaded for the requested period then it is masked by the region of interest locally. Therefore, it’s not as fast as other functions and the bottleneck could be the download speed.
- Parameters:
geometry (
shapely.geometry.Polygon
ortuple
) – The geometry for downloading clipping the data. For a tuple bbox, the order should be (west, south, east, north).dates (
tuple
orlist
, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].geo_crs (
str
,int
, orpyproj.CRS
, optional) – The CRS of the input geometry, defaults toepsg:4326
.
- Returns:
xarray.DataArray
– Daily actual ET within a geometry in mm/day at 1 km resolution