voxcity.downloader ================== .. py:module:: voxcity.downloader Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/voxcity/downloader/citygml/index /autoapi/voxcity/downloader/eubucco/index /autoapi/voxcity/downloader/gba/index /autoapi/voxcity/downloader/gee/index /autoapi/voxcity/downloader/mbfp/index /autoapi/voxcity/downloader/ocean/index /autoapi/voxcity/downloader/oemj/index /autoapi/voxcity/downloader/osm/index /autoapi/voxcity/downloader/overture/index /autoapi/voxcity/downloader/utils/index Attributes ---------- .. autoapisummary:: voxcity.downloader.OVERPASS_ENDPOINTS voxcity.downloader.tag_osm_key_value_mapping voxcity.downloader.classification_mapping Functions --------- .. autoapisummary:: voxcity.downloader.get_mbfp_gdf voxcity.downloader.download_file voxcity.downloader.initialize_earth_engine voxcity.downloader.get_roi voxcity.downloader.get_center_point voxcity.downloader.get_ee_image_collection voxcity.downloader.get_ee_image voxcity.downloader.save_geotiff voxcity.downloader.get_dem_image voxcity.downloader.save_geotiff_esa_land_cover voxcity.downloader.save_geotiff_dynamic_world_v1 voxcity.downloader.save_geotiff_esri_landcover voxcity.downloader.save_geotiff_open_buildings_temporal voxcity.downloader.save_geotiff_dsm_minus_dtm voxcity.downloader.load_gdf_from_openstreetmap voxcity.downloader.load_land_cover_gdf_from_osm voxcity.downloader.load_tree_gdf_from_osm voxcity.downloader.save_oemj_as_geotiff voxcity.downloader.load_gdf_from_eubucco voxcity.downloader.get_gdf_from_eubucco voxcity.downloader.load_gdf_from_overture voxcity.downloader.load_gdf_from_gba Package Contents ---------------- .. py:function:: get_mbfp_gdf(output_dir, rectangle_vertices) Download and process building footprint data for a rectangular region. This function takes a list of coordinates defining a rectangular region and: 1. Downloads the necessary building footprint data files covering the region 2. Loads and combines the GeoJSON data from all relevant files 3. Processes the data to ensure consistent coordinate ordering 4. Assigns unique sequential IDs to each building :param output_dir: Directory path where downloaded files will be saved :type output_dir: str :param rectangle_vertices: List of (lon, lat) tuples defining the rectangle corners. The coordinates should define a bounding box of the area of interest. :type rectangle_vertices: list :returns: GeoDataFrame containing building footprints with columns: - geometry: Building polygon geometries - id: Sequential unique identifier for each building :rtype: geopandas.GeoDataFrame .. note:: - Files are downloaded only if not already present in the output directory - Coordinates in the input vertices should be in (longitude, latitude) order - The function handles cases where some vertices might not have available data .. py:function:: download_file(url, filename, *, timeout=60, max_retries=3, initial_delay=2.0, backoff_factor=2.0, chunk_size=8192) Download a file from a URL and save it locally with retry and streaming. Uses streaming to avoid loading large files entirely into memory and retries on transient network failures with exponential backoff. :param url: URL of the file to download. :type url: str :param filename: Local path where the downloaded file will be saved. :type filename: str :param timeout: Request timeout in seconds (default 60). :type timeout: int :param max_retries: Number of retry attempts on failure (default 3). :type max_retries: int :param initial_delay: Seconds to wait before the first retry (default 2.0). :type initial_delay: float :param backoff_factor: Multiplier for delay between retries (default 2.0). :type backoff_factor: float :param chunk_size: Bytes per chunk when streaming (default 8192). :type chunk_size: int :raises requests.HTTPError: If download fails after all retries. .. rubric:: Example >>> download_file('https://example.com/file.pdf', 'local_file.pdf') .. py:function:: initialize_earth_engine(**initialize_kwargs) Initialize the Earth Engine API if not already initialized. Uses a public-behavior check to determine whether Earth Engine is already initialized by attempting to access asset roots. If that call fails, it will initialize Earth Engine using the provided keyword arguments. Arguments are passed through to ``ee.Initialize`` to support contexts such as specifying a ``project`` or service account credentials. .. py:function:: get_roi(input_coords) Create an Earth Engine region of interest polygon from coordinates. :param input_coords: List of coordinate pairs defining the polygon vertices in (lon, lat) format. The coordinates should form a valid polygon (non-self-intersecting). :returns: Earth Engine polygon geometry representing the ROI :rtype: ee.Geometry.Polygon .. note:: The function automatically closes the polygon by connecting the last vertex to the first vertex if they are not the same. .. py:function:: get_center_point(roi) Get the centroid coordinates of a region of interest. :param roi: Earth Engine geometry object representing the region of interest :returns: (longitude, latitude) coordinates of the centroid :rtype: tuple .. note:: The centroid is calculated using Earth Engine's geometric centroid algorithm, which may not always fall within the geometry for complex shapes. .. py:function:: get_ee_image_collection(collection_name, roi) Get the first image from an Earth Engine ImageCollection filtered by region. :param collection_name: Name of the Earth Engine ImageCollection (e.g., 'LANDSAT/LC08/C02/T1_TOA') :param roi: Earth Engine geometry to filter by :returns: First image from collection clipped to ROI, with any masked pixels unmasked :rtype: ee.Image .. note:: The function sorts images by time (earliest first) and unmasks any masked pixels in the final image. This is useful for ensuring complete coverage of the ROI. .. py:function:: get_ee_image(collection_name, roi) Get an Earth Engine Image clipped to a region. :param collection_name: Name of the Earth Engine Image asset :param roi: Earth Engine geometry to clip to :returns: Image clipped to ROI with masked pixels unmasked to 0 :rtype: ee.Image .. note:: Unlike get_ee_image_collection(), this function works with single image assets rather than image collections. It's useful for static datasets like DEMs. Masked pixels are replaced with 0 via unmask() so that downstream consumers receive numeric values instead of dataset-specific nodata sentinels (e.g. 255). .. py:function:: save_geotiff(image, filename, resolution=1, scale=None, region=None, crs=None) Save an Earth Engine image as a GeoTIFF file. This function provides flexible options for exporting Earth Engine images to GeoTIFF format. It handles different export scenarios based on the provided parameters. :param image: Earth Engine image to save :param filename: Output filename for the GeoTIFF :param resolution: Output resolution in degrees (default: 1), used when scale is not provided :param scale: Output scale in meters (overrides resolution if provided) :param region: Region to export (required if scale is provided) :param crs: Coordinate reference system (e.g., 'EPSG:4326') .. note:: - If scale and region are provided, uses ee_export_image() - Otherwise, uses ee_to_geotiff() with resolution parameter - The function automatically converts output to Cloud Optimized GeoTIFF (COG) format when using ee_to_geotiff() .. py:function:: get_dem_image(roi_buffered, source) Get a digital elevation model (DEM) image for a region. This function provides access to various global and regional Digital Elevation Model (DEM) datasets through Earth Engine. Each source has different coverage areas and resolutions. :param roi_buffered: Earth Engine geometry with buffer - should be larger than the actual area of interest to ensure smooth interpolation at edges :param source: DEM source, one of: - 'NASA': SRTM 30m global DEM - 'COPERNICUS': Copernicus 30m global DEM - 'DeltaDTM': Deltares global DTM - 'FABDEM': Forest And Buildings removed MERIT DEM - 'England 1m DTM': UK Environment Agency 1m terrain model - 'DEM France 5m': IGN RGE ALTI 5m France coverage - 'DEM France 1m': IGN RGE ALTI 1m France coverage - 'AUSTRALIA 5M DEM': Geoscience Australia 5m DEM - 'USGS 3DEP 1m': USGS 3D Elevation Program 1m DEM :returns: DEM image clipped to region :rtype: ee.Image .. note:: Some sources may have limited coverage or require special access permissions. The function will raise an error if the selected source is not available for the specified region. .. py:function:: save_geotiff_esa_land_cover(roi, geotiff_path) Save ESA WorldCover land cover data as a colored GeoTIFF. Downloads and exports the ESA WorldCover 10m resolution global land cover map. The output is a colored GeoTIFF where each land cover class is represented by a unique color as defined in the color_map. :param roi: Earth Engine geometry defining region of interest :param geotiff_path: Output path for GeoTIFF file Land cover classes and their corresponding colors: - Trees (10): Dark green - Shrubland (20): Orange - Grassland (30): Yellow - Cropland (40): Purple - Built-up (50): Red - Barren/sparse vegetation (60): Gray - Snow and ice (70): White - Open water (80): Blue - Herbaceous wetland (90): Teal - Mangroves (95): Light green - Moss and lichen (100): Beige .. note:: The output GeoTIFF is exported at 10m resolution, which is the native resolution of the ESA WorldCover dataset. .. py:function:: save_geotiff_dynamic_world_v1(roi, geotiff_path, date=None) Save Dynamic World land cover data as a colored GeoTIFF. Downloads and exports Google's Dynamic World near real-time land cover classification. The data is available globally at 10m resolution from 2015 onwards, updated every 2-5 days. :param roi: Earth Engine geometry defining region of interest :param geotiff_path: Output path for GeoTIFF file :param date: Optional date string (YYYY-MM-DD) to get data for specific time. If None, uses the most recent available image. Land cover classes and their colors: - water: Blue (#419bdf) - trees: Dark green (#397d49) - grass: Light green (#88b053) - flooded_vegetation: Purple (#7a87c6) - crops: Orange (#e49635) - shrub_and_scrub: Tan (#dfc35a) - built: Red (#c4281b) - bare: Gray (#a59b8f) - snow_and_ice: Light purple (#b39fe1) .. note:: If a specific date is provided but no image is available, the function will use the closest available date and print a message indicating the actual date used. .. py:function:: save_geotiff_esri_landcover(roi, geotiff_path, year=None) Save ESRI Land Cover data as a colored GeoTIFF. Downloads and exports ESRI's 10m resolution global land cover classification. This dataset is updated annually and provides consistent global coverage. :param roi: Earth Engine geometry defining region of interest :param geotiff_path: Output path for GeoTIFF file :param year: Optional year (YYYY) to get data for specific time. If None, uses the most recent available year. Land cover classes and colors: - Water (#1A5BAB): Water bodies - Trees (#358221): Tree cover - Flooded Vegetation (#87D19E): Vegetation in water-logged areas - Crops (#FFDB5C): Agricultural areas - Built Area (#ED022A): Urban and built-up areas - Bare Ground (#EDE9E4): Exposed soil and rock - Snow/Ice (#F2FAFF): Permanent snow and ice - Clouds (#C8C8C8): Cloud cover - Rangeland (#C6AD8D): Natural vegetation .. note:: The function will print the year of the data actually used, which may differ from the requested year if data is not available for that time. .. py:function:: save_geotiff_open_buildings_temporal(aoi, geotiff_path) Save Open Buildings temporal data as a GeoTIFF. Downloads and exports building height data from Google's Open Buildings dataset. This dataset provides building footprints and heights derived from satellite imagery. :param aoi: Earth Engine geometry defining area of interest :param geotiff_path: Output path for GeoTIFF file .. note:: - The output GeoTIFF contains building heights in meters - The dataset is updated periodically and may not cover all regions - Resolution is fixed at 4 meters per pixel - Areas without buildings will have no-data values .. py:function:: save_geotiff_dsm_minus_dtm(roi, geotiff_path, meshsize, source) Get the height difference between DSM and DTM from terrain data. Calculates the difference between Digital Surface Model (DSM) and Digital Terrain Model (DTM) to estimate heights of buildings, vegetation, and other above-ground features. :param roi: Earth Engine geometry defining area of interest :param geotiff_path: Output path for GeoTIFF file :param meshsize: Size of each grid cell in meters - determines output resolution :param source: Source of terrain data, one of: - 'England 1m DSM - DTM': UK Environment Agency 1m resolution - 'Netherlands 0.5m DSM - DTM': AHN4 0.5m resolution .. note:: - A 100m buffer is automatically added around the ROI to ensure smooth interpolation at edges - The output represents height above ground level in meters - Negative values may indicate data artifacts or actual below-ground features - The function requires both DSM and DTM data to be available for the region .. py:function:: load_gdf_from_openstreetmap(rectangle_vertices, floor_height=3.0) Download and process building footprint data from OpenStreetMap. This function: 1. Downloads building data using the Overpass API 2. Processes complex relations and their members 3. Extracts height information and other properties 4. Converts features to a GeoDataFrame with standardized properties :param rectangle_vertices: List of (lon, lat) coordinates defining the bounding box :type rectangle_vertices: list :returns: GeoDataFrame containing building footprints with properties: - geometry: Polygon or MultiPolygon - height: Building height in meters - levels: Number of building levels - min_height: Minimum height (for elevated structures) - building_type: Type of building - And other OSM tags as properties :rtype: geopandas.GeoDataFrame .. py:function:: load_land_cover_gdf_from_osm(rectangle_vertices_ori) Load and classify land cover data from OpenStreetMap. This function: 1. Downloads land cover features using the Overpass API 2. Classifies features based on OSM tags 3. Handles special cases like roads with width information 4. Projects geometries for accurate buffering 5. Creates a standardized GeoDataFrame with classifications :param rectangle_vertices_ori: List of (lon, lat) coordinates defining the area :type rectangle_vertices_ori: list :returns: GeoDataFrame with: - geometry: Polygon or MultiPolygon features - class: Land cover classification name - Additional properties from OSM tags :rtype: geopandas.GeoDataFrame .. py:function:: load_tree_gdf_from_osm(rectangle_vertices, default_top_height=10.0, default_trunk_height=4.0, default_crown_diameter=None, default_crown_ratio=0.6, include_polygons=True) Download and process individual tree data and tree land cover polygons from OpenStreetMap. This function downloads tree point data (natural=tree) and optionally tree land cover polygons (natural=wood, landuse=forest, natural=tree_row) from OpenStreetMap and creates a GeoDataFrame compatible with VoxCity's tree canopy processing. For individual trees, it extracts height and crown diameter information from OSM tags when available, or uses default values when not specified. For tree polygons (forests, woods), it assigns default height values and the polygon geometry is preserved for rasterization during canopy grid creation. OSM tags used for tree properties: - height, est_height: Tree height in meters - diameter_crown: Crown diameter in meters - circumference: Trunk circumference (used to estimate crown if diameter_crown missing) - genus, species: Tree species information (stored as properties) - leaf_type: broadleaved/needleleaved (stored as property) - leaf_cycle: deciduous/evergreen (stored as property) Crown diameter estimation priority (for point trees only): 1. Use diameter_crown tag if available 2. Estimate from circumference tag (trunk circumference × 15 / π) 3. Use default_crown_diameter if specified 4. Estimate from tree height (height × default_crown_ratio) :param rectangle_vertices: List of (lon, lat) coordinates defining the bounding box. Should be 4 vertices forming a rectangle. :type rectangle_vertices: list :param default_top_height: Default tree top height in meters when not specified in OSM. Defaults to 10.0 meters. :type default_top_height: float :param default_trunk_height: Default trunk height (height to bottom of canopy) in meters. This is the height where the canopy starts. Defaults to 4.0 meters. :type default_trunk_height: float :param default_crown_diameter: Default crown diameter in meters. If None, crown diameter is estimated from tree height using default_crown_ratio. Only used for Point geometries. :type default_crown_diameter: float, optional :param default_crown_ratio: Ratio of crown diameter to tree height, used when crown diameter cannot be determined from OSM tags and default_crown_diameter is None. Defaults to 0.6 (e.g., a 10m tall tree would have a 6m crown diameter). :type default_crown_ratio: float :param include_polygons: If True, also download tree land cover polygons (forests, woods, tree_rows). Defaults to True. Set to False to only get individual trees. :type include_polygons: bool :returns: GeoDataFrame containing tree features with columns: - geometry: Point or Polygon geometry (lon, lat) - geometry_type: 'point' for individual trees, 'polygon' for forest/wood areas - tree_id: Unique identifier for each feature - top_height: Height to the top of the tree canopy in meters - bottom_height: Height to the bottom of the canopy (trunk height) in meters - crown_diameter: Diameter of the tree crown in meters (0 for polygons) - genus: Tree genus if available - species: Tree species if available - leaf_type: Leaf type (broadleaved/needleleaved) if available - leaf_cycle: Leaf cycle (deciduous/evergreen) if available - osm_id: Original OSM element ID - osm_type: OSM element type ('node', 'way', 'relation') :rtype: geopandas.GeoDataFrame .. rubric:: Example >>> vertices = [(-73.99, 40.75), (-73.98, 40.75), (-73.98, 40.76), (-73.99, 40.76)] >>> # Get both individual trees and forest polygons >>> tree_gdf = load_tree_gdf_from_osm(vertices) >>> # Get only individual trees >>> tree_gdf = load_tree_gdf_from_osm(vertices, include_polygons=False) >>> # Customize defaults: 15m tall trees, 5m trunk >>> tree_gdf = load_tree_gdf_from_osm(vertices, default_top_height=15.0, ... default_trunk_height=5.0) .. note:: - Individual trees have Point geometry with crown_diameter for ellipsoid rendering - Tree polygons have Polygon geometry and are rasterized as flat canopy areas - Tree polygon types: natural=wood, landuse=forest, natural=tree_row - Crown diameter estimation from trunk circumference uses an empirical ratio - The function uses multiple Overpass API endpoints for reliability .. py:data:: OVERPASS_ENDPOINTS :value: ['https://overpass-api.de/api/interpreter', 'https://overpass.kumi.systems/api/interpreter',... .. py:data:: tag_osm_key_value_mapping .. py:data:: classification_mapping .. py:function:: save_oemj_as_geotiff(polygon, filepath, zoom=16, *, ssl_verify=True, allow_insecure_ssl=False, allow_http_fallback=False, timeout_s=30) Download and save OpenEarthMap Japan imagery as a georeferenced GeoTIFF file. This is the main function that orchestrates the entire process of downloading, processing, and saving satellite imagery for a specified region. :param polygon: List of (lon, lat) coordinates defining the region to download. Must be in clockwise or counterclockwise order. :type polygon: list :param filepath: Output path for the GeoTIFF file :type filepath: str :param zoom: Zoom level for detail. Defaults to 16. - 14: ~9.5m/pixel - 15: ~4.8m/pixel - 16: ~2.4m/pixel - 17: ~1.2m/pixel - 18: ~0.6m/pixel :type zoom: int, optional .. rubric:: Example >>> polygon = [ (139.7, 35.6), # Bottom-left (139.8, 35.6), # Bottom-right (139.8, 35.7), # Top-right (139.7, 35.7) # Top-left ] >>> save_oemj_as_geotiff(polygon, "tokyo_area.tiff", zoom=16) .. note:: - Higher zoom levels provide better resolution but require more storage - The polygon should be relatively small to avoid memory issues - The output GeoTIFF will be in Web Mercator projection (EPSG:3857) .. py:function:: load_gdf_from_eubucco(rectangle_vertices, output_dir) Downloads EUBUCCO data and loads it as GeoJSON. This function serves as the main interface for loading EUBUCCO building data. It handles the complete workflow from downloading to processing the data. Parameters: - rectangle_vertices (list of tuples): List of (longitude, latitude) tuples defining the area The first vertex is used to determine which country's data to download - output_dir (str): Directory to save intermediate and output files Creates a subdirectory 'EUBUCCO_raw' for raw downloaded data Returns: - geopandas.GeoDataFrame: DataFrame containing: - geometry: Building footprint polygons - height: Building heights in meters - id: Unique identifier for each building or None if the target area has no EUBUCCO data Notes: - Output is always in WGS84 (EPSG:4326) coordinate system - Building heights are in meters - Buildings without height data are assigned a height of -1.0 - The function automatically determines the appropriate country dataset .. py:function:: get_gdf_from_eubucco(rectangle_vertices, country_links, output_dir, file_name) Downloads, extracts, filters, and converts GeoPackage data to GeoJSON based on the rectangle vertices. This function: 1. Determines the target country based on input coordinates 2. Downloads and extracts EUBUCCO data for that country 3. Reads the GeoPackage into a GeoDataFrame 4. Ensures correct coordinate reference system 5. Assigns unique IDs to buildings Parameters: - rectangle_vertices (list of tuples): List of (longitude, latitude) tuples defining the area of interest - country_links (dict): Dictionary mapping country names to their respective GeoPackage URLs - output_dir (str): Directory to save downloaded and processed files - file_name (str): Name for the output GeoJSON file Returns: - geopandas.GeoDataFrame: DataFrame containing building geometries and properties or None if the target area has no EUBUCCO data Notes: - Automatically transforms coordinates to WGS84 (EPSG:4326) if needed - Assigns sequential IDs to buildings starting from 0 - Logs errors if target area is not covered by EUBUCCO .. py:function:: load_gdf_from_overture(rectangle_vertices, floor_height=3.0) Download and process building footprint data from Overture Maps. This function serves as the main entry point for downloading building data. It handles the complete workflow of downloading both building and building part data, combining them, and preparing them for further processing. :param rectangle_vertices: List of (lon, lat) coordinates defining the bounding box for data download :type rectangle_vertices: list :returns: Combined dataset containing: - Building and building part geometries - Standardized properties - Sequential numeric IDs :rtype: GeoDataFrame .. note:: - Downloads both building and building_part data from Overture Maps - Combines the datasets while preserving all properties - Assigns sequential IDs based on the final dataset index .. py:function:: load_gdf_from_gba(rectangle_vertices: Sequence[Tuple[float, float]], base_url: str = 'https://data.source.coop/tge-labs/globalbuildingatlas-lod1', download_dir: Optional[str] = None, clip_to_rectangle: bool = False) -> Optional[geopandas.GeoDataFrame] Download GBA tiles intersecting a rectangle and return combined GeoDataFrame. :param rectangle_vertices: Sequence of (lon, lat) defining the area of interest. :param base_url: Base URL hosting GBA parquet tiles. :param download_dir: Optional directory to store downloaded tiles. If None, a temporary directory is used and cleaned up by the OS later. :param clip_to_rectangle: If True, geometries are clipped to rectangle extent. :returns: 4326 geometry and an 'id' column, or None if no data. :rtype: GeoDataFrame with EPSG