Data Lineage
Full source-to-destination field mapping for every ingestion pipeline. Shows original source field names, spatial aggregation methods, unit conversions, and derived calculations.
USDM Drought Monitor
Weekly county-level drought conditions ingested from the USDM REST API.
drought_observations
| Destination field | Type | Source field | Transformation |
|---|---|---|---|
| fips | string(5) | fips | Zero-padded to 5 digits. Rows with non-numeric FIPS discarded. |
| valid_date | date | valid_date | Passed from ingestion call (not from API response). |
| county_name | string | county | Passed as-is. Null if absent. |
| state_abbr | string(2) | state | Passed as-is. Null if absent. |
| d0_pct β d4_pct | numeric(6,3) | d0 β d4 | Percentage of county area in each drought category. Rounded to 3 decimal places. |
| drought_severity | smallint | derived | Highest active category: if d4>0β4, d3>0β3, d2>0β2, d1>0β1, d0>0β0, else β1. |
| drought_category | string | derived | String label of highest active category ("D4"β¦"D0"). Null when drought-free (severity = β1). |
drought_metrics (computed from drought_observations history)
| Destination field | Type | Inputs | Computation |
|---|---|---|---|
| weeks_in_drought | smallint | drought_severity | Consecutive weeks with severity β₯ 0 (D0 or worse) ending on this date. |
| severity_change_4w | smallint | drought_severity | Current severity minus severity 4 weeks prior. |
| avg_severity_4w | numeric(4,2) | drought_severity | Rolling mean over 4 weeks. Rounded to 2 decimals. |
| avg_severity_12w | numeric(4,2) | drought_severity | Rolling mean over 12 weeks. Rounded to 2 decimals. |
| burden_52w | smallint | drought_severity | Sum of max(0, severity) over 52 weeks. D0 weeks = 0, D1 = 1, D4 = 4. |
| severity_percentile_20y | numeric(5,1) | drought_severity | Percentile rank of current severity within 20-year history for that county and calendar week. |
| severity_anomaly | numeric(4,2) | drought_severity | Current severity minus 20-year mean for this county and week-of-year. |
| drought_regime | string | severity, change |
Classification combining current severity, 4-week change, and persistence.
drought_free Β· entering Β· deepening Β· flash_drought Β· persistent Β· recovering
|
PRISM Climate
Daily 4km gridded climate variables spatially aggregated to county boundaries using a pre-built cell-to-county weight table.
| Destination field | Type | Source variable | Transformation |
|---|---|---|---|
| fips | string(5) | prism_cell_county_weights | FIPS from spatial weight table. Each county cell mapped via pre-built weights. |
| obs_date | date | obs_date | Passed from ingestion call. |
| precip_mm | numeric(7,2) | ppt | Precipitation in mm. Weighted mean of valid raster cells in county. Rounded to 2 decimals. |
| tmax_c | numeric(5,2) | tmax | Maximum daily temperature in Β°C. Weighted mean of valid raster cells. |
| tmin_c | numeric(5,2) | tmin | Minimum daily temperature in Β°C. Weighted mean of valid raster cells. |
| vpdmax_hpa | numeric(6,2) | vpdmax | Maximum vapor pressure deficit in hPa. Weighted mean of valid raster cells. |
| cell_count | smallint | derived | Number of valid (non-NaN) raster cells contributing to the county mean. |
Rolling window sums (7d, 30d, 90d) are computed at query time from precip_mm daily values β they are not stored as separate columns.
Growing Degree Days
Derived daily from PRISM tmax/tmin using the simple averaging method with crop-specific base and cap temperatures.
| Destination field | Type | Inputs | Base / Cap (Β°C) |
|---|---|---|---|
| gdd_corn | numeric(7,3) | tmax_c, tmin_c | Base 10.0 / Cap 30.0 |
| gdd_soybeans | numeric(7,3) | tmax_c, tmin_c | Base 10.0 / Cap 30.0 |
| gdd_wheat | numeric(7,3) | tmax_c, tmin_c | Base 0.0 / Cap 26.0 |
| gdd_cotton | numeric(7,3) | tmax_c, tmin_c | Base 15.5 / Cap 37.7 |
| gdd_sorghum | numeric(7,3) | tmax_c, tmin_c | Base 10.0 / Cap 37.8 |
All GDD values are clipped to a minimum of 0 and rounded to 3 decimal places. Null when either tmax or tmin is null for that county-date. Season-to-date accumulations and prior-year comparisons are computed at query time by summing daily gdd_* values from January 1.
NASA SMAP Soil Moisture
Daily HDF5 granules spatially aggregated to county level using a pre-built cell-to-county weight table.
| Destination field | Type | HDF5 array | Pass | Transformation |
|---|---|---|---|---|
| sm_mean_am | numeric(6,4) | sm | AM | Mean of valid (non-fill) cells within county. Fill value excluded. Null when no valid cells. |
| sm_std_am | numeric(6,4) | sm | AM | Standard deviation of valid cells within county. |
| coverage_pct_am | numeric(5,2) | sm | AM | Valid cell count / total county cell count Γ 100. |
| sm_error_am | numeric(6,4) | sm_error | AM | Mean retrieval uncertainty over SM-valid cells. |
| surface_temp_am | numeric(6,2) | surface_temp | AM | Mean surface temperature (K) over SM-valid cells. |
| vwc_am | numeric(6,4) | vwc | AM | Mean vegetation water content (kg/mΒ²) over SM-valid cells. |
| freeze_thaw_am | numeric(5,3) | freeze_thaw | AM | Mean freeze/thaw state flag over all county cells (0 = thawed, 1 = frozen). |
| sm_mean_pm / β¦_pm | same | same arrays | PM | Same aggregations from the ascending (PM) overpass. Often null at higher latitudes. |
smap_soil_properties (static, extracted once)
| Destination field | Type | HDF5 array | Transformation |
|---|---|---|---|
| bulk_density | numeric(7,3) | bulk_density | Mean of county cells (g/cmΒ³). |
| clay_fraction | numeric(5,3) | clay_fraction | Mean of county cells (fraction 0β1). |
NASS Crop Yields
Annual county-level crop yields and harvested acreage from the USDA NASS Quick Stats API.
| Destination field | Type | Source field | Transformation |
|---|---|---|---|
| fips | string(5) | state_fips_code + county_code | Concatenated and zero-padded: state.zfill(2) + county.zfill(3). Rows with county code "998" or "999" (state rollups) discarded. |
| year | smallint | year | Parsed to integer. |
| crop | string | commodity_desc | Normalized to canonical name. "CORN" β "CORN", "SOYBEANS" β "SOYBEANS", "WHEAT" + class "WINTER" β "WHEAT_WINTER", etc. |
| yield_value | numeric(10,2) | Value (YIELD query) | Commas removed, parsed to float. Null when NASS value is "(D)" (suppressed for confidentiality) or blank. |
| yield_unit | string | unit_desc | Passed as-is. E.g. "BU / ACRE", "LB / ACRE". |
| acres_harvested | integer | Value (AREA HARVESTED query) | Commas removed, parsed to int. Joined to yield rows by (fips, year). Null when suppressed. |
NASS Crop Progress
Weekly state-level crop progress percentages from USDA NASS Quick Stats.
| Destination field | Type | Source field | Transformation |
|---|---|---|---|
| state_fips | string(2) | state_fips_code | Zero-padded to 2 digits. Rows with state code "00" (national aggregate) discarded. |
| week_ending | date | week_ending | Parsed from "YYYY-MM-DD". Rows with missing or invalid dates discarded. |
| crop | string | commodity_desc | Normalized to canonical name. "WHEAT" + class "WINTER" β "WHEAT_WINTER", "WHEAT" + "SPRING" β "WHEAT_SPRING". |
| stage | string | unit_desc | "PCT " prefix stripped. "PCT PLANTED" β "PLANTED", "PCT EMERGED" β "EMERGED", etc. |
| progress_pct | numeric(5,2) | Value | Commas removed, parsed to float. Null when value is "(D)", "(NA)", "(Z)", or blank. |
MODIS NDVI
NASA MODIS MOD13A2 16-day NDVI composites ingested via NASA AppEEARS, aggregated to county level and normalized against a 25-year seasonal baseline.
| Destination field | Type | Source field | Transformation |
|---|---|---|---|
| fips | char(5) | filename / category | Extracted from AppEEARS category field. Zero-padded to 5 digits. |
| obs_date | date | Date | Parsed from AppEEARS date column (YYYY-MM-DD). |
| ndvi_mean | numeric(7,4) | MOD13A2_061__1_km_16_days_NDVI | Mean across all pixels within county boundary. Pixels with pixel_reliability > 1 excluded (marginal or snow/ice quality). Scale factor 0.0001 applied. Null if no valid pixels remain. |
| pixel_reliability | float | MOD13A2_061__1_km_16_days_pixel_reliability | Mean pixel_reliability value across county pixels used for quality filtering. 0=good, 1=marginal, 2=snow/ice, 3=cloudy. |
| Destination field (ndvi_baseline) | Type | Source | Computation |
|---|---|---|---|
| doy_period | smallint | obs_date | 16-day period of year: ((doy β 1) / 16 + 1). Periods 1β23. |
| ndvi_mean | numeric(7,4) | ndvi_observations | Mean NDVI across 2000β2024 obs for this county Γ period. Tiered quality filter: use pixel_reliability=0 if β₯10 obs; else pixel_reliabilityβ€1 if β₯10 obs; else low_confidence=true. |
| ndvi_std | numeric(7,4) | ndvi_observations | Standard deviation of NDVI across baseline years for this county Γ period. |
| low_confidence | boolean | computed | True when fewer than 10 valid observations exist across the baseline window. Z-score set to null for these counties/periods. |
Crop Stress Index
Composite 0β100 agricultural stress index computed at query time. Component weights are derived from Pearson correlations between each signal's growing-season summary and county-level NASS yield deviations from trend (2001β2024, ~41k county-years for corn). Separate models for CORN, SOYBEANS, WHEAT_WINTER, SORGHUM, COTTON, and GENERAL.
| Output field | Raw scorer range | Inputs | Computation |
|---|---|---|---|
| crop | string | county_primary_crop | Dominant crop by 5-year avg harvested acres (USDA NASS). Determines which weight set is applied. Falls back to GENERAL if no NASS data. |
| components.drought | 0β40 | drought_severity, burden_52w, severity_change_4w | Severity mapped to base points (D-1=0, D0=6, D1=16, D2=26, D3=34, D4=40). +4 if worsened past 4 weeks; +2 if 52-week burden >40%. Capped at 40. Weight in score: CORN 31, SOYBEANS 27, WHEAT_WINTER 35, SORGHUM 31, COTTON 34, GENERAL 31. |
| components.vpd | 0β15 | vpd_30d_avg_hpa (prism_observations) | 30-day avg VPD vs empirical percentiles (P50/P75/P90/P95 = 22.0/28.5/37.3/43.9 hPa) β 0/3/7/11/15 pts. Weight: CORN 21, SOYBEANS 26, WHEAT_WINTER 26, SORGHUM 22, COTTON 20, GENERAL 22. Null if PRISM unavailable. |
| components.precipitation | 0β15 | precip_90d_mm, precip_7d_mm | 90-day total deficit (0β11 pts) + 4 if recent 7-day rate < 25% of 90-day daily avg. Capped at 15. Not included in WHEAT_WINTER or COTTON models (rβ0). Weight: CORN 14, SOYBEANS 21, SORGHUM 18, GENERAL 15. |
| components.soil_moisture | 0β15 | sm_mean_am (smap_observations) | VWC thresholds: <0.08=15, <0.12=11, <0.18=7, <0.25=3, β₯0.25=0. Not in WHEAT_WINTER or SORGHUM models. Score is null (not computed) for soil-dependent crops when SMAP unavailable. Weight: CORN 14, SOYBEANS 5, COTTON 10, GENERAL 7. |
| components.ndvi_anomaly | 0β10 | ndvi_observations, ndvi_baseline | Z-score of current NDVI vs 25-year seasonal baseline: z β€ β2.0=10, β€ β1.5=8, β€ β1.0=5, β€ β0.5=2, > β0.5=0. Null when low_confidence=true or no NDVI data. Weight: CORN 20, SOYBEANS 21, WHEAT_WINTER 39, SORGHUM 29, COTTON 36, GENERAL 25. |
| score | 0β100 | all components | raw_total / available_weight_sum Γ 100, rounded. available_weight_sum = sum of weights for components with valid data. Scales correctly when signals are temporarily unavailable. |
| data_coverage_pct | integer | all components | available_weight_sum / crop_model_total_weight Γ 100. 100 = all signals present for this crop model. |
FEMA Flood Zones
County-level flood zone area percentages derived from the FEMA National Flood Hazard Layer (NFHL) via ArcGIS outStatistics.
| Destination field | Type | Source field | Transformation |
|---|---|---|---|
| fips | char(5) | county FIPS | 5-digit FIPS; one row per county (primary key). Counties absent from TIGER boundaries skipped. |
| sfha_pct | numeric(5,2) | FLD_ZONE (A*, V*) | % of county land area in Special Flood Hazard Area β all A-prefix and V-prefix zones. Area in sq degrees converted to mΒ² via centroid-latitude scaling; expressed as % of Census TIGER ALAND. |
| moderate_pct | numeric(5,2) | FLD_ZONE=X, ZONE_SUBTY contains "0.2" or "500" | % of county land in 0.2% annual chance flood zone (shaded Zone X). Identified by ZONE_SUBTY containing "0.2" or "500". |
| minimal_pct | numeric(5,2) | FLD_ZONE=X (unshaded), B, C | % of county land in minimal risk zones β unshaded Zone X, B, and C zones. |
| mapped_pct | numeric(5,2) | sfha + moderate + minimal | % of county land covered by any effective FIRM designation. Excludes Zone D (undetermined). If total zone area exceeds TIGER ALAND, all zones are scaled proportionally to cap mapped_pct at 100. |
| fetched_at | timestamptz | server clock | Timestamp of ingestion run. Set by DB default (NOW()). |
Open water (OPEN WATER zone) and Zone D (undetermined risk) are excluded from all percentages. NULL values indicate the county returned no NFHL features from the API.
FEMA National Risk Index
County-level composite and per-hazard risk scores, Expected Annual Loss, and social vulnerability / community resilience ratings from FEMA NRI v1.20.
County base and composite scores
| Destination field | Type | NRI source field | Transformation |
|---|---|---|---|
| fips | char(5) | STCOFIPS | Zero-padded to 5 digits. Rows with invalid FIPS discarded. |
| population | integer | POPULATION | 2020 Census population. Rounded to integer. NULL when source is null or β999. |
| building_value_usd | bigint | BUILDVALUE | Total building replacement value (USD). Rounded to integer. |
| agri_value_usd | bigint | AGRIVALUE | Total agricultural value (USD). Rounded to integer. |
| area_sqmi | numeric(10,2) | AREA | County area in square miles. |
| risk_score / risk_rating | numeric(6,2) / varchar(30) | RISK_SCORE / RISK_RATNG | Composite NRI risk score (0β100 percentile vs all US counties) and text rating. |
| eal_score / eal_rating | numeric(6,2) / varchar(30) | EAL_SCORE / EAL_RATNG | Composite Expected Annual Loss score and rating. |
| eal_total_usd | bigint | EAL_VALT | Total EAL across all hazards β buildings + ag + population proxy (USD). |
| sovi_score / sovi_rating | numeric(8,4) / varchar(30) | SOVI_SCORE / SOVI_RATNG | CDC/ATSDR Social Vulnerability Index score and rating. |
| resl_score / resl_rating | numeric(8,4) / varchar(30) | RESL_SCORE / RESL_RATNG | Community Resilience score and rating. |
| fetched_at | timestamptz | server clock | Timestamp of ingestion run. Set by DB default (NOW()). |
Per-hazard columns (17 hazards Γ 4 fields)
| Column pattern | Type | NRI source field | Transformation |
|---|---|---|---|
| {hazard}_risk_score | numeric(6,2) | {PREFIX}_RISKS | Hazard-specific risk index score (0β100). NULL when NRI value is null or the sentinel β999 (Not Applicable). |
| {hazard}_risk_rating | varchar(30) | {PREFIX}_RISKR | Text rating: Very High / High / Relatively High / Medium / Relatively Low / Low / Very Low / Not Applicable. Empty string coerced to NULL. |
| {hazard}_eal_usd | bigint | {PREFIX}_EALB | Expected Annual Loss β buildings only (USD). Rounded to integer. |
| {hazard}_alr | numeric(12,10) | {PREFIX}_ALRB | Annualized Loss Rate β buildings (loss / building exposure value). |
Hazard prefixes: avln avalanche Β· cfld coastal flooding Β· cwav cold wave Β· drgt drought Β· erqk earthquake Β· hail hail Β· hwav heat wave Β· hrcn hurricane Β· ifld inland flooding Β· istm ice storm Β· lnds landslide Β· ltng lightning Β· swnd strong wind Β· trnd tornado Β· tsun tsunami Β· vlcn volcanic activity Β· wfir wildfire
RMA Crop Insurance Loss History
Annual county-level crop insurance indemnity, premium, and loss records from USDA RMA Cause of Loss ZIP files. One row per county Γ commodity Γ insurance plan Γ stage Γ cause Γ month.
| Destination field | Type | Source column (1-indexed) | Transformation |
|---|---|---|---|
| fips | char(5) | cols 2 + 4 | state_code.zfill(2) + county_code.zfill(3). Rows where state="00" or county="000" (aggregated totals) are discarded. |
| state_fips | char(2) | col 2 | Zero-padded to 2 digits. |
| county_fips | char(3) | col 4 | Zero-padded to 3 digits. |
| state_abbr | varchar | col 3 | Passed as-is. NULL if blank. |
| county_name | varchar | col 5 | Passed as-is. NULL if blank. |
| commodity_year | smallint | col 1 | Crop year (e.g. 2023). Part of the composite primary key. |
| commodity_code | smallint | col 6 | RMA commodity code. Rows with null code discarded. |
| commodity_name | varchar | col 7 | Passed as-is. NULL if blank. |
| insurance_plan_code | smallint | col 8 | RMA insurance plan code. Part of composite key. Rows with null code discarded. |
| insurance_plan_abbr | varchar | col 9 | Passed as-is. NULL if blank. |
| stage_code | varchar | col 10 | Crop stage at loss. Defaults to "00" if blank. Part of composite key. |
| cause_of_loss_code | smallint | col 11 (29-col) / col 12 (30-col) | RMA cause of loss code (e.g. 58 = Drought). In the 30-column format (2024+), RMA inserted a sub-stage column at position 10, shifting cause-of-loss and all subsequent columns by one. Part of composite key. |
| cause_of_loss_desc | varchar | col 12 / col 13 | Human-readable cause description as published by RMA. NULL if blank. |
| month_of_loss | smallint | col 13 / col 14 | 1β12; 0 = month unspecified. Part of composite key. |
| policies_earning_premium | integer | col 16 / col 17 | Integer count. NULL if blank. |
| policies_indemnified | integer | col 17 / col 18 | Integer count. NULL if blank. |
| net_planted_acres | numeric | col 18 / col 19 | Commas stripped, parsed to float. Col 19/20 (net endorsed acres) is not stored. |
| liability | numeric | col 20 / col 21 | Total liability (USD). Commas stripped. |
| total_premium | numeric | col 21 / col 22 | Total premium (USD). Cols 22β26 (producer paid premium, subsidies, EFA discount) and col 27 (net determined acres) are not stored. |
| indemnity_amount | numeric | col 28 / col 29 | Indemnity paid (USD). Commas stripped. |
| loss_ratio | numeric | col 29 / col 30 | indemnity / total_premium. Passed from source as-is. |
| ingested_at | timestamptz | server clock | Set to NOW() on each upsert. |
Composite primary key: (commodity_year, state_fips, county_fips, commodity_code, insurance_plan_code, stage_code, cause_of_loss_code, month_of_loss). History available from 1989 onward. The API endpoint aggregates across all plans/stages/months to produce per-county summaries.
Yield Volatility
Actuarial yield risk metrics derived from USDA NASS county yield history. Computed by scripts/setup/build_yield_trend.py and stored in county_crop_yield_trend.
| Destination field | Type | Source / computation |
|---|---|---|
| yield_mean | real | numpy mean of yield_value over all fitted years for (fips, crop) |
| yield_std | real | numpy std(ddof=1) of yield_value β sample standard deviation |
| yield_cv | real | (yield_std / yield_mean) Γ 100. NULL when mean = 0 |
| slope_5y | real | OLS slope from scipy.stats.linregress over the most recent 5 data years only. NULL when n_years < 5 |
| trend_divergence | real | slope_5y β slope (long-run). Positive = recent acceleration. NULL when slope_5y is NULL |
Climate & Peril Frequency
Long-run peril frequency indicators derived from PRISM daily climate data and USDM drought records. Computed by scripts/setup/build_climate_risk.py and stored in county_climate_risk.
| Destination field | Type | Source / computation |
|---|---|---|
| drought_weeks_10y | smallint | COUNT of rows in drought_observations where drought_severity β₯ 1 AND valid_date β₯ 10-year cutoff |
| drought_freq_pct | numeric(5,2) | (D1+ weeks / total weeks) Γ 100 over full USDM history |
| extreme_drought_weeks_10y | smallint | COUNT where drought_severity β₯ 3 AND within 10-year window |
| heat_stress_days_10y | smallint | COUNT of prism_observations rows where tmax_c > 35 AND within 10-year window |
| heat_stress_days_avg_annual | numeric(5,1) | pandas mean of annual heat-stress day counts over full PRISM history |
| frost_days_10y | smallint | COUNT where tmin_c < 0 AND within 10-year window |
| frost_days_avg_annual | numeric(5,1) | pandas mean of annual frost day counts over full PRISM history |
| excess_moisture_days_10y | smallint | COUNT where precip_mm > 25 AND within 10-year window |
| annual_precip_mean_mm | numeric(7,1) | pandas mean of SUM(precip_mm) per year, grouped by (fips, year), over full history |
| annual_precip_cv | numeric(6,2) | (std / mean) Γ 100 of annual precipitation totals |
| prism_years | smallint | Number of PRISM years used β applies to heat_stress_*, frost_*, excess_moisture_*, annual_precip_* |
| drought_years | smallint | Number of USDM years used β applies to drought_weeks_10y, drought_freq_pct, extreme_drought_weeks_10y |
| computed_at | date | Date this row was last computed by build_climate_risk.py |
SMAP Soil Moisture Baseline
Per-county per-week-of-year historical statistics used to compute sm_z_score at query time. Computed by scripts/setup/build_smap_baseline.py and stored in smap_baseline. Analogous to ndvi_baseline.
| Destination field | Type | Source / computation |
|---|---|---|
| sm_mean | numeric(6,4) | pandas mean of sm_mean_am grouped by (fips, ISO week). NULL when low_confidence = TRUE |
| sm_std | numeric(6,4) | pandas std of sm_mean_am (default ddof=1). NULL when low_confidence = TRUE |
| obs_count | smallint | Number of valid am-pass observations for this (fips, week) pair |
| low_confidence | boolean | TRUE when obs_count < 5 β prevents spurious z-scores from thin history |
| sm_z_score (at query time) | float | (sm_mean_am β smap_baseline.sm_mean) / smap_baseline.sm_std, rounded to 2 decimal places. NULL when low_confidence or sm_std = 0 |
County Risk Flags
Pre-computed renewal alert flags derived from drought state, yield trend, climate risk, and primary crop. Computed by scripts/setup/build_risk_flags.py and stored in county_risk_flags.
| Flag | Type | Threshold logic |
|---|---|---|
| drought_alert | boolean | drought_metrics.drought_regime IN ('persistent', 'deepening') AND drought_observations.drought_severity β₯ 2 (D2+) |
| trend_alert | boolean | county_crop_yield_trend.below_trend_5y β₯ 3 for the county's primary crop |
| volatility_alert | boolean | county_crop_yield_trend.yield_cv > 75th percentile of yield_cv across all counties sharing the same primary crop |
| climate_alert | boolean | county_climate_risk.heat_stress_days_10y > 75th percentile nationally |