# A simple script to download and organize HRRR smoke data import os import sys import datetime import pytz import time import numpy as np import pandas as pd import xarray as xr from matplotlib import pyplot as plt import cartopy.crs as ccrs import cartopy.feature as cfeature import cartopy.io.img_tiles as cimgt from herbie import Herbie sys.path.append('/uufs/chpc.utah.edu/common/home/u0890904/LAIR_1/atmos') from utils import datetime_utils def get_dts_inrange(dtr,oof_path,hours,hours_tz): # Get a list of all the dates in the EM27 out of focus data hours_tz = pytz.timezone(hours_tz) dates = dtr.get_dates_in_range() em27_dts = [] inst_ids = os.listdir(oof_path) inst_ids = [inst_id for inst_id in inst_ids if not inst_id.startswith('.')] for date in dates: for inst_id in inst_ids: datestr = datetime.datetime.strftime(date,'%Y%m%d') inst_path = os.path.join(oof_path,inst_id) inst_files = os.listdir(inst_path) inst_files = [inst_file for inst_file in inst_files if not inst_file.startswith('.')] for inst_file in inst_files: if datestr in inst_file: dt = datetime.datetime.strptime(datestr, '%Y%m%d') loc_dt = hours_tz.localize(dt) for hour in hours: loc_dt = loc_dt.replace(hour=hour) utc_dt = loc_dt.astimezone(pytz.utc) em27_dts.append(utc_dt) em27_dts = sorted(list(set(em27_dts))) return em27_dts def get_xarray_for_hour(dtstr,save_dir): H = Herbie( dtstr, # Model initialization time model="hrrr", # Model name fxx=0, # Forecast step, in hours product="sfc", # Model product ) dss = H.xarray(r":MASSDEN|COLMD:",save_dir = save_dir) ds = xr.merge(dss) ds = ds.drop_vars([coord for coord in ds.coords if coord not in ['time', 'latitude', 'longitude']]) #ds = ds.rename({'unknown': 'tc_mdens'}) return ds def get_vars_nearest(ds,pointdf): nearest_ds = ds.herbie.pick_points(pointdf) dt = pd.to_datetime(nearest_ds.time.values) tc_mdens = float(nearest_ds.tc_mdens.values[0]) mdens = float(nearest_ds.mdens.values[0]) return dt, tc_mdens, mdens def create_format_df(dts,tc_mdenss,mdenss,tz = 'US/Mountain'): df = pd.DataFrame({'dt': dts, 'tc_mdens': tc_mdenss, 'mdens': mdenss}) df = df.set_index('dt') df.index = df.index.tz_localize('UTC').tz_convert(tz) df = df.sort_index() return df def main(): t1 = time.time() project_path = '/uufs/chpc.utah.edu/common/home/u0890904/LAIR_1/Projects/Ratio_paper_2025_v5' project_data_path = os.path.join(project_path,'Data') project_oof_path = os.path.join(project_path,'Data','EM27_oof') hrrr_save_dir = '/uufs/chpc.utah.edu/common/home/lin-group9/agm/HRRR/SLV_Smoke' dtr = datetime_utils.DateTimeRange('2022-05-01', '2025-02-01') hours = [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21] hours_tz = 'US/Mountain' em27_dts = get_dts_inrange(dtr,project_oof_path,hours,hours_tz) pointdf = pd.DataFrame({'latitude': [40.768], 'longitude': [-111.854]}) dts = [] tc_mdenss = [] mdenss = [] for dt in em27_dts: dtstr = dt.strftime('%Y-%m-%d %H:00') print(dtstr) ds = get_xarray_for_hour(dtstr,hrrr_save_dir) dt, tc_mdens, mdens = get_vars_nearest(ds,pointdf) dts.append(dt) tc_mdenss.append(tc_mdens) mdenss.append(mdens) df = create_format_df(dts,tc_mdenss,mdenss,tz = hours_tz) df.to_parquet(os.path.join(project_data_path,'hrrr_smoke.parquet'), index=True) t2 = time.time() print(f'Time elapsed: {round(t2-t1)} seconds') if __name__ == '__main__': main()