HRRR Download Scripting Tips

×

If you have not already, please register as a user before downloading data. Citation details can be found at the bottom of this page




You may write your own script to automate the download process, but PLEASE do not download an excessive number of files in a short period of time on multiple nodes (you agreed to not do this when you read the Best Practices).

HRRR GRIB2 files are large. sfc files are >100 MB and prs files are >380 MB each. If you download a day's worth of prs analyses, thats over 9 GB!


GRIB2 files are downloaded from the URL

https://pando-rgw01.chpc.utah.edu/ [model type]/ [fields]/ [YYYYMMDD]/ [file name]

Metadata for each file can be viewed from the same URL except with .idx appended to the grib2 file name

https://pando-rgw01.chpc.utah.edu/ [model type]/ [fields]/ [YYYYMMDD]/ [file name].idx

The model type and variable fields available include:

  • [model type] hrrr for the operational HRRR
    • [fields] sfc
    • [fields] prs
    • [fields] subh (sparse availability, if any)
    • [fields] nat (sparse availability, if any)
  • [model type] hrrrX for the experimental HRRR
    • [fields] sfc
  • [model type] hrrrak for HRRR Alaska
    • [fields] sfc
    • [fields] prs

[YYYYMMDD] represents the UTC date format (e.g. 20171228).

[file name] is in the format [model type].t[00-23]z.wrf[fields]f[00-18].grib2
where the two digit number following t is the model run hour and the two digit number following f is the model forecast hour.

Model NameModel TypeModel CycleArchived Forecasts
Operational HRRRhrrrHourly
range(0,23)
sfc: f00-f18 or f00-f36
prs: f00
Experimental HRRRhrrrXHourly
range(0,23)
sfc: f00
HRRR AlaskahrrrakEvery 3 hours
range(0,23,3)
sfc: f00-18 or f00-f36

Example

https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20180101/hrrr.t00z.wrfsfcf00.grib2

https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20180101/hrrr.t00z.wrfsfcf00.grib2.idx

The alternative download page may help you better understand the URL structure

cURL download full file

curl -O https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20180101/hrrr.t00z.wrfsfcf00.grib2


cURL download full file and rename

curl -o hrrr20180101_00zf00.grib2 https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20180101/hrrr.t00z.wrfsfcf00.grib2


wget download full file

wget https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20180101/hrrr.t00z.wrfsfcf00.grib2


cURL download single variable

GRIB files are gridded binary. They are made of "messages" or "fields" stacked on top of each other. Each field contains the data for a variable at a specific level across the model domain. It is possible to download portions of the full GRIB2 file and what you get is a valid GRIB2 file.

If you know the byte range of the variable you want (found from the .idx file), you can retrieve that single variable. The .idx> files share the same URL as the grib2, except with .idx> appended to the end. For example, from the .idx file for F00 valid at 0000 UTC 1 January 2018, we see that the byte range for TMP:2 m starts at 34884036 and ends at 36136433.

curl -o 20180101_00zf00_2mTemp.grib2 --range 34884036-36136433 https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20180101/hrrr.t00z.wrfsfcf00.grib2

After inspecting the downloaded file, you will see cURL has downloaded a valid GRIB2 file with only the 2 meter temperature variable.

You could repeat the steps for different byte ranges to get different variables and append the output to a file

curl --range ######-###### >> outFile.grib2


Precipitation Variables

Most precipitation fields for the model analyses (F00) are zero. This is because precipitation is not an instantaneous variable like temperature or humidity--precipitation rate and accumulation is a measure over a period. When you download precipitation fields like PRATE or APCP, download F01 or greater because F00 is all zeros.

cURL download several variables

Unfortunately, the curl --range function wont work if you request more than one range. I don't know why this doesn't work, but it must be a limitation of the Pando archive. Fortunately, similar variables are usually grouped together, like U and V wind compenents, so you can request a range that spans the variables you want. This example gets TMP, POT, SPFH, DPT, RH, UGRD, VGRD, WIND at 2 meters.

curl -o 20180101_00zf00_2mTemp2mDPT10mwind.grib2 --range 34884036-44863087 https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20180101/hrrr.t00z.wrfsfcf00.grib2


Details on different methods to download HRRR data with Python are on GitHub.

Python HRRR Download Examples

I use Anaconda with Python version 3. You may use this environment.yml file and find instructions on creating the conda environment on GitHub. If you are unfamiliar with conda environments, look at the README for details on how to create my environment from the environment.yml file.

Below is a summary of the examples.


How to download HRRR GRIB2 files

If you just want to download a few files, the easiest way is with the web download interface which makes downloading files as easy as clicking a button.

Most users, however, need to download a bunch of HRRR files. You will need to write a script for that. The most simple way to download a file from Pando is with urllib.request. Writing a script is as simple as changing the URL to get the files you want.

import urllib.request
download_this = 'https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200624/hrrr.t01z.wrfsfcf17.grib2'
save_as = 'my_file.grib2'
urllib.request.urlretrieve(download_this, save_as)

I made this Jupyter Notebook for you. It demonstrates how to use my download_HRRR function to download many files from the Pando (or NOMADS) HRRR archive. All you need to give the function is a list of datetimes and forecast hours.

Jupyter Notebook: Part 1

How to download select variables from a HRRR GRIB2 file

Full HRRR files are very large (~150 MB) and that adds up quick if you need a lot of days and forecasts. Often, you only need some of the data in the file. Remember that GRIB2 data is based on messages made up of binary concatenated together. It is possible to download only the parts of the file for specific variables or "fields" that you want.

You will save a lot of disk space and improve download time if you download just the variables you need. The size of a single HRRR grid is about 1 MB.

The challenge to downloading parts of the full GRIB2 file finding the byte range for a variable you want. The beginning byte of each variable is given in the index, or .idx, file.

Partial downloads with cURL require a known byte range. The grbi2.idx ( sfc example , prs example) files are metadata text files that contain the beginning byte of each field in the file. Each grib2 file has a unique index file. To find the byte range for a variable, the above function searches for the line that contains the specified variable abbreviation and level.

My Jupyter Notebook Part 3 shows you how I do this and provide a function to download a set of variables from a HRRR GRIB2 file. Part 3 combines the functions for downloading many files and downloading subsets.

Jupyter Notebook: Part 2 Jupyter Notebook: Part 3

How to read GRIB2 data with Python

When you have a GRIB2 file, there are two key packages that read GRIB2 files. Both can be installed via conda-forge.

  1. pygrib is what I started to learn and still use sometimes. | Video Demo | Documentation.
  2. cfgrib works well reading GRIB2 data as xarray datasets. Make sure you have the latest version (>0.9.8) | Documentation
  3. Getting Started with cfgrib

    import cfgrib

    a = cfgrib.open_datasets('hrrr.t00z.wrfsfcf08.grib2')

    The loaded data is a list of xarray dataset objects. Each item in the list is an xarray dataset for a different level. For example, entire atmosphere, height above ground, isobaric level, surface, etc.

I go over this, and give you a useful function that download and reads a HRRR file in my Jupyter Notebook #4

Jupyter Notebook: Part 4

A quick note on multi-threading your downloads. It probably isn't worth it. The archive bandwidth is limited, and will cause a bottleneck if you try downloading too much (and that makes it slow for us to put new data on the archive). I tried using multiprocessing and multi-threading and found that the download speed saturates around 3 or more threads.

List and download Pando objects with s3fs

It is possible to access the Pando S3 "hrrr" buckets directly with s3fs. Documentation. The secret sauce is to set the correct endpoint_url.
import s3fs
# Access Pando
fs = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url':"https://pando-rgw01.chpc.utah.edu/"})

# List objects in a path
fs.ls('hrrr/sfc/20190101/')

# Download the first file and rename it the same name (without the directory structure)
files = fs.ls('hrrr/sfc/20190101/')
fs.get(files[0], files[0].split('/')[-1])

This is a good method to use if you don't know the file names on Pando (such as if you are looking at GOES files), but the HRRR file names are very predictable. The following examples show how to download the files with the straightforward urllib.request and how to download a subset of a files (single variable grid) by exploiting the byte range feature of cURL.


You can use rclone to copy files from Pando to your own disk.

Configure rclone with the following settings (this should be in your .rclone.conf file)...

[publicPando]
type = s3
endpoint = https://pando-rgw01.chpc.utah.edu

Then you can list buckets and files with

rclone lsd publicPando:hrrr/sfc           # list a directory
rclone ls publicPando:hrrr/sfc/20190101/  # list all files in a directory
And rclone has ability to copy files from Pando to your home directory.

How the .idx files are created
wgrib2 hrrr.t09z.wrfsfcf17.grib2 -t -var -lev -ftime > hrrr.t09z.wrfsfcf17.grib2.idx
HRRR Projection Information
Model projection is contained in the GRIB2 file, but it's kind of buried. It can also be found on the HRRR main webiste in the namelist.wps file. Here are the important pieces:

dx = 3000,

dy = 3000,

map_proj = 'lambert',

ref_lat = 38.5,

ref_lon = -97.5,

truelat1 = 38.5,

truelat2 = 38.5,

stand_lon = -97.5,

How do I get the latitude/longitude grid?

Latitude and longitude for every HRRR grid point is defined as part of each grib message. The values for each grid are not stored for each grid box (that would take a lot of memory), but are calculated by the wgrib2 utility with the stored projection information.

If you are using pygrib, you can get the variable data, latitude, and longitude like this: value, lat, lon = grbs[1].data()

For convenience and some unique applications, I created an HDF5 file that contains just the HRRR latitude and longitude grids. HRRR_latlon.h5

List of missing and incomplete files
Wind Vectors: Grid Relative vs Earth Relative

If you are dealing with a vector quantity, like wind direction, you need to convert the U and V wind component from grid-relative to earth-relative to correctly orient the wind vectors.

Convert winds Grid-relative to Earth-Relative