HRRR Archive at the University of Utah
Frequently Asked Questions
About the HRRR archive
What is this HRRR archive?
This HRRR archive is a collection of output from NCEP's High Resolution Rapid Refresh model. This is a model developed by NOAA ESRL and is run operationally every hour at NCEP's Enivronmental Modeling Center. It continues to be developed by ESRL.
The operational HRRR generates hourly forecasts gridded at 3 km for 18 hours over the contiguous United States making it the highest spatial and temporal resolution forecast system run by NCEP. Details on the variables (page still being worked on) produceed by HRRR are available here: HRRR Variables
HRRR analyses and forecasts are exceptionally valuable to the research community. However, an official HRRR archvie does not exist. We began archiving HRRR data in April 2015 to support research efforts at the University of Utah. Instead of downloading all available files, we only download files most useful to accomplish our research efforts. We realize this data is valuable to many others and have made the archive publicly accessible for research purposes. As of 11 August 2017, the archive contains 39 terabytes of data.
Additional details have been published and can be found in Computers and Geosciences.
What files are contained in the HRRR archive?
Output files, in GRIB2 format, contained in the archive include:
- sfc: Surface fields for analyses (beginning April 18, 2015) and forecast hours (beginning July 15, 2016)
- File format: hrrr.t[00-23]z.wrfsfcf[00-18].grib2
- File size: ~120 MB
- prs: Pressure fields for analyses (beginning April 18, 2015)
- File format: hrrr.t[00-23]z.wrfprsf00.grib2
- File size: ~380 MB
- subh: Sub-hourly files for analyses (Begining May 11, 2017)
- File format: hrrr.t[00-23]z.wrfsubhf00.grib2
- File size: ~120 MB
- bufr: Vertical profiles available for KSLC, KODG, and KPVU
- File format: [KSLC,KODG,KPVU]_[YYYYMMDDHH].buf
- File size: ~75 KB
- sfc: Surface fields for analyses (beginning December 1, 2016)
- File format: hrrr.t[00-23]z.wrfsfcf00.grib2
- File size: ~120 MB
Note: Not all hours are available for the experimental runs.
- sfc: Surface fields for analyses and forecast hours (beginning December 1, 2016)
- File format: hrrr.t[00,03,06,09,12,15,18,21]z.wrfsfcf[00-36].grib2
- File size: ~6 MB or ~100 MB if full file was downloaded
- Only selected variables retained in early part of archive. Full files downloaded summer 2017.
- prs: Pressure fields for analyses (beginning December 1, 2016)
- File format: hrrr.t[00,03,06,09,12,15,18,21]z.wrfprsf00.grib2
- File size: ~205 MB
Note: Not all hours are available for the experimental runs.
Note: Some days and hours in our archive may not be available. Either the forecast wasn't run that hour (typical for the experimental models), or our download scripts failed to download everything.
What are GRIB2 files?
GRIB2, or Gridded Binary Version 2, is a standard file format used by meteorologists for model data sets. There are several useful tools for working with the data.
- wgrib2: a command line utility used to read GRIB2 files.
- pygrib: a Python module used to read GRIB2 files.
- NOAA Toolkit: graphical software that can read and visualize GRIB2 files. I highly recomend this tool if you haven't used GRIB2 files before.
Where do the HRRR output files come from?
The operational HRRR (hrrr) is downloaded via HTTP from the NOMADS server.
Experimental HRRR (hrrrX) and HRRR Alaska (hrrrAK) are downloaded via FTP from NOAA ESRL (credentials required).
What version of HRRR is in this archive?
HRRRv1 was the operational model prior to August 23, 2016. We downloaded the operational HRRR from here: http://nomads.ncep.noaa.gov/pub/data/nccf/nonoperational/com/hrrr/prod/. Although the URL says "nonoperational", it was the operational HRRR. The files just hadn't been moved to the operational space.
HRRRv2 was implemented at NCEP on August 23, 2016 begining with the 12z run. That day I began downloading from the new URL: http://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/.
For more details on the HRRRv2 implementation, check out the following announcement: http://www.nws.noaa.gov/om/notification/tin16-26rap_hrrrraaa.htm
HRRRv3 is the current experimental version under development and testing at ESRL. It is expected to become the operational model in Feburary 2018 (source). Version 3 will extend the forecast period for the 00, 06, 12, and 18 run out to 36 hours. Version 3 has improved Thompson microphysics, imporved MYNN PBL scheme, a land surface model update with 15 second MODIS data, and refined roughness lengths for certain land use types with additional imporovments in assimilation methods including new dataset like lightning and radar radial velocity.
When do we download the HRRR data?
HRRR is downloaded from NOMADS once a day at 6:35 PM Mountain Time.
HRRR-Alaska is downloaded from ESRL twice a day at 12:35 PM and 6:35 PM Mountain Time.
HRRR-X is downloaded from ESRL twice a day at 12:05 PM and 6:05 PM Mountain Time.
Since we retreive HRRR data only once or twice a day (a process that may take several hours), this HRRR archive is not meant to be used for real-time or operational products.
Vision for the Future
At the 2017 Annual AMS meeting I talked with a guy on the the AMS Board on Data Stewarship. He said the lack of a HRRR archive was one concern brought up at their board meeting.
Archiving high resolution model data is an expensive task. This is a growing issue in the atmospheric science community as we move to higher resolution models that generate terabytes of data annually.
Perhaps the solution is in cloud computing, where we can "bring the data to the computing." An archive on the cloud can be kept in one place and researchers can use cloud resources to process the data, perform analyses, and initialize WRF simulations without downloading the model output on their own computers. It would especially be beneficial if it were possible to effeciently mine the large dataset for a specific variable at a point or subgrid for a range of dates or times. Perhaps this model data needs to be stored in HDF5 format rather than grib2. It would be nice if there was fewer redundate data fields between the files. For instance, some fields are availabe in the prs and sfc and nat files. The benefits and complications of cloud computing were discussed at the 2017 Modeling Research in the Cloud Workshop.
Until those challenges are overcome, I would like to make this HRRR archive easily available to everyone to demonstrate the interest and need for having such an archive. This archive has already proven beneficial to many researchers and has served many applications.
In the future, it is possible to have server-side data processing that generate time series or wind roses for point locations in the HRRR model. My current methods for doing this are extreamly ineffiecient becuase they still require dowloading a temporary file for a single variable of the entire grid (which contains about 1.9 grid points over the contiguous United States).
In short, this project is turning into a classic case of one of my favorite children's books If You Give A Mouse A Cookie where starting one thing has grown and continues to grow into something bigger, doing things I hadn't thought I'd need before.
Who archives the HRRR data?
The HRRR archive at the University of Utah is managed by Brian Blaylock and the MesoWest group on resourses at Utah's Center for High Performance Computing. Please contact Brian with questions regarding the archive.
Why do we archive the HRRR?
While you can find current HRRR output atNOMADS, there is no official HRRR archive that is publicly and easily available, to our knowledge. This "gap" in the NOAA data archives will hopefully be filled in the future, seeing that their vision is that all NOAA environmental data shall be Discoverable, Accessible, Useable, and Preserved. We hope to be part of the solution to that goal.
This HRRR archive has been created to support various research endeavours in the Department of Atmospheric Sciences at the Univerisity of Utah. Things we do include:
- Initialize the Weather Research and Forecast (WRF) model with HRRR analyses as initial and boundary conditions. More info here.
- Model verification, where HRRR analyses and forecasts are compared to observed conditions.
- Retrospective analysis of high-impact weather events.
- Basic statistics of variables (max, min, mean, percentiles).
You are welcome to use the archive for your own research, but as a courtesy, please register and read the Best Practices before downloading from the archive (click buttons at top).
Where is the HRRR archive?
The archive is physically located at the University of Utah Downtown Data Center in Salt Lake City, Utah. It is hosted on Utah's Center for High Performance Computing Pando archive storage system.
This object storage system is similar to Amazon's S3 storage, but this is less expensive. We are currently using over 40 TB of our 60 TB allocation. At the current storage rate (~150 GB/day) I expect the S3 archive will be full around mid-December 2017. We will buy more storage if a proposal is funded, otherwise we'll have to remove the HRRR forcast or subhourlys grid if we want to continue saving HRRR analyses.
But you probably aren't interested in it's physical location. You want to know how to download from the archive. Lucky for you, the data is publically available to those outside the University of Utah. We only ask that you fill out the registration form before downloading. The registration form helps us keep track of the number of people who find this data useful, which helps us justify making the archive available to you. Archive Registration
After registering, you will be redirected to the interactive download page. Click the "Scripting Tips" button for instructions to download from Pando with wget or cURL. These tips also show how you can target specific variables you are interested without downloading the entire file.
Ok, so what is the Pando archive?
Pando is a colony of quanking aspen trees in southern Utah, thought to be the oldest and largest living organism in the world, linked together by the same root system. According to Wikipedia, Pando is Latin for "I spread". The name is fitting for the CHPC object storage system becuase of its resiliance and scalability, like the tree system.
The underlying software running Pando is Ceph, a project supported by RedHat. While Ceph can be configered in a few ways, in this archive Ceph manages the data objects with the Amazon S3 API. It is a separate "island" from the rest of CHPC file system so that output from other programs cannot write directly to it. Instead, files are copied to Pando through utilities like rclone or s3cmd.
Pando is built in three parts. All of these can be scaled to meet needs of the growing archive.
- 9 OSD Servers (16 8TB drives, each) - These contain the data objects.
- 3 Monitors - Monitors keep a map of the data objects. When a request for data is made, these monitors are contacted for the object map and return the object's ID.
- 1 Rados Gateway node - You, as the client, make requests through this gateway for data downloads.
Adapted from Sam Liston
What days are available to download?
The HRRR archvie began on 18 April 2015 and have continued to downlaod since, with few hiccups here and there.
In the begining, we downloaded operational HRRR analyses (i.e. forecast hour 0) for the surface and pressure files.
Beginning 27 July 2016, we store all the operational forecast for the surface fields. Unfortunately, we decided the pressure field files (~315 MB) are too large for us to save the forecast hours.
Experimental verison of the model were later added to the archive.
- 2015-04-18: The HRRR archive is born! Only HRRR analysis hours (f00) were stored for sfc and prs files.
- 2016-07-27: Began storing HRRR 15 hour sfc forecasts.
- 2016-08-24: Began storing HRRR 18 hour sfc forecasts (HRRRv2 became operational).
- 2016-12-01: Began storing experimental HRRR-Alaska. prs analyses and select sfc variables for analyses and forecasts.
- 2016-12-01: Began storing available experimental HRRR surface analyses.
- 2017-03-01: Moved HRRR archive from local file system to Pando storage.
- 2017-05-10: Began storing HRRR subhourly analyses and forecasts.
- 2017-06-08: Began storing all available HRRR-Alaska sfc variables.
Tip: Browse the interactive web download page for a visual sense of what data files are available each day.
Note: The HRRRx with Eclipse algorithm for all the forecast hours are available in Pando between August 18-22. Enjoy!
Who uses the HRRR archive?
Many people are interested in the HRRR archive, including yourself (obviously, or else you wouldn't have Googled HRRR archive).
The initial purpose of this HRRR archive was to serve atmospheric science research at the University of Utah, particularly my own research for my Master and PhD degrees. For example, I used HRRR analyses to initialize WRF simulations.
The archive is searched for quite often by people like you. Since this archive has been published online I have received inquiries and download requests from National Weather Service employees, researchers at the National Institute of Standards and Technology, a United State Air Force Captain, employees at NVIDIA, Lockheed Martin, commercial wind power companies, university professors, graduate students, airline forecasters, post docs, a bunch of students working on a capstone project, and many others.
For many researchers, having a HRRR archive allows them to use past weather data at high temporal and spatial resolution without the need to run their own WRF simulations. This is a huge time saver.
How is the HRRR archive useful to you? Send me an email and let me know.
I use the HRRR analyses to initialize WRF's boundary and initial conditions. Check out my instructions for initializing WRF with the HRRR here: http://home.chpc.utah.edu/~u0553130/Brian_Blaylock/hrrr.html
2-day Time Series: Time series graphs of observed values for select MesoWest stations and HRRR values for f00, f06, f12, and f18 for the last two days. Check out the HRRR Point Forecasts page, and select the clock next to a station to view verification of the HRRR model for the last two days.
"Hovmoller" Diagram: Imagine a Hovmoller diagram with forecast hour on the y-axis and valid time on the x-axis. These are created for a variable over each HRRR forecast hour and compare with the observed value. On the HRRR Point Forecasts page, select the clock next to a station and click "Hovmoller". GitHub Code
I created 2-year "climatologies" or composites of wind speed, temperature, and other variables for every hour of the day for an almost two year period. These may be used for MesoWest range checks to flag suspect or bad observations.
You can view all my climatological plots in the gallery here. I managed to get the computation time to find the max, min, and mean values for the ~2 years of data down to 20 minutes using 30 processors, (much better than 5 hours on a single processor). Brute-force calculations of percentiles takes quite a bit more time and memory. 1, 5, 10, 90, 95, and 99th percentiles where done for each hour of every day in the two years, and those took about one hour (download time from Pando and calculation) per variable.
The HRRR analyses mean winds are comparable with the results of James et al. 2017.
Investigated gravity wave in St. Louis and thunderstorms in Atlanta.
Quantify inversion strength during the Utah Fine Particulate Air Quality Study, January 2017.
New England Snow Storm, March 13-14 2017
Great Salt Lake surface temperature (before HRRR assimilation was fixed)
Tips for CHPC Users
Do you work/study at the University of Utah, too? Cool! Go Utes!! Here are a few helpful tips that will make your life easier. That is, if your life involves using HRRR data from our handy archive. Please share with me cool stuff you learn!
Is the archive on the local file system?
Yes, we keep the most recent month (maybe a bit more) on the horel-group/archive file system. Just modifiy the date in the example below to get to that directory.
Then navigate to the HRRR directory you are interested:
However, be aware that as disk space on horel-group grows, files are compressed and removed. In the future, HRRR model output wont be availalbe in the home directory.
Want a behind-the-scenes tour of the HRRR archive?
The process we use to downloading HRRR and move HRRR to the Pando archive system is documented on GitHub. Horel-S3
How do I get the HRRR data if what I'm looking for is compressed?
If you find that what you're looking for is compressed, the following instructions will help get what you need:
Copy the models.tar.gz file to your own space. Yes, this is large (~20 GB). It contains all the model data we archive (I warned you that this was inefficient).
Before untarring the entire directory, you can check if it contains the files you need with something like this:less -p /hrrr/hrrr.t models.tar.gz
You can untar the entire file, but that would take a looooonng time, and you don't need everything in that file anyways.
You can extract just the HRRR data with this one-liner...(still takes a bit of time)tar -xzvf models.tar.gz 20160101/models/hrrr/
Or you can get more creative and use the following to get just the pressure fields. (In this case it helps to know the contents of the tarred file). Starting in the directory you wish to copy the files into, type...tar -zxvf models.tar.gz --wildcards --no-anchored 'hrrr.t*z.wrfprsf00.grib2'
For more information and citation details, read this article:
Blaylock B., J. Horel and S. Liston, 2017: Cloud Archiving and Data Mining of High Resolution Rapid Refresh Model Output. Computers and Geosciences. Accepted. https://doi.org/10.1016/j.cageo.2017.08.005