Using wget to download data

A large amount of GOES-R Space Weather data is available on the public GOES-R website.

This page provides an example wget command for downloading one month of a particular data product, but the command could be modified to download more or less data as needed.

To download a large amount of data at once, you can use a recursive wget command. This would be difficult to use programmatically (e.g. in a Python script, downloading only certain dates/products, etc), but is the best way to download a block of data. Here’s an example wget command that would recursively download any filename ending with “nc” into a new “data.ngdc.noaa.gov” directory (with nested subdirectories matching the URL):

wget --recursive \
     --no-clobber \
     --no-parent \
     --accept nc \
     --wait=0.2 \
     --waitretry=10 \
     -e robots=off \
     https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l2/data/xrsf-l2-avg1m_science/2021/05/

The --recursive flag tells wget to download the files in the specified directory and all files in all subdirectories.

--no-clobber prevents files from being overwritten if they already exist, which is used in this example to prevent files from being redownloaded unnecessarily.

--no-parent prevents wget from moving up the directory tree and downloading more than a user wants.

--accept nc means that only files ending in “nc” will be saved.

--wait=0.2 makes wget wait 0.2 seconds between downloads in order to avoid overwhelming the web server (often limited to 10 requests/second).

--waitretry=10 makes wget wait 10 seconds if any connection error has occurred. This acts as a cooldown period in case the web server is throttling requests.

-e robots=off is sometimes necessary to download from a web server that is configured to block indexing using a robots.txt file. In some cases, a server may inadvertently block batch downloads by default, which would otherwise require users to explicitly download individual files.

https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l2/data/xrsf-l2-avg1m\_science/2021/05/ is a subdirectory of the “xrsf-l2-avg1m_science” product containing data for May 2021. If you wish to download a different dataset or a different date range, you can modify the URL as needed. If you would like to download multiple datasets, then you will need to run a series of wget commands using relevant paths.