Using wget to download data
A large amount of GOES-R Space Weather data is available on the public GOES-R website.
This page provides an example wget
command for downloading one month of a particular data product, but the command could be modified to download more or less data as needed.
To download a large amount of data at once, you can use a recursive wget
command. This would be difficult to use programmatically (e.g. in a Python script, downloading only certain dates/products, etc), but is the best way to download a block of data. Here’s an example wget
command that would recursively download any filename ending with “nc” into a new “data.ngdc.noaa.gov” directory (with nested subdirectories matching the URL):
wget --recursive \
--no-clobber \
--no-parent \
--accept nc \
--wait=0.2 \
--waitretry=10 \
-e robots=off \
https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l2/data/xrsf-l2-avg1m_science/2021/05/
The --recursive
flag tells wget to download the files in the specified directory and all files in all subdirectories.
--no-clobber
prevents files from being overwritten if they already exist, which is used in this example to prevent files from being redownloaded unnecessarily.
--no-parent
prevents wget from moving up the directory tree and downloading more than a user wants.
--accept nc
means that only files ending in “nc” will be saved.
--wait=0.2
makes wget
wait 0.2 seconds between downloads in order to avoid overwhelming the web server (often limited to 10 requests/second).
--waitretry=10
makes wget
wait 10 seconds if any connection error has occurred. This acts as a cooldown period in case the web server is throttling requests.
-e robots=off
is sometimes necessary to download from a web server that is configured to block indexing using a robots.txt
file. In some cases, a server may inadvertently block batch downloads by default, which would otherwise require users to explicitly download individual files.
https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l2/data/xrsf-l2-avg1m\_science/2021/05/
is a subdirectory of the “xrsf-l2-avg1m_science” product containing data for May 2021. If you wish to download a different dataset or a different date range, you can modify the URL as needed. If you would like to download multiple datasets, then you will need to run a series of wget
commands using relevant paths.