Download SUVI L1b files

Purpose:: Download SUVI L1b files from the NOAA webserver.

First of all: if you have wget and want an easy solution outside of Python, here are a few bash one-liner examples (remove the #) that can be applied to GOES-16, GOES-17, and different wavelengths with minor changes:

Download an entire day of 171 data to the current directory (long and short exposures):

#wget -nH -nd -r -np -A *.fits.gz https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe171/2021/02/18/

Download only the 171 data between 1 and 2 pm that day to the current directory:

#wget -nH -nd -r -np -A OR_SUVI-L1b-Fe171_G16_s2021???13*.fits.gz https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe171/2021/02/18/

Same as above, but for all SUVI wavelengths, downloaded into their respective subdirectories:

#for w in fe094 fe131 fe171 fe195 fe284 he304; do wget -nH -nd -r -np --directory-prefix=$w -A OR_SUVI-L1b-?e???_G16_s2021???13*.fits.gz https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-$w/2021/02/18/; done;

Now let’s use Python. Import the necessary libraries:

__author__ = "cbethge"

from bs4 import BeautifulSoup
from astropy.time import Time, TimeDelta
import requests, os
import numpy as np

Define parser for the SUVI websites using BeautifulSoup:

def list_url_directory(url, ext=''):
    page = requests.get(url).text
    soup = BeautifulSoup(page, 'html.parser')
    return [url + node.get('href') for node in soup.find_all('a') if node.get('href').endswith(ext)]

Now we define the variable date_time. The general template for is ‘YYYY-MM-DDThh:mm:ss’, but it can have the following formats:

Single date/time:
'2020-01-05T12:30:00'

Several dates/times:
'2020-01-05T12:30:00, 2020-04-23T11:43:00, 2020-05-11T17:05:00'

JSOC-style with start time, timespan, and cadence (image every 20 min for 1 hour in this example):
'2020-01-05T12:30:00/1h@20m'

If a single or several explicit date_times are given, the code will only download the data closest to those timestamps. For the JSOC-style, it will download everything in the given range with the given cadence. Note that SUVI has an imaging cadence of 4 minutes, so any given cadence should be a multiple of 4 minutes. An exception is the 195 channel, where images are taken more frequently. Accepted units for the timespan and cadence are: ‘d’ (days), ‘h’ (hours), and ‘m’ (minutes).

date_time = '2020-01-05T12:30:00/1h@20m'

A few other definitions:

spacecraft  = 16         # GOES 16 or 17?
wavelengths = [171,195]  # Wavelengths. Valid values: 93, 94, 131, 171, 195, 284, 304, 305.
outdir      = './L1b'    # The download path. Subdirectories for the wavelengths will be created.
query_only  = False      # If True, then the filenames are printed only, nothing is downloaded.
verbose     = True       # If True, then print the filenames when downloading.
long_exp    = True       # Download long exposures?
short_exp   = False      # Download short exposures? Note that it will only download one of the two
                         # short exposures for 94 and 131, the one that is closer to the given time.
                         # So depending if you want the 'short exposure' or the 'short flare exposure'
                         # in those channels, it might take a bit of fiddling with the chosen start time.

Run the code:

for wavelength in wavelengths:
    # Split the date argument at the commas (if applicable)
    date_time = date_time.replace(" ","").split(',')
    if len(date_time) == 1:
        # If it is not several dates, take only the first item. That way,
        # we can distinguish between lists and strings below.
        date_time = date_time[0]

    # this should stay the same for now
    baseurl1 = 'https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes'
    baseurl2 = '/l1b/'
    ext = '.fits.gz'

    # check for existing output directory and correct spacecraft and wavelength numbers
    if not query_only:
        # Create the output directory if it does not exist
        try:
            os.makedirs(outdir)
        except FileExistsError:
            # directory already exists
            pass

    spacecraft_numbers = [16,17]

    if spacecraft not in spacecraft_numbers:
        raise Exception('Invalid spacecraft number: '+str(spacecraft)+'. Valid values are: 16, 17.')

    wvln_path = dict({ 93:'suvi-l1b-fe094',  94:'suvi-l1b-fe094', 131:'suvi-l1b-fe131', 171:'suvi-l1b-fe171', \
                      195:'suvi-l1b-fe195', 284:'suvi-l1b-fe284', 304:'suvi-l1b-he304', 305:'suvi-l1b-he304' })

    if wavelength not in wvln_path:
        raise Exception('Invalid wavelength: '+str(wavelength)+'. Valid values are: 93, 94, 131, 171, 195, 284, 304, 305.')

    # Figure out what kind of date_time was given.
    if isinstance(date_time, str):
        # Check if it is a JSOC-style query
        if len(date_time.split('/')) == 2:
            if len(date_time.split('@')) == 2:
                cadence_string = date_time.split('@')[1]
                timespan_string = date_time.split('@')[0].split('/')[1]
                cadence = float(cadence_string[:-1])
                cadence_unit = cadence_string[-1]
                if cadence_unit == 'm':
                    cadence = cadence*60.
                elif cadence_unit == 'h':
                    cadence = cadence*60.*60.
                elif cadence_unit == 'd':
                    cadence = cadence*60.*60*24.
                else:
                    raise Exception('Not a valid time unit (must be m, h, or d).')
            else:
                cadence = 240.
                timespan_string = date_time.split('/')[1]

            timespan = float(timespan_string[:-1])
            timespan_unit = timespan_string[-1]
            if timespan_unit == 'm':
                timespan = timespan*60.
            elif timespan_unit == 'h':
                timespan = timespan*60.*60.
            elif timespan_unit == 'd':
                timespan = timespan*60.*60*24.
            else:
                raise Exception('Not a valid time unit (must be m, h, or d).')

            t0 = Time(date_time.split('/')[0], scale='utc', format='isot')
            tmp_timestamp = []
            counter = 0
            while counter*cadence <= timespan:
                tmp_timestamp.append(counter*cadence)
                counter += 1

            timestamp = t0+TimeDelta(tmp_timestamp, format='sec')
            urls = []
            for time in timestamp:
                urls.append(baseurl1+str(spacecraft)+baseurl2+wvln_path[wavelength]+'/'+time.value[0:10].replace('-','/')+'/')

        else:
            # Only one date, and no JSOC-style query
            timestamp = [Time(date_time, scale='utc', format='isot')]
            urls = [baseurl1+str(spacecraft)+baseurl2+wvln_path[wavelength]+'/'+date_time[0:10].replace('-','/')+'/']

    elif isinstance(date_time, list):
        # if the argument was a list of dates
        timestamp = []
        urls = []
        for this_date in date_time:
            timestamp.append(Time(this_date, scale='utc', format='isot'))
            urls.append(baseurl1+str(spacecraft)+baseurl2+wvln_path[wavelength]+'/'+this_date[0:10].replace('-','/')+'/')


    # Before we run, check if all of the websites are there.
    # Cook the urls down to unique values. To do that, convert
    # to a numpy array, use np.unique, and then convert back
    # to a list. Tried by using conversion to a set first,
    # but that doesn't keep the correct order for the dates.
    urls_arr = np.array(urls)
    urls_unique = np.unique(urls_arr).tolist()
    all_files  = []
    start_time = []
    end_time   = []
    for url in urls_unique:
        request = requests.get(url)
        if not request.status_code == 200:
            raise Exception('Website not found: '+url)
        else:
            # If all of the websites were found, go ahead and make lists of files and dates.
            print('Querying', url, 'for SUVI files...')
            for file in list_url_directory(url, ext):
                all_files.append(file)
                file_base = os.path.basename(file)
                start_time.append(url[-11:-1].replace('/','-')+'T'+file_base[30:32]+':'+file_base[32:34]+':'+\
                                      file_base[34:36]+'.'+file_base[36]+'00')
                end_time.append(url[-11:-1].replace('/','-')+'T'+file_base[46:48]+':'+file_base[48:50]+':'+\
                                    file_base[50:52]+'.'+file_base[52]+'00')

    # Create the subdirectory for the current wavelength
    this_outdir = os.path.join(outdir, str(wavelength))
    try:
        os.makedirs(this_outdir)
    except FileExistsError:
        # directory already exists
        pass

    # Make astropy time objects from the start and end times, compute the exposure time from that.
    start_time = Time(start_time, scale='utc', format='isot')
    end_time = Time(end_time, scale='utc', format='isot')
    exposure_time = end_time-start_time

    # Sort in long and short exposures.
    long_exposures  = np.where(np.around(exposure_time.sec) == 1)
    short_exposures = np.where(np.around(exposure_time.sec) == 0)
    long_exposure_files  = np.array(all_files)[long_exposures]
    short_exposure_files = np.array(all_files)[short_exposures]

    # Now go through all of the requested times and download/print the files.
    for time in timestamp:
        if long_exp:
            delta_t = time-start_time[long_exposures]
            which_file = np.abs(delta_t).argmin()
            if query_only:
                print('Long exposure: ', long_exposure_files[which_file])
            else:
                if verbose:
                    print('Long exposure: ', long_exposure_files[which_file])
                f = requests.get(long_exposure_files[which_file])
                open(os.path.join(this_outdir, os.path.basename(long_exposure_files[which_file])), 'wb').write(f.content)
        if short_exp:
            delta_t = time-start_time[short_exposures]
            which_file = np.abs(delta_t).argmin()
            if query_only:
                print('Short exposure:', short_exposure_files[which_file])
            else:
                if verbose:
                    print('Short exposure:', short_exposure_files[which_file])
                f = requests.get(short_exposure_files[which_file])
                open(os.path.join(this_outdir, os.path.basename(short_exposure_files[which_file])), 'wb').write(f.content)

Querying https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe171/2020/01/05/ for SUVI files...
Long exposure:  https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe171/2020/01/05/OR_SUVI-L1b-Fe171_G16_s20200051230404_e20200051230414_c20200051231078.fits.gz
Long exposure:  https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe171/2020/01/05/OR_SUVI-L1b-Fe171_G16_s20200051250405_e20200051250415_c20200051251081.fits.gz
Long exposure:  https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe171/2020/01/05/OR_SUVI-L1b-Fe171_G16_s20200051310405_e20200051310415_c20200051311080.fits.gz
Long exposure:  https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe171/2020/01/05/OR_SUVI-L1b-Fe171_G16_s20200051330406_e20200051330416_c20200051331076.fits.gz
Querying https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe195/2020/01/05/ for SUVI files...
Long exposure:  https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe195/2020/01/05/OR_SUVI-L1b-Fe195_G16_s20200051229504_e20200051229514_c20200051230173.fits.gz
Long exposure:  https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe195/2020/01/05/OR_SUVI-L1b-Fe195_G16_s20200051249505_e20200051249515_c20200051250177.fits.gz
Long exposure:  https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe195/2020/01/05/OR_SUVI-L1b-Fe195_G16_s20200051309505_e20200051309515_c20200051310180.fits.gz
Long exposure:  https://data.ngdc.noaa.gov/platforms/solar-space-observing-satellites/goes/goes16/l1b/suvi-l1b-fe195/2020/01/05/OR_SUVI-L1b-Fe195_G16_s20200051329506_e20200051329516_c20200051330197.fits.gz

Total running time of the script: (0 minutes 8.692 seconds)

Gallery generated by Sphinx-Gallery