Demo Workflow#

model_catalogs provides easy access to specific curated ocean models through Intake catalogs, with the ultimate product being an xarray Dataset.

An Intake catalog has one or more Sources listed in it; a Source is a distinct product that has a urlpath pointing to where the model output can be accessed. An Intake catalog can be nested catalogs that each have sources.

import model_catalogs as mc

import intake
import pandas as pd

import os

import xarray as xr

Main Catalog#

The first step is to set up the main catalog. The main catalog provides access to all of the models and the various model_sources that are available for each model. The catalog is set up by reading in the transform.yaml catalog file, pointing it to the hard-wired original (“orig”) catalog file for each model, and reading and copying in previously-calculated boundary information for the model (also saved in the repository). Every model catalog in the main catalog has at least a single model_source, but may also have others. Each model_source model catalog has a urlpath that points to a location for the model output so that a user can look at some model output. If the model output is available at a static link, that link is provided by urlpath. If the model output is available as a series of files that need to be aggregated (often for NOAA OFS models), then two example files are provided. Later in the process, the urlpath for all unaggregated models will be filled in with the full listing of file locations that are requested by the user.

The main catalog is compiled from parts that are saved to the repository, but some of the model catalog files have Intake user parameters for today and/or yesterday that are filled in when the main catalog is run. To save time, the individual compiled catalog files are saved into a user application cache and are considered “fresh” for 6 hours (set in mc.FRESH['compiled'] in the __init__ file) since they need to be recreated for each day. When mc.setup() is run, the catalog will be put together fresh if either more than 6 hours have passed since last time it was run or override=True is input to force it to be recreated. Otherwise, the previously-written “compiled” catalog files will be reread.

Each model catalog nested in the main catalog points to an original model catalog as a “target” for an Intake transform in model_catalogs. This transform is run when model output is requested from the model with source.to_dask() and it applies metadata stored in the model catalogs. The original model information can also be revealed with source.target.

Sample main catalog

main_cat

Nested Intake Catalog

cat

Intake Catalog

source


Intake Source

Description

CIOFS

coops-forecast-agg

static link to pre-aggregated forecast model output

coops-forecast-noagg

unaggregated model output from recently through forecast

ncei-archive-noagg

unaggregated nowcast model output from the past through not too long ago

ncei-archive-agg

aggregated forecast files from the past

WCOFS

coops-forecast-noagg

unaggregated model output from recently through forecast

ncei-archive-noagg

unaggregated nowcast model output from the past through not too long ago

GOFS

hycom-forecast-agg

static link to forecast model output

RTOFS

nomads-forecast-yesterday

known links to files, prepopulated with “yesterday” to get correct links

nomads-forecast-today

known links to files, prepopulated with “today” to get correct links. These files are only available midday so aren’t as reliable to use as the forecast depending on the time of time.

…all the rest are similar

NOTE: The model domain boundaries are previously calculated and saved into the repository. However, they can be recalculated with mc.calculate_boundaries() if a model has been changed.

NOTE: The user application cache directory can be revealed with mc.CAT_PATH.

main_cat = mc.setup()
# Or if you have changed a model catalog file, you can incorporate those changes into main_cat with:
# main_cat = mc.setup(override=True)

See what models are available with:

list(main_cat)
['NYOFS',
 'CBOFS-RGRID',
 'DBOFS-RGRID',
 'WCOFS-RGRID',
 'LEOFS-RGRID',
 'RTOFS-EAST',
 'RTOFS-AK',
 'RTOFS-2D',
 'WCOFS-2DS',
 'GOFS',
 'GFS-1-2DEG',
 'SFBOFS-RGRID',
 'RTOFS',
 'CREOFS-RGRID',
 'TBOFS',
 'NGOFS2',
 'GFS-1DEG',
 'NGOFS2-RGRID',
 'GOMOFS-2DS',
 'CIOFS',
 'LMHOFS-RGRID',
 'GOMOFS-RGRID',
 'LMHOFS',
 'CIOFS-RGRID',
 'DBOFS',
 'LEOFS',
 'SFBOFS',
 'TBOFS-RGRID',
 'NGOFS2-2DS',
 'RTOFS-WEST',
 'LOOFS',
 'WCOFS',
 'GOMOFS',
 'GFS-1-4DEG',
 'CREOFS',
 'LSOFS',
 'CBOFS']

Examine metadata of models in source catalog#

The main catalog contains information about each model. The full list of metadata keys for a model catalog can be found with:

main_cat['CBOFS'].metadata.keys()
dict_keys(['alpha_shape', 'bounding_box', 'filetype', 'geospatial_bounds', 'grid_dim_(xyz)', 'grid_res_max_(km)', 'grid_res_min_(km)', 'grid_type', 'html_desc', 'long_name', 'catalog_dir'])

Each source also has its own metadata:

main_cat['CBOFS']['coops-forecast-agg'].metadata.keys()
dict_keys(['axis', 'catalog_dir', 'output_period_(hr)', 'overall_end_datetime', 'overall_start_datetime', 'standard_names'])

Domain boundaries#

import shapely.wkt

P = shapely.wkt.loads(main_cat['CBOFS'].metadata['geospatial_bounds'])
P
_images/2d252a1bb97ad5b18caa26266b6264b9f3d91f4cf26eca1eef79472df2cd002a.svg

Variables#

A mapping for the relevant variables to NOAA applications has been written into each source catalog. The mapping is between a standard CF convention variable name to the model dataset variable name. An example list of possible variables used is:

    eastward_sea_water_velocity
    eastward_wind
    northward_sea_water_velocity
    northward_wind
    sea_ice_area_fraction
    sea_ice_thickness
    sea_surface_elevation
    sea_water_temperature
    sea_water_practical_salinity

The full standard_name list is available in the docs.

Examine the variable mapping for a given model:

main_cat['CBOFS']['coops-forecast-agg'].metadata['standard_names']
{'angle': 'angle_of_rotation_from_east_to_x',
 'eastward_sea_water_velocity': 'u',
 'eastward_wind': 'Uwind',
 'northward_sea_water_velocity': 'v',
 'northward_wind': 'Vwind',
 'ocean_s_coordinate_g1': ['s_rho', 's_w'],
 'sea_floor_depth': 'h',
 'sea_surface_elevation': 'zeta',
 'sea_water_practical_salinity': 'salt',
 'sea_water_temperature': 'temp',
 'time': 'time',
 'upward_sea_water_velocity': 'w'}

Conveniences: urlpath, model output source, and server check#

You can find the model output location(s) by querying urlpath on a source. Every source has a default urlpath that can be checked.

main_cat['CBOFS']['coops-forecast-noagg'].urlpath  # shows 2 default files
['https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n001.20221004.t00z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n002.20221004.t00z.nc']

The datetimes associated with files to be aggregated together from models that have a list of paths in urlpath (instead of just a static link) can be examined separately for convenience in the source:

main_cat['CBOFS']['coops-forecast-noagg'].dates
[Timestamp('2022-10-03 19:00:00'), Timestamp('2022-10-03 20:00:00')]

The sources in model_catalogs are Transforms pointing to other Intake catalog files that are the target of the transforms. The sources are what need to have urlpath defined. However, model_catalogs brings forward the urlpath so you can query it from the transform too. The original model information, the target of the transform, can be found with source.target. For example:

main_cat['CBOFS']['coops-forecast-noagg'].target
coops-forecast-noagg:
  args:
    combine: by_coords
    compat: override
    coords: minimal
    data_vars: minimal
    engine: netcdf4
    parallel: true
    urlpath:
    - https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n001.20221004.t00z.nc
    - https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n002.20221004.t00z.nc
  description: Unaggregated 3D Chesapeake Bay model in ROMS from 30 days ago with
    48 hour forecast.
  driver: intake_xarray.opendap.OpenDapSource
  metadata:
    axis:
      T: ocean_time
      X:
      - xi_rho
      - xi_u
      - xi_v
      - xi_psi
      Y:
      - eta_rho
      - eta_u
      - eta_v
      - eta_psi
      Z:
      - s_rho
      - s_w
    catalog_dir: /Users/kthyng/projects/model_catalogs/model_catalogs/catalogs/orig/
    catloc: https://opendap.co-ops.nos.noaa.gov/thredds/catalog/NOAA/CBOFS/MODELS/catalog.xml
    output_period_(hr): 1
    overall_end_datetime: 48 hours after present time
    overall_start_datetime: 1 month before present day
    sample_locs:
    - https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n001.20221004.t00z.nc
    - https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n002.20221004.t00z.nc
    standard_names:
      angle: angle_of_rotation_from_east_to_x
      eastward_sea_water_velocity: u
      eastward_wind: Uwind
      northward_sea_water_velocity: v
      northward_wind: Vwind
      ocean_s_coordinate_g1:
      - s_rho
      - s_w
      sea_floor_depth: h
      sea_surface_elevation: zeta
      sea_water_practical_salinity: salt
      sea_water_temperature: temp
      time: ocean_time
      upward_sea_water_velocity: w

You can also see if the server for a given Source is currently running with

main_cat['CBOFS']['coops-forecast-noagg'].status
True

Find availability for a model#

You can query a specific model for its availability for any or all model_sources. This is not a necessary step, but is useful for a user choosing the model to use and what is possible. The start and end datetimes found are saved into the user application cache and are reused if still considered fresh (as determined by the FRESH parameters for each model_source and start or end times).

  • If no model_sources are input, the availability will be found and inserted into each source in the input model catalog: mc.find_availability(main_cat['CIOFS']); this returns an Intake Catalog

  • Alternatively, the user can choose 1 or more model_sources to check, for example, maybe they are only interested in checking the available datetime range of a particular model_source: mc.find_availability(main_cat['CIOFS'], model_source='coops-forecast-agg'); this returns an Intake Catalog

  • Or, simplest of all, the user can directly input the Intake Source they want to evaluate: mc.find_availability(main_cat['CIOFS']['coops-forecast-agg']); this returns an Intake Source

In the following example, some model_sources for the CIOFS model may still be fresh for the first call below. The second time the request is made, all previously-calculated availabilities are fresh so it is fast to run.

%%time
cat = mc.find_availability(main_cat['CIOFS'])
CPU times: user 488 ms, sys: 14.9 ms, total: 503 ms
Wall time: 3.62 s
%%time
cat = mc.find_availability(main_cat['CIOFS'], verbose=True)
coops-forecast-agg: 2022-09-28T01:00:00.000000000 to 2022-10-07T12:00:00.000000000
coops-forecast-noagg: 2022-09-04 13:00:00 to 2022-10-07 12:00:00
ncei-archive-agg: 2019-09-16T19:00:00.000000000 to 2022-10-01T18:00:00.000000000
ncei-archive-noagg: 2019-09-16 19:00:00 to 2022-10-01 18:00:00
CPU times: user 6.28 ms, sys: 1.87 ms, total: 8.15 ms
Wall time: 7.03 ms

Setup model catalog for desired date range#

For unaggregated model_sources for NOAA OFS models like “coops-forecast-agg”, the locations of the necessary model files need to be determined and saved to urlpath in the source, based on the date range desired. mc.select_date_range() does this.

There are four modes for mc.select_date_range():

  1. Run mc.find_availability() for all model_sources, then request your desired date range in mc.select_date_range() and the code will figure out what model_source to use.

  2. Skip straight to mc.select_date_range() because you know which model_source to use for your model, and input that model_source.

  3. In the special case of there only being one model_source in the cat, it does not need to be input.

  4. In the special case of the model output being available at a static link or deterministic paths, the function of mc.select_date_range() is to input the desired start and end dates into the metadata so that the ultimate Dataset is subselected to that time range.

mc.select_date_range() can take in an Intake Catalog or Source, but always returns a Source.

NOTE: Some model sources have static links to the model output, in which case they can be immediately accessed, like the GOFS forecast. For those models, mc.select_date_range() does not need to be run and the model output can be accessed directly from the main catalog. You don’t need to know this though — you can use the same process described here too.

NOTE: All model sources have some files that work stored in urlpath so that you can always run main_cat[model][model_source].to_dask() and get something back. However, these are sample files and will be replaced with the proper file paths when a user runs mc.select_date_range().

Example of #1. After this is run, model output can be accessed through source_CIOFS.

start_date = pd.Timestamp.today()
main_cat = mc.setup()
cat = mc.find_availability(main_cat['WCOFS'])
source_WCOFS = mc.select_date_range(cat, start_date=start_date, end_date=start_date, override=True)
source_WCOFS.urlpath  # shows files for the date range selection
['https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n003.20221005.t03z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n006.20221005.t03z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n009.20221005.t03z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n012.20221005.t03z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n015.20221005.t03z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n018.20221005.t03z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n021.20221005.t03z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n024.20221005.t03z.nc']

The dates associated with the newly-found filenames in urlpath can be checked with source.dates:

source_WCOFS.dates
[Timestamp('2022-10-05 00:00:00'),
 Timestamp('2022-10-05 03:00:00'),
 Timestamp('2022-10-05 06:00:00'),
 Timestamp('2022-10-05 09:00:00'),
 Timestamp('2022-10-05 12:00:00'),
 Timestamp('2022-10-05 15:00:00'),
 Timestamp('2022-10-05 18:00:00'),
 Timestamp('2022-10-05 21:00:00')]

Example of #2:

today = pd.Timestamp.today()
main_cat = mc.setup()
source_LMHOFS = mc.select_date_range(main_cat['LMHOFS']['coops-forecast-noagg'], start_date=today, end_date=today)
source_LMHOFS.urlpath  # shows files for the date range selection
['https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n000.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n001.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n002.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n003.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n004.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n005.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n006.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n001.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n002.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n003.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n004.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n005.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n006.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n001.20221005.t18z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n002.20221005.t18z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n003.20221005.t18z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n004.20221005.t18z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n005.20221005.t18z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f000.20221005.t18z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f001.20221005.t18z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f002.20221005.t18z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f003.20221005.t18z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f004.20221005.t18z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f005.20221005.t18z.nc']

Example of #3 and #4:

today = pd.Timestamp.today()
main_cat = mc.setup()
source_GOFS = mc.select_date_range(main_cat['GOFS'], start_date=today, end_date=today+pd.Timedelta('1 day'))
source_GOFS.urlpath
'http://tds.hycom.org/thredds/dodsC/GLBy0.08/latest'

Note that since the GOFS forecast output is available at a static link, it can also be accessed directly. The difference in the resulting Dataset is that source_GOFS_direct retains all times in the output whereas source_GOFS has been filtered to the specified date range.

source_GOFS_direct = main_cat['GOFS']['hycom-forecast-agg'].to_dask()

Direct workflow#

If you already know what model and model_source you want to use, you can skip ahead to get your desired date range with:

source = mc.select_date_range(main_cat['CIOFS']['coops-forecast-noagg'], start_date=start_date, end_date=start_date)
source.urlpath
['https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n006.20221005.t00z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n001.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n002.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n003.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n004.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n005.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n006.20221005.t06z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n001.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n002.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n003.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n004.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n005.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n006.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f001.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f002.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f003.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f004.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f005.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f006.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f007.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f008.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f009.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f010.20221005.t12z.nc',
 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f011.20221005.t12z.nc']

Read in model output#

Model output can be accessed from our examples 1 and 2 above, or from an example with a static link.

Example #1:

%%time
source_WCOFS.to_dask()
CPU times: user 3.83 s, sys: 416 ms, total: 4.25 s
Wall time: 7.83 s
<xarray.Dataset>
Dimensions:        (tracer: 2, boundary: 4, s_rho: 40, s_w: 41, eta_rho: 1016,
                    xi_rho: 348, eta_u: 1016, xi_u: 347, eta_v: 1015,
                    xi_v: 348, eta_psi: 1015, xi_psi: 347, ocean_time: 2)
Coordinates: (12/19)
  * s_rho          (s_rho) float64 -0.9875 -0.9625 -0.9375 ... -0.0375 -0.0125
  * s_w            (s_w) float64 -1.0 -0.975 -0.95 -0.925 ... -0.05 -0.025 0.0
    lon_rho        (eta_rho, xi_rho) float64 dask.array<chunksize=(1016, 348), meta=np.ndarray>
    lat_rho        (eta_rho, xi_rho) float64 dask.array<chunksize=(1016, 348), meta=np.ndarray>
    lon_u          (eta_u, xi_u) float64 dask.array<chunksize=(1016, 347), meta=np.ndarray>
    lat_u          (eta_u, xi_u) float64 dask.array<chunksize=(1016, 347), meta=np.ndarray>
    ...             ...
  * xi_v           (xi_v) int64 0 1 2 3 4 5 6 7 ... 341 342 343 344 345 346 347
  * xi_psi         (xi_psi) int64 0 1 2 3 4 5 6 ... 340 341 342 343 344 345 346
  * eta_rho        (eta_rho) int64 0 1 2 3 4 5 ... 1010 1011 1012 1013 1014 1015
  * eta_u          (eta_u) int64 0 1 2 3 4 5 6 ... 1010 1011 1012 1013 1014 1015
  * eta_v          (eta_v) int64 0 1 2 3 4 5 6 ... 1009 1010 1011 1012 1013 1014
  * eta_psi        (eta_psi) int64 0 1 2 3 4 5 ... 1009 1010 1011 1012 1013 1014
Dimensions without coordinates: tracer, boundary
Data variables: (12/80)
    ntimes         int32 ...
    ndtfast        int32 ...
    dt             float64 ...
    dtfast         float64 ...
    dstart         datetime64[ns] ...
    nHIS           int32 ...
    ...             ...
    vbar           (ocean_time, eta_v, xi_v) float32 dask.array<chunksize=(1, 1015, 348), meta=np.ndarray>
    u              (ocean_time, s_rho, eta_u, xi_u) float32 dask.array<chunksize=(1, 40, 1016, 347), meta=np.ndarray>
    v              (ocean_time, s_rho, eta_v, xi_v) float32 dask.array<chunksize=(1, 40, 1015, 348), meta=np.ndarray>
    w              (ocean_time, s_w, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 41, 1016, 348), meta=np.ndarray>
    temp           (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 40, 1016, 348), meta=np.ndarray>
    salt           (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 40, 1016, 348), meta=np.ndarray>
Attributes: (12/37)
    file:                            nos.wcofs.fields.nowcast.20221005.t03z_0...
    format:                          netCDF-3 64bit offset file
    Conventions:                     CF-1.4, SGRID-0.3
    type:                            ROMS/TOMS history file
    title:                           wcofs nowcast RUN in operational mode
    var_info:                        varinfo.dat
    ...                              ...
    tiling:                          008x060
    history:                         ROMS/TOMS, Version 3.9, Wednesday - Octo...
    ana_file:                        ROMS/Functionals/ana_btflux.h
    CPP_options:                     mode, ADD_FSOBC, ADD_M2OBC, ANA_BSFLUX, ...
    DODS_EXTRA.Unlimited_Dimension:  ocean_time
    EXTRA_DIMENSION.N:               40

Example #2:

%%time
source_LMHOFS.to_dask()
CPU times: user 3.28 s, sys: 1.16 s, total: 4.45 s
Wall time: 14.7 s
<xarray.Dataset>
Dimensions:             (nele: 171377, node: 90806, three: 3, time: 24,
                         maxnode: 11, maxelem: 9, four: 4, siglay: 20,
                         siglev: 21)
Coordinates:
    lon                 (node) float32 dask.array<chunksize=(90806,), meta=np.ndarray>
    lat                 (node) float32 dask.array<chunksize=(90806,), meta=np.ndarray>
    lonc                (nele) float32 dask.array<chunksize=(171377,), meta=np.ndarray>
    latc                (nele) float32 dask.array<chunksize=(171377,), meta=np.ndarray>
  * time                (time) datetime64[ns] 2022-10-05 ... 2022-10-05T23:00...
    sigma_layers        (siglay, node) float32 dask.array<chunksize=(20, 90806), meta=np.ndarray>
    sigma_levels        (siglev, node) float32 dask.array<chunksize=(21, 90806), meta=np.ndarray>
Dimensions without coordinates: nele, node, three, maxnode, maxelem, four,
                                siglay, siglev
Data variables: (12/48)
    nprocs              int32 ...
    partition           (nele) int32 dask.array<chunksize=(171377,), meta=np.ndarray>
    x                   (node) float32 dask.array<chunksize=(90806,), meta=np.ndarray>
    y                   (node) float32 dask.array<chunksize=(90806,), meta=np.ndarray>
    xc                  (nele) float32 dask.array<chunksize=(171377,), meta=np.ndarray>
    yc                  (nele) float32 dask.array<chunksize=(171377,), meta=np.ndarray>
    ...                  ...
    atmos_press         (time, node) float32 dask.array<chunksize=(1, 90806), meta=np.ndarray>
    wet_nodes           (time, node) int32 dask.array<chunksize=(1, 90806), meta=np.ndarray>
    wet_cells           (time, nele) int32 dask.array<chunksize=(1, 171377), meta=np.ndarray>
    wet_nodes_prev_int  (time, node) int32 dask.array<chunksize=(1, 90806), meta=np.ndarray>
    wet_cells_prev_int  (time, nele) int32 dask.array<chunksize=(1, 171377), meta=np.ndarray>
    wet_cells_prev_ext  (time, nele) int32 dask.array<chunksize=(1, 171377), meta=np.ndarray>
Attributes: (12/17)
    title:                           LMHOFS
    institution:                     School for Marine Science and Technology
    source:                          FVCOM_4.3
    history:                         model started at: 05/10/2022   07:15
    references:                      http://fvcom.smast.umassd.edu, http://co...
    Conventions:                     CF-1.0
    ...                              ...
    Surface_Heat_Forcing:            FVCOM variable surface heat forcing file...
    Surface_Wind_Forcing:            FVCOM variable surface Wind forcing:\nFI...
    Surface_PrecipEvap_Forcing:      FVCOM periodic surface precip forcing:\n...
    DODS.strlen:                     26
    DODS.dimName:                    DateStrLen
    DODS_EXTRA.Unlimited_Dimension:  time

Example #3 (if you know you have direct access to static link):

%%time
main_cat['GOFS']['hycom-forecast-agg'].to_dask()
CPU times: user 121 µs, sys: 1 µs, total: 122 µs
Wall time: 126 µs
<xarray.Dataset>
Dimensions:            (depth: 40, lat: 4251, lon: 4500, time: 8)
Coordinates:
  * depth              (depth) float64 0.0 2.0 4.0 6.0 ... 3e+03 4e+03 5e+03
  * lat                (lat) float64 -80.0 -79.96 -79.92 ... 89.92 89.96 90.0
  * lon                (lon) float64 0.0 0.07996 0.16 0.24 ... 359.8 359.8 359.9
  * time               (time) datetime64[ns] 2022-10-05T18:00:00 ... 2022-10-...
    time_run           (time) datetime64[ns] 2022-10-04T12:00:00 ... 2022-10-...
Data variables:
    time_offset        (time) datetime64[ns] 2022-09-30T18:00:00 ... 2022-10-...
    water_u            (time, depth, lat, lon) float32 ...
    water_u_bottom     (time, lat, lon) float32 ...
    water_v            (time, depth, lat, lon) float32 ...
    water_v_bottom     (time, lat, lon) float32 ...
    water_temp         (time, depth, lat, lon) float32 ...
    water_temp_bottom  (time, lat, lon) float32 ...
    salinity           (time, depth, lat, lon) float32 ...
    salinity_bottom    (time, lat, lon) float32 ...
    surf_el            (time, lat, lon) float32 ...
Attributes: (12/14)
    classification_level:      UNCLASSIFIED
    distribution_statement:    Approved for public release. Distribution unli...
    downgrade_date:            not applicable
    classification_authority:  not applicable
    institution:               Fleet Numerical Meteorology and Oceanography C...
    source:                    HYCOM archive file
    ...                        ...
    field_type:                instantaneous
    Conventions:               CF-1.4, NAVO_netcdf_v1.1
    _CoordSysBuilder:          ucar.nc2.dataset.conv.CF1Convention
    cdm_data_type:             GRID
    featureType:               GRID
    location:                  Proto fmrc:GLBy0.08_930_FMRC

Other topics#

NOAA OFS models: how to use filetypes besides default 3D “fields”#

All NOAA OFS model configurations are available with 3D fields filetypes. However, for some models there are other filetypes:

  • regular_grid: model output interpolated to rectilinear grid

  • 2ds: only surface model output, variable names changed

You can see what model configurations are available, specifically including filetypes, by looking at the main catalog since they are in the catalog name:

list(main_cat)
['NYOFS',
 'CBOFS-RGRID',
 'DBOFS-RGRID',
 'WCOFS-RGRID',
 'LEOFS-RGRID',
 'RTOFS-EAST',
 'RTOFS-AK',
 'RTOFS-2D',
 'WCOFS-2DS',
 'GOFS',
 'GFS-1-2DEG',
 'SFBOFS-RGRID',
 'RTOFS',
 'CREOFS-RGRID',
 'TBOFS',
 'NGOFS2',
 'GFS-1DEG',
 'NGOFS2-RGRID',
 'GOMOFS-2DS',
 'CIOFS',
 'LMHOFS-RGRID',
 'GOMOFS-RGRID',
 'LMHOFS',
 'CIOFS-RGRID',
 'DBOFS',
 'LEOFS',
 'SFBOFS',
 'TBOFS-RGRID',
 'NGOFS2-2DS',
 'RTOFS-WEST',
 'LOOFS',
 'WCOFS',
 'GOMOFS',
 'GFS-1-4DEG',
 'CREOFS',
 'LSOFS',
 'CBOFS']
start_date = pd.Timestamp.today()
source_TBOFSRG = mc.select_date_range(main_cat['TBOFS-RGRID'], model_source='coops-forecast-noagg', 
                                      start_date=start_date, end_date=start_date)
ds = source_TBOFSRG.to_dask()

Access variables and axis#

Metadata has been added to the model Datasets to facilitate certain variable and axis accessibility. The variable access is set up by adding standard variable names to the datasets when they are opened, based on the variables available, and the axis accessibility is first by adding some attributes and then by using cf-xarray. See the following examples.

Filter an xarray Dataset by the attribute of standard_name (returns an xarray Dataset):

ds.filter_by_attrs(standard_name='sea_water_practical_salinity')
<xarray.Dataset>
Dimensions:     (ocean_time: 24, Depth: 11, ny: 393, nx: 329)
Coordinates:
  * Depth       (Depth) float64 0.0 2.0 4.0 6.0 8.0 ... 12.0 15.0 20.0 25.0 30.0
    Latitude    (ny, nx) float64 dask.array<chunksize=(393, 329), meta=np.ndarray>
    Longitude   (ny, nx) float64 dask.array<chunksize=(393, 329), meta=np.ndarray>
  * ocean_time  (ocean_time) datetime64[ns] 2022-10-05 ... 2022-10-05T23:00:00
  * nx          (nx) int64 0 1 2 3 4 5 6 7 8 ... 321 322 323 324 325 326 327 328
  * ny          (ny) int64 0 1 2 3 4 5 6 7 8 ... 385 386 387 388 389 390 391 392
Data variables:
    salt        (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 11, 393, 329), meta=np.ndarray>
Attributes: (12/33)
    file:                            nos.tbofs.fields.nowcast.20221005.t00z_0...
    format:                          netCDF-4/HDF5 file
    Conventions:                     CF-1.4, SGRID-0.3
    type:                            ROMS/TOMS history file
    title:                           tbofs nowcast RUN in operational mode
    var_info:                        varinfo.dat
    ...                              ...
    compiler_flags:                  -fp-model precise -ip -O2
    tiling:                          008x016
    history:                         ROMS/TOMS, Version 3.9, Wednesday - Octo...
    ana_file:                        ROMS/Functionals/ana_btflux.h, ROMS/Func...
    CPP_options:                     mode, ADD_FSOBC, ADD_M2OBC, ANA_BSFLUX, ...
    DODS_EXTRA.Unlimited_Dimension:  ocean_time

Alternatively, you could back out the variable name and use it directly (returns an xarray DataArray):

varname = source_TBOFSRG.metadata['standard_names']['sea_water_practical_salinity']
ds[varname]
<xarray.DataArray 'salt' (ocean_time: 24, Depth: 11, ny: 393, nx: 329)>
dask.array<concatenate, shape=(24, 11, 393, 329), dtype=float32, chunksize=(1, 11, 393, 329), chunktype=numpy.ndarray>
Coordinates:
  * Depth       (Depth) float64 0.0 2.0 4.0 6.0 8.0 ... 12.0 15.0 20.0 25.0 30.0
    Latitude    (ny, nx) float64 dask.array<chunksize=(393, 329), meta=np.ndarray>
    Longitude   (ny, nx) float64 dask.array<chunksize=(393, 329), meta=np.ndarray>
  * ocean_time  (ocean_time) datetime64[ns] 2022-10-05 ... 2022-10-05T23:00:00
  * nx          (nx) int64 0 1 2 3 4 5 6 7 8 ... 321 322 323 324 325 326 327 328
  * ny          (ny) int64 0 1 2 3 4 5 6 7 8 ... 385 386 387 388 389 390 391 392
Attributes:
    long_name:      salinity
    units:          PSU
    time:           ocean_time
    grid:           grid
    location:       face
    field:          salinity, scalar, series
    standard_name:  sea_water_practical_salinity

Refer to time axis without knowing the time variable name with cf-xarray:

ds.cf['T']
<xarray.DataArray 'ocean_time' (ocean_time: 24)>
array(['2022-10-05T00:00:00.000000000', '2022-10-05T01:00:00.000000000',
       '2022-10-05T02:00:00.000000000', '2022-10-05T03:00:00.000000000',
       '2022-10-05T04:00:00.000000000', '2022-10-05T05:00:00.000000000',
       '2022-10-05T06:00:00.000000000', '2022-10-05T07:00:00.000000000',
       '2022-10-05T08:00:00.000000000', '2022-10-05T09:00:00.000000000',
       '2022-10-05T10:00:00.000000000', '2022-10-05T11:00:00.000000000',
       '2022-10-05T12:00:00.000000000', '2022-10-05T13:00:00.000000000',
       '2022-10-05T14:00:00.000000000', '2022-10-05T15:00:00.000000000',
       '2022-10-05T16:00:00.000000000', '2022-10-05T17:00:00.000000000',
       '2022-10-05T18:00:00.000000000', '2022-10-05T19:00:00.000000000',
       '2022-10-05T20:00:00.000000000', '2022-10-05T21:00:00.000000000',
       '2022-10-05T22:00:00.000000000', '2022-10-05T23:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * ocean_time  (ocean_time) datetime64[ns] 2022-10-05 ... 2022-10-05T23:00:00
Attributes:
    long_name:      time since initialization
    field:          time, scalar, series
    standard_name:  time
    axis:           T