Demo Workflow#
model_catalogs provides easy access to specific curated ocean models through Intake catalogs, with the ultimate product being an xarray Dataset.
An Intake catalog has one or more Sources listed in it; a Source is a distinct product that has a urlpath pointing to where the model output can be accessed. An Intake catalog can be nested catalogs that each have sources.
import model_catalogs as mc
import intake
import pandas as pd
import os
import xarray as xr
Main Catalog#
The first step is to set up the main catalog. The main catalog provides access to all of the models and the various model_sources that are available for each model. The catalog is set up by reading in the transform.yaml catalog file, pointing it to the hard-wired original (“orig”) catalog file for each model, and reading and copying in previously-calculated boundary information for the model (also saved in the repository). Every model catalog in the main catalog has at least a single model_source, but may also have others. Each model_source model catalog has a urlpath that points to a location for the model output so that a user can look at some model output. If the model output is available at a static link, that link is provided by urlpath. If the model output is available as a series of files that need to be aggregated (often for NOAA OFS models), then two example files are provided. Later in the process, the urlpath for all unaggregated models will be filled in with the full listing of file locations that are requested by the user.
The main catalog is compiled from parts that are saved to the repository, but some of the model catalog files have Intake user parameters for today and/or yesterday that are filled in when the main catalog is run. To save time, the individual compiled catalog files are saved into a user application cache and are considered “fresh” for 6 hours (set in mc.FRESH['compiled'] in the __init__ file) since they need to be recreated for each day. When mc.setup() is run, the catalog will be put together fresh if either more than 6 hours have passed since last time it was run or override=True is input to force it to be recreated. Otherwise, the previously-written “compiled” catalog files will be reread.
Each model catalog nested in the main catalog points to an original model catalog as a “target” for an Intake transform in model_catalogs. This transform is run when model output is requested from the model with source.to_dask() and it applies metadata stored in the model catalogs. The original model information can also be revealed with source.target.
Sample main catalog
|
|
|
Description |
|---|---|---|---|
CIOFS |
|||
coops-forecast-agg |
static link to pre-aggregated forecast model output |
||
coops-forecast-noagg |
unaggregated model output from recently through forecast |
||
ncei-archive-noagg |
unaggregated nowcast model output from the past through not too long ago |
||
ncei-archive-agg |
aggregated forecast files from the past |
||
WCOFS |
|||
coops-forecast-noagg |
unaggregated model output from recently through forecast |
||
ncei-archive-noagg |
unaggregated nowcast model output from the past through not too long ago |
||
GOFS |
|||
hycom-forecast-agg |
static link to forecast model output |
||
RTOFS |
|||
nomads-forecast-yesterday |
known links to files, prepopulated with “yesterday” to get correct links |
||
nomads-forecast-today |
known links to files, prepopulated with “today” to get correct links. These files are only available midday so aren’t as reliable to use as the forecast depending on the time of time. |
…all the rest are similar
NOTE: The model domain boundaries are previously calculated and saved into the repository. However, they can be recalculated with mc.calculate_boundaries() if a model has been changed.
NOTE: The user application cache directory can be revealed with mc.CAT_PATH.
main_cat = mc.setup()
# Or if you have changed a model catalog file, you can incorporate those changes into main_cat with:
# main_cat = mc.setup(override=True)
See what models are available with:
list(main_cat)
['NYOFS',
'CBOFS-RGRID',
'DBOFS-RGRID',
'WCOFS-RGRID',
'LEOFS-RGRID',
'RTOFS-EAST',
'RTOFS-AK',
'RTOFS-2D',
'WCOFS-2DS',
'GOFS',
'GFS-1-2DEG',
'SFBOFS-RGRID',
'RTOFS',
'CREOFS-RGRID',
'TBOFS',
'NGOFS2',
'GFS-1DEG',
'NGOFS2-RGRID',
'GOMOFS-2DS',
'CIOFS',
'LMHOFS-RGRID',
'GOMOFS-RGRID',
'LMHOFS',
'CIOFS-RGRID',
'DBOFS',
'LEOFS',
'SFBOFS',
'TBOFS-RGRID',
'NGOFS2-2DS',
'RTOFS-WEST',
'LOOFS',
'WCOFS',
'GOMOFS',
'GFS-1-4DEG',
'CREOFS',
'LSOFS',
'CBOFS']
Examine metadata of models in source catalog#
The main catalog contains information about each model. The full list of metadata keys for a model catalog can be found with:
main_cat['CBOFS'].metadata.keys()
dict_keys(['alpha_shape', 'bounding_box', 'filetype', 'geospatial_bounds', 'grid_dim_(xyz)', 'grid_res_max_(km)', 'grid_res_min_(km)', 'grid_type', 'html_desc', 'long_name', 'catalog_dir'])
Each source also has its own metadata:
main_cat['CBOFS']['coops-forecast-agg'].metadata.keys()
dict_keys(['axis', 'catalog_dir', 'output_period_(hr)', 'overall_end_datetime', 'overall_start_datetime', 'standard_names'])
Domain boundaries#
import shapely.wkt
P = shapely.wkt.loads(main_cat['CBOFS'].metadata['geospatial_bounds'])
P
Variables#
A mapping for the relevant variables to NOAA applications has been written into each source catalog. The mapping is between a standard CF convention variable name to the model dataset variable name. An example list of possible variables used is:
eastward_sea_water_velocity
eastward_wind
northward_sea_water_velocity
northward_wind
sea_ice_area_fraction
sea_ice_thickness
sea_surface_elevation
sea_water_temperature
sea_water_practical_salinity
The full standard_name list is available in the docs.
Examine the variable mapping for a given model:
main_cat['CBOFS']['coops-forecast-agg'].metadata['standard_names']
{'angle': 'angle_of_rotation_from_east_to_x',
'eastward_sea_water_velocity': 'u',
'eastward_wind': 'Uwind',
'northward_sea_water_velocity': 'v',
'northward_wind': 'Vwind',
'ocean_s_coordinate_g1': ['s_rho', 's_w'],
'sea_floor_depth': 'h',
'sea_surface_elevation': 'zeta',
'sea_water_practical_salinity': 'salt',
'sea_water_temperature': 'temp',
'time': 'time',
'upward_sea_water_velocity': 'w'}
Conveniences: urlpath, model output source, and server check#
You can find the model output location(s) by querying urlpath on a source. Every source has a default urlpath that can be checked.
main_cat['CBOFS']['coops-forecast-noagg'].urlpath # shows 2 default files
['https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n001.20221004.t00z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n002.20221004.t00z.nc']
The datetimes associated with files to be aggregated together from models that have a list of paths in urlpath (instead of just a static link) can be examined separately for convenience in the source:
main_cat['CBOFS']['coops-forecast-noagg'].dates
[Timestamp('2022-10-03 19:00:00'), Timestamp('2022-10-03 20:00:00')]
The sources in model_catalogs are Transforms pointing to other Intake catalog files that are the target of the transforms. The sources are what need to have urlpath defined. However, model_catalogs brings forward the urlpath so you can query it from the transform too. The original model information, the target of the transform, can be found with source.target. For example:
main_cat['CBOFS']['coops-forecast-noagg'].target
coops-forecast-noagg:
args:
combine: by_coords
compat: override
coords: minimal
data_vars: minimal
engine: netcdf4
parallel: true
urlpath:
- https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n001.20221004.t00z.nc
- https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n002.20221004.t00z.nc
description: Unaggregated 3D Chesapeake Bay model in ROMS from 30 days ago with
48 hour forecast.
driver: intake_xarray.opendap.OpenDapSource
metadata:
axis:
T: ocean_time
X:
- xi_rho
- xi_u
- xi_v
- xi_psi
Y:
- eta_rho
- eta_u
- eta_v
- eta_psi
Z:
- s_rho
- s_w
catalog_dir: /Users/kthyng/projects/model_catalogs/model_catalogs/catalogs/orig/
catloc: https://opendap.co-ops.nos.noaa.gov/thredds/catalog/NOAA/CBOFS/MODELS/catalog.xml
output_period_(hr): 1
overall_end_datetime: 48 hours after present time
overall_start_datetime: 1 month before present day
sample_locs:
- https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n001.20221004.t00z.nc
- https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/10/04/nos.cbofs.fields.n002.20221004.t00z.nc
standard_names:
angle: angle_of_rotation_from_east_to_x
eastward_sea_water_velocity: u
eastward_wind: Uwind
northward_sea_water_velocity: v
northward_wind: Vwind
ocean_s_coordinate_g1:
- s_rho
- s_w
sea_floor_depth: h
sea_surface_elevation: zeta
sea_water_practical_salinity: salt
sea_water_temperature: temp
time: ocean_time
upward_sea_water_velocity: w
You can also see if the server for a given Source is currently running with
main_cat['CBOFS']['coops-forecast-noagg'].status
True
Find availability for a model#
You can query a specific model for its availability for any or all model_sources. This is not a necessary step, but is useful for a user choosing the model to use and what is possible. The start and end datetimes found are saved into the user application cache and are reused if still considered fresh (as determined by the FRESH parameters for each model_source and start or end times).
If no model_sources are input, the availability will be found and inserted into each source in the input model catalog:
mc.find_availability(main_cat['CIOFS']); this returns an Intake CatalogAlternatively, the user can choose 1 or more model_sources to check, for example, maybe they are only interested in checking the available datetime range of a particular model_source:
mc.find_availability(main_cat['CIOFS'], model_source='coops-forecast-agg'); this returns an Intake CatalogOr, simplest of all, the user can directly input the Intake Source they want to evaluate:
mc.find_availability(main_cat['CIOFS']['coops-forecast-agg']); this returns an Intake Source
In the following example, some model_sources for the CIOFS model may still be fresh for the first call below. The second time the request is made, all previously-calculated availabilities are fresh so it is fast to run.
%%time
cat = mc.find_availability(main_cat['CIOFS'])
CPU times: user 488 ms, sys: 14.9 ms, total: 503 ms
Wall time: 3.62 s
%%time
cat = mc.find_availability(main_cat['CIOFS'], verbose=True)
coops-forecast-agg: 2022-09-28T01:00:00.000000000 to 2022-10-07T12:00:00.000000000
coops-forecast-noagg: 2022-09-04 13:00:00 to 2022-10-07 12:00:00
ncei-archive-agg: 2019-09-16T19:00:00.000000000 to 2022-10-01T18:00:00.000000000
ncei-archive-noagg: 2019-09-16 19:00:00 to 2022-10-01 18:00:00
CPU times: user 6.28 ms, sys: 1.87 ms, total: 8.15 ms
Wall time: 7.03 ms
Setup model catalog for desired date range#
For unaggregated model_sources for NOAA OFS models like “coops-forecast-agg”, the locations of the necessary model files need to be determined and saved to urlpath in the source, based on the date range desired. mc.select_date_range() does this.
There are four modes for mc.select_date_range():
Run
mc.find_availability()for all model_sources, then request your desired date range inmc.select_date_range()and the code will figure out what model_source to use.Skip straight to
mc.select_date_range()because you know which model_source to use for your model, and input that model_source.In the special case of there only being one model_source in the cat, it does not need to be input.
In the special case of the model output being available at a static link or deterministic paths, the function of
mc.select_date_range()is to input the desired start and end dates into the metadata so that the ultimate Dataset is subselected to that time range.
mc.select_date_range() can take in an Intake Catalog or Source, but always returns a Source.
NOTE:
Some model sources have static links to the model output, in which case they can be immediately accessed, like the GOFS forecast. For those models, mc.select_date_range() does not need to be run and the model output can be accessed directly from the main catalog. You don’t need to know this though — you can use the same process described here too.
NOTE:
All model sources have some files that work stored in urlpath so that you can always run main_cat[model][model_source].to_dask() and get something back. However, these are sample files and will be replaced with the proper file paths when a user runs mc.select_date_range().
Example of #1. After this is run, model output can be accessed through source_CIOFS.
start_date = pd.Timestamp.today()
main_cat = mc.setup()
cat = mc.find_availability(main_cat['WCOFS'])
source_WCOFS = mc.select_date_range(cat, start_date=start_date, end_date=start_date, override=True)
source_WCOFS.urlpath # shows files for the date range selection
['https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n003.20221005.t03z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n006.20221005.t03z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n009.20221005.t03z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n012.20221005.t03z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n015.20221005.t03z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n018.20221005.t03z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n021.20221005.t03z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/WCOFS/MODELS/2022/10/05/nos.wcofs.fields.n024.20221005.t03z.nc']
The dates associated with the newly-found filenames in urlpath can be checked with source.dates:
source_WCOFS.dates
[Timestamp('2022-10-05 00:00:00'),
Timestamp('2022-10-05 03:00:00'),
Timestamp('2022-10-05 06:00:00'),
Timestamp('2022-10-05 09:00:00'),
Timestamp('2022-10-05 12:00:00'),
Timestamp('2022-10-05 15:00:00'),
Timestamp('2022-10-05 18:00:00'),
Timestamp('2022-10-05 21:00:00')]
Example of #2:
today = pd.Timestamp.today()
main_cat = mc.setup()
source_LMHOFS = mc.select_date_range(main_cat['LMHOFS']['coops-forecast-noagg'], start_date=today, end_date=today)
source_LMHOFS.urlpath # shows files for the date range selection
['https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n000.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n001.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n002.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n003.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n004.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n005.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n006.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n001.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n002.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n003.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n004.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n005.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n006.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n001.20221005.t18z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n002.20221005.t18z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n003.20221005.t18z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n004.20221005.t18z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.n005.20221005.t18z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f000.20221005.t18z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f001.20221005.t18z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f002.20221005.t18z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f003.20221005.t18z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f004.20221005.t18z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/LMHOFS/MODELS/2022/10/05/nos.lmhofs.fields.f005.20221005.t18z.nc']
Example of #3 and #4:
today = pd.Timestamp.today()
main_cat = mc.setup()
source_GOFS = mc.select_date_range(main_cat['GOFS'], start_date=today, end_date=today+pd.Timedelta('1 day'))
source_GOFS.urlpath
'http://tds.hycom.org/thredds/dodsC/GLBy0.08/latest'
Note that since the GOFS forecast output is available at a static link, it can also be accessed directly. The difference in the resulting Dataset is that source_GOFS_direct retains all times in the output whereas source_GOFS has been filtered to the specified date range.
source_GOFS_direct = main_cat['GOFS']['hycom-forecast-agg'].to_dask()
Direct workflow#
If you already know what model and model_source you want to use, you can skip ahead to get your desired date range with:
source = mc.select_date_range(main_cat['CIOFS']['coops-forecast-noagg'], start_date=start_date, end_date=start_date)
source.urlpath
['https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n006.20221005.t00z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n001.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n002.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n003.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n004.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n005.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n006.20221005.t06z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n001.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n002.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n003.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n004.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n005.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.n006.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f001.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f002.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f003.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f004.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f005.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f006.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f007.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f008.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f009.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f010.20221005.t12z.nc',
'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CIOFS/MODELS/2022/10/05/nos.ciofs.fields.f011.20221005.t12z.nc']
Read in model output#
Model output can be accessed from our examples 1 and 2 above, or from an example with a static link.
Example #1:
%%time
source_WCOFS.to_dask()
CPU times: user 3.83 s, sys: 416 ms, total: 4.25 s
Wall time: 7.83 s
<xarray.Dataset>
Dimensions: (tracer: 2, boundary: 4, s_rho: 40, s_w: 41, eta_rho: 1016,
xi_rho: 348, eta_u: 1016, xi_u: 347, eta_v: 1015,
xi_v: 348, eta_psi: 1015, xi_psi: 347, ocean_time: 2)
Coordinates: (12/19)
* s_rho (s_rho) float64 -0.9875 -0.9625 -0.9375 ... -0.0375 -0.0125
* s_w (s_w) float64 -1.0 -0.975 -0.95 -0.925 ... -0.05 -0.025 0.0
lon_rho (eta_rho, xi_rho) float64 dask.array<chunksize=(1016, 348), meta=np.ndarray>
lat_rho (eta_rho, xi_rho) float64 dask.array<chunksize=(1016, 348), meta=np.ndarray>
lon_u (eta_u, xi_u) float64 dask.array<chunksize=(1016, 347), meta=np.ndarray>
lat_u (eta_u, xi_u) float64 dask.array<chunksize=(1016, 347), meta=np.ndarray>
... ...
* xi_v (xi_v) int64 0 1 2 3 4 5 6 7 ... 341 342 343 344 345 346 347
* xi_psi (xi_psi) int64 0 1 2 3 4 5 6 ... 340 341 342 343 344 345 346
* eta_rho (eta_rho) int64 0 1 2 3 4 5 ... 1010 1011 1012 1013 1014 1015
* eta_u (eta_u) int64 0 1 2 3 4 5 6 ... 1010 1011 1012 1013 1014 1015
* eta_v (eta_v) int64 0 1 2 3 4 5 6 ... 1009 1010 1011 1012 1013 1014
* eta_psi (eta_psi) int64 0 1 2 3 4 5 ... 1009 1010 1011 1012 1013 1014
Dimensions without coordinates: tracer, boundary
Data variables: (12/80)
ntimes int32 ...
ndtfast int32 ...
dt float64 ...
dtfast float64 ...
dstart datetime64[ns] ...
nHIS int32 ...
... ...
vbar (ocean_time, eta_v, xi_v) float32 dask.array<chunksize=(1, 1015, 348), meta=np.ndarray>
u (ocean_time, s_rho, eta_u, xi_u) float32 dask.array<chunksize=(1, 40, 1016, 347), meta=np.ndarray>
v (ocean_time, s_rho, eta_v, xi_v) float32 dask.array<chunksize=(1, 40, 1015, 348), meta=np.ndarray>
w (ocean_time, s_w, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 41, 1016, 348), meta=np.ndarray>
temp (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 40, 1016, 348), meta=np.ndarray>
salt (ocean_time, s_rho, eta_rho, xi_rho) float32 dask.array<chunksize=(1, 40, 1016, 348), meta=np.ndarray>
Attributes: (12/37)
file: nos.wcofs.fields.nowcast.20221005.t03z_0...
format: netCDF-3 64bit offset file
Conventions: CF-1.4, SGRID-0.3
type: ROMS/TOMS history file
title: wcofs nowcast RUN in operational mode
var_info: varinfo.dat
... ...
tiling: 008x060
history: ROMS/TOMS, Version 3.9, Wednesday - Octo...
ana_file: ROMS/Functionals/ana_btflux.h
CPP_options: mode, ADD_FSOBC, ADD_M2OBC, ANA_BSFLUX, ...
DODS_EXTRA.Unlimited_Dimension: ocean_time
EXTRA_DIMENSION.N: 40Example #2:
%%time
source_LMHOFS.to_dask()
CPU times: user 3.28 s, sys: 1.16 s, total: 4.45 s
Wall time: 14.7 s
<xarray.Dataset>
Dimensions: (nele: 171377, node: 90806, three: 3, time: 24,
maxnode: 11, maxelem: 9, four: 4, siglay: 20,
siglev: 21)
Coordinates:
lon (node) float32 dask.array<chunksize=(90806,), meta=np.ndarray>
lat (node) float32 dask.array<chunksize=(90806,), meta=np.ndarray>
lonc (nele) float32 dask.array<chunksize=(171377,), meta=np.ndarray>
latc (nele) float32 dask.array<chunksize=(171377,), meta=np.ndarray>
* time (time) datetime64[ns] 2022-10-05 ... 2022-10-05T23:00...
sigma_layers (siglay, node) float32 dask.array<chunksize=(20, 90806), meta=np.ndarray>
sigma_levels (siglev, node) float32 dask.array<chunksize=(21, 90806), meta=np.ndarray>
Dimensions without coordinates: nele, node, three, maxnode, maxelem, four,
siglay, siglev
Data variables: (12/48)
nprocs int32 ...
partition (nele) int32 dask.array<chunksize=(171377,), meta=np.ndarray>
x (node) float32 dask.array<chunksize=(90806,), meta=np.ndarray>
y (node) float32 dask.array<chunksize=(90806,), meta=np.ndarray>
xc (nele) float32 dask.array<chunksize=(171377,), meta=np.ndarray>
yc (nele) float32 dask.array<chunksize=(171377,), meta=np.ndarray>
... ...
atmos_press (time, node) float32 dask.array<chunksize=(1, 90806), meta=np.ndarray>
wet_nodes (time, node) int32 dask.array<chunksize=(1, 90806), meta=np.ndarray>
wet_cells (time, nele) int32 dask.array<chunksize=(1, 171377), meta=np.ndarray>
wet_nodes_prev_int (time, node) int32 dask.array<chunksize=(1, 90806), meta=np.ndarray>
wet_cells_prev_int (time, nele) int32 dask.array<chunksize=(1, 171377), meta=np.ndarray>
wet_cells_prev_ext (time, nele) int32 dask.array<chunksize=(1, 171377), meta=np.ndarray>
Attributes: (12/17)
title: LMHOFS
institution: School for Marine Science and Technology
source: FVCOM_4.3
history: model started at: 05/10/2022 07:15
references: http://fvcom.smast.umassd.edu, http://co...
Conventions: CF-1.0
... ...
Surface_Heat_Forcing: FVCOM variable surface heat forcing file...
Surface_Wind_Forcing: FVCOM variable surface Wind forcing:\nFI...
Surface_PrecipEvap_Forcing: FVCOM periodic surface precip forcing:\n...
DODS.strlen: 26
DODS.dimName: DateStrLen
DODS_EXTRA.Unlimited_Dimension: timeExample #3 (if you know you have direct access to static link):
%%time
main_cat['GOFS']['hycom-forecast-agg'].to_dask()
CPU times: user 121 µs, sys: 1 µs, total: 122 µs
Wall time: 126 µs
<xarray.Dataset>
Dimensions: (depth: 40, lat: 4251, lon: 4500, time: 8)
Coordinates:
* depth (depth) float64 0.0 2.0 4.0 6.0 ... 3e+03 4e+03 5e+03
* lat (lat) float64 -80.0 -79.96 -79.92 ... 89.92 89.96 90.0
* lon (lon) float64 0.0 0.07996 0.16 0.24 ... 359.8 359.8 359.9
* time (time) datetime64[ns] 2022-10-05T18:00:00 ... 2022-10-...
time_run (time) datetime64[ns] 2022-10-04T12:00:00 ... 2022-10-...
Data variables:
time_offset (time) datetime64[ns] 2022-09-30T18:00:00 ... 2022-10-...
water_u (time, depth, lat, lon) float32 ...
water_u_bottom (time, lat, lon) float32 ...
water_v (time, depth, lat, lon) float32 ...
water_v_bottom (time, lat, lon) float32 ...
water_temp (time, depth, lat, lon) float32 ...
water_temp_bottom (time, lat, lon) float32 ...
salinity (time, depth, lat, lon) float32 ...
salinity_bottom (time, lat, lon) float32 ...
surf_el (time, lat, lon) float32 ...
Attributes: (12/14)
classification_level: UNCLASSIFIED
distribution_statement: Approved for public release. Distribution unli...
downgrade_date: not applicable
classification_authority: not applicable
institution: Fleet Numerical Meteorology and Oceanography C...
source: HYCOM archive file
... ...
field_type: instantaneous
Conventions: CF-1.4, NAVO_netcdf_v1.1
_CoordSysBuilder: ucar.nc2.dataset.conv.CF1Convention
cdm_data_type: GRID
featureType: GRID
location: Proto fmrc:GLBy0.08_930_FMRCOther topics#
NOAA OFS models: how to use filetypes besides default 3D “fields”#
All NOAA OFS model configurations are available with 3D fields filetypes. However, for some models there are other filetypes:
regular_grid: model output interpolated to rectilinear grid2ds: only surface model output, variable names changed
You can see what model configurations are available, specifically including filetypes, by looking at the main catalog since they are in the catalog name:
list(main_cat)
['NYOFS',
'CBOFS-RGRID',
'DBOFS-RGRID',
'WCOFS-RGRID',
'LEOFS-RGRID',
'RTOFS-EAST',
'RTOFS-AK',
'RTOFS-2D',
'WCOFS-2DS',
'GOFS',
'GFS-1-2DEG',
'SFBOFS-RGRID',
'RTOFS',
'CREOFS-RGRID',
'TBOFS',
'NGOFS2',
'GFS-1DEG',
'NGOFS2-RGRID',
'GOMOFS-2DS',
'CIOFS',
'LMHOFS-RGRID',
'GOMOFS-RGRID',
'LMHOFS',
'CIOFS-RGRID',
'DBOFS',
'LEOFS',
'SFBOFS',
'TBOFS-RGRID',
'NGOFS2-2DS',
'RTOFS-WEST',
'LOOFS',
'WCOFS',
'GOMOFS',
'GFS-1-4DEG',
'CREOFS',
'LSOFS',
'CBOFS']
start_date = pd.Timestamp.today()
source_TBOFSRG = mc.select_date_range(main_cat['TBOFS-RGRID'], model_source='coops-forecast-noagg',
start_date=start_date, end_date=start_date)
ds = source_TBOFSRG.to_dask()
Access variables and axis#
Metadata has been added to the model Datasets to facilitate certain variable and axis accessibility. The variable access is set up by adding standard variable names to the datasets when they are opened, based on the variables available, and the axis accessibility is first by adding some attributes and then by using cf-xarray. See the following examples.
Filter an xarray Dataset by the attribute of standard_name (returns an xarray Dataset):
ds.filter_by_attrs(standard_name='sea_water_practical_salinity')
<xarray.Dataset>
Dimensions: (ocean_time: 24, Depth: 11, ny: 393, nx: 329)
Coordinates:
* Depth (Depth) float64 0.0 2.0 4.0 6.0 8.0 ... 12.0 15.0 20.0 25.0 30.0
Latitude (ny, nx) float64 dask.array<chunksize=(393, 329), meta=np.ndarray>
Longitude (ny, nx) float64 dask.array<chunksize=(393, 329), meta=np.ndarray>
* ocean_time (ocean_time) datetime64[ns] 2022-10-05 ... 2022-10-05T23:00:00
* nx (nx) int64 0 1 2 3 4 5 6 7 8 ... 321 322 323 324 325 326 327 328
* ny (ny) int64 0 1 2 3 4 5 6 7 8 ... 385 386 387 388 389 390 391 392
Data variables:
salt (ocean_time, Depth, ny, nx) float32 dask.array<chunksize=(1, 11, 393, 329), meta=np.ndarray>
Attributes: (12/33)
file: nos.tbofs.fields.nowcast.20221005.t00z_0...
format: netCDF-4/HDF5 file
Conventions: CF-1.4, SGRID-0.3
type: ROMS/TOMS history file
title: tbofs nowcast RUN in operational mode
var_info: varinfo.dat
... ...
compiler_flags: -fp-model precise -ip -O2
tiling: 008x016
history: ROMS/TOMS, Version 3.9, Wednesday - Octo...
ana_file: ROMS/Functionals/ana_btflux.h, ROMS/Func...
CPP_options: mode, ADD_FSOBC, ADD_M2OBC, ANA_BSFLUX, ...
DODS_EXTRA.Unlimited_Dimension: ocean_timeAlternatively, you could back out the variable name and use it directly (returns an xarray DataArray):
varname = source_TBOFSRG.metadata['standard_names']['sea_water_practical_salinity']
ds[varname]
<xarray.DataArray 'salt' (ocean_time: 24, Depth: 11, ny: 393, nx: 329)>
dask.array<concatenate, shape=(24, 11, 393, 329), dtype=float32, chunksize=(1, 11, 393, 329), chunktype=numpy.ndarray>
Coordinates:
* Depth (Depth) float64 0.0 2.0 4.0 6.0 8.0 ... 12.0 15.0 20.0 25.0 30.0
Latitude (ny, nx) float64 dask.array<chunksize=(393, 329), meta=np.ndarray>
Longitude (ny, nx) float64 dask.array<chunksize=(393, 329), meta=np.ndarray>
* ocean_time (ocean_time) datetime64[ns] 2022-10-05 ... 2022-10-05T23:00:00
* nx (nx) int64 0 1 2 3 4 5 6 7 8 ... 321 322 323 324 325 326 327 328
* ny (ny) int64 0 1 2 3 4 5 6 7 8 ... 385 386 387 388 389 390 391 392
Attributes:
long_name: salinity
units: PSU
time: ocean_time
grid: grid
location: face
field: salinity, scalar, series
standard_name: sea_water_practical_salinityRefer to time axis without knowing the time variable name with cf-xarray:
ds.cf['T']
<xarray.DataArray 'ocean_time' (ocean_time: 24)>
array(['2022-10-05T00:00:00.000000000', '2022-10-05T01:00:00.000000000',
'2022-10-05T02:00:00.000000000', '2022-10-05T03:00:00.000000000',
'2022-10-05T04:00:00.000000000', '2022-10-05T05:00:00.000000000',
'2022-10-05T06:00:00.000000000', '2022-10-05T07:00:00.000000000',
'2022-10-05T08:00:00.000000000', '2022-10-05T09:00:00.000000000',
'2022-10-05T10:00:00.000000000', '2022-10-05T11:00:00.000000000',
'2022-10-05T12:00:00.000000000', '2022-10-05T13:00:00.000000000',
'2022-10-05T14:00:00.000000000', '2022-10-05T15:00:00.000000000',
'2022-10-05T16:00:00.000000000', '2022-10-05T17:00:00.000000000',
'2022-10-05T18:00:00.000000000', '2022-10-05T19:00:00.000000000',
'2022-10-05T20:00:00.000000000', '2022-10-05T21:00:00.000000000',
'2022-10-05T22:00:00.000000000', '2022-10-05T23:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* ocean_time (ocean_time) datetime64[ns] 2022-10-05 ... 2022-10-05T23:00:00
Attributes:
long_name: time since initialization
field: time, scalar, series
standard_name: time
axis: T