CORDEX

Let’s look at the cordex sub-command in detail:

!clef cordex --help
Usage: clef cordex [OPTIONS] [QUERY]...

  Search ESGF and local database for CORDEX files.

  Constraints can be specified multiple times, in which case they are
  combined    using OR: -v tas -v tasmin will return anything matching
  variable = 'tas' or variable = 'tasmin'. The --latest flag will check ESGF
  for the latest version available, this is the default behaviour NB. for
  CORDEX data associated to CMIP6 use  the cmip6 command with CORDEX as
  activity_id

Options:
  --latest / --all-versions       Return only the latest version or all of
                                  them. Default: --latest

  --replica / --no-replica        Return both original files and replicas.
                                  Default: --no-replica

  --distrib / --no-distrib        Distribute search across all ESGF nodes.
                                  Default: --distrib

  --csv / --no-csv                Send output to csv file including extra
                                  information. Works only with --local and
                                  --remote. Default: --no-csv

  --stats / --no-stats            Write summary of query results. Works only
                                  with --local and --remote. Default: --no-
                                  stats

  --debug / --no-debug            Show debug output. Default: --no-debug
  -d, --domain FACET              CORDEX region name
  -e, --experiment FACET          Experiment
  -dex, --driving_experiment FACET
                                  CMIP5 experiment of driving GCM or
                                  'evaluation' for re-analysis

  -dmod, --driving_model FACET    Model/analysis used to drive the model (eg.
                                  ECMWF­ERAINT)

  -m, --rcm_name FACET            Identifier of the CORDEX Regional Climate
                                  Model

  -rcmv, --rcm_version FACET      Identifier for reruns with perturbed
                                  parameters or smaller RCM release upgrades

  -v, --variable FACET            Variable name in file
  -f, --time_frequency FACET      Output frequency indicator
  -en, --ensemble FACET           Ensemble member of the driving GCM
  -vrs, --version FACET           Data publication version
  -cf, --cf_standard_name FACET   CF-Conventions name of the variable
  -ef, --experiment_family FACET  Experiment family: All, Historical, RCP
  -inst, --institute FACET        identifier for the institution that is
                                  responsible for the scientific aspects of
                                  the CORDEX simulation

  --and [domain|experiment|driving_experiment|driving_model|rcm_name|rcm_version|variable|time_frequency|ensemble|version|cf_standard_name|experiment_family|institute]
                                  Attributes for which we want to add AND
                                  filter, i.e. -v tasmin -v tasmax --and
                                  variable will return only model/ensemble
                                  that have both

  --help                          Show this message and exit.

cordex works in the same way but some constraints are specific to its experiment design. These are the cordex domain, rcm_name, rcm_version for the regional model, and the driving_model and driving_experiment for the driving model. CORDEX also does not use tables so you always have to use f--frequency to select different timesteps.

!clef cordex -v tas -e historical -dmod CSIRO-BOM-ACCESS1-3 -en r1i1p1 -f mon
/g/data/rr3/publications/CORDEX/output/AUS-44/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360J/v1/mon/tas/latest/
/g/data/rr3/publications/CORDEX/output/AUS-44/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360K/v1/mon/tas/latest/
/g/data/rr3/publications/CORDEX/output/AUS-44/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360L/v1/mon/tas/latest/
/g/data/rr3/publications/CORDEX/output/AUS-44i/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360J/v1/mon/tas/latest/
/g/data/rr3/publications/CORDEX/output/AUS-44i/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360K/v1/mon/tas/latest/
/g/data/rr3/publications/CORDEX/output/AUS-44i/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360L/v1/mon/tas/latest/

Everything available on ESGF is also available locally

CMIP6

Similarly cmip6 has its own arguments but usage is the same:

!clef cmip6 --help
Usage: clef cmip6 [OPTIONS] [QUERY]...

  Search ESGF and local database for CMIP6 files Constraints can be
  specified multiple times, in which case they are combined using OR:  -v
  tas -v tasmin will return anything matching variable = 'tas' or variable =
  'tasmin'. The --latest flag will check ESGF for the latest version
  available, this is the default behaviour

Options:
  -mip, --activity [AerChemMIP|C4MIP|CDRMIP|CFMIP|CMIP|CORDEX|DAMIP|DCPP|DynVarMIP|FAFMIP|GMMIP|GeoMIP|HighResMIP|ISMIP6|LS3MIP|LUMIP|OMIP|PAMIP|PMIP|RFMIP|SIMIP|ScenarioMIP|VIACSAB|VolMIP]
  -e, --experiment x              CMIP6 experiment, list of available depends
                                  on activity

  --source_type [AER|AGCM|AOGCM|BGC|CHEM|ISM|LAND|OGCM|RAD|SLAB]
  -t, --table x                   CMIP6 CMOR table: Amon, SIday, Oday ...
  -m, --model, --source_id x      CMIP6 model id: GFDL-AM4, CNRM-CM6-1 ...
  -v, --variable x                CMIP6 variable name as in filenames
  -mi, --member TEXT              CMIP6 member id: <sub-exp-id>-r#i#p#f#
  -g, --grid, --grid_label TEXT   CMIP6 grid label: i.e. gn for the model
                                  native grid

  -nr, --resolution, --nominal_resolution TEXT
                                  Approximate resolution: '250 km', pass in
                                  quotes

  --frequency [1hr|1hrCM|1hrPt|3hr|3hrPt|6hr|6hrPt|day|dec|fx|mon|monC|monPt|subhrPt|yr|yrPt]
  --realm [aerosol|atmos|atmosChem|land|landIce|ocean|ocnBgchem|seaIce]
  -se, --sub_experiment_id TEXT   Only available for hindcast and forecast
                                  experiments: sYYYY

  -vl, --variant_label TEXT       Indicates a model variant: r#i#p#f#
  --cf_standard_name TEXT         CF variable standard_name, use instead of
                                  variable constraint

  --and [variable_id|experiment_id|table_id|realm|frequency|member_id|source_id|source_type|activity_id|grid|grid_label|nominal_resolution|sub_experiment_id]
                                  Attributes for which we want to add AND
                                  filter, i.e. --and variable_id to apply to
                                  variable values

  --cite                          Write list of citations for query results,
                                  works only with --remote and --local
                                  options. Default: False

  --institution TEXT              Modelling group institution id: IPSL, NOAA-
                                  GFDL ...

  --latest / --all-versions       Return only the latest version or all of
                                  them. Default: --latest

  --replica / --no-replica        Return both original files and replicas.
                                  Default: --no-replica

  --distrib / --no-distrib        Distribute search across all ESGF nodes.
                                  Default: --distrib

  --csv / --no-csv                Send output to csv file including extra
                                  information. Works only with --local and
                                  --remote. Default: --no-csv

  --stats / --no-stats            Write summary of query results. Works only
                                  with --local and --remote. Default: --no-
                                  stats

  --debug / --no-debug            Show debug output. Default: --no-debug
  --help                          Show this message and exit.

The cmip6 sub-command works in the same way but some constraints are different. As well as changes in terminology CMIP6 has more attributes (facets) that can be used to select the data. Examples of these are the activity which groups experiments, resolution which is an approximation of the actual resolution and grid.

Controlling the ouput: clef options

clef allows some control over the output by using different flags after clef:

!clef --local cmip6 -e 1pctCO2 -t Amon -v tasmax -v tasmin -g gr
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1-HR/1pctCO2/r1i1p1f2/Amon/tasmax/gr/v20191021
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/1pctCO2/r1i1p1f2/Amon/tasmax/gr/v20180626
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r10i1p1f2/Amon/tasmax/gr/v20200529
...
/g/data/oi10/replicas/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/1pctCO2/r1i1p1f1/Amon/tasmin/gr/v20180727
/g/data/oi10/replicas/CMIP6/CMIP/THU/CIESM/1pctCO2/r1i1p1f1/Amon/tasmin/gr/v20200417

In this example we used the --local option for the main command clef to get only the local matching data path as output. Note also that: - we are using abbreviations for the options where available; - we are passing the variable -v option twice; - we used the CMIP6 specific option -g/--grid to search for all data that is not on the model native grid. This doesn’t indicate a grid common to all the CMIP6 output only to the model itself, the same is true for member_id and other attributes.

--local is actually executing the query directly on the NCI clef.nci.org.au database, which is different from the default query where the search is executed first on the ESGF and then its results are matched locally. In the example above the final result is exactly the same, whichever way we perform the query. This way of searching can give you more results if a node is offline or if a version have been unpublished from the ESGF but is still available locally.

!clef --missing cmip6 -e 1pctCO2 -v clw -v clwvi -t Amon -g gr
Available on ESGF but not locally:
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clw.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200620
...
CMIP6.CMIP.THU.CIESM.1pctCO2.r1i1p1f1.Amon.clw.gr.v20200417
CMIP6.CMIP.THU.CIESM.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200417

This time we used the --missing option and the tool returned only the results matching the constraints that are available on the ESGF but not locally (we changed variables to make sure to get some missing data back).

!clef --remote cmip6 -e 1pctCO2 -v tasmin -t Amon -g gr
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1-HR.1pctCO2.r1i1p1f2.Amon.tasmin.gr.v20191021
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.tasmin.gr.v20180626
...
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.tasmin.gr.v20180727
CMIP6.CMIP.NIMS-KMA.KACE-1-0-G.1pctCO2.r1i1p1f1.Amon.tasmin.gr.v20200115
CMIP6.CMIP.THU.CIESM.1pctCO2.r1i1p1f1.Amon.tasmin.gr.v20200417

The --remote option returns the Dataset_ids of the data matching the constraints, regardless that they are available locally or not.

Please note that --local, --remote and --missing together with --request, which we will look at next, are all options of the main command clef and they need to come before any sub-commands.

Requesting new data

What should we do if we found out there is some data we are interested to that has not been downloaded or requested yet? This is a complex data collection, NCI, in consultation with the community, decided the best way to manage it was to have one point of reference. Part of this agreement is that NCI will download the files and update the database that clef is interrrogating. After consultation with the community a priority list was decided and NCI has started downloading anything that falls into it as soon as become available. Users can then request from the NCI helpdesk, other combinations of variables, experiments etc that do not fall into this list. The list is available from the NCI climate confluence website: Even without consulting the list you can use clef, as we demonstrated above, to search for a particular dataset, if it is not queued or downloaded already clef will give you an option to request it from NCI. Let’s see how it works.

%%bash
clef --request cmip6 -e 1pctCO2 -v clw -v clwvi -t Amon -g gr
no
Available on ESGF but not locally:
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clw.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200620
...
CMIP6.CMIP.THU.CIESM.1pctCO2.r1i1p1f1.Amon.clw.gr.v20200417
CMIP6.CMIP.THU.CIESM.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200417
Do you want to proceed with request for missing files? (N/Y)
 No is default
Your request has been saved in
 /home/581/pxp581/clef/docs/CMIP6_pxp581_20210429T135117.txt
You can use this file to request the data via the NCI helpdesk: help@nci.org.au  or https://help.nci.org.au.

We run the same query which gave us as a result 4 missing datasets but this time we used the --request option after clef. The tool will execute the query remotely, then look for matches locally and on the NCI download list. Having found none gives as an option of putting in a request. It will accept any of the following as a positive answer: > Y YES y yes

With anything else or if you don’t pass anything it will assume you don’t want to put in a request. It still saved the request in a file we can use later.

!head -n 4 CMIP6_*.txt
dataset_id=CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clw.gr.v20200620
dataset_id=CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200620
dataset_id=CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r2i1p1f1.Amon.clw.gr.v20200620
dataset_id=CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r2i1p1f1.Amon.clwvi.gr.v20200620

If I answered yes the tool would have sent an e-mail to the NCI helpdesk with the text file attached, NCI can pass that file as input to their download tool and queue your request. NB if you are running clef from gadi you cannot send an e-mail so in that case the tool will skip the question and just remind you to send an e-mail to the NCI helpdesk yourself to finalise the request.