AND Filter¶
We started adding additional features to CleF which allows more complex queries. We started from the following case. Let’s say that you want to find all the CMIP6 models that have both daily precipitation (pr) and soil moisture (mrso) for a particular experiment(historical). Up to now you would had to select separately both variables and then work out which models had both on your own.
We will show how this work starting by using the actual function
interactively. There is also a command line option but it returns only a
list of the models. First of all, since we are potentially passing more
than one value to the query we are using lists in our constraints
dictionary. Then we need to define the attributes for which we want all
values to be present, only variable_id
in this case. Finally we tell
the function which attributes define a simulation, this would most often
be model
and member
.
constraints = {'variable_id': ['pr','mrso'], 'frequency': ['mon'], 'experiment_id': ['historical']}
allvalues = ['variable_id']
fixed = ['source_id', 'member_id']
results, selection = matching(s, allvalues, fixed, project='CMIP6', **constraints)
The function returns the selected models/members combinations that have both variables and the corresponding subset of the original query results. NB currently using the abbreviated version for the constraints keys won’t work, you will have to use the attributes full names. You can see by printing the length of both lists and one of the first item of selection that the results have been grouped by models/ensembles and then filtered.
print(len(results),len(selection))
selection.iloc[0,:]
174 87
comb {(pr,), (mrso,)}
frequency {mon}
version {v20191108}
path {/g/data/fs38/publications/CMIP6/CMIP/CSIRO-AR...
table_id {Amon, Lmon}
index (322, 608)
Name: (ACCESS-CM2, r1i1p1f1), dtype: object
The full definition the matching() shows all the function arguments: >matching(session, cols, fixed, project=‘CMIP5’, local=True, latest=True, **kwargs)
From this you can see that like search() by default project
is
‘CMIP5’ and latest
is True. We didn’t have to use yet the local
argument which is True by default, we will see examples later where is
set to False so we can do the same query remotely.
AND filter on more than one attribute¶
We can pass more than value for more than one attribute, let’s add piControl to the experiment list.
constraints = {'variable_id': ['pr','mrso'], 'frequency': ['mon'], 'experiment_id': ['historical', 'piControl']}
results, selection = matching(s, allvalues, fixed, project='CMIP6', **constraints)
print(len(results),len(selection))
selection.iloc[0,:]
275 93
comb {(pr,), (mrso,)}
frequency {mon}
version {v20191112, v20191108}
path {/g/data/fs38/publications/CMIP6/CMIP/CSIRO-AR...
table_id {Amon, Lmon}
index (322, 624, 680, 766)
Name: (ACCESS-CM2, r1i1p1f1), dtype: object
As you can see we get now many more results but only a few more
combinations after applying the filter. This is because we are still
defining a simulation by using model and member combinations we haven’t
included experiment and the results for the two experiments are grouped
together, to fix this we need to add experiment_id
to the fixed
list.
fixed = ['source_id', 'member_id','experiment_id']
results, selection = matching(s, allvalues, fixed, project='CMIP6', **constraints)
print(len(results),len(selection))
selection.iloc[0,:]
270 135
comb {(pr,), (mrso,)}
frequency {mon}
version {v20191108}
path {/g/data/fs38/publications/CMIP6/CMIP/CSIRO-AR...
table_id {Amon, Lmon}
index (322, 680)
Name: (ACCESS-CM2, r1i1p1f1, historical), dtype: object
If we wanted to find all models/members combinations which have both
variables and both experiments, then we should have kept fixed as it
was and add experiment_id
to the allvalues list instead.
allvalues = ['variable_id', 'experiment_id']
fixed=['source_id','member_id']
results, selection = matching(s, allvalues, fixed, project='CMIP6', **constraints)
print(len(results),len(selection))
selection.iloc[0,:]
168 42
comb {(mrso, piControl), (mrso, historical), (pr, p...
frequency {mon}
version {v20191112, v20191108}
path {/g/data/fs38/publications/CMIP6/CMIP/CSIRO-AR...
table_id {Amon, Lmon}
index (322, 624, 680, 766)
Name: (ACCESS-CM2, r1i1p1f1), dtype: object
AND filter applied to remote ESGF query¶
You can of course do the same query for CMIP5, in that case you can omit
project
when calling the function since its default value is
‘CMIP5’. Another default option is local=True, this says the function
to perfom this query directly on the local database if you want you can
perform the same query on the ESGF database, so you can see what has
been published.
constraints = {'variable': ['tasmin','tasmax'], 'cmor_table': ['Amon'], 'experiment': ['historical','rcp26', 'rcp85']}
allvalues = ['variable', 'experiment']
fixed=['model','ensemble']
results, selection = matching(s, allvalues, fixed, local=False, **constraints)
print(len(results),len(selection))
selection.iloc[0,:]
1494 47
comb {(tasmax, historical), (tasmax, rcp26), (tasma...
dataset_id {cmip5.output1.CNRM-CERFACS.CNRM-CM5.historica...
version {(v20110629,), (v20110901,), (v20110930,)}
cmor_table {Amon}
index (422, 423, 424, 425, 426, 427, 476, 477, 478, ...
Name: (CNRM-CM5, r1i1p1), dtype: object
Please note how I used different attributes names because we are
querying CMIP5 now. comb
highlights all the combinations that have
to be present for a model/ensemble to be returned while we are getting a
dataset_id
rather than a directory path.
AND filter on the command line¶
The command line version of matching can be called using the
--and
flag followed by the attribute for which we want all values,
the flag can be used more than once. By default model/ensemble
combinations define a simulation, and only model, ensemble and version
are returned as final result.
!clef --local cmip5 -v tasmin -v tasmax -e rcp26 -e rcp85 -e historical -t Amon --and variable
ACCESS1.0 / r1i1p1 versions: 20120727, 20120115
ACCESS1.0 / r2i1p1 versions: 20130726
ACCESS1.0 / r3i1p1 versions: 20140402
...
MRI-CGCM3 / r2i1p1 versions: 20120701
MRI-CGCM3 / r3i1p1 versions: 20120701
MRI-CGCM3 / r4i1p2 versions: 20120701
MRI-CGCM3 / r5i1p2 versions: 20120701
MRI-ESM1 / r1i1p1 versions: 20140210, 20130307
NorESM1-M / r1i1p1 versions: 20120412
NorESM1-M / r2i1p1 versions: 20120412
NorESM1-M / r3i1p1 versions: 20120412
inmcm4 / r1i1p1 versions: 20130207
The same will work for --remote
and cmip6
!clef --remote cmip6 -v pr -v mrso -e piControl -mi r1i1p1f1 --frequency mon --and variable_id
ACCESS-CM2 / r1i1p1f1 versions: v20191112
ACCESS-ESM1-5 / r1i1p1f1 versions: v20191214
AWI-ESM-1-1-LR / r1i1p1f1 versions: v20200212
...
NorESM2-MM / r1i1p1f1 versions: v20191108
SAM0-UNICON / r1i1p1f1 versions: v20190910
TaiESM1 / r1i1p1f1 versions: v20200302, v20200211