Advanced query
For those readers familiar with the variant query feature advanced queries can be seen a superset of variant queries, in which not only variant information can be queried but also metadata information.
Advanced queries should allow an arbitrary combination of filters. Standard LAPIS filters take the form
filter1 AND filter2 AND ... filterN
but they cannot query for more custom cases such as filter1 OR (filter2 AND NOT filter3)
,
the advanced queries feature allows such combinations and the creation of more custom queries.
Advanced queries can be used to filter sequences and metadata and can be passed to the server through the query parameter
advancedQuery
. Don’t forget to encode/escape the query correctly (in JavaScript, this can be done with the
encodeURIComponent()
function)!
The formal specification of the query language is available here as an ANTLR v4 grammar. In following, we provide an informal description and examples. The respective unit test provides a full list of possible atomic queries.
Features
Variant Queries
We support mutation and insertion queries for both nucleotide and amino acid sequences, see the mutation filter
page for more details. Note the addition of the MAYBE
operator to query ambiguous nucleotide symbols.
Metadata Queries
Standard metadata queries take the form metadataField=query
, for example
country=Ghana
Note that if the metadata field does not only contain letters and numbers it must be enclosed in single quotes, for example
country='United States of America'
To search for empty fields (fields that are null
) use the ISNULL
operator:
IsNull(host)
For dates and numbers (int or float) we allow queries for ranges, using the >=
and <=
operators, for example:
date>=2021-01-01
date<=2021-12-31
For string fields we also allow regex search. To use the regex substring search on a metadata field you must append .regex
to the end of the metadata field name
and enclose the query in single quotes:
host.regex='.*bos.*'
For regex searches the advanced queries use the google/re2 regex syntax.
Boolean operators
The query language understands Boolean logic. Expressions can be connected with &
(and), |
(or) and !
(not).
Both &
and AND
are recognized as and
, |
and OR
are recognized as or
, and !
and NOT
are recognized as not
.
Parentheses (
and )
can be used to define the order of the operations.
We also add a custom syntax N-of
and exactly-N-of
to match sequences for which at least or exactly N
out of a list of expressions are fulfilled.
Examples
-
Get the sequences with the nucleotide mutation 300G, without a deletion at position 400 and either the AA change S:123T or the AA change S:234A:
300G & !400- & (S:123T | S:234A)This can also be written as
300G AND NOT 400- AND (S:123T OR S:234A) -
Get all sequences from the USA that do not have cows as a host and that also have the mutation 300G:
NOT host='bos taurus' AND 300G AND country=USA -
Get the sequences with at least 3 out of five mutations/deletions:
[3-of: 123A, 234T, S:345-, ORF1a:456K, ORF7A:100-] -
Get the sequences that fulfill exactly 2 out of 4 conditions:
[exactly-2-of: 123A & 234T, !234T, S:345- | S:346-, [2-of: 222T, 333G, 444A, 555C]]