Skip to content

Advanced query

For those readers familiar with the variant query feature advanced queries can be seen a superset of variant queries, in which not only variant information can be queried but also metadata information.

Advanced queries should allow an arbitrary combination of filters. Standard LAPIS filters take the form filter1 AND filter2 AND ... filterN but they cannot query for more custom cases such as filter1 OR (filter2 AND NOT filter3), the advanced queries feature allows such combinations and the creation of more custom queries.

Advanced queries can be used to filter sequences and metadata and can be passed to the server through the query parameter advancedQuery. Don’t forget to encode/escape the query correctly (in JavaScript, this can be done with the encodeURIComponent() function)!

The formal specification of the query language is available here as an ANTLR v4 grammar. In following, we provide an informal description and examples. The respective unit test provides a full list of possible atomic queries.

Features

Variant Queries

We support mutation and insertion queries for both nucleotide and amino acid sequences, see the mutation filter page for more details. Note the addition of the MAYBE operator to query ambiguous nucleotide symbols.

Metadata Queries

Standard metadata queries take the form metadataField=query, for example

country=Ghana

Note that if the metadata field does not only contain letters and numbers it must be enclosed in single quotes, for example

country='United States of America'

To search for empty fields (fields that are null) use the ISNULL operator:

IsNull(host)

For dates and numbers (int or float) we allow queries for ranges, using the >= and <= operators, for example:

date>=2021-01-01
date<=2021-12-31

For string fields we also allow regex search. To use the regex substring search on a metadata field you must append .regex to the end of the metadata field name and enclose the query in single quotes:

host.regex='.*bos.*'

For regex searches the advanced queries use the google/re2 regex syntax.

Boolean operators

The query language understands Boolean logic. Expressions can be connected with & (and), | (or) and ! (not). Both & and AND are recognized as and, | and OR are recognized as or, and ! and NOT are recognized as not. Parentheses ( and ) can be used to define the order of the operations.

We also add a custom syntax N-of and exactly-N-of to match sequences for which at least or exactly N out of a list of expressions are fulfilled.

Examples

  • Get the sequences with the nucleotide mutation 300G, without a deletion at position 400 and either the AA change S:123T or the AA change S:234A:

    300G & !400- & (S:123T | S:234A)

    This can also be written as

    300G AND NOT 400- AND (S:123T OR S:234A)
  • Get all sequences from the USA that do not have cows as a host and that also have the mutation 300G:

    NOT host='bos taurus' AND 300G AND country=USA
  • Get the sequences with at least 3 out of five mutations/deletions:

    [3-of: 123A, 234T, S:345-, ORF1a:456K, ORF7A:100-]
  • Get the sequences that fulfill exactly 2 out of 4 conditions:

    [exactly-2-of: 123A & 234T, !234T, S:345- | S:346-, [2-of: 222T, 333G, 444A, 555C]]