Skip to content

Start LAPIS and SILO

Every LAPIS instance needs to be backed by a SILO instance, that acts as data source. SILO could be operated stand-alone. LAPIS is meant as a layer of convenience and abstraction around SILO.

We provide Docker images of SILO and LAPIS that are ready to use. We recommend using those Docker images, so in this tutorial, we explain how to use them. You will build a Docker Compose file step by step.

Prerequisites

  • You have Docker installed.
  • Some knowledge on how to use Docker and Docker Compose.
  • Make sure you have the latest Docker images:
Terminal window
docker pull ghcr.io/genspectrum/lapis-v2
docker pull ghcr.io/genspectrum/lapis-silo
  • Create a directory for the tutorial:
Terminal window
mkdir ~/lapisExample
cd ~/lapisExample

Writing Configuration

Both LAPIS and SILO need to know which metadata columns are available in the dataset. Furthermore, you need to define which column acts as primary key and which column should be used to generate partitions in SILO. Also, LAPIS is configured to be an open instance, meaning that the underlying data requires no visibility restrictions.

~/lapisExample/config/database_config.yaml
schema:
instanceName: testInstance
metadata:
- name: primaryKey
type: string
- name: date
type: date
- name: region
type: string
generateIndex: true
- name: country
type: string
generateIndex: true
- name: division
type: string
generateIndex: true
- name: pangoLineage
type: pango_lineage
- name: age
type: int
- name: qc_value
type: float
opennessLevel: OPEN
primaryKey: primaryKey
dateToSortBy: date
partitionBy: pangoLineage

Starting SILO Preprocessing

Download the example dataset from the end-to-end tests:

  • pangolineage_alias.json
  • reference_genomes.json
  • small_metadata_set.tsv
  • all fasta files for the sequences

SILO expects fasta files (possibly compressed via zstandard or xz) in the same directory with naming scheme nuc_<sequence_name>.fasta for nucleotide sequences or gene_<sequence_name>.fasta for amino acid sequences. The sequence_namess have to match the names defined in the reference_genomes.json.

Put those files into the folder ~/lapisExample/data/.

Now SILO needs to know where it can find those files. You have to provide a “preprocessing config” for that. Note that you need to provide the paths where the files will be stored in the Docker container. Filenames are relative to the input directory. Since you don’t provide the input directory explicitly, SILO will fall back to the default /data.

~/lapisExample/config/preprocessing_config.yaml
metadataFilename: 'small_metadata_set.tsv'
pangoLineageDefinitionFilename: 'pangolineage_alias.json'
referenceGenomeFilename: 'reference_genomes.json'

To start the preprocessing, you have to:

  • start SILO in the preprocessing mode
  • mount the data into the container to the default location
  • mount the preprocessing config into the container to the default location
  • mount the database config into the container to the default location
  • mount the output directory into the container to the default location

Add a corresponding service to the docker-compose.yaml:

~/lapisExample/docker-compose.yaml
version: '3.9'
services:
silo-preprocessing:
image: ghcr.io/genspectrum/lapis-silo
command: --preprocessing
volumes:
- ~/lapisExample/data:/preprocessing/input
- ~/lapisExample/config/preprocessing_config.yaml:/app/preprocessing_config.yaml
- ~/lapisExample/config/database_config.yaml:/app/database_config.yaml
- ~/lapisExample/output:/preprocessing/output

After this has completed, the output directory should contain the result of the preprocessing. That result has to be provided to SILO in the next step.

Starting SILO

To start the SILO api, you have to:

  • start SILO in the api mode,
  • expose port 8081,
  • mount the preprocessing result into the container,
  • wait for the preprocessing to complete.

Add a corresponding service to the docker-compose.yaml:

~/lapisExample/docker-compose.yaml
version: '3.9'
services:
silo-preprocessing:
image: ghcr.io/genspectrum/lapis-silo
command: --preprocessing
volumes:
- ~/lapisExample/data:/preprocessing/input
- ~/lapisExample/config/preprocessing_config.yaml:/app/preprocessing_config.yaml
- ~/lapisExample/config/database_config.yaml:/app/database_config.yaml
- ~/lapisExample/output:/preprocessing/output
silo-api:
image: ghcr.io/genspectrum/lapis-silo
command: --api
ports:
- '8081:8081'
volumes:
- ~/lapisExample/output:/data
depends_on:
silo-preprocessing:
condition: service_completed_successfully

Execute

Start the services
docker compose up

Now SILO should be available at http://localhost:8081 and http://localhost:8081/info should show that SILO contains sequences.

Starting LAPIS

Now you can start LAPIS. You have to:

  • expose port 8080 to the host.
  • mount the database configuration and the reference genomes to the default locations in the Docker container.
  • provide LAPIS with the SILO URL.

Add a corresponding service to the docker-compose.yaml:

~/lapisExample/docker-compose.yaml
version: '3.9'
services:
lapis:
image: ghcr.io/genspectrum/lapis-v2
command: --silo.url=http://silo-api:8081
ports:
- '8080:8080'
volumes:
- ~/lapisExample/config/database_config.yaml:/workspace/database_config.yaml
- ~/lapisExample/data/reference_genomes.json:/workspace/reference_genomes.json
silo-preprocessing:
image: ghcr.io/genspectrum/lapis-silo
command: --preprocessing
volumes:
- ~/lapisExample/data:/preprocessing/input
- ~/lapisExample/config/preprocessing_config.yaml:/app/preprocessing_config.yaml
- ~/lapisExample/config/database_config.yaml:/app/database_config.yaml
- ~/lapisExample/output:/preprocessing/output
silo-api:
image: ghcr.io/genspectrum/lapis-silo
command: --api
ports:
- '8081:8081'
volumes:
- ~/lapisExample/output:/data
depends_on:
silo-preprocessing:
condition: service_completed_successfully

Execute

Start the services
docker compose up

again. Now LAPIS should be available at http://localhost:8080. LAPIS offers a Swagger UI that serves as a good starting point for exploring its functionalities.

Further Reading