Skip to content

Introduction and Goals

This document was inspired by the arc42 template.

It describes LAPIS (Lightweight API for Sequences) and SILO, which is a platform to give easy access to genomic sequence data alongside metadata of the sequenced probes. It is used to filter potentially large sequence data and return the result to the user through web access, so that a user can develop their own evaluation of the data.

The initial implementation of LAPIS is specialized for SARS-CoV-2 and is used by CoV-Spectrum. In this approach, we want to develop a generalized API that is configurable for a wider range of organisms. The solution consists of two systems, LAPIS and SILO, which are designed to work together, but they could be operated independently of each other. SILO serves as a database with a specialized query language (SILO queries), which could in principle be exchanged by any other database.

This document focuses on LAPIS, but we usually think of it as operated as a single unit with SILO. We refer to this special configuration as SILO-LAPIS.

The following goals have been established for this system:

Priority
1Provide an API to query large sets of genomic sequence data in a very efficient way.
2Provide the infrastructure such that other researchers (“maintainers”) can easily setup their own instance.

Requirements Overview

Requirement
Create an instance for a given organismCreate an instance of the whole system by giving a configuration for a organism.
Store data efficientlyStore data in compressed form.
Provide web access to dataProvide endpoints for custom user queries to the data.

Quality Goals

Quality CategoryQualityDescription
UsabilityEase of useEase of use for the user to hand in queries.
Ease of useEase of use for maintainers to create a new instance for a new organism.
Ease of learningThe queries should be as easy to write as possible. We provide material to assist in learning the query language.
Performance efficiencyTime behaviourIt is possible to query millions of sequences in less than a second.
ScalabilityPerformance (query response time, memory usage) grows at most linearly with the number of stored sequences.
MaintainabilityReusabilityIt is possible to use LAPIS with any other database that implements the SILO query language.
TestabilitySILO-LAPIS is well tested on end to end scope. The tests serve as examples for users and maintainers.

Stakeholders

RoleExpectations
Database researcherCan develop new genomic data engineering algorithms for LAPIS.
DeveloperCan fix bugs and add new features to LAPIS.
User (beginner level)Can write simple queries to LAPIS by hand and get a fast result.
User (advanced level/tool developers)Can write advanced, possibly programmatically generated queries to LAPIS and get a fast result.
MaintainerCan host the software on their own servers for their own organism configuration.