Installation guide on a single server

Global requirements

Sarus can be deployed on any VM with Docker installed. All services are deployed with the Docker images. The more powerful the machine the faster the processing.
Tasks that are particularly resource intensive include: big data sources, synthetic data generation with text and images, deep learning.
Sarus Docker images require at least 10GB of disk space for installation. Then, datasets will require more space. We recommend to start with 100GB of disk space. Ideally the data sources can fit on a local disk so that synthetic data generation and processing are done directly within the instance. Alternatively, Sarus can leverage some external computation engines (eg: Synapse SQL engine, Redshift SQL engine…).

Architecture overview

Sarus is made of several services that communicate through HTTP requests. They each run in a separate container.
Main components are:
  • DataAdmin API and UI: mainly for Data owners/administrators who will manage datasets and users’ access

  • Private Learning Gateway API: for Data scientists and analysts who will remotely run analyses on the sensitive data (without direct access)

  • Nginx as a proxy

Requirements

Sarus does not require a specific OS as it uses Docker, but needs the following softwares installed:

  • docker version 19.03.8 or higher

  • docker-compose version 1.27.x or higher. Note that many linux distro are coming with lower version

To install the software, you need at least 10GB of disk space.

We also recommend that the data sources can fit on the local disk so that synthetic data generation and processing can be done within the instance if need be.
Sarus supports GPU in order to accelerate processing, but it is not required (except for text & image data).

Resources recommendations

To install and run Sarus mainly on numeric data, we suggest:

  • GCP: c2-standard-4 instance (approx. $0.2/h)

  • AWS: c5.2xlarge (approx. $0.4/h), or m5.xlarge (approx. $0.2/h), with a 100GB attached SSD. And an AMI with a clean Ubuntu 20.04 OS.

For text & image data, more powerful instances are required.
If this is your case, let’s discuss it so that we make the most adapted recommendation.

Securing Sarus (forcing https)

We recommend to secure the connection to the Sarus API using a SSL certificate.

This will force users to query Sarus with https instead of http, which means the communication between the user (from the Admin UI or from the SDK) and Sarus will be encrypted.

Please note that, without SSL, the username and password sent by the user to Sarus for authentication are communicated in PLAIN TEXT over the network and therefore can be intercepted by an attacker. SSL certificates are tied to a specific hostname or a domain name. Therefore, the SSL certificate must be created by your IT organization, and signed by a Certificate Authority recognized by your clients (browser or SDK).
The workflow is as follow:
  • prepare a VM to install Sarus and make sure it will be accessible by its hostname over the network (your DNS has to be configured)

  • generate a SSL certificate tied to this hostname

  • copy this SSL certificate onto the VM (certificate chain and key)

  • install Sarus and configure the .env file with the path to the SSL certificate (see instruction below)

  • check that encryption is enforced by trying to connect to the Sarus UI with http: you should be redirected to https.

Installation steps on a single server with Docker compose

You should have received an installer zip file with a secret token from your Sarus representative.

You can then follow the installation steps:

1. Install docker and docker-compose

To install docker on Debian/Ubuntu, you can follow instructions there: https://docs.docker.com/engine/install/ubuntu/
To install docker-compose, see https://docs.docker.com/compose/install/:
sudo curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

2. Unzip the provided installer zip file

3. Create a .env file from the env.template

4. Configure the .env file

Set required fields (in particular secrets & passwords, specifically for the first Sarus Admin user).
For a local setup, you only need to set the different passwords in the REQUIRED section of the env template. Leave the FS_PATH variable blank.
For a setup on Google Cloud Platform, you will need to set 2 additional variables:
  • GCLOUD_KEY_FILE_PATH indicating the path on the host machine to the google cloud credentials file to use

  • FS_PATH set it to the path of the gcs bucket/directory you want to use as storage backend. Should start with gs:// (something like gs://store-datasets-264717/my_dir)

If you want to reach the UI from outside the server, don’t forget to set the NGINX_HOST variable (either with the hostname or its IP).

5. Launch the installer installer.sh

It will ask you for a password, please enter the one provided by Sarus, and it will pull the Docker images. It then launches containers and the app is then available on port 80.

6. If you want to shutdown or restart Sarus

You can do docker-compose down (adding -v will delete stored data).
You can restart with docker-compose up -d .

Network configuration

If you have provided SSL certificates using the SSL_DIR_PATH variable in the .env file:

  • Private learning gateway is on port 5000 with https enabled

  • DataAdmin api and UI are on port 443 with https enabled (+ redirect from 80)

Else:

  • Private learning gateway is on port 5000 without https

  • DataAdmin api and UI are on port 80 without https

You’re ready to go!

The first Admin user can now log in to the Sarus UI!
For this first login, use the credentials you specified in the .env file. You will be asked to change your admin password for better security.
You can then invite new users from the “Users” section in the left menu of the UI.