Integration with MongoDB

Requirements & installation

Sarus can be deployed on a dedicated VM with Docker installed or on a Kubernetes cluster. For a deployment on a VM, Sarus does not require a specific OS as all services are deployed with Docker images.
Sarus Docker images require at least 10GB of disk space for installation. Then, datasets

will require more space. We recommend to start with 100GB of disk space. We recommend to have at least 4 cores / vCPU and 16GB of RAM.

For the full install documentation, please visit the Installation guide.

Constraint on the data structure

Sarus is a powerful tool for analytics and AI. Currently, it supports only datasets with schema. For data stored in MongoDB as collections, it means that all documents in one collection must have the same schema.

Sarus infers this schema from the first document in the collection. It will recognize the data types and generate synthetic data accordingly.

Connecting Sarus to MongoDB using the Atlas SQL interface and the JDBC driver

Sarus connects to MongoDB through the Atlas SQL Interface and the JDBC driver. You first need to enable it on your instance. Check the MongoDB documentation to learn more about the Atlas SQL Interface

  1. Connect to Your Federated Database Instance

  1. Navigate to your federated database instance. If it isn’t already displayed, select Data Federation from the left navigation panel.

  2. Click Connect to open the federated database instance connection modal.

  3. Select Connect using the Atlas SQL Interface.

  4. Select JDBC Driver.

  5. Copy your connection information. Atlas Data Federation provides a connection string to connect to your federated database instance. You’ll need this in a later step.

_images/mongodb-generate-string.png
  1. Connect from Sarus.

Once you’re logged in to Sarus UI, you need to create a DataConnection object that will allow Sarus to query MongoDB.

  1. Add a new DataConnection. In Sarus, click DataConnection, create and click on the MongoDB logo.

  2. In the form, enter the following information:

  • name: given to the dataConnection to be created.

  • username: the MongoDB user to connect with.

  • password: the MongoDB user’s password to connect to MongoDB.

  • server: the server name from the URL exposing the MongoDB Atlas SQL interface. This is the part of connection string you got from step A.5, which is between the prefix ‘jdbc:mongodb://’ and the name of your virtual database, without any ‘/’ symbols.

  • database: the name of your virtual database.

When you validate the form, a test is performed to check that Sarus can actually access the MongoDB collection.

_images/screenshot_dataconnection_mongodb.png

You’re now ready to go!

You can create new datasets in Sarus and make your MongoDB accessible in a privacy-preserving way through Sarus.