Mortar has joined Datadog, the leading SaaS-based monitoring service for cloud applications. Read more about what this means here.

MongoDB Tasks

MongoDB Tasks perform Luigi tasks on MongoDB databases.

mortar-luigi

These Tasks are part of the mortar-luigi project, an open-source collection of extensions for using Mortar from within Luigi. It is installed automatically when you use the mortar local:luigi or mortar luigi commands.

Configuration

All MongoDB Tasks require the following section to be added to your client.template.cfg file:

[mongodb]
mongo_conn: ${CONN}
mongo_db: ${DB}
mongo_input_collection: ${COLLECTION}

You'll also need to set the following Secure Configuration Parameters on your project:

    mortar config:set CONN=mongodb://<username>:<password>@<host>:<port>
    mortar config:set DB=<databasename>
    mortar config:set COLLECTION=lastfm_plays

SanityTestMongoDBCollection

The function of this sanity test is to provide an automated smoke test of whether data was successfully written to the MongoDB collection. It uses sentinel ids and a threshold number of records to do a quick check on whether the collection has been populated.

Example Usage

class SanityTestMyMongoDBCollection(mongodb.SanityTestMongoDBCollection):

    #Id field to check
    id_field = 'id'

    # number of results required to be returned for each primary key
    result_length = luigi.IntParameter(5)

    # when testing specific ids, how many are allowed to fail
    failure_threshold = luigi.IntParameter(2)

    # number of entries required to be in the table
    min_total_results = luigi.IntParameter(100)

    def collection_name(self):
        return 'my-collection-name'

    def output_token(self):
        return 'path-to-token-output'

    # sentinel ids expected to be in the result data
    def ids(self):
        return ["id1", "id2", "id3", "id4", "id5"]

    def requires(self):
        return [WriteMongoDBCollections()]

Elements specific to this task are:

  • id_field: Id field used to select records.
  • result_length: For each id tested, minimum acceptable number of records to be returned.
  • failure_threshold: When testing the list of ids, how many ids can fail before throwing an error.
  • min_total_results: Minimum number of records required to be in the table.

The collection_name function returns the name of the collection to be tested. The ids function returns a list of the sentinel ids to be checked. No more than failure_threshold of these ids can fail to exist or return results smaller than result_length rows.

output_token defines where Luigi should put its task finished token.

requires is a method that needs to be defined for every Luigi task, and it indicates dependencies for the task. This can either be other Luigi tasks, data in a specified location, or nothing.

Example in Context

In the mortar-recsys project, you can see an example of how the SanityTestMongoDBCollection Task works in a completed pipeline. In this example, the Task checks to see if the output of a recommendation engine, stored in Mongo, contains a few expected pieces of data.