Scheduling Jobs to Run Automatically
Your Pig and Luigi scripts can be scheduled to run automatically on a schedule you define with Mortar.
Creating a New Scheduled Job
Jobs can be scheduled to run daily, weekly, or monthly at a specified time of day. To setup a schedule, login to Mortar and visit the Schedules page.
Any existing schedules will be shown, and new schedules can be created by clicking on the Schedule Job button on the top right of the page.
Common Schedule Fields
The Schedule section at the top of the Schedule Job modal tells Mortar when you'd like your job to run.
The meaning of each field is as follows:
Name: A unique name that you provide for your schedule. For example "My Music Recommendation Engine".
Repeat: How often should the job run? Options include Daily, Weekly, and Monthly.
Day of Month / Day of Week: Which day of the month or week should the job run? Only required for Weekly and Monthly scheduled jobs.
Time of Day (UTC): Time of day that the job should run. Important: to avoid issues with Daylight Savings Time, this time is in UTC. Convert your time to UTC.
Luigi Pipeline Schedules
To schedule a Luigi pipeline, select the name of the Mortar Project where your pipeline lives in the Project field. Then, select your Luigi pipeline script in the Script field.
Your pipeline will run with the code on the master branch of your project. So, whenever you're ready to promote code to run automatically, just push it to the master branch of your project in GitHub. See the "Deploy To Mortar" section below for more info on how to do so.
You can also set a few additional parameters for your job:
Retry Failures: If a Task in your Luigi pipeline fails or is incomplete (missing required data), Mortar can rerun the pipeline every 15 minutes to retry it until it succeeds. This is very helpful for production pipelines that need to wait for data to arrive, or for pipelines where jobs sometimes fail but can easily be restarted to fix them. If you enable this feature, you can choose how many hours to retry before giving up.
Receive email when job finishes: If this option is checked, you will receive an email when your luigi pipeline completes (success or failure). If you have selected the Retry Failures option, you will only receive an email when the last retry is completed.
Script Parameters: If your Luigi pipeline needs command-line parameters, you can provide them here. Be sure to uses dashes instead of underscores in your parameter names, as Luigi expects all parameters to be provided with dashes.
Pig Job Schedules
You can schedule a Pig job for either a Mortar Project or a Web Project.
To schedule a Mortar Project Pig job, select the name of the Mortar Project where your Pig script lives in the Project field. Then, select your Pig script in the Script field.
Skip down to All Projects below to set the remaining fields.
To schedule a Web Project Pig job, select the name of your Web Project in the Project field. Next, follow the instructions in the All Projects section below to set the remaining fields.
The following fields are common to all scheduled Pig jobs:
Use Spot Instances: Should the Hadoop cluster run on Spot Instances? See Spot Instance Clusters for more information.
Keep Cluster Running After Job Finishes: Should the cluster be kept running for up to one hour after the job finishes? Usually you'll want to disable this for scheduled jobs, but you can leave it on for debugging if you need.
Cluster Size: The number of nodes in the Hadoop cluster to be launched.
Receive email when job finishes: If this option is checked, you will receive an email when your pig job completes (success or failure).
Email <address> if job fails: If this option is checked, an email will be sent to the provided address only if the job fails. This is often used to send an alert to your Pagerduty email endpoint if a job fails.
Script Parameters: Any parameters to provide to your Pig script. Mortar will auto-populate the default parameters defined in your script, but you can add as many parameters as you like.
Deploy To Mortar
For Mortar projects, schedules are run using the code on the master branch of your project.
In order to deploy your code to the master branch of your Mortar project, you will need to do a git commit and git push to the mortar remote. More information on git is available here, but the basic steps are as follows.
# ensure you are on the master branch
git checkout master
git add my-file-or-directory
git commit -m "my commit message"
git push mortar master