Jobs

A job is defined as any executable program that resides on the file system. It is represented as a file in the files system whose name is the same as the job name. Jobs can depend on each other. Jobs can also have start times before which a job may not by run.

When a job is run by the run wrapper (bin/run), two status semaphore files are created in the log directory. The first is created when a job starts and has a name of $FamilyName.$JobName.pid. This file contains some attributes of the job. When the job completes, more attributes are written to this file.

When the job completes, another semaphore file is written to the log directory. The name of this file will be $FamilyName.$JobName.0 if the job ran successfully, and $FamilyName.$JobName.1 if the job failed. In either case, the file will contain the exit code of the job (0 in the case of success and non-zero otherwise).

When a job is run by the run_with_log run wrapper, any output the job sends to stdout or stderr will be captured and stored in a file called $FamilyName.$JobName.$pid.$start_time.stdout in the log directory.

Within TaskForest, every job has a status, which is one of the following values:

  • Waiting - One or more dependencies of the job have not been met.
  • Ready - All dependencies have been met; the job will run during the next iteration of the taskforest main loop.
  • Running - The job is currently running
  • Success - The job has run successfully
  • Failure - The job was run, but the program exited with a non-zero return code
Jobs & Families

Jobs can be grouped together into ``Families.'' A family has a start time associated with it before which none of its jobs may run. A family also has a either (a) a list of days-of-the-week or (b) a calendar associated with it. Jobs within a family may only run on the days specified by the days-of-the-week or the calendar.

Jobs and families are given simple names. A family is described in a family file whose name is the family name. Each family file is a text file that contains 1 or more job names. The layout of the job names within a family file determine the dependencies between the jobs (if any). There are several reasons why text files are a good choice for Family files.

Family names and job names should contain only the characters shown below:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_

Let's see a few examples. In these examples the dashes (-), pipes (|) and line numbers are not parts of the files. They're only there for illustration purposes. The main script expects environment variables or command line options or configuration file settings that specify the locations of the directory that contain family files, the directory that contains job files, and the directory where the logs will be written. The directory that contains family files should contain only family files.

EXAMPLE 1 - Family file named F_ADMIN

   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', days => 'Mon,Wed,Fri'
02 |
03 | J_ROTATE_LOGS()
04 |
   +-------------------------------------------------------
  

The first line in any family file always contains 3 bits of information about the family: the start time, the time zone, and the days on which this jobs in this family are run, or the calendar that specifies on which dates jobs in this family are run.

In this case, this family starts at 2:00 a.m. Chicago time. The time is adjusted for daylight savings time. This family 'runs' on Monday, Wednesday and Friday only. Pay attention to the format: it's important.

Valid days are 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'. Days must be separated by commas.

All start times (for families and jobs) are in 24-hour format. '00:00' is midnight, '12:00' is noon, '13:00' is 1:00 p.m. and '23:59' is one minute before midnight.

There is only one job in this family - J_ROTATE_LOGS. This family will start at 2:00 a.m., at which time J_ROTATE_LOGS will immediately be run. Note the empty parentheses [()]. These are required.

What does it mean to say that J_ROTATE_LOGS will be run? It means that the system will look for a file called J_ROTATE_LOGS in the directory that contains job files. That file should be executable. The system will execute that file (run that job) and keep track of whether it succeeded or failed. The J_ROTATE_LOGS script can be any executable file: a shell script, a perl script, a C program etc.

To run the program, the system actually runs a wrapper script that invokes the job script. The location of the wrapper script is specified on the command line or in an environment variable.

Now, let's look at a slightly more complicated example:

EXAMPLE 2 - Job Dependencies

This family file is named WEB_ADMIN.


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |               J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS()       Delete_Old_Logs()
06 |
07 |               J_WEB_REPORTS()      
08 |
09 |    J_EMAIL_WEB_RPT_DONE()  # send me a notification
10 |
   +-------------------------------------------------------
  

A few things to point out here:

  • Blank lines are ignored.
  • A hash (#) and anything after it, until the end of the line is treated as a comment and ignored.
  • Job and family names do not have to start with J_ or be in upper case.
  • Instead of specifying days of the week on which this Family may run, this Family specifies a calendar to which it adheres. The calendar used in this case is called 'Weekdays.' For a detailed description of calendars, please see the Calendars documentation page.
EXAMPLE 3 - Time Dependencies

Let's say that we don't want J_RESOLVE_DNS to start before 9:00 a.m. because it's very IO-intensive and we want to wait until the relatively quiet time of 9:00 a.m. In that case, we can put a time dependency of the job. This adds a restriction to the job, saying that it may not run before the time specified. We would do this as follows:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |               J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS(start => '09:00')  Delete_Old_Logs()
06 |
07 |               J_WEB_REPORTS()      
08 |
09 |    J_EMAIL_WEB_RPT_DONE()  # send me a notification
10 |
   +-------------------------------------------------------
  

J_ROTATE_LOGS will still start at 2:00, as always. As soon as it succeeds, Delete_Old_Logs is started. If J_ROTATE_LOGS succeeds before 09:00, the system will wait until 09:00 before starting J_RESOLVE_DNS. It is possible that Delete_Old_Logs would have started and complete by then. J_WEB_REPORTS would not have started in that case, because it is dependent on two jobs, and both of them have to run successfully before it can run.

For completeness, you may also specify a timezone for a job's time dependency as follows:

05 | J_RESOLVE_DNS(start=>'10:00', tz=>'America/New_York') ...
EXAMPLE 4 - Job Forests

You can see in the example above that line 03 is the start of a group of dependent job. No job on any other line can start unless the job on line 03 succeeds. What if you wanted two or more groups of jobs in the same family that start at the same time (barring any time dependencies) and proceed independently of each other?

To do this you would separate the groups with a line containing one or more dashes (only). Consider the following family:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 |               J_ROTATE_LOGS()
04 |
05 | J_RESOLVE_DNS(start => '09:00')    Delete_Old_Logs()
06 |
07 |               J_WEB_REPORTS()      
08 |
09 |    J_EMAIL_WEB_RPT_DONE()  # send me a notification
10 |
11 |-------------------------------------------------------
12 |
13 | J_UPDATE_ACCOUNTS_RECEIVABLE()
14 |
15 | J_ATTEMPT_CREDIT_CARD_PAYMENTS()
16 |
17 |-------------------------------------------------------
18 |
19 | J_SEND_EXPIRING_CARDS_EMAIL()
20 |
   +-------------------------------------------------------

Because of the lines of dashes on lines 11 and 17, the jobs on lines 03, 13 and 19 will all start at 02:00. These jobs are independent of each other. J_ATTEMPT_CREDIT_CARD_PAYMENT will not run if J_UPDATE_ACCOUNTS_RECEIVABLE fails. That failure, however will not prevent J_SEND_EXPIRING_CARDS_EMAIL from running.

Finally, you can specify a job to run repeatedly every 'n' minutes, as follows:


   +-------------------------------------------------------
01 |start => '02:00', tz => 'GMT', calendar => 'Weekdays'
02 |
03 | J_CHECK_DISK_USAGE(every=>'30', until=>'23:00')
04 |
   +-------------------------------------------------------

This means that J_CHECK_DISK_USAGE will be called every 30 minutes and will not run on or after 23:00. By default, the 'until' time is 23:59. If the job starts at 02:00 and takes 25 minutes to run to completion, the next occurance will still start at 02:30, and not at 02:55. By default, every repeat occurrance will only have one dependency - the time - and will not depend on earlier occurances running successfully or even running at all. If line 03 were:

J_CHECK_DISK_USAGE(every=>'30', until=>'23:00', chained=>1)

...then each repeat job will be dependent on the previous occurance.

EXAMPLE 5 - Tokens
EXAMPLE 6 - Calendars