SCHEDULE
Automated Job Submission System
Guide and Reference Manual

2.1.12 Startup window

Each job can be assigned a startup window interval. The startup window is used by jobs that are waiting for a start time in the scheduling queue B. This interval represents the allowed time period during which a job must exit from the B queue.

As an example, a job is set to start at 12:00. The system goes down at 11:50 and comes back up at 12:30. When the system comes back up the scheduler decides whether or not to actually start the 12:00 job based on the window interval. The job is now 30 minutes behind schedule. If the window interval is 30 minutes or greater the job is started. If it is less then 30 minutes then it is NOT started. ⁴

The main objective of the startup window is to allow a system that has been down to gracefully catch up and not create unnecessary or undesired batch jobs.

For example to establish 15 minutes as the startup window for your job MYJOB use the following commands.

Schedule> chjob myjob -general=startup_window:00:15

2.1.13 Execution window

The execution window is the interval in which a job must be initiated and completed. If a job goes beyond this window then any initiates that it might have started are not activated. If you look at the scheduling queues this is the time it takes a job to travel from the C or D queue to the M queue.

As an example, a series of linked batch jobs A, B, and C were started on Monday morning. Normally A initiates B and B initiates C. The A job is submitted and then it activates the B job. Due to an error the B job is placed in the PENDING state and forgotten. On Thursday someone notices that the B job is still on the system and PENDING. They start the job. Unknown to him this job is no longer needed. The B job has now been in the system for 3 days. If the execution window is set to less then 3 days the C job will NOT be started when the B job terminates. If it had been set to 3 days or greater the C job would be activated.

The objective of the execution window interval is to allow a system that has been down or work that has been postponed for a lengthy time a graceful method for catching up without creating extra batch jobs.

For example to establish 12 hours as the execution window for your job MYJOB use the following type of command.

Schedule> chjob myjob -general=execute_window:12

2.1.14 Pre/Post scripts

For each job two scripts can be defined. The pre-job script is a file that contains a series of commands that are to be executed before the job's own commands. The post-job script is executed afterwards. This is a useful place for operations that are common to all jobs (or a family of jobs)⁵.

By modifying your local default job you can have a common pre-job and post-job command added automatically to each new job that you create. For example, to have the script start_job execute at the beginning of each new job that you create use the following the command.

Schedule> mkjob default -general=pre_com_file:/usr/start_files/start_job Schedule> mkjob new_job

When the NEW_JOB is initiated and enters an execution queue the temporary script contains the following type of commands.

/usr/start_files/start_job ... job commands ...

Both the pre and the post script should be set up in such a way as to always complete with a success status code. In this way the main procedure will always get a chance to do its work.

To include the actual contents of a PreCom or PostCom file in the temporary command, use a command of the following format:

Schedule> mk default -general=pre_com_file:"I myfile"

The "I" indicates that the contents of "myfile" are to be included into the temporary script that is submitted.

2.1.15 Job sets

A job in the SCHEDULE database is essentially a self-contained processing unit. All the information needed to perform the processing is defined for each job in the control file. A group of jobs can be grouped into a set.

A set of jobs is just a collection of jobs that are connected to each other using either prerequisites or initiates. They can be in the same or different directories. They can be on the same or different nodes in a network.

Three bits of information are passed between members of a job set during execution to control and coordinate the processing. This allows for multiple invocations of the same job set processing different data without any interference between them. They are:

Set id. - The set id. is just a number that is assigned to the first job that starts up the job set. This number can be manually specified or the system can automatically assign one. ⁶
This value is available to the job through the environment variable SCHEDULE_SET_ID.
Set parameter - The set parameter is just text data (up to 56 characters) that is passed into each job as it starts to execute. It is available to the job as the environment variable SCHEDULE_SET_PARAMETER. This value is established by the first job in the job set that is initiated. Later jobs can modify this value.
Set tag - The set tag is also just text data (up to 32 characters) that is passed into each job as it starts to execute. It is available to the job as the environment variable SCHEDULE_SET_TAG. The only difference between the tag and the parameter is that the tag picks up its initial value from the control file. When a job set is first initiated and no tag value is explicitly provided then this value is picked up from the job tag field of the control data.

2.1.15.1 Using set parameters

The set parameter is used to pass information between jobs. For example, we have set up a series of three jobs (JOB1, JOB2 and JOB3 each initiates the next job) that process a given data file into a report. To start up the series and send along the file name use a command of the following format:

Schedule> submit job1 -set_parameter="DATAFILE1.DAT" -initiates

The -initiates qualifier is present so that all initiate jobs will be triggered on completion. (The default behavior of the submit command is to just run the one job independently of all interconnections.)

The first job (JOB1) reads and processes the input data file and generates an output file which is then processed by the second job (JOB2). The commands for the first job (JOB1) would look something like the following.

define input_file 'schedule_set_parameter' process_data_program # # pass the data to job2 # Schedule> chque -entry='schedule_entry'- -set_parameter="output.dat"

2.1.15.2 Using set tags

The set tag value is used to pass data between jobs in a job set. The first job in a job set that gets initiated will either pick up the set tag value from the initiate operation or from the job tag value in the control file.

For example, we have a series of jobs (MONTHLY, DAILY and REPORT, each job initiates the next in this order). The MONTHLY job when completed initiates DAILY. DAILY then initiates REPORT. In the REPORT job it is important to be able to identify which job came first (either MONTHLY or DAILY). This is easily done by establishing a default tag values for each job in the following fashion.

Schedule> chjob monthly -general=job_tag:"MONTHLY" Schedule> chjob daily -general=job_tag:"DAILY"

In the REPORT job a check of the tag value can be made to determine how the set was started.

unix> if schedule_set_tag .eqs. "MONTHLY" then goto extra_stuff # . . extra_stuff

2.1.15.3 Using set ids

The set id. number is a number that is used to identify a job set. All members of a job set are assigned the same job set number during processing. The job set number can be left off and the system will automatically assign one. Alternately a number can be explicitly provided when the first job of the set is initiated. The automatically assigned numbers are always in the range 1 to 99,999. Thus it is best to only assign manually numbers greater than this.

The set number is needed if a failed job must be rerun manually to allow a job set to complete its processing. For example to start up a job to just run by itself and use a given set id., use the following command.

Schedule> submit job1 -set_id=23

2.1.16 Automatic job restarts

Any job can be set up to be automatically resubmitted for execution if a failure occurs. Typical failures are a node shutdown or a program aborting. In a cluster environment it is useful to have jobs submitted into a generic batch queue. In this way if a node goes down the job gets resubmitted into the generic queue which can then in turn route the job to another node.

To enable restarts on any particular job set the restart count to a number greater then zero.

Schedule> chjob /demo/a/start -general=restart_count=4

The restart count indicates how many times a job can be restarted if it fails. After this number of restarts have occurred the job is considered to have failed. Once a job has terminated (either successfully or failed) the initiate list associated with the job is examined and any appropriate jobs on the list are started.

A group of environment variables is automatically set up whenever a job is submitted. These symbols can be used by the job to determine whether or not the job is restarting. The environment variables used are listed below:

Symbol Description

SCHEDULE_STEP current step number

SCHEDULE_ENTRY scheduling queue entry number

SCHEDULE_RESTARTING 0 or 1 if job is restarting

SCHEDULE_RESTART_COUNT number of restarts that have occurred

SCHEDULE_RESTART_LIMIT allowed number of restarts

Symbol	Description
SCHEDULE_STEP	current step number
SCHEDULE_ENTRY	scheduling queue entry number
SCHEDULE_RESTARTING	0 or 1 if job is restarting
SCHEDULE_RESTART_COUNT	number of restarts that have occurred
SCHEDULE_RESTART_LIMIT	allowed number of restarts

The basic idea is to modify the STEP number as a job proceeds. At the beginning of the job a check is made for the RESTARTING flag, if it is set then a GOTO is done to the correct step. A typical command sequence that uses this process would appear as follows:

$ if schedule_restarting then goto step_'schedule_step' $! $step_1: Schedule> chque -entry='schedule_entry'-step=1 $ run program1 # step_2 schedule chque -entry='schedule_entry'-step=2 program2 #

The above method is excellent for handling certain types of failures. Another method that also can be used to handle failures is to assign an initiate list to the job. On that initiate list provide a job with a condition level of FATAL and/or ERROR. This error job will then be started whenever the job fails (or still fails after exhausting all allowed restarts).

Note

⁴ A value of `not set' is equivalent to an infinite window interval.

⁵ In a Satellite-Central network configuration be sure that these pre/post command files exist on all systems. Only the commands that are contained in the SCHEDULE Database are copied across the net.

⁶ The automatically assigned numbers are in the range 1 to 99,999.

2.2 Other scheduling features

Several other features of SCHEDULE are explained in more detail in the following sections.

2.3 EnterpriseSCHEDULE Workgroups

EnterpriseSCHEDULE features the concept of workgroups. Workgroup commands are detailed in Chapter 12. The basic functions of Workgroups are to:

Enable the EnterpriseSCHEDULE database to be replicated physically across nodes
Determine the way jobs run across any number of nodes in an enterprise.

Note

It is recomended that Workgroup configuration and maintenance be performed in the EnterpriseSCHEDULE Windows Client. The client features a Wizard that makes the setup of a Workgroup a simple task. This section is included to explain the concepts behind Workgroups only. See Windows Client documentation for instructions on working with Workgroups.

Workgroups feature the capability to apply load balancing to EnterpriseSCHEDULE job flow by determining the best way to distribute workload. For instance you can apply more job runs to your more powerful servers and allow the jobs to run across multiple machines, alleviating the load on a single server. This can all be automated by establishing several simple parameters to the Workgroup which the jobs are running in.

A workgroup is considered to be 2 or more nodes in a networked environment that can have duplicate versions of the EnterpriseSCHEDULE database (through automated replication) and/or facilitates the execution of jobs on multiple nodes.

2.3.1 Creating a Workgroup

All Workgroups are created in the \\syscontrol\workgroups folder in the EnterpriseSCHEDULE database. You must have All Access rights to create, modify, delete or copy and paste a workgroup. To create a new Workgroup:

Right click on the \syscontrol\ folder and select New > Folder.
Name the new folder "Workgroups".
Right click on the Workgroups folder and select Workgroup.
Name your workgroup

The workgroup can now be configured to replicate job data and/or distribute job activity.

2.3.2 Configuring a Workgroup

The Workgroup is configured using either the EnterpriseSCHEDULE Windows Client or using commands. For a description of the commands used to configure the workgroup, see Chapter 12.

2.3.2.1 Execution mode

The job's execution mode determines how jobs assigned to a workgroup will be processed. The choices are:

Run on the least busy node in the workgroup - This selection will cause the scheduler to choose the node that is currently running the least number of jobs based on it's power rating, job limit and Vote count.
Run on the first accessible node in the workgroup - This selection will cause the scheduler to find the node that is first accessible based on it's order in the members list.
Run on all nodes in a round robin fashion - This selection will run jobs throughout the Workgroup in a round robin fashion. In other words, the first job will run on the first node in the Workgroup, the second will run on the second node, etc.
Run on ALL nodes in the workgroup - This selection will cause the scheduler to run every job on every node. Keep in mind that this will cause multiple runs of each job. Once a job runs with the run-on-all workgroup type, all downstream jobs are required to be controlled by a workgroup that is also of type run-on-all. In addition, the node where each job instance is running must be a member of that workgroup.
No jobs can run in this workgroup - No jobs will run in the Workgroup

2.3.2.2 Database mode

The settings for the Database options are:

The database is shared using a fully clustered system - The database is already shared using a fully clustered system and will not use any Workgroup replication.
The database is shared using a limited cluster system - The database is already shared using a limited clustered system and will not use any Workgroup replication.
The database is distributed between nodes - The database is distributed between nodes and will be heterogeneous. Each node may have different database entries.
The database is distributed and replicated between nodes - The database is distributed across nodes and all entries will be replicated. Every node in the databse will have identical folders, jobs, calendars, variables etc.
Workgroup not used for database control - The data stays in it's default location and is not distributed or replicated.

Contents

Index

SCHEDULE Automated Job Submission System Guide and Reference Manual

2.1.12 Startup window

4 A value of `not set' is equivalent to an infinite window interval.

5 In a Satellite-Central network configuration be sure that these pre/post command files exist on all systems. Only the commands that are contained in the SCHEDULE Database are copied across the net.

6 The automatically assigned numbers are in the range 1 to 99,999.

2.3.2.2 Database mode

SCHEDULE
Automated Job Submission System
Guide and Reference Manual

⁴ A value of `not set' is equivalent to an infinite window interval.

⁵ In a Satellite-Central network configuration be sure that these pre/post command files exist on all systems. Only the commands that are contained in the SCHEDULE Database are copied across the net.

⁶ The automatically assigned numbers are in the range 1 to 99,999.