CHAPTER 5

DYNAMIC TASK DUPLICATION BASED SCHEDULING ALGORITHM (DyDupSA)

5.1 INTRODUCTION

Parallel computing has many impacts, from engineering and scientific applications to commercial applications. The reasons for the popularity and attraction of parallel computing are the reduction of computer hardware costs and computation speed increase. The performance of the parallel computing is based on efficient scheduling. Scheduling is a process that maps and manages the execution of interdependent tasks on to the distributed processors. Scheduling and allotment of the tasks to the processors are the major issues in parallel computing. The scheduling problem is NP-complete for most of the variants, except for a few simplified cases Sho 15.

With the advent of high-speed communication technologies and the concept of Network Of Workstation (NOW), a network of workstations, distributed processing has been investigated extensively for parallel computing purposes. The computing power of a local network can even exceed that of a supercomputer. Not only this will be a cost-effective measure, where investment on moderately powerful computers is low, also applications are partitioned to extract the maximum profit of using a particular design. Applications that use computing environments are fluid flow, weather modelling, database systems, and image processing. To achieve maximum benefits, a parallel application is first transformed into a Directed Acyclic Graph (DAG) by a task partitioning algorithm. Then those tasks are mapped onto processors using a scheduling algorithm. Most of the scheduling algorithms focus mainly on the optimization criteria namely scheduling length or makespan (i.e. the total completion time of a job). Duplication based scheduling algorithms are designed to avoid communication time results between tasks and to reduce the makespan Kur 12.

Task Duplication Based Scheduling is the best technique to achieve minimum makespan.

Though there types of scheduling and methods of scheduling, among these Task Duplication is the best technique to be used in scheduling concept. The main idea behind duplication based scheduling is to utilize processor idle time by duplicating predecessor tasks. Task duplication avoids transfer of results from a predecessor, through a communication channel that brings Inter Process Communication (IPC) into the picture. This may eliminate waiting slots on other processors Sam 02.

Multiprocessor task scheduling is applied to a wide variety of problems where multiple tasks are to be performed on a number of processors. The tasks considered here are non pre-emptive. The goal is to schedule these tasks on the processors while meeting the required constraints.

This chapter proposes a Dynamic Task Duplication based Scheduling Algorithm (DyDupSA) which duplicates the task level wise and then dynamically migrates the tasks from one processor to another at run time. The aim of DyDupSA is to minimize the makespan and maximize the processor utilization.

5.2 TASK DUPLICATION

The task duplication based scheduling provides greater efficiency and minimizes makespan compared to other scheduling techniques. The main idea behind the task duplication based scheduling is utilizing processor idle time to duplicate predecessor tasks. This can avoid the data transfer from a predecessor to a successor thus reducing the communication time, network overhead, and potentially reduce the waiting time for a task. Task duplication-based scheduling is most useful for systems having high communication latencies and low bandwidths. Task duplication-based scheduling algorithms were developed for homogeneous processors and heterogeneous processors in the past Rav 13.

Duplications are based on task dependencies. Duplication algorithm should ensure beneficial duplications and avoid unnecessary duplications. So that the idle time slots between task execution times are effectively used. Duplication algorithm aims to avoid the communication contention, which will happen while there is the frequent transportation of large sets of data Mut 15.

Task duplication is a relatively new approach for DAG scheduling. Performance of the duplication-based algorithms is superior to non duplication-based ones in terms of generating smaller schedule lengths. However, this is achieved at the expense of higher time complexity and larger number of processors Dor 10. In this chapter, a scheduling scheme called DyDupSA is proposed to schedule tasks of a DAG onto a heterogeneous system. DyDupSA is based on network of workstations, with processors of varying computing speed. The primary objective of this scheme is to minimize makespan and maximize the processor utilization.

The constraints and the values the problem is attempting to consider are:

1. The system contains a series of limited, fully connected heterogeneous processors.

2. Different communication time exists between tasks.

3. Each task has some prerequisite constraints that needs to be satisfied which means that given task are not executed until its prerequisite tasks are performed.

4. Task duplication is allowed which helps in minimizing communication time i.e., if the task and its predecessor are on the same processor, then the communication time is zero Rav 13.

In DAG, tasks are distributed in different levels. In order to show this in pictorial representation, DAG will have the parents in the top level and children are placed in the next successive levels towards the bottom. The child is the dependent of the parent(s). There cannot be dependencies between tasks of the same level. A child can have parents in any of the levels prior to the level where it is present. The method of duplication involves the execution of particular tasks in more than one computing processors and thereby making the results locally available for more descendants. Therefore, the time of message passing becomes zero and the communication between the processors is overlooked. The idle time of processor may be used to execute a duplicated task. Generally, it is observed from research, that duplication mechanism results in the earlier completion time of DAG and avoids communication contention between processors. As the idle slots are used effectively, wait time is avoided and the processor performance is highly appreciated. Many algorithms have proven to produce quality results in homogeneous computing systems. However, it is a complicated problem to schedule tasks on to processors in the heterogeneous environment due to varying nature of processors, speeds, and network bandwidths Mut 2015.

5.3 PROPOSED METHODOLOGY

In DyDupSA initially, level based Task Tree (TT) is generated from DAG and Processor List (PL) is also generated from the available processors. The root task is duplicated and assigned to all the processors in the first level. In the second level, the average communication time is calculated by dividing the sum of communication time by the total number of tasks in that level. If the communication time of a task is greater than the average communication time of current level, then that task is duplicated and assigned to all the processors. After completing the tasks of a level, the above steps are repeated for all the levels until all the tasks are scheduled. Finally the child task is migrated to its parent’s processor when the task has higher computation time. The proposed algorithm has two main objectives namely, maximizing the processor utilization and minimizing the total completion time (makespan).

Figure 5.1. Workflow of DyDupSA

It is found that duplicating the tasks in each level reduces communication time, minimizes makespan, and maximizes the processor utilization. The performance of the algorithm is measured by comparing it with DCP and TDSA algorithms. The result shows that minimized makespan and effective processor utilization with balanced loads across processors in multiprocessor environments. Figure 1.4 shows the sample DAG with 10 Tasks. Figure 5.1 shows the workflow diagram of DyDupSA. The psudocode of DyDupSA is given in figure 5.2.

1: Input DAG

2: Create Task Tree (TT)

3: Arrange the processors in descending order

4: For each level

5: Avg (CT) = Sum (CT) in CL/ Tot. no.of.tasks in CL

6: Assign task Ti to processor pj that has minimum EFT of task Ti

7: If CT (Ti) ;Avg (CT)

8: Duplicate Ti to pj ? FP

9: Continue step 5 to step 8 until all tasks at that level are executed

10: Migrate the child task to its parent’s processor if the task has higher computationtime/communication time.

Figure 5.2.The DyDupSA Algorithm

5.4 RESULTS AND DISCUSSION

The proposed DyDupSA is simulated in Java Environment, the performance of DyDupSA is compared with DCP and TDSA algorithms. The DyDupSA is tested using arbitrary task graph and regular graphs. The number of tasks in the arbitrary task graph is varied from 10 to 2000 tasks and the processors are varied from 3 to 44 processors.

For the DAG shown in figure 1.4 the tasks T0, T1, T2 and T8 are duplicated to all the processors using average communication time. After duplicating, the tasks are migrated based on the condition. In the first level T1 is having higher computation time, it is already duplicated. In the next level, the task, which is having higher computation time is task T5, is migrated from the processor P1 to P0. After duplicated task migration, the DyDupSA algorithm produces less makespan before duplication of migration. Hence the execution time is minimized from 18.6 time units to 16. 8 time units. Table 5.1 shows the duplicated tasks before migration and table 5.2 shows the duplicated tasks after migration are highlighted by yellow color.

Table 5.1. Duplicated Task before Migration

Task Processors Start Time End Time

T0 P1 0 1

T0 P0 0 0.80

T0 P2 0 0.67

T1 P1 1 10

T1 P0 0.8 8

T1 P2 0.67 6.67

T2 P1 10 11

T2 P0 8 8.80

T2 P2 6.67 7.34

T7 P2 7.34 8.01

T8 P1 11 13

T8 P0 8.80 10.40

T8 P2 8.01 9.34

T3 P2 9.34 11.34

T5 P1 13 17

T6 P2 11.34 12.67

T4 P0 10.40 12

T9 P0 17 18.6

Table 5.2. Duplicated Task after Migration

Task Processor Start Time End Time

T0 P1 0 1

T0 P0 0 0.8

T0 P2 0 0.67

T1 P1 1 10

T1 P0 0.8 8

T1 P2 0.67 6.67

T2 P1 10 11

T2 P0 8 8.80

T2 P2 6.67 7.34

T7 P2 7.34 8.01

T8 P1 11 13

T8 P0 8.80 10.40

T8 P2 8.01 9.34

T3 P2 9.34 11.34

T5 P0 10.4 13.6

T6 P2 11.34 12.67

T4 P0 13.6 15.2

T9 P0 15.2 16.8

Figure 5.3 shows the Gantt chart for the given arbitrary task graph after duplicating tasks but before migration. For example the processor P0 is assigned with T0, T1, T2, T4 and T9. The processor P1 is assigned with T0, T1, T2, T8 and T5 and the processor P2 is assigned with T0, T1, T2, T7, T8, T3 and T6. Thus the tasks T0, T1, T2 and T8 are duplicated with all processors which reduce the communication time. Hence the execution time is reduced to 16.8 time units.

Time Units /Processors 0 2 4 6 8 10 12 14 16 18

P0 T0(0-0.8) T1(0.8-8) T2(8-8.8) T8(8.8-10.4) T4(10-12) T9(17-18)

P1 T0(0-1) T1(1-10) T2(10-11) T8(11-13) T5(13-17)

P2 T0(0-0.67) T1(0.67-6.67) T2(6.67-7.34) T7(7.34-8.01) T8(8.01-9.34) T3(9-11)T6(11-12)

Figure 5.3. Gantt chart for the Given Task Graph before Migration

Figure 5.4 shows the Gantt chart for task duplication after migration. For example the processor P0 is assigned with T0, T1, T2, T8, T5, T4 and T9. The processor P1 is assigned with T0, T1, T2 and T8. The processor P2 is assigned with T0, T1, T2, T7, T8, T3 and T6.. The execution time of before migration is 16.8 time units.

Time Units /Processors 0 2 4 6 8 10 12 14 16 17

P0 T0(0-0.8) T1(0.8-8) T2(8-8.8) T8(8.8-10.4) T5(10.4-13.6) T4(13-15) T9(15-17)

P1 T0(0-1) T1(1-10) T2(10-11) T8(11-13) P2 T0(0-0.67) T1(0.67-6.67) T2(6.67-7.34) T7(7.34-8.01) T8(8.01-9.34) T3(9-11)T6(11-12) Figure 5.4. Gantt chart for Task Duplication after Migration

To evaluate the performance of DyDupSA algorithm, experiments are conducted using arbitrary tasks graphs and regular graphs.

5.4.1 Makespan

The makespan is defined as the time it takes from the point, the first task begins execution to the point at which the last task ends execution. The result of using arbitrary task graph is shown in figure 5.5. It is observed that DyDupSA outperforms DCP and TDSA in makespan.

Table 5.3. Comparison of makespan of DyDupSA

with DCP algorithm and TDSA Algorithm for Arbitrary Task Graph

Task Processors DCP TDSA DyDupSA

10 3 85 30 17

100 10 231 139 130

200 14 298 286 198

500 22 590 340 232

1000 31 1169 587 343

2000 44 2261 1036 388

Figure 5.5. Comparison of makespan of DyDupSA

with DCP algorithm and TDSA algorithm for Arbitrary Task Graph

DyDupSA algorithm applied with Gaussian Elimination Graph and Laplace Equation Graph and produces better results when compared with other algorithms. Figure 5.6 depicts the comparison of DyDupSA with DCP and TDSA algorithms. For 17 tasks Gaussian Elimination Graph the processors are varied from 3 to 7 in steps of 2 of the result is tabulated.

Figure 5.6. Comparison of makespan for DyDupSA algorithm

with DCP algorithm and TDSA Algorithm for Gaussian Elimination Graph

Figure 5.7 shows the comparison of makespan with DCP algorithm and TDSA algorithm. For 16 tasks Laplace Equation Graph the processors are varied from 4 to 8 in steps of 2 of the result is tabulated.

Figure 5.7. Comparison of makespan of DyDupSA algorithm

with DCP algorithm and TDSA Algorithm for Laplace Equation Graph

5.4.2 Processor utilization

The execution time and idle time of a processor are obtained from the scheduled list and it is used to calculate the utilization of a processor. Table 5.5 shows the processor utilization of DCP algorithm, TDSA algorithm and DyDupSA algorithm. The results are represented using arbitrary tasks graph in figure 5.8. It is observed that DyDupSA algorithm outperforms DCP algorithm and TDSA algorithm in processor utilization. Thus all processors are utilized effectively.

Table 5.4. Comparison study of processor utilization of DyDupSA algorithm

with DCP algorithm and TDSA Algorithm for Arbitrary Task Graph

Task Processors DCP (%) TDSA (%) DyDupSA (%)

10 3 68 70 72

100 10 60 65 68

200 14 65 70 71

500 22 65 68 70

1000 31 70 78 80

2000 44 73 86 88

Figure 5.8. Comparison of processor utilization of DyDupSA algorithm

with DCP algorithm and TDSA Algorithm for Arbitrary Task Graph

Figure 5.9 illustrates the comparison of processor utilization of DyDupSA algorithm with DCP algorithm and TDSA algorithm for Gaussian Elimination Graph. DyDupSA increases utilization of processors with DCP algorithm and TDSA algorithm.

Figure 5.9. Comparison of processor utilization of DyDupSA algorithm

with DCP algorithm and TDSA Algorithms for Gaussian Elimination Graph

Figure 5.10 depicts the comparison of processor utilization of DyDupSA with DCP algorithm and TDSA algorithm for Laplace Equation Graph. DyDupSA produces maximum processor utilization with DCP algorithm and TDSA algorithm. While maximizing the processor utilization the load is also balanced.

Figure 5.10. Comparison of processor utilization of DyDupSA algorithm

with DCP algorithm and TDSA Algorithm for Laplace Equation Graph

Time Complexity of DyDupSA Algorithm = O (np) where n is the number of tasks and p is the number of processors.

5.5 CONCLUSION

Multiprocessor task scheduling is a rich area demanding the application of efficient methods to minimize the task computation time in real-world applications. Scheduling tasks to computational processors is a complex problem, in order to get the effective scheduling, scheduling is the decision process by which application components are assigned to available resources to optimize various performance metrics. The multiprocessor scheduling can be explained with DAG model. The task duplication based scheduling provides greater efficiency and minimum makespan time as compared to other scheduling techniques. The proposed DyDupSA algorithm duplicates the tasks in each level to reduce communication time. It also migrates the tasks after duplication to reduce the makespan. Thus, DyDupSA minimizes the makespan and maximizes the processor utilization for the arbitrary task graph and regular graphs.