OpenMP Parallel For
The parallel directive #pragma omp parallel makes the code parallel, that is, it forks the master thread into a number of parallel threads, but it doesn’t actually share out the work. To illustrate, consider the following piece of code:
#include
#include
#include
int main(void){
int i;
#pragma omp parallel
{
for(i=0; i printf("Hello World %d\n",i);
}
}
}
If there were ten threads, then 100 hello worlds would be produced. Note the order in which they produced is NOT GUARANTEED
What we are really after is the parallel for directive, which we call a work-sharing construct.
#include
#include
#include
int main(void){
int i;
#pragma omp parallel for
for(i=0;i printf(“Thread ID: %d Hello World %d\n”,omp_get_thread_num(),i);
}
}
The for directive applies to the for loop immediately preceding it. Notice how we don’t have to outline a parallel region with curly braces {} following this directive in contrast to before. This program yields:
prog@michael-laptop:~$ ./test Thread ID: 3 Hello World 9 Thread ID: 1 Hello World 3 Thread ID: 1 Hello World 4 Thread ID: 0 Hello World 0 Thread ID: 0 Hello World 1 Thread ID: 0 Hello World 2 Thread ID: 2 Hello World 6 Thread ID: 2 Hello World 7 Thread ID: 1 Hello World 5 Thread ID: 2 Hello World 8 prog@michael-laptop:~$
Notice what I said about the order. By default, the loop index i.e. “i” in this context, is made private by the for directive.
At the end of the parallel for loop, there is an implicit barrier where all threads wait until they have all finished. There are however some rules for the parallel for directive
- The loop index, i, is incremented by a fixed amount each iteration e.g. i++ or i+=step.
- The start and end values must not change during the loop.
- There must be no “breaks” in the loop where the code steps out of that code block. Functions are, however, permitted and run as you would expect.
- The comparison operators may be < <= => >
There may be times when you want to perform some operation in the order of the iterations. This can be achieved with an ordered directive and an ordered clause. Each thread will wait until the previous iteration has finished it’s ordered section before proceeding with its own.
int main(void){
int i,a[10];
#pragma omp parallel for ordered
for(i=0;i<10;i++){
a[i]=expensive_function(i);
#pragma omp ordered
printf("Thread ID: %d Hello World %d\n",omp_get_thread_num(),i);
}
}
Will now print out the Hello Worlds in order. N.B. There is a penalty for this. The threads have to wait until the preceding iteration has finished with its ordered section of code. Only if the expensive_function() in this case were expensive, would this be worthwhile.
Posted in: OpenMP