OpenMP Tutorial – Critical, Atomic and Reduction

Atomic and Critical

  • critical: the enclosed code block will be executed by only one thread at a time, and not simultaneously executed by multiple threads. It is often used to protect shared data fromrace conditions.
  • atomic: the memory update (write, or read-modify-write) in the next instruction will be performed atomically. It does not make the entire statement atomic; only the memory update is atomic. A compiler might use special hardware instructions for better performance than when using¬†critical.

Consider this code which numerically approximates pi:


int main(void){
double pi,x;
int i,N;
pi=0.0;
N=1000;
#pragma omp parallel for
for(i=0;i x=(double)i/N;
pi+=4/(1+x*x);
}
pi=pi/N;
printf("Pi is %f\n",pi);
}

We compile this with gcc main.c -o test, ignoring the -fopenmp options, this means that the #pragma omp parallel for will be interpreted as a comment i.e. ignored. We run it and this is the result:
<

prog@michael-laptop:~$ gcc test.c -o test
prog@michael-laptop:~$ ./test
Pi is 3.142592

Now compile with the -fopenmp option and run:

prog@michael-laptop:~$ gcc test.c -o test -fopenmp
prog@michael-laptop:~$ ./test
Pi is 2.785016

Oh dear… Let’s examine what went wrong. Well, by default and as we have not specified it as private, the variable x is shared. This means all threads have the same memory address of the variable x. Therefore, thread i will compute some value at x and store it at memory address &x, thread j will then compute its value of x and store it at &x BEFORE thread i has used its value to make its contribution to pi. The threads are all over writing each others values of x because they all have the same memory address for x. Our first correction is that x must be made private:
#pragma omp parallel for private(x)

Secondly, we have a “Race Condition” for pi. Let me illustrate this with a simple example. Here is what would ideally happen:

 

  • Thread 1 reads the current value of pi : 0
  • Thread 1 increments the value of pi : 1
  • Thread 1 stores the new value of pi: 1
  • Thread 2 reads the current value of pi: 1
  • Thread 2 increments the value of pi: 2
  • Thread 2 stores the value of pi: 2

What is actually happening is more like this:

  • Thread 1 reads the current value of pi: 0
  • Thread 2 reads the current value of pi: 0
  • Thread 1 increments pi: 1
  • Thread 2 increments pi: 1
  • Thread 1 stores its value of pi: 1
  • Thread 2 stores its value of pi: 1

The way to correct this is to tell the code to execute the read/write of pi only one thread at a time. This can be achieved with critical or atomic. Add
#pragma omp atomic Just before pi get’s updated and you’ll see that it works.

This scenario crops up time and time again where you are updating some value inside a parallel loop so in the end it had its own clause made for it. All the above can be achieved by simply making pi a reduction variable.

Reduction

To make pi a reduction variable the code is changed as follows:

 

int main(void){
double pi,x;
int i,N;
pi=0.0;
N=1000;
#pragma omp parallel for private(x) reduction(+:pi)
for(i=0;i&lt;N;i++){
x=(double)i/N;
pi+=4/(1+x*x);
}
pi=pi/N;
printf("Pi is %f\n",pi);
}

This is simply the quick and neat way of achieving all what we did above.

Leave a Reply