## Parallel Random Number Generation using TRNG

To my surprise and disappointment, popular scientific libraries like Boost or GSL provide no native support for parallel random number generation. Recently I came across TRNG, an excellent random number generation library for C++ built specifically with parallel architectures in mind. Over the last few days I’ve been trawling internet forums and reading discussions about the best parallel random number generation libraries. Given the trend in CPU architectures whereby each contains an ever increasing number of cores, it makes sense to start a project by considering what libraries are available to best make use of this technology. The first libraries I came across were RngStream and SPRNG. It seems SPRNG was built specifically with MPI, i.e. for distributed memory architectures, in mind and there are some excellent examples and resources of how to get started with parallel MCMC on Darren Wilkinson’s blog. As a result, it seems a bit contrived to get SPRNG to work with OpenMP, i.e. for shared memory architectures. Moreover, I specifically wanted to use OpenMP because I wanted to write an extension for R for use on personal computers. RngStream is written by the man himself, Pierre L’Ecuyer, and is much more OpenMP amenable. Both of these, however, only generate uniform pseudo random numbers. This isn’t a fault, but it means you need to code up transformations and samplers to generate non-uniform pseudo random numbers. While this would be a good exercise, life is short, and I’d rather leave this sort of thing to the professionals (I don’t want to code up my own Ziggurat algorithm). Also, the generators or engines are of defined types and I found it hard to convert them into the corresponding types of other libraries like Boost or GSL so that I could use their non-uniform generation code. That probably says more about my ability rather than the actual difficulty of the problem and Darren Wilkinson provides some of his own code for getting the RNG engine of SPRNG into the correct datatype to be compatible with GSL.

## TRNG

At this point I was quite discouraged but then I came across TRNG, written by Heiko Bauke. At first glance TRNG is an excellently documented C++ PRNG (which stands for pseudo random number generator, not parallel, that would be PPRNG) library built specifically with parallel architectures in mind. Not only does it provide non-uniform distributions, but it can be used easily with MPI, OpenMP, CUDA and TBB, for which many examples are supplied. The documentation is excellent and the many examples of the same problem coded with each of the aforementioned parallelization methods are enlightening. If that weren’t enough, TRNG can be used in combination and interchangeably with the Boost random as well as the C++11 TR1 random libraries, that is, the engines/generators from TRNG can be used with the distribution functions of Boost and C++11 TR1, which was a problem I encountered with RngStream and SPRNG. The way TRNG and RngStream work are slightly different. Whereas RngStream generates multiple independent streams, TRNG uses a single stream and either divides it into blocks, or interleaves it between different processors by a leap-frog type scheme, much like dealing out cards round a table. The point of all this is that the streams of different processors never overlap, otherwise one would get the same draws on processor A as processor B. While purists might argue that L’Ecuyer’s method is more rigorous, I’m happy enough with the way Heiko has done it, especially given TRNG’s out-of-box easy of use and compatibility.

### Installation of TRNG

Clone the repository off Github.

[michael@michael$git clone https://github.com/rabauke/trng4 [michael@michael$cd trng4/
[michael@michael trng4]$./configure --prefix=/usr [michael@michael trng4]$make
[michael@michael trng4]$make inst [michael@michael trng4]$ sudo bash
[root@michael trng4]# ldconfig
[root@michael trng4]#


the “–prefix=” argument just sets where I want the files to be installed and is not necessary. If omitted the default case is /usr/local. After make install, run ldconfig as root in order to update the dynamic linker/loader with the presence of the new library.

Basically there exists a cache /etc/ld.so.cache which is used by the dynamic linker/loader at run-time as a cross-reference for a library’s soname with its full file path. ldconfig is normally run during booting but can also be run anytime to update the cache with the locations of new libraries. Here is what happens if you don’t run ldconfig, as I did the first time.

[michael@michael ~]$g++ hello_world.cc -L /usr/lib -ltrng4 [michael@michael ~]$ ./a.out
./a.out: error while loading shared libraries: libtrng4.so.0: cannot open shared object file: No such file or directory


It compiled fine, but at run-time the loader couldn’t find the library.

## Parallel Random Number Generation in C++

Nachtrag: I think instead of using trng::yarn2 gen[max] it is better to do:

trng::yarn2 * gen;
gen=new trng::yarn2[max];


The approach will be to generate the the PRNGs in C++ and call it from R using Rcpp. First lets consider the C++ code to generate some random uniforms.

#include <cstdlib>
#include <iostream>
#include <omp.h>
#include <trng/yarn2.hpp>
#include <trng/uniform01_dist.hpp>

int main() {

int rank;
trng::yarn2 gen[max];
trng::uniform01_dist<> u;
std::cout << max << " =max num of threads" << std::endl;

for (int i = 0; i < max; i++)
{
gen[i].split(max,i);
}

#pragma omp parallel for private(rank)
for (int i = 0; i < max; ++i)
{
#pragma omp critical
std::cout << u(gen[rank]) << " from thread " << rank <<  std::endl;
}
return EXIT_SUCCESS;
}



which returns

[michael@michael ~]$g++ omprng.cpp -o omprng -fopenmp -ltrng4 [michael@michael ~]$ ./omprng
[michael@michael ~]\$


The salient feature of this code is the leapfrog process by calling split. There exists a sequence of random uniforms and “.split(max,i)” divides it into max subsequences, leap-frogging each other, and grab the i’th subsequence. You can think of this as max players sitting around a poker table and the .split() as continuously dealing out random uniforms to each of the players. The code says let processor i be “player” i and use the sequence of random uniforms dealt to it.

## Parallel Random Number Generation in R using Rcpp

Thanks to Rcpp the above C++ code can be trivially changed so that it can be used from R. Just include the Rcpp header and change the function return type.

#include <cstdlib>
#include <iostream>
#include <omp.h>
#include <trng/yarn2.hpp>
#include <trng/uniform01_dist.hpp>
#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::NumericVector prunif(int n) {

int rank;
trng::yarn2 gen[max];
trng::uniform01_dist<> u;
Rcpp::NumericVector draws(n);

for (int i = 0; i < max; i++)
{
gen[i].split(max,i);
}

#pragma omp parallel for private(rank)
for (int i = 0; i < n; ++i)
{
draws[i]=u(gen[rank]);
}

return draws;
}


This code can be compiled and loaded into R on the fly, so lets test it.

### Speedup Performance

> library(Rcpp)
> library(rbenchmark)
> Sys.setenv("PKG_CXXFLAGS"="-fopenmp")
> Sys.setenv("PKG_LIBS"="-ltrng4")
> sourceCpp("prunif.cpp")
> benchmark(replications=rep(100,0,1),runif(1000000),prunif(1000000))
test replications elapsed relative user.self sys.self user.child
2 prunif(1e+06)          100   0.611     1.00     2.227    0.114          0
1  runif(1e+06)          100   3.837     6.28     3.745    0.086          0


Parallel RNG speedup

There are a few things to note. Spawning threads incurs its own overhead, so it will obviously be slower for very few draws. As the number of draws becomes larger the time taken to spawn new threads is dwarfed by the time taken to create the draws and so it is worthwhile to do it in parallel. One caveat is that prunif and runif did not in this case use the same generating algorithm. R’s algorithm can be changed with RNG.kind and the TRNG algorithm can be changed by using an alternative to yarn in “trng::yarn2″. Even if they were the same though I would expect the same qualitative behaviour.

## Discussion

Generating large samples of random numbers in one hit quickly is not the reason why I started looking for a good parallel random number generator. Rarely is it important to me to generate large amount of draws in one go but it certainly is important to me to have independent streams. Generally I will port expensive parts of my R code, usually for loops, to C++ and inevitably I will somewhere within these for loops or other expensive parts of code need to draw some random numbers. Since these expensive pieces of code are self-evidently expensive, I will want to compute them in parallel in C++ if I can and so it is very important to me to have independent streams from which to draw random numbers.

## 4 thoughts on “Parallel Random Number Generation using TRNG”

1. Hi Michael, many thanks for sharing this! I wonder how difficult would it be to build a R package that makes all the distributions in TRNG available at R level, by using Rcpp. I guess that would be quite useful, as I don’t know about any package that provides such functionality.

• Hey Matteo, it would be nice to have a package for parallel random number generation for sure, but it might be possible to achieve this with the C++11 pseudo random number generators instead of TRNG (I wrote this post before I was aware of C++11 PRNG). A lot of people would be happy to have one C++11 PRNG object per thread and seed them differently (i.e. seed according to thread id), this is what I always used to do, but the purists out there I think would say that seeding according to thread id is simply not enough as it may still be possible to get overlapping streams. Having said that, I think in practice the chances of getting an overlapping stream are rare, so you might be willing to accept that risk. I know enough to be cautious, but I don’t really know enough how best to proceed.

2. Actually using one seed per thread is what I generally do too… but sometimes I do wonder about the hidden risks!

3. Thanks, nice writeup.
Here’s the best theoretical discussion of threads and rng streams, with a clear discussion of problems and justification for TRNG’s methodology, FWIW.
http://arxiv.org/abs/0905.4238