C++11 versus R Standalone Random Number Generation Performance Comparison

If you are writing some C++ code with the intent of calling it from R or even developing it into a package you might wonder whether it is better to use the pseudo random number library native to C++11 or the R standalone library. On the one hand users of your package might have an outdated compiler which doesn’t support C++11 but on the other hand perhaps there are potential speedups to be won by using the library native to C++11. I decided to compare the performance of these two libraries.

#define MATHLIB_STANDALONE
#include <iostream>
#include <vector>
#include <random>
#include <chrono>
#include "Rmath.h"

int main(int argc, char *argv[])
{
        int ndraws=100000000;
        std::vector<double> Z(ndraws);
        std::mt19937 engine;
        std::normal_distribution<double> N(0,1);

        auto start = std::chrono::steady_clock::now();
        for(auto & z : Z ) {
                z=N(engine);
        }
        auto end = std::chrono::steady_clock::now();
        std::chrono::duration<double> elapsed=end-start;

        std::cout <<  elapsed.count() << " seconds - C++11" << std::endl;

        start = std::chrono::steady_clock::now();
        for(auto & z : Z ) {
                z=rnorm(0,1);
        }
        end = std::chrono::steady_clock::now();
        elapsed=end-start;

        std::cout <<  elapsed.count() << " seconds - R Standalone" << std::endl;

        return 0;
}

Compiling and run with:

[michael@michael coda]$ g++ normal.cpp -o normal -std=c++11 -O3 -lRmath
[michael@michael coda]$ ./normal 

Normal Generation

5.2252 seconds - C++11
6.0679 seconds - R Standalone

Gamma Generation

11.2132 seconds - C++11
12.4486 seconds - R Standalone

Cauchy

6.31157 seconds - C++11
6.35053 seconds - R Standalone

As expected the C++11 implementation is faster but not by a huge amount. As the computational cost of my code is dominated by other linear algebra procedures of O(n^3) I’d actually be willing to use the R standalone library because the syntax is more user friendly.

Recursive Search Within Files in Terminal

When I inherit load of code from people, often I like to see what files call certain functions. A way to do this is to use grep recursively by using the -R option. Say I want to find all the files in which ‘rgamma’ appears.

msl33@hotel Documents]$ grep -R 'rgamma' *
601/lab/4/script4.R:lamda=rgamma(n,shape=5,rate=6)
601/lab/9/9.R:#  X[i,]=rgamma
601/lab/9/9.R:#   X=matrix(rgamma(length(a)*k,a,1),k,length(a),byrow=T)
601/lab/9/9.R:# theta=rgamma(10000,7,1000)
msl33@hotel Documents]$

This will look recursively within files to find appearances of ‘rgamma’. As it turns out rgamma appears once in “script4.R” and thrice in “9.R”.

Mustang Vim Colourscheme

I just came across this awesome vim colourscheme called mustang. The author’s deviantArt page is here but a slightly modified version is found here, which provides colouring for NERDTree.

vim colorscheme

vim colorscheme

R snippets for vim-SnipMate

Vim is my editor of choice, reasonable so, whether it be for coding C++, LaTeX or even R. I’ve used RStudio, which even has a Vim-Mode, but I still prefer to use Vim. Vim has it’s own R plugin, namely Vim-R-plugin, but this post is about snippets. SnipMate is an awesome auto-completion plugin for Vim which is fully customizable. One simply writes a string, rnorm for example, and presses tab to autocomplete the code to rnorm(n=,mean=,sd=), where repeated press of tab cycles through the placeholders at the function parameters. The strings to recognize, referred to as snippets, are stored in a snippets file “languagetype.snippets” along with the corresponding code to auto-complete. These can be user defined for any language, not just R. It’s usually not necessary to write these snippets files yourself, as there are already existing snippets files within the vim community, including one for R. Here is a github repository containing snippets files for a great many languages, including R. Simply put this into your vim-SnipMate snippets directory. The last thing to do is to tell Vim to recognize an r filetype. If you open an R file and type,

:set ft=r

then this will tell Vim that the file is an R file. Obviously you don’t want to do this all the time, so to get Vim to automatically recognize “.r” and “.R” extensions as R files simply append your .vimrc file with:

au BufNewFile,BufRead *.r set filetype=r
au BufNewFile,BufRead *.R set filetype=r

Writing Custom Snippets

This is an example of how to complete a for loop having written only “for”. Append the r.snippets file with the following code.

snippet for
    for(${1:i} in ${2:begin}:${3:end}){
    ${4}
    }${5}

This defines “for” as the snippet, the string after which we will want to press tab. The dollar signs define the place holders. “${x:text}” defines the x’th placeholder and the highlighted text which will be replaced.

for<tab>

becomes

for(i in begin:end){

}

.

Repeated pressing of tab cycles between i, begin, end, body of foor loop and then breaks out of the for loop.

AWK Remove Lines of Multiple Files

My girlfriend is a statistician and acquired some data in the form of 200 text files. The problem is that each file had two lines of non-informative description which needed removing. The goal was to process the data to remove the descriptive header. Instead of removing the lines from every file, I wrote this for her


mkdir news
for file in *;
do awk '{if (NR!=1 && NR!=2) {print}}' "$file" > new_"$file";
mv new_"$file" news/"$file";
done

cd news;
rm script;
rm news;

This uses awk to remove lines 1 and 2 from all of the files in the current directory and store the reduced files in the newly created directory “news”. Just run the shell script in the directory containing all the files.

rsync Tutorial and Help with Examples

rsync is what people used to use before Dropbox. It is used to sync remote folders with local ones. Say for example tha I have some work on my office computer in folder ‘foo’. It is the weekend and I need to work on this project but I don’t want to go into the office. rsync can be used to pull the folder from the remote (office) machine onto my local (home) computer. Then, when I am done with the changes, rsync can push all the modified files back to the office computer. rsync has some really great features so that it will only transfer the files which have been modified.

Home$ rsync -avzu office:.matlab/myfunctions/figs/ .
receiving incremental file list
./
1_a_0_neg.eps
1_a_0_pos.eps
1_a_10_neg.eps
1_a_10_pos.eps

sent 90 bytes  received 12464 bytes  2789.78 bytes/sec
total size is 36238  speedup is 2.89

-a –archive
Stands for archive mode. Basically this means that the structure from the office machine is kept on the local machine (i.e. same symbolic links etc.)

-v, –verbose
Increases verbosity

-z, –compress
Compresses the files for transfer purposes and then uncompresses them on the local machine. Really great if transferring massive data.

-u, –update
This means that if the destination folder has files with a timestamp which is newer than the files in the source folder, then the files in the destination folder will not be overwritten with the older source ones.

These are just a few most common arguments. Lastly, a good argument to remember

-n, –dry-run
This won’t actually do any transfer of data, it merely shows you what rsync WOULD do if you removed this argument. A good choice if you want to be careful.

How to crop a pdf when using the ps2pdf converter via -dEPSCrop

Often I find myself converting eps or ps to pdf files for inclusion in a latex document using the ps2pdf converter. The problem is that often when using ps2pdf I get a large white border around the figure of interest in my pdf file which was not included in the eps file. Obviously this is annoying when trying to include the graphic in a latex file surrounded by text. The solution comes with using the -dEPSCrop option.

The following flags will remedy the problem.

lindon$ ps2pdf -dPDFSETTINGS=/prepress -dEPSCrop test.eps

Now the pdf doesn’t included the unnecessary white space border.

GNU Wget Tutorial

As a student, you may find yourself wanting to download lots of lecture slides and other materials off a module homepage, which can become quite an arduous task. Thankfully, GNU created Wget which is already on most linux machines. It is best demonstrated by example:

wget -r -l5 -np -k -nH --cut-dirs=5 --load-cookies cookies.txt http://www2.warwick.ac.uk/fac/sci/physics/current/teach/module_home/px421/

-r
Means wget acts recursively i.e. it follows links found on the current page (much like a search engine spider.

-l
Specifies the depth, which means how many of these links it can follow. If you imagine all the links on the current page forming branches away from it, then the links on those pages forming branches away from those, then -l5 sets the maximum branch distance away from the current page.

-np
No Parent, means wget will only progress down the directory tree i.e. it will not work its way back into http://www2.warwick.ac.uk/fac/sci/physics/current/teach/module_home/

-k
Convert Hyperlinks. When wget downloads a page, say index.html, there will be links on that page just like viewed in your browser but -k will convert them to local links, so that you can navigate your way through the pages on your local machine.

-nH
No host directories. Basically wget would otherwise create a folder named “http://www2.warwick.ac.uk/” and all the downloaded stuff would get stored in there, which is normally undesirable.

–cut-dirs=5
Otherwise wget would create 5 directories

http://www2.warwick.ac.uk/fac/

sci/
physics/
current/
teach/
module_home/

in a directory tree which you don’t want to have to click through…

–load-cookies
Normally content is restricted and you need to login, so you need to supply wget with some cookies. If you are a firefox user then there is an extension called ‘cookie exporter’, which you can use to output your cookies to a file called cookies.txt.

That’s it!

Libraries

Compiling converts your source code into object or machine code, which the processor can understand. So the compiler produces an object file (.o) from your source code. The linker then pieces the object files together and herefrom produces an executable. If you wish to “compile only” i.e. to obtain merely the object file, you can add the “-c” flag at compilation:

michael@michael-laptop:~$ gcc -c test.c

This produces the object file “test.o”. You can inspect this object file with the nm command. It basically lists all the symbols defined in the object file.

Static Libraries

A static library is an archive of object files. All we do is include this archive in the compile line just like we would do for the .o files. Any executable created by linking with a static library contains its own copy of all the binary object files found inside, so only the executable need be distributed. This archive is created with the ar command e.g.

michael@michael-laptop:~$ ar r library.a file1.o file2.o file3.o 

” r Insert the files member into archive (with replacement).”
You can then display what files are in an archive with the t option.

michael@michael-laptop:~$ ar t library.a

Shared Libraries

It is a better idea to use shared libraries over static libraries. This is because modules that a program requires are loaded into the memory from shared objects at run time or load time, whereas static libraries are all put together at compile time. This has the advantage that I can change the object files in the libraries and not need to recompile my executable. If the library was static and I made a change, then I would need to recompile all the executables which depend on that library. Shared libraries have 3 names:

soname

has the prefix “lib” and the suffix .so followed by a full stop and then the major version number i.e. libtest.so.1
We would only increment the major version number if we make a change which breaks backward compatibility e.g. changing the number of arguments that a function has

realname

is the actual filename which contains the actual library code. It gains a minor version number plus a realase number in addition to the soname i.e. libtest.so.1.0.1

linker name

is the name of the library which the linker refers to at compilation. It is the same as the soname, just without the version number i.e. libtest.so
The linker name is basically a symbolic link to the soname, which is itself a symbolic link to the real name.

To create a shared library, we need to compile our source code as follows:

michael@michael-laptop:~$ gcc -fPIC -c test.c

The “-fPIC” option tells the compiler to produce Position Independent Code, which means the code can function regardless of where in the memory it is loaded. We can then proceed by using the “-shared” option at gcc and passing the soname as an option to the linker with the -Wl command.

michael@michael-laptop:~$ gcc -shared -Wl,-soname,libtest.so.1 -o libtest.1.0.1 test.o

The -shared option tells the compiler that the output file should be a shared library.
-Wl,option
Pass option as an option to the linker. If option contains commas, it is split into multiple options at the commas. You must not include whitespaces.
The -soname option specifies the soname, duh.
The -o option specifies the real name.

Now that the shared library has been created, we need to install it, namely with ldconfig. The ldconfig program generates a symbolic link, named as the soname, to the realname. The -n option specifies the directory where the shared library is found.
Finally, we need to create a new symbolic link with a filename as the linker name, to the soname.

michael@michael-laptop:~$ ln -s libtest.so.1 libtest.so

where -s option stands for symbolic.

at command

Scheduling a process to run automatically at a certain date and time can be quite useful. This is achieved with the at command. The at command reads a series of commands from the standard input and lumps them together into one single at-job to be executed at some point in the future.

Syntax: at [-V] [-q queue] [-f file] [-mkdv] [-t time]

You can man at to find out about all the options.

Here is an example:

michael@michael-laptop:~$ at -mv 4:44 sep 21
Wed Sep 21 04:44:00 2011

warning: commands will be executed using /bin/sh
at> g++ main.cpp -o test
at> nohup ./test &
at> 
job 13 at Wed Sep 21 04:44:00 2011
michael@michael-laptop:~$ 

-m option mails the user when the at-job has been executed. The -v option simply produces the first line i.e. displays when the job will be executed. One does not write , instead, this is achieved by entering CTRL+D here on a new line.

To see whats at-jobs have been scheduled, enter atq (alternatively at -l

michael@michael-laptop:~$ atq
13	Wed Sep 21 04:44:00 2011 a michael
michael@michael-laptop:~$ at -l
13	Wed Sep 21 04:44:00 2011 a michael
michael@michael-laptop:~$ 

To remove this job, enter atrm and the ID:

michael@michael-laptop:~$ atrm 13
michael@michael-laptop:~$ atq
michael@michael-laptop:~$

The at-job inherits the environment of the terminal that schedules it and hence contains the same working directory and environment variables. If at some point in the future you forget what commands an at-job contains, you can view them with at -c job. This lists the environment variables of the at-job and contains the commands at the bottom.