Send Lines of Code from Vim to R/Julia/Python REPL

Vim + Tmux + Slime Workflow

Editing code in anything other than Vim seems to drain my coding stamina rapidly, but when working with a language such as R that has an interactive REPL (read evaluate print loop) it is often useful to send lines of code thereto for purposes of debugging. I have used the Vim-R-Plugin for some time but recently my differences with R have led me to Julia, and so I have been looking for a way to send code from Vim to Julia. My search has resulted in vim-slime.

Vim Slime

vim-slime is absolutely great, it’s a plugin for vim that lets you send text to screen or tmux. This is perfect for programming any language that has an REPL, simply load a session up in a tmux, open up vim, tell it what tmux window and session you want to send text to and away you go. This will work for any language!
A great post can be found here.

Edit Locally, Execute Remotely

I’ve long since desired to be able to do this. It would be great if you could edit the file locally, but have the Julia session execute on a remote machine, say my office computer, so that all the heavy lifting is done by the remote machine allowing me to get on with my own personal things on my local machine. Vim-slime provides an elegant solution. Simply start a tmux session, ssh to your remote machine, start up an interactive REPL of the language of your choice, open another terminal locally and start vim on your code and now you can send lines to your remote R/Julia/Python session! This has dramatically improved my workflow as my office computer is much more powerful than my laptop.

vimslime

send code from vim to r python julia etc repl

C++ Merge Sort Algorithm

C++ Merge Sort Algorithm

#include <random>
#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

vector<int> merge_sort(const vector<int>& input)
{
	if(input.size()<=1) return input;
	vector<int> output(input.size());

	//Split Vector//
	int midpoint=0.5*input.size();
	vector<int> input_left(input.begin(),input.begin()+midpoint);
	vector<int> input_right(input.begin()+midpoint,input.end());

	input_left=merge_sort(input_left);
	input_right=merge_sort(input_right);
	merge(input_left.begin(),input_left.end(),input_right.begin(),input_right.end(),output.begin());

	return output;
}

int main(){

	//Create unsorted vector of ints
	vector<int> unsorted(40);
	iota(unsorted.begin(),unsorted.end(),-20);
	shuffle(unsorted.begin(),unsorted.end(),default_random_engine());

	//Perform merge_sort//
	vector<int> sorted=merge_sort(unsorted);

	//Display results//
	cout << "Unsorted: " <<  endl;
	for(auto value:unsorted)  cout << value << " ";
	cout <<  endl;
	cout << "Sorted: " <<  endl;
	for(auto value:sorted)  cout << value << " ";
	cout <<  endl;

}

stdout

Unsorted:
-2 14 7 -15 -9 -17 -8 -1 13 1 -10 -7 16 -19 6 2 -12 -11 8 -18 -14 10 5 4 17 12 15 -16 -5 18 -4 -3 -6 -20 0 3 9 -13 11 19
Sorted:
-20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Compile R and OpenBLAS from Source Guide

1. Get OpenBLAS
2.1 Get R
2.2 Specific Instructions for DSS Users
3. Validation
4. Benchmark

This guide is intended to aid any R and Linux user who desires a threaded version of BLAS. In particular I hope this will allow other grad students, who like me do not have many user privileges on their office computer, to follow suit and exploit multiple cores to speed up their linear algebra computations within R. The following will be performed on Scientific Linux 6.4 but has should be completely general. If you are a Ubuntu user, then there is an elegant and streamlined process for changing BLAS libraries and a recommended post about it here. I use Fedora on my laptop, and the following has also been tested thereupon.

My office computer has a quadcore processor with two threads per core but I also have access to a departmental computer with 4 sockets and 12 cores per socket (1 thread per core), so it really makes sense to use a threaded version of BLAS. If you are curious about the hardware on your own computer you can run the command “cat /proc/cpuinfo” or “lscpu”.

Unfortunately my office computer is part of a network upon which I do not have permissions to change ‘/usr/lib64/R/lib/libRblas.so’. Moreover R appears to be running serially: if you start up R and get the PID (process ID) from ‘top’ or ‘ps aux | grep R’ or something and then execute ‘cat /proc/PID/status | grep Threads’ you can see there is only one thread available.

[msl33@cabbage ~]$ cat /proc/13605/status | grep Threads
Threads:	1

(where 13605 was the process ID of my R process. That is using the default R on the network. One could appeal to the network administrator to change things for you but they probably won’t because a parallel BLAS implementation may cause problems for other users who require a serial BLAS, such as those that use the multicore environment to perform inherently parallel algorithms such as parallel tempering instead of using idle cores to speed up the linear algebra. There are also some known conflicts with the multicore package in R. There is, however, nothing stopping the user from compiling one’s own custom R build in one’s home directory and just changing the executable path thereto. In addition, you then have the power and freedom customize R to your needs – at the moment I have some very large matrices which would benefit from a threaded BLAS but at some point I may want to revert to a tuned serial BLAS such at ATLAS for certain parallel algorithms.

Firstly, go ahead and create a directory in which to keep all your custom software.

[msl33@cabbage ~]$ pwd
/home/grad/msl33
[msl33@cabbage ~]$ mkdir software


Download OpenBLAS

Make a directory “openblas” in the “software directory.

[msl33@cabbage ~]$ cd software/
[msl33@cabbage software]$ mkdir openblas

Next, grab the tarball from the OpenBLAS homepage. Change directory into where you downloaded the tarball and extract the files from it.

[msl33@cabbage ~]$ cd Downloads/
[msl33@cabbage Downloads]$ tar -xvf xianyi-OpenBLAS-v0.2.9-0-gf773f49.tar.gz 

While this is running, fill a kettle with some water and turn it on, this stage is very important.

Change directory into where you extracted the files and verify that NO_AFFINITY=1 is uncommented in the Makefile.rule. If so proceed and run make.

[msl33@cabbage ~/Downloads]$ cd xianyi-OpenBLAS-347dded/
[msl33@cabbage xianyi-OpenBLAS-347dded]$ cat Makefile.rule | grep NO_AFFINITY
NO_AFFINITY = 1
[msl33@cabbage xianyi-OpenBLAS-347dded]$ make

Now is a good time to “make” some tea with the water prepared earlier. When done successfully one will see
openblas confirmation
Now, as instructed above, install to the “software” directory made earlier.

[msl33@cabbage xianyi-OpenBLAS-347dded]$ make PREFIX=/home/grad/msl33/software/openblas install
...
Install OK!

In openblas/lib there will be a file “libopenblas.so”, needed for later. That’s it for openblas, next we will do R.


Download R

Let’s create an R directory in software. Go onto the R homepage, then download, then choose a mirror and grab the tarball of the latest version. Download it to your “software” directory and extract it as before with “tar -xvf R-3.1.1.tar.gz”. Once extracted, remove the tarball and change directory into R-3.1.1. Before running the configure script one might bring some customizations into consideration in the name of efficiency. One might consider upping the optimization level from 2 to 3 and adding march or mtune by editing “config.site” and changing “## CFLAGS=” on line 53 to “CFLAGS=’-O3 -march=native'” and making similar changes for FFLAGS and CXXFLAGS. It is noted in the R Installation and Administration documentation that these can produce worthwhile speedups but come with a warning that the build will be less reliable, with segfaults and numerical errors creeping in. It is safest to leave things regular (reccommended link) but I’ll take the risk. Now, if you are not using a computer on the duke statistical science network, run the configure script, otherwise see the additional instructions before running configure.

[msl33@cabbage R-3.1.1]$ ./configure --prefix=/home/grad/msl33/software/R --enable-R-shlib --enable-BLAS-shlib --enable-memory-profiling --with-tcltk=no


BEGIN ADDITIONAL INSTRUCTIONS FOR DUKE STATISTICAL SCIENCE STUDENTS

[On the DSS computers some further instructions are required to locate headers and libraries. The first time I tried to make on my office computer I encountered this error. “jni.h” could not be found. The error was resolved by locating it and then export the environment variable JAVA_HOME.

[msl33@cabbage software]$ locate jni.h
/usr/lib/jvm/java-1.7.0-sun-1.7.0.11/include/jni.h
[msl33@cabbage software]$ export JAVA_HOME=/usr/lib/jvm/java-1.7.0-sun-1.7.0.11/

In addition, when running the configure script the readline headers/libs could not be found. We’ll just borrow them from some other software. Add to CFLAGS, FFLAGS, CXXFLAGS “-I/opt/EPD_Free/include -L/opt/EPD_Free/lib” in addition to any other flags that you have set. Also make a lib directory and copy them in.

[msl33@cabbage R-3.1.1]$ mkdir lib
[msl33@cabbage R-3.1.1]$ cp /opt/EPD_Free/lib/libreadline.* lib/
[msl33@cabbage R-3.1.1]$ cp /opt/EPD_Free/lib/libncurses* lib/

Now run the configure line above.]

END ADDITIONAL INSTRUCTIONS FOR DUKE STATISTICAL SCIENCE STUDENTS

Once the configure has completed, you’ll see a summary below like
openblas configure
Now issue the command “make”, it will take some time. Once make has finished, you can execute “make install” to populate the software/R directory created earlier but you don’t need to. Change directories to lib and make a backup of libRblas.so and create a symbolic link to the openblas library that was made earlier.

[msl33@cabbage ~]$ cd software/R-3.1.1/lib
[msl33@cabbage lib]$ pwd
/home/grad/msl33/software/R-3.1.1/lib
[msl33@cabbage lib]$ mv libRblas.so libRblas.so.keep
[msl33@cabbage lib]$ ln -s /home/grad/msl33/software/openblas/lib/libopenblas.so libRblas.so

That was the last step.


Setup Validation

The R executable in the bin directory should now use openblas. Note this is the R executable you now need to run in order to use the custom built R with openblas. Just typing R in terminal will load the old /usr/lib64… which we students didn’t have the permissions to alter. You can, however, create an alias in your .bashrc file by inserting the line ‘alias R=”/home/grad/msl33/software/R-3.1.1/bin/./R”‘. Now when you type R in a terminal it will load the new R and not the old one. One can check that R executable depends on the correct linked shared blas library with the “ldd” command.

[msl33@cabbage bin]$ pwd
/home/grad/msl33/software/R-3.1.1/bin
[msl33@cabbage bin]$ ./R CMD ldd exec/./R | grep blas
	libRblas.so => /home/grad/msl33/software/R-3.1.1/lib/libRblas.so (0x00007f62e3fb7000)
[msl33@cabbage bin]$ ls -lt ../lib | grep openblas
lrwxrwxrwx  1 msl33 grad      53 Jul 16 15:35 libRblas.so -> /home/grad/msl33/software/openblas/lib/libopenblas.so

In addition, execute “./R” from the “bin” directory (or just R if you set up the alias) and grab the process id.

[msl33@cabbage bin]$ ps aux | grep R | grep software | awk '{print $2}'
2412
[msl33@cabbage bin]$ cat /proc/`ps aux | grep R | grep software | awk '{print $2}'`/status | grep Threads
Threads:	8
[msl33@cabbage bin]$ 

Evidently the R session now has 8 threads available. Finally, lets perform an eigen-decomposition and look at the cpu usage using top. You’ll see it light up all of your cores.
openblas cpu usage


Benchmark

Using this benchmark the reference BLAS took 32.1 seconds whilst openBLAS took 7.1 seconds.

C++11 versus R Standalone Random Number Generation Performance Comparison

If you are writing some C++ code with the intent of calling it from R or even developing it into a package you might wonder whether it is better to use the pseudo random number library native to C++11 or the R standalone library. On the one hand users of your package might have an outdated compiler which doesn’t support C++11 but on the other hand perhaps there are potential speedups to be won by using the library native to C++11. I decided to compare the performance of these two libraries.

#define MATHLIB_STANDALONE
#include <iostream>
#include <vector>
#include <random>
#include <chrono>
#include "Rmath.h"

int main(int argc, char *argv[])
{
        int ndraws=100000000;
        std::vector<double> Z(ndraws);
        std::mt19937 engine;
        std::normal_distribution<double> N(0,1);

        auto start = std::chrono::steady_clock::now();
        for(auto & z : Z ) {
                z=N(engine);
        }
        auto end = std::chrono::steady_clock::now();
        std::chrono::duration<double> elapsed=end-start;

        std::cout <<  elapsed.count() << " seconds - C++11" << std::endl;

        start = std::chrono::steady_clock::now();
        for(auto & z : Z ) {
                z=rnorm(0,1);
        }
        end = std::chrono::steady_clock::now();
        elapsed=end-start;

        std::cout <<  elapsed.count() << " seconds - R Standalone" << std::endl;

        return 0;
}

Compiling and run with:

[michael@michael coda]$ g++ normal.cpp -o normal -std=c++11 -O3 -lRmath
[michael@michael coda]$ ./normal 

Normal Generation

5.2252 seconds - C++11
6.0679 seconds - R Standalone

Gamma Generation

11.2132 seconds - C++11
12.4486 seconds - R Standalone

Cauchy

6.31157 seconds - C++11
6.35053 seconds - R Standalone

As expected the C++11 implementation is faster but not by a huge amount. As the computational cost of my code is dominated by other linear algebra procedures of O(n^3) I’d actually be willing to use the R standalone library because the syntax is more user friendly.

Recursive Search Within Files in Terminal

When I inherit load of code from people, often I like to see what files call certain functions. A way to do this is to use grep recursively by using the -R option. Say I want to find all the files in which ‘rgamma’ appears.

msl33@hotel Documents]$ grep -R 'rgamma' *
601/lab/4/script4.R:lamda=rgamma(n,shape=5,rate=6)
601/lab/9/9.R:#  X[i,]=rgamma
601/lab/9/9.R:#   X=matrix(rgamma(length(a)*k,a,1),k,length(a),byrow=T)
601/lab/9/9.R:# theta=rgamma(10000,7,1000)
msl33@hotel Documents]$

This will look recursively within files to find appearances of ‘rgamma’. As it turns out rgamma appears once in “script4.R” and thrice in “9.R”.

Mustang Vim Colourscheme

I just came across this awesome vim colourscheme called mustang. The author’s deviantArt page is here but a slightly modified version is found here, which provides colouring for NERDTree.

vim colorscheme

vim colorscheme

R snippets for vim-SnipMate

Vim is my editor of choice, reasonable so, whether it be for coding C++, LaTeX or even R. I’ve used RStudio, which even has a Vim-Mode, but I still prefer to use Vim. Vim has it’s own R plugin, namely Vim-R-plugin, but this post is about snippets. SnipMate is an awesome auto-completion plugin for Vim which is fully customizable. One simply writes a string, rnorm for example, and presses tab to autocomplete the code to rnorm(n=,mean=,sd=), where repeated press of tab cycles through the placeholders at the function parameters. The strings to recognize, referred to as snippets, are stored in a snippets file “languagetype.snippets” along with the corresponding code to auto-complete. These can be user defined for any language, not just R. It’s usually not necessary to write these snippets files yourself, as there are already existing snippets files within the vim community, including one for R. Here is a github repository containing snippets files for a great many languages, including R. Simply put this into your vim-SnipMate snippets directory. The last thing to do is to tell Vim to recognize an r filetype. If you open an R file and type,

:set ft=r

then this will tell Vim that the file is an R file. Obviously you don’t want to do this all the time, so to get Vim to automatically recognize “.r” and “.R” extensions as R files simply append your .vimrc file with:

au BufNewFile,BufRead *.r set filetype=r
au BufNewFile,BufRead *.R set filetype=r

Writing Custom Snippets

This is an example of how to complete a for loop having written only “for”. Append the r.snippets file with the following code.

snippet for
    for(${1:i} in ${2:begin}:${3:end}){
    ${4}
    }${5}

This defines “for” as the snippet, the string after which we will want to press tab. The dollar signs define the place holders. “${x:text}” defines the x’th placeholder and the highlighted text which will be replaced.

for<tab>

becomes

for(i in begin:end){

}

.

Repeated pressing of tab cycles between i, begin, end, body of foor loop and then breaks out of the for loop.

AWK Remove Lines of Multiple Files

My girlfriend is a statistician and acquired some data in the form of 200 text files. The problem is that each file had two lines of non-informative description which needed removing. The goal was to process the data to remove the descriptive header. Instead of removing the lines from every file, I wrote this for her


mkdir news
for file in *;
do awk '{if (NR!=1 && NR!=2) {print}}' "$file" > new_"$file";
mv new_"$file" news/"$file";
done

cd news;
rm script;
rm news;

This uses awk to remove lines 1 and 2 from all of the files in the current directory and store the reduced files in the newly created directory “news”. Just run the shell script in the directory containing all the files.

rsync Tutorial and Help with Examples

rsync is what people used to use before Dropbox. It is used to sync remote folders with local ones. Say for example tha I have some work on my office computer in folder ‘foo’. It is the weekend and I need to work on this project but I don’t want to go into the office. rsync can be used to pull the folder from the remote (office) machine onto my local (home) computer. Then, when I am done with the changes, rsync can push all the modified files back to the office computer. rsync has some really great features so that it will only transfer the files which have been modified.

Home$ rsync -avzu office:.matlab/myfunctions/figs/ .
receiving incremental file list
./
1_a_0_neg.eps
1_a_0_pos.eps
1_a_10_neg.eps
1_a_10_pos.eps

sent 90 bytes  received 12464 bytes  2789.78 bytes/sec
total size is 36238  speedup is 2.89

-a –archive
Stands for archive mode. Basically this means that the structure from the office machine is kept on the local machine (i.e. same symbolic links etc.)

-v, –verbose
Increases verbosity

-z, –compress
Compresses the files for transfer purposes and then uncompresses them on the local machine. Really great if transferring massive data.

-u, –update
This means that if the destination folder has files with a timestamp which is newer than the files in the source folder, then the files in the destination folder will not be overwritten with the older source ones.

These are just a few most common arguments. Lastly, a good argument to remember

-n, –dry-run
This won’t actually do any transfer of data, it merely shows you what rsync WOULD do if you removed this argument. A good choice if you want to be careful.

How to crop a pdf when using the ps2pdf converter via -dEPSCrop

Often I find myself converting eps or ps to pdf files for inclusion in a latex document using the ps2pdf converter. The problem is that often when using ps2pdf I get a large white border around the figure of interest in my pdf file which was not included in the eps file. Obviously this is annoying when trying to include the graphic in a latex file surrounded by text. The solution comes with using the -dEPSCrop option.

The following flags will remedy the problem.

lindon$ ps2pdf -dPDFSETTINGS=/prepress -dEPSCrop test.eps

Now the pdf doesn’t included the unnecessary white space border.