EPSRC

HiPLARM installation

Here we present the user with a guide to installing HiPLARM and the relevant prerequisite libraries needed by the package. Ideally, the user will have an NVIDIA GPU and a multi-core CPU to take advantage of all aspects of the package. However, the user can continue only using the multi-core aspect of the package if an NVIDIA GPU is not present. We recommend the user downloads our install script which fully automates the entire install process. For more advanced users or those wishing to experiment with different aspects of the prerequisite libraries we provide some instructions below.

Quick Start

For users running on Linux distributions we have provided a build script that will download and install the prerequisite libraries and the HiPLARM package. The script can be downloaded here. It assumes the user has downloaded and installed the CUDA libraries which are available from the NVIDIA website The user simply provides a build location and a location of the CUDA installation. If not the defaults are used. The user is also given an option of using the ATLAS or OpenBLAS in the script. For example if the user wishes to use the ATLAS libraries.

$:sh buildScript.sh --prefix=/home/jsmyth/Libraries \ --cuda-home=/usr/local/cuda --with-atlas

Or the OpenBLAS libraries

$:sh buildScript.sh --prefix=/home/jsmyth/Libraries \ --cuda-home=/usr/local/cuda --with-openblas

If the user does not have an NVIDIA GPU on their system then the user can specify using the --no-gpu flag.

$:sh buildScript.sh --prefix=/home/jsmyth/Libraries \ --cuda-home=/home/jsmyth/cuda --with-openblas --no-gpu

This downloads and installs the hwloc, OpenBLAS/ATLAS, PLASMA and MAGMA shared libraries as well as the HiPLARM package. The script also performs auto-tuning on the HiPLARM routines.

Back to top

HiPLARM Prerequisites

For more advanced users we will provide a brief description of the prerequisite libraries and their install procedures. These steps are merely guidelines and to gain full advantage of the libraries we recommend the user read the documentation provided on the relevant websites. The user may also use the script as a reference for installing the software stack below.

CUDA

CUDA is NVIDIA's parallel computing architecture. Currently, HiPLARM can only run on NVIDIA GPUs and so users must download the relevant drivers and the CUDA toolkit. The toolkit is available here. Simply follow the instructions given and CUDA will be ready for use. The user should remember to set the LD_LIBRARY_PATH and PATH variables. Of course for users who do not have GPU enabled systems they can forego the GPU related libraries.

Optimised BLAS

For optimal performance of the PLASMA and MAGMA libraries it is crucial that the user provides an optimised version of the BLAS routines. As described in the PLASMA installation guide, the use of Netlib BLAS will produce an order of magnitude slower than optimised BLAS routines such as ATLAS or OpenBLAS.

Note: We mention only OpenBLAS and ATLAS here as that is what we have tested the package with. Users are not confined to these and can use other versions such as MKL and ACML. The user will just need to ensure that the correct flags are used for each package. The MAGMA library contains sample makefiles for different versions and PLASMA instructions can be seen on the PLASMA website.

OpenBLAS

OpenBLAS is an optimised BLAS library based on the GotoBLAS project, however, it should be noted that the GotoBLAS is no longer being updated. For the installation of OpenBLAS the procedure is relatively simple. The user can download the package as zip or tar file here. We suggest that the user creates a single directory for all the relevant files and folders. For demonstration purposes we will refer to it as BLDDIR.

$:export BLDDIR=/home/jsmyth/Libraries $: tar -xf xianyi-OpenBLAS-v0.2.2-0-g71d29fa.tar.gz $: cd OpenBLAS #In the Makefile.rule file set NO_AFFINITY=1# $: make $: make PREFIX=$BLDDIR install

The install process will automatically detect the system settings and optimise the library depending on these.

ATLAS

ATLAS is Automatically Tuned Linear Algebra Library and provides an optimised version of BLAS routines for C and Fortran. A certain amount of LAPACK routines are also provided but a full LAPACK package can be provided also and our instructions will cater for this. In addition, we will be compiling shared versions of the libraries. We suggest the user interested in exacting the most optimal performance to read the detailed documentation provided on the ATLAS website. ATLAS also provides multi-threaded versions of the libraries for parallel computation. Prior to starting the user should download the latest LAPACK version here and the latest version of the ATLAS library here.

Note: ATLAS requires CPU throttling to be turned off. For more information on how to do this see here.

$: tar -xf atlas3.10.0.tar.bz2 $:cd ATLAS $:mkdir build $:cd build $:../configure --prefix="$BLDDIR" --shared \ --with-netlib-lapack-tarfile="$BLDDIR/lapack-3.4.1.tgz" \ -Fa alg '-fPIC -m64 -fPIC' $:make $:cd lib $:make shared #for threaded shared libs# $:make ptshared # Now for lapack shared library # $:gcc -fPIC -Xlinker -zmuldefs -shared -o liblapack.so \ -Wl,-whole-archive liblapack.a -Wl,-no-whole-archive \ -L. -lf77blas -lcblas -latlas -lgfortran $:cp *.so $BLDDIR/lib $:cd ../ $:make install

R

User should ensure that they have a correctly compiled version of R. R should be compiled with the --enable-BLAS-shlib flag. Also, for completeness uses should build R linking to the version of optimised BLAS they use, whether that be the versions mentioned above or another version. Some examples for the config file are provided here.

./configure --with-x=no \ --with-blas="-L/usr/local/atlas_3.8.4/lib -lf77blas -latlas" \ --with-lapack="-L/usr/local/atlas_3.8.4/lib -llapack -lcblas" \ --prefix=/home/jsmyth/opt --enable-R-shlib

or for OpenBLAS

./configure --with-x=no \ --with-blas="-L/usr/local/openblas/lib -lopenblas" \ --with-lapack="-L/usr/local/openblas/lib -lopenblas"\ --prefix=/home/jsmyth/opt --enable-R-shlib

Where the atlas libraries are installed in /usr/local/atlas_3.8.4/lib and the openblas libraries in /usr/local/openblas/lib for this user. They may be located in other directories depending on the initial installation. For more detailed information the user should refer to the R administration and installation page here.

Back to top

hwloc

hwloc is a software package for assessing the topology of multi-core systems. It determines components such as cores, sockets, caches and NUMA nodes and other features. hwloc can be downloaded here and is part of the OpenMPI project. hwloc also uses the pkg-config utility in order to detected by other libraries such as PLASMA. This can be downloaded using the apt-get command line utility for Linux distributions e.g. sudo apt-get hwloc.

$:tar -xf hwloc-1.5.tar.gz $:cd hwloc-1.5 $:./configure --prefix=$BLDDIR $:make && make install

Following this the user should update the PKG_CONFIG_PATH and LD_LIBRARY_PATH

export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$BLDDIR/lib/pkgconfig export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$BLDDIR/lib

PLASMA

PLASMA is Parallel Linear Algebra for Scalable Multi-core Architectures and is one of the major libraries underpinning the HiPLARM package and provides high performance LAPACK and BLAS routines for shared-memory, multi-core systems. As before, if the user is building the libraries outside our build script we recommend they read the PLASMA installation guide prior to installation. We will provide the user with two installation examples, using ATLAS and OpenBLAS. Again, we assume that the libraries mentioned prior are installed in $BLDDIR/lib directory.

Note: It is important that the user uses single threaded BLAS. Note ATLAS does not have any dynamic setting for thread number so users should use the single threaded ATLAS explicitly. If multi-threaded BLAS is installed set the thread number to 1. e.g

$:export OPENBLAS_NUM_THREADS=1 # OR # $:export OMP_NUM_THREADS=1

Similarly if the user is using other versions of BLAS e.g MKL, ACML the same procedure should be followed

Using OpenBLAS

$:tar -xf plasma-installer_2.4.5.tar.gz $:cd plasma-installer $:./setup.py --prefix="$BLDDIR" --blaslib="-L$BLDDIR/lib -lopenblas" \ --lapclib="-L$BLDDIR/lib -lopenblas" \ --cflags="-O2 -fPIC -I$BLDDIR/include" \ --fflags="-O2 -fPIC" --notesting \ --ldflags_c="-I$BLDDIR/include" $:cd $BLDDIR/lib # Make shared libraries # $:gcc -fPIC -shared -o libquark.so \ -Wl,-whole-archive libquark.a \ -Wl,-no-whole-archive -L . -lhwloc $:gcc -fPIC -shared -o libcoreblas.so \ -Wl,-whole-archive libcoreblas.a \ -Wl,-no-whole-archive -L. -lquark -lopenblas $:gcc -fPIC -shared -o libplasma.so \ -Wl,-whole-archive libplasma.a \ -Wl,-no-whole-archive -L. -lcoreblas -lpthread -lquark

Using ATLAS

$:tar -xf plasma-installer_2.4.5.tar.gz $:cd plasma-installer $:./setup.py --prefix="$BLDDIR" \ --blaslib="-L$BLDDIR/lib -lsatlas -llapack" \ --cflags="-O2 -fPIC -I$INCDIR" --fflags="-O2 -fPIC" \ --notesting --ldflags_c="-I$INCDIR" --downlapc $:cd $BLDDIR/lib # Make shared libraries # $:gcc -fPIC -shared -o libquark.so \ -Wl,-whole-archive libquark.a \ -Wl,-no-whole-archive -L . -lhwloc $:gcc -fPIC -shared -o libcoreblas.so \ -Wl,-whole-archive libcoreblas.a \ -Wl,-no-whole-archive -L. -lquark -llapack -lsatlas -llapacke $:gcc -fPIC -shared -o libplasma.so \ -Wl,-whole-archive libplasma.a \ -Wl,-no-whole-archive -L. -lcoreblas -lpthread -lquark -llapacke
Back to top

MAGMA

The MAGMA library is the second major library underpinning the HiPLARM project. It is a dense linear algebra library designed for heterogeneous and hybrid architectures, specifically, multi-core and GPU systems. There is also some support now for multi-GPU systems. The MAGMA library is intended to work optimally on top of the software stack detailed above and so we recommend the user follows the instructions above first and installs the relevant libraries. Further documentation on the library is available here and can be downloaded here. There are instructions for different architectures included in the downloadable package but we will assume the user has followed a similar process to that already described. We will provide instructions for OpenBLAS and ATLAS libraries in addition to instructions on how to build shared libraries for the MAGMA package. This will require some additional patches for the MAGMA source code which we have provided on our server here.

Before starting the build process the user needs to perform the following:
  1. Download and unpack MAGMA
  2. Enter src folder in the MAGMA directory
  3. Delete zheevd_m.cpp
  4. Delete cheevd_m.cpp
  5. Download and unpack our patch in the same src directory
  6. Edit the make.inc file in the the main magma directory with the code below
MAGMA with OpenBLAS

Take the make.inc.goto and copy to make.inc. Using a text editor alter the following in the makefile:

OPTS = -O3 -DADD_ -fPIC F77OPTS = -O3 -DADD_ -fPIC FOPTS = -O3 -DADD_ -x f95-cpp-input -fPIC NVOPTS = -O3 -DADD_ \ --compiler-options '-fPIC',-fno-strict-aliasing\ -DUNIX LDOPTS = -fPIC -Xlinker -zmuldefs LIB = -lopenblas -lpthread -lcuda -lcudart -lcublas \ -lm -lcoreblas -lquark -lplasma CUDADIR = $CUDADIR LIBDIR = -L$LIBDIR -L$CUDADIR/lib64 -L/usr/lib64 INC = -I$CUDADIR/include

Where $CUDADIR and $LIBDIR are where CUDA and the OpenBLAS and other libraries mentioned above are installed. There are some settings that are not mentioned above but contained in the file. The user must also specify if they have a Fermi or Tesla GPU, but this is apparent in the full make.inc file.

Following this the user must build shared versions of the libraries. Enter the lib directory in the main magma folder and execute the following:

$:gcc -fPIC -Xlinker -zmuldefs -shared -o libmagmablas.so \ -Wl,-whole-archive libmagmablas.a -Wl,-no-whole-archive $:gcc -DMAGMA_WITH_PLASMA -fPIC -Xlinker -zmuldefs \ -shared -o libmagma.so -Wl,-whole-archive libmagma.a \ -Wl,-no-whole-archive -L. -lmagmablas

The libraries are now ready for use and the user can place them in a directory of their choice. The user should also be aware that the header files in the include directory are also required by HiPLARM.

MAGMA with ATLAS

If the user has decided to use the ATLAS library then the instructions below should be followed. We assume the user has followed or read steps 1 to 6 in this section.

Take the make.inc.atlas in the main magma directory and copy to make.inc. Using a text editor make the following changes/additions:

OPTS = -O3 -DADD_ -fPIC F77OPTS = -O3 -DADD_ -fPIC FOPTS = -O3 -DADD_ -x f95-cpp-input -fPIC NVOPTS = -O3 -DADD_ --compiler-options '-fPIC', \ -fno-strict-aliasing -DUNIX LDOPTS = -fPIC -Xlinker -zmuldefs LIB = -llapack -lsatlas -lpthread -lcuda -lcudart -lcublas \ -lm -lcoreblas -lquark -lplasma -llapacke CUDADIR = $CUDADIR LIBDIR = -L$LIBDIR -L$CUDADIR/lib64 -L/usr/lib64 INC = -I$CUDADIR/include

Again $LIBDIR is the directory where all the libraries described above are installed. If these are in alternate locations then each location should be entered.$CUDADIR is the directory of the cuda installation.

Following this the user must build shared versions of the libraries. Enter the lib directory in the main magma folder.

$:gcc -fPIC -Xlinker -zmuldefs -shared -o libmagmablas.so \ -Wl,-whole-archive libmagmablas.a -Wl,-no-whole-archive $:gcc -DMAGMA_WITH_PLASMA -fPIC -Xlinker -zmuldefs \ -shared -o libmagma.so -Wl,-whole-archive libmagma.a \ -Wl,-no-whole-archive -L. -lmagmablas

The libraries are now ready for use and the user can place them in a directory of their choice. The user should also be aware that the header files in the include directory are also required by HiPLARM.

Back to top

HiPLARM Installation

HiPLARM is installed like regular R packages with some extra configure flags. Following downloading from CRAN or our website the user can use the R CMD INSTALL command to install the HiPLARM package. The following is an example using the ATLAS optimised BLAS routines.

R CMD INSTALL --configure-args="--with-lapack= \ -L/home/jsmyth/Numerical/lib\ -llapack -lsatlas \ --with-plasma-lib=/home/jsmyth/Numerical \ --with-cuda-home=/usr/local/cuda \ --with-magma-lib=/home/jsmyth/Numerical" HiPLARM_0.1.tar.gz

Note: This assumes a user called jsmyth has installed all the relevant libraries in /home/jsmyth/Numerical/lib and includes in /home/jsmyth/Numerical/include The user should also ensure prior to building HiPLAR that LD_LIBRARY_PATH is set to the correct directory or directories.

Should the user wish to build HiPLARM with OpenBLAS the following command can be used:

R CMD INSTALL --configure-args="--with-lapack= \ -L/home/jsmyth/Numerical/lib\ -lopenblas \ --with-plasma-lib=/home/jsmyth/Numerical \ --with-cuda-home=/usr/local/cuda \ --with-magma-lib=/home/jsmyth/Numerical" HiPLARM_0.1.tar.gz

Again we assume that all the prerequisite libraries are installed in a single directory, which, in the example is /home/jsmyth/Numerical/lib.

Auto-tuning HiPLARM

In HiPLARM we have provided an auto-tuning functionality that allows the user to take optimal advantage of their system architecture. For smaller matrix sizes it is less optimal to use GPUs as the cost of transferring the data outweighs the speed of computation. It is here that PLASMA will provide the computational advantage, whilst for larger sizes this will be achieved using MAGMA and GPUs. The auto-tuning suite will provide this crossover point to make optimal use of the hardware for the given problem size. The suite can be run once after installation and these values are saved for future use. This is automatically run as part of our install script but the user can run the feature themselves, targeting all or particular routines.

In R itself

library(HiPLARM)
OptimiseAll(128, FALSE)

Note: The value of 128 is the difference between the test problem sizes. For a more accurate crossover point the user can choose smaller sizes but 128 should be sufficient. The TRUE value denotes the verbose option, i.e. if TRUE all timing information will be output etc.

The user can also target specific routines using separate optimisation functions e.g. OptimiseChol(), documentation on these and their use is provided in the package within R using ?OptimiseChol() etc.

Using HiPLARM

The user uses HiPLARM much in the same way as the regular Matrix package in R. Simply load the package in the usual fashion and begin using the routines. Before starting R and loading HiPLARM the user should ensure that their LD_LIBRARY_PATH variable is pointing to the right directories containing the prerequisite libraries.

$:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/jsmyth/libraries/lib

In addition, the user should be aware of some setting that may impact performance. For optimal use the user should set the following system variables:

library(HiPLARM)
A <- Matrix(rnorm(4096 * 4096), ncol=4096)
B <- Matrix(rnorm(4096 * 4096), ncol=4096)
A %*% B

Back to top

A note on Windows and Mac

Currently not supported but may be in the future.