HPC Ports

Management of software is a task that has been around as long as computers. In the last decade or two, many "package management" systems have been created. Some support both binary and source-based distribution and some support only source-based packages. The needs of software management on High Performance Computing (HPC) systems is driven by the following realities:

  • Most people do not have root access to these systems.
  • Many HPC systems have heavily customized userspace tools and compiler toolchains.
  • In many cases, "standard" tools that come with the OS are horribly out of date or compiled with a different compiler which is not compatible with the one we are using.
  • In the case of systems with static compilation, updating a low-level package requires rebuilding all dependent higher-level packages.
Given these facts, if a user wants to install a complicated application with many dependencies, then they are faced with having to install from scratch many pieces of commodity software just to ensure that everything has been compiled consistently. They are also faced with the burden of updating this installed software as needed when new versions of packages are released. HPCPorts is designed to make this situation tractable.

Latest Development Activity

Documentation and Download

If you want to use HPCPorts to manage software on your own system, then it is available on github under a BSD license. You can read the documentation here, and can browse / clone the source here.

I maintain installations of HPCPorts on several machines. Usually I attempt to install as many packages as possible under the constraints of the system (for example, it is not practical to use python on a system that only supports static compilation). If you just want to use HPCPorts on one of these maintained systems, you do not have to download anything- just see below for how to use the modules that I have already installed.

HPCPorts at NERSC

There are several steps that are common to using the installs of HPCPorts on the machines at NERSC. The first step is to confirm which shell you are using:

$> echo $SHELL

If this is not the shell you wish to use, log in to NIM and change your shell. Now you will need to edit your shell resource file. If you are using bash for your shell, you should edit ~/.bashrc.ext. If you are using tcsh, you should edit ~/.tcshrc.ext. Near the top of this file (before the machine-specific sections), put the line:

source /project/projectdirs/cmb/modules/hpcports_NERSC.sh

OR

source /project/projectdirs/cmb/modules/hpcports_NERSC.csh

This special shell snippet will automatically determine which experiment filegroups you are in, and add experiment-specific module locations into your search path. It also adds shell aliases to allow loading particular "flavors" of HPCPorts on each machine. As a further example, here is what a minimal ~/.bashrc.ext might look like:

# begin .bashrc.ext
#
# User additions to .bashrc go in this file
#

# HPCPorts environment
source /project/projectdirs/cmb/modules/hpcports_NERSC.sh

# Global Settings for all machines

alias qwork='cd ${PBS_O_WORKDIR}'
export EDITOR='emacs -nw'

# Machine-specific settings

if [ $NERSC_HOST == "datatran" ]; then
  # specific settings for data transfer nodes (dtn01,dtn02, etc)
  echo "" > /dev/null
fi

if [ $NERSC_HOST == "hopper" ]; then
  # specific settings for hopper
  echo "" > /dev/null
fi

if [ $NERSC_HOST == "edison" ]; then
  # specific settings for edison
  echo "" > /dev/null
fi

if [ $NERSC_HOST == "carver" ]; then
  # specific settings for carver
  echo "" > /dev/null
fi

# end .bashrc.ext

After logging in, you must run a machine-specific command to select the compiler environment you want to use. This command has the form:

$> hpcports <environment>

The choices of environment on each machine are listed under the name of the machine below. Now you are ready to load some modules. Note that in order to avoid naming conflicts with other system-installed module files, all modules installed by HPCPorts have the "-hpcp" suffix in their names. For convenience, there is an alias name "cmb" which points to the "cmb-hpcp" module. So for example, you can load modules like this:

$> module load astromatic-hpcp
$> module load scipy-hpcp
$> module load cmb

What follows below are a machine-specific details for the supported NERSC systems.

carver.nersc.gov

On carver, the primary HPCPorts flavor is the "gnu" one, which uses a newer version of gcc than the system installed compiler. To select this you would do:

$> hpcports gnu

and then load the desired modules.

Important Notes:

  • The latest OpenMPI installed by NERSC (1.6.3) was misconfigured with thread support, which is not possible over the infiniband interconnect. Running jobs which use MPI threading support will result in jobs dying. Because of this, I use a self-built GNU toolchain with gcc-4.9.0 and binutils 2.24, along with OpenMPI 1.8.1 provided by HPCPorts. Threading support is disabled.
  • The version of the environment modules on carver was too old to support the large module hierarchy needed by HPCPorts. Running the "hpcports (env)" command will do an in-place swap of the modules tools with a newer version (3.2.10) built by me.
  • The version of tcsh on carver was too old to support large environments. Running the "hpcports (env)" command will add a newer version of tcsh into your PATH and spawn a new shell. When logging out of the machine, you may find that you have to type "exit" twice.

Because HPCPorts uses a newer version of OpenMPI than the version installed by NERSC, you DO NOT need to specify the "-bynode", "-bysocket", and "-bind-to-socket" options to mpirun. Instead, mpirun simply does the correct thing. You can always display the actual process placement information with the "--report-bindings" option to mpirun.

sgn0{1,2,3,4}.nersc.gov

On the science gateway nodes, there is only one flavor of HPCPorts. The environment is initialized with:

$> hpcports

and then you can load the desired modules.

Important Notes:

  • The version of the environment modules on the science gateway nodes was too old to support the large module hierarchy needed by HPCPorts. Running the "hpcports (env)" command will do an in-place swap of the modules tools with a newer version (3.2.10) built by me.
  • The compilers on the science gateway nodes was slightly old, and the installation of gcc-4.7.x had been relocated after install (which breaks libtool files). HPCPorts on the science gateway nodes uses the same GNU 4.9.0 and binutils toolchain that I use on carver.

hopper.nersc.gov

On hopper, there is one toolchain to choose from. This uses a newer version of gcc, and has been compiled with the standard Cray programming environment. This uses static linking and is designed to maximize performance on large jobs. One consequence of static linking is that only a subset of HPCPorts packages have been installed (no python, for example). You select this flavor after logging in, by doing:

$> hpcports gnu

and then load the desired modules.

edison.nersc.gov

On edison, there are three toolchain flavors to choose from. The "gnu" one uses a newer version of gcc, and has been compiled with the standard Cray programming environment. This version of HPCPorts uses static linking and is designed to maximize performance on large jobs. One consequence of static linking is that only a subset of HPCPorts packages have been installed (no python, for example). Another HPCPorts flavor is "shared_gnu" which is built with the same Cray compiler, but all packages are built with dynamic linking. This toolchain flavor supports python and other tools like HARP which load plugins at runtime. The shared_gnu toolchain does not permit network access from the compute nodes. The third HPCPorts flavor on edison is the "ccm_gnu" toolchain, which is built using Cray's "Cluster Compatibility Mode". The software stack in this case is almost identical to carver, and processes running on the compute nodes can access the network. For a given login session, you must choose what environment you want to use. You select this flavor after logging in, by doing:

$> hpcports gnu # (for best performance on statically compiled MPI code)

OR

$> hpcports shared_gnu # (for python codes or anything using runtime plugins)

OR

$> hpcports ccm_gnu # (for codes which require network access)

and then load the desired modules.

Special Instructions for CCM mode on edison

Using CCM requires some extra work when running MPI jobs. The general documentation for CCM is on the NERSC website here. Because the shell environment is not propagated to the nodes executing the MPI processes, we must use another technique. I have created a command which dumps your environment to a file in your home directory, and the hpcports_NERSC.(c)sh file then loads this environment file if it exists. After the MPI command executes, you run another command to clear this file. There are additional options needed to the mpirun command to enable proper job startup within the CCM environment. Here is an example PBS script on edison which runs an MPI job in CCM mode:

#PBS -S /bin/bash          
#PBS -l mppwidth=48        
#PBS -N test               
#PBS -j oe                 
#PBS -q ccm_queue              
#PBS -l walltime=0:30:00   
#PBS -V                    

cd $PBS_O_WORKDIR

# Total cores (24 per node)
# This MUST match the mppwidth parameter above
CORES=48                   

# Processes per node
NODE_PROC=4

# Threads per process
NODE_THREAD=$(( 24 / NODE_PROC ))

# Total number of processes
NPROC=$(( CORES / NODE_THREAD ))

# Set OpenMP threads
export OMP_NUM_THREADS=${NODE_THREAD}

# Dump full environment, which will be loaded
# by the shell on compute nodes when sourcing
# hpcports_NERSC.sh
hpcpenv set

# Run job, with optional display of process affinity
ccmrun mpirun \
--prefix ${openmpi_PREFIX} \
--hostfile ${PBS_NODEFILE} \
-np ${NPROC} --report-bindings \
/path/to/my/favorite/executable

# Clear environment
hpcpenv clear

Note that the "hpcpenv set" and "hpcp clear" commands only work within a PBS script- since they use the $PBS_JOBID to label the dump files. If some jobs die, there may be a bunch of stale dump files in your home directory. You can either remove them manually or use the "clearall" option:

$> hpcpenv clearall

OR

$> rm ~/.hpcpenv_*

IPAC

Many HPCPorts packages are installed on the "max" Beowulf cluster at IPAC. If you have an account on that machine, you can follow these instructions to access those modules. The first step is to edit your shell resource file (either ~/.bashrc or ~/.tcshrc) and append the HPCPorts directory to your module search path by adding one line:

module use /planck/tools/hpcports/env/modulefiles
There is no "hpcports" toolchain selection function on max, since there is only one flavor of HPCPorts installed there. In order to access planck-proprietary software modules, see additional instructions on the Planck experiment page. After that, log out and log back in. Now you can access any of the modules. Note that in order to avoid naming conflicts with other system-installed module files, all modules installed by HPCPorts have the "-hpcp" suffix in their names. For convenience, there is an alias name "cmb" which points to the "cmb-hpcp" module. So for example, you can load modules like this:

$> module load astromatic-hpcp
$> module load clique-hpcp
$> module load cmb

riemann.lbl.gov

This cluster at LBNL is used by the SDSS. There is only one toolchain supported (using a recent version of gcc). All tools on top of this are built by HPCPorts, including OpenMPI. You should edit your ~/.bashrc or ~/.tcshrc, depending on your choice of shell, and add this line:

source /clusterfs/riemann/data/kisner/software/hpcports_riemann.sh

OR

source /clusterfs/riemann/data/kisner/software/hpcports_riemann.csh

This will switch the default version of the environment modules to a newer version (3.2.10) built by me. It will also add a couple of directories to your module search path. After that, log out and log back in. Now you can access any of the modules. Note that in order to avoid naming conflicts with other system-installed module files, all modules installed by HPCPorts have the "-hpcp" suffix in their names. So for example, you can load modules like this:

$> module load astromatic-hpcp
$> module load ipython-hpcp
$> module load scipy-hpcp

Important Notes:

  • The version of the environment modules on riemann was too old to support the large module hierarchy needed by HPCPorts. Sourcing the file above will do an in-place swap of the modules tools with a newer version (3.2.10) built by me. You could put that source command inside a function / alias if you wish to control this behavior in a more fine-grained way.
  • Although many software packages existed on riemann, they were built with a variety of different compilers. I have installed a base of gcc-4.8.2 and this is used to build all packages in HPCPorts.
  • If you experience issues with the tcsh environment size, either switch to bash or you can do "module load tcsh-hpcp" to get a more recent tcsh with support for larger environments.

sisu.csc.fi

This machine at the Finnish supercomputing center (CSC) is used by some members of the Planck collaboration. There are two versions of HPCPorts installed here. The first is a statically compiled version to be used on the compute nodes, which has only a subset of available packages. The second is a dynamically built version that can be used on the login / PBS nodes, and which contains python and other tools requiring dynamic linking. You should edit your ~/.bashrc or ~/.tcshrc, depending on your choice of shell, and add this line:

source /proj/planck/software/hpcports_sisu.sh

OR

source /proj/planck/software/hpcports_sisu.csh

This will add a shell alias to allow selecting which flavor of HPCPorts you want to use. After logging in, you can select between the compute node or login node versions with:

$> hpcports gnu

OR

$> hpcports login

Now you can access any of the modules. Note that in order to avoid naming conflicts with other system-installed module files, all modules installed by HPCPorts have the "-hpcp" suffix in their names. For convenience, there is an alias name "cmb" which points to the "cmb-hpcp" module. So for example, you can load modules like this:

$> module load astromatic-hpcp
$> module load cmb