User Guide

Please see the table of links to the the other topics in this guide on the far right and below each section and the embedded table of links to page subsections.

Scope

This guide is not intended to descibe each topic in full detail but rather to descibe what you need to know specifically for our environment. More details about a command or software package or technology can be found elsewhere in most cases. Please email us if you are having trouble finding some information you need.

 

This user guide also assumes familiarity with the following topics:

Cluster Computing

Unix/Linux

Research Computing resources are presented in a Linux environment. Be comfortable working from a command-line (notably the "bash" shell). Users are expected to understand how to edit text files, and how to manipulate file and directory permissions.

Secure Shell (SSH)

Access to Research Computing resources is generally provided via SSH. Understand how to log into a remote server using SSH, and how to use SSH to transfer files between the remote server and your local workstation.

Shell scripting

Compute jobs launched on Research Computing compute resources are initialized by user-written shell scripts. Beyond that, many common operations can be simplified and automated using shell scripts.

 

Accounts

Getting an account

Access to Research Computing resources is available to CU-Boulder, Colorado State University and RMACC members. If you are a UCB or CSU user, request accessing using the RC Account Request portal. If you are a non-UCB or -CSU RMACC user, RC will provide information on how to request access to Summit during the second quarter of 2017

External users who are collaborating with a UCB reserach group must obtain a sponsored affiliate account with CU before they will be able to request a Research Computing account.

Account deletion

User accounts may be subject to deletion based on a review of resource balance or allocation time (i.e. a project has ended or an account exists during a review cycle that isn't associated with any active projects). Users will have one month warning to reapply for a project or to retrieve files, data, and application codes. After the one month period, the data associated with the account on the specific resource will be deleted and no backups will be preserved.

Authentication

Research Computing resources require "two-factor" authentication for external access. Two authentication mechanisms are currently supported:

Duo
A cloud-based authentication platform that supports out-of-band authentication using a smartphone application.
Vasco
A two-factor authentication system based on a PIN and a physical one-time password (OTP) generator.

Users may be set up with the Vasco system, the Duo system, or both.

Duo authentication

Research Computing uses Duo Security for its preferred authentication platform. After receiving a Research Computing account, contact us at rc-help@colorado.edu or visit one of the OIT help desks and ask for Research Computing Duo enrollment in order to request a Duo invitation. You will need to present a valid photo ID.

Enrollment

Once an RC administrator enrolls you into Duo, you will receive an enrollment email from Duo Security that outlines the steps to set up your two-factor authentication. RC currently supports Duo "push" and "phone call" for authentication, with "push" being the preferred method.

Logging in

To log in using Duo, after having registered a Duo credential, enter your password as duo:identikey-pw, where the constant duo: prefix activates Duo authentication, and replacing identikey-pw with your CU campus IdentiKey password. If your password is correct, Duo will automatically prompt your phone app or call the phone number registered in enrollment. Follow the instructions provided to allow login and your login will be completed.

Vasco OTP

The Vasco one-time password (OTP) system generates a new password approximately every 30 seconds that is displayed on a hardware authenticator device. This six-digit OTP is combined with a secret four-digit PIN with the resulting in a unique 10-digit numeric password.

Obtaining a Vasco authenticator

OTP authenticators are available at the IT Service Center in the Technology Learning Center. You can order an authenticator online, but you will need to retrieve it in person when it is ready. You will be asked to show ID to be given an authenticator.

OTP authenticators are also available at the Research Computing office in the Administrative and Research Center. Request an appointment by sending email to rc-help@colorado.edu. Again, you will be asked to show ID to be given an authenticator.

If you cannot visit any of the above locations in person, contact the IT Service Center or email us at rc-help@colorado.edu to arrange an alternative.

Registering an OTP authenticator

An OTP authenticator must be registered with your account before it can be used. To register your device, visit the OTP Self Service page, and follow the instructions under Register Your Device to register your authenticator using your CU-Boulder IdentiKey.

Logging in

To log in using Vasco, after having registered a Vasco token, enter your password as PinOtp, replacing pin with the four-digit numeric pin you selected or were assigned during registration, and replacing otp with the six-digit one-time password generated by the token.

Getting help

If your authenticator or Duo credential is lost or damaged, or if you experience any other technical problems, please contact us right away at rc-help@colorado.edu.

Further reading

You can read more about the CU Boulder IdentiKey password system on the OIT website.

Remote access and logging in




Access to Research Computing resources is available over the campus network and the general Internet largely by way of the Secure Shell, or ssh, protocol. Access is provided via one of the dedicated login nodes.

When you have logged into an RC login node, you are accessing one of several virtual machines on which you can edit files, submit jobs to the compute clusters, and access storage resources. Use of the login nodes should be restricted to editing, data transfer, and job submission. Running CPU- or memory-intensive processes on the login nodes is actively discouraged because doing so is likely to impact other users' work. Data analysis and large-scale data transfers should be done via scheduled jobs. Software compilation is best done on the targeted resource via an interactive job or (for applications that are meant to run on Summit) on one of the Summit compile nodes.

The secure shell (ssh)

Once you have

  • obtained a Research Computing account;

and

  • obtained, registered, and tested an OTP authenticator;
  • or registered with Duo

you should be able to log into an RC login node using the secure shell (ssh).

The ssh command can be run from the Linux and OS X command-line.

ssh -l rc_username login.rc.colorado.edu

where you should replace rc_username with your actual Research Computing username.

When logging in from Windows, we recommend the PuTTY application.

The first time you log in, the system will configure your internal ssh key pair, used for authentication between internal hosts.

Attempting to create internal ssh config for connecting to
CURC managed resources...  Generating public/private rsa key pair.
Your identification has been saved in /home/example/.ssh/curc.
Your public key has been saved in /home/example/.ssh/curc.pub.
The key fingerprint is:
ed:43:02:88:99:10:d5:87:ed:84:51:b1:69:73:24:b4
example@login01.rc.colorado.edu

Further reading

The ssh protocol and its related applications have many features. It can be used as simply as described here, to provide simple shell access to the RC login nodes; but it can also

  • transfer files,
  • forward X11 GUI application interfaces,
  • proxy TCP connections,
  • multiplex sessions through a single connection,

and much more.

If you are interested in using advanced ssh features, we recommend the following additional reading:

Linux and the command line

The default shell

The user's shell (the program that accepts and executes commands) is selected during the account request process. Most users are using the default shell, "Bash."

When you log in you may see a basic shell prompt: -bash-2.11 $

You can customize the prompt by setting the PS1 environment variable. Documentation for setting PS1 is available in the bash manpage, and many examples are available online.

$ PS1="[\u@\h \W]\\$ "

When a user logs in interactively and starts a Bash shell, a number of scripts are executed automatically. Users typically edit these scripts in their home directory to set up their environment upon login.

.bash_profile
executed for login shells
.bashrc
executed for non-login shells (and not re-executed if you invoke a second shell)
.logout
executed at logout

In some cases, it may be appropriate to record all configuration in .bashrc and "source" .bashrc from .bash_profile.

source ~/.bashrc

It is possible to load modules from your .bashrc or .bash_profile, but this can have unintended consequences. In the Research Computing environment it is very important to be aware that your jobs will run in a different environment from the interactive login session. Your job scripts must be written to load relevant modules as needed, regardless of whether they are loaded into the command line environment at job submission.

If you would like to change your shell, please contact us at rc-help@colorado.edu, providing your username and preferred shell.

Compute resources

Summit

The Summit supercomputer is a heterogeneous Linux compute cluster with an aggregate performance of about 450 TFLOPS (trillion floating-point operations per second.) Funding for Summit was provided by the National Science Foundation (MRI Award Nos. ACI-1532235 and ACI-1532236), the University of Colorado Boulder, and Colorado State University.

Operating system Red Hat Enterprise Linux 7
Compute nodes 488 total (452 general compute, 11 GPU, 5 high-mem, and 20 Phi
CPU cores 12,632 total
System memory 70.8 TiB total
Parallel storage 1.2 PB high-performance GPFS scratch filesystem
Primary interconnect Intel Omni-Path 100 Gb/s fat-tree topology with 2:1 oversubscription between groups of 32 compute nodes
Ethernet 1 GbE to each compute node with 10 GbE aggregate connectivity to the CU Science Network

More detailed technical specifications can be found on the Resources page.

Access to Summit requires a Research Computing account and a supporting PI or faculty member who agrees to abide by the relevant export control requirements.

Blanca

The Research Computing Condo Computing service offers researchers the opportunity to purchase and own compute nodes that will be operated as part of a cluster, named Blanca. The aggregate cluster is made available to all condo partners while maintaining priority for the owner of each node.

Benefits to partners include

  • Data center rack space – including redundant power and cooling – is provided, as is scratch disk space.
  • Partners get significantly prioritized access on nodes that they own and can run jobs on any nodes that are not currently in use by other partners.
  • System configuration and administration, as well as technical support for users, is provided by RC staff.
  • A standard software stack appropriate for a range of research needs is provided. Partners are able to install additional software applications as needed.
  • Bulk discount pricing is available for all compute node purchases.

Three types of compute nodes are available. The specifications below are updated periodically. If you need a newer configuration, such as the latest CPU or GPU model, please email rc-help@colorado.edu.

Node type Specifications Cost Potential uses
Compute node
  • 2x 14-core 2.4 GHz Intel “Broadwell” processors
  • 128 GB RAM at 2400 MHz
  • 1x 1 TB 7200 RPM hard drive
  • 10 gigabit/s Ethernet
$6,875/node general purpose high-performance computation that doesn’t require low-latency parallel communication between nodes
GPU node
  • 2x 14-core 2.4 GHz Intel “Broadwell” processors
  • 128 GB RAM at 2400 MHz
  • 1x 1 TB 7200 RPM hard drive
  • 10 gigabit/s Ethernet
  • 1x NVIDIA P100 GPU coprocessor (12 GB RAM)
$13,560/node molecular dynamics, image processing, deep learning
Dual GPU node
  • 2x 14-core 2.4 GHz Intel “Broadwell” processors
  • 128 GB RAM at 2400 MHz
  • 1x 1 TB 7200 RPM hard drive
  • 10 gigabit/s Ethernet
  • 2x NVIDIA P100 GPU coprocessors (16 GB RAM)
$13,560/node molecular dynamics, image processing, deep learning
Himem
  • 4x 12-core 3.0 GHz Intel “Ivy Bridge” processors
  • 1024 GB RAM at 1600 MHz
  • 10x 1 TB 7200 RPM hard drives in high-performance RAID configuration
  • 10 gigabit/s Ethernet
$34,511/node genetics/genomics applications

Crestone

Crestone is deprecated and may be decommissioned soon.

Crestone is a Dell PowerEdge M1000e Blade system and is provided for jobs requiring more memory or longer runtimes than are available on Janus. This system is intended for single-node jobs and does not have access to a high-speed, low-latency network interconnect.

Operating system Red Hat Enterprise Linux 6
Compute nodes 16
Compute cores 192 total, two Intel Xeon X5660 (6x2.8 GHz "Westmere") processors per node, Hyper-Threading enabled
System memory 1,536 GiB total, 96 GiB per node
Local storage 2 TB hard disk per node

Crestone nodes are accessed via the crestone QOS.

Storage and filesystems

The Research Computing environment provides several types of storage. Understanding these storage options, how they impact job execution, and how they may impact other running jobs is essential when running in the Research Computing environment.

Home

Each user has a home directory available from all Research Computing nodes at /home/${USER}/. Each home directory is a limited to 2GB to prevent the use of the home directory as a target for job output, software installs, or other data likely to be used during a compute job. Home directories are not stored on a high-performance file-system and, as such, they are not intended to be written to by compute jobs. Use of a home directory during a high-performance or parallel compute job may negatively affect the environment for all users.

A hidden .snapshot/ directory is available in each home directory (and in each subdirectory) and contains recent copies of the files at 2-hour, daily, and weekly intervals. These snapshots can be used to recover files after accidental deletion or corruption.

Home directories are intended for the use of their owner only; sharing the contents of home directories with other users is strongly discouraged.

Home directories are protected by the redundancy features of the OneFS clustered file system, and are backed-up to a second site each night for disaster recovery.

Projects

Each user has access to a 250GB projects directory available from all Research Computing nodes at /projects/${USER}/. The projects directory is intended to store software builds and smaller data sets. Projects directories may be shared with other RC users. Like home directories, project directories are not intended to be written to by compute jobs. Significant I/O to a project directory during a high-performance or parallel compute job may negatively affect the environment for all users.

A hidden .snapshot/ directory is available in each project directory (and in each subdirectory) and contains recent copies of the files at 6-hour, daily, and weekly intervals. These snapshots can be used to recover files after accidental deletion or corruption.

Project directories are protected by the redundancy features of the OneFS clustered file system, and are backed-up to a second site each night for disaster recovery.

Summit scratch

A high-performance parallel scratch filesystem meant for I/O from jobs running on Summit is available at /scratch/summit/$USER/.

By default, each user is limited to a quota of 10 TB of storage space and 20M files and directories. Email rc-help@colorado.edu if you need these limits increased.

Summit scratch is a storage space most likely to provide the highest I/O performance for jobs running on Summit and is the preferred storage target for these jobs.

Summit scratch is mounted on all Summit compute nodes via the GPFS protocol over the Omni-Path interconnect. It is also mounted on login nodes and data-transfer nodes (ie, Globus endpoint nodes) via the NFS protocol.

High-performance scratch directories are not backed up or checkpointed, and are not appropriate for long-term storage. Data may be purged at any time. Files are automatically removed 90 days after their initial creation.

Inappropriate use of Summit scratch, including attempts to circumvent the automatic file purge policy, may result in loss of access to Summit.

Summit Scratch is served by a DDN GRIDScaler appliance running GPFS (aka IBM Spectrum Scale) version 4.2.

General scratch

A central general-purpose scratch filesystem is available at /rc_scratch/$USER/. This space is intended for short term/parallel storage over the length of a job run. Its primary purpose is for I/O from jobs on the Blanca cluster.

No administrative limits are placed on the amount of data that can be placed in a general-purpose global scratch directory.

Summit compute jobs should use Summit scratch, not the general-purpose scratch filesystem. As such, the general-purpose scratch filesystem is mounted read-only on Summit compute nodes.

General-purpose scratch directories are not backed up or checkpointed, and are not appropriate for long-term storage. Data may be purged at any time. Files may be automatically removed 90 days after their initial creation


The PetaLibrary

The PetaLibrary is a cost-subsidized service for the storage, archival, and sharing of research data, housed in the Space Sciences data center at CU-Boulder. It is available for a modest fee to any US-based researcher affiliated with the University of Colorado Boulder. For more details visit our Petalibrary page.

The PetaLibrary stores and archives research data, broadly defined as the scholarly output from researchers at the University of Colorado, as well as digital archives and special collections. The PetaLibrary is not an appropriate storage target for administrative data, data that is copyrighted by someone other than the project owner or members, or sensitive data (e.g., HIPAA, FERPA, ITAR, or Classified).

PetaLIbrary storage is allocated to individual projects. A PetaLibrary project is purchased and overseen by a Principal Investigator (PI), but may be made accessible to multiple additional members. A PI may have multiple PetaLibrary projects, each with a different storage allocation and list of authorized members. Each project may also have a Point of Contact (POC) person who is allowed to request changes to the project on behalf of the PI.

Each project is presented as a unique path on a posix-compliant filesystem.

The PetaLibrary system provides two primary services: PetaLibrary Active and PetaLibrary Archive.

The PetaLibrary has a 10 Gb/s connection to the CU Science Network, allowing transfers at speeds up to 800 MB/s.

PetaLibrary Active

The PetaLibrary Active storage service is mounted at /work/ on all Research Computing computational systems. This storage may be used by compute workloads, but it is not designed to be performant under I/O-intensive applications or parallel writes. (Parallel or concurrent IO should target Summit scratch or /rc_scratch instead, depending on which cluster it originates from.)

PetaLibrary/archive and hierarchical storage management (HSM)

PetaLibrary Archive is a hierarchical storage system that dynamically and simultaneously manages data on both disk and tape storage systems while presenting all data as part of a single, unified filesystem namespace. The process by which files are automatically migrated between disk and tape storage is called "hierarchical storage management" (HSM). Normally, recently-used or small files are kept on low-latency disk storage, while older or larger files are migrated to bulk tape storage. This provides good performance when accessing frequently-used data while remaining cost-effective for storing large quantities of archive data.

PetaLibrary/archive provides an optional "additional copy on tape" to protect against tape media failure. (All disk storage is protected from media failure by parity in a disk array.)

Sharing data

Data in the PetaLibrary can be shared with collaborators who do not possess a CU Identikey or RC account via the Globus sharing service. To enable sharing, a member of the PetaLibrary project must authenticate to the "CU-Boulder Research Computing" Globus endpoint and configure a "Globus shared endpoint" that references the intended PetaLibrary storage area. You can then grant access (read and/or write) to the shared endpoint to anyone with a Globus account.

A Globus Plus subscription is required to use the Globus sharing service. One Globus Plus subscription is provided with each PetaLibrary project.

To comply with NSF requirements, external collaborators should have a US-based affiliation.

For more information, contact us at rc-help@colorado.edu.

Local scratch

Local scratch directories reside directly on individual compute nodes and are assigned automatically during job execution. Because these directories are assigned and removed automatically, the location of the directory is assigned to the $SLURM_SCRATCH environment variable.

No administrative limits are placed on the amount of data that can be placed in a temporary local scratch directory. Scratch directories are limited only by the physical capacity of their storage.

Local scratch directories are not backed up or checkpointed, and are not appropriate for long-term storage. Local scratch directories, and all data contained in them, are removed automatically when the running job ends.

Implementation of local scratch directories
Partition Description
Summit (Haswell, GPU, KnL) 180 GB local SSD.
Summit (Himem) 10 TB local RAID filesystem.
Crestone 0.8 TB local SATA drive.
Blanca 0.8 TB local SATA drive.

Allocations

Quick overview of steps needed to start computing on Summit

  • In order to run jobs on Summit, you need to be a member of an "allocation" of compute time. Allocations are used to facilitate the reasonable and fair sharing of our limited resources.
  • If you are a member of a CU Boulder research group that already has a project and allocation on Summit (including condo allocations), ask your PI to email us with permission for you to access to the appropriate project.
  • Otherwise, request access to the UCB General allocation by sending an email to rc-help@colorado.edu with a few sentences outlining your proposed use of Summit.

The general allocation is available for initial startup and testing as well as small-scale production work. Within one year, ensure that your research group has created its own Summit project and has requested an allocation of CPU time associated with that project. Only PIs can create projects.

Please read through the following sections for more information about projects and allocations.

Projects, General access, and Allocations

In this context a Project is a defined research effort and a container for compute time Allocations, as well as listings of expected goals and reported results. Projects require PI approval and supervision. For our purposes a PI is long-term research or teaching faculty and not a shorter term student, PostDoc, or visitor position.

Projects are required for use of RC resources as a means to document the PI supervision of computational work, the reporting of results, and most of all to facilitate the reasonable and fair sharing of our limited resources.

CU Boulder faculty/staff/student users will either be part of a PI supervised Project or they can initially get access to Summit via “General” access for up to one year. The “General” compute time allocation is provided so that UCB users may familiarize themselves with the system and do the preliminary work needed to inform a Project and Project Allocation.

Outside collaborators, visitors, etc. will need to be part of a Project and its related allocation in order to use Summit. This insures that the outside collaborator is supervised by a UCB PI/Faculty and is working on UCB research.

Projects may span multiple years, however compute time allocations associated with Projects will need to be re-evaluated yearly.

Users who are part of a Project can obtain compute time Allocations to increase priority and allow for greater throughput. There is no charge for the allocation however there is a process to request, review, and award these Allocations. Approximately 50 million SU are available to be allocated to about 50 UCB research groups via Project allocations each year.

Allocation requests will require prior use of Summit for testing, scaling and optimization work to inform the Allocation request. Unused hours from Janus will not carry over onto Summit.

We strongly urge you to email rc-help@colorado.edu with questions about the request process or to begin a collaboration on your Allocation request with RC. Doing so can help avoid wasted effort and delays in allocation approval.

General Access without a Project

  • (Startup/Temporary compute time allocation)
  • Requested via a portal (in development). In the meantime, email rc-help@colorado.edu to request access to the general allocation.
  • Easy to get for new UCB Summit users
  • Limited to a year
  • Impractical for larger scale operation
  • Results reporting required
  • Not available to Sponsored Affiliates, Visitors, external collaborators
  • Users can use about 10K SU per month with reasonable queue waits

Project

  • Approximately 5 year maximum duration
  • Reasonably specific, not a general request for group resources for a variety of goals
  • Container for compute time Allocations
  • Administered via a web portal
  • Supervised by PI “Long-term” faculty
  • Establishes clarity of longer term research effort
  • Organize expected and reported results (papers, degree awards, etc).
  • Requires annual confirmation of Project continuation and user list.
  • Required for Sponsored Affiliates, Visitors, and external collaborators

Project Compute Allocation

  • Establishes an amount of available compute time to project members, requested as Service Units or “SU” (based on core-hours) but awarded as a % share of the system.
  • Rewards users who work within their budget with increased priority on Summit.
  • Typically allows for more throughput and shorter wait times than users in General
  • Jobs continue to run at a lower priority even if the % share is overused
  • General-like testing allocations possible when conditions warrant, i.e. visitors or users who have already had a year unsupervised General access but need to test for a new Project
  • 50 million SU available per year to allocate to about 50 UCB research groups

Project Description - Include this information when creating a Project

  • Title (Unique, descriptive, about 60 characters or less)
  • PI (Must have a Research Computing account)
  • Concise description of the research project and its significance
  • Estimated Project timeframe/duration
  • Concise description of the related computational work
  • Goals (typically degrees, publications, presentations, data products)
  • Grants that support this project, specifically grants that are directly related to work performed on RC resources
  • General description of the data storage, sharing and management for the project (and beyond if applicable).
  • Create a new project using the web portal

Allocation Request Worksheet

This worksheet can help you develop your Allocation request proposal. You can use it as a template or to check your proposal to determine if it is complete. Please address all of the bullet points in the worksheet. If you feel that some of the requested information is not applicable, please note why that is rather than leaving that section out of your request entirely.

Requests for larger allocations will need more detailed justification. As a general guideline, a request for 300K SU might require a two-page proposal, while a 1M+ SU request might require three to five pages.

Why do we ask for so much detail? It's not to create unnecessary irritating work for you! It helps to ensure that CU's heavily subscribed HPC resources are being used properly and fairly. The allocation request process gives you the opportunity to ensure that you have a clear plan for using Summit, that your workflow is appropriate for the resources you plan to use, and that your application is running efficiently. Recently, a number of groups have realized improvements of 4x or greater in overall efficiency by working through the steps in this allocation worksheet. That immediately leads to greater research productivity!

Introduction and summary

  • Concise Description - Describe the portion of the Project that this computational work supports
  • Allocation goals - Describe the anticipated goals for this particular effort as a subset of the Project goals.
  • Duration: indicate if this allocation is for one year or until completion of a nearer-term goal whichever is sooner.
  • Expected followup - indicate if this is the final allocation for the Project or if work will likely continue.

Computational method

  • Describe the application(s) that you will be running
  • Details of application optimization
    • If application is an RC-provided module please indicate; in that case you don't need to provide further details about application optimization.
    • If application has been optimized and detailed in a previous Allocation Request please indicate the specifics or paste in those details with attribution.
    • Describe how the computational algorithm was optimized; note whether optimized numerical libraries such as Intel MKL are used; note any compiler optimization flags used.
    • When was optimization and scaling testing performed, and by whom? Please include information such as Job IDs and/or username and dates.
  • Workflow optimization
    • For parallel applications, show how the total job time changes as more cores or nodes are used (ie, provide scaling information)
    • Describe how nodes are fully utilized in terms of memory, CPU or both.
    • Describe how the workflow was structured to fit the resources, walltime limits, etc.

I/O

  • Describe the disk I/O by job type
  • Temporary files - indicate the size and number of job specific temporary files and how/if they are removed.
  • Local vs scratch - Usage of on-node local disk vs /scratch
  • Output files - Describe the nature, size and number of output files that remain after job completion.

Data Management

  • Describe how much data this effort will produce, per job and overall.
  • How much of that is temporary "raw" output which will be post processed and then deleted.
  • How much will need to be migrated off scratch to safe storage. (Recall that scratch filesystems are purged at regular intervals and thus can only be used for temporary storage.)
  • Describe this "safe" storage (RC PetaLibrary, department file server, etc.)

CPU time ("SU") request

  • Break down the estimated number of jobs of each type and the cores and walltime they require.
  • Indicate the Billing Weight modifier for the resource type. The different Summit nodes (Haswell,KnL,GPU and high memory) are charged or “billed” against your award differently based on actual node purchase cost.
  • Multiply and total to determine your SU request
  • See the Summit Partitions in our User Guide for more details on the various Summit nodes
Example table for calculating total SU requirement
Node Type Jobs Cores Hours Weight Total
shas 100 24 12 1 28,800
sgpu 2.5
smem 50 12 4 6 14,400
sknl 0.1
Grand Total 43,200

Project allocation requests should be submitted through the relevant project in the web portal.

Batch queueing and job scheduling

Research Computing uses a queueing system called Slurm to manage compute resources and to schedule jobs that use them. Users use Slurm commands to submit batch and interactive jobs and to monitor their progress during execution.

Access to Slurm is provided by the slurm module.

$ module load slurm/cluster-name
where you should replace cluster-name with "summit" to submit jobs to Summit, and with "blanca" to submit jobs to Blanca. If you do not specify a cluster-name it will default to Janus.

Batch jobs

Slurm is primarily a resource manager for batch jobs: a user writes a job script that Slurm schedules to run non-interactively when resources are available. Users primarily submit computational jobs to the Slurm queue using the sbatch command.

$ sbatch job-script.sh

sbatch takes a number of command-line arguments. These arguments can be supplied on the command-line:

$ sbatch --ntasks 16 job-script.sh

or embedded in the header of the job script itself using #SBATCH directives:

#!/bin/bash
#SBATCH --ntasks 16

You can use the scancel command to cancel a job that has been queued, whether the job is pending or currently running. Jobs are cancelled by specifying the job id that is assigned to the job during submission.

Example batch job script: hello-world.sh

#!/bin/bash

#SBATCH --ntasks 1
#SBATCH --output hello-world.out
#SBATCH --qos debug
#SBATCH --time=00:05:00

echo Running on $(hostname --fqdn):  'Hello, world!'

This minimal example job script, hello-world.sh, when submitted with sbatch, writes the name of the cluster node on which the job ran, along with the standard programmer's greeting, "Hello, world!", into the output file hello-world.out

$ sbatch hello-world.sh

Note that any Slurm arguments must precede the name of the job script.

Example: Serial jobs

Job requirements

Slurm uses the requirements declared by job scripts and submission arguments to schedule and execute jobs as efficiently as possible. To minimize the time your jobs spend waiting to run, define your job's resource requirements as accurately as possible.

--nodes
The number of nodes your job requires to run.
--mem
The amount of memory required on each node.
--ntasks
The number of simultaneous tasks your job requires. (These tasks are analogous to MPI ranks.)
--ntasks-per-node
The number of tasks (or cores) your job will use on each node.
--time
The amount of time your job needs to run.

The --time requirement (also referred to as "walltime") deserves special mention. Job execution time can be somewhat variable, leading some users to overestimate (or even maximize) the defined time limit to prevent premature job termination; but an unnecessarily long time limit may delay the start of the job and allow undetected stuck jobs to waste more resources before they are terminated.

For all resources, --time included, smaller resource requirements generally lead to shorter wait times.

Summit nodes can be shared, meaning each such node may execute multiple jobs simultaneously, even from different users.

Additional job parameters are documented with the sbatch command.


Summit Partitions

On Summit, nodes with the same hardware configuration are grouped into partitions. You will need to specify a partition using --partition in order for your job to run on the appropriate type of node.

Partition name Description # of nodes cores/nodes RAM/core (GB) Max Walltime Billing weight
shas General Compute with Haswell CPUs (default) 380 24 5.25 24H 1
sgpu GPU-enabled 10 24 5.25 24H 2.5
smem High-memory 5 48 42 7D 6
sknl Phi (Knights Landing) CPU 20 64 TBD 24H 0.1

More details about each type of node can be found here.


Quality of service (QOS)

On Blanca, a QoS is specified to submit a job to either a group's high-priority queue or to the shared low-priority queue.

On Summit, QoSes are used to constrain or modify the characteristics that a job can have. For example, by selecting the "debug" QoS, a user can obtain higher queue priority for a job with the tradeoff that the maximum allowed wall time is reduced from what would otherwise be allowed on that partition. We recommend specifying a QoS (using the --qos flag or directive in Slurm) as well as a partition for every job

The available Summit QoSes are

QOS name Description Max walltime Max jobs/user Node limits Priority boost
normal default Derived from partition n/a 256/user 0
debug For quicker turnaround when testing 1H 1 32/job Equiv. of 3-day queue wait time
long For jobs needing longer wall times 7 D n/a 22/user; 40 nodes total 0
condo For groups who have purchased Summit nodes 7D n/a n/a Equiv. of 1 day queue wait time

Shell variables and environment

Jobs submitted to Summit are not automatically set up with the same environment variables as the shell from which they were submitted. Thus, it is required to load any necessary modules or set any environment variables needed by the job within the job script. These settings should be included after any #SBATCH directives in the job script.


Job arrays

Job arrays provide a mechanism for running several instances of the same job with minor variations.

Job arrays are submitted using sbatch, similar to standard batch jobs.

$ sbatch --array=[0-9] job-script.sh

Each job in the array will have access to a $SLURM_ARRAY_TASK_ID set to the value that represents that job's position in the array. By consulting this variable, the running job can perform the appropriate variant task.

Example array job script: hello-world.sh

#!/bin/bash

#SBATCH --array 0-9
#SBATCH --ntasks 1
#SBATCH --output array-job.out
#SBATCH --open-mode append
#SBATCH --qos debug
#SBATCH --time=00:05:00

echo "$(hostname --fqdn): index ${SLURM_ARRAY_TASK_ID}"

This minimal example job script, array-job.sh, when submitted with sbatch, submits ten jobs with indexes 0 through 9. Each job appends the name of the cluster node on which the job ran, along with the job's array index, into the output file array-job.out

$ sbatch array-job.sh
Example: Array jobs

Allocations

Access to computational resources is allocated via shares of CPU time assigned to Slurm allocation accounts. You can determine your default allocation account using the sacctmgr command.

$ sacctmgr list Users Users=$USER format=DefaultAccount

Use the --account argument to submit a job for an account other than your default.

#SBATCH --account=crcsupport

You can use the sacctmgr command to list your available accounts.

$ sacctmgr list Associations Users=$USER format=Account

Job mail

Slurm can be configured to send email notifications at different points in a job's lifetime. This is configured using the --mail-type and --mail-user arguments.

#SBATCH --mail-type=END
#SBATCH --mail-user=user@example.com

The --mail-type configures what points during job execution should generate notifications. Valid values include BEGIN, END, FAIL, and ALL.


Resource accounting

Resources used by Slurm jobs are recorded in the Slurm accounting database. This accounting data is used to track allocation usage.

The sacct command displays accounting data from the Slurm accounting database. To query the accounting data for a single job, use the --job argument.

$ sacct --job $jobid

sacct queries can take some time to complete. Please be patient.

You can change the fields that are printed with the --format option, and the fields available can be listed using the --helpformat option.

$ sacct --job=200 --format=jobid,jobname,qos,user,nodelist,state,start,maxrss,end

If you don't have a record of your job IDs, you can use date-range queries in sacct to find your job.

$ sacct --user=$USER --starttime=2017-01-01 --endtime=2017-01-03
To query the resources being used by a running job, use sstat instead:
 $sstat -a -j JobID.batch  
where you should replace JObID with the actual ID of your running job. sstat is especially useful for determining how much memory your job is using; see the "MaxRSS" field.

Monitoring job progress

The squeue command can be used to inspect the the Slurm job queue and a job's progress through it.

By default, squeue will list all jobs currently queued by all users. This is useful for inspecting the full queue; but, more often, users simply want to inspect the current state of their own jobs.

$ squeue --user=$USER

Slurm can provide an estimate of when your jobs will start, along with what resources it expects to dispatch your jobs to. Please keep in mind that this is only an estimate!

$ squeue --user=$USER --start

More detailed information about a specific job can be accessed using the scontrol command.

$ scontrol show job $SLURM_JOB_ID

Memory limits

To better balance the allocation of memory to CPU cores (for example, to prevent users from letting their jobs use all the memory on a shared node while only requesting a single core), we have limited each core to a fixed amount of memory. This limit is dependent on the requested node. You can either specify how much memory you need in MB and let Slurm assign the correct number of cores, or you can proportionally set the number of cores relative to the memory that your job will need.

Node type per-CPU limit per-node limit
crestone 3,942 MiB
shas 4,944 MiB 118,658 MiB
sgpu 4,944 MiB 118,658 MiB
smem 42,678 MiB 2,048,544 MiB

Interactive jobs

Interactive jobs allow users to log in to a compute node to run commands interactively on the command line. They are commonly run with the debug QoS as part of an interactive programming and debugging workflow. The simplest way to establish an interactive session is to use the sinteractive command:

$ sinteractive --qos=debug  --time=01:00:00

This will open a login shell using one core on one node for one hour. It also provides X11 forwarding via the submit host and can thus be used to run GUI applications.

If you prefer to submit an existing job script or other executable as an interactive job, use the salloc command.

$ salloc --qos debug job-script.sh

If you do not provide a command to execute, salloc starts up a Slurm job that nodes will be assigned to, but it does not log you in to the allocated node(s).

The sinteractive and salloc commands each support the same parameters as sbatch, and can override any default configuration. Note that any #SBATCH directives in your job script will not be interpreted by salloc when it is executed in this way. You must specify all arguments directly on the command line.


Topology-aware scheduling

Summit's general compute nodes are arranged into "islands" of about 30 nodes on a single Omni-Path switch. Nodes connected to the same switch have full interconnect bandwidth to other nodes in that same island. The bandwidth between islands is only half as much (ie, 2:1 blocking.) Thus, a job that does a lot of inter-node MPI communication may run faster if it is assigned to nodes in the same island.

If the --switches=1 directive is used, Slurm will put all of the job's tasks on nodes connected to a single switch. Keep in mind that jobs requesting topology-aware scheduling can use a maximum of 32 nodes and may spend a long time in the queue waiting for switch-specific nodes to be come available. To specify the maximum amount of time a job should wait for a single switch, use --switches=1@DD-HH:MM:SS and replace DD-HH:MM:SS with the desired number of days, hours, minutes, and seconds. After that time elapses, Slurm will schedule the job on any available nodes.

File and data transfer

Research Computing offers two primary mechanisms for data transfer: SSH-based scp and sftp; and GridFTP, including access from Globus Connect.

SSH file transfer (SCP / SFTP)

SSH-based file transfer is not particularly efficient or performant (especially compared to the Globus and GridFTP method detailed below) but, because it uses the same software that is already used for interaction with Research Computing resources, it is still commonly used.

Command-line secure copy (SCP)

Files can be transferred to and from the Research Computing environment using the scp command from a Unix or Linux command-line (including Mac OS X).

$ scp ${local_filename} rc_username@login.rc.colorado.edu:/path/to/target-directory
(Please modify the above command with the appropriate paths and your RC username.
Tutorial: When transferring files using scp, it is particularly useful to use OpenSSH ControlMaster to reduce OTP password entries.

Interactive clients

Files can also be transferred to and from the Research Computing environment using a number of interactive file transfer applications. The most basic of these is sftp, available from a Unix or Linux command-line (including Mac OS X).

$ sftp rc_username@login.rc.colorado.edu

Once an SFTP connection is established, files can be transferred using the get and put commands. More information can be accessed using the help command.

Alternative (and graphical) file transfer clients are available for Windows, including

  • FileZilla, a multi-protocol, multi-platform file-transfer application.
  • WinSCP, a basic SCP/SFTP file-transfer application for Windows.

Graphical file-transfer applications often retain passwords for automatic authentication for later transfers. Because Research Computing uses one-time passwords for authentication, you must disable password retention / saving in your file-transfer client. Failure to do so may cause your account to be temporarily disabled after the client attempts and fails to authenticate repeatedly in the background.

File-system and application integration

SSH-accessible filesystems can be directly integrated into compatible applications and Operating Systems.

Configuration of these OS and application-level integrations is currently outside the scope of this guide.

Globus

Globus addresses deficiencies in secure copy requests by automating large data transfers, by resuming failed transfers, and by simplifying the implementation of high-performance transfers between computing centers.

Globus.org and Globus Online

Globus Online is a Software as a Service (SaaS) deployment of the Globus Toolkit which provides end-users with a browser interface to initiate data transfers between endpoints registered with the Globus Alliance. Globus Online allows registered users to “drag and drop” files from one endpoint to another. Endpoints are terminals for data; they can be laptops or supercomputers, and anything in between. The servers at Globus.org act as intermediaries-negotiating, monitoring and optimizing transfers through firewalls and across network address translation (NAT). Under certain circumstances with high performance hardware transfer rates exceeding 1 GB/s are possible.

We recommend reading through Globus.org's overview documentation .

Getting an account

To use Globus Online, you will first need to sign up for an account at Globus.org. If you prefer to use your CU Identikey account for authentication to Globus Online, choose "University of Colorado Boulder" from the dropdown menu at the Login page and follow the instructions from there.

Transferring data to/from a local workstation

You can use Globus Online to transfer data between your local workstation (e.g., your laptop or desktop) and Research Computing. In this workflow, you configure your local workstation as a Globus endpoint using Globus Connect.

  1. Log in to Globus.org
  2. Use the Manage Endpoints interface to “add Globus Connect Personal” as an endpoint. (More information at Globus.org support.)
  3. Transfer Files, using your new workstation endpoint for one side of the transfer, the Research Computing endpoint (CU-Boulder Research Computing) for the other side. (You will be required to authenticate to the Research Computing endpoint using your RC account and OTP or Duo.)

Transferring data between two remote endpoints

Globus.org can also be used to transfer data between two remote Globus endpoints (e.g., between your local compute center's Globus endpoint and the Research Computing endpoint.)

  1. Log in to Globus.org
  2. Transfer Files, using the Research Computing endpoint (CU-Boulder Research Computing) for one side of the transfer, and another endpoint of your choice for the other side. (You will be required to authenticate to the Research Computing endpoint using your RC account and OTP. The other endpoint may require its own authentication as well.)

Globus Connect command-line interface

Globus.org provides a command-line interface (CLI) as an alternative to its web interface. This command-line interface is provided over an SSH connection to a Globus.org server.

  1. Use the Manage Identities interface at Globus.org to upload your ssh public key.
  2. Connect to Globus.org using an ssh client.

    $ ssh -l globus_username cli.globusonline.org
  3. The Globus.org command-line interface can start and manage transfers, manage files on an endpoint, and configure endpoints associated with your account. Use the help command for more information on the commands available, or visit the Globus.org support system.

Software

Research Computing manages a set of supported software for use on RC systems. This software is published using an Environment Modules system.

Users are encouraged to build custom software in a project directory. Users may even maintain and publish local module files to dynamically configure a running environment to use the software. If this module is useful for the general user community, it may be adopted as a centrally-supported module.

We are happy to help with the installation of custom or third-party software. If you need assistance, please contact us at rc-help@colorado.edu.

Category Software Module name Comments
Commerical Applications Matlab matlab/R2016b
Mathematica mathematica/9.0
IDL idl/8.5
Compilers GCC gcc/6.1.0
Intel intel/17.0.0
PGI pgi/16.5
MPI Implementations OpenMPI openmpi/1.10.2
Intel intel/2017.0.098
Debuggers And Optimization Allinea allinea/6.0.4 The debugger program is ddt
Totalview totalview/2016.6.21
Tau tau/2.25.1
PAPI papi/5.4.3
Perfsuite perfsuite/1.1.4
Utilities Boost boost/1.61.0
FFTW fftw/3.3.4
GSL gsl/2.1
Python python/2.7.11, python/3.5.1
R R/3.3.0
Szip szip/2.1
HDF5 hdf5/1.10
MKL 17.0.0
NetCDF4 netcdf/4.4.0
LAPACK (via Intel MKL)
GDAL gdal/2.1.0
Visualization Paraview paraview/5.0.1
Visit visit/2.6.2
NCL ncl/6.3.0

Environment modules

The use of a module system means most software is not accessible by default but has to be loaded using the module command. The reasons for this are that it allows RC to provide multiple versions of the software concurrently and allows users to easily switch between versions.

"Loading a module" sets or modifies a user's environment variables to enable access to the software package provided by that module. For instance, the $PATH variable might be updated so that appropriate executables for that package can be used.

Research Computing's current Lua based Lmod environment module system is hierarchical, with five layers to support programs built with compiler and library consistency requirements. Modules are only available to be loaded once their dependencies have been satisfied. This prevents accidental loading of modules that are inconsistent with each other. The layers include

  • Independent programs
  • Compilers
  • Compiler dependent programs
  • MPI implementations
  • MPI dependent programs

Thus, in order to load an MPI-dependent program, it's first necessary to load a compiler (eg, Intel), and then an MPI implementation (eg, IMPI). See examples below.

Enabling the current module system

User accounts created prior to Summer 2016 may be using an older modules system that only has access to earlier versions of software packages. RC highly recommends that all users switch to the Lmod modules system. Old modules will not work on Summit!

/curc/tools/utils/switch_lmod.sh
Switch to the new modules collection
/curc/tools/utils/switch_lmod.sh -r
Revert to the legacy modules collection

Common commands

The typical usage of the module command is outlined in the following table.

The module command may be shortened to the ml alias, with slightly different semantics.

module avail (ml av)
List available software. If a module is not listed here, it might have an unmet dependency and thus be unavailable for loading until a package higher in the module hierarchy is loaded.
Search for not-listed software using the module spider command.
module spider openmpi (ml spider openmpi)
Search for particular software. In this example we are searching for the OpenMPI library.
module load gcc (ml gcc)
Load a module to use the software. In this example we are loading the GNU Compiler Collection. We have not specified a version, so the default version will be loaded.
module load gcc/6.1.0
Load gcc version 6.1.0
module unload gcc (ml -gcc)
Remove/unload a module.
module swap gcc intel (ml -gcc intel)
Swap a module. In this example we are swapping gcc for intel. This will unload gcc and load intel, if there are any gcc dependent modules they will also be unloaded and the intel dependent versions (if available) will be loaded in their place.
module purge (ml purge)
Remove all modules. Note, that the slurm module will not be unloaded with this purge as it is sticky. If a user wants to unload a sticky module, they must specify the --force option.
module save foo (ml save foo)
Save the state of all loaded modules. In this example we are saving all loaded modules as a collection called foo.
module restore foo (ml restore foo)
Restore a state of saved modules. In this example we are restoring all modules that were saved as the collection called foo.

Additional module sub-commands are documented in the module help command.


Loading modules in Slurm jobs

In order for an application running in a Slurm job to have access to any necessary module-provided software packages, those modules need to be loaded in the job script. Module load commands should be placed after any #SBATCH directives and before the actual executable is called.

Software compilation

If the software package or version you need is not available as an environment module, you may compile it yourself. The recommended location for user-installed software is the project directory, which is snapshotted and can easily be shared with members of a research group.

It is extremely important that software be built on the same type of node as it is intended to run on. This means that compiling on the login nodes is highly discouraged because the login nodes have a different hardware architecture and operating system packages than any compute nodes. Applications compiled on login nodes are unlikely to perform well and may not even run on compute nodes. Software that was built on Janus needs to be recompiled in order to run on Summit.

Compile nodes

Dedicated nodes are provided for compiling software meant to run on the Summit compute cluster. Access to these nodes is provided from any login node via SSH.

$ ssh scompile

The Summit compile nodes are otherwise identical to Summit general compute nodes and are most likely to produce problem-free builds, especially for applications that use MPI or are heavily hardware-optimized. Applications that are intended to run on GPU nodes must be compiled directly on a GPU node since the standard Summit compile nodes do not include the necessary GPU drivers. Start an interactive session to log into a GPU node for compiling:

 $ sinteractive --partition=sgpu --qos=debug --time=1:00:00 --exclusive

Compilers

RC provides three compiler suites, each of which can be enabled by loading the appropriate environment module. We recommend using the Intel Compiler Suite when compiling applications to be run on Research Computing resources, e.g.:

$ module load intel/17.0.0
Compiler Suite Module Family Language Command
Intel Compiler Suite intel C icc
Fortran ifort
C++ icpc
The Portland Group pgi C pgcc
Fortran pgfortran
C++ pgcc++
GNU Compiler Collection gcc C gcc
Fortran gfortran
C++ g++
Java gcj

MPI

Two MPI implementations are available as modules for use with Summit: OpenMPI and Intel MPI. We recommend Intel MPI.

All MPI implementations available in the Research Computing Environment provide standard wrapper commands.

Language Wrapper command
C mpicc
Fortran mpifc, mpif77, mpif90
C++ mpicxx

The MPI module you load is dependent on the compiler suite you have loaded. For example if you have the Intel compiler loaded you will only be able to load an MPI implementation built with that compiler. A compiler module must thus be loaded before an MPI module.

Optimization

Please choose appropriate compiler flags to produce an optimized executable. Application performance on modern computer architectures such as Summit often depends critically on compile-time optimization. It's not unusual to see a factor of two speedup when choosing correct optimization options. Keep in mind that that is equivalent to doubling the size of your compute allocation!

The general optimization flag for most compilers is -OX where X is an integer specifying the optimization level. Level 2 is appropriate for most applications and 3 produces additional speedups in many cases but may also reduce numerical accuracy. Try both to see which works better for your program.

Processor-specific optimization can also lead to noticeably improved performance. For the CPUs on Summit nodes, -march=core-avx2 or -xcore-avx2 (Intel compiler) or -mtune=haswell (GNU compiler) turn on the appropriate CPU-specific optimization.

The Intel compilers can provide additional information about how well your executable is vectorized, that is, whether multiple floating point operations per CPU cycle are enabled. Run man icc or man ifort and search for vec-report or qopt-report for more details.

Many optimized precompiled numerical functions are available in packages such as Intel's Math Kernel Library or the GNU Scientific Library. By linking these into your program, rather than writing your own versions, you can be sure that you are getting excellent optimization and vectorization without additional work on your part. MKL and GSL are both provided as RC modules.

Administration and communication

Regularly scheduled maintenance

A regular maintenance period of 7:00 AM to 7:00 PM will be scheduled on the first Wednesday of each month. If necessary an additional maintenance may be scheduled on the third Wednesday of the month. During maintenance periods, jobs may be prevented from running, and login access to RC resources may be restricted.

Upcoming and ongoing maintenance will be announced on the rc-announce mailing list and at www.rc.colorado.edu.

rc-announce

RC-Announce is the main form of communication that Research Computing staff use to reach out to our community of users. As a member of this list you will receive important system notifications and updates about important news and events. We would also encourage users to follow us on Twitter @CUBoulderRC

You can subscribe to rc-announce at https://lists.rc.colorado.edu/mailman/listinfo/rc-announce.

Frequently asked questions

If you have a question that is not answered in the user guide, or need general help with a Research Computing system, please contact us at rc-help@colorado.edu. Some of our most frequently asked questions are presented here, with references to the user guide where more information can be found.

Remote access and logging in

How do I get an account?

Please email rc-help@colorado.edu for allocation request or management questions during this transition to Summit.

External users must first obtain a sponsored affiliate account with CU.

How do I register a one-time password (OTP) authenticator?

Register your authenticator at otp.colorado.edu. The process is documented in the user guide.

How do I log into my account?

Computational access to Research Computing systems is primarily provided via ssh. Access to storage resources is primarily provided via SSH and Globus.

Allocations

How do I get an allocation?

All new accounts are eligible for a start-up allocation upon request. Additional allocations are granted upon request with sufficient justification. More information is available in the user guide.

Batch queueing and job scheduling

What queues are available?

Research Computing uses Slurm, which uses a single job queue. In stead of queues, Slurm uses resource partitions and quality-of-service (QOS) values to set limits and priorities and to assign jobs to resources.

More information on selecting a QOS value for your job is available in the user guide.

How can I run multiple serial jobs in parallel?

Research Computing provides a simple load-balancer that can run multiple serial processes as tasks in a larger job.

When will my job start running?

Slurm can provide an estimate of when your jobs will start, along with what resources it expects to dispatch your jobs to.

$ squeue --user=$USER --start

Start times vary, and are largely dependent on the existing job load. A small debug job should start within a few minutes. A production 96-core job might take a few hours to start. A large, long job may spend several days or a week in the queue before starting.

More information is available in the user guide.

Storage

How long are backups saved in the PetaLibrary?

The PetaLibrary does not offer a conventional backup service. Storage options with "replication" or "second copy on tape" are designed to protect against a media failure, not accidental deletion. Changes to files on primary storage propagate automatically to the replicated copy, and if a file is deleted from the main storage it should not be expected to remain on the copy.

Can I mount my PetaLibrary project directory to my desktop computer?

We do not currently support NFS or CIFS exports from the PetaLibrary to the campus network, but similar (unsupported) access could be provided using sshfs.

How long does it take to access a file in the PetaLibrary Archive?

Files that have been migrated to tape can takes up to a minute to load the appropriate tape cartridge, plus another minute or two to advance the tape to the spot where the file is located. Data is then read from the tape at up to 160 MB/s. If all tape drives are in use by other file operations, processes, then the data recall will wait until a tape drive becomes available.

What tools and processes are available on the PetaLibrary for data and metadata management, curation, and information lifecycle management?

The PetaLibrary is designed primarily as a storage facility, leaving management of the stored data to the project owner. However, CU Research Data Services can offer assistance with many aspects of the data management process.

Can I back up my PC to the PetaLibrary?

The PetaLibrary has been funded to store research data, not for general-purpose storage or backups. The UCB Files storage service is an alternative for general file storage.

Why shouldn’t I use rsync to synchronize my local data with PetaLibrary Archive?

rsync synchronizes data by comparing file content in addition to file metadata. In an HSM environment like PetaLibrary Archive, these content comparisons require that all files be recalled from tape to disk, making rsync an inefficient mechanism for synchronizing files.

Why are my PetaLibrary transfers slow?

The PetaLibrary has a 10 Gb/s connection to the CU Science Network, allowing transfers at speeds up to 800 MB/s; but slower networks in your department or institution are likely to be the greatest limiting factor when moving data in and out of the PetaLibrary. If you have confirmed with your local network administration that you have a 10 Gb/s or faster connection between yourself and the PetaLibrary, contact us at rc-help@colorado.edu for help troubleshooting the connection.

Software

Will you install software for me?

We prefer that software is installed first by a user that will use it. For more information is available in the user guide.

How can I profile my code?

We recommend using the HPCToolkit.

How do I access LAPACK and BLAS?

We recommend using Intel's Math Kernel Library (MKL), especially if you are compiling with the Intel Compiler Suite.

$ module load intel/intel-13.0.0

If you prefer an open source implementation, we provide the GNU Scientific Library.

$ module load gsl/gsl-1.15_gcc-4.7.2

JupyterHub

JupyterHub is a multi-user server for Jupyter (formerly known as IPython) notebooks. It provides a web service that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. The CURC environment includes support for parallel computation on local HPC resources.

Our current documentation is currently located here

Related Tutorials