Storage and filesystems

The Research Computing environment provides several types of storage. Understanding these storage options, how they impact job execution, and how they may impact other running jobs is essential when running in the Research Computing environment.

Home

Each user has a home directory available from all Research Computing nodes at /home/${USER}/. Each home directory is a limited to 2GB to prevent the use of the home directory as a target for job output, software installs, or other data likely to be used during a compute job. Home directories are not stored on a high-performance file-system and, as such, they are not intended to be written to by compute jobs. Use of a home directory during a high-performance or parallel compute job may negatively affect the environment for all users.

A hidden .snapshot/ directory is available in each home directory (and in each subdirectory) and contains recent copies of the files at 2-hour, daily, and weekly intervals. These snapshots can be used to recover files after accidental deletion or corruption.

Home directories are intended for the use of their owner only; sharing the contents of home directories with other users is strongly discouraged.

Home directories are protected by the redundancy features of the OneFS clustered file system, and are backed-up to a second site each night for disaster recovery.

Projects

Each user has access to a 250GB projects directory available from all Research Computing nodes at /projects/${USER}/. The projects directory is intended to store software builds and smaller data sets. Projects directories may be shared with other RC users. Like home directories, project directories are not intended to be written to by compute jobs. Significant I/O to a project directory during a high-performance or parallel compute job may negatively affect the environment for all users.

A hidden .snapshot/ directory is available in each project directory (and in each subdirectory) and contains recent copies of the files at 6-hour, daily, and weekly intervals. These snapshots can be used to recover files after accidental deletion or corruption.

Project directories are protected by the redundancy features of the OneFS clustered file system, and are backed-up to a second site each night for disaster recovery.

Summit scratch

A high-performance parallel scratch filesystem meant for I/O from jobs running on Summit is available at /scratch/summit/$USER/.

By default, each user is limited to a quota of 10 TB of storage space and 20M files and directories. Email rc-help@colorado.edu if you need these limits increased.

Summit scratch is a storage space most likely to provide the highest I/O performance for jobs running on Summit and is the preferred storage target for these jobs.

Summit scratch is mounted on all Summit compute nodes via the GPFS protocol over the Omni-Path interconnect. It is also mounted on login nodes and data-transfer nodes (ie, Globus endpoint nodes) via the NFS protocol.

High-performance scratch directories are not backed up or checkpointed, and are not appropriate for long-term storage. Data may be purged at any time. Files are automatically removed 90 days after their initial creation.

Inappropriate use of Summit scratch, including attempts to circumvent the automatic file purge policy, may result in loss of access to Summit.

Summit Scratch is served by a DDN GRIDScaler appliance running GPFS (aka IBM Spectrum Scale) version 4.2.

General scratch

A central general-purpose scratch filesystem is available at /rc_scratch/$USER/. This space is intended for short term/parallel storage over the length of a job run. Its primary purpose is for I/O from jobs on the Blanca cluster.

No administrative limits are placed on the amount of data that can be placed in a general-purpose global scratch directory.

Summit compute jobs should use Summit scratch, not the general-purpose scratch filesystem. As such, the general-purpose scratch filesystem is mounted read-only on Summit compute nodes.

General-purpose scratch directories are not backed up or checkpointed, and are not appropriate for long-term storage. Data may be purged at any time. Files may be automatically removed 90 days after their initial creation


The PetaLibrary

The PetaLibrary is a cost-subsidized service for the storage, archival, and sharing of research data, housed in the Space Sciences data center at CU-Boulder. It is available for a modest fee to any US-based researcher affiliated with the University of Colorado Boulder. For more details visit our Petalibrary page.

The PetaLibrary stores and archives research data, broadly defined as the scholarly output from researchers at the University of Colorado, as well as digital archives and special collections. The PetaLibrary is not an appropriate storage target for administrative data, data that is copyrighted by someone other than the project owner or members, or sensitive data (e.g., HIPAA, FERPA, ITAR, or Classified).

PetaLIbrary storage is allocated to individual projects. A PetaLibrary project is purchased and overseen by a Principal Investigator (PI), but may be made accessible to multiple additional members. A PI may have multiple PetaLibrary projects, each with a different storage allocation and list of authorized members. Each project may also have a Point of Contact (POC) person who is allowed to request changes to the project on behalf of the PI.

Each project is presented as a unique path on a posix-compliant filesystem.

The PetaLibrary system provides two primary services: PetaLibrary Active and PetaLibrary Archive.

The PetaLibrary has a 10 Gb/s connection to the CU Science Network, allowing transfers at speeds up to 800 MB/s.

PetaLibrary Active

The PetaLibrary Active storage service is mounted at /work/ on all Research Computing computational systems. This storage may be used by compute workloads, but it is not designed to be performant under I/O-intensive applications or parallel writes. (Parallel or concurrent IO should target Summit scratch or /rc_scratch instead, depending on which cluster it originates from.)

PetaLibrary/archive and hierarchical storage management (HSM)

PetaLibrary Archive is a hierarchical storage system that dynamically and simultaneously manages data on both disk and tape storage systems while presenting all data as part of a single, unified filesystem namespace. The process by which files are automatically migrated between disk and tape storage is called "hierarchical storage management" (HSM). Normally, recently-used or small files are kept on low-latency disk storage, while older or larger files are migrated to bulk tape storage. This provides good performance when accessing frequently-used data while remaining cost-effective for storing large quantities of archive data.

PetaLibrary/archive provides an optional "additional copy on tape" to protect against tape media failure. (All disk storage is protected from media failure by parity in a disk array.)

Sharing data

Data in the PetaLibrary can be shared with collaborators who do not possess a CU Identikey or RC account via the Globus sharing service. To enable sharing, a member of the PetaLibrary project must authenticate to the "CU-Boulder Research Computing" Globus endpoint and configure a "Globus shared endpoint" that references the intended PetaLibrary storage area. You can then grant access (read and/or write) to the shared endpoint to anyone with a Globus account.

A Globus Plus subscription is required to use the Globus sharing service. One Globus Plus subscription is provided with each PetaLibrary project.

To comply with NSF requirements, external collaborators should have a US-based affiliation.

For more information, contact us at rc-help@colorado.edu.

Local scratch

Local scratch directories reside directly on individual compute nodes and are assigned automatically during job execution. Because these directories are assigned and removed automatically, the location of the directory is assigned to the $SLURM_SCRATCH environment variable.

No administrative limits are placed on the amount of data that can be placed in a temporary local scratch directory. Scratch directories are limited only by the physical capacity of their storage.

Local scratch directories are not backed up or checkpointed, and are not appropriate for long-term storage. Local scratch directories, and all data contained in them, are removed automatically when the running job ends.

Implementation of local scratch directories
Partition Description
Summit (Haswell, GPU, KnL) 180 GB local SSD.
Summit (Himem) 10 TB local RAID filesystem.
Crestone 0.8 TB local SATA drive.
Blanca 0.8 TB local SATA drive.