AWS S3 Overview
Amazon S3 provides pay-as-you-go, off site, cloud based storage with an array of tools for access controls and sharing of data. Common use cases include backups, hosting datasets and portals, and programmatic access to objects, though there are many areas where S3 can integrate into and improve data workflows.
Capability Highlights:
- Graphical and cli clients
- Easy to use from CADES environments
- Long term Archival storage via Glacier
- Data tiering and aging policies
- Stored objects can (if desired) be exposed via https://
- Host static websites from an S3 bucket
- Create privately sharable pre-signed urls, with or without expiration
- No cost to retrieve data to AWS VMs within same region
- Support for multipart uploads and custom metadata tags
- Supported by Globus Premium Connectors
- Traffic is over port 433
- S3 interactions from ORNL systems does not (usually) require firewall exceptions
- Protocol is used by many providers and vendors of on prem object storage - not limited to AWS.
Getting Started with S3
While the official AWS S3 Docs are extensive, the below quick start guides are designed to assist ORNL scientific users become familiar with S3. Interacting with object storage is dissimilar to posix based filesystems, and the guides here are intended to introduce working with S3 data workflows.
- S3 Object Storage User Guide
- S3 AWS Command Line Interface (CLI)
- S3 Advanced Usage
- S3 in a Python Virtual Environment