AWS Command Line Interface (CLI) Tool

AWS Command Line Interface (CLI) Tool

1. aws cli Installation

The AWS Command Line tool is used for interaction with the storage service, and can be scripted for automated workflows. Installing the AWS CLI is summarized below, and you may consult the official AWS CLI install guide.

CADES SHPC Users

The aws client is provided via a software module, though you may install a local version in your home directory if you wish.
From the SPHC login nodes:

-bash-4.2$module load python/3.6.3
-bash-4.2$aws --version
aws-cli/1.16.12 Python/3.6.3 Linux/3.10.0-862.9.1.el7.x86_64 botocore/1.12.2

Windows Users

Download from https://aws.amazon.com/cli/.

OSX Users

See AWS macOS instructions.

Linux/OpenStack Users

You may encounter issues if awscli and the awscli-plugin-endpoint are installed from different sources e.g. one from your distribution's package manager (apt or yum) and one from pip. Installing both via pip usually allows them to work together well.

📝 Note: It is recommended to install the components in a Python virtual environment, the instructions for which are available here.

If you wish to install system-wide (as root) you may do so with pip via:

sudo pip install awscli
sudo pip install awscli-plugin-endpoint

2. Initial aws cli Profile Configuration

Specify IAM user credentials by editing your ~/.aws/credentials file, or run aws configure and paste the appropriate values into the prompts, to create entries like:

[default]
aws_access_key_id = <accessKey>
aws_secret_access_key = <secretKeyValue>
Default region name = <leave blank or us-east-1>
Default output format = <leave blank or text or json>

Changing Default S3 Endpoints and Multiple Profiles

By default the external Amazon S3 service is assumed, and if you wish to use AWS there is no need to change.

If you wish to use some other S3 provider (such as an on prem S3 compatible service) you will need to explicitly define the endpoint to use.

Example: To change to using an on-perm S3 endpoint, and make the default for the --profile option:

aws configure set plugins.endpoint awscli_plugin_endpoint
aws configure --profile default set s3.endpoint_url http://or-rda-s3.ornl.gov
aws configure --profile default set s3api.endpoint_url http://or-rda-s3.ornl.gov

📝 Note: The first command enables the "endpoint" plugin, which allows easy switching between interacting with multiple internal (S3) identities or external (AWS) accounts by passing a --profile argument. Your ~/.aws/config and ~/.aws/credentials must have profiles and credentials defined for each identity.

Further information on configuring multiple named profiles:

https://docs.aws.amazon.com/cli/latest/userguide/cli-multiple-profiles.html

https://github.com/wbinglee/awscli-plugin-endpoint

3. Basic S3 Storage Operations

Integrated User Manual

The AWS-CLI tool has help text integrated into it. To invoke this, use aws help. To get detailed help about supported features, build your command line and post-pend help to the command. As an example if you want help with the S3 copy command, type:

aws s3 cp help

General Format

As we are dealing with the S3 service we will almost always be specifying one of two commands to run: aws s3 or aws s3api.

Create a New Bucket

Buckets are storage areas similar to Unix volumes or Windows drives. With every s3 command a bucket must be specified.

Create a new bucket with:

aws s3 mb s3://mynewbucket

Listing S3 Buckets

Use ls without a bucket name to list all buckets visible to you:

aws s3 ls s3://

Or use the s3api command with list-buckets, which offers additional options

aws s3api list-buckets

Display Capacity Used

Many methods exist for reporting S3 capacity utilization, within the cli and AWS webconsole utilization metrics, and third party tools.

To return the overall capacity used by a bucket:

aws s3api list-objects --bucket rda-sup-backups --output json --query "[sum(Contents[].Size), length(Contents[])]" --profile cades-ops-s3
[
    1277153105,
    8667
]

Copying Files Into and Out of S3

Copying files into S3 is very similar to copying files on the Unix command line or SCP. The aws command is used, along with the endpoint specification, both common to all operations.

We specify the S3 service and that we want to copy files. The direction can either be local → S3 or S3 → local file system, simply by reversing the order.

aws s3 cp <local filename> s3://<bucket>/<remote filename>

Example:

aws s3 cp largefile s3://mynewbucket/largefile

As with all aws commands, optionally adding --profile may be used to specify the named profile and matching credentials to be used.

List Files

To list the files in a bucket, type:

aws s3 ls <bucket>

Example:

aws s3 ls mynewbucket

📝 Note: Object storage is non-hierarchical, directories do not exist. They are somewhat emulated through 'prefix' paths, designated in ls output with PRE. You can ls a series of PRE tags, similar to a directory structure. Ensure to end the final PRE path with a closing /

Syncing Files

The S3 service provides a capability similar to that of the rsync command. Similar to the copy command the direction of synchronization can be either to S3 or from S3. The <local directory> can be relative or absolute. This is significantly faster if you have a moderate number of files.

aws s3 sync <local directory> s3://<bucket>/directory

Example:

aws --quiet s3 sync /home/xyz/project/model_output s3://mynewbucket/model_output

When the sync operation is used a line is updated with the current command statistics. Above we see the optional parameter --quiet. This suppresses the update statistics output. This is useful when capturing command output as the progress bar normally fills log files with a large amount of unintelligible output.

Syncing creates destination PRE (prefix) paths automatically.

Removing Files

Removing a single file:

aws s3 rm s3://<bucket>/<filename>

Example:

aws s3 rm s3://mynewbucket/largefile

Removing a directory

With the addition of the --recursive option an entire directory can be removed. Example:

aws s3 rm --recursive s3://mynewbucket/large_directory