AWS Command Line Interface (CLI) Tool
1. aws cli Installation
The AWS Command Line tool is used for interaction with the storage service, and can be scripted for automated workflows. Installing the AWS CLI is summarized below, and you may consult the official AWS CLI install guide.
CADES SHPC Users
- The aws client is provided via a software module, though you may install a local version in your home directory if you wish.
- From the SPHC login nodes:
-bash-4.2$module load python/3.6.3
-bash-4.2$aws --version
aws-cli/1.16.12 Python/3.6.3 Linux/3.10.0-862.9.1.el7.x86_64 botocore/1.12.2
Windows Users
- Download from https://aws.amazon.com/cli/.
OSX Users
- See AWS macOS instructions.
Linux/OpenStack Users
You may encounter issues if awscli
and the awscli-plugin-endpoint
are installed from different sources e.g. one from your distribution's package manager (apt
or yum
) and one from pip
. Installing both via pip
usually allows them to work together well.
📝 Note: It is recommended to install the components in a Python virtual environment, the instructions for which are available here.
If you wish to install system-wide (as root) you may do so with pip
via:
sudo pip install awscli
sudo pip install awscli-plugin-endpoint
2. Initial aws cli Profile Configuration
Specify IAM user credentials by editing your ~/.aws/credentials
file, or run aws configure
and paste the appropriate values into the prompts, to create entries like:
[default]
aws_access_key_id = <accessKey>
aws_secret_access_key = <secretKeyValue>
Default region name = <leave blank or us-east-1>
Default output format = <leave blank or text or json>
Changing Default S3 Endpoints and Multiple Profiles
By default the external Amazon S3 service is assumed, and if you wish to use AWS there is no need to change.
If you wish to use some other S3 provider (such as an on prem S3 compatible service) you will need to explicitly define the endpoint to use.
Example: To change to using an on-perm S3 endpoint, and make the default
for the --profile
option:
aws configure set plugins.endpoint awscli_plugin_endpoint
aws configure --profile default set s3.endpoint_url http://or-rda-s3.ornl.gov
aws configure --profile default set s3api.endpoint_url http://or-rda-s3.ornl.gov
📝 Note: The first command enables the "endpoint" plugin, which allows easy switching between interacting with multiple internal (S3) identities or external (AWS) accounts by passing a --profile
argument. Your ~/.aws/config
and ~/.aws/credentials
must have profiles and credentials defined for each identity.
Further information on configuring multiple named profiles:
https://docs.aws.amazon.com/cli/latest/userguide/cli-multiple-profiles.html
https://github.com/wbinglee/awscli-plugin-endpoint
3. Basic S3 Storage Operations
Integrated User Manual
The AWS-CLI tool has help text integrated into it. To invoke this, use aws help
. To get detailed help about supported features, build your command line and post-pend help
to the command. As an example if you want help with the S3 copy command, type:
aws s3 cp help
General Format
As we are dealing with the S3 service we will almost always be specifying one of two commands to run:
aws s3
or
aws s3api
.
Create a New Bucket
Buckets are storage areas similar to Unix volumes or Windows drives. With every s3 command a bucket must be specified.
Create a new bucket with:
aws s3 mb s3://mynewbucket
Listing S3 Buckets
Use ls without a bucket name to list all buckets visible to you:
aws s3 ls s3://
Or use the s3api
command with list-buckets, which offers additional options
aws s3api list-buckets
Display Capacity Used
Many methods exist for reporting S3 capacity utilization, within the cli and AWS webconsole utilization metrics, and third party tools.
To return the overall capacity used by a bucket:
aws s3api list-objects --bucket rda-sup-backups --output json --query "[sum(Contents[].Size), length(Contents[])]" --profile cades-ops-s3
[
1277153105,
8667
]
Copying Files Into and Out of S3
Copying files into S3 is very similar to copying files on the Unix command line or SCP. The aws
command is used, along with the endpoint specification, both common to all operations.
We specify the S3 service and that we want to copy files. The direction can either be local → S3 or S3 → local file system, simply by reversing the order.
aws s3 cp <local filename> s3://<bucket>/<remote filename>
Example:
aws s3 cp largefile s3://mynewbucket/largefile
As with all aws
commands, optionally adding --profile
may be used to specify the named profile and matching credentials to be used.
List Files
To list the files in a bucket, type:
aws s3 ls <bucket>
Example:
aws s3 ls mynewbucket
📝 Note: Object storage is non-hierarchical, directories do not exist. They are somewhat emulated through 'prefix' paths, designated in ls output with PRE. You can ls a series of PRE tags, similar to a directory structure. Ensure to end the final PRE path with a closing /
Syncing Files
The S3 service provides a capability similar to that of the rsync
command. Similar to the copy command the direction of synchronization can be either to S3 or from S3. The <local directory>
can be relative or absolute. This is significantly faster if you have a moderate number of files.
aws s3 sync <local directory> s3://<bucket>/directory
Example:
aws --quiet s3 sync /home/xyz/project/model_output s3://mynewbucket/model_output
When the sync
operation is used a line is updated with the current command statistics. Above we see the optional parameter --quiet
. This suppresses the update statistics output. This is useful when capturing command output as the progress bar normally fills log files with a large amount of unintelligible output.
Syncing creates destination PRE (prefix) paths automatically.
Removing Files
Removing a single file:
aws s3 rm s3://<bucket>/<filename>
Example:
aws s3 rm s3://mynewbucket/largefile
Removing a directory
With the addition of the --recursive
option an entire directory can be removed.
Example:
aws s3 rm --recursive s3://mynewbucket/large_directory