Advanced S3 Operations & FAQ

1. Pre-signed URL Sharing

Generate a pre-signed URL for an S3 object. This allows anyone who receives the pre-signed URL to retrieve the S3 object with an HTTP GET request.

Configuration

For our environment you must configure your aws profile to use S3 version 4 signing and ensure a region is specified.

You may provide a named profile or specify to use the 'default' profile.

$aws configure set profile.YourProfileName.s3.signature_version s3v4
$aws configure set profile.YourProfileName.region us-east-1
# Or replace profile.YourProfileName. with default. if you only have that profile.


# Verify your aws profile is configured correctly:
$ aws configure list --profile YourProfileName
      Name                    Value             Type    Location
      ----                    -----             ----    --------
   profile               YourProfileName       manual    --profile
access_key     ****************DNQ8 shared-credentials-file    
secret_key     ****************m+p9 shared-credentials-file    
    region                us-east-1      config-file    ~/.aws/config

2. Parallel S3 Operations

3. Upload Object With Custom User Metadata

Add an object with customer user metadata during cp, mv, and sync (client version =>1.9.10)

(awscli) pae@koolaid:~$ aws s3 cp NVIDIA-Linux-x86_64-340.98.run s3://cades-s3user-pae --metadata '{"x-amz-meta-cms-id":"juicyfruit"}' --profile rda_eby

Retrieve metadata for an object (key):

(awscli) pae@koolaid:~$ aws s3api head-object --bucket cades-s3user-pae --key NVIDIA-Linux-x86_64-340.98.run --profile rda_eby
{
    "LastModified": "Tue, 21 Aug 2018 19:31:14 GMT",
    "ContentLength": 69984344,
    "ETag": "\"cfbe7baeaeae7bea413754ace19891ce-9\"",
    "Metadata": {
        "x-amz-meta-cms-id": "juicyfruit"
    }
}

4. Encrypt Objects During Upload

To be documented. See https://sixfeetup.com/blog/hidden-features-via-aws-cli

5. Writing Data Directly From an Application

If your application would be sped up by skipping writing data to disk and instead writing directly to S3 this is possible in a sizable number of programming languages. Of relevance to scientific computing are the C++, Python, and Java SDKs. For a complete list please see the AWS Tools page (https://aws.amazon.com/tools/). We have tested the Python interface and have found it to be highly performant. Example Python script that puts the contents of the data string into a file called 'test.txt'. This works for serializable objects.

#!/usr/bin/env python
import boto3
s3 = boto3.resource('s3')
data = 'This is some test data in a string for S3'
s3.Bucket('cades-8d73a078-94c6-4a73-a668-345fc6ee8618').put_object(Key='test.txt', Body=data)

6. Utilization Query

Requires access to s3 api endpoint and port, and iam permissions for such.

# awss3api list-objects --bucket mydata --output json --query "[sum(Contents[].Size), length(Contents[])]"
[
    98620405600492,
    109899
]

https://www.exratione.com/2016/11/analyzing-the-contents-of-very-large-s3-buckets/

FAQ

awscli_plugin_endpoint

If you see the error:

ModuleNotFoundError: No module named 'awscli_plugin_endpoint'

You need to run the following command:

module load python

Additional Developer Resources

Hortonworks Cloud Data Access book.