Skip to contents

A science-focused, more humane R interface to AWS.

Authentication

To be able to use this package you’ll need two AWS secrets and an AWS region in the following three environment variables:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_REGION

You can set these within R for the current R session like:

Sys.setenv(
  AWS_ACCESS_KEY_ID = "",
  AWS_SECRET_ACCESS_KEY = "",
  AWS_REGION = "us-west-2"
)

Or set them in a variety of ways to be available across R sessions. See the R Startup chapter of What They Forgot to Teach You About R book for more details.

Package API Overview

  • aws_billing*: manage AWS billing details
  • aws_bucket*: manage S3 buckets
  • aws_file_*: manage files in S3 buckets
  • aws_user*: manage AWS users
  • aws_group*: manage AWS groups
  • aws_role*: manage AWS roles
  • aws_policy*: manage AWS policies
  • aws_db*: interact with AWS database services Redshift and RDS
  • aws_secrets*: secrets manager
  • aws_vpc_security*: VPC security groups

Working with S3

This vignette won’t touch on all of the above parts of the package API - but instead will cover working with files as that’s likely a common use case for sixtyfour users.

Buckets

Make a random bucket name

random_bucket_name <- function() {
  glue::glue("egs-{paste0(sample(letters, size = 12), collapse = '')}")
}
bucket <- random_bucket_name()

Create a bucket - check if it exists first

exists <- aws_bucket_exists(bucket)

if (!exists) {
  aws_bucket_create(bucket)
}
#> [1] "http://egs-tnxoekbrjcdl.s3.amazonaws.com/"

Create files in a few different directories

library(fs)
tdir <- fs::path(tempdir(), "apples")
fs::dir_create(tdir)
tfiles <- replicate(n = 10, fs::file_temp(tmp_dir = tdir, ext = ".txt"))
invisible(lapply(tfiles, function(x) write.csv(mtcars, x)))

Upload them to the newly created bucket

aws_bucket_upload(path = tdir, bucket = bucket)
#> [1] "s3://private/var/folders/qt/fzq1m_bj2yb_7b2jz57s9q7c0000gp/T/RtmpyTGZNW/apples"

List objects in the bucket

objects <- aws_bucket_list_objects(bucket)
objects
#> # A tibble: 10 × 8
#>    bucket_name      key        uri    size type  owner etag  last_modified
#>    <chr>            <chr>      <chr> <fs:> <chr> <chr> <chr> <dttm>
#>  1 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file  <NA>  "\"6… 2024-03-19 23:28:06
#>  2 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file  <NA>  "\"6… 2024-03-19 23:28:06
#>  3 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file  <NA>  "\"6… 2024-03-19 23:28:06
#>  4 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file  <NA>  "\"6… 2024-03-19 23:28:06
#>  5 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file  <NA>  "\"6… 2024-03-19 23:28:06
#>  6 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file  <NA>  "\"6… 2024-03-19 23:28:07
#>  7 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file  <NA>  "\"6… 2024-03-19 23:28:07
#>  8 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file  <NA>  "\"6… 2024-03-19 23:28:07
#>  9 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file  <NA>  "\"6… 2024-03-19 23:28:07
#> 10 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file  <NA>  "\"6… 2024-03-19 23:28:07

Cleanup - delete the bucket.

#> Error: BucketNotEmpty (HTTP 409). The bucket you tried to delete is not empty

If there’s files in your bucket you can not delete it. Delete files, then delete bucket again

aws_file_delete(objects$uri)
aws_bucket_delete(bucket)

Files

All or most of the file functions are built around accepting and returning character vectors of length one or greater. This includes the functions that take two inputs, such as a source and destination file. Rather than using the paws package under the hood as most functions in this package use, the file functions use s3fs under the hood (which is itself built on paws) as it’s a cleaner interface to S3.

First, create a bucket:

my_bucket <- random_bucket_name()
aws_bucket_create(my_bucket)
#> [1] "http://egs-fzpqkyhmtxlg.s3.amazonaws.com/"

Then, upload some files

temp_files <- replicate(n = 3, tempfile(fileext = ".txt"))
for (i in temp_files) cat(letters, "\n", file = i)
remote_files <- s3_path(my_bucket, basename(temp_files))
aws_file_upload(path = temp_files, remote_path = remote_files)
#> [1] "s3://egs-fzpqkyhmtxlg/file102ee45c0b74.txt"
#> [2] "s3://egs-fzpqkyhmtxlg/file102ee36f3eee8.txt"
#> [3] "s3://egs-fzpqkyhmtxlg/file102ee48c9dd87.txt"

List files in the bucket

obs <- aws_bucket_list_objects(my_bucket)
obs
#> # A tibble: 3 × 8
#>   bucket_name      key         uri    size type  owner etag  last_modified
#>   <chr>            <chr>       <chr> <fs:> <chr> <chr> <chr> <dttm>
#> 1 egs-fzpqkyhmtxlg file102ee3… s3:/…    53 file  <NA>  "\"a… 2024-03-19 23:28:10
#> 2 egs-fzpqkyhmtxlg file102ee4… s3:/…    53 file  <NA>  "\"a… 2024-03-19 23:28:10
#> 3 egs-fzpqkyhmtxlg file102ee4… s3:/…    53 file  <NA>  "\"a… 2024-03-19 23:28:10

Fetch file attributes

aws_file_attr(remote_files)
#> # A tibble: 3 × 38
#>   bucket_name    key   uri    size type  etag  last_modified       delete_marker
#>   <chr>          <chr> <chr> <fs:> <chr> <chr> <dttm>              <lgl>
#> 1 egs-fzpqkyhmt… file… s3:/…    53 file  "\"a… 2024-03-19 23:28:10 NA
#> 2 egs-fzpqkyhmt… file… s3:/…    53 file  "\"a… 2024-03-19 23:28:10 NA
#> 3 egs-fzpqkyhmt… file… s3:/…    53 file  "\"a… 2024-03-19 23:28:10 NA
#> # ℹ 30 more variables: accept_ranges <chr>, expiration <chr>, restore <chr>,
#> #   archive_status <chr>, checksum_crc32 <chr>, checksum_crc32_c <chr>,
#> #   checksum_sha1 <chr>, checksum_sha256 <chr>, missing_meta <int>,
#> #   version_id <chr>, cache_control <chr>, content_disposition <chr>,
#> #   content_encoding <chr>, content_language <chr>, content_type <chr>,
#> #   expires <dttm>, website_redirect_location <chr>,
#> #   server_side_encryption <chr>, metadata <list>, …

Check if one or more files exist

aws_file_exists(remote_files[1])
#> [1] TRUE
aws_file_exists(remote_files)
#> [1] TRUE TRUE TRUE

Copy

new_bucket <- random_bucket_name()
aws_bucket_create(new_bucket)
#> [1] "http://egs-gqalfwjhukzr.s3.amazonaws.com/"

# add existing files to the new bucket
aws_file_copy(remote_files, new_bucket)
#> [1] "s3://egs-gqalfwjhukzr/file102ee45c0b74.txt"
#> [2] "s3://egs-gqalfwjhukzr/file102ee36f3eee8.txt"
#> [3] "s3://egs-gqalfwjhukzr/file102ee48c9dd87.txt"

# create bucket that doesn't exist yet
# the force=TRUE makes this work non-interactively
aws_file_copy(remote_files, random_bucket_name(), force = TRUE)
#> [1] "s3://egs-tsadhcfyrjmo/file102ee45c0b74.txt"
#> [2] "s3://egs-tsadhcfyrjmo/file102ee36f3eee8.txt"
#> [3] "s3://egs-tsadhcfyrjmo/file102ee48c9dd87.txt"

Download

tfile <- tempfile()
aws_file_download(remote_files[1], tfile)
#> [1] "/var/folders/qt/fzq1m_bj2yb_7b2jz57s9q7c0000gp/T//RtmpyTGZNW/file102ee497415cd"
readLines(tfile)
#> [1] "a b c d e f g h i j k l m n o p q r s t u v w x y z "

Rename

aws_file_exists(remote_files[1])
#> [1] TRUE
aws_file_rename(remote_files[1], s3_path(dirname(remote_files[1]), "myfile.txt"))
#> [1] "s3://egs-fzpqkyhmtxlg/myfile.txt"
aws_file_exists(remote_files[1])
#> [1] FALSE