A science-focused, more humane R interface to AWS.
Authentication
To be able to use this package you’ll need two AWS secrets and an AWS region in the following three environment variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
You can set these within R for the current R session like:
Sys.setenv(
AWS_ACCESS_KEY_ID = "",
AWS_SECRET_ACCESS_KEY = "",
AWS_REGION = "us-west-2"
)
Or set them in a variety of ways to be available across R sessions. See the R Startup chapter of What They Forgot to Teach You About R book for more details.
Package API Overview
-
aws_billing*
: manage AWS billing details -
aws_bucket*
: manage S3 buckets -
aws_file_*
: manage files in S3 buckets -
aws_user*
: manage AWS users -
aws_group*
: manage AWS groups -
aws_role*
: manage AWS roles -
aws_policy*
: manage AWS policies -
aws_db*
: interact with AWS database services Redshift and RDS -
aws_secrets*
: secrets manager -
aws_vpc_security*
: VPC security groups
Working with S3
This vignette won’t touch on all of the above parts of the package
API - but instead will cover working with files as that’s likely a
common use case for sixtyfour
users.
Buckets
Make a random bucket name
random_bucket_name <- function() {
glue::glue("egs-{paste0(sample(letters, size = 12), collapse = '')}")
}
bucket <- random_bucket_name()
Create a bucket - check if it exists first
exists <- aws_bucket_exists(bucket)
if (!exists) {
aws_bucket_create(bucket)
}
#> [1] "http://egs-tnxoekbrjcdl.s3.amazonaws.com/"
Create files in a few different directories
library(fs)
tdir <- fs::path(tempdir(), "apples")
fs::dir_create(tdir)
tfiles <- replicate(n = 10, fs::file_temp(tmp_dir = tdir, ext = ".txt"))
invisible(lapply(tfiles, function(x) write.csv(mtcars, x)))
Upload them to the newly created bucket
aws_bucket_upload(path = tdir, bucket = bucket)
#> [1] "s3://private/var/folders/qt/fzq1m_bj2yb_7b2jz57s9q7c0000gp/T/RtmpyTGZNW/apples"
List objects in the bucket
objects <- aws_bucket_list_objects(bucket)
objects
#> # A tibble: 10 × 8
#> bucket_name key uri size type owner etag last_modified
#> <chr> <chr> <chr> <fs:> <chr> <chr> <chr> <dttm>
#> 1 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file <NA> "\"6… 2024-03-19 23:28:06
#> 2 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file <NA> "\"6… 2024-03-19 23:28:06
#> 3 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file <NA> "\"6… 2024-03-19 23:28:06
#> 4 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file <NA> "\"6… 2024-03-19 23:28:06
#> 5 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file <NA> "\"6… 2024-03-19 23:28:06
#> 6 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file <NA> "\"6… 2024-03-19 23:28:07
#> 7 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file <NA> "\"6… 2024-03-19 23:28:07
#> 8 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file <NA> "\"6… 2024-03-19 23:28:07
#> 9 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file <NA> "\"6… 2024-03-19 23:28:07
#> 10 egs-tnxoekbrjcdl file102ee… s3:/… 1.74K file <NA> "\"6… 2024-03-19 23:28:07
Cleanup - delete the bucket.
aws_bucket_delete(bucket)
#> Error: BucketNotEmpty (HTTP 409). The bucket you tried to delete is not empty
If there’s files in your bucket you can not delete it. Delete files, then delete bucket again
aws_file_delete(objects$uri)
aws_bucket_delete(bucket)
Files
All or most of the file functions are built around accepting and
returning character vectors of length one or greater. This includes the
functions that take two inputs, such as a source and destination file.
Rather than using the paws
package under the hood as most
functions in this package use, the file functions use s3fs
under the hood (which is itself built on paws
) as it’s a
cleaner interface to S3.
First, create a bucket:
my_bucket <- random_bucket_name()
aws_bucket_create(my_bucket)
#> [1] "http://egs-fzpqkyhmtxlg.s3.amazonaws.com/"
Then, upload some files
temp_files <- replicate(n = 3, tempfile(fileext = ".txt"))
for (i in temp_files) cat(letters, "\n", file = i)
remote_files <- s3_path(my_bucket, basename(temp_files))
aws_file_upload(path = temp_files, remote_path = remote_files)
#> [1] "s3://egs-fzpqkyhmtxlg/file102ee45c0b74.txt"
#> [2] "s3://egs-fzpqkyhmtxlg/file102ee36f3eee8.txt"
#> [3] "s3://egs-fzpqkyhmtxlg/file102ee48c9dd87.txt"
List files in the bucket
obs <- aws_bucket_list_objects(my_bucket)
obs
#> # A tibble: 3 × 8
#> bucket_name key uri size type owner etag last_modified
#> <chr> <chr> <chr> <fs:> <chr> <chr> <chr> <dttm>
#> 1 egs-fzpqkyhmtxlg file102ee3… s3:/… 53 file <NA> "\"a… 2024-03-19 23:28:10
#> 2 egs-fzpqkyhmtxlg file102ee4… s3:/… 53 file <NA> "\"a… 2024-03-19 23:28:10
#> 3 egs-fzpqkyhmtxlg file102ee4… s3:/… 53 file <NA> "\"a… 2024-03-19 23:28:10
Fetch file attributes
aws_file_attr(remote_files)
#> # A tibble: 3 × 38
#> bucket_name key uri size type etag last_modified delete_marker
#> <chr> <chr> <chr> <fs:> <chr> <chr> <dttm> <lgl>
#> 1 egs-fzpqkyhmt… file… s3:/… 53 file "\"a… 2024-03-19 23:28:10 NA
#> 2 egs-fzpqkyhmt… file… s3:/… 53 file "\"a… 2024-03-19 23:28:10 NA
#> 3 egs-fzpqkyhmt… file… s3:/… 53 file "\"a… 2024-03-19 23:28:10 NA
#> # ℹ 30 more variables: accept_ranges <chr>, expiration <chr>, restore <chr>,
#> # archive_status <chr>, checksum_crc32 <chr>, checksum_crc32_c <chr>,
#> # checksum_sha1 <chr>, checksum_sha256 <chr>, missing_meta <int>,
#> # version_id <chr>, cache_control <chr>, content_disposition <chr>,
#> # content_encoding <chr>, content_language <chr>, content_type <chr>,
#> # expires <dttm>, website_redirect_location <chr>,
#> # server_side_encryption <chr>, metadata <list>, …
Check if one or more files exist
aws_file_exists(remote_files[1])
#> [1] TRUE
aws_file_exists(remote_files)
#> [1] TRUE TRUE TRUE
Copy
new_bucket <- random_bucket_name()
aws_bucket_create(new_bucket)
#> [1] "http://egs-gqalfwjhukzr.s3.amazonaws.com/"
# add existing files to the new bucket
aws_file_copy(remote_files, new_bucket)
#> [1] "s3://egs-gqalfwjhukzr/file102ee45c0b74.txt"
#> [2] "s3://egs-gqalfwjhukzr/file102ee36f3eee8.txt"
#> [3] "s3://egs-gqalfwjhukzr/file102ee48c9dd87.txt"
# create bucket that doesn't exist yet
# the force=TRUE makes this work non-interactively
aws_file_copy(remote_files, random_bucket_name(), force = TRUE)
#> [1] "s3://egs-tsadhcfyrjmo/file102ee45c0b74.txt"
#> [2] "s3://egs-tsadhcfyrjmo/file102ee36f3eee8.txt"
#> [3] "s3://egs-tsadhcfyrjmo/file102ee48c9dd87.txt"
Download
tfile <- tempfile()
aws_file_download(remote_files[1], tfile)
#> [1] "/var/folders/qt/fzq1m_bj2yb_7b2jz57s9q7c0000gp/T//RtmpyTGZNW/file102ee497415cd"
readLines(tfile)
#> [1] "a b c d e f g h i j k l m n o p q r s t u v w x y z "
Rename
aws_file_exists(remote_files[1])
#> [1] TRUE
aws_file_rename(remote_files[1], s3_path(dirname(remote_files[1]), "myfile.txt"))
#> [1] "s3://egs-fzpqkyhmtxlg/myfile.txt"
aws_file_exists(remote_files[1])
#> [1] FALSE