| Title: | Seamless AWS Cloud Bursting for Parallel R Workloads |
|---|---|
| Description: | A 'future' backend that enables seamless execution of parallel R workloads on 'Amazon Web Services' ('AWS', <https://aws.amazon.com>), including 'EC2' and 'Fargate'. 'staRburst' handles environment synchronization, data transfer, quota management, and worker orchestration automatically, allowing users to scale from local execution to 100+ cloud workers with a single line of code change. |
| Authors: | Scott Friedman [aut, cre] |
| Maintainer: | Scott Friedman <[email protected]> |
| License: | Apache License 2.0 |
| Version: | 0.3.8 |
| Built: | 2026-06-03 03:06:15 UTC |
| Source: | https://github.com/scttfrdmn/starburst |
This is the entry point called by the Future package when a plan(starburst) is active
## S3 method for class 'starburst' future( expr, envir = parent.frame(), substitute = TRUE, lazy = FALSE, seed = FALSE, globals = TRUE, packages = NULL, stdout = TRUE, conditions = "condition", label = NULL, ... )## S3 method for class 'starburst' future( expr, envir = parent.frame(), substitute = TRUE, lazy = FALSE, seed = FALSE, globals = TRUE, packages = NULL, stdout = TRUE, conditions = "condition", label = NULL, ... )
expr |
Expression to evaluate |
envir |
Environment for evaluation |
substitute |
Whether to substitute the expression |
lazy |
Whether to lazily evaluate (always FALSE for remote) |
seed |
Random seed |
globals |
Globals to export (TRUE for auto-detection, list for manual) |
packages |
Packages to load |
stdout |
Whether to capture stdout (TRUE, FALSE, or NA) |
conditions |
Character vector of condition classes to capture |
label |
Optional label for the future |
... |
Additional arguments |
A StarburstFuture object
Launch a future on the Starburst backend
## S3 method for class 'StarburstBackend' launchFuture(backend, future, ...)## S3 method for class 'StarburstBackend' launchFuture(backend, future, ...)
backend |
A StarburstBackend object |
future |
The future object to launch |
... |
Additional arguments |
The future object (invisibly)
List futures for StarburstBackend
## S3 method for class 'StarburstBackend' listFutures(backend, ...)## S3 method for class 'StarburstBackend' listFutures(backend, ...)
backend |
A StarburstBackend object |
... |
Additional arguments |
List of futures (empty for this backend)
Number of workers for StarburstBackend
## S3 method for class 'StarburstBackend' nbrOfWorkers(evaluator)## S3 method for class 'StarburstBackend' nbrOfWorkers(evaluator)
evaluator |
A StarburstBackend object |
Number of workers
A future backend for running parallel R workloads on AWS (EC2 or Fargate)
## S3 method for class 'starburst' plan( strategy, workers = 10, cpu = 4, memory = "8GB", region = NULL, timeout = 3600, auto_quota_request = interactive(), launch_type = "EC2", instance_type = "c7g.xlarge", use_spot = TRUE, warm_pool_timeout = 3600, detached = FALSE, ... )## S3 method for class 'starburst' plan( strategy, workers = 10, cpu = 4, memory = "8GB", region = NULL, timeout = 3600, auto_quota_request = interactive(), launch_type = "EC2", instance_type = "c7g.xlarge", use_spot = TRUE, warm_pool_timeout = 3600, detached = FALSE, ... )
strategy |
The starburst strategy marker (ignored, for S3 dispatch) |
workers |
Number of parallel workers |
cpu |
vCPUs per worker (1, 2, 4, 8, or 16) |
memory |
Memory per worker (supports GB notation, e.g., "8GB") |
region |
AWS region (default: from config or "us-east-1") |
timeout |
Maximum runtime in seconds (default: 3600) |
auto_quota_request |
Automatically request quota increases (default: interactive()) |
launch_type |
Launch type: EC2 or FARGATE (default: EC2) |
instance_type |
EC2 instance type when using EC2 launch type (default: c7g.xlarge) |
use_spot |
Use EC2 Spot instances for cost savings (default: TRUE) |
warm_pool_timeout |
Timeout for warm pool in seconds (default: 3600) |
detached |
Use detached session mode (deprecated, use starburst_session instead) |
... |
Additional arguments passed to future backend |
A future plan object
if (starburst_is_configured()) { future::plan(starburst, workers = 50) results <- future.apply::future_lapply(1:100, function(i) i^2) }if (starburst_is_configured()) { future::plan(starburst, workers = 50) results <- future.apply::future_lapply(1:100, function(i) i^2) }
Print method for session status
## S3 method for class 'StarburstSessionStatus' print(x, ...)## S3 method for class 'StarburstSessionStatus' print(x, ...)
x |
A StarburstSessionStatus object |
... |
Additional arguments (ignored) |
Invisibly returns x.
Checks whether the future task has completed execution
## S3 method for class 'StarburstFuture' resolved(x, ...)## S3 method for class 'StarburstFuture' resolved(x, ...)
x |
A StarburstFuture object |
... |
Additional arguments |
Logical indicating if the future is resolved
Retrieves the result from a resolved future
## S3 method for class 'StarburstFuture' result(future, ...)## S3 method for class 'StarburstFuture' result(future, ...)
future |
A StarburstFuture object |
... |
Additional arguments |
A FutureResult object
Submits the future task to AWS Fargate for execution
## S3 method for class 'StarburstFuture' run(future, ...)## S3 method for class 'StarburstFuture' run(future, ...)
future |
A StarburstFuture object |
... |
Additional arguments |
The future object (invisibly)
This function should never be called directly. Use plan(starburst, ...) instead.
starburst(...)starburst(...)
... |
Arguments passed to StarburstBackend() |
Does not return a value; always signals an error if called directly.
This object exists as a strategy marker for plan.
Monitor quota increase request
starburst_check_quota_request(case_id, region = NULL)starburst_check_quota_request(case_id, region = NULL)
case_id |
Case ID from quota increase request |
region |
AWS region |
Invisibly returns the quota request details, or NULL on error.
if (starburst_is_configured()) { starburst_check_quota_request("case-12345") }if (starburst_is_configured()) { starburst_check_quota_request("case-12345") }
Manually delete Docker images from ECR to save storage costs. Images will be rebuilt on next use (adds 3-5 min delay).
starburst_cleanup_ecr(force = FALSE, region = NULL)starburst_cleanup_ecr(force = FALSE, region = NULL)
force |
Delete all images immediately, ignoring TTL |
region |
AWS region (default: from config) |
Invisibly returns TRUE on success or FALSE if not
configured.
if (starburst_is_configured()) { # Delete images past TTL starburst_cleanup_ecr() # Delete all images immediately (save $0.50/month) starburst_cleanup_ecr(force = TRUE) }if (starburst_is_configured()) { # Delete images past TTL starburst_cleanup_ecr() # Delete all images immediately (save $0.50/month) starburst_cleanup_ecr(force = TRUE) }
Creates a cluster object for managing AWS Fargate workers using Future backend
starburst_cluster( workers = 10, cpu = 4, memory = "8GB", platform = "X86_64", region = NULL, timeout = 3600 )starburst_cluster( workers = 10, cpu = 4, memory = "8GB", platform = "X86_64", region = NULL, timeout = 3600 )
workers |
Number of parallel workers |
cpu |
CPU units per worker |
memory |
Memory per worker |
platform |
CPU architecture (X86_64 or ARM64) |
region |
AWS region |
timeout |
Maximum runtime in seconds |
A starburst_cluster object
if (starburst_is_configured()) { cluster <- starburst_cluster(workers = 20) results <- cluster$map(data, function(x) x * 2) }if (starburst_is_configured()) { cluster <- starburst_cluster(workers = 20) results <- cluster$map(data, function(x) x * 2) }
Configure staRburst options
starburst_config( max_cost_per_job = NULL, cost_alert_threshold = NULL, auto_cleanup_s3 = NULL, ... )starburst_config( max_cost_per_job = NULL, cost_alert_threshold = NULL, auto_cleanup_s3 = NULL, ... )
max_cost_per_job |
Maximum cost per job in dollars |
cost_alert_threshold |
Cost threshold for alerts |
auto_cleanup_s3 |
Automatically clean up S3 files after completion |
... |
Additional configuration options |
Invisibly returns the updated configuration list.
if (starburst_is_configured()) { starburst_config( max_cost_per_job = 10, cost_alert_threshold = 5 ) }if (starburst_is_configured()) { starburst_config( max_cost_per_job = 10, cost_alert_threshold = 5 ) }
Runs a small sample of tasks locally to estimate cloud execution time and cost. Provides informed prediction before spending money on cloud execution.
starburst_estimate( .x, .f, workers = 10, cpu = 2, memory = "8GB", platform = "X86_64", sample_size = 10, region = NULL, ... )starburst_estimate( .x, .f, workers = 10, cpu = 2, memory = "8GB", platform = "X86_64", sample_size = 10, region = NULL, ... )
.x |
A vector or list to iterate over |
.f |
A function to apply to each element |
workers |
Number of parallel workers to estimate for |
cpu |
CPU units per worker (1, 2, 4, 8, or 16) |
memory |
Memory per worker (e.g., "8GB") |
platform |
CPU architecture: "X86_64" (default) or "ARM64" (Graviton3) |
sample_size |
Number of items to run locally for estimation (default: 10) |
region |
AWS region |
... |
Additional arguments passed to .f |
Invisible list with estimates, prints summary to console
if (starburst_is_configured()) { # Estimate before running starburst_estimate(1:1000, expensive_function, workers = 50) # Then decide whether to proceed results <- starburst_map(1:1000, expensive_function, workers = 50) }if (starburst_is_configured()) { # Estimate before running starburst_estimate(1:1000, expensive_function, workers = 50) # Then decide whether to proceed results <- starburst_map(1:1000, expensive_function, workers = 50) }
Returns TRUE if starburst_setup() has been run, the
configuration file exists, and AWS credentials are available.
Useful for guarding example code that requires AWS credentials.
starburst_is_configured()starburst_is_configured()
TRUE if configured and credentials are available, FALSE otherwise.
starburst_is_configured()starburst_is_configured()
List all detached sessions in S3
starburst_list_sessions(region = NULL)starburst_list_sessions(region = NULL)
region |
AWS region (default: from config) |
Data frame with session information
if (starburst_is_configured()) { sessions <- starburst_list_sessions() print(sessions) }if (starburst_is_configured()) { sessions <- starburst_list_sessions() print(sessions) }
View worker logs
starburst_logs(task_id = NULL, cluster_id = NULL, last_n = 50, region = NULL)starburst_logs(task_id = NULL, cluster_id = NULL, last_n = 50, region = NULL)
task_id |
Optional task ID to view logs for specific task |
cluster_id |
Optional cluster ID to view logs for specific cluster |
last_n |
Number of last log lines to show (default: 50) |
region |
AWS region (default: from config) |
Invisibly returns the list of log events, or NULL if no
events were found.
if (starburst_is_configured()) { # View recent logs starburst_logs() # View logs for specific task starburst_logs(task_id = "abc-123") # View last 100 lines starburst_logs(last_n = 100) }if (starburst_is_configured()) { # View recent logs starburst_logs() # View logs for specific task starburst_logs(task_id = "abc-123") # View last 100 lines starburst_logs(last_n = 100) }
Parallel map function that executes on AWS Fargate using the Future backend
starburst_map( .x, .f, workers = 10, cpu = 4, memory = "8GB", platform = "X86_64", region = NULL, timeout = 3600, .progress = TRUE, ... )starburst_map( .x, .f, workers = 10, cpu = 4, memory = "8GB", platform = "X86_64", region = NULL, timeout = 3600, .progress = TRUE, ... )
.x |
A vector or list to iterate over |
.f |
A function to apply to each element |
workers |
Number of parallel workers (default: 10) |
cpu |
CPU units per worker (1, 2, 4, 8, or 16) |
memory |
Memory per worker (e.g., 8GB) |
platform |
CPU architecture (X86_64 or ARM64) |
region |
AWS region |
timeout |
Maximum runtime in seconds per task |
.progress |
Show progress bar (default: TRUE) |
... |
Additional arguments passed to .f |
A list of results, one per element of .x
if (starburst_is_configured()) { # Simple parallel computation results <- starburst_map(1:100, function(x) x^2, workers = 10) # With custom configuration results <- starburst_map( data_list, expensive_function, workers = 50, cpu = 4, memory = "8GB" ) }if (starburst_is_configured()) { # Simple parallel computation results <- starburst_map(1:100, function(x) x^2, workers = 10) # With custom configuration results <- starburst_map( data_list, expensive_function, workers = 50, cpu = 4, memory = "8GB" ) }
Show quota status
starburst_quota_status(region = NULL)starburst_quota_status(region = NULL)
region |
AWS region (default: from config) |
Invisibly returns a list with quota information including current limit, usage, and any pending requests.
if (starburst_is_configured()) { starburst_quota_status() }if (starburst_is_configured()) { starburst_quota_status() }
Rebuild environment image
starburst_rebuild_environment(region = NULL, force = FALSE)starburst_rebuild_environment(region = NULL, force = FALSE)
region |
AWS region (default: from config) |
force |
Force rebuild even if current environment hasn't changed |
Invisibly returns NULL. Called for its side effect of
rebuilding and pushing the Docker environment image.
if (starburst_is_configured()) { starburst_rebuild_environment() }if (starburst_is_configured()) { starburst_rebuild_environment() }
Request quota increase (user-facing)
starburst_request_quota_increase(vcpus = 500, region = NULL)starburst_request_quota_increase(vcpus = 500, region = NULL)
vcpus |
Desired vCPU quota |
region |
AWS region (default: from config) |
Invisibly returns TRUE if the increase was requested,
FALSE if already sufficient or cancelled.
if (starburst_is_configured()) { starburst_request_quota_increase(vcpus = 500) }if (starburst_is_configured()) { starburst_request_quota_increase(vcpus = 500) }
Creates a new detached session that can run computations independently of your R session. You can close R and reattach later to collect results.
starburst_session( workers = 10, cpu = 4, memory = "8GB", region = NULL, timeout = 3600, session_timeout = 3600, absolute_timeout = 86400, launch_type = "EC2", instance_type = "c7g.xlarge", use_spot = TRUE, warm_pool_timeout = 3600 )starburst_session( workers = 10, cpu = 4, memory = "8GB", region = NULL, timeout = 3600, session_timeout = 3600, absolute_timeout = 86400, launch_type = "EC2", instance_type = "c7g.xlarge", use_spot = TRUE, warm_pool_timeout = 3600 )
workers |
Number of parallel workers (default: 10) |
cpu |
vCPUs per worker (default: 4) |
memory |
Memory per worker, e.g., "8GB" (default: "8GB") |
region |
AWS region (default: from config or "us-east-1") |
timeout |
Task timeout in seconds (default: 3600) |
session_timeout |
Active timeout in seconds (default: 3600) |
absolute_timeout |
Maximum session lifetime in seconds (default: 86400) |
launch_type |
"FARGATE" or "EC2" (default: "FARGATE") |
instance_type |
EC2 instance type for EC2 launch (default: "c6a.large") |
use_spot |
Use spot instances for EC2 (default: FALSE) |
warm_pool_timeout |
EC2 warm pool timeout in seconds (default: 3600) |
A StarburstSession object with methods:
submit(expr, ...) - Submit a task to the session
status() - Get progress summary
collect(wait = FALSE) - Collect completed results
extend(seconds = 3600) - Extend timeout
cleanup() - Terminate and cleanup
if (starburst_is_configured()) { # Create detached session session <- starburst_session(workers = 10) # Submit tasks task_ids <- lapply(1:100, function(i) { session$submit(quote(expensive_computation(i))) }) # Close R and come back later... session_id <- session$session_id # Reattach session <- starburst_session_attach(session_id) # Collect results results <- session$collect(wait = TRUE) }if (starburst_is_configured()) { # Create detached session session <- starburst_session(workers = 10) # Submit tasks task_ids <- lapply(1:100, function(i) { session$submit(quote(expensive_computation(i))) }) # Close R and come back later... session_id <- session$session_id # Reattach session <- starburst_session_attach(session_id) # Collect results results <- session$collect(wait = TRUE) }
Reattach to a previously created detached session
starburst_session_attach(session_id, region = NULL)starburst_session_attach(session_id, region = NULL)
session_id |
Session identifier |
region |
AWS region (default: from config) |
A StarburstSession object
if (starburst_is_configured()) { session <- starburst_session_attach("session-abc123") status <- session$status() results <- session$collect() }if (starburst_is_configured()) { session <- starburst_session_attach("session-abc123") status <- session$status() results <- session$collect() }
One-time configuration to set up AWS resources for staRburst
starburst_setup( region = "us-east-1", force = FALSE, use_public_base = TRUE, ecr_image_ttl_days = NULL, build_image = TRUE )starburst_setup( region = "us-east-1", force = FALSE, use_public_base = TRUE, ecr_image_ttl_days = NULL, build_image = TRUE )
region |
AWS region (default: "us-east-1") |
force |
Force re-setup even if already configured |
use_public_base |
Use public base Docker images (default: TRUE). Set to FALSE to build private base images in your ECR. |
ecr_image_ttl_days |
Number of days to keep Docker images in ECR (default: NULL = never delete). AWS will automatically delete images older than this many days. This prevents surprise costs if you stop using staRburst. Recommended: 30 days for regular users, 7 days for occasional users. When images are deleted, they will be rebuilt on next use (adds 3-5 min). |
build_image |
Build the worker environment image during setup (default: TRUE).
Set to FALSE to provision AWS resources (S3/ECR/ECS/VPC), write config, and
check quotas without triggering the multi-minute Docker image build. The
image is then built lazily on first worker launch via
|
Invisibly returns the configuration list.
if (starburst_is_configured()) { # Default: keep images forever (~$0.50/month idle cost) starburst_setup() # Auto-delete images after 30 days (saves money if you stop using it) starburst_setup(ecr_image_ttl_days = 30) # Use private base images with 7-day cleanup starburst_setup(use_public_base = FALSE, ecr_image_ttl_days = 7) # Provision resources without building the image (fast; CI / connectivity checks) starburst_setup(build_image = FALSE) }if (starburst_is_configured()) { # Default: keep images forever (~$0.50/month idle cost) starburst_setup() # Auto-delete images after 30 days (saves money if you stop using it) starburst_setup(ecr_image_ttl_days = 30) # Use private base images with 7-day cleanup starburst_setup(use_public_base = FALSE, ecr_image_ttl_days = 7) # Provision resources without building the image (fast; CI / connectivity checks) starburst_setup(build_image = FALSE) }
One-time setup for EC2 launch type. Creates IAM roles, instance profiles, and capacity providers for specified instance types.
starburst_setup_ec2( region = "us-east-1", instance_types = c("c7g.xlarge", "c7i.xlarge"), force = FALSE )starburst_setup_ec2( region = "us-east-1", instance_types = c("c7g.xlarge", "c7i.xlarge"), force = FALSE )
region |
AWS region (default: "us-east-1") |
instance_types |
Character vector of instance types to setup (default: c("c7g.xlarge", "c7i.xlarge")) |
force |
Force re-setup even if already configured |
Invisibly returns TRUE on success or FALSE on failure
or cancellation.
if (starburst_is_configured()) { # Setup with default instance types (Graviton and Intel) starburst_setup_ec2() # Setup with custom instance types starburst_setup_ec2(instance_types = c("c7g.2xlarge", "r7g.xlarge")) }if (starburst_is_configured()) { # Setup with default instance types (Graviton and Intel) starburst_setup_ec2() # Setup with custom instance types starburst_setup_ec2(instance_types = c("c7g.2xlarge", "r7g.xlarge")) }
Show staRburst status
starburst_status()starburst_status()
Invisibly returns a list with current configuration and quota information.
A future backend for running parallel R workloads on AWS ECS
StarburstBackend( workers = 10, cpu = 4, memory = "8GB", region = NULL, timeout = 3600, launch_type = "EC2", instance_type = "c6a.large", use_spot = FALSE, warm_pool_timeout = 3600, ... )StarburstBackend( workers = 10, cpu = 4, memory = "8GB", region = NULL, timeout = 3600, launch_type = "EC2", instance_type = "c6a.large", use_spot = FALSE, warm_pool_timeout = 3600, ... )
workers |
Number of parallel workers |
cpu |
vCPUs per worker (1, 2, 4, 8, or 16) |
memory |
Memory per worker (supports GB notation, e.g., "8GB") |
region |
AWS region (default: from config or "us-east-1") |
timeout |
Maximum runtime in seconds (default: 3600) |
launch_type |
"EC2" or "FARGATE" (default: "EC2") |
instance_type |
EC2 instance type (e.g., "c6a.large") |
use_spot |
Use spot instances (default: FALSE) |
warm_pool_timeout |
Pool timeout in seconds (default: 3600) |
... |
Additional arguments |
A StarburstBackend object