---
title: "Performance Guide: When to Use Cloud vs Local"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Performance Guide: When to Use Cloud vs Local}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = FALSE)
```

## The Honest Truth

Cloud is not magic. It has overhead, costs money, and AWS CPUs are slower per-core than a modern laptop like an M4 Pro. But with the right instance types and spot pricing, cloud can be both faster and cheaper than you expect.

**Cloud wins when:**

- Total workload > 2–4 hours sequential
- You run the same analysis multiple times (parameter sweeps, recurring jobs)
- You need results faster than local parallel can deliver
- Your laptop needs to stay usable
- Budget allows $1–5 per job (often less with spot instances)

**Local wins when:**

- Workload < 1 hour
- One-time quick analysis
- You have a powerful multi-core machine (16+ cores)
- Data is too sensitive for cloud
- Zero budget

---

## Real Performance Data

**Workload**: 500M Monte Carlo iterations (9.7 hours sequential)

```
Sequential (1 core):        9.7 hours
M4 Pro (10 perf cores):     67.5 minutes
AWS EC2 c8a (50 workers):   ~30 minutes    ← winner

Cloud speedup: 2.2x vs M4 Pro parallel, 19x vs sequential
Cost: ~$1.10 (spot instances)
```

**Why 2.2x and not 5x?**

1. **Per-core gap**: c8a AMD is ~65–70% of M4 Pro speed (3.0 GHz AMD EPYC vs 3.5–4.0 GHz Apple Silicon)
2. **Startup overhead**: ~2 minutes for first result (pool warmup + S3 transfer) — amortized across subsequent runs
3. **Straggler effect**: some variation in task completion time across workers

---

## Per-Task Overhead

Once workers are running, each task incurs a small fixed overhead from S3 data transfer:

| Task Duration | Overhead % | Efficiency |
|--------------|------------|------------|
| 1 second     | 200–300%   | Negative   |
| 10 seconds   | 20–30%     | Moderate   |
| 30 seconds   | 7–10%      | Good       |
| 60 seconds   | 3–5%       | Excellent  |
| 5 minutes    | <1%        | Optimal    |

**Sweet spot**: tasks that run for 2–10 minutes each. For faster tasks, batch them.

---

## Batching

The most important optimization. Instead of sending 10,000 tiny tasks, group them:

```{r batching-bad}
# Bad: 10,000 tasks of 0.1s each — overhead dominates
starburst_map(1:10000, quick_fn, workers = 100)
```

```{r batching-good}
# Good: 100 tasks of 100s each — overhead is negligible
batches <- split(1:10000, ceiling(seq_along(1:10000) / 100))
starburst_map(batches, function(batch) lapply(batch, quick_fn), workers = 100)
```

**Batch size formula:**

```{r batch-size}
# Profile your function first
per_item_time <- 0.5   # seconds, from local profiling
target_task_duration <- 60  # aim for 60s minimum per task

batch_size <- ceiling(target_task_duration / per_item_time)
# Result: 120 items per batch
```

---

## Choosing Instance Types

| Instance | Architecture | Price/Perf | Best For |
|----------|-------------|------------|----------|
| **c8a** | AMD 8th gen | ★★★★★ | Default — best overall |
| **c8g** | Graviton4 ARM | ★★★★ | Best ARM64 option |
| **c7a** | AMD 7th gen | ★★★★ | Proven, stable |
| **c8i** | Intel 8th gen | ★★★ | High single-thread needs |

```{r instance-choice}
# Recommended: c8a with spot instances
plan(starburst,
  workers     = 50,
  instance_type = "c8a.xlarge",  # AMD 8th gen — best price/performance
  use_spot    = TRUE             # 70% cheaper than on-demand
)
```

**Spot vs on-demand (50 workers, us-east-1):**

```
c8a on-demand:  $7.20/hr
c8a spot:       $2.16/hr   ← 70% savings, low interruption risk
```

---

## The Startup Cost Problem

Workers need ~2 minutes to start cold (pool warmup + environment sync). This cost is **fixed** — it doesn't scale with job size. The key is to amortize it.

### Run once — marginal value

```
Local parallel (M4 Pro):  67.5 min, $0
Cloud 50 workers (cold):  46.5 min, $3.82
  → Saved 21 min for $3.82 = $10.91/hr saved
  → Marginal
```

### Run 10 times (parameter sweep) — much better

```
Local parallel:            675 min (10 × 67.5), laptop tied up
Cloud (workers stay warm): 375 min (10 min startup once + 10 × 36.5)
  → Saved 5 hours for $38 = $7.64/hr saved
  → Good value, laptop stays usable
```

### Daily recurring job — excellent

```
100 runs/month
Local:   6,750 min, laptop tied up 4.7 hrs/day
Cloud:   3,660 min, 0.1% startup overhead
  → ~$380/month, 51 hours saved
```

---

## Decision Framework

**How long is the workload?**
```
< 30 min:     Don't bother
30–60 min:    Local probably better
1–4 hours:    Cloud starts to make sense
4–8 hours:    Cloud clearly better
> 8 hours:    Easy choice
```

**How many times will you run it?**
```
Once:          Startup is ~20% of job time
2–5 times:     Startup amortizes to ~5–10%
10+ times:     Startup < 2%
Daily:         Keep warm pool, effectively zero
```

**What's your local hardware?**
```
4–6 cores:         Cloud dominates
8–10 cores (M4):   Cloud wins but it's closer
16+ cores:         Cloud might not win on speed
HPC cluster:       Use your cluster
```

---

## Common Patterns

### Pattern: One-shot analysis — use local

```{r one-shot-local}
library(parallel)
cl <- makeCluster(detectCores() - 1)
results <- parLapply(cl, data, your_function)
stopCluster(cl)
```

### Pattern: Parameter sweep — use cloud

```{r param-sweep}
# Pay startup cost once, run many combinations
for (alpha in seq(0.1, 1.0, 0.1)) {
  for (beta in seq(0.1, 1.0, 0.1)) {
    results <- starburst_map(
      data,
      function(x) model(x, alpha = alpha, beta = beta),
      workers = 50
    )
  }
}
```

### Pattern: Daily production job — keep warm pool

```{r warm-pool}
# Start warm pool once in the morning
plan(starburst, workers = 50, warm_pool_timeout = 28800)  # 8 hours

# All runs during the day start in < 30s
results_am <- starburst_map(morning_data, process)
results_pm <- starburst_map(afternoon_data, process)
# Pool shuts down automatically after 8 hours of inactivity
```

### Pattern: Hybrid — develop local, scale on cloud

```{r hybrid}
# Iterate quickly on a small sample locally
results_test <- lapply(data[1:100], your_function)

# When logic is right, scale to full dataset on cloud
results_full <- starburst_map(data, your_function, workers = 100)
```

---

## Cost Estimation

**Quick formula (EC2 spot, us-east-1):**

```
Cost ≈ workers × hours × $0.044/worker/hour
```

| Job size | Workers | Wall time | Spot cost |
|----------|---------|-----------|-----------|
| Small (1 hr sequential)  | 10  | ~6 min  | ~$0.04 |
| Medium (5 hr sequential) | 25  | ~12 min | ~$0.22 |
| Large (10 hr sequential) | 50  | ~25 min | ~$0.92 |

Use `starburst_estimate()` for a precise estimate before running.

---

## Common Pitfalls

**Too many small tasks:**
```{r pitfall-small}
# Bad: each task is 0.1s — overhead is 20-30x the work
starburst_map(1:10000, function(x) sqrt(x), workers = 100)

# Good: batch into groups of 100
batches <- split(1:10000, ceiling(seq_along(1:10000) / 100))
starburst_map(batches, function(b) sapply(b, sqrt), workers = 100)
```

**More workers than tasks:**
```{r pitfall-workers}
# Bad: 40 workers sit idle
starburst_map(1:10, fn, workers = 50)

# Good: match workers to workload
starburst_map(1:100, fn, workers = 25)  # 4 tasks per worker
```

**Sending large data to every worker:**
```{r pitfall-data}
# Bad: huge_matrix serialized and sent to each of 50 workers
huge_matrix <- matrix(rnorm(1e8), ncol = 1000)
starburst_map(1:50, function(i) process(huge_matrix, i), workers = 50)

# Good: generate data inside the worker
starburst_map(1:50, function(i) {
  data <- generate_chunk(i)  # create data on the worker
  process(data)
}, workers = 50)
```

---

## Use Case Quick Reference

| Use Case | Task Duration | Batch Size | Workers | Expected Speedup |
|----------|--------------|------------|---------|------------------|
| Fast calculations | 0.001s | 1000+ per task | 20–50 | 3–8x |
| API calls | 0.5–2s | 20–100 per task | 20–50 | 8–15x |
| Data processing | 10–60s | 5–20 per task | 20–50 | 12–20x |
| Report generation | 60–300s | 1–5 per task | 20–50 | 15–25x |
| Model training | 2–10 min | 1–2 per task | 20–50 | 18–30x |

---

## AWS Authentication

staRburst uses the [paws](https://github.com/paws-r/paws) AWS SDK, which supports
the full AWS credential chain:

- **Environment variables**: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`
- **Named profiles**: set `AWS_PROFILE=myprofile` to select a profile from `~/.aws/credentials`
- **AWS SSO / `aws login`**: supported via named profiles configured with SSO in `~/.aws/config` (requires AWS CLI v2, `aws login` available since November 2025)
- **IAM instance roles**: automatic when running on EC2 or ECS

```bash
# Standard profile
export AWS_PROFILE=my-aws-account
Rscript -e "library(starburst); starburst_setup_ec2()"

# SSO profile (AWS CLI v2)
aws login --profile my-sso-profile
export AWS_PROFILE=my-sso-profile
Rscript -e "library(starburst); starburst_setup_ec2()"
```

No explicit configuration is required in staRburst — it defers entirely to paws credential discovery.