Performance dashboard showing upload throughput metrics with concurrency and latency graphs

High-Throughput Uploads: Performance Tuning and Parallelism

Maximize upload throughput with concurrency tuning, keep-alive optimization, server-side scaling, and practical measurement techniques for chunked upload pipelines.

Ops·Updated 2026-04-26

Bandwidth Is Rarely the Bottleneck

Most upload performance problems aren't bandwidth problems. A user on a 100 Mbps connection uploading 2 MB chunks should see each chunk transfer in under 200 ms. In practice, they often see 500 ms or more per chunk — connection setup overhead, server processing time, proxy buffering, and configuration gaps eat the difference.

Improving upload throughput means identifying and eliminating these hidden costs. The approach is systematic: measure, identify the bottleneck, adjust one variable, re-measure. Resist the urge to change everything at once.

Client-Side Tuning

Three Resumable.js configuration options directly affect throughput:

simultaneousUploads — the number of chunks uploaded in parallel. Default is 3. Increasing this overlaps per-request overhead across multiple connections. The optimal value depends on available bandwidth, server capacity, and browser connection limits.

chunkSize — the size of each chunk in bytes. Larger chunks mean fewer HTTP requests (less overhead) but less granular progress and higher per-failure cost. Smaller chunks mean more requests but better resume granularity. The optimal chunk sizes guide covers this trade-off in detail.

testChunks — when true, a GET request is sent for each chunk to check if it already exists. On a fresh upload with no interruption, these test requests are pure overhead. If you can detect fresh uploads (no prior upload state in localStorage), skip the test phase.

const r = new Resumable({
  target: '/api/upload',
  chunkSize: 4 * 1024 * 1024,   // 4 MB — fewer requests than 1 MB
  simultaneousUploads: 4,         // 4 parallel streams
  testChunks: hasExistingState(), // Only test if resuming
  maxChunkRetries: 3,
});

The interaction between chunkSize and simultaneousUploads matters. With 4 MB chunks and 4 parallel streams, you have 16 MB of data in flight at any moment. On a 50 Mbps upload connection, that's about 2.5 seconds of data queued. Increasing either value beyond what your connection can sustain just adds memory pressure without improving throughput.

Keep-Alive: The Overlooked Multiplier

Every new HTTP connection requires a TCP handshake (1 round trip) and, for HTTPS, a TLS handshake (1–2 additional round trips). On a connection with 100 ms latency, that's 200–300 ms of overhead per new connection — before a single byte of file data is sent.

HTTP keep-alive reuses existing connections for subsequent requests, eliminating this overhead. For chunked uploads sending dozens or hundreds of requests to the same endpoint, the savings are substantial.

Modern browsers enable keep-alive by default. The question is whether your server and reverse proxy honor it:

# nginx: enable keep-alive for upload endpoints
upstream upload_backend {
    server 127.0.0.1:3000;
    keepalive 32;              # Pool of keep-alive connections to backend
}

server {
    location /api/upload {
        proxy_pass http://upload_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";   # Enable keep-alive to upstream

        keepalive_timeout 120s;           # Keep connections open between chunks
        keepalive_requests 1000;          # Allow many requests per connection
    }
}

Without this configuration, nginx may open a new connection to your backend for every chunk request, defeating keep-alive even though the browser reuses its connection to nginx.

Verify keep-alive is working by checking the Connection header in responses and monitoring connection counts on your server. If you see new connections per request in your server logs, something in the chain is closing connections prematurely.

Browser Connection Limits

Browsers limit concurrent connections per origin:

  • HTTP/1.1: ~6 connections per origin (varies by browser)
  • HTTP/2: 1 connection with multiplexed streams (100+ concurrent requests typical)

With HTTP/1.1, setting simultaneousUploads above 5 is counterproductive — the excess requests queue behind the connection limit, and you're consuming connections that other parts of your application need (API calls, images, WebSocket).

HTTP/2 changes the math. Multiplexing allows many concurrent requests over a single connection, eliminating the per-connection overhead. If your server supports HTTP/2, you can increase simultaneousUploads more aggressively — though you're still sharing bandwidth, not multiplying it.

Check your server's HTTP version:

curl -sI https://your-server.com/api/upload | grep -i "http/"
# HTTP/2 200 — good, multiplexing available
# HTTP/1.1 200 — limited to ~6 concurrent connections

Server-Side Scaling

The upload endpoint is often the most I/O-intensive part of a web application. Each chunk request writes data to disk or storage. Optimize for that workload:

Async I/O — use non-blocking file writes. In Node.js, use streams. In Python, use async frameworks or offload writes to a thread pool. A synchronous write that blocks for 10 ms per chunk limits your throughput to 100 chunks/second per worker.

// Node.js: stream chunk data to disk
const fs = require('fs');

app.post('/api/upload', (req, res) => {
  const chunkPath = getChunkPath(req.body.resumableIdentifier, req.body.resumableChunkNumber);
  const writeStream = fs.createWriteStream(chunkPath);

  req.pipe(writeStream);

  writeStream.on('finish', () => res.sendStatus(200));
  writeStream.on('error', (err) => res.sendStatus(500));
});

Worker processes — run multiple upload handler processes. With Node.js, use cluster mode or PM2. With Python, use Gunicorn with multiple workers. Each worker handles chunks independently.

Upload endpoint isolation — separate your upload endpoint from your API and web servers. Upload traffic is bursty, I/O-heavy, and can consume significant memory. Running it on the same process as your API means a burst of uploads can degrade API response times. Deploy upload handlers on dedicated instances or containers.

Reverse Proxy Configuration

A misconfigured reverse proxy is the most common cause of upload performance problems. Key settings for nginx:

location /api/upload {
    # Don't buffer the upload body to disk — stream it to the backend
    proxy_request_buffering off;

    # Allow large chunk sizes
    client_max_body_size 20m;

    # Generous timeouts for slow connections
    proxy_read_timeout 300s;
    proxy_send_timeout 300s;
    client_body_timeout 300s;

    # Keep-alive settings (see above)
    proxy_http_version 1.1;
    proxy_set_header Connection "";
}

proxy_request_buffering off is critical. By default, nginx buffers the entire request body to disk before forwarding it to the backend. For upload chunks, this means the data is written to disk twice — once by nginx, once by your application. Disabling buffering streams data directly to the backend, halving disk I/O and reducing latency.

client_max_body_size must accommodate your largest chunk. Set it slightly above your chunkSize to account for multipart/form-data overhead (boundary strings, headers).

The timeouts guide covers timeout configuration for upload endpoints in detail.

Network Bottlenecks

Some throughput limits are outside your control but worth understanding:

Upload/download asymmetry — most consumer internet connections have asymmetric speeds. A "100 Mbps" cable connection might offer only 10 Mbps upload. Your 4 MB chunks at 10 Mbps upload take ~3.2 seconds each. No amount of server tuning changes this.

ISP throttling — some ISPs throttle sustained upload traffic. Users may see high initial throughput that drops after several seconds or megabytes.

VPN overhead — corporate VPNs add encryption overhead and route traffic through distant servers. Upload throughput through a VPN is often 50–70% of direct throughput.

Geographic distance — latency between client and server directly impacts per-request overhead and parallel upload effectiveness. If your users are global, consider upload endpoints in multiple regions.

Measuring Effectively

You can't tune what you don't measure. Use Resumable.js events to capture timing data:

const uploadMetrics = {};

r.on('chunkingStart', (file) => {
  uploadMetrics[file.uniqueIdentifier] = {
    startTime: Date.now(),
    chunkTimes: [],
    totalBytes: file.size,
  };
});

r.on('chunkSuccess', (file, message, chunk) => {
  const metrics = uploadMetrics[file.uniqueIdentifier];
  if (!metrics) return;

  metrics.chunkTimes.push({
    chunkNumber: chunk.offset + 1,
    size: chunk.endByte - chunk.startByte,
    duration: Date.now() - (chunk._startTime || metrics.startTime),
  });
});

r.on('fileSuccess', (file) => {
  const metrics = uploadMetrics[file.uniqueIdentifier];
  if (!metrics) return;

  const totalTime = (Date.now() - metrics.startTime) / 1000;
  const throughputMbps = (metrics.totalBytes * 8) / (totalTime * 1_000_000);

  console.log(`File: ${file.fileName}`);
  console.log(`Total time: ${totalTime.toFixed(1)}s`);
  console.log(`Throughput: ${throughputMbps.toFixed(1)} Mbps`);
  console.log(`Avg chunk time: ${(
    metrics.chunkTimes.reduce((a, c) => a + c.duration, 0) /
    metrics.chunkTimes.length
  ).toFixed(0)}ms`);
});

Key metrics to track:

  • Effective throughput — total bytes / wall-clock time (includes all overhead)
  • Per-chunk latency — time from chunk request start to response
  • Overhead ratio — compare per-chunk time to (chunk size / bandwidth); the difference is overhead
  • Failure rate — percentage of chunks requiring retry

Load Testing Upload Endpoints

Before tuning for production, simulate realistic load:

# Using curl to simulate chunked uploads
for i in $(seq 1 50); do
  dd if=/dev/urandom bs=2M count=1 2>/dev/null | \
    curl -s -o /dev/null -w "%{time_total}\n" \
    -X POST -F "file=@-" \
    -F "resumableChunkNumber=$i" \
    -F "resumableTotalChunks=50" \
    -F "resumableIdentifier=loadtest-001" \
    https://your-server.com/api/upload &
done
wait

For more realistic load testing, use tools like k6, Artillery, or Locust with custom scripts that simulate the full Resumable.js protocol: test chunks, upload chunks in parallel, and complete uploads.

Monitor server-side metrics during load tests: CPU, memory, disk I/O, and network. The bottleneck is usually disk I/O (writing chunks) or connection handling (too many concurrent requests for the worker pool).

Tuning Workflow

Follow this sequence:

  1. Baseline — measure throughput with default settings (simultaneousUploads: 3, chunkSize: 1MB)
  2. Chunk size — try 2 MB, 4 MB, 8 MB. Larger chunks reduce overhead. Find the point where increasing size no longer improves throughput.
  3. Parallelism — try 2, 3, 4, 5 simultaneous uploads. Watch for diminishing returns and server-side saturation.
  4. Keep-alive — verify connections are reused. Fix nginx or backend config if they aren't.
  5. Proxy buffering — disable proxy_request_buffering. Measure the improvement.
  6. Server workers — increase worker processes until disk I/O or CPU is the limit.
  7. Re-baseline — measure again with all changes. Compare to step 1.

Change one variable at a time. The rate limits guide helps you set safe upper bounds so that tuning for throughput doesn't create a denial-of-service risk for your own infrastructure.

The parallel vs resumable guide provides additional context on how concurrency and resumability interact, and how to balance throughput with resilience.

Throughput tuning is iterative. Your optimal configuration depends on your specific infrastructure, user base, and file sizes. Measure, adjust, measure again.