MIME Types and Extensions Are Not Proof
Every file that arrives at your server has a stated MIME type and an extension. Neither is trustworthy. The MIME type comes from the browser's best guess — often derived from the extension itself. The extension is just a string the user controls. Rename malware.exe to photo.jpg, and the browser will happily report image/jpeg.
Real upload security starts by inspecting actual file content. The technique is straightforward: read the raw bytes at the beginning of a file and compare them against known signatures. This is what operating systems do internally, and it's what your upload pipeline should do too.
What Magic Bytes Are
Most binary file formats begin with a fixed byte sequence — a magic number or file signature — that identifies the format regardless of what the file is named. These are standardized and well-documented:
| Format | Magic Bytes (hex) | Offset |
|---|---|---|
| JPEG | FF D8 FF | 0 |
| PNG | 89 50 4E 47 0D 0A 1A 0A | 0 |
25 50 44 46 | 0 | |
| GIF | 47 49 46 38 | 0 |
| ZIP | 50 4B 03 04 | 0 |
| MP4 | 66 74 79 70 | 4 |
| WebP | 52 49 46 46 ... 57 45 42 50 | 0, 8 |
Some formats (like MP4) place signatures at a fixed offset rather than byte zero. A few — like plain text, CSV, and some XML — have no magic bytes at all. For those, you fall back to content heuristics or simply allow the type based on context.
Client-Side Pre-Checks
Checking magic bytes on the client before uploading gives users instant feedback. If someone selects an .exe disguised as .png, you can reject it immediately rather than uploading gigabytes of useless data. This is a UX improvement, not a security measure — that distinction matters.
With Resumable.js, the fileAdded event fires before any upload begins. You can read the first bytes of the file using FileReader and validate the signature:
const SIGNATURES = {
'image/jpeg': [
{ bytes: [0xFF, 0xD8, 0xFF], offset: 0 }
],
'image/png': [
{ bytes: [0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A], offset: 0 }
],
'application/pdf': [
{ bytes: [0x25, 0x50, 0x44, 0x46], offset: 0 }
],
};
function checkMagicBytes(file, allowedTypes) {
return new Promise((resolve, reject) => {
// Read just the first 16 bytes — enough for most signatures
const slice = file.slice(0, 16);
const reader = new FileReader();
reader.onload = (e) => {
const arr = new Uint8Array(e.target.result);
for (const type of allowedTypes) {
const sigs = SIGNATURES[type];
if (!sigs) continue;
for (const sig of sigs) {
const match = sig.bytes.every(
(byte, i) => arr[sig.offset + i] === byte
);
if (match) {
resolve({ valid: true, detectedType: type });
return;
}
}
}
resolve({ valid: false, detectedType: null });
};
reader.onerror = () => reject(new Error('Failed to read file'));
reader.readAsArrayBuffer(slice);
});
}
Wire this into Resumable.js by returning false from the file-added handler to reject the file:
const r = new Resumable({
target: '/api/upload',
chunkSize: 2 * 1024 * 1024,
// File type and validation options
fileType: ['jpg', 'jpeg', 'png', 'pdf'],
});
r.on('fileAdded', async (file) => {
const allowed = ['image/jpeg', 'image/png', 'application/pdf'];
const result = await checkMagicBytes(file.file, allowed);
if (!result.valid) {
r.removeFile(file);
showError(`File "${file.fileName}" does not match an allowed type.`);
return false;
}
});
This pairs well with the built-in file validation options like fileType and maxFileSize. Use both: extension filtering catches casual mistakes, magic-byte checks catch deliberate deception.
Server-Side Validation: The Real Gate
Client-side checks are trivially bypassed. Anyone with browser dev tools or a script can POST whatever bytes they want. Your server must re-validate every assembled file independently.
After all chunks arrive and you've merged them into the final file, inspect it:
import struct
MAGIC_TABLE = {
'image/jpeg': [(0, b'\xFF\xD8\xFF')],
'image/png': [(0, b'\x89PNG\r\n\x1a\n')],
'application/pdf': [(0, b'%PDF')],
}
def validate_file_signature(filepath, allowed_types):
with open(filepath, 'rb') as f:
header = f.read(16)
for mime_type in allowed_types:
for offset, magic in MAGIC_TABLE.get(mime_type, []):
if header[offset:offset + len(magic)] == magic:
return mime_type
return None # No match — reject the file
If you're using Django or Flask for your chunk receiver, run this check in the same handler that merges the final chunk. See the patterns in the Python chunked upload guide for how to structure that merge step.
Chunk Integrity: Checksums Per Piece
When you're uploading a large file in chunks, corruption can happen at any point — network glitches, dropped packets, flipped bits in transit. Verifying the assembled file alone isn't enough if you want to know which chunk went bad.
The approach: compute a checksum (SHA-256 or MD5) for each chunk before sending, include it as a header or parameter, and verify it server-side.
async function computeChunkHash(blob) {
const buffer = await blob.arrayBuffer();
const hashBuffer = await crypto.subtle.digest('SHA-256', buffer);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
On the server, compare the received chunk's hash against the header value. If they don't match, reject that chunk — Resumable.js will retry it automatically.
You can also store per-chunk checksums and verify the full file after assembly by recomputing from the stored chunks. This is the same approach S3 uses for multipart upload integrity (the ETag on each part is an MD5).
End-of-File Verification
After reassembling all chunks, verify the complete file:
- Signature check — re-read the magic bytes from the assembled file (not from the first chunk in isolation, though they should match).
- File size — compare against the
totalSizedeclared by the client. A mismatch means chunks were lost or tampered with. - Whole-file hash — if the client computes a SHA-256 of the entire original file before upload and sends it as metadata, compare after assembly. This catches any corruption or manipulation across the full pipeline.
- Format-specific validation — for images, try decoding the file. For PDFs, parse the trailer. A file can have valid magic bytes but be truncated or internally corrupt.
import hashlib
def verify_assembled_file(filepath, expected_size, expected_hash=None):
actual_size = os.path.getsize(filepath)
if actual_size != expected_size:
return False, f'Size mismatch: expected {expected_size}, got {actual_size}'
if expected_hash:
sha = hashlib.sha256()
with open(filepath, 'rb') as f:
for block in iter(lambda: f.read(8192), b''):
sha.update(block)
if sha.hexdigest() != expected_hash:
return False, 'Hash mismatch'
return True, 'OK'
Trust Boundaries
Draw a clear line: client validation is UX, server validation is security.
Client-side magic-byte checks exist to give users fast feedback. They prevent wasted bandwidth and improve the upload experience. They are not security controls.
Server-side validation exists to enforce policy. The server should validate file signatures, check sizes, scan for malware, and verify integrity regardless of what the client claims to have already done. See the security guide for the full threat model.
Your server receiver sits at the trust boundary. Everything before it — the browser, the network, the client-side code — is untrusted input.
Common Pitfalls
Polyglot files are valid in multiple formats simultaneously. A file can be a valid JPEG and a valid ZIP at the same time — this is how some image-based exploits work. Checking magic bytes alone won't catch these. If your application processes uploads (resizes images, parses documents), use dedicated libraries that will reject malformed structures.
Zip bombs pass signature validation perfectly — they're valid ZIP files. A 42 KB zip bomb can expand to 4.5 petabytes. Never extract or decompress uploaded archives without size limits and recursion depth checks.
Files with valid headers but malicious content are the hardest to detect. An SVG file with embedded JavaScript, a PDF with auto-executing actions, a DOCX containing macros. Magic bytes only validate the container format, not the payload. For these, you need content-aware scanning — either antivirus integration or format-specific sanitization.
Practical Pipeline
A complete validation pipeline for chunked uploads looks like this:
- Client: check magic bytes on file selection → reject or warn immediately
- Client: compute per-chunk SHA-256 → include as header on each chunk request
- Server: verify chunk checksum on receipt → reject corrupted chunks (triggers retry)
- Server: assemble chunks after all arrive → verify total size and whole-file hash
- Server: check magic bytes on assembled file → reject type mismatches
- Server: run format-specific validation → decode images, parse headers
- Server: scan with antivirus if applicable → quarantine flagged files
- Server: move to permanent storage only after all checks pass
Each layer catches a different class of problem. Client checks catch user mistakes. Chunk checksums catch transport corruption. Server validation catches deliberate attacks. Together they form a defense-in-depth approach that matches the file validation philosophy: never rely on a single check.
The overhead of this pipeline is minimal. Reading 16 bytes for magic checks is instant. SHA-256 of a 2 MB chunk takes under a millisecond on modern hardware. The security value far exceeds the cost.
