safetar ******* [image: SafeTar Logo][image] Hardened TAR extraction for Python - secure by default. [image: PyPI Version][image][image: Supported Python versions][image][image: Build Status][image][image: Documentation Status][image][image: llms.txt - documentation for LLMs][image][image: MIT][image][image: Coverage][image] "safetar" is a zero-dependency, production-grade wrapper around Python's "tarfile" module that defends against the most common TAR- based attacks: TarSlip path traversal, decompression bombs, symlink/hardlink attacks, device file injection, and crafted archives. Features ======== * **TarSlip protection** - relative traversal, absolute paths, Unicode NFC normalisation attacks, PAX path overrides, GNU long-name reassembly, and null bytes in filenames are all blocked. * **Decompression bomb protection** - archive-level compression ratio monitoring across GZ, BZ2, and XZ streams aborts extraction before runaway decompression can exhaust disk or memory. * **File size limits** - per-member and total extraction size limits enforced at stream time (not based on untrusted header values). * **Symlink policy** - configurable: "REJECT" (default), "IGNORE", or "RESOLVE_INTERNAL" (full chain verification with TOCTOU defence via deferred batch creation). * **Hardlink policy** - configurable: "REJECT" (default) or "INTERNAL" (target must exist on disk; forward references rejected). * **Forbidden entry types** - character devices, block devices, FIFOs, and unknown type codes are always rejected. * **setuid/setgid/sticky bit stripping** - dangerous permission bits are removed by default. * **UID/GID ownership clamping** - archived ownership is clamped to the current user by default. * **Timestamp sanitisation** - mtime values are clamped to "[0, 2**32 - 1]". * **Sparse file policy** - "REJECT" (default) or "MATERIALISE" (extract as dense). * **Atomic writes** - every member is written to a temporary file first; the destination is only created after all checks pass. No partial files are left on disk after a security abort. * **Secure by default** - all limits are active without any configuration. * **Zero dependencies** - standard library only. * **Python 3.12 data_filter** - applied as an additional defensive layer when available. Prerequisites ============= Python 3.10 or later. No additional packages required. Installation ============ With "uv": uv pip install safetar Or with "pip": pip install safetar Quick start =========== Drop-in replacement for the common "tarfile" extraction pattern: from safetar import safe_extract safe_extract("path/to/upload.tar.gz", "/var/files/extracted/") Or use the "SafeTarFile" context manager for more control: from safetar import SafeTarFile with SafeTarFile("path/to/upload.tar.gz") as stf: print(stf.getnames()) stf.extractall("/var/files/extracted/") Custom limits ============= See the Default limits for reference. from safetar import SafeTarFile, SymlinkPolicy, HardlinkPolicy with SafeTarFile( "path/to/upload.tar.gz", max_file_size=100 * 1024 * 1024, # 100 MiB per member (default: 1 GiB) max_total_size=500 * 1024 * 1024, # 500 MiB total (default: 5 GiB) max_files=1_000, # (default: 10 000) max_ratio=50.0, # (default: 200) symlink_policy=SymlinkPolicy.IGNORE, # (default: SymlinkPolicy.REJECT) hardlink_policy=HardlinkPolicy.INTERNAL, # (default: HardlinkPolicy.REJECT) ) as stf: stf.extractall("/var/files/extracted/") Recursive extraction ==================== When an archive contains nested ".tar" files, set "recursive=True" to descend into them automatically. All safety limits apply at every level. Each nested archive is extracted into a directory named after it (without the extension). The nested ".tar" file is removed from disk after recursive extraction (see "_extract_nested_archive" in "_core.py"). from safetar import SafeTarFile # archive.tar # readme.txt # inner.tar ← will be descended into, not extracted as a blob # inner_file.txt with SafeTarFile("path/to/archive.tar.gz", recursive=True, max_nesting_depth=3) as stf: stf.extractall("/var/files/extracted/") # Result on disk: # /var/files/extracted/readme.txt # /var/files/extracted/inner/inner_file.txt By default, "recursive=False" and nested tar archives are extracted as regular files. When "recursive=True", safetar detects and extracts nested tar archives automatically using content-based detection ("tarfile.is_tarfile()"), avoiding extension-spoofing attacks. All security protections are applied to nested archives: * Nesting depth is enforced ("max_nesting_depth") * File size limits apply across all nested extractions ("max_file_size", "max_total_size") * Symlink, hardlink, and sparse policies are enforced * Permission, ownership, and timestamp sanitisation is applied * All other security checks (path traversal, decompression bombs, etc.) Security event monitoring ========================= from safetar import SafeTarFile, SecurityEvent def my_monitor(event: SecurityEvent) -> None: print(f"[safetar] {event.event_type} archive={event.archive_hash}") with SafeTarFile( "path/to/upload.tar.gz", on_security_event=my_monitor ) as stf: stf.extractall("/var/files/extracted/") Default limits ============== +----------------------------+--------------------+ | Parameter | Default | |============================|====================| | "max_file_size" | 1 GiB | +----------------------------+--------------------+ | "max_total_size" | 5 GiB | +----------------------------+--------------------+ | "max_files" | 10 000 | +----------------------------+--------------------+ | "max_ratio" | 200 | +----------------------------+--------------------+ | "max_nesting_depth" | 3 | +----------------------------+--------------------+ | "recursive" | False | +----------------------------+--------------------+ | "symlink_policy" | REJECT | +----------------------------+--------------------+ | "hardlink_policy" | REJECT | +----------------------------+--------------------+ | "sparse_policy" | REJECT | +----------------------------+--------------------+ | "strip_special_bits" | True | +----------------------------+--------------------+ | "preserve_ownership" | False | +----------------------------+--------------------+ | "clamp_timestamps" | True | +----------------------------+--------------------+ Environment variable configuration ================================== See the Default limits for reference. Every default can be overridden at process start via environment variables, without modifying call sites. Explicit constructor arguments always take precedence over environment variables. +-----------------------------------------+-----------------------------+ | Environment variable | Parameter | |=========================================|=============================| | "SAFETAR_MAX_FILE_SIZE" | "max_file_size" | +-----------------------------------------+-----------------------------+ | "SAFETAR_MAX_TOTAL_SIZE" | "max_total_size" | +-----------------------------------------+-----------------------------+ | "SAFETAR_MAX_FILES" | "max_files" | +-----------------------------------------+-----------------------------+ | "SAFETAR_MAX_RATIO" | "max_ratio" | +-----------------------------------------+-----------------------------+ | "SAFETAR_MAX_NESTING_DEPTH" | "max_nesting_depth" | +-----------------------------------------+-----------------------------+ | "SAFETAR_RECURSIVE" | "recursive" | +-----------------------------------------+-----------------------------+ | "SAFETAR_SYMLINK_POLICY" | "symlink_policy" | +-----------------------------------------+-----------------------------+ | "SAFETAR_HARDLINK_POLICY" | "hardlink_policy" | +-----------------------------------------+-----------------------------+ | "SAFETAR_SPARSE_POLICY" | "sparse_policy" | +-----------------------------------------+-----------------------------+ | "SAFETAR_STRIP_SPECIAL_BITS" | "strip_special_bits" | +-----------------------------------------+-----------------------------+ | "SAFETAR_PRESERVE_OWNERSHIP" | "preserve_ownership" | +-----------------------------------------+-----------------------------+ | "SAFETAR_CLAMP_TIMESTAMPS" | "clamp_timestamps" | +-----------------------------------------+-----------------------------+ Integer and float variables accept standard numeric strings. Boolean variables accept "1" / "true" / "yes" / "on" (truthy) or "0" / "false" / "no" / "off" (falsy), case-insensitively. Policy variables accept the lower-case enum value names (e.g. "SAFETAR_SYMLINK_POLICY=resolve_internal"). Unrecognised or unparseable values are silently ignored and the built-in default is used instead. CLI === "safetar" ships with a CLI for quick extraction: # Extract an archive safetar extract path/to/archive.tar.gz /var/files/extracted/ # List archive contents safetar list path/to/archive.tar.gz # Extract with custom limits safetar extract archive.tar /output/ \ --max-file-size 104857600 \ --max-total-size 524288000 \ --max-files 1000 # Enable recursive extraction safetar extract archive.tar /output/ --recursive # Show help safetar --help The CLI supports all the same security options as the Python API. Testing ======= All tests run inside Docker to prevent accidental pollution of the host system: make test To test a specific Python version: make test-env ENV=py312 Writing documentation ===================== Keep the following hierarchy: ===== title ===== header ====== sub-header ---------- sub-sub-header ~~~~~~~~~~~~~~ sub-sub-sub-header ^^^^^^^^^^^^^^^^^^ sub-sub-sub-sub-header ++++++++++++++++++++++ sub-sub-sub-sub-sub-header ************************** License ======= MIT Support ======= For security issues contact me at the e-mail given in the Author section. For overall issues, go to GitHub. Author ====== Artur Barseghyan Project documentation ===================== Contents: Table of Contents ^^^^^^^^^^^^^^^^^ * safetar * Features * Prerequisites * Installation * Quick start * Custom limits * Recursive extraction * Security event monitoring * Default limits * Environment variable configuration * CLI * Testing * Writing documentation * License * Support * Author * Project documentation * Security Policy * Reporting a Vulnerability * Supported Versions * Contributor guidelines * Developer prerequisites * Code standards * Virtual environment * Testing * Releases * Adding tests * Pull requests * GitHub Actions * Questions * Issues * Release history and notes * 0.1.2 * 0.1.1 * 0.1 * Package * Indices and tables * Project source-tree * .coderabbit.yaml * .coveralls.yml * AGENTS.md * CONTRIBUTING.rst * README.rst * conftest.py * docker-compose.yml * pyproject.toml * src/safetar/__init__.py * src/safetar/_core.py * src/safetar/_events.py * src/safetar/_exceptions.py * src/safetar/_guard.py * src/safetar/_sandbox.py * src/safetar/_streamer.py * src/safetar/cli/__init__.py * src/safetar/cli/_main.py * src/safetar/tests/__init__.py * src/safetar/tests/conftest.py * src/safetar/tests/test_cli.py * src/safetar/tests/test_guard.py * src/safetar/tests/test_integration.py * src/safetar/tests/test_sandbox.py * src/safetar/tests/test_streamer.py