Loading YAML configs safely in CLI apps
[project]
name = "secure-yaml-cli"
version = "1.0.0"
requires-python = ">=3.10"
dependencies = [
"pyyaml>=6.0",
"pydantic>=2.0",
]
pip install pyyaml pydantic
The Arbitrary Code Execution Vulnerability in YAML
Legacy yaml.load() implementations default to unsafe deserialization. Crafted files can trigger !!python/object/apply: tags during parsing. These payloads bypass standard input boundaries and execute arbitrary system commands. When architecting robust Advanced Input Parsing & User Experience pipelines, treating external config files as untrusted input is non-negotiable.
- YAML supports arbitrary Python object instantiation by default
- Untrusted configs can trigger
ConstructorErroror silent RCE - Always isolate config parsing from execution logic
Secure Loading Pattern with yaml.safe_load()
Replace yaml.load() with yaml.safe_load() to restrict deserialization to standard primitives. This function explicitly blocks custom constructors and object instantiation. Pair it with pathlib for deterministic file resolution. Use context managers for safe I/O handling.
import yaml
from pathlib import Path
def load_config(config_path: Path) -> dict:
if not config_path.exists():
raise FileNotFoundError(f"Config not found: {config_path}")
with open(config_path, "r", encoding="utf-8") as f:
# safe_load blocks !!python/object tags by default
return yaml.safe_load(f) or {}
safe_load()is the only recommended entry point for external YAML- Always specify
encoding="utf-8"to prevent locale decoding errors - Return empty dict on
Noneto avoidTypeErroron iteration
Strict Schema Validation with Pydantic (Python 3.10+)
Parsed YAML lacks type guarantees. Injecting Pydantic models immediately after loading enforces strict contracts. This approach provides actionable validation errors and aligns with production standards for Handling Configuration Files & Env Vars. It prevents silent misconfigurations from propagating into CLI execution.
from pydantic import BaseModel, Field, ValidationError
from typing import Optional
class CLIConfig(BaseModel):
host: str = Field(default="localhost", pattern=r"^[a-zA-Z0-9.-]+$")
port: int = Field(ge=1024, le=65535)
debug: bool = False
retries: Optional[int] = Field(default=3, ge=0)
# Usage after safe_load
raw_data = load_config(Path("config.yaml"))
try:
config = CLIConfig(**raw_data)
except ValidationError as e:
print(f"Invalid config: {e}")
raise SystemExit(1)
- Pydantic v2 uses
patternfor regex validation - Fail-fast validation prevents runtime crashes
- Type coercion handles string-to-int conversions automatically
Exact Error Resolution: ConstructorError & SafeLoader
Migrating legacy code often triggers yaml.constructor.ConstructorError. The exact message reads: could not determine a constructor for the tag '!!python/object/apply:os.system'. This occurs when yaml.load() lacks an explicit Loader on PyYAML 5.1+. Malicious payloads frequently target older versions. Resolution requires two steps. First, replace yaml.load(data) with yaml.safe_load(data). Second, if legacy constraints force yaml.load(), explicitly pass Loader=yaml.SafeLoader. Never use yaml.FullLoader or yaml.UnsafeLoader for user-provided files.
# LEGACY (VULNERABLE)
# config = yaml.load(data)
# FIXED (SAFE)
config = yaml.safe_load(data)
# OR explicitly:
# config = yaml.load(data, Loader=yaml.SafeLoader)
- Exact error:
ConstructorError: could not determine a constructor for the tag... - Fix:
yaml.safe_load()oryaml.load(data, Loader=yaml.SafeLoader) - Audit all third-party CLI plugins that consume YAML configs