diff options
| author | Ken D'Ambrosio <ken@jots.org> | 2026-06-08 17:09:51 +0000 |
|---|---|---|
| committer | Ken D'Ambrosio <ken@jots.org> | 2026-06-08 17:09:51 +0000 |
| commit | da28a20f091372375822f9dde4486ecade859e7e (patch) | |
| tree | 80d02f26c1b9d52f1a09e36f5d8946b1e3fedf6a | |
| parent | 4ba9f6451f5ab1e5ae95c0871d6fa594f49372cc (diff) | |
Add opt-in facial recognition: detection and embedding storage
- scripts/faces.py: Python helper using face_recognition (dlib/HOG) to
detect faces and return 128-D encodings as JSON; called by update.rb
- scripts/update.rb: enrich_faces() stores face boxes and encodings in
album.json per image (null = not yet processed, [] = processed/none found);
skips files already processed; gated on faces.enabled in config.yml
- Reads CONFIG_PATH (same env var as app.rb) to check faces.enabled flag
- Feature is off by default; enabled in this install via config.yml
- README.md, DESIGN.md: document installation, opt-in config, data model,
and planned clustering/people-management pipeline
People management UI and clustering script are the next milestone.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| -rw-r--r-- | DESIGN.md | 85 | ||||
| -rw-r--r-- | README.md | 38 | ||||
| -rw-r--r-- | scripts/faces.py | 51 | ||||
| -rw-r--r-- | scripts/update.rb | 31 |
4 files changed, 202 insertions, 3 deletions
@@ -116,6 +116,32 @@ Password hashing uses `OpenSSL::PKCS5.pbkdf2_hmac` from Ruby's standard library | **ExifTool** | Backing tool for MiniExiftool — must be installed on the server | | **ffmpeg** | Video thumbnail extraction (frame at 2 s) and duration probing via `ffprobe` | +### Optional: facial recognition + +| Component | Purpose | +|-----------|---------| +| **Python 3** | Runtime for `scripts/faces.py` | +| **face_recognition** (PyPI) | dlib-backed face detection and 128-D embedding extraction | +| `/opt/albumen/venv/` | Python virtual environment isolating the dependency | + +Install (one-time, takes ~30 min to compile dlib on CPU): + +```bash +apt install python3-pip python3-dev python3-venv cmake build-essential libopenblas-dev liblapack-dev +python3 -m venv /opt/albumen/venv +/opt/albumen/venv/bin/pip install face_recognition +``` + +Enable by adding to `config.yml`: + +```yaml +faces: + enabled: true +``` + +When disabled (or when the venv doesn't exist), `update.rb` simply skips face detection and the +rest of the app is unaffected. + --- ## Data Model — `album.json` @@ -164,6 +190,7 @@ used. The file is written atomically (write to a `.tmp` file, then | `taken_at` | `null` | ISO 8601 timestamp from EXIF; used for chronological sorting | | `width` / `height` | `null` | Pixel dimensions recorded by `update.rb` | | `exif_absent` | `null` | Set to `true` by `update.rb` when exiftool found no metadata; skips re-extraction on future rescans | +| `faces` | `null` | Set by `update.rb` when `faces.enabled`; array of `{"box": [top,right,bottom,left], "encoding": [128 floats]}` per detected face; `[]` means processed with no faces found; `null` means not yet processed | When `taken_at` is present on *any* file in an album, the entire album is sorted chronologically. Albums with no `taken_at` data stay in filename @@ -385,6 +412,64 @@ to the `albumen` user so the web app can read the files. --- +## Facial Recognition (opt-in) + +Enabled by setting `faces.enabled: true` in `config.yml`. When disabled, +no Python is invoked and no face data is stored or displayed. + +### Detection pipeline + +`update.rb` calls `enrich_faces` for each image file where `meta['faces']` +is `nil` (not yet processed). It shells out to `scripts/faces.py`, which: + +1. Loads the image with `face_recognition.load_image_file` (handles JPEG, + PNG, HEIC, etc. via Pillow). +2. Runs `face_locations(model="hog")` — the HOG model is fast on CPU and + accurate for frontal/near-frontal faces. (The CNN model is more accurate + but requires a GPU to be practical.) +3. For each detected location, calls `face_encodings` to produce a + 128-dimensional L2-normalised embedding vector. +4. Prints a JSON array to stdout; on any error prints `[]` so `update.rb` + always gets valid JSON. + +The result is stored as `meta['faces']` in `album.json`. An empty array +`[]` means "processed, no faces found" and prevents re-processing. A `null` +value means "not yet processed." + +Encodings are stored in full (128 floats each) to allow re-clustering +without reprocessing all images. + +### Clustering and people management (planned) + +A second pass (`scripts/cluster_faces.rb`) will: + +1. Walk all `album.json` files and collect every `{encoding, source_file, + box}` tuple. +2. Cluster them with a threshold distance (~0.6 in L2 space, empirically + good for dlib encodings). +3. Write `/var/albumen/people.json` — a map of stable UUIDs → cluster + metadata (name, representative encoding, member list). + +The admin `/admin/people` UI will let you: +- Name unidentified clusters ("Who is this?"). +- Merge two clusters that are the same person. +- Remove a photo from a cluster (false positive). + +Public `/people` and `/people/:id` routes will let any visitor browse by +person. + +### Performance notes + +- HOG face detection: ~0.5–2 s per image on a single CPU core. +- A library of 10,000 images takes ~3–6 hours to index fully, but the + sentinel-based skip means subsequent `update.rb` runs only process new + photos. +- Encodings stored in `album.json` are ~3.5 KB per face. A library with + an average of 2 faces per photo adds ~70 MB of JSON across 10,000 photos + — negligible. + +--- + ## Security **Path traversal:** `resolve_dir` and `resolve_file` call `File.expand_path` @@ -35,6 +35,11 @@ back end, plain HTML/CSS/JS front end. Live at **https://albumen.jots.org**. - **Run Update** button scans for new/removed files and generates missing thumbnails; **Force rescan all** checkbox bypasses the sentinel and rescans every directory +### Facial recognition (opt-in) +- Detects faces in photos and stores 128-D embeddings alongside each image +- Powered by [face_recognition](https://github.com/ageitgey/face_recognition) (dlib/HOG, CPU-only) +- People management and browse-by-person UI in progress + ### Media support | Category | Extensions | @@ -169,6 +174,39 @@ The PBKDF2-SHA256 hash is stored in `/opt/albumen/config.yml` (readable only by --- +## Facial recognition setup + +Face detection is opt-in. Install once, then enable in `config.yml`. + +### 1. Install Python dependencies (server, ~30 min first time) + +```bash +apt install python3-pip python3-dev python3-venv cmake build-essential libopenblas-dev liblapack-dev +python3 -m venv /opt/albumen/venv +/opt/albumen/venv/bin/pip install face_recognition +``` + +### 2. Enable in config.yml + +Add to `/opt/albumen/config.yml`: + +```yaml +faces: + enabled: true +``` + +### 3. Run the update script + +```bash +ruby /opt/albumen/scripts/update.rb +``` + +The update script will now detect faces in images and store bounding boxes and +embeddings in each album's `album.json`. This is a one-time cost per image; +subsequent runs skip already-processed photos. + +--- + ## Service management ```bash diff --git a/scripts/faces.py b/scripts/faces.py new file mode 100644 index 0000000..d072376 --- /dev/null +++ b/scripts/faces.py @@ -0,0 +1,51 @@ +#!/usr/bin/env python3 +""" +Detect faces in an image and return their bounding boxes and 128-D encodings. + +Usage: python3 faces.py <image_path> + +Stdout: JSON array — one object per face: + [{"box": [top, right, bottom, left], "encoding": [128 floats]}, ...] + +Returns "[]" when no faces are found or the image cannot be opened. +Errors are written to stderr; stdout is always valid JSON. +""" +import sys +import json + + +def main(): + if len(sys.argv) < 2: + print("[]") + return + + path = sys.argv[1] + try: + import face_recognition + except ImportError as e: + print(f"face_recognition not available: {e}", file=sys.stderr) + print("[]") + return + + try: + img = face_recognition.load_image_file(path) + except Exception as e: + print(f"Could not load {path}: {e}", file=sys.stderr) + print("[]") + return + + try: + locations = face_recognition.face_locations(img, model="hog") + encodings = face_recognition.face_encodings(img, locations) + result = [ + {"box": list(loc), "encoding": enc.tolist()} + for loc, enc in zip(locations, encodings) + ] + print(json.dumps(result)) + except Exception as e: + print(f"Detection error for {path}: {e}", file=sys.stderr) + print("[]") + + +if __name__ == "__main__": + main() diff --git a/scripts/update.rb b/scripts/update.rb index 822405f..d6effe5 100644 --- a/scripts/update.rb +++ b/scripts/update.rb @@ -23,9 +23,10 @@ require 'fileutils' require 'mini_magick' require 'mini_exiftool' -MEDIA_ROOT = (ENV['MEDIA_ROOT'] || '/var/albumen').freeze -CACHE_ROOT = (ENV['CACHE_ROOT'] || '/opt/albumen/cache/thumbs').freeze -THUMB_SIZE = 300 +MEDIA_ROOT = (ENV['MEDIA_ROOT'] || '/var/albumen').freeze +CACHE_ROOT = (ENV['CACHE_ROOT'] || '/opt/albumen/cache/thumbs').freeze +CONFIG_PATH = (ENV['CONFIG_PATH'] || '/opt/albumen/config.yml').freeze +THUMB_SIZE = 300 IMAGE_EXTS = %w[jpg jpeg png gif webp heic heif tiff bmp].freeze VIDEO_EXTS = %w[mp4 mov avi mkv webm m4v ogv].freeze @@ -34,6 +35,11 @@ MEDIA_EXTS = (IMAGE_EXTS + VIDEO_EXTS + AUDIO_EXTS).freeze TRANSCODE_EXTS = %w[avi mkv mov].freeze # not universally browser-playable; convert to MP4 SENTINEL_FILE = '.albumen_scanned'.freeze +_cfg = File.exist?(CONFIG_PATH) ? YAML.load_file(CONFIG_PATH, symbolize_names: true) : {} +FACES_ENABLED = (_cfg.dig(:faces, :enabled) == true).freeze +VENV_PYTHON = File.expand_path('../venv/bin/python3', __dir__).freeze +FACES_SCRIPT = File.expand_path('faces.py', __dir__).freeze + # Explicit directory argument implies force — you asked for it, it should run. FORCE_UPDATE = !!(ARGV.delete('--force') || ARGV[0]) @@ -171,6 +177,25 @@ def enrich_image(full, name, meta) warn " #{name}: dimension error — #{e.message}" end end + + enrich_faces(full, name, meta) +end + +def enrich_faces(full, name, meta) + return unless FACES_ENABLED + return unless meta['faces'].nil? # already processed ([] means "processed, none found") + return unless File.exist?(VENV_PYTHON) && File.exist?(FACES_SCRIPT) + + begin + out = IO.popen([VENV_PYTHON, FACES_SCRIPT, full], err: '/dev/null', &:read).strip + faces = JSON.parse(out.empty? ? '[]' : out) + if faces.is_a?(Array) + meta['faces'] = faces + puts " #{name}: #{faces.length} face(s)" unless faces.empty? + end + rescue StandardError => e + warn " #{name}: face detection error — #{e.message}" + end end def enrich_video(full, name, meta) |
