Add opt-in facial recognition: detection and embedding storage

- scripts/faces.py: Python helper using face_recognition (dlib/HOG) to detect faces and return 128-D encodings as JSON; called by update.rb - scripts/update.rb: enrich_faces() stores face boxes and encodings in album.json per image (null = not yet processed, [] = processed/none found); skips files already processed; gated on faces.enabled in config.yml - Reads CONFIG_PATH (same env var as app.rb) to check faces.enabled flag - Feature is off by default; enabled in this install via config.yml - README.md, DESIGN.md: document installation, opt-in config, data model, and planned clustering/people-management pipeline People management UI and clustering script are the next milestone. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
author: Ken D'Ambrosio <ken@jots.org> 2026-06-08 17:09:51 +0000
committer: Ken D'Ambrosio <ken@jots.org> 2026-06-08 17:09:51 +0000
commit: da28a20f091372375822f9dde4486ecade859e7e (patch)
tree: 80d02f26c1b9d52f1a09e36f5d8946b1e3fedf6a
parent: 4ba9f6451f5ab1e5ae95c0871d6fa594f49372cc (diff)
4 files changed, 202 insertions, 3 deletions
diff --git a/DESIGN.md b/DESIGN.md
index 7639ffc..a7f368c 100644
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -116,6 +116,32 @@ Password hashing uses `OpenSSL::PKCS5.pbkdf2_hmac` from Ruby's standard library
 | **ExifTool** | Backing tool for MiniExiftool — must be installed on the server |
 | **ffmpeg** | Video thumbnail extraction (frame at 2 s) and duration probing via `ffprobe` |
 
+### Optional: facial recognition
+
+| Component | Purpose |
+|-----------|---------|
+| **Python 3** | Runtime for `scripts/faces.py` |
+| **face_recognition** (PyPI) | dlib-backed face detection and 128-D embedding extraction |
+| `/opt/albumen/venv/` | Python virtual environment isolating the dependency |
+
+Install (one-time, takes ~30 min to compile dlib on CPU):
+
+```bash
+apt install python3-pip python3-dev python3-venv cmake build-essential libopenblas-dev liblapack-dev
+python3 -m venv /opt/albumen/venv
+/opt/albumen/venv/bin/pip install face_recognition
+```
+
+Enable by adding to `config.yml`:
+
+```yaml
+faces:
+  enabled: true
+```
+
+When disabled (or when the venv doesn't exist), `update.rb` simply skips face detection and the
+rest of the app is unaffected.
+
 ---
 
 ## Data Model — `album.json`
@@ -164,6 +190,7 @@ used. The file is written atomically (write to a `.tmp` file, then
 | `taken_at` | `null` | ISO 8601 timestamp from EXIF; used for chronological sorting |
 | `width` / `height` | `null` | Pixel dimensions recorded by `update.rb` |
 | `exif_absent` | `null` | Set to `true` by `update.rb` when exiftool found no metadata; skips re-extraction on future rescans |
+| `faces` | `null` | Set by `update.rb` when `faces.enabled`; array of `{"box": [top,right,bottom,left], "encoding": [128 floats]}` per detected face; `[]` means processed with no faces found; `null` means not yet processed |
 
 When `taken_at` is present on *any* file in an album, the entire album is
 sorted chronologically. Albums with no `taken_at` data stay in filename
@@ -385,6 +412,64 @@ to the `albumen` user so the web app can read the files.
 
 ---
 
+## Facial Recognition (opt-in)
+
+Enabled by setting `faces.enabled: true` in `config.yml`. When disabled,
+no Python is invoked and no face data is stored or displayed.
+
+### Detection pipeline
+
+`update.rb` calls `enrich_faces` for each image file where `meta['faces']`
+is `nil` (not yet processed). It shells out to `scripts/faces.py`, which:
+
+1. Loads the image with `face_recognition.load_image_file` (handles JPEG,
+   PNG, HEIC, etc. via Pillow).
+2. Runs `face_locations(model="hog")` — the HOG model is fast on CPU and
+   accurate for frontal/near-frontal faces. (The CNN model is more accurate
+   but requires a GPU to be practical.)
+3. For each detected location, calls `face_encodings` to produce a
+   128-dimensional L2-normalised embedding vector.
+4. Prints a JSON array to stdout; on any error prints `[]` so `update.rb`
+   always gets valid JSON.
+
+The result is stored as `meta['faces']` in `album.json`. An empty array
+`[]` means "processed, no faces found" and prevents re-processing. A `null`
+value means "not yet processed."
+
+Encodings are stored in full (128 floats each) to allow re-clustering
+without reprocessing all images.
+
+### Clustering and people management (planned)
+
+A second pass (`scripts/cluster_faces.rb`) will:
+
+1. Walk all `album.json` files and collect every `{encoding, source_file,
+   box}` tuple.
+2. Cluster them with a threshold distance (~0.6 in L2 space, empirically
+   good for dlib encodings).
+3. Write `/var/albumen/people.json` — a map of stable UUIDs → cluster
+   metadata (name, representative encoding, member list).
+
+The admin `/admin/people` UI will let you:
+- Name unidentified clusters ("Who is this?").
+- Merge two clusters that are the same person.
+- Remove a photo from a cluster (false positive).
+
+Public `/people` and `/people/:id` routes will let any visitor browse by
+person.
+
+### Performance notes
+
+- HOG face detection: ~0.5–2 s per image on a single CPU core.
+- A library of 10,000 images takes ~3–6 hours to index fully, but the
+  sentinel-based skip means subsequent `update.rb` runs only process new
+  photos.
+- Encodings stored in `album.json` are ~3.5 KB per face. A library with
+  an average of 2 faces per photo adds ~70 MB of JSON across 10,000 photos
+  — negligible.
+
+---
+
 ## Security
 
 **Path traversal:** `resolve_dir` and `resolve_file` call `File.expand_path`
diff --git a/README.md b/README.md
index adc19a6..8167c0b 100644
--- a/README.md
+++ b/README.md
@@ -35,6 +35,11 @@ back end, plain HTML/CSS/JS front end.  Live at **https://albumen.jots.org**.
 - **Run Update** button scans for new/removed files and generates missing thumbnails;
   **Force rescan all** checkbox bypasses the sentinel and rescans every directory
 
+### Facial recognition (opt-in)
+- Detects faces in photos and stores 128-D embeddings alongside each image
+- Powered by [face_recognition](https://github.com/ageitgey/face_recognition) (dlib/HOG, CPU-only)
+- People management and browse-by-person UI in progress
+
 ### Media support
 
 | Category | Extensions |
@@ -169,6 +174,39 @@ The PBKDF2-SHA256 hash is stored in `/opt/albumen/config.yml` (readable only by
 
 ---
 
+## Facial recognition setup
+
+Face detection is opt-in. Install once, then enable in `config.yml`.
+
+### 1. Install Python dependencies (server, ~30 min first time)
+
+```bash
+apt install python3-pip python3-dev python3-venv cmake build-essential libopenblas-dev liblapack-dev
+python3 -m venv /opt/albumen/venv
+/opt/albumen/venv/bin/pip install face_recognition
+```
+
+### 2. Enable in config.yml
+
+Add to `/opt/albumen/config.yml`:
+
+```yaml
+faces:
+  enabled: true
+```
+
+### 3. Run the update script
+
+```bash
+ruby /opt/albumen/scripts/update.rb
+```
+
+The update script will now detect faces in images and store bounding boxes and
+embeddings in each album's `album.json`. This is a one-time cost per image;
+subsequent runs skip already-processed photos.
+
+---
+
 ## Service management
 
 ```bash
diff --git a/scripts/faces.py b/scripts/faces.py
new file mode 100644
index 0000000..d072376
--- /dev/null
+++ b/scripts/faces.py
@@ -0,0 +1,51 @@
+#!/usr/bin/env python3
+"""
+Detect faces in an image and return their bounding boxes and 128-D encodings.
+
+Usage: python3 faces.py <image_path>
+
+Stdout: JSON array — one object per face:
+  [{"box": [top, right, bottom, left], "encoding": [128 floats]}, ...]
+
+Returns "[]" when no faces are found or the image cannot be opened.
+Errors are written to stderr; stdout is always valid JSON.
+"""
+import sys
+import json
+
+
+def main():
+    if len(sys.argv) < 2:
+        print("[]")
+        return
+
+    path = sys.argv[1]
+    try:
+        import face_recognition
+    except ImportError as e:
+        print(f"face_recognition not available: {e}", file=sys.stderr)
+        print("[]")
+        return
+
+    try:
+        img = face_recognition.load_image_file(path)
+    except Exception as e:
+        print(f"Could not load {path}: {e}", file=sys.stderr)
+        print("[]")
+        return
+
+    try:
+        locations = face_recognition.face_locations(img, model="hog")
+        encodings = face_recognition.face_encodings(img, locations)
+        result = [
+            {"box": list(loc), "encoding": enc.tolist()}
+            for loc, enc in zip(locations, encodings)
+        ]
+        print(json.dumps(result))
+    except Exception as e:
+        print(f"Detection error for {path}: {e}", file=sys.stderr)
+        print("[]")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/update.rb b/scripts/update.rb
index 822405f..d6effe5 100644
--- a/scripts/update.rb
+++ b/scripts/update.rb
@@ -23,9 +23,10 @@ require 'fileutils'
 require 'mini_magick'
 require 'mini_exiftool'
 
-MEDIA_ROOT = (ENV['MEDIA_ROOT'] || '/var/albumen').freeze
-CACHE_ROOT = (ENV['CACHE_ROOT'] || '/opt/albumen/cache/thumbs').freeze
-THUMB_SIZE = 300
+MEDIA_ROOT  = (ENV['MEDIA_ROOT']  || '/var/albumen').freeze
+CACHE_ROOT  = (ENV['CACHE_ROOT']  || '/opt/albumen/cache/thumbs').freeze
+CONFIG_PATH = (ENV['CONFIG_PATH'] || '/opt/albumen/config.yml').freeze
+THUMB_SIZE  = 300
 
 IMAGE_EXTS     = %w[jpg jpeg png gif webp heic heif tiff bmp].freeze
 VIDEO_EXTS     = %w[mp4 mov avi mkv webm m4v ogv].freeze
@@ -34,6 +35,11 @@ MEDIA_EXTS     = (IMAGE_EXTS + VIDEO_EXTS + AUDIO_EXTS).freeze
 TRANSCODE_EXTS = %w[avi mkv mov].freeze  # not universally browser-playable; convert to MP4
 SENTINEL_FILE  = '.albumen_scanned'.freeze
 
+_cfg         = File.exist?(CONFIG_PATH) ? YAML.load_file(CONFIG_PATH, symbolize_names: true) : {}
+FACES_ENABLED = (_cfg.dig(:faces, :enabled) == true).freeze
+VENV_PYTHON   = File.expand_path('../venv/bin/python3', __dir__).freeze
+FACES_SCRIPT  = File.expand_path('faces.py', __dir__).freeze
+
 # Explicit directory argument implies force — you asked for it, it should run.
 FORCE_UPDATE = !!(ARGV.delete('--force') || ARGV[0])
 
@@ -171,6 +177,25 @@ def enrich_image(full, name, meta)
       warn "  #{name}: dimension error — #{e.message}"
     end
   end
+
+  enrich_faces(full, name, meta)
+end
+
+def enrich_faces(full, name, meta)
+  return unless FACES_ENABLED
+  return unless meta['faces'].nil?  # already processed ([] means "processed, none found")
+  return unless File.exist?(VENV_PYTHON) && File.exist?(FACES_SCRIPT)
+
+  begin
+    out = IO.popen([VENV_PYTHON, FACES_SCRIPT, full], err: '/dev/null', &:read).strip
+    faces = JSON.parse(out.empty? ? '[]' : out)
+    if faces.is_a?(Array)
+      meta['faces'] = faces
+      puts "  #{name}: #{faces.length} face(s)" unless faces.empty?
+    end
+  rescue StandardError => e
+    warn "  #{name}: face detection error — #{e.message}"
+  end
 end
 
 def enrich_video(full, name, meta)
author	Ken D'Ambrosio <ken@jots.org>	2026-06-08 17:09:51 +0000
committer	Ken D'Ambrosio <ken@jots.org>	2026-06-08 17:09:51 +0000
commit	da28a20f091372375822f9dde4486ecade859e7e (patch)
tree	80d02f26c1b9d52f1a09e36f5d8946b1e3fedf6a
parent	4ba9f6451f5ab1e5ae95c0871d6fa594f49372cc (diff)