diff options
Diffstat (limited to 'DESIGN.md')
| -rw-r--r-- | DESIGN.md | 148 |
1 files changed, 108 insertions, 40 deletions
@@ -26,18 +26,29 @@ Apache reverse proxy (192.168.10.1 / albumen.jots.org) Puma application server (127.0.0.1:4567) │ Rack / Sinatra ▼ -app.rb ──reads──► /var/albumen/ (media files + album.json sidecars) - ──reads──► /opt/albumen/cache/thumbs/ (generated thumbnails) - ──reads──► /opt/albumen/config.yml (password hash + session secret) +app.rb ──reads──► /var/albumen/ (media files + album.json + faces.json sidecars) + ──reads──► /opt/albumen/cache/thumbs/ (generated thumbnails) + ──reads──► /opt/albumen/config.yml (password hash + session secret) + +face_daemon.rb ──reads──► /var/albumen/ (image files) + ──writes─► /var/albumen/**/faces.json (per-directory face sidecar) ``` ### Process model -The app runs as the `albumen` system user under systemd +The web app runs as the `albumen` system user under systemd (`config/albumen.service`). Puma is configured for 1 worker with 4–8 threads (`config/puma.rb`). Logs go to `/opt/albumen/log/`. On crash, systemd restarts the process after 5 seconds. +The face detection daemon (`config/face_daemon.service`) runs as a +separate systemd service under the same `albumen` user. It polls +`MEDIA_ROOT` every `poll_interval` seconds (default 300), processes +any images not yet in a `faces.json` sidecar, and writes results +atomically. It never touches `album.json`, so there is no write +contention with `update.rb`. All process output is written to the +shared log at `/opt/albumen/log/albumen.log`. + ### Reverse proxy The proxy (Apache or nginx — a sample nginx config is in @@ -57,7 +68,8 @@ is set to unlimited because large video files may be uploaded via rsync. config.ru Gemfile / Gemfile.lock config/ - albumen.service ← systemd unit file + albumen.service ← systemd unit file for the web app + face_daemon.service ← systemd unit file for the face detection daemon puma.rb ← Puma config nginx-albumen.conf ← sample reverse-proxy config views/ @@ -74,6 +86,8 @@ is set to unlimited because large video files may be uploaded via rsync. img/audio.svg ← placeholder thumbnail for audio files scripts/ update.rb ← post-upload scan/enrich script + face_daemon.rb ← face detection daemon (polls for new images, writes faces.json) + faces.py ← dlib CNN face detection helper called by the daemon set_password.rb ← PBKDF2-SHA256 password setter cache/thumbs/ ← generated thumbnail cache (mirrored path structure) tmp/ ← Puma pid / state files @@ -83,11 +97,13 @@ is set to unlimited because large video files may be uploaded via rsync. /var/albumen/ ← media root (owned by albumen user) album.json ← root-level sidecar (optional) SomeAlbum/ - album.json ← per-album metadata sidecar + album.json ← per-album metadata sidecar (owned by update.rb) + faces.json ← per-album face data sidecar (owned by face_daemon.rb) photo1.jpg photo2.jpg SubAlbum/ album.json + faces.json photo3.jpg ``` @@ -190,7 +206,6 @@ used. The file is written atomically (write to a `.tmp` file, then | `taken_at` | `null` | ISO 8601 timestamp from EXIF; used for chronological sorting | | `width` / `height` | `null` | Pixel dimensions recorded by `update.rb` | | `exif_absent` | `null` | Set to `true` by `update.rb` when exiftool found no metadata; skips re-extraction on future rescans | -| `faces` | `null` | Set by `update.rb` when `faces.enabled`; array of `{"box": [top,right,bottom,left], "encoding": [128 floats]}` per detected face; `[]` means processed with no faces found; `null` means not yet processed | When `taken_at` is present on *any* file in an album, the entire album is sorted chronologically. Albums with no `taken_at` data stay in filename @@ -417,57 +432,110 @@ to the `albumen` user so the web app can read the files. Enabled by setting `faces.enabled: true` in `config.yml`. When disabled, no Python is invoked and no face data is stored or displayed. +### Architecture: separate daemon + +Face detection runs in a completely separate process (`scripts/face_daemon.rb`, +managed by `config/face_daemon.service`) and is entirely decoupled from +`update.rb`. This design keeps the two operations from conflicting: + +- **`update.rb`** owns `album.json` in each directory. It indexes media, + extracts EXIF data, and generates thumbnails as fast as possible so + newly uploaded photos are browseable immediately. +- **`face_daemon.rb`** owns `faces.json` in each directory. It runs + continuously in the background, processing images that haven't been + detected yet. There is no file locking or write contention between the + two processes. + +### Data model — `faces.json` + +Each directory gets a `faces.json` sidecar written by the daemon: + +```json +{ + "photo1.jpg": [ + {"box": [top, right, bottom, left], "encoding": [128 floats]}, + ... + ], + "photo2.jpg": [], + "photo3.jpg": null +} +``` + +| Value | Meaning | +|-------|---------| +| key absent | not yet processed | +| `null` | error during last detection attempt; will retry | +| `[]` | processed successfully, no faces found | +| `[{box, encoding}, ...]` | one entry per detected face | + +`app.rb` reads `faces.json` via `load_faces(dir)` and merges face data +into each entry's `:faces` field. The field is `nil` (absent key in +`faces.json`) when the daemon hasn't processed the image yet. + ### Detection pipeline -`update.rb` calls `enrich_faces` for each image file where `meta['faces']` -is `nil` (not yet processed). It shells out to `scripts/faces.py`, which: +The daemon shells out to `scripts/faces.py`, which uses the **CNN model** +(`model="cnn"`) for higher accuracy. The CNN model detects: +- Faces at angles up to ~45° profile +- Small faces in group photos +- Faces in non-ideal lighting -1. Loads the image with `face_recognition.load_image_file` (handles JPEG, - PNG, HEIC, etc. via Pillow). -2. Runs `face_locations(model="hog")` — the HOG model is fast on CPU and - accurate for frontal/near-frontal faces. (The CNN model is more accurate - but requires a GPU to be practical.) -3. For each detected location, calls `face_encodings` to produce a - 128-dimensional L2-normalised embedding vector. -4. Prints a JSON array to stdout; on any error prints `[]` so `update.rb` - always gets valid JSON. +Trade-off: CNN is ~10–30× slower than HOG on CPU. The daemon compensates +with `--workers N` (default 20) — dlib releases the Python GIL during +C++ inference, so threads achieve genuine CPU parallelism. -The result is stored as `meta['faces']` in `album.json`. An empty array -`[]` means "processed, no faces found" and prevents re-processing. A `null` -value means "not yet processed." +`faces.py` accepts a batch of image paths and returns a JSON dict mapping +each path to its result list. Null for a path means detection failed +(file unreadable or corrupt); the daemon marks that entry `null` in +`faces.json` so it is retried on the next pass. Encodings are stored in full (128 floats each) to allow re-clustering without reprocessing all images. -### Clustering and people management (planned) +### Daemon operation -A second pass (`scripts/cluster_faces.rb`) will: +``` +loop: + for each directory in MEDIA_ROOT (recursive): + pending = images whose name is absent from faces.json (or null) + if pending not empty: + call faces.py --workers 20 <pending paths> + merge results into faces.json (atomic write) + sleep POLL_INTERVAL seconds (default 300, in 1-second increments for prompt shutdown) +``` -1. Walk all `album.json` files and collect every `{encoding, source_file, - box}` tuple. -2. Cluster them with a threshold distance (~0.6 in L2 space, empirically - good for dlib encodings). -3. Write `/var/albumen/people.json` — a map of stable UUIDs → cluster - metadata (name, representative encoding, member list). +SIGTERM / SIGINT trigger graceful shutdown between directories. -The admin `/admin/people` UI will let you: -- Name unidentified clusters ("Who is this?"). -- Merge two clusters that are the same person. -- Remove a photo from a cluster (false positive). +### Configuration -Public `/people` and `/people/:id` routes will let any visitor browse by -person. +```yaml +faces: + enabled: true + workers: 20 # threads passed to faces.py + poll_interval: 300 # seconds between full-tree sweeps +``` ### Performance notes -- HOG face detection: ~0.5–2 s per image on a single CPU core. -- A library of 10,000 images takes ~3–6 hours to index fully, but the - sentinel-based skip means subsequent `update.rb` runs only process new - photos. -- Encodings stored in `album.json` are ~3.5 KB per face. A library with +- CNN face detection with 20 workers: ~4–6 images/minute on a 64-core CPU. +- A library of ~20,000 photos takes roughly 2.5–3.5 days on initial pass. +- Subsequent daemon passes only process new photos. +- Encodings stored in `faces.json` are ~3.5 KB per face. A library with an average of 2 faces per photo adds ~70 MB of JSON across 10,000 photos — negligible. +### Clustering and people management (planned) + +A second pass (`scripts/cluster_faces.rb`) will: + +1. Walk all `faces.json` files and collect every `{encoding, source_file, box}` tuple. +2. Cluster them with a threshold distance (~0.6 in L2 space, empirically good for dlib encodings). +3. Write `/var/albumen/people.json` — a map of stable UUIDs → cluster metadata. + +The admin `/admin/people` UI will let you name clusters, merge duplicates, +and remove false positives. Public `/people` routes will allow browsing by +person. + --- ## Security |
