diff options
| author | Ken D'Ambrosio <ken@jots.org> | 2026-06-08 18:36:07 +0000 |
|---|---|---|
| committer | Ken D'Ambrosio <ken@jots.org> | 2026-06-08 18:36:07 +0000 |
| commit | 625b3d5176f2c274e91fcf28bda8e45cc0477722 (patch) | |
| tree | 6ca16ad6f4a830b65dcddbd78ad7e7a2f1655682 | |
| parent | ecc872a1fd43c0863e3171a1faf533adc3e3a4c5 (diff) | |
Separate face detection into standalone daemon
- Strip all face code from update.rb; add shared log helper writing to
/opt/albumen/log/albumen.log with [update] prefix. update.rb now owns
only album.json; face_daemon.rb owns faces.json.
- New scripts/face_daemon.rb: polls MEDIA_ROOT for unprocessed images,
calls faces.py in batches, writes per-directory faces.json sidecars
atomically. Graceful SIGTERM/SIGINT shutdown between directories.
- New config/face_daemon.service: systemd unit running as albumen user,
Restart=on-failure, logs via SyslogIdentifier=albumen-faces.
- app.rb: add FACES_ENABLED constant; load_faces() helper reads faces.json;
album_files() merges face data into each entry as :faces field.
- Update README.md and DESIGN.md to document the new daemon architecture,
faces.json schema, and service management commands.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| -rw-r--r-- | DESIGN.md | 148 | ||||
| -rw-r--r-- | README.md | 41 | ||||
| -rw-r--r-- | app.rb | 14 | ||||
| -rw-r--r-- | config/face_daemon.service | 27 | ||||
| -rw-r--r-- | scripts/face_daemon.rb | 148 | ||||
| -rw-r--r-- | scripts/update.rb | 103 |
6 files changed, 355 insertions, 126 deletions
@@ -26,18 +26,29 @@ Apache reverse proxy (192.168.10.1 / albumen.jots.org) Puma application server (127.0.0.1:4567) │ Rack / Sinatra ▼ -app.rb ──reads──► /var/albumen/ (media files + album.json sidecars) - ──reads──► /opt/albumen/cache/thumbs/ (generated thumbnails) - ──reads──► /opt/albumen/config.yml (password hash + session secret) +app.rb ──reads──► /var/albumen/ (media files + album.json + faces.json sidecars) + ──reads──► /opt/albumen/cache/thumbs/ (generated thumbnails) + ──reads──► /opt/albumen/config.yml (password hash + session secret) + +face_daemon.rb ──reads──► /var/albumen/ (image files) + ──writes─► /var/albumen/**/faces.json (per-directory face sidecar) ``` ### Process model -The app runs as the `albumen` system user under systemd +The web app runs as the `albumen` system user under systemd (`config/albumen.service`). Puma is configured for 1 worker with 4–8 threads (`config/puma.rb`). Logs go to `/opt/albumen/log/`. On crash, systemd restarts the process after 5 seconds. +The face detection daemon (`config/face_daemon.service`) runs as a +separate systemd service under the same `albumen` user. It polls +`MEDIA_ROOT` every `poll_interval` seconds (default 300), processes +any images not yet in a `faces.json` sidecar, and writes results +atomically. It never touches `album.json`, so there is no write +contention with `update.rb`. All process output is written to the +shared log at `/opt/albumen/log/albumen.log`. + ### Reverse proxy The proxy (Apache or nginx — a sample nginx config is in @@ -57,7 +68,8 @@ is set to unlimited because large video files may be uploaded via rsync. config.ru Gemfile / Gemfile.lock config/ - albumen.service ← systemd unit file + albumen.service ← systemd unit file for the web app + face_daemon.service ← systemd unit file for the face detection daemon puma.rb ← Puma config nginx-albumen.conf ← sample reverse-proxy config views/ @@ -74,6 +86,8 @@ is set to unlimited because large video files may be uploaded via rsync. img/audio.svg ← placeholder thumbnail for audio files scripts/ update.rb ← post-upload scan/enrich script + face_daemon.rb ← face detection daemon (polls for new images, writes faces.json) + faces.py ← dlib CNN face detection helper called by the daemon set_password.rb ← PBKDF2-SHA256 password setter cache/thumbs/ ← generated thumbnail cache (mirrored path structure) tmp/ ← Puma pid / state files @@ -83,11 +97,13 @@ is set to unlimited because large video files may be uploaded via rsync. /var/albumen/ ← media root (owned by albumen user) album.json ← root-level sidecar (optional) SomeAlbum/ - album.json ← per-album metadata sidecar + album.json ← per-album metadata sidecar (owned by update.rb) + faces.json ← per-album face data sidecar (owned by face_daemon.rb) photo1.jpg photo2.jpg SubAlbum/ album.json + faces.json photo3.jpg ``` @@ -190,7 +206,6 @@ used. The file is written atomically (write to a `.tmp` file, then | `taken_at` | `null` | ISO 8601 timestamp from EXIF; used for chronological sorting | | `width` / `height` | `null` | Pixel dimensions recorded by `update.rb` | | `exif_absent` | `null` | Set to `true` by `update.rb` when exiftool found no metadata; skips re-extraction on future rescans | -| `faces` | `null` | Set by `update.rb` when `faces.enabled`; array of `{"box": [top,right,bottom,left], "encoding": [128 floats]}` per detected face; `[]` means processed with no faces found; `null` means not yet processed | When `taken_at` is present on *any* file in an album, the entire album is sorted chronologically. Albums with no `taken_at` data stay in filename @@ -417,57 +432,110 @@ to the `albumen` user so the web app can read the files. Enabled by setting `faces.enabled: true` in `config.yml`. When disabled, no Python is invoked and no face data is stored or displayed. +### Architecture: separate daemon + +Face detection runs in a completely separate process (`scripts/face_daemon.rb`, +managed by `config/face_daemon.service`) and is entirely decoupled from +`update.rb`. This design keeps the two operations from conflicting: + +- **`update.rb`** owns `album.json` in each directory. It indexes media, + extracts EXIF data, and generates thumbnails as fast as possible so + newly uploaded photos are browseable immediately. +- **`face_daemon.rb`** owns `faces.json` in each directory. It runs + continuously in the background, processing images that haven't been + detected yet. There is no file locking or write contention between the + two processes. + +### Data model — `faces.json` + +Each directory gets a `faces.json` sidecar written by the daemon: + +```json +{ + "photo1.jpg": [ + {"box": [top, right, bottom, left], "encoding": [128 floats]}, + ... + ], + "photo2.jpg": [], + "photo3.jpg": null +} +``` + +| Value | Meaning | +|-------|---------| +| key absent | not yet processed | +| `null` | error during last detection attempt; will retry | +| `[]` | processed successfully, no faces found | +| `[{box, encoding}, ...]` | one entry per detected face | + +`app.rb` reads `faces.json` via `load_faces(dir)` and merges face data +into each entry's `:faces` field. The field is `nil` (absent key in +`faces.json`) when the daemon hasn't processed the image yet. + ### Detection pipeline -`update.rb` calls `enrich_faces` for each image file where `meta['faces']` -is `nil` (not yet processed). It shells out to `scripts/faces.py`, which: +The daemon shells out to `scripts/faces.py`, which uses the **CNN model** +(`model="cnn"`) for higher accuracy. The CNN model detects: +- Faces at angles up to ~45° profile +- Small faces in group photos +- Faces in non-ideal lighting -1. Loads the image with `face_recognition.load_image_file` (handles JPEG, - PNG, HEIC, etc. via Pillow). -2. Runs `face_locations(model="hog")` — the HOG model is fast on CPU and - accurate for frontal/near-frontal faces. (The CNN model is more accurate - but requires a GPU to be practical.) -3. For each detected location, calls `face_encodings` to produce a - 128-dimensional L2-normalised embedding vector. -4. Prints a JSON array to stdout; on any error prints `[]` so `update.rb` - always gets valid JSON. +Trade-off: CNN is ~10–30× slower than HOG on CPU. The daemon compensates +with `--workers N` (default 20) — dlib releases the Python GIL during +C++ inference, so threads achieve genuine CPU parallelism. -The result is stored as `meta['faces']` in `album.json`. An empty array -`[]` means "processed, no faces found" and prevents re-processing. A `null` -value means "not yet processed." +`faces.py` accepts a batch of image paths and returns a JSON dict mapping +each path to its result list. Null for a path means detection failed +(file unreadable or corrupt); the daemon marks that entry `null` in +`faces.json` so it is retried on the next pass. Encodings are stored in full (128 floats each) to allow re-clustering without reprocessing all images. -### Clustering and people management (planned) +### Daemon operation -A second pass (`scripts/cluster_faces.rb`) will: +``` +loop: + for each directory in MEDIA_ROOT (recursive): + pending = images whose name is absent from faces.json (or null) + if pending not empty: + call faces.py --workers 20 <pending paths> + merge results into faces.json (atomic write) + sleep POLL_INTERVAL seconds (default 300, in 1-second increments for prompt shutdown) +``` -1. Walk all `album.json` files and collect every `{encoding, source_file, - box}` tuple. -2. Cluster them with a threshold distance (~0.6 in L2 space, empirically - good for dlib encodings). -3. Write `/var/albumen/people.json` — a map of stable UUIDs → cluster - metadata (name, representative encoding, member list). +SIGTERM / SIGINT trigger graceful shutdown between directories. -The admin `/admin/people` UI will let you: -- Name unidentified clusters ("Who is this?"). -- Merge two clusters that are the same person. -- Remove a photo from a cluster (false positive). +### Configuration -Public `/people` and `/people/:id` routes will let any visitor browse by -person. +```yaml +faces: + enabled: true + workers: 20 # threads passed to faces.py + poll_interval: 300 # seconds between full-tree sweeps +``` ### Performance notes -- HOG face detection: ~0.5–2 s per image on a single CPU core. -- A library of 10,000 images takes ~3–6 hours to index fully, but the - sentinel-based skip means subsequent `update.rb` runs only process new - photos. -- Encodings stored in `album.json` are ~3.5 KB per face. A library with +- CNN face detection with 20 workers: ~4–6 images/minute on a 64-core CPU. +- A library of ~20,000 photos takes roughly 2.5–3.5 days on initial pass. +- Subsequent daemon passes only process new photos. +- Encodings stored in `faces.json` are ~3.5 KB per face. A library with an average of 2 faces per photo adds ~70 MB of JSON across 10,000 photos — negligible. +### Clustering and people management (planned) + +A second pass (`scripts/cluster_faces.rb`) will: + +1. Walk all `faces.json` files and collect every `{encoding, source_file, box}` tuple. +2. Cluster them with a threshold distance (~0.6 in L2 space, empirically good for dlib encodings). +3. Write `/var/albumen/people.json` — a map of stable UUIDs → cluster metadata. + +The admin `/admin/people` UI will let you name clusters, merge duplicates, +and remove false positives. Public `/people` routes will allow browsing by +person. + --- ## Security @@ -36,8 +36,9 @@ back end, plain HTML/CSS/JS front end. Live at **https://albumen.jots.org**. **Force rescan all** checkbox bypasses the sentinel and rescans every directory ### Facial recognition (opt-in) -- Detects faces in photos and stores 128-D embeddings alongside each image -- Powered by [face_recognition](https://github.com/ageitgey/face_recognition) (dlib/HOG, CPU-only) +- Detects faces in photos and stores 128-D embeddings in per-directory `faces.json` sidecar files +- Powered by [face_recognition](https://github.com/ageitgey/face_recognition) (dlib CNN model, CPU-only) +- Runs as a background daemon (`face_daemon.service`), completely decoupled from the update script - People management and browse-by-person UI in progress ### Media support @@ -176,7 +177,9 @@ The PBKDF2-SHA256 hash is stored in `/opt/albumen/config.yml` (readable only by ## Facial recognition setup -Face detection is opt-in. Install once, then enable in `config.yml`. +Face detection is opt-in and runs as a **separate background daemon** so it +never slows down the update script. Photos are indexed and browseable +immediately after upload; faces are detected asynchronously in the background. ### 1. Install Python dependencies (server, ~30 min first time) @@ -193,31 +196,47 @@ Add to `/opt/albumen/config.yml`: ```yaml faces: enabled: true + workers: 20 # parallel threads (set to ~nproc/3 to leave headroom) + poll_interval: 300 # seconds between full sweeps ``` -### 3. Run the update script +### 3. Install and start the daemon ```bash -ruby /opt/albumen/scripts/update.rb +cp /opt/albumen/config/face_daemon.service /etc/systemd/system/ +systemctl daemon-reload +systemctl enable face_daemon +systemctl start face_daemon ``` -The update script will now detect faces in images and store bounding boxes and -embeddings in each album's `album.json`. This is a one-time cost per image; -subsequent runs skip already-processed photos. +The daemon polls the media tree every `poll_interval` seconds, processes any +images not yet in a `faces.json` sidecar, and logs to +`/opt/albumen/log/albumen.log` with a `[faces]` prefix. + +Initial detection of a large library (~20,000 photos with CNN model on a +64-core CPU) takes roughly 2.5–3.5 days. Only new photos are processed on +subsequent passes. --- ## Service management ```bash +# Web app systemctl status albumen # is it running? systemctl restart albumen # restart (e.g. after editing app.rb) journalctl -u albumen -f # live service logs -tail -f /opt/albumen/log/puma.stdout.log # Puma access log -tail -f /opt/albumen/log/puma.stderr.log # Puma error log + +# Face detection daemon +systemctl status face_daemon # is it running? +systemctl restart face_daemon # restart the daemon +journalctl -u albumen-faces -f # live daemon logs (systemd journal) + +# Shared activity log (both update.rb and face_daemon.rb write here) +tail -f /opt/albumen/log/albumen.log ``` -The service runs as the `albumen` user. App code lives in `/opt/albumen/`. +Both services run as the `albumen` user. App code lives in `/opt/albumen/`. --- @@ -23,7 +23,8 @@ VIDEO_EXTS = %w[mp4 mov avi mkv webm m4v ogv].freeze AUDIO_EXTS = %w[mp3 flac ogg wav m4a aac].freeze MEDIA_EXTS = (IMAGE_EXTS + VIDEO_EXTS + AUDIO_EXTS).freeze -APP_CONFIG = (File.exist?(CONFIG_PATH) ? YAML.load_file(CONFIG_PATH, symbolize_names: true) : {}).freeze +APP_CONFIG = (File.exist?(CONFIG_PATH) ? YAML.load_file(CONFIG_PATH, symbolize_names: true) : {}).freeze +FACES_ENABLED = (APP_CONFIG.dig(:faces, :enabled) == true).freeze # ── Sinatra config ───────────────────────────────────────────────────────────── @@ -83,7 +84,17 @@ helpers do { 'files' => {}, 'visible' => true } end + def load_faces(dir) + path = File.join(dir, 'faces.json') + return {} unless File.exist?(path) + JSON.parse(File.read(path)) + rescue JSON::ParserError + {} + end + def album_files(dir, data) + face_data = FACES_ENABLED ? load_faces(dir) : {} + files = Dir.children(dir) .sort .select { |n| MEDIA_EXTS.include?(File.extname(n).downcase.delete_prefix('.')) } @@ -107,6 +118,7 @@ helpers do shutter: meta['shutter'], iso: meta['iso'], transcoded_to: meta['transcoded_to'], + faces: face_data[name], } end diff --git a/config/face_daemon.service b/config/face_daemon.service new file mode 100644 index 0000000..4babccc --- /dev/null +++ b/config/face_daemon.service @@ -0,0 +1,27 @@ +[Unit] +Description=Albumen face detection daemon +After=network.target albumen.service +Wants=albumen.service + +[Service] +Type=simple +User=albumen +Group=albumen +WorkingDirectory=/opt/albumen + +Environment=MEDIA_ROOT=/var/albumen +Environment=CONFIG_PATH=/opt/albumen/config.yml +Environment=LOG_PATH=/opt/albumen/log/albumen.log +Environment=VENV_PYTHON=/opt/albumen/venv/bin/python3 +Environment=FACES_SCRIPT=/opt/albumen/scripts/faces.py + +ExecStart=/usr/bin/ruby /opt/albumen/scripts/face_daemon.rb +Restart=on-failure +RestartSec=30 + +StandardOutput=journal +StandardError=journal +SyslogIdentifier=albumen-faces + +[Install] +WantedBy=multi-user.target diff --git a/scripts/face_daemon.rb b/scripts/face_daemon.rb new file mode 100644 index 0000000..5e817cd --- /dev/null +++ b/scripts/face_daemon.rb @@ -0,0 +1,148 @@ +#!/usr/bin/env ruby +# frozen_string_literal: true +# +# Face detection daemon for Albumen. +# +# Polls MEDIA_ROOT for images not yet in a per-directory faces.json sidecar +# and runs faces.py (dlib CNN model) on them. Never touches album.json — +# zero write contention with update.rb. +# +# faces.json schema (per directory): +# filename → null error during detection; will retry next pass +# filename → [] processed, no faces found +# filename → [{box,encoding}] face data +# (key absent) not yet processed +# +# Configuration — from ENV or /opt/albumen/config.yml (under `faces:` key): +# workers: 20 # ThreadPoolExecutor workers passed to faces.py +# poll_interval: 300 # seconds between full-tree sweeps +# +# Signal handling: SIGTERM / SIGINT triggers graceful shutdown between dirs. + +require 'json' +require 'yaml' +require 'fileutils' +require 'open3' + +MEDIA_ROOT = (ENV['MEDIA_ROOT'] || '/var/albumen').freeze +CONFIG_PATH = (ENV['CONFIG_PATH'] || '/opt/albumen/config.yml').freeze +LOG_PATH = (ENV['LOG_PATH'] || '/opt/albumen/log/albumen.log').freeze +VENV_PYTHON = (ENV['VENV_PYTHON'] || '/opt/albumen/venv/bin/python3').freeze +FACES_SCRIPT = (ENV['FACES_SCRIPT'] || '/opt/albumen/scripts/faces.py').freeze + +IMAGE_EXTS = %w[jpg jpeg png gif webp heic heif tiff bmp].freeze + +_cfg = File.exist?(CONFIG_PATH) ? (YAML.load_file(CONFIG_PATH, symbolize_names: true) rescue {}) : {} +FACES_WORKERS = (_cfg.dig(:faces, :workers) || 20).to_i.freeze +POLL_INTERVAL = (_cfg.dig(:faces, :poll_interval) || 300).to_i.freeze + +$shutdown = false +Signal.trap('TERM') { $shutdown = true } +Signal.trap('INT') { $shutdown = true } + +# ── Logging ─────────────────────────────────────────────────────────────────── + +def log(msg) + $stdout.puts msg + $stdout.flush + ts = Time.now.strftime('%Y-%m-%d %H:%M:%S') + File.open(LOG_PATH, 'a') { |f| f.puts "[#{ts}] [faces] #{msg}" } +rescue StandardError + # never crash on log failure +end + +# ── faces.json helpers ──────────────────────────────────────────────────────── + +def load_faces_json(path) + return {} unless File.exist?(path) + JSON.parse(File.read(path)) +rescue JSON::ParserError + {} +end + +def save_faces_atomic(path, data) + tmp = "#{path}.tmp.#{Process.pid}" + File.write(tmp, JSON.generate(data)) + File.rename(tmp, path) +rescue StandardError => e + File.unlink(tmp) rescue nil + log " Error saving #{path}: #{e.message}" +end + +# Returns image filenames that still need processing. +# null-valued entries (prior errors) are retried; [] entries are done. +def pending_images(dir) + faces = load_faces_json(File.join(dir, 'faces.json')) + Dir.children(dir) + .select { |n| IMAGE_EXTS.include?(File.extname(n).downcase.delete_prefix('.')) } + .reject { |n| faces.key?(n) && !faces[n].nil? } + .sort +end + +# ── Core processing ─────────────────────────────────────────────────────────── + +def process_dir(dir) + pending = pending_images(dir) + return if pending.empty? + + rel = dir.delete_prefix(MEDIA_ROOT).delete_prefix('/') + label = rel.empty? ? '(root)' : rel + log "#{label}: #{pending.size} image(s) pending" + + paths = pending.map { |n| File.join(dir, n) } + cmd = [VENV_PYTHON, FACES_SCRIPT, '--workers', FACES_WORKERS.to_s, *paths] + + stdout, stderr, status = Open3.capture3(*cmd) + + unless status.success? || stdout.strip.start_with?('{') + log " faces.py error (exit #{status.exitstatus}): #{stderr.strip}" + return + end + + begin + results = JSON.parse(stdout) + rescue JSON::ParserError => e + log " faces.py output is not valid JSON: #{e.message}" + return + end + + faces_path = File.join(dir, 'faces.json') + faces = load_faces_json(faces_path) # re-read before writing (pick up concurrent changes) + + pending.each do |name| + full = File.join(dir, name) + faces[name] = results[full] + detail = faces[name].nil? ? 'error (will retry)' : + faces[name].empty? ? 'no faces' : + "#{faces[name].length} face(s)" + log " #{name}: #{detail}" + end + + save_faces_atomic(faces_path, faces) +end + +def run_pass + dirs = [MEDIA_ROOT] + Dir.glob("#{MEDIA_ROOT}/**/*/").sort + dirs.each do |dir| + return if $shutdown + process_dir(dir) + end +end + +# ── Main loop ───────────────────────────────────────────────────────────────── + +log "Starting (workers=#{FACES_WORKERS}, poll_interval=#{POLL_INTERVAL}s, media=#{MEDIA_ROOT})" + +loop do + break if $shutdown + run_pass + break if $shutdown + + # Sleep in 1-second increments so SIGTERM/SIGINT takes effect promptly + POLL_INTERVAL.times do + break if $shutdown + sleep 1 + end +end + +log 'Shutting down.' diff --git a/scripts/update.rb b/scripts/update.rb index 5671330..f909510 100644 --- a/scripts/update.rb +++ b/scripts/update.rb @@ -16,6 +16,9 @@ # - Safe to re-run at any time; all operations are idempotent. # - Unchanged directories are skipped via a .albumen_scanned sentinel file; # pass --force to bypass. +# +# Face detection is NOT handled here. Run face_daemon.rb (or let the systemd +# service manage it) to detect faces and write per-directory faces.json files. require 'json' require 'yaml' @@ -23,27 +26,30 @@ require 'fileutils' require 'mini_magick' require 'mini_exiftool' -MEDIA_ROOT = (ENV['MEDIA_ROOT'] || '/var/albumen').freeze -CACHE_ROOT = (ENV['CACHE_ROOT'] || '/opt/albumen/cache/thumbs').freeze -CONFIG_PATH = (ENV['CONFIG_PATH'] || '/opt/albumen/config.yml').freeze -THUMB_SIZE = 300 +MEDIA_ROOT = (ENV['MEDIA_ROOT'] || '/var/albumen').freeze +CACHE_ROOT = (ENV['CACHE_ROOT'] || '/opt/albumen/cache/thumbs').freeze +LOG_PATH = (ENV['LOG_PATH'] || '/opt/albumen/log/albumen.log').freeze +THUMB_SIZE = 300 IMAGE_EXTS = %w[jpg jpeg png gif webp heic heif tiff bmp].freeze VIDEO_EXTS = %w[mp4 mov avi mkv webm m4v ogv].freeze AUDIO_EXTS = %w[mp3 flac ogg wav m4a aac].freeze MEDIA_EXTS = (IMAGE_EXTS + VIDEO_EXTS + AUDIO_EXTS).freeze -TRANSCODE_EXTS = %w[avi mkv mov].freeze # not universally browser-playable; convert to MP4 +TRANSCODE_EXTS = %w[avi mkv mov].freeze SENTINEL_FILE = '.albumen_scanned'.freeze -_cfg = File.exist?(CONFIG_PATH) ? YAML.load_file(CONFIG_PATH, symbolize_names: true) : {} -FACES_ENABLED = (_cfg.dig(:faces, :enabled) == true).freeze -FACES_WORKERS = (_cfg.dig(:faces, :workers) || 4).freeze -VENV_PYTHON = File.expand_path('../venv/bin/python3', __dir__).freeze -FACES_SCRIPT = File.expand_path('faces.py', __dir__).freeze - # Explicit directory argument implies force — you asked for it, it should run. FORCE_UPDATE = !!(ARGV.delete('--force') || ARGV[0]) +def log(msg) + $stdout.puts msg + $stdout.flush + ts = Time.now.strftime('%Y-%m-%d %H:%M:%S') + File.open(LOG_PATH, 'a') { |f| f.puts "[#{ts}] [update] #{msg}" } +rescue StandardError + # never crash on log failure +end + # ── Directory processing ─────────────────────────────────────────────────────── def process_dir(dir, idx, total) @@ -51,20 +57,15 @@ def process_dir(dir, idx, total) label = rel.empty? ? '(root)' : rel prefix = "[#{idx}/#{total}]" - pending_faces = false unless FORCE_UPDATE sentinel = File.join(dir, SENTINEL_FILE) if File.exist?(sentinel) && File.mtime(sentinel) >= File.mtime(dir) - if faces_pending?(dir) - pending_faces = true # fall through, but only to run face detection - else - puts "#{prefix} Skipping #{label} (unchanged)" - return - end + log "#{prefix} Skipping #{label} (unchanged)" + return end end - puts "#{prefix} Scanning #{label}#{' (face detection pending)' if pending_faces}" + log "#{prefix} Scanning #{label}" json_path = File.join(dir, 'album.json') data = load_json(json_path) @@ -84,9 +85,9 @@ def process_dir(dir, idx, total) thumb = File.join(CACHE_ROOT, rel.empty? ? "#{n}.th.jpg" : "#{rel}/#{n}.th.jpg") if File.exist?(thumb) File.unlink(thumb) - puts " Removed: #{n} (+ thumb)" + log " Removed: #{n} (+ thumb)" else - puts " Removed: #{n}" + log " Removed: #{n}" end end @@ -96,20 +97,19 @@ def process_dir(dir, idx, total) base = File.basename(name, '.*') target = "#{base}.mp4" if current.include?(target) - # MP4 already exists — just ensure the marker is recorded data['files'][name] ||= {} data['files'][name]['transcoded_to'] = target next end full = File.join(dir, name) dest = File.join(dir, target) - puts " Transcoding: #{name} → #{target}" + log " Transcoding: #{name} → #{target}" transcode_to_mp4(full, dest) if File.exist?(dest) data['files'][name] ||= {} data['files'][name]['transcoded_to'] = target current << target - puts " → done" + log " → done" else warn " Transcode failed: #{name}" end @@ -132,8 +132,6 @@ def process_dir(dir, idx, total) generate_thumb_if_needed(full, rel, name, ext) end - batch_detect_faces(dir, current, data) if FACES_ENABLED - atomic_write_json(json_path, data) FileUtils.touch(File.join(dir, SENTINEL_FILE)) end @@ -152,7 +150,7 @@ def enrich_image(full, name, meta) raw = exif.date_time_original || exif.create_date || exif.date_time if raw meta['taken_at'] = raw.respond_to?(:strftime) ? raw.strftime('%Y-%m-%dT%H:%M:%S') : raw.to_s - puts " #{name}: taken_at = #{meta['taken_at']}" + log " #{name}: taken_at = #{meta['taken_at']}" end end @@ -169,14 +167,12 @@ def enrich_image(full, name, meta) warn " #{name}: EXIF error — #{e.message}" end - # If exiftool found nothing at all, record that so we don't retry on every re-scan. if meta['taken_at'].nil? && meta['camera'].nil? && meta['aperture'].nil? && meta['shutter'].nil? && meta['iso'].nil? meta['exif_absent'] = true end end - # Dimensions (skip if already recorded) if meta['width'].nil? begin img = MiniMagick::Image.open(full) @@ -186,37 +182,6 @@ def enrich_image(full, name, meta) warn " #{name}: dimension error — #{e.message}" end end - -end - -def batch_detect_faces(dir, names, data) - return unless File.exist?(VENV_PYTHON) && File.exist?(FACES_SCRIPT) - - unprocessed = names.select do |name| - IMAGE_EXTS.include?(File.extname(name).downcase.delete_prefix('.')) && - (data['files'][name] || {})['faces'].nil? - end - return if unprocessed.empty? - - puts " Detecting faces in #{unprocessed.length} image(s) (#{FACES_WORKERS} workers)…" - paths = unprocessed.map { |n| File.join(dir, n) } - cmd = [VENV_PYTHON, FACES_SCRIPT, '--workers', FACES_WORKERS.to_s] + paths - - begin - out = IO.popen(cmd, err: '/dev/null', &:read).strip - results = JSON.parse(out.empty? ? '{}' : out) - raise 'expected Hash' unless results.is_a?(Hash) - - results.each do |path, faces| - name = File.basename(path) - next unless data['files'].key?(name) - next if faces.nil? # error on this file — leave faces: null to retry - data['files'][name]['faces'] = faces - puts " #{name}: #{faces.length} face(s)" unless faces.empty? - end - rescue StandardError => e - warn " Face detection batch error — #{e.message}" - end end def enrich_video(full, name, meta) @@ -232,12 +197,12 @@ end # ── Thumbnail generation ─────────────────────────────────────────────────────── def generate_thumb_if_needed(full, rel, name, ext) - return if AUDIO_EXTS.include?(ext) # audio uses a static icon + return if AUDIO_EXTS.include?(ext) cache = File.join(CACHE_ROOT, rel.empty? ? "#{name}.th.jpg" : "#{rel}/#{name}.th.jpg") return if File.exist?(cache) - puts " Generating thumb: #{name}" + log " Generating thumb: #{name}" FileUtils.mkdir_p(File.dirname(cache)) if VIDEO_EXTS.include?(ext) @@ -295,16 +260,6 @@ rescue JSON::ParserError => e {} end -def faces_pending?(dir) - return false unless FACES_ENABLED - json_path = File.join(dir, 'album.json') - return false unless File.exist?(json_path) - (load_json(json_path)['files'] || {}).any? do |name, meta| - IMAGE_EXTS.include?(File.extname(name).downcase.delete_prefix('.')) && - meta['faces'].nil? - end -end - # Fields the admin controls — never overwrite with stale values from our earlier read. ADMIN_ALBUM_KEYS = %w[title description cover cover_dynamic sort_reverse visible].freeze ADMIN_FILE_KEYS = %w[title caption visible].freeze @@ -348,7 +303,7 @@ if Process.uid == 0 begin require 'etc' pw = Etc.getpwnam(service_user) - puts "Fixing ownership of #{start} → #{service_user}" + log "Fixing ownership of #{start} → #{service_user}" FileUtils.chown_R(pw.uid, pw.gid, start) rescue ArgumentError warn "Warning: user '#{service_user}' not found; skipping chown" @@ -361,4 +316,4 @@ dirs = dirs.uniq total = dirs.size dirs.each_with_index { |d, i| process_dir(d, i + 1, total) } -puts 'Done.' +log 'Done.' |
