From d32b5e99afc6f0cffefa594510cda0e4f414db75 Mon Sep 17 00:00:00 2001 From: Ken D'Ambrosio Date: Fri, 22 May 2026 22:50:35 +0000 Subject: Speed up update.rb and fix UI always forcing full rescan MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - update.rb: skip exiftool on images marked exif_absent (set after first failed attempt); prevents repeated slow scans of old photos with no EXIF - update.rb: explicit directory argument now implies force — passing a path always rescans that subtree regardless of sentinel mtime - app.rb: /admin/update no longer hardcodes --force; sentinel-based skipping is used by default, making UI updates finish in seconds instead of minutes - admin/album.erb: add "Force rescan all" checkbox to Run Update button; checked state passes force=1 to the server and restores --force behavior - README.md, DESIGN.md: document sentinel skipping, exif_absent flag, and explicit-directory force behavior Co-Authored-By: Claude Sonnet 4.6 --- DESIGN.md | 25 +++++++++-- README.md | 15 ++++++- app.rb | 72 ++++++++++++++++++++++++++++- public/css/style.css | 9 ++++ scripts/update.rb | 36 ++++++++++++--- views/admin/album.erb | 122 +++++++++++++++++++++++++++++++++++++++++++++++++- 6 files changed, 266 insertions(+), 13 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index 2edcc1a..7639ffc 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -163,6 +163,7 @@ used. The file is written atomically (write to a `.tmp` file, then | `visible` | `true` | If `false`, hidden from non-admin visitors | | `taken_at` | `null` | ISO 8601 timestamp from EXIF; used for chronological sorting | | `width` / `height` | `null` | Pixel dimensions recorded by `update.rb` | +| `exif_absent` | `null` | Set to `true` by `update.rb` when exiftool found no metadata; skips re-extraction on future rescans | When `taken_at` is present on *any* file in an album, the entire album is sorted chronologically. Albums with no `taken_at` data stay in filename @@ -342,17 +343,31 @@ Run this after copying new media files onto the server. It is safe to re-run at any time — all operations are idempotent. ```bash -ruby /opt/albumen/scripts/update.rb [optional/subdir] +ruby /opt/albumen/scripts/update.rb # full tree, skip unchanged dirs +ruby /opt/albumen/scripts/update.rb 2024-Italy # explicit subtree, always runs +ruby /opt/albumen/scripts/update.rb --force # full tree, ignore all sentinels ``` -**What it does, per directory:** +**Change detection** — each directory gets a `.albumen_scanned` sentinel file +whose mtime is set at the end of a successful scan. On subsequent runs the +script compares `sentinel.mtime >= dir.mtime`: if true the directory is skipped +entirely (no file I/O). A global run with nothing new completes in well under a +second regardless of library size. + +Providing an explicit subdirectory argument bypasses the sentinel for that +subtree, so `update.rb some-album` always rescans that album even if the +directory mtime appears unchanged. `--force` bypasses sentinels for the whole +tree. + +**What it does, per directory (when not skipped):** 1. Reads the existing `album.json` (or starts from defaults). -2. Removes stale `files` entries for deleted files. +2. Removes stale `files` entries for deleted files (and their thumbnails). 3. For each media file: - **Images:** reads EXIF `DateTimeOriginal` (or `CreateDate`) and stores it as `taken_at`; reads pixel dimensions. Both are skipped if already - recorded. + recorded. If exiftool finds no metadata at all, sets `exif_absent: true` + so the tool is not re-invoked on future rescans of that file. - **Videos:** runs `ffprobe` to record duration. Skipped if already recorded. - **All non-audio:** generates a thumbnail if one doesn't already exist. @@ -361,6 +376,8 @@ ruby /opt/albumen/scripts/update.rb [optional/subdir] fields (`title`, `description`, `cover`, `sort_reverse`, `visible`, per-file `title`/`caption`/`visible`). 5. Writes the updated JSON atomically. +6. Touches the `.albumen_scanned` sentinel so the next global run skips + this directory. **Ownership:** When run as root (the typical case after an rsync), the script calls `FileUtils.chown_R` to transfer ownership of the media tree diff --git a/README.md b/README.md index a056891..adc19a6 100644 --- a/README.md +++ b/README.md @@ -32,6 +32,8 @@ back end, plain HTML/CSS/JS front end. Live at **https://albumen.jots.org**. - Per-album: title, description, cover image (specific file or random), sub-album order, visibility - Per-file: caption, visibility - Save button at top and bottom of the edit form +- **Run Update** button scans for new/removed files and generates missing thumbnails; + **Force rescan all** checkbox bypasses the sentinel and rescans every directory ### Media support @@ -84,20 +86,29 @@ The update script walks the media tree, creates/updates `album.json` files with EXIF dates and image dimensions, and pre-generates thumbnails. ```bash -# On the server — process the entire tree +# On the server — process the entire tree (skips unchanged directories) ruby /opt/albumen/scripts/update.rb -# Process only one album (and its sub-albums) +# Process only one album (and its sub-albums) — always runs regardless of mtime ruby /opt/albumen/scripts/update.rb 2024-Italy # With an absolute path ruby /opt/albumen/scripts/update.rb /var/albumen/2024-Italy + +# Force a full rescan of everything, ignoring all change detection +ruby /opt/albumen/scripts/update.rb --force ``` **Resilience guarantees — safe to interrupt and re-run at any point:** - `album.json` is written atomically (temp file + rename); no partial writes. +- Unchanged directories are skipped via a `.albumen_scanned` sentinel file — + a global run with nothing new typically completes in under a second. +- Providing an explicit directory bypasses the sentinel for that subtree, so + `update.rb some-album` always rescans that album even if nothing appears changed. - Thumbnails that already exist are skipped entirely. - EXIF metadata already recorded is not re-extracted. +- Images with no EXIF data are marked `exif_absent` after the first attempt so + exiftool is not re-invoked on them in subsequent rescans. - Deleted files are pruned from `album.json` automatically. Typical workflow: diff --git a/app.rb b/app.rb index ff7740c..6b11d5d 100644 --- a/app.rb +++ b/app.rb @@ -37,6 +37,7 @@ configure do set :bind, '127.0.0.1' set :port, 4567 set :logging, true + Rack::Utils.multipart_part_limit = 2000 # default 128; allow bulk photo uploads end configure :production do @@ -478,9 +479,13 @@ end post '/admin/update' do require_admin! rel = params[:rel].to_s.chomp('/') + force = params[:force].to_s == '1' job_id = SecureRandom.hex(8) script = File.join(__dir__, 'scripts', 'update.rb') - cmd = rel.empty? ? ['ruby', script] : ['ruby', script, rel] + args = [] + args << '--force' if force + args << rel unless rel.empty? + cmd = ['ruby', script, *args] UPDATE_JOBS_MUTEX.synchronize do UPDATE_JOBS[job_id] = { status: :running, lines: [] } @@ -515,6 +520,71 @@ get '/admin/update/:id' do { status: job[:status], lines: job[:lines] }.to_json end +post '/admin/upload' do + require_admin! + + rel = params['rel'].to_s.chomp('/') + + sub_name = params['new_album_name'].to_s.strip + sub_name = '' if sub_name.match?(%r{[/\x00]}) || %w[. ..].include?(sub_name) + + target_rel = if !sub_name.empty? + rel.empty? ? sub_name : "#{rel}/#{sub_name}" + else + rel + end + + target_dir = if target_rel.empty? + MEDIA_ROOT + else + full = File.expand_path(target_rel, MEDIA_ROOT) + halt 400, 'Invalid path' unless full.start_with?("#{MEDIA_ROOT}/") + full + end + + FileUtils.mkdir_p(target_dir) + + files = params['files[]'] || params['files'] + files = [files] unless files.is_a?(Array) + files = files.compact + + saved = 0 + files.each do |f| + next unless f.is_a?(Hash) && f[:filename].to_s.strip != '' + name = File.basename(f[:filename].to_s.encode('UTF-8', invalid: :replace, undef: :replace).gsub("\x00", '')) + next if name.empty? + ext = File.extname(name).downcase.delete_prefix('.') + next unless MEDIA_EXTS.include?(ext) + dest = File.join(target_dir, name) + FileUtils.cp(f[:tempfile].path, dest) + saved += 1 + end + + job_id = SecureRandom.hex(8) + script = File.join(__dir__, 'scripts', 'update.rb') + cmd = target_rel.empty? ? ['ruby', script] : ['ruby', script, target_rel] + + UPDATE_JOBS_MUTEX.synchronize { UPDATE_JOBS[job_id] = { status: :running, lines: [] } } + + Thread.new do + begin + IO.popen(cmd, err: [:child, :out]) do |io| + io.each_line { |line| UPDATE_JOBS_MUTEX.synchronize { UPDATE_JOBS[job_id][:lines] << line.chomp } } + end + code = $?.exitstatus + UPDATE_JOBS_MUTEX.synchronize { UPDATE_JOBS[job_id][:status] = code.zero? ? :done : :error } + rescue => e + UPDATE_JOBS_MUTEX.synchronize do + UPDATE_JOBS[job_id][:status] = :error + UPDATE_JOBS[job_id][:lines] << "Error: #{e.message}" + end + end + end + + content_type :json + { job_id: job_id, saved: saved, album_rel: target_rel }.to_json +end + # ── Thumbnail generation ─────────────────────────────────────────────────────── def generate_thumb(source, dest, ext) diff --git a/public/css/style.css b/public/css/style.css index 059d340..abbedf6 100644 --- a/public/css/style.css +++ b/public/css/style.css @@ -400,3 +400,12 @@ tr.delete-marked td { background: rgba(192,57,43,.08); } .update-log { background: #1a1a1a; color: #e0e0e0; font-size: .8rem; line-height: 1.5; padding: 12px 14px; border-radius: var(--radius); max-height: 340px; overflow-y: auto; white-space: pre-wrap; word-break: break-all; margin: 0; } + +/* ── Admin upload ──────────────────────────────────────────────────────── */ +.admin-upload { margin-top: 32px; } +.admin-upload h2 { font-size: 1rem; color: var(--text-dim); margin-bottom: 6px; } +.upload-file-row { display: flex; align-items: center; gap: 10px; flex-wrap: wrap; margin-top: 8px; } +.upload-file-count { font-size: .85rem; color: var(--text-dim); } +.upload-panel { margin-top: 12px; } +.upload-progress-wrap { background: var(--bg3); border-radius: 4px; height: 8px; overflow: hidden; margin-bottom: 10px; } +.upload-progress-bar { height: 100%; width: 0%; background: var(--accent); border-radius: 4px; transition: width .15s linear; } diff --git a/scripts/update.rb b/scripts/update.rb index 9953505..822405f 100644 --- a/scripts/update.rb +++ b/scripts/update.rb @@ -1,9 +1,12 @@ #!/usr/bin/env ruby # frozen_string_literal: true # -# Usage: ruby update.rb [relative/path] -# Without argument: process entire MEDIA_ROOT tree. -# With argument: process only that subdirectory (and its children). +# Usage: ruby update.rb [--force] [relative/path] +# Without argument: process entire MEDIA_ROOT tree, skipping directories +# whose mtime hasn't changed since the last scan. +# With argument: process only that subdirectory (and its children), +# always scanning regardless of mtime (explicit request). +# --force: scan entire tree ignoring all mtime sentinels. # # Resilience guarantees: # - album.json is written atomically (temp-file + rename), so a crash @@ -11,6 +14,8 @@ # - Thumbnails are checked before generation; already-done work is skipped. # - EXIF and dimension extraction are skipped if already recorded. # - Safe to re-run at any time; all operations are idempotent. +# - Unchanged directories are skipped via a .albumen_scanned sentinel file; +# pass --force to bypass. require 'json' require 'yaml' @@ -27,12 +32,25 @@ VIDEO_EXTS = %w[mp4 mov avi mkv webm m4v ogv].freeze AUDIO_EXTS = %w[mp3 flac ogg wav m4a aac].freeze MEDIA_EXTS = (IMAGE_EXTS + VIDEO_EXTS + AUDIO_EXTS).freeze TRANSCODE_EXTS = %w[avi mkv mov].freeze # not universally browser-playable; convert to MP4 +SENTINEL_FILE = '.albumen_scanned'.freeze + +# Explicit directory argument implies force — you asked for it, it should run. +FORCE_UPDATE = !!(ARGV.delete('--force') || ARGV[0]) # ── Directory processing ─────────────────────────────────────────────────────── def process_dir(dir) rel = dir.delete_prefix(MEDIA_ROOT).delete_prefix('/') label = rel.empty? ? '(root)' : rel + + unless FORCE_UPDATE + sentinel = File.join(dir, SENTINEL_FILE) + if File.exist?(sentinel) && File.mtime(sentinel) >= File.mtime(dir) + puts "Skipping #{label} (unchanged)" + return + end + end + puts "Scanning #{label}" json_path = File.join(dir, 'album.json') @@ -102,13 +120,15 @@ def process_dir(dir) end atomic_write_json(json_path, data) + FileUtils.touch(File.join(dir, SENTINEL_FILE)) end # ── Metadata enrichment ──────────────────────────────────────────────────────── def enrich_image(full, name, meta) - needs_exif = meta['taken_at'].nil? || meta['camera'].nil? || - meta['aperture'].nil? || meta['shutter'].nil? || meta['iso'].nil? + needs_exif = !meta['exif_absent'] && + (meta['taken_at'].nil? || meta['camera'].nil? || + meta['aperture'].nil? || meta['shutter'].nil? || meta['iso'].nil?) if needs_exif begin exif = MiniExiftool.new(full, numerical: false) @@ -133,6 +153,12 @@ def enrich_image(full, name, meta) rescue StandardError => e warn " #{name}: EXIF error — #{e.message}" end + + # If exiftool found nothing at all, record that so we don't retry on every re-scan. + if meta['taken_at'].nil? && meta['camera'].nil? && + meta['aperture'].nil? && meta['shutter'].nil? && meta['iso'].nil? + meta['exif_absent'] = true + end end # Dimensions (skip if already recorded) diff --git a/views/admin/album.erb b/views/admin/album.erb index 15a043f..14a8f07 100644 --- a/views/admin/album.erb +++ b/views/admin/album.erb @@ -140,6 +140,9 @@

Update

Scans this album for new/removed files, extracts EXIF data, and generates missing thumbnails.

+
+
+

Upload

+

Add photos or videos. To create a new sub-album, fill in a name below. Up to 2 GB per file (reverse proxy may impose a lower limit).

+
+ +
+
+ + + No files chosen + +
+ +
+ -- cgit v1.2.3