Files
ai-media-hub/TODO.md
T
AI Assistant e4262613c3
build-push / docker (push) Successful in 4m16s
Fix Artgrid collector matching and split ranker
2026-03-13 19:31:57 +09:00

15 KiB

AI Media Hub Handover

Working Rule

  • From this point on, every meaningful change should be appended to this file so the next handoff can reconstruct:
    • what changed
    • why it changed
    • how it was verified
    • what remains risky
  • Treat this file as both backlog and handover log, not just a static TODO list.

Current Session Update (2026-03-13)

  • Added a local self-test workflow before push/container build:
    • scripts/selftest.sh
    • scripts/mock_searxng.py
  • Fixed Korean query translation fallback behavior:
    • If GEMINI_API_KEY is missing or Gemini translation fails, the code now still attempts Google Translate fallback.
    • If Google Translate fallback fails, dictionary replacement fallback still runs.
  • Added Go tests for translation fallback logic.
  • Fixed frontend HLS preview wiring:
    • hls.js is now loaded in frontend/index.html
    • frontend now tries hls.js first, then native HLS playback if available
  • Corrected the practical local verification note:
    • go build ./backend from repo root conflicts with the existing backend/ directory name
    • verified build command is now treated as go build -o /tmp/... ./backend

Current Session Update (2026-03-13, Search/Preview Follow-up)

  • Investigated a production search failure using downloaded frontend logs.
  • Identified the main timeout cause:
    • too many search results were being collected
    • too many Gemini Vision batches were being evaluated sequentially
    • backend debug messages were broadcasting oversized result payloads
  • Applied search pipeline optimization:
    • reduced per-source result caps
    • reduced query fan-out for Google Video
    • reduced enrichment cap
    • limited Gemini Vision evaluation to top-ranked candidates only
  • Improved Google Video filtering:
    • added bans for music/BGM/trailer-style noise results
  • Improved Envato enrichment fidelity:
    • source page metadata is now preferred over search-engine proxy thumbnails
    • source snippet/title are now taken from page metadata when available
    • preview mp4 extraction now works via HTML/JSON-LD parsing
    • added Python HTML fetch fallback for Cloudflare-challenged Envato pages because Go HTTP alone was receiving 403 challenge pages in testing
  • Improved Artgrid fidelity:
    • source page title/description/thumbnail are now preferred over search-engine snippets when available
    • preview extraction is still not considered solved for all Artgrid clips because public HTML tested here did not expose a stable mp4/m3u8 URL
  • Improved logging:
    • backend search debug events now emit summaries, timings, source counts, preview counts, and Gemini batch stats instead of giant raw arrays
    • frontend now logs raw non-JSON error bodies instead of collapsing them to {} on gateway/proxy failures
  • Improved result rendering:
    • search cards now show source snippet/description separately from AI reason to reduce confusion between asset metadata and Gemini commentary

Current Session Update (2026-03-13, Regression Fix)

  • A regression was found after search optimization:
    • Envato and Artgrid disappeared entirely for some real searches while Google Video still returned results
  • Root cause:
    • the first optimization reduced query-variant breadth too aggressively
    • the first 3 query variants were not enough to recover Envato/Artgrid in some real SearXNG result sets
  • Fix applied:
    • search now runs in two stages
    • stage 1 searches only the first few variants for speed
    • stage 2 searches additional variants only for sources that still returned zero results
  • Intent:
    • keep the anti-timeout optimization
    • recover Envato/Artgrid recall when the early pass is too narrow

Current Session Update (2026-03-13, HTML Snapshot Analysis)

  • Used saved HTML snapshots supplied by the user for:
    • Envato item page
    • Artgrid clip page
  • Findings:
    • Envato page exposes clean VideoObject JSON-LD with:
      • exact asset title
      • rich description
      • thumbnail URL
      • preview mp4 URL
    • Artgrid page exposes reliable meta fields for:
      • title
      • description
      • thumbnail
      • canonical URL
    • Artgrid snapshot still does not expose a stable preview mp4 or m3u8 in the saved HTML or downloaded asset bundle inspected here
  • Fixes applied from the snapshots:
    • Envato enrichment now prefers VideoObject JSON-LD over generic meta tags
    • Envato search cards should now align much better with the actual source asset and preview
    • Artgrid title/description are now cleaned so Gemini/source text is less polluted by site suffixes and generic boilerplate
  • Remaining limitation:
    • Artgrid hover-video preview cannot be derived reliably from the provided snapshot alone
    • if Artgrid preview video is still required, the next useful artifact is a browser HAR or DevTools network capture from an opened clip page

Current Session Update (2026-03-13, Collector Refactor)

  • Refactored the search pipeline into source-specific collectors:
    • envatoCollector
    • artgridCollector
    • googleVideoCollector
  • SearchService now acts mainly as:
    • collector orchestration
    • query-pass control
    • dedupe
    • cross-source enrichment scheduling
  • Goal of the refactor:
    • reduce cross-source coupling
    • make future source-specific fixes safer
    • make it easier to replace or disable one source without destabilizing the others
  • Current implementation note:
    • collectors are still in Go code under backend services, but the responsibilities are now separated by source instead of one monolithic search loop

Current Session Update (2026-03-13, Artgrid Collector Fix + Ranker Split)

  • Artgrid collector regression fixed:
    • real search results can come back as artlist.io/stock-footage/clip/.../<id> instead of only artgrid.io/clip/<id>/...
    • renderable filtering was rejecting those URLs, which caused SearXNG returned no renderable results. for Artgrid-only searches
  • Fix applied:
    • Artgrid renderability now accepts both artgrid.io and artlist.io/stock-footage/clip/... clip URLs
    • Artgrid result links are normalized into https://artgrid.io/clip/<id>/<slug> inside the collector flow before filtering/enrichment
  • Refactor continued:
    • ranking / Gemini candidate evaluation / recommendation merge logic moved out of handlers/api.go
    • new service layer file: backend/services/ranker.go
    • handler is now thinner and less coupled to search internals

Local Self-Test Workflow

  • Primary command:
    • bash scripts/selftest.sh
  • What it currently verifies:
    • Go formatting for touched backend files
    • Python syntax for worker + mock SearXNG
    • go test ./...
    • backend binary build
    • local app boot with temp SQLite/download dirs
    • /healthz
    • /api/search using a local mock SearXNG server
    • /api/upload
  • Purpose:
    • allow safe local regression checks before push or container build without depending on real SearXNG, Gemini, or browser interaction

Project Summary

  • Project: ai-media-hub
  • Goal: AI-assisted media discovery + ingest dashboard for Unraid
  • Backend: Go
  • Worker: Python + yt-dlp + ffmpeg
  • Frontend: HTML + Vanilla JS + Tailwind CDN
  • Database: SQLite
  • Current search backend: SearXNG
  • Current vision/ranking backend: Gemini 2.5 Flash
  • Deployment target: single Docker container on Unraid
  • Git remote: https://git.savethenurse.com/savethenurse/ai-media-hub.git

Current Architecture

  • backend/main.go App bootstrap, env loading, static frontend serving, route registration
  • backend/handlers/api.go Upload/download/search APIs, WebSocket progress broadcast, debug event broadcast
  • backend/services/cse.go Actual search backend service Despite filename, this is no longer Google CSE logic It now wraps SearXNG search, source filtering, result enrichment, preview asset parsing
  • backend/services/gemini.go Query translation, deterministic query expansion helper, Gemini vision scoring Also extracts first video frame with ffmpeg when no thumbnail exists
  • backend/models/db.go SQLite init + download history
  • worker/downloader.py yt-dlp probe/download + ffmpeg clip extraction
  • frontend/index.html Main dashboard UI, preview modal, debug log panel
  • frontend/app.js API calls, WebSocket status bar, hover preview playback, debug logger panel, platform toggles
  • frontend/style.css Custom styles, clamp helpers, slider thumb styles, debug panel scrollbar styles
  • unraid-template.xml Unraid template for current git.savethenurse.com image source

Search Flow: Current Implementation

  1. User enters a query in Zone A.
  2. Frontend sends /api/search with:
    • query
    • selected platforms
  3. Backend translates the query to English in GeminiService.TranslateQuery. Fallback order:
    • Gemini translation
    • Google Translate HTTP fallback
    • small Korean media-term dictionary replacement
  4. Backend builds deterministic English search variants in GeminiService.ExpandQuery.
  5. Backend calls SearchService.SearchMedia(...).
  6. Search service queries SearXNG for:
    • Envato
    • Artgrid
    • Google Video
  7. Search service filters source URLs aggressively:
    • Google Video: YouTube-only
    • Envato: elements.envato.com item URLs only
    • Artgrid: artgrid.io/clip/... only
  8. Search service enriches results:
    • Envato: parses item page HTML for og:image and preview video URL
    • Artgrid: attempts clip API + HTML parsing for thumbnails and preview sources
  9. Backend ranks all results locally.
  10. Backend evaluates all ranked results with Gemini vision in batches.
  11. Backend merges Gemini recommendations + fallback ranked items and returns JSON to frontend.
  12. Frontend renders cards and hover previews.

Direct Downloader Flow: Current Implementation

  1. User enters URL in Zone C.
  2. Frontend checks duplicate history via /api/history/check.
  3. Frontend loads preview metadata via /api/download/preview.
  4. Preview modal opens with:
    • media preview
    • duration
    • crop dual-thumb slider
    • quality select
  5. User confirms download.
  6. Backend launches Python worker.
  7. Worker downloads source with yt-dlp, clips with ffmpeg, emits JSON progress lines.
  8. Backend rebroadcasts progress over WebSocket.

Current Features Implemented

  • Project folder structure
  • Dockerfile
  • Gitea workflow
  • Unraid template
  • SQLite download history
  • File upload
  • yt-dlp direct downloader
  • Preview modal for direct download
  • Crop selection slider
  • Quality selection
  • WebSocket realtime progress
  • Search source toggles
  • Search card hover preview support
  • Debug log panel in frontend
  • .log download from debug panel

Important Current Constraints / Known Problems

  • Search backend has been rewritten multiple times and is still the main unstable area.
  • Envato previews are parsed mainly from page HTML metadata / structured data.
  • Artgrid previews are partially inferred from:
    • clip page HTML
    • clip API attempts
    • HLS preview handling in frontend
  • Search relevance is still not considered stable enough.
  • Gemini batch evaluation exists, but search quality can still degrade if upstream SearXNG results are noisy.
  • Frontend JavaScript was not linted with Node tooling in this environment because node is not installed here.
  • Full browser-level preview validation is still not covered by the local self-test script.
  • Search cards now separate source snippet from AI reason, but metadata fidelity still depends on source enrichment quality.
  • Artgrid public pages inspected from this environment still did not expose a stable public preview video URL in HTML, so Artgrid hover-video support may remain partial until a browser-captured HTML/HAR sample reveals the real preview source pattern.

Frontend Debug Logger

  • UI button: bottom-right Logs
  • Files:
    • frontend/index.html
    • frontend/app.js
    • frontend/style.css
  • Logs currently capture:
    • API request / response
    • WebSocket progress messages
    • ignored WS debug messages
    • status updates
    • platform toggle state
    • preview source attach / detach
    • hover start / hover end
    • modal preview open / close
    • browser errors
    • promise rejections
    • backend debug broadcasts

Current Environment Variables

  • APP_ROOT
  • APP_ADDR
  • SQLITE_PATH
  • DOWNLOADS_DIR
  • FRONTEND_DIR
  • WORKER_SCRIPT
  • SEARXNG_BASE_URL
  • SEARXNG_GOOGLE_VIDEO_ENGINE
  • SEARXNG_WEB_ENGINE
  • GEMINI_API_KEY

Unraid Template Notes

  • Current image repository in template: git.savethenurse.com/savethenurse/ai-media-hub:latest
  • Current registry in template: https://git.savethenurse.com

Docker / Build Notes

  • Dockerfile uses:
    • Go build stage
    • static ffmpeg image stage
    • Python runtime stage
  • Heavy apt ffmpeg install path was removed earlier to reduce build time.

Git / Push Workflow Used So Far

  • Branch: main
  • Remote: origin
  • All requested changes were committed and pushed incrementally to: https://git.savethenurse.com/savethenurse/ai-media-hub.git

Recent Relevant Commits

  • 8ed1e84 Add in-app debug log panel
  • 823bf12 Reflect selected platforms in search status
  • cceb040 Update platform status and HLS previews
  • ad8afd5 Tighten source filters and add platform toggles
  • 27000db Hide overlays during hover preview
  • b78865d Rewrite search flow and enrich preview assets
  • de24886 Filter non-English expansions and prefer stock sources
  • 0bd458d Boost translated search fallback and source priority

Next Priority Areas

  • Search backend quality stabilization The search service is the main unresolved area.
  • Envato / Artgrid preview extraction hardening
  • Search result relevance validation against real user queries
  • Better matching between rendered description and actual linked asset
  • Add browser-level verification for preview/HLS behavior
  • Add more automated coverage for search ranking / filtering logic
  • If Artgrid hover preview is still required, collect one real clip HTML/HAR from a browser session and derive a stable preview URL parser
  • Add proper frontend build/lint step if Node becomes available

Verified Locally In This Environment

  • go build -o /tmp/ai-media-hub ./backend
  • go test ./... (currently no broad test suite beyond the added fallback tests)
  • Python syntax check for worker + self-test helper
  • local app boot / /healthz through scripts/selftest.sh
  • local /api/search against mock SearXNG through scripts/selftest.sh
  • local /api/upload through scripts/selftest.sh
  • full browser-level validation was not fully reproducible in this environment

Short Handover Summary

  • The codebase exists and runs.
  • Upload/download features mostly exist.
  • Search is implemented but is still the most fragile subsystem.
  • A visible debug logging panel now exists in the web UI and should be used first when continuing work.