Files
ai-media-hub/TODO.md
T
AI Assistant b43886e950
build-push / docker (push) Successful in 4m52s
Add in-app result viewer and expand Gemini review
2026-03-16 10:12:12 +09:00

14 KiB

AI Media Hub Handover

Working Rule

  • This file is both backlog and handover log.
  • Every meaningful change should record:
    • what changed
    • why it changed
    • how it was verified
    • what is still risky or incomplete
  • If a push fails or a change remains local-only, that must be written here explicitly.

Current State At A Glance

  • Project: ai-media-hub
  • Goal: AI-assisted media discovery + ingest dashboard for Unraid
  • Backend: Go
  • Worker: Python + yt-dlp + ffmpeg
  • Frontend: HTML + Vanilla JS + Tailwind CDN
  • Database: SQLite
  • Search backend: SearXNG
  • AI translation / visual ranking: Gemini 2.5 Flash
  • Deployment target: single Docker container on Unraid
  • Git remote: https://git.savethenurse.com/savethenurse/ai-media-hub.git

Current Status Summary

  • Upload / direct download flow is implemented and broadly usable.
  • Search is implemented end-to-end and now refactored into source-specific collectors.
  • Search remains the main unstable subsystem.
  • Envato metadata and preview extraction are much stronger than before.
  • Artgrid metadata fidelity is improved, but stable public hover-video preview extraction is still not solved.
  • Frontend now logs more useful API and debug information than earlier versions.
  • A local self-test workflow now exists and should be run before container builds or pushes.

Current Architecture

  • backend/main.go
    • app bootstrap
    • env loading
    • static frontend serving
    • route registration
  • backend/handlers/api.go
    • upload / download / search APIs
    • WebSocket progress broadcast
    • debug event broadcast
    • search request orchestration only, with ranking/Gemini logic mostly moved out
  • backend/services/cse.go
    • SearXNG querying
    • shared search helpers
    • source-specific enrich helpers
    • URL filtering / parsing utilities
  • backend/services/search_collectors.go
    • source-specific collectors:
      • envatoCollector
      • artgridCollector
      • googleVideoCollector
  • backend/services/ranker.go
    • ranking
    • Gemini candidate cap logic
    • Gemini batch evaluation wrapper
    • recommendation merge logic
  • backend/services/gemini.go
    • query translation
    • deterministic query expansion
    • Gemini vision scoring
    • video frame extraction via ffmpeg when needed
  • backend/models/db.go
    • SQLite init
    • download history
  • worker/downloader.py
    • yt-dlp probe / download
    • ffmpeg clip extraction
  • frontend/index.html
    • main dashboard UI
    • result viewer modal
    • preview modal
    • debug log panel
  • frontend/app.js
    • API calls
    • WebSocket status bar
    • result viewer modal
    • hover preview playback
    • direct download handoff for Google Video results
    • debug logger panel
    • platform toggles
  • frontend/style.css
    • custom styles
    • clamp helpers
    • slider thumb styles
    • debug panel scrollbar styles
  • scripts/selftest.sh
    • local smoke test flow
  • scripts/mock_searxng.py
    • local mock SearXNG used by self-test
  • unraid-template.xml
    • Unraid template for current image source

Search Flow: Current Implementation

  1. User enters a query in Zone A.
  2. Frontend sends /api/search with:
    • query
    • selected platforms
  3. Backend translates the query in GeminiService.TranslateQuery.
    • Gemini translation if available
    • Google Translate HTTP fallback
    • Korean media-term dictionary fallback
    • explicit normalization for known compound phrases such as 사이버 펑크 -> cyberpunk
  4. Backend builds deterministic English search variants in GeminiService.ExpandQuery.
  5. SearchService.SearchMedia(...) orchestrates source-specific collectors.
  6. Collectors query SearXNG separately for:
    • Envato
    • Artgrid
    • Google Video
  7. Each collector applies source-specific acceptance logic.
    • Google Video: YouTube-only plus noise filtering
    • Envato: elements.envato.com item URLs only
    • Artgrid: accepts both:
      • artgrid.io/clip/...
      • artlist.io/stock-footage/clip/...
  8. Artgrid canonical links are normalized to:
    • https://artgrid.io/clip/<id>/<slug>
  9. Results are enriched source-by-source.
    • Envato:
      • VideoObject JSON-LD preferred
      • page meta preferred over search-engine proxy thumbnail
      • preview mp4 extraction via JSON-LD / HTML parsing
      • Python HTML fetch fallback used when Go HTTP fetch gets Cloudflare challenge pages
    • Artgrid:
      • page title / description / thumbnail cleaning
      • homepage / challenge HTML is now rejected so generic site metadata does not overwrite clip metadata
      • preview video extraction still not stable
  10. Ranked results are passed through the shared ranker.
  11. All ranked candidates are evaluated with Gemini Vision in batches.
  12. Merge order now prefers:
  • Gemini recommended items
  • Gemini-reviewed non-recommended items
  • keyword fallback items only if Gemini output is incomplete
  1. Frontend renders cards, result viewer modal, and hover previews.

Direct Downloader Flow: Current Implementation

  1. User enters URL in Zone C.
  2. Frontend checks duplicate history via /api/history/check.
  3. Frontend loads preview metadata via /api/download/preview.
  4. Preview modal opens with:
    • media preview
    • duration
    • crop dual-thumb slider
    • quality select
  5. User confirms download.
  6. Backend launches Python worker.
  7. Worker downloads source with yt-dlp, clips with ffmpeg, emits JSON progress lines.
  8. Backend rebroadcasts progress over WebSocket.

Major Work Completed So Far

  • Added local self-test workflow:
    • scripts/selftest.sh
    • scripts/mock_searxng.py
  • Fixed translation fallback when Gemini key is missing.
  • Added tests for translation fallback logic.
  • Added HLS frontend wiring:
    • hls.js script
    • native HLS fallback
  • Reduced search timeout risk by:
    • limiting collector result caps
    • limiting enrichment scope
    • limiting Gemini Vision evaluation scope
    • replacing oversized raw debug result payloads with summaries
  • Improved Google Video filtering:
    • rejects more music / trailer / BGM style noise
  • Improved Envato fidelity:
    • real title / description / thumbnail / preview from source page
  • Improved Artgrid fidelity:
    • accepts canonical Artlist URLs
    • normalizes Artgrid clip URLs
    • cleans title / description better
  • Refactored search into source-specific collectors.
  • Moved ranking and Gemini batch handling into backend/services/ranker.go.
  • Fixed server-side 500 caused by Gemini candidate cap exceeding available ranked candidates.
  • Improved frontend logging:
    • raw non-JSON error body logging
    • more compact debug payload rendering
  • Changed hover preview playback to lazy attach on hover:
    • attach source on mouseenter
    • wait for readiness before play()
    • detach source on mouseleave
  • Added in-app result viewer modal for search results:
    • results now open in a modal instead of directly opening a new tab
    • modal shows embedded site iframe, external open button, source summary, and full AI note
  • Google Video results can now jump directly into the existing direct-download preview / crop flow from the result viewer
  • Gemini reason generation is now intended to be Korean-first for readability
  • Gemini Vision evaluation now covers all ranked results instead of only a top subset

Current Features Implemented

  • Project folder structure
  • Dockerfile
  • Gitea workflow
  • Unraid template
  • SQLite download history
  • File upload
  • yt-dlp direct downloader
  • Preview modal for direct download
  • Crop selection slider
  • Quality selection
  • WebSocket realtime progress
  • Search source toggles
  • Search card hover preview support
  • Result viewer modal for search results
  • Google Video direct-download handoff from search results
  • Debug log panel in frontend
  • .log download from debug panel
  • Local self-test workflow
  • Source-specific search collectors
  • Shared ranker service layer

Important Current Constraints / Known Problems

  • Search backend quality is still the most fragile subsystem.
  • Search relevance is still heuristic-heavy and not yet benchmarked against a durable real-query set.
  • Embedded result viewer uses an iframe, so some third-party sites may still block embedding with X-Frame-Options / CSP.
  • Artgrid hover-video preview is still partial / unresolved:
    • provided Artgrid HTML snapshots and downloaded asset bundles did not expose a stable public preview mp4/m3u8 URL
    • public HTML often only exposes title / description / thumbnail / canonical URL
  • Artgrid can still be sensitive to how SearXNG indexes canonical domains.
  • Full browser-level validation is still not covered by local self-test.
  • Frontend JavaScript still has no Node-based lint/build step in this environment.
  • Search cards now separate source snippet from AI reason, but metadata fidelity still depends on source enrichment quality.
  • Gemini notes are now intended to be Korean, but final output quality still depends on Gemini response consistency.
  • The local self-test script is better than before, but it is still a smoke test, not full integration coverage.

Current Risks Around Search Quality

  • Upstream SearXNG quality still controls the candidate pool.
  • Gemini Vision can only rerank the candidates it receives.
  • If source enrichment fails, Gemini may still judge a weaker proxy thumbnail or fallback image.
  • Compound Korean intents are better handled now, but the translation path is still heuristic and can drift on niche concepts.
  • Running Gemini Vision across all ranked results increases latency and token usage compared with the earlier capped approach.

Frontend Debug Logger

  • UI button: bottom-right Logs
  • Files:
    • frontend/index.html
    • frontend/app.js
    • frontend/style.css
  • Logs currently capture:
    • API request / response
    • WebSocket progress messages
    • ignored WS debug messages
    • status updates
    • platform toggle state
    • result viewer modal open / close
    • preview source attach / detach
    • hover start / hover end
    • hover play errors
    • modal preview open / close
    • browser errors
    • promise rejections
    • backend debug broadcasts

Current Environment Variables

  • APP_ROOT
  • APP_ADDR
  • SQLITE_PATH
  • DOWNLOADS_DIR
  • FRONTEND_DIR
  • WORKER_SCRIPT
  • SEARXNG_BASE_URL
  • SEARXNG_GOOGLE_VIDEO_ENGINE
  • SEARXNG_WEB_ENGINE
  • GEMINI_API_KEY

Local Self-Test Workflow

  • Primary command:
    • bash scripts/selftest.sh
  • What it currently verifies:
    • Go formatting for touched backend files
    • Python syntax for worker + mock SearXNG
    • go test ./...
    • backend binary build
    • local app boot with temp SQLite/download dirs
    • /healthz
    • /api/search using local mock SearXNG
    • /api/upload
  • Notes:
    • search step now retries to reduce startup timing flakiness
    • this is a smoke test, not a browser-level verification suite

Verified Locally In This Environment

  • go build -o /tmp/ai-media-hub ./backend
  • go test ./...
  • Python syntax check for worker + self-test helper
  • local app boot / /healthz through scripts/selftest.sh
  • local /api/search against mock SearXNG through scripts/selftest.sh
  • local /api/upload through scripts/selftest.sh
  • full browser-level validation was not fully reproducible in this environment

Unraid / Docker / CI Notes

  • Dockerfile uses:
    • Go build stage
    • static ffmpeg image stage
    • Python runtime stage
  • Heavy apt ffmpeg install path was removed earlier to reduce build time.
  • Gitea workflow builds and pushes:
    • git.savethenurse.com/savethenurse/ai-media-hub:latest
    • git.savethenurse.com/savethenurse/ai-media-hub:${{ github.sha }}

Recent Relevant Commits

  • 9637b76 Improve query intent handling and preview playback
  • 6d9391b Expand Artgrid query coverage to artlist canonical URLs
  • d8cc32e Fix Gemini candidate cap causing search 500s
  • e426261 Fix Artgrid collector matching and split ranker
  • 5aebbef Refactor search into source-specific collectors
  • ae091c5 Improve source parsing from Envato and Artgrid HTML
  • 06ea4f3 Restore Envato and Artgrid fallback search breadth
  • 7dfb1ad Stabilize search pipeline and improve preview diagnostics
  • 6f3149a Add local self-test flow and fix fallback regressions
  • f968458 Rewrite TODO as project handover

Git / Push Status

  • Last pushed commit known in earlier work:
    • 6d9391b was pushed successfully
  • Local-only work currently exists:
    • 9637b76 Improve query intent handling and preview playback
  • Push status for 9637b76:
    • not pushed
    • remote rejected the push with:
      • remote unpack failed: unable to create temporary object directory
      • remote rejected main -> main (unpacker error)
  • Interpretation:
    • current blocker appears to be on the remote git server side, not a local git history issue

Highest-Value Next Steps

  • Re-try push of local commit once remote git storage/unpacker issue is resolved
  • Build collector-specific integration tests with recorded SearXNG samples
  • Separate source enrichment tests from live network behavior using local fixtures
  • Add a browser-level preview validation path, especially for hover video
  • If Artgrid hover preview is still required, obtain one real clip HAR / DevTools network export and derive a stable preview URL parser
  • Build a small fixed real-query benchmark set to evaluate search quality before further tuning
  • If frontend tooling becomes available, add lint/build checks

Short Handover Summary

  • The codebase runs.
  • Upload/download features mostly exist.
  • Search has been significantly refactored and is in a better shape than before, but is still the main unstable area.
  • Envato source fidelity is much better than earlier.
  • Artgrid source fidelity is better, but preview-video extraction is still incomplete.
  • There is now a local self-test workflow.
  • There is one known local commit that has not been pushed because the remote repo reported an unpacker error.