savethenurse/ai-media-hub

Fork 0

Files

T

AI Assistant 6972178a2b

build-push / docker (push) Successful in 4m6s

Details

Reduce Gemini fallback noise and restore result volume

2026-03-16 14:38:46 +09:00

23 KiB

Raw Blame History

AI Media Hub Handover

Working Rule

This file is both backlog and handover log.
Every meaningful change should record:
- what changed
- why it changed
- how it was verified
- what is still risky or incomplete
If a push fails or a change remains local-only, that must be written here explicitly.

Current State At A Glance

Project: ai-media-hub
Goal: AI-assisted media discovery + ingest dashboard for Unraid
Backend: Go
Worker: Python + yt-dlp + ffmpeg
Frontend: HTML + Vanilla JS + Tailwind CDN
Database: SQLite
Search backend: SearXNG
AI translation / visual ranking: Gemini 2.5 Flash
Deployment target: single Docker container on Unraid
Git remote: https://git.savethenurse.com/savethenurse/ai-media-hub.git

Current Status Summary

Upload / direct download flow is implemented and broadly usable.
Search is implemented end-to-end and now refactored into source-specific collectors.
Search remains the main unstable subsystem.
Envato metadata and preview extraction are much stronger than before, including additional hydration-data preview fallback.
Artgrid metadata fidelity is improved, but stable public hover-video preview extraction is still not solved.
Frontend now logs more useful API and debug information than earlier versions.
A local self-test workflow now exists and should be run before container builds or pushes.

Current Architecture

backend/main.go
- app bootstrap
- env loading
- static frontend serving
- route registration
backend/handlers/api.go
- upload / download / search APIs
- WebSocket progress broadcast
- debug event broadcast
- search request orchestration only, with ranking/Gemini logic mostly moved out
backend/services/cse.go
- SearXNG querying
- shared search helpers
- source-specific enrich helpers
- URL filtering / parsing utilities
backend/services/search_collectors.go
- source-specific collectors:
  - envatoCollector
  - artgridCollector
  - googleVideoCollector
backend/services/ranker.go
- ranking
- Gemini candidate cap logic
- Gemini batch evaluation wrapper
- recommendation merge logic
backend/services/gemini.go
- query translation
- deterministic query expansion
- Gemini vision scoring
- video frame extraction via ffmpeg when needed
backend/models/db.go
- SQLite init
- download history
worker/downloader.py
- yt-dlp probe / download
- ffmpeg clip extraction
frontend/index.html
- main dashboard UI
- result viewer modal
- preview modal
- debug log panel
frontend/app.js
- API calls
- WebSocket status bar
- result viewer modal
- hover preview playback
- direct download handoff for Google Video results
- debug logger panel
- platform toggles
frontend/style.css
- custom styles
- clamp helpers
- slider thumb styles
- debug panel scrollbar styles
scripts/selftest.sh
- local smoke test flow
scripts/mock_searxng.py
- local mock SearXNG used by self-test
unraid-template.xml
- Unraid template for current image source

Search Flow: Current Implementation

User enters a query in Zone A.
Frontend sends /api/search with:
- query
- selected platforms
Backend translates the query in GeminiService.TranslateQuery.
- Gemini translation if available
- Google Translate HTTP fallback
- Korean media-term dictionary fallback
- explicit normalization for known compound phrases such as 사이버 펑크 -> cyberpunk
Backend builds deterministic English search variants in GeminiService.ExpandQuery.
SearchService.SearchMedia(...) orchestrates source-specific collectors.
Collectors query SearXNG separately for:
- Envato
- Artgrid
- Google Video
Each collector applies source-specific acceptance logic.
- Google Video: YouTube-only plus noise filtering
- Envato: elements.envato.com item URLs only
- Artgrid: accepts both:
  - artgrid.io/clip/...
  - artlist.io/stock-footage/clip/...
Artgrid canonical links are normalized to:
- https://artgrid.io/clip/<id>/<slug>
Results are enriched source-by-source.
- Envato:
  - VideoObject JSON-LD preferred
  - page meta preferred over search-engine proxy thumbnail
  - preview mp4 extraction via JSON-LD / HTML parsing
  - Python HTML fetch fallback used when Go HTTP fetch gets Cloudflare challenge pages
- Artgrid:
  - page title / description / thumbnail cleaning
  - homepage / challenge HTML is now rejected so generic site metadata does not overwrite clip metadata
  - preview video extraction still not stable
Ranked results are passed through the shared ranker.
All ranked candidates are evaluated with Gemini Vision in batches.
Merge order now prefers:

Gemini recommended items
Gemini-reviewed non-recommended items
keyword fallback items only if Gemini output is incomplete

Frontend renders cards, result viewer modal, and hover previews.

Direct Downloader Flow: Current Implementation

User enters URL in Zone C.
Frontend checks duplicate history via /api/history/check.
Frontend loads preview metadata via /api/download/preview.
Preview modal opens with:
- media preview
- duration
- crop dual-thumb slider
- quality select
User confirms download.
Backend launches Python worker.
Worker downloads source with yt-dlp, clips with ffmpeg, emits JSON progress lines.
Backend rebroadcasts progress over WebSocket.

Major Work Completed So Far

Added local self-test workflow:
- scripts/selftest.sh
- scripts/mock_searxng.py
Fixed translation fallback when Gemini key is missing.
Added tests for translation fallback logic.
Added HLS frontend wiring:
- hls.js script
- native HLS fallback
Reduced search timeout risk by:
- limiting collector result caps
- limiting enrichment scope
- limiting Gemini Vision evaluation scope
- replacing oversized raw debug result payloads with summaries
Improved Google Video filtering:
- rejects more music / trailer / BGM style noise
Improved Envato fidelity:
- real title / description / thumbnail / preview from source page
Improved Artgrid fidelity:
- accepts canonical Artlist URLs
- normalizes Artgrid clip URLs
- cleans title / description better
Refactored search into source-specific collectors.
Moved ranking and Gemini batch handling into backend/services/ranker.go.
Fixed server-side 500 caused by Gemini candidate cap exceeding available ranked candidates.
Improved frontend logging:
- raw non-JSON error body logging
- more compact debug payload rendering
Changed hover preview playback to lazy attach on hover:
- attach source on mouseenter
- wait for readiness before play()
- detach source on mouseleave
Added in-app result viewer modal for search results:
- results now open in a modal instead of directly opening a new tab
- modal now prefers internal preview media over embedded third-party iframes to avoid embed blocking
- external open button remains available
Google Video results can now jump directly into the existing direct-download preview / crop flow from the result viewer
Gemini reason generation is now intended to be Korean-first for readability
Gemini Vision evaluation now covers all ranked results instead of only a top subset
Search results now prioritize AI note text visually ahead of source summary
Search query order and final top results now include light randomness so repeated searches are less static

Current Features Implemented

Project folder structure
Dockerfile
Gitea workflow
Unraid template
SQLite download history
File upload
yt-dlp direct downloader
Preview modal for direct download
Crop selection slider
Quality selection
WebSocket realtime progress
Search source toggles
Search card hover preview support
Result viewer modal for search results
Google Video direct-download handoff from search results
Debug log panel in frontend
.log download from debug panel
Local self-test workflow
Source-specific search collectors
Shared ranker service layer

Important Current Constraints / Known Problems

Search backend quality is still the most fragile subsystem.
Search relevance is still heuristic-heavy and not yet benchmarked against a durable real-query set.
Embedded third-party result viewing is no longer relied on because many providers block iframe embedding with X-Frame-Options / CSP.
Artgrid hover-video preview is still partial / unresolved:
- provided Artgrid HTML snapshots and downloaded asset bundles did not expose a stable public preview mp4/m3u8 URL
- public HTML often only exposes title / description / thumbnail / canonical URL
Artgrid can still be sensitive to how SearXNG indexes canonical domains.
Full browser-level validation is still not covered by local self-test.
Frontend JavaScript still has no Node-based lint/build step in this environment.
Search cards now separate source snippet from AI reason, but metadata fidelity still depends on source enrichment quality.
Gemini notes are now intended to be Korean, but final output quality still depends on Gemini response consistency.
The local self-test script is better than before, but it is still a smoke test, not full integration coverage.

Current Risks Around Search Quality

Upstream SearXNG quality still controls the candidate pool.
Gemini Vision can only rerank the candidates it receives.
If source enrichment fails, Gemini may still judge a weaker proxy thumbnail or fallback image.
Compound Korean intents are better handled now, but the translation path is still heuristic and can drift on niche concepts.
Running Gemini Vision across all ranked results increases latency and token usage compared with the earlier capped approach.

Frontend Debug Logger

UI button: bottom-right Logs
Files:
- frontend/index.html
- frontend/app.js
- frontend/style.css
Logs currently capture:
- API request / response
- WebSocket progress messages
- ignored WS debug messages
- status updates
- platform toggle state
- result viewer modal open / close
- preview source attach / detach
- hover start / hover end
- hover play errors
- modal preview open / close
- browser errors
- promise rejections
- backend debug broadcasts

Recent Change Log

Date: 2026-03-16
What changed:
- Result modal layout was rebuilt to match a top 16:9 embedded viewer with bottom-left full AI note and bottom-right action panel.
- Google Video results now load YouTube embed URLs in the modal viewer and keep the white Direct Download action in the lower-right panel.
- When Gemini evaluation comes back mostly negative or too weak, the backend now runs one supplemental search pass with broader intent variants and reevaluates the merged pool.
- Failed Gemini batch evaluations now retry sequentially candidate-by-candidate with a short delay so more candidates can still be processed when batch/token evaluation is unstable.
Why it changed:
- The requested modal information hierarchy was different from the previous implementation.
- The user wanted negative Gemini feedback to trigger more exploration instead of stopping at the first pool.
- Batch-level Gemini failures were causing too many results to skip evaluation entirely.
How it was verified:
- code-path inspection against the updated modal wiring and search flow
What is still risky or incomplete:
- Non-YouTube third-party pages can still refuse iframe embedding via CSP or X-Frame-Options.
- Sequential Gemini retries improve coverage but also increase latency when the model is degraded.
Date: 2026-03-16
What changed:
- Bumped frontend asset version and added result-modal initialization guards to avoid click failures when browser cache serves mismatched HTML/JS.
Why it changed:
- The result modal stopped opening after the modal markup refactor, which is consistent with stale cached frontend assets or partially initialized modal DOM.
How it was verified:
- static code inspection of modal DOM/JS bindings
What is still risky or incomplete:
- Browser cache behavior itself was not fully reproduced here, so a hard refresh may still be needed in an already-open client session.
Date: 2026-03-16
What changed:
- Envato preview extraction now also inspects INITIAL_HYDRATION_DATA when direct page meta / JSON-LD preview URLs are missing.
- Search result cards and result modal now surface AI note before source summary text.
- Google Video direct download action moved into the AI note area and now seeds Zone C input before opening the shared preview/download modal.
- Result modal no longer depends on third-party iframe embedding and instead shows internal preview media plus external-open fallback.
- Search flow now shuffles collector query order and lightly randomizes the top merged results to reduce identical repeated outputs.
Why it changed:
- Some Envato items still missed preview URLs.
- Third-party iframe embedding was failing for blocked sites and creating a poor modal experience.
- The user wanted AI note to be the primary explanatory text and Google Video download action to be more obvious.
- Repeated searches returning the same ordering made the discovery experience feel too static.
How it was verified:
- go test ./...
- python3 -m py_compile worker/downloader.py scripts/mock_searxng.py
- bash scripts/selftest.sh
- added unit coverage for Envato hydration preview extraction
What is still risky or incomplete:
- Envato hydration structure could change again, so this fallback is still heuristic.
- Full browser-level validation of the revised result modal and button placement was not fully reproducible in this environment.
- Search randomness currently changes ordering and query traversal, but does not guarantee materially different source pools if upstream SearXNG returns a narrow candidate set.

Current Environment Variables

APP_ROOT
APP_ADDR
SQLITE_PATH
DOWNLOADS_DIR
FRONTEND_DIR
WORKER_SCRIPT
SEARXNG_BASE_URL
SEARXNG_GOOGLE_VIDEO_ENGINE
SEARXNG_WEB_ENGINE
GEMINI_API_KEY

Local Self-Test Workflow

Primary command:
- bash scripts/selftest.sh
What it currently verifies:
- Go formatting for touched backend files
- Python syntax for worker + mock SearXNG
- go test ./...
- backend binary build
- local app boot with temp SQLite/download dirs
- /healthz
- /api/search using local mock SearXNG
- /api/upload
Notes:
- search step now retries to reduce startup timing flakiness
- this is a smoke test, not a browser-level verification suite

Verified Locally In This Environment

go build -o /tmp/ai-media-hub ./backend
go test ./...
Python syntax check for worker + self-test helper
local app boot / /healthz through scripts/selftest.sh
local /api/search against mock SearXNG through scripts/selftest.sh
local /api/upload through scripts/selftest.sh
full browser-level validation was not fully reproducible in this environment

Recent Change Log

Date: 2026-03-16
What changed:
- Added /api/preview/stream so remote preview assets are fetched through the backend instead of relying on the browser to load Envato / Artgrid media directly.
- MP4 previews are cached on disk under the downloads area, and HLS playlists are rewritten so segment fetches also flow through the same backend proxy route.
Why it changed:
- Direct browser loading of remote preview URLs was still unstable and often failed due to upstream restrictions or missing headers.
How it was verified:
- code-path inspection of preview proxy and playlist rewrite flow
What is still risky or incomplete:
- HLS caching is not yet persisted segment-by-segment; current implementation rewrites playlists and proxies segment requests live.
Date: 2026-03-16
What changed:
- Relaxed final recommendation merge so Gemini-reviewed non-negative items can still appear, and only a small preview-capable ranked filler set is used when the result list is otherwise too thin.
Why it changed:
- The stricter recommended=true only merge made the visible result set collapse too aggressively in real searches.
How it was verified:
- go test ./...
- bash scripts/selftest.sh
What is still risky or incomplete:
- Filler results can still appear with the fallback reason when Gemini-reviewed positives are scarce, though the amount is now intentionally capped.
Date: 2026-03-16
What changed:
- Google Video embed URL now uses youtube-nocookie with explicit origin to reduce player load failures.
- Gemini Vision prompt now forces a Yes or No verdict per candidate and only Yes candidates are merged into the final result set.
- Candidate visual fetch now prefers Envato / Artgrid preview video frames before thumbnails and sends a Referer header for thumbnail fetches.
- Envato / Artgrid enrichment now retries preview extraction once after a short delay when the first fetch still lacks a usable preview URL.
Why it changed:
- The user reported YouTube player error 153, too many fallback-style results, and source pages that appear to need additional loading time before preview URLs become visible.
How it was verified:
- log review from ai-media-hub-2026-03-16T05-05-58-704Z.log
- go test ./...
What is still risky or incomplete:
- If a YouTube video has embedding disabled by the uploader, no embed URL variant will fully bypass that restriction.
- Delay-and-retry HTML fetching cannot execute JavaScript, so pages that only expose preview URLs after full client-side rendering may still need a real browser-based fetch path later.
Date: 2026-03-16
What changed:
- Reduced the heaviest search-stage caps slightly: fewer query variants per request, smaller per-source result caps, lower enrichment scope, and a bounded Gemini candidate set.
Why it changed:
- The widened search configuration was pushing the request past the reverse-proxy timeout and surfacing 504 Gateway Time-out.
How it was verified:
- go test ./...
- bash scripts/selftest.sh
What is still risky or incomplete:
- Search coverage is still broader than the original baseline, but there is now an explicit tradeoff between result volume and request latency.
Date: 2026-03-16
What changed:
- Increased collector result caps and widened source-specific search query templates for Envato, Artgrid, and Google Video.
- Strengthened Gemini query-expansion and vision prompts with a professional video-editor framing.
- Restored result modal media fallback so Google Video uses YouTube embed while Envato and Artgrid can show preview video or thumbnail instead of blocked iframe pages.
- Expanded generic preview URL parsing so HTML-embedded .mp4 and .m3u8 sources are accepted more broadly.
Why it changed:
- Search result volume was too low.
- The user wanted Gemini to reason more like a professional editor.
- Envato iframe pages were being refused, Google Video modal opening was broken, and preview extraction still missed known media URLs.
How it was verified:
- local code inspection against attached Envato / Artgrid HTML samples
- go test ./...
What is still risky or incomplete:
- The attached Artgrid HTML sample is a generic homepage shell, so preview extraction still depends on what the live clip page or downstream assets expose at runtime.
- Some providers can still refuse iframe rendering even when Artgrid pages currently appear to work.

Unraid / Docker / CI Notes

Dockerfile uses:
- Go build stage
- static ffmpeg image stage
- Python runtime stage
Heavy apt ffmpeg install path was removed earlier to reduce build time.
Gitea workflow builds and pushes:
- git.savethenurse.com/savethenurse/ai-media-hub:latest
- git.savethenurse.com/savethenurse/ai-media-hub:${{ github.sha }}

Recent Relevant Commits

9637b76 Improve query intent handling and preview playback
6d9391b Expand Artgrid query coverage to artlist canonical URLs
d8cc32e Fix Gemini candidate cap causing search 500s
e426261 Fix Artgrid collector matching and split ranker
5aebbef Refactor search into source-specific collectors
ae091c5 Improve source parsing from Envato and Artgrid HTML
06ea4f3 Restore Envato and Artgrid fallback search breadth
7dfb1ad Stabilize search pipeline and improve preview diagnostics
6f3149a Add local self-test flow and fix fallback regressions
f968458 Rewrite TODO as project handover

Git / Push Status

Last pushed commit known in earlier work:
- 6d9391b was pushed successfully
Local-only work currently exists:
- 9637b76 Improve query intent handling and preview playback
Push status for 9637b76:
- not pushed
- remote rejected the push with:
  - remote unpack failed: unable to create temporary object directory
  - remote rejected main -> main (unpacker error)
Interpretation:
- current blocker appears to be on the remote git server side, not a local git history issue

Highest-Value Next Steps

Re-try push of local commit once remote git storage/unpacker issue is resolved
Build collector-specific integration tests with recorded SearXNG samples
Separate source enrichment tests from live network behavior using local fixtures
Add a browser-level preview validation path, especially for hover video
If Artgrid hover preview is still required, obtain one real clip HAR / DevTools network export and derive a stable preview URL parser
Build a small fixed real-query benchmark set to evaluate search quality before further tuning
If frontend tooling becomes available, add lint/build checks

Short Handover Summary

The codebase runs.
Upload/download features mostly exist.
Search has been significantly refactored and is in a better shape than before, but is still the main unstable area.
Envato source fidelity is much better than earlier.
Artgrid source fidelity is better, but preview-video extraction is still incomplete.
There is now a local self-test workflow.
There is one known local commit that has not been pushed because the remote repo reported an unpacker error.

Recent Change Log

Date: 2026-03-16
What changed:
- Removed per-batch Gemini fallback injection so empty Gemini sub-results no longer automatically turn into many "Gemini Vision 응답이 부족해..." items.
- Relaxed final merge to keep more non-negative Gemini-reviewed candidates and allow a larger capped preview-capable filler set when the visible result list is too small.
- Slightly raised Envato / Artgrid caps and enrichment scope again after the stricter merge caused visible result count to collapse too far.
- Bumped frontend asset version to 20260316e so clients pick up the newer preview-proxy behavior.
Why it changed:
- The user still saw too few results and too many fallback-labeled items despite the earlier filtering changes.
How it was verified:
- go test ./...
What is still risky or incomplete:
- If the browser is holding an older cached app.js, a hard refresh may still be needed before the proxied preview path is actually used on the client.

23 KiB Raw Blame History

AI Media Hub Handover

Working Rule

Current State At A Glance

Current Status Summary

Current Architecture

Search Flow: Current Implementation

Direct Downloader Flow: Current Implementation

Major Work Completed So Far

Current Features Implemented

Important Current Constraints / Known Problems

Current Risks Around Search Quality

Frontend Debug Logger

Recent Change Log

Current Environment Variables

Local Self-Test Workflow

Verified Locally In This Environment

Recent Change Log

Unraid / Docker / CI Notes

Recent Relevant Commits

Git / Push Status

Highest-Value Next Steps

Short Handover Summary

Recent Change Log

23 KiB

Raw Blame History