424 lines
20 KiB
Markdown
424 lines
20 KiB
Markdown
# AI Media Hub Handover
|
|
|
|
## Working Rule
|
|
- This file is both backlog and handover log.
|
|
- Every meaningful change should record:
|
|
- what changed
|
|
- why it changed
|
|
- how it was verified
|
|
- what is still risky or incomplete
|
|
- If a push fails or a change remains local-only, that must be written here explicitly.
|
|
|
|
## Current State At A Glance
|
|
- Project: `ai-media-hub`
|
|
- Goal: AI-assisted media discovery + ingest dashboard for Unraid
|
|
- Backend: Go
|
|
- Worker: Python + `yt-dlp` + `ffmpeg`
|
|
- Frontend: HTML + Vanilla JS + Tailwind CDN
|
|
- Database: SQLite
|
|
- Search backend: `SearXNG`
|
|
- AI translation / visual ranking: `Gemini 2.5 Flash`
|
|
- Deployment target: single Docker container on Unraid
|
|
- Git remote: `https://git.savethenurse.com/savethenurse/ai-media-hub.git`
|
|
|
|
## Current Status Summary
|
|
- Upload / direct download flow is implemented and broadly usable.
|
|
- Search is implemented end-to-end and now refactored into source-specific collectors.
|
|
- Search remains the main unstable subsystem.
|
|
- Envato metadata and preview extraction are much stronger than before, including additional hydration-data preview fallback.
|
|
- Artgrid metadata fidelity is improved, but stable public hover-video preview extraction is still not solved.
|
|
- Frontend now logs more useful API and debug information than earlier versions.
|
|
- A local self-test workflow now exists and should be run before container builds or pushes.
|
|
|
|
## Current Architecture
|
|
- `backend/main.go`
|
|
- app bootstrap
|
|
- env loading
|
|
- static frontend serving
|
|
- route registration
|
|
- `backend/handlers/api.go`
|
|
- upload / download / search APIs
|
|
- WebSocket progress broadcast
|
|
- debug event broadcast
|
|
- search request orchestration only, with ranking/Gemini logic mostly moved out
|
|
- `backend/services/cse.go`
|
|
- SearXNG querying
|
|
- shared search helpers
|
|
- source-specific enrich helpers
|
|
- URL filtering / parsing utilities
|
|
- `backend/services/search_collectors.go`
|
|
- source-specific collectors:
|
|
- `envatoCollector`
|
|
- `artgridCollector`
|
|
- `googleVideoCollector`
|
|
- `backend/services/ranker.go`
|
|
- ranking
|
|
- Gemini candidate cap logic
|
|
- Gemini batch evaluation wrapper
|
|
- recommendation merge logic
|
|
- `backend/services/gemini.go`
|
|
- query translation
|
|
- deterministic query expansion
|
|
- Gemini vision scoring
|
|
- video frame extraction via `ffmpeg` when needed
|
|
- `backend/models/db.go`
|
|
- SQLite init
|
|
- download history
|
|
- `worker/downloader.py`
|
|
- `yt-dlp` probe / download
|
|
- `ffmpeg` clip extraction
|
|
- `frontend/index.html`
|
|
- main dashboard UI
|
|
- result viewer modal
|
|
- preview modal
|
|
- debug log panel
|
|
- `frontend/app.js`
|
|
- API calls
|
|
- WebSocket status bar
|
|
- result viewer modal
|
|
- hover preview playback
|
|
- direct download handoff for Google Video results
|
|
- debug logger panel
|
|
- platform toggles
|
|
- `frontend/style.css`
|
|
- custom styles
|
|
- clamp helpers
|
|
- slider thumb styles
|
|
- debug panel scrollbar styles
|
|
- `scripts/selftest.sh`
|
|
- local smoke test flow
|
|
- `scripts/mock_searxng.py`
|
|
- local mock SearXNG used by self-test
|
|
- `unraid-template.xml`
|
|
- Unraid template for current image source
|
|
|
|
## Search Flow: Current Implementation
|
|
1. User enters a query in Zone A.
|
|
2. Frontend sends `/api/search` with:
|
|
- `query`
|
|
- selected `platforms`
|
|
3. Backend translates the query in `GeminiService.TranslateQuery`.
|
|
- Gemini translation if available
|
|
- Google Translate HTTP fallback
|
|
- Korean media-term dictionary fallback
|
|
- explicit normalization for known compound phrases such as `사이버 펑크` -> `cyberpunk`
|
|
4. Backend builds deterministic English search variants in `GeminiService.ExpandQuery`.
|
|
5. `SearchService.SearchMedia(...)` orchestrates source-specific collectors.
|
|
6. Collectors query SearXNG separately for:
|
|
- Envato
|
|
- Artgrid
|
|
- Google Video
|
|
7. Each collector applies source-specific acceptance logic.
|
|
- Google Video: YouTube-only plus noise filtering
|
|
- Envato: `elements.envato.com` item URLs only
|
|
- Artgrid: accepts both:
|
|
- `artgrid.io/clip/...`
|
|
- `artlist.io/stock-footage/clip/...`
|
|
8. Artgrid canonical links are normalized to:
|
|
- `https://artgrid.io/clip/<id>/<slug>`
|
|
9. Results are enriched source-by-source.
|
|
- Envato:
|
|
- `VideoObject` JSON-LD preferred
|
|
- page meta preferred over search-engine proxy thumbnail
|
|
- preview mp4 extraction via JSON-LD / HTML parsing
|
|
- Python HTML fetch fallback used when Go HTTP fetch gets Cloudflare challenge pages
|
|
- Artgrid:
|
|
- page title / description / thumbnail cleaning
|
|
- homepage / challenge HTML is now rejected so generic site metadata does not overwrite clip metadata
|
|
- preview video extraction still not stable
|
|
10. Ranked results are passed through the shared ranker.
|
|
11. All ranked candidates are evaluated with Gemini Vision in batches.
|
|
12. Merge order now prefers:
|
|
- Gemini recommended items
|
|
- Gemini-reviewed non-recommended items
|
|
- keyword fallback items only if Gemini output is incomplete
|
|
13. Frontend renders cards, result viewer modal, and hover previews.
|
|
|
|
## Direct Downloader Flow: Current Implementation
|
|
1. User enters URL in Zone C.
|
|
2. Frontend checks duplicate history via `/api/history/check`.
|
|
3. Frontend loads preview metadata via `/api/download/preview`.
|
|
4. Preview modal opens with:
|
|
- media preview
|
|
- duration
|
|
- crop dual-thumb slider
|
|
- quality select
|
|
5. User confirms download.
|
|
6. Backend launches Python worker.
|
|
7. Worker downloads source with `yt-dlp`, clips with `ffmpeg`, emits JSON progress lines.
|
|
8. Backend rebroadcasts progress over WebSocket.
|
|
|
|
## Major Work Completed So Far
|
|
- Added local self-test workflow:
|
|
- `scripts/selftest.sh`
|
|
- `scripts/mock_searxng.py`
|
|
- Fixed translation fallback when Gemini key is missing.
|
|
- Added tests for translation fallback logic.
|
|
- Added HLS frontend wiring:
|
|
- `hls.js` script
|
|
- native HLS fallback
|
|
- Reduced search timeout risk by:
|
|
- limiting collector result caps
|
|
- limiting enrichment scope
|
|
- limiting Gemini Vision evaluation scope
|
|
- replacing oversized raw debug result payloads with summaries
|
|
- Improved Google Video filtering:
|
|
- rejects more music / trailer / BGM style noise
|
|
- Improved Envato fidelity:
|
|
- real title / description / thumbnail / preview from source page
|
|
- Improved Artgrid fidelity:
|
|
- accepts canonical Artlist URLs
|
|
- normalizes Artgrid clip URLs
|
|
- cleans title / description better
|
|
- Refactored search into source-specific collectors.
|
|
- Moved ranking and Gemini batch handling into `backend/services/ranker.go`.
|
|
- Fixed server-side 500 caused by Gemini candidate cap exceeding available ranked candidates.
|
|
- Improved frontend logging:
|
|
- raw non-JSON error body logging
|
|
- more compact debug payload rendering
|
|
- Changed hover preview playback to lazy attach on hover:
|
|
- attach source on `mouseenter`
|
|
- wait for readiness before `play()`
|
|
- detach source on `mouseleave`
|
|
- Added in-app result viewer modal for search results:
|
|
- results now open in a modal instead of directly opening a new tab
|
|
- modal now prefers internal preview media over embedded third-party iframes to avoid embed blocking
|
|
- external open button remains available
|
|
- Google Video results can now jump directly into the existing direct-download preview / crop flow from the result viewer
|
|
- Gemini reason generation is now intended to be Korean-first for readability
|
|
- Gemini Vision evaluation now covers all ranked results instead of only a top subset
|
|
- Search results now prioritize AI note text visually ahead of source summary
|
|
- Search query order and final top results now include light randomness so repeated searches are less static
|
|
|
|
## Current Features Implemented
|
|
- [x] Project folder structure
|
|
- [x] Dockerfile
|
|
- [x] Gitea workflow
|
|
- [x] Unraid template
|
|
- [x] SQLite download history
|
|
- [x] File upload
|
|
- [x] yt-dlp direct downloader
|
|
- [x] Preview modal for direct download
|
|
- [x] Crop selection slider
|
|
- [x] Quality selection
|
|
- [x] WebSocket realtime progress
|
|
- [x] Search source toggles
|
|
- [x] Search card hover preview support
|
|
- [x] Result viewer modal for search results
|
|
- [x] Google Video direct-download handoff from search results
|
|
- [x] Debug log panel in frontend
|
|
- [x] `.log` download from debug panel
|
|
- [x] Local self-test workflow
|
|
- [x] Source-specific search collectors
|
|
- [x] Shared ranker service layer
|
|
|
|
## Important Current Constraints / Known Problems
|
|
- Search backend quality is still the most fragile subsystem.
|
|
- Search relevance is still heuristic-heavy and not yet benchmarked against a durable real-query set.
|
|
- Embedded third-party result viewing is no longer relied on because many providers block iframe embedding with `X-Frame-Options` / CSP.
|
|
- Artgrid hover-video preview is still partial / unresolved:
|
|
- provided Artgrid HTML snapshots and downloaded asset bundles did not expose a stable public preview mp4/m3u8 URL
|
|
- public HTML often only exposes title / description / thumbnail / canonical URL
|
|
- Artgrid can still be sensitive to how SearXNG indexes canonical domains.
|
|
- Full browser-level validation is still not covered by local self-test.
|
|
- Frontend JavaScript still has no Node-based lint/build step in this environment.
|
|
- Search cards now separate source snippet from AI reason, but metadata fidelity still depends on source enrichment quality.
|
|
- Gemini notes are now intended to be Korean, but final output quality still depends on Gemini response consistency.
|
|
- The local self-test script is better than before, but it is still a smoke test, not full integration coverage.
|
|
|
|
## Current Risks Around Search Quality
|
|
- Upstream SearXNG quality still controls the candidate pool.
|
|
- Gemini Vision can only rerank the candidates it receives.
|
|
- If source enrichment fails, Gemini may still judge a weaker proxy thumbnail or fallback image.
|
|
- Compound Korean intents are better handled now, but the translation path is still heuristic and can drift on niche concepts.
|
|
- Running Gemini Vision across all ranked results increases latency and token usage compared with the earlier capped approach.
|
|
|
|
## Frontend Debug Logger
|
|
- UI button: bottom-right `Logs`
|
|
- Files:
|
|
- `frontend/index.html`
|
|
- `frontend/app.js`
|
|
- `frontend/style.css`
|
|
- Logs currently capture:
|
|
- API request / response
|
|
- WebSocket progress messages
|
|
- ignored WS debug messages
|
|
- status updates
|
|
- platform toggle state
|
|
- result viewer modal open / close
|
|
- preview source attach / detach
|
|
- hover start / hover end
|
|
- hover play errors
|
|
- modal preview open / close
|
|
- browser errors
|
|
- promise rejections
|
|
- backend debug broadcasts
|
|
|
|
## Recent Change Log
|
|
- Date: `2026-03-16`
|
|
- What changed:
|
|
- Result modal layout was rebuilt to match a top `16:9` embedded viewer with bottom-left full AI note and bottom-right action panel.
|
|
- Google Video results now load YouTube embed URLs in the modal viewer and keep the white `Direct Download` action in the lower-right panel.
|
|
- When Gemini evaluation comes back mostly negative or too weak, the backend now runs one supplemental search pass with broader intent variants and reevaluates the merged pool.
|
|
- Failed Gemini batch evaluations now retry sequentially candidate-by-candidate with a short delay so more candidates can still be processed when batch/token evaluation is unstable.
|
|
- Why it changed:
|
|
- The requested modal information hierarchy was different from the previous implementation.
|
|
- The user wanted negative Gemini feedback to trigger more exploration instead of stopping at the first pool.
|
|
- Batch-level Gemini failures were causing too many results to skip evaluation entirely.
|
|
- How it was verified:
|
|
- code-path inspection against the updated modal wiring and search flow
|
|
- What is still risky or incomplete:
|
|
- Non-YouTube third-party pages can still refuse iframe embedding via CSP or `X-Frame-Options`.
|
|
- Sequential Gemini retries improve coverage but also increase latency when the model is degraded.
|
|
|
|
- Date: `2026-03-16`
|
|
- What changed:
|
|
- Bumped frontend asset version and added result-modal initialization guards to avoid click failures when browser cache serves mismatched HTML/JS.
|
|
- Why it changed:
|
|
- The result modal stopped opening after the modal markup refactor, which is consistent with stale cached frontend assets or partially initialized modal DOM.
|
|
- How it was verified:
|
|
- static code inspection of modal DOM/JS bindings
|
|
- What is still risky or incomplete:
|
|
- Browser cache behavior itself was not fully reproduced here, so a hard refresh may still be needed in an already-open client session.
|
|
|
|
- Date: `2026-03-16`
|
|
- What changed:
|
|
- Envato preview extraction now also inspects `INITIAL_HYDRATION_DATA` when direct page meta / JSON-LD preview URLs are missing.
|
|
- Search result cards and result modal now surface AI note before source summary text.
|
|
- Google Video direct download action moved into the AI note area and now seeds Zone C input before opening the shared preview/download modal.
|
|
- Result modal no longer depends on third-party iframe embedding and instead shows internal preview media plus external-open fallback.
|
|
- Search flow now shuffles collector query order and lightly randomizes the top merged results to reduce identical repeated outputs.
|
|
- Why it changed:
|
|
- Some Envato items still missed preview URLs.
|
|
- Third-party iframe embedding was failing for blocked sites and creating a poor modal experience.
|
|
- The user wanted AI note to be the primary explanatory text and Google Video download action to be more obvious.
|
|
- Repeated searches returning the same ordering made the discovery experience feel too static.
|
|
- How it was verified:
|
|
- `go test ./...`
|
|
- `python3 -m py_compile worker/downloader.py scripts/mock_searxng.py`
|
|
- `bash scripts/selftest.sh`
|
|
- added unit coverage for Envato hydration preview extraction
|
|
- What is still risky or incomplete:
|
|
- Envato hydration structure could change again, so this fallback is still heuristic.
|
|
- Full browser-level validation of the revised result modal and button placement was not fully reproducible in this environment.
|
|
- Search randomness currently changes ordering and query traversal, but does not guarantee materially different source pools if upstream SearXNG returns a narrow candidate set.
|
|
|
|
## Current Environment Variables
|
|
- `APP_ROOT`
|
|
- `APP_ADDR`
|
|
- `SQLITE_PATH`
|
|
- `DOWNLOADS_DIR`
|
|
- `FRONTEND_DIR`
|
|
- `WORKER_SCRIPT`
|
|
- `SEARXNG_BASE_URL`
|
|
- `SEARXNG_GOOGLE_VIDEO_ENGINE`
|
|
- `SEARXNG_WEB_ENGINE`
|
|
- `GEMINI_API_KEY`
|
|
|
|
## Local Self-Test Workflow
|
|
- Primary command:
|
|
- `bash scripts/selftest.sh`
|
|
- What it currently verifies:
|
|
- Go formatting for touched backend files
|
|
- Python syntax for worker + mock SearXNG
|
|
- `go test ./...`
|
|
- backend binary build
|
|
- local app boot with temp SQLite/download dirs
|
|
- `/healthz`
|
|
- `/api/search` using local mock SearXNG
|
|
- `/api/upload`
|
|
- Notes:
|
|
- search step now retries to reduce startup timing flakiness
|
|
- this is a smoke test, not a browser-level verification suite
|
|
|
|
## Verified Locally In This Environment
|
|
- [x] `go build -o /tmp/ai-media-hub ./backend`
|
|
- [x] `go test ./...`
|
|
- [x] Python syntax check for worker + self-test helper
|
|
- [x] local app boot / `/healthz` through `scripts/selftest.sh`
|
|
- [x] local `/api/search` against mock SearXNG through `scripts/selftest.sh`
|
|
- [x] local `/api/upload` through `scripts/selftest.sh`
|
|
- [ ] full browser-level validation was not fully reproducible in this environment
|
|
|
|
## Recent Change Log
|
|
- Date: `2026-03-16`
|
|
- What changed:
|
|
- Reduced the heaviest search-stage caps slightly: fewer query variants per request, smaller per-source result caps, lower enrichment scope, and a bounded Gemini candidate set.
|
|
- Why it changed:
|
|
- The widened search configuration was pushing the request past the reverse-proxy timeout and surfacing `504 Gateway Time-out`.
|
|
- How it was verified:
|
|
- `go test ./...`
|
|
- `bash scripts/selftest.sh`
|
|
- What is still risky or incomplete:
|
|
- Search coverage is still broader than the original baseline, but there is now an explicit tradeoff between result volume and request latency.
|
|
|
|
- Date: `2026-03-16`
|
|
- What changed:
|
|
- Increased collector result caps and widened source-specific search query templates for Envato, Artgrid, and Google Video.
|
|
- Strengthened Gemini query-expansion and vision prompts with a professional video-editor framing.
|
|
- Restored result modal media fallback so Google Video uses YouTube embed while Envato and Artgrid can show preview video or thumbnail instead of blocked iframe pages.
|
|
- Expanded generic preview URL parsing so HTML-embedded `.mp4` and `.m3u8` sources are accepted more broadly.
|
|
- Why it changed:
|
|
- Search result volume was too low.
|
|
- The user wanted Gemini to reason more like a professional editor.
|
|
- Envato iframe pages were being refused, Google Video modal opening was broken, and preview extraction still missed known media URLs.
|
|
- How it was verified:
|
|
- local code inspection against attached Envato / Artgrid HTML samples
|
|
- `go test ./...`
|
|
- What is still risky or incomplete:
|
|
- The attached Artgrid HTML sample is a generic homepage shell, so preview extraction still depends on what the live clip page or downstream assets expose at runtime.
|
|
- Some providers can still refuse iframe rendering even when Artgrid pages currently appear to work.
|
|
|
|
## Unraid / Docker / CI Notes
|
|
- Dockerfile uses:
|
|
- Go build stage
|
|
- static ffmpeg image stage
|
|
- Python runtime stage
|
|
- Heavy apt ffmpeg install path was removed earlier to reduce build time.
|
|
- Gitea workflow builds and pushes:
|
|
- `git.savethenurse.com/savethenurse/ai-media-hub:latest`
|
|
- `git.savethenurse.com/savethenurse/ai-media-hub:${{ github.sha }}`
|
|
|
|
## Recent Relevant Commits
|
|
- `9637b76` Improve query intent handling and preview playback
|
|
- `6d9391b` Expand Artgrid query coverage to artlist canonical URLs
|
|
- `d8cc32e` Fix Gemini candidate cap causing search 500s
|
|
- `e426261` Fix Artgrid collector matching and split ranker
|
|
- `5aebbef` Refactor search into source-specific collectors
|
|
- `ae091c5` Improve source parsing from Envato and Artgrid HTML
|
|
- `06ea4f3` Restore Envato and Artgrid fallback search breadth
|
|
- `7dfb1ad` Stabilize search pipeline and improve preview diagnostics
|
|
- `6f3149a` Add local self-test flow and fix fallback regressions
|
|
- `f968458` Rewrite TODO as project handover
|
|
|
|
## Git / Push Status
|
|
- Last pushed commit known in earlier work:
|
|
- `6d9391b` was pushed successfully
|
|
- Local-only work currently exists:
|
|
- `9637b76 Improve query intent handling and preview playback`
|
|
- Push status for `9637b76`:
|
|
- not pushed
|
|
- remote rejected the push with:
|
|
- `remote unpack failed: unable to create temporary object directory`
|
|
- `remote rejected main -> main (unpacker error)`
|
|
- Interpretation:
|
|
- current blocker appears to be on the remote git server side, not a local git history issue
|
|
|
|
## Highest-Value Next Steps
|
|
- [ ] Re-try push of local commit once remote git storage/unpacker issue is resolved
|
|
- [ ] Build collector-specific integration tests with recorded SearXNG samples
|
|
- [ ] Separate source enrichment tests from live network behavior using local fixtures
|
|
- [ ] Add a browser-level preview validation path, especially for hover video
|
|
- [ ] If Artgrid hover preview is still required, obtain one real clip HAR / DevTools network export and derive a stable preview URL parser
|
|
- [ ] Build a small fixed real-query benchmark set to evaluate search quality before further tuning
|
|
- [ ] If frontend tooling becomes available, add lint/build checks
|
|
|
|
## Short Handover Summary
|
|
- The codebase runs.
|
|
- Upload/download features mostly exist.
|
|
- Search has been significantly refactored and is in a better shape than before, but is still the main unstable area.
|
|
- Envato source fidelity is much better than earlier.
|
|
- Artgrid source fidelity is better, but preview-video extraction is still incomplete.
|
|
- There is now a local self-test workflow.
|
|
- There is one known local commit that has not been pushed because the remote repo reported an unpacker error.
|