Stabilize search pipeline and improve preview diagnostics
build-push / docker (push) Successful in 4m14s

This commit is contained in:
AI Assistant
2026-03-13 18:32:54 +09:00
parent 6f3149a443
commit 7dfb1ad2de
8 changed files with 463 additions and 45 deletions
+30 -1
View File
@@ -23,6 +23,33 @@
- `go build ./backend` from repo root conflicts with the existing `backend/` directory name
- verified build command is now treated as `go build -o /tmp/... ./backend`
## Current Session Update (2026-03-13, Search/Preview Follow-up)
- Investigated a production search failure using downloaded frontend logs.
- Identified the main timeout cause:
- too many search results were being collected
- too many Gemini Vision batches were being evaluated sequentially
- backend debug messages were broadcasting oversized result payloads
- Applied search pipeline optimization:
- reduced per-source result caps
- reduced query fan-out for Google Video
- reduced enrichment cap
- limited Gemini Vision evaluation to top-ranked candidates only
- Improved Google Video filtering:
- added bans for music/BGM/trailer-style noise results
- Improved Envato enrichment fidelity:
- source page metadata is now preferred over search-engine proxy thumbnails
- source snippet/title are now taken from page metadata when available
- preview mp4 extraction now works via HTML/JSON-LD parsing
- added Python HTML fetch fallback for Cloudflare-challenged Envato pages because Go HTTP alone was receiving 403 challenge pages in testing
- Improved Artgrid fidelity:
- source page title/description/thumbnail are now preferred over search-engine snippets when available
- preview extraction is still not considered solved for all Artgrid clips because public HTML tested here did not expose a stable mp4/m3u8 URL
- Improved logging:
- backend search debug events now emit summaries, timings, source counts, preview counts, and Gemini batch stats instead of giant raw arrays
- frontend now logs raw non-JSON error bodies instead of collapsing them to `{}` on gateway/proxy failures
- Improved result rendering:
- search cards now show source snippet/description separately from AI reason to reduce confusion between asset metadata and Gemini commentary
## Local Self-Test Workflow
- Primary command:
- `bash scripts/selftest.sh`
@@ -145,7 +172,8 @@
- Gemini batch evaluation exists, but search quality can still degrade if upstream SearXNG results are noisy.
- Frontend JavaScript was not linted with Node tooling in this environment because `node` is not installed here.
- Full browser-level preview validation is still not covered by the local self-test script.
- Search cards still render recommendation reason text, not a robust asset description/snippet mapping.
- Search cards now separate source snippet from AI reason, but metadata fidelity still depends on source enrichment quality.
- Artgrid public pages inspected from this environment still did not expose a stable public preview video URL in HTML, so Artgrid hover-video support may remain partial until a browser-captured HTML/HAR sample reveals the real preview source pattern.
## Frontend Debug Logger
- UI button: bottom-right `Logs`
@@ -215,6 +243,7 @@
- [ ] Better matching between rendered description and actual linked asset
- [ ] Add browser-level verification for preview/HLS behavior
- [ ] Add more automated coverage for search ranking / filtering logic
- [ ] If Artgrid hover preview is still required, collect one real clip HTML/HAR from a browser session and derive a stable preview URL parser
- [ ] Add proper frontend build/lint step if Node becomes available
## Verified Locally In This Environment