diff --git a/TODO.md b/TODO.md index fce32e2..72f944a 100644 --- a/TODO.md +++ b/TODO.md @@ -429,35 +429,79 @@ - `f968458` Rewrite TODO as project handover ## Git / Push Status -- Last pushed commit known in earlier work: - - `6d9391b` was pushed successfully -- Local-only work currently exists: - - `9637b76 Improve query intent handling and preview playback` -- Push status for `9637b76`: - - not pushed - - remote rejected the push with: - - `remote unpack failed: unable to create temporary object directory` - - `remote rejected main -> main (unpacker error)` -- Interpretation: - - current blocker appears to be on the remote git server side, not a local git history issue +- Current branch in ongoing work: `main` +- Current state: + - latest work in this environment has been pushed successfully multiple times after the earlier remote unpacker issue + - the older push failure note is historical context only and should not be treated as the current repo state +- Operational note: + - because the frontend is static and aggressively cached, browser hard refreshes are often required after UI / modal / preview changes ## Highest-Value Next Steps -- [ ] Re-try push of local commit once remote git storage/unpacker issue is resolved +- [ ] Reduce `/api/search` latency further without collapsing result count +- [ ] Improve Envato / Artgrid preview acquisition reliability so Gemini Vision sees real frames more often +- [ ] Revisit Google Video UX: + - current YouTube embed was abandoned due error `153` + - current in-app panel is more reliable but less rich than a true embedded watch page - [ ] Build collector-specific integration tests with recorded SearXNG samples - [ ] Separate source enrichment tests from live network behavior using local fixtures -- [ ] Add a browser-level preview validation path, especially for hover video +- [ ] Add a browser-level preview validation path, especially for hover video and preview proxy routing - [ ] If Artgrid hover preview is still required, obtain one real clip HAR / DevTools network export and derive a stable preview URL parser - [ ] Build a small fixed real-query benchmark set to evaluate search quality before further tuning - [ ] If frontend tooling becomes available, add lint/build checks ## Short Handover Summary - The codebase runs. -- Upload/download features mostly exist. -- Search has been significantly refactored and is in a better shape than before, but is still the main unstable area. -- Envato source fidelity is much better than earlier. -- Artgrid source fidelity is better, but preview-video extraction is still incomplete. +- Upload / direct-download features mostly exist and are broadly usable. +- Search is functional but still the least stable subsystem by a wide margin. +- Envato source fidelity is better than before, but Cloudflare / fetch failures still affect enrichment and preview acquisition. +- Artgrid source fidelity is improved, but query coverage and preview extraction are still unreliable. - There is now a local self-test workflow. -- There is one known local commit that has not been pushed because the remote repo reported an unpacker error. +- Backend debug logging is now much more detailed and intended to support exported log-file analysis from the in-app `Logs` panel. + +## Current Reality Check +- Search request flow is now heavily instrumented. +- The frontend `Logs` panel can capture: + - API request start / completion + - SearXNG request / response counts + - collector query expansion + - enrichment start / finish + - Gemini translation / vision preparation / batch failures + - preview proxy fetch / cache events +- The latest broad issue pattern observed in logs is: + - too many SearXNG calls for a single request can still push total latency too high + - Envato / Artgrid often fail to provide enough preview-capable candidates for Gemini Vision + - Google Video is frequently the easiest source to retrieve and therefore can dominate final results + - YouTube embed error `153` made the prior Google modal approach unreliable + +## Active Problems +- `504 Gateway Time-out` + - Root cause: `/api/search` can still become too expensive when query expansion, source collectors, enrichment, and Gemini batch retries stack together. + - Current mitigation: request-level time budget and partial-result return path. + - Residual risk: fewer reviewed results can be returned when the budget is exhausted. +- Too many Google Video-only result sets + - Root cause: Envato / Artgrid queries can still produce repeated `rawCount: 0` responses from SearXNG. + - Current mitigation: looser unquoted query variants for both sources. + - Residual risk: upstream SearXNG quality still dominates discovery. +- Gemini Vision partial or weak evaluation + - Root cause: many candidates still lack usable thumbnails / preview frames, so Gemini sees fewer real visuals than the raw result count suggests. + - Current mitigation: more verbose visual-fetch logging, preview-video-first strategy for Envato / Artgrid, and partial backfill from ranked candidates. + - Residual risk: if source media cannot be fetched, Gemini quality still degrades sharply. +- Envato / Artgrid preview instability + - Root cause: source HTML can be incomplete, fetches can fail, and some previews may only appear after client-side rendering or protected media access paths. + - Current mitigation: JSON-LD/meta/hydration parsing, delayed retry, preview proxy route, MP4 cache, and HLS playlist rewriting. + - Residual risk: a real browser-rendered fetch path may still be needed later for some pages. +- Google Video popup UX + - Root cause: YouTube embed error `153`. + - Current mitigation: dedicated in-app Google panel instead of direct embed. + - Residual risk: this is reliable but not as rich as showing the live watch page. + +## Current Technical Notes +- Preview proxy route: + - `/api/preview/stream` + - MP4 responses can be cached to disk + - HLS playlists are rewritten so segment fetches also flow through the backend +- Frontend cache busting is done via `/app.js?v=...` +- If behavior in the browser does not match the latest backend/frontend code, the first assumption should be stale frontend assets until proven otherwise ## Recent Change Log - Date: `2026-03-16`