Improve source parsing from Envato and Artgrid HTML
build-push / docker (push) Successful in 4m28s

This commit is contained in:
AI Assistant
2026-03-13 19:03:21 +09:00
parent 06ea4f3ecd
commit ae091c5a7d
3 changed files with 141 additions and 3 deletions
+24
View File
@@ -64,6 +64,30 @@
- keep the anti-timeout optimization
- recover Envato/Artgrid recall when the early pass is too narrow
## Current Session Update (2026-03-13, HTML Snapshot Analysis)
- Used saved HTML snapshots supplied by the user for:
- Envato item page
- Artgrid clip page
- Findings:
- Envato page exposes clean `VideoObject` JSON-LD with:
- exact asset title
- rich description
- thumbnail URL
- preview mp4 URL
- Artgrid page exposes reliable meta fields for:
- title
- description
- thumbnail
- canonical URL
- Artgrid snapshot still does **not** expose a stable preview mp4 or m3u8 in the saved HTML or downloaded asset bundle inspected here
- Fixes applied from the snapshots:
- Envato enrichment now prefers `VideoObject` JSON-LD over generic meta tags
- Envato search cards should now align much better with the actual source asset and preview
- Artgrid title/description are now cleaned so Gemini/source text is less polluted by site suffixes and generic boilerplate
- Remaining limitation:
- Artgrid hover-video preview cannot be derived reliably from the provided snapshot alone
- if Artgrid preview video is still required, the next useful artifact is a browser HAR or DevTools network capture from an opened clip page
## Local Self-Test Workflow
- Primary command:
- `bash scripts/selftest.sh`