enrichment: drop LLM for structured info, dedup images by sha + phash

Per user request, the LLM is no longer asked to extract rooms/size/rent/WBS —
those come from the inberlinwohnen.de scraper which is reliable. Haiku is now
used for one narrow job: pick which <img> URLs from the listing page are
actual flat photos (vs. logos, badges, ads, employee portraits). On any LLM
failure the unfiltered candidate list passes through.

Image dedup runs in two tiers:
1. SHA256 of bytes — drops different URLs that point to byte-identical files
2. Perceptual hash (Pillow + imagehash, Hamming distance ≤ 5) — drops the
   "same image at a different resolution" duplicates from srcset / CDN
   variants that were filling galleries with 2–4× copies

UI:
- Wohnungsliste falls back to scraper-only display (rooms/size/rent/wbs)
- Detail panel only shows images + "Zur Original-Anzeige →"; description /
  features / pros & cons / kv table are gone
- Per-row "erneut versuchen" link + the "analysiert…/?" status chips were
  tied to LLM extraction and are removed; the header "Bilder nachladen (N)"
  button still surfaces pending/failed batches for admins

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
EiSiMo 2026-04-21 15:29:55 +02:00
parent 374368e4af
commit 0aa4c6c2bb
6 changed files with 137 additions and 233 deletions

View file

@ -403,13 +403,7 @@ def _wohnungen_context(user) -> dict:
}, filters):
continue
last = db.last_application_for_flat(uid, f["id"])
enrichment_data = None
if f["enrichment_json"]:
try:
enrichment_data = json.loads(f["enrichment_json"])
except Exception:
enrichment_data = None
flats_view.append({"row": f, "last": last, "enrichment": enrichment_data})
flats_view.append({"row": f, "last": last})
rejected_view = db.rejected_flats(uid)
enrichment_counts = db.enrichment_counts()
@ -489,12 +483,6 @@ def partial_wohnung_detail(request: Request, flat_id: str, user=Depends(require_
flat = db.get_flat(flat_id)
if not flat:
raise HTTPException(404)
enrichment_data = None
if flat["enrichment_json"]:
try:
enrichment_data = json.loads(flat["enrichment_json"])
except Exception:
enrichment_data = None
slug = enrichment.flat_slug(flat_id)
image_urls = [
f"/flat-images/{slug}/{i}"
@ -503,7 +491,6 @@ def partial_wohnung_detail(request: Request, flat_id: str, user=Depends(require_
ctx = {
"request": request,
"flat": flat,
"enrichment": enrichment_data,
"enrichment_status": flat["enrichment_status"],
"image_urls": image_urls,
}