Commit graph

8 commits

Author SHA1 Message Date
ee7ba6c6ff fix: round €/m² in Telegram, drop "Bilder nachladen" admin button, fix lightbox visibility
- notifications: round sqm_price to whole € in Telegram match messages
  (was emitting raw float like "12.345614 €/m²").

- wohnungen: remove the admin-only "Bilder nachladen (N)" button. It
  flickered into view whenever a freshly-scraped flat was still in
  pending state, which was effectively random from the user's point of
  view, and the manual backfill it triggered isn't needed anymore — new
  flats are auto-enriched at scrape time. Also drops the dead helpers
  it was the sole caller of: enrichment.kick_backfill,
  enrichment._backfill_runner, db.flats_needing_enrichment,
  db.enrichment_counts.

- lightbox: the modal didn't appear because Tailwind's Play CDN injects
  its own .hidden { display: none } rule at runtime, which kept fighting
  our class toggle. Switch the show/hide to inline style.display so no
  external stylesheet can mask it. Single-class .lightbox now only owns
  the layout — the initial-hidden state is on the element via
  style="display:none".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 12:48:14 +02:00
d06dfdaca1 refactor: rename wohnungsdidi → lazyflat
Container names, FastAPI titles, email subjects, filenames, brand text,
session cookie, User-Agent, docstrings, README. Volume lazyflat_data and
/data/lazyflat.sqlite already used the new name, so on-disk data is
preserved; dropped the now-obsolete legacy-rename comments.

Side effect: SESSION_COOKIE_NAME change logs everyone out on deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 09:26:12 +02:00
eb73b5e415 correctness batch: atomic writes, task refs, hmac, import-star, pickle
Per review §2:

- web/db.py: new _tx() context manager wraps multi-statement writers in
  BEGIN IMMEDIATE … COMMIT/ROLLBACK (our connections run in autocommit
  mode, so plain `with _lock:` doesn't give atomicity). partnership_accept
  (UPDATE + DELETE) and cleanup_retention (3 deletes/updates) now use it.
- Fire-and-forget tasks: add module-level _bg_tasks sets in web/app.py and
  web/enrichment.py. A _spawn() helper holds a strong ref until the task
  finishes so the GC can't drop it mid-flight (CPython's event loop only
  weakly references pending tasks).
- apply/main.py: require_api_key uses hmac.compare_digest, matching web's
  check. Also imports now use explicit names instead of `from settings *`.
- apply/language.py: replace `from settings import *` + `from paths import *`
  with explicit imports — this is the pattern that caused the LANGUAGE
  NameError earlier.
- alert/utils.py: pickle-based hash_any_object → deterministic JSON+sha256.
  Cheaper, portable across Python versions, no pickle attack surface.
- web/notifications.py: /fehler links repointed to /bewerbungen (the
  former page doesn't exist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 19:14:26 +02:00
0c18f0870a rename to wohnungsdidi + didi logo + footer for all + seconds-only counter
- App is now called "wohnungsdidi" everywhere user-facing (page title,
  nav brand, login header, notification subjects, report filename,
  FastAPI titles, log messages)
- Brand dot replaced with an image of Didi (web/static/didi.webp),
  rendered as a round 2.25rem avatar in _layout + login
- "Programmiert für Annika ♥" footer now shows for every logged-in user,
  not only Annika
- Count-up shows only seconds ("vor 73 s") regardless of age — no
  rollover to minutes/hours
- Data continuity: DB file stays /data/lazyflat.sqlite and the Docker
  volume stays lazyflat_data so the rename doesn't strand existing data
- Session cookie renamed to wohnungsdidi_session (one-time logout on
  rollout)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:29:24 +02:00
83db8cd902 enrichment: size floor + LLM fallback for opaque CDN URLs
Two issues surfaced on HOWOGE and similar sites:

1. Tiny icons/1x1 tracking pixels leaked through (e.g. image #5, 1.8 KB).
   Added MIN_IMAGE_BYTES = 15_000 and MIN_IMAGE_DIMENSION = 400 px on the
   short side; files below either threshold are dropped before saving.
   Pillow already gives us the dims as part of the phash pass, so the
   check is free.

2. Listings whose image URLs are opaque CDN hashes
   (.../fileadmin/_processed_/2/3/xcsm_<hash>.webp.pagespeed.ic.<hash>.webp)
   caused the LLM URL picker to reject every candidate, yielding 0 images
   for legit flats. Fixes: (a) prompt now explicitly instructs Haiku to
   keep same-host /fileadmin/_processed_/ style URLs even when the filename
   is illegible, (b) if the model still returns an empty set we fall back
   to the unfiltered Playwright candidates, trusting the pre-filter instead
   of erasing the gallery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:44:47 +02:00
0aa4c6c2bb enrichment: drop LLM for structured info, dedup images by sha + phash
Per user request, the LLM is no longer asked to extract rooms/size/rent/WBS —
those come from the inberlinwohnen.de scraper which is reliable. Haiku is now
used for one narrow job: pick which <img> URLs from the listing page are
actual flat photos (vs. logos, badges, ads, employee portraits). On any LLM
failure the unfiltered candidate list passes through.

Image dedup runs in two tiers:
1. SHA256 of bytes — drops different URLs that point to byte-identical files
2. Perceptual hash (Pillow + imagehash, Hamming distance ≤ 5) — drops the
   "same image at a different resolution" duplicates from srcset / CDN
   variants that were filling galleries with 2–4× copies

UI:
- Wohnungsliste falls back to scraper-only display (rooms/size/rent/wbs)
- Detail panel only shows images + "Zur Original-Anzeige →"; description /
  features / pros & cons / kv table are gone
- Per-row "erneut versuchen" link + the "analysiert…/?" status chips were
  tied to LLM extraction and are removed; the header "Bilder nachladen (N)"
  button still surfaces pending/failed batches for admins

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:29:55 +02:00
a8f698bf5e enrichment: capture failure cause + admin retry button
Each enrichment failure now records {"_error": "...", "_step": "..."} into
enrichment_json, mirrors the message into the errors log (visible in
/logs/protokoll), and the list shows the cause as a tooltip on the
"Fehler beim Abrufen der Infos" text. Admins also get a "erneut versuchen"
link per failed row that re-queues just that flat (POST /actions/enrich-flat).

The pipeline raises a typed EnrichmentError per step (fetch / llm / crash)
so future failure modes don't get swallowed as a silent "failed".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:05:39 +02:00
eb66284172 enrichment: Haiku flat details + image gallery on expand
apply service
- POST /internal/fetch-listing: headless Playwright fetch of a listing URL,
  returns {html, image_urls[], final_url}. Uses the same browser
  fingerprint/profile as the apply run so bot guards don't kick in

web service
- New enrichment pipeline (web/enrichment.py):
  /internal/flats → upsert → kick() enrichment in a background thread
    1. POST /internal/fetch-listing on apply
    2. llm.extract_flat_details(html, url) — Haiku tool-use call returns
       structured JSON (address, rooms, rent, description, pros/cons, etc.)
    3. Download each image directly to /data/flats/<slug>/NN.<ext>
    4. Persist enrichment_json + image_count + enrichment_status on the flat
- llm.py: minimal Anthropic /v1/messages wrapper, no SDK
- DB migration v5 adds enrichment_json/_status/_updated_at + image_count
- Admin "Altbestand anreichern" button (POST /actions/enrich-all) queues
  backfill for all pending/failed rows; runs in a detached task
- GET /partials/wohnung/<id> renders _wohnung_detail.html
- GET /flat-images/<slug>/<n> serves the downloaded image

UI
- Chevron on each list row toggles an inline detail pane (HTMX fetch on
  first open, hx-preserve keeps it open across the 3–30 s polls)
- CSS .flat-gallery normalises image tiles to a 4/3 aspect with object-fit:
  cover so different source sizes align cleanly
- "analysiert…" / "?" chips on the list reflect enrichment_status

Config
- ANTHROPIC_API_KEY + ANTHROPIC_MODEL wired into docker-compose's web
  service (default model: claude-haiku-4-5-20251001)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:46:12 +02:00