- Python 100%
- 247/247 L0 + 846/846 L1 + 877/1962 L2 covered, so any product
walking the OFF taxonomy upward will hit an icon
- 1999/2000 icons (en:rabbit-meat dropped — gpt-image moderation
false-positive on raw animal product wording)
- same comic_v4 style as bls-icons; mixable in one app
Pipeline:
- gpt-5-mini prompter chooses Single / Group / Process per category
- gpt-image-2 quality=low via OpenAI Batch API
- rembg + BiRefNet-massive on Modal A10G for transparent variants
- two manual review rounds (flood-vs-AI swap + feedback-driven regen
for 32 items) — total cost ~$8
Storage: icons + icons_raw via Git LFS so a metadata-only clone stays
small.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| icons | ||
| icons_raw | ||
| style | ||
| tools | ||
| .gitattributes | ||
| .gitignore | ||
| CONTRIBUTING.md | ||
| generate.py | ||
| grid.png | ||
| grid_alpha.png | ||
| items.csv | ||
| LICENSE | ||
| modal_postprocess.py | ||
| off_categories.py | ||
| off_categories_ranked.csv | ||
| README.md | ||
| requirements.txt | ||
Open Food Facts Category Icon Set
In my project ACP (Adaptive Calorie Tracker) I am using the Open Food Facts taxonomy to categorize generic products. Like my companion repo bls-icons (German BLS 4.0 nutritional database), I needed clean, same-styled icons for each entry — so I generated my own with AI. Same comic_v4 style as bls-icons, so an app can mix both sets without visual seams.
100 random samples from the dataset:
Same items with backgrounds removed (the checker is just to show the alpha — the actual files are transparent):
Use it in your app
git clone ssh://git@git.moritz.run:2222/moritz/off-icons.git
cd off-icons
git lfs pull # download all 1999 PNGs (~3 GB)
import csv
items = list(csv.DictReader(open("items.csv", encoding="utf-8")))
slug = items[0]["slug"] # e.g. "en__plant-based-foods"
icon_path = f"icons/{slug}.png" # transparent PNG
code = slug.replace("__", ":", 1) # → "en:plant-based-foods"
Without git lfs pull you only get the metadata (~3 MB, clones in seconds).
Dataset
| OFF categories covered | 1999 of the top-2000 by importance |
| Top-level (L0) coverage | 247 / 247 (100%) |
| L1 coverage | 846 / 846 (100%) |
| L2 coverage | 877 / 1962 (44%) |
| Resolution | 1024×1024 PNG |
icons_raw/ |
source images, white background |
icons/ |
transparent (after background removal) |
items.csv |
one row per icon: rank, code, slug, name_de, name_en, depth, parents |
off_categories_ranked.csv |
full 14,248-row ranking (build input) |
The top-2000 slice covers every L0 and L1 category, so any product walking the
taxonomy upward from its most specific tag will eventually hit an icon. One
icon (en:rabbit-meat) is missing — OpenAI's image moderation false-flagged
the prompt and I left it out rather than fight the safety filter.
Slug mapping
OFF taxonomy IDs use colons (en:plant-based-foods) which are illegal as
Windows filenames and risky as OpenAI batch custom_ids. We replace : with
__:
en:plant-based-foods ↔ en__plant-based-foods.png
items.csv carries both code and slug columns. Reverse-lookup at
runtime: code = slug.replace("__", ":", 1) — every OFF code has exactly one
colon.
How it was made
- Source. Open Food Facts taxonomy + product database, parsed by
off_categories.py. 14,248 categories total, ranked by importance for German products:score = n_products_de + 1000 × (8 − depth)so root categories (foundation of the icon-fallback chain) sit at the top, then frequent ones. - Top-N selection.
tools/make_items.py --top 2000slices the ranked CSV intoitems.csv. The 2000-item cap covers all L0 + all L1 categories. - Prompt generation. Per category,
gpt-5-minireads the German + English name, depth, and parent chain (for disambiguation), and decides between three visual modes — Single Item (one motif, e.g.Käse→ a cheese wedge), Group (2-3 representatives, e.g.Milchprodukte→ cheese- milk + yoghurt), or Process Bucket (one packaging archetype, e.g.
Tiefkühlprodukte→ a generic frozen-food box). Style spec (style/comic_v4.md) is identical to bls-icons. Submitted as a chat-completions Batch (50% off, async). ~$1.50 for the full run.
- milk + yoghurt), or Process Bucket (one packaging archetype, e.g.
- Image generation.
gpt-image-2at qualitylow, 1024×1024, via the OpenAI Batch API. Output is a PNG with white background. ~$5.50 for 2000 images. - Background removal.
BiRefNet-massivevia therembglibrary on Modal serverless A10G GPUs. ~$0.20 and ~5–8 min wall time. - Manual review. Every icon was reviewed in two rounds via
tools/review.py:- Round 1: pick between BiRefNet alpha and a cheap PIL flood-fill (the latter wins for ~17% of icons where BiRefNet over-erased low-contrast details). 32 icons got verbal feedback for a re-try.
- Round 2: 32 prompt rewrites + image regens, reviewed against the originals; 31 swapped in, 1 reverted. Total refinement cost: ~$0.05.
Total cost end-to-end: ~$8 for 1999 production-ready icons.
Repo layout
.
├── items.csv 2000 icons (rank, code, slug, name_de, name_en, depth, parents)
├── off_categories.py downloads + ranks the full OFF taxonomy
├── off_categories_ranked.csv 14,248-row full ranking (build input)
├── generate.py end-to-end pipeline (prompter + image batch + bg removal)
├── modal_postprocess.py Modal entry point for background removal
├── style/comic_v4.md visual style spec (shared with bls-icons)
├── grid.png README preview (regenerable via tools/make_grid.py)
├── grid_alpha.png transparent variant
├── icons_raw/ white-bg PNGs (LFS)
├── icons/ transparent PNGs after bg removal (LFS)
└── tools/
├── make_items.py slice top-N from the ranked CSV → items.csv
├── make_grid.py regenerate grid.png / grid_alpha.png
├── review.py Tk reviewer (round 1: flood vs AI, round 2: 4-cell compare)
├── apply_flood.py apply round-1 flood-swap decisions to icons/
├── regen_with_feedback.py round-2 regen pipeline (re-batches with feedback appended)
└── apply_v2.py merge round-2 decisions back into icons/ + icons_raw/
Regenerate
# companion file: style spec lives here too
pip install -r requirements.txt
# OpenAI key for prompter + image gen
echo "OPENAI_API_KEY=sk-..." > .env
# slice ranked categories → items.csv (top 2000)
python tools/make_items.py --top 2000
# end-to-end (~$8, completes in <24 h via Batch API)
python generate.py # submit
python generate.py --fetch # poll, download, bg-remove, sync
# (optional) review loop
python tools/review.py
python tools/apply_flood.py
python tools/regen_with_feedback.py --submit
python tools/regen_with_feedback.py --fetch
modal run modal_postprocess.py --in-dir review_v2/raw --out-dir review_v2/alpha
python tools/review.py --v2
python tools/apply_v2.py
Models
| Step | Model | Mode | Approx. cost (full 2000-item run) |
|---|---|---|---|
| Prompter | gpt-5-mini (reasoning_effort=minimal) |
OpenAI Batch | ~$1.50 |
| Image gen | gpt-image-2 quality low |
OpenAI Batch | ~$5.50 |
| Background removal | BiRefNet-massive (rembg) |
Modal A10G GPU | ~$0.20 |
Storage
Icons are stored via Git LFS (*.png in icons/ and icons_raw/). A
plain git clone fetches only the small text/CSV files; binaries arrive on
first checkout (or git lfs pull). The repo itself stays small enough to
clone in seconds.
License
Released into the public domain under CC0 1.0 (matching bls-icons). Use, modify, and redistribute the icons, code, and metadata for any purpose without attribution.