AI-generated icon dataset for Open Food Facts categories

Python 100%

Find a file

moritz fb08baeffe release v1.0: 1999 icons covering the top 2000 OFF categories - 247/247 L0 + 846/846 L1 + 877/1962 L2 covered, so any product walking the OFF taxonomy upward will hit an icon - 1999/2000 icons (en:rabbit-meat dropped — gpt-image moderation false-positive on raw animal product wording) - same comic_v4 style as bls-icons; mixable in one app Pipeline: - gpt-5-mini prompter chooses Single / Group / Process per category - gpt-image-2 quality=low via OpenAI Batch API - rembg + BiRefNet-massive on Modal A10G for transparent variants - two manual review rounds (flood-vs-AI swap + feedback-driven regen for 32 items) — total cost ~$8 Storage: icons + icons_raw via Git LFS so a metadata-only clone stays small. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-10 11:43:20 +02:00
icons	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
icons_raw	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
style	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
tools	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
.gitattributes	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
.gitignore	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
CONTRIBUTING.md	sync CONTRIBUTING.md from moritz-meta	2026-05-09 12:03:02 +02:00
generate.py	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
grid.png	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
grid_alpha.png	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
items.csv	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
LICENSE	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
modal_postprocess.py	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
off_categories.py	init off-icons: ranking script + 14248 categories sorted by DE-importance	2026-05-09 11:20:50 +02:00
off_categories_ranked.csv	init off-icons: ranking script + 14248 categories sorted by DE-importance	2026-05-09 11:20:50 +02:00
README.md	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00
requirements.txt	release v1.0: 1999 icons covering the top 2000 OFF categories	2026-05-10 11:43:20 +02:00

README.md

Open Food Facts Category Icon Set

In my project ACP (Adaptive Calorie Tracker) I am using the Open Food Facts taxonomy to categorize generic products. Like my companion repo bls-icons (German BLS 4.0 nutritional database), I needed clean, same-styled icons for each entry — so I generated my own with AI. Same comic_v4 style as bls-icons, so an app can mix both sets without visual seams.

100 random samples from the dataset:

Same items with backgrounds removed (the checker is just to show the alpha — the actual files are transparent):

Use it in your app

git clone ssh://git@git.moritz.run:2222/moritz/off-icons.git
cd off-icons
git lfs pull        # download all 1999 PNGs (~3 GB)

import csv
items = list(csv.DictReader(open("items.csv", encoding="utf-8")))
slug = items[0]["slug"]                        # e.g. "en__plant-based-foods"
icon_path = f"icons/{slug}.png"                # transparent PNG
code = slug.replace("__", ":", 1)              # → "en:plant-based-foods"

Without git lfs pull you only get the metadata (~3 MB, clones in seconds).

Dataset


OFF categories covered	1999 of the top-2000 by importance
Top-level (L0) coverage	247 / 247 (100%)
L1 coverage	846 / 846 (100%)
L2 coverage	877 / 1962 (44%)
Resolution	1024×1024 PNG
`icons_raw/`	source images, white background
`icons/`	transparent (after background removal)
`items.csv`	one row per icon: `rank, code, slug, name_de, name_en, depth, parents`
`off_categories_ranked.csv`	full 14,248-row ranking (build input)

The top-2000 slice covers every L0 and L1 category, so any product walking the taxonomy upward from its most specific tag will eventually hit an icon. One icon (en:rabbit-meat) is missing — OpenAI's image moderation false-flagged the prompt and I left it out rather than fight the safety filter.

Slug mapping

OFF taxonomy IDs use colons (en:plant-based-foods) which are illegal as Windows filenames and risky as OpenAI batch custom_ids. We replace : with __:

en:plant-based-foods   ↔   en__plant-based-foods.png

items.csv carries both code and slug columns. Reverse-lookup at runtime: code = slug.replace("__", ":", 1) — every OFF code has exactly one colon.

How it was made

Source. Open Food Facts taxonomy + product database, parsed by off_categories.py. 14,248 categories total, ranked by importance for German products: score = n_products_de + 1000 × (8 − depth) so root categories (foundation of the icon-fallback chain) sit at the top, then frequent ones.
Top-N selection. tools/make_items.py --top 2000 slices the ranked CSV into items.csv. The 2000-item cap covers all L0 + all L1 categories.
Prompt generation. Per category, gpt-5-mini reads the German + English name, depth, and parent chain (for disambiguation), and decides between three visual modes — Single Item (one motif, e.g. Käse → a cheese wedge), Group (2-3 representatives, e.g. Milchprodukte → cheese
- milk + yoghurt), or Process Bucket (one packaging archetype, e.g. Tiefkühlprodukte → a generic frozen-food box). Style spec (style/comic_v4.md) is identical to bls-icons. Submitted as a chat-completions Batch (50% off, async). ~$1.50 for the full run.
Image generation. gpt-image-2 at quality low, 1024×1024, via the OpenAI Batch API. Output is a PNG with white background. ~$5.50 for 2000 images.
Background removal. BiRefNet-massive via the rembg library on Modal serverless A10G GPUs. ~$0.20 and ~5–8 min wall time.
Manual review. Every icon was reviewed in two rounds via tools/review.py:
- Round 1: pick between BiRefNet alpha and a cheap PIL flood-fill (the latter wins for ~17% of icons where BiRefNet over-erased low-contrast details). 32 icons got verbal feedback for a re-try.
- Round 2: 32 prompt rewrites + image regens, reviewed against the originals; 31 swapped in, 1 reverted. Total refinement cost: ~$0.05.

Total cost end-to-end: ~$8 for 1999 production-ready icons.

Repo layout

.
├── items.csv                  2000 icons (rank, code, slug, name_de, name_en, depth, parents)
├── off_categories.py          downloads + ranks the full OFF taxonomy
├── off_categories_ranked.csv  14,248-row full ranking (build input)
├── generate.py                end-to-end pipeline (prompter + image batch + bg removal)
├── modal_postprocess.py       Modal entry point for background removal
├── style/comic_v4.md          visual style spec (shared with bls-icons)
├── grid.png                   README preview (regenerable via tools/make_grid.py)
├── grid_alpha.png             transparent variant
├── icons_raw/                 white-bg PNGs (LFS)
├── icons/                     transparent PNGs after bg removal (LFS)
└── tools/
    ├── make_items.py          slice top-N from the ranked CSV → items.csv
    ├── make_grid.py           regenerate grid.png / grid_alpha.png
    ├── review.py              Tk reviewer (round 1: flood vs AI, round 2: 4-cell compare)
    ├── apply_flood.py         apply round-1 flood-swap decisions to icons/
    ├── regen_with_feedback.py round-2 regen pipeline (re-batches with feedback appended)
    └── apply_v2.py            merge round-2 decisions back into icons/ + icons_raw/

Regenerate

# companion file: style spec lives here too
pip install -r requirements.txt

# OpenAI key for prompter + image gen
echo "OPENAI_API_KEY=sk-..." > .env

# slice ranked categories → items.csv (top 2000)
python tools/make_items.py --top 2000

# end-to-end (~$8, completes in <24 h via Batch API)
python generate.py            # submit
python generate.py --fetch    # poll, download, bg-remove, sync

# (optional) review loop
python tools/review.py
python tools/apply_flood.py
python tools/regen_with_feedback.py --submit
python tools/regen_with_feedback.py --fetch
modal run modal_postprocess.py --in-dir review_v2/raw --out-dir review_v2/alpha
python tools/review.py --v2
python tools/apply_v2.py

Models

Step	Model	Mode	Approx. cost (full 2000-item run)
Prompter	`gpt-5-mini` (`reasoning_effort=minimal`)	OpenAI Batch	~$1.50
Image gen	`gpt-image-2` quality `low`	OpenAI Batch	~$5.50
Background removal	`BiRefNet-massive` (rembg)	Modal A10G GPU	~$0.20

Storage

Icons are stored via Git LFS (*.png in icons/ and icons_raw/). A plain git clone fetches only the small text/CSV files; binaries arrive on first checkout (or git lfs pull). The repo itself stays small enough to clone in seconds.

License

Released into the public domain under CC0 1.0 (matching bls-icons). Use, modify, and redistribute the icons, code, and metadata for any purpose without attribution.

README.md Unescape Escape