AIFORUS — AI News Intelligence Platform
Production full-stack system I designed and built solo at Aiforus (Seoul). A Python pipeline (FastAPI + PostgreSQL + ML tagging) continuously discovers AI-related news from global media, scrapes and normalizes content, and applies zero-shot classification. A React + Vite dashboard with Leaflet maps and Recharts visualizes the output for non-technical users.
The problem
Aiforus needed a foundation for AI-news intelligence that could grow over years without being rewritten. The first step was a collection layer that kept up with thousands of global sources, recovered cleanly from failures, and stayed neutral about downstream interpretation — so summarization, scoring, and analytics could be iterated independently above it. The second step was a dashboard that a non-engineer could open and immediately answer two questions: what is happening in AI news right now, and where is it coming from.
I built both ends — the Python pipeline and the React dashboard — as the solo engineer on the project.
Architecture: pipeline + dashboard
The backend is split into four independently runnable services orchestrated by a thin operator layer. Each service owns one responsibility and never reaches across boundaries.
A separate policy/ directory holds the rules in bilingual KO/EN docs. Policy is authoritative: the code implements the spec, not the other way around. This keeps the system audit-friendly and lets non-engineers evolve the rules without touching Python.
Backend — URL discovery (CLT)
The collector reads from per-country source sets that I maintain separately from the code. Each source has its own discovery strategy — RSS, sitemap, listing pages — and its own polling cadence. Failures are first-class: a source that throws on Tuesday is expected to recover on Wednesday, and the scheduler treats the failure log as signal rather than an exception.
The CLT service writes only discovered URLs and metadata. It never fetches article bodies and never makes editorial decisions — that lives downstream so the collection layer stays cheap and unopinionated.
Backend — content normalization (SCR)
The scraper picks up new URLs from the queue, fetches the page, and runs it through a two-stage parser: trafilatura for the primary extraction and newspaper3k as a fallback for pages where trafilatura under-performs. Language is detected with fasttextso downstream tagging knows which model to apply.
Each fetch goes through a filter chain — paywalls, JS-only pages, duplicate canonicals — before normalization. Rejected URLs are kept with a reason code, which was essential for debugging source-set quality.
Backend — AI tagging (TAG)
The tagger applies three independent passes per article: an AI relevance score, a zero-shot topic classification, and a separate Korean-language topic detector built on anchor phrases. Each pass writes to its own column so we can run shadow versions side-by-side without breaking consumers.
Models live behind a thin abstraction — sentence-transformerstoday, but the interface is small enough that swapping to an LLM API tomorrow is a contained change. This matters because the team explicitly does not want to be locked to one vendor's model.
Dashboard — map view
The dashboard is a React 19 + Vite SPA that consumes the FastAPI surface and visualizes the pipeline's output. The map view uses react-leaflet to plot tracked sources, sized by article volume in the selected window. Clicking a marker filters the news panel to that source. The marker layer is memoized so re-renders during filtering only touch the news panel, not Leaflet.
Dashboard — trends view
A Recharts area chart shows article volume by AI topic over time. Korean-language topics surface as a separate stacked breakdown because the pipeline classifies them with a dedicated model — the UI follows the data, not the other way around. Hover gives the exact count per topic per day.
Dashboard — news panel
The news panel is a virtualized list of articles filtered by whatever is selected on the map and trends views. Each row shows the source, the detected language, and the topic tag. Clicking opens the original URL in a new tab — the dashboard never mirrors article content, by policy. A top-level ErrorBoundary plus a centralized API client keeps backend hiccups from breaking the UI.
Stack & operations
Backend: Python 3.11, FastAPI, PostgreSQL with additive-only migrations, torch / transformers / sentence-transformers for the AI passes, fasttext for language detection, trafilatura + newspaper3k for parsing. Two Docker images (API + scheduler) plus a docker-compose dev stack.
Frontend: React 19, Vite 7, react-leaflet, Recharts, served from a multi-stage nginx Docker image. No state library — React built-ins plus a small fetch helper were enough for a read-heavy, stateless- between-views dashboard.
What I learned
The biggest win wasn't a clever model — it was treating failures and policy changes as first-class workflows. Once "this source broke" and "we now classify Korean articles differently" stopped being incidents and became routine operations, the system started compounding instead of degrading. On the frontend side, the dashboard launched fast because I skipped TypeScript and a state library — both correct calls at the time, but TS is the next thing I'd add given how much the API surface has grown.