Software 2.0: When Your Users Write Your Roadmap

1. The Problem with Traditional Development

Most product teams operate on a familiar rhythm: brainstorm features, build them, ship them, and then — maybe — check an analytics dashboard a few weeks later. The gap between what users actually do and what gets built next is enormous, and it is almost always filled with assumptions rather than evidence.

Traditional analytics dashboards are passive instruments. They sit there, accumulating data, waiting for someone to log in, write a query, interpret the results, and translate those results into a JIRA ticket. In practice, this means the signal from user behavior arrives late, diluted, and filtered through multiple layers of human interpretation.

For a small team building climate intelligence tools, this lag is unacceptable. Every week spent building the wrong feature is a week not spent on the feature that actually helps researchers, analysts, and policymakers access the information they need.

"The best product decisions come from data, not opinions. But most teams treat analytics as an afterthought — a rearview mirror when what you need is a headlight."

We wanted something fundamentally different: a system where user behavior doesn't just get recorded — it gets automatically transformed into development priorities and, in many cases, into actual code changes. The Japanese concept of kaizen (改善) — continuous improvement — but encoded into the infrastructure itself.

2. The Software 2.0 Philosophy

In 2017, Andrej Karpathy introduced the concept of "Software 2.0" to describe a shift from hand-written programs to learned programs — neural networks trained on data that effectively write their own logic. The core insight was profound: instead of humans specifying every rule, let the data define the behavior.

We took this idea and applied it to product development itself. Our interpretation: let user behavior data write the development roadmap. Not metaphorically — literally. The system collects events, analyzes patterns, generates prioritized recommendations, and opens GitHub Issues with specific action items. In some cases, an AI agent then picks up those issues, writes the code, and opens a pull request.

The key insight is closing the loop. Every product team collects data. Very few teams make that data automatically generate actionable development priorities. Even fewer close the loop all the way to code.

📊

Stage 1

Collect Data

Most teams stop here

MOST TEAMS

▶

🎯

Stage 2

Auto-Prioritize

Data generates dev priorities

FEW TEAMS

▶

🔄

Stage 3

Auto-Fix Code

Close the loop entirely

SOFTWARE 2.0 ✦

Traditional Development

Code

→

Ship

→

Analytics (maybe)

→

Auto-Collect

→

Auto-Analyze

→

Auto-Prioritize

→

Auto-Fix

Software 2.0 Development

Code

→

Ship

→

Auto-Collect

→

Auto-Analyze

→

Auto-Prioritize

→

Auto-Fix

↩

Loop

The difference is not just speed — it is the elimination of manual interpretation as a bottleneck.

3. Architecture: The Six Components

The feedback loop consists of six discrete components, each designed to operate independently while forming a continuous chain. Here is how they work, from the business perspective and the technical one.

🔄 Software 2.0 Cycle

User Action

Climate Library UI interactions

gi-track.js

Rich event tracking (sessions, features, funnels, errors)

/track-event

Worker ingests structured events to Cloudflare KV

/insights

Auto-generates dev priorities from user behavior

weekly-insights.yml

Every Monday: GitHub Issue with priorities

sw2-actionable

Human labels issue for auto-resolution

resolve-insight.yml

Claude Agent analyzes, fixes, creates PR

Improved UX

User reviews & merges. Loop restarts.

↑ Loop Back to Step 1

3.1 Event Tracking — gi-track.js

Business value: Captures every meaningful user interaction — feature usage, funnel progression, errors, session lifecycle — without impacting page performance or user experience. Respects Do-Not-Track (DNT) headers by design, because privacy is not optional in climate tech.

Technical implementation: A lightweight client-side script (~3KB) that uses the Beacon API for non-blocking event dispatch. Events are categorized into four types: feature (what users do), funnel (where users are in the journey), error (where users hit friction), and session (engagement lifecycle). Events are batched and sent asynchronously to minimize network overhead. Session IDs are generated locally — no cookies, no fingerprinting.

3.2 Event Ingestion — /track-event

Business value: Processes thousands of events per day at the edge, at near-zero cost. No databases to manage, no servers to scale. Events are available for analysis within seconds of being sent.

Technical implementation: A Cloudflare Worker endpoint that validates incoming events against a structured schema, then writes them to Cloudflare KV using daily aggregation keys (evt:{date}). Each day's events are stored as a single KV entry that gets incrementally updated, keeping read costs minimal. All data has a 90-day TTL — we care about trends, not archaeological data.

3.3 Auto-Analysis — /insights

Business value: This is the brain of the system. It automatically identifies what to build next based on actual user behavior — not guesses, not stakeholder opinions, not competitor feature lists. It answers the question every product team struggles with: "What should we work on this week?"

Technical implementation: The /insights endpoint reads aggregated event data and runs five types of analysis to generate prioritized recommendations.

Five Priority Types

funnel_dropoff — Users are abandoning the product at a specific step. Highest urgency: this is lost value.
friction — Repeated errors or failed operations. Users are trying to do something and failing.
adoption_gap — Features that exist but almost nobody uses. Build it and they didn't come — why?
model_preference — Which LLM models users actually choose when given options. Guides infrastructure investment.
engagement — Low session duration or event count. The product isn't sticky enough.

3.4 Analytics Dashboard — Visual Intelligence

Business value: The team can see user behavior in real-time without writing SQL, configuring Mixpanel, or waiting for a data analyst. Funnel visualization, event breakdowns, and trend lines are all live and always current.

Technical implementation: A React single-page application using Recharts for visualization, pulling data directly from the Worker API. No build step required for updates — the dashboard reads the same KV data that powers the automated analysis. This means the dashboard and the auto-analysis engine are always looking at the same truth.

3.5 Scheduled Reports — GitHub Actions

Business value: Development priorities arrive as GitHub Issues every Monday morning. No meetings required to decide what to work on — the data has already decided. The team reviews, triages, and labels. This is where human judgment enters the loop.

Technical implementation: A cron-triggered GitHub Actions workflow runs weekly, calls the /insights endpoint, parses the JSON response, and creates a formatted GitHub Issue with priority type, severity, affected metrics, and suggested actions. Issues are tagged with sw2-actionable for those that can be addressed by the automated agent.

3.6 Auto-Resolution Agent — Closing the Loop

Business value: An AI agent reads the Issue, analyzes the relevant codebase, proposes a fix, and creates a pull request — often within minutes of the Issue being labeled. This is not replacing developers; it is eliminating the mechanical translation from "we know what's wrong" to "here's a first draft of the fix."

Technical implementation: Powered by Claude Code via claude-code-action, triggered when an Issue receives the sw2-actionable label. The agent reads the Issue body, searches the codebase for relevant files, generates changes, creates a feature branch, commits, and opens a PR with a description linking back to the originating Issue and the data that motivated it.

4. The Conversion Funnel: What We Actually Measure

At the heart of the system is a six-step conversion funnel that maps the complete user journey through Climate Library:

Climate Library Funnel

visit → load_sources → select_sources → generate → studio_use → publish

Each step represents a meaningful commitment. Visit means someone found us. Load sources means they engaged with the knowledge pipeline. Select sources means they made a deliberate choice about what information to use. Generate means they asked an LLM to synthesize climate intelligence. Studio use means they edited and refined the output. Publish means they produced something they considered worth sharing.

The system automatically calculates drop-off rates between each pair of consecutive steps. When a drop-off exceeds a threshold, it becomes a funnel_dropoff priority — the highest-urgency signal in the system, because it represents users who wanted to get value but couldn't.

Example: Detected Drop-off

62% drop-off between studio and publish

Users are creating content but not publishing it. This suggests the publish flow has too much friction, the preview doesn't match expectations, or users lack confidence in the output quality. Auto-generated Issue: "Reduce studio-to-publish friction — investigate publish UX barriers."

This is the difference between "we think the publish flow might need work" and "62% of users who invest time in the studio abandon the product before publishing, and here's a ticket to fix it." The data doesn't just inform — it acts.

5. Human-in-the-Loop: Ningensei (人間性) by Design

It would be tempting to make this cycle fully autonomous — and technically, we could. But we deliberately chose not to. The Japanese concept of ningensei — the essential quality of humanness — applies directly to product development: some decisions require judgment that no algorithm should make alone.

There are two intentional human checkpoints in the loop:

Label triage: When weekly insights arrive as GitHub Issues, a human reviews them before applying the sw2-actionable label. This is where product judgment lives. Not every data signal should be acted on — sometimes a drop-off is acceptable, sometimes a low-adoption feature is intentionally niche, sometimes the right response is "wait and observe."
Pull request review: Every PR generated by the agent goes through standard code review. The agent proposes; a human approves, requests changes, or rejects. Code quality, architectural consistency, and edge cases still benefit enormously from human eyes.

This is the difference between "AI replacing developers" and "AI augmenting developers." The loop handles the mechanical work — collecting data, identifying patterns, translating priorities into code — while humans handle the judgment work: deciding what matters, evaluating trade-offs, maintaining quality standards.

"The agent proposes, the human disposes. Automation handles the 'what' and the 'how.' Humans decide the 'whether' and the 'why.'"

6. Results and What Comes Next

The Software 2.0 cycle is live on Climate Library at greenimpacts.org. Here is what we have observed in the early stages:

Funnel visibility: For the first time, we can see exactly where users succeed and where they struggle — not in aggregate dashboards reviewed monthly, but in automated weekly reports that arrive ready to act on.
Friction detection: Error patterns that previously went unnoticed for weeks are now surfaced within days, with specific context about what the user was trying to do when the error occurred.
Model preference insights: We now have clear data on which LLM models users prefer for different tasks, guiding our infrastructure investment and API cost optimization.
Development velocity: The time from "user encounters a problem" to "PR with a fix is open" has compressed from weeks to days for many categories of issues.

What comes next:

A/B testing integration: Using the same event pipeline to measure the impact of changes, not just detect the need for them.
Cohort analysis: Understanding how different user segments (researchers vs. policymakers vs. analysts) move through the funnel differently.
Predictive priority scoring: Using historical data to predict which issues will have the highest impact before they fully manifest.

There is a meta-insight worth noting: this system improves itself. More users generate better behavioral data. Better data produces more accurate priorities. More accurate priorities lead to a better product. A better product attracts more users. This is the virtuous cycle that Software 2.0 was always meant to enable — not just data writing programs, but data driving the entire product evolution.

The Virtuous Cycle

👥 More Users

📊 Better Data

🎯 Better Priorities

🚀 Better Product

♻️

Compounds
Over Time

Each iteration of the loop makes the next iteration more precise. The system's accuracy compounds over time.