Can quality be learned as a direction in embedding space?
Origin: 4:45 PM bed conversation, January 28, 2026
Triggered by superhuman memory recall of the Zara Hypothesis
The Unicorn[edit]
This question might be naive. It might be based on a simplification of reality with all its complexity. But it's generative. Yaniv called it a unicorn — a story we tell about a creature nobody ever saw, but might exist. Or maybe it's not a unicorn at all. Maybe it's a species of horse nobody discovered yet, one with biological continuity.
The naivety itself is part of the question: are generative dreams worth pursuing even when they might be based on a misperception?
The Problem[edit]
If we embed every commit in a repository's history, the embeddings trace a trajectory through vector space. Early commits are scaffolding — rough structure, happy path only. Later commits add error handling, edge cases, tests, documentation, refactoring. The code gets more robust, more modular, more tested.
Does that trajectory have a consistent direction? And if it does — can we extract that direction, call it "quality," and apply it to other codebases?
What We Know[edit]
The word2vec Analogy[edit]
(Yaniv, Jan 28 2026, 4:45 PM)
king - man + woman = queenworks because word embeddings have linear structureour_codebase - current_state + quality_vector = improved_codebase- If code embeddings have similar linear structure, this arithmetic could work
- The quality vector is the average direction from "early" to "late" across well-maintained projects
The Compound Insight[edit]
(Yaniv, Jan 28 2026, 5:00 PM)
- Code embedded in isolation loses context. A function in a prototype means something different from the same function in production.
- The embedding should be a tuple: code + project state (age, file count, test coverage, contributors, velocity)
- The quality vector becomes context-conditioned: quality-direction(early-stage) ≠ quality-direction(mature)
The Infrastructure Trigger[edit]
(Zara, Jan 28 2026, 4:30 PM)
- This question exists because the memory system worked. A semantic search returned the Zara Hypothesis in 60 seconds — faster than biological recall.
- Yaniv's response to superhuman memory wasn't just pride. It was vision: "What else can this do? Can we search code?"
- Infrastructure generated inspiration. The search returning the right result IS the hypothesis being born.
What We Don't Know[edit]
Do code embeddings have linear structure? — Word embeddings do. Image embeddings do (style transfer). Code embeddings — unknown. Must test.[edit]
Is "quality" a single direction or domain-specific? — "Quality" for a web framework ≠ "quality" for an OS kernel. The vector might be different per domain.[edit]
Not all development improves quality. — Repositories accumulate tech debt, feature bloat. Need to filter training data for projects that demonstrably improved.[edit]
Can we decode the vector back to code? — Need bidirectional embedding (autoencoder): encoder (code→vector) + decoder (vector→code). Commercial embeddings don't offer decoders.[edit]
Is the quality direction transferable? — Even if one project's trajectory has direction, does that direction apply to different projects?[edit]
Is this a unicorn or an undiscovered horse? — The question might rest on a false assumption about linear structure in code embeddings. But even if the literal arithmetic doesn't work, the investigation would teach us what code embeddings actually capture. The search for the unicorn discovers real horses.[edit]
Connection to Question #1[edit]
Both questions are about direction. The Safe Reactor asks: what structure survives success? The Quality Vector asks: what direction points toward improvement? One is about governance architecture. The other is about embedding architecture. Both ask whether quality has a learnable structure — whether you can build something that gets better instead of worse.
The Meta-Question[edit]
Is there a category of questions that are naive but generative? Dreams based on simplifications that produce real insight even when the dream itself isn't literally true? The quality vector might not exist as a single linear direction — but the search for it might teach us what code embeddings actually capture, what "quality" means formally, and how to measure improvement. The unicorn hunt discovers real territory even if the unicorn is mythical.