The LLM Wiki - Ramshankar Yadhunath

A couple of days ago I came across Andrej Karpathy's LLM Wiki idea. I did come across it quite unceremoniously on an Insta page while I was scrolling social media waiting for my kettle to boil. Immediate thoughts - Hey, this is eerily similar to Tiago Forte's second brain! Well, its not quite the second brain, but it pretty much relates to the same topic - Personal Knowledge Management (PKM). I have had a kind of a long and complex relation with PKM over the years. In undergrad, my PKM was pretty much a bunch of A4-size sheets and notebooks with diagrams and writing that I seldom revisited. Somewhere through the time, I discovered the idea of journalling and began using my journal as a TODO + PKM system. Not strictly bullet journal-like, but it did serve its purpose. Cut to 2023, I finally got myself a copy of Tiago Forte's Second Brain book and spent the December work taking meticulous notes on the process. I found myself so entangled in it that I came up with an "inspired" process for both my PKM as well as tried to setup something similar for my team at work - let's call it *Enterprise* Knowledge System (EKM). Both lost their shine soon enough, though the reasons were vastly different - PKM died because I realised the addition of more processes was so hard to maintain because "bookkeeping" my knowledge was an unsavoury task. EKM died because unless there is a dedicated person who will crack a whip on anyone who does not follow the right processes to knowledge management, enterprises really cannot fight the problem of knowledge sharing within the org. ![[second-brain-book.png]] Given these past experiences, when I read that Karpathy had an idea on how LLMs can help with knowledge management, I was all ears. In this post, I shall think through the [LLM wiki approach as written by Karpathy here](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). As part of this, I am going to also try and apply the approach in a basic fashion on my own Obsidian vault. A couple of important reminders 1. Obsidian works well for me, but I am not a power user. I do not have fancy knowledge graphs and neither is my vault published on the beautiful Obsidian vault examples online. 2. I reject the need to have a super sophisticated digital note taking system because I am still quite analog in my note taking. When I want to seriously understand something, my first instinct is to open up a notebook and scribble into it. ## The Core Ideas The most important idea for me personally is how Karpathy views LLM agents like Claude Code, OpenAI Codex etc. He starts with *"This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you."*. This means he trusts the agents to be useful enough to help users build out sophisticated PKM tools based of the "idea file". While this is a no-brainer, it also highlights the importance of "knowing what you want" before asking the agent to build it. In Karpathy's view, the LLM Wiki approach acts as a natural way to solve the maintenance crisis of human-curated knowledge bases. Traditional wikis (Obsidian, Roam) fail because maintenance burden grows quadratically while value grows linearly. At scale, "bookkeeping" defeats humans. If you have been following AI, you would definitely have come across RAG. RAG is when an LLM retrieves information from external data sources and uses those as a basis for generating an answer to a query. RAG is stateless - each query triggers a fresh search, extraction, and synthesis cycle before vanishing into chat history. This creates "rediscovery" i.e the same intellectual work repeats endlessly every time a similar query is asked. The LLM Wiki inverts this - it builds a persistent synthesis layer (the wiki itself) that evolves with each interaction. Rather than querying raw documents, you query your own compiled understanding. Monday's insights inform Thursday's queries because the wiki has accumulated and preserved them. As per Karpathy's definition, tasks in knowledge management can be broken down into human work vs LLM work. Or simply put - thinking vs admin work. ![[human-work-vs-llm-work.png]] The "wiki" is treated as compounding artefact that grows over time. As far as architecture for this is concerned, there are 3 components 1. Raw - Immutable sources of info i.e articles, YT videos, images 2. Wiki - LLM-generated markdown files with summaries, entity pages, cross-references 3. Schema - A configuration document (like `CLAUDE.md`) defining wiki structure and conventions I put together the following mermaid diagram[^1] to showcase the above components and the 3 operations (Ingest, Query and Lint) as set by Karpathy. ![[llm-wiki-mermaid.png]] Some good practices as maintained in the gist 1. The human should ideally stay involved during the Ingest process. While bulk ingest is an option, working through a single piece in isolation will help analyse the material qualitatively and ensure the LLM is indexing your thoughts and interpretation; not just making its own vacuum summary. Chris Lettieri explicitly drills into this in his piece, [An LLM Wiki won't compound your knowledge. Here's what will](https://bitsofchris.com/p/an-llm-wiki-wont-compound-your-knowledge). 2. When querying, ensure good quality answers are added back into the wiki for the future 3. Run a lint periodically to "health-check" the wiki Karpathy doesn't go into extensive detail about indexing and logging as standalone concepts, but they emerge naturally from his three operations. During the Ingest phase, the LLM "updates indexes" as it reads sources and writes summary pages. This indexing ensures the wiki remains queryable and organized, allowing the Query operation to search efficiently through accumulated knowledge. The indexes serve as navigation aids within the wiki, helping connect related pages and maintain the interconnected structure that makes synthesis valuable. Logging appears implicitly in the schema and the Lint operation. As sources are ingested and pages are updated, maintaining records of what was added and when becomes part of the wiki's governance. The Lint operation then reviews these records to flag stale claims, contradictions, and pages that have drifted from their sources. The schema defines how this bookkeeping happens, ensuring consistent practices across all three operations. In essence, indexing and logging are mechanical housekeeping tasks that the LLM handles as part of its job to maintain wiki health without requiring human intervention. ## Let's build it! Alright, now with the core ideas above, I am going to attempt to build this over my Obsidian vault. In all fairness, it's only going to be a PoC. I have had a phase of looking to "learn & write" about climate and tech and sustainability. In fact, my LinkedIn bio reads "Climate Enthusiast" (yeah, it's time I change that cringy tagline to impress the algorithmic overlords). On this vault, I have been writing up some notes and questions on climate tech and sustainability for months. My Obsidian folder was growing, but without structure. I had research notes scattered everywhere, no coherent way to think through bigger questions, and no clear next step. It felt like bookkeeping again (the exact problem that killed my previous PKM attempts). And the joy got sucked out of it pretty fast. So, maybe LLM wiki can revive that !? I'm also aware this is my first attempt. The system will evolve as I run more synthesis cycles. I expect some iterations to land cleanly, others to need rework. Since Karpathy's vision was deliberately high level and since my creativity was non-existent a few days ago, I started with a youtube tutorial to "inspire me". And I realised something important: the LLM drives a lot of the build. In other words, you don't need to have everything figured out before you start. You figure it out as you go, talking through it with the agent[^2]. <iframe width="700" height="350" src="https://www.youtube.com/embed/zVEb19AwkqM" title="Karpathy's LLM Wiki: Watch Me Build a Knowledge Base From Scratch!" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> Here's what the workflow looks like in practice, based on the tutorial: ![[llm-wiki-deep-dove-onchain-garage.png]] Now, I get the process. But, I am not really sure I would be able to focus on something if I did not have a list of things I wanted to focus on. In my obsidian vault, I already had a "Questions" page which is where I write up questions that I think are worth finding out answers to. So, I decided to break away from the framework a little and introduce a new component to my workflow - Instead of trying to ingest and organise everything upfront, questions started the process. This process is not really at odds with the LLM wiki approach too, in fact it enhances it. See below for example - Since the agent has a reference of the questions I have on the page, it will pick up something unanswered and open from that page itself. So it's processing less context, and my learning will actually be directed toward questions I genuinely want to answer. ![[Screenshot 2026-04-14 at 09.02.05.png]] ### My workflow The workflow split into five clear steps 1. I pick a question or the agent recommends one 2. I research it and collect materials 3. The agent synthesises what I found into a structured breakdown 4. I refine it until I'm happy with it, 5. The agent logs it and moves forward. Nothing fancy. ![[climate-llm-wiki-workflow.png]] The whole workflow is guided by questions. The questions page is where I pop in random ideas I would like to know or write about. Research needs to be human in some way. Get involved. Then the agent finds sources and proposes a breakdown. The key thing here is to stay involved at each step. As Chris Lettieri highlights: [human involvement during research prevents the agent from making vacuum summaries that don't reflect your actual understanding](https://bitsofchris.com/p/an-llm-wiki-wont-compound-your-knowledge). It's a great read, written by an AI enthusiast with just enough skepticism to give him credibility! I also realised I could ask the agent to find resources for me (see below). This felt like cheating at first, but honestly, it's a practical win if your goal is to gain domain expertise quickly. That said, there's a real tradeoff: speed over the serendipity of hunting for sources yourself. Sometimes the papers you stumble on accidentally are the ones that shift your thinking. So it depends on whether you're optimizing for understanding fast or understanding deeply. ![[Screenshot 2026-04-14 at 21.52.14.png]] In the synthesis step, there is this bit where I have asked the LLM to break down a question into sub-components. For the greenhouse gases question, the agent suggested splitting it into sub-components - What are greenhouse gases, how the physics works, types and potency, sources by sector, current levels, historical context, and the impact. In a way, this helps add on more coverage onto a specific question. Makes the process of learning fun! ![[Screenshot 2026-04-14 at 21.53.03.png]] What does a **synthesis note** actually look like? It's a markdown file with frontmatter (status, sources, sub-components, dates), followed by coherent sections that walk through the answer. In this example, the greenhouse gases synthesis has seven sections, but depending on the question, you might have anywhere from 3-10 sub-components. ![[Screenshot 2026-04-14 at 21.54.04.png]] ### Why did I choose to diverge from the LLM Wiki approach? The main architectural difference between what Karpathy suggests and what I built is this: Karpathy's approach is source-driven. You ingest sources, and the wiki grows from processing them. My approach is question-driven. A question pulls research toward it, and the wiki grows from answering questions. This matters because I realized I don't want to spend time ingesting everything I find. I want to ingest only what helps me answer something I actually want to know. This divergence aren't improvements on Karpathy's idea. They're just adaptations. Karpathy was building a system at a different scale and for different purposes. I was cleaning up a specific folder and trying to avoid the maintenance burden that killed my previous PKM attempts. There are no hard rules. It's an open world with a lot of liberty to decide what really works for each individual. **Experimentation is king.** [^1]: The mermaid syntax was easily put together by Claude. But, it involved a lot of rechecking to ensure the process made sense! [^2]: Personally, I think this is a dangerous game. Scope creep alert!