Building a Personal AI Research OS

The Problem: Fragmented Knowledge

Most knowledge management systems fail because they are passive. Research notes, GitHub repositories, and video transcripts remain siloed in tools like Obsidian, Notion, or browser bookmarks. When starting a new project, developers often rely on generic LLM context windows, losing the personal value and historical context of their own previous work. The goal is to move from 'hoarding' information to an active, agentic research operating system.

The Three-Layer Architecture

Instead of relying on heavy vector databases or proprietary platforms like NotebookLM, the authors propose a file-based system that is human-readable and agent-native. The system relies on three distinct layers:

Raw Layer: Immutable source files (Markdown, PDFs, transcripts). This is the source of truth.
Index Layer: A central index.yaml file that acts as a catalog. It contains metadata (title, author, date, origin) and a summary for every source. This file is the primary entry point for agents, allowing them to reason about what information is available without needing to scan thousands of files.
Wiki Layer: A synthesized layer of derivatives. This includes comparisons between concepts, entity deep-dives, and project-specific notes. These are generated by agents and stored as markdown files, allowing the knowledge base to compound over time.

The Deep Research Algorithm

Rather than a single prompt, the system uses an iterative, multi-agent loop:

Orchestrator Agent: Breaks a high-level topic into sub-questions.
Worker Agents: Execute queries against the Second Brain and the public web (using tools like Google Search or custom scrapers).
Ranking & Compaction: To prevent context window explosion, the system ranks sources by relevance to the topic. Only the highest-signal information is fully scraped; the rest is kept as summaries.
Persistence: Unlike a standard chat session, the output is saved as a new entry in the Wiki layer, meaning the research is reusable for future projects.

Why File-Based Over Vector Databases?

For personal research, vector databases add unnecessary infrastructure overhead. A file-based system using YAML and Markdown is:

Inspectable: You can open, edit, and delete files manually.
Portable: It works across any OS and is not tied to a specific cloud provider.
Agent-Friendly: LLMs are highly proficient at reading and writing structured text files, making them ideal for managing a local knowledge base.

Implementation Strategy

Consolidation: Move scattered notes into a single local directory (e.g., Obsidian vault).
Automation: Use agentic tools (like Claude Code or Codex) to automate the ingestion of new sources (YouTube transcripts, GitHub repos, web links).
Maintenance: Treat the Wiki as a living document. When research becomes stale, update the Wiki page rather than starting a new search from scratch.

The Problem: Fragmented Knowledge

The Three-Layer Architecture

The Deep Research Algorithm

Why File-Based Over Vector Databases?

Implementation Strategy

More from AI Automation

Designing Agentic Loops with Claude Code

Standardizing AI Context with the Open Knowledge Format (OKF)

Moonshot AI Launches Kimi Work: A Local Desktop Agent

Building Internal AI Data Workspaces with Studio