Product / May 2026

Knowledge Graphs for Research: Connecting Papers, Data, Notes, and Code

How graph-based context helps AI understand relationships between papers, methods, datasets, figures, and code instead of treating every file as isolated text.

Research teams do not suffer from a shortage of information. They suffer from a shortage of continuity.

A typical project begins with PDFs, interview notes, datasets, code notebooks, citations, figures, Slack threads, and half-finished drafts. At first, everyone remembers what matters. A senior researcher knows which paper supports which claim. A PhD student remembers why one dataset was excluded. A collaborator can explain why a model output changed between two drafts.

Then the project moves. Files are renamed. Folders are reorganized. People leave. The literature review becomes separated from the analysis. Code becomes separated from the claim it produced. The final paper cites a polished result, but the reasoning trail behind it becomes hard to reconstruct.

This is the problem a research knowledge graph should solve.

Not by creating yet another database that someone has to maintain after the fact, but by growing quietly while the team reads, writes, analyzes, and revises. The graph should emerge from the work itself.

From document storage to project memory

Most research tools still treat knowledge as a collection of files. A PDF manager stores papers. A notes app stores observations. A drive folder stores datasets. A code repository stores scripts. A reference manager stores citations.

Each tool is useful, but each one captures only part of the research process.

The real project memory lives in the relationships between these materials: this claim is supported by this passage; this figure was generated by this notebook; this table depends on this dataset; this paragraph cites this paper but contradicts another; this reviewer comment led to this revision.

A knowledge graph makes those relationships explicit.

Instead of representing a project as folders and files, it represents it as connected entities: documents, passages, claims, citations, authors, datasets, variables, computations, figures, notes, decisions, and drafts. Those entities become nodes. Their relationships become edges.

That structure matters because research is relational. A single passage can support several claims. A dataset can feed multiple analyses. A notebook cell can produce a table that later becomes a figure. A citation can be used as background in one section and evidence in another.

Plain search struggles with this. Vector search can retrieve semantically similar text, but similarity is not the same as evidence. A graph can preserve the difference between “related to,” “supports,” “contradicts,” “computed from,” “cites,” and “revises.”

That distinction is becoming increasingly important in AI-assisted research.

Why GraphRAG matters for research teams

Retrieval-augmented generation, or RAG, has become a common way to ground AI systems in external documents. But standard RAG often treats a knowledge base as chunks of text. It retrieves snippets that look relevant and passes them to a language model.

That works for simple lookup questions. It is weaker when the answer depends on relationships across many documents, such as “Which claims in our draft are weakly supported?” or “What changed between the first and final analysis?” or “Which studies contradict the assumption behind this model?”

This is where graph-based retrieval becomes useful. Microsoft Research describes GraphRAG as combining text extraction, network analysis, and LLM prompting to better understand private datasets, especially when questions require connecting information across a corpus rather than retrieving one passage. Microsoft Research: GraphRAG

The open GraphRAG documentation similarly frames it as a structured, hierarchical approach to RAG that extracts a knowledge graph from raw text, builds community structure, and uses those structures during retrieval. GraphRAG documentation

For research teams, the implication is straightforward: project memory should not only retrieve documents. It should retrieve context.

If a researcher asks, “Why did we make this claim?” the system should return not just a paragraph from a paper, but the chain: the draft claim, the cited passage, the note where it was interpreted, the dataset or computation that supports it, and the later revision where the claim was narrowed.

That is the difference between document search and research memory.

Provenance is the foundation of trust

More than one researcher described the same small panic before a deadline: a figure exists, the manuscript refers to it, but nobody is completely sure which notebook generated the final version.

There are several notebooks with similar names. Several output folders. Several versions of the dataset. The figure is correct, probably, but the trail is harder to prove than it should be.

A knowledge graph should make that relationship explicit: this figure came from this notebook, using this dataset version, with these parameters, and supports this claim in the draft.

A useful research knowledge graph must track provenance. Provenance means knowing where something came from, how it was transformed, and what depends on it.

This is not only a technical nicety. It is central to reproducibility. FAIR data principles emphasize that research objects should be findable, accessible, interoperable, and reusable, with rich metadata and persistent identifiers supporting reuse by both humans and machines. GO FAIR: FAIR Principles

In computational research, provenance becomes even more important because claims often depend on chains of data cleaning, modeling, notebook execution, and parameter choices.

Recent work on computational notebooks shows why this matters. A 2024 GigaScience study examined the reproducibility of Jupyter notebooks associated with biomedical publications, highlighting the difficulty of rerunning computational work even when notebooks are publicly available. GigaScience: Reproducibility of Jupyter notebooks

Related work on FAIR Jupyter uses a knowledge graph approach to make notebook-related research objects more granularly searchable and reusable. FAIR Jupyter

In practice, this means a research memory system should connect outputs back to inputs. A chart should know which code produced it. A code cell should know which dataset version it used. A draft paragraph should know which result it summarizes. A citation should know which passage it supports.

Without this, AI can sound confident while losing the evidentiary trail. With it, AI can answer with context, caveats, and sources.

A living graph, not an archive

The most important design principle is that the graph must grow during the work.

Many knowledge management systems fail because they ask researchers to stop working and become librarians. That rarely happens. Researchers are busy. They will not maintain a separate ontology after every meeting or manually connect every claim to every source.

A living research graph has to be built from natural activity: uploading PDFs, annotating passages, importing citations, writing notes, running notebooks, editing drafts, responding to reviewer comments, and discussing findings. Each action should add structure automatically or semi-automatically.

When a researcher highlights a passage, the system can connect it to a note. When a note becomes a claim, the claim can inherit its evidence. When a notebook generates a figure, the figure can remain linked to the computation. When a draft changes, the graph can preserve the earlier version and the reason for revision.

This is the direction Paradocs takes with Autograph, its knowledge layer for research teams. Autograph maps relationships between files, claims, passages, citations, data, and computations so that AI responses stay grounded in the project’s own evidence.

The goal is not to replace researchers’ judgment. It is to preserve the context that makes judgment possible.

Durable memory for long research projects

Research projects often outlive their original structure. A grant becomes a paper. A paper becomes a dataset. A dataset becomes a follow-up study. A collaborator joins halfway through. Another leaves before submission. Six months later, someone needs to know why a choice was made.

A living knowledge graph gives the project continuity across those transitions.

It allows a team to ask better questions: What evidence supports this section? Which claims depend on outdated data? Which papers have we cited but not actually used? Which findings are robust across analyses? Which assumptions changed between drafts?

That is the promise of research memory: not more storage, but more continuity.

For AI-assisted research, this continuity is the difference between a chatbot that summarizes files and a collaborator that understands the project.

A useful research knowledge graph is not built after the fact. It grows as the research grows.

The Paradocs Approach

Paradocs is built around the idea that research memory should grow naturally from the work itself.

Researchers should not have to stop what they are doing to manually maintain a separate knowledge system. The graph should emerge as they read papers, annotate passages, write notes, run notebooks, edit drafts, cite sources, and discuss findings with their team.

That is why Paradocs connects documents, data, code, citations, notes, and writing inside one workspace. Every file is not just stored. It becomes part of a living project graph that tracks how ideas, evidence, analysis, and outputs relate to each other.

A passage in a paper can be linked to a note. A note can become a claim. A claim can connect to a draft paragraph. A figure can remain tied to the notebook and dataset that produced it. A revision can preserve the reason behind a change.

This makes AI more useful because it gives the model structured context instead of isolated text chunks. It also makes AI more trustworthy because answers can be grounded in the project's own evidence, not just generated from loose semantic similarity.

The goal is not to create another place where researchers have to organize information.

The goal is to preserve the relationships that research teams usually lose.

Paradocs turns scattered files into connected project memory, so teams can ask not only where something is, but why it matters, what supports it, and how it changed over time.

Because a research project is not just a folder of documents.

It is a living system of claims, evidence, analysis, and decisions.

←Field Notes