Archy Knowledgebase

Build a context-aware knowledge graph for advanced RAG.

Ingest files, databases, and web pages. Extract entities and relations. Resolve duplicates. Load Neo4j. Then answer questions with hybrid retrieval: vector search plus graph traversal.

Pipeline (Dagster)

Ingest → Extract → Resolve → Graph → Retrieve

Neo4jQdrantRedisParquet snapshots

Ingest

Canonical documents + chunks from files, web, and databases.

docs/*.md → Document
html → cleaned text

Extract + resolve

Entities, relations, and probabilistic deduplication to keep the graph clean.

Profiles choose extractors (spaCy rules, LLMs) and matchers (splink).

Hybrid retrieval

Answer questions with evidence: vector search plus graph traversals.

Vector recallQdrant
Relationship contextNeo4j

Why teams use it

Retrieve the “why” with evidence — not vibes.

Vector search alone finds text. Knowledge graphs capture relationships: systems used by services, ownership, dependencies, decisions, and the trail of evidence. Archy Knowledgebase combines both so answers stay grounded in your sources.

Hybrid retrieval

Combine vector recall with graph traversal to surface the most relevant context and the relationships around it.

Entity resolution

Probabilistic matching deduplicates entities so your graph stays coherent across sources and naming variations.

Snapshots + lineage

Materializations, caching, and Parquet snapshots let you reproduce results, compare runs, and debug extraction changes.

Configurable by profile

Adapt extraction and matching to your domain.

Use profile-driven configuration to choose extractors (spaCy patterns, LLMs), tune chunking and prompts, and switch matching strategies per source type or language.

Profiles control

extractors: [spacy, azure-openai]
matchers: [splink, rules]
graph: neo4j (weighted edges)
retrieval: qdrant + graph traversals

FAQ

A few quick answers about how Archy Knowledgebase fits into your workflow.

© 2026 Meet Archy