Archy Knowledgebase
Build a context-aware knowledge graph for advanced RAG.
Ingest files, databases, and web pages. Extract entities and relations. Resolve duplicates. Load Neo4j. Then answer questions with hybrid retrieval: vector search plus graph traversal.
Pipeline (Dagster)
Ingest → Extract → Resolve → Graph → Retrieve
Ingest
Canonical documents + chunks from files, web, and databases.
Extract + resolve
Entities, relations, and probabilistic deduplication to keep the graph clean.
Hybrid retrieval
Answer questions with evidence: vector search plus graph traversals.
Why teams use it
Retrieve the “why” with evidence — not vibes.
Vector search alone finds text. Knowledge graphs capture relationships: systems used by services, ownership, dependencies, decisions, and the trail of evidence. Archy Knowledgebase combines both so answers stay grounded in your sources.
Hybrid retrieval
Combine vector recall with graph traversal to surface the most relevant context and the relationships around it.
Entity resolution
Probabilistic matching deduplicates entities so your graph stays coherent across sources and naming variations.
Snapshots + lineage
Materializations, caching, and Parquet snapshots let you reproduce results, compare runs, and debug extraction changes.
Configurable by profile
Adapt extraction and matching to your domain.
Use profile-driven configuration to choose extractors (spaCy patterns, LLMs), tune chunking and prompts, and switch matching strategies per source type or language.
Profiles control