Build intelligent search systems with knowledge graphs. Learn graph database selection, ontology design, entity extraction, and RAG integration with production code examples.
Vector search revolutionised how we find semantically similar content, but it has limitations. It can't represent relationships between concepts, struggles with multi-hop reasoning, and loses structural information that humans naturally use when understanding domains. Knowledge graphs solve these problems by explicitly modelling entities and their relationships—and when combined with LLMs, they create search systems that truly understand your data.
This guide covers the complete implementation of knowledge graph-enhanced search systems: from selecting graph databases and designing ontologies, through entity extraction and graph construction, to querying and integrating with RAG pipelines. You'll learn patterns that work at scale with production-ready code examples.
Knowledge graphs add a dimension that vector embeddings alone cannot capture: explicit relationships between concepts.
When a user asks "What regulations affect products similar to ours?", a knowledge graph can: find your product, traverse to similar products, find regulations connected to those products, and return structured, explainable results. Vector search alone would struggle to navigate these relationship chains.
Your choice of graph database affects performance, query capabilities, and integration complexity. Here's how to decide:
| Database | Best For | Query Language | Vector Support |
|---|---|---|---|
| Neo4j | General purpose, enterprise | Cypher | Native (5.x+) |
| Amazon Neptune | AWS integration, managed | Gremlin, SPARQL | Via OpenSearch |
| ArangoDB | Multi-model flexibility | AQL | Native |
| NebulaGraph | High-scale distributed | nGQL | Experimental |
| FalkorDB | Redis ecosystem, speed | Cypher | Via Redis |
Neo4j is the most mature option with excellent Python and TypeScript support:
1from neo4j import GraphDatabase
2from typing import List, Dict, Any
3import os
4
5class KnowledgeGraphDB:
6 def __init__(self):
7 self.driver = GraphDatabase.driver(
8 os.getenv("NEO4J_URI", "bolt://localhost:7687"),
9 auth=(
10 os.getenv("NEO4J_USER", "neo4j"),
11 os.getenv("NEO4J_PASSWORD")
12 )
13 )
14
15 def close(self):
16 self.driver.close()
17
18 def create_entity(self, entity_type: str, properties: Dict[str, Any]) -> str:
19 """Create a node with given type and properties."""
20 with self.driver.session() as session:
21 result = session.run(
22 f"""
23 CREATE (n:{entity_type} $props)
24 RETURN elementId(n) as id
25 """,
26 props=properties
27 )
28 return result.single()["id"]
29
30 def create_relationship(
31 self,
32 from_id: str,
33 to_id: str,
34 rel_type: str,
35 properties: Dict[str, Any] = None
36 ):
37 """Create a typed relationship between two nodes."""
38 with self.driver.session() as session:
39 session.run(
40 f"""
41 MATCH (a), (b)
42 WHERE elementId(a) = $from_id AND elementId(b) = $to_id
43 CREATE (a)-[r:{rel_type} $props]->(b)
44 """,
45 from_id=from_id,
46 to_id=to_id,
47 props=properties or {}
48 )
49
50 def find_connected(
51 self,
52 entity_id: str,
53 relationship_types: List[str] = None,
54 max_depth: int = 2
55 ) -> List[Dict]:
56 """Find entities connected within max_depth hops."""
57 rel_filter = ""
58 if relationship_types:
59 rel_filter = ":" + "|".join(relationship_types)
60
61 with self.driver.session() as session:
62 result = session.run(
63 f"""
64 MATCH path = (start)-[{rel_filter}*1..{max_depth}]-(connected)
65 WHERE elementId(start) = $entity_id
66 RETURN connected, relationships(path) as rels
67 """,
68 entity_id=entity_id
69 )
70 return [dict(record) for record in result]Neo4j 5.x supports native vector indexes for hybrid graph + semantic search:
1def setup_vector_index(self, index_name: str, label: str, property_name: str, dimensions: int = 1536):
2 """Create a vector index for semantic search."""
3 with self.driver.session() as session:
4 # Create vector index
5 session.run(
6 f"""
7 CREATE VECTOR INDEX {index_name} IF NOT EXISTS
8 FOR (n:{label})
9 ON (n.{property_name})
10 OPTIONS {{
11 indexConfig: {{
12 `vector.dimensions`: {dimensions},
13 `vector.similarity_function`: 'cosine'
14 }}
15 }}
16 """
17 )
18
19def add_embedding(self, entity_id: str, embedding: List[float], property_name: str = "embedding"):
20 """Add embedding vector to an existing node."""
21 with self.driver.session() as session:
22 session.run(
23 f"""
24 MATCH (n)
25 WHERE elementId(n) = $entity_id
26 SET n.{property_name} = $embedding
27 """,
28 entity_id=entity_id,
29 embedding=embedding
30 )
31
32def vector_search(
33 self,
34 query_embedding: List[float],
35 label: str,
36 index_name: str,
37 top_k: int = 10
38) -> List[Dict]:
39 """Find similar nodes using vector similarity."""
40 with self.driver.session() as session:
41 result = session.run(
42 f"""
43 CALL db.index.vector.queryNodes('{index_name}', {top_k}, $embedding)
44 YIELD node, score
45 RETURN node, score
46 ORDER BY score DESC
47 """,
48 embedding=query_embedding
49 )
50 return [{"node": dict(r["node"]), "score": r["score"]} for r in result]A well-designed ontology is the foundation of an effective knowledge graph. It defines what entity types exist and how they can relate.
1from dataclasses import dataclass
2from typing import List, Optional
3from enum import Enum
4
5class EntityType(Enum):
6 DOCUMENT = "Document"
7 SECTION = "Section"
8 PERSON = "Person"
9 ORGANIZATION = "Organization"
10 CONCEPT = "Concept"
11 PRODUCT = "Product"
12 REGULATION = "Regulation"
13
14class RelationType(Enum):
15 # Document structure
16 CONTAINS = "CONTAINS" # Document -> Section
17 REFERENCES = "REFERENCES" # Document -> Document
18
19 # Entity relationships
20 AUTHORED_BY = "AUTHORED_BY" # Document -> Person
21 PUBLISHED_BY = "PUBLISHED_BY" # Document -> Organization
22 MENTIONS = "MENTIONS" # Document -> Entity
23 RELATED_TO = "RELATED_TO" # Entity -> Entity
24
25 # Domain-specific
26 REGULATES = "REGULATES" # Regulation -> Product
27 SUPERSEDES = "SUPERSEDES" # Regulation -> Regulation
28 WORKS_FOR = "WORKS_FOR" # Person -> Organization
29
30@dataclass
31class OntologySchema:
32 """Define valid relationships between entity types."""
33 valid_relationships = {
34 RelationType.CONTAINS: [
35 (EntityType.DOCUMENT, EntityType.SECTION),
36 ],
37 RelationType.AUTHORED_BY: [
38 (EntityType.DOCUMENT, EntityType.PERSON),
39 ],
40 RelationType.MENTIONS: [
41 (EntityType.DOCUMENT, EntityType.PERSON),
42 (EntityType.DOCUMENT, EntityType.ORGANIZATION),
43 (EntityType.DOCUMENT, EntityType.CONCEPT),
44 (EntityType.DOCUMENT, EntityType.PRODUCT),
45 (EntityType.SECTION, EntityType.CONCEPT),
46 ],
47 RelationType.REGULATES: [
48 (EntityType.REGULATION, EntityType.PRODUCT),
49 (EntityType.REGULATION, EntityType.ORGANIZATION),
50 ],
51 }
52
53 @classmethod
54 def validate_relationship(
55 cls,
56 rel_type: RelationType,
57 from_type: EntityType,
58 to_type: EntityType
59 ) -> bool:
60 """Check if a relationship is valid according to schema."""
61 valid_pairs = cls.valid_relationships.get(rel_type, [])
62 return (from_type, to_type) in valid_pairsEnforce your ontology to maintain graph quality:
1class ValidatedKnowledgeGraph(KnowledgeGraphDB):
2 def __init__(self, schema: OntologySchema):
3 super().__init__()
4 self.schema = schema
5
6 def create_relationship_validated(
7 self,
8 from_id: str,
9 from_type: EntityType,
10 to_id: str,
11 to_type: EntityType,
12 rel_type: RelationType,
13 properties: Dict = None
14 ):
15 """Create relationship with schema validation."""
16 if not self.schema.validate_relationship(rel_type, from_type, to_type):
17 raise ValueError(
18 f"Invalid relationship: {from_type.value} "
19 f"-[{rel_type.value}]-> {to_type.value}"
20 )
21 return self.create_relationship(from_id, to_id, rel_type.value, properties)Building a knowledge graph requires extracting entities and relationships from source content. LLMs excel at this task with proper prompting.
1from typing import List, Tuple
2from pydantic import BaseModel
3import json
4
5class ExtractedEntity(BaseModel):
6 name: str
7 type: str
8 properties: Dict[str, Any] = {}
9
10class ExtractedRelationship(BaseModel):
11 from_entity: str
12 to_entity: str
13 relationship_type: str
14 properties: Dict[str, Any] = {}
15
16class ExtractionResult(BaseModel):
17 entities: List[ExtractedEntity]
18 relationships: List[ExtractedRelationship]
19
20class EntityExtractor:
21 def __init__(self, llm_client, ontology: OntologySchema):
22 self.llm = llm_client
23 self.ontology = ontology
24
25 async def extract_from_text(self, text: str, context: str = "") -> ExtractionResult:
26 """Extract entities and relationships from text."""
27 entity_types = [e.value for e in EntityType]
28 rel_types = [r.value for r in RelationType]
29
30 prompt = f"""
31 Extract entities and relationships from the following text.
32
33 Entity Types: {entity_types}
34 Relationship Types: {rel_types}
35
36 Context: {context}
37
38 Text:
39 {text}
40
41 Return JSON in this exact format:
42 {{
43 "entities": [
44 {{"name": "...", "type": "...", "properties": {{}}}}
45 ],
46 "relationships": [
47 {{"from_entity": "...", "to_entity": "...", "relationship_type": "...", "properties": {{}}}}
48 ]
49 }}
50
51 Rules:
52 - Only use entity types and relationship types from the provided lists
53 - Entity names should be normalised (consistent casing, no abbreviations)
54 - Include relevant properties like dates, identifiers, descriptions
55 - Only extract relationships that are explicitly stated or strongly implied
56 """
57
58 response = await self.llm.complete(prompt, response_format="json")
59 return ExtractionResult.model_validate(json.loads(response))
60
61 async def extract_batch(
62 self,
63 documents: List[Dict[str, str]]
64 ) -> List[ExtractionResult]:
65 """Extract from multiple documents with entity resolution."""
66 results = []
67 for doc in documents:
68 result = await self.extract_from_text(
69 doc["content"],
70 context=doc.get("metadata", "")
71 )
72 results.append(result)
73 return resultsThe same entity may be mentioned differently across documents. Entity resolution merges these references:
1from sentence_transformers import SentenceTransformer
2import numpy as np
3
4class EntityResolver:
5 def __init__(self, similarity_threshold: float = 0.85):
6 self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
7 self.threshold = similarity_threshold
8 self.entity_index: Dict[str, List[str]] = {} # type -> [names]
9 self.embeddings: Dict[str, np.ndarray] = {} # name -> embedding
10
11 def add_entity(self, name: str, entity_type: str) -> str:
12 """Add entity and return canonical name (resolved if duplicate)."""
13 embedding = self.encoder.encode(name)
14
15 # Check for existing similar entity of same type
16 for existing_name in self.entity_index.get(entity_type, []):
17 existing_emb = self.embeddings[existing_name]
18 similarity = np.dot(embedding, existing_emb) / (
19 np.linalg.norm(embedding) * np.linalg.norm(existing_emb)
20 )
21
22 if similarity >= self.threshold:
23 # Found match - return existing canonical name
24 return existing_name
25
26 # No match - add as new entity
27 if entity_type not in self.entity_index:
28 self.entity_index[entity_type] = []
29 self.entity_index[entity_type].append(name)
30 self.embeddings[name] = embedding
31 return name
32
33 def resolve_extraction(
34 self,
35 extraction: ExtractionResult
36 ) -> ExtractionResult:
37 """Resolve all entities in an extraction result."""
38 name_mapping = {}
39
40 # Resolve entities
41 resolved_entities = []
42 for entity in extraction.entities:
43 canonical_name = self.add_entity(entity.name, entity.type)
44 name_mapping[entity.name] = canonical_name
45
46 if canonical_name == entity.name: # New entity
47 resolved_entities.append(entity)
48
49 # Update relationship references
50 resolved_relationships = []
51 for rel in extraction.relationships:
52 resolved_rel = ExtractedRelationship(
53 from_entity=name_mapping.get(rel.from_entity, rel.from_entity),
54 to_entity=name_mapping.get(rel.to_entity, rel.to_entity),
55 relationship_type=rel.relationship_type,
56 properties=rel.properties
57 )
58 resolved_relationships.append(resolved_rel)
59
60 return ExtractionResult(
61 entities=resolved_entities,
62 relationships=resolved_relationships
63 )1class GraphBuilder:
2 def __init__(self, db: KnowledgeGraphDB, extractor: EntityExtractor, resolver: EntityResolver):
3 self.db = db
4 self.extractor = extractor
5 self.resolver = resolver
6 self.entity_id_map: Dict[str, str] = {} # name -> db_id
7
8 async def process_document(self, document: Dict[str, Any]) -> Dict[str, Any]:
9 """Process a document and add to knowledge graph."""
10 # 1. Extract entities and relationships
11 extraction = await self.extractor.extract_from_text(
12 document["content"],
13 context=document.get("title", "")
14 )
15
16 # 2. Resolve entities
17 resolved = self.resolver.resolve_extraction(extraction)
18
19 # 3. Create nodes
20 for entity in resolved.entities:
21 if entity.name not in self.entity_id_map:
22 entity_id = self.db.create_entity(
23 entity.type,
24 {"name": entity.name, **entity.properties}
25 )
26 self.entity_id_map[entity.name] = entity_id
27
28 # 4. Create relationships
29 for rel in resolved.relationships:
30 from_id = self.entity_id_map.get(rel.from_entity)
31 to_id = self.entity_id_map.get(rel.to_entity)
32 if from_id and to_id:
33 self.db.create_relationship(
34 from_id, to_id, rel.relationship_type, rel.properties
35 )
36
37 return {
38 "entities_created": len(resolved.entities),
39 "relationships_created": len(resolved.relationships)
40 }Effective querying combines graph traversal with semantic search for powerful retrieval.
1class GraphQueryEngine:
2 def __init__(self, db: KnowledgeGraphDB):
3 self.db = db
4
5 def find_related_documents(
6 self,
7 entity_name: str,
8 max_depth: int = 2
9 ) -> List[Dict]:
10 """Find documents connected to an entity."""
11 query = """
12 MATCH (e {name: $name})
13 MATCH path = (e)-[*1..$depth]-(d:Document)
14 RETURN DISTINCT d, length(path) as distance
15 ORDER BY distance
16 LIMIT 20
17 """
18 with self.db.driver.session() as session:
19 result = session.run(query, name=entity_name, depth=max_depth)
20 return [dict(r) for r in result]
21
22 def find_path_between(
23 self,
24 entity1_name: str,
25 entity2_name: str,
26 max_length: int = 5
27 ) -> List[Dict]:
28 """Find shortest paths between two entities."""
29 query = """
30 MATCH (e1 {name: $name1}), (e2 {name: $name2})
31 MATCH path = shortestPath((e1)-[*1..$length]-(e2))
32 RETURN path, length(path) as path_length
33 ORDER BY path_length
34 LIMIT 5
35 """
36 with self.db.driver.session() as session:
37 result = session.run(
38 query,
39 name1=entity1_name,
40 name2=entity2_name,
41 length=max_length
42 )
43 return [dict(r) for r in result]
44
45 def get_entity_context(
46 self,
47 entity_name: str,
48 relationship_types: List[str] = None
49 ) -> Dict:
50 """Get an entity with all its immediate relationships."""
51 rel_filter = ""
52 if relationship_types:
53 rel_filter = ":" + "|".join(relationship_types)
54
55 query = f"""
56 MATCH (e {{name: $name}})
57 OPTIONAL MATCH (e)-[r{rel_filter}]-(connected)
58 RETURN e as entity,
59 collect({{
60 relationship: type(r),
61 direction: CASE WHEN startNode(r) = e THEN 'outgoing' ELSE 'incoming' END,
62 connected: connected
63 }}) as connections
64 """
65 with self.db.driver.session() as session:
66 result = session.run(query, name=entity_name)
67 record = result.single()
68 return dict(record) if record else None1class HybridSearchEngine:
2 def __init__(self, db: KnowledgeGraphDB, embedding_model):
3 self.db = db
4 self.embedder = embedding_model
5
6 async def hybrid_search(
7 self,
8 query: str,
9 vector_weight: float = 0.5,
10 top_k: int = 10
11 ) -> List[Dict]:
12 """
13 Combine vector similarity with graph connectivity.
14
15 Score = vector_weight * vector_score + (1-vector_weight) * graph_score
16 """
17 # 1. Get query embedding
18 query_embedding = self.embedder.encode(query).tolist()
19
20 # 2. Vector search for initial candidates
21 vector_results = self.db.vector_search(
22 query_embedding,
23 label="Document",
24 index_name="document_embeddings",
25 top_k=top_k * 3 # Get more candidates for reranking
26 )
27
28 # 3. Enhance with graph context
29 enhanced_results = []
30 for result in vector_results:
31 doc_id = result["node"]["id"]
32
33 # Get graph connectivity score
34 graph_context = await self._get_graph_score(doc_id, query_embedding)
35
36 combined_score = (
37 vector_weight * result["score"] +
38 (1 - vector_weight) * graph_context["score"]
39 )
40
41 enhanced_results.append({
42 "document": result["node"],
43 "vector_score": result["score"],
44 "graph_score": graph_context["score"],
45 "combined_score": combined_score,
46 "connected_entities": graph_context["entities"]
47 })
48
49 # 4. Sort by combined score and return top_k
50 enhanced_results.sort(key=lambda x: x["combined_score"], reverse=True)
51 return enhanced_results[:top_k]
52
53 async def _get_graph_score(
54 self,
55 doc_id: str,
56 query_embedding: List[float]
57 ) -> Dict:
58 """Calculate graph-based relevance score."""
59 # Get entities mentioned in document
60 query = """
61 MATCH (d:Document {id: $doc_id})-[:MENTIONS]->(e)
62 RETURN e.name as name, e.embedding as embedding
63 """
64 with self.db.driver.session() as session:
65 result = session.run(query, doc_id=doc_id)
66 entities = [dict(r) for r in result]
67
68 if not entities:
69 return {"score": 0, "entities": []}
70
71 # Calculate average similarity of mentioned entities to query
72 scores = []
73 for entity in entities:
74 if entity.get("embedding"):
75 similarity = np.dot(query_embedding, entity["embedding"]) / (
76 np.linalg.norm(query_embedding) * np.linalg.norm(entity["embedding"])
77 )
78 scores.append(similarity)
79
80 avg_score = np.mean(scores) if scores else 0
81 return {
82 "score": float(avg_score),
83 "entities": [e["name"] for e in entities]
84 }Integrating knowledge graphs with RAG pipelines enhances retrieval and provides structured context to the LLM.
1class GraphRAG:
2 def __init__(
3 self,
4 llm_client,
5 graph_db: KnowledgeGraphDB,
6 search_engine: HybridSearchEngine
7 ):
8 self.llm = llm_client
9 self.db = graph_db
10 self.search = search_engine
11
12 async def answer_query(self, query: str) -> Dict[str, Any]:
13 """Answer a query using knowledge graph-enhanced retrieval."""
14 # 1. Extract entities from query
15 query_entities = await self._extract_query_entities(query)
16
17 # 2. Hybrid search for relevant documents
18 search_results = await self.search.hybrid_search(query, top_k=5)
19
20 # 3. Get graph context for found entities
21 graph_context = await self._build_graph_context(
22 query_entities,
23 [r["document"]["id"] for r in search_results]
24 )
25
26 # 4. Format context for LLM
27 context = self._format_context(search_results, graph_context)
28
29 # 5. Generate answer
30 answer = await self._generate_answer(query, context)
31
32 return {
33 "answer": answer,
34 "sources": search_results,
35 "graph_context": graph_context
36 }
37
38 async def _extract_query_entities(self, query: str) -> List[str]:
39 """Extract entity mentions from query."""
40 prompt = f"""
41 Extract entity names from this query:
42 "{query}"
43
44 Return as JSON array of strings.
45 """
46 response = await self.llm.complete(prompt, response_format="json")
47 return json.loads(response)
48
49 async def _build_graph_context(
50 self,
51 query_entities: List[str],
52 document_ids: List[str]
53 ) -> Dict:
54 """Build structured context from graph relationships."""
55 context = {
56 "entity_relationships": [],
57 "paths": []
58 }
59
60 # Get relationships for query entities
61 for entity_name in query_entities:
62 entity_context = self.db.get_entity_context(entity_name)
63 if entity_context:
64 context["entity_relationships"].append(entity_context)
65
66 # Find paths between query entities and document entities
67 for doc_id in document_ids:
68 doc_entities = self._get_document_entities(doc_id)
69 for q_entity in query_entities:
70 for d_entity in doc_entities:
71 paths = self.db.find_path_between(q_entity, d_entity)
72 if paths:
73 context["paths"].extend(paths[:2]) # Top 2 paths
74
75 return context
76
77 def _format_context(
78 self,
79 search_results: List[Dict],
80 graph_context: Dict
81 ) -> str:
82 """Format context for LLM consumption."""
83 sections = []
84
85 # Document context
86 sections.append("## Relevant Documents")
87 for i, result in enumerate(search_results, 1):
88 doc = result["document"]
89 sections.append(f"### Document {i}: {doc.get('title', 'Untitled')}")
90 sections.append(doc.get("content", ""))
91 if result.get("connected_entities"):
92 sections.append(f"Related entities: {', '.join(result['connected_entities'])}")
93
94 # Graph context
95 if graph_context.get("entity_relationships"):
96 sections.append("\n## Entity Relationships")
97 for entity_ctx in graph_context["entity_relationships"]:
98 entity = entity_ctx["entity"]
99 sections.append(f"### {entity.get('name', 'Unknown')}")
100 for conn in entity_ctx.get("connections", [])[:5]:
101 sections.append(
102 f"- {conn['direction']}: {conn['relationship']} -> "
103 f"{conn['connected'].get('name', 'Unknown')}"
104 )
105
106 return "\n\n".join(sections)
107
108 async def _generate_answer(self, query: str, context: str) -> str:
109 """Generate answer using LLM with graph-enhanced context."""
110 prompt = f"""
111 Answer the following question using the provided context.
112 Include relevant relationships and connections in your answer.
113
114 Context:
115 {context}
116
117 Question: {query}
118
119 Answer:
120 """
121 return await self.llm.complete(prompt)Knowledge graphs at scale require careful attention to performance. Here are key optimisation strategies:
1def setup_indexes(db: KnowledgeGraphDB):
2 """Configure indexes for common query patterns."""
3 indexes = [
4 # Unique constraints (also create indexes)
5 "CREATE CONSTRAINT doc_id IF NOT EXISTS FOR (d:Document) REQUIRE d.id IS UNIQUE",
6 "CREATE CONSTRAINT entity_name IF NOT EXISTS FOR (e:Entity) REQUIRE e.name IS UNIQUE",
7
8 # Property indexes for common lookups
9 "CREATE INDEX doc_title IF NOT EXISTS FOR (d:Document) ON (d.title)",
10 "CREATE INDEX doc_created IF NOT EXISTS FOR (d:Document) ON (d.created_at)",
11 "CREATE INDEX entity_type IF NOT EXISTS FOR (e:Entity) ON (e.type)",
12
13 # Full-text indexes for search
14 """CREATE FULLTEXT INDEX doc_content IF NOT EXISTS
15 FOR (d:Document) ON EACH [d.title, d.content]""",
16
17 # Relationship property indexes (for filtered traversals)
18 "CREATE INDEX rel_created IF NOT EXISTS FOR ()-[r:MENTIONS]-() ON (r.created_at)",
19 ]
20
21 with db.driver.session() as session:
22 for index in indexes:
23 try:
24 session.run(index)
25 except Exception as e:
26 print(f"Index creation note: {e}")1interface CachedQuery {
2 result: any;
3 timestamp: Date;
4 queryHash: string;
5}
6
7class GraphQueryCache {
8 private cache: Map<string, CachedQuery> = new Map();
9 private readonly ttlMs: number;
10
11 constructor(ttlMs: number = 300000) { // 5 minute default TTL
12 this.ttlMs = ttlMs;
13 }
14
15 private hashQuery(query: string, params: Record<string, any>): string {
16 const input = JSON.stringify({ query, params });
17 return createHash('md5').update(input).digest('hex');
18 }
19
20 async executeWithCache<T>(
21 query: string,
22 params: Record<string, any>,
23 executor: () => Promise<T>
24 ): Promise<T> {
25 const hash = this.hashQuery(query, params);
26
27 // Check cache
28 const cached = this.cache.get(hash);
29 if (cached && (Date.now() - cached.timestamp.getTime()) < this.ttlMs) {
30 return cached.result as T;
31 }
32
33 // Execute and cache
34 const result = await executor();
35 this.cache.set(hash, {
36 result,
37 timestamp: new Date(),
38 queryHash: hash
39 });
40
41 return result;
42 }
43
44 invalidatePattern(pattern: string): void {
45 // Invalidate cache entries matching pattern
46 for (const [key, entry] of this.cache) {
47 if (entry.queryHash.includes(pattern)) {
48 this.cache.delete(key);
49 }
50 }
51 }
52}| Metric | Target | Action if Exceeded |
|---|---|---|
| Query P95 latency | < 100ms | Add indexes, optimise queries |
| Traversal depth | < 4 hops avg | Review ontology, add shortcuts |
| Cache hit rate | > 60% | Increase cache TTL, warm cache |
| Memory usage | < 80% heap | Scale cluster, archive old data |
Knowledge graphs transform how AI systems understand and navigate complex information. By explicitly modelling entities and relationships, you enable queries that pure vector search cannot handle—multi-hop reasoning, relationship-aware retrieval, and explainable connections between concepts.
The patterns in this guide—from ontology design through entity extraction to hybrid search—provide a foundation for building production knowledge graph systems. Start with a focused ontology, implement robust entity resolution, and integrate thoughtfully with your existing RAG pipeline for immediate improvements in retrieval quality.
As your graph grows, the value compounds. Each new document adds not just content but connections, and the system becomes increasingly capable of surfacing relevant information through relationship paths that would be invisible to traditional search.
Deep dive into multi-agent system architecture for AI applications. Learn communication protocols, orchestration patterns, and implementation strategies with production-ready code examples.
Learn how RAG combines the power of large language models with your business data to provide accurate, contextual AI responses. Complete guide to understanding and implementing RAG systems.
Discover how vector databases enable semantic search, power RAG systems, and revolutionize how AI accesses information. Complete guide to embeddings, similarity search, and choosing the right vector database.