Providers: LLM & Embedder Configuration¶
GraphRAG SDK uses two provider types: LLM (for text generation and extraction) and Embedder (for vector embeddings). Both are abstract base classes with built-in implementations for LiteLLM and OpenRouter.
Provider Overview¶
| Provider | LLM Class | Embedder Class | Install Extra | Models Supported |
|---|---|---|---|---|
| LiteLLM | LiteLLM |
LiteLLMEmbedder |
pip install graphrag-sdk[litellm] |
Azure OpenAI, OpenAI, Anthropic, Cohere, 100+ |
| OpenRouter | OpenRouterLLM |
OpenRouterEmbedder |
pip install graphrag-sdk[openrouter] |
All OpenRouter models |
| Custom | Subclass LLMInterface |
Subclass Embedder |
-- | Anything |
LiteLLM (Recommended)¶
LiteLLM provides a unified interface to 100+ LLM providers. It is the recommended default.
LLM¶
from graphrag_sdk import LiteLLM
# Azure OpenAI
llm = LiteLLM(
model="azure/gpt-4.1",
api_key="your-azure-key",
api_base="https://your-resource.openai.azure.com/",
api_version="2024-12-01-preview",
temperature=0.0, # default: 0.0
max_tokens=None, # default: None (provider default)
)
# OpenAI direct
llm = LiteLLM(
model="gpt-4o",
api_key="your-openai-key",
)
# Anthropic
llm = LiteLLM(
model="anthropic/claude-sonnet-4-20250514",
api_key="your-anthropic-key",
)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
str |
required | Model identifier (use provider/model format for non-OpenAI) |
api_key |
str \| None |
None |
API key (or set via environment variable) |
api_base |
str \| None |
None |
Base URL (required for Azure) |
api_version |
str \| None |
None |
API version (required for Azure) |
temperature |
float |
0.0 |
Sampling temperature |
max_tokens |
int \| None |
None |
Max output tokens |
Embedder¶
from graphrag_sdk import LiteLLMEmbedder
# Azure OpenAI
embedder = LiteLLMEmbedder(
model="azure/text-embedding-ada-002",
api_key="your-azure-key",
api_base="https://your-resource.openai.azure.com/",
api_version="2024-12-01-preview",
)
# OpenAI direct
embedder = LiteLLMEmbedder(
model="text-embedding-3-small",
api_key="your-openai-key",
)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
str |
required | Embedding model identifier |
api_key |
str \| None |
None |
API key |
api_base |
str \| None |
None |
Base URL |
api_version |
str \| None |
None |
API version |
OpenRouter¶
OpenRouter aggregates models from multiple providers behind a single API.
LLM¶
from graphrag_sdk import OpenRouterLLM
llm = OpenRouterLLM(
model="anthropic/claude-sonnet-4-20250514",
api_key="your-openrouter-key", # or set OPENROUTER_API_KEY env var
temperature=0.0,
max_tokens=None,
extra_headers={}, # optional custom headers
)
Embedder¶
from graphrag_sdk import OpenRouterEmbedder
embedder = OpenRouterEmbedder(
model="openai/text-embedding-ada-002",
api_key="your-openrouter-key",
extra_headers={},
)
LLMInterface ABC¶
To integrate a provider not covered by LiteLLM or OpenRouter, subclass LLMInterface.
Required Method¶
from graphrag_sdk import LLMInterface
from graphrag_sdk.core.models import LLMResponse
class MyLLM(LLMInterface):
def __init__(self, model_name: str = "my-model", **kwargs):
super().__init__(model_name=model_name)
# Initialize your client
def invoke(self, prompt: str, **kwargs) -> LLMResponse:
"""Synchronous text generation. REQUIRED."""
response = my_client.generate(prompt)
return LLMResponse(content=response.text)
Optional Overrides¶
| Method | Default Behavior | Override When |
|---|---|---|
ainvoke(prompt, max_retries=3) |
Runs invoke() in a thread pool with retry |
You have a native async client |
ainvoke_messages(messages, max_retries=3) |
Concatenates messages into a single prompt and calls ainvoke() |
You have a native multi-turn chat API |
invoke_with_model(prompt, response_model) |
Calls invoke() and parses JSON into Pydantic model |
Your provider has native structured output |
ainvoke_with_model(prompt, response_model) |
Calls ainvoke() and parses JSON |
Same, async version |
abatch_invoke(prompts, max_concurrency) |
Concurrent ainvoke() with semaphore |
You have a native batch API |
ainvoke_messages() is called by completion() when conversation history is provided. Override it to pass messages natively to your LLM's chat API for proper multi-turn handling:
from graphrag_sdk.core.models import ChatMessage, LLMResponse
class MyLLM(LLMInterface):
def invoke(self, prompt: str, **kwargs) -> LLMResponse:
response = my_client.generate(prompt)
return LLMResponse(content=response.text)
async def ainvoke_messages(self, messages: list[ChatMessage], *, max_retries=3, **kwargs) -> LLMResponse:
"""Native multi-turn — pass messages directly to your chat API."""
response = await my_client.chat(
messages=[m.to_dict() for m in messages],
)
return LLMResponse(content=response.text)
Constructor Parameters¶
LLMInterface.__init__(
model_name: str, # Model identifier
model_params: dict | None = None, # Provider-specific params
max_concurrency: int = 12, # Parallel call limit for abatch_invoke
)
Embedder ABC¶
Required Methods¶
from graphrag_sdk import Embedder
class MyEmbedder(Embedder):
@property
def model_name(self) -> str:
"""Identifier for the embedding model. REQUIRED."""
return "my-embedding-model"
def embed_query(self, text: str, **kwargs) -> list[float]:
"""Embed a single text. REQUIRED."""
return my_model.encode(text).tolist()
The model_name property is used by the graph config node to validate that the same embedding model is used for ingestion and retrieval.
Optional Overrides¶
| Method | Default Behavior | Override When |
|---|---|---|
aembed_query(text) |
Runs embed_query() in thread pool |
You have async embedding |
embed_documents(texts) |
Sequential embed_query() per text |
You can batch embeddings |
aembed_documents(texts) |
Runs embed_documents() in thread pool |
You have async batch |
Batch Embedding¶
The embed_documents() and aembed_documents() methods are critical for performance. The ingestion pipeline calls them with hundreds or thousands of texts. If your provider supports batch embedding, always override these methods:
class MyEmbedder(Embedder):
def embed_query(self, text: str, **kwargs) -> list[float]:
return self.model.encode(text).tolist()
def embed_documents(self, texts: list[str], **kwargs) -> list[list[float]]:
# Batch embedding -- much faster than sequential
return self.model.encode(texts).tolist()
Environment Variables¶
For convenience, you can configure providers via environment variables instead of passing parameters directly. LiteLLM respects standard environment variables:
| Variable | Provider |
|---|---|
OPENAI_API_KEY |
OpenAI |
AZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSION |
Azure OpenAI |
ANTHROPIC_API_KEY |
Anthropic |
COHERE_API_KEY |
Cohere |
OPENROUTER_API_KEY |
OpenRouter |
See the LiteLLM documentation for the full list of supported providers and their environment variables.
Choosing a Provider¶
| Use Case | Recommendation |
|---|---|
| Production (Azure) | LiteLLM with azure/ prefix |
| Development (OpenAI) | LiteLLM with OpenAI models |
| Budget-conscious | OpenRouterLLM for model price comparison |
| Local models | Custom LLMInterface wrapping Ollama, vLLM, etc. |
| Local embeddings | Custom Embedder wrapping sentence-transformers |