**Essential Configuration:**
1. **Client Configuration:**
```python
import chromadb
from chromadb.config import Settings
# In-memory (development)
client = chromadb.Client()
# Persistent (single machine)
client = chromadb.PersistentClient(path="/path/to/chroma/data")
# HTTP Client (production)
client = chromadb.HttpClient(
host='localhost',
port=8000,
settings=Settings(
chroma_client_auth_provider="chromadb.auth.basic.BasicAuthClientProvider",
chroma_client_auth_credentials="admin:password"
)
)
```
2. **Collection Management:**
```python
# Create collection with custom embedding function
collection = client.create_collection(
name="my_collection",
embedding_function=chromadb.utils.embedding_functions.OpenAIEmbeddingFunction(
api_key="your-openai-key",
model_name="text-embedding-3-small"
),
metadata={"description": "My vector collection"}
)
# Get or create collection
collection = client.get_or_create_collection(name="my_collection")
# Delete collection
client.delete_collection(name="my_collection")
```
3. **Adding Documents:**
```python
collection.add(
documents=["Document 1 text", "Document 2 text", "Document 3 text"],
metadatas=[
{"source": "web", "date": "2024-01-01"},
{"source": "pdf", "date": "2024-01-02"},
{"source": "api", "date": "2024-01-03"}
],
ids=["id1", "id2", "id3"]
)
# Add with custom embeddings
collection.add(
embeddings=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
documents=["doc1", "doc2"],
ids=["id1", "id2"]
)
```
4. **Querying with Filters:**
```python
# Semantic search
results = collection.query(
query_texts=["What is RAG?"],
n_results=10
)
# With metadata filtering
results = collection.query(
query_texts=["What is RAG?"],
n_results=10,
where={"source": "web"}, # Exact match
where_document={"$contains": "retrieval"} # Full-text search
)
# Complex filters
results = collection.query(
query_texts=["machine learning"],
n_results=5,
where={
"$and": [
{"source": {"$in": ["web", "pdf"]}},
{"date": {"$gte": "2024-01-01"}}
]
}
)
```
5. **Embedding Functions:**
```python
# OpenAI embeddings
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
embedding_fn = OpenAIEmbeddingFunction(
api_key="your-key",
model_name="text-embedding-3-small"
)
# Sentence Transformers (local)
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
embedding_fn = SentenceTransformerEmbeddingFunction(
model_name="all-MiniLM-L6-v2"
)
# Cohere embeddings
from chromadb.utils.embedding_functions import CohereEmbeddingFunction
embedding_fn = CohereEmbeddingFunction(
api_key="your-cohere-key",
model_name="embed-english-v3.0"
)
# Custom embedding function
from chromadb import EmbeddingFunction
class MyEmbeddingFunction(EmbeddingFunction):
def __call__(self, texts):
# Your embedding logic here
return embeddings
```
6. **Update and Delete:**
```python
# Update documents
collection.update(
ids=["id1"],
documents=["Updated document text"],
metadatas=[{"source": "updated"}]
)
# Delete by IDs
collection.delete(ids=["id1", "id2"])
# Delete by filter
collection.delete(where={"source": "old"})
```
7. **Environment Variables (Server):**
```bash
# Authentication
export CHROMA_SERVER_AUTH_CREDENTIALS="admin:password"
export CHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.basic.BasicAuthServerProvider"
# Persistence
export IS_PERSISTENT=TRUE
export PERSIST_DIRECTORY=/path/to/data
# Performance
export CHROMA_SERVER_HTTP_PORT=8000
export CHROMA_OTEL_COLLECTION_ENDPOINT="" # Disable telemetry
export ANONYMIZED_TELEMETRY=FALSE
```
8. **Performance Optimization:**
```python
# Batch operations
collection.add(
documents=large_document_list,
ids=large_id_list,
metadatas=large_metadata_list
)
# Use persistent client for disk caching
client = chromadb.PersistentClient(path="./chroma_data")
# Optimize n_results for speed
results = collection.query(query_texts=["query"], n_results=10) # Lower is faster
```
**Best Practices:**
- Use persistent or server mode for production (not in-memory)
- Batch document additions for better performance
- Choose appropriate embedding models (size vs accuracy trade-off)
- Index metadata fields you frequently filter on
- Use meaningful collection names and metadata
- Implement proper authentication in server mode
- Regular backups of chroma_data directory
- Monitor embedding generation costs (API-based models)
- Use local embedding models (Sentence Transformers) for cost savings
- Implement retry logic for API-based embedding functions