Ollama backend for qmd — enables vector embeddings and semantic search via Ollama's API.
qmd only ships with llama_cpp and sentence_tf backends, both requiring heavy dependencies (llama-cpp-python or PyTorch). This package adds an Ollama backend that uses your existing Ollama server — zero extra dependencies beyond requests.
- Embeddings via Ollama's
/api/embedendpoint (any embedding model) - Cosine-similarity reranking using the same embedding model
- Simple query expansion fallback (no LLM generation needed)
- Auto-patches qmd on import — just
pip installand it works - Configurable via environment variables
pip install git+https://github.com/akshaydeshraj/qmd-ollama.gitSet these environment variables:
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST |
http://localhost:11434 |
Ollama server URL |
QMD_OLLAMA_MODEL |
qwen3-embedding:0.6b |
Embedding model name |
After installation, qmd automatically picks up the Ollama backend:
export OLLAMA_HOST=http://your-ollama-server:11434
qmd embed
qmd search "your query"
qmd query "your query"The package monkey-patches qmd's create_llm_backend() to try the Ollama backend first (before falling back to llama_cpp/sentence_tf).
The Ollama backend:
- Calls
POST /api/embedfor embeddings - Uses cosine similarity between query/document embeddings for reranking
- Returns simple lexical + vector query expansions (no LLM generation)
qwen3-embedding:0.6b(recommended — SOTA for size, 1024 dims, 32K context)nomic-embed-text(works, but older generation)
Any Ollama embedding model should work.
MIT