Feature Extraction
sentence-transformers
Safetensors
Transformers
multilingual
llama_bidirec
text
sentence-similarity
mteb
mmteb
custom_code
text-embeddings-inference
Instructions to use nvidia/llama-embed-nemotron-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use nvidia/llama-embed-nemotron-8b with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("nvidia/llama-embed-nemotron-8b", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use nvidia/llama-embed-nemotron-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="nvidia/llama-embed-nemotron-8b", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("nvidia/llama-embed-nemotron-8b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Upstream transformers support with `use_bidirectional_attention`
#13
by michaelfeil - opened
Can you make a PR in transformers for use_bidirectional_attention for the llama arch?
@michaelfeil do you mean adding bidirectional support to LLama's architecture in transformers? Is there any issue with using custom LlamaBidirectionalModel class?
Yeah, similar to Gemma3Embedding.
E.g. refer to enum implmentation in Text-embeddings-inference!
since its merged here, https://huggingface.co/nvidia/llama-embed-nemotron-8b/commit/1acaf42b890bafa464ef9a58d1c0db0dd26120d4 I am closing.
michaelfeil changed discussion status to closed