Upstream transformers support with `use_bidirectional_attention`

#13

by michaelfeil - opened Jan 23

Discussion

michaelfeil

Jan 23

Can you make a PR in transformers for use_bidirectional_attention for the llama arch?

ybabakhin

NVIDIA org Jan 26

@michaelfeil do you mean adding bidirectional support to LLama's architecture in transformers? Is there any issue with using custom LlamaBidirectionalModel class?

michaelfeil

Jan 27

Yeah, similar to Gemma3Embedding.

michaelfeil

Jan 27

•

edited Jan 27

E.g. refer to enum implmentation in Text-embeddings-inference!

michaelfeil

Jan 27

since its merged here, https://huggingface.co/nvidia/llama-embed-nemotron-8b/commit/1acaf42b890bafa464ef9a58d1c0db0dd26120d4 I am closing.

michaelfeil changed discussion status to closed Jan 27

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment