Bootstrapping Audio Embeddings from Multimodal LLMs
Turn any multimodal LLM into a small audio embedding model that beats CLAP with 25x less data.
Turn any multimodal LLM into a small audio embedding model that beats CLAP with 25x less data.
A tiny transformer that fingerprints embedding models by reading raw numerical digits. No feature engineering.
Two sub-1B multilingual embeddings with best-in-class performance, available on Elastic Inference Service, Llama.cpp and MLX.
New 2B vision language model achieves SOTA on multilingual VQA, no catastrophic forgetting on text-only tasks.
New 0.6B-parameter listwise reranker that considers the query and all candidate documents in a single context window.
Embedding models aren't the most glamorous aspect of the AI industry, but image generators and chatbots couldn't exist without them.
We brought multimodal embeddings to llama.cpp and GGUF, and uncovered a few surprising issues along the way.
Code generation LLMs → code embeddings: 0.5B/1.5B models achieve SOTA performance across 25 code retrieval benchmarks.
Jina MCP streamlines agent development by connecting our APIs to any LLM, reducing custom code and improving reliability of the workflow.
4000 tokens/sec for a 3B-parameter embedding model on L4 GPU is probably as fast as you'll get with llama.cpp. Or is it?
Sharing what we saw and learned at SIGIR 2025, feat. CLIP-AdaM, RE-AdaptIR and evaluations for LLM-based retrieval systems.
Image resolution is crucial for embedding visually rich documents. Too small and models miss key details; too large and they can't connect the parts.