Bootstrapping Audio Embeddings from Multimodal LLMs
Turn any multimodal LLM into a small audio embedding model that beats CLAP with 25x less data.
Turn any multimodal LLM into a small audio embedding model that beats CLAP with 25x less data.
A tiny transformer that fingerprints embedding models by reading raw numerical digits. No feature engineering.
Two sub-1B multilingual embeddings with best-in-class performance, available on Elastic Inference Service, Llama.cpp and MLX.
New 2B vision language model achieves SOTA on multilingual VQA, no catastrophic forgetting on text-only tasks.
New 0.6B-parameter listwise reranker that considers the query and all candidate documents in a single context window.
Embedding models aren't the most glamorous aspect of the AI industry, but image generators and chatbots couldn't exist without them.
We brought multimodal embeddings to llama.cpp and GGUF, and uncovered a few surprising issues along the way.
Code generation LLMs → code embeddings: 0.5B/1.5B models achieve SOTA performance across 25 code retrieval benchmarks.
Jina MCP streamlines agent development by connecting our APIs to any LLM, reducing custom code and improving reliability of the workflow.
4000 tokens/sec for a 3B-parameter embedding model on L4 GPU is probably as fast as you'll get with llama.cpp. Or is it?
Sharing what we saw and learned at SIGIR 2025, feat. CLIP-AdaM, RE-AdaptIR and evaluations for LLM-based retrieval systems.
Image resolution is crucial for embedding visually rich documents. Too small and models miss key details; too large and they can't connect the parts.
Press
JinaVDR is a new benchmark spanning 95 tasks across 20 languages for visual document retrieval, soon on MTEB.
Tech Blog
While others rely on prompt tuning and hope for the best, you should learn submodular optimization that provides a principled framework with theoretical guarantees for better context engineering.
Tech Blog
Many know the importance of query diversity in DeepResearch, but few know how to solve it rigorously via submodular optimization.
Tech Blog
Quantization gives smaller embeddings. We show you fine-tuned quantization gives you even lossless embeddings.
Press
Jina Embeddings v4 is a 3.8 billion parameter universal embedding model for multimodal and multilingual retrieval that supports both single-vector and multi-vector embedding outputs.
Tech Blog
As serious as we are about MTEB, we also love vibe-testing. Correlations is a simple GUI we use for validating citations in DeepSearch, debugging late chunking, and vibe-testing embeddings. Now it's open-source.
Events
We collect some most interesting papers in ICLR 2025, featuring TIPS, FlexPrefill, Zero-Shot Rerankers, SVD-LLM, Hymba etc.
Tech Blog
Text similarity: 0.7. Image similarity: 0.5. Which document is more relevant? You literally cannot tell—and that's the core problem breaking multimodal search. We solve it with unified reranking.
Tech Blog
Boost robustness and performance with model soups: averaging weights. No extra cost, better results.
Tech Blog
Size bias refers to how the length of text inputs affects similarity, regardless of semantic relevance. It explains why search systems sometimes return long, barely-relevant documents instead of shorter, more precise matches to your query.
Press
Introducing jina-reranker-m0, our new multilingual multimodal reranker for retrieving visual documents, with SOTA performance on multilingual long documents and code searching tasks.
Tech Blog
Standard LLM or reasoning model, which is better for DeepSearch? In this post, we explored using DeepSeek-R1 in the DeepSearch implementation for choosing the next action.