Skip to main content

Command Palette

Search for a command to run...

My binary vector search is better than your FP32 vectors

Binary Vector Search: The 30x Memory Reduction Revolution with Preserved Accuracy

Updated
6 min read
My binary vector search is better than your FP32 vectors

Within the field of vector search, an intriguing development has arisen: binary vector search. This approach shows promise in tackling the long-standing issue of memory consumption by achieving a remarkable 30x reduction. However, a critical aspect that sparks debate is its effect on accuracy.

We believe that using binary vector search, along with specific optimization techniques, can maintain similar accuracy. To provide clarity on this subject, we showcase a series of experiments that will demonstrate the effects and implications of this approach.

What is a binary vector?

A binary vector is a representation of a vector where each element in the vector is encoded as a binary value, typically either 0 or 1. This encoding scheme transforms the original vector, which may contain real-valued or high-dimensional data, into a binary format.

$$V[1\times 256]=\left[ \begin{matrix} -0.021 & 0.243 & 0.065 & -0.223 & \cdots & 0.452 & -0.248 \end{matrix} \right]$$

$$BV[1\times 256]=\left[ \begin{matrix} 0 & 1 & 1 & 0 & \cdots & 1 & 0 \end{matrix} \right]$$

Binary vectors require only one bit of memory to store each element, while the original float32 vectors need 4 bytes for each element. This means that using binary vectors can reduce memory usage by up to 32 times. Additionally, this reduction in memory requirements corresponds to a notable increase in Requests Per Second (RPS) for binary vector operations.

Let's consider an example where we have 1 million vectors, and each vector is represented by float32 values in a 3072-dimensional space. In this scenario, the original float32 vector index would require around 20 gigabytes (GB) of memory to store all the vectors.

Now, if we were to use binary vectors instead, the memory usage would be significantly reduced. In this case, the binary vector index would take approximately 600 megabytes (MB) to store all 1 million vectors.

However, it was expected that this reduction in memory would lead to a significant decrease in accuracy because binary vectors lose a lot of the original information.

Surprisingly, our experiments showed that the decrease in accuracy was not as big as expected. Even though binary vectors lose some specific details, they can still capture important patterns and similarities that allow them to maintain a reasonable level of accuracy.

Experiment

To evaluate the performance metrics in comparison to the original vector approach, we conducted benchmarking using the dbpedia-entities-openai3-text-embedding-3-large-3072-1M dataset. The benchmark was performed on a Google Cloud virtual machine (VM) with specifications of n2-standard-8, which includes 8 virtual CPUs and 32GB of memory. We used pgvecto.rs v0.2.1 as the vector database.

After inserting 1 million vectors into the database table, we built indexes for both the original float32 vectors and the binary vectors.

CREATE TABLE openai3072 (
  id bigserial PRIMARY KEY,
  text_embedding_3_large_3072_embedding vector(3072),
  text_embedding_3_large_3072_bvector bvector(3072)
);

CREATE INDEX openai_vector_index on openai3072 using vectors(text_embedding_3_large_3072_embedding vector_l2_ops);

CREATE INDEX openai_vector_index_bvector ON public.openai3072 USING vectors (text_embedding_3_large_3072_bvector bvector_l2_ops);

After building the indexes, we conducted vector search queries to assess the performance. These queries were executed with varying limits, indicating the number of search results to be retrieved (limit 5, 10, 50, 100).

We observed that the Requests Per Second (RPS) for binary vector search was approximately 3000, whereas the RPS for the original vector search was only around 300.

The RPS metric indicates the number of requests or queries that can be processed by the system per second. A higher RPS value signifies a higher throughput and faster response time.

However, the accuracy of the binary vector search was reduced to about 80% compared to the original vector search. While this decrease may not be seen as significant in some cases, it can be considered unacceptable in certain situations where achieving high accuracy is crucial.

Optimization: adaptive retrieval

Luckily, we have a simple and effective method called adaptive retrieval, which we learned from the Matryoshka Representation Learning, to improve the accuracy.

The name is complex but the idea behind adaptive retrieval is straightforward. Let's say we want to find the best 100 candidates. We can follow these steps:

  1. Query the binary vector index to retrieve a larger set (e.g. 200 candidates) from the 1 million embeddings. This is a fast operation.

  2. Rerank the candidates using a KNN query to retrieve the top 100 candidates. Please notice that we are running KNN instead of ANN. KNN is well-suited for scenarios where we need to work with smaller sets and perform accurate similarity search, making it an excellent choice for reranking in this case.

By incorporating this reranking step, we can achieve a notable increase in accuracy, potentially reaching up to 95%. Additionally, the system maintains a high Requests Per Second (RPS), approximately 1700. Furthermore, despite these improvements, the memory usage of the index remains significantly smaller, around 30 times less, compared to the original vector representation.

Below is the SQL code that can be used to execute the adaptive retrieval:

CREATE OR REPLACE FUNCTION match_documents_adaptive(
  query_embedding vector(3072),
  match_count int
)
RETURNS SETOF openai3072
LANGUAGE SQL
AS $$
-- Step 1: Query binary vector index to retrieve match_count * 2 candidates
WITH shortlist AS (
  SELECT *
  FROM openai3072
  ORDER BY text_embedding_3_large_3072_bvector <-> binarize(query_embedding)
  LIMIT match_count * 2
)
-- Step 2: Rerank the candidates using a KNN query to retrieve the top candidates
SELECT *
FROM shortlist
ORDER BY text_embedding_3_large_3072_embedding <-> query_embedding
LIMIT match_count;
$$;

Comparison with shortening vectors

OpenAI latest embedding model text-embedding-3-large has a feature that allows you to shorten vectors.

It produces embeddings with 3072 dimensions by default. But you could safely remove some numbers from the end of the sequence and still maintain a valid representation for the text. For example, you could shorten the embeddings to 1024 dimensions.

This feature can help you save memory and make your requests faster, just like binary vectors. It would be a good idea to compare the performance and see which one works better for your needs.

Based on what we discovered, the conclusion is clear: Binary vectors significantly outperform shortened vectors.

We performed similar benchmarks to compare with binary vectors. We created indexes using the same dataset and machine type, but with varying dimensionalities. One index had 256 dimensions, while the other had 1024 dimensions.

The 1024-dimensional index achieved an accuracy of approximately 85% with a request rate of 1000 requests per second (RPS). On the other hand, the 256-dimensional index had around 60% accuracy with a higher request rate of 1200 RPS.

The 1024-dimensional index required approximately 8GB of memory, while the 256-dimensional index used around 2GB. In comparison, the binary vector approach achieved an accuracy of around 80% with a request rate of 3000 RPS, and its memory usage was approximately 600MB.

We implemented adaptive retrieval with lower-dimensional indexes. The binary vector index still outperformed the 256-dimensional index in terms of both request rate (RPS) and accuracy, while also exhibiting lower memory usage. On the other hand, the adaptive retrieval with the 1024-dimensional index achieved a higher accuracy of 99%; however, it had a relatively lower request rate and consumed 12 times more memory compared to the other indexes.

Summary

By utilizing adaptive retrieval techniques, binary vectors can maintain a high level of accuracy while significantly reducing memory usage by 30 times. We have presented benchmark metrics in a table to showcase the results. It is important to note that these outcomes are specific to the openai text-embedding-3-large model, which possesses this particular property.

Comments (11)

Join the discussion
E

I NEED A HACKER TO RECOVER MY LOST CRYPTOS VISIT [ ROCKET SPEED RECOVERY HACKER ]

My name is Emma Raducanu and I am from Toronto, Canada well known as a Professional Tennis Player. Have you fallen victim to a Ponzi scheme or are you having problems accessing your bitcoin wallet? For a 100% guarantee on the return of your wallet and any lost money, get in touch with ROCKET SPEED RECOVERY HACKER. In addition, ROCKET SPEED RECOVERY HACKER can help you with phone hacking, iCloud breaches, YouTube breaches, removing criminal records, and enhancing your university grades. Captain ROCKET SPEED RECOVERY HACKER has been helping people and businesses secure and retrieve stolen digital currencies like Eth and Bitcoin for years. contact them at...

WhatsApp .. +1 (269) 369–5436 Email… rocketspeedrecoveryhacker@seznam.cz Telegram..http://t.me/rocketspeedrecoveryhacker

B

TROPICAL DELIGHT RECOVERY IS THE ONLY VERIFIED LICENSE COMPANY.

Are you looking for a way to recover cryptocurrency you misplaced or had stolen? I'm pleased to inform that Tropical Delight Recovery Hacker has resolved the issue that nearly caused my house to fall since I spent the monies my spouse wanted to utilise to start a business. I lost $1.4 million in bitcoin and etherium to these scammers. They excel at what they do and are the leading certified cryptocurrency recovery company. They can be reached instantly at: @ tropicaldelightrecoveryhacker on Telegram and @ tropicaldelightrecoveryhacker.93 on Signal. The email address is TropicalDelightRecoveryHacker @ out look .com.

O

who know how i can get this solve?

We suspect that the sender of this email is hiding something, can you obtain the secret message? Note : flag format is Flag{SECRET MESSAGE}.

B

buy CODEINE online with Overnight Shipping

https://undergroundmeds.com. Introducing Underground Meds - Your Pathway to Premium Health Solutions! 🔥

Are you tired of the endless search for top-notch health products and supplements that truly deliver results? Look no further! Welcome to Underground Meds, your ultimate destination for cutting-edge health solutions that will transform your life.

🏆 Why Choose Underground Meds? 🏆

Unparalleled Quality: We take pride in curating only the highest-quality products from renowned suppliers and manufacturers. Each item in our collection undergoes rigorous testing to ensure purity, potency, and effectiveness.

Diverse Range: From immune-boosting essentials to cutting-edge cognitive enhancers, our extensive product selection caters to all your health needs. Discover a treasure trove of supplements that will supercharge your well-being.

Expert Guidance: Say goodbye to confusion! Our team of health experts is committed to providing personalized recommendations and expert guidance. We're here to help you make the right choices for your unique health journey.

Secure Shopping: Your safety is our top priority. Our website employs state-of-the-art security measures to safeguard your data, ensuring a worry-free shopping experience.

Customer Satisfaction: Join thousands of satisfied customers who have experienced remarkable transformations with our products. Read their inspiring stories and embark on your path to wellness.

🎁 Special Offer: Enjoy 15% off on your first purchase with code "HEALTH15" at checkout! 🎁

Don't settle for mediocre health solutions when you deserve the extraordinary! Embrace the Underground Meds experience today and unlock the secrets to a healthier, happier you. Visit our website at https://undergroundmeds.com and take that vital step towards a vibrant life! 🌟 Elevate your health journey with Underground Meds - Where Excellence Meets Wellness! 🌟 order Now : https://undergroundmeds.com https://start.me/w/mPyJMv https://start.me/w/J9eA20 https://start.me/w/oPezGm https://christmancraig.gumroad.com/l/mhpyw https://christmancraig.gumroad.com/l/rlgyo https://christmancraig.gumroad.com/l/wkbad https://christmancraig.gumroad.com/l/kezktt https://anyflip.com/vzyql/ntzk/basic https://www.alltrails.com/explore/map/map-march-31-2024-3b3fff6?u=m&sh=l2ackp https://www.alltrails.com/explore/map/map-march-31-2024-a279094?u=m&sh=l2ackp

S

Recover Your Lost or Stolen Cryptocurrency with TREK Tech Corp

Two months ago, I lost around 19 BTC and $307,000 worth of ETH to a fake trading platform. I was lured in by the promise of earning 20% on each trade. Unfortunately, the platform turned out to be a scam, and I lost all my funds.

After weeks of feeling devastated, I found TREK Tech Corp, a cryptocurrency recovery expert. I reached out to their support team, explaining my situation. They asked for some information, and within 12 hours, they had traced, tracked, and recovered all my stolen funds without any hassle or hidden charges. It was a relief to have my money back.

If you've lost cryptocurrency due to scams or fraud, TREK Tech Corp is the best team to help you recover your digital assets.

Contact email: trektechcorp1 @ gmail. com / trektechcorp @ consultant. com.

J

I understand the reduction of memory using binary vectors. However, if you use the normal vectors for knn re-ranking, you still need the complete vectors for all items, right? That sounds like you need even more memory. Can you elaborate on that?

C
Ce Gao1y ago

Thanks for the question. You need to store the full-precision vector data, but you can build the index with binary vectors. The memory usage of the index can be reduced. Data and index, are two different things that need storage.

2
J

Thank you, I get it now. Ce Gao

J

Cool article. I understand how you transform the fp32 vector into a bitvector, but how do you do the nearest neighbor on the set of bit vectors to get the initial 200 you describe in your article?. Do you use brute-force, or annoy, or something else?

C
Ce Gao2y ago

Still using HNSW to build an index for the bit vectors. It takes 0.6GB to store the indexes. In the experiment we use our own pgvecto.rs, the vector search extension in Postgres.

T

May I ask how the shortening index is performed? Did you use PCA or some dimension reduction method?

C
Ce Gao2y ago

No, that is actually a new feature of the OpenAI embedding model. You have the ability to selectively discard or drop certain dimensions, and the model will still function appropriately.

Under the hood is Matryoshka Representation Learning https://aniketrege.github.io/blog/2024/mrl/

Y
Yuxin Xu2y ago

Can I ask how we got the calculation of "can reduce memory usage by up to 32 times"? Thank you! Been stuck in that step for a while not knowing why.

Y
Yuxin Xu2y ago

Is it because of the below example (20/0.6 = 33.3333...)?

1
C
Ce Gao2y ago

Yuxin Xu Hello, that statement represents a theoretical result. In the case of FP32 (floating-point 32-bit) format, it requires 32 bits to store values, whereas a binary vector only needs 1 bit per element.

Z

Great documentation! I wonder if there's room to tune the threshold by which we decide whether a float is 0 or 1. Obviously, the threshold tuning should be done on a sample of the data. What do you think?

2
C
Ce Gao2y ago

Thanks! It would be an improvement if we adjust the cutoff point to decide whether a number is considered 0 or 1. Currently, we just convert positive numbers to 1 and negative numbers to 0

More from this blog

V

VectorChord | Vector Search in Postgres

37 posts

Scalable Vector Search in Postgres. Revolutionize Vector Search, not Database.