Comment by talipozturk

Comment by talipozturk 4 hours ago

0 replies

co-founder of Vectroid: We forked Lucene. Lucene is awesome for search in general, filters, and obviously full-text search. Very mature and well supported by so many big names and amazing engineers. So we take advantage of that but we had to change a few things to make it work perfectly for Vector use-case. We basically think Vector should be the main data type as it is the most difficult one to deal with. For instance, we modified Lucene to use X number of CPU / threads to build a single segment index. As a result, if/when needed, we can utilize hundreds of CPUs to index quicker and generate less number of segments that will enable lower query latency. We also built a custom File System Directory for Lucene to work off of GCS directly (or S3 later on). It can by-pass the kernel, read from network and write directly into the memory... no SSD, no page-cache, no mmap involved. Perhaps I should not say more...