However Briefly

I made a search engine worse than Elasticsearch (2024)

created: June 5, 2025, 6:37 p.m. | updated: June 6, 2025, 3:35 p.m.

And in this shame, you too, can experience the humility and understanding of what a real, honest-to-goodness, not side-project, search engine does to make lexical search fast. A Magic WAND(Or how SearchArray is top 8m retrieval while Elasticsearch == top K retrieval)In lexical search systems, you search for multiple terms. Caching the FULL query, not just individual BM25 term scoringSearchArray is just a system for computing BM25 scores (or whatever similarity). score ( term ) return scoresBut in a regular search engine like Solr, Elasticsearch, OpenSearch, or Vespa, this logic is expressed in the search engine’s Query DSL. Maybe this simple bm25_search:def bm25_search(corpus, query): query = snowball_tokenizer(query) scores = np.zeros(len(corpus)) for q in query: scores += corpus['text_snowball'].array.score(q) return scores(Leaving out some annoying threading, but you can look at the code all here )

Read Full Article

2 days, 12 hours ago: Hacker News