Search Engine Awesome
- Search Benchmark
- quickwit-oss/tantivy
- MIT, Rust
- 等同于基于 Rust 的 Lucene
- 基于 Tantivy
- toshi-search/Toshi
- MIT, Rust
- mosuka/bayard
- MIT, Rust
- quickwit-oss/quickwit
- AGPLv3, Rust
- Like Elasticsearch, but highly reliable & cost-efficient for log management.
- lnx-search/lnx
- MIT, Rust
- toshi-search/Toshi
- valeriansaliou/sonic
- MPL-2.0, Rust
- schema-less search backend
- 非常小巧,对于一个词组默认索引 1000 条记录
- groonga/groonga
- LGPL-2.1, C
- 有 MySQL 和 PostgreSQL 插件 - 非常易用
- manticoresoftware/manticoresearch
- GPL-2.0, C++
- Database for search
- MySQL 协议和语法
- 支持 行存、列存、文档存储
- forked from Sphinx 2.3.2 in 2017
- vs MeiliSearch https://www.reddit.com/r/selfhosted/comments/w89tgh/comment/ihq798e/
- typesense/typesense
- GPL-3.0, C++
- 不支持 CJK
- Support for writing systems without spaces between words typesense/typesense#228
- pisa-engine/pisa
- Performant Indexes and Search for Academia
- Apache Lucene Core
- olivernn/lunr.js
- Apache nutch
- sphinxsearch
- v3 后不开源
- blevesearch/bleve
- Apache-2.0, Go
- mosuka/blast
- 基于 bleve
- os-fulltext-search-solutions
- gajus/liqe
- Lucene-like parser and search engine
- blugelabs/bluge
- Apache-2.0, Go
- indexing library for Go
- Bluge Based
- prabhatsharma/zinc
- Apache-2.0, Go+Vue
- alternative to elasticsearch
- 目标是日志分析,不是 ES 兼容 https://github.com/prabhatsharma/zinc/issues/52#issuecomment-1000064449
- mosuka/phalanx
- Apache-2.0, Go
- cloud-native distributed search engine
- prabhatsharma/zinc
- MeiliSearch
- MIT, Rust
- 索引要能放入 RAM - 小数据量场景
- BucketSort
- wibyweb/wiby
- GPLv2
- wilsonzlin/edgesearch
- MIT, Rust
- WASM code at Cloudflare Worker
- jameslittle230/stork
- Apache-2.0, Rust
- tinysearch/tinysearch
- Apache-2.0,MIT, Rust
- Web Search
- https://searchhut.org/
- 提交域名索引 https://searchhut.org/about
- https://sr.ht/~sircmpwn/searchhut/
- GPLv3, Go
- https://news.ycombinator.com/item?id=32104609
- Crawler
SearchHut Bot 0.0 (GNU AGPL 3.0); https://sr.ht/~sircmpwn/searchhut; <[email protected]>
- StractOrg/stract
- AGPLv3, Rust, Svelte
- axum web framework, rocksdb
- https://searchhut.org/
- yacy/yacy_search_server
- GPLv2+, Java
- Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
- mwmbl/mwmbl
- AGPLv3, Python
searx/searx- AGPLv3, Python
- Privacy-respecting, hackable metasearch engine
- https://searx.org/
- https://searx.space/
- List of search engines
- metasearch engine
- 聚合其他搜索引擎
- beir-cellar/beir
- Heterogeneous Benchmark for Information Retrieval
- Web
- pagefind
- nearform/lyra
- @nearform/lyra
- in-memory, typo-tolerant, full-text search engine
- fuzzyjs
- lucaong/minisearch
Lcoal Search
- xapian/xapian
- The Lemur Project
- A Local Search Engine
- Building Monocle, a universal personal search engine for life
- recoll
- desktop full-text search tool
- HN
中文分词
- ik
- jieba
- fxsjy/jieba
- Pythone
- fxsjy/jieba
- hightman/scws
- PHP
- Simple Chinese Word Segmentation
- go-ego/gse
Library
- Golang
- FST - finite state transducer
- BurntSushi/fst
- Rust
- Extract
Tech
- TF-IDF
- FST - finite state transducer - 有限状态传感器
- term -> id
- 实现推荐 - O(length(querry))
- 共享尾部
- 内存少于 tire
- DFSA - Deterministric acyclic finite state acceptor - 确定无环有限状态接收机
- tire-tree
- 适合英文词典 - 字符集少,唯一前缀少
- tire-tree
- Double Array Trie
- 适合做中文词典,内存占用小
- https://linux.thai.net/~thep/datrie/datrie.html
- Ternary Search Tree
- Ragel
Story
Misc
- RaRe-Technologies/gensim
- LGPL-2.1, Python
- Topic Modelling for Humans
- oborchers/Fast_Sentence_Embeddings
- mayabot/mynlp