Database Awesome
name | stand for |
---|---|
Relational DBMS | 关系型数据库 |
Key-value stores | KV 存储 |
Document stores | 文档存储 |
Time Series DBMS | 时序数据库 |
Graph DBMS | 图数据库 |
Object oriented DBMS | 面向对象数据库 |
Search engines | 搜索引擎 |
RDF stores | RDF 存储 |
Wide column stores | 宽列存储 |
Column stores | 列存储 |
Multivalue DBMS | 多值数据库 |
Native XML DBMS | XML 数据库 |
Spatial DBMS | 空间数据库 |
Event Stores | 事件存储 |
Content stores | 内容存储 |
Navigational DBMS | 导航数据库 |
- PostgreSQL Awesome
- NewSQL
- cockroachdb
- Realtime/Baas/Streaming/实时
- MaterializeInc/materialize
- 基于 PostgreSQL
- 实时的
MATERIALIZED VIEW
- CREATE SOURCE
- 支持 Kafka, S3
- supabase/supabase
- 基于 PostgreSQL
- 类比 Firebase
- MaterializeInc/materialize
- CommandLine/CUI/CLI
- xo/usql
- MIT, Go
- Universal command-line interface for SQL databases
usql -c '\drivers'
- 推荐 🌟 没有依赖,方便 Windows 这样的环境使用,支持的驱动多
- dbcli/pgcli
- BSC-3, Python
- dbcli/mycli
- xo/usql
- Ranking
- https://db-engines.com/en/ranking
- https://db-benchmarks.com/
- by manticoresearch
理论
- Sargable - Search ARGument ABLE
- wikipedia Sargable
- 定义查询可利用索引的场景
关系型
- PostgreSQL
- SQLite
- MySQL
- FirebirdSQL/firebird
- MPL-1.1, C++
- since 1981
- ANSI SQL
MySQL
- Viteness
- TiDB
- baidu/BaikalDB
- Apache-2.0, C++
- HTAP
分布式
- 分布式/Disributed/HA/高可用
- rqlite/rqlite
- SQLite + Raft
- tigerbeetle/tigerbeetle
- Apache-2.0, Zig
- OLTP
- PostgreSQL
- citusdata/citus
- PostgreSQL Sharding
- cdb
- yugabyte/yugabyte-db
- ydb-platform/ydb
- citusdata/citus
时序数据库
- 可观察性 - 日志、指标、追踪
- IoT
- timescale
- CeresDB/ceresdb
- GreptimeTeam/greptimedb
- questdb/questdb
- Java, Apache 2.0
- PostgreSQL 兼容协议
- m3db/m3
- from Uber
- Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform
- taosdata/TDengine
OLAP
- Cube 计算
- BI
- cubejs
- duckdb/duckdb
- SQLite for OLAP, C++, MIT
- ClickHouse
- Apache Druid
- Apache Kylin
- Apache Pinot
- by LinkedIn
- Apache Doris
- by 百度
- Comparison of OLAP servers
OLTP
- stoneatom/stonedb
- GPLv2, C++
- MySQL HTAP and MySQL-native database for oltp, real-time analytics
Column Store
- 面向分析
- Clickhouse
- DuckDB
- Apache Druid
- Apache Pinot
- InfluxDB
- PostgreSQL cstore_fdw, vops
- ORC
- greenplum-db/gpdb
- format
- orc
- parquet
- List of column-oriented DBMSes
Wide-Column
- 宽列 - 二维 KV
- 并非标准列存储
- Scylla
- Apache Cassandra
- Apache Accumulo
- Apache HBase
- Hypertable
- 商业
- Bigtable
- DataStax Enterprise
- DataStax Astra DB
- Azure Tables
图
Stream
- airbyte
- nifi
- benthosdev/benthos
消息队列
Data versioning
- dolthub/dolt
- Apache-2.0, Go
- Git for Data
- treeverse/lakeFS
- Apache-2.0, Go
- Data version control for your data lake
- https://lakefs.io/blog/dvc-vs-git-vs-dolt-vs-lakefs/
- Git LFS
- DVS
有趣
- wiredtiger/wiredtiger
- MongoDB
- tantaman/aphrodite
- facebookincubator/LogDevice
- Distributed storage for sequential data
- FerretDB/FerretDB
- Apache-2.0, Go
- mongodb wire protocol to PostgreSQL
- fcoury/oxide
- Apache-2.0, Rust
- mongodb wire protocol to PostgreSQL
- alash3al/redix
- Apache-2.0, Go
- redis wire protocol to PostgreSQL
- TAO: Facebook’s Distributed Data Store for the Social Graph
- TAO - The Associations and Objects
- HN
- EvgSkv/logica
- 逻辑编程转 SQL
- ApsaraDB
- by Alibaba
- xtdb/xtdb
- MIT, Clojure
- codenotary/immudb
- immutable database
- maxmunzel/kvass
- MIT, Go
- personal key-value store
- debezium/debezium
- Apache-2.0, Java
- CDC - Change data capture
- nhost/nhost
- MIT, TS
- Hasura+S3
- Firebase Alternative with GraphQL
- surrealdb/surrealdb
- 服务端 Rust, BSL
- 概念 NAMESPACE, DATABASE
- 本地存储: RocksDB
- 分布式存储: tikv
docker run --rm -p 8000:8000 surrealdb/surrealdb:latest start --log debug --user root --pass root
curl --request POST \
--header "Accept: application/json" \
--user "root:root" \
--data "INFO FOR DB;" \
http://localhost:8000/sql
SQLite Awesome
- dgllghr/stanchion
- Column
- proofrock/ws4sqlite
- Query sqlite via http
- HN
- Hosting SQLite on Github Pages
- subzerocloud/blue-steel
- RUST
- REST api for SQLite & PostgreSQL
- https://lumosql.org/
- SQLite+LMDB
- pocketbase/pocketbase
- MIT, Go
- realtime backend in 1 file
- Replication
- vlcn-io/cr-sqlite
- MIT, Rust
- CRDT
- 之前为 tantaman/conflict-free-sqlite
- benbjohnson/litestream
- Apache-2.0, Go
- Streaming replication for SQLite
- 外部进程
- 将一个 sqlite 同步到另外的 sqlite - 依赖中间存储
- superfly/litefs
- Apache-2.0, Go
- 基于 fuse 的 litestream - 批量同步
- maxpert/marmot
- 基于 NATS 同步 sqlite
- vlcn-io/cr-sqlite
- SQLite Based Distributed Database/基于 SQLite 的分布式数据库
- dqlite
- GPLv3, C
- rqlite
- MIT, Golang
- HTTPS API
- node-discovery & automatic clustering - Kubernetes, Consul, etcd, DNS
- hashicorp/raft
- https://github.com/rqlite/rqlite/blob/master/DOC/PERFORMANCE.md
- In-memory DB < 2GB
- 只能用 deterministic SQL - 因为 raft 复制,幂等
- losfair/mvsqlite
- Apache-2.0, Rust
- MVCC SQLite that runs on FoundationDB
- bloomberg/comdb2
- Apache-2.0, C
- dqlite
- LumoSQL/LumoSQL
- SQLite over LMDB
- https://github.com/planetopendata/awesome-sqlite
ID
- ULID
- niieani/hashids.js
- YouTube-like ids
- Sharding & IDs at Instagram
Big Data
Vector
面向向量的多用于 搜索 和 机器学习。
Vector ->
number[]
常见模型的向量维度
- OpenAI text-embedding-3-large 3072 dims
- mistral-derivative 4096 dims
- https://huggingface.co/spaces/mteb/leaderboard
- pgvector/pgvector
- PostgreSQL/MIT, C
- since 2021-07
- PG extension: vector
- 比 pgvecto.rs 更成熟, 使用面更广泛, 最近开发没有 pgvecto.rs 活跃, 功能特性更少
- 适合目前已有环境,能够很方便直接使用,对 vector 要求不高的场景,避免单独部署 DB 直接利用现有 PG
- 特性
- 2000 dim
- 使用 PG 管理索引存储
- HNSW_MAX_DIM = IVFFLAT_MAX_DIM = 2000
- pgvector#461 Increase max vectors dimension limit for index
- PG 默认 8KB Page, 不可调整, 限制了 1page 能存放的 float32
- pgvector#461 Increase max vectors dimension limit for index
- Supabase PostgreSQL Image 包含了 pgvector
- tensorchord/pgvecto.rs
- Apache-2.0, Rust
- since 2023-05
- PG extension: vectors
- 适合需要更高性能,更多功能,更大维度的场景,可以针对 vector 需求单独部署一个 DB
- docker tensorchord/pgvecto-rs:pg16-v0.2.1
- 不是 TLE / trusted extension
- AWS 不支持
- Supabase 的 PG 镜像未包含
- supabase/postgres#895 support pgvecto.rs extension
- vs pgvector
- by pgvecto.rs
- 功能更丰富
- 支持 sparse vector
- 65535 dim
- 支持 FP16, INT8
- 独立的索引管理
- 更好的 filtering 支持
- https://blog.pgvecto.rs/pgvector-vs-pgvectors-in-2024-a-comprehensive-comparison-for-vector-search-in-postgresql
- immich-app/immich#5830 Why Pgvecto.rs over pgvector?
- filtering 结果更准确
- 索引更快
- 插入更快
- 功能更多 - quantization, VBASE
- qdrant/qdrant
- Apache-2.0, Rust
- milvus-io/milvus
- Apache-2.0, Go
- Vector database for scalable similarity search and AI applications.
- weaviate/weaviate
- BSD-3, Golang
- cloud-native, modular, real-time vector search engine
- facebookincubator/velox
- Apache-2.0, C++
- C++ vectorized database acceleration library
- erikbern/ann-benchmarks
- Benchmarks of approximate nearest neighbor
- eto-ai/lance
- Apache-2.0, Rust
- columnar format
- lancedb/lancedb
- Apache-2.0, Rust
- Not All Vector Databases Are Made Equal
- https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html
- jerryjliu/llama_index
- sqlite-vss
- Lib
- nmslib/hnswlib
- facebookresearch/faiss
- MIT, C++
- efficient similarity search and clustering of dense vectors
- unum-cloud/usearch
Geo
- https://s2geometry.io/
- uber/h3
- Apache-2.0, C
- Hexagonal hierarchical geospatial indexing system
Tools
- Netflix/ndbench
- Data Store Benchmark
Interactive
- pocketbase/pocketbase
- MIT, Go, SQLite
- sqlpad/sqlpad
- Web-based SQL editor and visualizing
- pinterest/querybook
- chartbrew/chartbrew
- MIT, Node.js
- Data visualization tool for SQL databases
Library
- Parser/SQL
- prql/prql
- Apache-2.0, Rust
- Pipelined Relational Query Language
- Open Source SQL Parsers
- runreveal/pql
- Golang
- Pipelined Query Language
- https://news.ycombinator.com/item?id=39539252
- Azure Kusto Query Language (KQL)
- Azure appinsights logs
- prql/prql
迁移/升级/Schema 变更/Migration
不需要重复部署的环境可以不用考虑
常见模式迁移
- DSL - XML, YAML, HCL
- 支持 diff
- 多一层学习成本
- 支持多种数据库
- 还支持除了 table 以外的对象
- 必须先写 DSL - SSOT
- SQL
- 概念简单
- 能利用所有 SQL 能力
- 支持很多场景
- 版本管理为主
- ORM -> SQL
- 类似 DSL
- 减少额外定义 DSL
- 生成 SQL 不透明
- 功能局限
- 一般 diff 不会做移除列操作
- 局限于 table
版本目录模式
[version]-[title].sql
- 通过 注释 分割 up/down
--- up
--- down
[version]-[title].<up|down>.sql
[version]-[title]/
- 目录区分版本up.sql
down.sql
- Java
- flywaydb
- Apache-2.0, SQL
db/migration/<Prefix><Version>__<Description>.sql
mvn clean flyway:migrate -Dflyway.configFiles=myFlywayConfig.conf
- IDEA JPA Budy 有 Flyway 集成
- liquibase
- XML
- flywaydb
- Golang
- ariga/atlas
- DSL, Go
- golang-migrate/migrate
- MIT, Go
VER_DESC.<up|down>.sql
migrate create -ext sql -dir db/migrations -seq create_users_table
- amacneil/dbmate
- SQL, Go
- framework-agnostic database migration tool
- 注释分割
-- migrate:up
- pressly/goose
- MIT, SQL/Go
- ariga/atlas
- 平台/服务/工具
- bytebase/bytebase
- MIT, Go, Vue
- Safe database schema change and version control for DevOps teams.
- bytebase/bytebase
- djrobstep/migra
- PG Schema Diff
- fabianlindfors/reshape
- zero-downtime schema migration tool for Postgres
- sqitchers/sqitch
- MIT, Perl
- skeema
- MySQL & MariaDB
- sqlite user_version
- rickbergfalk/postgrator
- Hasura - https://hasura.io/docs/latest/migrations-metadata-seeds/manage-migrations/
VER_DESC/<up|down>.sql
TBD
- Database eLearning
- DB-Engines Ranking
- Aphyr jepsen
- EventQL
- EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries.
- Carte
- SQL FOR THINGS DATA
- Real-time SQL. Simple scaling. Millions of inserts per second.
- CrateDB offers Standard-SQL, real-time queries and document support (JSON) in a simple, horizontal way to scale.
- IoT & Sensors
- Stream millions of data points per second into CrateDB and query them in real time to visualize, track, and predict.
- Log & Event Analytics
- Analyze logs from IT infrastructure to monitor security, compliance, usage, billing, and more.
- Time Series, Geospatial, Machine Learning
- Versatile SQL engine enables complex queries, text searches, and aggregations – in real time or against volumes of historic data.
- caesar0301/awesome-public-datasets
- https://dbmstools.com/