Database Awesome
| name | stand for | 
|---|---|
| Relational DBMS | 关系型数据库 | 
| Key-value stores | KV 存储 | 
| Document stores | 文档存储 | 
| Time Series DBMS | 时序数据库 | 
| Graph DBMS | 图数据库 | 
| Object oriented DBMS | 面向对象数据库 | 
| Search engines | 搜索引擎 | 
| RDF stores | RDF 存储 | 
| Wide column stores | 宽列存储 | 
| Column stores | 列存储 | 
| Multivalue DBMS | 多值数据库 | 
| Native XML DBMS | XML 数据库 | 
| Spatial DBMS | 空间数据库 | 
| Event Stores | 事件存储 | 
| Content stores | 内容存储 | 
| Navigational DBMS | 导航数据库 | 
| db | lang | 
|---|---|
| MSSQL | T-SQL/Transact-SQL | 
| PostgreSQL | PL/pgSQL | 
| Oracle | PL/SQL | 
| MySQL | PL/SQL (有限支持) / Stored Procedures (SQL语法扩展) | 
| DB2 | SQL PL | 
- PostgreSQL Awesome
- NewSQL
- cockroachdb
 
- Realtime/Baas/Streaming/实时
- MaterializeInc/materialize
- 基于 PostgreSQL
- 实时的 MATERIALIZED VIEW
- CREATE SOURCE
- 支持 Kafka, S3
 
 
- supabase/supabase
- 基于 PostgreSQL
- 类比 Firebase
 
 
- MaterializeInc/materialize
- CommandLine/CUI/CLI
- xo/usql
- MIT, Go
- Universal command-line interface for SQL databases
- usql -c '\drivers'
- 推荐 🌟 没有依赖,方便 Windows 这样的环境使用,支持的驱动多
 
- dbcli/pgcli
- BSC-3, Python
 
- dbcli/mycli
 
- xo/usql
- Ranking/Benchmark
- https://github.com/mgramin/awesome-db-tools
理论
- Sargable - Search ARGument ABLE
- wikipedia Sargable
- 定义查询可利用索引的场景
 
关系型
- PostgreSQL
- SQLite
- MS Access
- .accdb, .mdb
 
- MySQL
- Oracle
- FirebirdSQL/firebird
- MPL-1.1, C++
- since 1981
- ANSI SQL
 
MySQL
- Viteness
- TiDB
- baidu/BaikalDB
- Apache-2.0, C++
- HTAP
 
分布式
- 分布式/Disributed/HA/高可用
- rqlite/rqlite
- SQLite + Raft
 
- tigerbeetle/tigerbeetle
- Apache-2.0, Zig
- OLTP
 
- PostgreSQL
- citusdata/citus
- PostgreSQL Sharding
 
- cdb
- yugabyte/yugabyte-db
- ydb-platform/ydb
 
- citusdata/citus
时序数据库
- 可观察性 - 日志、指标、追踪
- IoT
- timescale
- CeresDB/ceresdb
- GreptimeTeam/greptimedb
- questdb/questdb
- Java, Apache 2.0
- PostgreSQL 兼容协议
 
- m3db/m3
- from Uber
- Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform
 
- taosdata/TDengine
OLAP
- Cube 计算
- BI
- cubejs
- duckdb/duckdb
- SQLite for OLAP, C++, MIT
 
- ClickHouse
- Apache Druid
- Apache Kylin
- Apache Pinot
- by LinkedIn
 
- Apache Doris
- by 百度
 
- Comparison of OLAP servers
OLTP
- stoneatom/stonedb
- GPLv2, C++
- MySQL HTAP and MySQL-native database for oltp, real-time analytics
 
Column Store
- 面向分析
- Clickhouse
- DuckDB
- Apache Druid
- Apache Pinot
- InfluxDB
- PostgreSQL cstore_fdw, vops
- ORC
 
- greenplum-db/gpdb
- format
- orc
- parquet
 
- List of column-oriented DBMSes
Wide-Column
- 宽列 - 二维 KV
- 并非标准列存储
 
- Scylla
- Apache Cassandra
- Apache Accumulo
- Apache HBase
- Hypertable
- 商业
- Bigtable
- DataStax Enterprise
- DataStax Astra DB
- Azure Tables
 
图
Stream
- airbyte
- nifi
- benthosdev/benthos
消息队列
Data versioning
- dolthub/dolt
- Apache-2.0, Go
- Git for Data
 
- treeverse/lakeFS
- Apache-2.0, Go
- Data version control for your data lake
- https://lakefs.io/blog/dvc-vs-git-vs-dolt-vs-lakefs/
 
- Git LFS
- DVS
有趣
- wiredtiger/wiredtiger
- MongoDB
 
- tantaman/aphrodite
- facebookincubator/LogDevice
- Distributed storage for sequential data
 
- FerretDB/FerretDB
- Apache-2.0, Go
- mongodb wire protocol to PostgreSQL
 
- fcoury/oxide
- Apache-2.0, Rust
- mongodb wire protocol to PostgreSQL
 
- alash3al/redix
- Apache-2.0, Go
- redis wire protocol to PostgreSQL
 
- TAO: Facebook’s Distributed Data Store for the Social Graph
- TAO - The Associations and Objects
- HN
 
- EvgSkv/logica
- 逻辑编程转 SQL
 
- ApsaraDB
- by Alibaba
 
- xtdb/xtdb
- MIT, Clojure
 
- codenotary/immudb
- immutable database
 
- maxmunzel/kvass
- MIT, Go
- personal key-value store
 
- debezium/debezium
- Apache-2.0, Java
- CDC - Change data capture
 
- nhost/nhost
- MIT, TS
- Hasura+S3
- Firebase Alternative with GraphQL
 
- surrealdb/surrealdb
- 服务端 Rust, BSL
- 概念 NAMESPACE, DATABASE
- 本地存储: RocksDB
- 分布式存储: tikv
 
docker run --rm -p 8000:8000 surrealdb/surrealdb:latest start --log debug --user root --pass root
curl --request POST \
  --header "Accept: application/json" \
  --user "root:root" \
  --data "INFO FOR DB;" \
  http://localhost:8000/sql
SQLite Awesome
- nalgeon/sqlean
- MIT, C
- 扩展合集
 
- dgllghr/stanchion
- Column
 
- proofrock/ws4sqlite
- Query sqlite via http
- HN
 
- Hosting SQLite on Github Pages
- subzerocloud/blue-steel
- RUST
- REST api for SQLite & PostgreSQL
 
- https://lumosql.org/
- SQLite+LMDB
 
- pocketbase/pocketbase
- MIT, Go
- realtime backend in 1 file
 
- Replication
- vlcn-io/cr-sqlite
- MIT, Rust
- CRDT
- 之前为 tantaman/conflict-free-sqlite
 
- benbjohnson/litestream
- Apache-2.0, Go
- Streaming replication for SQLite
- 外部进程
- 将一个 sqlite 同步到另外的 sqlite - 依赖中间存储
 
- superfly/litefs
- Apache-2.0, Go
- 基于 fuse 的 litestream - 批量同步
 
- maxpert/marmot
- 基于 NATS 同步 sqlite
 
 
- vlcn-io/cr-sqlite
- SQLite Based Distributed Database/基于 SQLite 的分布式数据库
- dqlite
- GPLv3, C
 
- rqlite
- MIT, Golang
- HTTPS API
- node-discovery & automatic clustering - Kubernetes, Consul, etcd, DNS
- hashicorp/raft
- https://github.com/rqlite/rqlite/blob/master/DOC/PERFORMANCE.md
- In-memory DB < 2GB
- 只能用 deterministic SQL - 因为 raft 复制,幂等
 
- losfair/mvsqlite
- Apache-2.0, Rust
- MVCC SQLite that runs on FoundationDB
 
- bloomberg/comdb2
- Apache-2.0, C
 
 
- dqlite
- LumoSQL/LumoSQL
- SQLite over LMDB
 
- https://github.com/planetopendata/awesome-sqlite
ID
- ULID
- niieani/hashids.js
- YouTube-like ids
 
- Sharding & IDs at Instagram
Big Data
Vector
面向向量的多用于 搜索 和 机器学习。
Vector ->
number[]
常见模型的向量维度
- OpenAI text-embedding-3-large 3072 dims
- mistral-derivative 4096 dims
- https://huggingface.co/spaces/mteb/leaderboard
- pgvector/pgvector
- PostgreSQL/MIT, C
- since 2021-07
- PG extension: vector
- 比 pgvecto.rs 更成熟, 使用面更广泛, 最近开发没有 pgvecto.rs 活跃, 功能特性更少
- 适合目前已有环境,能够很方便直接使用,对 vector 要求不高的场景,避免单独部署 DB 直接利用现有 PG
- 特性
- 2000 dim
- 使用 PG 管理索引存储
 
- HNSW_MAX_DIM = IVFFLAT_MAX_DIM = 2000
- pgvector#461 Increase max vectors dimension limit for index
- PG 默认 8KB Page, 不可调整, 限制了 1page 能存放的 float32
 
 
- pgvector#461 Increase max vectors dimension limit for index
- Supabase PostgreSQL Image 包含了 pgvector
 
- tensorchord/pgvecto.rs
- Apache-2.0, Rust
- since 2023-05
- PG extension: vectors
- 适合需要更高性能,更多功能,更大维度的场景,可以针对 vector 需求单独部署一个 DB
- docker tensorchord/pgvecto-rs:pg16-v0.2.1
- 不是 TLE / trusted extension
- AWS 不支持
- Supabase 的 PG 镜像未包含
- supabase/postgres#895 support pgvecto.rs extension
 
 
- vs pgvector
- by pgvecto.rs
- 功能更丰富
- 支持 sparse vector
- 65535 dim
- 支持 FP16, INT8
- 独立的索引管理
- 更好的 filtering 支持
- https://blog.pgvecto.rs/pgvector-vs-pgvectors-in-2024-a-comprehensive-comparison-for-vector-search-in-postgresql
 
- immich-app/immich#5830 Why Pgvecto.rs over pgvector?
- filtering 结果更准确
- 索引更快
- 插入更快
- 功能更多 - quantization, VBASE
 
 
- qdrant/qdrant
- Apache-2.0, Rust
 
- milvus-io/milvus
- Apache-2.0, Go
- Vector database for scalable similarity search and AI applications.
 
- weaviate/weaviate
- BSD-3, Golang
- cloud-native, modular, real-time vector search engine
 
- facebookincubator/velox
- Apache-2.0, C++
- C++ vectorized database acceleration library
 
- erikbern/ann-benchmarks
- Benchmarks of approximate nearest neighbor
 
- eto-ai/lance
- Apache-2.0, Rust
- columnar format
 
- lancedb/lancedb
- Apache-2.0, Rust
 
- Not All Vector Databases Are Made Equal
- https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html
- jerryjliu/llama_index
- sqlite-vss
- Lib
- nmslib/hnswlib
- facebookresearch/faiss
- MIT, C++
- efficient similarity search and clustering of dense vectors
 
- unum-cloud/usearch
 
Geo
- https://s2geometry.io/
- uber/h3
- Apache-2.0, C
- Hexagonal hierarchical geospatial indexing system
 
Tools
- Netflix/ndbench
- Data Store Benchmark
 
Interactive
- pocketbase/pocketbase
- MIT, Go, SQLite
 
- sqlpad/sqlpad
- Web-based SQL editor and visualizing
 
- pinterest/querybook
- chartbrew/chartbrew
- MIT, Node.js
- Data visualization tool for SQL databases
 
Library
- Parser/SQL
- prql/prql
- Apache-2.0, Rust
- Pipelined Relational Query Language
 
- Open Source SQL Parsers
- runreveal/pql
- Golang
- Pipelined Query Language
- https://news.ycombinator.com/item?id=39539252
 
- Azure Kusto Query Language (KQL)
- Azure appinsights logs
 
 
- prql/prql
迁移/升级/Schema 变更/Migration
不需要重复部署的环境可以不用考虑
常见模式迁移
- DSL - XML, YAML, HCL
- 支持 diff
- 多一层学习成本
- 支持多种数据库
- 还支持除了 table 以外的对象
- 必须先写 DSL - SSOT
- SQL
- 概念简单
- 能利用所有 SQL 能力
- 支持很多场景
- 版本管理为主
- ORM -> SQL
- 类似 DSL
- 减少额外定义 DSL
- 生成 SQL 不透明
- 功能局限
- 一般 diff 不会做移除列操作
- 局限于 table
版本目录模式
- [version]-[title].sql- 通过 注释 分割 up/down
- --- up
- --- down
 
- [version]-[title].<up|down>.sql
- [version]-[title]/- 目录区分版本- up.sql
- down.sql
 
- 平台/服务/工具
- bytebase/bytebase
- MIT, Go, Vue
- Safe database schema change and version control for DevOps teams.
 
 
- bytebase/bytebase
- Java
- flywaydb
- Apache-2.0, SQL
- db/migration/<Prefix><Version>__<Description>.sql
- mvn clean flyway:migrate -Dflyway.configFiles=myFlywayConfig.conf
- IDEA JPA Budy 有 Flyway 集成
 
- liquibase
- XML
 
 
- flywaydb
- Golang
- ariga/atlas
- Apache-2.0, DSL, Go
- SQL, HCL, ORM
- database schema as code
 
- golang-migrate/migrate
- MIT, Go
- VER_DESC.<up|down>.sql
- migrate create -ext sql -dir db/migrations -seq create_users_table
 
- amacneil/dbmate
- SQL, Go
- framework-agnostic database migration tool
- 注释分割 -- migrate:up
 
- pressly/goose
- MIT, SQL/Go
- Supports SQL migrations and Go functions.
 
 
- ariga/atlas
- NodeJS
- salsita/node-pg-migrate
- MIT, TS
- Node.js database migration management for PostgreSQL
 
 
- salsita/node-pg-migrate
- djrobstep/migra
- PG Schema Diff
 
- fabianlindfors/reshape
- zero-downtime schema migration tool for Postgres
 
- sqitchers/sqitch
- MIT, Perl
 
- skeema
- MySQL & MariaDB
 
- sqlite user_version
- rickbergfalk/postgrator
- Hasura - https://hasura.io/docs/latest/migrations-metadata-seeds/manage-migrations/
- VER_DESC/<up|down>.sql
 
TBD
- Database eLearning
- DB-Engines Ranking
- Aphyr jepsen
- EventQL
- EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries.
 
- Carte
- SQL FOR THINGS DATA
- Real-time SQL. Simple scaling. Millions of inserts per second.
- CrateDB offers Standard-SQL, real-time queries and document support (JSON) in a simple, horizontal way to scale.
- IoT & Sensors
- Stream millions of data points per second into CrateDB and query them in real time to visualize, track, and predict.
 
- Log & Event Analytics
- Analyze logs from IT infrastructure to monitor security, compliance, usage, billing, and more.
 
- Time Series, Geospatial, Machine Learning
- Versatile SQL engine enables complex queries, text searches, and aggregations – in real time or against volumes of historic data.
 
 
- caesar0301/awesome-public-datasets
- https://dbmstools.com/