skyitachi's blogann-benchmarks中hnsw简单解读 2024-07-23
hnswlib和faiss_hnsw在ann-benchmark中的参数解读
benchmark链接
只需要关注hnswlib和hnsw(faiss) 即可
声明
- 数据集: sift-128-euclidean数据集(向量的维度是128)
- k-nn的k是10
- hnswlib和faiss_hnsw的benchmark都是基于单线程的
benchmark的细节点说明
hnswlib中有相同图例的点, eg:
Parameters: hnswlib ({'M': 12, 'efConstruction': 500})
这样得点标记了好几个,但是QPS和Recall都不相同,
原因在于hnswlib的benchmark配置中有query-args参数1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20// ann_benchmarks/algorithms/hnswlib/config.yml
float:
any:
- base_args: ['@metric']
constructor: HnswLib
disabled: false
docker_tag: ann-benchmarks-hnswlib
module: ann_benchmarks.algorithms.hnswlib
name: hnswlib
run_groups:
M-12:
arg_groups: [{M: 12, efConstruction: 500}]
args: {}
query_args: [[10, 20, 40, 80, 120, 200, 400, 600, 800]]
...
// query_args 对应了 hnswlib/module.py中的set_query_arguments函数的参数
def set_query_arguments(self, ef):
self.p.set_ef(ef)hnswlib中set_ef和faiss_hnsw中设置efSearch的效果是一样的, faiss中的图例就标注了efSearch(ef)
,"faiss ({'M': 12, 'efConstruction': 500}, ef: 80)
efSearch不影响build,所以在Recall/Build time的图上hnswlib就没有”重复”的点了
hnswlib和faiss_hnsw的性能表现
主要看recall和build的表现
ann-benchmark中的表现
参考ann-benchmark的论文,他们运行的硬件环境是: Amazon EC2 c5.4xlarge instances that are equipped with Intel Xeon Platinum 8124M CPU (16 cores available, 3.00 GHz, 25.0MB Cache) and 32GB of RAM running Amazon Linux.
ann-benchmarks网站上的benchmark如下:
说明:
整体QPS随着recall的上升成指数级的下降 (两者都是)
相同召回率的情况下,hnswlib的QPS要高一点, 由于faiss标注了ef(efSearch)但是hnswlib没有标注,所以比较起来不太直观,后面我在本地自己跑的时候给hnswlib带上了ef参数,这样就更加直观了
说明: 1. Build Index Time也是hnswlib占优
本地的表现
硬件环境: AMD Ryzen 5 3600 (3.6G Hz), 6 core 12 threads, 32GB of RAM running Ubuntu 24.04
这里我修改了ann-benchmarks中hnswlib的环境和在efConstruction=200和500的情况下比较hnswlib的表现
1 | // ann_benchmarks/algorithms/hnswlib/config.yml |
Recall & QPS 表现
关于recall 我们更需要关注是否通过参数配置到足够高的精度,我这里以0.99的recall作为基础,主要比较在0.99以上的召回率条件下(qps > 2000),两种算法实现的qps及其对应的参数
algorithm | parameters | k-nn(recall) | qps | build(s) | indexsize(kb) |
---|---|---|---|---|---|
faiss_hnsw | faiss ({‘M’: 24, ‘efConstruction’: 500}, ef: 80) | 0.99031 | 2561.923162412995 | 1458.773098230362 | 794864.0 |
faiss_hnsw | faiss ({‘M’: 16, ‘efConstruction’: 500}, ef: 120) | 0.99347 | 2194.419076982479 | 1285.849939107895 | 732492.0 |
faiss_hnsw | faiss ({‘M’: 36, ‘efConstruction’: 500}, ef: 80) | 0.99314 | 2191.7604350595634 | 1585.1033165454865 | 888936.0 |
faiss_hnsw | faiss ({‘M’: 48, ‘efConstruction’: 500}, ef: 80) | 0.99411 | 2077.6779248107487 | 1667.5446255207062 | 982148.0 |
hnswlib | hnswlib ({‘M’: 64, ‘efConstruction’: 200}, ef: 80) | 0.9901500000000001 | 3731.2651844102425 | 496.65849924087524 | 1150800.0 |
hnswlib | hnswlib ({‘M’: 96, ‘efConstruction’: 200}, ef: 80) | 0.9902599999999999 | 3616.44180507205 | 513.7249546051025 | 1400632.0 |
hnswlib | hnswlib ({‘M’: 36, ‘efConstruction’: 500}, ef: 80) | 0.99308 | 3571.6658501707066 | 1156.385510444641 | 931660.0 |
hnswlib | hnswlib ({‘M’: 24, ‘efConstruction’: 200}, ef: 120) | 0.9939 | 3314.9200264352426 | 426.75320744514465 | 838564.0 |
hnswlib | hnswlib ({‘M’: 48, ‘efConstruction’: 500}, ef: 80) | 0.99433 | 3290.7135482210724 | 1246.9214706420898 | 1026148.0 |
hnswlib | hnswlib ({‘M’: 64, ‘efConstruction’: 500}, ef: 80) | 0.9946999999999999 | 3153.6867226554614 | 1276.8294219970703 | 1146332.0 |
hnswlib | hnswlib ({‘M’: 24, ‘efConstruction’: 500}, ef: 120) | 0.9955999999999999 | 3139.927141602247 | 1008.0483357906342 | 838676.0 |
hnswlib | hnswlib ({‘M’: 12, ‘efConstruction’: 200}, ef: 200) | 0.9920199999999999 | 3100.2119285498034 | 313.95580410957336 | 745664.0 |
hnswlib | hnswlib ({‘M’: 12, ‘efConstruction’: 500}, ef: 200) | 0.99271 | 3040.3355289720653 | 709.9178235530853 | 745132.0 |
hnswlib | hnswlib ({‘M’: 96, ‘efConstruction’: 500}, ef: 80) | 0.9949 | 3036.296166689699 | 1317.8283751010895 | 1400940.0 |
hnswlib | hnswlib ({‘M’: 36, ‘efConstruction’: 200}, ef: 120) | 0.99544 | 2873.9128879689624 | 466.08784890174866 | 931436.0 |
hnswlib | hnswlib ({‘M’: 48, ‘efConstruction’: 200}, ef: 120) | 0.9956799999999999 | 2690.4460742892884 | 482.9417383670807 | 1025096.0 |
hnswlib | hnswlib ({‘M’: 64, ‘efConstruction’: 200}, ef: 120) | 0.99576 | 2671.663602183599 | 496.65849924087524 | 1150800.0 |
hnswlib | hnswlib ({‘M’: 36, ‘efConstruction’: 500}, ef: 120) | 0.99753 | 2612.247989122775 | 1156.385510444641 | 931660.0 |
hnswlib | hnswlib ({‘M’: 96, ‘efConstruction’: 200}, ef: 120) | 0.99588 | 2581.161300550079 | 513.7249546051025 | 1400632.0 |
hnswlib | hnswlib ({‘M’: 16, ‘efConstruction’: 200}, ef: 200) | 0.9962 | 2565.1980331237614 | 364.242990732193 | 778360.0 |
hnswlib | hnswlib ({‘M’: 48, ‘efConstruction’: 500}, ef: 120) | 0.9979100000000001 | 2394.6802442922076 | 1246.9214706420898 | 1026148.0 |
hnswlib | hnswlib ({‘M’: 64, ‘efConstruction’: 500}, ef: 120) | 0.9980800000000001 | 2282.691070497442 | 1276.8294219970703 | 1146332.0 |
hnswlib | hnswlib ({‘M’: 96, ‘efConstruction’: 500}, ef: 120) | 0.99821 | 2188.4360442789643 | 1317.8283751010895 | 1400940.0 |
hnswlib | hnswlib ({‘M’: 24, ‘efConstruction’: 200}, ef: 200) | 0.99818 | 2166.896152958348 | 426.75320744514465 | 838564.0 |
hnswlib | hnswlib ({‘M’: 8, ‘efConstruction’: 200}, ef: 400) | 0.99244 | 2138.951169866413 | 256.2939279079437 | 715140.0 |
hnswlib | hnswlib ({‘M’: 8, ‘efConstruction’: 500}, ef: 400) | 0.9936 | 2124.6338197635687 | 588.7308986186981 | 715372.0 |
hnswlib | hnswlib ({‘M’: 24, ‘efConstruction’: 500}, ef: 200) | 0.9989100000000001 | 2017.995520316913 | 1008.0483357906342 | 838676.0 |
Build Index Time
algorithm | parameters | indexsize | build |
---|---|---|---|
hnswlib | hnswlib ({‘M’: 8, ‘efConstruction’: 200}, ef: 400) | 715140.0 | 256.2939279079437 |
hnswlib | hnswlib ({‘M’: 12, ‘efConstruction’: 200}, ef: 200) | 745664.0 | 313.95580410957336 |
hnswlib | hnswlib ({‘M’: 16, ‘efConstruction’: 200}, ef: 200) | 778360.0 | 364.242990732193 |
hnswlib | hnswlib ({‘M’: 24, ‘efConstruction’: 200}, ef: 120) | 838564.0 | 426.75320744514465 |
hnswlib | hnswlib ({‘M’: 36, ‘efConstruction’: 200}, ef: 120) | 931436.0 | 466.08784890174866 |
hnswlib | hnswlib ({‘M’: 48, ‘efConstruction’: 200}, ef: 120) | 1025096.0 | 482.9417383670807 |
hnswlib | hnswlib ({‘M’: 64, ‘efConstruction’: 200}, ef: 120) | 1150800.0 | 496.65849924087524 |
hnswlib | hnswlib ({‘M’: 96, ‘efConstruction’: 200}, ef: 80) | 1400632.0 | 513.7249546051025 |
hnswlib | hnswlib ({‘M’: 8, ‘efConstruction’: 500}, ef: 400) | 715372.0 | 588.7308986186981 |
hnswlib | hnswlib ({‘M’: 12, ‘efConstruction’: 500}, ef: 200) | 745132.0 | 709.9178235530853 |
hnswlib | hnswlib ({‘M’: 24, ‘efConstruction’: 500}, ef: 120) | 838676.0 | 1008.0483357906342 |
hnswlib | hnswlib ({‘M’: 36, ‘efConstruction’: 500}, ef: 120) | 931660.0 | 1156.385510444641 |
hnswlib | hnswlib ({‘M’: 48, ‘efConstruction’: 500}, ef: 120) | 1026148.0 | 1246.9214706420898 |
hnswlib | hnswlib ({‘M’: 64, ‘efConstruction’: 500}, ef: 120) | 1146332.0 | 1276.8294219970703 |
faiss_hnsw | faiss ({‘M’: 16, ‘efConstruction’: 500}, ef: 120) | 732492.0 | 1285.849939107895 |
hnswlib | hnswlib ({‘M’: 96, ‘efConstruction’: 500}, ef: 120) | 1400940.0 | 1317.8283751010895 |
faiss_hnsw | faiss ({‘M’: 24, ‘efConstruction’: 500}, ef: 80) | 794864.0 | 1458.773098230362 |
faiss_hnsw | faiss ({‘M’: 36, ‘efConstruction’: 500}, ef: 80) | 888936.0 | 1585.1033165454865 |
faiss_hnsw | faiss ({‘M’: 48, ‘efConstruction’: 500}, ef: 80) | 982148.0 | 1667.5446255207062 |
说明
1.efConstruction越小,build 耗费时间越小,牺牲的精确性可以通过加大ef来弥补, 要想获取最佳性能需要对M,efConstruction, ef这三个参数进行平衡
2.indexsize 只和M有关系
3.整体而言,hnswlib的性能仍然要比faiss的hnsw的要好一点, 两者差距不大
4.efConstruction=200的情况下通过适当调大ef也能实现较高的召回率,也不会带来性能损失,但是对Build Index Time 会有较大的提升
其他细节指标解读
- ann-benchmarks中通过python data_export.py可以得出详细的指标数据, 其中有两列
epsilon
,largeepsilon
这里其实对应得是在算recall的时候,允许的距离误差大小(使用euclidean距离,就是向量召回的点和实际的knn的点距离误差),epsilon
是0.01,largeepsilon
, 允许的误差越大,对应的recall 就越高, 论文中的实际公式如下:
- indexsize在graph base的算法中都挺大的,不过整体上hnswlib的更小一点
总结
这里列举都是在特定数据集下的表现,事实上不同数据集下不同类型的算法表现不尽相同,hnswlib不是在任何情况下的表现都由于faiss_hnsw, 那在实际的生产环境中,还是需要对不同的算法进行benchmark,从而得出更好的参数和选出更好的算法
ann-benchmarks的论文指出虽然graph-based的算法在rand-euclidean的数据集下性能还不如faiss-ivf,但是在实际的数据集上,往往会有所谓的global structure出现,graph based的算法一般都是更好的选择,graph based中的hnsw也是更好的选择
ann-benchmarks也指出如果使用GPU的话,graph based的算法不如ivf这种简单的算法