Good stuff!
You absolutely cooked with these! I replaced my daily driver for important tasks.
Used: GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0-v2
It scored much higher on my mini private benchmark, general vibe test, and, perplexity to just throw it in the mix, all while retaining solid performance.
It crushed unsloths ‘ud,’ q3_k_xl. The q4_k_xl did better perplexity-wise, but it wasn't usable in terms of speed/performance on my hardware, topping out at 5 tk/s TG. I'll have to give your v1 a whirl, there's no quant in the v2 batch similar in size to give it a fair shake.
Bartowski’s IQ4_XS was a very close second, though it is 7.3 GB smaller, so it might be a tie. In all fairness, his models are always superb, at least to me.
I tried your two smaller variants, but GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ4_XS-IQ4_NL-v2 ran slow for some reason, and GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ3_S-IQ4_NL-v2 didn’t perform well; Bartowski’s was the clear winner between those two.
Here are some perplexity numbers; take them with a grain of salt. I only ran 75 chunks, which gives a good rough idea without waiting hours. Speeds vary as I was experimenting with various Linux kernel schedulers, and I lost track of the runs.
—
- [May have made some mistakes below, it's late and I wanted to say well done and thank you before I forgot again :P]
--- GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0-v2
llama_model_loader: loaded meta data with 42 key-value pairs and 803 tensors from /home/iam/Downloads/Models/GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0-v2.gguf (version GGUF V3 (latest))
perplexity: tokenizing the input ..
perplexity: tokenization took 416.239 ms
perplexity: calculating perplexity over 75 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 30.57 seconds per pass - ETA 9.55 minutes
0 2.9273 1.074086 0.110931
0 3.6302 1.289277 0.091056
0 2.8024 1.030472 0.071197
0 2.4754 0.906421 0.061893
2048 2.4320 0.888695 0.053184
2048 2.3032 0.834286 0.045946
2048 2.3109 0.837642 0.042740
2048 2.3804 0.867259 0.039812
4096 2.4878 0.911397 0.037920
4096 2.5030 0.917473 0.035596
4096 2.4952 0.914377 0.033811
4096 2.6972 0.992215 0.034238
6144 3.0065 1.100791 0.034947
6144 3.0562 1.117159 0.033817
6144 3.2102 1.166325 0.033267
6144 3.2910 1.191181 0.032403
8192 3.4199 1.229612 0.032256
8192 3.6551 1.296125 0.032286
8192 3.5916 1.278592 0.031342
8192 3.6296 1.289119 0.030439
10240 3.7176 1.313089 0.029906
10240 3.6849 1.304233 0.029053
10240 3.5997 1.280847 0.028252
10240 3.5304 1.261423 0.027445
12288 3.4873 1.249130 0.026728
12288 3.4546 1.239704 0.026094
12288 3.4317 1.233046 0.025630
12288 3.4680 1.243567 0.025265
14336 3.5115 1.256040 0.024873
14336 3.5660 1.271458 0.024639
14336 3.6333 1.290136 0.024492
14336 3.6811 1.303213 0.024190
16384 3.7518 1.322231 0.023961
16384 3.7869 1.331557 0.023630
16384 3.8771 1.355076 0.023508
16384 3.9272 1.367921 0.023202
18432 3.9458 1.372646 0.022873
18432 4.0396 1.396151 0.022811
18432 4.0721 1.404162 0.022516
18432 4.1046 1.412116 0.022214
20480 4.1845 1.431381 0.022065
20480 4.2002 1.435133 0.021790
20480 4.1996 1.434992 0.021529
20480 4.2363 1.443691 0.021322
22528 4.3437 1.468731 0.021293
22528 4.4034 1.482380 0.021116
22528 4.3713 1.475056 0.020881
22528 4.3036 1.459455 0.020690
24576 4.2605 1.449396 0.020426
24576 4.2656 1.450588 0.020238
24576 4.3086 1.460603 0.020014
24576 4.3411 1.468132 0.019874
26624 4.3904 1.479427 0.019750
26624 4.4209 1.486334 0.019599
26624 4.4429 1.491318 0.019466
26624 4.4728 1.498014 0.019298
28672 4.4688 1.497114 0.019107
28672 4.4908 1.502041 0.018977
28672 4.5009 1.504285 0.018828
28672 4.5472 1.514502 0.018696
30720 4.5920 1.524306 0.018616
30720 4.6526 1.537417 0.018515
30720 4.7047 1.548567 0.018451
30720 4.7340 1.554775 0.018309
32768 4.7450 1.557083 0.018166
32768 4.7570 1.559608 0.018027
32768 4.7581 1.559856 0.017854
32768 4.7639 1.561063 0.017747
34816 4.8057 1.569812 0.017649
34816 4.8080 1.570286 0.017504
34816 4.7971 1.568006 0.017360
34816 4.7959 1.567756 0.017256
36864 4.8084 1.570359 0.017168
36864 4.8356 1.576015 0.017108
36864 4.8370 1.576296 0.017024
Final estimate: PPL = 4.8370 +/- 0.08234
llama_perf_context_print: load time = 44132.52 ms
llama_perf_context_print: prompt eval time = 364498.92 ms / 38400 tokens ( 9.49 ms per token, 105.35 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 366322.90 ms / 38401 tokens
llama_perf_context_print: graphs reused = 0
---Bartowski's IQ4_XS
llama_model_loader: loaded meta data with 48 key-value pairs and 803 tensors from /mnt/spare/AI-Models/bartowski_zai-org_GLM-4.5-Air-GGUF/zai-org_GLM-4.5-Air-IQ4_XS-00001-of-00002.gguf (version GGUF V3 (latest))
perplexity: tokenizing the input ..
perplexity: tokenization took 429.382 ms
perplexity: calculating perplexity over 75 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 49.96 seconds per pass - ETA 15.60 minutes
0 2.8947 1.062871 0.109924
0 3.6463 1.293711 0.090985
0 2.8217 1.037327 0.071466
0 2.4826 0.909291 0.061360
2048 2.4939 0.913846 0.053093
2048 2.3597 0.858539 0.045797
2048 2.3665 0.861413 0.042619
2048 2.4086 0.879040 0.039719
4096 2.5175 0.923267 0.038010
4096 2.5372 0.931073 0.035643
4096 2.5322 0.929097 0.033912
4096 2.7331 1.005434 0.034361
6144 3.0467 1.114045 0.035039
6144 3.0967 1.130325 0.033895
6144 3.2476 1.177920 0.033323
6144 3.3271 1.202089 0.032447
8192 3.4563 1.240184 0.032295
8192 3.6927 1.306361 0.032311
8192 3.6289 1.288931 0.031380
8192 3.6709 1.300431 0.030491
10240 3.7559 1.323319 0.029952
10240 3.7253 1.315147 0.029107
10240 3.6375 1.291295 0.028326
10240 3.5655 1.271312 0.027513
12288 3.5203 1.258544 0.026792
12288 3.4936 1.250919 0.026180
12288 3.4680 1.243571 0.025720
12288 3.5007 1.252962 0.025355
14336 3.5472 1.266146 0.024981
14336 3.6006 1.281088 0.024748
14336 3.6663 1.299173 0.024582
14336 3.7152 1.312440 0.024296
16384 3.7827 1.330450 0.024029
16384 3.8183 1.339800 0.023691
16384 3.9087 1.363198 0.023558
16384 3.9586 1.375897 0.023249
18432 3.9766 1.380432 0.022926
18432 4.0682 1.403210 0.022859
18432 4.1011 1.411264 0.022566
18432 4.1345 1.419365 0.022268
20480 4.2155 1.438768 0.022122
20480 4.2316 1.442579 0.021849
20480 4.2300 1.442213 0.021590
20480 4.2656 1.450571 0.021383
22528 4.3734 1.475535 0.021353
22528 4.4370 1.489977 0.021186
22528 4.3998 1.481569 0.020948
22528 4.3368 1.467128 0.020756
24576 4.2936 1.457128 0.020500
24576 4.2989 1.458359 0.020312
24576 4.3443 1.468867 0.020094
24576 4.3756 1.476049 0.019948
26624 4.4237 1.486981 0.019817
26624 4.4540 1.493793 0.019664
26624 4.4738 1.498228 0.019529
26624 4.5022 1.504571 0.019357
28672 4.4951 1.502985 0.019163
28672 4.5159 1.507602 0.019029
28672 4.5253 1.509675 0.018881
28672 4.5711 1.519764 0.018750
30720 4.6149 1.529295 0.018666
30720 4.6770 1.542649 0.018569
30720 4.7310 1.554137 0.018514
30720 4.7611 1.560484 0.018373
32768 4.7702 1.562383 0.018226
32768 4.7805 1.564544 0.018082
32768 4.7799 1.564423 0.017905
32768 4.7859 1.565678 0.017795
34816 4.8295 1.574738 0.017700
34816 4.8319 1.575237 0.017553
34816 4.8212 1.573017 0.017406
34816 4.8214 1.573064 0.017304
36864 4.8335 1.575573 0.017215
36864 4.8599 1.581021 0.017152
36864 4.8610 1.581246 0.017071
Final estimate: PPL = 4.8610 +/- 0.08298
llama_perf_context_print: load time = 173919.89 ms
llama_perf_context_print: prompt eval time = 368140.00 ms / 38400 tokens ( 9.59 ms per token, 104.31 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 369979.83 ms / 38401 tokens
llama_perf_context_print: graphs reused = 0
--- GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ3_S-IQ4_NL-v2
llama_model_loader: loaded meta data with 42 key-value pairs and 803 tensors from /home/iam/Downloads/Models/GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ3_S-IQ4_NL-v2.gguf (version GGUF V3 (latest))
perplexity: tokenizing the input ..
perplexity: tokenization took 488.22 ms
perplexity: calculating perplexity over 75 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 28.32 seconds per pass - ETA 8.83 minutes
0 2.8853 1.059639 0.106120
0 3.6131 1.284562 0.088691
0 2.8089 1.032805 0.069683
0 2.4854 0.910434 0.060735
2048 2.5242 0.925939 0.052864
2048 2.3811 0.867562 0.045616
2048 2.3988 0.874955 0.042466
2048 2.4706 0.904455 0.040034
4096 2.6382 0.970083 0.038649
4096 2.6573 0.977324 0.036241
4096 2.6420 0.971535 0.034435
4096 2.8416 1.044364 0.034708
6144 3.1722 1.154421 0.035447
6144 3.2265 1.171411 0.034338
6144 3.3781 1.217312 0.033709
6144 3.4544 1.239663 0.032809
8192 3.5776 1.274699 0.032526
8192 3.8182 1.339771 0.032517
8192 3.7472 1.321019 0.031550
8192 3.7788 1.329414 0.030605
10240 3.8608 1.350878 0.030067
10240 3.8219 1.340751 0.029217
10240 3.7290 1.316144 0.028412
10240 3.6505 1.294861 0.027595
12288 3.6010 1.281208 0.026861
12288 3.5662 1.271512 0.026251
12288 3.5391 1.263864 0.025773
12288 3.5716 1.273012 0.025382
14336 3.6184 1.286035 0.025017
14336 3.6695 1.300064 0.024766
14336 3.7352 1.317812 0.024596
14336 3.7824 1.330358 0.024291
16384 3.8537 1.349035 0.024040
16384 3.8917 1.358849 0.023700
16384 3.9838 1.382239 0.023569
16384 4.0347 1.394941 0.023262
18432 4.0494 1.398558 0.022925
18432 4.1428 1.421368 0.022857
18432 4.1757 1.429274 0.022572
18432 4.2092 1.437267 0.022276
20480 4.2893 1.456134 0.022133
20480 4.3027 1.459246 0.021862
20480 4.3007 1.458786 0.021597
20480 4.3361 1.466976 0.021390
22528 4.4439 1.491525 0.021356
22528 4.5067 1.505558 0.021186
22528 4.4710 1.497607 0.020944
22528 4.4021 1.482075 0.020722
24576 4.3570 1.471793 0.020471
24576 4.3622 1.472981 0.020286
24576 4.4065 1.483071 0.020064
24576 4.4384 1.490289 0.019921
26624 4.4892 1.501666 0.019800
26624 4.5208 1.508695 0.019646
26624 4.5427 1.513530 0.019513
26624 4.5708 1.519696 0.019338
28672 4.5626 1.517903 0.019146
28672 4.5843 1.522632 0.019010
28672 4.5921 1.524342 0.018859
28672 4.6375 1.534165 0.018722
30720 4.6830 1.543947 0.018638
30720 4.7463 1.557358 0.018536
30720 4.7994 1.568488 0.018481
30720 4.8282 1.574482 0.018337
32768 4.8394 1.576789 0.018193
32768 4.8503 1.579047 0.018052
32768 4.8494 1.578865 0.017876
32768 4.8544 1.579885 0.017764
34816 4.8972 1.588672 0.017669
34816 4.8990 1.589029 0.017525
34816 4.8846 1.586090 0.017378
34816 4.8854 1.586245 0.017280
36864 4.9002 1.589272 0.017199
36864 4.9272 1.594769 0.017136
36864 4.9276 1.594861 0.017049
Final estimate: PPL = 4.9276 +/- 0.08401
llama_perf_context_print: load time = 47535.27 ms
llama_perf_context_print: prompt eval time = 401160.26 ms / 38400 tokens ( 10.45 ms per token, 95.72 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 403384.70 ms / 38401 tokens
llama_perf_context_print: graphs reused = 0
--- GLM-4.5-Air-Q8_0-FFN-IQ4_XS-IQ4_XS-IQ4_NL-v2
perplexity: tokenizing the input ..
perplexity: tokenization took 677.204 ms
perplexity: calculating perplexity over 75 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 43.64 seconds per pass - ETA 13.63 minutes
0 2.9353 1.076825 0.114285
0 3.6513 1.295081 0.092247
0 2.8132 1.034314 0.072015
0 2.4761 0.906694 0.062623
2048 2.4622 0.901070 0.054131
2048 2.3425 0.851220 0.046791
2048 2.3405 0.850365 0.043396
2048 2.3972 0.874281 0.040338
4096 2.5288 0.927738 0.038668
4096 2.5460 0.934539 0.036251
4096 2.5359 0.930534 0.034401
4096 2.7378 1.007150 0.034744
6144 3.0513 1.115577 0.035395
6144 3.1022 1.132128 0.034242
6144 3.2534 1.179711 0.033637
6144 3.3325 1.203726 0.032748
8192 3.4603 1.241369 0.032543
8192 3.6963 1.307335 0.032529
8192 3.6283 1.288757 0.031558
8192 3.6635 1.298418 0.030623
10240 3.7504 1.321874 0.030096
10240 3.7206 1.313882 0.029244
10240 3.6328 1.289993 0.028430
10240 3.5594 1.269591 0.027603
12288 3.5156 1.257219 0.026886
12288 3.4817 1.247511 0.026252
12288 3.4578 1.240641 0.025783
12288 3.4930 1.250768 0.025420
14336 3.5380 1.263557 0.025024
14336 3.5906 1.278309 0.024774
14336 3.6564 1.296492 0.024601
14336 3.7063 1.310035 0.024314
16384 3.7773 1.329001 0.024067
16384 3.8126 1.338309 0.023724
16384 3.9027 1.361661 0.023591
16384 3.9522 1.374263 0.023279
18432 3.9720 1.379272 0.022957
18432 4.0673 1.402984 0.022906
18432 4.0992 1.410800 0.022610
18432 4.1310 1.418520 0.022304
20480 4.2102 1.437517 0.022156
20480 4.2262 1.441296 0.021881
20480 4.2256 1.441160 0.021624
20480 4.2629 1.449959 0.021416
22528 4.3714 1.475092 0.021384
22528 4.4330 1.489076 0.021208
22528 4.4003 1.481676 0.020964
22528 4.3341 1.466519 0.020765
24576 4.2871 1.455621 0.020503
24576 4.2919 1.456719 0.020313
24576 4.3343 1.466569 0.020087
24576 4.3666 1.473986 0.019945
26624 4.4168 1.485407 0.019823
26624 4.4464 1.492084 0.019668
26624 4.4685 1.497042 0.019534
26624 4.4978 1.503591 0.019363
28672 4.4915 1.502194 0.019169
28672 4.5138 1.507148 0.019037
28672 4.5238 1.509349 0.018892
28672 4.5710 1.519743 0.018761
30720 4.6145 1.529203 0.018674
30720 4.6759 1.542429 0.018571
30720 4.7306 1.554042 0.018516
30720 4.7591 1.560058 0.018374
32768 4.7689 1.562121 0.018230
32768 4.7802 1.564489 0.018088
32768 4.7804 1.564533 0.017914
32768 4.7854 1.565567 0.017801
34816 4.8267 1.574170 0.017700
34816 4.8278 1.574392 0.017551
34816 4.8192 1.572612 0.017411
34816 4.8189 1.572543 0.017309
36864 4.8326 1.575388 0.017222
36864 4.8603 1.581091 0.017160
36864 4.8620 1.581446 0.017080
Final estimate: PPL = 4.8620 +/- 0.08304
llama_perf_context_print: load time = 50600.65 ms
llama_perf_context_print: prompt eval time = 492767.38 ms / 38400 tokens ( 12.83 ms per token, 77.93 tokens per second)
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_perf_context_print: total time = 495380.22 ms / 38401 tokens
llama_perf_context_print: graphs reused = 0
Thanks for the feedback! Glad to know it's working well out in the wild. Here's some data that may interest you:
These first three charts include all the v2 quants as well as GLM-4.5-Air-Q8_0-FFN-IQ3_S-IQ3_S-Q5_0, GLM-4.5-Air-Q8_0-FFN-1Q4_XS-1Q4_XS-Q5_0 (v1), and some other custom quants.


This chart is older and shows all of the v1 quants as well as some quants from Unsloth and Bartowski.
(All KLD values are computed against a standard pure Q8_0 quant)
Overall the IQ4_XS-IQ4_XS-Q5_0-v2 does seem to be the best bang-for-buck. :3
I found this in the list of quants after wanting to try air again, and I would recommend you put one of these charts on your model page, they're an incredible quick reference and were exactly what I was looking for in these discussions. They're so infrequently used that most people won't check them though. I would highly recommend at least putting the KL divergence one on there and maybe your last sentence as well.