do not quantize embedding weight #840

leejet · 2025-09-17T15:31:10Z

Try to fix #837.

wbruna · 2025-09-17T16:50:17Z

clip.hpp

        if (!force_clip_f32) {
            auto tensor_type = tensor_types.find(prefix + "token_embedding.weight");
-            if (tensor_type != tensor_types.end())
+            if (tensor_type != tensor_types.end() && tensor_type->second == GGML_TYPE_F16) {


Maybe lower quants could be kept at F16 too (or even Q8)? The main reason for quantization is reducing memory usage, so it'd be surprising if a Q8_0 or Q4_0 quant ended up using more VRAM than F16.

do not quantize embedding weight

4e2a0d5

leejet mentioned this pull request Sep 17, 2025

clip related crash, segmentation fault #837

Open

wbruna reviewed Sep 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

do not quantize embedding weight #840

do not quantize embedding weight #840

Uh oh!

leejet commented Sep 17, 2025

Uh oh!

wbruna Sep 17, 2025

Uh oh!

Uh oh!

do not quantize embedding weight #840

Are you sure you want to change the base?

do not quantize embedding weight #840

Uh oh!

Conversation

leejet commented Sep 17, 2025

Uh oh!

wbruna Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!