Skip to content

Conversation

leejet
Copy link
Owner

@leejet leejet commented Sep 17, 2025

Try to fix #837.

if (!force_clip_f32) {
auto tensor_type = tensor_types.find(prefix + "token_embedding.weight");
if (tensor_type != tensor_types.end())
if (tensor_type != tensor_types.end() && tensor_type->second == GGML_TYPE_F16) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lower quants could be kept at F16 too (or even Q8)? The main reason for quantization is reducing memory usage, so it'd be surprising if a Q8_0 or Q4_0 quant ended up using more VRAM than F16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clip related crash, segmentation fault
2 participants