another AI question

Wed Apr 8 08:26:01 EDT 2026

I spent a lot of time and heartache dealing with VLLM and ollama.
Here are the issues vllm. CPU is barely supported, and it is 60 GB docker
layers to build it, then runs super slow. Ollama is much better on CPU,
community wise they take months to merge common sense features. Still
better than VLLM is a black hole of denial and then never merging anyway.
The problem I have with ollama is similar to VLLM the install itself is
about 14GB as it downloads every C and  blas library in existence.

Enter deliverance
https://github.com/edwardcapriolo/deliverance

- Written for CPU
- Written in Java with selected C modules foursome heavy lifting
- A binary of < 55MB! (not 20 GB of python c and tensor libraries)
- compiles in < 5 minutes (including tests)
- Available in docker hub https://hub.docker.com/r/ecapriolo/deliveranc

I do nice work with quantized models for qwen, gemma, and llama. I do most
of my devwork on a core i5 that is 8 years old.

Ow but EdI hatem da java.
Well the tensor library does some math operation in SIMD in C
https://github.com/edwardcapriolo/deliverance/blob/main/native/src/main/c/simd/vector_simd.c

And and it even has web_dawn CPU support (confession I donthave a GPU to
test on)
https://github.com/edwardcapriolo/deliverance/blob/main/native/src/main/c/gpu/vector_gpu.c

If anyone wants to loan be acccess to a BSD system or a BSD system with GPU
I can run some tests there and we can have some fun.

The dependencies are very light on the C side my alpine system that i test
on looks like this:

doas apk add maven
doas apk install git

doas apk add curl
doas apk add docker-compose
doas apk add openjdk25
 doas apk add gpg
 doas apk add bash
doas apk add clang20-libclang-20.1.8-r0
doas apk add llvm clang lld

That will build the SIMD c module, which as I mentioned is significantly
less effort then the 12 GB of blas libraries ollama installs and the 4GB of
tensorflow stuff transformers will install

Thanks,
Edward

On Tue, Apr 7, 2026 at 5:50 PM Martin Cracauer <cracauer at cons.org> wrote:

> The situation with LLMs on FreeBSD is not totally catastrophic.
>
> The NVidia drivers are currently broken on my 5090, so I cannot
> compare Vulkan/FreeBSD to Linux/Cuda.
>
> But they work on my 2080ti with Vulkan and run both ollama and
> llama.cpp, accelerated.
>
> On my laptop with "AMD Ryzen 7 PRO 4750U with Radeon Graphics" also
> runs Vulkan and accelerates ollama (although only by a factor of 3
> compared to CPU).  This combo does not run llama.cpp
>
> Now that NVidia drivers are running on at least one of my cards I'll
> give it another go to run CUDA through Linuxulator.
>
> Martin
> --
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> Martin Cracauer <cracauer at cons.org>   http://www.cons.org/cracauer/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nycbug.org:8443/pipermail/talk/attachments/20260408/30338a9f/attachment.htm>