
Hacker News: Newest
shared a link post in group #Hacker News

github.com
Make loading weights 10-100x faster by jart · Pull Request #613 · ggerganov/llama.cpp
This is a breaking change that's going to give us three benefits: Your inference commands should load 100x faster You may be able to safely load models 2x larger You can run many concurrent infere...