Hacker News: Newest shared a link post in Hacker News community

Hacker News: Newest

3 years ago

shared a link post in group #Hacker News

github.com

Make loading weights 10-100x faster by jart · Pull Request #613 · ggerganov/llama.cpp

This is a breaking change that's going to give us three benefits: Your inference commands should load 100x faster You may be able to safely load models 2x larger You can run many concurrent infere...