Release Notes
What's new in 0.3.2
- New: Ability to pin models to the top is back!
- Right-click on a model in My Models and select "Pin to top" to pin it to the top of the list.
- Chat migration dialog now appears in the chat side bar.
- You can migrate your chats from pre-0.3.0 versions from there.
- As of v0.3.1, system prompts now migrated as well.
- Your old chats are NOT deleted.
- Don't show a badge with a number on the downloads button if there are no downloads
- Added a button to collapse the FAQ sidebar in the Discover tab
- Reduced default context size from 8K to 4K tokens to alleviate Out of Memory issues
- You can still configure any context size you want in the model load settings
- Added a warning next to the Flash Attention in model load settings
- Flash Attention is experimental and may not be suitable for all models
- Updated the bundled
llama.cpp
engine to 3246fe84d78c8ccccd4291132809236ef477e9ea
(Aug 27)
Bug Fixes
- Bug fix: My Models model size aggregation was incorrect when you had multi-part model files
- Bug fix: (Linux) RAG would fail due to missing bundled embedding model (fixed)
- Bug fix: Flash Attention - KV cache quantization is back to FP16 by default
- In 0.3.0, the K and V were both set to Q8, which introduced large latencies in some cases
- You might notice an increase in memory consumption when FA is ON compared with 0.3.1, but on par with 0.2.31
- Bug fix: On some setups app would hang at start up (fix + mitigation)
- Bug fix: Fixed an issue where the downloads panel was dragged over the app top bar
- Bug fix: Fixed typos in built-in code snippets (server tab)
Even More