LM Studio 0.3.5

2024-10-22

LM Studio 0.3.5 introduces headless mode, on-demand model loading, and updates to mlx-engine to support Pixtral (MistralAI's vision-enabled LLM).


👾 We are hiring a TypeScript SDK Engineer in NYC to build apps and SDKs for on-device AI

Get the latest LM Studio:

  • macOS: Download the .dmg file from here.
  • Windows: Download the .exe file from here.
  • Linux: Download the .AppImage file from here.

LM Studio as your local LLM background server

In this release we're adding a combination of developer-facing features aimed at making it much more ergonomic to use LM Studio as your background LLM provider. We've implemented headless mode, on-demand model loading, server auto-start, and a new CLI command to download models from the terminal. These features are useful for powering local web apps, code editor or web browser extensions, and much more.

Headless mode

Normally, to use LM Studio's functionality you'd have to keep the application open. This sounds obvious when considering LM Studio's graphical user interface. But for certain developer workflows, mainly ones that use LM Studio exclusively as a server, keeping the application running results in unnecessary consumption of resources such as video memory. Moreover, it's cumbersome to remember to launch the application after a reboot and enable the server manually. No more! Enter: headless mode 👻.

Headless mode, or "Local LLM Service", enables you to leverage LM Studio's technology (completions, chat completions, embedding, structured outputs via llama.cpp or Apple MLX) as local server powering your app.

Once you turn on "Enable Local LLM Service", LM Studio's process will run without the GUI upon machine start up.

undefined

Enable the LLM server to start on machine login

Minimize to tray

To switch into using LM Studio in the background, you can minimize it into the tray. This will hide the dock icon and free up resources taken up by the graphical user interface.

Minimize to tray on Windows

undefined

Send LM Studio to run in the background on Windows

Minimize to tray on Mac

undefined

Send LM Studio to run in the background on macOS

Remember last server state

If you turn the server ON, it'll auto-start next time the application starts -- either launched by you, or on start-up when in service mode. The same goes for turning the server OFF.

To ensure the server is on, run the following command:

# Start the server
lms server start

Conversly, to ensure the server is off, run:

# Stop the server
lms server stop

On-demand model loading

Before v0.3.5: if you wanted to use a model through LM Studio you would have had to load it first yourself: either through the UI or via lms load (or through lmstudio-js).

After v0.3.5: to use a model, simply send an inferencing request to it. If the model is not yet loaded, it'll be loaded before your request returns. This means that the first request might take a few seconds until the loading operation finishes, but subsequent calls should be zippy as usual.

Combine on-demand loading with Per-Model Settings

Using on-demand model loading, you might wonder how you could configure load settings such as Context Length, GPU offload %, Flash Attention and more. This can be solved using LM Studio's per-model default settings feature.

Using per-model settings, you can predetermine which load parameters the software will use by default when loading a given model.

🛠️ API change: GET /v1/models behavior

Without JIT loading (pre-0.3.5 default): returns only models that are already loaded into memory

With JIT loading: returns all local models that can be loaded

How to turn on JIT model loading

If you've used LM Studio before, turn on Just-In-Time model loading on by flipping this switch in the Developer tab. New installs have this on by default.

undefined

Load models on demand

lms get

LM Studio's CLI, lms, gains a new command to enable you to download models directly from the terminal.

lms is updated automatically when you install a new version of LM Studio.

Download a model: lms get {author}/{repo}

To download Meta's Llama 3.2 1B, run:

lms get bartowski/llama-3.2-1b-instruct-gguf

Download a specific quantization

We're introducing the following notation for signifying quantization: @{quantization}

Get the q4_k_m quant

lms get lmstudio-community/llama-3.2-1b-instruct-gguf@q4_k_m

Get the q8_0 quant

lms get lmstudio-community/llama-3.2-1b-instruct-gguf@q8_0

Provide an explicit huggingface.co URL

You provide explicit Hugging Face URLs to download a specific model:

lms get https://huggingface.co/lmstudio-community/granite-3.0-2b-instruct-GGUF

The quantization notation works here too!

lms get https://huggingface.co/lmstudio-community/granite-3.0-2b-instruct-GGUF@q8_0

This will download the q8_0 quant for this model.

Pixtral support with Apple MLX

In LM Studio 0.3.4 we've introduced support for Apple MLX. Read about it here. In 0.3.5 we've updated the underlying MLX engine (which is open source) and added support for MistralAI's Pixtral!

This was made possible by adopting Blaizzy/mlx-vlm version 0.0.15.

You can download Pixtral through Model Search (⌘ + ⇧ + M) or by using lms get like so:

lms get mlx-community/pixtral-12b-4bit

Take it for a spin if your Mac has 16GB+ RAM, preferably 32GB+.

Bug fixes


LM Studio 0.3.5 - Full Changelog

  • Run LM Studio as a service (headless)
    • lms load, lms server start no longer requires launching the GUI
    • ability to run on machine startup
  • Server start / stop button will remember last setting
    • This is useful when LM Studio is running as a service
  • Improvement to Model Search
    • Hugging Face search now happens automatically without Cmd / Ctrl + Enter
  • Just-In-Time model loading for OpenAI endpoints
  • Button to toggle Mission Control full screen / modal modes
  • Update llama.cpp-based JSON response generation; now supports more complex JSON schemas
  • Tray menu options to minimize app to tray, copy server base URL
  • Checkbox to add lms to PATH during onboarding on Linux
  • [Mac][MLX Vision] Bump mlx-vlm version to 0.0.15, support Qwen2VL
  • [Mac][MLX Engine] Updated Transformers to 4.45.0
  • [UI] Move Chat Appearance control to top bar
  • [UI] Tweaks to size of per-message action buttons
  • Localization:

Even More