Documentation
Getting Started
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Manage Models
Model Info
Getting Started
Predicting with LLMs
Agentic Flows
Text Embedding
Tokenization
Manage Models
Model Info
Image Input
Required Python SDK version: 1.1.0
Some models, known as VLMs (Vision-Language Models), can accept images as input. You can pass images to the model using the .respond()
method.
If you don't yet have a VLM, you can download a model like qwen2-vl-2b-instruct
using the following command:
lms get qwen2-vl-2b-instruct
Connect to LM Studio and obtain a handle to the VLM (Vision-Language Model) you want to use.
import lmstudio as lms
model = lms.llm("qwen2-vl-2b-instruct")
Use the prepare_image()
function or files
namespace method to
get a handle to the image that can subsequently be passed to the model.
import lmstudio as lms
image_path = "/path/to/image.jpg" # Replace with the path to your image
image_handle = lms.prepare_image(image_path)
If you only have the raw data of the image, you can supply the raw data directly as a bytes object without having to write it to disk first. Due to this feature, binary filesystem paths are not supported (as they will be handled as malformed image data rather than as filesystem paths).
Binary IO objects are also accepted as local file inputs.
The LM Studio server supports JPEG, PNG, and WebP image formats.
.respond()
Generate a prediction by passing the image to the model in the .respond()
method.
import lmstudio as lms
image_path = "/path/to/image.jpg" # Replace with the path to your image
image_handle = lms.prepare_image(image_path)
model = lms.llm("qwen2-vl-2b-instruct")
chat = lms.Chat()
chat.add_user_message("Describe this image please", images=[image_handle])
prediction = model.respond(chat)
On this page
Prerequisite: Get a VLM (Vision-Language Model)
1. Instantiate the Model
2. Prepare the Image
3. Pass the Image to the Model in .respond()