Documentation

Structured Output

You can enforce a particular response format from an LLM by providing a JSON schema to the /v1/chat/completions endpoint, via LM Studio's REST API (or via any OpenAI client).


Start LM Studio as a server

To use LM Studio programatically from your own code, run LM Studio as a local server.

You can turn on the server from the "Developer" tab in LM Studio, or via the lms CLI:

lms server start
Install lms by running npx lmstudio install-cli

This will allow you to interact with LM Studio via an OpenAI-like REST API. For an intro to LM Studio's OpenAI-like API, see Running LM Studio as a server.


Structured Output

The API supports structured JSON outputs through the /v1/chat/completions endpoint when given a JSON schema. Doing this will cause the LLM to respond in valid JSON conforming to the schema provided.

It follows the same format as OpenAI's recently announced Structured Output API and is expected to work via the OpenAI client SDKs.

Example using curl

This example demonstrates a structured output request using the curl utility.

To run this example on Mac or Linux, use any terminal. On Windows, use Git Bash.

curl http://{{hostname}}:{{port}}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "{{model}}",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful jokester."
      },
      {
        "role": "user",
        "content": "Tell me a joke."
      }
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "joke_response",
        "strict": "true",
        "schema": {
          "type": "object",
          "properties": {
            "joke": {
              "type": "string"
            }
          },
        "required": ["joke"]
        }
      }
    },
    "temperature": 0.7,
    "max_tokens": 50,
    "stream": false
  }'

All parameters recognized by /v1/chat/completions will be honored, and the JSON schema should be provided in the json_schema field of response_format.

The JSON object will be provided in string form in the typical response field, choices[0].message.content, and will need to be parsed into a JSON object.

Example using python

from openai import OpenAI
import json

# Initialize OpenAI client that points to the local LM Studio server
client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio"
)

# Define the conversation with the AI
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Create 1-3 fictional characters"}
]

# Define the expected response structure
character_schema = {
    "type": "json_schema",
    "json_schema": {
        "name": "characters",
        "schema": {
            "type": "object",
            "properties": {
                "characters": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "occupation": {"type": "string"},
                            "personality": {"type": "string"},
                            "background": {"type": "string"}
                        },
                        "required": ["name", "occupation", "personality", "background"]
                    },
                    "minItems": 1,
                }
            },
            "required": ["characters"]
        },
    }
}

# Get response from AI
response = client.chat.completions.create(
    model="your-model",
    messages=messages,
    response_format=character_schema,
)

# Parse and display the results
results = json.loads(response.choices[0].message.content)
print(json.dumps(results, indent=2))

Important: Not all models are capable of structured output, particularly LLMs below 7B parameters.

Check the model card README if you are unsure if the model supports structured output.

Structured output engine

  • For GGUF models: utilize llama.cpp's grammar-based sampling APIs.
  • For MLX models: using Outlines.

The MLX implementation is available on Github: lmstudio-ai/mlx-engine.


Community

Chat with other LM Studio users, discuss LLMs, hardware, and more on the LM Studio Discord server.