Requires LM Studio 0.3.6 or newer. Still WIP, endpoints may change.
LM Studio now has its own REST API, in addition to OpenAI compatibility mode (learn more).
The REST API includes enhanced stats such as Token / Second and Time To First Token (TTFT), as well as rich information about models such as loaded vs unloaded, max context, quantization, and more.
{"id":"chatcmpl-i3gkjwthhw96whukek9tz","object":"chat.completion","created":1731990317,"model":"granite-3.0-2b-instruct","choices":[{"index":0,"logprobs":null,"finish_reason":"stop","message":{"role":"assistant","content":"Greetings, I'm a helpful AI, here to assist,\nIn providing answers, with no distress.\nI'll keep it short and sweet, in rhyme you'll find,\nA friendly companion, all day long you'll bind."}}],"usage":{"prompt_tokens":24,"completion_tokens":53,"total_tokens":77},"stats":{"tokens_per_second":51.43709529007664,"time_to_first_token":0.111,"generation_time":0.954,"stop_reason":"eosFound"},"model_info":{"arch":"granite","quant":"Q4_K_M","format":"gguf","context_length":4096},"runtime":{"name":"llama.cpp-mac-arm64-apple-metal-advsimd","version":"1.3.0","supported_formats":["gguf"]}}
POST /api/v0/completions
Text Completions API. You provide a prompt and receive a completion.
Example request
curl http://localhost:1234/api/v0/completions \
-H "Content-Type: application/json" \
-d '{
"model": "granite-3.0-2b-instruct",
"prompt": "the meaning of life is",
"temperature": 0.7,
"max_tokens": 10,
"stream": false,
"stop": "\n"
}'
Response format
{"id":"cmpl-p9rtxv6fky2v9k8jrd8cc","object":"text_completion","created":1731990488,"model":"granite-3.0-2b-instruct","choices":[{"index":0,"text":" to find your purpose, and once you have","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":9,"total_tokens":14},"stats":{"tokens_per_second":57.69230769230769,"time_to_first_token":0.299,"generation_time":0.156,"stop_reason":"maxPredictedTokensReached"},"model_info":{"arch":"granite","quant":"Q4_K_M","format":"gguf","context_length":4096},"runtime":{"name":"llama.cpp-mac-arm64-apple-metal-advsimd","version":"1.3.0","supported_formats":["gguf"]}}
POST /api/v0/embeddings
Text Embeddings API. You provide a text and a representation of the text as an embedding vector is returned.
Example request
curl http://127.0.0.1:1234/api/v0/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-nomic-embed-text-v1.5",
"input": "Some text to embed"
}
Example response
{"object":"list","data":[{"object":"embedding","embedding":[-0.016731496900320053,0.028460891917347908,-0.1407836228609085,
... (truncated for brevity) ...,0.02505224384367466,-0.0037634256295859814,-0.04341062530875206],"index":0}],"model":"text-embedding-nomic-embed-text-v1.5@q4_k_m","usage":{"prompt_tokens":0,"total_tokens":0}}