LM Studio now has its own REST API, in addition to OpenAI compatibility mode (learn more).
The REST API includes enhanced stats such as Token / Second and Time To First Token (TTFT), as well as rich information about models such as loaded vs unloaded, max context, quantization, and more.
{"id":"chatcmpl-i3gkjwthhw96whukek9tz","object":"chat.completion","created":1731990317,"model":"granite-3.0-2b-instruct","choices":[{"index":0,"logprobs":null,"finish_reason":"stop","message":{"role":"assistant","content":"Greetings, I'm a helpful AI, here to assist,\nIn providing answers, with no distress.\nI'll keep it short and sweet, in rhyme you'll find,\nA friendly companion, all day long you'll bind."}}],"usage":{"prompt_tokens":24,"completion_tokens":53,"total_tokens":77},"stats":{"tokens_per_second":51.43709529007664,"time_to_first_token":0.111,"generation_time":0.954,"stop_reason":"eosFound"},"model_info":{"arch":"granite","quant":"Q4_K_M","format":"gguf","context_length":4096},"runtime":{"name":"llama.cpp-mac-arm64-apple-metal-advsimd","version":"1.3.0","supported_formats":["gguf"]}}
POST /api/v0/completions
Text Completions API. You provide a prompt and receive a completion.
Example request
curl http://localhost:1234/api/v0/completions \
-H "Content-Type: application/json" \
-d '{
"model": "granite-3.0-2b-instruct",
"prompt": "the meaning of life is",
"temperature": 0.7,
"max_tokens": 10,
"stream": false,
"stop": "\n"
}'
Response format
{"id":"cmpl-p9rtxv6fky2v9k8jrd8cc","object":"text_completion","created":1731990488,"model":"granite-3.0-2b-instruct","choices":[{"index":0,"text":" to find your purpose, and once you have","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"completion_tokens":9,"total_tokens":14},"stats":{"tokens_per_second":57.69230769230769,"time_to_first_token":0.299,"generation_time":0.156,"stop_reason":"maxPredictedTokensReached"},"model_info":{"arch":"granite","quant":"Q4_K_M","format":"gguf","context_length":4096},"runtime":{"name":"llama.cpp-mac-arm64-apple-metal-advsimd","version":"1.3.0","supported_formats":["gguf"]}}
POST /api/v0/embeddings
Text Embeddings API. You provide a text and a representation of the text as an embedding vector is returned.
Example request
curl http://127.0.0.1:1234/api/v0/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-nomic-embed-text-v1.5",
"input": "Some text to embed"
}
Example response
{"object":"list","data":[{"object":"embedding","embedding":[-0.016731496900320053,0.028460891917347908,-0.1407836228609085,
... (truncated for brevity) ...,0.02505224384367466,-0.0037634256295859814,-0.04341062530875206],"index":0}],"model":"text-embedding-nomic-embed-text-v1.5@q4_k_m","usage":{"prompt_tokens":0,"total_tokens":0}}