Model Registry API
The Model Registry API (mModelStore) provides model storage and management capabilities for the AI Foundation Package. This service allows you to register, upload, download, and manage AI models in GGUF and ONNX formats.
Base URL
http://localhost:8083/mimik-ai/store/v1
The Model Registry service runs as part of the AI Foundation Package. The default port is 8083.
Authentication
Mutating operations (POST, PUT, DELETE) require a Bearer token in the Authorization header:
Authorization: Bearer 1234
The default API key is 1234, configured in the [mmodelstore-v1] section of the addon .ini file. See Addon Configuration for details.
Read operations (GET) do not require authentication.
Quick Reference
| Method | Endpoint | Description |
|---|---|---|
| GET | /models | List all models |
| POST | /models | Create model metadata |
| PUT | /models | Update model configuration |
| GET | /models/{id} | Get model details |
| DELETE | /models/{id} | Delete model |
| POST | /models/{id}/upload | Upload model file |
| POST | /models/{id}/download | Download model from URL |
Two-Step Provisioning
Model provisioning follows a two-step process:
- Create metadata: Register the model with its configuration
- Provision file: Either upload directly or download from URL
This separation allows you to configure the model before the file transfer, and supports both local uploads and remote downloads.
Model Kinds
The Model Registry supports four model kinds:
| Kind | Description | File Format | Use Case |
|---|---|---|---|
llm | Large Language Model | GGUF | Text generation, chat, reasoning |
vlm | Vision Language Model | GGUF + mmproj | Multimodal (text + images) |
embed | Embedding Model | GGUF | Text embeddings, semantic search |
onnx | ONNX Model | ONNX | Image classification, predictive AI |
Endpoints
Create Model
Create a new model entry with metadata. The model file is provisioned separately.
Request
POST /models
Headers
| Header | Required | Value |
|---|---|---|
Content-Type | Yes | application/json |
Authorization | Yes | Bearer <token> |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Unique model identifier |
version | string | Yes | Model version (metadata only) |
kind | string | Yes | Model type: llm, vlm, embed, onnx |
gguf | object | No | GGUF configuration (for llm, vlm, embed) |
onnx | object | No | ONNX configuration (for onnx kind) |
GGUF Configuration
| Field | Type | Description |
|---|---|---|
chatTemplateHint | string | Chat template format (see supported values below) |
initContextSize | integer | Context window size for model initialization |
initGpuLayerSize | integer | GPU layers to offload during initialization |
ONNX Configuration
| Field | Type | Description |
|---|---|---|
executionProvider | string | Execution provider: cpu, cuda, coreml, tensorrt |
Example: Create LLM
- cURL
- JavaScript
- Python
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}'
const response = await fetch('http://localhost:8083/mimik-ai/store/v1/models', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer 1234'
},
body: JSON.stringify({
id: 'smollm2-360m',
version: '1.0.0',
kind: 'llm',
gguf: {
chatTemplateHint: 'chatml',
initContextSize: 2048,
initGpuLayerSize: 99
}
})
});
const model = await response.json();
console.log('Model created:', model.id);
import requests
response = requests.post(
"http://localhost:8083/mimik-ai/store/v1/models",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer 1234"
},
json={
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}
)
model = response.json()
print(f"Model created: {model['id']}")
Response (201 Created)
{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"readyToUse": false,
"createdAt": 1729591200000,
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}
Creating a model that already exists updates the metadata (returns 200 OK instead of 201 Created).
Upload Model File
Upload a model file directly via multipart form data. This is Step 2a of provisioning.
Request
POST /models/{id}/upload
Headers
| Header | Required | Value |
|---|---|---|
Content-Type | Yes | multipart/form-data |
Authorization | Yes | Bearer <token> |
Form Data
| Field | Type | Required | Description |
|---|---|---|---|
file | binary | Yes | The model file (.gguf or .onnx) |
Example
- cURL
- JavaScript
- Python
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/upload" \
-H "Authorization: Bearer 1234" \
-F "file=@SmolLM2-360M-Instruct-Q8_0.gguf"
const formData = new FormData();
formData.append('file', fileInput.files[0]);
const response = await fetch('http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/upload', {
method: 'POST',
headers: {
'Authorization': 'Bearer 1234'
},
body: formData
});
const model = await response.json();
console.log('Model uploaded, ready:', model.readyToUse);
import requests
with open("SmolLM2-360M-Instruct-Q8_0.gguf", "rb") as f:
response = requests.post(
"http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/upload",
headers={"Authorization": "Bearer 1234"},
files={"file": f}
)
model = response.json()
print(f"Model uploaded, ready: {model['readyToUse']}")
Response (200 OK)
{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"readyToUse": true,
"totalSize": 386000000,
"createdAt": 1729591200000,
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}
Download Model File
Download a model file from a URL. This is Step 2b of provisioning. Returns a Server-Sent Events stream with download progress.
Request
POST /models/{id}/download
Headers
| Header | Required | Value |
|---|---|---|
Content-Type | Yes | application/json |
Authorization | Yes | Bearer <token> |
Accept | No | text/event-stream (for SSE progress) |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to download the model file |
mmprojUrl | string | No | URL for VLM multimodal projection file |
Example: Download LLM
- cURL
- JavaScript
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/download" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"url": "https://huggingface.co/lmstudio-community/SmolLM2-360M-Instruct-GGUF/resolve/main/SmolLM2-360M-Instruct-Q8_0.gguf?download=true"
}'
const response = await fetch('http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/download', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer 1234'
},
body: JSON.stringify({
url: 'https://huggingface.co/lmstudio-community/SmolLM2-360M-Instruct-GGUF/resolve/main/SmolLM2-360M-Instruct-Q8_0.gguf?download=true'
})
});
// Handle SSE stream for progress
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.done) {
console.log('Download complete:', data.model);
} else {
const percent = ((data.size / data.totalSize) * 100).toFixed(1);
console.log(`Progress: ${percent}%`);
}
}
}
}
SSE Response Stream
data: {"size": 100000000, "totalSize": 386000000}
data: {"size": 250000000, "totalSize": 386000000}
data: {"size": 386000000, "totalSize": 386000000}
data: {"done": true, "model": {"id": "smollm2-360m", "readyToUse": true, ...}}
Example: Download VLM with mmproj
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/llava-1.6/download" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"url": "https://example.com/llava-model.gguf",
"mmprojUrl": "https://example.com/llava-mmproj.gguf"
}'
VLM downloads include mmproj progress:
data: {"size": 1048576000, "totalSize": 1048576000}
data: {"mmproj": {"size": 26214400, "totalSize": 52428800}}
data: {"done": true, "model": {...}}
To cancel a download in progress, disconnect from the SSE stream. The server detects disconnection, stops the download, and removes partial files.
List Models
Retrieve all models in the registry.
Request
GET /models
Query Parameters
| Parameter | Type | Description |
|---|---|---|
kind | string | Filter by model type: llm, vlm, embed, onnx |
ready | boolean | Filter by ready status: true or false |
Example
- cURL
- JavaScript
- Python
# List all models
curl "http://localhost:8083/mimik-ai/store/v1/models"
# List only LLM models
curl "http://localhost:8083/mimik-ai/store/v1/models?kind=llm"
# List only ready models
curl "http://localhost:8083/mimik-ai/store/v1/models?ready=true"
# Combine filters
curl "http://localhost:8083/mimik-ai/store/v1/models?kind=llm&ready=true"
// List all models
const response = await fetch('http://localhost:8083/mimik-ai/store/v1/models');
const data = await response.json();
console.log(`Found ${data.data.length} models`);
data.data.forEach(model => {
console.log(`${model.id} (${model.kind}): ready=${model.readyToUse}`);
});
// Filter by kind
const llmResponse = await fetch('http://localhost:8083/mimik-ai/store/v1/models?kind=llm');
const llmModels = await llmResponse.json();
import requests
# List all models
response = requests.get("http://localhost:8083/mimik-ai/store/v1/models")
data = response.json()
print(f"Found {len(data['data'])} models")
for model in data['data']:
print(f"{model['id']} ({model['kind']}): ready={model['readyToUse']}")
Response (200 OK)
{
"data": [
{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"readyToUse": true,
"totalSize": 386000000,
"createdAt": 1729591200000,
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
},
{
"id": "nomic-embed-text",
"version": "1.0.0",
"kind": "embed",
"readyToUse": true,
"totalSize": 274000000,
"createdAt": 1729591300000,
"gguf": {
"initContextSize": 8192
}
}
]
}
Get Model Details
Retrieve detailed information about a specific model.
Request
GET /models/{id}
Query Parameters
| Parameter | Value | Description |
|---|---|---|
alt | media | Download the model file instead of metadata |
Example
- cURL
- JavaScript
- Python
# Get model metadata
curl "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m"
# Download model file
curl "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m?alt=media" -o model.gguf
const response = await fetch('http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m');
if (response.ok) {
const model = await response.json();
console.log(`Model: ${model.id}`);
console.log(`Kind: ${model.kind}`);
console.log(`Ready: ${model.readyToUse}`);
console.log(`Size: ${(model.totalSize / 1024 / 1024).toFixed(0)} MB`);
} else {
console.error('Model not found');
}
import requests
response = requests.get("http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m")
if response.status_code == 200:
model = response.json()
print(f"Model: {model['id']}")
print(f"Kind: {model['kind']}")
print(f"Ready: {model['readyToUse']}")
print(f"Size: {model['totalSize'] / 1024 / 1024:.0f} MB")
else:
print("Model not found")
Response (200 OK)
{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"readyToUse": true,
"totalSize": 386000000,
"createdAt": 1729591200000,
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}
Update Model
Update model configuration (metadata fields only).
Request
PUT /models
Headers
| Header | Required | Value |
|---|---|---|
Content-Type | Yes | application/json |
Authorization | Yes | Bearer <token> |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Model identifier to update |
action | string | Yes | Must be "update" |
gguf | object | No | Updated GGUF configuration |
onnx | object | No | Updated ONNX configuration |
Example
curl -X PUT "http://localhost:8083/mimik-ai/store/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"id": "smollm2-360m",
"action": "update",
"gguf": {
"initContextSize": 4096,
"initGpuLayerSize": 99
}
}'
Delete Model
Remove a model from the registry. This deletes the model entry and associated files.
Request
DELETE /models/{id}
Headers
| Header | Required | Value |
|---|---|---|
Authorization | Yes | Bearer <token> |
Example
- cURL
- JavaScript
- Python
curl -X DELETE "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m" \
-H "Authorization: Bearer 1234"
const response = await fetch('http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m', {
method: 'DELETE',
headers: {
'Authorization': 'Bearer 1234'
}
});
if (response.ok) {
console.log('Model deleted successfully');
}
import requests
response = requests.delete(
"http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m",
headers={"Authorization": "Bearer 1234"}
)
if response.ok:
print("Model deleted successfully")
Deleting a model removes all associated files and cancels any in-progress downloads. This cannot be undone.
Model Schema
Full Model Object
{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"readyToUse": true,
"totalSize": 386000000,
"createdAt": 1729591200000,
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}
Field Descriptions
| Field | Type | Description |
|---|---|---|
id | string | Unique model identifier |
version | string | Model version (metadata only, not part of unique key) |
kind | string | Model type: llm, vlm, embed, onnx |
readyToUse | boolean | Whether the model file is provisioned and ready |
totalSize | integer | File size in bytes (set by system after upload/download) |
createdAt | integer | Creation timestamp in milliseconds (set by system) |
gguf | object | GGUF configuration (for llm, vlm, embed kinds) |
onnx | object | ONNX configuration (for onnx kind) |
ID Format Rules
- Allowed characters: alphanumeric, dash, underscore, dot
- Pattern:
^[a-zA-Z0-9._-]+$ - Maximum length: 255 characters
- Cannot start with
.or- - Cannot contain
..
| Example | Valid | Notes |
|---|---|---|
smollm2-360m | Yes | Standard format |
llama-3.2-1b | Yes | With dots |
my_model_v1 | Yes | With underscores |
../etc/passwd | No | Path traversal blocked |
.hidden | No | Cannot start with dot |
Supported Chat Template Hints
For GGUF models (llm, vlm, embed), the chatTemplateHint field specifies the chat format:
| Value | Models |
|---|---|
chatml | Many fine-tuned models |
llama2 | Llama 2 family |
llama3 | Llama 3 family |
phi3 | Phi-3 family |
mistral-v1, mistral-v3, mistral-v7 | Mistral family |
gemma | Gemma family |
deepseek, deepseek2, deepseek3 | DeepSeek family |
command-r | Cohere Command-R |
falcon3 | Falcon 3 |
zephyr | Zephyr models |
vicuna, vicuna-orca | Vicuna family |
openchat | OpenChat models |
Match the chatTemplateHint to your model. Using the wrong template may result in poor quality responses or formatting issues.
Error Responses
| Code | Description |
|---|---|
| 400 | Bad request (invalid input parameter) |
| 401 | Unauthorized (missing authentication) |
| 403 | Forbidden (invalid API key) |
| 404 | Not found (model does not exist) |
| 500 | Internal server error |
Error Format
{
"error": {
"code": 404,
"message": "Model 'unknown-model' not found"
}
}