Skip to main content

Upload a Model

The Model Registry manages your AI models using a two-step provisioning process. This guide shows you how to register and provision models for the four supported model kinds (llm, vlm, embed, onnx). See AI Foundation Overview for model kind descriptions.

Two-Step Provisioning

Model provisioning separates metadata creation from file provisioning:

  1. Create metadata: Register the model with its configuration
  2. Provision file: Either upload directly or download from URL

This approach allows you to configure the model before transferring the file.

Two-step provisioning: Create Metadata, then choose to Upload File or Download from URL, resulting in Model Ready

Prerequisites

Before uploading a model:

  • mimOE runtime is running (Quick Start)
  • You have a model file in GGUF or ONNX format, or a URL to download from
  • Terminal or command prompt access

Step 1: Create Model Metadata

Register the model with its configuration. The metadata is the same regardless of whether you download or upload the file.

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}'

Response shows readyToUse: false until the file is provisioned:

{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"readyToUse": false,
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}

Step 2: Provision the Model File

After creating the metadata, provision the model file using one of two methods:

Option A: Download from URL

Download the model from a URL. The response streams progress via Server-Sent Events:

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/download" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"url": "https://huggingface.co/lmstudio-community/SmolLM2-360M-Instruct-GGUF/resolve/main/SmolLM2-360M-Instruct-Q8_0.gguf?download=true"
}'

Progress stream:

data: {"size": 100000000, "totalSize": 386000000}
data: {"size": 250000000, "totalSize": 386000000}
data: {"size": 386000000, "totalSize": 386000000}
data: {"done": true, "model": {"id": "smollm2-360m", "readyToUse": true, ...}}
Cancel Download

To cancel a download in progress, disconnect from the SSE stream. The server detects disconnection, stops the download, and removes partial files.

Option B: Upload Local File

If you have a model file on your machine, upload it directly:

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/upload" \
-H "Authorization: Bearer 1234" \
-F "file=@/path/to/model.gguf"

Response shows the model is ready:

{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"readyToUse": true,
"totalSize": 386000000,
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}

Vision Language Models (VLM)

VLM models require two files: the main model and a multimodal projection (mmproj) file.

Download Only

VLM models must be provisioned via URL download. Local file upload is not supported for VLM because it requires two separate files.

Register VLM

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"id": "llava-1.6",
"version": "1.0.0",
"kind": "vlm",
"gguf": {
"chatTemplateHint": "llama3",
"initContextSize": 4096,
"initGpuLayerSize": 32
}
}'

Download VLM with mmproj

Provide both URLs in the download request:

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/llava-1.6/download" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"url": "https://example.com/llava-model.gguf",
"mmprojUrl": "https://example.com/llava-mmproj.gguf"
}'

Progress stream includes mmproj progress:

data: {"size": 1048576000, "totalSize": 1048576000}
data: {"mmproj": {"size": 26214400, "totalSize": 52428800}}
data: {"done": true, "model": {...}}

GGUF Configuration Options

chatTemplateHint

The chat template determines how messages are formatted. Match this to your model:

ValueModels
llama3Llama 3 family
phi3Phi-3 family
mistral-v3Mistral family
gemmaGemma family
chatmlMany fine-tuned models
deepseek2DeepSeek family

See the Model Registry API for the complete list.

initContextSize

The context window size for model initialization:

ValueUse Case
2048Quick responses, lower memory
4096Standard (recommended for most models)
8192Longer conversations
16384+Extended context models

initGpuLayerSize

Number of layers to offload to GPU. Higher values use more GPU memory but run faster:

ValueDescription
0CPU only
99Maximum GPU offload (recommended)

Troubleshooting

Download Fails or Times Out

Symptom: Download returns an error or stops mid-way

Solution:

  1. Check your internet connection
  2. Verify the URL is correct and accessible
  3. Ensure you have enough disk space
  4. Retry the download (it will replace any partial file)

Model Not Ready After Download

Symptom: readyToUse stays false

Solution: Check the model status:

curl "http://localhost:8083/mimik-ai/store/v1/models/your-model"

If the download failed, retry it.

Upload Fails

Symptom: Upload returns an error

Solution:

  1. Verify the file exists and is readable
  2. Ensure you created the metadata first
  3. Check that the file format matches the kind

Wrong Chat Template

Symptom: Model produces poor quality or malformed responses

Cause: Incorrect chatTemplateHint for the model

Solution: Update the model configuration:

curl -X PUT "http://localhost:8083/mimik-ai/store/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"id": "your-model",
"action": "update",
"gguf": {
"chatTemplateHint": "correct-template"
}
}'

Next Steps

Now that you've uploaded a model:

API Reference

For complete API documentation including listing, updating, and deleting models: