Upload a Model

The Model Registry manages your AI models using a two-step provisioning process. This guide shows you how to register and provision models for the four supported model kinds (llm, vlm, embed, onnx). See AI Foundation Overview for model kind descriptions.

Two-Step Provisioning

Model provisioning separates metadata creation from file provisioning:

Create metadata: Register the model with its configuration
Provision file: Either upload directly or download from URL

This approach allows you to configure the model before transferring the file.

Two-step provisioning: Create Metadata, then choose to Upload File or Download from URL, resulting in Model Ready

Prerequisites

Before uploading a model:

mimOE runtime is running (Quick Start)
You have a model file in GGUF or ONNX format, or a URL to download from
Terminal or command prompt access

Step 1: Create Model Metadata

Register the model with its configuration. The metadata is the same regardless of whether you download or upload the file.

LLM (GGUF)
Embedding Model
ONNX Model

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 1234" \
  -d '{
    "id": "smollm2-360m",
    "version": "1.0.0",
    "kind": "llm",
    "gguf": {
      "chatTemplateHint": "chatml",
      "initContextSize": 2048,
      "initGpuLayerSize": 99
    }
  }'

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 1234" \
  -d '{
    "id": "nomic-embed-text",
    "version": "1.0.0",
    "kind": "embed",
    "gguf": {
      "initContextSize": 8192
    }
  }'

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 1234" \
  -d '{
    "id": "mobilenet-v2",
    "version": "1.0.0",
    "kind": "onnx",
    "onnx": {
      "executionProvider": "cpu"
    }
  }'

Response shows readyToUse: false until the file is provisioned:

{
  "id": "smollm2-360m",
  "version": "1.0.0",
  "kind": "llm",
  "readyToUse": false,
  "gguf": {
    "chatTemplateHint": "chatml",
    "initContextSize": 2048,
    "initGpuLayerSize": 99
  }
}

Step 2: Provision the Model File

After creating the metadata, provision the model file using one of two methods:

Option A: Download from URL: Download directly from Hugging Face or other sources
Option B: Upload local file: Upload a model file from your machine

Option A: Download from URL

Download the model from a URL. The response streams progress via Server-Sent Events:

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/download" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 1234" \
  -d '{
    "url": "https://huggingface.co/lmstudio-community/SmolLM2-360M-Instruct-GGUF/resolve/main/SmolLM2-360M-Instruct-Q8_0.gguf?download=true"
  }'

Progress stream:

data: {"size": 100000000, "totalSize": 386000000}
data: {"size": 250000000, "totalSize": 386000000}
data: {"size": 386000000, "totalSize": 386000000}
data: {"done": true, "model": {"id": "smollm2-360m", "readyToUse": true, ...}}

Cancel Download

To cancel a download in progress, disconnect from the SSE stream. The server detects disconnection, stops the download, and removes partial files.

Option B: Upload Local File

If you have a model file on your machine, upload it directly:

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/upload" \
  -H "Authorization: Bearer 1234" \
  -F "file=@/path/to/model.gguf"

Response shows the model is ready:

{
  "id": "smollm2-360m",
  "version": "1.0.0",
  "kind": "llm",
  "readyToUse": true,
  "totalSize": 386000000,
  "gguf": {
    "chatTemplateHint": "chatml",
    "initContextSize": 2048,
    "initGpuLayerSize": 99
  }
}

Vision Language Models (VLM)

VLM models require two files: the main model and a multimodal projection (mmproj) file.

Download Only

VLM models must be provisioned via URL download. Local file upload is not supported for VLM because it requires two separate files.

Register VLM

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 1234" \
  -d '{
    "id": "llava-1.6",
    "version": "1.0.0",
    "kind": "vlm",
    "gguf": {
      "chatTemplateHint": "llama3",
      "initContextSize": 4096,
      "initGpuLayerSize": 32
    }
  }'

Download VLM with mmproj

Provide both URLs in the download request:

curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/llava-1.6/download" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 1234" \
  -d '{
    "url": "https://example.com/llava-model.gguf",
    "mmprojUrl": "https://example.com/llava-mmproj.gguf"
  }'

Progress stream includes mmproj progress:

data: {"size": 1048576000, "totalSize": 1048576000}
data: {"mmproj": {"size": 26214400, "totalSize": 52428800}}
data: {"done": true, "model": {...}}

GGUF Configuration Options

chatTemplateHint

The chat template determines how messages are formatted. Match this to your model:

Value	Models
`llama3`	Llama 3 family
`phi3`	Phi-3 family
`mistral-v3`	Mistral family
`gemma`	Gemma family
`chatml`	Many fine-tuned models
`deepseek2`	DeepSeek family

See the Model Registry API for the complete list.

initContextSize

The context window size for model initialization:

Value	Use Case
`2048`	Quick responses, lower memory
`4096`	Standard (recommended for most models)
`8192`	Longer conversations
`16384+`	Extended context models

initGpuLayerSize

Number of layers to offload to GPU. Higher values use more GPU memory but run faster:

Value	Description
`0`	CPU only
`99`	Maximum GPU offload (recommended)

Troubleshooting

Download Fails or Times Out

Symptom: Download returns an error or stops mid-way

Solution:

Check your internet connection
Verify the URL is correct and accessible
Ensure you have enough disk space
Retry the download (it will replace any partial file)

Model Not Ready After Download

Symptom: readyToUse stays false

Solution: Check the model status:

curl "http://localhost:8083/mimik-ai/store/v1/models/your-model"

If the download failed, retry it.

Upload Fails

Symptom: Upload returns an error

Solution:

Verify the file exists and is readable
Ensure you created the metadata first
Check that the file format matches the kind

Wrong Chat Template

Symptom: Model produces poor quality or malformed responses

Cause: Incorrect chatTemplateHint for the model

Solution: Update the model configuration:

curl -X PUT "http://localhost:8083/mimik-ai/store/v1/models" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 1234" \
  -d '{
    "id": "your-model",
    "action": "update",
    "gguf": {
      "chatTemplateHint": "correct-template"
    }
  }'

Next Steps

Now that you've uploaded a model:

Inference API Guide: Run inference with your model
Chat with SmolLM2: Complete chat example
Finding Models: Discover models and model sizes

API Reference

For complete API documentation including listing, updating, and deleting models:

Model Registry API Reference

Two-Step Provisioning​

Prerequisites​

Step 1: Create Model Metadata​

Step 2: Provision the Model File​

Option A: Download from URL​

Option B: Upload Local File​

Vision Language Models (VLM)​

Register VLM​

Download VLM with mmproj​

GGUF Configuration Options​

chatTemplateHint​

initContextSize​

initGpuLayerSize​

Troubleshooting​

Download Fails or Times Out​

Model Not Ready After Download​

Upload Fails​

Wrong Chat Template​

Next Steps​

API Reference​