Upload a Model
The Model Registry manages your AI models using a two-step provisioning process. This guide shows you how to register and provision models for the four supported model kinds (llm, vlm, embed, onnx). See AI Foundation Overview for model kind descriptions.
Two-Step Provisioning
Model provisioning separates metadata creation from file provisioning:
- Create metadata: Register the model with its configuration
- Provision file: Either upload directly or download from URL
This approach allows you to configure the model before transferring the file.
Prerequisites
Before uploading a model:
- mimOE runtime is running (Quick Start)
- You have a model file in GGUF or ONNX format, or a URL to download from
- Terminal or command prompt access
Step 1: Create Model Metadata
Register the model with its configuration. The metadata is the same regardless of whether you download or upload the file.
- LLM (GGUF)
- Embedding Model
- ONNX Model
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}'
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"id": "nomic-embed-text",
"version": "1.0.0",
"kind": "embed",
"gguf": {
"initContextSize": 8192
}
}'
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"id": "mobilenet-v2",
"version": "1.0.0",
"kind": "onnx",
"onnx": {
"executionProvider": "cpu"
}
}'
Response shows readyToUse: false until the file is provisioned:
{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"readyToUse": false,
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}
Step 2: Provision the Model File
After creating the metadata, provision the model file using one of two methods:
- Option A: Download from URL: Download directly from Hugging Face or other sources
- Option B: Upload local file: Upload a model file from your machine
Option A: Download from URL
Download the model from a URL. The response streams progress via Server-Sent Events:
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/download" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"url": "https://huggingface.co/lmstudio-community/SmolLM2-360M-Instruct-GGUF/resolve/main/SmolLM2-360M-Instruct-Q8_0.gguf?download=true"
}'
Progress stream:
data: {"size": 100000000, "totalSize": 386000000}
data: {"size": 250000000, "totalSize": 386000000}
data: {"size": 386000000, "totalSize": 386000000}
data: {"done": true, "model": {"id": "smollm2-360m", "readyToUse": true, ...}}
To cancel a download in progress, disconnect from the SSE stream. The server detects disconnection, stops the download, and removes partial files.
Option B: Upload Local File
If you have a model file on your machine, upload it directly:
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/smollm2-360m/upload" \
-H "Authorization: Bearer 1234" \
-F "file=@/path/to/model.gguf"
Response shows the model is ready:
{
"id": "smollm2-360m",
"version": "1.0.0",
"kind": "llm",
"readyToUse": true,
"totalSize": 386000000,
"gguf": {
"chatTemplateHint": "chatml",
"initContextSize": 2048,
"initGpuLayerSize": 99
}
}
Vision Language Models (VLM)
VLM models require two files: the main model and a multimodal projection (mmproj) file.
VLM models must be provisioned via URL download. Local file upload is not supported for VLM because it requires two separate files.
Register VLM
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"id": "llava-1.6",
"version": "1.0.0",
"kind": "vlm",
"gguf": {
"chatTemplateHint": "llama3",
"initContextSize": 4096,
"initGpuLayerSize": 32
}
}'
Download VLM with mmproj
Provide both URLs in the download request:
curl -X POST "http://localhost:8083/mimik-ai/store/v1/models/llava-1.6/download" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"url": "https://example.com/llava-model.gguf",
"mmprojUrl": "https://example.com/llava-mmproj.gguf"
}'
Progress stream includes mmproj progress:
data: {"size": 1048576000, "totalSize": 1048576000}
data: {"mmproj": {"size": 26214400, "totalSize": 52428800}}
data: {"done": true, "model": {...}}
GGUF Configuration Options
chatTemplateHint
The chat template determines how messages are formatted. Match this to your model:
| Value | Models |
|---|---|
llama3 | Llama 3 family |
phi3 | Phi-3 family |
mistral-v3 | Mistral family |
gemma | Gemma family |
chatml | Many fine-tuned models |
deepseek2 | DeepSeek family |
See the Model Registry API for the complete list.
initContextSize
The context window size for model initialization:
| Value | Use Case |
|---|---|
2048 | Quick responses, lower memory |
4096 | Standard (recommended for most models) |
8192 | Longer conversations |
16384+ | Extended context models |
initGpuLayerSize
Number of layers to offload to GPU. Higher values use more GPU memory but run faster:
| Value | Description |
|---|---|
0 | CPU only |
99 | Maximum GPU offload (recommended) |
Troubleshooting
Download Fails or Times Out
Symptom: Download returns an error or stops mid-way
Solution:
- Check your internet connection
- Verify the URL is correct and accessible
- Ensure you have enough disk space
- Retry the download (it will replace any partial file)
Model Not Ready After Download
Symptom: readyToUse stays false
Solution: Check the model status:
curl "http://localhost:8083/mimik-ai/store/v1/models/your-model"
If the download failed, retry it.
Upload Fails
Symptom: Upload returns an error
Solution:
- Verify the file exists and is readable
- Ensure you created the metadata first
- Check that the file format matches the
kind
Wrong Chat Template
Symptom: Model produces poor quality or malformed responses
Cause: Incorrect chatTemplateHint for the model
Solution: Update the model configuration:
curl -X PUT "http://localhost:8083/mimik-ai/store/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1234" \
-d '{
"id": "your-model",
"action": "update",
"gguf": {
"chatTemplateHint": "correct-template"
}
}'
Next Steps
Now that you've uploaded a model:
- Inference API Guide: Run inference with your model
- Chat with SmolLM2: Complete chat example
- Finding Models: Discover models and model sizes
API Reference
For complete API documentation including listing, updating, and deleting models: