Finding Models on Hugging Face
Hugging Face is the largest repository of AI models, with thousands of pre-trained models available for free. This guide shows you how to find compatible models for mimOE, evaluate their characteristics, and download them for on-device inference.
Understanding Model Formats
Before searching, understand which format you need:
GGUF Models
Use when: You need generative AI capabilities (chat, text generation, code completion)
File extension: .gguf
Common use cases:
- Conversational AI
- Text generation
- Code completion
- Question answering
- Creative writing
Typical size: 1.5GB - 8GB (quantized versions)
ONNX Models
Use when: You need predictive AI capabilities (classification, detection, embeddings)
File extension: .onnx
Common use cases:
- Image classification
- Object detection
- Text embeddings
- Sentiment analysis
- Regression models
Typical size: 10MB - 500MB
Finding GGUF Models
Method 1: Search with Filters
-
Go to Hugging Face Models
-
In the search filters, select: : Format: GGUF : Libraries: transformers (optional) : Tasks: Text Generation
-
Sort by: : Most downloads: Popular, well-tested models : Most likes: Community favorites : Trending: Recently popular models
Method 2: Search by Keywords
Use the search bar with specific keywords:
phi-3 gguf
llama-3 gguf
mistral gguf
gemma gguf
Popular GGUF Models for mimOE
Here are recommended models that work well on-device:
Small Models (2-4GB): Recommended for Most Devices
| Model | Size | Context | Best For |
|---|---|---|---|
| Phi-3-mini-4k | 2.4GB (Q4) | 4K tokens | General chat, coding, Q&A |
| Gemma-2B | 1.8GB (Q4) | 8K tokens | Fast responses, low memory |
| TinyLlama-1.1B | 0.6GB (Q4) | 2K tokens | Ultra-fast, resource-constrained |
Direct links:
Medium Models (4-8GB): For Capable Devices
| Model | Size | Context | Best For |
|---|---|---|---|
| Llama-3-8B | 4.7GB (Q4) | 8K tokens | High-quality responses |
| Mistral-7B | 4.1GB (Q4) | 8K tokens | Instruction following, reasoning |
| Phi-3-medium | 7.6GB (Q4) | 128K tokens | Long context, complex tasks |
Direct links:
Understanding Quantization Levels
GGUF models come in different quantization levels. Lower quantization = smaller size but slightly lower quality.
| Quantization | Size vs Original | Quality | Use When |
|---|---|---|---|
| Q2 | ~25% | Acceptable | Extremely limited memory |
| Q3 | ~33% | Good | Limited memory |
| Q4 | ~40% | Very Good | Recommended default |
| Q5 | ~50% | Excellent | You have extra memory |
| Q6 | ~60% | Near-perfect | Quality is paramount |
| Q8 | ~80% | Identical | Research/benchmarking |
Q4 (Q4_K_M variant) offers the best balance of size, speed, and quality for most use cases.
Downloading GGUF Models
Once you've found a model, download it:
Option 1: Click "Download" on model page
Navigate to the Files tab and click the download icon next to the .gguf file.
Option 2: Use curl (faster for large files)
# Example: Download Phi-3-mini Q4
curl -L -o phi-3-mini-4k-instruct-q4.gguf \
"https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf?download=true"
Option 3: Use Hugging Face CLI
# Install HF CLI
pip install huggingface-hub
# Download model
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf \
Phi-3-mini-4k-instruct-q4.gguf \
--local-dir ./models
Finding ONNX Models
Method 1: Search with Filters
-
Go to Hugging Face Models
-
In the search filters, select: : Format: ONNX : Libraries: transformers or onnx : Tasks: Select your task (Image Classification, Object Detection, etc.)
Method 2: Search by Task + "ONNX"
mobilenet onnx
resnet onnx
bert onnx
yolo onnx
Popular ONNX Models for mimOE
Image Classification
| Model | Size | Input Size | Accuracy | Speed |
|---|---|---|---|---|
| MobileNetV2 | 14MB | 224x224 | 72% | Very Fast |
| ResNet-50 | 98MB | 224x224 | 76% | Fast |
| EfficientNet-B0 | 20MB | 224x224 | 77% | Fast |
Direct links:
Object Detection
| Model | Size | Input Size | Use Case |
|---|---|---|---|
| YOLOv8n | 6MB | 640x640 | Real-time detection (fast) |
| YOLOv8s | 22MB | 640x640 | Better accuracy |
| YOLOv8m | 52MB | 640x640 | High accuracy |
Direct links:
Text Embeddings
| Model | Size | Embedding Dim | Use Case |
|---|---|---|---|
| all-MiniLM-L6-v2 | 90MB | 384 | Fast semantic search |
| BERT-base | 420MB | 768 | Higher quality embeddings |
Direct links:
Downloading ONNX Models
Option 1: Direct download
Click "Download" on the model's Files tab.
Option 2: Use curl
# Example: Download MobileNetV2
curl -L -o mobilenet_v2.onnx \
"https://huggingface.co/onnx-community/mobilenet_v2_1.0_224/resolve/main/model.onnx?download=true"
Option 3: Python script to export
Many models don't have pre-exported ONNX versions. You can export them:
from transformers import AutoModel
from optimum.onnxruntime import ORTModelForImageClassification
# Load and export to ONNX
model = ORTModelForImageClassification.from_pretrained(
"microsoft/resnet-50",
export=True
)
model.save_pretrained("./resnet-50-onnx")
Evaluating Models
Before downloading, check these characteristics:
Model Card
Every model has a "Model Card" describing:
- Purpose: What the model is designed for
- Training data: What data it was trained on
- Limitations: Known weaknesses
- License: Usage restrictions
Files Tab
Check the model's files:
- Size: Will it fit in your available memory?
- Format: Is it
.ggufor.onnx? - Variants: Multiple quantization levels available?
Community Activity
Indicators of model quality:
- Downloads: More downloads = more tested
- Likes: Community endorsement
- Discussions: Active community support
- Recent updates: Is it maintained?
Model Selection Criteria
For GGUF (Generative AI)
Choose based on:
1. Memory constraints
Rule of thumb: You need ~1.5-2x the model file size in available RAM.
- 4GB RAM → Up to 2B parameters (TinyLlama 1.1B, Gemma-2B)
- 8GB RAM → Up to 4B parameters (Phi-3-mini, SmolLM2)
- 16GB RAM → Up to 7B parameters comfortably (Llama-3-8B, Mistral-7B)
- 32GB RAM → Up to 13B parameters (Llama-3-13B, CodeLlama-13B)
2. Task complexity
- Simple chat → Phi-3-mini, Gemma-2B
- Code generation → Phi-3-mini, CodeLlama
- Long documents → Phi-3-medium (128K context)
- Maximum quality → Llama-3-8B, Mistral-7B
3. Response speed
- Fastest → TinyLlama (0.6GB)
- Fast → Phi-3-mini (2.4GB)
- Balanced → Llama-3-8B Q4 (4.7GB)
For ONNX (Predictive AI)
Choose based on:
1. Task type
- Image classification → MobileNetV2, ResNet-50
- Object detection → YOLOv8n, YOLOv8s
- Text embeddings → all-MiniLM-L6-v2
- Sentiment analysis → DistilBERT
2. Accuracy vs. Speed
- Speed priority → MobileNetV2, YOLOv8n, MiniLM
- Accuracy priority → ResNet-50, YOLOv8m, BERT-base
- Balanced → EfficientNet-B0, YOLOv8s
3. Input constraints
- Limited preprocessing → Models with 224x224 input
- High resolution → Models with 640x640+ input
Common Pitfalls to Avoid
GGUF Models
- Don't download multiple quantizations: Pick one (Q4 recommended)
- Check context length: Longer isn't always better (more memory)
- Verify it's an instruct model: Base models aren't fine-tuned for chat
ONNX Models
- Check input preprocessing: Must match model expectations
- Verify output format: Some models return logits, others probabilities
- Look for complete examples: Preprocessing is critical
General
- Read the license: Some models have commercial restrictions
- Check file integrity: Large downloads can get corrupted
- Test on sample data first: Before deploying
Example Model Searches
"I want a chatbot"
Search: phi-3 gguf or llama-3 gguf
Recommendation: Phi-3-mini-4k-instruct Q4 (2.4GB)
- Fast responses
- Good quality
- Works on most devices
"I want to classify images"
Search: mobilenet onnx or resnet onnx
Recommendation: MobileNetV2 ONNX (14MB)
- Very fast
- Good accuracy for common objects
- Small size
"I want semantic search"
Search: sentence-transformers onnx or all-MiniLM onnx
Recommendation: all-MiniLM-L6-v2 (90MB)
- Fast embedding generation
- Good semantic understanding
- Widely used and tested
"I want code completion"
Search: phi-3 gguf or codellama gguf
Recommendation: Phi-3-mini-4k-instruct Q4 (2.4GB)
- Excellent coding capabilities
- Fast
- Reasonable size
Next Steps
Now that you know how to find models:
- Quick Start: Upload your first model
- Chat with SmolLM2: Complete GGUF tutorial
- Upload Models: Detailed upload guide