Finding Models on Hugging Face

Hugging Face is the largest repository of AI models, with thousands of pre-trained models available for free. This guide shows you how to find compatible models for mimOE, evaluate their characteristics, and download them for on-device inference.

Understanding Model Formats

Before searching, understand which format you need:

GGUF Models

Use when: You need generative AI capabilities (chat, text generation, code completion)

File extension: .gguf

Common use cases:

Conversational AI
Text generation
Code completion
Question answering
Creative writing

Typical size: 1.5GB - 8GB (quantized versions)

ONNX Models

Use when: You need predictive AI capabilities (classification, detection, embeddings)

File extension: .onnx

Common use cases:

Image classification
Object detection
Text embeddings
Sentiment analysis
Regression models

Typical size: 10MB - 500MB

Finding GGUF Models

Method 1: Search with Filters

Go to Hugging Face Models
In the search filters, select: : Format: GGUF : Libraries: transformers (optional) : Tasks: Text Generation
Sort by: : Most downloads: Popular, well-tested models : Most likes: Community favorites : Trending: Recently popular models

Method 2: Search by Keywords

Use the search bar with specific keywords:

phi-3 gguf
llama-3 gguf
mistral gguf
gemma gguf

Popular GGUF Models for mimOE

Here are recommended models that work well on-device:

Small Models (2-4GB): Recommended for Most Devices

Model	Size	Context	Best For
Phi-3-mini-4k	2.4GB (Q4)	4K tokens	General chat, coding, Q&A
Gemma-2B	1.8GB (Q4)	8K tokens	Fast responses, low memory
TinyLlama-1.1B	0.6GB (Q4)	2K tokens	Ultra-fast, resource-constrained

Direct links:

Medium Models (4-8GB): For Capable Devices

Model	Size	Context	Best For
Llama-3-8B	4.7GB (Q4)	8K tokens	High-quality responses
Mistral-7B	4.1GB (Q4)	8K tokens	Instruction following, reasoning
Phi-3-medium	7.6GB (Q4)	128K tokens	Long context, complex tasks

Direct links:

Understanding Quantization Levels

GGUF models come in different quantization levels. Lower quantization = smaller size but slightly lower quality.

Quantization	Size vs Original	Quality	Use When
Q2	~25%	Acceptable	Extremely limited memory
Q3	~33%	Good	Limited memory
Q4	~40%	Very Good	Recommended default
Q5	~50%	Excellent	You have extra memory
Q6	~60%	Near-perfect	Quality is paramount
Q8	~80%	Identical	Research/benchmarking

Recommended Quantization

Q4 (Q4_K_M variant) offers the best balance of size, speed, and quality for most use cases.

Downloading GGUF Models

Once you've found a model, download it:

Option 1: Click "Download" on model page

Navigate to the Files tab and click the download icon next to the .gguf file.

Option 2: Use curl (faster for large files)

# Example: Download Phi-3-mini Q4
curl -L -o phi-3-mini-4k-instruct-q4.gguf \
  "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf?download=true"

Option 3: Use Hugging Face CLI

# Install HF CLI
pip install huggingface-hub

# Download model
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf \
  Phi-3-mini-4k-instruct-q4.gguf \
  --local-dir ./models

Finding ONNX Models

Method 1: Search with Filters

Go to Hugging Face Models
In the search filters, select: : Format: ONNX : Libraries: transformers or onnx : Tasks: Select your task (Image Classification, Object Detection, etc.)

Method 2: Search by Task + "ONNX"

mobilenet onnx
resnet onnx
bert onnx
yolo onnx

Popular ONNX Models for mimOE

Image Classification

Model	Size	Input Size	Accuracy	Speed
MobileNetV2	14MB	224x224	72%	Very Fast
ResNet-50	98MB	224x224	76%	Fast
EfficientNet-B0	20MB	224x224	77%	Fast

Direct links:

Object Detection

Model	Size	Input Size	Use Case
YOLOv8n	6MB	640x640	Real-time detection (fast)
YOLOv8s	22MB	640x640	Better accuracy
YOLOv8m	52MB	640x640	High accuracy

Direct links:

YOLOv8 ONNX models

Text Embeddings

Model	Size	Embedding Dim	Use Case
all-MiniLM-L6-v2	90MB	384	Fast semantic search
BERT-base	420MB	768	Higher quality embeddings

Direct links:

Downloading ONNX Models

Option 1: Direct download

Click "Download" on the model's Files tab.

Option 2: Use curl

# Example: Download MobileNetV2
curl -L -o mobilenet_v2.onnx \
  "https://huggingface.co/onnx-community/mobilenet_v2_1.0_224/resolve/main/model.onnx?download=true"

Option 3: Python script to export

Many models don't have pre-exported ONNX versions. You can export them:

from transformers import AutoModel
from optimum.onnxruntime import ORTModelForImageClassification

# Load and export to ONNX
model = ORTModelForImageClassification.from_pretrained(
    "microsoft/resnet-50",
    export=True
)
model.save_pretrained("./resnet-50-onnx")

Evaluating Models

Before downloading, check these characteristics:

Model Card

Every model has a "Model Card" describing:

Purpose: What the model is designed for
Training data: What data it was trained on
Limitations: Known weaknesses
License: Usage restrictions

Files Tab

Check the model's files:

Size: Will it fit in your available memory?
Format: Is it .gguf or .onnx?
Variants: Multiple quantization levels available?

Community Activity

Indicators of model quality:

Downloads: More downloads = more tested
Likes: Community endorsement
Discussions: Active community support
Recent updates: Is it maintained?

Model Selection Criteria

For GGUF (Generative AI)

Choose based on:

1. Memory constraints

Rule of thumb: You need ~1.5-2x the model file size in available RAM.

4GB RAM → Up to 2B parameters (TinyLlama 1.1B, Gemma-2B)
8GB RAM → Up to 4B parameters (Phi-3-mini, SmolLM2)
16GB RAM → Up to 7B parameters comfortably (Llama-3-8B, Mistral-7B)
32GB RAM → Up to 13B parameters (Llama-3-13B, CodeLlama-13B)

2. Task complexity

Simple chat → Phi-3-mini, Gemma-2B
Code generation → Phi-3-mini, CodeLlama
Long documents → Phi-3-medium (128K context)
Maximum quality → Llama-3-8B, Mistral-7B

3. Response speed

Fastest → TinyLlama (0.6GB)
Fast → Phi-3-mini (2.4GB)
Balanced → Llama-3-8B Q4 (4.7GB)

For ONNX (Predictive AI)

Choose based on:

1. Task type

Image classification → MobileNetV2, ResNet-50
Object detection → YOLOv8n, YOLOv8s
Text embeddings → all-MiniLM-L6-v2
Sentiment analysis → DistilBERT

2. Accuracy vs. Speed

Speed priority → MobileNetV2, YOLOv8n, MiniLM
Accuracy priority → ResNet-50, YOLOv8m, BERT-base
Balanced → EfficientNet-B0, YOLOv8s

3. Input constraints

Limited preprocessing → Models with 224x224 input
High resolution → Models with 640x640+ input

Common Pitfalls to Avoid

GGUF Models

Don't download multiple quantizations: Pick one (Q4 recommended)
Check context length: Longer isn't always better (more memory)
Verify it's an instruct model: Base models aren't fine-tuned for chat

ONNX Models

Check input preprocessing: Must match model expectations
Verify output format: Some models return logits, others probabilities
Look for complete examples: Preprocessing is critical

General

Read the license: Some models have commercial restrictions
Check file integrity: Large downloads can get corrupted
Test on sample data first: Before deploying

Example Model Searches

"I want a chatbot"

Search: phi-3 gguf or llama-3 gguf

Recommendation: Phi-3-mini-4k-instruct Q4 (2.4GB)

Fast responses
Good quality
Works on most devices

"I want to classify images"

Search: mobilenet onnx or resnet onnx

Recommendation: MobileNetV2 ONNX (14MB)

Very fast
Good accuracy for common objects
Small size

"I want semantic search"

Search: sentence-transformers onnx or all-MiniLM onnx

Recommendation: all-MiniLM-L6-v2 (90MB)

Fast embedding generation
Good semantic understanding
Widely used and tested

"I want code completion"

Search: phi-3 gguf or codellama gguf

Recommendation: Phi-3-mini-4k-instruct Q4 (2.4GB)

Excellent coding capabilities
Fast
Reasonable size

Next Steps

Now that you know how to find models:

Quick Start: Upload your first model
Chat with SmolLM2: Complete GGUF tutorial
Upload Models: Detailed upload guide

Understanding Model Formats​

GGUF Models​

ONNX Models​

Finding GGUF Models​

Method 1: Search with Filters​

Method 2: Search by Keywords​

Popular GGUF Models for mimOE​

Small Models (2-4GB): Recommended for Most Devices​

Medium Models (4-8GB): For Capable Devices​

Understanding Quantization Levels​

Downloading GGUF Models​

Finding ONNX Models​

Method 1: Search with Filters​

Method 2: Search by Task + "ONNX"​

Popular ONNX Models for mimOE​

Image Classification​

Object Detection​

Text Embeddings​

Downloading ONNX Models​

Evaluating Models​

Model Card​

Files Tab​

Community Activity​

Model Selection Criteria​

For GGUF (Generative AI)​

For ONNX (Predictive AI)​

Common Pitfalls to Avoid​

GGUF Models​

ONNX Models​

General​

Example Model Searches​

"I want a chatbot"​

"I want to classify images"​

"I want semantic search"​

"I want code completion"​

Next Steps​

Resources​

Understanding Model Formats

GGUF Models

ONNX Models

Finding GGUF Models

Method 1: Search with Filters

Method 2: Search by Keywords

Popular GGUF Models for mimOE

Small Models (2-4GB): Recommended for Most Devices

Medium Models (4-8GB): For Capable Devices

Understanding Quantization Levels

Downloading GGUF Models

Finding ONNX Models

Method 1: Search with Filters

Method 2: Search by Task + "ONNX"

Popular ONNX Models for mimOE

Image Classification

Object Detection

Text Embeddings

Downloading ONNX Models

Evaluating Models

Model Card

Files Tab

Community Activity

Model Selection Criteria

For GGUF (Generative AI)

For ONNX (Predictive AI)

Common Pitfalls to Avoid

GGUF Models

ONNX Models

General

Example Model Searches

"I want a chatbot"

"I want to classify images"

"I want semantic search"

"I want code completion"

Next Steps

Resources