AI Models Guide

AI Models & Selection

SOHAM routes your queries across 13+ specialized models from Groq, Google, HuggingFace, and OpenRouter. Here's what's available and when to use each.

Auto Mode โ€” Recommended for Most Users

SOHAM analyzes your query and routes it to the most capable model automatically

๐Ÿงฎ

Math problems

Routed to math-capable models (Qwen, Llama)

๐Ÿ’ป

Code questions

Routed to DeepSeek V3.2 or Llama 3.3 70B

๐Ÿ–ผ๏ธ

Image tasks

Routed to Gemini multimodal models

๐Ÿ”

Web queries

Auto-triggers DuckDuckGo search + AI synthesis

๐Ÿ’ฌ

General chat

Routed to fast conversational models

๐Ÿ“„

Document tasks

Routed to large-context models

Available Models

General Purpose

Fast, versatile models for everyday tasks

Llama 3.1 8B Instant

Groq โ€” Ultra-fast inference, great for quick answers and general chat

Fastest response
Streaming
Low latency

Llama 3.2 1B Instruct
Free

HuggingFace โ€” Compact free model, good for lightweight tasks

Free
Lightweight
No limits

Llama 3.1 70B Instruct
Free

HuggingFace โ€” Large free model with strong reasoning

Free
High quality
Complex reasoning

Qwen 2.5 7B Instruct
Free

HuggingFace โ€” Strong multilingual model, excellent at math and code

Free
Multilingual
Math & code

Coding Specialists

Models fine-tuned for programming, debugging, and architecture

DeepSeek V3.2

HuggingFace โ€” State-of-the-art coding model โ€” best for complex programming tasks

Code generation
Debugging
Architecture advice

Conversational

Models optimized for natural dialogue

RNJ-1 Instruct

HuggingFace โ€” Efficient conversational model with good personality

Natural dialogue
Context awareness
Friendly tone

Gemini 2.5 Flash Lite

Google โ€” Lightweight Google model, fast and helpful

Fast responses
Casual chat
Helpful tone

Multimodal (Vision)

Models that understand and analyze images

Gemini 2.5 Flash

Google โ€” Google's latest multimodal model โ€” best for image analysis

Image analysis
Visual Q&A
Large context

Gemini Flash Latest

Google โ€” Latest Gemini Flash with improved vision capabilities

Image understanding
Text + vision
Fast processing

Free / OpenRouter

Completely free models with no usage limits

OpenRouter Auto (Free)
Free

OpenRouter โ€” Automatically selects the best available free model from OpenRouter

Auto selection
No limits
Multiple providers

How to Select a Model

Desktop

  1. 1. Click the Settings icon (โš™๏ธ) in the header
  2. 2. Find the "AI Model" dropdown
  3. 3. Select "Auto" or a specific model
  4. 4. Settings save automatically

Mobile

  1. 1. Tap the model button in the chat header
  2. 2. A bottom sheet opens with all models
  3. 3. Tap to select your preferred model
  4. 4. Sheet closes and model is applied

๐Ÿ’ก Model Selection Tips

For coding tasks

Use DeepSeek V3.2 for complex programming, debugging, and architecture questions.

For math problems

Auto mode routes to Qwen 2.5 or Llama models which handle math well.

For image analysis

Use Gemini 2.5 Flash for image understanding and visual Q&A.

For speed

Groq's Llama 3.1 8B Instant is the fastest model available.

For free unlimited use

HuggingFace models (Llama, Qwen) and OpenRouter Auto have no usage limits.

For general chat

Auto mode or RNJ-1 Instruct work best for everyday conversations.

Model Providers

Groq

Ultra-fast inference with streaming

2 models available
Fastest responses
Streaming
Low latency

HuggingFace

Open-source models, many free

6 models available
Free models
Open source
Diverse selection

Google

Advanced multimodal capabilities

3 models available
Vision support
Large context
Multimodal

OpenRouter

Aggregated access to many models

1 model available
Auto selection
Free tier
Multiple providers