LLMs & Foundation Models

Large language models and multimodal AI

GPT-4.1 / GPT-4o

OpenAI's frontier models for reasoning, coding, and generation

Claude 4 Opus / Sonnet

Anthropic's latest for analysis, agentic tasks, and 1M-token context

Llama 4 / Maverick

Meta's open-weight models for on-premise and fine-tuning

Gemini 2.5 Pro / Flash

Google's multimodal models with native tool use and thinking

Mistral Large / Codestral

Efficient open models for cost-sensitive and coding workloads

DeepSeek-V3 / R1

High-performance open reasoning models with MoE architecture

Qwen 2.5 / QwQ

Alibaba's multilingual models with strong code and math

Grok 3

xAI's reasoning model with extended thinking capabilities

Image & Video Generation

Diffusion models, video synthesis, and creative AI

Flux 1.1 Pro / Dev

Black Forest Labs' state-of-the-art image generation

DALL-E 3

OpenAI's text-to-image with precise prompt adherence

Stable Diffusion 3.5

Open-weight image generation for self-hosted pipelines

Sora / Veo 2

OpenAI and Google video generation models

Kling 2.0

Kuaishou's video generation with motion control

ComfyUI

Node-based workflow builder for diffusion pipelines

Computer Vision

Detection, segmentation, and image analysis

YOLO11 / YOLOv10

Latest Ultralytics real-time detection and segmentation

SAM 2

Meta's Segment Anything for zero-shot image and video segmentation

Florence-2

Microsoft's unified vision model for captioning, detection, and grounding

Grounding DINO 1.5

Open-set object detection with text-guided grounding

PyTorch 2.x

Primary deep learning framework with torch.compile

OpenCV

Image processing and classical vision algorithms

TensorRT 10

NVIDIA inference optimization for edge and data center

ONNX Runtime

Cross-platform model inference and optimization

Speech & Audio

Speech-to-text, TTS, and voice AI

Whisper v3 Turbo

OpenAI's latest speech recognition -- faster, more accurate

Deepgram Nova-3

Real-time streaming ASR with diarization

ElevenLabs

Production-grade TTS with voice cloning

OpenAI Realtime API

Native speech-to-speech for voice agents

Sesame CSM

Conversational speech model with emotional intonation

Piper / Kokoro

Open-source TTS for on-premise and edge deployment

Agentic AI & Orchestration

Agent frameworks, tool use, and workflow engines

LangGraph

Stateful multi-agent orchestration with cycles and persistence

Claude Agent SDK

Anthropic's framework for building production AI agents

OpenAI Agents SDK

Agent loops with handoffs, guardrails, and tool use

CrewAI

Multi-agent collaboration with role-based delegation

MCP (Model Context Protocol)

Anthropic's open standard for connecting agents to tools and data

Temporal

Durable workflow execution for long-running agent tasks

LangChain / LangSmith

LLM application framework with tracing and evaluation

Celery

Distributed task queues for async processing

RAG, Embeddings & Data

Vector databases, retrieval, and knowledge systems

Pinecone

Managed vector database for production RAG

Qdrant

High-performance vector search with filtering and payloads

Weaviate

Open-source vector DB with hybrid keyword + semantic search

pgvector / pgvectorscale

Vector search as a PostgreSQL extension -- no extra infra

LlamaIndex

Data framework for ingestion, indexing, and retrieval

Unstructured.io

Document parsing for PDFs, images, tables, and slides

Cohere Embed / Rerank

Production embeddings and reranking for retrieval quality

Web & Backend

Application frameworks, databases, and APIs

Flask / FastAPI

Python web frameworks -- sync and async API serving

Next.js / React

Frontend framework for dashboards and SSR applications

PostgreSQL / MySQL

Primary relational databases with JSON support

Redis / Valkey

Caching, queues, rate limiting, and session storage

SQLAlchemy / Prisma

ORMs for Python and TypeScript backends

gRPC / tRPC

High-performance service-to-service communication

GPU & Compute Hardware

Accelerators, NPUs, and inference infrastructure

NVIDIA H100 / H200

Data center GPUs for large-scale training and inference

NVIDIA L4 / L40S

Cost-efficient GPUs for inference and fine-tuning

AMD MI300X

192GB HBM3 accelerator for LLM training and HPC

Google TPU v5e / v5p

Custom silicon for JAX/TF training on GCP

AWS Inferentia2 / Trainium

AWS custom chips for cost-efficient training and inference

Intel Gaudi 3

Alternative accelerator with native PyTorch support

NVIDIA Jetson Orin

Edge AI computing for vision, robotics, and IoT

NVLink / InfiniBand

High-speed GPU interconnect for multi-node clusters

Model Serving & Inference

Efficient model deployment and optimization

vLLM

High-throughput LLM serving with PagedAttention

TGI (Text Generation Inference)

Hugging Face's production LLM serving engine

Ollama

Local LLM runner for dev, testing, and edge deployment

llama.cpp / GGUF

CPU and mixed-precision inference for quantized models

Triton Inference Server

NVIDIA multi-framework model serving platform

BentoML

Unified model packaging and deployment framework

Robotics & Embodied AI

Robot control, simulation, and autonomous systems

ROS2 Jazzy / Rolling

Latest Robot Operating System for production robotics

Nav2

Navigation stack for autonomous mobile robots

MoveIt 2

Motion planning for robotic manipulation

NVIDIA Isaac Sim / Lab

GPU-accelerated simulation for synthetic data and sim-to-real

Gazebo Harmonic

Open-source physics simulation for testing

LeRobot

Hugging Face's open robotics learning framework

Deployment & Infrastructure

Cloud, containers, GPU clouds, and CI/CD

Docker / Podman

Containerization for all services

Kubernetes / K3s

Container orchestration -- full-scale and lightweight

AWS / GCP / Azure

Hyperscale cloud providers for GPU and general compute

Lambda Labs / CoreWeave

GPU-first cloud providers for ML workloads

Terraform / Pulumi

Infrastructure as code -- declarative and programmatic

GitHub Actions

CI/CD pipelines and automation

NVIDIA GPU Operator

Automated GPU driver and toolkit management on Kubernetes

Observability & AI Ops

Monitoring, tracing, evaluation, and guardrails

Grafana / Prometheus

Metrics dashboards, alerting, and time-series monitoring

LangSmith / LangFuse

LLM tracing, prompt evaluation, and cost tracking

Weights & Biases

Experiment tracking, model registry, and dataset versioning

Sentry

Error tracking and performance monitoring

OpenTelemetry

Vendor-neutral distributed tracing standard

Guardrails AI / NeMo

Safety rails, output validation, and content filtering

Security & Compliance

Data protection, access control, and audit

OAuth 2.0 / OIDC

Standard authentication and authorization protocols

Vault / SOPS

Secrets management and encrypted config

PII Detection / Redaction

Automated sensitive data scanning in AI pipelines

SOC 2 / HIPAA tooling

Compliance frameworks for regulated AI deployments

Our Technology Stack