Our Technology Stack
We are tool-agnostic but deeply experienced. We choose technologies based on the problem, not the hype cycle. Here is what we work with every day.
LLMs & Foundation Models
Large language models and multimodal AIGPT-4.1 / GPT-4o
OpenAI's frontier models for reasoning, coding, and generation
Claude 4 Opus / Sonnet
Anthropic's latest for analysis, agentic tasks, and 1M-token context
Llama 4 / Maverick
Meta's open-weight models for on-premise and fine-tuning
Gemini 2.5 Pro / Flash
Google's multimodal models with native tool use and thinking
Mistral Large / Codestral
Efficient open models for cost-sensitive and coding workloads
DeepSeek-V3 / R1
High-performance open reasoning models with MoE architecture
Qwen 2.5 / QwQ
Alibaba's multilingual models with strong code and math
Grok 3
xAI's reasoning model with extended thinking capabilities
Image & Video Generation
Diffusion models, video synthesis, and creative AIFlux 1.1 Pro / Dev
Black Forest Labs' state-of-the-art image generation
DALL-E 3
OpenAI's text-to-image with precise prompt adherence
Stable Diffusion 3.5
Open-weight image generation for self-hosted pipelines
Sora / Veo 2
OpenAI and Google video generation models
Kling 2.0
Kuaishou's video generation with motion control
ComfyUI
Node-based workflow builder for diffusion pipelines
Computer Vision
Detection, segmentation, and image analysisYOLO11 / YOLOv10
Latest Ultralytics real-time detection and segmentation
SAM 2
Meta's Segment Anything for zero-shot image and video segmentation
Florence-2
Microsoft's unified vision model for captioning, detection, and grounding
Grounding DINO 1.5
Open-set object detection with text-guided grounding
PyTorch 2.x
Primary deep learning framework with torch.compile
OpenCV
Image processing and classical vision algorithms
TensorRT 10
NVIDIA inference optimization for edge and data center
ONNX Runtime
Cross-platform model inference and optimization
Speech & Audio
Speech-to-text, TTS, and voice AIWhisper v3 Turbo
OpenAI's latest speech recognition -- faster, more accurate
Deepgram Nova-3
Real-time streaming ASR with diarization
ElevenLabs
Production-grade TTS with voice cloning
OpenAI Realtime API
Native speech-to-speech for voice agents
Sesame CSM
Conversational speech model with emotional intonation
Piper / Kokoro
Open-source TTS for on-premise and edge deployment
Agentic AI & Orchestration
Agent frameworks, tool use, and workflow enginesLangGraph
Stateful multi-agent orchestration with cycles and persistence
Claude Agent SDK
Anthropic's framework for building production AI agents
OpenAI Agents SDK
Agent loops with handoffs, guardrails, and tool use
CrewAI
Multi-agent collaboration with role-based delegation
MCP (Model Context Protocol)
Anthropic's open standard for connecting agents to tools and data
Temporal
Durable workflow execution for long-running agent tasks
LangChain / LangSmith
LLM application framework with tracing and evaluation
Celery
Distributed task queues for async processing
RAG, Embeddings & Data
Vector databases, retrieval, and knowledge systemsPinecone
Managed vector database for production RAG
Qdrant
High-performance vector search with filtering and payloads
Weaviate
Open-source vector DB with hybrid keyword + semantic search
pgvector / pgvectorscale
Vector search as a PostgreSQL extension -- no extra infra
LlamaIndex
Data framework for ingestion, indexing, and retrieval
Unstructured.io
Document parsing for PDFs, images, tables, and slides
Cohere Embed / Rerank
Production embeddings and reranking for retrieval quality
Web & Backend
Application frameworks, databases, and APIsFlask / FastAPI
Python web frameworks -- sync and async API serving
Next.js / React
Frontend framework for dashboards and SSR applications
PostgreSQL / MySQL
Primary relational databases with JSON support
Redis / Valkey
Caching, queues, rate limiting, and session storage
SQLAlchemy / Prisma
ORMs for Python and TypeScript backends
gRPC / tRPC
High-performance service-to-service communication
GPU & Compute Hardware
Accelerators, NPUs, and inference infrastructureNVIDIA H100 / H200
Data center GPUs for large-scale training and inference
NVIDIA L4 / L40S
Cost-efficient GPUs for inference and fine-tuning
AMD MI300X
192GB HBM3 accelerator for LLM training and HPC
Google TPU v5e / v5p
Custom silicon for JAX/TF training on GCP
AWS Inferentia2 / Trainium
AWS custom chips for cost-efficient training and inference
Intel Gaudi 3
Alternative accelerator with native PyTorch support
NVIDIA Jetson Orin
Edge AI computing for vision, robotics, and IoT
NVLink / InfiniBand
High-speed GPU interconnect for multi-node clusters
Model Serving & Inference
Efficient model deployment and optimizationvLLM
High-throughput LLM serving with PagedAttention
TGI (Text Generation Inference)
Hugging Face's production LLM serving engine
Ollama
Local LLM runner for dev, testing, and edge deployment
llama.cpp / GGUF
CPU and mixed-precision inference for quantized models
Triton Inference Server
NVIDIA multi-framework model serving platform
BentoML
Unified model packaging and deployment framework
Robotics & Embodied AI
Robot control, simulation, and autonomous systemsROS2 Jazzy / Rolling
Latest Robot Operating System for production robotics
Nav2
Navigation stack for autonomous mobile robots
MoveIt 2
Motion planning for robotic manipulation
NVIDIA Isaac Sim / Lab
GPU-accelerated simulation for synthetic data and sim-to-real
Gazebo Harmonic
Open-source physics simulation for testing
LeRobot
Hugging Face's open robotics learning framework
Deployment & Infrastructure
Cloud, containers, GPU clouds, and CI/CDDocker / Podman
Containerization for all services
Kubernetes / K3s
Container orchestration -- full-scale and lightweight
AWS / GCP / Azure
Hyperscale cloud providers for GPU and general compute
Lambda Labs / CoreWeave
GPU-first cloud providers for ML workloads
Terraform / Pulumi
Infrastructure as code -- declarative and programmatic
GitHub Actions
CI/CD pipelines and automation
NVIDIA GPU Operator
Automated GPU driver and toolkit management on Kubernetes
Observability & AI Ops
Monitoring, tracing, evaluation, and guardrailsGrafana / Prometheus
Metrics dashboards, alerting, and time-series monitoring
LangSmith / LangFuse
LLM tracing, prompt evaluation, and cost tracking
Weights & Biases
Experiment tracking, model registry, and dataset versioning
Sentry
Error tracking and performance monitoring
OpenTelemetry
Vendor-neutral distributed tracing standard
Guardrails AI / NeMo
Safety rails, output validation, and content filtering
Security & Compliance
Data protection, access control, and auditOAuth 2.0 / OIDC
Standard authentication and authorization protocols
Vault / SOPS
Secrets management and encrypted config
PII Detection / Redaction
Automated sensitive data scanning in AI pipelines
SOC 2 / HIPAA tooling
Compliance frameworks for regulated AI deployments
Technology-Agnostic, Not Technology-Indifferent
We have strong opinions on technology selection, loosely held. The right tool depends on your constraints -- latency, cost, compliance, team expertise, and scale. We will recommend what fits your problem, not what is trending on Hacker News.