Deep learning that
ships

Neural networks are a means, not a brand. We pick architecture by task shape, build the training pipeline to be reproducible, and treat inference cost and latency as design constraints from the first experiment, not a post-hoc scramble.

See architecture stack Scope a model build

1960s+ ANN

Feed-forward networks

Dense layers, back-prop, early activation functions. Still the right baseline when the signal is non-structured and the dataset is small enough that anything deeper overfits.

Dense · MLP
ReLU · GELU · Swish
Batch norm · layer norm
Adam · AdamW · SGD momentum

1998+ CNN

Convolutional networks

Translation-invariant feature detectors for images, spectrograms and structured 2D signals. ResNet and EfficientNet families are still production workhorses outside vision-LM territory.

ResNet · ResNeXt · RegNet
EfficientNet · ConvNeXt
U-Net · DeepLab
YOLO · DETR

1997+ RNN · LSTM · GRU

Sequential networks

Recurrent state for text, audio and telemetry. Less fashionable since Transformers, but superior for some streaming / low-latency signals where attention quadratic cost is a hard stop.

LSTM · GRU · bi-directional
Seq2seq · attention
Neural ODE · state-space
Mamba · S4 · S6

2017+ Transformer

Attention architectures

Self-attention over tokens. The substrate for modern language, vision (ViT, DINO) and multimodal models (CLIP, Flamingo, GPT-4V). Scaling laws and mixture-of-experts still unfold here.

Encoder · decoder · enc-dec
ViT · Swin · DeiT
Mixture of experts
RoPE · ALiBi · MLA

2014+ Generative

GAN · Diffusion · VAE

Distributions learned well enough to sample from. Diffusion now dominates image and audio; autoregressive transformers dominate text and code; hybrids (consistency, flow-matching) are rising.

Stable Diffusion · SDXL · Flux
VAE · VQ-VAE · VQGAN
Consistency models
Flow-matching

2019+ Transfer

Fine-tune · adapter · LoRA

Pretrained weights plus targeted training. LoRA / QLoRA for parameter-efficient fine-tuning, full fine-tune when the task shifts domain, distillation to compress teacher into student.

LoRA · QLoRA · DoRA
Full fine-tune · DPO · PPO
Distillation · knowledge transfer
Adapter layers · prefix tuning

Foundation first

No deep model outperforms its data. The data engineering discipline carries the collection, cleaning and labelling stack that deep learning feeds on.

Open data engineering ↗

01, Dataset
Dataset

Curation, deduplication, contamination check against public eval sets. For custom data: licensing review, PII filter, quality scoring, held-out slices by source and cohort.
02, Architecture
Architecture

Pick the model family from task shape, not vintage. Set hyperparameter search space: width, depth, LR schedule, batch size, warmup, regularisation. Sweep before commit.
03, Train
Train

Distributed training with FSDP or DeepSpeed ZeRO. Mixed precision (bf16), gradient checkpointing, activation offload. Checkpoint every N steps; rewind on instability.
04, Evaluate
Evaluate

Offline suites, leakage-controlled holdout, slice analysis across cohorts. Human pairwise review for generative output. Drift baseline established on staging traffic.
05, Ship
Ship

Quantisation (INT8, 4-bit GPTQ/AWQ), KV cache tuning, batching strategy. vLLM or TensorRT-LLM for serving, SLO gates on latency and quality, canary rollout.

Image

Latent diffusion pipelines

Stable Diffusion, SDXL, Flux for text-to-image and image-to-image. ControlNet, LoRA adapters, IP-Adapter for conditioning. Evaluation on FID, human pairwise and downstream use-case metrics.

Language

LLM training and alignment

Base-model pretraining at <1B parameters when the moat is the data, continued pretraining on domain corpora, SFT then DPO or PPO for alignment. Evals on closed-book and RAG settings.

Audio

Speech and music synthesis

Text-to-speech (XTTS, StyleTTS2), streaming synthesis for assistants, music continuation (MusicGen, Stable Audio). Multi-speaker cloning under licensing and consent guardrails.

Multimodal

Vision-language and video

CLIP-family encoders for retrieval, vision-language models for captioning and VQA, diffusion video (SVD, CogVideoX) for short-form generation with temporal consistency.

Distributed training

FSDP · DeepSpeed ZeRO-3
Tensor + pipeline parallelism
Ray Train orchestration
Spot-instance checkpointing

Inference optimisation

vLLM · TensorRT-LLM · TGI
INT8 · GPTQ · AWQ · 4-bit
Speculative decoding
KV-cache quantisation

Evaluation & QA

DeepEval · Ragas · LM-Eval
LangSmith · Braintrust
Red-team + jailbreak suites
Pairwise human review

Safety & governance

LlamaGuard · ShieldGemma
PII + prompt-injection filters
Model cards · training logs
EU AI Act · SOC 2 mapping

2012

AlexNet

Krizhevsky · Sutskever · Hinton

CNNs + GPU training broke the ImageNet benchmark and started the deep-learning wave.

2014

GAN

Goodfellow et al.

Adversarial training opened generative modelling for images, audio and molecules.

2015

ResNet

He · Zhang · Ren · Sun

Skip connections let networks train much deeper without vanishing gradients.

2017

Transformer

Vaswani et al. ("Attention Is All You Need")

Self-attention replaced recurrence and became the substrate for modern language, vision and multimodal models.

2020

GPT-3

Brown et al.

Scaling laws: bigger models + more data + few-shot prompting shifted what "learning" meant in practice.

2020

DDPM

Ho · Jain · Abbeel

Denoising diffusion reframed generative modelling; Stable Diffusion and successors grew from this line.

2021

LoRA

Hu et al.

Low-rank adapters made parameter-efficient fine-tuning the pragmatic default for domain adaptation.

2023

Mamba

Gu · Dao

State-space models challenged the attention monopoly on long sequences with linear-time alternatives.

Upstream

Machine Learning

Classical ML, tabular gradient boosting, feature engineering. Often the right baseline before a deep model is warranted.

Foundation

Data Engineering

Pipelines, lakehouses and the training-data layer that decides whether the model is worth training at all.

Applied

Computer Vision

Deep learning's oldest production surface. Detection, segmentation, OCR, medical imaging pipelines.

Umbrella

Artificial Intelligence

The wider discipline map. Deep learning is one of six pillars we operate across.

Train · fine-tune · serve

Have the data, need the weights

Share the task, the data shape and the inference budget. We come back with an architecture shortlist, compute estimate and training plan inside ten working days. No benchmark leaderboards; only numbers tied to your use case.

Start a deep learning engagement Or open the MLOps layer

Artificial Intelligence

Machine Learning

Data Engineering

Computer Vision

Deep Learning

Natural Language Processing

MLOps & Governance

Cyber Security & Risk Ops

Technology Stack

AI Integration

SaaS Product Development

E-Commerce & Marketplace

Growth Analytics & SEO/GEO

Mobile App Development

Web & Content Platforms

CRM & Revenue Operations

Code & Performance Refactoring

Financial Technology

Healthcare & MedTech

E-Commerce & Retail

Manufacturing & Industrial

Media & Publishing

Education & EdTech

Real Estate & PropTech

Logistics & Supply Chain

Energy & Sustainability

Project Management

Product Strategy

DevOps & Cloud Infrastructure

Enterprise Workflow Automation

Business Intelligence

QA & Release Governance

UX/UI Systems & Design

Change Management & Transformation

Portfolio Management

Deep learning that ships

Six families, each with its own reason

Feed-forward networks

Convolutional networks

Sequential networks

Attention architectures

GAN · Diffusion · VAE

Fine-tune · adapter · LoRA

Five stages from dataset to deployed weights

Dataset

Architecture

Train

Evaluate

Ship

Four domains where generative models reach production

Latent diffusion pipelines

LLM training and alignment

Speech and music synthesis

Vision-language and video

Training at scale, serving at cost, governed on release

Distributed training

Inference optimisation

Evaluation & QA

Safety & governance

The eight papers we keep open on the desk

AlexNet

GAN

ResNet

Transformer

GPT-3

DDPM

LoRA

Mamba

Deep learning never ships alone

Machine Learning

Data Engineering

Computer Vision

Artificial Intelligence

Have the data, need the weights

Deep learning that
ships