AI Engineer (Image & Video)
About:
CATCHES builds physics-backed AI for garment simulation and virtual try-on, used by luxury fashion brands. Launched at Nvidia GTC 2026 after several years in stealth at the cutting edge of physics informed research and development we are now handling enterprise scale consumer facing virtual try-on’s for brand partners globally.
We are backed by investors from both sides of the industry, including Antoine Arnault, Natalia Vodianova Arnault, Roy Chung (founder, Apollo.io), Dillon Erb (founder, Paperspace), Gary Sheinbaum (former CEO, Tommy Hilfiger) and Sarah Willersdorf (former Head of Luxury, BCG)
Role:
You'll be working directly with our image and video models, owning quality, consistency, and performance at production scale.
The outputs are consumer-facing and visible to a global audience of fashion brands and their customers, so the bar is high.
That means fine-tuning, optimising inference, and making deliberate choices about model architecture and tooling.
You'll have real input into how we build and evolve our visual AI stack.
You:
Strong hands-on experience with diffusion-based image and/or video generation models.
Proven ability to optimise VLMs for performance, quality, and consistency in production environments.
Experience fine-tuning or adapting foundation models for specific domains or outputs.
Solid Python skills and familiarity with model serving and inference infrastructure.
-
Ability to evaluate and iterate on visual outputs with a rigorous, metrics-driven approach.
Nice to have:
Hands-on experience with FLUX and/or LTX.
Familiarity with ControlNet, IP-Adapter, or similar conditioning approaches.
Experience with video generation frameworks and temporal consistency techniques.
Knowledge of garment, fashion, or e-commerce visual generation use cases.
Experience with cloud-based GPU inference (GCP, AWS, or equivalent).
Apply for the job
If this sounds like your kind of problem, we'd love to hear from you. We welcome applications from all backgrounds.
