human

Curiosity and judgment; I test ideas in the open.

a 

training models, building evals, and testing ideas against real benchmarks

Matin Mahmood

about me

founder, applied ML researcher, and open-source builder

I am a founder and applied ML researcher in Berlin, working on the training side of AI: embedding models, retrieval evaluation, RL-style environments, and tools for understanding model behavior. My current work is hyper³labs: hierarchy-aware image-text models, HyperView, and eval loops that turn retrieval failures into better data and training runs.

Previously, I worked on smart-substation ML at GE Vernova and helped build open-source energy infrastructure tooling through PyPSA-Earth and earth-osm. I grew up between Germany and Lahore and studied Artificial Intelligence and Mathematics at the University of Edinburgh.

I build in public and share progress on X and on my blog.

Rock climbing
Escape room experience
Food and culinary experiences
Graduation ceremony
Horseback riding
Landscape view of Bayreuth
National park or monument
Remote work or travel
Stand-up comedy or presentation

career

education

projects

pre vibe-coding era only

hyper³labs / HyperView

Open-source workbench for understanding embedding spaces and dataset/model failures. HyperView links image grids, Euclidean/Poincare layouts, selections, panels, and an agent-readable CLI into one local workspace.

Python
TypeScript
PyTorch
CLIP
UMAP
Poincare

Captain Search

CLI-first search layer for AI agents. Captain Search routes web search, code search, and webpage extraction across providers, returns clean Markdown, installs a reusable agent skill, and includes provider/telemetry health checks.

Python
MCP
CLI
Web Search
Code Search
Agents

earth-osm / PyPSA-Earth

Python package and modelling pipeline work for global power-grid data. earth-osm extracts OpenStreetMap infrastructure into ML-ready files; PyPSA-Earth became an open global energy-system model.

Python
GeoPandas
OpenStreetMap
PyPSA
Energy

Alignment Graph

Interactive map of AI alignment literature built after winning the Apart Research AI Safety Hackathon. The project clusters thousands of papers with LLM assistance so newcomers and researchers can explore the field.

LLMs
NLP
Graph Visualization
AI Safety
Claude

other work

things worth mentioning

A compact log of talks, programs, wins, and technical work that sits around the main project portfolio.

  • Mar 2026: secured German ministry-supported AI compute for a Hyper3-CLIP proposal on hyperbolic adapters for multimodal retrieval.

  • Mar 8, 2026: presented HyperView in a Berlin Computer Vision Group workshop on image-embedding geometry. Event

  • Feb 1, 2026: presented HyperView in a Berlin Computer Vision Group workshop on dataset curation, outliers, and geometric projections. Event

  • Dec 2025: presented HyperView at Merantix AI Campus Hacker Room Demo Day. Video

  • Oct 2025: won the Lyceum / Merantix Model Brawls prize with the top-performing model in a two-hour GPU software challenge.

  • Sep 2025: accepted into and joined the Merantix AI Campus Hacker Room residency in Berlin while building hyper³labs/ HyperView . Program

  • Jun 2025: built a Qwen2.5VL-to-action driving prototype for 0xRobots, streaming camera frames into a VLA loop. Code

  • Mar-Sep 2025: selected for the Apart Research Fellowship after building AI safety tooling through Apart Lab Studio. Program

  • Sep 2024: built an NLP occupational-standards analysis tool in the Alan Turing Institute Data Study Group. Program

  • Mar 2024: won 1st place at the Apart Research AI Safety Hackathon with Alignment Graph , including a $1,000 prize. Site Writeup

  • Nov 2023: toured the US with Open Energy Transition / PyPSA meets Earth, organizing open energy-modelling exchanges across Austin, Stanford/San Francisco , Golden, Princeton, and Washington DC. Event

  • 2019: became a DHL UK Scholar and a Barclays Local Genius finalist while studying AI and mathematics at Edinburgh.

blog

latest posts

    Generated editorial image of candidate code tiles moving through an evaluator loop
    What I learned from using DSPy, generated scripts, offline simulators, and GEPA-style evolutionary loops on small optimization challenges.
    Generated editorial image of prompt strips becoming soft-prefix vectors before entering a model core
    A small Gemma 3 270M experiment using WinoGrande, lm-eval, and prefix tuning as a way to reason about prompt-optimization headroom.
    Generated editorial image of a decoder model spine projecting embeddings across a retrieval landscape
    A research log on causal pooling, LLM2Vec-style masking, NV-Embed-style latent pooling, F2LLM data, accelerator training, and NanoBEIR validation.

contact me

send the curious note

If something here overlaps with what you are building, researching, or trying to understand, send a note. Short, specific emails are very welcome.