Senior Data Scientist, AI & Automation

Ben Drury

I turn messy, real-world data problems into production systems, from scoping and modelling through to deployment. I measure success by impact delivered, not complexity built.

At Cromwell Tools on AI extraction and structured content. Previously led automation and ML at Zoro UK.

Current role

Data Scientist

Cromwell Tools

How I work

Production ML

Scope, ship, measure impact

Education

MSc Data Analytics

Distinction, De Montfort University

Python & SQLAI & MLVector searchETLFastAPIGCP

Portfolio

Interactive projects & demos

Live implementations you can explore: RAG, LLMs, multimodal validation, and more.

View all projects

Beyond this site

Building & live work

Separate from portfolio demos: products and platforms I'm shipping elsewhere, including work in progress.

In development

SpecLogic

Data-first buying decisions

A decision engine for high-ticket purchases, turning complex specs into comparable scores so buyers choose with confidence, not guesswork.

  • Comparison engine
  • Technical benchmarks
  • Category scoring
speclogic.co.uk
SpecLogic preview

Architecture & delivery

Data science work without a demo

Production systems where the value is in pipelines, scale, and outcomes, not a clickable UI on this site.

Zoro UK · ML

Hybrid Classification Pipeline

Auto-categorised 5M+ records across 2,000+ categories at 84% accuracy, halving manual categorisation and accelerating supplier onboarding.

Product data
Classifier
Taxonomy
  • 84% accuracy at scale
  • 50% less manual work
  • Faster supplier onboarding
PythonNLPBigQuerySQL

Zoro UK · Entity resolution

Vector Product Grouping

Vector-based model grouped 70% of SKUs across 300K+ families at 90%+ accuracy, reclaiming 30+ hours of manual effort per week.

SKU embeddings
Similarity
Product families
  • 90%+ grouping accuracy
  • 70% SKUs auto-grouped
  • 30+ hrs/week saved
PythonVector searchEmbeddingsGCP

Cromwell Tools · Current

Structured Content & Extraction

AI extraction from PDFs and supplier feeds, plus a componentized content framework replacing unstructured blobs with reusable fields.

PDFs & feeds
Extract + normalise
Structured PIM
  • Automated PDF/supplier extraction
  • Structured page fields
  • Cross-channel consistency
PythonLLMsETLGCP

Career

Experience & education

A quick scroll through roles and study. Full detail in the CV.

  1. workDec 2025 to Present

    Data Scientist

    Cromwell Tools

    Building AI extraction from PDFs and supplier data, leading a structured content framework for product pages, and normalisation models for attribute consistency across channels.

  2. workOct 2023 to Dec 2025

    Senior Data Automation Analyst

    Zoro UK (Grainger plc)

    Technical ownership of automation for 5M+ products: hybrid classification, vector-based product grouping, mentoring, and presenting ROI to global leadership.

  3. workNov 2022 to Oct 2023

    Data Automation Analyst

    Zoro UK (Grainger plc)

    Python ETLs and supplier pipelines; developed a TF-IDF alternative-product model, A/B tested via Unleash, and scaled to full site traffic after significance.

  4. workMar 2022 to Nov 2022

    Content Quality Analyst

    Zoro UK (Grainger plc)

    Cut handover-to-launch to 3 days with onboarding automation (1M+ image downloads) and Looker/BigQuery dashboards for catalogue quality.

  5. educationOct 2020 to Feb 2022

    MSc Data Analytics (Distinction)

    De Montfort University, Leicester

    Advanced analytics, statistics, and data systems with applied projects.

  6. educationOct 2017 to Jul 2020

    BSc (Hons) Computer Science (First Class)

    De Montfort University, Leicester

    Software engineering and data foundations with practical, project-led focus.

Toolkit

Tech stack & strengths

Tools and domains I work in day to day, from notebooks to production APIs.

Python

Languages

SQL / BigQuery

Data

LLMs & RAG

AI

FastAPI

Backend

GCP

Cloud

Docker

DevOps

ETL Pipelines

Data

Vector Search

ML

Git / CI

Engineering

Vertex AI

ML

Systems

Python · SQL · FastAPI · Docker · GCP (Cloud Run, Vertex AI, BigQuery)

ML & AI

Vector search · NLP · Clustering · Information retrieval · Multi-modal extraction

Delivery

Technical roadmapping · Stakeholder management · Agile delivery · Production monitoring

Certifications

Databricks Gen AI & AI Agents · Google Cloud Generative AI · TensorFlow