Sportler beim Klimmzug-Test – ein trainiertes Modell analysiert Fitness-Defizite fΓΌr den personalisierten Trainingsplan

Study Check: No AI Chatbot Needed – Tests & Machine Learning Build Your Training Plan

Christopher KlenkChristopher Klenk6 min read

Runners run. Gym-goers bench. Almost nobody trains their actual weak point – because almost nobody knows where it is. Chinese researchers have now published a machine-learning model in Scientific Reports that detects exactly that and automatically builds a training plan around it. With real data, a randomised trial – and a few gaps you should know about. And without any AI chatbot.

At a glance

Chinese researchers built a model trained on real fitness data from four tests that detects where your biggest deficit lies – and aligns your training plan precisely around it. In a trial with 1,160 participants, where one group got the model-generated plan and the other trained normally: noticeably more pull-ups, better running times, 23.5% less overweight. That sounds impressive – until you see that the diet was completely overhauled in parallel, and that effect is not factored out of the results. Still, the study demonstrates a principle that works.

The idea behind it is simpler than it sounds

You do four tests: a 3,000-metre run, pull-ups, sit-ups in 60 seconds, and a short shuttle sprint. That data is fed into a machine-learning model – a system trained on thousands of real fitness measurement profiles that has learned to spot patterns. It looks at where you perform worst compared to similar profiles. From there it identifies your biggest deficit – and that is exactly what gets the most focus in your training plan.

The foundation is called SHAP – a method that essentially forces the model to explain itself: it is not enough to say "you are overweight", it has to justify why. In plain terms: the model says "your pull-up count contributed by far the most to this classification." That value then directly drives how your training plan is put together. More strength deficit, more strength training. Poor endurance, more HIIT and endurance sessions.

For those who want the details: what the model actually does

Technically, it is a 1D-CNN with multi-head attention as a feature extractor, followed by a LightGBM classifier. The CNN processes the 15 split times from the 3,000-metre run as a genuine time series – the other three tests are simply duplicated to fit the format. The 94.5% classification accuracy for BMI categories sounds impressive; but the paper shows that the prescription layer above it (SHAP β†’ training building blocks) is rule-based, not another layer of machine learning.

This is not a chatbot – and that matters

Before anyone asks: this has nothing to do with a chatbot technically. No ChatGPT, no Claude, no prompt. What the researchers trained is a specialised model on real measurement data – and that difference matters in practice.

Imagine you ask ChatGPT: "I do 8 pull-ups, run 3,000 metres in 14 minutes and manage 32 sit-ups in 60 seconds – where is my biggest deficit?" ChatGPT gives you an answer. A plausible one, even. But it is based on texts the model happened to read at some point – articles, forums, books about training. It does not know how your 14 minutes compare to thousands of other people of similar body weight. It does not classify, it interprets.

The model in this study does exactly that. It was trained on 6,698 real measurement profiles and in doing so learned: which combination of running time, pull-ups, sit-ups and shuttle time is statistically linked to which weight category? When you feed in your four test values, you do not get a generated answer back – you get a prediction that follows from measured data. And via SHAP you see it in black and white: your pull-up count contributed most to the classification, with a value of +0.21. The shuttle run worked against it at βˆ’0.08. That is not interpretation, that is calculation.

An AI chatbot can explain what a plan for someone with a strength deficit should look like. This model figures out for itself that you have a strength deficit.

What the trial shows – and what it does not prove

1,160 participants, 12 weeks, two groups: one received the model-generated training plan, the other trained according to a standard protocol. The numbers from the first group are clearly better – on average 4.5 more pull-ups, 1.4 minutes faster over 3,000 metres, 3.3 seconds faster in the shuttle. Something happened there.

The problem is stated in the paper itself: alongside the training plan, the intervention also included a detailed nutrition programme with specific calorie targets and macro goals. And that nutrition programme was not factored out of the results. So you cannot cleanly say how much came from the training plan and how much from the diet change.

Two more points you should know

The sample consists exclusively of 18- to 20-year-old male students from a Chinese vocational school. No women, no older adults, no competitive athletes – it is about as homogeneous as it gets. On top of that, the model classifies BMI categories, not body fat or muscle mass. Anyone with real muscle ends up in the "overweight" cluster, no matter how fit they actually are. The model does not pick up on that.

The principle works – what you can do with it today

You will not simply be able to download the model from the study – the data is not public, and neither is the code. But the underlying principle is already usable today, with what you have. Anyone who knows their weakest result and consistently aligns their training plan around it demonstrably progresses faster than with a generic plan. The study delivers numbers for that, not gut feeling.

Three ways to use this today

  1. Simple: Do the four tests yourself and feed the values into a chatbot prompt – for example: "I do 8 pull-ups, run 3,000 metres in 14:30, manage 28 sit-ups in 60 seconds and 32 seconds in the 30Γ—2 shuttle. Which is my biggest deficit compared to a fit athlete of my age – and how should I train for the next 4 weeks?" No specialised model, but the basic principle still works.

  2. Better: Use your wearable data (HRV, training load, VOβ‚‚max estimate) as input instead of four individual tests. More data points, better prioritisation.

  3. For tech enthusiasts: Train your own small model on your own training data from the last few years. Doable in Python – and then actually personalised to you, not to data from a Chinese vocational school.

What happens when you combine the best of both worlds – specialised model for the diagnosis, chatbot for the plan – is the really interesting question. The building blocks all exist today. What is missing is someone to put them together.

The prompt paradox hits directly here: anyone who knows what distinguishes endurance, maximal strength and anaerobic capacity gets a usable prioritisation from the chatbot and can build the perfect training plan with ChatGPT, Gemini or Claude themselves. Anyone who does not know that gets a confidently worded answer they cannot classify.

Source: Mo M. et al. A machine learning framework for personalized exercise prescription based on BMI and physical fitness assessment. Scientific Reports (2026). DOI: 10.1038/s41598-026-42405-2