LΓ€ufer nach dem Training blickt auf seine GPS-Sportuhr β€” VO2max-SchΓ€tzung per Wearable vs. Labortest

Calculating VO2max with AI: what LLMs and wearables really deliver

Christopher KlenkChristopher Klenk13 min read

Can AI estimate my VO2max? Technically yes β€” but whether the result is usable depends heavily on which kind of AI you mean. An LLM like ChatGPT works with formulas you give it. A wearable algorithm evaluates sensor data. Both estimate β€” via completely different paths, with different precision.

The answer differs significantly by method. And the most interesting finding from the research: more AI does not automatically mean more accuracy. A wearable without any ML component beats some AI-driven solutions in error rate.

At a glance

LLMs (ChatGPT, Claude, Gemini) can calculate VO2max β€” if you give them valid inputs (race time, Cooper test, HR data). AI wearables (Garmin, Apple Watch) estimate with ~3–16% error from sensor data β€” heavily dependent on fitness level. Polar reaches ~8–13% β€” using a physiological formula, without ML, but with mixed evidence. For tracking trends, wearables are enough. For training zone planning and race preparation, the lactate step test remains the gold standard.

Which methods sit behind the estimate β€” and what do they actually measure?

First things first: no wearable and no LLM measures your VO2max. They estimate it β€” via completely different paths.

Briefly explained: what VO2max actually means

VO2max describes how much oxygen your body can use per minute and per kilogram of body weight at maximum effort. It is measured correctly in a lab via spiroergometry at maximal load β€” you run or cycle until you cannot go on while wearing a mask that measures your gas exchange. Everything else is an approximation.

Wearables like Garmin and Apple Watch use heart rate and GPS pace as primary inputs. The basic principle: your HR response at a given pace says something about your aerobic capacity. The algorithm β€” FirstBeat Analytics at Garmin, an in-house model at Apple β€” compares that response with population data from millions of training sessions and interpolates a VO2max value from it.

Polar takes a different route. No training needed, no GPS. OwnIndex uses resting heart rate, HRV and your user profile β€” static, reproducible. The model is based on decades of research in collaboration with the University of JyvΓ€skylΓ€, not on ML. A well-calibrated simple model can beat a complex one β€” but the evidence is more mixed than the marketing communication suggests.

LLMs like ChatGPT, Claude or Gemini do something different: they calculate. They apply validated sports physiology formulas to your inputs. The result is only as good as your input β€” more on that in the next section.

Infographic: the three methods for VO2max estimation compared β€” LLM, GPS wearable and resting measurement

Three methods, three completely different paths to the same number.

ChatGPT, Claude, Gemini: how to calculate your VO2max with an LLM

Yes β€” ChatGPT, Claude and Gemini can calculate your VO2max. But the word "calculate" is deliberate here: LLMs measure nothing, they compute. They apply validated sports physiology formulas to your inputs. The result is only as good as your input.

The most common misunderstanding: someone writes "Can you estimate my VO2max?" β€” and the LLM starts asking questions or improvising. What works is a clear input using one of three validated methods:

Method 1: Daniels VDOT (best method for runners)

Jack Daniels β€” the sports scientist, not the whiskey β€” developed VDOT as a practical shortcut. The idea: your race time over a known distance reveals everything important about your current aerobic performance β€” oxygen uptake capacity and running economy combined. VDOT is therefore not a directly measured physiological value but a performance index: it describes which VO2max value corresponds mathematically to your race performance. All training paces are derived from this one value β€” Easy, Marathon, Tempo, Interval, Repetition. The decisive advantage over a lab test: you need no equipment, just an honestly run time. And you can track VDOT across the season β€” if it rises, your training is working.

Prerequisite: a race time over a known distance, run under normal conditions.

Prompt example: "I ran 10 km in 47:32, under normal weather conditions and well rested. Calculate my VDOT according to Daniels and give me training paces for Easy, Marathon, Tempo and Interval."

The LLM gives you a VDOT value and the matching training zones in min/km β€” directly usable for your next training plan.

Screenshot: VDOT calculation with Claude β€” prompt with 10km race time, output with training paces in min/km

This is what a VDOT prompt looks like in practice β€” input in, training zones out.

Method 2: Cooper test (no device, no race experience needed)

You run as far as possible in 12 minutes. Measure distance in metres.

Prompt example: "Cooper test: I ran 2,480 metres in 12 minutes. Calculate my VO2max and classify it for a 38-year-old man."

Formula: VO2max = (distance in metres βˆ’ 504.9) / 44.73. Well validated, no equipment required.

Method 3: HR reserve (if you have heart rate data)

Prompt example: "Resting HR 52 bpm, max HR 187 bpm. On a 30-minute easy run I averaged 138 bpm at 6:20 min/km. Estimate my VO2max using the HR reserve method."

Less precise than VDOT β€” and it assumes a valid HRmax estimate, which is itself error-prone. Useful when no race data is available, but use with due caution.

Context makes the difference

The decisive advantage over an online calculator: you can feed in factors that bias your result. Heat slows pace, poor sleep raises HR, altitude training distorts both. Example: "I ran the 10 km at 28 degrees Celsius and after a bad night β€” I reckon that cost me 1–2 minutes. Take that into account in the calculation." Claude or ChatGPT will then give you a corrected estimate with reasoning. A static calculator cannot do that.

The limit is clear: garbage in, garbage out. Whoever enters a half-hearted time or leaves out context gets back a useless value.

Which wearable estimates VO2max most accurately?

Wearable

AI/ML?

Method

Error rate (MAPE)

Garmin

Yes

FirstBeat ML, HR + GPS/pace

~3–10% (fitness-level dependent)

Apple Watch

Yes

Apple ML model, HR + GPS

~16%

Suunto

Yes

FirstBeat license (identical to Garmin)

~3–10%

Polar

No

OwnIndex, resting HR + HRV + profile

~8–13%

Fitbit

No

CardioFitness Score, resting-HR formula

~15–20%

Whoop

β€”

No VO2max value

β€”

MAPE values come from independent validation studies; they vary by fitness level, device and measurement conditions. Garmin's spread is especially pronounced: in moderately trained runners independent studies show 2.8–4.1%, in highly trained athletes ~9–10%.

Bar chart: MAPE comparison of wearables β€” Garmin, Apple Watch, Polar, Fitbit

Error rates compared β€” Garmin varies considerably with the wearer's fitness level.

Whoop is a deliberate exception: the device promises no VO2max value β€” and that is more honest than an estimate outside any validation basis.

The error rate depends heavily on who wears the device. Garmin and Apple are calibrated for recreational athletes and deliver usable numbers there. Among well-trained athletes the accuracy breaks down β€” the algorithm simply has too little reference data in that performance range.

Polar OwnIndex does something different: no training, no GPS, just a resting measurement. That is more reproducible β€” but the evidence on accuracy is mixed. One study with the Polar Vantage shows a MAPE of 13.2%, other results come out more favourably. What Polar delivers is consistency β€” not necessarily precision.

A direction that is rarely mentioned: well-trained athletes tend to be underestimated by GPS-based algorithms. The models work with population data from the recreational-athlete range β€” anyone clearly above that gets systematically low values. And if you then use that number directly for training-zone planning, you end up planning in the wrong intensity zone.

Where all AI methods hit their limits β€” and why the value still matters

Before we get to the limits: why is VO2max relevant in the first place? The answer goes further than most people think.

Performance status and training progress β€” the obvious. A VO2max that rises from 48 to 52 is a valid signal that training is working. If it falls despite training, that points to overtraining or insufficient recovery. As a trend value it is more precise than any subjective feeling.

Race prediction β€” via Daniels VDOT a performance index can be derived from a race time, from which training paces for all intensity ranges follow. Important: the direction is race time β†’ VDOT β†’ paces, not lab VO2max value β†’ paces. Measured VO2max and VDOT correlate strongly but are not the same thing.

Longevity β€” the underestimated aspect. According to current research, VO2max is one of the strongest single predictors of all-cause mortality. Mandsager et al. (2018, JAMA Network Open) showed in a large study: people with low fitness had a significantly higher mortality risk than those with average fitness β€” the difference was larger than for most classic risk factors. Peter Attia summarises these and other studies in "Outlive": cardiorespiratory fitness is the single strongest modifiable risk factor for a long, healthy life.

That means: tracking VO2max is not just a performance topic for competitive athletes. It is a health topic β€” even for someone who never plans a race.

Even so, VO2max alone is a weak reference point for precise load management when you are building your own AI running training plan. That sounds paradoxical after everything just said β€” but it holds true.

My Polar Pacer Pro shows a value of 41 via OwnIndex β€” measured at rest, without a single training session. No GPS, no pace, no sweat. Just resting heart rate and HRV, compared against a reference model. Polar's classification system rates that as "Good" β€” a status value based on a particular reference dataset, not a universal classification. What that number does not tell me: which heart-rate zone my base training should happen in. Where my anaerobic threshold sits. OwnIndex gives me a reference point β€” not a training plan.

Screenshot Polar Pacer Pro OwnIndex fitness test β€” VO2max estimate via resting measurement

Polar OwnIndex: VO2max measured at rest β€” just resting HR and HRV.

What happens in the lab goes beyond a single number. You get the ventilatory thresholds VT1 and VT2 β€” the points at which your metabolism tips. You get a complete lactate profile: how much lactate at which pace, where your aerobic threshold sits, where your anaerobic one does. Those are the values you build training zones from that fit you β€” not an average person in a reference dataset.

On top of that comes movement economy. Two athletes with an identical VO2max of 41 can run at completely different speeds β€” because one simply uses less oxygen per stride. No algorithm can measure that from resting heart rate and HRV.

And then there is the hardware problem. Optical HR sensors on the wrist have known weaknesses: at high intensity, with movement artefacts, with certain skin types or in cold weather. Faulty HR data in β€” faulty VO2max estimate out, no matter how good the model behind it is.

I am not writing this to trash wearables. I wear my own Polar every day. But I know what the number on the display achieves β€” and where it stops helping.

When is AI calculation enough β€” and who really needs the lab?

The short answer: most recreational athletes do not need a lactate step test. LLM calculation or a wearable estimate is enough for most training purposes.

You do not have any value yet β€” where to start?

No wearable, no lab, no race time? No problem. The Cooper test needs a track or a measured distance, 12 minutes of time and no equipment. The resulting VO2max estimate is rough β€” but it is your personal starting value, not that of an average person from a formula. You can work with it straight away.

When AI calculation is sufficient

When you are tracking your aerobic trend β€” a VO2max rising from 38 to 42 is a valid signal, even if the absolute value is a few points off. When you want rough training zones (Easy vs Tempo vs Hard), an LLM-calculated value works well enough as a starting point. When you are not planning a race and simply want to train more healthily: a wearable or a Cooper test plus an LLM calculation is perfectly sufficient.

Who really should go to the lab

Competitive athletes preparing for a specific event β€” sub-3 marathon, triathlon, cycling race β€” who want to calibrate their training zones precisely. The lactate step test delivers VT1 and VT2 (the ventilatory thresholds) and a complete lactate profile. Those are the values you build zones from that fit you β€” not an average from reference data.

Athletes with VO2max above ~55 ml/kg/min: wearable algorithms are calibrated for recreational athletes. In that performance range the estimate becomes unreliable β€” the models simply have too little training data here. Anyone training at that level gets different zones from the lab than from any app.

What if my VO2max does not rise despite training?

That is a scenario that occurs more often than people think β€” and neither AI nor wearables can explain it directly. A stagnating VO2max with constant training usually points to one of three problems: the intensity distribution is wrong (too much in the middle range, too little real Zone-2 work and too few hard intervals), recovery is insufficient (volume too high for your regeneration capacity), or the value itself is simply measured incorrectly. A lactate step test reveals which of these three problems is at play β€” that is the real reason for the test, not the number in itself.

The practical recommendation

Start with method 1 or 2 via an LLM. Use the result as a trend indicator. If you are training systematically to a plan but results fail to appear β€” then a lactate step test is worthwhile. Not before.

VO2max is one of several physiological parameters you should hand over to the AI β€” which other values are decisive for endurance and strength training is explained in Physiological foundations for AI training: your data as prompt context.

FAQ

Can ChatGPT calculate my VO2max? Yes β€” if you provide valid inputs. A race time (Daniels VDOT), a Cooper-test distance or HR data from a submaximal test. The LLM applies validated formulas and can take context factors into account. It does not measure β€” it calculates.

How accurate is Garmin's VO2max estimate? Heavily dependent on fitness level: in moderately trained runners recent studies show a MAPE of 2.8–4.1%, in highly trained athletes ~9–10%. There is no blanket answer.

Can the Apple Watch measure my VO2max? Estimate yes, measure no. The Apple Watch computes an estimate from heart rate and GPS pace after at least 20 minutes of outdoor activity. Independent studies show a MAPE of ~16% across all fitness levels.

When is a lactate step test worth doing instead of an AI estimate? When you want to determine training zones precisely, before race preparation, or when you hit a performance plateau. The lactate test delivers threshold values (VT1, VT2) and a complete metabolic profile β€” that is something no wearable and no LLM gives you.

Why does my Polar estimate differ from the lab value? OwnIndex is based on resting measurement β€” resting HR, HRV and user profile. Errors arise from imprecise profile data, poor sensor contact during measurement, or HRV fluctuations driven by daily form. As a trend value over weeks, OwnIndex is reliable; for precise training zones the lactate step test remains the better choice.