wearable device on an athlete arm

New Study Reveals Holes in Wearable Device Scores


By Frederic Sabater Pastor,
CTS Expert Coach,
Associate Researcher at University of Perpignan Via Domitia

In recent years the market has seen an influx of new companies creating wearable devices to measure sleep, recovery, breathing, blood glucose, blood oxygen, and more. They are developing innovative products and algorithms, and doing some fantastic marketing. Athletes are eager for devices that promise to improve fitness, enhance recovery, or provide insights that help them perform better. But marketing moves faster than evidence, and once science catches up, the claims don’t always match the results.

Some information wearables provide is both accurate and useful for athletes and coaches. That appears to be the case with heart rate variability (HRV) or resting heart rate (RHR). Other types of data are useful in certain situations, like pulse oximetry. However, marketing claims tend to be ahead of the science that validates them, and strain and recovery scores can be one of such cases.

Evidence on Strain and Recovery Scores

A recently published paper studied the usefulness of the WHOOP strap. The researchers’ goal was to test whether WHOOP-derived measures (including HRV, RHR, Strain and Recovery) were associated with laboratory measurements of metabolism (resting metabolic rate and T3 Thyroid hormone). They also looked at whether WHOOP measures are “significantly associated with a validated, sports-specific questionnaire commonly used to capture sports-specific and general stress, as well as recovery and the balance between stress and recovery, that is, the RESTQ.” In other words, researchers looked at physiological and psychological measures that indicate an athlete’s stress and recovery states, and compared them against the information presented by WHOOP’s strap and app.

What the scientists did:

The authors recruited an NCAA Division 1 Swim team, which included male and female Olympic Trials qualifiers, All-Americans and national team members from different countries. Subjects wore a WHOOP strap during training, but researchers specifically targeted a 6-week “Overload” period, as classified by their coaches, which would lead to a peak just before the championship season.

The rationale for timing the study was that the training overload would lead to increased stress (in part related to Energy Deficiency induced by the increased training load). This energy deficiency would then result in lower resting metabolic rates and thyroid hormone levels, both of which are metabolic markers that have been associated with energy deficiency. The overall stress would also be reflected in the RESTQ.

It is important to recognize that the purpose of this type of research is to find less invasive and less expensive ways to gather valid information about athletes. The lab tests used to evaluate lower metabolic rates and thyroid hormone levels are both invasive and expensive. If the metrics from a WHOOP strap can be shown to correlate with lab measurements accurately and reliably, coaches could track athletes’ stress states more frequently or consistently.

What WHOOP measures:

Besides HRV and RHR, WHOOP generates a few of its own metrics that were used in this study. It is important to point out that these metrics are black boxes. We have no idea what algorithms wearable companies use to calculate those metrics, or even what data they consider for input variables in their equations. That’s not a knock against WHOOP, specifically. It’s true across the wearable sensor/device industry.

WHOOP says Strain is calculated based on how high heart rate was during a workout. Recovery is calculated based on a combination of metrics, including HRV, RHR, “sleep performance”, breathing rate, skin temperature and blood oxygen levels. Recovery is then presented as a percentage. Each of those variables has the potential for error when measured by an arm-mounted device. We don’t know how the information is combined to come up with a value to present to an athlete, nor how the algorithm corrects for potential measurement errors. Finally, WHOOP attempts to communicate exercise energy expenditure, which seems to be based on heart rate during exercise.

What researchers found:

These swimmers were serious. They completed three strength training sessions per week and swam almost every day. They averaged 18 hours and 20 minutes of total training per week, swimming 6700 yards per day. However, something that seemed a bit strange from the data was that they reported average exercise energy expenditures of 901 kcal/day for the males and 578 for the females. This is notable because the men tipped the scales at 83.8 kg on average (185 lbs) and averaged 2 h 34 min of training per day. That energy expenditure seems incongruous with the level of training and caliber of the subjects. It is likely that the energy expenditure (which was estimated from heart rate, which was also used to calculate Strain) was underestimated.

Besides the energy expenditure glitch, they found that HRV was significantly correlated with sport-specific stress and total stress, as well as with lower-than-expected resting metabolic rate (which would be considered metabolic suppression, linked to energy deficiency and possibly overreaching). This is not surprising, since WHOOP has been previously shown to be pretty good at measuring HRV, and HRV should definitely get worse (i.e., lower) with stress. However, Strain and Recovery showed no relationship with any of the metabolic or RESTQ-derived stress variables.

How do we know?

The table below reports r values, which is a measure of the linear trend between two variables. An r value of 1 means there is a strong positive correlation between the two variables, whereas -1 indicates a strong negative correlation. An r value of zero means there is no correlation between the two variables. In the table below, let’s use “total recovery” row as an example. Across HRV, RHR, Strain, and Recovery, the correlation with “total recovery” as measured by the RESTQ resulted in r values of -0.05, -0.18, -0.03, -0.01, respectively. In other words, virtually no correlation.

In regard to the table below, researchers reported, “We found a negative correlation between HRV and sport-specific stress (r  =  −0.462; p  =  0.026) and total stress (r  =  −0.459; p  =  0.028) in all participants. No other correlations between HRV, RHR, strain, or recovery and general stress, any recovery subscales (general, sport-specific, and total), or recovery–stress balance variables were found in all participants.” In research contexts, strong correlations should have an r value between 0.8 and 1.0 or between -0.8 and -1.0. The two negative correlations they found, at r  =  −0.462 and r  =  −0.459, would be classified as weak to moderate correlations.

Table from Whoop wearable device study

Male vs. Female Data

When data were analyzed by sex, none of the WHOOP variables were related to metabolic or stress variables for the females. However, in males, they found significant relationships between stress and HRV or resting heart rate. The interesting twist was that they found significant correlations between Strain and stress variables derived from the RESTQ (general stress, sport-specific stress, total stress and stress balance). In other words, the RESTQ – a low-tech questionnaire – revealed the expected correlations when the wearable device did not.

Free Cycling Training Assessment Quiz

Take our free 2-minute quiz to discover how effective your training is and get recommendations for how you can improve.

Figure from WHOOP study in swimmers

Another interesting piece was that the correlations were in the opposite direction that I would have expected: higher Strain corresponded with lower stress levels. The authors speculated that perhaps athletes who felt less stressed were able to push harder during training, thereby generating higher strain scores. As an aside, a suppressed heart rate during exercise at a given intensity has been associated with overreaching, which could explain that, because Strain relies on heart rate data. But then, if that means we cannot take heart rate into account, it would mean that Strain would lose some of its ability to reflect how strained an athlete is during exercise.

Final thoughts

In their conclusion, the authors said that WHOOP-derived HRV may provide insights into metabolism and sport-specific stress among athletes. That’s a well-established conclusion; HRV is useful for endurance athletes. However, this study should also serve as a word of caution about metrics derived from black-box proprietary algorithms, such as Strain and Recovery score.

It is important to make a distinction between “measures” of things that can be measured (such as heart rate or HRV) and “estimates” of unknown parameters. Some of these unknowns are impossible to measure and assigned a specific number, such as “stress”, “readiness”, “recovery”, “body battery”, or “sleep quality”. At best, those will be educated guesses. That doesn’t mean they have no value, however. There may be a use for them in training (for example, looking at trends over time), but we caution athletes not to be over reliant on algorithm-based scores to obtain insights or make decisions about the training process.

Think of this as the classic “trust but verify” scenario. Pay attention to the scores over time, but always view them in context of measurable data and subjective perceptions of stress and recovery (which evidence continues to show are remarkably accurate). In many cases, our own subjective perception of stress may be at least as good as any wearable device.

About Coach Frederic

Frederic Sabater Pastor @fredericspast is a multitalented and multilingual (English 🇺🇸, Spanish 🇪🇸, Catalan ) Spanish coach based out of France. He has published multiple acclaimed articles in the International Journal of Sports Physiology and Performance in the past few years such as “VO2max and Velocity at VO2max Play a Role in Ultradistance Trail-Running Performance”. In addition to his success in academia, Frederic has had coaching success with athletes in a variety of disciplines including triathlons, track and road races, obstacle course events, and ultra/trail races.


Kellmann, Michael, and K Wolfgang Kallus. Recovery-Stress Questionnaire for Athletes : User Manual. Champaign, Ill. ; Leeds, Human Kinetics, 2001.

Lundstrom, E. A., De Souza, M. J., Koltun, K. J., Strock, N. C. A., Canil, H. N., & Williams, N. I. (2023). Wearable technology metrics are associated with energy deficiency and psychological stress in elite swimmers. International Journal of Sports Science & Coaching, 0(0). https://doi.org/10.1177/17479541231206424

Miller, Dean J et al. “A Validation of Six Wearable Devices for Estimating Sleep, Heart Rate and Heart Rate Variability in Healthy Adults.” Sensors (Basel, Switzerland) vol. 22,16 6317. 22 Aug. 2022, doi:10.3390/s22166317

Strock, Nicole C A et al. “Indices of Resting Metabolic Rate Accurately Reflect Energy Deficiency in Exercising Women.” International journal of sport nutrition and exercise metabolism vol. 30,1 (2020): 14-24. doi:10.1123/ijsnem.2019-0199


FREE Mini-Course: Learn How to Maximize Your Limited Training Time

Learn step-by-step how to overcome limited training time and get faster. Walk away with a personalized plan to increase your performance.

This field is for validation purposes and should be left unchanged.

Comments 9

  1. Pingback: Highest Strain on Whoop - eBikeAI

  2. Interesting angle…. Attacking marketing is purely marketing. These devices are the best things we have ever had. Period. You’re basically saying that the buttons on the remote control don’t always work properly. Sure, just get up and change the channel then.
    I’ll continue using the metrics (remote control) provided. What would you suggest? A monitor (coach) for the monitor?? Aaahh, marketing. Lol

  3. Not surprised. I have a wearable watch, I won’t say the name so that giant company doesn’t try to cancel me. But it’s detected my Afib just a couple of times the last few years. I know how to read ECG s and the instant ECG app, which I love btw, shows Afib and I can feel Afib heart rate changes very well since my resting HR is in the high 40s and it jumps instantly to the 90s. When I started looking in to it, it says it doesn’t take true continuous readings so duh, it going to miss some episodes. And when I’m working out and my HR seemingly jumps 10-15 BPM, that’s just artifact. So yeah, wearables have limitations.

  4. I was, like many cyclists interested in the Whoop strap when many popular American Cyclocross athletes were using social media to talk about the device a few years back (yay marketing!), but was almost as quickly disinterested in the device when I discovered that they used a wrist strap to measure HR. I had and have found that wrist-based HR monitoring devices to be TERRIBLY inaccurate. And if you can’t measure HR accurately, it would seem to me all of the other metrics and algorithms kind of fall apart.

      1. Yes, which is why I see an issue with part of this study. I didn’t look at the references to be honest (suppose I could, and while I should, honestly how many people actually do?) and I know wrist HR can be extremely off when the optical sensor gets wet, at least in my experience, with my 5 y.o. Garmin watch.

Leave a Reply

Your email address will not be published. Required fields are marked *