A massive two-year study reveals fitness apps can help people take more steps, but the improvements are modest and uneven—raising big questions about who benefits most and how to make digital health tools sustainable.

Study: Can fitness apps work long term? A 24-month quasi-experiment of 516,818 Canadian fitness app users. Image Credit: Nan_Got / Shutterstock
In a recent study published in the British Journal of Sports Medicine, researchers evaluated the long-term effectiveness of fitness apps.
Despite over 100,000 commercial fitness apps across app stores, there is limited knowledge about their long-term effects. Systematic reviews of randomized controlled trials (RCTs) have found no study examining the impact of fitness apps beyond one year. Understanding the long-term effects of fitness apps is crucial to designing them effectively.
However, implementing longitudinal RCTs in digital environments can be challenging due to factors such as software development costs and retention issues. Moreover, relying on conventional RCT methods may limit the benefits of fitness apps. As such, robust quasi-experiments integrating strategies for improving internal validity may help uncover the long-term efficacy of fitness apps.
About the study
In the present study, researchers investigated whether a multi-component fitness app could increase physical activity (PA) over a two-year period. The “Carrot Rewards” app was a commercial fitness app with micro financial incentives developed as a public-private collaboration in Canada. Participants downloaded the app between December 2016 and December 2018.
The app was discontinued on June 19, 2019, due to insufficient funding. Data were collected until June 18, 2019. Users could start the key feature of Carrot Rewards, ‘Steps,’ upon download. There was a one- to two-week pre-intervention or baseline period (i.e., no PA incentives or personalized daily step goals) before the intervention was used earnestly, when users were asked to wear their device daily.
Users earned very small (“micro”) daily incentives for achieving adaptive daily step goals, and the withdrawal of daily rewards in December 2018 created a natural experiment to assess durability.
At least five days (before July 26, 2017) and three days (after July 26, 2017) with valid step counts were required for a baseline step count. Carrot Rewards staff established these criteria, aligning with the minimum days needed for a valid weekly average daily step count. Thereafter, the team calculated the weekly average daily step counts, ensuring at least four valid days per week.
The primary outcome was the weekly average daily step count. The analyses included users with a valid baseline step count and at least one other valid study week. A multiple linear regression model was used to assess the impact of Carrot Rewards over a 24-month period. The secondary outcome was the influence of select covariates on longitudinal effects, examined using multiple linear regression; these were baseline PA, start season, geographic location, and app engagement.
Results
The analytic sample comprised 516,818 users, with an average baseline step count of 6,035. Approximately 47.1% of users were categorized as low-active at baseline. Twelve-month retention rates ranged from 48% to 68% across start seasons. At 24 months, the rates were 47% for Winter 2016/17 and 38% for Spring 2017. The team observed slight increases in the weekly average daily step count from baseline at all time points.
In particular, a 464-step increase per day was observed 12 months after baseline, and a 242-step/day increase was noted at 24 months. The increases at six months from baseline were mostly maintained at 12 and 18 months. Users who started using the app from Winter 2016/17 showed reductions in weekly average daily step counts around study weeks 52 and 104 during the Winters of 2017/18 and 2018/19.
At 12 months, approximately 106,726 users increased their step count by ≥ 1,000 per day, and 24,937 had a 1,000-step/day increase at 24 months. At 12 months, approximately 41% of users increased their daily steps by 1,000 or more, while about 25% decreased by that amount. At 24 months, about 39% of users increased their daily steps, and about 27% reduced them.
Users with earlier start seasons, who had more prolonged exposure to rewards, experienced slight increases in their weekly average step counts from baseline to 12 months. Moreover, among those with earlier start seasons, step count increases from baseline and diminishes at 24 months.
Notably, increases in weekly average daily step counts were larger for baseline “low active” users (+1,986 steps/day at 24 months). Conversely, individuals with very high baseline PA had substantially large reductions (−3,969 steps/day at 24 months, possibly linked to motivational “crowding out” or regression to the mean). PA increases were minimal across geographic locations, with the densest region, Metropolitan Toronto, showing the smallest increases from baseline. All app engagement levels showed small or minimal increases, and at 12 months, those with the highest engagement showed slightly smaller gains, a pattern that attenuated by 24 months.
Conclusions
Taken together, there were very small increases in the weekly average daily step count from baseline at all time points. However, average gains did not reach the commonly cited 1,000-step threshold; nonetheless, more users improved than declined by ≥1,000 steps per day at both 12 and 24 months. Therefore, modest population-level increases were maintained over two years, with more people exhibiting clinically significant increases than reductions.
Interpretation is limited by the non-randomized design, attrition at 12 and 24 months, and potential measurement error in step counts under free-living conditions. The authors also highlight that micro-incentives proved financially unsustainable, underscoring the need for alternative models such as lotteries or AI-personalized goals.