Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Reproducibility of objectively measured physical activity and sedentary time over two seasons in children; Comparing a day-by-day and a week-by-week approach

  • Eivind Aadland ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Validation, Writing – original draft, Writing – review & editing

    eivind.aadland@hvl.no

    Affiliation Faculty of Teacher Education and Sports, Campus Sogndal, Western Norway University of Applied Sciences, Norway

  • Lars Bo Andersen,

    Roles Funding acquisition, Supervision, Writing – review & editing

    Affiliation Faculty of Teacher Education and Sports, Campus Sogndal, Western Norway University of Applied Sciences, Norway

  • Turid Skrede,

    Roles Investigation, Project administration, Resources, Writing – review & editing

    Affiliation Faculty of Teacher Education and Sports, Campus Sogndal, Western Norway University of Applied Sciences, Norway

  • Ulf Ekelund,

    Roles Funding acquisition, Supervision, Writing – review & editing

    Affiliation Department of Sports Medicine, Norwegian School of Sport Sciences, Norway

  • Sigmund Alfred Anderssen,

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing

    Affiliations Faculty of Teacher Education and Sports, Campus Sogndal, Western Norway University of Applied Sciences, Norway, Department of Sports Medicine, Norwegian School of Sport Sciences, Norway

  • Geir Kåre Resaland

    Roles Conceptualization, Funding acquisition, Investigation, Project administration, Resources, Writing – review & editing

    Affiliations Faculty of Teacher Education and Sports, Campus Sogndal, Western Norway University of Applied Sciences, Norway, Center for Health Research, Førde Central Hospital, Norway

Abstract

Introduction

Knowledge of reproducibility of accelerometer-determined physical activity (PA) and sedentary time (SED) estimates are a prerequisite to conduct high-quality epidemiological studies. Yet, estimates of reproducibility might differ depending on the approach used to analyze the data. The aim of the present study was to determine the reproducibility of objectively measured PA and SED in children by directly comparing a day-by-day and a week-by-week approach to data collected over two weeks during two different seasons 3–4 months apart.

Methods

676 11-year-old children from the Active Smarter Kids study conducted in Sogn og Fjordane county, Norway, performed 7 days of accelerometer monitoring (ActiGraph GT3X+) during January-February and April-May 2015. Reproducibility was calculated using a day-by-day and a week-by-week approach applying mixed effect modelling and the Spearman Brown prophecy formula, and reported using intra-class correlation (ICC), Bland Altman plots and 95% limits of agreement (LoA).

Results

Applying a week-by-week approach, no variables provided ICC estimates ≥ 0.70 for one week of measurement in any model (ICC = 0.29–0.66 not controlling for season; ICC = 0.49–0.67 when controlling for season). LoA for these models approximated a factor of 1.3–1.7 of the sample PA level standard deviations. Compared to the week-by-week approach, the day-by-day approach resulted in too optimistic reliability estimates (ICC = 0.62–0.77 not controlling for season; ICC = 0.64–0.77 when controlling for season).

Conclusions

Reliability is lower when analyzed over different seasons and when using a week-by-week approach, than when applying a day-by-day approach and the Spearman Brown prophecy formula to estimate reliability over a short monitoring period. We suggest a day-by-day approach and the Spearman Brown prophecy formula to determine reliability be used with caution.

Trial Registration

The study is registered in Clinicaltrials.gov 7th April 2014 with identification number NCT02132494.

Introduction

Objective assessment of movement has moved the field of physical activity (PA) monitoring substantially forward by replacing self-report measures suffering from many well-known limitations. Still, there are many unresolved issues regarding data reduction and quality assessment of data derived from accelerometry. This has resulted in great variation in procedures used and criteria applied to define what constitutes a valid measurement [1]. Behavior vary greatly over time. Thus, an important aspect of accelerometer measurements is how many days or periods of measurement are to be included to obtain reproducible estimates of habitual activity level. This is particularly true when children live in an area with a significant change in weather during different seasons [24]. As most diseases that can be prevented by PA develop over longer periods, the “true” habitual PA level would be more closely related to health than a short–for example a 7-day–snapshot. Association analyses will inherently suffer from severe regression dilution bias, if relying on a monitoring period that is too short [5]. Although the length of a period to be considered to constitute a person’s “habitual” or “regular” PA level is not easily defined, a 7-day period is arguably a short, and possible insufficient, period.

Most studies in children apply a criterion of a minimum 3 or 4 wear days to constitute a valid accelerometer-measurement period [1]. Although findings vary between studies in both adults [610] and children [1122], most evidence suggest that a reasonable reliability (i.e., intra-class correlation (ICC)) of ~ 0.70–0.80 are achieved with 3–7 days of monitoring. Most previous studies have estimated the reliability of single days and thereafter calculated the number of days needed to reach a reasonable reliability level (often considered to be ICC = 0.80), based on the Spearman Brown prophecy formula for measurements conducted over a single 7-day period. Unfortunately, these study designs have received critique for being likely to underestimate the number of monitoring days needed, and their conclusions should therefore be interpreted with caution [2325]. Importantly, these results are in principle only generalizable to the included days, as inclusion of additional days, weeks or seasons will add variability to the measurement and thus lower the reliability estimates for a given number of days (i.e., the variance partitioning of a fixed number of days to the total variance will decrease if the total variance increase).

Some studies have determined the reliability for several periods of measurement over the course of two weeks up to a year, of which all have shown considerable intra-individual variation [26, 27, 25, 28, 29]. Reliability has been shown to be ~ 0.70–0.80 for one out of two and three consecutive weeks of measurement in preschool children and adults, respectively [28, 29]. However, poorer estimates are found in studies including several seasons [26, 27, 25], leaving reliability estimates of ~ 0.50 for one week monitoring in children [27, 25]. Of particular interest, Wickel and Welk [25] showed that even applying three measurement periods across different seasons, did not result in a reliability of 0.80 using an absolute agreement definition (i.e., not controlling for season). This finding agrees with studies showing substantial seasonal variation in activity level in children and adolescents [24], which are obviously not captured when relying on a single measurement period. While the lower reliability estimates from these latter studies involving several monitoring periods might be due to variation across seasons, there might also be differences between the analytic approaches applied. To the best of our knowledge, no previous study has directly compared a day-by-day and a week-by-week approach for determining reliability for accelerometer outcomes, therefore we will address this important question. Furthermore, few studies have determined the intra-individual week-by-week reproducibility of accelerometer outcomes using absolute measures of agreement (i.e., limits of agreement (LoA) and/or standard error of the measurement (SEM)) [28, 29]. These previous studies should be extended to evaluation of agreement over different seasons.

The present study had two aims: 1) to determine the reproducibility of accelerometer-determined PA and sedentary time (SED) for one out of two 7-day measurement periods obtained during two different seasons separated by 3–4 months in a large sample of children; and 2) to directly compare a day-by-day and a week-by-week approach for analyzing reproducibility of accelerometer data. We hypothesized great variability across the monitoring periods for all accelerometer outcomes, resulting in reliability estimates lower than ICC = 0.80, and lower reliability using a week-by-week as compared to a day-by-day approach.

Materials and methods

Participants

The present analyses are based on data obtained in fifth grade children from the Active Smarter Kids (ASK) cluster-randomized trial, conducted in Norway during 2014–2015 [30, 31]. Physical activity was measured with accelerometry at baseline (mainly May to June 2014) and follow-up (April to May 2015) in all children, as well as in approximately two-thirds of the children that we invited to complete a mid-term measurement (January to February 2015). In the present study, we include the mid-term and the follow-up measurement, to allow for comparison of PA and SED over two different seasons separated by 3–4 months. Additionally, as the intervention was ongoing at both these time-points, we included both the intervention and the control groups. We have previously published a detailed description of the study [30], and do only provide a brief overview of the accelerometer handling herein.

Our procedures and methods conform to ethical guidelines defined by the World Medical Association’s Declaration of Helsinki and its subsequent revisions. The South-East Regional Committee for Medical Research Ethics approved the study protocol (reference number 2013/1893). We obtained written informed consent from each child’s parents or legal guardian and from the responsible school authorities prior to all testing. The study is registered in Clinicaltrials.gov with identification number: NCT02132494.

Procedures

Physical activity was measured using the ActiGraph GT3X+ accelerometer (Pensacola, FL, USA) [32]. During both measurements, participants were instructed to wear the accelerometer at all times over 7 consecutive days, except during water activities (swimming, showering) or while sleeping. Units were initialized at a sampling rate of 30 Hz. Files were analyzed at 10 second epochs using the KineSoft analytical software version 3.3.80 (KineSoft, Loughborough, UK). Data was restricted to hours 06:00 to 23:59. In all analyses, consecutive periods of ≥ 20 minutes of zero counts were defined as non-wear time [33, 1]. Results are reported for overall PA level (cpm), as well as minutes per day spent SED (< 100 cpm), in light PA (LPA) (100–2295 cpm), in moderate PA (MPA) (2296–4011 cpm), in vigorous PA (VPA) (≥ 4012 cpm), and in moderate-to-vigorous PA (MVPA) (≥ 2296 cpm), determined using previously established and validated cut points [34, 35]. We reported main results for four different wear time requirements (≥ 8 and ≥ 10 hours/day, and ≥ 3 and ≥ 5 days/week), and included sensitivity analyses requiring the inclusion of both weekdays and weekend days (≥ 3 weekdays and ≥ 1 weekend day, and ≥ 4 weekdays and 2 weekend days).

Statistical analyses

Children’s characteristics were reported as frequencies, means and standard deviations (SD). Differences between included and excluded children, differences in PA level between measurements, and differences in intra-individual variation for the combined period (14 days) against the mean of the two separate weeks was tested using a mixed effect model including random intercepts for children. Wear time was included as a covariate for analyses of PA and SED.

We estimated reliability using two approaches; 1) day-by-day analyses, and 2) week-by-week analyses. In both approaches, reliability for single days (day-by-day approach) and weeks (week-by-week approach) of measurement (ICCs) was assessed using variance partitioning applying a one-way random effect model not controlling for season (i.e., determining reliability based on an absolute agreement definition) and a two-way mixed effect model controlling for season (i.e., determining reliability based on a consistency definition) [36]. All models were adjusted for wear time by adding wear time as a covariate, as wear time has a strong association with PA and SED estimates and also impact reliability [29], and since most studies control for wear time. The number of days (day-by-day approach) and weeks (week-by-week approach) needed to obtain a reliability of 0.80 (N) was estimated using the Spearman Brown prophecy formula (ICC for average measurements [ICCk]) [6, 36]: N = ICCt/(1-ICCt)*[(1-ICCs)/ICCs], where N = the number of days or weeks needed, ICCt = the desired level of reliability, and ICCs = the reliability for single days or weeks. Additionally, the ICCk (between-subject variance/[between-subject variance + residual variance/k]) for k = 6 (i.e., the mean number of monitoring days/week) was calculated to directly compare reliability estimates for one week of measurement from the day-by-day and the week-by-week approach.

In the week-by-week analyses, we additionally applied Bland Altman plots, showing the difference between two subsequent weeks as a function of the mean of the two weeks [37], to visualize the week-by-week measurement variability. We calculated 95% LoA and coefficient of variation (CV) from the residual variance (i.e., within-subjects) error term based on the variance partitioning models (LoA = √residual variance *√2*1.96; CV = √residual variance/mean values) [38]. We assessed whether the variability varied as a function of the mean activity levels (i.e., whether data were homoscedastic or heteroscedastic) by correlating absolute differences against the mean values using Pearson’s correlation coefficient (r). For quantification of measurement error, an absolute measure of error (e.g., LoA) provide the correct estimate for homoscedastic data (where there are no association between variability and mean values), whereas a relative measure of error (e.g., CV) provide the correct estimate for heteroscedastic data (where variability increases with increased mean values) [39]. Yet, both measures provide valid reliability estimates for the mean sample PA levels.

All analyses were performed using IBM SPSS v. 23 (IBM SPSS Statistics for Windows, Armonk, NY: IBM Corp., USA). A p-value < .05 indicated statistically significant findings.

Results

Participants’ characteristics

Of the 1129 children included in the ASK-study, 676 children provided accelerometer data at the mid-term and post measurement, of whom 615 children (50% boys) fulfilled the ≥ 480 minutes/day and ≥ 3 days/week wear criterion (Table 1). There were no differences between the included (n = 615) and excluded (n = 514) children in anthropometry (p ≥ .092) or PA level at the post measurement (p ≥ .218). For the included children, the number of wear days was similar between the winter and spring measurement, whereas the valid wear time was marginally higher during the spring measurement. Overall PA level (cpm) and intensity-specific PA was significantly higher (except for LPA in girls), and SED was significantly lower, in the spring than in the winter for both boys and girls. The greatest increase from the winter to the spring measurement was seen for VPA (50% in boys and 44% in girls), overall PA level (31% in boys and 26% in girls), and MVPA (28% in boys and 23% in girls).

thumbnail
Table 1. The children’s characteristics.

Values are mean (SD) if not otherwise stated.

https://doi.org/10.1371/journal.pone.0189304.t001

Reliability based on a day-by-day approach

Table 2 shows the reliability for single days of measurement (ICCs) and the number of days (N) needed to achieve a reliability of 0.80, as estimated by the Spearman Brown prophecy formula. For all variables, reliability increased marginally (N decreased by 0.1–0.8 days) when applying a stricter wear time criteria (10 hours/day vs. 8 hours/day). For intensity-specific PA and SED, reliability was marginally better during the winter (N was 0.1–2.4 days lower than in the spring), whereas a profound difference was found for overall PA level (cpm), for which N ~ 7 days at winter and ~ 12 days at spring (S1 Table). The mean intra-individual SDs increased by 4.2–13.9% across variables when including two weeks of measurement compared to the mean of the two separate weeks (Overall PA: 221 vs. 194 cpm, p < .001; SED: 79.3 vs. 76.1 min/day, p < .001; LPA: 42.5 vs. 40.7 min/day, p < .001; MPA: 14.9 vs. 14.0 min/day, p < .001; VPA: 14.0 vs 12.5 min/day, p < .001; MVPA: 26.8 vs 24.7 min/day, p < .001). Consistent with this increased variation, reliability estimates decreased when analyzing the overall 14-day period compared to either of the two weeks. When applying the whole 14-day period, we estimated that 7–15 and 7–14 days of measurement was needed to reach a reliability level of 0.80 when not controlling for season and controlling for season, respectively.

thumbnail
Table 2. Reliability for single days of measurement (ICCs) and number of days needed to achieve a reliability of 0.80 (N) for the two weeks (winter and spring).

https://doi.org/10.1371/journal.pone.0189304.t002

Reliability based on a week-by-week approach

We found minor improvements in week-by-week reliability when data was accumulated over longer daily wear time (≥ 8 to ≥ 10 hours) and more days (≥ 3 to ≥ 5 days) (Table 3), and when requiring both week and weekend days (S2 Table). The bias (spring—winter) between the weeks was in average 137 (95% CI; 124–151) (p < .001) cpm for overall PA, and -10.2 (-14.3–-6.1) (p < .001), 5.5 (3.2–7.8) (p < .001), 4.6 (3.7–5.4) (p < .001), 8.4 (7.6–9.3) (p < .001), and 13.1 (11.6–14.5) (p < .001) min/day for SED, LPA, MPA, VPA, and MVPA, respectively. As shown in Table 3, no variables provided ICC estimates ≥ 0.70 for one week of measurement in any model, values being 0.29–0.66 when not controlling for season (using an absolute agreement definition), and 0.49–0.67 when controlling for season (using a consistency definition), indicating substantial intra-individual variation over time for all outcomes, as shown in Fig 1. Agreement (LoA) for these models approximated a factor of 1.3–1.7 the sample PA level SDs. CVs were small to moderate for SED (0.05–0.06) and LPA (0.08–0.09), but large for MPA (0.19–0.22), VPA (0.33–0.44), MVPA (0.21–0.27), as well as overall PA (0.21–0.28). Variability increased with increased activity level for overall PA level (r for absolute differences vs. mean activity level = 0.56, p < .001), MPA (r = 0.27, p< .001), VPA (r = 0.55, p < .001) and MVPA (r = 0.39, p < .001), but not for SED and LPA (r = -0.05–0.06, p ≥.152). The number of weeks needed to reach a reliability level of 0.80 was 2–10 when not controlling for season, and 2–4 when controlling for season. Overall PA level was clearly the least reliable outcome across models.

thumbnail
Fig 1. Bland Altman plots of agreement for different outcome variables over two weeks of measurement performed in the winter and spring, 3 to 4 months apart.

Bland Altman plots (the mean of two weeks of measurement on the x-axis versus the difference between them on the y-axis) for (a) overall physical activity (cpm), and minutes per day spent (b) sedentary (SED), (c) in light physical activity (LPA), (d) in moderate physical activity (MPA), (e) in vigorous physical activity (VPA), and (f) in moderate-to-vigorous physical activity (MVPA). Results are based on a ≥ 8 hours & ≥ 3 days wear time criterion (n = 615). The full line is the bias between weeks, whereas the dotted lines are 95% limits of agreement corrected for wear time and season.

https://doi.org/10.1371/journal.pone.0189304.g001

thumbnail
Table 3. The week-by-week reproducibility for different outcome variables for two weeks of measurement.

https://doi.org/10.1371/journal.pone.0189304.t003

Reliability was similar for the intervention and control groups, the maximum difference being ICC = 0.05 across outcomes and models.

Comparison of reliability estimates across approaches

As reliability estimates differed between the day-by-day and the week-by-week approaches, we show a direct comparison of estimates for these approaches in Table 4. Estimates using the day-by-day approach are averaged over 6 monitoring days, thus being similar to the weekly averages in terms of the number of monitoring days included. Despite both calculations were based on the exact same data, reliability estimates was substantially higher using the day-by-day approach (ICC = 0.62–0.77), compared to the week-by-week approach (ICC = 0.29–0.65).

thumbnail
Table 4. The week-by-week reliability for different outcome variables for two weeks of measurement.

https://doi.org/10.1371/journal.pone.0189304.t004

Discussion

The present study aimed to determine the reproducibility of accelerometer-determined PA and SED over two different seasons and to directly compare a day-by-day and a week-by-week approach for analyzing reproducibility of accelerometer data. Our results suggest that 1) the reliability for one out of two week-long measurements undertaken 3–4 months apart resulted in estimates clearly lower than most previous studies that have relied on a single monitoring period, and that 2) a day-by-day approach overestimated the reliability compared to a week-by-week approach. Our findings indicate that the children’s PA level varied up to ± 1.3–1.7 SD units between the two measurements, indicating substantial measurement error for all variables.

Most previous studies investigating reliability and the required number of accelerometer monitoring days have estimated reliability based on day-by-day analyses using a single 7-day monitoring period [11, 16, 17, 40, 18, 19, 22, 20, 21, 1215]. In general, these studies conclude that 3–7 monitoring days are sufficient in children. This approach, however, restricts variation and underestimates the number of monitoring days and periods needed to obtain reliable estimates. We applied two monitoring periods covering two different seasons, leading to findings very similar to previous studies that have applied multiple measurement periods over the course of several seasons. These studies have yielded substantially lower reliability estimates in adults [26] and children [27, 25], concluding that more than one monitoring period is needed to reach a reliability level of 0.80. Mattocks [27] determined overall PA, MVPA and SED over four 7-day periods over approximately one year using the Actigraph 7164 accelerometer in 11–12-year-old children. The ICC for one period of measurement varied from 0.45 to 0.59 across outcome variables. Wickel & Welk [25] found an ICC of 0.46 for one out of three 7-day periods to assess steps for the Digiwalker pedometer in 80 children aged ~ 10 years. The present findings along with these previous findings question the validity of one week of measurement to determine children’s “true” habitual activity level.

Whereas we found that 7–15 days of measurement was required to reach a reliability of 0.80 based on the day-by-day analyses, 2–10 weeks of measurement was required based on the week-by-week analyses. These contrasting findings strengthen the argument that the estimation of number of days needed using the traditional approach, that is, applying the Spearman Brown prophecy formula to single days, might be used with caution. We have no explanation for these contrasting findings, but our findings do support previous studies that have warned against a possible overestimation of reliability by the day-by-day approach [2325]. This is especially clear when the assessment is spread across different seasons. For example, two studies have revealed similar results for a day-to-day and a week-to-week approach [28, 29]. However, contrary to the present study, these studies were based on two consecutive weeks of measurement. In contrast, both the present study and others that have introduced multiple seasons [27, 25], found increased variability in estimates. Apparently, seasonality has a more profound effect on the week-by-week analysis than the day-by-day analysis, as illustrated by the differences in reliability estimates with and without controlling for season shown in Table 4. The difference in variance between the two monitoring periods (Table 1) could explain the findings, as the model assumes compound symmetry and the ICC are sensitive to asymmetry [36], however, this difference between measurements applies to both analytic approaches. Nevertheless, it is clear that applying the Spearman Brown prophecy formula/the ICCk calculated for average days [36], which imply dividing the residual variance over the desired number of days, seems overly optimistic when compared to week-by-week approach. Notably, this limitation also applies to the estimation of the number of weeks needed for the week-by-week approach.

As noise in exposure (x) variables will lead to attenuation of regression coefficients (regression dilution bias), and noise in outcome (y) variables will increase standard errors [5], unreliable measures weaken researchers ability to make valid conclusions. In epidemiology, researchers are in general interested in the long-term “true” habitual PA level, rather than activity during the most recent days. There are some health characteristics, as for example insulin resistance, lipid metabolism and blood pressure, that might change with acute increases or decreases in PA [41]. Despite this, a child’s level of fatness, aerobic fitness or motor skill takes months or years to change. For such stable traits, association analyses will inherently suffer from regression dilution bias if relying on a 7-day monitoring period that provide an insufficient snapshot of children’s habitual activity level. Similarly, tracking coefficients for PA are generally low to moderate [4244], probably due to measurement error as much as true change over time. Interestingly, our reliability estimates over 3–4 months are quite similar to many tracking estimates reported in the literature. This finding challenge our understanding of behavioral change versus measurement error, as they are both different sides of the same coin.

Although an increased monitoring length might improve validity of study conclusions, the burden for participants should be kept minimal to maximize response rate and compliance. We have previously performed 2 and 3-week monitoring protocols in preschool children and adults, respectively, without any major issues regarding compliance [28, 29]. More recently, we have also successfully performed a 2-week monitoring protocol in larger samples of children, adults and older people, demonstrating this protocol’s acceptance in various context. Still, performing measurements over separate as opposed to consecutive periods might pose an increased burden for participants, as well as for researchers. Notably, the required monitoring volume is a matter of the research question posed, as population-estimates on a group level requires a lower level of reproducibility than individual-level estimates used for association analyses [24].

Strengths and limitations

The main strength of the present study is the inclusion of a large and representative sample of children. As reliability estimates (i.e., ICCs) depend on the sample variation [37, 45, 38], the validity of the estimated ICCs presented herein should be generalizable to other contexts, including large-scale population studies. Another strength is inclusion of measurements conducted 3–4 months apart, during two different seasons. Thus, these data clearly serve the aim of the study; we introduced more variability than within a shorter time frame, but also restricting the duration to some few months, where “true” changes over time would be expected to be limited. A limitation, though, is the inclusion of only two weeks and two seasons, as inclusion of more observations probably would introduce more variability and lead to more conservative reproducibility estimates [27, 25]. Moreover, Norway has profound seasonal differences in weather conditions. This characteristic might limit generalizability to areas with less pronounced seasonality. Finally, the inclusion of the intervention group in the current analyses might have caused additional variation to the data, as the intervention group could be expected to change their PA level over time. Yet, the intervention was ongoing during both measurements, there was no effect of the intervention on PA levels [31], and reliability estimates differed marginally between the intervention and control groups.

Conclusion

We conclude that a one-week accelerometer monitoring period conducted during two different seasons 3–4 months apart resulted in modest reproducibility between measurements in a large sample of children (ICC for one week = 0.32–0.67). The traditional approach for estimating the number of wear days needed for accelerometer measurements–applying the Spearman Brown prophecy formula to single days of measurement over a short monitoring period–resulted in more optimistic reliability estimates than a week-by-week approach. Thus, consistent with previous studies that have raised concern about the traditional approach to estimate reliability of accelerometer monitoring protocols, we suggest results from studies using a day-by-day approach to determine reliability be interpreted with caution. Researchers should consider increasing the monitoring period beyond a single 7-day period in future studies.

Supporting information

S1 File. The data file underlying the study findings.

https://doi.org/10.1371/journal.pone.0189304.s001

(XLSX)

S1 Table. Reliability for single days of measurement (ICCs) and number of days needed to achieve a reliability of 0.80 (N) for the two weeks (winter and spring) separately.

https://doi.org/10.1371/journal.pone.0189304.s002

(DOCX)

S2 Table. The week-by-week reliability for one out of two weeks of measurement for different wear criteria requiring both weekdays (3 or 4 days) and weekend days (1 or 2 days).

https://doi.org/10.1371/journal.pone.0189304.s003

(DOCX)

Acknowledgments

We thank all children, parents and teachers at the participating schools for their excellent cooperation during the data collection. We also thank Katrine Nyvoll Aadland, Mette Stavnsbo, Øystein Lerum, Einar Ylvisåker, and students at the Western Norway University of Applied Sciences (formerly Sogn og Fjordane University College) for their assistance during the data collection.

References

  1. 1. Cain KL, Sallis JF, Conway TL, Van Dyck D, Calhoon L. Using accelerometers in youth physical activity studies: a review of methods. Journal of Physical Activity & Health. 2013;10(3):437–50.
  2. 2. Atkin AJ, Sharp SJ, Harrison F, Brage S, Van Sluijs EMF. Seasonal variation in children's physical activity and sedentary time. Medicine and Science in Sports and Exercise. 2016;48(3):449–56. pmid:26429733
  3. 3. Gracia-Marco L, Ortega FB, Ruiz JR, Williams CA, Hagstromer M, Manios Y et al. Seasonal variation in physical activity and sedentary time in different European regions. The HELENA study. Journal of Sports Sciences. 2013;31(16):1831–40. pmid:24050788
  4. 4. Ridgers ND, Salmon J, Timperio A. Too hot to move? Objectively assessed seasonal changes in Australian children's physical activity. International Journal of Behavioral Nutrition and Physical Activity. 2015;12.
  5. 5. Hutcheon JA, Chiolero A, Hanley JA. Random measurement error and regression dilution bias. British Medical Journal. 2010;340.
  6. 6. Trost SG, McIver KL, Pate RR. Conducting accelerometer-based activity assessments in field-based research. Medicine And Science In Sports And Exercise. 2005;37(11):S531–S43.
  7. 7. Jerome GJ, Young DR, Laferriere D, Chen CH, Vollmer WM. Reliability of RT3 accelerometers among overweight and obese adults. Medicine and Science in Sports and Exercise. 2009;41(1):110–4. pmid:19092700
  8. 8. Coleman KJ, Epstein LH. Application of generalizability theory to measurement of activity in males who are not regularly active: A preliminary report. Research Quarterly for Exercise and Sport. 1998;69(1):58–63. pmid:9532623
  9. 9. Matthews CE, Ainsworth BE, Thompson RW, Bassett DR. Sources of variance in daily physical activity levels as measured by an accelerometer. Medicine And Science In Sports And Exercise. 2002;34(8):1376–81. pmid:12165695
  10. 10. Hart TL, Swartz AM, Cashin SE, Strath SJ. How many days of monitoring predict physical activity and sedentary behaviour in older adults? International Journal of Behavioral Nutrition and Physical Activity. 2011;8.
  11. 11. Basterfield L, Adamson AJ, Pearce MS, Reilly JJ. Stability of habitual physical activity and sedentary behavior monitoring by accelerometry in 6-to 8-year-olds. Journal of Physical Activity & Health. 2011;8(4):543–7.
  12. 12. Addy CL, Trilk JL, Dowda M, Byun W, Pate RR. Assessing preschool children's physical activity: how many days of accelerometry measurement. Pediatric Exercise Science. 2014;26(1):103–9. pmid:24092773
  13. 13. Hinkley T, O'Connell E, Okely AD, Crawford D, Hesketh K, Salmon J. Assessing volume of accelerometry data for reliability in preschool children. Medicine and Science in Sports and Exercise. 2012;44(12):2436–41. pmid:22776873
  14. 14. Hislop J, Law J, Rush R, Grainger A, Bulley C, Reilly JJ et al. An investigation into the minimum accelerometry wear time for reliable estimates of habitual physical activity and definition of a standard measurement day in pre-school children. Physiological Measurement. 2014;35(11):2213–28. pmid:25340328
  15. 15. Penpraze V, Reilly JJ, MacLean CM, Montgomery C, Kelly LA, Paton JY et al. Monitoring of physical activity in young children: How much is enough? Pediatric Exercise Science. 2006;18(4):483–91.
  16. 16. Ojiambo R, Cuthill R, Budd H, Konstabel K, Casajus JA, Gonzalez-Aguero A et al. Impact of methodological decisions on accelerometer outcome variables in young children. International Journal of Obesity. 2011;35:S98–S103. pmid:21483428
  17. 17. Rich C, Geraci M, Griffiths L, Sera F, Dezateux C, Cortina-Borja M. Quality control methods in accelerometer data processing: defining minimum wear time. Plos One. 2013;8(6).
  18. 18. Kang M, Bassett DR, Barreira TV, Tudor-Locke C, Ainsworth B, Reis JP et al. How many days are enough? A study of 365 days of pedometer monitoring. Research Quarterly for Exercise and Sport. 2009;80(3):445–53. pmid:19791630
  19. 19. Murray DM, Catellier DJ, Hannan PJ, Treuth MS, Stevens J, Schmitz KH et al. School-level intraclass correlation for physical activity in adolescent girls. Medicine and Science in Sports and Exercise. 2004;36(5):876–82. pmid:15126724
  20. 20. Treuth MS, Sherwood NE, Butte NF, McClanahan B, Obarzanek E, Zhou A et al. Validity and reliability of activity measures in African-American girls for GEMS. Medicine and Science in Sports and Exercise. 2003;35(3):532–9. pmid:12618587
  21. 21. Trost SG, Pate RR, Freedson PS, Sallis JF, Taylor WC. Using objective physical activity measures with youth: How many days of monitoring are needed? Medicine and Science in Sports and Exercise. 2000;32(2):426–31. pmid:10694127
  22. 22. Janz KF, Witt J, Mahoney LT. The stability of childrens physical-activity as measured by accelerometry and self-report. Medicine and Science in Sports and Exercise. 1995;27(9):1326–32. pmid:8531633
  23. 23. Baranowski T, Masse LC, Ragan B, Welk G. How many days was that? We're still not sure, but we're asking the question better! Medicine and Science in Sports and Exercise. 2008;40(7):S544–S9.
  24. 24. Matthews CE, Hagstromer M, Pober DM, Bowles HR. Best practices for using physical activity monitors in population-based research. Medicine and Science in Sports and Exercise. 2012;44:S68–S76. pmid:22157777
  25. 25. Wickel EE, Welk GJ. Applying generalizability theory to estimate habitual activity levels. Medicine and Science in Sports and Exercise. 2010;42(8):1528–34. pmid:20139788
  26. 26. Levin S, Jacobs DR, Ainsworth BE, Richardson MT, Leon AS. Intra-individual variation and estimates of usual physical activity. Annals of Epidemiology. 1999;9(8):481–8. pmid:10549881
  27. 27. Mattocks C, Leary S, Ness A, Deere K, Saunders J, Kirkby J et al. Intraindividual variation of objectively measured physical activity in children. Medicine and Science in Sports and Exercise. 2007;39(4):622–9. pmid:17414799
  28. 28. Aadland E, Johannessen K. Agreement of objectively measured physical activity and sedentary time in preschool children. Preventive Medicine Reports. 2015;2:635–9. pmid:26844129
  29. 29. Aadland E, Ylvisåker E. Reliability of objectively measured sedentary time and physical activity in adults. PLoS ONE. 2015;10(7):1–13.
  30. 30. Resaland GK, Moe VF, Aadland E, Steene-Johannessen J, Glosvik Ø, Andersen JR et al. Active Smarter Kids (ASK): Rationale and design of a cluster-randomized controlled trial investigating the effects of daily physical activity on children's academic performance and risk factors for non-communicable diseases. BMC Public Health. 2015;15:709–. pmid:26215478
  31. 31. Resaland GK, Aadland E, Moe VF, Aadland KN, Skrede T, Stavnsbo M et al. Effects of physical activity on schoolchildren's academic performance: The Active Smarter Kids (ASK) cluster-randomized controlled trial. Preventive Medicine. 2016;91:322–8. pmid:27612574
  32. 32. John D, Freedson P. ActiGraph and Actical physical activity monitors: a peek under the hood. Medicine and Science in Sports and Exercise. 2012;44(1 Suppl 1):S86–S9.
  33. 33. Esliger DW, Copeland JL, Barnes JD, Tremblay MS. Standardizing and optimizing the use of accelerometer data for free-living physical activity monitoring. Journal of Physical Activity & Health. 2005;2(3):366.
  34. 34. Evenson KR, Catellier DJ, Gill K, Ondrak KS, McMurray RG. Calibration of two objective measures of physical activity for children. Journal of Sports Sciences. 2008;26(14):1557–65. pmid:18949660
  35. 35. Trost SG, Loprinzi PD, Moore R, Pfeiffer KA. Comparison of accelerometer cut points for predicting activity intensity in youth. Medicine and Science in Sports and Exercise. 2011;43(7):1360–8. pmid:21131873
  36. 36. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychological Methods. 1996;1(1):30–46.
  37. 37. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–10. pmid:2868172
  38. 38. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. Journal of Strength and Conditioning Research. 2005;19(1):231–40. pmid:15705040
  39. 39. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Medicine. 1998;26(4):217–38 pmid:9820922
  40. 40. Chinapaw MJM, de Niet M, Verloigne M, De Bourdeaudhuij I, Brug J, Altenburg TM. From sedentary time to sedentary patterns: accelerometer data reduction decisions in youth. Plos One. 2014;9(11).
  41. 41. Thompson PD, Crouse SF, Goodpaster B, Kelley D, Moyna N, Pescatello L. The acute versus the chronic response to exercise. Medicine And Science In Sports And Exercise. 2001;33(6 Suppl):S438.
  42. 42. Jones RA, Hinkley T, Okely AD, Salmon J. Tracking physical activity and sedentary behavior in childhood a systematic review. American Journal of Preventive Medicine. 2013;44(6):651–8. pmid:23683983
  43. 43. Biddle SJH, Pearson N, Ross GM, Braithwaite R. Tracking of sedentary behaviours of young people: A systematic review. Preventive Medicine. 2010;51(5):345–51. pmid:20682330
  44. 44. Telama R. Tracking of physical activity from childhood to adulthood: A review. Obesity Facts. 2009;2(3):187–95. pmid:20054224
  45. 45. Hopkins WG. Measures of reliability in sports medicine and science. Sports Medicine. 2000;30(1):1–15. pmid:10907753