Next Article in Journal
Bayesian Finite Element Model Updating and Assessment of Cable-Stayed Bridges Using Wireless Sensor Data
Next Article in Special Issue
Fusion of Heart Rate, Respiration and Motion Measurements from a Wearable Sensor System to Enhance Energy Expenditure Estimation
Previous Article in Journal
An Ontology-Based Cybersecurity Framework for the Internet of Things
Previous Article in Special Issue
A Device-Independent Efficient Actigraphy Signal-Encoding System for Applications in Monitoring Daily Human Activities and Health
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantile Coarsening Analysis of High-Volume Wearable Activity Data in a Longitudinal Observational Study

1
Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, USA
2
IBM Watson Research Center, Yorktown Heights, NY 10598, USA
3
Center for Behavioral Cardiovascular Health, Department of Medicine, Columbia University Medical Center, New York, NY 10032, USA
4
Department of Neurology, Columbia University Medical Center, New York, NY 10032, USA
*
Author to whom correspondence should be addressed.
Submission received: 13 August 2018 / Revised: 4 September 2018 / Accepted: 6 September 2018 / Published: 12 September 2018
(This article belongs to the Special Issue Data Analytics and Applications of the Wearable Sensors in Healthcare)

Abstract

:
Owing to advances in sensor technologies on wearable devices, it is feasible to measure physical activity of an individual continuously over a long period. These devices afford opportunities to understand individual behaviors, which may then provide a basis for tailored behavior interventions. The large volume of data however poses challenges in data management and analysis. We propose a novel quantile coarsening analysis (QCA) of daily physical activity data, with a goal to reduce the volume of data while preserving key information. We applied QCA to a longitudinal study of 79 healthy participants whose step counts were monitored for up to 1 year by a Fitbit device, performed cluster analysis of daily activity, and identified individual activity signature or pattern in terms of the clusters identified. Using 21,393 time series of daily physical activity, we identified eight clusters. Employment and partner status were each associated with 5 of the 8 clusters. Using less than 2% of the original data, QCA provides accurate approximation of the mean physical activity, forms meaningful activity patterns associated with individual characteristics, and is a versatile tool for dimension reduction of densely sampled data.

1. Introduction

Physical activity has been shown to improve cardiovascular health, reduce risk of mortality [1,2,3,4] and is an important component of primary prevention for many chronic diseases and conditions such as Type 2 diabetes and obesity [5,6]. Walking, in particular, is recognized as an easily accessible, convenient, and familiar mode of physical activity, and thus is an appealing strategy for the promotion of health and well-being. As such there is impetus for examining walking behaviors as a predictor of multiple health outcomes in ambulatory, community-dwelling adults.
Advances in sensor technologies on wearable devices have enabled the continuous and accurate collection of step counts and other walking parameters over an extended period of time, thus providing a voluminous stream of data. The large amount of data provides an opportunity to better understand the daily physical activity patterns across populations. However, conventional analytical approaches focus on measuring physical activity patterns by predefined summary statistics such as total step counts and average minutes with activity on a given day. By summarizing physical activity at the daily level, however, these methods ignore between-day heterogeneity within a person, as they fail to capture the within-day patterns of activity. An understanding of within-day patterns of physical activity is of importance to facilitate individualized mobile experience, such as when push notifications and activity updates are being sent [7,8], and identifying changes in an individual’s daily routines, thereby facilitating tailored behavior intervention [9,10]. Given the broad use of step-counting trackers to monitor and improve physical activity [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25], analyzing sensor data beyond predefined daily features thus can have significant public health impact.
Multivariate finite mixture modeling (MFMM) is a clustering method, whose purpose is to identify homogeneous subgroups wherein the number of subgroups is not assumed to be known in the analysis. The MFMM analysis is model-based, data-driven, and aims to produce subgroups with features arising from the same statistical distribution; dividing the data into an optimal number of subgroups based on specific criteria such as the Bayesian information criterion [26]. Clustering algorithms utilizing MFMM methods have been applied to identify dietary patterns [27,28] and physical activity patterns based on questionnaire data [29]. These algorithms often entail prespecifying only a small-to-moderate number of features as input variables, as the computational complexity grows exponentially with the addition of more features [30]. In the present context where the goal is to examine the within-day activity patterns, hundreds of physical activity inputs can be recorded from sensors throughout a day (e.g., minute-by-minute step counts), existing clustering algorithms may prove to be computationally infeasible without properly reducing the dimension of the data in a pre-processing step.
Dimension reduction of sensor data continuously collected can be achieved by time series modeling of the data [31,32,33,34]. Typically, a time series is first transformed to a domain relevant to the scientific interest, and is then summarized by a few parameters (e.g., autocorrelation). These parameters in turn serve as input features in a clustering algorithm. In this article, we take a similar approach and propose a two-step method for analyzing sensor data as time series: the proposed method first transforms the daily physical activity data into a coarsened probability density function of quantiles of activity time, and then applies the MFMM analysis using the quantiles as input features. The method is thus called quantile coarsening analysis (QCA). This approach is motivated by the consideration that time of activity, as well as the amount of activity, is of primary interest in our application. As will be shown in Statistical Analyses below, the resolution or coarseness of dimension reduction can be set by users in accordance with the needs in their application; such flexibility distinguishes the proposed method from the traditional parametric modeling of time series data [35]. The purposes of this article are to demonstrate the feasibility of QCA in a data set of 21,393 time series of daily physical activity, and to examine its estimation properties under various degrees of coarseness.

2. Materials and Methods

2.1. Study Cohort

A single cohort, 12-month, intensive observational study was conducted in healthy adults with an objective to collect their personal daily stress and physical activity for associative analysis. The study was approved by Columbia University Medical Center’s (CUMC) institutional review board. All participants provided informed consent. Access to the study dataset and information about the study’s execution and materials is publicly available [36].
Potentially eligible participants were identified and screened at CUMC. The inclusion criteria were (i) aged 18 years or older; (ii) self-reported intermittent exerciser (i.e., exercise 6–11 times per month but did not have a regular workout schedule); (iii) having access to a personal computer and a smartphone. Exclusion criteria included individuals who (i) were unavailable for 12 continuous months; (ii) had serious medical comorbidity that would compromise their ability to engage in usual physical activity; (iii) had occupational work demands that required rigorous activity; or (iv) were unable to read and speak English. From January 2014 to July 2015, a total of 79 participants were enrolled and followed for 12 months. For the purpose of this article, we considered the physical activity data (described below). Details of enrollment, participant characteristics, and other association studies of stress level were previously reported [37]. Briefly, the data set for the present analysis consisted of 45 females and 34 males, with an overall mean age of 31.9 years (±9.5 years). In addition, we considered the following variables for association with physical activity: race/ethnicity (27 non-Hispanic whites vs. 52 others), education as an ordinal variable (13 having less than college vs. 34 completing college vs. 32 attaining graduate or professional degrees), employment status (64 full-time employed vs. 8 part-time), and partner status (32 having a partner or spouse vs. 45 being single).

2.2. Physical Activity

Physical activity was monitored continuously for up to 12 months using a wrist-worn Fitbit activity monitor (Fitbit Flex) [38]. The Fitbit device, containing an accelerometer and an altimeter, tracks the wearer’s daily physical activity including steps, distance walked, and stairs climbed, and has been previously validated for measuring physical activity [39]. While the Fitbit devices (including the Fitbit Flex) have been demonstrated to have good validity for the objective measurement of physical activity, their accuracy has largely been reported for stepping-related physical activities (e.g., walking and running) [40]. Similar to other research-grade accelerometers, the Fitbit devices have poor accuracy for the measurement of cycling [41,42]. Furthermore, Fitbit instructs users to not swim with the Fitbit Flex because it is not waterproof [43], thus rendering it unable to assess swimming-based exercise.
Data from the devices were automatically uploaded to the Fitbit website whenever the device was within 15 feet of the base station, which was plugged into the participant’s own personal computer. Participants were instructed to sync and charge their device every 5–7 days to ensure no loss of activity data. The Fitbit accelerometer recorded data in one-minute epochs, starting at 12:00 a.m. and ending at 23:59 p.m. every day, yielding a time series of 1440 minute-epochs per day per individual. The raw minute-by-minute step count data were extracted from the manufacturer’s website using Fitabase (Small Steps Labs, San Diego, CA, USA) and were reduced using a novel QCA, described in Statistical Analyses below. Specifically, the raw data that was relevant to the present article included the step counts over one-minute intervals with a timestamp; data for each participant was converted to an “RDATA” file each associated with a unique participant ID. Based on the raw data, we calculated other predefined physical activity measurements, including total daily step counts, the duration in minutes of physical activity (PA, defined as having at least 50 steps in a minute), and activity midday (defined as the time when 50% of daily step counts were achieved).

2.3. Statistical Analyses

2.3.1. Quantile Coarsening Analysis (QCA)

Let Y(t) denote the step counts at time t and S(t) = 0 t Y ( u ) d u be the cumulative activity up to time t ( 0 ,   t m a x ) . Then
T ( p ) inf { t : S ( t ) p   S ( t m a x ) }
denotes the time where 100p percent of the total activity has been achieved and will be referred to as the 100pth quantile of the activity time [44]. Specifically, activity midday is defined by the 50th quantile, T(0.5). The idea of QCA is to represent a time series Y(t) using multiple quantiles T(pj) for a prespecified set of p1 < p2 < … < pK, together with the total daily counts S(tmax). The number K of quantiles determines the number of components used to represent Y(t), and hence controls the resolution or coarseness of the approximation. Define the Kth order quantile-coarse function of Y(t) as
C K Y ( t ) = S ( t m a x ) ( K + 1 ) { T ( p j + 1 ) T ( p j ) }   for   T ( p j ) t < T ( p j + 1 )
for j = 0, …, K, with the convention that T(0) = 0 and T(1) = tmax. While pj can be any values between 0 and 1, we consider an evenly spaced grid, i.e., setting pj = j/(K + 1) for j = 1, …, K. It can be easily shown that the quantile-coarse function is invariant under the quantile transformation. That is to say, applying the quantile transformation to CKY(t) will result in the same quantile representation as applying the transformation to the original Y(t), i.e., CK{CKY(t)} = CKY(t). As a result, there is no loss in information by converting between coarsened data and quantiles back and forth for any given K.
Our data set consisted of a total of 21,393 days of minute-by-minute step counts from 79 study participants. For each daily time series, we evaluated the quantile-coarse function. The mean time series Y ¯ ( t ) of each cluster was then approximated by the corresponding mean quantile-coarse function C K Y ¯ (t). We calculated the integrated mean squared error:
0 t m a x { C K Y ¯ ( t ) Y ¯ ( t ) } 2 d t
to assess the accuracy of the quantile coarsening method under various coarseness values K.

2.3.2. Cluster Analysis

We performed cluster analysis using MFMM with the quantile-coarse function CKY(t) as input. Specifically, we considered K = 19 so that each time series Y ( t ) was represented by a total of 20 features, namely, T(0.05), T(0.10), T(0.15), …, T(0.95), and S(tmax). Note that although we did not use common features such as PA minutes as direct inputs of the cluster analysis, these features were implicitly incorporated as they could be approximated from a quantile-coarse function. The number of clusters was determined based on the Bayesian information criterion [45]. After the MFMM analysis, physical activity features of each cluster were described using means and standard deviations, along with the mean time series Y ¯ ( t ) of each cluster.

2.3.3. Association Studies

In order to identify important factors affecting a participant’s physical activity behaviors in terms of the identified clusters, association between the cluster membership and participant characteristics was assessed using generalized linear mixed model (GLMM) with a logit link in a univariate manner, with an adjustment for a weekend/weekday random effect nested within a subject random effect. For comparison purposes, we also examined the association of step-count based clusters with participant characteristics using the same GLMM approach.

3. Results

3.1. Physical Activity Clusters by Multivariate Finite Mixture Modeling

The MFMM analysis found an eight-cluster solution among the 21,393 series. Table 1 reports some summary physical activity measures in each cluster. The clusters were organized according to the average daily step counts, which were in concordance with PA duration. The least active cluster (Cluster 1) on average completed just below 1000 steps a day with 7.3 min in PA; this subgroup of activity either depicted a very sedentary pattern or effectively identified inactivity due to non-wear. The most active group (Cluster 8) had about 10,000 counts on average with 73 min in PA. The next two most active clusters (Clusters 6 and 7) had similar activity level to Cluster 8 and were within 1000 steps daily on average. However, activity midday in these clusters, ranging from noon to 3:00 p.m., occurred earlier than that of Cluster 8. While not as inactive as Cluster 1, Clusters 2 and 3 had low PA level when compared to the higher clusters, with different activity midday. Clusters 4 and 5 represented days of intermediate PA level.
Figure 1 shows the mean activity curves of the clusters, and the superimposed cumulative activities of the clusters (lower right figure). These plots reveal additional cluster-defining features. Specifically, Cluster 2 was characterized by very early (i.e., late night) activity. Clusters 6 and 8 had peak activity averaged at around noon and 6:00 p.m. respectively, whereas Cluster 5 had multiple peaks throughout the day (at around 8:00 a.m., noon, and 5:00 p.m.).

3.2. Activity Patterns and the Weekends

Table 1 also shows the proportion of daily activity falling on a weekend for each cluster, and demonstrates a range across the eight groups with ≥40% of time series in Clusters 2 and 6 occurring on a weekend, and 16% in Cluster 5 being on a weekend. Generally, it is also noted that the time series in the inactive clusters (Clusters 1–3) tended to fall on weekends.
Figure 2 further shows the PA patterns of the 79 participants were very different on weekdays and on weekends, with Cluster 5 being clearly a weekday phenomenon in most participants. It was consistent with the fact that Cluster 5 was characterized by spikes in activity around morning commute, lunch, and evening commute (Figure 1). At the same time, the heatmaps showed variations among the participants and that some did not follow this weekday/weekend differential (e.g., Participants 11 and 16). In addition, the PA patterns on the weekends were more dispersed than those on the weekdays, suggesting weekend activities were less structured and more heterogenous across participants.

3.3. Physical Activity Clusters and Participant Characteristics

Table 2 gives the association between each cluster and participant characteristics, in terms of odds ratio of falling into one activity cluster vs. the others using GLMM. In this cohort, employment and partner statuses were the most influential predictors of activity, each associated with 5 PA clusters. Specifically, Cluster 5 was highly significantly (p < 0.01) associated with being full-time employed and having a partner/spouse. Interestingly, the association between Cluster 5 and employment status was significant after adjusting for the weekend/weekday effect, suggesting that employment had a structural impact on an individual’s behaviors and habits beyond the physical constraint it has during a workweek.
In contrast, Clusters 2 and 4, both having very early activity (Figure 1), were associated with singles with part-time jobs; having a younger age and receiving less education were also associated with these two clusters.
To a lesser extent, race/ethnicity was also predictive of an individual’s activity behaviors. Specifically, non-Hispanic whites were more likely to engage in physical activities consistent with Clusters 5 and 8, and less with Cluster 2. Finally, it is interesting to note that the inactive cluster (Cluster 1) was not associated with any particular characteristics.

3.4. Accuracy of Approximation

Table 3 gives the integrated mean squared errors of the quantile-coarse function using different values of K for estimating the mean activity of the 8 patterns. Accuracy improves as the original function Y ¯ ( t ) is represented with a larger number K of quantiles, with the initial improvement being most substantial. With K = 19, the mean squared error was about 3% on average of the error when daily activity was summarized using only the total daily counts (K = 0).

4. Discussion

We have proposed a novel QCA for reducing dimension of data collected from wearable devices, and for representing data in conjunction with downstream analyses such as MFMM and association studies. The proposed method contributes to the analysis and management of wearable data in two ways. First, quantile transformation lends itself to making inferences about the time of activity, which could be useful in distinguishing individuals and days from a single individual with differing patterns of PA accrual. Using data from an intensive, 12-month observational study, we were able to identify 8 unique clusters (or subgroups) that characterized the various types of PA accrual patterns observed at the day-level and were able to link these clusters with participant characteristics that provided important contextual information regarding the observed patterns. For example, we observed a “worker” cluster (Cluster 5) associated with employment status wherein spikes in activity were observed around times of day that typically coincide with morning commute, lunch, and evening commute. We also observed active clusters that accrued much of the activity earlier or later in the day (Clusters 6–8), possibly reflective of morning or evening exercise. On the other hand, it is interesting to note that the most active pattern (Cluster 8) accumulates steps late in the day and is associated with full-time employment, suggesting these are intentional leisure-time physical activities. This is consistent with the literature that individuals who meet physical activity guidelines are those who engage in leisure-time physical activity [46]. In contrast, when we performed cluster analysis using total step counts only (i.e., not including time of activity as inputs), all but one cluster had an activity midday at 2:30 p.m. (Table 1). And as a result, we identified fewer and weaker association between the step-count based clusters and participant characteristics (Table 2); this analysis did provide nuances about the nature of activity, which in turn could be useful for developing applications of individualized intervention.
Second, QCA facilitates large-scale data reduction, as quantile transformation requires only simple and scalable computations. We have demonstrated the method in a dataset of 21,393 time series (over 30 million minute-by-minute counts) from 79 participants for up to 1-year follow-up as a proof of concept. In real-life situations where deployment of mobile sensors such as Fitbit can occur at a much larger scale for a much longer duration, the large data volume will be a practical issue for storage and analyses and for the deployment of edge computing [47]. In a typical application, data are transmitted from the devices and stored externally on a server or in a cloud platform for specific analyses. Quantile coarsening in this context can be used as a data pre-processing step to minimize the volume of data transmission, storage, and persistence demand. As the size of the wearable devices tends to be small, their computational capacity is often limited. As such, continuous sensing may pose challenges to existing multi-modal analysis techniques using wearable devices. Since quantile transformation is easy to implement and can be computed independently of data from other individuals, simple scripts can be written to execute on the edge devices as well as on the server level. Depending on the purpose of the analysis, the end-user can specify the level of resolution in terms of the number K of quantiles needed. Our analyses show that the mean quantile-coarse function provides good approximation of the original mean function with only 20 data points per day per individual, representing less than 2% of the original amount of data (1440 data points). In addition, at the deployment time, QCA can also be applied on the incoming streams of data to compare to pre-stored cluster characteristics identified from the cluster analysis. This can lend support to the implementation of many other dynamic, just-in-time adaptive interventions that are key to persuasive reminder and sustainable behavioral changes [48].
The high volume of step count data offers the opportunities to tailor behavior intervention of each individual in a highly personalized manner. Specifically, we have created an activity behavior signature for each individual over time (Figure 2), which can serve as the basis of adaptive intervention. For instance, we could adapt the “dose” and time of push notifications if there are indications that an individual deviate from his/her own norm. The use of signature is broadly applicable to other behavioral intervention system such as centralized recommenders of health apps [49,50,51]. To allow for such tailored intervention, it is important to acknowledge individual behaviors are not monolithic, but heterogeneous. It is therefore important to note that our analysis goal was to identify clusters of daily activity as building blocks of each signature, as opposed to identify clusters of individuals. While within-day metrics (such as intensity and regularity [52]) have been examined to reflect enrich the heterogeneity in between-day activities of each individual, these approaches typically are semi-quantitative and are intended for visualization.
In the present article, we have shown the feasibility of QCA in a small cohort of relatively healthy individuals. The study design and analytical methods can be easily deployed to other populations. For example, the Northern Manhattan Study aims to assess risk factors for stroke and cardiovascular diseases, and has examined and analyzed the physical activity patterns of the cohort based on paper questionnaires [3,29]. It would be an interesting next step to follow up on these individuals to monitor and assess their mobility issues using wearables, and to provide additional information (signature) that contributes to cardiovascular risks.
We applied QCA to step count data. The method however is applicable to other data variety and supports monitoring of biometrics (e.g., heart rate, ambulatory blood pressure, etc.), location (e.g., indoor/outdoor), behaviors (e.g., medication adherence), exogenous factors such as weather, and user-input data via ecological momentary assessment. There is a growing trend towards self-monitoring on a daily basis with goals such as tracking health status, ameliorating exacerbations of chronic conditions, and avoiding episodic hospitalization; see [53,54,55,56,57] for example. As such, wearables devices are well-suited for this new approach to patient care, provided that they are capable of handling complex analysis efficiently (resulting in smaller and lighter devices with longer battery life). At the same time, we acknowledge that accelerometers are not capable of capturing some of the more common forms of aerobic exercise. Research- and commercial-grade accelerometers such as those made by Fitbit have poor accuracy for the measurement of cycling and cannot be worn while swimming due to not being waterproof [41,42]. However, given the versatility of the QCA, it shall provide useful unified analytical tools for the high data variety in multi-modal monitoring as sensing technologies advance.

Author Contributions

Conceptualization, Y.K.C. and K.M.D.; Methodology & Formal analysis, Y.K.C.; Resources, Y.K.C. and K.M.D.; Writing—Original Draft, Y.K.C.; Writing—Review & Editing, Y.K.C., P.-Y.S.H., I.E., J.Z.W. and K.M.D.

Funding

This work was partly supported by NIH grants R01HL111195 and R01MH109496.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. World Health Organization. Global Health Risks: Mortality and Burden of Disease Attributable to Selected Major Risks; WHO Press: Geneva, Switzerland, 2009. [Google Scholar]
  2. Garcia, M.C.; Bastian, B.; Rossen, L.M.; Anderson, R.; Minino, A.; Yoon, P.W.; Faul, M.; Massetti, G.; Thomas, C.C.; Hong, Y.; et al. Potentially Preventable Deaths among the Five Leading Causes of Death—United States, 2010 and 2014; Morbidity and Mortality Weekly Report (MMWR): Atlanta, GA, USA, 18 November 2016; Volume 65, pp. 1245–1255. [Google Scholar]
  3. Cheung, Y.K.; Moon, Y.P.; Kulick, E.R.; Sacco, R.L.; Elkind, M.S.V.; Willey, J.Z. Leisure-time physical activity and cardiovascular mortality in an elderly population in northern Manhattan: A prospective cohort study. J. Gen. Int. Med. 2017, 32, 168–174. [Google Scholar] [CrossRef] [PubMed]
  4. Diaz, K.M.; Howard, V.J.; Hutto, B.; Colabianchi, N.; Vena, J.E.; Safford, M.M.; Blair, S.N.; Hooker, S.P. Patterns of sedentary behavior and mortality in U.S. middle-aged and older adults. Ann. Int. Med. 2017, 167, 465–475. [Google Scholar] [CrossRef] [PubMed]
  5. Motl, R.W. Theoretical models for understanding physical activity behavior among children and adolescents—Social cognitive theory and self-determination theory. J. Teach. Phys. Edu. 2007, 26, 350–357. [Google Scholar] [CrossRef]
  6. Bravata, D.M.; Smith-Spangler, C.; Sundaram, V.; Gienger, A.L.; Lin, N.; Lewis, R.; Stave, C.D.; Olkin, I.; Sirard, J.R. Using pedometers to increase physical activity and improve health. JAMA 2007, 298, 2296–2304. [Google Scholar] [CrossRef] [PubMed]
  7. Consolvo, S.; McDonald, D.W.; Toscos, T.; Chen, M.Y.; Froehlich, J.; Harrison, B.; Klasnja, P.; LaMarca, A.; LeGrand, L.; Libby, R.; et al. Activity sensing in the wild: A field trial of UbiFit Garden. In Proceedings of the the SIGCHI Conference on Human Factors in Computing Systems, Florence, Italy, 5–10 April 2008; ACM Press: New York, NY, USA, 2008. [Google Scholar] [CrossRef]
  8. Lin, J.J.; Mamykina, L.; Lindtner, S.; Delajoux, G.; Strub, H.B. Fish’n'Steps: Encouraging physical activity with an interactive computer game. In Proceedings of the 8th international conference on Ubiquitous Computing, Orange County, CA, USA, 17–19 September 2006; Dourish, P., Friday, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4206, pp. 261–278. [Google Scholar] [CrossRef]
  9. Miller, A.D.; Mynatt, E.D. A School-based Pervasive Social Fitness System for Everyday Adolescent Health. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Toronto, ON, Canada, 26 April–1 May 2014; ACM Press: New York, NY, USA, 2014; pp. 823–2832. [Google Scholar] [CrossRef]
  10. Munson, S.; Consolvo, S. Exploring goal-setting, rewards, self-monitoring, and sharing to motivate physical activity. In Proceedings of the 6th International. Conference on Pervasive Computing Technologies for Healthcare, San Diego, CA, USA, 21–24 May 2012; pp. 25–32. [Google Scholar] [CrossRef]
  11. Pillay, J.D.; van der Ploeg, H.P.; Kolbe-Alexander, T.L.; Proper, K.I.; van Stralen, M.M.; Tomaz, S.A.; van Mechelen, W.; Lambert, E.V. The association between daily steps and health, and the mediating role of body composition: A pedometer-based, cross-sectional study in an employed South African population. BMC Publ. Health 2005, 15, 174. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Evenson, K.R.; Wen, F.; Furberg, R.D. Assessing Validity of the Fitbit Indicators for U.S. Public Health Surveillance. Am. J. Prev. Med. 2017, 53, 931–932. [Google Scholar] [CrossRef] [PubMed]
  13. Wang, J.B.; Cadmus-Bertram, L.A.; Natarajan, L.; White, M.M.; Madanat, H.; Nichols, J.F.; Ayala, G.X.; Pierce, J.P. Wearable Sensor/Device (Fitbit One) and SMS Text-Messaging Prompts to Increase Physical Activity in Overweight and Obese Adults: A Randomized Controlled Trial. Telemed. E-Health 2015, 21, 782–792. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Bentley, F.; Tollmar, K.; Stephenson, P.; Levy, L.; Jones, B.; Robertson, S.; Price, E.; Catrambone, R.; Wilson, J. Health Mashups: Presenting Statistical Patterns between Wellbeing Data and Context in Natural Language to Promote Behavior Change. ACM Trans. Comput. Hum. Interact. 2013, 20, 1–27. [Google Scholar] [CrossRef]
  15. Yoon, S.; Schwartz, J.E.; Burg, M.M.; Kronish, I.M.; Alcantara, C.; Julian, J.; Parsons, F.; Davidson, K.W.; Diaz, K.M. Using Behavioral Analytics to Increase Exercise: A Randomized N-of-1 Study. Am. J. Prev. Med. 2018, 54, 559–567. [Google Scholar] [CrossRef] [PubMed]
  16. Cadmus-Bertram, L.A.; Marcus, B.H.; Patterson, R.E.; Parker, B.A.; Morey, B.L. Randomized Trial of a Fitbit-Based Physical Activity Intervention for Women. Am. J. Prev. Med. 2015, 49, 414–418. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Auffray, C.; Balling, R.; Barroso, I.; Bencze, L.; Benson, M.; Bergeron, J.; Bernal-Delgado, E.; Blomberg, N.; Bock, C.; Conesa, A. Making sense of big data in health research: Towards an EU action plan. Genome Med. 2016, 8, 71. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Cheung, Y.; Hsueh, P.; Qian, M.; Yoon, S.; Meli, L.; Diaz, K.M.; Schwartz, J.E.; Kronish, I.M.; Davidson, K.W. Are nomothetic or ideographic approaches superior in predicting daily exercise behaviors? Analyzing N-of-1 mHealth data. Methods Inf. Med. 2017, 56, 452–460. [Google Scholar] [CrossRef] [PubMed]
  19. Hsiao, M.; Hsueh, P.; Ramakrishnan, S. Personalized adherence activity recognition via model-driven sensor data assessment. Stud. Health Technol. Inf. 2012, 180, 1050–1054. [Google Scholar]
  20. Kim, Y.; Welk, G.J.; Braun, S.I.; Kang, M. Extracting Objective Estimates of Sedentary Behavior from Accelerometer Data: Measurement Considerations for Surveillance and Research Applications. PLoS ONE 2015, 10, e0118078. [Google Scholar] [CrossRef] [PubMed]
  21. Swan, M. The Quantified Self: Fundamental Disruption in Big Data Science and Biological Discovery. Big Data 2013, 1, 85–99. [Google Scholar] [CrossRef] [PubMed]
  22. Dijkhuis, T.B.; Blaauw, F.J.; van Ittersum, M.W.; Velthuijsen, H.; Aiello, M. Personalized Physical Activity Coaching: A Machine Learning Approach. Sensors 2018, 18, 623. [Google Scholar] [CrossRef] [PubMed]
  23. Du, H.; Venkatakrishnan, A.; Youngblood, G.M.; Ram, A.; Pirolli, P. A Group-Based Mobile Application to Increase Adherence in Exercise and Nutrition Programs: A Factorial Design Feasibility Study. JMIR MHealth UHealth 2016, 4, e4. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Burg, M.M.; Schwartz, J.E.; Kronish, I.M.; Diaz, K.M.; Alcantara, C.; Duer-Hefele, J.; Davidson, K.W. Does Stress Result in You Exercising Less? Or Does Exercising Result in You Being Less Stressed? Or Is It Both? Testing the Bi-directional Stress-Exercise Association at the Group and Person (N of 1) Level. Ann. Behav. Med. 2017, 51, 799–809. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Hartman, S.J.; Nelson, S.H.; Weiner, L.S. Patterns of Fitbit Use and Activity Levels Throughout a Physical Activity Intervention: Exploratory Analysis from a Randomized Controlled Trial. JMIR MHealth UHealth 2018, 6, e29. [Google Scholar] [CrossRef] [PubMed]
  26. McLachlan, G.; Peel, D. Finite Mixture Models; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2005. [Google Scholar]
  27. Hu, F.B.; Rimm, E.; Smith-Warner, S.A.; Feskanich, D.; Stampfer, M.J.; Ascherio, A.; Sampson, L.; Willett, W.C. Reproducibility and validity of dietary patterns assessed with a food-frequency questionnaire. Am. J. Clin. Nutr. 1999, 69, 243–249. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Hu, F.B. Dietary pattern analysis: A new direction in nutritional epidemiology. Curr. Opin. Lipidol. 2002, 13, 3–9. [Google Scholar] [CrossRef] [PubMed]
  29. Cheung, Y.K.; Yu, G.; Wall, M.M.; Sacco, R.L.; Elkind, M.S.V.; Willey, J.Z. Patterns of leisure-time physical activity using multivariate finite mixture modeling and cardiovascular risk factors in the Northern Manhattan Study. Ann. Epidemiol. 2015, 25, 469–474. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  30. Inaba, M.; Katoh, N.; Imai, H. Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In Proceedings of the 10th ACM Symposium on Computational Geometry, Stony Brook, NY, USA, 6–9 June 1994; pp. 332–339. [Google Scholar] [CrossRef]
  31. Fan, J.; Yao, Q. Nonlinear Time Series: Nonparametric and Parametric Methods; Springer: Berlin, Germany, 2003. [Google Scholar]
  32. Fryzlewicz, P.; Oh, H.S. Thick pen transformation for time series. J. R. Stat. Soc. Ser. B 2011, 73, 499–529. [Google Scholar] [CrossRef] [Green Version]
  33. Tsay, R.S. Some methods for analyzing big dependent data. J. Bus. Econ. Stat. 2016, 34, 673–688. [Google Scholar] [CrossRef]
  34. Lim, Y.; Oh, H.S.; Cheung, Y.K. Functional clustering of accelerometer data via transformed input variables. Unpublished manuscript.
  35. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2008. [Google Scholar]
  36. Diaz, K. Ecological Link of Psychosocial Stress to Exercise: Personalized Pathways. Available online: https://osf.io/kmszn/ (accessed on 11 September 2018).
  37. Burg, M.M.; Schwartz, J.E.; Kronish, I.M.; Diaz, K.M.; Alcantara, C.; Duer-Hefele, J.; Davidson, K.W. Does stress result in you exercising less? Or does exercising result in you being less stressed? Or it is both? Testing the bi-directional stress-exercise association at the group and person (n of 1) level. Ann. Behav. Med. 2017, 51, 799–809. [Google Scholar] [CrossRef] [PubMed]
  38. Fitbit Flex. Available online: http://www.fitbit.com (accessed on 11 September 2018).
  39. Diaz, K.; Krupka, D.J.; Chang, M.J.; Peacock, J.; Ma, Y.; Goldsmith, J.; Schwartz, J.E.; Davidson, K.W. Fitbit: An accurate and reliable device for wireless physical activity tracking. Int. J. Cardiol. 2015, 185, 138–140. [Google Scholar] [CrossRef] [PubMed]
  40. Evenson, K.R.; Goto, M.M.; Furberg, R.D. Systematic review of the validity and reliability of consumer-wearable activity trackers. Int. J. Behav. Nutr. Phys. Act. 2015, 12, 159. [Google Scholar] [CrossRef] [PubMed]
  41. Sasaki, J.E.; Hickey, A.; Mavilia, M.; Tedesco, J.; John, D.; Kozey Keadle, S.; Freedson, P.S. Validation of the Fitbit wireless activity tracker for prediction of energy expenditure. J. Phys. Act. Health 2015, 12, 149–154. [Google Scholar] [CrossRef] [PubMed]
  42. Wallen, M.P.; Gomersall, S.R.; Keating, S.E.; Wisloff, U.; Coombers, J.S. Accuracy of heart rate watches: Implications for weight management. PLoS ONE 2016, 11, e0154420. [Google Scholar] [CrossRef] [PubMed]
  43. Fitbit Flex. Available online: https://staticcs.fitbit.com/content/assets/help/manuals/manual_flex_en_US.pdf (accessed on 11 September 2018).
  44. Koenker, R. Quantile Regression (Econometric Society Monographs); Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
  45. Fraley, C.; Raftery, A.E. How many clusters? Which clustering method? Answers via model based cluster analysis. Comput. J. 1998, 41, 578–589. [Google Scholar] [CrossRef]
  46. Nang, E.E.K.; Khoo, E.Y.; Salim, A.; Tai, E.S.; Lee, J.; Van Dam, R.M. Patterns of physical activity in different domains and implications for intervention in a multi-ethnic Asian population: A cross-sectional study. BMC Public Health 2010, 10, 644. [Google Scholar]
  47. Garcia Lopez, P.; Montresor, A.; Epema, D.; Datta, A.; Higashino, T.; Iamnitchi, A.; Barcellos, M.; Felber, P.; Riviere, E. Edge-centric computing: Vision and challenges. ACM SIGCOMM Comput. Commun. Rev. 2015, 45, 37–42. [Google Scholar] [CrossRef]
  48. Klasnja, P.; Hekler, E.B.; Shiffman, S.; Boruvka, A.; Almirall, D.; Tewari, A.; Murphy, S.A. Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychol. 2015, 34, 1220–1228. [Google Scholar] [CrossRef] [PubMed]
  49. Mohr, D.C.; Cheung, K.; Schueller, S.M.; Brown, C.H.; Duan, N. Continuous evaluation of evolving behavioral intervention technologies. Am. J. Prev. Med. 2013, 45, 517–523. [Google Scholar] [CrossRef] [PubMed]
  50. Cheung, K.; Ling, W.; Karr, C.J.; Weingardt, K.; Schueller, S.M.; Mohr, D.C. Evaluation of a recommender app for apps for the treatment of depression and anxiety: An analysis of longitudinal user engagement. J. Am. Med. Inf. Assoc. 2018, 25, 955–962. [Google Scholar] [CrossRef] [PubMed]
  51. Hu, X.; Hsueh, P.S.; Qian, M.; Chen, C.-H.; Diaz, K.M.; Cheung, Y.K. A First Step Towards Behavioral Coaching for Managing Stress: A Case Study on Optimal Policy Estimation with Multi-stage Threshold Q-learning. AMIA Annu. Symp. Proc. 2017, 2017, 930–939. [Google Scholar]
  52. Marschollek, M. A semi-quantitative method to denote generic physical activity phenotypes from long-term accelerometer data—The ATLAS index. PLoS ONE 2013, 8, e63522. [Google Scholar] [CrossRef] [PubMed]
  53. Rodriguez-Paras, C.; Tippey, K.; Brown, E.; Sasangohar, F.; Creech, S.; Kum, H.-C.; Lawley, M.; Benzer, J.K. Posttraumatic Stress Disorder and Mobile Health: App Investigation and Scoping Literature Review. JMIR MHealth UHealth 2017, 5, e156. [Google Scholar] [CrossRef] [PubMed]
  54. Kiral-Kornek, I.; Roy, S.; Nurse, E.; Mashford, B.; Karoly, P.; Carroll, T.; Payne, D.; Saha, S.; Baldassano, S.; O’Brien, T.; et al. Epileptic Seizure Prediction Using Big Data and Deep Learning: Toward a Mobile System. EBioMedicine 2018, 27, 103–111. [Google Scholar] [CrossRef] [PubMed]
  55. Garcia-Alamino, J.M.; Ward, A.M.; Alonso-Coello, P.; Perera, R.; Bankhead, C.; Fitzmaurice, D.; Heneghan, C.J. Self-monitoring and self-management of oral anticoagulation. Cochrane Database Syst. Rev. 2010, 7, CD003839. [Google Scholar] [CrossRef]
  56. Roditi, D.; Robinson, M.E. The role of psychological interventions in the management of patients with chronic pain. Psy. Res. Behav. Manag. 2011, 4, 41–49. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  57. Wells, N.; Pasero, C.; McCaffery, M. Improving the Quality of Care Through Pain Assessment and Management. In Patient Safety and Quality: An Evidence-Based Handbook for Nurses. Agency for Healthcare Research and Quality (US); NCBI: Bethesda, MD, USA, 2008. Available online: http://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/pubmed/21328759 (accessed on 6 September 2018).
Figure 1. Mean activity of the 8 physical activity clusters by multivariate finite mixture modeling. Lower right: Superimposed cumulative step counts of the 8 clusters.
Figure 1. Mean activity of the 8 physical activity clusters by multivariate finite mixture modeling. Lower right: Superimposed cumulative step counts of the 8 clusters.
Sensors 18 03056 g001
Figure 2. Heatmap of activity patterns of the 79 participants on weekdays and weekends. The color code indicates the proportion of days that a participant fell into each activity cluster.
Figure 2. Heatmap of activity patterns of the 79 participants on weekdays and weekends. The color code indicates the proportion of days that a participant fell into each activity cluster.
Sensors 18 03056 g002
Table 1. Physical activity clusters by multivariate finite mixture modeling.
Table 1. Physical activity clusters by multivariate finite mixture modeling.
Cluster ID12345678
N4091302228527517819167823262823
Daily step counts 96162276855803789999379939610,038
Activity midday a11:30 a.m.1:00 p.m.2:00 p.m.3:30 p.m.2:00 p.m.Noon3:00 p.m.5:00 p.m.
PA minutes b7.342.345.652.859.965.965.172.7
Weekend c37%40%39%35%16%46%30%23%
a Time of day when 50% of daily counts were achieved; time was rounded to nearest half-hour. b Duration (in minutes) with ≥50 counts per minute. c Percent of time series in the cluster being on a weekend.
Table 2. Association (odds ratio) of physical activity clusters and participant characteristics.
Table 2. Association (odds ratio) of physical activity clusters and participant characteristics.
Cluster ID12345678
Age a0.990.96 ***0.990.98 *1.02 *1.011.001.01
Male (ref: Female)0.770.940.801.370.861.041.020.95
NHW b (ref: others)0.650.60 *0.800.851.35 *1.040.911.23 *
Education c1.020.66 **1.010.75 *1.151.170.911.11
Full-time (FT) (ref: Part-time, PT)1.170.44 *0.930.42 ***3.49 ***0.57 **1.011.41 *
Being single (ref: Partner/spouse)0.742.37 ***0.76*1.72 ***0.65 **1.021.19 *0.85
a Odds ratio per one-year increase in age. b NHW: Non-hispanic white. c Education as an ordinal variable: 0 = less than college; 1 = college graduate; 2 = above college. * ≤0.05, ** ≤0.01, *** ≤0.001.
Table 3. Integrated mean squared errors in estimating the mean activity of the eight clusters.
Table 3. Integrated mean squared errors in estimating the mean activity of the eight clusters.
K12345678
0 a563319115,67416,10622,80032,00426,22647,884
322416905189392613,4738631735615,360
9596098809082763151921323084
1929255253342676506626826
391913198135181188211237
a = 0 corresponds to approximation using daily step counts only; activity is assumed to be uniform throughout the day.

Share and Cite

MDPI and ACS Style

Cheung, Y.K.; Hsueh, P.-Y.S.; Ensari, I.; Willey, J.Z.; Diaz, K.M. Quantile Coarsening Analysis of High-Volume Wearable Activity Data in a Longitudinal Observational Study. Sensors 2018, 18, 3056. https://0-doi-org.brum.beds.ac.uk/10.3390/s18093056

AMA Style

Cheung YK, Hsueh P-YS, Ensari I, Willey JZ, Diaz KM. Quantile Coarsening Analysis of High-Volume Wearable Activity Data in a Longitudinal Observational Study. Sensors. 2018; 18(9):3056. https://0-doi-org.brum.beds.ac.uk/10.3390/s18093056

Chicago/Turabian Style

Cheung, Ying Kuen, Pei-Yun Sabrina Hsueh, Ipek Ensari, Joshua Z. Willey, and Keith M. Diaz. 2018. "Quantile Coarsening Analysis of High-Volume Wearable Activity Data in a Longitudinal Observational Study" Sensors 18, no. 9: 3056. https://0-doi-org.brum.beds.ac.uk/10.3390/s18093056

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop