close

Вход

Забыли?

вход по аккаунту

?

AnnalsATS.201702-101OC

код для вставкиСкачать
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 1 of 55
Discovering Pediatric Asthma Phenotypes Based on Response to Controller Medication Using
Machine Learning
Mindy K. Ross MD, MBA, MAS1, Jinsung Yoon MS2, Auke van der Schaar MSE3, Mihaela van der
Schaar PhD 2,4
1
Department of Pediatrics, Division of Pediatric Pulmonology and Sleep Medicine, University of
California Los Angeles, Los Angeles, CA
2
Department of Electrical Engineering, University of California Los Angeles, Los Angeles, CA
3
Stratagem Technologies, London, United Kingdom
4
Man-Institute, University of Oxford, Oxford, United Kingdom
Corresponding Author:
Mindy K. Ross
10833 Le Conte Ave
MDCC 22-387B
Los Angeles, CA 90095
Author Contributions: MKR conceptualized the project, was involved in the methodology,
validation, data curation, writing – original draft, review & editing, supervision, and project
administration. JY was involved in the algorithm methodology, software application, validation,
formal analysis, data curation, data visualization, and writing – review & editing. AVDS was
involved in the algorithm methodology, software application, validation, formal analysis, data
curation, data visualization, and writing – review & editing. MVDS was involved in the
supervision of the methodology, software application, validation, formal analysis, resources,
data curation, data visualization, and writing – review & editing.
Sources of Support: Ross- U54TR001627; Yoon, van der Schaar, M.- NSF ECCS1407712
Running Title: Pediatric Asthma Phenotypes and Treatment Response
ATS Descriptor Code: 1.16
MeSH Terms: Obesity, Personalized Medicine
Word Count: 3,330
This article has a data supplement, which is accessible from this issue's table of contents online
at www.atsjournals.org
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Abstract
Rationale: Pediatric asthma has variable underlying inflammation and symptom control.
Approaches to address this heterogeneity such as clustering methods to find phenotypes and
predict outcomes have been investigated. However, clustering based upon the relationship
between treatment and clinical outcome has not been performed, and machine learning
approaches for long-term outcome prediction in pediatric asthma have not been studied in
depth.
Objectives: Our objectives were to use our novel machine learning algorithm, Predictor Pursuit
(PP) to discover pediatric asthma phenotypes based on asthma control in response to controller
medications, to predict longitudinal asthma control among children with asthma, and to
identify features associated with asthma control within each discovered pediatric phenotypes.
Methods: We applied PP to the Childhood Asthma Management Program study data (n=1,019)
to discover phenotypes based on asthma control between assigned controller therapy groups
(budesonide vs. nedocromil). We confirmed PP’s ability to discover phenotypes using the
Asthma Clinical Research Network/Childhood Asthma Research and Education network data.
We next predicted children’s asthma control over time and compared PP’s performance to
traditional prediction methods. Last, we identified clinical features most correlated with asthma
control in the discovered phenotypes.
Results: Four phenotypes were discovered in both datasets: allergic-not-obese (A(+)/O(-)), obesenot-allergic (A(-)/O(+)), allergic-and-obese (A(+)/O(+)), and not-obese-not-allergic (A(-)/O(-)). Of the
well-controlled children in the CAMP dataset, we found more non-obese children treated with
budesonide than nedocromil (p=0.015) and more obese children treated with nedocromil than
Copyright © 2017 by the American Thoracic Society
Page 2 of 55
Page 3 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
budesonide (p=0.008). Within the obese group, more A(+)/O(+) children were well-controlled
with nedocromil than budesonide (p=0.022) or placebo (p=0.011). The PP algorithm performed
significantly better (p<0.001) than traditional machine learning algorithms for both short and
long-term asthma control prediction. Asthma control and bronchodilator response were the
features most predictive of short-term asthma control regardless of type of controller
medication or phenotype. Bronchodilator response and serum eosinophils were the most
predictive features of asthma control regardless of type of controller medication or phenotype.
Conclusions: Advanced statistical machine learning approaches can be a powerful tool to
discover phenotypes based upon treatment response and can aid in asthma control prediction
in complex medical conditions such as asthma.
Abstract Word Count: 350
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Asthma is a complex disease with heterogeneous inflammation between and within individuals.
Over recent decades, the construct of TH2-predominant inflammation has broadened to include
TH1, TH17, TReg inflammatory profiles, and more.1,2 Because inhaled corticosteroids have a wide
range of anti-inflammatory properties, they are recommended as the first-line asthma
medication for all patients with persistent asthma.3 However, not all individuals respond the
same to controller medication within and between medication classes4,5 and ten percent or
more of those with asthma are considered difficult to control.6 In order to find patterns in the
data, approaches such as clustering and predictive modeling have been used.
Clustering methods (e.g. k-means or latent class analysis) have found specific
phenotypes in adults and children that include allergic markers, body mass index (BMI), age of
asthma onset, clinical manifestations, and severity.7-19 But, these methods have limitations.
They often require clinicians to choose the features (variables) included in the model, which can
introduce feature selection bias. In addition, there is lack of statistical confirmation of the
differences between clusters. Most importantly for clinical decisions, they do not inform which
treatment an individual patient may benefit from the most.
Traditional prediction algorithms (e.g. logistic regression) to predict asthma control have
shown promise. It has been found that short-term asthma control is most indicative of future
control.20-22 However, long-term (one year or more) asthma control prediction is more
challenging, in part due to instability of features that change over time, such as adherence and
seasonality.23,24 Standard models often cannot capture these complex relationships because
they usually apply a “one-size-fits-all” model to the entire feature space (i.e. all dimensional
Copyright © 2017 by the American Thoracic Society
Page 4 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 5 of 55
combinations of variables in the dataset). To our knowledge, long-term asthma control
prediction has not been assessed in depth in children.
Our novel machine learning tool, Predictor Pursuit (PP),25 addresses these limitations of
other machine learning and prediction methods. The PP tool was designed to discover
phenotypes and predict clinical outcomes in an entirely data-driven fashion with the ability to
find heterogeneous relationships among clinical features and outcomes. Therefore, the asthma
domain is ideal for our tool to discover complex data patterns that have clinical relevance.
In this study, we used the PP machine learning algorithm to 1) discover statisticallydistinct asthma phenotypes based upon asthma control according to the type of controller
therapy, to 2) predict asthma control state (both long and short term) based on clinical features,
and 3) identify the most predictive clinical features of asthma control state for the discovered
phenotypes. Some of the results have been previously reported in the form of an abstract.26
Methods
Predictor Pursuit (PP) Algorithm
The first capability of PP is to identify phenotypes (subgroups) of children based upon statistical
differences between asthma control status. The method iteratively discovers phenotypes in a
dataset until there are no statistical differences that can lead to further division (Figure 1). This
permits independent, data-driven discoveries.
The second capability is to predict clinical outcome based on all available features.
Predictor Pursuit sequentially divides the feature space and assigns different predictive models
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
for each discovered feature subspace to capture relationships between features and the
outcomes in order to maximize prediction accuracy (Figure 2). It continues to divide the feature
space until there are no prediction performance improvements that can lead to a further
division. The advantage of this method is that we can use heterogeneous (different) predictive
models for the discovered feature subspaces that cannot be done by traditional machine
learning methods such as logistic regression (Figure 3). The PP method does not require
variable standardization because it uses statistical differences (the two sample t-test)27 as
criteria rather than a distance metric. Additional details on the above methods are provided in
an online data supplement.
Study Population
We applied the PP algorithm to two datasets: the Childhood Asthma Management Program
(CAMP) trial, and the Asthma Clinical Research Network (ACRN)/Childhood Asthma Research
and Education (CARE) network.
The CAMP trial,28 is a large randomized, placebo-controlled pediatric asthma study. The
de-identified dataset was obtained by request through the National Heart, Lung and Blood
Institute’s Biologic Specimen and Data Repository Information Coordinating Center
(BioLINCC).29 The study included 1,041 participants with mild to moderate persistent asthma,
ages 5-12 years old assigned to budesonide, nedocromil, or placebo medication. Per the study
protocol, the children were assessed at baseline and every four months. There were 962
features (variables) collected at these intervals including sociodemographics, lung function
measurements, asthma morbidity, use of healthcare resources, side effects, change in
Copyright © 2017 by the American Thoracic Society
Page 6 of 55
Page 7 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
controller medication, missed school days, physical growth and development, and psychological
development.
The Asthma Clinical Research Network (ACRN)/Childhood Asthma Research and
Education (CARE) network.30 This dataset is a collection of adults and children with mild-severe
asthma from multiple studies and is a mix of observational and randomized-controlled (with
and without cross-over design) studies. The ACRN/CARE data was obtained by request from the
NHLBI SNP Health Association Asthma Resource Project (SHARP) through the database of
Genotypes and Phenotypes (dbGaP).31 There were a total of 1,353 adults and children in the
dataset. There was a variety of inhaled corticosteroids (ICS) in the dataset (beclomethasone,
budesonide, flunisolide, fluticasone, and triamcinolone), so we grouped them into one category
for comparison to montelukast. We selected children aged 5-12 years old with any ICS or
montelukast (n=684). We harmonized 56 features across the datasets. Features included
sociodemographics, lung function measurements, asthma symptoms, use of healthcare
resources, and physical growth.
To ensure a consistent data trend over time, we excluded individuals with fewer than
four clinical follow-up visits documented over the entire study period in the CAMP dataset and
fewer than four consecutive visits in the ACRN/CARE dataset. We excluded children with other
known pulmonary conditions, such cystic fibrosis. The UCLA Institutional Review Board
approved this study.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Features (Variables)
We excluded features with more than 5% missing data. Missing data was imputed using the knearest neighbors (k-NN) imputation methodology.32 In addition to the provided features, we
generated features documented in the literature as relevant to asthma phenotypes, such as
allergic status, obese status, bronchodilator response, and adherence.8 In both datasets, a BMI
of ≥95th percentile for age defined obesity. This was calculated by converting the raw BMI
scores to percentiles using a BMI conversion table based on age and sex. In the CAMP dataset,
we defined an allergic state by the presence of any one of the following features: a positive skin
test to any allergen, history of allergy shots, or physician diagnosis of allergy. In the ACRN/CARE
dataset, there were not as many features available; therefore, allergic status was determined
by a positive skin test to any allergen. For the CAMP dataset only, bronchodilator response was
defined as the change in the forced expiratory volume in one second (FEV1%) following
bronchodilator administration. Adherence was determined by a “no” response to the question
“takes medicine as prescribed.”
Outcome Measurement
The outcome measurement was asthma control state (well-controlled vs. not well-controlled)
as defined by the 2007 NAEPP asthma guideline criteria for impairment and risk.3
After feature exclusion, “now well-controlled” was defined in the CAMP dataset by the
presence of one of more of the following: FEV1 <80% predicted, symptoms >2 times per week,
use of short-acting beta agonist >2 days per week, any limitations in normal activity, or any
emergency room visit or hospitalization. In the ACRN/CARE dataset, “not well-controlled” was
Copyright © 2017 by the American Thoracic Society
Page 8 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 9 of 55
defined as FEV1 <80% predicted, symptoms >2 times per week, rescue medication use <1 time
per week, any emergency room visit or hospitalization, or oral steroid use more than once per
year.
Study Design/Analysis
Phenotype discovery. We applied PP to each dataset to divide the children into phenotypes
that maximized the difference in asthma control between controller therapy groups (CAMP
trial=budesonide vs. nedocromil; ACRN/CARE data=ICS vs. montelukast). For both datasets, we
separated the data into independent training and testing sets (i.e. the training set discovered
the phenotypes and then confirmed these in the independent testing sets). We used 50% of the
cases at random for the training set and the other 50% were the testing set. We examined
associations between the type asthma controller therapy and asthma control within each
phenotype using proportional tests33 (with significance level of 0.05) and verified these results
with permutation tests.34 We used interaction tests35 to test whether the associations between
the type of asthma controller therapy and asthma control varied between phenotypes. While
the ACRN/CARE dataset contained mostly participants from randomized trials, it also included
observational data. Therefore, we used inverse propensity of treatment weighting (IPTW) to
account for treatment selection bias from this dataset before applying PP.36
For each phenotype identified in the CAMP study data, we used a Markov-Chain
approach model to estimate the likelihood of patients within phenotypes remaining in their
current asthma control state at four month intervals based upon the previous state. The results
are presented in the supplement (Figure E4).
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Asthma control prediction and predictive features. We performed the final two aims of
the study using the CAMP dataset because of its larger sample size and greater number of
features. First, we applied PP to predict asthma control based on all available features over
both the short-term (four months) and long-term (after 4 follow-up visits, or approximately one
year). If asthma was “well-controlled” at three or more follow-up visits, we classified the
participant as “well-controlled”; otherwise, they were classified as “not well-controlled.” In
order to validate the results, we trained the predictive model on a training set and tested it on
an independent testing set using 5-fold cross validation. We compared the results to traditional
machine learning methods: neural networks, logistic regression, adaptive boosting, random
forest, Support Vector Machine (SVM), and Naïve Bayes using two sample t-tests.
Next, we identified the strongest (based upon Pearson correlation coefficient) indicative
features, for short and long-term asthma control, for the four discovered phenotypes. We
determined the important features regardless of assigned medication and then analyzed the
features within each phenotype by assigned treatment (budesonide or nedocromil). We further
studied the predictive value of each feature over the long-term using the Python sklearn
package and the results are presented in the supplement (Figure E5).
Results
There were 1,019 children from the CAMP study and 669 children from the ACRN/CARE dataset
in the final analysis. The baseline features of the groups are described in Tables 1 and 2. There
were 602 (out of 962) clinical features (variables) used in our model from the CAMP study and
Copyright © 2017 by the American Thoracic Society
Page 10 of 55
Page 11 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
54 (out of 57) from the ACRN/CARE dataset. Over the end-segment of the studies, we classified
36.7% of the children as well-controlled in the CAMP study and 21.5% as well-controlled in the
ACRN/CARE dataset.
Phenotype Discovery
The algorithm discovered that obesity and allergy related features were the most statistically
significant features that distinguished poor control from good control in both the training and
testing sets. A sensitivity analysis was performed around one of PP’s parameters, the minimum
number of patients in each subgroup (). The algorithm was set from 10 to 200. There was no
difference in discovered phenotypes in the range of 10 to 200, but above 200, no phenotypes
were discovered. Four groups were identified: allergic-not-obese (A(+)/O(-)), obese-not-allergic
(A(-)/O(+)), allergic-and-obese (A(+)/O(+)), not-allergic-not-obese (A(-)/O(-)). Overall, for both
datasets, we did not find a significant difference in the distribution of assigned controller
medications between phenotypes (Tables 1 and 2).
In both the training and test sets of the CAMP study, two phenotypes had significantly
different control states by assigned controller medication (Table 3). The A(+)/O(+) phenotype
contained more children that were well-controlled with nedocromil vs. budesonide (52.6% vs.
16.7%, p=0.022) or vs. placebo (52.6% vs. 19.4%, p=0.011). The A(+)/O(-) phenotype had more
well-controlled children treated with budesonide than nedocromil (47.2% vs. 31.0%, p=0.030).
There was no significant difference in asthma control between budesonide and placebo (47.2%
vs. 35.2%, p=0.110). Longitudinal asthma control among A(+)/O(-) and A(+)/O(+) phenotypes
stratified by controller therapy are shown in Figure 4. For the A(-)/O(+) or A(-)/O(-) phenotypes,
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
there was no significant difference in the number of well-controlled children assigned
budesonide, nedocromil or placebo.
Well-controlled obese children (with or without allergies) we more frequently treated
with nedocromil as compared to budesonide (52.2% vs. 16.0%, p=0.008) or placebo (52.2% vs.
20.9%, p=0.009), and non-obese children were more often treated with budesonide as
compared to nedocromil (48.6% vs. 33.8%, p=0.015) or placebo (48.6% vs. 34.2%, p=0.016).
In the ACRN/CARE dataset, asthma control varied between those receiving ICS
compared to montelukast among those with the A(+)/O(-) phenotype (Table 4). This group had
more well-controlled children treated with ICS (40.0% vs. 19.2%, p=0.004). There was no
significant difference in the number of well-controlled children between ICS and montelukast
for children categorized as A(+)/O(+), A(-)/O(+), or A(-)/O(-). Well-controlled allergic children (with or
without obesity) were more frequently treated with an ICS as compared to montelukast (39.9%
vs. 24.1%, p=0.019), and non-obese children were more often treated with an ICS rather than
montelukast (33.6% vs. 18.9%, p=0.011).
Outcome Prediction and Predictive Features
For the entire CAMP study, Predictor Pursuit’s short-term prediction accuracy (AUC) for level of
control over time using all features in the dataset that met inclusion criteria was 0.86. The longterm prediction accuracy was 0.66. In both cases, PP performed statistically better (with
significance level 0.05) when compared to typical machine-learning methods on this dataset
(Table 5). Our results for short-term asthma control prediction were consistent with previous
literature that the current control state is the most indicative of control state at the next
Copyright © 2017 by the American Thoracic Society
Page 12 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 13 of 55
assessment based on the Pearson correlation coefficient (r=0.58). The two most correlated
predictive features for long-term control were bronchodilator response (r= -0.28) and serum
percent eosinophils (-0.16). The 10 strongest predictive features are reported in the
supplement (Table E1).
When the cohort was divided into the four phenotypes, (A(+)/O(-), A(-)/O(+), A(+)/O(+), and
A(-)/O(-)), short-term asthma control was still best indicated by previous asthma control state
(Table E2). Over the long-term, bronchodilator response and serum eosinophils predicted
better asthma control (Table 6, Table E3). When these groups were examined based on
assigned medication (budesonide or nedocromil), the strongest predictive features remained
the current control state for short-term prediction and bronchodilator response for long-term
prediction (Table 7, Table E4).
Discussion
Predictor Pursuit discovered that obesity and allergy features determined phenotypes with the
most significant differences in asthma control based on assigned controller medication. More
well-controlled non-obese (specifically allergic-non-obese) children were treated with
budesonide versus nedocromil; whereas more well-controlled obese (specifically allergic-andobese) children were treated with nedocromil versus budesonide. Asthma control over the
short and long-term was predicted with better accuracy by PP than standard machine learning
algorithms such as logistic regression. The most relevant features for short-term control
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
prediction were the current control state and bronchodilator response. The strongest predictive
features for long-term control prediction were bronchodilator response and serum eosinophils.
Of the four discovered phenotypes, we found that more well-controlled A(+)/O(+)
children were treated with nedocromil rather than budesonide or placebo. More wellcontrolled A(+)/O(-) children were assigned to budesonide rather than nedocromil, but there was
no statistically significant advantage over placebo. The PP clustering algorithm was able to find
these patterns because of its advantages over traditional machine learning algorithms such as
(1) there is no clinician bias of input variables (i.e. it is an entirely data-driven methodology), (2)
it identifies phenotypes based on relationships between treatment and controllability, and (3) it
guarantees that the discovered phenotypes have statistically significant differences between
the treatment responses.
Our study is aligned with the finding by Forno et al. that overweight/obese-asthmatics in
the CAMP study responded less favorably to budesonide, which our study shows with statistical
significance, but they did not assess the response to nedocromil.37 It is notable that our method
discovered this obese-asthmatic group using a machine learning approach rather than manually
choosing the cohort to study. Another secondary analysis of CAMP data by Howrylak et al. used
spectral clustering to find clusters within this dataset and evaluate the cumulative probability of
oral steroid course and time to switch controller therapy.38 Their approach used a smaller
number of clinical features to find clusters; whereas, our method used hundreds of variables to
determine clusters based on response to medication in a data-driven fashion. In addition, by
using the BMI percentile to define obese children, which was not performed in their paper, we
discovered that obesity is an important feature correlated with treatment response.
Copyright © 2017 by the American Thoracic Society
Page 14 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 15 of 55
Our findings support previous knowledge that TH2-based allergic asthma responds most
favorably to ICS.39 Although there can be some allergic overlap, obese asthma tends to be less
responsive to inhaled ICS.40-44 This is thought to be due in part to other active inflammatory
pathways involving TH17, mast cells, and neutrophils.45-50 The literature reports that children
with obese and allergic asthma have more severe and/or poorly controlled symptoms,42,51 so it
is valuable to determine which, if any, medications may be more conducive to treat this group
of children.
Because of the additional or different underlying inflammation, non-ICS based controller
medications could potentially be targeted toward certain children with obese-asthma. Previous
reports in the literature about these controller medications have suggested that the leukotriene
receptor antagonist montelukast may have an effect on neutrophilic inflammation; and a
recent abstract reported that adults with mild A(+)/O(+) asthma had a more favorable response
to montelukast than ICS.52 The 5-lipoxygenase inhibitors have been suggested to affect the
leukotriene B4 pathway produced by neutrophils.53-56 Mast cells have been implicated in obese
asthma and nedocromil and cromolyn may also have an effect on neutrophil function in
addition to mast cells.57,58 Finally, theophylline has been shown to affect apoptosis and
chemotaxis of neutrophils.59,60
In regard to asthma control state prediction, PP outperformed traditional predictive
models such as logistic regression and SVM in the short and long-term time periods. It was able
to discover more detailed predictive features for long-term asthma control other than the
current control state. The PP algorithm (aka ConfidentMatch) has also been successfully applied
to the heart transplant domain in terms of predicting transplant outcomes.25 A next step is
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
integrating these predictive models into real-time data in order to identify children who are
most at-risk for an asthma exacerbation.
While PP’s findings appear to be clinically aligned, our study does have some
shortcomings. First PP has limitations typical of any machine learning approach in that we
assume better predictive power with larger datasets. Some features were eliminated due to
missing data and the small amount of data imputation may have affected our results. The
prediction accuracy for asthma control was higher for short-term intervals (four months) rather
than longer intervals. This may due to other factors that affect long-term controllability such as
adherence, seasonality, and other features that change over time. In terms of adherence, we
could not eliminate children that reported non-adherence because the sample size would have
been too small for our method to analyze. However, we noted in the CAMP study data that the
level of adherence was similar between all treatment groups and the PP algorithm did not
identify non-adherence as a feature that determined different medication response between
the two groups.
While our goal was to predict asthma phenotype control based on controller medication,
we are of course limited in our clinical assumptions with this post-hoc analysis of a pre-existing
dataset. By the end of the study, the majority of children were not well-controlled and often in
the well-controlled group, the treatment medications did not perform significantly better than
placebo.
In conclusion, PP discovered differences in asthma control state to controller medication
based on allergic and obese-related features. The PP algorithm was also able to predict
pediatric asthma control state over the long-term with greater accuracy than standard machine
Copyright © 2017 by the American Thoracic Society
Page 16 of 55
Page 17 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
learning approaches. However, in order to make clinical assumptions to guide controller
medication choice for a given phenotype, prospective studies on larger datasets of real-world
data are needed.
The long-term goal for this line of research is to eventually determine which asthma
medication maximizes the probability of a well-controlled state and incorporate this
information into the overall asthma treatment plan for children with asthma. For precision
medicine in asthma, treatment choice based on asthma phenotype is one part of a
comprehensive approach to asthma management that would also take into consideration
sociodemographics, environment, adherence, genetics, and other factors.
Acknowledgements
Dr. Peter Szilagyi for his critical analysis of the manuscript revision. Kyeong Ho (Kenneth) Moon
for data processing. Dr. Douglas Bell for introducing this collaboration. We acknowledge the
NIH GWAS Data Repository, the NHLBI, and the investigator(s) who contributed to the
phenotype data from his/her original studies. This Manuscript was prepared using CAMP
Research Materials obtained from the NHLBI Biologic Specimen and Data Repository
Information Coordinating Center and does not necessarily reflect the opinions or views of the
CAMP or the NHLBI.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 18 of 55
References
1. Lloyd CM, Hessel EM. Functions of T cells in asthma: more than just T(H)2 cells. Nat Rev
Immunol. 2010;10(12):838-848.
2. Gelfand EW, Alam R. The other side of asthma: Steroid-refractory disease in the absence of
TH2-mediated inflammation. J Allergy Clin Immunol. 2015;135(5):1196-1198.
3. National Asthma E, Prevention P. Expert Panel Report 3 (EPR-3): Guidelines for the
Diagnosis and Management of Asthma-Summary Report 2007. J Allergy Clin Immunol.
2007;120(5 Suppl):S94-138.
4. Szefler SJ, Phillips BR, Martinez FD, Chinchilli VM, Lemanske RF, Strunk RC, Zeiger RS, Larsen
G, Spahn JD, Bacharier LB, Bloomberg GR, Guilbert TW, Heldt G, Morgan WJ, Moss MH,
Sorkness CA, Taussig LM. Characterization of within-subject responses to fluticasone and
montelukast in childhood asthma. J Allergy Clin Immunol. 2005;115(2):233-242.
5. Fitzpatrick AM, Jackson DJ, Mauger DT, Boehmer SJ, Phipatanakul W, Sheehan WJ, Moy JN,
Paul IM, Bacharier LB, Cabana MD, Covar R, Holguin F, Lemanske RF, Jr., Martinez FD,
Pongracic JA, Beigelman A, Baxi SN, Benson M, Blake K, Chmiel JF, Daines CL, Daines MO,
Gaffin JM, Gentile DA, Gower WA, Israel E, Kumar HV, Lang JE, Lazarus SC, Lima JJ, Ly N,
Marbin J, Morgan W, Myers RE, Olin JT, Peters SP, Raissy HH, Robison RG, Ross K, Sorkness
CA, Thyne SM, Szefler SJ, National Institutes of Health/National Heart L, Blood Institute A.
Individualized therapy for persistent asthma in young children. J Allergy Clin Immunol.
2016;138(6):1608-1618 e1612.
6. Chung KF, Wenzel SE, Brozek JL, Bush A, Castro M, Sterk PJ, Adcock IM, Bateman ED, Bel EH,
Bleecker ER, Boulet LP, Brightling C, Chanez P, Dahlen SE, Djukanovic R, Frey U, Gaga M,
Gibson P, Hamid Q, Jajour NN, Mauad T, Sorkness RL, Teague WG. International ERS/ATS
guidelines on definition, evaluation and treatment of severe asthma. Eur Respir J.
2014;43(2):343-373.
7. Green RH, Brightling CE, Bradding P. The reclassification of asthma based on
subphenotypes. Curr Opin Allergy Clin Immunol. 2007;7(1):43-50.
8. Lotvall J, Akdis CA, Bacharier LB, Bjermer L, Casale TB, Custovic A, Lemanske RF, Jr., Wardlaw
AJ, Wenzel SE, Greenberger PA. Asthma endotypes: a new approach to classification of
disease entities within the asthma syndrome. J Allergy Clin Immunol. 2011;127(2):355-360.
9. Wenzel S. Severe asthma: from characteristics to phenotypes to endotypes. Clin Exp Allergy.
2012;42(5):650-658.
10. Haldar P, Pavord ID, Shaw DE, Berry MA, Thomas M, Brightling CE, Wardlaw AJ, Green RH.
Cluster analysis and clinical asthma phenotypes. Am J Respir Crit Care Med.
2008;178(3):218-224.
11. Weatherall M, Travers J, Shirtcliffe PM, Marsh SE, Williams MV, Nowitz MR, Aldington S,
Beasley R. Distinct clinical phenotypes of airways disease defined by cluster analysis. Eur
Respir J. 2009;34(4):812-818.
Copyright © 2017 by the American Thoracic Society
Page 19 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
12. Moore WC, Meyers DA, Wenzel SE, Teague WG, Li H, Li X, D'Agostino R, Jr., Castro M,
Curran-Everett D, Fitzpatrick AM, Gaston B, Jarjour NN, Sorkness R, Calhoun WJ, Chung KF,
Comhair SA, Dweik RA, Israel E, Peters SP, Busse WW, Erzurum SC, Bleecker ER, National
Heart L, Blood Institute's Severe Asthma Research P. Identification of asthma phenotypes
using cluster analysis in the Severe Asthma Research Program. Am J Respir Crit Care Med.
2010;181(4):315-323.
13. Siroux V, Basagana X, Boudier A, Pin I, Garcia-Aymerich J, Vesin A, Slama R, Jarvis D, Anto
JM, Kauffmann F, Sunyer J. Identifying adult asthma phenotypes using a clustering
approach. Eur Respir J. 2011;38(2):310-317.
14. Schatz M, Hsu JW, Zeiger RS, Chen W, Dorenbaum A, Chipps BE, Haselkorn T. Phenotypes
determined by cluster analysis in severe or difficult-to-treat asthma. J Allergy Clin Immunol.
2014;133(6):1549-1556.
15. Deliu M, Sperrin M, Belgrave D, Custovic A. Identification of Asthma Subtypes Using
Clustering Methodologies. Pulm Ther. 2016;2:19-41.
16. Loza MJ, Adcock I, Auffray C, Chung KF, Djukanovic R, Sterk PJ, Susulic VS, Barnathan ES,
Baribaud F, Silkoff PE, Adept, Investigators UB. Longitudinally Stable, Clinically Defined
Clusters of Patients with Asthma Independently Identified in the ADEPT and U-BIOPRED
Asthma Studies. Ann Am Thorac Soc. 2016;13 Suppl 1:S102-103.
17. Loureiro CC, Sa-Couto P, Todo-Bom A, Bousquet J. Cluster analysis in phenotyping a
Portuguese population. Rev Port Pneumol (2006). 2015.
18. Spycher BD, Silverman M, Brooke AM, Minder CE, Kuehni CE. Distinguishing phenotypes of
childhood wheeze and cough using latent class analysis. Eur Respir J. 2008;31(5):974-981.
19. Fitzpatrick AM, Teague WG, Meyers DA, Peters SP, Li X, Li H, Wenzel SE, Aujla S, Castro M,
Bacharier LB, Gaston BM, Bleecker ER, Moore WC, National Institutes of Health/National
Heart L, Blood Institute Severe Asthma Research P. Heterogeneity of severe asthma in
childhood: confirmation by cluster analysis of children in the National Institutes of
Health/National Heart, Lung, and Blood Institute Severe Asthma Research Program. J Allergy
Clin Immunol. 2011;127(2):382-389 e381-313.
20. Luo G, Stone BL, Fassl B, Maloney CG, Gesteland PH, Yerram SR, Nkoy FL. Predicting asthma
control deterioration in children. BMC medical informatics and decision making.
2015;15:84.
21. Sharma HP, Matsui EC, Eggleston PA, Hansel NN, Curtin-Brosnan J, Diette GB. Does current
asthma control predict future health care use among black preschool-aged inner-city
children? Pediatrics. 2007;120(5):e1174-1181.
22. Finkelstein J, Jeong IC. Machine learning approaches to personalize early prediction of
asthma exacerbations. Annals of the New York Academy of Sciences. 2017;1387(1):153-165.
23. Johnson KM, FitzGerald JM, Tavakoli H, Chen W, Sadatsafavi M. Stability of Asthma
Symptom Control in a Longitudinal Study of Mild-Moderate Asthmatics. J Allergy Clin
Immunol Pract. 2017.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
24. Schatz M, Zeiger RS, Yang SJ, Chen W, Crawford W, Sajjan S, Allen-Ramey F. Change in
asthma control over time: predictors and outcomes. J Allergy Clin Immunol Pract.
2014;2(1):59-64.
25. Yoon J, Alaa AM, Cadeiras M, van der Schaar M. Personalized Donor-Recipient Matching for
Organ Transplantation. arXiv preprint arXiv:161103934. 2016.
26. Ross MK, Yoon J, Ho Moon K, Van Der Schaar M. A Personalized Approach To Asthma
Control Over Time: Discovering Phenotypes Using Machine Learning. C26. ASTHMA IN
INFANTS AND CHILDREN: Am Thoracic Soc; 2017:A5093-A5093.
27. Zimmerman DW. Teacher’s Corner: A Note on Interpretation of the Paired-Samples t Test.
Journal of Educational and Behavioral Statistics. 1997;22(3):349-360.
28. Long-term effects of budesonide or nedocromil in children with asthma. The Childhood
Asthma Management Program Research Group. N Engl J Med. 2000;343(15):1054-1063.
29. Giffen CA, Carroll LE, Adams JT, Brennan SP, Coady SA, Wagner EL. Providing Contemporary
Access to Historical Biospecimen Collections: Development of the NHLBI Biologic Specimen
and Data Repository Information Coordinating Center (BioLINCC). Biopreserv Biobank.
2015;13(4):271-279.
30. Denlinger LC, Sorkness CA, Chinchilli VM, Lemanske RF, Jr. Guideline-defining asthma clinical
trials of the National Heart, Lung, and Blood Institute's Asthma Clinical Research Network
and Childhood Asthma Research and Education Network. J Allergy Clin Immunol.
2007;119(1):3-11; quiz 12-13.
31. Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J,
Phan L, Popova N, Pretel S, Ziyabari L, Lee M, Shao Y, Wang ZY, Sirotkin K, Ward M,
Kholodov M, Zbicz K, Beck J, Kimelman M, Shevelev S, Preuss D, Yaschenko E, Graeff A,
Ostell J, Sherry ST. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet.
2007;39(10):1181-1186.
32. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman
RB. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520525.
33. Mc NQ. Note on the sampling error of the difference between correlated proportions or
percentages. Psychometrika. 1947;12(2):153-157.
34. Good P. Permutation tests: a practical guide to resampling methods for testing hypotheses.
Springer Science & Business Media; 2013.
35. Altman DG, Bland JM. Interaction revisited: the difference between two estimates. BMJ.
2003;326(7382):219.
36. Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of
Confounding in Observational Studies. Multivariate Behav Res. 2011;46(3):399-424.
Copyright © 2017 by the American Thoracic Society
Page 20 of 55
Page 21 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
37. Forno E, Lescher R, Strunk R, Weiss S, Fuhlbrigge A, Celedon JC, Childhood Asthma
Management Program Research G. Decreased response to inhaled steroids in overweight
and obese asthmatic children. J Allergy Clin Immunol. 2011;127(3):741-749.
38. Howrylak JA, Fuhlbrigge AL, Strunk RC, Zeiger RS, Weiss ST, Raby BA, Childhood Asthma
Management Program Research G. Classification of childhood asthma phenotypes and longterm clinical responses to inhaled anti-inflammatory medications. J Allergy Clin Immunol.
2014;133(5):1289-1300, 1300 e1281-1212.
39. Walker C, Bode E, Boer L, Hansel TT, Blaser K, Virchow JC, Jr. Allergic and nonallergic
asthmatics have distinct patterns of T-cell activation and cytokine production in peripheral
blood and bronchoalveolar lavage. Am Rev Respir Dis. 1992;146(1):109-115.
40. Peters-Golden M, Swern A, Bird SS, Hustad CM, Grant E, Edelman JM. Influence of body
mass index on the response to asthma controller agents. Eur Respir J. 2006;27(3):495-503.
41. Boulet LP, Franssen E. Influence of obesity on response to fluticasone with or without
salmeterol in moderate asthma. Respir Med. 2007;101(11):2240-2247.
42. Sutherland ER, Goleva E, King TS, Lehman E, Stevens AD, Jackson LP, Stream AR, Fahy JV,
Leung DY, Asthma Clinical Research N. Cluster analysis of obesity and asthma phenotypes.
PLoS One. 2012;7(5):e36631.
43. Meagher LC, Cousin JM, Seckl JR, Haslett C. Opposing effects of glucocorticoids on the rate
of apoptosis in neutrophilic and eosinophilic granulocytes. J Immunol. 1996;156(11):44224428.
44. Sampson AP. The role of eosinophils and neutrophils in inflammation. Clin Exp Allergy.
2000;30 Suppl 1:22-27.
45. Dixon AE, Holguin F, Sood A, Salome CM, Pratley RE, Beuther DA, Celedon JC, Shore SA,
American Thoracic Society Ad Hoc Subcommittee on O, Lung D. An official American
Thoracic Society Workshop report: obesity and asthma. Proc Am Thorac Soc. 2010;7(5):325335.
46. Telenga ED, Tideman SW, Kerstjens HA, Hacken NH, Timens W, Postma DS, van den Berge
M. Obesity in asthma: more neutrophilic inflammation as a possible explanation for a
reduced treatment response. Allergy. 2012;67(8):1060-1068.
47. McGrath KW, Icitovic N, Boushey HA, Lazarus SC, Sutherland ER, Chinchilli VM, Fahy JV,
Asthma Clinical Research Network of the National Heart L, Blood I. A large subgroup of mildto-moderate asthma is persistently noneosinophilic. Am J Respir Crit Care Med.
2012;185(6):612-619.
48. Moore WC, Hastie AT, Li X, Li H, Busse WW, Jarjour NN, Wenzel SE, Peters SP, Meyers DA,
Bleecker ER, National Heart L, Blood Institute's Severe Asthma Research P. Sputum
neutrophil counts are associated with more severe asthma phenotypes using cluster
analysis. J Allergy Clin Immunol. 2014;133(6):1557-1563 e1555.
49. Sismanopoulos N, Delivanis DA, Mavrommati D, Hatziagelaki E, Conti P, Theoharides TC. Do
mast cells link obesity and asthma? Allergy. 2013;68(1):8-15.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
50. Liu J, Divoux A, Sun J, Zhang J, Clement K, Glickman JN, Sukhova GK, Wolters PJ, Du J,
Gorgun CZ, Doria A, Libby P, Blumberg RS, Kahn BB, Hotamisligil GS, Shi GP. Genetic
deficiency and pharmacological stabilization of mast cells reduce diet-induced obesity and
diabetes in mice. Nat Med. 2009;15(8):940-945.
51. Dixon AE, Poynter ME. Mechanisms of Asthma in Obesity. Pleiotropic Aspects of Obesity
Produce Distinct Asthma Phenotypes. Am J Respir Cell Mol Biol. 2016;54(5):601-608.
52. Farzan S, Khan S, Elera C, Akerman M. Montelukast Is a Better Controller in Obese Atopic
Asthmatics. J Allergy Clin Immun. 2016;137(2):Ab210-Ab210.
53. Anderson R, Theron AJ, Gravett CM, Steel HC, Tintinger GR, Feldman C. Montelukast inhibits
neutrophil pro-inflammatory activity by a cyclic AMP-dependent mechanism. Brit J
Pharmacol. 2009;156(1):105-115.
54. Theron AJ, Gravett CM, Steel HC, Tintinger GR, Feldman C, Anderson R. Leukotrienes C4 and
D4 sensitize human neutrophils for hyperreactivity to chemoattractants. Inflamm Res.
2009;58(5):263-268.
55. Al Saadi MM, Meo SA, Mustafa A, Shafi A, Tuwajri AS. Effects of Montelukast on free radical
production in whole blood and isolated human polymorphonuclear neutrophils (PMNs) in
asthmatic children. Saudi Pharm J. 2011;19(4):215-220.
56. Busse WW. Leukotrienes and inflammation. Am J Respir Crit Care Med. 1998;157(6 Pt
1):S210-213.
57. Rand TH, Lopez AF, Gamble JR, Vadas MA. Nedocromil sodium and cromolyn (sodium
cromoglycate) selectively inhibit antibody-dependent granulocyte-mediated cytotoxicity. Int
Arch Allergy Appl Immunol. 1988;87(2):151-158.
58. Yazid S, Leoni G, Getting SJ, Cooper D, Solito E, Perretti M, Flower RJ. Antiallergic cromones
inhibit neutrophil recruitment onto vascular endothelium via annexin-A1 mobilization.
Arterioscler Thromb Vasc Biol. 2010;30(9):1718-1724.
59. Condino-Neto A, Vilela MM, Cambiucci EC, Ribeiro JD, Guglielmi AA, Magna LA, De Nucci G.
Theophylline therapy inhibits neutrophil and mononuclear cell chemotaxis from chronic
asthmatic children. Br J Clin Pharmacol. 1991;32(5):557-561.
60. Yasui K, Agematsu K, Shinozaki K, Hokibara S, Nagumo H, Nakazawa T, Komiyama A.
Theophylline induces neutrophil apoptosis through adenosine A2A receptor antagonism. J
Leukoc Biol. 2000;67(4):529-535.
Copyright © 2017 by the American Thoracic Society
Page 22 of 55
Page 23 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Figure Legends
Figure 1. Block diagram of Predictor Pursuit (PP) for phenotype discovery : The entire patient
group, : A patient subgroup, : Minimum number of patients in the patient subgroup. A
“greedy algorithm” (left) is an algorithm that is repeatedly applied to each newly generated
patient group (right), until no more subspaces exist that satisfy the rule of statistically different
clinical outcomes. The output of the function is identifying phenotypes.
Figure 2. Block diagram of Predictor Pursuit (PP) for outcome prediction. ℎ : Assigned
predictive model to the patient group. . In comparison to the phenotype discovery function
of PP (figure 1), the output of the function is predictive modeling. Furthermore, the outcome
prediction function of PP has different criteria for division of the patient groups. The final tree
(left) is achieved after PP repeatedly applies the steps (right) to the newly generated patient
group until there are no more patient subgroups that yield additional improvement of
prediction accuracy).
Figure 3. Predictor Pursuit (PP) constructs different predictive models for various subspaces as
opposed to standard machine learning methods that apply one predictive (i.e. a “one-size-fitsall”) model for the entire feature space. The PP algorithm (left) simultaneously divides the
patient groups and assigned corresponding predictive model for each patient group to further
improve (maximize) the prediction accuracy; whereas, standard logistic regression (right) tries
to find the single predictive model for the entire group that maximizes the prediction accuracy.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Figure 4. Percent of well-controlled children over time of the allergic-not-obese (A(+)/O(-)),
obese-and-allergic (A(+)/O(+), obese-not-allergic (A(-)/O(+)), and not-obese-not-allergic (A(-)/O(-))
children treated with budesonide (bud), nedocromil (ned), and placebo in the CAMP study data.
Copyright © 2017 by the American Thoracic Society
Page 24 of 55
Page 25 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Table 1. Baseline characteristics of the CAMP study children (5-12 years old)
Feature
Entire
Budesonide Nedocromi
(1,019)
(302)
l
(307)
Age, years
8.9 ± 2.1
9.0 ± 2.1
8.9 ± 2.1
609
201
Gender, male
175 (58.0%)
(59.8%)
(65.5%)
Age asthma onset, years
3.8 ± 2.3
3.7 ± 2.2
3.8 ± 2.4
617
193
Adherent to medication, yes
178 (58.9%)
(60.6%)
(62.9%)
93.8
93.5
FEV1% predicted
93.7 ± 14.6
± 14.3
± 14.4
319
Bronchodilator Response ≥12%
100 (33.1%) 95 (30.9%)
(31.3%)
Blood Eosinophils, IU/L
202 ± 175
192±146
208 ± 203
18.46 ±
18.24 ±
Body Mass Index, kg/m2
18.83 ± 6.7
5.1
4.8
137
Black
44 (14.6%)
37 (12.1%)
(13.4%)
Hispanic
97 (9.5%)
32 (10.6%)
28 (9.1%)
Ethnicity
Other
89 (8.7%)
30 (9.9%)
26 (8.5%)
696
216
White
196 (64.9%)
(68.3%)
(70.4%)
474
139
Mild
140 (46.4%)
(46.6%)
(45.3%)
Severity
545
168
162 (53.6%)
Moderate
(53.4%)
(54.7%)
574
181
Allergic-not-Obese
173 (57.3%)
(56.3%)
(59.0%)
Obese-not-Allergic
43 (4.2%)
15 (5.0%)
13 (4.2%)
Phenotype
128
s
Allergic-and-Obese
38 (12.6%)
34 (11.1%)
(12.6%)
Not-Obese-not274
76 (25.2%)
79 (25.7%)
Allergic
(26.9%)
CAMP=Childhood Asthma Management Program
Data are mean ± standard deviation or frequency (percentage)
Copyright © 2017 by the American Thoracic Society
Placebo
(410)
8.9 ± 2.2
233
(56.8%)
3.9 ± 2.4
246
(60.0%)
94.0
± 14.0
124
(30.2%)
206 ± 171
18.35 ±
3.9
56 (13.7%)
37 (9.0%)
33 (8.0%)
284
(69.3%)
195
(47.5%)
215
(52.5%)
220
(53.7%)
15 (3.7%)
56 (13.7%)
119
(29.0%)
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Table 2. Baseline characteristics of the ACRN/CARE dataset children (5-12 years old)
Feature
Entire
ICS
Montelukas
Placebo
(669)
(284)
t
(210)
(175)
Age, years
8.2 ± 3.2
7.8 ± 2.9
9.2 ± 3.4
7.9 ± 3.1
180
Gender, male
418 (62.5%)
110 (62.9%) 128 (61.0%)
(63.4%)
Age asthma onset, years
3.9 ± 2.1
3.8 ± 1.9
4.3 ± 2.5
3.7 ± 2.1
FEV1 % predicted
93.2 ± 8.4
92.8 ± 8.0 94.1 ± 10.0 93.0 ± 7.3
Bronchodilator response ≥ 12%
79 (11.8%)
35 (12.3%) 31 (17.7%)
13 (6. 2%)
Blood Eosinophils, IU/L
415 ± 261
436±274 363 ± 276
432 ± 222
Body Mass Index, kg/m2
18.0 ± 3.0
17.7 ± 2.7 18.2 ± 3.2
18.2 ± 3.2
Black
83 (12.4%)
31 (10.9%) 23 (13.1%)
29 (13.8%)
Hispanic
137 (20.5%) 60 (21.1%) 39 (22.3%)
38 (18.1%)
Ethnicity
162
White
377 (56.4%)
95 (54.3%) 120 (57.1%)
(57.0%)
Other
72 (10.8%)
31 (10.9%) 18 (10.3%)
23 (11.0%)
Mild
125 (18.7%) 49 (17.3%) 46 (26.3%)
30 (14.3%)
218
Severity
Moderate
492 (73.5%)
106 (60.6%) 168 (80.0%)
(76.8%)
Severe
52 (7.8%)
17 (6.0%)
23 (13.1%)
12 (5.7%)
165
Allergic-not-Obese
384 (57.4%)
99 (56.6%) 120 (57.1%)
(58.1%)
32 (4.8%)
12 (4.2%)
6 (3.4%)
14 (6.7%)
Phenotyp Obese-not-Allergic
es
Allergic-and-Obese
62 (9.3%)
25 (8.8%)
19 (10.9%)
18 (8.6%)
Not-Obese-not191 (28.6%) 82 (28.9%) 51 (29.1%)
58 (27.6%)
Allergic
ACRN= Asthma Clinical Research Network. CARE = Childhood Asthma Research and Education
(CARE). ICS= Inhaled corticosteroids
Data are mean ± standard deviation or frequency (percentage)
Copyright © 2017 by the American Thoracic Society
Page 26 of 55
Page 27 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Table 3. Associations between asthma controller therapy and asthma control in the CAMP
dataset by phenotype classification
Outcome (Well-Controlled Asthma)
ppPlaceb
ppvalue value
o
value value
Phenotype
1
1
1
1
Bud
Ned
(Prop (Perm
(Prop (perm
ortion utatio
ortion utatio
al)
n)
al)
n)
37
(35.2%)
0.110
0
0.100
9
0.477
3
0.011
3
0.393
6
0.023
7
0.271
3
0.019
6
0.301
1
2
(28.6%)
7
(19.4%)
17
(32.1%)
0.060
7
0.053
6
41
(34.5%)
21
(40.3%)
0.324
5
0.082
4
0.343
5
0.091
2
44
(31.2%)
19
(31.7%)
0.123
9
0.131
7
0.142
9
0.143
6
12
(52.2%)
0.007
9
0.010
5
9
(20.9%)
0.009
4
0.015
2
Allergic-not-Obese
(277)
34
(47.2%)
31
(31.0%)
0.030
4
Obese-not-Allergic
(18)
1
(14.3%)
2
(50.0%)
0.200
8
Allergic-and-Obese
(73)
3
(16.7%)
10
(52.6%)
0.022
0
Not-Obese-notAllergic (142)
21
(51.2%)
19
(39.6%)
37
(41.1%)
22
(45.8%)
4
(16.0%)
Allergic (350)
Non-Allergic (160)
Obese (91)
0.027
4
0.185
1
54
0.016 0.033
0.012
(34.2%)
4
4
3
CAMP=Childhood Asthma Management Program. Bud=budesonide. Ned=nedocromil. The
difference between budesonide and nedocromil was tested for significance (p-value1). Then
the “better performing” medication was compared to placebo (p-value2).
Non-Obese (419)
55
(48.6%)
50
(33.8%)
0.015
1
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 28 of 55
Table 4. Associations between asthma controller therapy and asthma control in the ACRN/CARE
dataset based on phenotype classification.
Outcome (Well-Controlled Asthma)
Inhaled
p-value
Phenotype
Montelukast
p-value
Corticosteroids
(permutation)
(175)
(proportional)
(284)
Allergic-not-Obese
0.0059
31 (40.0%)
11 (19.2%)
0.0044
(134)
Obese-not-Allergic
0.0513
0 (0.0%)
1 (25.0%)
0.0680
(12)
Allergic-and-Obese
0.3912
2 (14.9%)
1 (11.9%)
0.4261
(21)
Not-Obese-not0.2123
4 (12.1%)
2 (7.4%)
0.2725
Allergic (63)
Allergic (155)
36 (39.9%)
16 (24.1%)
0.0192
0.0342
Non-Allergic (75)
5 (10.5%)
3 (9.1%)
0.4236
0.4756
Obese (33)
3 (14.3%)
1 (7.9%)
0.2980
0.3294
Non-Obese (197)
38 (33.6%)
16 (18.9%)
0.0105
0.0174
ACRN= Asthma Clinical Research Network. CARE = Childhood Asthma Research and Education
(CARE). No placebo data is included because the inverse propensity of treatment weighting
(IPTW) does not allow for comparison of more than two treatments.
Copyright © 2017 by the American Thoracic Society
Page 29 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Table 5. Prediction accuracy of short (four months) and long-term (one year) asthma control in
the CAMP study
Prediction Accuracy (Area Under Curve)
Algorithm
Short-term
p-value
Long-term
p-value
Predictor Pursuit
0.8533 ± 0.0117
0.6517 ± 0.0129
Neural Network
0.8297 ± 0.0101
0.0317
0.6282 ± 0.0091
0.0146
Logistic Regression
0.8199 ± 0.0123
0.0123
0.0047
0.6168 ± 0.0105
Adaptive Boosting
0.8080 ± 0.0174
0.0077
0.5884 ± 0.0251
0.0003
Random Forest
0.8211 ± 0.0184
0.0349
0.5626 ± 0.0226
<0.0001
Naïve Bayes
0.7963 ± 0.0139
<0.0001
<0.0001
0.5496 ± 0.0128
0.0005
Support Vector
0.0040
0.8045 ± 0.0165
0.5936 ± 0.0137
Machine
CAMP= Childhood Asthma Management Program
Data are mean ± SD
p-values are for the comparison with Predictor Pursuit
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Table 6. The three most predictive features to predict long-term (one year) pediatric asthma
controllability in the CAMP study based on discovered phenotype regardless of assigned
medication.
A(+)/O(+)
A(+)/O(-)
A(-)/O(+)
A(-)/O(-)
Rank
Feature
r
Feature
r
Feature
r
Feature
r
Bronchodil
Bronchodi
Bronchodil
Bronchodi
1
ator
-0.30
lator
-0.30
ator
-0.30
lator
-0.32
Response
Response
Response
Response
Croup
2
Total eos -0.17
0.16
Croup ever 0.18
Total eos -0.18
ever
3
Eos %
-0.17
Eos %
-0.16
Total eos -0.17
Eos %
-0.17
CAMP=Childhood Asthma Management Program. r = Pearson correlation coefficient.
Eos=eosinophils cells/microliter. WBC=white blood count
Copyright © 2017 by the American Thoracic Society
Page 30 of 55
Page 31 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Table 7. The three most predictive features to predict long-term (one year) pediatric asthma
controllability in the CAMP study based on phenotype and assigned medication
A(+)/O(+)
A(+)/O(-)
A(-)/O(+)
A(-)/O(-)
Rank
Feature
r
Feature
r
Feature
r
Feature
r
Bud
Bronchodil
Bronchodi
Bronchodi
Bronchodilat
1
ator
-0.29
lator
-0.29
-0.28
lator
-0.31
or Response
Response
Response
Response
Total eos
Wheezes
Total eos
Total eos
2
count
-0.22
apart
-0.18
count
-0.21
count
-0.22
by %WBC
from cold
by %WBC
by %WBC
Daily
Ever in
Total eos
3
Female
0.19
asthma
-0.17 hospital for -0.21
-0.19
count
meds
asthma
Ned
Bronchodil
Bronchodi
Bronchodi
Bronchodilat
-0.34
1
ator
-0.32
lator
-0.31
lator
-0.34
or Response
Response
Response
Response
Daily
2
Croup
0.18
asthma
-0.16
Croup
0.22
Eos %
-0.19
med use
Use
Use
commercia
commercial
3
0.17
Croup
0.15
0.18
Eos %
-0.18
l cockroach
cockroach
spray
spray
CAMP=Childhood Asthma Management Program. r=correlation. Eos=eosinophils. WBC=white
blood count. Bud=budesonide. Ned=nedocromil.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Figure 1. Block diagram of Predictor Pursuit (PP) for phenotype discovery (left: greedy algorithm, right: a
recursive step of the algorithm).
637x331mm (72 x 72 DPI)
Copyright © 2017 by the American Thoracic Society
Page 32 of 55
Page 33 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Figure 2. Block diagram of Predictor Pursuit (PP) for outcome prediction.
198x114mm (96 x 96 DPI)
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Figure 3. Predictor Pursuit (PP) constructs different predictive models for various subspaces as opposed to
standard machine learning methods that apply one predictive (i.e. a “one-size-fits-all”) model for the entire
feature space.
511x304mm (72 x 72 DPI)
Copyright © 2017 by the American Thoracic Society
Page 34 of 55
Page 35 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Figure 4. Percent well-controlled children over time of the allergic-not-obese (A(+)/O(-)) and obese-andallergic (A(+)/O(+) children treated with budesonide (bud), nedocromil (ned), and placebo in the CAMP
study data.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Online Data Supplement
Discovering Pediatric Asthma Phenotypes Based on Response to Controller Medication Using
Machine Learning
Mindy K. Ross MD, MBA, MAS, Jinsung Yoon MS, Auke van der Schaar MSE, Mihaela van der
Schaar PhD
Copyright © 2017 by the American Thoracic Society
Page 36 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 37 of 55
Technical Details
Methods
We created Predictor Pursuit (PP) to 1) discover patient subgroups (phenotypes) based
upon the statistically different responses to medication and to 2) achieve a predictive model for
treatment outcomes by simultaneously discovering the patient subgroups and assigning
corresponding predictive models for each subgroup.
Phenotype Discovery (using PP)
To determine phenotypes, the general approach of Predictor Pursuit is to iteratively perform the
following two steps:
1. From the entire patient population, the algorithm splits the patients into all possible
combinations of two subgroups (pairs) based on features. Next, the subgroups are analyzed to
determine whether the medication responses are statistically different between the two groups –
we compute p-values for this using proportional test (2). Among all the possible paired groups,
the algorithm selects the ones for which the difference between their p-values is largest
(maximized).
2. The maximization in step 1 above is performed under the constraints: the difference
between two sub-patient groups regarding clinical outcomes must be statistically significance
(the level of statistical significance is 0.05). This constraint ensures that the discovered
subgroups achieve statistically different (significant versus not-significant) response by the
medication regarding clinical outcomes for each subgroup.
Figure E1 (Figure 1 in the manuscript) portrays the block diagram of Predictor Pursuit,
which sequentially discovers patient subgroups until there is no further feasible division (based
on the second step in the method described above). The final leaves of the tree discovered by our
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
personalization method represent patient subgroups (phenotypes). Note that the PP guarantees
the statistical significance of the division because it is applied as a constraint of Step 2. A
hypothetical example is: The algorithm first splits into two subgroups of young and old patients
because it determined these features (of all features) have the most significant difference of
treatment response to Medication A versus B (among the old and young patients, the old patients
responded best to Medication A). A second split is among the young patients into male and
females (females responded best to Medication B) and the young males are further split into
smokers and non-smokers (young male non-smoker patients responded to Medication A). This
continues until no further significant differences between paired treatment responses are
determined.
Outcome Prediction (using PP)
Predictor Pursuit adopts a novel framework for crafting complex predictive models out of
simpler baseline models by creating a phenotypic characterization of the clinical feature space in
which different predictive models are assigned to disjoint partitions of the feature space (1). The
set of partitions that cover the entire feature space, together with a set of predictive models each
tailored to a given partition, leads to an overall more complex and granular predictive model
(Figure E2, Figure 2 in the manuscript).
In detail, Predictor Pursuit divides the clinical feature space into disjoint subsets,
where is to be determined based on the given dataset, in such a way that for each subset, we
optimize a separate predictive model that minimizes the overall expected risk (3). We write
{ , … , } as a partition of the feature space , where all such partitions are ensured to be
disjoint and cover the entire . Given the above construct, the learning problem becomes a
problem of (simultaneously) finding the optimal partitioning { , … , } of the clinical feature
Copyright © 2017 by the American Thoracic Society
Page 38 of 55
Page 39 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
space, together with the optimal predictive model ℎ
∈ ℋ associated with every partition . This
optimal partitioning and corresponding optimal predictive model-constructing problem can be
formalized as follow.
min min
{ ,…, } ,…, ∈ℋ
∑
# ℱ ∈ × ℱ ℎ
, !"$
subjectto = ⋃
# , and
∩ 3 = ∅for∀ ≠ However, the computational complexity of solving the above optimization problem
exponentially increases (4-5). Furthermore, the objective cannot be computed with a finite
number of samples (3). Therefore, Predictor Pursuit adopts an efficient greedy algorithm for
approximating the solution to the optimal partitioning problem with alternative objective (the
upper bound of the expected loss based on empirical loss). As a first step to construct such an
algorithm, we reformulate the optimal partitioning problem by incorporating two more
constraints. First, we restrict the partitions of the feature space to be hyper-cubes. A hyper-cubic
partition of the feature space is defined as { , … , } where = ∏
#:
3 , ;
3 " ,:
3 ≤ ;
3 ,
:
3 &;
3 ∈ ℝ. Second, we restrict the number of partitions to be ? ∈ ℤ to solve this problem
sequentially and iteratively. The original optimization problem with these additional constraints
can be stated as follows.
1
min ∑
# min D ∑
A B { ,…, }
∈ℋ
subjectto
X
V
G ,K M F L
Eℎ
E
J
G
F H, !I H$
+ OP
Q log T
D
= ∏
#:
3 , ;
3 " ,:
3 ≤ ;
3 , :
3 , ;
3 ∈ ℝ
≤ ?, where ∈ ℤ
W
V
U = ⋃
# , and
∩ 3 = ∅for∀ ≠ Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Let Opt be the optimal partition of the above optimization problem; we construct a
greedy algorithm which iteratively solve the above optimization problem to achieve the
approximate solution for the original optimization problem. More specifically, we solve the
above optimization problem recursively on each separately up to the point where we do not
expect to improve the objective function (i.e. the optimal is 1). The final partition is the union
of the partitions that are generated by applying this procedure recursively to each . We write
the final partition achieved by the greedy algorithm as {̂∗ , … , ̂ ∗ } and the corresponding
predictive models as _ℎ̀∗ , … , ℎ̀∗ a. The final predictive model achieved by Predict Pursuit can be
written as follow. ℎ∗ = ∑
# bc ∈ ̂ ∗ × ℎ̀
∗ .
Predictive Features
The top 10 predictive features for short-term and long-term controllability are achieved
by sorting the absolute correlation between features and labels. For long-term and short-term, we
use different outcomes as the labels (long-term and short-term controllability); thus, different top
10 predictive features are achieved by the above method.
In addition, to identify predictive features, we determined the predictive powers of each
feature in terms of area under the curve (AUC) value across the long-term time period with
Python tool (sklearn package) based upon asthma phenotypes identified by PP. We analyzed
phenotype groups according to assigned treatment to describe the most predictive features that
determined controllability throughout time. The predictive power of the features can be
evaluated by the prediction accuracy of the Adaptive Boosting algorithm (11). We used single
feature among the entire features and compute the prediction accuracy (area under the curve) for
each feature. Then, we sorted the features based on the value of the area under the curve (AUC).
Copyright © 2017 by the American Thoracic Society
Page 40 of 55
Page 41 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
We defined the top five predictive features as the features whose AUC values are highest. This
methodology can be easily extended to multiple feature analysis if we treat multiple features as a
subset of the entire feature set.
State-Transition Model (probability of control)
As a supplement to understand the short-term response for different medications, we used a
Markov Chain framework (6-7) to construct the state transition model for asthma controllability
for the CAMP trial. At four-month intervals, for each phenotype identified in the CAMP study,
we used the Markov-Chain model to estimate the likelihood of patients within phenotypes to
remain in their current asthma control state based upon their previous control state. Markov
Chain consists of three components: states d, action e, and transition probability (fg h, h i ),
and we define those components to construct the state-transition model (6-7).
First, the states of asthma d are defined using the 2007 NAEPP Asthma Guideline (8)
criteria (State 1: Well-Controlled, State 2: Not-Well-Controlled, and State 3: Very-PoorlyControlled). Next, actions for the asthma management e are either Budesonide (Bud) or
Nedocromil (Ned) based on the Childhood Asthma Management Program (CAMP) trial (9).
Finally, the transition probability (fg h, h i ) is defined as the probability that the next state will
be h i when the current state is h and the current action is :. Mathematically, it can be defined as
fg h, h i = Phkl = h i |hk = h, :k = :
To complete the above state-transition models, we need to estimate the entire transition
probabilities, fg , n for all : ∈ {1,2} and , n ∈ {1,2,3}. We used Monte Carlo method (10) to
estimate the entire transition probabilities (Figure E3). Based on the mathematical definition of
the transition probability, fg , n = fhkl = n|hk = , :k = :, the estimator is
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
f̀g , n = f̀hkl = n|hk = , :k = :, =
qhkl = n, hk = , :k = :
qhk = , :k = :
for all r ∈ s, the entire time horizon. qhk = , :k = : is defined as the number of cases that the
current state is with the medication :. qhkl = n, hk = , :k = : is also similarly defined as
the number of cases that the current state is and the next state is n with the medication :.
Features
For this particular study, data processing included generating features from certain
variables of interest for asthma phenotypes whose state was labeled as present or absent. These
were: obese, allergic status, BMI, airways hyperreactivity (CAMP only), and adherence (CAMP
only). Obesity was defined as body mass index ≥95 percentile for age. The presence of an
allergic state was defined based upon any one of the following features in the CAMP dataset:
history of allergy shots, allergen skin testing positive, and physician diagnosis of allergy.
Airways hyperreactivity was determined by the difference between pre-percent predicted
forced expiratory volume in 1 second (FEV1%) and post-FEV1%. Adherence was determined by
a “no” response to the question “takes medicine as prescribed.”
Asthma severity was determined in both the CAMP and ACRN/CARE dataset based on
the NAEPP guidelines. In the CAMP dataset, severity was determined from FEV1% predicted,
nighttime awakenings, symptoms that interfered with daily activity, and healthcare utilization
(emergency room visits, medical visits, hospitalizations). In the ACRN/CARE dataset, severity
was determined from the FEV1% predicted, symptoms per week, rescue medication use per
week, emergency room or hospitalizations, and oral steroid use.
Outcome (control state)
Copyright © 2017 by the American Thoracic Society
Page 42 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 43 of 55
After feature exclusion, variables that comprised guideline criteria to define control in the
CAMP study were a) impairment: FEV1, symptoms of phlegm, nighttime awakenings, shortacting beta-agonist use for symptom control, and interference with normal activity and b) risk:
use of healthcare resources (emergency room, hospitalizations). The missing rate of the oral
steroid data was too high to be included. Although we assessed baseline adherence in the CAMP
data (similar across groups), we were not able to eliminate non-adherent patients due to the
resultant small sample size. In the ACRN/CARE dataset, the variables available to determine
control were FEV1% predicted, symptoms per week, rescue medication use per week,
emergency room or hospital utilization, and oral steroid use. For both datasets, the worst-case
control category was assigned.
Additional results
State-Transition Model (probability of control)
For all patients, the probability that they were in a well-controlled state was higher with
budesonide than nedocromil. The A(+)/O(-) patients were 7% more likely to be in a wellcontrolled state in the budesonide group than nedocromil. For the A(+)/O(+) group, if wellcontrolled, the highest difference in the probability (26%) that the patients would remain wellcontrolled at each interval was for those treated with nedocromil. But, if they were in a “not-well”
or “very-poorly” controlled state, then budesonide outperforms nedocromil in moving them to a
more controlled category (Figure E4).
Predictive Features
We identified the top 10 predictive features across phenotypes. The top three ranked
features are reported in the manuscript and Tables E1-4 show the top 10 predictive features in all
categories.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
For the single feature prediction using Python, we were unable to analyze the obese-andallergic phenotype for predictive single features because the sample was too small for this
method. Instead, we evaluated all allergic patients (which included obese patients) to identify
predictor variables of control. On average, a lower value of these features was associated more
with a well-controlled state. During the first 400 days of treatment for those treated with
nedocromil-- airways hyperreactivity, total eosinophil count, social desirability score (13), and
bone mineral content/density were predictive for a well-controlled state. For those treated with
budesonide-- airways hyperreactivity, age, hemoglobin value, social desirability score, and bone
mineral content/density were the most predictive features. For nedocromil during the last 10001400 days, the most predictive values for well controlled were airways hyperreactivity,
hemoglobin value, social desirability score, and bone mineral content/density. The most
predictive variables for the budesonide group were age and bone mineral content/density. Of
these variables, airways hyperreactivity and peripheral blood eosinophil count (Figure E5) have
been identified in previous studies in cluster results as relevant to asthma control.
Copyright © 2017 by the American Thoracic Society
Page 44 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 45 of 55
References
E1.
Yoon J, Alaa AM, Cadeiras M, van der Schaar M. Personalized Donor-Recipient
Matching for Organ Transplantation. Association for the Advancement of Artificial
Intelligence (AAAI) Conference, 2017.
E2.
Zimmerman DW. A note on interpretation of the Paired-Samples t test. Journal of
Education and Behavioral Statistics. 1997; 22(3):349-360
E3.
Shalev-Shwartz S, Shai BD. Unerstanding Machine Learning: From Theory to
Algorithms. Cambridge University Press. 2014.
E4.
Laurent HYAF, Ronald LR. Contructing Optimal Binary Decision Trees is NP-
Complete. Information Processing Letters. 1976.
E5.
Berend D, Tassa T. Improved Bounds on Bell Numbers and on Moments of Sums of
Random Variables. Probability and Mathematical Statistics. 2010; 30(2): 185-205
E6.
Asmussen S. Applied Probability and Queues. Springer Science & Business Media.
2003:7
E7.
Parzen E. Stochastic Processes. Courier Dover Publications. 2015:188
E8.
National Asthma E, Prevention P. Expert Panel Report 3 (EPR-3): Guidelines for the
Diagnosis and Management of Asthma-Summary Report 2007. J Allergy Clin Immunol.
2007;120(5 Suppl):S94-138.
E9.
The Childhood Asthma Management Program (CAMP): design, rationale, and
methods. Childhood Asthma Management Program Research Group. Control Clin Trials.
1999;20(1):91-120.
E10.
Berg BA, Markov Chain Monte Carlo Simulations and Their Statistical Analysis (With
Web-Based Fortran Code). Hackensack. 2004.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
E11.
Freund Y, Robert ES. A decision-theoretic generalization of on-line learning and an
application to boosting. European conference on computational learning theory. Springer
Berlin Heidelberg. 1995
E12.
Peter H, Jeffrey SR, Qi L. Cross-Validation and the Estimation of Conditional
Probability Densities. Journal of the American Statistical Association. 2005; 99(468): 10151026
E13.
M. R. Dadds, S. Perrin, and W. Yule. Social desirability and self-reported anxiety in
children: An analysis of the RCMAS Lie Scale. Journal of abnormal child psychology. 1998;
26(4): 311-317.
Copyright © 2017 by the American Thoracic Society
Page 46 of 55
Page 47 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Figures Legends
Figure E1. Block diagram of Predictor Pursuit (left: greedy algorithm, right: a recursive step of
the algorithm)
Figure E2. Block diagram of Predictor Pursuit for outcome prediction
Figure E3. State-transition model diagram
Figure E4. State transition model from the CAMP dataset for the responses of budesonide versus
nedocromil medication for a) all patients, b) allergic-not-obese patients, and c) obese-andallergic patients. Each node represents a control state (Well, Not Well, Very Poor) with the
probability of patients that will remain in that state at each interval. The arrows are the
probabilities of patients going to a better (solid line) or worse (dotted line) control state at fourmonth interval.
Figure E5. Airways hyperreactivity (AH) and peripheral blood eosinophilia (Eos) count over
time for all allergic patients (A(+)/O(+/-)). Both a lower AH and Eos value predict a more wellcontrolled state.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 48 of 55
Table E1. The top 10 relevant features and correlation to predict short (four months) and longterm (one year) pediatric asthma controllability prediction in the CAMP study, regardless of
assigned treatment
Rank
1
2
3
4
5
6
7
8
9
10
Short-term (four months)
Feature name
Correlation
Current control state
0.5778
Airways hyperreactivity
-0.2255
Standing Height
-0.1844
Age
-0.1732
Eosinophils (%)
-0.1678
Total eosinophil % of WBC, cells/mL
-0.1585
L1 BMC (grams)
-0.1580
L1 area (cm2)
-0.1569
Wheezes apart from cold
-0.1567
Total spine BMC
-0.1509
Long-term (one year)
Feature name
Correlation
Airways hyperreactivity
-0.2780
Eosinophils (%)
-0.1634
Wheezes apart from cold
0.1543
Total eosinophil % of WBC, cells/mL
-0.1535
Gender (Female)
0.1471
Croup
0.1380
Daily asthma meds
0.1302
Total eosinophil count
-0.1274
Attacks after exercise
0.1274
Nasal discharge
-0.1261
CAMP=Childhood Asthma Management Program. L=lumbar. BMC=Bone mineral content.
WBC=white blood cells
Table E2. The top 10 relevant features and correlation to predict short-term (four months)
pediatric asthma controllability in the CAMP study according to discovered phenotype
regardless of assigned medication.
Rank
A(+)/O(+)
Feature
A(+)/O(-)
R
Feature
A(-)/O(+)
R
Feature
A(-)/O(-)
R
Feature
R
1
Current control
0.5675
Current control
0.5622
Current control
0.5557
Current control
0.5471
2
Airways
Hyperreactivity
-0.2294
Airways
Hyperreactivity
-0.2614
Airways
Hyperreactivity
-0.2269
Airways
Hyperreactivity
-0.2344
3
4
Height
0.2191
Height
0.1802
Height
0.2103
Height
0.1927
0.1945
L1 BMC
(grams)
Age
0.1907
0.1792
L1 BMC
(grams)
Age
0.1727
0.1935
Wheezes apart
from cold
Age
-0.1668
5
L1 BMC
(grams)
Age
0.1704
6
Weights
0.1935
0.1785
Weights
0.1668
7
L1 area (cm2)
0.1867
0.1783
Yes: airways
hyperreactivity
Current grade in
school
L3 BMC
(grams)
0.1861
L3 BMC
(grams)
L1 area (cm2)
0.1773
L3 BMC
(grams)
Total spine
BMC (grams)
L1 area (cm2)
0.1607
8
Total spine
BMC (grams)
L2 BMC
(grams)
Weights
L2 BMC
(grams)
0.1577
9
10
0.1842
0.1840
L1 BMC
(grams)
Daily asthma
meds August
# months daily
asthma med
Total spine
BMC (grams)
L2 BMC
(grams)
0.1601
0.1572
-0.1566
-0.1531
0.1500
0.1478
0.1781
0.1693
0.1606
0.1587
CAMP=Childhood Asthma Management Program. R=correlation. L=lumbar. BMC=Bone
mineral content. WBC=white blood cells.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 49 of 55
Table E3. The top 10 relevant features and correlation to predict long-term (one year)
pediatric asthma controllability in the CAMP study based on discovered phenotype
regardless of assigned medication
Rank
A(+)/O(+)
A(-)/O(+)
A(-)/O(-)
R
Feature
R
Feature
R
Feature
R
1
Airways
Hyperreactivity
-0.3003
Airways
Hyperreactivity
-0.2993
Airways
Hyperreactivity
-0.3046
Airways
Hyperreactivity
-0.3167
2
Total eos count
by %WBC
-0.1710
Croup
0.1641
Croup
0.1770
Total eos count
by %WBC
-0.1801
3
Eos %
-0.1663
Eos %
-0.1594
-0.1654
Eos %
-0.1736
4
Croup
0.1605
-0.1469
-0.1637
-0.1611
5
Female
0.1524
6
Any positive
core test
Attacks after
exercise
Total eos count
-0.1518
Daily asthma
meds
Wheezes apart
from cold
Total eos count
by %WBC
Female
Any positive
core test
Total eos count
-0.1292
Coughs on most
days
-0.1201
7
8
9
10
Feature
A(+)/O(-)
Coughs most
days
Wheezes apart
from cold
-0.1487
-0.1476
-0.1457
-0.1454
-0.1446
-0.1418
0.1305
-0.1220
Total eos count
by %WBC
Eos %
Wheezes apart
from cold
Total eos
-0.1453
Wheezes apart
from cold
Female
-0.1425
Total eos
-0.1592
Any positive
core test
Ever in hospital
from asthma
Female
-0.1423
Attacks after
exercise
Croup
-0.1524
Any positive
core test
Nasal
obstruction
-0.1467
Attacks after
exercise
-0.1405
0.1403
-0.1368
0.1608
0.1493
-0.1357
CAMP=Childhood Asthma Management Program. R=correlation. L=lumbar. BMC=Bone
mineral content. WBC=white blood cells.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 50 of 55
Table E4. The top 10 relevant features and correlation to predict long-term (one year) pediatric
asthma controllability in the CAMP study based on phenotype and assigned medication
Rank
A(+)/O(+)
Feature
A(+)/O(-)
R
Feature
A(-)/O(+)
R
Feature
A(-)/O(-)
R
Feature
R
Bud
-0.1835
5
Attacks after
exercise
Eos %
Airways
Hyperreactivity
Wheezes apart
from cold
Daily asthma
meds
Croup
-0.1772
Eos %
-0.1605
6
Total eos count
-0.1757
Female
0.1469
7
Wheezes apart
from cold
Asthma med,
2+ sx or bronch
Weights
-0.1755
Total eos count
by %WBC
Asthma med, 2+
sx or bronch
Any positive
core test
Total eos count
-0.1402
1
2
3
4
8
9
10
Airways
Hyperreactivity
Total eos count
by %WBC
Female
-0.2858
-0.2157
0.1929
-0.1667
-0.1628
Other illness in
past 12 months
-0.1582
Airways
hyperreactivity
Croup
-0.3244
-0.2861
-0.1824
-0.1660
0.1630
-0.1371
-0.1236
-0.1235
Airways
Hyperreactivity
Total eos count
by %WBC
Ever in hospital
for asthma
Croup
Daily asthma
meds
Basement or
attic
Wheezes apart
from cold
Total eos count
-0.2031
Airways
Hyperreactivity
Total eos count
by %WBC
Total eos count
0.2002
Eos %
-0.1894
-0.1916
Attacks after
exercise
Wheezes apart
from colds
Croup
-0.1723
-0.2759
-0.2058
-0.1860
-0.1821
-0.2207
-0.1933
-0.1693
0.1638
Any positive
core test
Ever in hospital
for asthma
Any positive
core test
-0.1605
-0.3402
0.2153
Airways
hyperreactivity
Eos %
-0.1881
Use commercial
cockroach spray
Cockroaches
0.1836
Eos %
-0.1813
0.1644
-0.1789
-0.1590
Wheezes apart
from colds
Croup
-0.1582
Female
0.1739
-0.1715
Asthma med, 2+
sx or bronch
Any positive
core test
-0.1814
-0.3097
-0.1798
-0.1625
-0.1577
-0.1396
Ned
0.1690
4
Use commercial
cockroach spray
Nausea, current
Airways
hyperreactivity
Daily asthma
meds
Croup
-0.1590
Eos %
-0.1403
5
Eos %
-0.1488
-0.129
6
Asthma med,
2+ symptoms or
bronch
Other HEENT
abnormality
Months asthma
med, 2+ sx or
bronch
Total eos count
by %WBC
Nasal discharge
-0.1486
Coughs on most
days
Any positive
core test
-0.1236
Asthma med, 2+
sx or bronch
Eos %
-0.1382
Female
0.1187
Female
0.1483
-0.1319
Asthma med, 2+
symptoms or
bronch
Wheezes apart
from colds
Total eos count
-0.1174
Coughing after
eating
-0.1483
Attacks after
exercise
Nausea, current
-0.1163
Coughs on most
days
Use humidifier
-0.1481
Total eos count
-0.1582
-0.1399
Nasal discharge
-0.1553
1
2
3
7
8
9
10
0.1795
-0.1319
-0.1278
-0.3074
-0.1563
0.1527
-0.1148
Airways
hyperreactivity
Croup
-0.3439
0.1782
0.1612
CAMP=Childhood Asthma Management Program. R=correlation. L=lumbar. BMC=Bone
mineral content. WBC=white blood cells
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 51 of 55
Figures
Figure E1.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Figure E2.
Copyright © 2017 by the American Thoracic Society
Page 52 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 53 of 55
Figure E3.
Copyright © 2017 by the American Thoracic Society
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Figure E4.
Copyright © 2017 by the American Thoracic Society
Page 54 of 55
ANNALSATS Articles in Press. Published on 19-October-2017 as 10.1513/AnnalsATS.201702-101OC
Page 55 of 55
Figure E5.
Copyright © 2017 by the American Thoracic Society
Документ
Категория
Без категории
Просмотров
6
Размер файла
1 936 Кб
Теги
annalsats, 101oc, 201702
1/--страниц
Пожаловаться на содержимое документа