A Novel Pupillometric-Based Application for the Automated Diagnosis of ADHD Using Machine Learning New York City Science and Engineering Fair 2020 Junior Science and Humanities Symposium 2020 Shubh Khanna Hunter College High School, New York, NY, 10128 Khanna, 2019 Table of Contents List of Figures.......................................................................................................................................................................3 Ethics Statement..................................................................................................................................................................3 Introduction...........................................................................................................................................................................4 Methods..................................................................................................................................................................................7 Results and Conclusions....................................................................................................................................................9 References and Literature Cited..................................................................................................................................10 1 Khanna, 2019 List of Figures Figure 1 Diagram of patient data collection Figure 2 Screenshot of data analyzed by the machine learning algorithms Ethics Statement All research was conducted independently, but with guidance from an experienced teacher and followed state and federal regulatory guidance applicable to the humane and ethical conduct of such research. 2 Khanna, 2019 Abstract Attention-deficit/hyperactivity disorder (ADHD) is a neurodevelopmental disorder that affects more than 6 million children. ADHD is characterized by hyperactivity, impulsivity and inattention, often causing other psychiatric conditions such as anxiety, depression, schizophrenia, and autism (O’Connell et al., 2019). It can be severely debilitating for affected individuals, adversely impacting academic and occupational outcomes, as well as impairing socio-emotional development (Mueller et al., 2010). As such, the timely and accurate diagnosis of ADHD is critical in order to give patients the necessary treatments to mitigate its effects. However, the current diagnosis of ADHD is subjective, time-consuming, and inaccurate. Clinical assessments often last multiple hours and the high demand for these examinations greatly exceeds the maximum capacity of the available developmental pediatric clinics (Duda et al., 2016). In this project, a machine learning model that analyzes de-classified, open-source pupil dynamics data from ADHD positive and negative subjects was developed, utilizing pupil movement as an objective biomarker to characterize ADHD. An empirical evaluation of various machine learning models showed that the naive bayes classification algorithm yielded the highest accuracy and precision. After being trained on the data, the model was tested on an isolated testing set, diagnosing ADHD with 90.6% accuracy, 93.5% average precision-recall rate, and yielding an area under the receiving operating characteristic curve (AUROC) of .926. The model developed offers a novel and reliable technical approach to diagnose ADHD that is time-efficient, accurate, and freely accessible, and also illustrates the potential of pupillometry as a biomarker to identify ADHD. 3 Khanna, 2019 I. Introduction Attention-deficit/hyperactivity disorder (ADHD) is a clinically heterogeneous neurobehavioral disorder characterized by inattention, impulsivity, and hyperactivity. ADHD is the most prevalent childhood behavioral disorder: studies have shown its prevalence rates in children and adolescents is approximately 5-8% (Peng et al 2013). ADHD is caused by an array of genetic, neurological, and behavioral factors. Symptoms stem from problems in behavioral and cognitive control, and have been attributed to deficient dopaminergic signaling. Brown et al. developed a comprehensive model to describe the complex cognitive functions impaired by ADHD, wherein the impairments of the executive functions of the brain are attributed to inherited problems in the chemistry of the system. Furthermore, the dynamic developmental behavioural theory predicts that behaviour and symptoms in ADHD result from the interplay between individual predispositions and the surroundings, whereas hypofunctioning dopamine branches represent the main individual predispositions (Sagvolden et al, 2005). To treat the disorder, psychiatrists prescribe an initial treatment of methylphenidate, a stimulant, as well as a usual second treatment, atomoxetine, a noradrenaline reuptake inhibitor. Both methylphenidate and atomoxetine increase the catecholamine availability at synapses, altering neural activity; these drug-focused treatments of ADHD is why improper diagnosis can be extremely detrimental to a patient: methylphenidate and atomoxetine are specific, prescribed drugs that can cause side effects in patients without ADHD. The diagnosis of ADHD is inaccurate – the misdiagnosis rate is more than 20% – in part because of its comorbidity with similar behavioral and psychological disorders. Currently, diagnosis is based solely on observed behavior and reported symptoms, creating a risk of over and under-diagnosis. ADHD patients display clear heterogeneities at the clinical and biological levels, further complicating diagnosis. There are well-established strong associations with lower IQ and intellectual disability, specific learning and developmental problems, such as reading disability, speech and language problems, motor coordination difficulties and also autistic spectrum disorders. ADHD also shows a high level of comorbidity with other psychiatric and behavioural disorders, notably conduct antisocial behaviour, and mood disorders, that is explained by shared heritability. As a result, ADHD might be best considered as one of a cluster of externalising disorders that have various psychological implications (Andrews, 2009). Some of these 4 Khanna, 2019 neurodevelopmental problems may present later in development and might arise as a consequence of having ADHD (analogous to diabetes potentially resulting in blindness) (Willcutt et al., 2012). Early diagnosis of this disorder is of prime importance in preventing subsequent complications such as negative effects on children’s social interactions. However, the diagnosis of ADHD is is flawed for two reasons: (1) it relies on subjective and inefficient methods, and (2) it is unable to detect ADHD at its earliest stages. There is no objective test to diagnose ADHD, and clinical tests cost between $800-$2000 per screening, making diagnosis for ADHD inaccessible to a majority of the world. Clinical assessment typically involves the use of multiple methods and informants, often slow and inefficient. The diagnosis is based on a checklist of eighteen symptoms, nine related to inattention and nine related to hyperactivity and impulsivity. Clinicians make a positive diagnosis if a patient shows symptoms from at least six of each category, and to a degree that is inappropriate for their age and is disruptive or impairs social, school, or work functioning. These subjective clinical assessments – the only way doctors are able to evaluate ADHD – often last multiple hours and the demand for these examinations greatly exceeds the maximum capacity of the available developmental pediatric clinics (Duda et al., 2016). As a result, children are often waitlisted for over a year, preventing a timely diagnosis and delaying the start of necessary treatment (Duda et al., 2016). Other measures used in the assessment of ADHD include computerized performance tests and lab or analogue observations. However, problems exist with each of these assessment tools. For example, diagnostic interviews and behavior rating scales are subjective, prone to social desirability, and are dependent upon accurate reporting of symptoms. Moreover, the majority of subjects tested for ADHD are children: these children typically do not produce objective or accurate answers that can lay a clear path to appropriate diagnosis. Additionally, rating scales often contain vague and poorly defined items. Assessment methods such as computerized performance tests have been criticized due to poor clinical utility. Lab or analogue observations have also been criticized due to high subjectivity, as they are dependent upon the observer’s criteria for what qualifies as inattentive or hyperactive behavior, and few observational measures have norms. This inefficient and expensive diagnosis exacerbates socioeconomic divides and the ability to treat ADHD: racial and ethnic minorities have been reported to be diagnosed with ADHD at lower rates than white children, and therefore may have unmet treatment needs. Children who are African Americans are diagnosed 5 Khanna, 2019 with ADHD at only two-thirds the rate of white children despite displaying greater symptomatology, and Hispanic children are also underdiagnosed. Misdiagnosis, or a lack of timely diagnosis, is detrimental to patients because it will lead to the implementation of inappropriate treatment interventions such as providing stimulant medication to a child with a reading disorder or providing academic interventions only to a child with ADHD. Furthermore, u ntreated ADHD during childhood is a risk factor for later adult mental health issues, which extend beyond impairment in academics to lethal disorders. The lack of treatment impairs social and occupational functioning and increases the likelihood of developing comorbid disorders like anxiety, depression, personality disorders, and antisocial behaviors, and it is thus necessary to diagnose ADHD accurately, timely, and cost-effectively. To reduce the likelihood of misdiagnosis and clearly identify ADHD at its onset, an objective assessment using a biological marker is imperative to define and diagnose the disorder. Oculomotor paradigms provide a promising methodology for characterizing maturational abnormalities in brain systems associated with neurodevelopmental disorders. During the last few years, they have been applied to a number of psychiatric disorders, such as obsessive compulsive disorder (Rosenberg et al. 1997), autism (Kemner et al, 2004), dyslexia (Ramus, 2003) and tourette syndrome (Swenney, 2004). A promising biomarker of oculomotor changes in human samples is pupil-size dynamics. After controlling for stimulus luminance, pupil size correlates with task difficulty, emotional valence, physical effort, motor output, and arousal states. Independent findings suggest that these fluctuations in pupil size reflect the state of the brain norepinephrine (NE) system. This system originates in the locus-coeruleus (LC) and projects throughout the cerebral cortex, hippocampus, thalamus and midbrain, among others. Brain areas associated with attentional processing (e.g., parietal cortex, pulvinar nucleus, superior colliculus) receive particularly dense LC-NE innervations. Data from animal models, both rodents and non-human primates, have shown a central role for the LC in selective attention. Nowadays, pupil size can be monitored in a completely noninvasive way in humans, using a remote camera and infrared light. We hypothesized that if pupil size reflects the activity of the LC-NE system, which is one of the important attentional systems in humans, then pupil size can be used as an objective biomarker to diagnose ADHD. If this is the case, a machine learning model based on changes in pupil size 6 Khanna, 2019 should reflect the behavioral differences observed between ADHD patients and control subjects, and can be implemented in a freely accessible, web-based application that uses a standard computer camera to diagnose ADHD. By extracting a descriptive and comprehensive set of features from the preprocessed eye movement data and feeding it into the machine learning model, we hypothesized that an accurate and reliable mechanism to diagnose ADHD could be developed. Furthermore, we sought to perform an empirical test of all types of machine learning based models, also developing a neural network based architecture to analyze the raw time series data composed of eye gaze coordinates and pupil size dilations using long short term memory networks, a special variant of recurrent neural networks that employs memory cells to retain information over longer spans of temporal data. Our developed web application will change the way our society approaches the diagnosis of ADHD and other psychiatric disorders. II. Methods II.I. Dataset The machine learning model developed in this project was trained and evaluated on a dataset collected from subjects with and without ADHD, with three specific groups labelled as: Off-ADHD, On-ADHD, Ctrl . The dataset was obtained as part of a study conducted by Wainstein et al. (2019). Off-ADHD corresponds to positively diagnosed subjects not taking any medication, while On-ADHD corresponds to those taking medication. Ctrl represents the control group of healthy subjects. A group of 50 subjects participated in this study. 28 subjects were patients diagnosed with ADHD, drawn from a pool of diagnosed patients, and 22 healthy control children recruited from local schools. All ADHD children were being treated with methylphenidate at the time of the study and a subgroup of 17 ADHD patients (3 girls, age: 11.19±0.86 years-old) performed the task twice, on and off medication. All ADHD subjects were diagnosed following standard procedures by a neurologist, and participated with parental consent. All procedures were approved by the Ethics Committee of the School of Medicine of Pontificia Universidad Católica de Chile. 7 Khanna, 2019 Figure I: Data collection. From Wainstain et al, 2017 II.II. Data Preprocessing Analysis of the raw pupil data was programmed and executed in Python. Adopting methods from Wainstein et al., the raw pupil data was preprocessed in order to later extract pupillometric features. Pupil data surrounding blinks were removed from the time series used in the analyses — specifically, pupil data within 50ms of a period of missing data, or a gap, spanning more than 75ms, were removed. Pupil diameter during these periods was estimated using cubic spline interpolation, which works as follows: four points (A, B, C, and D) are placed around the on and offset of the blink. Point B is placed slightly before the onset of the blink; point C is placed slightly after the onset of the blink. Point A is then placed before point B; point D is placed after point C. Points are equally spaced, such that the distances between A and B, B and C, etc. are constant. After this, a smooth line is drawn through all four points, replacing the missing and distorted data between B and C. (Mathot et al.) 8 Khanna, 2019 Figure II: Data Analyzed, from Wainstein et al , 2017. To obtain representative pupil diameter metrics, data from each patient were baseline-adjusted and smoothed by a bandpass Butterworth filter between 0.025 Hz and 4 Hz. (Wainstein et al.) A low-pass frequency filter extracted the high frequency noise, and a high-pass filter was applied to detrend the basal slow change of the pupil diameter across trials. (Wainstein et al.) Furthermore, outliers, defined as periods of pupil change higher than 3 standard errors from the mean were discarded. Additionally, dilation speed outliers were removed from the raw pupil data — dilation speed outliers are samples that feature a disproportionately large absolute pupil size change relative to their adjacent samples. (Kret et al.) To address this potential problem of nonuniform sampling and spacing between pupil size measurements, the absolute change between samples can be divided by the temporal separation of the samples that are susceptible to non-uniform spacing, producing the normalized dilation speed between samples. Lastly, all trials with more than 50% of missing data (due to blinks or outliers) were removed. For each of the 160 trials, 8s of analysis was taken into account. After filtering the data and removing outliers, the pupil timeseries was normalized by calculating the z-scores of each pupil size every second, separately for each trial. II.III. Feature Extraction A number of pupillometric features were extracted from each trial in order to feed into the machine learning models, namely: the pupil size maxima during probe presentation, the mean and max pupil dilation velocity before and after stimulus onset, as well as their differences in values, the differences in max accumulated pupil dilation velocities before and after stimulus onset, the mean and median pupil size during the trial, as well as the skew, kurtosis, and standard deviation of these values. The mean values for each of the calculated features from the 160 trials were calculated to associate with each patient. II.IV. Machine Learning Application The dataset was trained with the following classification algorithms: KNN Neighbors, Random Forest Regression, Gradient Boosting Classifier, Naive Bayes, Decision Tree Classifier, and AdaBoost Classifier. An empirical evaluation of the accuracy of each algorithm was then conducted. For each algorithm, the accuracy, precision, recall, and AUROC, values were calculated, allowing the effectiveness of each distinct algorithm to be evaluated. Then, this model was placed in the backend of a computer web camera application, which takes in visual eye data, converts into pupil dynamic figures, and returns an output. 9 Khanna, 2019 II.V. Deep Learning Application A long short term memory neural network architecture developed by Pompsun et al. was adopted and modified, consisting of 3 hidden layers, each taking an input of a fixed size sequence of raw gaze points extracted by a sliding window approach. From each gaze point, we took a 6-dimensional feature vector that consisted of horizontal and vertical gaze coordinates and pupil sizes for both eyes (Pompsun et al.). The model receives a feature vector as its input and outputs a probability distribution over all classes for every single time step. The model extracts the temporal feature representations and dynamic disparities between both eyes from the weighted input and sequential information preserved in the network’s hidden state. III. Results III.I. Validation Metrics For each algorithm, the accuracy, precision, recall, and AUROC, values were calculated, showing that the Naive Bayes Classifier yielded the optimal values, with an accuracy of 90.6%, and AUROC of 91.9%, and average precision-recall rate of 93.6%. The machine learning model’s strong performance illustrates the strong correlation between pupillometric responses to a visuospatial memory task and the presence of ADHD, introducing and justifying our method of diagnosing ADHD. The machine learning algorithms are currently being further optimized, and a web application to incorporate this model is in the process of being developed in order to alter how ADHD is diagnosed. IV. Discussion IV.I. Future Implications This model offers a novel and reliable technical approach to diagnose ADHD that is time-efficient and freely accessible. This study highlights pupil dynamics as an effective biomarker for ADHD diagnosis and implemented an application using the built-in computer web camera that can detect pupillometrics and diagnose a subject accurately. This novel approach to diagnosing a widespread neurobehavioral disorder will allow children across the world to receive the help and support they need and reach their potential. Moreover, the eye tracking methodology innovated in this research could also be implemented across different neurological conditions, thereby democratizing access to screenings for other neurobehavioral disorders. At the moment, this study is pursuing IRB approval to conduct further testing of the web application at local institutions and hospitals, allowing for the work conducted to be as tangible and effective as possible. 10 Khanna, 2019 V. References Andrews, G., Pine, D.S., Hobbs, M.J., Anderson, T.M., & Sunderland, M. (2009). Neurodevelopmental disorders: Cluster 2 of the proposed meta-structure for DSM-V and ICD-11. [Review]. Psychological Medicine, 39, 2013-2023. Brown TE: ADHD comorbidities. Handbook for ADHD complications in children and adults. American Psychiatric Publishing 2009. Costa, Vincent D, and Peter H Rudebeck. “More than Meets the Eye: the Relationship between Pupil Size and Locus Coeruleus Activity.” Neuron vol. 89,1 (2016): 8-10. doi:10.1016/j.neuron.2015.12.031 D. Niall Hartnett, Jason M. Nelson & Anne N. Rinn (2004) Gifted or ADHD? The possibilities of misdiagnosis, Roeper Review, 26:2, 73-76, DOI: 10.1080/02783190409554245 Duda, M., Ma, R., Haber, N. et al. Use of machine learning for behavioral distinction of autism and ADHD. Transl Psychiatry 6, e732 (2016) doi:10.1038/tp.2015.221. Fatima, M. and Pasha, M. (2017) Survey of Machine Learning Algorithms for Disease Diagnostic. Journal of Intelligent Learning Systems and Applications, 9, 1-16. https://doi.org/10.4236/jilsa.2017.91001. Ford-Jones, Polly Christine. “Misdiagnosis of attention deficit hyperactivity disorder: 'Normal behaviour' and relative maturity.” Paediatrics & child health vol. 20,4 (2015): 200-2. doi:10.1093/pch/20.4.200 Gonon, F. The dopaminergic hypothesis of attention-deficit/hyperactivity disorder needs re-examining. Trends in neurosciences 32, 2–8, doi:10.1016/j.tins.2008.09.010 (2009). Jae-Won Kim, Vinod Sharma, Neal D. Ryan. Predicting Methylphenidate Response in ADHD Using Machine Learning Approaches. International Journal of Neuropsychopharmacology . Volume 18, Issue 11. October 2015. doi: 10.1093/ijnp/pyv052 Johnson, K. A. et al . Response variability in attention deficit hyperactivity disorder: evidence for neuropsychological heterogeneity. Neuropsychologia 45, 630–638, doi:10.1016/j.neuropsychologia.2006.03.034 (2007). Kret, Mariska & Sjak-Shie, Elio. (2018). Preprocessing pupil size data: Guidelines and code. Behavior Research Methods. 51. 10.3758/s13428-018-1075-y. 11 Khanna, 2019 Lichtenstein, P., Carlstrom, E., Rastam, M., Gillberg, C., & Anckarsater, H. (2010). The genetics of autism spectrum disorders and related neuropsychiatric disorders in childhood. American Journal of Psychiatry, 167, 1357–1363. Mueller, A., Candrian, G., Kropotov, J.D. et al. Classification of ADHD patients on the basis of independent ERP components using a machine learning system. Nonlinear Biomed Phys 4, S1 (2010) doi:10.1186/1753-4631-4-S1-S1. Privitera, C. M., Renninger, L. W., Carney, T., Klein, S. & Aguilar, M. Pupil dilation during visual target detection. J Vis 10, 3, doi:10.1167/10.10.3 (2010). Rello, L., & Ballesteros, M. (2015). Detecting readers with dyslexia using machine learning with eye tracking measures. Proceedings of the 12th Web for All Conference on - W4A 15. doi:10.1145/2745555.2746644. Rojas-Líbano, D., Wainstein, G., Carrasco, X. et al. A pupil size, eye-tracking and neuropsychological dataset from ADHD children during a cognitive task. Sci Data 6, 25 (2019) doi:10.1038/s41597-019-0037-2. Rutter, M. (2011). Research review: Child psychiatric diagnosis and classification: Concepts, findings, challenges and potential. Journal of Child Psychology and Psychiatry, 52, 647– 660. Sagvolden T, Johansen EB, Aase H, Russell VA: A dynamic developmental theory of attention-deficit/hyperactivity disorder (ADHD) predominantly hyperactive/impulsive and combined subtypes. Behav Brain Sci 2005, 28:397-419. Serdar Baltaci & Didem Gokcay (2016) Stress Detection in Human–Computer Interaction: Fusion of Pupil Dilation and Facial Temperature Features, International Journal of Human–Computer Interaction, 32:12, 956-966, doi: 10.1080/10447318.2016.1220069. Sheehan, D. V. et al . The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. The Journal of clinical psychiatry 59(Suppl 20), 34-57, 22–33;quiz (1998). Solanto MV. Neuropsychopharmacological mechanisms of stimulant drug action in attention-deficit hyperactivity disorder: a review and integration. Behav Brain Res. 1998;94:127-152. Timm, F., & Barth, E. (2011). Accurate Eye Centre Localisation By Means Of Gradients. Proceedings of the International Conference on Computer Vision Theory and Applications. doi:10.5220/0003326101250130 Uddin LQ, Kelly AMC, Biswal BB, Margulies DS, Shehzad Z, et al (2008) Network homogeneity reveals decreased integrity of default-mode network in ADHD. Journal of Neuroscience Methods 169: 249–254. 12 Khanna, 2019 Van Slooten JC, Jahfari S, Knapen T, Theeuwes J (2019) How pupil responses track value-based decision-making during and after reinforcement learning. PLOS Computational Biology 15(5): e1007031. https://doi.org/10.1371/journal.pcbi.1007031 Vaidya, C. J. et al . Selective effects of methylphenidate in attention deficit hyperactivity disorder: A functional magnetic resonance study. Proceedings of the National Academy of Sciences 95, 14494–14499, doi:10.1073/pnas.95.24.14494 (1998). Vaurio, R. G., Simmonds, D. J. & Mostofsky, S. H. Increased intra-individual reaction time variability in attention-deficit/hyperactivity disorder across response inhibition tasks with different cognitive demands. Neuropsychologia 47, 2389–2396, doi:10.1016/j.neuropsychologia.2009.01.022 (2009). Wainstein, G., Rojas-Líbano, D., Crossley, N.A. et al. Pupil Size Tracks Attentional Performance In Attention-Deficit/Hyperactivity Disorder. Sci Rep 7, 8228 (2017) doi:10.1038/s41598-017-08246. Willcutt, E.G., Nigg, J., Pennington, B.F., Solanto, M.V., Rohde, L.A., Tannock, R., .andLahey, B.B. (2012). Validity of DSM-IV attention deficit/hyperactivity disorder symptom dimensions and subtypes. Journal of Abnormal Psychology. 13