Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE X plore. Restrictions ap ply. Identifying Parkinson's Disease Using ANN and ANN Algorithms Based on Voice Disorders Kavyashree G. M , Asst Prof., Abhishek Singh, Sidhant Kaul, Vatsal Jain , Department of Computer Science and Engineering, Sir M Visvesvaraya Institute of Technology, Bangalore, Karnataka, India, 562157 Abstract — Speech signal processing has received a lot of attention in recent years due to its wide range of applications. In this study, we performed a comparative analysis of efficient detection of Parkinson's disease applied to a machine learning classifier for speech disorders known as dysphonia. To prove a robust detection process, we used an artificial neural network (ANN) and the K Nearest Neighbors (KNN) algorithm to distinguish be tween PD patients and healthy individuals. Experimental results show that our ANN classifier achieved higher average performance than her ANN classifier in terms of accuracy. The UCI study consisted of 31 subjects, 23 of whom were diagnosed with Parkinson' s disease. Established systems can use ANNs to distinguish healthy from acceptable people with PD with 96.7% accuracy. Keywords — Parkinson’s disease; ANN; KNN; dysphonia I. I NTRODUCTION Neurodegenerative diseases have personal, social and economic consequences at the personal, occupational and societal levels.A progressive medical condition that affects the brain and nervous system and leads to death of nerve cells. The best known and most common are Alzheimer's disease and Parkinson's disease. Park inson's disease is specifically associated with loss of herdopaminergic neurons in the basal ganglia. In fact, his drug bill for Parkinson's is very expensive. Currently, no cure has been found for. Medication is limited to early stage treatment to impro ve the patient's quality of life.Several methods have been used to identify symptoms of Parkinson's disease, most of which require motor behavior that occurs only in advanced stages of the disease. The most commonly used conventional methods for diagnosing disease are expensive invasive methods, namely SPECT and CT tomography, which are inherently effective during the mature stages of the disease. In addition to traditional methods, practitioners follow several diagnostic paths. Some of them were based on handwriting, considering the relationship between handwriting and nervous system problems. increase.Current research focuses on speech analysis using diagnostic pathways. In fact, voice - based systems have been the focus of recent PD telemedicine studies. Therefore, we used several audio signal processing algorithms to extract the information needed for the evaluation of PD. The extracted features are passed to learning algorithms to build trustworthy decision support systems.Acoustic parameters commonly us ed in acoustic analysis applications today, and most frequently cited in literature, are fundamental frequency, jitter, simmer, and HNR. The fundamental frequency (F0), measured at Hertz, is and is defined as the number of times the vocal folds repeat the generated sound wave in a given time. It is also times the opening and closing cycle of the glottis. This frequency has a typical range of values for different genders and age groups. However, these values are not stationary because F0 also uses to tra nsmit prosody [1]. Jitter is defined as the parameter of frequency variation from one cycle to the next. This is mainly due to the inability to control the vibration of the vocal cords. Her patient voices with medical conditions often have a high percenta ge of jitters. Most researchers estimate that sustained vocalizations range from 0.5 to 1.0% in adults. The simmer refers to the amplitude variation of the sound wave [2]. It changes in response to decreased glottal resistance and extensive damage to the v ocal cords, correlates with the presence of noise and respiration. Levels of less than 3% in adults and 0.4 - 1% in children are considered pathological speech [3]. Parkinson's disease detection is based on the use of different classifiers. Their distinction is based on metrics: classification accuracy, Matthews correlation coefficient (MCC), Spearman correlation coefficient, specificity, sensitivity, F - score (F - measure), etc. There are formulas representing each of these metrics to calculate and conclude whi ch is the most qualitatively appropriate classifier for your study.Before defining these criteria, we need to look at the confusion matrix. This is called a contingency table and is a tool for measuring the performance of a learning model and checking how well its predictions compare to reality on classification problems. Table I shows the confusion matrix for a two - class classifier. Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE X plore. Restrictions ap ply. TABLE I. C ONFUSION M ATRIX F1 - score: 𝑇𝑁 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑇𝑁 + 𝐹𝑃 (6) It represents the harmonic average of the recall and the precision: TP (True Positive): the prediction is positive while the actual real value is positive. TN (True Negative): the prediction is negative while the actual real value is negative. FP (False Positive): the prediction is positive while the actual real value is negative. FN (False Negative): the prediction is negative while the actual real value is positive. a) Matthews’s correlation coefficient (MCC): 𝑇𝑃 𝑇𝑁 − 𝐹𝑃 𝐹𝑁 𝑀𝐶𝐶 = (1) √ ( 𝑇𝑃 + 𝐹𝑃 ) ( 𝑇𝑃 + 𝐹𝑁 ) ( 𝑇𝑁 + 𝐹𝑃 ) ( 𝑇𝑁 + 𝐹𝑁 ) This coefficient is used as a quality measurement of binary and multi - class classification. It’s based on the true and false positive and negative. It returns a value ranging from - 1 to 1. (1): Perfect classifier (perfect prediction). (0): No better than random prediction. ( - 1): Contradiction between prediction and observation. b) Spearman correlation coefficient: This coefficient analyzes monotonic non - linear relations (if one of the variables increases, the other does the same, and vice versa). 6 ∑ d i 2 r s = 1 − n ( n 2 − 1 ) (2) n: number of observations. 𝑑 𝑖 = 𝑔 ( 𝑋 𝑖 ) − 𝑟𝑔 ( 𝑌 𝑖 ) : The difference between the two rows of each observation c) Classification accuracy, specificity, sensitivity: The classification model can be evaluated based on the confusion matrix. According to the evaluation results the performance measures are then extracted. Precision: Represents the proportion of correct predictions among the points that have been predicted positives. 𝑇𝑃 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 + 𝐹𝑃 (3) Accuracy : the number of correct predictions made by the model over all kinds predictions made. 𝑇𝑃 + 𝑇𝑁 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 (4) Sensitivity : It represents the rate of true positives. 𝑇𝑃 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃 + 𝐹𝑁 (5) Specificity : It represents the rate of true negatives. 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 𝐹 𝑠𝑐𝑜𝑟𝑒 = 2 ∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 (7) In general, classifier scores are based on the confusion matrix. We then use the metrics extracted from this matrix to quantify the classifier's performance. To conduct the evaluation, several authors performed work based on overall accuracy. This is an indicator of overall performance, as it indicates what percentage of all referral sites were mapped correctly, and measures performance regardless of size. class. Overall accuracy is generally expressed as a percentage, with a value of 100% representing perfect classification, with all reference sites correctly classified. In this study, her two classifiers of he r distinguishing between healthy and Parkinson's disease (PD) subjects, namely ANN and ANN, were applied. According to the accuracy classifier, the ANN gives a classification rate of 96.7% and the ANN gives an accuracy of 79.3%. II. Materials and Methods: The current study is based on the Parkinson's disease database UI Machine Learning [4] and acoustic properties. A. Dataset: The database consisted of 195 sustained vocalizations from 31 male and female subjects, of whom 23 were diagnosed with Park inson's disease. Patients' ages ranged from .46 to 85 years (mean 65.8, standard deviation.9.8). For each patient, a mean of 6 vocalizations of individuals were recorded ranging from 1 to 36 seconds. According to this database, the main purpose is to dist inguish her healthy people from those with Parkinson's disease.This is done based on the status column defined as 0 for healthy individuals and 1 for PD patients. Each of the columns in the table represents a vote, and each row represents one of the 195 records of those individual voices (the "Name" column). B. Audio Recordings: Subjects were aged between 46 and 85 years old (mean 65.8, standard deviation 9.8). An average of 6 vocalizations with durations of 1 – 36 s were recorded for each of the su bjects. See Table I for item details. Figure 3 shows her two diagrams of these audio signals [5]. Predicted Class Positive Negative Actual Class Positive (TP) (FN) Negative (FP) (TN) Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE X plore. Restrictions ap ply. Fig. 1. Example of speech signal of a healthy individual Fig. 2. Example of speech signal of a subject with PD Vocalizations were recorded in an Industrial Acoustics Company (IAC) sound processing booth using a head - mounted microphone (AKG C420) positioned 8 cm from the lips. The microphone was calibrated using a Class 1 (B&K 2238) sound level meter placed 30 cm from the loudspeaker.Computerized Speech Laboratory (CSL) 4300B Hardware (KAY Elemetrics) was used for direct computer recording of speec h signal sampled at 44.1 kHz at 16 - bit resolution [5]. C. Feature extraction/selection: To compute features, all speech signals must be selected based on both conventional and non - standard measurement methods. Each method generates a unique number for each of the 195 signals. Table II provides a list of the measurements used to characterize this study. 1) Calculation of conventional dimensions This calculation was performed using the software Praat [6]. Conventional measurements are based on applying short - term autocorrelation to successive segments of the peak - to - peak signal to determine the frequency (F0 or pitch period) of vocal fold vibration and the time position at the onset. for each vibration cycle of the vocal folds (pitch marks) [7]. TABLE II. A TTRIBUTE DETAILS FOR DIAGNOSIS OF PD P ATIENT To calculate jitter and periodicity measurements, a set of frequencies for each tuning cycle 4464 should be used. It does this by taking the successive absolute differences between the frequencies of each 4484 cycles and calculating the average over 4464 different cycle numbers. Normalized by the overall mean of The sh immer and amplitude noise measurements are obtained from the sequence of maximum amplitudes of th e amplitude of the signal within each speech cycle 4484. The average difference of this sequence is taken as a measure of the variation between cycle amplitudes. The noise - to - harmonics (and harmonics - to - noise) ratio is obtained from the signal - to - noise est imate of the autocorrelation for each cycle. For more information on how traditional measures are calculated, see [6], [8], and [7]. 2) Calculation of non - standard measurements: The calculation of the correlation dimension (D2) is based on the time - delayed embedding of the signal to reconstruct the phase space of the nonlinear dynamic system generating the speech signal [10] . In this recovery phase space, the complex dynamics involved in dysphonia were exhibited by geometrically self - similar (fractal) obje cts [9]. We use an implementation of Time Series Analysis (TISEAN) [11]. Recurrence Period Density Entropy (RPDE) quantifies the extent to which dynamics in the reconstructed phase space can be considered strictly periodic after time - delay embedding d. H. Repeat exactly [13]. Repeat period T represents a repeat signal returning to the same point in phase space after a specific time . Deviations from the periodicity of the distribution of these repetition periods 𝑃 ( 𝑇 ) as estimated by the entropy H have been shown to be useful indicators of common speech impairments. This is because common speech pathologies indicate impaired ability. Vote by sustained periodic vibration of the vocal cords [13]. Table Head Table Column Head MDVP: RAP Relative Amplitude Perturbation MDVP: PPQ Five - point Period Perturbation Quotient Jitter: DDP Average absolute difference of differences between cycles, divided by the average period MDVP: Shimmer Shimmer Local amplitude perturbation MDVP: Shimmer (db) Local amplitude perturbation (decibels) Shimmer: APQ3 3 - point Amplitude Perturbation Quotient Shimmer: APQ5 5 - point Amplitude Perturbation Quotient MDVP: APQ 11 - point Amplitude Perturbation Quotient Shimmer: DDA Average absolute difference between the amplitudes of consecutive periods NHR Noise - to - Harmonics Ratio HNR Harmonics - to - Noise Ratio RPDE Recurrence Period Density Entropy D2 Correlation Dimension DFA Detrended Fluctuation Analysis Spread1 Fundamental frequency variation Spread2 Fundamental frequency variation PPE Pitch period entropy Table Head Table Column Head MDVP: F0 (Hz) Average vocal fundamental frequency MDVP: Fhi (Hz) Maximum vocal fundamental frequency MDVP: Flo (Hz) Minimum vocal fundamental frequency MDVP: Jitter (%) Fundamental frequency perturbation (%) MDVP:Jitter (Abs) Absolute jitter in microseconds Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE X plore. Restrictions ap ply. i The RPDE values ( 𝐻𝑛𝑜𝑟 𝑚 ) are normalized to the range [0,1] by dividing by the entropy of the uniform distribution. Finally, DFA represents a measure of the degree of probabilistic self - similarity of noise in speech signals. Most of the time, turbulence through the vocal cords produces sound when speaking [13]. For common speech disorders, it has been found that subjects with dysphonia have higher scaling exponents than healthy subjects [8]. The amount of amplitude variation 𝐹 ( 𝐿 ) of the speech signal over a range of time scal es is computed by the DFA algorithm. Also, the self - similarity of speech signal is measured by the slope 𝛼 of the line in the logarithmic plot of 𝐿 versus 𝐹 ( 𝐿 ). These gradient values ( 𝛼𝑛𝑜𝑟𝑚 ) are normalized to the range [0,1][8] by a simple nonli near transformation. D. Classifier Baseline: Artificial Neural Network (ANN) and K - Nearest Neighbors (KNN) are used as the machine learning algorithm in our research. They are one of the most popular classifiers in the literature. Each of them has its own technique and voting method used to separate PD patients from healthy patients. 1) ANN classification: Biological neural networks were behind the advent of artificial neural networks. It consists of her large networks of interconnected neurons that exchang e her signals with each other. These links are weighted based on past experiences. Hence; an adaptive network that can learn [14] Artificial neural networks are widely used to model complex pattern, prediction, and classification problems. Figure 3 shows t he general structure of an artificial neural network. There are many different optimization algorithms during training of ANNs. They all have different characteristics and performance of the in terms of requirements, speed and accuracy. Therefore, we have to be very precise in choosing the correct algorithm 4304 according to the correct conclusion of the problem. The main optimization algorithms used in the training phase of multi - layer networks are gradient descent (GD), batch backpropagation (BBP), conjug ate gradient descent (CGD), and Levenberg - Marquardt (LM) training algorithm. is. Our study recommended using the LM algorithm for the classification process. This is the nonlinear optimization algorithm. His LM algorithm, developed by Kenneth Levenberg and Donald Marquardt, represents his combination of Newton's method in backpropagation and the gradient descent property of [15]. It is designed to work specifically with loss functions that take the form of sums of squared errors. Works with of gradient vectors and Jacobian matrices. Fig. 3. General Structure of Artificial Neural Network 2) K - NN Classification: K - Nearest Neighbors or KNN is one of the most commonly used supervised machine learning algorithms and one of the simplest classification algorithms. It is the data classification method that determines which group a data point belongs to by examining the surrounding data points. Cases are classified based on the majority vote of their neighbors, assigning the case to the most common class a mong the K nearest neighbors measured by the distance function. In summary, the formal k NN classification algorithm is: arg min( 𝑑 𝑒 ( 𝑡 , 𝑜 , 𝑘 ) ) Identify P ( 9 ) where 𝑡 is the training data, 𝑜 is the object to be classified, 𝑃 is the class assigned to the new object, 𝑘 is the number of nearest neighbors to consider, and 𝑑𝑒 is the Euclidean distance. L d e ( t, o, k ) = √ ∑( t i,k − o i,k ) 2 (1 0 ) i=1 𝐿 is the length of each data vector.To get the best results, you should choose the best value for K and the appropriate distance calculation method. Historically , the optimal value of k for most datasets was between 3 and 10. There are several ways to calcu late distance, but one method may be preferred depending on the problem you are trying to solve. However, the Euclidean distance of is the most popular choice. R ESULTS AND D ISCUSSION This study is based on 22 acoustic features extracted from human dysphon ia measurements.is used to identify PD patients. The 22 characteristics are contained in TableII. A ANN Classification: MATLAB's neural network pattern recognition tool (nprtool) was used to perform the classification process. Our data are divided into his three parts of artificial neural networks.Validation: Once the network stops developing, stop training using 5% of the dataset. test: We use 25% of the data set to measure the performance parameters of the network in the training and validation steps. Fig ure 4 shows the ANN view. : f = ∑ e 2 i=1 Where m is the number of instances in the data set. (8) Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE X plore. Restrictions ap ply. The regression graphs of the training, validation and test data set are shown in figure 5, 6 and 7 respectively. The figure 8 shows the entire regression of the data set. Fig. 4. The view of ANN Fig. 5. Training regression Fig. 8. All Dataset regression B. K NN classification: The data were split into training (70%) and testing (30%) data. Here, we used the ANN classifier method to store all training data. It is a widely used and reliable technique for evaluating the accuracy of prediction systems and avo iding adjustments beyond . The number of neighbors was used to classify the new examples, and the distance function was used to determine the nearest neighbors. Using a cosine distance of , the best accuracy of 79% was obtained when the number of acquired neighbors was k=1 . A 96.7% correct classification rate was achieved using ANNs. For ANN, the achieved accuracy was 79.31%. Therefore, the classifier results produced by the ANN are the best we know. Our results show that ANN outperforms ANN for classifyin g PD patients, consistent with the higher accuracy of the results obtained. Fig. 6. Validation regression Fig. 7. Test regression II. C ONCLUSION : In this study, we wanted to compare 's machine learning, artificial neural network (ANN) and K 's nea rest neighbor (ANN) classifiers. We used them to distinguish PD patients from healthy patients The ANN algorithm yielded the best score. The experiments results gained 96.7% classification accuracy for ANN, depending on the data set and the number of acous tic features used. MATLAB is one of the widely used software for this purpose. Nowadays, in the medical imaging domain, many classification techniques are applied in order to achieve a highest accuracy. This work can be extended to other machine learning a lgorithms and diverse datasets in order to increase the performance of classifiers to achieve the highest score of accuracy. As a perspective, we plan to work on ANNs and hybrid classification methods where ANNs can be aggregated with other machine learnin g classifiers. References [1] Teixeira, J.P.; Ferreira, D.; Carneiro, S. Speech Acoustic Analysis - Measurement of Jitter and Shimmer for Speech Diagnosis of Medical Conditions. At the 6th Luso Mozambique Engineering Conference. Maputo, Mozambique, 2011 Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE X plore. Restrictions ap ply. R E F E R E N C E S [1] Teixeira, J. P.; Ferreira, D.; Carneiro, S.. Análise acústica vocal - determinação do Jitter e Shimmer para diagnóstico de patalogias da fala. In 6º Congresso Luso - Moçambicano de Engenharia. Maputo, Moçambique, 2011. Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE X plore. Restrictions ap ply. [2] Zwetsch, I., Fagundes, R., Russomano, T., Scolari, D.. Digital signal processing in the differential diagnosis of beningn larynx diseases, Porto Alegre, 2006. [3] Guimarães, Isabel. A Ciência e a Arte da Voz Humana. Escola Superior de Saúde de Alcoitão, 2007. [4] M.A.Little, P.E. McSharry, E.J.Hunter, J.Spielmanand L.O.Ramig “Suitability of dysphonia measurements for telemonitoring of Parkinson's disease” IEEE Transactions on Biomedical Engineering, 2009. [5] Little, M. A., McSharry, P. E., Hunter, E. J., Spielman, J., & Ramig, L. O. (2009). Suitability of Dysphonia Measurements for Telemonitoring of Parkinson’s Disease [6] P. Boersma and D. Weenink, “Praat, a system for doing phonetics by computer,” Glot Int., vol. 5, pp. 341 – 34 5, 2001. [7] P. Boersma, “Accurate short - term analysis of the fundamental frequency and the harmonics - to - noise ratio of a sampled sound,” presented at the Inst. Phonet. Sci., University of Amsterdam, Amsterdam, The Netherlands, 1993, vol. 17. [8] KayPENTAX , “Kay elemetrics disordered voice database, model 4337,” Kay Elemetrics, Lincoln Park, NJ, 1996 – 2005. [9] J. J. Jiang, Y. Zhang, and C. McGilligan, “Chaos in voice, from modelingto measurement,” J. Voice, vol. 20, pp. 2 – 17, 2006. [10] H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, New ed. Cambridge, U.K.: Cambridge Univ. Press, 1999. [11] R. Hegger, H. Kantz, and T. Schreiber, “Practical implementation of nonlinear time series methods: The TISEAN package,” Chaos, vol. 9, pp. 413 – 435, 1999. [12] M.A. Little, P. E. McSharry, S. J.Roberts, D. A. Costello, and I. M. Moroz, “Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection,” Biomed. Eng. Online., vol. 6, p. 23, 2007. [13] R. P. Dixit, “On defining aspiration,” in Proc. 12th Int. Conf., Linguistics, Tokyo, Japan, 1988, pp. 606 – 610. [14] Manik S, Saini L M and Vadera N 2016 Counting and classification of white blood cell using artificial neural network (ANN) Proc. IEEE 1st International Conference on Power Electronics, Intelligent Contro l and Energy Systems 2016 [15] Hagan M T and Menhaj M B 1994 Training feedforward networks with the Marquardt algorithm IEEE transactions on Neural Networks 5(6) pp 989 - 993.