i Preface Welcome to the Volume 10 Number 1 of the International Journal of Design, Analysis and Tools for Integrated Circuits and Systems (IJDATICS). This volume is comprised of research papers from the International Conference on Recent Advancements in Computing in AI, Internet of Things (IoT) and Computer Engineering Technology (CICET), October 18-19, 2021, Taipei, Taiwan. CICET 2021 is hosted by The Tamkang University amid pleasant surroundings in Taipei, which is a delightful city for the conference and traveling around. CICET 2021 serves a communication platform for researchers and practitioners both from academia and industry in the areas of Computing in AI, IoT, Integrated Circuits and Systems and Computer Engineering Technology. The main target of CICET 2021 is to bring together software/hardware engineering researchers, computer scientists, practitioners and people from industry and business to exchange theories, ideas, techniques and experiences related to all aspects of CICET. Recent progress in Deep Learning (DL) has unleashed some of the promises of Artificial Intelligence (AI), moving it from the realm of toy applications to a powerful tool that can be leveraged across a wide number of industries. In recognition of this, CICET 2021 has selected Artificial Intelligence and Machine Learning (ML) as this year’s central theme. The Program Committee of CICET 2021 consists of more than 150 experts in the related fields of CICET both from academia and industry. CICET 2021 is organized by The Tamkang University, Taipei, Taiwan, and co-organized by AI University Research Centre (AI-URC) and Research Institute of Big Data Analytics (RIBDA), Xi’an Jiaotong-Liverpool University, China as well as supporting by: Swinburne University of Technology Sarawak Campus, Malaysia; Taiwanese Association for Artificial Intelligence, Taiwan; Trcuteco, Belgium; International Journal of Design, Analysis and Tools for Integrated Circuits and Systems, International DATICS Research Group. The CICET 2021 Technical Program includes 1 invited speaker and 15 oral presentations. We are beholden to all of the authors and speakers for their contributions to CICET 2021. On behalf of the program committee, we would like to welcome the delegates and their guests to CICET 2021. We hope that the delegates and guests will enjoy the conference. Professor Ka Lok Man, Xi’an Jiaotong-Liverpool University, China Professor Young B. Park, Dankook University, Korea Chairs of CICET 2021 ii Table of Contents Vol. 10, No. 1, November 2021 _____________________________________________________________________________________ Preface ………………………………………………………………………………....... i Table of Contents ……………………………………………………………………….. ii _____________________________________________________________________________________ 1. Xin Hu and Dingkun Li, Korean EFL Learners’ Suprasegmental Features, Yiwu Art School of Zhejiang, China 1 2. Yuhan Li, Ketian Wang, Yujia Zhai and Quan Zhang, Automatic Stacking System Based on ABB Robots and Digital Twin Monitoring, Xi’an Jiaotong-Liverpool University, China 7 3. Yong Shing Voon, Yunze Wu, Xinzhi Lin and Kamran Siddique, Performance Analysis of CPU, GPU and TPU for Deep Learning Applications, Xiamen University Malaysia, Malaysia 12 4. Tianyu Cai, Yi Yen Low and Kamran Siddique, A Review of Defense Solutions Against Cache Side- Channel Attacks, Xiamen University Malaysia, Malaysia 19 5. Aayush Srivastava and Ou Liu, Using Twitter Sentiment Analysis to Assess US Airline Industry, Aston University, Birmingham, UK 27 6. Dezheng Yang, Dongkun Hou and Jie Zhang, Differential Privacy in Social Network Analysis: A Systematic Literature Review, Xi’an Jiaotong-Liverpool University, China 34 7. Zhi Lin, Jie Zhang, Steven Guan and Ka Lok Man, Performance Analysis of Compressed Transmission and Storage of Blockchain Enabled Federated Learning in Internet of Vehicles, Xi’an Jiaotong-Liverpool University, China 39 8. Zitian Peng, Dongkun Hou, Jie Zhang and Zheng Zhang, A Systematic Literature Review of Privacy Protection Methods in Federated Learning: Issues, Classification, and Application, Xi’an Jiaotong- Liverpool University, China 45 9. Zheng Zhang, Dongkun Hou, Jie Zhang and Zitian Peng, A Privacy-Preserving Middleware Framework for Fog Computing-Enhanced IoT using Differential Privacy, Xi’an Jiaotong-Liverpool University, China 51 10. Justina Mandravickaitė and Tomas Krilavičius, Testing performance of NER models for Russian, Vytautas Magnus University, Lithuania 55 11. Yu-Li Wang and Shwu-Huey Yen, Bi-Directed Super Resolution Network for Real-World Images Corrupted by Unknown Degradation, Tamkang University, Taiwan 59 12. Runjie Wang and Gabriela Mogos, Visual Cryptography on mobile devices, Xi’an Jiaotong- Liverpool University, China 65 13. Rory Custance and Gabriela Mogos, Raspberry pi Firewall and Intrusion Detection System, Xi’an Jiaotong-Liverpool University, China 70 14. Shuaibu Musa Adam, Najib Hamisu Umar, Vladimir Hahanov, Ka Lok Man, Svetlana Chumachenko and Eugenia Litvinova, Wind-Diesel Hybrid Design Project: A Case Study of Masirah Island in Oman, Federal University Dutsin-Ma, Katsina, Nigeria 74 iii 15. Abubakar Ya’u Muhammad, Shamsuddeen Yusuf, Shuaibu Musa Adam, Ka Lok Man, Vladimir Hahanov, Svetlana Chumachenko and Eugenia Litvinova, Design and Simulation of Microstrip Patch Array Antenna for Wireless Communication System, Kano University of Science and Technology, KUST Kano, Nigeria 78 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Analysis of Superasegmental Features among Korean EFL Students in Visualization Xin Hu1, Dingkun Li2* guidance can be demonstrated in different interactive speech Abstract— This paper delves into some aspects of software and websites (Lee, 2008). Moreover, Murphy (2003) suprasegmental features such as syllable structure, stress, and also states that research in the different fields of L2 learning and rhythm and compares them between NS and NNS. It is teaching has shown that the use of explicit instruction can have investigated in spectrograms and sound waveforms that 1. On the aspect of syllable structure in English, the onset and the coda in useful effects in learning. Linguistic research on segmental English syllable structure are characterized to have a maximum of features of phonemes gives birth to the development of 3 and 4 consonant clusters, respectively. In contrast, Korean Audiolingualism in the 1940s and 1950s. In the 1970s, we allows only 1 single consonant in onset and coda position. This encountered the advent of 'Communicative Language Teaching' cross-linguistic difference gives rise to the insertion of the neutral in the area of EFL teaching, which values 'fluency' over vowel /ɨ/ to break up the consonant clusters in English words, in 'accuracy' in order to fulfill the primary purpose of language, which the inserted vowel forms an independent wave chunk. 2. Refer to stress in English, it is universally recognized as every that is, communication. The Communicative Approach, which single English word or sentence consist of its own stress. On the emerged in the 1970s and is currently dominant in language contrary, Korean lacks stress placed at the level of the word. It teaching, is said to play a guiding role in today's pronunciation follows that Korean EFL learners tend to put an approximately teaching and research. It follows that suprasegmental features of equal prominence on every syllable in a word and to exhibit a CLT in pronunciation receive more attention from both ESL tendency to put a strong prominence particularly on the first and EFL researchers. More importantly, a more significant syllable of a word with more than 2 syllables, which is dubbed an 'initial prominence phenomenon' in this paper. 3. In relation to distinction is often detected between the two groups of learners English rhythm, English is certainly a stress-timed rhythm, but in the area of suprasegmentals than segmentals. This has led me Korean is a syllable-timed rhythm. The core differences between into research on suprasegmental features among Korean EFL the stress-timed rhythm and the syllable-timed are on the form of learners. In this paper, we delve into the aspects of 'foot', which is established when stressed and unstressed syllables syllabification, stress, and rhythm, which are most deviant from occur in relatively regular alternating patterns in sentences, led to those of English native speakers. To show such deviance, We a phenomenon of that the number of feet depends on the timing of articulation within a whole sentence. Given that suprasegmental have conducted an experiment on these aspects with one native features play a more important role than segmental ones, this speaker (NS) and non-native speaker (NNS). Based on the paper finds its significance in exhibiting these features in performance of the NNS, We argue that such suprasegmental visualization between NS and NNS, meanwhile, has a historical features play a significant role in enhancing communicative significance of laying a milestone for the future researchers in the competence and accordingly suggest some implications to EFL phonetics field. pronunciation teaching to Korean EFL learners. Index Terms—segmental feature, suprasegmental feature, syllable, stress, rhythm, foot, linking II. LITERATURE REVIEW I. INTRODUCTION A. Research on Segmentals and Suprasegmentals I t is obvious that pronunciation is one of the important As a first approximation, segmentals are widely defined as components of language skills. However, it has been “the basic inventory of distinctive sounds” which are combined neglected in EFL teaching and research, as well as treated as to form a spoken language (Morley, 1991; Celce-Murcia et al., Cinderella, as dubbed in Kelly (1969). Since then, the pendulum 1996; Florez, 1998). Anderson-Hsieh et al. (1992) compare the of pronunciation teaching and research has begun to swing in relative contributions made to intelligibility between segmental the opposite direction with the emergence of the Reform and suprasegmental features and find that the latter can earn Movement in phonetics in the 1890s. With the advent of this higher scores in enhancing intelligibility. As the two main movement, there appeared an opposite approach in components in pronunciation teaching and research, segmentals pronunciation teaching and research, which is referred to as the and suprasegmentals, have undergone fluctuations in the past 'Analytic-Linguistic Approach' (Celce-Murcia et al., 1996). In several decades, so have the language teaching methods, the Analytic-Linguistic Approach, explicit influences of ranging from Grammar-Translation (GT) in the 1800s to pronunciation pedagogy is enhanced. The clear information Communicative Language Teaching (CLT) in the 1970s. The teaching method sees its basic tenet in teaching the grammar Xin Hu is with the Department of English Education, Yiwu Art School of rules of the target language and translating it into the native Zhejiang Province, Zhejiang, China (email:andy9100910@126.com) language of learners based on these rules. Affirmatively, for the Dingkun Li is with the Department of Big Data Research Center, Karamay Central Hospital, Xinjiang, China (email:33644251@qq.com) 1 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 sake of accurate communication in terms of grammar and one NNS in comparison with one NS. spelling, a segmental feature, the Grammar-Translation method still contributes to the underlying skills and exercises for EFL III. SUPRASEGMENTAL PERFORMANCE IN COMPARISON learners (Fish, 2003). Deeply influenced by the Reform In this chapter, we compare and contrast between a native Movement, Audiolingualism appeared in the 1940s and had speaker (NS) and a non-native speaker (NNS) in their been dominant until the 1950s. This teaching method highly performance of syllabification, stress, and rhythm. The NS is a valued 'accuracy' in pronunciation. It follows that most research female who comes from the US, who is considered to be and teaching was implemented with a main focus on segmental authentic in the production of these suprasegmental features. features such as individual phonemes in words (Morley, 1991). The NNS is a 3rd year middle school student, a typical EFL In the 1970s, we witness a great turning point in language learner who performs these features uniquely characteristic of teaching with the advent of Communicative Language Teaching Korean phonetics and phonology. (CLT). CLT places its primary goal on 'communication' rather than the 'accurate use' of individual phonemes and words. The A. Vowel Insertion in Syllabification method stresses the need to help students achieve a certain level Restriction on Onset and Coda in Korean of pronunciation skills for fluent communication, dubbed as a As illustrated in syllable structure (1), a nucleus is surrounded 'threshold level' in the literature above which communication is by a consonant cluster of onset and coda, which consists of one not threatened by misunderstanding caused by incorrect or more consonants across languages. Syllables in Korean, pronunciation (Celce-Murica, Brinton & Goodwin, 1996). unlike in English, are characterized to strictly have 1 consonant in onset and coda position.1 This strict restriction on the number B. Suprasegmentals and Intelligibility of onset and coda is seen to impose a huge burden on Korean Pennington & Richards (1986) point out that the teaching of EFL learners' syllabifying English words with more than 2 phonetics should include segmental and suprasegmental consonants in the positions. Let's consider the syllabification. features, pronunciation habits, and intelligibility. Functional (1) a. Christmas /krɪsməs/ → /krɪsɨməs/ intelligibility here can be defined as an ability to make oneself b. spring /sprɪŋ/ → /sɨpɨrɪŋ/ relatively easily understood. Celce-Murcia et al. (1996) also (2) a. ghost /gəʊst/ → /gəʊsɨtɨ/ take intelligible pronunciation as a model and realistic goal of b. risk /rɪsk/ → /rɪsɨkɨ/ teaching pronunciation. Two dichotomic terms of intelligibility In (1a), the word Christmas has a consonant cluster /kr/ in can be said to be accuracy and fluency. The two concepts are onset, which is not allowed in Korean syllable structure. best represented by Audio-Lingual Method (ALM) in the Accordingly, the neutral vowel /ɨ/ is inserted to break this 1940s-1950s and CLT, respectively. Accuracy was highly cluster.2 In (1b), the neutral vowel is likewise inserted after /s/ valued in ALM, and thereby it can be achieved through and /p/ in order to break the consonant cluster /spr/, virtually intensive and explicit pronunciation teaching. On the other allowing only the single consonant /r/ to become the onset. In hand, fluency is one of the goals of CLT. In other words, the contrast, the 2 words in (2a,b), ghost and risk, contain consonant communicative target for EFL learners is universally clusters, /st/ and /sk/, in coda, respectively. The principle of recognized as intelligibility. Korean syllable structure, mentioned above, holds here as well, resulting in the insertion of the neutral vowel. It is generally noted that the acquisition of pronunciation is This cross-linguistic difference yields a difference in usually associated with pronunciation models. There are, at articulation, as illustrated below. large, 5 standard models of pronunciation such as Received Pronunciation (RP), General American (GA), Canadian, Australian, and Indian. The first two models have been recognized to be the most dominant ones of all. The pronunciation model that We adopt here in this paper is GA mentioned above, based upon which We analyze the aspects of Fig.1. Syllabification of phoned "He phoned me on Friday, too." one Korean EFL learner in comparison with one native speaker, Figure 1 above is a visual representation of the sentence He focusing our main attention on intelligibility at which the phoned me on Friday, too. A close look at the waveform of the threshold level is set. In fact, this course is complicated in that word phoned /foUnd/, which has a consonant cluster /nd/ in there is little agreement as to what phonological aspects threaten coda, reveals a difference in performance between an NS and an an EFL leaner’s intelligibility and therefore there is relatively NNS. That is, the NNS is shown to have inserted the Korean little information that guides the students to decide what aspects neutral vowel /ɨ/ after /d/, pointed out by the arrow (↑), to form of superasegmental features can cause unintelligibility (Munro an independent syllable /dɨ/ along with the coda consonant /d/, and Derwing, 1995). In connection with this problem, Kenworthy (1987) argues that intelligibility is the most sensible goal and sets intelligibility as being understood by a listener at a 1 Syllables in English are observed to maximally have 3 and 4 consonants in onset and coda, given time in a given situation. Consequently, we support respectively, as in spring /sprɪŋ/ and contempts /kən’tempts/. 2 The neutral vowel /ɨ/, analogous to /ə/ in English, is uniquely found in Korean and constitutes a Kenworthy’s argument so that we implement an experiment on nucleus for any consonant excluded from onset as in spring /sprɪŋ/ → /sɨpɨrɪŋ/ (CCCVCC → those characteristics of suprasegmental features performed by CVCVCV). 2 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 resulting in two syllables, /foUn•dɨ/, consequently breaking the first, second, and third syllable, respectively. He asked 29 consonant cluster /nd/ in coda. Thus, the vowel-insertion Korean college students to record these words for the analysis of phenomena are often observed among Korean EFL leaners their stress pattern. His findings are as follows below, where E because Korean EFL learners are unconscious of vowel refers to ‘evenly distributed over every syllable’ and N/A to 'not insertion when they produce and perceive English speech. applicable'. Linking Table 1. Stress Pattern (N=29) In Korean orthography, syllables in a word stand separate Stress Position DIfferent (1st) poLITE (2nd) interRUPT (3rd) from and independent of each other, so that there exists a 1st 24 11 10 clear-cut boundary between syllables. This in part gives rise to a 2nd 0 5 0 difference in performance of linking between an NS and an 3rd 0 N/A 5 NNS. E 5 13 14 Accuracy Rate (%): 82.7 17.2 17.2 We note that the average accuracy rate of 3 words (82.7%, 17.2%, 17.2%) is 39.03%, which is quite low as expected. What is noteworthy here is that the rates of the first syllable and the evenly distributed stress (E) pattern are very high, 44.82% and 48.27%, respectively. Putting aside the first syllable pattern, we pay attention to the Fig. 2. Linking of late night The phrase in question consists of two words, late /leɪt/ and E pattern here and introduce the two different stress patterns of night /naɪt/, each of which contains one syllable. We note that marvelous by NS and NNS. the coda /t/ is followed by the onset /n/ in the phrase /leɪt•naɪt/. Now the NS is seen to have linked these 2 words (syllables) with the coda /t/ unreleased, while the NNS is seen to have inserted /ɨ/ after /t/ to form an independent syllable, as manifested by the arrow (↑), resulting in three syllables as in /leɪ•tɨ• naɪt/.3 Vowel Fig.3. Stress Pattern of marvelous insertion is frequently observed in the process of linking among Figure 3 above is a visual illustration of the word marvelous Korean EFL learners and is discovered more noticeably in /'mɑr:vələs/ with the primary stress on the first syllable /mar/, comparison with English native speakers. which is more prominent than the neighboring unstressed syllables. It is shown that the waveform, spectrogram, and pitch B. Stress and Word-Level Prominence contour, displayed in 3 layers, exhibit a different pattern between NS and NNS. First of all, the waveform of NS is rather Equal Prominence continuous without interruption among 3 syllables. The Stress plays a vital role in English because it is phonemic as spectrogram of NS manifests that he places stress on the first illustrated below. syllable as the first chunk appears to be thick. In contrast, NNS (3) a: reCORD /rɪˈkɔːd/ verb → REcord is seen to have inserted the neutral vowel /ɨ/ after the fricative /s/ /ˈrekɔːd/ noun and to display interrupted syllable boundaries as manifested by b: rePORT /rɪˈpɔːt/ verb → REport 4 distinct chunks of waveforms. The spectrogram performed by /ˈrɪpɔːt/ noun NNS shows 'evenly distributed chunks' across three syllables. c: adDRSS /əˈdres/ verb → ADdress The visual representation of stress above exhibits the /ˈædres/ noun characteristics of Korean EFL learners' stress pattern, in which d: exPORT /ɪkˈspɔːt/ verb → EXport /ˈekspɔːt/ noun every syllable in a word appears to approximately receive equal As seen in (3a,b,c,d), which the position of stress on each pair prominence. of words changes their meaning. This phonemic feature is Initial Prominence Phenomenon lexical in nature, so the stress position of every lexical item is Lee & Rhee (2018) assert that Korean EFL learners are predetermined. On the contrary, Korean lacks stress and never observed to apply Korean phonological rules to their speaking shows such effect; thus, stress shift does not cause the change of of English, in which the first syllable of (Korean) words is word meaning. It follows that Korean EFL learners have stressed. Kang (2013) confirms this fact by saying that Korean difficulties in placing stress on English words. Kang (2013) EFL learners tend to place a primary stress on the first syllable conducted an experiment of pronouncing word stress by Korean in English words, the placement of which he calls an 'initial EFL learners in order to find their stress pattern of three words, prominence phenomenon'. A visualization of this stress pattern different, polite, and interrupt. We note that stress falls on the is shown below for expository purposes. 3 Note that the visual waveform pointed by the arrow (↑) in Figure 2 does not necessarily manifest the insertion of the neutral vowel /ɨ/. However, the sound recording of this phrase clearly shows the insertion of the vowel of our concern. 3 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Fig.4. Stress Pattern of photography Fig.7. The Rhythm Pattern of 5 Syllables by NNS "The birds eat the worms." As shown above, the second chunk of photography /fəˈtɑɡrəfi/ is much thicker and darker in NS’s waveform pattern, which represents that the stress is put on the second syllable of this word. In addition, the second chunk of photography in the NS’s spectrogram forms a dark shade Fig.8. The Rhythm Pattern of 6 Syllables by NNS "The birds will eat the column, which exhibits the same effect as well. In contrast, the worms." first chunk of the word for NNS indicated by the arrow (↑), is in Figure 5 displays the waveform of sentence (4a) Birds eat a more darken shade than the other 3 chunks. Clearly, this worms. We obviously note three distinct chunks of waveforms, constitutes an exemplary evidence for the 'initial prominence each one of which represents birds, eat, and worms, phenomenon', which is generally observed among Korean EFL respectively. The sentence in (4b), The birds eat worms, learners. consisting of 4 syllables (words), is displayed in visualization in Figure 6. It is obvious that this sentence is represented visually C. Rhythm and Sentence-Level Prominence as four distinct chunks of waveforms. Interestingly, however, the third word (syllable), eat, is represented as the smallest Syllable-Timed Rhythm waveform chunk despite it being a content word. The other 2 Korean is typologically regarded as a language of figures, Figure 7 and 8, exhibit the two sentences in (4c) and syllable-timed rhythm, whereas English as a language of (4d), respectively. As displayed in these figures, the number of stress-timed rhythm. In syllable-timed rhythm, first of all, each words (syllables) in both sentences is clearly represented in the syllable in a word is said to receive an approximately equal form of wave chunks, 5 for (4c) and 6 for (4d), respectively. prominence, and its timing thus depends on the number of Even in these rhythm patterns, the content word eat is syllables in a sentence. Let us look at the paradigm of sentences represented as the weakest in terms of prominence and the below with the different number of words (syllables).4 smallest in terms of waveform. Looking at the 4 rhythm (4) a. Birds eat worms. (3 syllables) patterns, displayed in Figure 5 through 8, we can conclusively b. The birds eat worms. (4 syllables) say that the NNS has failed to differentiate between content c. The birds eat the worms. (5 syllables) words and function words, thereby placing an approximately d. The birds will eat the worms. (6 syllables) equal prominence on each word (syllable). This in turn leads to According to the basic tenet of syllable-timed rhythm, different timing in each sentence, which will be illustrated illustrated above, we expect that all the sentences will take shortly. different time for articulation with (4a) the shortest and (4d) the In Figure 5 through 8, we have indicated the starting point longest. This is due to Korean EFL learners' articulatory trait and the end point of articulation for the four sentences in (4) by that they put an approximately equal prominence on every the arrows (↓) so that we can calculate the timing of each syllable, which in turn renders it to take equal time. It follows sentence. The timing is thus calculated by subtracting the that the timing for a whole sentence is proportionate to the starting point from the end point. In Figure 5 for (4a), for number of words (syllables) in it. The waveforms of (4a-d) are instance, articulation starts at the point of 2.9 and ends at the visualized as below for further discussion. point of 4.2, resulting in the timing of 1.3 seconds. Now we end up with 1.9, 2.0, and 2.1 for (4b), (4c), and (4d), respectively. It is obvious that the timing increases according to the number of words (syllables) of each sentence, consequently conforming to the basic tenet of syllable-timed rhythm. Fig.5. The Rhythm Pattern of 3 Syllables by NNS "Birds eat worms." D. Foot and Rhythm A foot is grouped by syllables, which contains one strong-stressed syllable and multiple weak and unstressed syllables as well as they are combined together to form metrical feet in English, along with strong-stressed syllables occurring at Fig.6. The Rhythm Pattern of 4 Syllables by NNS "The birds eat worms." regular intervals (Celce-Murcia et al., 2010). The length of the utterance will greatly count on the number of stressed-syllables. Accordingly, the metrical feet, the term of which has been widely used by English phonologist who has been good expertise in metrical phonology since the 1970s (Hayes, 1980; 4 In this paper, I use the terms word and syllable interchangeably because the distinction of them Liberman & Prince, 1977) and prosodic phonology (Nespor & is blurred when words in a sentence are syllabified. 4 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Vogal 1986; Selkirk, 1980, 1981). Especially in English, the stress-timed rhythm typically consists of regular and patterned feet of stressed and unstressed syllables. In this pattern, the stressed syllable of a content word and the unstressed syllables of (a) function word(s) which follow(s) it are combined to be pronounced as a metrical unit, called 'foot' in English. Fig.12. The Rhythm Pattern of 3 Feet / 6 Syllables by NS "The birds will eat the worms." Therefore, each foot is assumed to take an approximately equal Figure 9 shows the waveform of sentence (5a) Birds eat worms. time no matter how many syllables it contains. The rhythmic We clearly note three dark chunks of waveforms. Each of them pattern of the sentences in (4) can be represented as (5), in represents birds, eat, and worms, respectively. The rhythm which content words are in bold and the vertical line (|) pattern of sentence (5b), The birds eat worms, with 4 syllables indicates a foot boundary. (words) is also illustrated and visualized in 3 distinct chunks in (5) a. Birds |eat |worms. (3 feet / 3 syllables) Figure 10. What concerns us here is the 1st chunk, the birds, b. The birds |eat |worms. (3 feet / 4 syllables) which seemingly looks like 2 chunks. A sharp contrast arises c. The birds |eat the |worms. (3 feet / 5 syllables) here between the chunk of the birds by NNS in Figure 6 and that d. The birds will |eat the |worms. (3 feet / 6 syllables) by NS in Figure 10. In the former, there exists a rather long We note above that each sentence in (5) contains a different interval between the and birds. This indicates that the NNS has number of syllables, but it unanimously has 3 feet regardless of articulated the two syllables (words) as two separate feet with an the number of syllables in it. Now the syllables (words) in each equal prominence on each of them. In the latter, however, we section divided by boundaries are fused into a single unit called notice a short interval between the and birds, which enables us 'foot'. Sentence (5d), for example, has three feet with emphasis to say that the two are completely fused on the content words in bold, birds, eat, and worms, into a single metrical unit called foot. respectively. The other syllables (words), the, will, and the, are The other 2 figures, Figure 13 and 14, exhibit the two attached to the content words, birds and eat, consequently sentences in (5c) and (5d), respectively. As displayed in these maintaining 3 feet likewise as in (5a-c). Now a stress-timed figures, the number of words (syllables) in both sentences is rhythm is established in (5a-d) by uttering 3 feet at regular clearly represented in the form of wave chunks, 3 for (5c) and 3 intervals with each stressed syllable given more prominence for (18d), respectively. Even in these rhythm patterns, the than the other unstressed syllables. As each foot is produced at a function words, the and will, are shown to be attenuated in their regular interval, all the sentences in (5) are presumed to take waveforms adjacent to the emphasized syllables. Looking at the generally the same amount of time because of the same number 4 rhythmic patterns, illustrated in Figure 9 through 12, we can of feet regardless of a different number of syllables in each arguably say that the NS has created 3 metrical feet in each sentence. It follows that the timing of 4 sentences in (5) is sentence in (5) by placing an approximately equal prominence roughly the same if they are articulated according to on each metrical foot. This in turn leads to equal timing in each stress-timed rhythm. The rhythm patterns of the above 4 sentence according to the number of metrical feet, which will be sentences are displayed below. illustrated shortly. In Figure 9 through 12, we have indicated the starting point and the end point of articulation for the four sentences in (5) by the arrows (↓) so that we can calculate the timing of each sentence. The timing is thus calculated by subtracting the Fig.9. The Rhythm Pattern of 3 Feet / 3 Syllables by NS "Birds eat worms." starting point from the end point. In Figure 11 for (5a), for "Thebirdseat worms." instance, articulation starts at the point of 1.3 and ends at the point of 2.75, resulting in the timing of 1.45 seconds. Now we likewise end up with 1.45 seconds all for (5b), (5c), and (5d). It is obvious, therefore, that the equal timing for 3 metrical feet of each sentence is coherent to the basic tenet of stress-timed rhythm. Fig.10. The Rhythm Pattern of 3 Feet / 4 Syllables by NS IV. CONCLUSION The findings of this study reveal that there exist huge differences in suprasegmental features between English and Korean in terms of syllable structure, stress, and rhythm. As for syllable structure, Korean EFL learners arise a process of Fig.11. The Rhythm Pattern of 3 Feet / 5 Syllables by NS resyllabification in the experiement. That is, Korean EFL "The birds eat the worms." learners are often seen to insert a neutral vowel /ɨ/ in order to break consonant clusters. This process is also shown to take place even with single stop and fricative consonants in coda 5 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 position such as /p,b,t,d,k,g,s,z/, as visualized in Figure 2, in REFERENCES which the insertion of the vowel in the phrase late night, as a [1] Anderson-Hsieh, J., Johnson, R., & Koehler, K. 1992. The relationship linking device, forms an independent wave chunk indicated by between native speaker judgements of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language the arrow. According to stress, it represents two characteristics. Learning, 42: 529-555. One is that the lack of stress among Korean EFL Learners, [2] Avery, P. & Ehrlich, S. 1992. Teaching American English naturally leads to placing an equal prominence on every syllable Pronunciation. Oxford: Oxford University Press. in a word, as displayed visually in Figure 3, in which 4 syllables [3] Celce-Murcia, M. & McIntosh, L. 1991. Teaching English as a Second or Foreign Language. Cambridge, MA: New Burry House Publishers. of the word marvelous, articulated by NNS, are stressed [4] Celce-Murcia, M., Briton, D. M., Goodwin, J. M., & Griner, B. 2010. approximately equally. The other characterization is what we Teaching Pronunciation: A Course Book and Reference Guide (2nd ed.). call 'initial prominence phenomenon', in which they tend to put a Cambridge: Cambridge University Press. [5] Chela-Flores, B. 1997. Rhythmic patterns as basic units in pronunciation primary stress on the initial syllable of multi-syllable words, as teaching. ONOMAZEIN, 2: 111-134. visualized in Figure 4, in which the first wave chunk of [6] Celce-Murcia, M., Brinton, D. M., & Goodwin, J. M. 1996. Teaching photography appears thicker and darker than the others. In Pronunciation: A Reference for Teachers of English to Speakers of Other Language. Cambridge: Cambridge University Press. correlation with rhythm, Korean is treated to belong to [7] Fant, Gunnar. 1970. Acoustic Theory of Speech Production with languages of syllable-timed rhythm, which is defined as a Calculations Based on X-ray Studies of Russian Articulations. Boston: rhythm in which the timing is determined by the number of Mouton. syllables (words) in a sentence. Given that Korean exhibits an [8] Florez, M. C. 1998. Improving adult ESL learners’ pronunciation skills. Phonetics, 25(2): 169-186. evenly distributed stress pattern among syllables in words, it is [9] Fish, S. E. 2003. Is There a Text in This Class? - The Authority of obvious that the articulation timing of a sentence is proportional Interpretative Communities (12th ed.). Boston: Harvard University Press. to the number of syllables (words) contained in a sentence. This [10] Fraser, H. & Perth, H. F. 1999. ESL pronunciation teaching: Could it be more effective? Australian Language Matters, 7(4): 7-8. rhythmic pattern is exactly shown visually in Figures 5, 6, 7, and [11] Fries, Charles C. 1945. Teaching and Learning English as a Foreign 8, which contain 3, 4, 5, and 6 syllables (words), respectively. Language. Ann Arbor, MI: The University of Michigan Press. Accordingly, the timing of these 4 sentences is differentiated [12] Hammer, J. 2007. How to Teach English. Harlow: Pearson Education. [13] Hoard, J. E. 1971. Aspiration, tenseness and syllabification in English. and visually shown to be 1.3, 1.9, 2.0, and 2.1 seconds, Language, 47: 133-140. respectively, consequently conforming to the basic tenet of [14] Kang, Seung-Man. 2013. Stress, rhythm, and intelligibility in English. syllable-timed rhythm. Ultimately, this paper attempts to exhibit The Jungang Journal of English Language and Literature, 55(3): 53-76. [15] Kelly, L. G. 1969. Centuries of Language Teaching. Rowley, MA: different suprasegmental features in phonology between Newbury House. English and Korean, and it aims at applying the experimental [16] Kenworthy, J. 1987. Teaching English Pronunciation. London: analysis to enhance the level of intelligibility and intercultural Longman. communication for Korean EFL learners in their foreign [17] Lee, J. K. 2008. An intonational assessment of English communicative competence of Korean college students with the curriculum. Modern language teaching and learning. Given that English English Education, 9(1): 256-282. pronunciation education in Korean EFL settings should swing [18] Lee, Jeong-Hwa & Rhee, Seok-Chae. 2018. Acoustic analysis of Korean its pendulum to the focus on the teaching of suprasegmental trisyllabic words produced by English and Korean speakers. Phonetics Speech Science, 10(2): 1-6. features. This paper finds its contribution in the English [19] Munro, M. J. & Derwing, T. M. 1995. Foreign accent, comprehensibility pedagogical approach to pronunciation teaching to Korean EFL and intelligibility in the speech of second language learners. Language learners by showing the visualized patterns of such features in Learning, 45: 73-97. [20] Morley, J. 1991. The pronunciation component in teaching English to comparison between a native and non-native speaker of English. speakers of other languages. TESOL Quarterly, 25(3): 481-520. [21] Murphy, J. 2003. Pronunciation. In D. Nunan (ed.), Practical English ACKNOWLEDGMENT Language Teaching. pp. 111-128. Boston: McGraw-Hill. [22] Pennington, M. & Richards, J. 1986. Pronunciation revisited. TESOL We would like to express my deepest gratitude to the many Quarterly, 21(2): 207-225. people who have supported and helped me throughout the [23] Pulgram, E. 1970. Syllable, Word, Nexus, Cursus. The Hague: Mouton. completion of this thesis. First, we sincerely thank my thesis [24] Selkirk, E. O. 1980. The role of prosodic categories in English word stress. Linguistic Inquiry, 11(3): 563-605. advisor, Dr. Dingkun Li in Computer Science Department and [25] Selkirk, E. O. 1981. Epenthesis and degenerate sllable in Cairene Arabic. Dr. Seung-Man Kang in English Education Department at MIT Working Papers in Lingusitcs, 3(2): 209-232. Chungbuk National University. They taught me the ins and outs [26] Xu, L. H. 1991. Developing student confidence in speaking English. Modern English Teacher, 3(3): 74-89. of research method and cultivated my interest in phonetics analysis which we could benefit from considerably. Besides them, we likewise thank my dear friend and classmate, Breauna Oldham, a female American teacher who served as the ESL language teacher at the International Service Center of Chungbuk National University, and Dong-yong Kim, a Korean middle school English teacher who is my classmate in the graduate school of English Education Department, arranged the experiment project with his students from Chengju Nam middle school for the phonetic recordings. 6 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Performance Analysis of CPU, GPU and TPU for Deep Learning Applications Yong Shing Voon*, Yunze Wu*, Xinzhi Lin* and Kamran Siddique Abstract—Deep Neural Networks have gained popularity due be used via cloud services to perform various research and AI to their superiority in performance in various applications. Hence, applications. domain-specific architectures have been introduced to speed up the model training process. Since the launch of TPUs by Google, As TPUs were fairly recently introduced, most of the there were few literatures that examines the performance of TPUs comparative studies on the performance analysis of processing compared to existing processing units such as the CPU and GPU units on deep learning applications in current literature only in deep learning applications. The recent inclusion of the PyTorch discusses the Central Processing Unit (CPU) and GPU [1], [2], framework to use Google Colaboratory’s resources also provided [3]. We chose to compare the performance using the an opportunity for us to build a PyTorch-based Colab notebook Convolutional Neural Network (CNN) with aims to follow up that can be used for further study in this area. In this study, we evaluated the performance of the Intel Xeon CPU, Nvidia T4 GPU on the experiments done in [2] and [3]. According to [4], the and TPU v2-8 by using the Convolutional Neural Network (CNN) use of CNN incurred less noticeable amounts of multi-chip model on the CIFAR-10 and MNIST datasets. Our study also overhead compared to other neural networks. In this study, we considers the accuracy and convergence of the model while examined and compared the performances of the CPU, GPU testing, which were implemented by data augmentation and batch and TPU by implementing image classification models on the normalization techniques in the VGG16 model trained from CIFAR-10 and MNIST datasets. A VGG16 model with batch scratch. We achieved a model accuracy of 86% and 96% for the normalization was trained from scratch on the Intel Xeon CPU, CIFAR-10 and MNIST datasets respectively. During the Nvidia T4 GPU and TPU v2 respectively. All experiments were experiments, we observed that the GPU performed 25x faster than performed using Google’s cloud server. However, cloud and the CPU; meanwhile the TPU only achieved a 12x speedup internet overhead were not considered in this study. compared to the CPU as there were some issues faced. We Contributions – We first provided a brief comparison of the proposed some issues and possible solutions in this paper for CPU, GPU and TPU architecture in a theoretical manner. Then, further study. we compared the performance of the Intel Xeon CPU, Nvidia Index Terms—CPU, GPU, TPU, Deep Learning, Convolutional T4 GPU and TPU v2 via Google Colaboratory (Colab) in the Neural Network, Image Classification, Google Colab, CIFAR-10, application of image classification with CNN. We produced a MNIST, PyTorch reusable Pytorch-based framework with Colab notebook for future parameter tuning and troubleshooting with Colab’s Input Pipeline Analyzer to improve the results of our study. I. INTRODUCTION In recent years, deep learning has become more popular due II. RESEARCH METHODOLOGY to increased data volumes and advancement in computing A. Hardware Specifications and Configuration technologies. Deep Neural Networks (DNN) are preferred over classical machine learning methods due to their superiority in As we intend to evaluate the performance of CPU, GPU and performance when trained with massive datasets. They can be TPU, two personal computers were used for testing. The first used in many artificial intelligence (AI) applications such as one uses Intel Core i7, 12GB RAM and NVIDIA GeForce GTX image recognition, speech recognition and natural language 1050 Ti, while the second one uses Intel Core i3, 16GB RAM processing. However, deep learning does require a fair bit of and does not have a dedicated GPU. Due to the inconsistencies computing resources to run efficiently. Although they may be and insufficiencies in computer specifications, we utilized more powerful than classical machine learning approaches, Colab, a cloud-based service to perform all the tests. deep learning models often take a longer time to train. Hence, As Colab randomly assigns any available hardware to us, domain-specific processors such as Graphics Processing Units the hardware specifications were not certain in the beginning of (GPUs) have been introduced in an effort to speed up the the study. The finalized hardware specifications used in this computational process. More recently, Google launched the experiment is shown in Table I. It is also important to note that Tensor Processing Unit (TPU), which claims to perform better Colab has a maximum execution time of 12 hours and a than many other AI accelerators that are available today. At the maximum idle time of 90 minutes [5]. moment, the TPUs i.e.: Cloud TPU v2 and Cloud TPU v3 can TABLE I. HARDWARE SPECIFICATIONS *Yong Shing Voon, Yunze Wu and Xinzhi Lin are co-first authors. Parameter CPU GPU TPU All authors are with the Department of Information and Communication Model Name Intel Xeon Nvidia T4 v2-8 Technology, Xiamen University Malaysia, Sepang, Selangor, Malaysia (email: CST1809818@xmu.edu.my, CST1809238@xmu.edu.my, Frequency 2.30GHz 2.30GHz * CST1809216@xmu.edu.my, tmkamran@gmail.com). 12 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Parameter CPU GPU TPU our experiments, we defined a metric as shown in the formula Model Name Intel Xeon Nvidia T4 v2-8 below. No. of CPU 2 2 8 Cores Memory - 16GB 64GiB 4.1 TFLOPS/8.1 Performance - * TFLOPS where T is total running time; e is epoch; D is dataset size Memory and B is batch size. - 0.82GHz/1.59GHz * Clock Available D. Model Training 12GB 12GB * RAM In this study, we considered the model’s accuracy and Disk Space 25GB 358GB * convergence aside from testing the performances of the * Information not available different architectures. We are of the opinion that if the B. Datasets Used processing units were to complete a certain task, then they should perform it well enough. Hence, we set the target that the 1) CIFAR-10 [6]: The CIFAR-10 dataset is a well-known models trained should achieve an accuracy of 85% or above. dataset that consists of 60,000 32×32 coloured images of In order to accomplish this objective, we performed a few nature. There are a total of 10 classes, with 6 thousand images techniques to optimize the initial VGG16 model. Firstly, we per class. Some of the classes include airplane, car, horse and performed data augmentation to introduce randomness to the ship. In total, there are 50,000 training images and 10,000 test original dataset: images. Random crop: We set the padding to be 4 and random crop to be the original size of the image. 2) MNIST [7]: The MNIST dataset is a large collection of Random vertical and horizontal flip: The images will be handwritten digits. It consists of 60,000 training images and randomly flipped in the vertical or horizontal direction. 10,000 testing images, all of which are 28×28 greyscale images Then, we performed batch normalization to standardize the of the digits 0 to 9. inputs to the layer of the network. We performed this step to C. Evaluation Methods accelerate the training process, so that our model converges We trained a VGG16 model from scratch to perform the faster, thereby reducing the epoch counts required. It also image classification tasks for both datasets. We used the reduces generalization error, as we were able to use the same PyTorch framework for our model building. The model for both datasets. hyperparameters used for the VGG16 model and for the The epoch counts were also carefully selected by first respective datasets are shown in Table II. implementing the VGG16 model on our personal laptops. We obtained an accuracy of about 85% at epoch 116 for the CIFAR- TABLE II. HYPERPARAMETER SETTINGS 10 dataset on our laptop with GPU. For the MNIST dataset, only 5 epochs were required for the model to achieve an accuracy of Hyperparameter CIFAR-10 MNIST about 96%. Lastly, the batch sizes and learning rates were tuned Loss Function Cross Entropy Cross Entropy by trial and error to achieve good model accuracies. Learning Rate 0.001 0.01 For the TPU, we used the PyTorch XLA package which enables us to execute PyTorch code in Google Colab’s TPU. Optimizer ADAM SGD Otherwise, the execution was not possible. Training Images 50,000 60,000 III. LITERATURE REVIEW Test Images 10,000 10,000 Several studies have been carried out with the aim to Batch Size 16 to 16384 16/32/48/64 compare and analyze the performance of different processing units on different deep learning applications. Different Epoch Count 5/120 5 perspectives and methodologies were studied in the related works. Buber and Diri [1] studied the performances of the Intel In the first part of the experiment, we used batch sizes of 16, Xeon Gold 6126 CPU and the Tesla K80 GPU by executing a 32, 48 and 64 to train both datasets in order to examine the Recurrent Neural Network (RNN) model to classify English effects of batch size. In the second part, we further examined web pages. In this study, the use of cloud-based CPU and GPU the performance of the GPU and TPU by training the model have been applied, so the tests were only run for 3 epochs due to with the CIFAR-10 dataset for 120 epochs. Lastly, we tested the high costs. The experiment has four different test cases, whereby GPU and TPU by using extremely large batch sizes, i.e.: 1024, the authors altered one of the following four variables mentioned 2048, 4096, 8192 and 16,384. For the final test, we did not below: consider the accuracy of the model. The epoch counts for each set of the experiment was kept CPU specification constant to provide a fair comparison for the different Batch size architectures tested. To standardize and evaluate the output of Hidden layer size Whether transfer learning has been used 13 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 From this study, it was observed that the training time In [4], a study was performed to benchmark the decreased after doubling core frequency and increasing number performances of the CPU, GPU and TPU for deep learning of cores respectively. Besides, parallelization power is increased applications. The ParaDnn deep learning benchmark model was when the batch size increases. The lead of the GPU in training proposed by the authors, which consists of end-to-end Fully time compared to the CPU becomes more obvious too, due to connected models (FCs), RNNs and CNNs. In addition to the greatly parallel architecture of the GPU. On the other hand, ParaDnn, the authors also included two more workloads from the authors observed that the increased number of hidden layers the MLPerf [12] benchmark suite constructed with the does not increase accuracies, despite increased training time due TensorFlow framework to perform the experiment. Tensorflow to more learning parameters. In the case of transfer learning, the was used as it was the only framework that was supported by difference in working time is not obvious with or without it, Google Colab at that time. In their findings, the authors however model accuracies were observed to be slightly identified the architectural limitations of the TPU and proposed increased. The authors also mentioned that the model’s accuracy some ideas to improve the TPU’s performance. Some of the can be enhanced with more epochs being executed. Generally, issues mentioned were multi-chip overhead and host-device the GPU’s performance was 4-5 times faster than the CPU in all balance. The authors stated that CNN incurred less noticeable test cases. amounts of multi-chip overhead as compared to the other models In another study, K. V. Sai Sundar et al. [3] also examined tested. However, this study does not consider the model’s the performances of CPU and GPU when running DNN models accuracy or convergence. by using the TensorFlow [8] framework. The authors used CIFAR-10 in their experiment. A few improvements were made IV. BACKGROUND STUDY by the authors on the Inception-v3 and Resnet models to obtain A. Convolutional Neural Network faster and more accurate results, even when the dataset is small. Apart from this, the batch sizes and optimizers used in each test The Convolutional Neural Network (CNN) [13] consists of case were also varied for the two different neural network the convolution layer, pooling layer and fully connected layer. model. The authors were able to conclude that the Tesla k20c CNN is widely applied in image-related tasks because it can GPU had a 40x speedup while the Titan X GPU had an 82x grasp details in 2-dimensional images more effectively speedup with respect to the Intel Xeon CPU when Inception-v3 compared to the Fully Connected Neural Network (FCNN). [9] model was executed. For the implementation of the Resnet- Apart from this, the CNN is derived from the FCNN by 50 [10] model, the speedup relative to the CPU speed for the substituting its layers with convolutional layers except for the Tesla k20c and Titan X GPUs are 52x and 102x respectively. last layer because FCNN involves a large number of calculations In the study performed by Wang et al. [11], the performance of weight matrix as illustrated in Fig. 1. There are two essential and energy efficiency of different processors from Intel (CPU), attributes of convolutional layer: NVIDIA (GPU), AMD (GPU) and Google (TPU) were locally connected: each of the neuron in the lth compared and studied. This paper first provided in-depth convolutional layer is only linked with a random neuron insights and analysis on the comparisons between CPU, GPU in the local window in lth+1 layer. As illustrated in Fig. and TPU regarding the most common operations in neural 2, the connection between lth and lth+1 layer drops from network, i.e.: matrix multiplication and convolution in 2D Ml × M(l+1) to Ml × K where K is the size of the kernel, Ml dimension. The authors plotted the trends of FLOPS of different and M(l+1) are the number of neurons in lth layer and AI accelerators at three levels of increasing computation l+1th layer, respectively. intensity. In this paper, six different neural networks, i.e.: shared weights: as a parameter, the kernel can be shared VGG16, Resnet50, Inception V3, 2-Layer LSTM, Deep Speech with all neurons in lth layer, which is displayed in Fig. 2. 2 and Transformer language model were executed on the selected processing units. According to the authors, the performance of NVIDIA’s V100 GPU with Tensor Core is still limited by the bottleneck issue, causing its performance to lag behind Google’s TPU V2 by a large margin, despite having the best performance among the evaluated GPUs with FP32 and Mixed Precision training for the execution of the CNN model. The authors also mentioned that there is still room for Fig. 1. The fully connected layer. (All neurons in fully connected layer should hardware optimization for the TPU v3-8 to better fit modern be linked together.) deep learning networks. The study compared the performance between the Tesla V100 GPU, Tesla V100 GPU with Tensor Cores and TPUs. It was observed that the TPUs outperformed the Tesla V100 in all three Resnet50, Inception c3 and Transformer models. In the former two models, TPU V2-8 achieved 1.5x higher throughput compared to the Tesla V100 with Tensor Cores, whereas TPU V3-8 had a 2.7x higher throughput. In the Transformer language model however, it was observed that the performance of the TPU V2-8 and both Tesla Fig. 2. The convolutional layer. (The weights are the same for the same coloured lines.) V100 GPUs are very similar, whereas the TPU V3-8 showed a more significant 1.7x speedup. 14 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Although the convolutional layer can have a better effect architecture aggregates 32,768 Arithmetic Logic Units (ALU) when dealing with RGB images and reduces the scale of and uses the special datatype bfloat16. Hence, it sacrifices calculation by, the calculation in the convolutional layer still precision for higher processing speeds. However, workloads takes up a lot of computing resources, especially for models that are not based on matrix multiplication would not work well which have a large number of layers and channels. Therefore, on TPUs [17]. Besides, large and effective batch sizes should be we consider that the convolution part demands the most used to maximize the optimization effects provided by TPUs resources from the computer. [18]. B. Central Processing Unit (CPU) E. Comparison across CPU, GPU and TPU The design of the CPU is focused to have low time delay. It The components of the CPU are much more complicated has a powerful Arithmetic Logic Unit (ALU) so that the than the GPU. The CPU not only has a considerably larger arithmetic operation can be finished in a very short clock cycle. cache, but also consists of complex control unit and many The precision that CPUs can achieve nowadays is 64-bit double. optimized circuits for branch decision while the GPU has large- It only takes 1 to 3 clock cycles for the CPU to carry out addition scaled computation units and a relatively smaller portion of and multiplication operations. These days, CPU frequencies can storage units and control unit. This provides a pure computing reach up to 3 GHz. Apart from this, the larger cache of the CPU environment without interruption. The GPU chip includes also reduces time delays because more data can be stored into larger number of registers, threads and Single Instruction the cache, hence reducing memory access. According to the Multiple Data (SIMD) streams, which guarantees its principle of spatial locality, the larger the size of the cache, the calculation efficiency with high-speed parallel computation. In higher the cache hit rate since there is a higher probability of the terms of storage, the GPU chip has a smaller cache and DRAM content requested being in the cache. Moreover, the complex compared to the CPU chip [19]. Since the target of the GPU control unit can lower the delay through its ability of branch cache is not for long-term data storage, it acts like a buffer for prediction. For example, if the execution of some instructions temporary storage in the computation process. So, it will requires the results from a former instruction, the control unit involve much more delay when parallel computation needs to can determine the locations of these instructions in the pipeline access data from the same buffer frequently. and transfer the result needed towards the latter instruction as When comparing the GPU to the TPU, [20] stated that the soon as possible. TPU has a much larger Local Unified Buffer (UB). This is used to store intermediate results to be accessed quickly, similar to C. Graphic Processing Unit (GPU) the cache in the CPU and GPU. The UB has a size of 24MiB Generally, the GPU consists of the decoder, texture cache, and takes up about 29% of the TPU chip. In comparison, the vertex cache and shared local storage. Specifically, more than accumulators, which are 4MiB in size, takes up about 6% of the 40% of space of modern GPU chip are occupied by ALU, which chip. With the powerful UB and accumulators, the TPU saves is important for its high-speed computation. Different from a large amount of time in acquiring data from the DRAM. CPUs, GPUs are designed for parallel processing [14]. These Another difference between the GPU and TPU is that days, most GPUs utilize the single instruction multiple data TPU tolerates low computing precision. As shown in [21], (SIMD) stream architecture [15], whereby a single instruction despite using a lower-precision approximation, the model can can trigger data parallelism due to each processor having their achieve the same, or sometimes even higher accuracy. When own dedicated memory. For a typical image processing process, using a lower or mixed precision, the overall processing speed the vertex shader in the GPU is responsible for handling the data can be increased by a large margin, because the number of flow from the image including light, colour and positions. Then, transistors required in each computing operation would fall the information of the image will be stored in the shared local sharply. Therefore, assuming that the total volume of transistors storage as pixels. Lastly, each pixel will be executed separately is the same for the GPU and TPU, more operations can be by different threads of different ALUs according to some performed on the TPU per unit time, allowing results to be computation rules defined. achieved at a higher rate with complicated and powerful deep D. Google Tensor Processing Unit (TPU) learning algorithms. The Tensor Processing Unit (TPU) is a domain-specific architecture developed by Google dedicated for deep learning, V. EXPERIMENTAL RESULTS specifically neural network models. The TPU is an AI A. CPU, GPU and TPU with Small Batch Sizes accelerator application-specific integrated circuit (ASIC), and For this part of the experiment, the following settings were cannot perform general tasks like the CPU or GPU. However, defined: this is the reason that sets it apart from the other processors, since Dataset: CIFAR-10 and MNIST it focuses on matrix processing tasks which are the main Batch sizes: 16, 32, 48, 64 operations in neural network models. The TPU is able to achieve Epoch count: 5 great computational throughput and process massive datasets as the von Neumann bottleneck issue is greatly reduced by only focusing on specific workloads [16]. For Google TPUs, each board consists of 4 chips, while each chip has 2 cores. Every core has a matrix unit (MXU), vector unit (VPU) and a scalar unit, all connected to an 8GB high bandwidth memory (HBM). The matrix processor uses a 128 x 128 systolic array architecture to perform the vast number of hard-wired calculations. This 15 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 relatively small. We deduced that this is because the MNIST dataset is a simpler image classification problem as it only contains black and white images while the CIFAR-10 dataset proved to be more challenging for the architectures tested. Fig. 3. Comparison of average time per batch of the CPU, GPU and TPU with batch sizes 16 to 64 and epoch count of 5 for the CIFAR-10 dataset. For the CIFAR-10 dataset, we intended to train the model for 120 epochs, as this would give us a high accuracy. However, we realized that the Intel Xeon CPU on Colab could only produce 1 epoch after 16 minutes. Since Colab had a 12-hour Fig. 5. Comparison of total running time of the CPU, GPU and TPU with batch sizes 16 to 64 and epoch count of 5 for the CIFAR-10 dataset. maximum execution time limit for free users, we would not be able to complete the test in time. Hence, we sacrificed the model As shown in Fig. 5, as the batch size is increased, the total accuracy for the CIFAR-10 dataset, achieving a maximum training time for all 3 processors decreased. However, the TPU accuracy of only 57.83%, 55.67% and 54.48% with 5 epochs showed the highest reduction rate at an average rate of 1.26x, for the CPU, GPU and TPU respectively. followed by the GPU at 1.24x and the CPU at only 1.08x. As the batch size increases, the number of parallelizable matrix However, this would not affect our performance evaluation multiplication operations increases too. Hence, the TPU and as we compared the average running time per batch as shown GPU were able to utilize their parallelization power enabled by in Fig. 3. It is evident that when batch size increases, the overall their architecture to achieve a shorter total running time. average time per batch increases. This is because when batch size increases, the number of matrix multiplications also B. GPU and TPU with Large Epoch Counts increases. Naturally, it will take a longer time to train as the For this part of the experiment, the following settings were batch sizes increases. As shown in Fig, 3, the average running defined: time per batch for the CPU is 25x of the GPU and 12x of the Dataset: CIFAR-10 TPU. Batch sizes: 16, 32, 48, 64 Epoch count: 120 Fig. 6. Comparison of average time per batch of the GPU and TPU with batch Fig. 4. Comparison of average time per batch of the CPU, GPU and TPU with sizes 16 to 64 and epoch count of 120 for the CIFAR-10 dataset. batch sizes 16 to 64 and epoch count of 5 for the MNIST dataset. Since the test to run 120 epochs for the Intel Xeon CPU on For the MNIST dataset, we were able to achieve good Colab was unfeasible, we only performed the original test on accuracies despite only running for 5 epochs, i.e.: 96.44%, the GPU and TPU. We observed that for small batch sizes, the 96.73% and 95.37% for the CPU, GPU and TPU respectively. GPU and TPU both took a long time to train all 4 batch sizes, The trend of the graph plotted in Fig. 4 is similar to the graph which was about 4 and 5 hours respectively. However, it was plotted for the CIFAR-10 dataset. However, the difference in observed that the TPU consistently took 2x more running time average time per batch for the CPU with the GPU and TPU than the GPU, which was out of our expectations, since we were not as significant as in the CIFAR-10 dataset. expected the TPU to be the better performer. Furthermore, the difference in average time per batch between Even though we ensured that our batch sizes were multiples the GPU and TPU for the CIFAR-10 dataset was almost 2 times, of 8 to suit the TPU architecture, we were unable to achieve the while the difference observed for the MNIST dataset was desired speedup for the TPU. 16 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 C. GPU and TPU with Large Batch Size Based on our study, we deduced that there may be too many For the final part of the experiment, the following settings data preprocessing steps for the TPU to handle, since it is a were defined: purely computational architecture. While we strived to achieve Dataset: CIFAR-10 a good accuracy for our model, we may have sacrificed the Batch sizes: 1024, 2048, 4096, 8192, 16384 opportunity to fully take advantage of the TPU architecture. Epoch count: 5 Hence, further code optimization or locally-performed data preprocessing steps can be performed to reduce the effects of non-matrix multiplication operations. Another aspect worth looking into is the issue of input processing bottleneck that is caused by the TPU being able to train much faster than the CPU could fetch the input data for it. This can be further investigated on the Colab framework we produced by using the Input Pipeline Analyzer tool on Colab. There may also be cloud and internet overhead, which were not considered in this study. VII. CONCLUSION In this study, we first presented a brief theoretical comparison of the CPU, GPU and TPU. Then, we performed three tests to examine the performance of the Intel Xeon CPU, Fig. 7. Comparison of total running time of the GPU and TPU with batch sizes Nvidia T4 GPU and TPU v2-8 in the application of deep 1024 to 16384 and epoch count of 5 for the CIFAR-10 dataset. learning, particularly in the task of image classification. We For the final testing, we referred to [22], which suggested trained a VGG16 model with batch normalization from scratch that batch sizes of at least 64, and are multiples of 128 should to classify images in the CIFAR-10 and MNIST datasets. We be utilized, where the suggested batch size was 1024. Hence, varied the batch sizes and epoch counts to assess their effects we dramatically increased the batch sizes from our previous on the performance of the different architectures. We also settings to test if our model was indeed affected by batch sizes considered the model’s accuracy and convergence by which were too small. This was also one of the simplest options implementing batch normalization and data augmentation available, hence we performed this test instead of other options techniques while optimizing the code for the architectures due to time constraints. tested. Due to financial, time and hardware constraints, we As shown in Fig. 7, the average time per batch taken for the faced some difficulties to take full advantage of the TPU TPU was still approximately 2x of the time taken for the GPU. architecture. Nonetheless, it is safe to conclude that we were However, when the batch size of 16384 was used, the GPU able to maximize the parallelization power of the GPU, failed to execute instantly as it has run out of memory. The achieving a 25x speedup as compared to the CPU’s average TPU, however, proved to be superior in handling extremely running time per batch. The TPU, although not unleashed to its large batch sizes as it had no problem performing the full model full potential, was able to achieve a 12x speedup when training. The possible issues and problems that caused the compared to the CPU. We were also able to achieve a model consistent delay in the training time for the TPU will be accuracy of approximately 86% and 96% for the CIFAR-10 and discussed in Section VI. MNIST datasets respectively. Other than that, we also proposed issues that one may face in an attempt to make full use of VI. LIMITATIONS AND FUTURE RESEARCH Google Colab’s TPU resources. Some future research DIRECTIONS directions and possible solutions to the issues raised were also Even though we were aware that the TensorFlow provided in Section VI. framework was the native framework that is most compatible with Google Colab, we still decided to use the PyTorch REFERENCES framework as it was recently made available to Google Colab’s resources. This may pose some incompatibility issues and made [1] E. Buber and B. Diri, "Performance Analysis and CPU vs GPU it harder to troubleshoot as compared to the TensorFlow Comparison for Deep Learning," 2018 6th International Conference on Control Engineering & Information Technology (CEIT), 2018, pp. 1-6, framework. Besides, the random delegation of available doi: 10.1109/CEIT.2018.8751930. processing units also made our model building and optimization [2] T. Carneiro, R. V. Medeiros Da NóBrega, T. Nepomuceno, G. Bian, V. process more challenging, as it was uncertain what hardware H. C. De Albuquerque and P. P. R. Filho, "Performance Analysis of we would be assigned. Google Colaboratory as a Tool for Accelerating Deep Learning Prior to performing Experiments B and C, we have adhered Applications," in IEEE Access, vol. 6, pp. 61677-61685, 2018, doi: 10.1109/ACCESS.2018.2874767. to most of the requirements stated in the troubleshooting guide [3] K. V. Sai Sundar, L. R. Bonta, A. K. Reddy B., P. K. Baruah and S. S. provided by Google [22]. For example, we ensured that the Sankara, "Evaluating Training Time of Inception-v3 and Resnet-50,101 timing of our model training was only timed for the running Models using TensorFlow across CPU and GPU," 2018 Second time of the epochs. Besides, the layer size chosen for our model International Conference on Electronics, Communication and Aerospace also aligned with the one suggested in the guide. In Experiment Technology (ICECA), 2018, pp. 1964-1968, doi: 10.1109/ICECA.2018.8474878.J. Clerk Maxwell, A Treatise on C, we also ensured a large and compatible batch size with the Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, TPU, but the results were still not optimistic. pp.68–73. 17 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 [4] Wang, Y., Wei, G., & Brooks, D. (2019). Benchmarking TPU, GPU, and https://www.extremetech.com/gaming/269335-how-graphics-cards- CPU Platforms for Deep Learning. ArXiv, abs/1907.10701. work. [5] “Colaboratory,” Google. [Online]. Available: [15] M. Levinas, Everything You Need to Know About GPU Architecture and https://research.google.com/colaboratory/faq.html#:~:text=How%20lon How It Has Evolved, 23-Mar-2021. [Online]. Available: g%20can%20notebooks%20run,or%20based%20on%20your%20usage. https://blog.cherryservers.com/everything-you-need-to-know-about- [6] A. Krizhevsky, V. Nair, and G. Hinton, “The cifar-10 dataset,” [Online]. gpu-architecture. Available: http://www.cs.toronto.edu/kriz/cifar.html, 2014. [16] K. Sato, “What makes TPUs fine-tuned for deep learning? | Google Cloud [7] Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010. Blog,” Google Cloud, 31-Aug-2018. [Online]. Available: [8] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. https://cloud.google.com/blog/products/ai-machine-learning/what- Corrado, A. Davis, J. Dean, M. Devin, et al., “Tensorflflow: Large-scale makes-tpus-fine-tuned-for-deep-learning. machine learning on heterogeneous distributed systems,” arXiv preprint [17] R. Sagar, “A Beginner's Guide To TPUs,” Analytics India Magazine, 13- arXiv:1603.04467, 2016. May-2020. [Online]. Available: https://analyticsindiamag.com/tpu- [9] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking beginners-guide-google/. the Inception Architecture for Computer Vision," 2016 IEEE Conference [18] “Cloud Tensor Processing Units (TPUs),” Google Cloud. [Online]. on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818- Available: https://cloud.google.com/tpu/docs/tpus. 2826, doi: 10.1109/CVPR.2016.308. [19] X. Wang and W. Zhang, "A Sample-Based Dynamic CPU and GPU LLC [10] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Bypassing Method for Heterogeneous CPU-GPU Architectures," 2017 Recognition," 2016 IEEE Conference on Computer Vision and Pattern IEEE Trustcom/BigDataSE/ICESS, 2017, pp. 753-760, doi: Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90. 10.1109/Trustcom/BigDataSE/ICESS.2017.309. [11] Y. Wang et al., "Benchmarking the Performance and Energy Efficiency [20] N. P. Jouppi et al., "In-datacenter performance analysis of a tensor of AI Accelerators for AI Training," 2020 20th IEEE/ACM International processing unit," 2017 ACM/IEEE 44th Annual International Symposium Symposium on Cluster, Cloud and Internet Computing (CCGRID), 2020, on Computer Architecture (ISCA), 2017, pp. 1-12, doi: pp. 744-751, doi: 10.1109/CCGrid49817.2020.00-15. 10.1145/3079856.3080246. [12] D. Patterson, “MLPerf: SPEC for ML,” [Online]. Available: https:// [21] S. Wang and P. Kanwar, “BFloat16: The secret to high performance on rise.cs.berkeley.edu/ blog/ mlperf-spec-for-ml/ , 2018. Cloud TPUs,” Google Cloud, 24-Aug-2019. [Online]. Available: [13] Qiu, X., 2020. Neural Networks and Deep Learning. 1st ed. Beijing: https://cloud.google.com/blog/products/ai-machine-learning/bfloat16- China Machine Press, pp.115-116. the-secret-to-high-performance-on-cloud-tpus. [14] J. C. Hruska, “How Do Graphics Cards Work?,” ExtremeTech, 10-Feb- [22] “Troubleshooting,” Google Cloud. [Online]. Available: 2021. [Online]. Available: https://cloud.google.com/tpu/docs/troubleshooting#training-speed. 18 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Automatic Stacking System Based on ABB Robots and Digital Twin Monitoring Yuhan Li*, Ketian Wang*, Yujia Zhai, Quan Zhang convey will stop immediately is unwarranted in actual Abstract— During COVID-19, the logistics industry plays a cru- situations due to the existence of inertia; and the research cial role in managing the outbreak and maintaining basic social conducted by Zhang [6] is not sufficiently practically to be consumption needs. To minimize the risk of infection, unmanned applied in industry. The above researches led to the decision to intelligent sorting systems are suggested to be implemented. This use a dual detection method by infrared sensors and vision paper covers the developing process of an automatic stacking cameras to achieve precise grasping. Meanwhile, build a system based on ABB IRB 120 robot with two auxiliary functions: synchronous data processing and digital twin simulation. Detailed runtime environment realized with the support of PC SDK to logic and method will be introduced, and relevant figures will be achieve digital twin monitoring. presented. The project will be evaluated and improvement will be This project utilizes the ABB IRB 120 robot with a magnetic suggested at the conclusion of this paper. sucker tool and an external convey as the stacking mechanism. Index Terms— ABB Robot; RobotStudio; Industrial For the detection and recognition system, Cognex 2000 series Application; Automatic Sorting and Stacking; Digital Twin; industry cameras and inferred sensors are selected (see Fig. 1). Real-time Data Processing. This project aims to imitate the actual industry applications that involves detecting, measuring, optimizing, stacking and monitoring to verify the logic of a nova stacking method and I. INTRODUCTION investigate the feasibility of building real time PC SDK digital Under the pandemic of COVID-19, the logistics industry has twin model. become an ever-flourishing domain. Considering the in- In the following paper, the develop logic and operational creasing consumption enthusiasm for the population under detail of the RAPID and the PC SDK host will be separately quarantine and the urgent need to allocate medical supplies, covered in the main section, followed by the discussion of packages transported globally have reached an unprecedented limitations and further improvement. Please notice that this scale [1]. Traditionally, the process of sorting and stacking are paper requires basic ABB robot operation knowledge and operated by human workers; however, a shortage of labour RobotStudio programming experience. It is strongly force has been posed since the beginning of the pandemic [2]. recommended to read this paper together with the RAPID, Since the mounting of Hannover Messe 2011, it has long been Integrated-Vision and PC SDK Application Manual [7] [8] [9]. recognized that the concept of “Smart Factory” is one of the major themes for Industry 4.0 [3]. Therefore, the demand II. THEORIES, METHODS AND RESULTS remained high for smart unmanned factories where deliveries A. RAPID Programming are automatically sorted and operational status are uploaded to be monitored remotely. The operational logic of the RAPID program is illustrated in Typical operations in a logistics centre, for example, trans- the flow chart as shown in Fig. 2. The initialize process involves porting by convey, sorting according to the bar code, have moving the robot arm to its home position and emptying the already been universally implemented by major express coop- occupation record array. Then, the convey will be enabled to eration [4]. However in the area of automatic optimal-arranged transport the object on which into the Recognition- and-Grab stacking and real time digital twin monitoring, limited amount area. Afterwards, the camera will detect the type of the object, of information could be found. Ma and Han applied infrared followed by updating the object’s central point position and sensor to detect the existence of objects on the convey [5]. deflecting angle. Subsequently, the most suitable coordinate in Zhang introduced fixed location pick-up method to grip objects the stack will be decided according to the present occupation [6]. Both the research are conducted by software simulation, the situation and the priority of stacking. Finally, the robot arm will limitation of which being obvious if applied to actual industry grab and place the object into the stack and the system will be situation. Assumptions made by Ma and Han [5] that the ready for the following operation cycle. 1) Convey Motion Control: The convey functions as the mechanism to transport the objects from one terminal into the * These authors contributed equally to this work. Recognition-and-Grab area at the other terminal. By adjusting All authors are with the Department of Mechatronics and Robotics, Xi’an the trigger distance of the terminal inferred sensor, its signal Jiaotong-Liverpool University, Suzhou, Jiangsu, China. (email: Yuhan.Li18@ will be an indicator to decide whether the objects have entered student.xjtlu.edu.cn, Ketian.Wang18@student.xjtlu.edu.cn, Yujia.Zhai@xjtlu. edu.cn, Quan.Zhang@student.xjtlu.edu.cn). the area or not, which when triggered, disenables the convey. This process does not require a high accuracy because the 7 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 camera will later detect object’s precise position. 2) Camera Detection and Target Recognition: By import- ing the targets into the Integrated Vision in RobotStudio and performing image recognition training [9], desired parameters like type of objects, central point offset and deflecting angle could be output to RAPID for further process. In this system however, the frame transformation value obtained by the Integrated Vision is in direct proportional to the object’s actual central point offset. Detailed relationship which are shown in Fig. 3 (y = 1.7886x + 135.94 in mathematical form), with the slope of which being 1.7886. Therefore, the the actual position offset equals to the value obtained the Integrated Vision divided by 1.7886. Fig. 1. Work Station Overview (A: Cognex InSight Camera; B:Recognition-and-Grab Area; C: Checkerboard) Fig. 3. Offset Relationship Diagram The deflecting angle obtained by the camera is within the range of [-90◦ , 270◦ ]. However, considering the symmetry the object, for a circle, no angle offset is required; for square, an angle offset within the range of [0◦, 90◦] are required; for a rectangular, an angle offset within the range of [0◦, 180◦] are required. To achieve consistency, the deflecting angle detected by the camera will be further processed by adding 180◦ each time until it reaches a positive value. 3) Categorizing Algorithm: In this project, three types of objects of the same height will be considered, which are square, circle and rectangular. The detailed dimensions of the three objects are shown in Fig. 4. Leaving a 1mm margin between two blocks, a unit block could be defined as a spacing of 42 × 42 × 21 mm, which is also the dimension of the checkerboard. Therefore, each square or circle occupies one unit block and each rectangular occupies two neighbouring unit blocks. To represent the scaled-down model of the real stacking situation, the target stack area is established as an area Fig. 2. Programming Logic Built in RAPID of 3 × 3 × 3 unit blocks which is outlined by the blue line in Fig. 5. A three-dimensional 3 × 3 × 3 Occupation-Record- 8 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Array is further established to recode the occupation situation, B. Communication, Data Processing and Digital Twin where for each element, “0” represents unoccupied and “1” Simulation represents occupied. Different from other communicating methods, TCP allows An exhaustive searching and scoring algorithm is deployed transferring data precisely with a reasonable speed, therefore to decide the most appropriate coordinate. Precisely, to being reliable in industrial application. The operational logic of retrieving all the potential spacing that accommodates the the client design is shown in the flow chart (see Fig. 6). current object, the system will traverse all the spacing successively. For each square and circle, every underneath supported vacant spacing will be a potential spacing; and for each rectangular, two underneath supported neighbouring spacings could be a potential spacing. In default, the rectangular will be arranged along the X-axis, and if it will be arranged along the Y-axis, a 90-degree additional angle offset will be added to the rotationaloffset. When one potential spacing is verified, it will be scored according to the priority of stacking. A coefficient will be assigned to each spacing in the X, Y and Z-axis, and the final score of a certain coordinate is the multiplication of its three corresponding coefficients. After traverse every coordinate and scoring each potential spacing, the one with the highest score will be selected as the optimal coordinate. The coefficients on each axis can be customized according to the ideal priority of stacking. In this project, the blocks are expected to be arranged in the sequence of X, Y and Z-axis; thus, one feasible coefficient set can be the red number in Fig. 5. Fig. 6. Design Flow Chart This client application, which is built by Visual Studio 2019, consists of three major sections: existed controller list monitoring and connection, robotic system data display, and digital twin (real time robot monitoring). Before programming, the development kit named PC SDK need to be downloaded. To establish a new C# Winform program, three library files need to be added, which are ABB.Robotics.Controllers.PC.dll, and RobotSrudio.Services.RobApi.Desktop.dll, Fig. 4. Objects Size Information (all three be found in the install path, coded by ABB Ltd). Subsequently, use using instruction to apply classes which might be used in namespace. Finally, apply relevant instructions and logics to achieve the functions stated above. 1) Controller Monitoring and Connection: Controller monitoring it is achieved via clicking the “REFRESH” button in the interface shown in Fig. 7a, which will refresh the controller list. Subsequently, select the controller in the list andclick the “Connect Controller” button, the connection will be initialized. To disconnect, select the current controller in the list and click the “Disconnect Controller” button. 2) Robotic System Data Write and Read: Synchronous data processing is one of the main innovations of this ap- plication, achieving which, two timers are introduced to the program. One triggers the display of current robot’s axial information (rotating degrees of 6 axes) and vision camera’s detected data Fig. 5. Target Stacking Area with Multiply Coefficients 9 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 (work object and work tool names, detected object’s name, 3) Digital Twin Model Simulation: Based on the storage data position, and rotating angle). The other triggers the processing in AxisData.xlsx, the digital twin model is simulated via and storing of the current axes data into the file named MATLAB. The convenience innovation of this real-time AxisData.xls with the path located in Documents. The clock monitoring is the timer-triggered self-refreshing that needs no cycle for first timer is 50 ms, and due to the limitation of the manual button clicking. However, as mentioned above, data file processing speed the clock cycle for second timer is 1000 refreshing rate is limited by the processing speed of Microsoft ms, which is significantly greater than the first one. Jointly, Excel that the visualized status will be updated every 1000 ms, they will be updated and displayed in the rest two interface causing high delay and discontinuity in the animation. named “Motion Unit Status” (Fig. 7b) and “Vision Camera Target Status” (Fig. 7c). Fig. 8. Digital Twin Model For testing, a series of rotated angle changes for these six axes within a working period are plotted in below (Fig. 9). (a) Controller List (a) Axis 1 (b) Axis 2 (b) Motion Unit Status (c) Axis 3 (d) Axis 4 III. EXPERIMENTS (c) Vision Camera Target Information (e) Axis 5 (f) Axis 6 Fig. 7. Client UI Design Fig. 9. Motion Unit Situation for One Working Period 10 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 It is illustrated in Fig. 9 that the motion state for whole robot tuitions and useful suggestions on our research thesis. We are with graphical form. Go further for it, assume that there is a deeply indebted of their help in the completion of this project. mistake occurs in one of the axes, and the respective diagram will have significant vibrations or other uncommon visualized REFERENCES phenomenon. Therefore, it might be a criterion, and engineers [1] “China’s exports of anti-epidemic products such are able to check the system and detect problem conveniently as masks to the UK increase under the new crown ep and effectively. idemic.” https://www.bbc.com/zhongwen/trad/business-5 4811967 RFC6120 Extensible Messaging and Presence P IV. CONCLUSIONS rotocol (XMPP): Core. P. Saint-Andre. March 2011 In conclusion, this project successfully utilizing a scoring [2] “Labor force survey (basic aggregate) 2020 (Reiwa 2 yea algorithm that instructs the ABB robot to arrange the blocks rs) average results.” https://www.stat.go.jp/data/roudou/s into a given stacking area. Additionally, data communication okuhou/nen/ft/index.html environments are established to monitor the real time [3] A. Rojko, “Industry 4.0 Concept: Background and Overv operational state through a digital twin model. Thus, the logic iew,” International Journal of Interactive Mobile Technol of the stacking method and feasibility of building real time ogies (iJIM), vol. 11, p. 77, 07/24 2017, doi: 10.3991/iji digital twin model have been verified. For further information m.v11i5.7072 . and guidance, please contact the corresponding authors. [4] “Maximize the Benefits of Outsourcing.” https://www.dh However, two limitations have emerged during the process. l.com/hk-en/ home/our-divisions/supply-chain/solutions. Firstly, in real applications, a stack is usually composed of html objects with diverse dimensions; however in this project, three [5] Z. Ma, K. Han, “Simulation Research on Palletizing Wor objects with the same height are considered. Therefore, the kstation Based on ABB Robot,” (in Chinese), Modern Inf universality and practicability of the scoring algorithm ormation Technology, vol. 4, p. 4, 2020, doi: 10.19850/j. introduced in this paper is undetermined. Secondly, the delay cnki.2096-4706.2020.16.041. and discontinuity in the digital twin model will weaken the [6] D. Zhang, “Study on Programming Optimization of ABB perceivability of malfunctions, thus diminishing the effective- Robot Palletizing,” (in Chinese), Integrated Circuit Appl ness of immediate responses. ications, vol. 38, p. 2, 2021, doi: 10.19339/j.issn.1674-25 In future research, machine learning could be introduced to 83.2021.05.036. train a particular neural network that decides the best stacking position; and other advanced data processing methods, for [7] “Application manual: PC SDK” https://library.e.abb.com example, SQL, are highly recommended to smooth the /public/ 124d6b59313ed85fc125793400410c5b/3HAC03 monitoring model. 6957-en.pdf [8] “Technical reference manual: RAIPD Instructions, Functi V. ACKNOWLEDGEMENT ons and Data types” https://library.e.abb.com/public/6888 Author order is determined by contribution; the 1st and 2nd 94b98123f87bc1257cc50044e809/Technical%20referenc authors have the same contributions. e% 20manual RAPID 3HAC16581-1 revJ en.pdf We would like to extend our sincere gratitude to our [9] “Application manual: Integrated Vision.” https://us.v-cdn. supervisors, Yujia Zhai and Quan Zhang, for their instructive net/5020483/ uploads/editor/pa/rct9e0e66w1f.pdf 11 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 A Review of Defence Solutions against Cache Side-channel Attacks Tianyu Cai*, Yi Yen Low*, and Kamran Siddique attack is high. In most cases, the mitigation methods used to Abstract— Nowadays the storing of sensitive information in defend against side channel attacks may lead to negative impact cyberspace shows a significant increase and this led to an on the performance of the system. Thirdly, the platform is escalation of cyberattacks. Hence, the protection against the highly dependent. The vulnerabilities used in side-channel confidentiality of data or programs has been a matter of special attacks are in the implementation and operation of the program. concern to many security experts. One of the problematic Therefore, the success of the attack depends on the execution cyberattacks is the cache side-channel attacks as caches are environment and hardware architecture of the program. playing an important rule to ensure the performance of computer. The leakage of information can occur by observing the distinct This paper mainly focuses on reviewing a particular type of access time for a cache miss or hit. This paper conducts some side channel attack which is based on the cache. The rest of this background research about the categorization of different cache paper is organized as follows. Section II gives some side-channel attacks. Besides that, several strategies which are background information about cache side channel attacks. used to defend against cache side-channel attacks such as Sections III and IV show the detection method and some detection method, prevention method and some proposed secure prevention methods against attacks. Section V introduces some cache design will be reviewed. Also, the difficulties about the secure cache architecture which can effectively mitigate the defenses in cache side-channel attacks and some recommended side channel attacks. The Section VI does the conclusion which strategies will also be discussed in the following content. provides research challenges, summary and recommendation Index Terms— cache side-channel attacks, cache design, deep for this paper. learning, detection, prevention, challenges. II. BACKGROUND I. INTRODUCTION A. Basic Principles of Cache Side Channel Attacks With a series of hardware security vulnerabilities in Intel chips like Spectre [1] and Meltdown [2] leaked by security Cache is a small amount of memory used to buffer data that researchers, the security threats caused by the characteristics of is frequently used by CPU. It uses the principle of locality in the hardware architecture are getting more and more attention data access to avoid high delays when data is reaccessed in from many cybersecurity experts and academics. In terms of main memory. Since the caches can be shared between different hardware attack, the side channel attack is the most common processes, Cache-side channel attacks are very easy to attack which may bring great security threats to user implement in CPU. By monitoring the Cache access behavior information and equipment security. Different from traditional of a given program, hackers can analyze Cache hit and miss attacks that directly access to the user’s data, the side channel times to infer sensitive information about that program. For attacks get sensitive information from the system by exploiting example, the algorithm shown in Figure1 is an RSA square and analyzing physical information like time, power exponentiation, and the sensitive data in the algorithm can be consumption and temperature from certain monitored resources obtained by the side channel attack. [3]. This type of attack is commonly used to circumvent the For each loop, the squ () and mod () functions are executed security provided by encryption algorithms and other security once, and the mul () and mod () functions are executed once mechanisms. when the 𝑒𝑖 is 1. Therefore, the number of executions of the Generally speaking, side channel attacks have three function squ () in the third line can indicate the number of bits characteristics [4]. The first is that this kind of attack is difficult in the binary index e, and whether the function mul() in the to be detected. Most side-channel attacks do not require high sixth line is executed or not indicates whether the value of the system permissions. As long as the collection of side channel bit is 1. By observing the Cache access behavior of these two information can be completed, subsequent analysis can be functions and comparing the time of the address access, hackers carried out. Secondly, the cost of defending against this type of can determine whether the function was executed or not, and thus infer the value of all the bits of the exponent e [5]. *Tianyu Cai and Yi Yen Low are co-first authors. All authors are with the Department of Information and Communication Technology, Xiamen University Malaysia (email: CST1809196@xmu.edu.my, CST1809730@xmu.edu.my, tmkamran@gmail.com). 19 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 program with the overall time difference. The attackers can infer the execution process of the program based on this difference in the execution time. However, this attack mode needs to trigger the user program several times, and needs to judge the overall execution time of the program, which has poor accuracy and long detection time. Therefore, the application scope of this attack mode is relatively small [7]. D. Prime + Probe Attack The Prime+Probe attack was first mentioned by [8]. The characteristic of this attack is that there is no need for the Fig. 1. RSA square exponentiation [5] attacker and the victim to share memory. The attack principle is similar to the EVICT +Reload attack, which uses the time difference between Cache hit and missed to determine the B. General Method of Cache Side Channel Attacks Cache access behavior of the target program. However, because the Prime+Probe attack does not rely on the shared memory of According to the way the attacker obtains information, the spyware and the target program, unlike EVICT +Reload Cache side-channel attacks can be divided into three types: attack, the hackers can only measure the time difference between a spy's access to certain Cache group. 1) Time-based attacks: The attacker can estimate the Using the Cache access feature, the attack is divided into number of cache hits and misses of the target three steps: 1) In the padding phase, the attacker populates all program by observing the overall running time of the rows in a specific Cache group by accessing part of his certain calculation. memory space; 2) In the target access phase, the attacker waits for the target program to access the memory rows in the Cache 2) Trace-based attacks: When the target process is group, so as to clear the corresponding Cache rows filled in the running, the attacker can speculate which memory first phase; 3) In the probe phase, the attacker accesses all the access has caused a cache hit by observing the rows in the Cache group again and measures the interview time. approximate cache behavior. If the line has been purged from the Cache by the target program, the query time is long. On the other hand, if all 3) Access-based attacks: The attacker analyzes the padding is still stored in the Cache, the access time is shorter. target process's access to the specified Case group, With this attack, the attacker determines whether the target and then infers the sensitive data accessed by the program accesses the specified Cache group, thereby inferring target process. sensitive data [7]. E. Flush + Reload Attack C. ECIVT + Reload Attack The Flush + Relooad attack is a variant of the Prime + EVICT + Reload Attack is a typical type of access-based Probe attack. The memory page sharing between the spy Cache side channel attack, mainly targeting L1-Cache to program and the target program allows this attack to clear the retrieve sensitivity information. The principle of this type memory lines in the cache at all levels, including L3 [9]. In Intel attack is that the attacker takes advantage of the characteristics processors, user threads can flush both readable and executable of Cache group association, and clears and checks all the pages with the clflush directive, which allows hackers to attack memory rows in the specified Cache group. Since the spyware by flushing pages shared with the target program. By flushing process and the target process share the Cache, the spyware the target memory location frequently with clflush instruction, process can detect which row in the cache group has been hackers can analyse the time taken to reload the row to reloaded by the target process, so that the private Cache data in determine whether the target program has also cached the row the inner core of the target process can be obtained [6]. in Cache. The specific attack process is as follows:1) The attacker The Flush+Reload attack consists of three steps: 1) The measures the time required for the execution of the program monitored memory location is refreshed from the cache; 2) The under normal operation; 2) According to the previously spyware waits for the target program to access the memory line; determined vulnerabilities, the attacker fills the corresponding 3) The spyware reloads the flushed memory row and measures location cache and "clears" this part of content from the cache; the load time [7]. 3) The attackers measure the time required for program execution again, so as to judge the execution process of the user 20 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 III. DETECTION METHOD to prevent attacks. However, these implementations require a lot of manual work to adapt to different environments. These A. Using Hardware Performance Counter (HPC) problems can be solved by automatic compilation techniques HPC is a special register typically configured in modern proven by Coppens et al [14]. They use if-conversion to microprocessors to store counts of system activities like cache eliminate the conditional move instruction at the back-end of hits/misses. And it is one of the most common side channel compile, which shows that the automated compiler technique attack detection tools which can find the cache references and that can be used to eliminate the key-dependent control flow. misses caused by the side channel attack. Chiappetta et al. propose three HPC based detection methods targeting the B. Constant-Time Techniques Flush + Reload cache side channel attacks [10] . The first method of their research is to find out the correlation between Once the hackers get the variation of encryption time victims and spies using the information provided by the HPC. with respect to data, they can easily implement the side channel The other methods are based on the machine learning attack on victim’s system. Therefore, make the time of system technique using five relevant features like total accesses of L3 constant can be a good method to prevent cache-based side cache to predict the spies. In terms of cross VM cache side channel attack. However, this kind of technique is difficult to channel attack, Mohammad-Mahdi et al. propose a Gaussian implement because of the complexity of hardware and anomaly detection method using the information provided from algorithm. To solve this problem, Brickell et al. propose a the HPC and Intel Cache Monitoring Technology (CMT) [11]. method using a small compact S-table that all the cache lines This method periodically analyzes the behavior of virtual are contained in it and can be accessed and permuted every machines to identify malicious activity based on the round [15]. The key processes in this method are constant time information of cache misses and it has a very high detection and the distribution of average time follows a Gaussian rate with only 2% performance overhead. distribution. C. Preventing Memory Sharing B. Using Performance Counter Monitor (PCM) Flush + Reload based side channel attack mainly PCM is a visual monitoring tool built into Windows. It depends on the shared physical memory in the system to steal records system resource usage in real time at the operating information. To defense Flush+Reload side channel attack, system level and can monitor the number of hardware events Zhou et al. propose a copy-on-access model to manage the such as the CPU kernel, cache, memory controller and memory shared physical pages through security domain [16]. Its main chip. Therefore, it is a very useful tool for side channel principle is to make each domain have its own copy by copying detection. Gulmezoglu et al. propose a model called the physical pages that are being accessed by multiple different FortuneTeller to detect different types of side channel attack security domains. In this case, the user's access to the copy of [12]. They use PCM and deep learning technique Recurrent the page is invisible to hackers in the Flush + Reload side- Neural Networks (RNN) to train a model to predict what value channel attack. of performance counter will be. Once the value of counter dose not satisfy the prediction, the side channel attack can be determined by the system. However, D. Cache Flushing deep learning requires a large amount of data and time cost to train the model to complete the detection. In order to achieve Cache clearance technology is to clear up the changes of real-time detection channel attack detection, Jonghyeon et al. microarchitecture state caused by the execution of user propose a detection method based on PCM and machine programs with the assistance of operating system. Zhang et al. learning algorithm [13]. They use machine learning propose a system called Duppel to help VM mitigate the cache- classification algorithms to detect whether some abnormal based side channel attack [17]. The mechanism of it is to events appearing in the system. This method can detect various periodically clean the cache of L1 and L2 cache which can side channel attack with more than 90% accuracy in the effectively prevent the attacker from obtaining the accurate different environment. time variation of the victim. Unnecessary cache cleanup operations are skipped in this method to achieve less than 7% of the performance overhead. IV. PREVENTION METHOD A. Automatic Compile Technology E. Reduce Measurement Accuracy Cache based side channel attack are carried out by obtaining information from the physical implementation of a Only the attackers know the program execution and cryptographic system. Therefore, adding noise or memory access pattern of the victim, can they carry out the side randomization to the encryption implementation is a good way channel attack on the victim. Therefore, the accuracy of 21 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 obtaining information leads directly to whether the attack is B. Partitioning-based Caches – SecVerilog Cache successful or not. Taesoo Kim et al. propose a system-level mechanism that can manages a set of locked Cache lines for each processor core [18]. Each virtual machine can load its own sensitive data into its locked Cache line, making it invisible to other virtual machines. In this case, the attackers are not able to obtain accurate information of the system. V. SECURE CACHE ARCHITECTURE A. Partitioning-based Caches – Static Partitioning (SP) Cache Fig. 3. SecVerilog cache [19] Static partitioning techniques will also be used in SecVerilog cache. In SecVerilog cache, cache blocks which are located between security levels Low (L) and High (H) will be statically partitioned. As shown in Figure 3, distinct instructions will be tagged with distinct labels which are H and L. Both H and L partitions can be read by H instructions while L instructions can only access to L partition. For those programs which are using SecVerilog cache, each instruction of the source code requires to include a timing label. This timing label is used to indicate whether the Low or High security level is accessing the data according to the code. Timing label here is similar to the Fig. 2. SP cache [19] function of a process ID as they both can differentiate between the attacker’s instructions (Low security level) and victim’s To build a secure cache architecture for SP cache, instructions (High security level). partitioning technique is pressed into service. As shown in Based on this kind of cache design, the operation’s timing Figure 2, the cache will be statically separated for both attacker produced from Low partition will not be affected by the and victim. Thus, the attacker and victim will have distinct operations of the High partition. For a write or read miss, both cache ways (or sets). Among different processes, the eviction partitions can only be modified by their corresponding of cache line is prohibited. Since the sharing of cache is not instructions. When there is a cache miss occurs due to Low allowed in this cache architecture, it can be used to prevent the instructions and the retrieved data exists in the High partition, interferences among the memory accesses of the attacker and it will result in a cache miss. Then, in order to maintain the victim. Nevertheless, this security protection leads to consistency, the data in High partition will be moved to Low degradation of cache performance and it can only be applied in partition. However, when the data is found in the cache, High some extremely security-sensitive applications [4]. By instructions will lead to a cache hit in both partitions [20]. applying partitioning techniques, the cache ways of SP cache will be statically partitioned into two partitions based on distinct process IDs of the attacker and the victim. C. Partitioning-based Caches – SHARP Cache For the attacker, the cache ways are separated into Low Cache line CVB partition while the cache ways will be partitioned into High partition for the victim. Generally, the attacker is related to Low security and the victim belongs to High security. The Low Fig. 4. SHARP cache [19] partition which is assigned to the attacker cannot be modified by the victim’s memory accesses while the High partition The implementation of SHARP cache involves both which is assigned to processes such as the victim cannot be randomization and partitioning scheme and it is mainly targeted modified by the attacker’s memory accesses. However, once at those inclusive caches. In general, SHARP cache is designed the retrieved data is currently in cache, both the attacker’s to prevent eviction-based attacks and protect the data of victim memory accesses and the victim’s memory accesses can cause from being flushed by some malicious processes. Every cache a cache hit in either partition [20]. block will be augmented with the core valid bits (CVB) as shown in Figure 4. In SHARP cache, cache hit among distinct data of processes is accepted. 22 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Once a cache miss occurs and there is some data requires to E. Randomization-based Caches – Random Fill (RF) Cache be evicted, those data not owning by present processes are evicted first. Instead, if the data stated above is not available, the eviction of data from same process will be done. Otherwise, when the current data in cache is not from the identical process, the eviction of data in the cache set will be conducted in random order. An interrupt will be generated by this random eviction to notify the operating system of any possible suspicious event [21]. D. Randomization-based Caches – Random Permutation (RP) Cache P ID Original cache line Fig. 5. RP cache (1) [19] Fig. 7. Random Fill cache [19] To provide protection against reuse-based attacks, the cache fill strategy is redesigned to decorrelate the desired memory access and the cache fill. Figure 7 shows the architecture of random fill cache. For the applications in this cache, some new instructions will be used to control whether the demanded data refers to a random fill request or a normal request. By performing the normal request, the cache will be filled with the missing line and it will also send the returned data to the processor while the random fill request deals with the cache filling but there is no any data being sent to the processor. Fig. 6. RP cache (2) [19] The processing of cache hits in RF cache is similar to a By applying randomization technique in RP cache, the normal cache. When the critical data of the victim is being accessing to memory address and the cache timing can be accessed, it will be executed as a Nofill request and the decorrelated. As shown in Figure 5, for each RP cache block, a accessing of the demanded data can be implemented without protection bit (P) and a process ID will be extended for each any caching. Nofill request here refers to a demand fetching line to indicate whether a certain cache block is being protected which can forwards the returned data directly to the processor or not. Figure 6 shows a new permutation table (PT) will be but the cache will not be filled at the same time. Meanwhile, for used in RP cache so that pre-calculated permuted set number of the processing of a random fill request, it will bring an arbitrary each cache set can be stored. The number of tables are decided data within the addresses’ range into the cache. To deal with by the number of protected processes. During the memory other memory accesses of process and the accessing to the access processes, cache hits occurs when both the process ID normal victim’s memory, a normal request is performed so that and the address are identical. the normal replacement policy can be achieved [22]. Once a cache miss occurs due to data D which is belong to cache set S, if the data to be brought in and the data to be evicted are in the same process but their protection bits are different, it will evict an arbitrary data from random cache set S’ and the accessing of data D can be done without any caching. Instead, if both data not belong to same process, it will store D in an evicted cache block of S’. At the same time, the mapping between S’ and S is swapped. Otherwise, the normal replacement policy will be carried out [20]. 23 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 F. Randomization-based Caches – Non-Deterministic Cache be evicted by the victim’s processes. This will further lead to the future cache misses at attacker side and provide an opportunity for the attacker to draw inferences about the cache accesses done by the victim. There are some accessed-based cache side-channel attacks which are belong to this type of attacks such as Percival’s attack. Type II: Cache Miss-based Attacks due to Internal Interference Different with type I attacks, this attack does not require the attacker and the victim to run their programs concurrently. For the attacker, he only needs to make a Fig. 8. Non-Deterministic cache [19] measurement about the total execution time performed by the victim such as time to complete an encryption for a plaintext For this Non-Deterministic cache, cache access decay will be block. A shorter execution time means that there will be less used to implement randomization between the timing of cache cache misses occur during the victim’s execution process. access and the access of cache block. As shown in Figure 8, Based on this observation, the attacker can obtain the local counters are used to record the interval between its information related to the memory accesses performed by the corresponding data activeness and it will be incremented on victim. There are some timing-based cache side-channel each clock tick of the global counter once there is an untouched attacks which can be categorized into this class and one of the data. An invalidation of the corresponding cache line will take typical instances that belongs to this class is Bernstein’s attack. place if the predefined value is reached by the counter. In this Non-Deterministic cache, the initial value of local counters will Type III: Cache Hit-based Attacks due to External be set randomly and this value cannot more than the global Interference counter’s maximum value. This cache designed ensures the changing in cache delay can be randomized. For this class of attacks, some memory space will be shared by both the victim and the attacker such as sharable By controlling the interval between cache delay, it results in cryptography library. Initially, some or all the sharable memory distinct cache miss and cache hit statistics since the randomized blocks will be evicted by the attacker from the cache. The counter which is corresponding to different cache line will be attacker requires to wait a certain time after the execution of the used to determine the invalidation. Hence, the decorrelation of victim is completed. After that, the reading of the sharable cache’s access time and the accessed address can be done. memory blocks and the measurement of the access time can be However, by comparing to the other secure caches, this Non- done by the attacker. According to the observation result, a Deterministic cache may cause a tremendous performance short access time indicates there will be a cache hit occurs at degradation [20]. attacker side and this means that the victim has accessed to a certain cache line at that interval and then re-fetched it into cache. By inference, the attacker can obtain the information G. Effectiveness of Secure Caches against Timing-based about the memory addresses accessed by the victim. The Cache Side Channel Attacks access-based cache side-channel attacks will belong to this type of attacks. As stated in [23], for the cache side channel attacks, the major cause of the effective attacks is related to interference. Type IV: Cache Hit-based Attacks due to Internal Interference here refers to either the internal interference that Interference can be found inside the own program of the victim or the external interference which exists between the programs of the These attacks are conceptually similar to type II attacks as victim and attacker. Based on the observations of the cache it will require the attacker to do the measurement of the total behaviors done by the attacker, the cache side-channel attack execution time from the victim side. In these attacks, the main can be categorized into four different types as bellow: concern is to observe the cache hits occur in the victim’s code. A shorter execution time indicates there will be more cache hits Type I: Cache Miss-based Attacks due to External occur when the victim runs the execution. Hence, by using the Interference cache collision related to memory accesses, an inference can be drawn by the attacker to acquire some information about the For this type of attacks, the processes of both the encryption keys. Some timing-based cache attacks are a kind of victim and the attacker will be run on the identical processor this attack such as Bonneau’s attack. and the identical data cache will be shared by them. In this way, To model the possible vulnerabilities of cache timing, a the cache lines which are used to hold the data of attacker may new three-step model is proposed in [20]. By referring to the 24 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 analysis results proposed by the author, Table 1 and Table 2 VI. CONCLUSIONS show the effectiveness of the six secure caches stated above to thwart different types of timing-based attacks. The results show A. Research Challenges that the protection against the internal interference is harder There is often a tension between the performance and the than prevention of the attacks due to external interference since security protection of a computer architecture design. When the SP cache, SecVerilog cache and SHARP cache and RP cache designers deal with the design of computer architecture, they fail in this protection. Meanwhile, the design of may focus on the aspects which can easily show their design is Nondeterministic cache which will completely randomize the better. For instance, area, performance, power numbers, etc. access time of the cache successfully prevent all the attacks. However, there is no such metric that can be used to describe Nevertheless, the design of this cache may lead to a great clearly which design is secure than others. Besides that, to test expense for a complex application. the vulnerability of a cache design, it can be conducted by using either real machines or simulation. By running on real machines, some modification of the hardware can be done by adding some Table 1. Effectiveness of partitioning-based caches against cache side-channel attacks [19] new features and implementing the security testing. Compared with real systems, the result obtained from the SP cache SecVerilog SHARP cache simulation sometimes is not accurate enough to represent the cache real systems and it may result in unclear performance effects Attacks on √ √ √ for some security features. Furthermore, although some ideal Cache Misses due cache designs have been proposed but there might be an issue to external interference with the implementation in commercial processors. Also, there Attacks on × × × is limited formal work on proving the security of a design and Cache Misses due more work related to the modeling of all possible cache side- to internal channel attacks is required [20]. interference Attacks on × √ × Cache Hits due to B. Summary & Recommendations external interference To provide a secure protection against the cache side- Attacks on × × × channel attacks, there are a large number of secure cache Cache Hits due to architectures, detection and prevention method being proposed. internal interference The design of secure cache can be classified into two strategies √: can prevent certain attack which are the partitioning and the randomization techniques. ×: fail to prevent certain attack For the partitioning technique, this strategy will focus on the division of the cache into distinct zones to deal with distinct Table 2. Effectiveness of randomization-based caches against cache side- processes. In this way, the cache sharing can be prevented and channel attacks [19] hence the information leakage can be thwarted. There are some caches follow this design idea such as Static Partitioning (SP) RP cache RF cache Non- Deterministic cache, SecVerilog cache, SHARP cache, Sanctum cache, PL cache cache and others. In the randomization approach, the Attacks on √ × √ information about the side-channel will be randomized so that Cache Misses due the accuracy of the leaked information from the cache can be to external decreased. For example, RP cache, Newcache, RF cache, interference Attacks on × × CEASER cache, etc. √ Cache Misses due As discussed in [24], the capability of side-channel attacks to internal and the strategy to mitigate the corresponding damage have interference attracted a great attention of those cybersecurity experts and Attacks on √ √ √ academics. Generally, the protection against cache side- Cache Hits due to external channel attack can be categorized into two main strategies. First interference technique refers to reduction of the leaked signals from the Attacks on × √ √ computer. This strategy requires the deployment of distinct Cache Hits due to security strategies such as the elimination of the leaked cache internal information, adding some random noise into the computer interference computation process and the detection against the unfriendly √: can prevent certain attack ×: fail to prevent certain attack environment. To prevent the leakage of cache information, the design of software and hardware can consider to allocate more sophisticated cache or the using of “leakage free code”. 25 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Some delays, breaks or unessential computation can be detection through Intel Cache Monitoring Technology added deliberately into the process as this useless random noise and Hardware Performance Counters. In Proceedings of the 2018 Third International Conference on Fog and will mislead the attacker. Hence, this will increase the difficulty Mobile Edge Computing, Barcelona, Spain, 23–26 April for the attacker to analysis or decrypt some sensitive 2018. information such as the extraction of cryptographic key. The [12] Gulmezoglu, B.; Moghimi, A.; Eisenbarth, T.; Sunar, B. detection of hostile environment includes the discovering of FortuneTeller: Predicting Microarchitectural Attacks via Unsupervised Deep Learning. arXiv 2019, some possible malicious modifications to cryptographic arXiv:1907.03651. process which will trigger the defensive measure. Therefore, [13] Cho, J., Kim, T., Kim, S., Im, M., Kim, T., & Shin, Y. this technique is effective to thwart against some possible (2020). Real-time detection for cache side channel attack analysis attacks. The second defense refers to the breaking of using performance counter monitor. Applied the connection between the critical information and the leaked Sciences, 10(3), 984. data. However, these proposed solutions are never secure [14] Coppens B, Verbauwhede I, Bosschere KD, Sutter BD (2009) Practical mitigations for timing-based side- enough as the attacker will always seek for new manners to channel attacks on modern x86 processors. In: 2009 30th mount the attack and some new approaches to decipher the IEEE symposium on security and privacy, pp 45–6 system signals. [15] Brickell E, Graunke G, Neve M, Seifert J (2006) Software REFERENCES mitigations to hedge AES, against cache-based software side channel vulnerabilities. IACR Cryptology ePrint [1] P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Archive 2006:52 Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. [16] Zhou Z, Reiter MK, Zhang Y (2016) A software approach Schwarz and Y. Yarom, "Spectre attacks: Exploiting to defeating side channels in last-level caches. In: speculative execution," 2019 IEEE Symposium on Proceedings of the 2016 ACM SIGSAC conference on Security and Privacy (SP), May 2019. computer and communications security. ACM, New [2] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, J. York, CCS ’16, pp 871–882,. https://doi. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, M. org/10.1145/2976749.2978 Hamburg, and R. Strackx, “Meltdown,” Communications [17] Zhang Y, Reiter MK (2013) Duppel: retrofitting of the ACM, vol. 63, no. 6, pp. 46–56, 2020. commodity ¨ operating systems to mitigate cache side [3] T.-H. Le, C. Canovas, and J. Clédière, “An overview of channels in the cloud. In: 20th ACM SIGSAC conference side channel analysis attacks,” Proceedings of the 2008 on computer and communications security. ACM, New ACM symposium on Information, computer and York, pp 827–838 communications security - ASIACCS '08, 2008. [18] Kim, T., Peinado, M., & Mainar-Ruiz, G. (2012). [4] He, Z., & Lee, R. B. (2017, October). How secure is your {STEALTHMEM}: System-level protection against cache against side-channel attacks? Proceedings of the cache-based side channel attacks in the cloud. In 21st 50th Annual IEEE/ACM International Symposium on {USENIX} Security Symposium ({USENIX} Security Microarchitecture, 341-353. 12) (pp. 189-204). doi:10.1145/3123939.3124546 [19] “Processor architecture security - yale university.” [5] Miao Xinliang, Jiang Liehui, Chang Rui. Review on [Online]. Available: Access-driven Cache Side Channel Attack [J]. Journal of https://caslab.csl.yale.edu/tutorials/acaces2019/acaces20 Computer Research and Development, 2020, 57(4): 824. 19_proc_arch_sec_part-3.pdf. [6] Gruss, D., Spreitzer, R., & Mangard, S. (2015). Cache [20] Deng, S., Xiong, W., & Szefer, J. (2019). Analysis of template attacks: Automating attacks on inclusive last- Secure Caches Using a Three-Step Model for Timing- level caches. In 24th {USENIX} Security Symposium Based Attacks. Journal of Hardware and Systems ({USENIX} Security 15) (pp. 897-912). Security, 3(4), 397-425. doi:10.1007/s41635-019-00075- [7] Wang C, Wei S, Zhang F, & Song K. (2021). A review of 9 cache side channel defense. Computer Research and [21] M. Yan, B. Gopireddy, T. Shull, and J. Torrellas, “Secure Development, 58(4), 794. hierarchy-aware cache replacement policy (sharp),” [8] Osvik, D. A., Shamir, A., & Tromer, E. (2006, February). Proceedings of the 44th Annual International Symposium Cache attacks and countermeasures: the case of AES. on Computer Architecture, 2017. In Cryptographers’ track at the RSA conference (pp. 1- [22] Liu, F., & Lee, R. B. (2014). Random Fill Cache 20). Springer, Berlin, Heidelberg. Architecture. 2014 47th Annual IEEE/ACM International [9] Schwarz, M., Weiser, S., Gruss, D., Maurice, C., & Symposium on Microarchitecture, 203-215. Mangard, S. (2017, July). Malware guard extension: doi:10.1109/micro.2014.28 Using SGX to conceal cache attacks. In International [23] Zhang, T., & Lee, R. B. (2014). New Models of Cache Conference on Detection of Intrusions and Malware, and Architectures Characterizing Information Leakage from Vulnerability Assessment (pp. 3-24). Springer, Cham. Cache Side Channels. Annual Computer Security [10] Chiappetta, M.; Savas, E.; Yilmaz, C. Real time detection Applications Conference (ACSAC). of cache-based side channel attacks using hardware [24] “Side-channel attacks: strategies and defenses,” Intertrust performance counters. Appl. Soft Comput. 2016, 49, Technologies, 18-Aug-2021. [Online]. Available: 1162–1174. https://www.intertrust.com/blog/side-channel-attacks- [11] Mohammad-Mahdi, B.; Thibaut, S.; Marc, L.; Sudholt, strategies-and-defenses/. M.; Menaud, J. Cache-based side channel attacks 26 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Using Twitter Sentiment Analysis to Assess US Airline Industry Aayush Srivastava, Ou Liu Abstract - Sentiment analysis is extremely beneficial in exponential rate. Because Sentiment Analysis data can have a social media monitoring as it gives an overview of the wider lot of commercial value, even a minor improvement in public opinion behind a certain topic. Same applies to the implementation performance can add up to a lot of money. As Airline Industry which is a major contributor to world’s a result, two Supervised Machine Learning models best suited GDP. Therefore, there is a significant need to analyse according to the dataset are compared in this work, as adapting customer data in this sector to get people’s opinion about models for various business use-cases and domains is a the dominant players in Airline Industry. significant need. In this research, a dataset consisting of Twitter Tweets The proposed approach is implemented using the US Airline for 6 major US Airlines is used to perform Sentiment Dataset after examining past work on the topic and highlighting Analysis. Unsupervised Models are used for Aspect and the most effective approaches. Sentiment Analysis is conducted Subjective Level Sentiment Analysis to determine the to determine the Percentage of Negative Aspect for each airline words used to describe each sentiment. Further, two and determining a classification model specific to the twitter Supervised Learning Models (SVC and Naïve Bayes airline data. The aim of this research is to extract valuable Classifiers) are compared to determine which model is the information from Twitter Airline data by classifying customer best fit for the Airline Industry and can be used to predict responses to airline services in order to provide constructive the sentiment of future tweets. This research would help the feedback to airlines on their services and to help customers Airline Industry to improve on their existing services as choose the best airline. they can analyse what negative aspects customers are talking about and also help the customers choose the best II. LITERATURE REVIEW airline for their travel. Sentiment analysis has been used in a variety of fields in recent years, from assessing movie reviews to stock marketing I. INTRODUCTION analysis, but there has been little research done in the airline Sentiment Analysis is an important tool for customer business. Wu and Liao [2] examined the leading and trailing relationship management. It is the process of identifying metrics for 38 airline businesses using Data Envelopment opinions expressed in a piece of text, by applying Natural Analysis (DEA) and annual business reports. Similarly, Language Processing and Text Mining Techniques, especially Hannigan et al. [3] used the annual business reports to examine to determine the polarity of the text i.e., positive, negative or the association between several performance parameters in neutral. Twitter serves as a good source to get customer United States of America based airlines from 1996 to 2011. feedback, opinions and to perform Sentiment Analysis. Since Using regression models, they discovered a positive association Airline Industry is one of the largest and leading industries in between the price and performance over time, as well as a the world, with approximately 2,250,000 passengers travelling negative relationship with the quality of the services. The every day in the United States of America [1], for this project, authors address future research in terms of studying various the focus is set on six major US Airlines i.e., Southwest, data sources that could provide new insights into the study. American, United, Delta, US Airways, and Virgin America. Sultan and Simpson Jr. [4] compiled data from surveys of US The data is taken from the standard Kaggle Dataset: Twitter US and European travellers and developed the SERVQUAL model Airline Sentiment which was released by CrowdFlower, based on customer expectations and opinions for airline containing 14640 tweets. performance. The reviews were divided into five categories in For data gathered from Twitter, it seems that there is a the model: Tangibles, Reliability, Responsiveness, Assurance, consensus that machine learning algorithms, using relatively and Empathy. Min and Min [5] discovered 18 factors that can simple feature generating methods may produce invaluable be used as a standard in airline services. Anitsal et al. [1] study findings for research purpose and for use in production the sentiment of passengers for the top 10 US airlines from environments with real-life use-cases. Proper and efficient Skytrax, building on the work of Min and Min [5]. implementation is necessary as such applications provide Li [6] focuses on the gaps in Sentiment Analysis for the extremely valuable data for businesses. The amount of data airline review site Skytrax in order to achieve industry collected by microblogging services like Twitter is rising at an standards such as dependability, discriminating validity, and external validity. The study demonstrates how Skytrax research Both authors are with Operations and Information Management Department, based on numerical ratings is unreliable for judging any Aston University, Birmingham, UK (email: 200167828@aston.ac.uk; airline's performance. According to Kaur and Duhan [7], o.liu@aston.ac.uk ). 27 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 negation handling, domain generalisation, pronoun resolution, such as automated advertising or political propaganda. The data language generalisation, related knowledge, and mapping itself could be unethical in certain situations. When data is slangs are six critical issues that occur when transforming written in a language other than English, it may take on a textual information into a meaningful analysis. Li [6] has done completely different meaning when translated into English. As outstanding work in overcoming these obstacles to a reasonable a result, it should be avoided because the data could result in degree, but there is still potential for error when dealing with model failure. We can't control data bias, but we can be bogus or spam reviews. This study will aid subjective level conscious of it for better research understanding. analysis, as document-based analysis does not allow for the extraction of features from the content. There has been a lot of B. Data pre-processing research done on LCCs, but it has all been done through Data pre-processing is the process of transforming data into passenger surveys. Yee Liau and Pei Tan [8] on the other hand, a format that can be understood. The data gathered from Twitter analysed consumer opinions for LCCs in Malaysia using the is frequently inconsistent, containing missing values, outliers, microblogging site Twitter; Twitter tweets were collected for 2 and anomalies that must be addressed prior to any analysis. months for the Airlines. As the majority of the tweets were After multiple rounds of pre-processing, cleaned data is written in Malayan, a Malay lexicon was created for Sentiment acquired. Words from the English Vocabulary are subjected to Analysis, and the tweets were further divided into four clusters: Sentiment Analysis. As a result, the model's input data must be flight cancellations, ticket promotions, customer satisfaction, cleaned and validated for quality. and delays. Our data pre-processing includes the following steps: text Twitter data is well-researched as it is very approachable data normalization, tokenization, stop-words handling, Stemming / source for valuable information. This research aims to extract Lemmatisation, N-gram converting, Infrequent word filtering, valuable information from Twitter Airline data by classifying and synonym handling. After these pre-processing steps, the customer responses to airline services in order to provide dataset is reduced to seven columns, as shown in Fig. 1 constructive feedback to airlines on their services and to help customers choose the best airline. It will explain the feature engineering process, which includes pre-processing approaches for transforming text into data that machine learning algorithms can analyse. III. SENTIMENT ANALYSIS AND EVALUATION A. Data Source In this project, the data is sourced from standard Kaggle dataset: Twitter US Airline Sentiment released by CrowdFlower. It contains a total of 14640 tweets for six major US Airlines that are: Southwest, American, US Airways, Delta, United and Virgin America. The tweets were a mix of positive, negative and neutral sentiment. Primary data collection processes such as interviews, surveys, site visits, etc. are not required for this work. It's possible that tweets include erroneous information or are biased in some way. These tweets may contain nonsensical data, Fig 1: Preprocessed Dataframe 28 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 C. Data Analysis and Visualization We first create a Dataframe for each airline and calculating the number of neutral, positive, and negative entries for each. The table/figure below describe the sentiment count per airline. Fig. 2: Sentiment Count per Airline Fig. 4: Individual Description of Sentiment per Airline We can notice right away that United has the highest total number of negative comments. This isn’t however enough to establish that United receives the most negative feedback. Because if we look at US airlines, it appears that US Airways has a higher percentage of bad reviews than United. We must still consider the favourable feedback they have received. When determining which airline is the worst, we must consider not only the negative but also the total counts. The airline with the most negative reviews is not necessarily the worst. Assume that there are two airlines. One airline has the most negative feedback, despite the fact that it has a significant overall amount of feedback because there are many positive and neutral feedbacks as well. The other airline has fewer negative reviews, but it also has fewer positive or neutral reviews. To display the word frequencies for negative feedback, neutral feedback, and positive feedback, we built three bar charts. Fig. 3: Sentiment Count per Airline Bar Chart It can be said that neutral count, positive count, and negative count differ from one another based on this graph. We utilise a bar chart to make the three counts of different airlines more visible. United Airlines appears to have the most negative feedback, based on a count. For easier viewing, we can divide the bar chart above into three independent charts. The neutral count, positive count, and negative count charts are the three types of charts. 29 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Fig. 5. Negative word frequency Bar Chart The top 50 most common words in negative feedback can be seen in the bar chart above. The words “service”, “hold”, “hours”, and “cancelled” appear substantially more frequently than other terms. This suggests that when individuals are dissatisfied with an airline's service, they are more likely to criticise it. Also, the word "cancelled" implies that flight cancellation is the most common reason for passengers to express their dissatisfaction on Twitter. We can not only see the frequency rank of terms in feedback from a bar chart like this, but we can also tell which words have a clearly higher frequency than others. We now describe the neutral feedback. 30 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Figure 6: Neutral word frequency Bar Chart The top 50 most frequent words in neutral feedback can be seen The top 50 most frequent words in favourable comments can in the bar chart above. “Need”, “please”, “dm”, and “flights” be seen in the bar chart above. “Thank”, “thanks”, and “great” are among the words that appear more frequently than others. are among the words that appear more frequently than others. This demonstrates that when passengers have a great flight The word frequency of positive feedbacks are shown below. experience, they are more likely to praise the airline on social media. Figure 7: Positive words frequency Bar Chart 31 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 assist customers in selecting their preferred airline. It will also benefit US Airlines in identifying areas for improvement and In order to train (or fit) the classifier model using Supervised comparing their performance to that of their competitors in Machine Learning methods, an existing dataset with target order to get a competitive advantage in the market. values is required. Another dataset is required that is not When compared to the work of Li [6] and Yee Liau and Pei presented to the classifier during training so that the trained Tan [8], the results are more efficient. Most of their prior work classifier's performance can be evaluated on unseen data before is focused on either Document Level Sentiment Analysis or it is used in production. In this context, evaluating implies research on low-cost carriers (LLC). Extending the prior airline comparing the existing target data to the trained classifier's research, we were able perform the Aspects and Subjective predictions. Evaluating can be as easy as counting the correct Level analysis. This research differs from previous studies in predictions (to gain an overall percentage), or it might involve that it employs Supervised Machine Learning models along more sophisticated metrics such as sensitivity, selectivity, and with Unsupervised Machine Learning models; in the airline F1-score, among others. business, there is limited research on Negative Aspect Given the extraordinarily long-tailed distribution of word Detection. As a result, our study fills a gap in the literature. frequency in English (and any other natural language), some During the Exploratory Data Analysis (EDA), we discovered words may appear in practically every document, while others that US Airways has significant Customer service issues may be extremely rare. This results in a significant numerical followed by American and United Airlines. This is verified imbalance in the dataset representation matrix, and in rare when we perform Sentiment Analysis and detect the Negative Aspect Percentage for each airline where we get similar results. situations, it can even cause software limits (dividing small The negative words that are used in the tweets can be seen when numbers by very large numbers can result numbers which are we have the Bar Chart of top 50 negative words. impossible to represent in a given software architecture, so some calculation simple cannot be made). Some classifiers (for V. CONCLUSIONS example, Naive Bayes) can handle this situation successfully, but not all of them. As a result, the so-called 'term frequency– In this research, we looked at the previous work in this domain and extended the research to perform the Aspect Level inverse document frequency' approach is frequently used to Sentiment Analysis. We were able to determine the Percentage normalise the bag-of-words matrix. The following function is of Negative Feedback of all airlines and concluded that US applied to each point in the bag of words matrix: Airways is the worst airline to travel, followed by American Airlines. TFIDF(t, d, D) = tf(t, d) * idf(t, D) Thereafter, we performed the Subjective Level Sentiment Where ‘tf’ is the frequency of a term in a single document, Analysis and determined the words which are being used to and ‘idf’ is the global, inverse frequency of a term across all describe the sentiment of a tweet. We can see that “help”, documents. To put it another way, tokens that are relatively rare “service”, “hold”, “hours”, and “cancelled” were the top five but specific to a document receive a high score, whereas tokens negative terms. We weren't expecting to see words like “no”, that are overly frequent receive a low score. since they might be considered as Stopwords, but it makes We can attempt both the linear SVC (Support Vector perfect sense for them to appear frequently in negative remarks. Classification) model and the Naive Bayes model, according to “Need”, “please”, “help”, “flight”, and “thanks” are the top five the machine learning map page [9], and choose the one with the neutral words. This makes sense because question words are highest accuracy. generally neutral. Hence, we vectorise the training data and give is as input to SVC and Naïve Bayes classifiers. As the result, the SVC model “Thank”, “thanks”, “great”, “love”, and “service” are the top (0.805) outperforms the Naive Bayes (0.758) model in terms of five positive phrases. This makes sense because many accuracy. Now, the SVC model may be used to forecast future favourable remarks are simply thank you notes to the airlines. airline sentiments in tweets. This may make data analysts' jobs The research has analysed the negative aspects of the airlines, a lot easier, and it can also assist airlines figure out what's which are helpful to the customers in choosing the best airline causing their bad feedback. They may then improve their for their travel purpose. The word description can be analysed flights, services, and overall quality as a result. Finally, by by the airline industry to improve on their existing services properly evaluating such customer feedback data, they may since it provides immediate feedback from customers who use increase their earnings. a specific airline. This refers to Unsupervised Machine Learning. IV. DISCUSSION Finally, SVC model can be used to predict customer sentiment To the best of knowledge, this research on the airline industry, from the future tweets as it showed an accuracy of particularly using Aspect and Subjective Sentiment Analysis, approximately 80 percent with the current dataset. We used has been limited in the past. This study will fill in the gaps and 32 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 labelled data for training and testing the SVC classifier. This REFERENCES refers to Supervised Machine Learning approach. [1] Anitsal, M., Anitsal, I. and Anitsal, S. (2017). A sentiment This research presents a lot of room for improvement in the analysis of air passengers of top ten us based airlines. future. Furthermore, this project can be implemented in other Atlantic Marketing Association Proceedings, pp. 37–50. regional languages and in a variety of industries, such as education, transportation, healthcare and so on. This work can [2] Wu, W.-Y. and Liao, Y.-K. (2014). A balanced scorecard also be used to analyse Twitter Airline Data generated during envelopment approach to assess airlines’ performance. the Covid-19 pandemic. We can overcome these limitations by, Industrial Management & Data Systems 114(1): 123–143. Creating an experiment framework that can function [3] Hannigan, T., Hamilton III, R. D. and Mudambi, R. (2015). in tandem with commercial tools, allowing users and Competition and competitiveness in the us airline industry, administrators to customise Sentiment Analysis Competitiveness Review 25(2): 134–155. software to their own needs. Creating a multilingual framework that can identify [4] Sultan, F. and Simpson Jr, M. C. (2000). International general grammatical errors, spelling mistakes, etc and service variants: airline passenger expectations and include it in the data pre-processing step to further perceptions of service quality, Journal of services improve the performance of the classifiers. marketing 14(3): 188–216. Defining an abbreviation library for words that are usually used while posting content on social [5] Min, H. and Min, H. (2015). Benchmarking the service media/twitter. This would help to get the meaning quality of airlines in the united states: an exploratory from the text posted online. analysis, Benchmarking: An International Journal 22(5): 734– 751. [6] Li, G. (2017). Application of sentiment analysis: assessing the reliability and validity of the global airlines rating program, B.S. thesis, University of Twente. [7] Kaur, A. and Duhan, N. (2015). A survey on sentiment analysis and opinion mining, International Journal of Innovations & Advancement in Computer Science 4: 107– 116. [8] Yee Liau, B. and Pei Tan, P. (2014). Gaining customer knowledge in low-cost airlines through text mining, Industrial Management & Data Systems 114(9): 1344– 1359. [9] Scikit-learn.org. 2021. Choosing the right estimator — scikit-learn 0.24.2 documentation. [online] Available at: <https://scikit- learn.org/stable/tutorial/machine_learning_map/index.ht ml> [Accessed 18 July 2021]. 33 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Differential Privacy in Social Network Analysis: A Systematic Literature Review Dezheng Yang, Dongkun Hou, Jie Zhang relationship of nodes and edges. Data mining and latent model Abstract— In recent years, a new network architecture called analysis of social networks may yield richer and more accurate Publish/Subscribe (Pub/Sub) is getting popular due to the information by social network analysis [7]. Most of these prevalence of IoT. The unique feature of the Pub/Sub architecture is that a broker plays the role of a server, which receives messages applied datasets, however, are related to personal privacy, and with a specific topic published by the publishers, and then so attackers can extract the individual information in the public forwards the messages to the subscribers who previously network data even if the data are mostly anonymous [8]. subscribed the topic. This can decouple the publishers and the To strengthen the security of graph anonymization methods, subscribers both in time and space. They don't even need to be aware of the existence of each other. However, once the broker some researchers attempted to add noise to social network fails, the message delivery service also fails. To solve the single graphs [9]. However, adding only some disordered noises to point of failure (SPOF) problem of the Pub/Sub model, in this the graph network could be insecure when the attacker has research we implement the hierarchical broker architecture based knowledge of the context of the network data [10]. The on ActiveMQ, an open source software. Specifically, we deploy a emergence of DP provides a new idea of privacy protection for two-layer broker cluster in an IoT environment with Raspberry Pi 3 development boards. The data is synchronized between the social network analysis. upper-layer brokers (called broker hubs), and they run in a dual- Some researches have summarized the applications of DP at active configuration to serve the publishers. The lower-layer various times and area [4] [11]. For social network analysis, brokers (just called brokers) are managed by the broker hubs, and some researchers are keener to discuss the valuable information are responsible for serving the subscribers. We conducted two experiments to test the fault tolerance capability of the that they can extract, but pay less attention to the application of hierarchical broker architecture, as well as evaluating the system DP [12] [13]. As a systematic literature review, this paper asks performance of the hierarchical broker architecture compared the following research questions (RQ) on how DP could be used with a single broker. in social network analysis, and answers the questions by Index Terms— ActiveMQ, Hierarchical Broker, Internet of studying the related literature. Things, Publish/Subscribe. • RQ1. What are privacy issues facing social network analysis? • RQ2. How can DP be applied in social network analysis? I. INTRODUCTION • RQ3. What were DP approaches applied to social Differential Privacy (DP) is a way of privacy protection network analysis? presented by Dwork et al. to protect the privacy of personal information in published data [1]. In this way, even if someone with malicious motives try to extract sensitive and personal In the subsequent sections of this paper, information on how information, they can not deduce the true privacy data. DP has papers are screened and how they are classified are described been applied to meet data privacy requirement in data mining in section II. The section III will answer the research questions [2], deep learning [3], informational medicine [4], and social raised in section I. Finally, some possible improvements and network analysis [5] [6]. DP guarantees that the attackers with problems will be discussed in section IV. a high level of knowledge can not obtain any sensitive privacy information from the datasets. II. METHODOLOGY A social network focuses on the relationship between individual and other groups related to them, and is an A. Review methods abstraction of the relationship between individuals [5]. In social As a Systematic Literature Review (SLR). The structural network graph, individuals and groups are represented as nodes, organization and searching method of this article follows the and their relationships are represented as edges. It is a powerful approach proposed in an introductory article [14]. The literature tool to understand the structure of the network, and the covered in this paper was obtained through major conference databases and Google Scholar searches, and after some screening, the time frame was limited to the post-2000 period, All authors are with the School of Advanced Technology, Xi’an Jiaotong- and only conference and journal papers with clear author Liverpool University, Suzhou, Jiangsu, P.R.China. (email: information were considered. Dezheng.Yang20@student.xjtlu.edu.com, Dongkun Hou20@student.xjtlu.edu.cn, Jie.Zhang01@xjtlu.edu.cn). 34 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 B. Literature classification Sensitive information is illegally obtained by attackers’ unauthorized means, which is called privacy attacks. In social In the total literature, 15 are related to traditional DP, and 11 network analysis, the most common privacy attack is the are closely related to social network analysis. 3 articles are inference attack which is to reverse the private information of about Local Differential Privacy (LDP), and 6 articles are about a member [9]. privacy Moreover, 3 pieces of conceptual literature involve Generally, there are two types of inference attacks including social network analysis. All are from academic conferences and personal attribute inference and user de-anonymization. journals. Personal attribute inference can determine whether a target node possesses a known attribute by the neighboring nodes in III. RESEARCH RESULT the network [8]. User de-anonymization determines the privacy information of a node by comparing the anonymity graph with This section focuses on answering the three research a reference graph. The reference graph may be a graph with real questions posed in the above section. Since DP has different information but fewer node attributes [16]. applicable definitions in different contexts, and the detailed C. Common privacy models in social network analysis (RQ2) descriptions are clearly given in [1] [15], so we would not re- peat them in this paper, but only briefly describe the traditional Based on RQ2, this section enumerates common privacy definition of DP. models in social network analysis and describes general terms. 1) Node Privacy: According to [17] [5], a typical social A. Preliminary of DP network graph G is composed of a given node denoted V and Theoretically DP provides a strong privacy guarantee for the the connecting lines between the nodes, called edges denoted private information of the participating investigators. It is E. The adjacency graph G2 of a given network graph G1 is defined in the context of adjacent social network graphs, which composed of G1 adding or removing query Q satisfies the implies the existence of a single node or edge or graph graph G1 = (V1, E1), G2 = (V2, E2), V2 = V1 – E(x), where difference between two graphs or sets of graphs. DP is defined Vi and Ei are all nodes and edges in Gi, and (vi, vj) represents as EQ.1. For a randomized query Q: D R, domain D and range the specific node and the edges between node i and node j R satisfied (E, δ) DP [1]. For any adjacent graph or graph set correspondingly. 𝐸(𝑥) = {(𝑣𝑖, 𝑣𝑗)|𝑣𝑖, 𝑣𝑗 ∈ 𝑉1, 𝑣𝑖 ≠ G1, G2, the result K follows: 𝑣𝑗, (𝑣𝑖, 𝑣𝑗) ∈ 𝐸1} . When a query is privatized, an attacker Pr[𝑄(𝐺1) ∈ 𝐾] ≤ 𝑒 𝜖 Pr[𝑄(𝐺2) ∈ 𝐾] + 𝛿 (1) would not be able to determine whether a person or an Where E denotes the privacy budget parameter, which is tuned investigated person appears in the dataset. DP has a natural to weigh the accuracy and privacy of query Q. In general, a adaptation to this privacy model, but also limits the sensitively smaller privacy budget represents less privacy leakage, but the computations and queries. additional noise required would increase accordingly. The 2) Edge Privacy: Similar to node privacy, a privatized parameter δ in the EQ.1 represents the probability of allowing query Q satisfies edge privacy if the number of nodes and E-DP to be broken. This probability should preferably be less information are the same. In the graphs G1 = (V1, E1) and G2 than 1/ G. When its value is 0, it means that the query Q follows = (V2, E2), denoted as V1 = V2, and E2 + E(x) = E1, |E(x)| = the most strictly defined E-DP. k [5]. In this model, two adjacent network graphs are different Another important concept of DP is sensitivity. The role of only in k edges, meaning that the adjacent graph G2 of the considering DP is to protect adjacent datasets from attackers, network graph G1 is formed by deleting or adding k edges to which essentially adds noise to obfuscate the adjacent datasets G1. When k = 1, it commonly uses single-edge privacy so that others could not distinguish the real data. The difference criterion in the existing literature [18]. It is important to note of adjacent datasets is also made sensitive, and both in terms that edge privacy has a lower privacy guarantee than node of the difference value need to make the real data produce privacy. However, it allows more multiplicity of query recognizable features. For social network analysis, the privatization than node privacy. sensitivity of a query Q : D → R is: 3) Out-link privacy: Out-link privacy is a model similar to edge privacy [5]. If the adjacent graph G2 of a network graph ∆𝑄 = max |𝑄(𝐺1) − 𝑄(𝐺2)| (2) 𝐺1,𝐺2 G1 is composed of k out-link removed or added at a point in Where G1, G2 are the adjacent datasets, graph or graph set. G1, this model satisfies out-link privacy. An attacker cannot This is a traditional definition of DP, which covers the strong determine whether someone has provided their own data. The mathematical guarantees of DP, and also, answer the RQ1. For privacy guarantee of this model is somewhere between node noise generation, a common approach is Laplace noise. privacy and edge privacy. It improves the edge privacy. For Variants on DP in different cases and noise functions would not a node, although the inner links pointing from other nodes to be presented in this paper. that node are not privatized, the node (the investigated person) could plausibly deny the relationship with other nodes and is B. Privacy issues in social network analysis (RQ1) untraceable. Out-link privacy could privatize more queries than edge privacy. 35 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 10, NO. 1, NOVEMBER 2021 Table 1. Privacy model and the feasibility of corresponding privatization group aggregation. Therefore, it is necessary to privatize the targets triangle counts. Privacy Privatization Target model Triangle Degree Centrality [20] In the worst scenario, changing a node to a graph containing counting [19] distribution [5] n edges can reduce the number of triangles of the graph by 2n. Node Privacy Conditional Invalid Invalid A [23], for which the sensitivity is unbounded, both the larger [17] Edge Privacy Conditional Effective Invalid the graph, the more sensitive it may be. Therefore, node privacy [21] is usually not suitable to be used to privatize triangle counts, Out-link Effective Effective Conditional but considering the idea of smoothing sensitivity, node privacy Privacy [5]* can be made private in graphs with low maximum degree [5]. Partition Conditional Conditional Conditional Privacy [5]* For edge privacy, the worst case is changing an edge in a * Only theory is feasible, lack of cases. graph containing n nodes that is added or removed would change the number of triangles by 2n,−so theoretical edge 4) Partition Privacy: Unlike the above privacy model, privacy also does not guarantee well the privatization of the the partitioned privacy, proposed by [5], is the ensemble of number of triangles. However, given the idea of smoothing social network graphs. The adjacent dataset is no longer two sensitivity, the edge privacy model could be used when the network graphs, but the ensemble of two network graphs G1, number of changed triangles is small. But, when the number G2, where G2 = G1 - gi, and gi denotes the subgraphs in the of changed triangles is high, it produces a large error [21]. ensemble. Specifically, many social network analyses are 2) Degree Distribution: The degree distribution of a performed in the network ensemble, which divides respondents graph usually divides the nodes by the degree of the nodes in into different groups for analysis. What is privatized in partitioned privacy is some subgraph or subset of the ensemble. the graph and plots them as a histogram. Usually, it represents It offers stricter privacy protections than node privacy, and it the structure of social networks for subsequent model analysis protects the entire social network rather than the member. or similarity comparison [24]. 3) Out-link privacy: All the above models are based on For node privacy, changing a node in a graph containing k traditional DP. Nowadays, there is a generative graph model edges could affect a maximum of 2k + 1 values of the node that exploits LDP, and many researchers try to add noise to the degree distribution. Since the sensitivity depends on the information locally at the information gathering stage [22], so maximum value that a single node could affect, it is usually that the data analysts are not informed of the real data. The considered infeasible to privatize the degree distribution using noisy data would generate a generative network graph with node privacy. high similarity to the original data according to the algorithm. For edge privacy, in the case of k edge privacy (adjacent Another way is to generate network graphs with similar graphs differing by k edges), the maximum possible impact features based on the original network graph after processing is 4k. When the graph is large enough, this is a noise that could [7], and these new methods would not be discussed in detail in be ignored, and the feasibility of the approach has been this review. successfully demonstrated. [25]. 3) Centrality: In social network analysis, the centrality D. Application of DP in social network analysis (RQ3) of a node individual indicates the importance of individual in This section reviews the implementation of DP in social net- the network. The node with the highest centrality is the most work analysis. Which including DP used to privatize triangle influential and central member of the network [20]. The value counting, degree distribution and centrality. Table 1 shows the of centrality is usually determined by the number of shortest feasibility of privacy model on corresponding application. paths through a node in the graph or the distance from each For out-link privacy and partition privacy, we consider the boundary. relevant techniques that are not yet mature, because there are Measures of centrality present an obstacle to the traditional few relevant examples and only theoretical proofs are given in DP approaches in social network analysis. The distribution of [5]. And in that results, out-link privacy and partition privacy centrality is very sensitive due to the presence of bridges (key could be used to privatize the triangle counting and degree nodes that connect two subgraphs) that may appear in most distribution, but partially suitable for centrality. The principles graphs. Removing a node or an edge could lead to a complete of the two model have been roughly introduced in the previous change of the original path in the network, resulting in section, so we do not discuss them here. catastrophic errors. In general, neither node privacy nor edge 1) Triangle Counting: The triangle in the social network privacy can solve the centrality privatization problem. graph indicates that three nodes are connected to each other At this point, all the questions raised in the previous with edges [19]. The triangle count is an important parameter chapters have been answered. In the next section, the of the clustering as a result, the change in the number of problems will be reflected and discussed, and possible ways triangles is proportional to the size of the graph coefficient, to improve them will be suggested. which is a common metric for describing physical signs like 36
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-