IJDATICS_05_02.pdf

i Preface Welcome to the Volume 5 Number 2 of the International Journal of Design, Analysis and Tools for Integrated Circuits and Systems (IJDATICS). This issue comprises of i) enhanced and extended version of research papers from the International DATICS Workshops in 2014, and ii) ordinary manuscript submissions in 2014. DATICS Workshops were created by a network of researchers and engineers both from academia and industry in the areas of i) Design, Analysis and Tools for Integrated Circuits and Systems and ii) Communication, Computer Science, Software Engineering and Information Technology. The main target of DATICS Workshops is to bring together software/hardware engineering researchers, computer scientists, practitioners and people from industry to exchange theories, ideas, techniques and experiences. This IJDATICS issue presents three high quality academic papers. This mix provides a well-rounded snapshot of current research in the field and provides a springboard for driving future work and discussion. The three papers presented in this volume are summarized as follows: • Text Clustering : Ciganaitė, Macku tė - Varoneckienė, and Krilavič ius investigate the performance of bag of words, and inverse document frequency along with three common clustering algorithm on less popular languages such as Azeri or Lithuanian. • Feedback Control System: Lei, Lee, Kwan, Lee, Huang, and Kwok propose and build a smart three- layer hierarchical power conversion system for laboratories as a means for enhancing cross- disciplinary integrated design coursework. • HDL SoC diagnostics: Vladimir Hahanov, Baghdadi Ammar Awni Abbas, Eugenia Litvinova, and Svetlana Chumachenko describes technology for diagnosis SoC HDL-models, based on transaction graph. We are beholden to all of the authors for their contributions to the Volume 5 Number 2 of IJDATICS. We would also like to thank the IJDATICS editorial team. Editors: Ka Lok Man , Xi’an Jiaotong-Liverpool University, China, David Afolabi , Xi’an Jiaotong-Liverpool University, China. ii Table of Contents Vol. 5, No. 2 , December 2014 Preface .............................................................................................. i Table of Contents ................................................................................. ii 1. Text Document Clustering .................................................................... .................................. G. C iganaitė, A. Mackutė - Varoneckienė, T. Krilavičius 1 2. Undergraduate Cross-Discipline Integrated Learning through Designing a Smart Hierarchical Power Conversion System ..................................................... Chi- Un Lei, Christopher H.T. Lee, T.O. Kwan, C.K. Lee, K.B. Huang, R.Y.K. Kwok 6 3. HDL SoC TAB-model for Diagnosis and Repair ............... Vladimir Hahanov Baghdadi Ammar Aw ni Abbas, Eugenia Litvinova, Svetlana Chumachenko 9 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 5, NO. 2, DECEMBER 2014 1 Text Document Clustering G. Ciganait ̇ e, A. Mackut ̇ e - Varoneckien ̇ e, T. Krilaviˇ cius Abstract —Document clustering is a well known technique for improving information retrieval results, organizing them, browsing in a collection of documents. It is quite well investigated for English and some other popular languages, however, the performance of different techniques, combining features, their selection and clustering methods for less popular languages, such as Azeri, Lithuanian or Russian is not known. In this paper we compare the performance of two document representations, namely word frequency in documents (bag of words, BOW) and term frequency - inverse document frequency (TF-IDF) and three clustering algorithms (k-means, spherical k-mean and hierarchical clustering with Ward linkage) with Euclidean and cosinus distances. The performance was evaluated using a number of criteria: precision, recall, F-score, Rand index and purity. Overall, the best results were achieved using TF-IDF document representation, cosine dissimilarity and spherical k-means. Index Terms —Text documents clustering; distance measures; k-means algorithm; spherical k-means algorithm; hierarchical clustering with Ward linkage. I. I NTRODUCTION D OCUMENT clustering is a well know technique for grouping documents by their similarity (or dissimilarity) [1], [2]. The goal of the clustering is to group objects into clusters where objects are as homogeneous as possible, and ob- jects in different clusters clearly differ from each other [3], [2]. Document clustering is well researched for English and some other popular languages [1], [2], and applied for improving information retrieval results [2], [4]. As in most cases in Nat- ural Language Processing, clustering is not language agnostic, therefore we investigate the performance of different document representation and clustering techniques for the Azeri language [5], [6], [7]. Most common clustering methods are k-means [2] and spherical k-means [8] for flat (partitional) clustering and hierarchical clustering [2], therefore we experiment using them in combination with most common document representations, namely bag of words (BOW) [9] and TF-IDF [7], [10], [11]. The performance of techniques is compared with each other and random baseline using a number of criteria: precision, recall, F-score, Rand index and purity [2]. Document representation techniques are discussed in sec- tion II, measures to determine relation between documents described in section III-A, k-means, spherical k-means and hierarchical clustering algorithms are presented in sections III-B – III-D, respectively and clustering quality evaluation indexes are described in section III-E. Experimental results are given in section IV. Conclusions and feature plans are defined in section V. G. Ciganait ̇ e and A. Mackut ̇ e - Varoneckien ̇ e – Vytautas Magnus University, Faculty of Informatics, Kaunas, Lithuania. T. Krilaviˇ cius – Vytautas Magnus University, Faculty of Informatics, Kaunas, Lithuania, Baltic Institute of Advanced Technology, Vilnius, Lithuania. The research was partially funded by ESFA (VP1-3.1-MM-10-V-02-025). II. D OCUMENT R EPRESENTATION Feature set is represented by n × m matrix with m features (columns) and n objects (rows), where usually n < m . In this case, features are words ( t ) and objects are documents ( d ∈ D , where D is set of document). The best known method to form feature matrix is Bag Of Words (BOW). This method counts the frequency ω t,d of words in a document. The most popular method is a simple frequency: ω t,d = f ( t, d ) . The problem with BOW is that sometimes words which occur many times but are not signif- icant can make a big influence to the dissimilarity measure. The solution is TF-IDF [11], [10]. It reduces the influence of insignificant words to distance measure and increase the influence of significant words. This procedure is a product of frequency and idf t,D = ln N n , where n = |{ d ∈ D : t ∈ d }| is a number of documents which contain word t , then tfidf ( t, d, D ) = ω t,d · idf t,D [11], [12], [2]. In practice, more methods can be used, for example, stem- ming, but not all languages have tools for preprocessing, while BOW does not require information about properties of a language. III. C LUSTERING METHODS In this paper we will use the most popular clustering tech- niques: k-means, spherical k-means and hierarchical methods, which are defined in sections III-B–III-D, respectively. We have n × m matrix of features, the set of classes C = ( C 1 , C 2 , ..., C k ) and document dependence on one of them. The main goal of clustering is to group the same documents to such clusters as they are classified, but we are not dependent on labels of classes or clusters we get. It means that despite the fact that documents are clustered in cluster C i although we know that all of them depend on class j ( i 6 = j ), clustering will be assumed as correct. A. Distance measure Distance measure in clustering is a numeric expression which expresses similarity or dissimilarity between documents d 1 = ( ω t 1 ,d 1 , ω t 2 ,d 1 , ..., ω t m ,d 1 ) d 2 = ( ω t 1 ,d 2 , ω t 2 ,d 2 , ..., ω t m ,d 2 )) In case of binary feature representation where ω ( t,d ) = 1 when word occurs in a document or ω ( t,d ) = 0 vice versa the Jaccard index [2], [3] could be used: dist J ( d i , d j ) = 1 − | d i ∩ d j | | d i ∪ d j | (1) INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 5, NO. 2, DECEMBER 2014 2 In other cases the most popular measure is Minkowski distance: dist ( d i , d j ) = p √ √ √ √ n ∑ i =1 | ω t k ,d i − ω t k ,d j | p where k = 1 , 2 , ..., m and i, j = 1 , 2 , ..., n . When value p = 1 , Minkowski distance is called Manhattan distance: d M ( d i , d j ) = n ∑ i =1 | ω t k ,d i − ω t k ,d j | (2) and when p = 2 , it is an Euclidean distance [13], [14], [2], [3] d E ( d i , d j ) = √ √ √ √ m ∑ i =1 ( ω t k ,d i − ω t k ,d j ) 2 (3) For text documents often cosine distance [15], [16], [2], [3] is used: d cos ( d i , d j ) = 1 − ∑ n i =1 ω t k ,d i · ω t k ,d j √∑ n i =1 ω 2 t k ,d i · √∑ n i =1 ω 2 t k ,d j (4) B. K-means K-means is an iterative technique whose basic aim is to assign documents to the nearest centroid (object which is considered as a center of cluster). Usually, k-means for de- termination of document interrelation uses Euclidean distance. The method works in 4 steps [17], [2], [3]: Step 1: select a number of clusters k ; In this paper we will assume that we know the number k . However, naturally, it is not known, and different techniques are used to determine it, e.g. elbow, silhouette, gap statistics [18], [19]. Step 2: distribute all documents to the nearest centroid c i , i = 1 , 2 , .., k ; The initial set of centroids C = ( C 1 , C 2 , ..., C k ) is selected at random and other documents are assigned to that centroid where the distance is the shortest. As a result of random initial centroids, most of the times the outcomes of the method will be different. Step 3: recalculate centroids c i ; In this part the method calculates the mean of all clusters individually. This value will be assumed as a new cen- troid. In step 2 centroids are solely documents. After the first recalculation of centroids it could be not a document but a point in the space R m Step 4: repeat the 2 nd and 3 rd steps while centroids do not change their position and documents do not migrate among clusters. The main criteria of k-means clustering is SSE: SSE = k ∑ i =1 ∑ x ∈ C i dist ( c i , d ) 2 (5) where c i is a centroid of C i cluster, d is a document in C i cluster. This criteria can be treated as an optimization problem and then minimize SSE is an objective function [1], [2]. C. Spherical k-means Spherical k-means (SK-means) is a popular method for document clustering. Its execution is quite similar to k-means, but SK-means uses cosine similarity and operates on vectors that lie on the unit sphere [20], [21]: Let us assume that we know the number of clusters k Step 1: initialize centroid set { c (0) 1 , c (0) 2 , ..., c (0) k } ; c ( t ) i is the mean vector of cluster normalized to the unit Euclidean norm: c (0) i = ∑ x ∈ π (0) i x ∥ ∥ ∥∑ x ∈ π (0) i x ∥ ∥ ∥ , where π (0) i - initial document partition, t = 0 , 1 , ... - iteration index, i = 1 , 2 , ..., k Step 2: assign documents to the closest centroid by cosine similarity; Step 3: recompute centroids corresponding to the parti- tion completed in step 2; c ( t +1) i = ∑ x ∈ π ( t +1) i x || ∑ x ∈ π ( t +1) i x || Step 4: repeat the 2 nd and 3 rd steps while increase of objective function Φ ( { π i } k i =1 ) is greater than ε Objective function is Φ ( { π i } k i =1 ) = ∑ k i =1 ∑ x ∈ π i x T c i D. Hierarchical clustering The main aim of hierarchical clustering is creating a hier- archy of clusters. Usually, agglomerative or divisive [15], [1] approaches are used. In this paper we will use an agglomera- tive method, because it is less complex [2], [3]. Agglomerative hierarchical clustering algorithm consists of 4 steps [2]: Step 1: calculate proximity matrix; This is n × n matrix which describes distances between documents. Distances can be calculated by one of the measures described above. Step 2: merge the closest pair of clusters; According to proximity matrix, the closest pair of docu- ments is merged to one cluster. Step 3: update proximity matrix. While every document is in the separate cluster, merging is a very simple operation: the closest pair of documents is merged to one cluster. After the first iteration simi- larity among clusters should be calculated. Usually, the following criteria are used. Single link method interprets similarity as the short- est segment between documents in separate clusters: dist s ( U, V ) = min x i ∈ U,y j ∈ V dist ( x i , y j ) , (6) According to a complete link method, similarity is the longest segment between documents in separate clusters: dist c ( U, V ) = max x i ∈ U,y j ∈ V dist ( x i , y j ) , (7) average link – similarity is the segment between the average value of clusters: dist a ( U, V ) = ∑ x i ∈ U ∑ y j ∈ V dist ( x i , y j ) v U n V (8) INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 5, NO. 2, DECEMBER 2014 3 With Ward method two clusters are the closest if after merging SSE increases the least: dist w ( U, V ) = || ̄ U − ̄ V || 2 1 n U + 1 n V (9) Here, U , V - clusters, x i , y j - documents, n U , n V - number of elements in clusters U and V respectively, ̄ U , ̄ V - the mean of clusters U and V respectively. Experiments have shown that results of single link, com- plete link and average link are very poor, while the results of Ward method and k-means are quite similar. Step 4: repeat 2 nd and 3 rd steps while only one cluster is left. E. Clustering Quality Evaluation If documents were assigned to the right class, the model can be considered as suitable for the document set. Indexes (evaluation measures) are the best way to check suitability. In this paper we used the following methods: precision, recall, F-value, purity and Rand index. All indexes require to form confusion matrix. The following contingency table is computed in accordance with the confusion matrix [3], [15], [1], [2]: TABLE I C ONTINGENCY TABLE Relevant Not relevant Retrieved True positive (tp) False positive (fp) Not retrieved False negative (fn) True negative (tn) Let us assume that we know information about real classes, then contingency table helps to estimate the following indexes. Precision – The fraction of selected documents that are relevant [2]: P = tp tp + f p ∈ [0 , 1] (10) Recall – the fraction of relevant documents that are retrieved [2]: R = tp tp + f n ∈ [0 , 1] (11) F-measure is a harmonic mean of P and R [2]: F = 2 P R P + R . (12) Index F varies in the interval [0 , 1] . Higher value means better clustering. Rand Index – It measures the probability of the right decisions. RI = tp + tn tp + f p + f n + tn ∈ [0 , 1] (13) Purity : purity ( U, V ) = 1 n ∑ i max j ( u i ∩ v j ) (14) Here U = ( u 1 , u 2 , ..., u i ) , V = ( v 1 , v 2 , ..., v j ) are clusters. Index sums the most popular class objects in each cluster and divides by number of objects n purity ( U, V ) ∈ [0 , 1] , higher value means better clustering. TABLE II S TRUCTURE OF EACH CLASS Class ID Title # of Documents (1) Economy 500 (2) Energetics 328 (3) Disasters 490 (4) Policy 500 (5) Entertainment 480 (6) Sport 495 (7) Technology 350 Overall 3143 IV. E XPERIMENTAL E VALUATION A. Dataset All experiments were performed with Internet media texts corpora in the Azeri language. Corpora consists of 3143 documents and each of them is characterized by 118714 features. Table II describes the structure of each class. B. Experiments Clustering algorithms were executed with two different distance measures and two feature representation methods. Clustering results were evaluated in subsection III-E. Statistical tool R [22] was chosen for experiments. First of all we formed 3143 × 118714 size BOW feature matrix. To reduce clustering time and memory requirements we removed terms which occur in more than 2/3 of documents. Features which occur less than in 3 documents were removed as insignificant, as well. Thus, only 34,88% of features have left. The size of the reduced feature matrix is 3143 × 441406 From such matrix TF-IDF feature matrix was calculated. We use random baseline for the evaluation of results: each document is assigned to one of 7 classes at random ten times. After this we estimated values of indexes and presented the results as mean and standard deviation. Clustering results are presented in table III: HWE - hierarchical Ward method with Euclidean distance, HWC - hierarchical Ward method with Cosine distance, KM - K-means method with Euclidean distance, SKM - Spherical K-means with Cosine similarity. Experiments with clustering algorithms KM and SKM were repeated 10 times and results are presented by mean ± standard deviation of index values. High values of all indexes in result table III show that cosine distance is very effective and appropriate dissimilarity measure for document clustering despite the way of document representation. Comparing the results of BOW and TF-IDF can be said that feature representation method TF-IDF is more effective because execution time is shorter and results of SKM method with TF-IDF matrix completed the best clustering of all methods which were experimented. It is possible that deeper research of representation matrix could help to make SKM execution time shorter and positively effect clustering results. It is important to note that all methods showed better results than random baseline. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 5, NO. 2, DECEMBER 2014 4 Fig. 1. Chart of clustering quality. TABLE III C LUSTERING RESULTS Method P R F RI purity Relative duration, BOW HWE 0.302 0.359 0.328 0.82 0.41 642.41 HWC 0 814 0 698 0 752 0 913 0 696 5.82 KM 0.214 ± 0.008 0.261 ± 0.009 0.235 ± 0.008 0.796 ± 0.003 0.326 ± 0.021 10.78 SKM 0.723 ± 0.053 0.674 ± 0.055 0.698 ± 0.054 0.905 ± 0.015 0.673 ± 0.048 279.01 TF-IDF HWE 0.301 0.318 0.309 0.802 0.341 642.99 HWC 0.692 0.54 0.606 0.88 0.603 1.00 KM 0.264 ± 0.049 0.25 ± 0.018 0.255 ± 0.023 0.793 ± 0.008 0.298 ± 0.037 9.31 SKM 0 821 ± 0.07 0 794 ± 0.073 0 807 ± 0.072 0 948 ± 0.019 0 827 ± 0. 054 211.95 Random baseline 0.161 ± 0.017 0.161 ± 0.017 0.161 ± 0.017 0.754 ± 0.076 0.176 ± 0.018 – V. C ONCLUSIONS a) Results: Experiments were performed with 1) Azeri corpora. 2) Two document representation techniques, namely BOW and TF-IDF. 3) Two dissimilarity measures: Euclidean and cosine. 4) Three clustering methods: k-means, spherical k-means and hierarchical clustering with Ward linkage. 5) Clustering results were evaluated using precision, recall, F-score, Rand index and purity criteria and compared with each other and random baseline. b) Conclusions: 1) The best results were achieved using TF-IDF, spherical k-means and cosine dissimilarity. 2) All methods have shown better results than random baseline. c) Future plans: 1) Reduction of feature matrix for faster and more precise clustering. 2) Features as symbolic n-grams. 3) Outliers analysis. 4) Determination of number of clusters. 5) Experiments with more clustering methods. R EFERENCES [1] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, (First Edition) Boston, MA, USA: Addison-Wesley Longman Pub- lishing Co., Inc., 2005. [2] C. D. Manning, P. Raghavan, and H. Sch ̈ utze, Introduction to Infor- mation Retrieval New York, NY, USA: Cambridge University Press, 2008. [3] M. Kantardzic, Data Mining: Concepts, Models, Methods and Algo- rithms New York, NY, USA: John Wiley & Sons, Inc., 2002. [4] E. M. Rasmussen, “Clustering algorithms,” in Information Retrieval: Data Structures & Algorithms , 1992, pp. 419–442. [5] G. Ciganait, A. Mackut-Varoneckien, and T. Krilaviius, “Text documents clustering,” Informacins technologijos. XIX tarpuniversitetin magistrant ir doktorant konferencija ”Informacin visuomen ir universitetins studi- jos” (IVUS 2014) : konferencijos praneim mediaga , 2014. [6] T. Krilaviˇ cius, ˇ Z. Medelis, J. Kapoˇ ci ̄ ut ̇ e-Dzikien ̇ e, and T. ˇ Zalandauskas, “News media analysis using focused crawl and natural language pro- cessing: Case of lithuanian news websites,” in Information and Software Technologies Springer, 2012, pp. 48–61. [7] A. Mackut-Varoneckien and T. Krilaviius, Empirical Study on Unsuper- vised Feature Selection for Document Clustering IOS Press, 2014. [8] I. S. Dhillon, Y. Guan, and J. Kogan, “Refining clusters in high dimensional text data,” in Proceedings of the Workshop on Clustering High Dimensional Data and its Applications, Second SIAM International Conference on Data Mining SIAM, 2002, pp. 71–82. [9] C. Potts, “Distributional approaches to word meanings,” 2013. [10] J. Ramos, “Using tf-idf to determine word relevance in document queries.” [11] J. Zobel and A. Moffat, “Exploring the similarity space,” SIGIR FO- RUM , vol. 32, pp. 18–34, 1998. [12] B. K. W and et al., “Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches,” PLOS ONE , vol. 6, March 2011. [13] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” 1999. [14] A. Singh, A. Yadav, A. E. Block, A. Rana, E. Block, and G. Floor, “K-means with three different distance metrics.” INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 5, NO. 2, DECEMBER 2014 5 [15] M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” in In KDD Workshop on Text Mining , 2000. [16] K. Hornik, I. Feinerer, M. Kober, and C. Buchta, “Spherical k -means clustering,” Journal of Statistical Software , vol. 50, no. 10, pp. 1–22, 2012. [17] V. ekanaviius and G. Murauskas, Statistika 2 TEV, 2002. [18] T. M. Kodinariya and P. R. Makwana, “Rewiew on determining number of cluster in k-means clustering,” International journal of Advance Research in Computer Science and Management Studies , 2013. [19] R. Tibshirani, G. Walther, and T. Hastie, “Estimating the number of clusters in a dataset via the gap statistic,” vol. 63, pp. 411–423, 2000. [20] I. S. Dhillon, Y. Guan, and J. Kogan, “Refining clusters in high dimensional text data,” in Proceedings of the Workshop on Clustering High Dimensional Data and its Applications, Second SIAM International Conference on Data Mining SIAM, 2002, pp. 71–82. [21] S. Zhong, “Efficient online spherical $k$-means clustering,” in Proc. 2005 IEEE International Joint Conference on Neural Networks , vol. 5, 2005, pp. 3180–3185. [22] R. Ihaka and R. Gentleman, programming language R [Online]. Available: http://www.r-project.org/ Auˇ sra Mackut ̇ e-Varoneckien ̇ e received PhD degree in Informatics at Vytautas Magnus University (Kau- nas, Lithuania) in 2007. Currently she is researcher and lecturer at Informatics faculty, Vytautas Magnus University. Main research interests are data mining, text mining, global optimization, multiobjective op- timization, multidimensional data visualization. Tomas Krilaviˇ cius received PhD degree in Com- puter Science from University of Twente, Enschede, The Netherlands, in 2006. Now he works at In- formatics faculty, Vytautas Magnus University and Baltic Institute of Advanced Technology. His current research interests include robotics, language tech- nologies and data mining. Greta Ciganait ̇ e received bachelor degree in math- ematics at Vytautas Magnus University (Kaunas, Lithuania) in 2015. Main research interests are statis- tics and data mining. Currently she is working on text documents clustering. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 5, NO. 2, DECEMBER 2014 6 Abstract —The manuscript describes the design of a smart hierarchical power conversion system as a project vehicle in a cross-discipline integrated design project (IDP) course. A three-layer hierarchical design process has been used to design the project vehicle: electrical circuit level, feedback control system level and decision support network level. This project allows senior undergraduate students to consolidate their technical knowledge and design skills of feedback control systems, energy storage systems, big data analysis systems and high-power electrical systems. Since students learn experientially, the project is also useful to motivate students towards adapting electrical system design as well as provide solid learning evidence to their future employers, in the era of Internet of Things and Big Data. Index Terms —cross-discipline learning, data sciences, design project, energy conversion, project-based learning, project vehicle. I. I NTRODUCTION ECENTLY, emerging student learning technologies and process have been introduced to reshape the scope of engineering education [1 − 6]. In particular, in order to assess students’ competence comprehensively, various project-based learning curricula have been revamped. In our department, we have also reformulated the cross-discipline integrated design project (IDP) course. We hope that through the revamped IDP, senior undergraduate students in teams can have an opportunity to apply and integrate their knowledge and skills in practices, to implement a practical electrical/electronic system, in the era of Internet of Things (IoT) and Big Data [7 −11 ]. In order to assist students learn effectively and solidly, project vehicle (i.e. the product to be designed) has to be designed carefully. In this paper, we discuss how a smart energy converter can be used as a project vehicle in our IDP course. Contributions of our paper are as follows: • Pedagogical requirements and technical requirements of the project vehicle have been described. • Design of the smart power conversion system has been outlined, based on an intelligence hierarchy. This research is partially supported by the Research Development Fund (RDF-13-01-13) from Xi'an Jiaotong-Liverpool University, China. Chi-Un Lei, Christopher H.T. Lee, T.O. Kwan are with the Department of Electrical and Electronic Engineering, University of Hong Kong, Hong Kong. Email: culei@eee.hku.hk • Problems encountered and experiences gained in the prototyping stage have been summarized. II. R EQUIREMENTS OF THE P ROJECT V EHICLE According to Kolb's learning cycle [12], for learning to take place in experiential learning courses (e.g. design courses), the project vehicle should be designed carefully. In particular, the vehicle should be designed, such that the following events can be accomplished: • concrete experiencing (i.e. working actively, instead of simply watching or reading), • reflective observation (i.e. observe and reflect what has been done and experienced), • abstract conceptualization (i.e. frame the observation), and • active experimentation (i.e. put what they have learnt into practice). Meanwhile, in order to help students equip product design skills, the project vehicle should help students master the following contents: • design principles of an integrated system, • design techniques of electronic systems, • use of computer-aided design tools and equipment for building electronic systems, • integrated knowledge and skills from different electrical and electronic engineering disciplines (i.e., computer engineering, electronic engineering and electrical engineering), and • techniques of problem solving and project management in a mixed design team. Thus, the project vehicle should have a clear design purpose, functionality and hierarchy, such that students are assisted to conceive, design and implement an electronic system [13]. In the course, the project vehicle contains the following modules: • electrical circuits that interacts with physical quantities, including high-power quantities, • feedback control systems that integrate actuators, sensors and microcontrollers for processing, and • decision support networks that turn external and internal data into actionable knowledge for the system Undergraduate Cross-Discipline Integrated Learning through Designing a Smart Hierarchical Power Conversion System Chi-Un Lei, Christopher H.T. Lee, T.O. Kwan, C.K. Lee, K.B. Huang, R.Y.K. Kwok R INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 5, NO. 2, DECEMBER 2014 7 III. D ESIGN OF THE S MART P OWER C ONVERSION S YSTEM Based on requirements described in Section II, a smart energy conversion system for temperature regulations in smart laboratories [3] has been proposed as the project vehicle. Besides circuits for sensing and actuating, feedback control and decision support have been added to enhance the functionality of the system. The block diagram of the system is shown in Fig. 1. Through designing the system, students can explore the following topics: • Control of thermoelectrical cooling systems • Power storage and regulations • Feedback control via open-source hardware • Decision support via big data analysis A. Electrical Circuit Level In order to regulate the environment, a solid-state thermoelectric heat pump (i.e. a peltier device) is used here to transfer heat from air in metal ventilation ducts to the ambient. Two blower fans are placed at the opening of ventilation ducts, in order to bring air out from the ducts. The current/power supplied to electrical devices (i.e. the system cooling strength) is controlled by a pulse-width modulation power driver. For safety reasons, the “high-voltage” input power source is replaced by a 48V AC current source. Furthermore, an array of large capacitors is used as a temporarily power boosting device. In normal situations, supplied power is not enough for the peltier and blower fans to perform a full-load operation, but can charge up capacitors one by one. However, when the temperature has to be lowered in a short period of time, charged capacitors can be discharged to provide an extra power for fans and the peltier, in order to perform a full-load operation temporarily (e.g. 15 seconds). Operation control of capacitors, fans and peltier are handled by an Intel 8051 microcontroller, through a battery management system configuration. We have provided a standard topology for students to imitate the design process. A prototype (as shown in Fig. 2) has been developed for demonstrations. Meanwhile, students are encouraged to revise the standard and develop their own design, according to their strengths and limitations. B. Feedback Control System Level An open-source microcontroller board Arduino is used for providing feedback to circuits and establishing a closed-loop feedback control. In the system, sensors are connected to the board for measuring temperatures, currents and other physical quantities. Based on the measurement, feedback control signals (e.g. desired fan speed and delivery power of the peltier) are sent from the Arduino board to inform the Intel MCS-51 microcontroller board. Multi-thread control and external interrupt control are used with teacher/community guidance. LCD panel is also installed for displaying messages to users. In the prototyping stage, we discovered that there may be insufficient input/output (I/O) ports for sensing and actuating in some situations. Thus, students have to learn to use 48V AC Current Source AC-DC Converter Microcontroller (Arduino UNO) Device Data Distribution DC Storage and Stabilizer PWM Motor Controller Peltier (With Sensors) Motor A (With Sensors) Motor B (With Sensors) Power Converter Microcontroller (Arduino UNO with WIFI Shield) Channel Data Analysis Current Sensors Electrical Circuit Level Feedback Control System Level Decision Support Network Level Microcontroller (Intel 8051) Data Fetching and Writing Weather Data Aggregation “ Low power ” “ High power ” Fig. 1 Block diagram of the proposed energy conversion system. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 5, NO. 2, DECEMBER 2014 8 multiplexing techniques (via shift registers) for saving I/O ports. C. Decision Support Network Level An Arduino with a WIFI shield is used to make high-level control judgments as well as exchange data between the system and the internet. In particular, the system can aggregate real-time data via APIs of online services (e.g. weather information and user schedules), as well as distribute data to online internet-of-things (IoT) channels for detailed analysis through Matlab. Through these interfaces, analytic charts and construct exploratory summaries of data can be shown to users in support of manual control decisions. Furthermore, the system has installed LEDs and an infrared remote controller for notifications and manual control. In order to connect to the internet, WIFI shield has been used. Furthermore, I 2C has been to communicate between other Arduino broads. Thus, open-source peripheral libraries have been used for JSON decoding, network communications, service API communications and inter-board communications. In the prototyping stage, we discovered that there may be insufficient internal memory for network communications and using online services in some situations. Thus, students have to perform tradeoff analysis and design a slim program for memory saving. Through designing, students have to learn techniques for data exchanging, analysis and visualization, which are valuable skills in the coming Big Data Era. IV. C ONCLUSIONS In this paper, we have shown how a smart hierarchical power conversion system can be used as a project vehicle for an integrated design project. In particular, we have developed a smart cooler for smart laboratories as the standard prototype. We hope that through designing the conversion system, students are better equipped to learn other advanced EEE materials and practically design electronic systems for their future career development in the era of Internet of Things and Big Data. R EFERENCES [1] N. Paulino, J.P. Oliveira, R. Santos-Tavares, "The design of an audio power amplifier as a class project for undergraduate students," in Proc. IEEE ISCAS, pp. 2565-2568, May 2013. [2] C.-U. Lei, H. K.-H. So, E. Y. Lam, K. K.-Y. Wong, R. Y.-K. Kwok, and C. K. Chan, “Teaching introductory electrical engineering: a project-based learning experience,” in Proc. IEEE TALE, Aug. 2012 [3] C.U. Lei, K.L. Man, H.-N. Liang, E.G. Lim, K. Wan, “Building an Intelligent Laboratory Environment via a Cyber-Physical System,” International Journal of Distributed Sensor Networks, vol. 2013, Article ID 109014, 9 pages, 2013 [4] C.-U. Lei, “Teaching Introductory Circuits and Systems: Enhancing Learning Experience via Iterative Design Process and Pre-/Post-Project Learning Activities,” in Proc. IEEE ISCAS, pp. 2413-2416, Jun 2014. [5] C.-U. Lei, N. Wong, K.L. Man, “Integration of a Wireless Sensor Network Project for Introductory Circuits and Systems Learning,” in Proc. IEEE ISCAS, pp. 2569-2572, May 2013. [6] C.-U. Lei, C. H.T. Lee, T.O. Kwan, C.K. Lee, K.B. Huang, R.Y.K. Kwok, K.L. Man, “The Design of a Smart Power Conversion System as an Undergraduate Cross-Discipline Integrated Design Project”, in Proc. IEEE ISOCC, Nov. 2014. [7] Michael E. Porter, James E. Heppelman, “How Smart, Connected Products Are Transforming Competition”, Harvard Business Review, vol. 92, no. 11, Nov. 2014. [8] M. Iansiti, K. R. Lakhani, "Digital Ubiquity: How Connections, Sensors, and Data Are Revolutionizing Business", Harvard Business Review, vol. 92, no. 11, Nov. 2014. [9] G.F. Hurlburt, J. Voas, "Big Data, Networked Worlds," Computer, vol.47, no.4, pp.84-87, Apr. 2014. [10] G. Kortuem, A.K. Bandara, N. Smith, M. Richards, M. Petre, "Educating the Internet-of-Things Generation," Computer, vol.46, no.2, pp.53-61, Feb. 2013. [11] L. Trappeniers et al. "The Internet of Things: the Next Technological Revolution," Computer, vol. 46, no. 2, pp. 24-25, Feb, 2013. [12] David A. Kolb, “Experiential learning: Experience as the source of learning and development.” Englewood Cliffs, NJ: Prentice-Hall, 1984. [13] Edward F Crawley, “Creating the CDIO syllabus, a universal template for engineering education,” in Proc. FIE, pp. F3F-8-F3F-13, 2002. Fig. 2. Implementation of the project vehicle (Electrical circuit level and decision feedback system level) in the prototyping stage. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 5, NO. 2, DECEMBER 2014 9 HDL SoC TAB-model for Diagnosis and Repair Vladimir Hahanov , Baghdadi Ammar Awni Abbas, Eugenia Litvinova, Svetlana Chumachenko Abstract — This article describes technology for diagnosis SoC HDL-models, based on transaction graph. Diagnosis method is focused to decrease the time of fault detection and memory for storage of diagnosis matrix by means of forming ternary relations between test, monitor, and functional component. The following problems are solved: creation of digital system model in the form of transaction graph and multi-tree of fault detection tables, as well as ternary matrices for activating functional components of the selected set of monitors by using test patterns; development of a method for analysis of the activation matrix to detect the faulty blocks with given depth and synthesis logic functions for subsequent embedded hardware fault diagnosis. Keywords – HDL SoC model; diagnosis; faulty blocks detection; transaction graph. I. TAB- MODEL FOR DIAGNOSIS FAULTY S O C COMPONENTS The goal is creation of a TAB-matrix model (Tests – Assertions – Blocks functional) model and diagnosis method to decrease the time of testing and memory for storage by means of forming ternary relations (test – monitor – functional component) in a single table. The problems are: 1) development of digital system HDL-model in the form of a transaction graph for diagnosing functional blocks by using assertion set [1-6,15]; 2) development method for analyzing TAB-matrix to detect minimal set of fault blocks [4-7,13]; 3) Synthesis of logic functions for embedded fault diagnosis procedure [8-11,14]. Model for testing a digital system HDL-code is represented by the following xor-relation between the parameters <test – functionality – faulty blocks B*>: B, A} {T B T B* ; 0 B* B T ⊕ × = ⊕ = = ⊕ ⊕ (1) which transforms relationship of the components in the TAB-matrix: B A) (T M , {B}} A} {{T M j i ij ⊕ × = × × = (2) Here, the coordinate of the matrix is equal to 1, if the pair test–monitor i A) (T × detects or activates some faults of the functional block B B j ∈ An analytical model for verification by using a temporal assertion (additional observation statements or lines) is focused to achieve the specified diagnosis depth and presented as follows: }. T ,..., T ,..., T , {T T }; S ,..., S ,..., S , {S S }; B ,..., B ,..., B , {B B }; A ,..., A ,..., A , {A A ); B , T ( f S ; S B) * A ( G , ) T , S , B , A , G ( f k i 2 1 m i 2 1 n i 2 1 h i 2 1 = = = = = × = = Ω (3) Here S B) * A ( G × = is functionality, represented by Code- Flow Transaction (CFT) Graph (Fig. 1); } S ,..., S ,..., S , {S S m i 2 1 = are nodes or software states when simulating test segments (patterns). Otherwise the graph can be considered as an ABC-graph – Assertion Based Coverage Graph. Each state } S ,..., S ,..., S , {S S ip ij i2 1 i i = is determined by the values of design essential variables (Boolean, register variables, memory). The oriented graph arcs are represented by a set of software blocks: (4) The assertion } A ,..., A ,..., A , {A A A n i 2 1 i = ∈ can be inserted to each block i B – a sequence of code statements which determines the state of the graph node ) B , T ( f S i i = depending on the test pattern The assertion monitor, uniting the assertions of incoming arcs iq ij i2 1 i i A ... A ... A A ) A(S ∨ ∨ ∨ ∨ ∨ = can be put on e