i Preface Welcome to the Volume 4 Number 2 of the International Journal of Design, Analysis and Tools for Integrated Circuits and Systems (IJDATICS). This issue comprises of i) enhanced and extended version of research papers from the International DATICS Workshops in 2012 and 2013, and ii) ordinary manuscript submissions in 2012 and 2013. DATICS Workshops were created by a network of researchers and engineers both from academia and industry in the areas of i) Design, Analysis and Tools for Integrated Circuits and Systems and ii) Communication, Computer Science, Software Engineering and Information Technology. The main target of DATICS Workshops is to bring together software/hardware engineering researchers, computer scientists, practitioners and people from industry to exchange theories, ideas, techniques and experiences. This IJDATICS issue presents three high quality academic papers. This mix provides a well- rounded snapshot of current research in the field and provides a springboard for driving future work and discussion. The three papers presented in this volume are summarized as follows: • Distributed Control: Bukowiec investigates how distributed control systems can be implemented on FPGAs via Petri Nets. • Reconfigurable Logic: Tkacz and Adamski apply Gentzen Reasoning to implement structured configurable controllers on FPGAs. • Embedded Systems: Knirsch, Schnarz and Wietzke present a novel approach to arbitrate shared resources in partitioned multicore systems via library interposition. • Digital Circuits: Kim proposes two methodologies for predicting transistor aging in nanometer digital circuits as well as controlling periodic jitter leakage. We are beholden to all of the authors for their contributions to the Volume 4 Number 2 of IJDATICS. We would also like to thank the IJDATICS editorial team. Editors: Ka Lok Man, Xi’an Jiaotong-Liverpool University, China, and Baltic Institute of Advanced Technology (BPTI), Lithuania Chi-Un Lei, University of Hong Kong, Hong Kong Amir-Mohammad Rahmani, University of Turku, Finland Nan Zhang, Xi’an Jiaotong-Liverpool University, China David Afolabi, Xi’an Jiaotong-Liverpool University, China ii Table of Contents Vol. 4, No. 2, December 2013 Preface ………………………………………………………………………………....... i Table of Contents ……………………………………………………………………….. ii 1. Distributed Control Systems Design as Petri Nets for FPGAs ...… Arkadiusz Bukowiec 1 2. Design of Structured Configurable Controllers Using Gentzen Reasoning 10 ......………………………………………………….. Jacek Tkacz and Marian Adamski 3. SHARB: Shared Resource Arbitration in Partitioned Multicore Systems via Library Interposition ……………....… Andreas Knirsch, Pierre Schnarz, and Joachim Wietzke 18 4. Transistor Aging Prediction in Nanometer Digital Circuits ……………. Kyung Ki Kim 28 5. A New Method for Periodic Jitter Leakage Control ……...…………….. Kyung Ki Kim 32 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 1 Distributed Control Systems Design as Petri Nets for FPGAs Arkadiusz Bukowiec Abstract—The paper describes a new method for the imple- (GALS) architecture is used in the entire system. Each of the mentation of the application specific distributed control systems, SM components is synchronized by local clock signal. For the constructed using the FPGA devices. The initial steps of the entire system to use an asynchronous mode of synchronization, proposed control algorithm rely on the notion of a Petri net, which is an easy way to describe parallel processes. The sub- the additional communication signals are required. sequent steps of the algorithm consist in the decomposition of In such solution the places of each SM subnet can be a given Petri net – with the use of a coloring algorithm – encoded with a minimal-length binary vector. This encoding into a set of state machine type subnets. In the results each allows to decompose the digital circuit of SM subnet into subnet represents one parallel process. These subnets are then double-level structure with separate circuits for firing transi- implemented independently in different FPGA devices. To ensure proper communication between all subnets, the entire control tions and generating microoperations [18], [19]. In this case, system uses a globally asynchronous locally synchronous (GALS) the realization of a microoperation decoder is possible with architecture with each subnet synchronized by the local clock the use of an FPGA embedded memory blocks [20]. This is signal. Global communication between components is buffer- done in a way that leads to a balanced usage of all the logical based and uses additional signals, generated in a given subnet resources of the FPGA device. and distributed to the remaining ones. Index Terms—Boolean algebra; Decomposition; Field pro- II. P ETRI NET grammable gate arrays; Logic synthesis; Sequential circuits. A simple Petri net [1], [2] is defined as a triple P N = (P, T, F ), (1) I. I NTRODUCTION where: P ETRI nets (PNs) [1], [2] are one of the most popular concepts used in formal design and synthesis of the application specific concurrent logic controllers [3], [4]. Such P T is a finite non-empty set of places, P = {p1 , . . . , pM } is a finite non-empty set of transitions, graphical representation of algorithm is very comfortable. It T = {t1 , . . . , tS } gives easy way for representation of concurrent processes and F is a set of arcs (describing flow relations) from places additionally there could be applied mathematical algorithms to transitions and from transitions to places: for formal analysis and verification of the designed model [5], [6], [7]. The digital design of the controllers is very often F ⊆ (P × T ) ∪ (T × P ), implemented using integrated circuit as field programmable P ∩T = ∅. gate arrays (FPGAs) [8], [9], [10], [11]. The most typical The sets of input and output transitions of a place pm ∈ P implementation of Petri nets in the FPGA devices uses the are defined respectively as follows: one-hot local state encoding method, where each place is represented by a single flip-flop [3]. This approach requires •pm = {ts ∈ T : (ts , pm ) ∈ F }, hardware implementation of a large number of logic functions pm • = {ts ∈ T : (pm , ts ) ∈ F }. and flip-flops included in logic cells. Moreover, there is no Sets of input and output places of a transition ts ∈ T are possibility to distribute such system. defined respectively as follows: In this paper there is proposed a method for the implementa- tion of a Petri net representing a distributed application specific •ts = {pm ∈ P : (pm , ts ) ∈ F }, logic controller [12]. To allow its subsequent decomposition, ts • = {pm ∈ P : (ts , pm ) ∈ F }. the Petri net is initially colored [13], with places that have A marking of a Petri net is defined as a function: the same color forming a one-state machine (SM) module. In the process, we use a new subnets extracting procedure that M : P → N. supplements the known methods of Petri net SM coloring [3], Intuitively, given a place pm the function M (pm ) returns the [14], [15], [16], [17]. As a result, each SM module can be number of tokens in pm . A place or a set of places is marked implemented in a separate FPGA device. To ensure proper when it contains a token. A transition ts can be fired if all communication a globally asynchronous locally synchronous its input places are marked. Firing a transition removes tokens A. Bukowiec is with Institute of Computer Engineering and Electronics, from its input places and puts one token in each output place. University of Zielona Góra, ul. Podgórna 50, 65-246 Zielona Góra, Poland When the initial marking M0 is additionally specified, the Petri (Email: a.bukowiec@iie.uz.zgora.pl). net can be represented as a tuple: The research was financed from budget resources intended for science in 2010–2013 as an own research project No. N N516 513939. P N = (P, T, F, M0 ). (2) INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 2 A. Colored Petri Net X1 Y1 X Y XZ1 A Petri net can be enhanced by assigning colors to places XZ SMN1 YZ1 and transitions [1], [3], [13]. In a state machine (SM) colored CLK RES YZ Petri net the colors aid in an intuitive as well as formal consistency check for all the sequential processes covering X2 Y2 X Y the net. Each color recognizes one SM subnet. The rules for XZ2 the Petri net coloring are as follows [14], [15]: XZ SMN2 YZ2 • each place and transition must have at least one color X CLK RES YZ Y assigned, Z • if the place is assigned with a given color, each of its RES I input and output transitions must be assigned with the XI Y X Y same color, XZI I • the input places of each transition must be assigned with XZ SMN YZI CLK RES YZ different colors, • the output places of each transition must be assigned with different colors, • the input and output places of a given transition must share the same set of colors, Fig. 1. Architecture of distributed control system • the initially marked places cannot share the same set of colors, • the number of different colors which are shared by the initially marked places must be equal to the total number X X D Y Y1 D Q of colors. XZ XZ CC D CLK RG Q Q Y Q RES YZ YZ LOCAL B. Interpreted Petri Net A Petri net enhanced with an additional feature for infor- RES mation exchange is called an interpreted Petri net [2]. If an interpreted Petri net is also enhanced with colors, it is often Fig. 2. Architecture of single SM module called a colored interpreted Petri net. This exchange is made by use of binary signals. Interpreted Petri nets are used as models of concurrent logic controllers. and includes much more complex extensions of the basic The Boolean variables occurring in the interpreted Petri net definition, including the notion of a multi-active macroplace can be divided into three sets: [22]. X is the set of input variables, X = {x1 , . . . , xL }, Y is the set of output variables, Y = {y1 , . . . , yN }, III. S YSTEM A RCHITECTURE Z is the set of internal communication variables (most cases not used, with Z = ∅). The use of the globally asynchronous locally synchronous An interpreted Petri net has a guard condition ϕs associated (GALS) architecture for a distributed control system seems with every transition ts . The guard condition ϕs is defined to to be preferable. Since each color recognizes one sequential be a Boolean function of a subset of variables from the sets X process, the decomposition of a colored Petri net into SM and Z. In a special case, the condition ϕs can be defined as 1 subnets is based on the assigned colors. As a result, we (always true). A transition ts can be fired if all its input places obtain I SM subnets, where I is the number of the Petri are marked and the current value of the corresponding Boolean net colors. (The following section describes the process of function ϕs is equal to 1. The conjunction ψm associated with decomposition in more detail.) In the next step each SM subnet a place pm is an elementary conjunction of positive literals is implemented as a double-level sequential circuit (SMNi ) formed from output variables from the set Y . If the place pm is with local clock signals (Fig. 1) [23], [19]. The communication marked, the output variables from corresponding conjunction between components is made asynchronous in a buffer-based ψm are set and other variables are reset. mode. We note that there exists a multitude of other commu- nication methods, such as Handshake, FIFO-, Controller-, or Lookup-based [24]. However, in the considered architecture C. Macroplace there is no real data exchange, since only the triggers are Macroplaces correspond to a subnet or a part of a subnet sent to other components. Note that FIFO-, Controller-, and assigned with a particular color and enhance a Petri net with Lookup-based communication types are not adequate here, a hierarchy [2], [5]. In this article we make frequent use of since they are designed for systems where large amount of data the so-called mono-active macroplaces, that is macroplaces is exchanged. Only the Handshake method could be applicable, consisting of sequential places, with one input and one output but it requires a modification of the SM subnets control [21]. We note that the macroplaces theory is well developed algorithm. Hence we suggest the use of dedicated buffers INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 3 to send all the triggers. Each SMNi circuit (Fig. 2) consists here, for more information see [16], [17]. Since the coloring of a first-level combinational circuit CCi , a memory register algorithms do not care about the output signals, we need to RGi , and a second-level output decoder Yi . The combinational make sure that places, which elementary conjunctions include circuit CCi is responsible for the generation of the excitation the same output signal, have at least one color in common. functions for the memory register. It implements transition This is required to ensure the proper functioning of the control functions, and it can be described as follows: system. It also prevents the occurrence of the single resource access conflict between concurrent processes. All this can be Di = Di (X i , XZi , Qi ), (3) viewed as an extension of a coloring algorithm. Note that where: colors should be also assigned to elementary conjunctions Xi is a subset of the set of the Petri net input variables of places and that each elementary conjunction should be (X i ⊆ X), assigned the same color as its place. Hence the rules for the i XZ is a subset of the set of the additional internal Petri net coloring should be extended with one condition: variables used for synchronization (XZi ⊆ Z), • Elementary conjunctions containing a common output i Q is the set of variables used to store a code of the signal must share at least one of the colors. currently marked place in a given SM subnet. The entire synthesis process consists in the following steps: Recall that a given SM subnet recognizes one sequential 1) Decomposition into SM subnets. The purpose of this step process at a time, hence only once place can be marked in is to obtain subnets from the initial colored Petri net, a given moment. The memory register RGi consists of the assuming that the Petri net is colored with I different D-type flip-flops. It is used to store the code of the currently colors, where i = 1, . . . , I is a number of particular marked place. The number of required flip-flops depends on color. The process of decomposition begins from the the number of places as well as on the method of encoding. first color, denoted C1 . All the places colored with this (We formally define this number in the following section.) The color form the first SM subnet, denoted SMN1 . All the decoder Yi is responsible for the generation of the Petri net subsequent subnets are created in a similar way, except output signals Y i that are under control of a given SM subnet that all the sequences of places, which have already been and additional internal variables used for synchronization YZi . placed in one of the previously created subnets (assigned These can be defined as follows: with the color number lower than currently evaluated), Yi = Y i (Qi ), are replaced by a macroplace and dmpm ∈ DM Pi (4) YZi = YZi (Qi ). – the macroplace doubler – is placed in a given SM subnet. Note that, as a result, there could be several Connections between all the SMNi circuits have to be doublers obtained. Doublers occurring sequentially can created (Fig. 1). In the above equation, Z denotes a bus be replaced by a single doubler. implementing the buffer-based communication method that 2) Constructing the set of synchronizing variables. The goal can be defined as follows: here is to construct the set Z, consisting of variables I [ used for the synchronization of the SMNi subnets. These Z= YZi . (5) variables are assigned to places and transitions. There i=1 is required to analyze each transition. If a transition The bus Y is an output of the controller and it can be defined ts belongs to more that one SMNi subnet, then each as a sum of outputs for all SMNi circuits: elementary conjunction ψm assigned to a transition I input place pm ∈ •ts (note that in the case of a SM subnet | • ts | = 1) should be replaced by a new [ Y = Y i. (6) pm pm ∗ i=1 elementary conjunction ψm = ψm ∧ yZ , where yZ pm is the new synchronizing signal and yZ ∈ YZi . We The bus X is an input of the controller. are then required to replace the guard condition ϕs assigned to the transition ts in each subnet SMNi by IV. S YNTHESIS M ETHOD aV new guard condition ϕis = ϕs ∧ ( xpZm ) – where V The main idea of the proposed synthesis method is based xpZm is a disjunction of signals yZ pm generated by the on the decomposition of a Petri net into SM subnets and the remaining subnets – and connected to the appropriate minimal encoding of places. The places are encoded separately inputs xpZm ∈ XZi of the considered subnet SMNi . in each subnet using the minimal (in bits) length vector. The 3) Places encoding. The aim of this step is to assign binary decomposition is based on the colors assigned to the places code K(pm ) to each place pm in each subnet SMNi . All of a given colored Petri net. The Output variables assigned the subnets have to be analyzed separately. The encoding to places are decoded in the second-level circuit. This is should be done using the lowest possible number of implemented with the embedded memory blocks, which allows bits. This allows an efficient use of the FPGA embedded a balanced usage of the modern FPGA devices resources [18]. memory blocks for the purposes of the decoder Yi . In The starting point of the synthesis method is a colored such case the number of bits used is interpreted Petri net. We note that there are several Petri net coloring algorithms and we do not intend to discuss them Ri = dlog2 |Pi |e, (7) INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 4 where Pi ⊆ (P ∪ DM Pi ) is a set of places of a resources. The content of these memory blocks can be given subnet SMNi . To store this code we use Qi = described by a truth table or by a set of formulas. How- {q0 , . . . , qRi −1 } variables. The place belonging to the ever, the truth table method is more suitable in this case. initial marking M0 should be assigned the code equal The variables qr ∈ Qi represent the address and output, to 0. If a given subnet does not include any place from whereas the variables y ∈ (Y i ⊆ Y ) ∪ YZi represent the initial marking M0 , the code equal to 0 should be the word stored under this address. The set Y i contains assigned to the place dmpm that replaced macroplace only the output variables controlled by the subnet SMNi . with such place. The subsequent places should receive In each row of the truth table only the variables y that subsequent Gray codes. We note that it is the most ef- form a part of the elementary conjunction ψm related to fective encoding for this type of circular state machines the place pm (represented by the variables qr , given the [25]. code K(pm )) are assigned the value 1. Other variables 4) Forming conjunctions. The purpose of this step is to receive the value 0. form the conjunctions describing each place pm , each 7) Forming logic circuit. The aim of this step is to cre- transition ts and any given additional hold conditions. ate models of each subnet in a hardware description These conjunctions aid the construction of the logical language (HDL) according to the schematic from the formulas related to the system (3), described in the next figure 2 and rules of its connections into control system step. For each subnet SMNi the conjunctions are formed according to the schematic from the figure 1. separately. Given a place pm the conjunction describing The module related to the combinational circuit CCi pm is formed with literals from the variables qr ∈ Qi , is described with the use of continues assignments. It using the code of the place pm (denoted K(pm )). If a describes the formulas created in the step 5. The memory variable qr equals 0 – as encoded by K(pm ) – then RGi is described as an Ri -bits, D-type, asynchronous the negative literal q¯r is used. If qr equals 1, then the reset signal register, which is triggered by a positive positive literal qr is used: edge. The decoder Yi is described as a process using R−1 of the case statement. Note that since the embedded memory blocks are synchronous, the decoder has to be ^ pm = qrl , (8) r=0 synchronous as well and such situation is not shown in the schematic. Note also that it should be triggered where l ∈ {0, 1}, qr0 = q¯r and qr1 = qr . The conjunction by a negative edge. The result is that the output values related to a transition ts is formed using the conjunction are ready and stable after a single clock signal period describing its input place pm ∈ •ts and its guard [20]. As a consequence of the behavior of the embedded condition: memory blocks, the reset signal has to be synchronous. ts = pm ∧ ϕs . (9) A HDL description of this module should include special Additional hold conditions are required to ensure that synthesis directive appropriate for the use of embedded the code for the current place is not lost, when it memory blocks. The syntax of this directive depends will be required for more that one clock cycle. The on a language, device and synthesis tool vendors. The formula describing the hold conditions of a place pm description of the case statement is created based on the is the conjunction of pm and the negated disjunction of truth table created in the step 6. The top-level module conjunctions describing its output transitions: of the subnet should describe connections between these _ three modules based on the schematic shown in figure hpm = pm • ∧ pm . (10) 2. The local clock and reset signal have to be connected to the memory RGi and the decoder Yi . The connection 5) Building formulas. The goal of this step is to build of clock and reset signal to the decoder is not shown on up the formulas describing the system (3), which is the schematic because, in general case, the decoder can implemented using the combinational circuit CCi . These be asynchronous. Because there is applied simple buffer- formulas are formed using a D-type flip-flop and are based method of synchronization there is required that built from the conjunctions of transition conditions and all local clock signals should have the same frequency. the conjunctions of the hold conditions for places. If a They cloud be shifted in phase and they could have variable qr is equal to 1 (given the code K(pm ) for the different duty cycles. It means that there could be place pm ) then the respective formula, denoted Dr , is a applied local clock generators for each subnet. Subnets disjunction of conjunctions of the input transitions and created by following the above should be passed on the hold conditions for the place pm : separately to the third party synthesis & implementation M _ _ tools. Each subnet is implemented in a separate FPGA Dr = ( •pm ∨ hpm ∧ K(pm )[r]). (11) device of distributed control system. Devices have to m=1 be connected by the external buses according to the 6) Creating decoder memory blocks. Since the system (4) schematic in the figure 1, where each block represents has a regular structure, the decoders Yi can be imple- one device. mented with the use of the embedded memory blocks. This allows a balanced usage of the FPGA device INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 5 Aggregate Content Water YT1 feeder feeder feeder P1 [C1] XN1 YT1 YT2 t1 XN2 P2 P3 XN1 Scales [C1] [C2] YV2 t2 XF1 Mixer arm XF2 YV1 YV1 YM P4 [C1,C2] Content XF4 XF1 Timer t3 mixer YT2 XF3 P5 YV3 [C1,C2] XN2 t4 Fig. 4. Industrial control process YV1 YV2 P6 P9 [C1,C2] [C3] XF1 XF2 t5 t9 YZP3 P3 P7 P8 YT1 XZP2 [C2] [C3] P1 t2 t6 YZDMP1 XN1 t1 DMP1 YM P10 YZP2 P2 XF1*XZP6 [C2,C3] t5 XF4 XZP3 t7 t2 YZP7 P7 YV3 YV1 YV2 P11 P4 XZP8 P9 [C2,C3] t6 XF3 XF1 t3 YM XF2 t8 t9 YT2 P10 YZP8 P5 P8 XF4 XN2 t7 Fig. 3. Sample Petri net PN1 XZP7 t4 YV3*YZP11 t6 YV1*YZP6 P11 YZDMP2 P6 DMP2 V. A N E XAMPLE P ROCEDURE XF3*XZDMP2 XF1*XZDMP1 t8 XF3*XZP11 t5 We use an example Petri net PN1 (Fig. 3) to illustrate the t8 method of Petri net synthesis described in the previous section. This Petri net describes control process of industrial mixer of (a) SMN1 (b) SMN2 (c) SMN3 aggregate content and water (Fig. 4) [8], [26]. Outputs of the Fig. 5. Subnets of the Petri net PN1 controller are connected into valves of tanks (yt1 , yt2 , yv1 , yv2 , yv3 ) and engine of mixer (ym). Inputs gives information about state of tanks (xf3 ), scale (xf1 , xn1 , xn2 ), timer (xf4 ) SM subnet. Also output signals, from elementary conjunctions and flow meter (xf2 ). The Petri net is colored using three of these places, are now only under control of one SM subnet. colors C1 , C2 and C3 , shown in the Figure as the lower-right label assigned to places. That sort of a colored Petri net can Secondly, the synchronization variables are created (step be considered an entry point in our synthesis method. 2). In our case, there are only four transitions belonging Firstly, the Petri net has to be decomposed (step 1). The to more that one subnet: t2 and t5 belong to SMN1 and first subnet (Fig. 5a) consists of all the places colored by the SMN2 and t6 and t8 belong to SMN2 and SMN3 . Additional p2 p3 color C1 . The second one (Fig. 5b) has one doubler for the signals, denoted yZ and yZ , were created for the transition macroplace dmp1 . This doubler is used instead of places p4 , t2 . These signals were assigned to the conjunctions ψ2∗ and p5 , and p6 , since these places have been already used in the ψ3∗ respectively. These conjunctions relate to places p2 and first subnet. A doubler for the macroplace dmp2 in the third p3 and are connected with the input signals denoted xpZ2 and subnet (Fig. 5c) is also created, replacing p10 , and p11 . We note xpZ3 , respectively. The input signals are added to the guard that the replication of the sequential places by a doubler of conditions denoted ϕ22 and ϕ12 (for the transition t2 in SMN2 macroplace has also removed some transitions from received and SMN1 ) respectively. In a similar way the signals for three INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 6 TABLE I E NCODING OF PLACES ‘define p1 ˜Q[ 2 ] & ˜Q[ 1 ] & ˜Q[ 0 ] ‘define p2 ˜Q[ 2 ] & ˜Q[ 1 ] & Q[ 0 ] SMN1 SMN2 SMN3 ‘define p4 ˜Q[ 2 ] & Q[ 1 ] & Q[ 0 ] Code Code Code ‘define p5 ˜Q[ 2 ] & Q[ 1 ] & ˜Q[ 0 ] Place Place Place ‘define p6 Q[ 2 ] & ˜Q[ 1 ] & Q[ 0 ] q2 q1 q0 q2 q1 q0 q1 q0 p1 000 p3 000 p9 00 p2 001 dmp1 001 p8 01 ‘define t1 ‘p1 & XN1 p4 011 p7 011 dmp2 11 ‘define t2 ‘p2 & XZP3 p5 010 p10 010 ‘define t3 ‘p4 & XF1 p6 110 p11 110 ‘define t4 ‘p5 & XN2 ‘define t5 ‘p6 & XZDMP1 &X F1 ‘define hp1 ˜( ‘t1 ) & ‘p1 remaining transitions have to be created. The entire process is ‘define hp2 ˜( ‘t2 ) & ‘p2 shown in Figure 5. ‘define hp4 ˜( ‘t3 ) & ‘p4 The next step is to encode the places (step 3). In accordance ‘define hp5 ˜( ‘t4 ) & ‘p5 with (7) we are required to use R1 = 3, R2 = 3, and R3 = 2 ‘define hp6 ˜( ‘t5 ) & ‘p6 bits, in order to encode all the places. A possible encoding of these places is shown in Table I. It is done separately for each Fig. 6. Verilog description of conjunctions for SMN1 subnet using variables from the corresponding Qi set. Since ‘ i n c l u d e ” c o n j u n c t i o n s . vh ” M0 = {p1 , p3 , p9 }, the places p1 , p3 , and p9 all are assigned module CC1 ( XN1 , XF1 , XN2 , XF2 , XZP3 , XZDMP1, the codes equal to 0. Q, D) ; After the encoding process is finished, we can start forming i n p u t XN1 , XF1 , XN2 , XF2 , XZP3 , XZDMP1 ; i n p u t [ 2 : 0 ] Q; the conjunctions (step 4). The place conjunctions are formed o u t p u t [ 2 : 0 ] D; on the basis of the place codes according to (8). The example conjunctions for subnet SMN1 looks as follow: a s s i g n D[ 0 ] = ‘t1 | ‘hp2 | ‘t2 | ‘hp4 ; p1 = q2 ∧ q1 ∧ q0 , a s s i g n D[ 1 ] = ‘t2 | ‘hp4 p2 = q2 ∧ q1 ∧ q0 , | ‘t3 | ‘hp5 | ‘t4 | ‘hp6 ; p4 = q2 ∧ q1 ∧ q0 , a s s i g n D[ 2 ] = ‘t4 | ‘hp6 ; ..., endmodule p6 = q2 ∧ q1 ∧ q0 . Fig. 7. Verilog description of combinational circuit CC of SMN1 module RG1 (CLK, RES , Q, D) ; The transition conjunction consists of transition input places i n p u t CLK, RES ; i n p u t [ 2 : 0 ] D; and the transition guard condition according to (9). For the o u t p u t [ 2 : 0 ] Q; SMN1 , we have: reg [ 2 : 0 ] Q; t1 = p1 ∧ xn1 , a l w a y s @( p o s e d g e CLK or p o s e d g e RES ) t2 = p2 ∧ xpZ3 , i f ( RES ) Q <= 3 ’ b0 ; t3 = p4 ∧ xf1 , else ..., Q <= D; endmodule t5 = p6 ∧ xf1 ∧ xdmp Z 1 . Fig. 8. Verilog description of register RG of SMN1 The hold conjunction for each place consists of a negation of the disjunction of all the conjunctions of the transitions of the variable Dr based on (11) in every subnet. For example, the output places and a place according to (10). For the SMN1 , formula for D0 of SMN1 is denoted as: we have: hp1 = t1 ∧ p 1 , D0 = t1 ∨ hp2 hp2 = t2 ∧ p 2 , ∨ t2 ∨ hp4 . hp4 = t3 ∧ p 4 , Since the variable q0 is equal to 1 – given the code for the ..., places p2 , and p4 – the formula D0 is a disjunction of all the hp6 = t5 ∧ p 6 . input transition conjunctions of these places. In the considered case these are denoted t1 , and t2 . Also, a conjunction of hold conditions for all places has to be added. In a similar way we Now the formulas describing the combinational circuits can form the remaining formulas for D1 and D2 , and D0 , D1 and be formed (step 5). A formula has to be formed for every D2 of SMN2 , and D0 and D1 of SMN3 . These formulas can INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 7 TABLE II TABLES OF DECODER OPERATIONS module Y1 (CLK, RES , YT1 , YT2 , YV1 , YZP2 , YZP6 , Q) ; Decoder Y 1 Decoder Y 2 Decoder Y 3 i n p u t CLK, RES ; Address Operation Address Operation Address Operation i n p u t [ 2 : 0 ] Q; q2 q1 q0 Y1 q2 q1 q0 Y2 q1 q0 Y3 o u t p u t YT1 , YT2 , YV1 , YZP2 , YZP6 ; 000 10000 000 001000 00 100 r e g YT1 , YT2 , YV1 , YZP2 , YZP6 ; 001 00010 001 000100 01 010 011 00100 011 000010 11 001 / / s y n t h e s i s a t t r i b u t e bram map o f Y1 i s y e s 010 01000 010 010000 a l w a y s @( negedge CLK) 110 00101 110 100001 i f ( RES ) p2 p6 Y1 = yt1 yt2 yv1 yZ yZ {YT1 , YT2 , YV1 , YZP2 , YZP6} = 5 ’ b00000 ; p3 dmp1 p7 p11 Y2 = yv3 ym yZ yZ yZ yZ e l s e c a s e (Q) p8 dmp2 3 ’ b000 : {YT1 , YT2 , YV1 , YZP2 , YZP6} = Y2 = yv2 yZ yZ 5 ’ b10000 ; 3 ’ b001 : {YT1 , YT2 , YV1 , YZP2 , YZP6} = 5 ’ b00010 ; be obviously minimized after putting conjunctions instead of 3 ’ b011 : {YT1 , YT2 , YV1 , YZP2 , YZP6} = the corresponding variables. 5 ’ b00100 ; We are now ready to create the decoder memory blocks 3 ’ b010 : {YT1 , YT2 , YV1 , YZP2 , YZP6} = (step 6). Each of the subnets requires a separate table. Table II 5 ’ b01000 ; shows three such tables in relation to the considered example. 3 ’ b110 : {YT1 , YT2 , YV1 , YZP2 , YZP6} = 5 ’ b00101 ; d e f a u l t : {YT1 , YT2 , YV1 , YZP2 , YZP6} = Finally, the logic circuit can be formed (step 7). Each SMNi 5 ’ b00000 ; circuit can be easily described in any HDL. Our examples endcase was described in Verilog. The process of the description is endmodule illustrated based on SMN1 . All other SMNi circuit have to be described in the same way. The conjunctions were described in Fig. 9. Verilog description of decoder Y of SMN1 separate file (Fig. 6) and then included into the combinational module SMN1 (CLK, RES , XF1 , XF2 , XN1 , XN2 , module CC 1 . The combinational module was described using ZDMP1, ZP3 , YT1 , YT2 , YV1 , ZP2 , ZP6 ) ; continues assignments (Fig. 7). The memory register RG1 i n p u t CLK, RES ; was described as an 3-bits, D-type with asynchronous reset, i n p u t XF1 , XF2 , XN1 , XN2 ; i n p u t ZDMP1, ZP3 ; which is triggered by a positive edge (Fig. 8). There was o u t p u t YT1 , YT2 , YV1 ; used topical synthesis template [27]. The decoder Y1 was o u t p u t ZP2 , ZP6 ; described as a process using of the case statement (Fig. 9), w i r e [ 2 : 0 ] D; as it was mentioned earlier. In our example it is targeted w i r e [ 2 : 0 ] Q; into Xilnix Spartan or Virtex devices. It is important for this CC1 CC ( . D(D) , . Q(Q) , . XF1 ( XF1 ) , . XF2 ( XF2 ) , circuit because it have to implemented into embedded memory . XN1(XN1) , . XN2(XN2) , . XZDMP1(ZDMP1) , blocks. While, each family of FPGA devices has different . XZP3 ( ZP3 ) ) ; memory blocks this circuit have to be targeted into specific family. The decoder is a synchronous one with asynchronous RG1 RG ( . CLK(CLK) , . RES ( RES ) , . D( D1 ) , . Q(Q) ) ; reset because embedded memory blocks of Xilnix devices, Y1 Y ( . CLK(CLK) , . RES ( RES ) , . Q(Q) , called BlockRAM, required such behavior. It is triggered . YT1 ( YT1 ) , . YT2 ( YT2 ) , . YV1(YV1) , by a negative edge. The description of this module should . YZP2 ( ZP2 ) , . YZP6 ( ZP6 ) ) ; include special synthesis attribute bram_map for the use of endmodule BlockRAMs durnig the synthesis & implementation process. The top-level module of the SMN1 subnet should describe Fig. 10. Verilog description of decoder Y of SMN1 connections between these three components using instanti- ation mechanism (Fig. 10) based on the schematic shown in figure 2. The local clock and reset signal have to be connected it on to the third party synthesis & implementation tools. The to the memory RG and the decoder Y. results for Xilnix Spartan device are shown in table III. The All SMNi modules do not have to be connected together entire control system can be set up on the basis of the obtained in any module because they are going to be implemented circuit. in separate devices. But, for simulation purpose, there was created the main top-level module according to the schematic VI. C ONCLUSIONS in figure 1. Then the simple test-bench for this main top- The paper presents a method of distributed control system level module was written. This test-bench emulate the control realization. A formal description of the method is then accom- process of mixer (Fig. 4). It generate all information about panied with a simple example. The specification of the control state of tanks (xf3 ), scale (xf1 , xn1 , xn2 ), timer (xf4 ) and algorithm uses the notion of a Petri net, which allows an easy flow meter (xf2 ). The sample simulation (Fig. 11) shows one description of parallel processes. We note that it is possible to cycle of the process. apply formal verification methods to test the algorithm. The When simulation results are correct, we are ready to pass proposed method of synthesis is based on the decomposition of INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 8 Fig. 11. Simulation of sample Petri net PN1 TABLE III R ESOURCES OF LOGIC CIRCUIT OF P ETRI NET PN1 [3] M. Wȩgrzyn, P. Wolański, M. Adamski, and J. Monteiro, “Coloured Petri net model of application specific logic controller programs,” in Proceedings of IEEE International Symposium on Industrial Electronics SMN1 SMN2 SMN3 ISIE’97, vol. 1. Guimarães, Portugal: Piscataway, 1997, pp. 158–163. Number of LUTs 6 6 3 [4] M. Doligalski, “Behavioral specification diversification for logic Number of Flip-Flops 3 3 2 controllers implemented in FPGA devices,” in Proceedings Number of BRAMs 1 1 1 of the Annual FPGA Conference, ser. FPGAworld’12. New York, USA: ACM, 2012, pp. 6:1–6:5. [Online]. Available: http://doi.acm.org/10.1145/2451636.2451642 [5] J. Esparza and M. Silva, “On the analysis and synthesis of free choice a Petri net into SM subnets. The architecture for the distributed systems,” in Advances in Petri Nets 1990, ser. Lecture Notes in Computer control system is based on the local synchronization of each of Science, G. Rozenberg, Ed. Berlin/Heidelberg: Springer-Verlag, 1991, the SM subnets. There is no global synchronization however, vol. 483, pp. 243–286. [6] K. Barkaoui and M. Minoux, “A polynomial-time graph algorithm which means that the entire system is of GALS type. In order to decide liveness of some basic classes of bounded Petri nets,” in to synchronize the entire system we propose a method of Application and Theory of Petri Nets, ser. Lecture Notes in Computer asynchronous communication via additional control signals in Science, K. Jensen, Ed. Berlin/Heidelberg: Springer, 1992, vol. 616, pp. 62–75. [Online]. Available: http://dx.doi.org/10.1007/3-540-55676-1 4 the buffer-based mode. The synthesis method and connected [7] A. Karatkevich and T. Gratkowski, “Analysis of the operational Petri nets system architecture apply to the FPGA devices. In addition, by a distributed system,” in Proceedings of the International Conference we propose the realization of instruction decoders with the use on Modern Problems of Radio Engineering, Telecommunications and of the embedded memory blocks, since it allows a balanced Computer Science TCSET’04, Lviv Polytechnic National University. Lviv, Ukraine: Lviv, Publishing House of Lviv Polytechnic, 2004, pp. usage of FPGA device resources. 319–322. [8] L. Gniewek and J. Kluska, “Hardware implementation of fuzzy Petri net as a controller,” IEEE Transactions on Systems, Man, and Cybernetics R EFERENCES – Part B: Cybernetics, vol. 34, no. 3, pp. 1315–1324, 2004. [1] T. Murata, “Petri nets: Properties, analysis and applications,” Proceed- [9] L. Gomes, A. Costa, J. Barros, and P. Lima, “From Petri net models ings of the IEEE, vol. 77, no. 4, pp. 541–580, 1989. [Online]. Available: to VHDL implementation of digital controllers,” in 33rd Annual Con- http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=24143 ference of the IEEE Industrial Electronics Society IECON’07. Taipei, [2] A. Karatkevich, Dynamic Analysis of Petri Net-Based Discrete Systems, Taiwan: IEEE, 2007, pp. 94–99. ser. Lecture Notes in Control and Information Sciences. Berlin: [10] T. Łuba, G. Borowik, and A. Kraśniewski, “Synthesis of finite state ma- Springer-Verlag, 2007, vol. 356. chines for implementation with programmable structures,” Electronics INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 9 and Telecommunications Quarterly, vol. 55, no. 2, pp. 183–200, 2009. [11] R. Winiewski, A. Barkalov, L. Titarenko, and W. Halang, “Design of microprogrammed controllers to be implemented in fpgas,” International Journal of Applied Mathematics and Computer Science, vol. 21, no. 2, pp. 401–412, 2011. [12] A. Bukowiec and P. Mróz, “An FPGA synthesis of the distributed control systems designed with Petri nets,” in Proceedings of IEEE 3rd International Conference on Networked Embedded Systems for Every Application NESEA’12, Liverpool, United Kingdom, 2012, p. [6]. [13] K. Jensen, K. Kristensen, and L. Wells, “Coloured Petri nets and CPN tools for modelling and validation of concurrent systems,” International Journal on Software Tools for Technology Transfer (STTT), vol. 9, no. 3, pp. 213–254, 2007. [14] K. Biliński, M. Adamski, J. Saul, and E. Dagless, “Petri-net-based algo- rithms for parallel-controller synthesis,” IEE Proceedings – Computers and Digital Techniques, vol. 141, no. 6, pp. 405–412, 1994. [15] T. Kozłowski, E. Dagless, J. Saul, M. Adamski, and J. Szajna, “Parallel controller synthesis using Petri nets,” IEE Proceedings – Computers and Digital Techniques, vol. 142, no. 4, pp. 263–271, 1995. [16] A. Wȩgrzyn, “On decomposition of Petri net by means of coloring,” in Proceedings of IEEE East-West Design & Test Workshop EWDTW’06, Sochi, Russia, 2006, pp. 407–413. [17] J. Tkacz, “State machine type colouring of Petri net by means of using a symbolic deduction method,” Measurement Automation and Monitoring, vol. 53, no. 5, pp. 120–122, 2007. [18] A. Bukowiec, “Synthesis of FSMs based on architectural decomposition with joined multiple encoding,” International Journal of Electronics and Telecommunications, vol. 58, no. 1, pp. 35–41, 2012. [19] A. Bukowiec and M. Adamski, “Logic synthesis for FPGAs of inter- preted Petri net with common operation memory,” in 11th IFAC/IEEE International Conference on Programmable Devices and Embedded Systems PDeS 2012. Proceedings., Z. Bradáč, F. Bradáč, and F. Zezulka, Eds. Brno, Czech Republic: IFAC-PapersOnLine, 2012, pp. 57–62. [20] A. Bukowiec, Synthesis of Finite State Machines for FPGA devices based on Architectural Decomposition, ser. Lecture Notes in Control and Computer Science. Zielona Góra: University of Zielona Góra Press, 2009, vol. 13. [21] M. Adamski, “Formal logic design of reprogrammable controllers,” in Design of Embedded Control Systems, M. Adamski, A. Karatkevich, and M. Wȩgrzyn, Eds. New York: Springer, 2005, pp. 15–26. [22] A. Karatkevich, “On macroplaces in Petri nets,” in Proceedings of IEEE East-West Design & Test Symposium EWDTS’08. Lviv, Ukraine: IEEE, 2008, pp. 418–422. [23] A. Bukowiec and M. Adamski, “Synthesis of Petri nets into FPGA with operation flexible memories,” in Proceedings of the IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits and Systems DDECS’12, Tallinn, Estonia, 2012, pp. 16–21. [24] S. Suhaib, D. Mathaikutty, and S. Shukla, “Dataflow architectures for GALS,” Electronic Notes in Theoretical Computer Science, vol. 200, no. 1, pp. 33–50, Feb. 2008. [25] H. Kubátová, “Finite state machine implementation in FPGAs,” in Design of Embedded Control Systems, M. Adamski, A. Karatkevich, and M. Wȩgrzyn, Eds. New York: Springer, 2005, pp. 177–187. [26] G. Łabiak, M. Adamski, M. Doligalski, J. Tkacz, and A. Bukowiec, “UML modelling in rigorous design methodology for discrete con- trollers,” International Journal of Electronics and Telecommunications, vol. 58, no. 1, pp. 27–34, 2012. [27] J. M. Lee, Verilog QuickStart: A Practical Guide to Simulation and Synthesis in Verilog. Norwell, MA: Kluwer Academic Publishers, 1999. Arkadiusz Bukowiec was born in Zielona Góra. He received a Bachelor degree in computer engi- neering from Technical University of Zielona Góra. During these studies he completed industrial practice at Aldec Inc. in Henderson, NV, USA. Then, he received Master degree and a Ph.D. degree in com- puter science from the University of Zielona Góra. During the Ph.D. studies he spent one semester at New University of Lisbon, Portugal. Since 2003, he has been working at the University of Zielona Góra. His research interests include methods of design, synthesis and verification of digital circuits. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 10 Design of Structured Configurable Controllers Using Gentzen Reasoning Jacek Tkacz and Marian Adamski Abstract—The paper is concentrated on behavioral and struc- Logic, describe both structure of the net as well as the intended tural specification of reconfigurable logic controllers (RLC). The behavior of the logic controller, given in professional hardware initial description is given as a hierarchical modular control description language. Gentzen symbolic reasoning [8], [9] interpreted Petri net. The strategy developed and promoted in this paper is based on the hierarchical decomposition of Petri establishes a link between Boolean expressions, commonly Nets into nested, self-contained and structurally ordered subnets, used in a digital system design on RTL level and the Petri which are suitable for distributed state encoding as well as flexible net model of a Hierarchical Concurrent State Machine [1], reconfiguration. All structured modules are easily recognized by [7]. their symbolic names of the configuration (coordination) places, The paper introduces only the short outline of proposed which are only marked if selected modules are active. On the abstract level of the logic synthesis specification is written in general design methodology. It is assumed that the reader formal propositional Gentzen sequent language. This decision has a basic knowledge of Petri nets [9], [10], [11], [12]. The rules describe both the structure of the net and the intended modular approach to specification and synthesis of concurrent behavior of the logic controller. The rapid modeling in FPGA controllers is focused and direct hierarchical mapping of Petri can be done directly from rule-based expressions, written in the nets into FPGA structure is demonstrated. hardware description language, for example in VHDL. The proposed design methodology is based on a formal Index Terms—Gentzen logic, sequents, Petri net, logic synthe- mapping of specification in propositional Gentzen logic into sis, logic controllers, FPGA, VHDL. very close equivalent description of the implementation, which is accepted directly by VHDL [13]. I. I NTRODUCTION II. T HEORETIC PRELIMINARY T HE presents covers an effective technique for computer- based synthesis of Reconfigurable Logic Controllers, which start from the given Petri net based behavioural spec- A. Colored Control Interpreted Petri net A simple Petri net [10], [14] is defined as a triple ification. The main goal of the proposed design style is to continuously preserve the direct, self-evident correspondence P N = (P, T, F ), (1) between a hierarchical interpreted Petri net and modular struc- where: tured implementation of the digital system, which is designed P is a finite non-empty set of places, in Field Programmable Logic (FPL). The control part of the P = {p1 , . . . , pM } small discrete embedded system can be realised as Application T is a finite non-empty set of transitions, Specific Logic Controller, which is implemented as a separated T = {t1 , . . . , tS } FPGA device or a part of embedded integrated microsystem. In F is a set of arcs from places to transitions and from such a case it frequently collaborates with electromechanical transitions to places. operational units. On the other hand the discrete controller An interpreted Petri net is enhanced with feature for infor- can be also applied as a central part of the modern integrated mation exchange with environment [11], [12], [14], [15]. This embedded microsystem. exchange is made by using binary signals [13], [16]. There The main goal of the proposed rapid design style is to can be distinguished two types of interpreted Petri nets: Moore preserve the self-evident correspondence among modular inter- type and Mealy type. preted Petri net, the related symbolic rule-based specification, The Boolean variables occurring in the interpreted control and final logic design expressions, which are directly mapped Petri net are divided into two sets: into configurable logic arrays FPGA [1], [2], [3]. X is a set of input variables, X = {x1 , . . . , xL }, The paper demonstrates logic design techniques, which can Y is a set of output variables, Y = {y1 , . . . , yN }. be used for rigorous computer-based synthesis of Reconfig- urable Logic Controllers (RLC) [1], [4], [5], [6], [7] and its The guarded (labeled) transition ts is enabled and can be fired hardware implementation. Initially, the behavior of the con- if all its input places (•ts ) are marked and current value of troller is specified as hierarchical colored control interpreted corresponding Boolean function ϕs is logically true (one). In Petri net. The decision rules, written in Propositional Sequent case of Moore type interpreted Petri net, ψm is an elementary conjunction of some output variables from the set Y . If the J. Tkacz and M. Adamski are with Institute of Computer Engineering and place pm is marked the output variables from corresponding Electronics, University of Zielona Góra, ul. Podgórna 50, 65-246 Zielona conjunction ψm are set to logical one. Góra, Poland. E-mail: {j.tkacz, m.adamski}@iie.uz.zgora.pl The research was financed from budget resources intended for science in A Petri net can be also enhanced by assigning colors 2010–2013 as an own research project No. N N516 513939. to places and transitions [9], [12], [15]. These colors help INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 11 to intuitively and formally validate the consistency of all 1 (a∨b∨c)∧(a∨¬b) |-; sequential processes in the considered control interpreted Petri ∧|- Removed sequents net [1], [7], [11], [13], [17]. Each color can recognize one state 2 (a∨b∨c),(a∨¬b) |-; machine module. The rules for Petri net coloring are given in ∨|- [16], [18]. 3 a,a∨¬b|-; 5 b∨c,a∨¬b|-; ∨|- ∨|- B. Hierarchical Concurent State Machine In the traditional sequential state machine model the logic 4 a,a|-; a,¬b|-; 6 b,a∨¬b|-; 9 c,a∨¬b|-; controller changes only its global internal states, which are ¬ |- 7 8 ∨|- 10 11 ∨|- usually recognised by their mnemonic names. The set of all a|-b; b,a|-; b,¬b|-; c,a|-; c,¬b|-; the possible global internal states is finite and fixed. ¬ |- In the considered hierarchical Petri net approach a concur- 12 c|-b; rent state machine simultaneously holds several local states and some independent local state changes can occur, forcing Fig. 1. Example of sequent normalization separate local internal events. The local state changes in the modeled local state space are first precisely recognized and then graphically represented as related Petri net transitions, example the rule of the disjunction operator elimination will be usually with several input places and output places. The global presented. If the main logical operator in sequent is disjunction states of the controller can be given as maximal subsets of the located in antecedent then two new related together sequent local states, which simultaneously hold. will be produced. The first sequent contains the first argument The efficient tactic presented in this paper is based on of in antecedent, and the second sequent will contain the modular, possibly hierarchical decomposition of the local state second argument of disjunction also in antecedent (2). space of the concurrent state machine into the self-contained and structurally ordered subsets. They are easily recognized Θ, Φ ∨ Ψ, Γ ` Π (2) by individual part of internal state code from embedded Θ, Φ, Γ ` Π Θ, Ψ, Γ ` Π places. The total codes of Petri net subnets, represented If the main logical operator in sequent is disjunction located in as macroplaces, are obtained by merging of common parts succedent then in the next step only one sequent is produced of appropriate individual place codes. Since the local state and logic operator or is replaced by comma in succedent. In changes are isolated in particular modular parts of the FPGA such away both of arguments of disjunction are separated by structure, the combinational excitation functions for flip-flops, a comma in succedent (3). which belong to the state and output registers of the controller, are relatively simple. They can be easily implemented by any Λ ` Θ, Φ ∨ Ψ, Γ (3) kind of macrocell, with JK, D or T flip-flops. In the paper T Λ ` Θ, Φ, Ψ flip-flop [4], [5], [19], [20] with additional enabling inputs is The elimination process is repeated, while only normalized promoted. When the structured state encoding is applied, the sequents without logical operators are obtained (Fig. 1). redesign of the digital system is much simpler. Any part of The normalized sequent is a sequent without any logical a behavioural specification can be immediately recognised in operators. The sequent is a tautology iff it has the same the regular cell structure of digital circuits on the proper level formula in antecedent and succedent. The located tautology of abstraction and improved or rejected and replaced. sequents could be removed from further normalization. Iff all normalized sequents are tautologies the analyzed root sequent C. Reasoning in Gentzen system is also a tautology. When one of leafs in deduction tree (Fig. The sequent is a formalized statement used for deduction 1) is not a tautology it means that it is a counterexample and calculi [8], [21], [22]. In the sequent calculus, sequents for analyzed sequent. The consensus method is included into are used for specification of judgment that are characteristic formal deduction (Gentzen cut rule). to deduction system. The sequent is defined as a ordered The current version of implementation of Gentzen system pair (Γ, ∆), where Γ and ∆ are finite sets of formulas, and ”GENTZEN v6.7.2” accept many types of logical operators, Γ = {A1 , A2 , . . . , Am }, ∆ = {B1 , B2 , . . . , Bn }. Instead of taken from Palasm (*, +, /, ->, <->, <+>), VHDL (and, (Γ, ∆) it is used notation with use of turnstile symbol Γ ` ∆. or, nor, xor, and, not, <=), Verilog (&, |, !, ˆ), Linear Γ is called the antecedent and ∆ is a succedent of the sequent. logic (⊕, ⊗, () and NuSMV (&, |, xor, xnor, !). The The sequent Γ ` ∆ is satisfiable for Vmthe valuation Wnv iff for implemented system is optimized through the introduction of the same valuation v the formula i=1 Ai → j=1 Bj is elements of the Thelens algorithm [14], [23] and resolution satisfied. In proposed implementation of Gentzen system there algorithm [22]. Results obtained from the system can be are defined ten rules of elimination of logic operators. For each automatically transformed to VHDL, VERILOG, NuSMV operator (negation, disjunction, conjunction, implication and (model for model checking system), Espresso, and classic CNF equivalency), there are defined two rules of its elimination. and DNF form. As result of sequent normalization (Fig. 1) First rule is used when the operator is located in antecedent it is proven that the relatively complicated sequent 1 can be and the second one when it is located in succedent. As an decomposed into two simple sequents 4 and 12. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 12 Aggregate Cement Water TABLE I feeder feeder feeder L IST OF TRANSITIONS AND GUARDS Transition Guard Interpretation of guard YT1 YT2 XN2 t1 XN1 Required value of aggregate is reached XN1 t2 1 Always true Scales XN1 YT1 t3 XF1 The scale is empty XF1 Mixer YV2 YV1 XN2 YT2 t4 XN2 Required value of cement is reached controller arm XF2 Logic XF1 YV1 t5 XF1 The scale is empty YM XF2 YV2 t6 1 Always true YV3 Content XF4 XF3 t7 XF4 Ingredients are intermixed Timer XF4 YM mixer t8 XF3 Cement mixer is empty XF3 t9 XF2 Required value of cement is reached YV3 TABLE II L IST OF PLACES AND OUTPUTS Fig. 2. Mechanical part of discrete control system Place Output Interpretation of place [1] P1 YT1 First dozing of cement YT1 P1 P2 - Waiting t1 XN1 P3 - Waiting P4 YV1 First emptying the scale [1] P2 [2] P3 P5 YT1 Second dozing of cement P6 YV1 Second emptying the scale t2 P7 - Waiting YV1 P8 - Waiting [1] P4 P9 YV2 Dosing of water t3 XF1 P10 YM Mixing of compounds P11 YV3 Emptying the mixer YT2 [1] P5 [2] P12 t4 XN2 YV1 YV2 could be colored by a designer during the initial specification [1] P6 [3] P9 process to demonstrate its preferable State Machines subnets t5 t9 XF2 (Fig. 3). These colors help to validate intuitively and formally XF1 [2] P7 the consistency of all sequential processes in the considered [3] P8 discrete state model. The control Petri net is covered by t6 three separated State-Machine components SM1 , SM2 , SM3 , YM [2] P10 recognized by colors {[1], [2], [3]}: t7 XF4 [3] P13 SM1 [1] = {P1 , P2 , P4 , P5 , P6 }; YV3 [2] P11 SM2 [2] = {P3 , P12 , P7 , P10 , P11 }; t8 XF3 SM3 [3] = {P9 , P8 , P13 }; Fig. 3. Petri net model B. Place centered specification of Petri net in Gentzen logic In formal description of the Petri net, letters stand for the III. RULE - BASED SPECIFICATION OF P ETRI NET symbols from Gentzen propositional logic [8], [9]. Symbol A. Example of control system and denotes conjunction symbol or denotes disjunction, sym- bol not - negation, symbol ⇐ backward implication, symbol The controlled plant (Fig. 2) consists of three feeders, xor - exclusive or, symbol ⇔ equivalence. scales and content mixer. The example first introduced by P. In the paper the specification of Petri net is concentrated Misiurewicz has been used as benchmark in several papers, around places with their input and output transitions. Such among others in [13], [24]. strategy makes possible to represent Petri net places as a The logic controller has six inputs separated elementary parts of Petri net. {XN1 , XN2 , XF1 , XF2 , XF3 , XF4 } and six outputs The logic description describes the changes of Petri net {Y T1 , Y T2 , Y V1 , Y N2 , Y V3 , Y M }. The example of control- markings, separately for any place. The autonomous place interpreted Petri net, which describes the behavior of control system is presented in Fig. 3. The Petri net places {P1 , P2 , . . . , P11 } stand for the local states of concurrent state a) b) c) ti tj machine. The transitions t1 . . . t9 describe events in terms of local changes inside the Petri net state space. Boolean Pi Pn Pm Pi expressions XN1 . . . XF4 called guards give the external MPn tj conditions for transitions to be enabled and fired. The colored tk tl guardti ti Pm coordination places P12 and P13 in Fig. 3 are optional. The guarded events are strongly related with transitions guardtl tl of the net (Table I). The Moore type outputs Y T1 . . . Y M are attached to places (Table II). The basic, one-level net Fig. 4. Symbolic representation of Petri net parts INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 13 TABLE III Pn is considered together with its input {ti . . . tj } and output P RECONDITION AND OUTPUTS transitions {tk . . . tl } as a basic component of the net (Fig. 4). Precondition of transitions Moore type outputs The precondition for firing transition ti is: ti ⇐ pm and pl ` t1 ⇐ P1 and XN1 ; ` Y T 1 ⇐ P1 ; and guardti . The place safely gets its token if one of its input ` t2 ⇐ P2 and P3 ; ` Y V1 ⇐ P4 ; transition fires. The next marking for a place Pn (Fig. 4a) is ` t3 ⇐ P4 and XF1 ; ` Y T 2 ⇐ P5 ; ` t4 ⇐ P5 and XN2 ; ` Y V1 ⇐ P6 ; defined as follows: ` t5 ⇐ P6 and XF1 ; ` Y V2 ⇐ P9 ; ` t6 ⇐ P7 and P8 ; ` Y V3 ⇐ P11 ; @Pn ⇐ Pn xor ((ti xor tj ) xor (tk xor tl )) (4) ` t7 ⇐ P10 and XF4 ; ` Y V M ⇐ P10 ; ` t8 ⇐ P11 and XF3 ; If net is safe xor operator in expression (ti xor tj) can be ` t9 ⇐ P9 and XF2 ; replaced by or. If the place deterministically loses its token the expression (tk xor tl ) is simplified to (tk or tl ) [19]. YT1 P1 [1] The precondition of local transitions ti (Fig. 4b) is defined as t1 XN1 follows: [1] [1] [2] [2] P2 MP1 MP2 P3 ti ⇐ Pm and Pi and guardti (5) t2 Macroplace M Pn from the example (Fig. 4c) contains se- YV1 P4 [1, 2] quential places Pi and Pm . The common, boundary transition t3 XF1 tl presented on Fig. 4c can be described as follows: [1 2] YT2 MP3 P5 [1, 2] tl ⇐ M Pn and Pm and guardtl (6) t4 XN2 YV1 [3] To make the specification close with VHDL syntax and P6 [1, 2] P9 [3] YV2 semantics, the sequents with empty left side are used: ”` Φ;” t5 XF1 t9 XF2 MP5 where Φ is formula in propositional logic. Symbol @ defines next operator from propositional temporal logic and it is [2] MP4 P7 [2] P8 [3] usually also omitted. t6 The first part of state-event-state (place-transition-place) YM description of one level Petri net is given i the Table III. P10 [2, 3] Each line of description represents a single event in Petri [2, 3] t7 XF4 MP6 net as as transition with its preconditions. YV3 P11 [2, 3] The Moore type combinational outputs are related with t8 XF3 places, which contain token when that output is active: ` Y T 1 ⇐ P1 ; ` Y V1 ⇐ P4 or P6 ; ... ` Y V3 ⇐ P11 ; Fig. 5. Hierarchical macronet Different forms of rule-based specification can be found in papers [4], [14], [19]. Petri net specification format (PNSF) Some of registered outputs can serve also as local one-hot serves as convenient textual form for an automatic translation state codes, except Y V1 , which is generated twice in local of transition rules into VHDL or Verilog code [13]. states P4 and P6 : Y V1 ⇐ P4 or P6 : After concurrent one-hot encoding symbols P1 ..P11 are ` @Y T1 ⇐ Y T1 xor (t5 xor t1 ); treated as a names of flip-flops contained in distributed local ` @Y T2 ⇐ Y T2 xor (t3 xor t4 ); state register. The outputs can be traditionally generated in ` @Y V2 ⇐ Y V2 xor (t8 xor t9 ); combinational circuit: Y T1 ⇐ P1 . . . Y M ⇐ P10 (Table III). ` @Y V M ⇐ Y V M xor (t6 xor t7 ); Finally, changes of place markings: ` @Y V3 ⇐ Y V3 xor (t7 xor t8 ); ` @P1 ⇐ P1 xor (t5 xor t1 ); The full list of preconditions of local transitions and de- ` @P2 ⇐ P2 xor (t1 xor t2 ); scriptions of Moore type outputs is given in Table III. ` @P3 ⇐ P3 xor (t8 xor t2 ); ` @P4 ⇐ P4 xor (t2 xor t3 ); ` @P5 ⇐ P5 xor (t3 xor t4 ); IV. C OLORED CONTROL INTERPRETED P ETRI NET ` @P6 ⇐ P6 xor (t4 xor t5 ); A. Concurrently decomposed colored Petri net ` @P7 ⇐ P7 xor (t5 xor t6 ); ` @P8 ⇐ P8 xor (t9 xor t6 ); The rules for the Petri net coloring are [7], [16], [18]: ` @P9 ⇐ P9 xor (t8 xor t9 ); 1) If the place has a color each of its input and output ` @P10 ⇐ P10 xor (t6 xor t7 ); transition must have the same color. ` @P11 ⇐ P11 xor (t7 xor t8 ); 2) Each place and transition must have at least one color. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 14 TABLE IV 3) The input places of each transition must hold different S IPHONS AND TRAPS colors. Siphons Traps Colors 4) The output places of each transition must hold different M P2 , M P3 , M P4 , M P6 M P2 , M P3 , M P4 , M P6 c2 colors. M P1 , M P3 M P1 , M P3 c1 5) The input and output places of transition must shere the M P5 , M P6 M P5 , M P6 c3 same set of colors. TABLE V 6) There are not two or more initially marked places which E XAMPLE OF ENCODING share exactly the same set of colors. SM1 [1] SM2 [2] SM3 [3] {Q1 , Q2 , Q3 } {Q4 , Q5 , Q6 } {Q7 , Q8 } 7) The number of different colors which are shared by the P1 = 000 P3 = 000 P9 = 00 places initially marked is equal to the total number of P2 = 001 P12 = 001 P8 = 01 colors. P4 = 011 P7 = 011 P13 = 11 The colors {[1], [2], [3]}, which paint the Petri net places P5 = 010 P10 = 010 P6 = 110 P11 = 110 in Fig. 3 separate three concurrent sequences of local state changes in non-overlapping state machine components. The new version of computer program iCPN efficiently laborious process of results selection is avoided. Siphons and reduces the net, as well as finds the suitable minimal coloring traps that are not minimal are eliminated beforehand. from topological structure of interpreted Petri net graph [2]. The concurrently decomposed Petri net can be considered C. Encoding inside state machine modules as a restricted form of general Jensen Petri nets [15]. Con- ventionally the net is state-encoded by means of using eight Encoding from Table V can be used for more compact variables {Q1 . . . Q8 } distributed among linked state machines dense state space. The number of logic variables is reduced to (Table V). Alternatively, the net could be one-hot encoded eight by commercial tools [2]. Every state machine subnet using thirteen variables {Q1 . . . Q13 } [2], [5], [12], [13]. is encoded separately. The changes of marking are now represented by changes of of local state variables. The current value of a registered signal is presented as Q, but its next value B. Petri net-based testing liveness and coloring using Gentzen is written as @Q [4]. logic − − SM1 − − It is assumed that the Petri net has already been reduced to ` @Q1 ⇐ Q1 xor (t4 xor t5 ); a hierarchical macronet (Fig. 5), which will be analyzed in the ` @Q2 ⇐ Q2 xor (t2 xor t5 ); next steps. The Gentzen sequents show in a symbolic way the ` @Q3 ⇐ Q3 xor (t1 xor t3 ); relations between input and output places of all transitions. − − SM2 − − They are determined on the basis of the topological structure ` @Q4 ⇐ Q4 xor (t7 xor t8 ); of an uninterpreted skeleton of Petri net. The method presented ` @Q5 ⇐ Q5 xor (t5 xor t8 ); in [10] separated sequents of siphons (deadlocks) (7) and traps ` @Q6 ⇐ Q6 xor (t2 xor t6 ); (8) were created. − − SM3 − − M P3 → (M P1 + M P2 ), ` @Q7 ⇐ Q7 xor (t6 xor t8 ); (M P1 + M P4 ) → M P3 , ` @Q8 ⇐ Q8 xor (t9 xor t8 ); (7) M P6 → (M P4 + M P5 ), (M P2 + M P5 ) → M P6 `; In this case combinational outputs should be generated as follows: (M P1 + M P2 ) → M P3 , ` Y T1 ⇐ not Q1 and not Q2 and not Q3 ; M P3 → (M P1 + M P4 ), ... (8) (M P4 + M P5 ) → M P6 , ` Y V3 ⇐ Q4 and Q5 and not Q6 ; M P6 → (M P2 + M P5 ) `; In order to check if the Petri net is live, traps equal to V. I MPLEMENTATION OF COLORED HIERARCHICAL siphons (deadlocks) are detected [7], [16], [23]. Sets of marked MACRONET traps which are equivalent to siphons determine potential state machine subnet, which are present inside the Petri net. Each A. Modular specification of logic controller of the subnets is marked with different color, flagging also its Together with coloring a Petri net can be converted into suit- places and transitions. Traps not included into siphons indicate able hierarchical description. As an example, the initial, basic potential net defects. The net is not live, if all the siphons do net from Fig. 3 was reduced to the macronet with macroplaces not contain traps. M P1 . . . M P6 (Fig. 5). Transitions with more than one input After removing the right sides of the reduced sequents and place or more than one output place, such as t2 , t5 , t6 , t8 after leaving out only the siphons dominated by marked traps, are called boundary transitions. Transfer transitions with one three potential automata subnets are obtained. input and one output places are hidden inside first order What differentiates sequent calculus from other methods, macroplaces. Fusion of Series Places (FSP) and fusion of e.g. those used in [10], [14], is that the conversion to clausal Parallel Places (FPP) [9], [14] are used recursively during form is not required. Moreover, by using cut and consensus a coloring [16], [18], until the macronet becomes irreducible. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 15 It should be noted, that the macroplaces which are painted D. Local encoding inside macroplaces with registered outputs with disjoint set of colors are evidently concurrent to each The local places can be encoded using registered outputs. other. The macroplaces sharing the same color are sequentially The codes of local places are as follows: related to each other. The special implicit configuration (coor- dination) places M P1 . . . M P6 detect all the Petri net subnets, P1 ⇔ Y T1 ; P 2 ⇔ not Y T1 ; P 3 ⇔ not Y V3 ; which they dominate. During the hierarchical state encoding P4 ⇔ Y V1a ; P 5 ⇔ Y T2 ; P 6 ⇔ Y V1b ; only a proper subset of them is nesesery to detect the groups P7 ⇔ not Y V1 ; P 8 ⇔ not Y V2 ; P 9 ⇔ Y V2 ; of places [7]. P10 ⇔ Y M ; P11 ⇔ Y V3 ; After that kind of encoding of places {P1 , P4 , P5 , P9 , P10 } the preconditions of local transitions are as: B. General template for modular logic design Symbols M P1 . . . M P6 are the names of macroplaces as ` t1 ⇐ M P1 and Y T1 and XN1 ; well as names of its coordination places (subnet flags) from ` t3 ⇐ M P3 and Y V1a and XF1 ; Fig. 5. Petri net places p1 . . . p11 are related to capital letters ` t4 ⇐ M P3 and Y T2 and XN2 ; denoting single bit memory elements - flip-flops: P1 . . . P11 . ` t7 ⇐ M P6 and Y M and XF4 ; ` t9 ⇐ M P5 and Y V2 and XF2 ; The main part of a novel template of formal description of Petri net in Gentzen sequent logic language is as follows: As a result of replacing the names of places by related Preconditions of boundary transitions: output names, the changes of local places are described as follows: ` t2 ⇐ M P1 and M P2 and P2 ; ` t5 ⇐ M P3 and P6 ; ` @Y T1 ⇐ (Y T1 and M P1 ) xor (t5 xor t1 ); /* @P1 */ ` t6 ⇐ M P4 and M P5 and P7 and P8 ; ` @Y V1a ⇐ (Y V1a and M P3 ) xor (t2 xor t3 ); /* @P4 */ ` t8 ⇐ M P6 and P11 ; ` @Y T2 ⇐ (Y T2 and M P3 ) xor (t3 xor t4 ); /* @P5 */ ` @Y V1b ⇐ (Y V1b and M P3 ) xor (t4 xor t5 ); /* @P6 */ Precondition of local transitions: ` @Y V2 ⇐ (Y V2 and M P5 ) xor (t8 xor t9 ); /* @P9 */ ` @Y M ⇐ (Y M and M P6 ) xor (t6 xor t7 ); /* @P10 */ ` t1 ⇐ M P1 and P1 and XN1 ; ` @Y V3 ⇐ (Y V3 and M P6 ) xor (t7 xor t8 ); /* @P11 */ ` t3 ⇐ M P3 and P4 and XF1 ; ` t4 ⇐ M P3 and P5 and XN2 ; ` t7 ⇐ M P6 and P10 and XF4 ; E. State machine style for macroplace encoding ` t9 ⇐ M P5 and P9 and XF2 ; As a next optimization the macroplaces can be encoded in state machine style. The first subnet corresponding to color [1], Flags of macrostates (macroplaces): which contains macroplaces M P1 and M P3 can be encoded by one logic variable (Q1). The second subnet, which contains ` @M P1 ⇐ M P1 xor (t5 xor t2 ); macroplaces M P5 and M P6 can be encoded also by one ` @M P2 ⇐ M P2 xor (t8 xor t2 ); logic variable (Q3). The last subnet corresponding to color [3], ` @M P3 ⇐ M P3 xor (t2 xor t5 ); which contains macroplaces {M P2 , M P3 , M P4 , M P6 } need ` @M P4 ⇐ M P4 xor (t5 xor t6 ); two logic variable Q2 and Q4 . Macroplaces M P2 and M P4 ` @M P5 ⇐ M P5 xor (t8 xor t6 ); get one-hot codes: ` @M P6 ⇐ M P6 xor (t6 xor t8 ); M P1 ⇔ Q1 ; M P3 ⇔ not Q1 ; M P5 ⇔ Q3 ; Places are encoded inside macroplaces. Changes of local (9) M P6 ⇔ not Q3 ; M P2 ⇔ Q2 ; M P4 ⇔ Q4 ; places are as follows: The symbol ⇔ used above denotes logic equivalence. The ` @P1 ⇐ (P1 and M P1 ) xor (t5 xor t1 ); number of flip flops is reduced from six to four. The number ` @P2 ⇐ (P2 and M P1 ) xor (t1 xor t2 ); of expressions describing flags is now equal only four: ... ` @Q1 ⇐ Q1 xor (t5 xor t2 ); ` @P10 ⇐ (P10 and M P6 ) xor (t6 xor t7 ); ` @Q2 ⇐ Q2 xor (t8 xor t2 ); ` @P11 ⇐ (P11 and M P6 ) xor (t7 xor t8 ); ` @Q4 ⇐ Q4 xor (t5 xor t6 ); ` @Q3 ⇐ Q3 xor (t8 xor t6 ); C. One-Hot encoding of macroplaces In this case the preconditions of boundary transitions and For a rapid prototyping, macroplaces are coded by means precodition of local transitions should be also changed by of registered outputs Q1 . . . Q6 as shown in Fig. 6: replacing macroplaces by codes presented in expressions (9). M P1 ⇔ Q1 ; M P2 ⇔ Q2 ; M P3 ⇔ Q3 ; VI. VHDL-S TYLE OF THE MODULAR P ETRI NET M P4 ⇔ Q4 ; M P5 ⇔ Q5 ; M P6 ⇔ Q6 ; DESCRIPTION It is easy, but not recommended, to find codes for local The preferable way of controller rapid prototyping is hier- places P1 and P11 , using only additional variables Q7 . . . Q17 . archical design from a formal assertion-based [25] behavioral INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 16 library IEEE; description, using professional HDL syntax. One of the pos- use IEEE.STD_LOGIC_1164.all; sible version of general template [13] is presented in Fig. 6. entity reactor is For pragmatic reasons the controller is realized as port (CLK, RESET: in std_logic; synchronous digital system with distributed state register XN1, XN2, XF1, XF2, XF3, XF4: in std_logic; YT1, YT2, YV1, YV2, YV3, YM: out std_logic); M P1 . . . M P6 and distributed output register Y T1 . . . Y M . end reactor; In general state register and output register can be merged. architecture reactor_t of reactor is All concurrently enable transitions can fire independently, in signal Q1, Q2, Q3, Q4, Q5, Q6: std_logic; any order. It is considered that after animation and classical signal Y_T1, Y_T2, Y_V1a, Y_V1b: std_logic; signal Y_V2, Y_V3, Y_M: std_logic; analysis, the implemented interpreted Petri net is checked signal t1, t2, t3, t4, t5, t6, t7, t8, t9: std_logic; as safe, live, reversible and without conflicts, which are not begin --Precondition of border transitions solved [14], [15]. Anyway if some transitions of net would t2 <= Q1 and Q2 and not Y_T1 and not Y_V3; be in conflicts or net is not safe (1-bounded), the detected t5 <= Q3 and Y_V1b and XF1; t6 <= Q4 and Q5 and not Y_V1b and not Y_V2; partial state of the net is frozen (state changes stop). Registered t8 <= Q6 and Y_V3 and XF3; outputs can be used both for precondition and local states --Precondition of local transitions t1 <= Q1 and Y_T1 and XN1; coding. The simulation results obtained from Active-HDL tool t3 <= Q3 and Y_V1a and XF1; is shown in Fig. 7. t4 <= Q3 and Y_T2 and XN2; t7 <= Q6 and Y_M and XF4; t9 <= Q5 and Y_V2 and XF2; VII. R ESULTS OF EXPERIMENTS FF:process (CLK,RESET) -- Transition firing begin The macroplace-centered and place-centered decomposition if RESET = ’1’ then -- Asynchronous reset and encoding of SM-colored Petri net is preferable from FPGA Q1 <= ’1’; Q2 <= ’1’; Q3 <= ’0’; Q4 <= ’0’; Q5 <= ’1’; Q6 <= ’0’; resources utilization point of view [2], [5]. Additionally it Y_T1 <= ’1’; Y_T2 <= ’0’; Y_V1a <= ’0’; make possible flexible reusing of previously tested, encoded Y_V1b <= ’0’; Y_V2 <= ’1’; Y_V3 <= ’0’; Y_M <= ’0’; Petri net components. Synthesis after classic hierarchical one- elseif rising_edge(CLK) then hot state encoding of macroplaces and places needs 17 flip- Q1 <= Q1 xor (t5 xor t2); flops. In case of economical rapid concurrent one-hot local Q2 <= Q2 xor (t8 xor t2); encoding of merged places and registered outputs (Fig. 6) it is Q3 <= Q3 xor (t2 xor t5); Q4 <= Q4 xor (t5 xor t6); necessary to use only 13 shared additional encoding variables. Q5 <= Q5 xor (t8 xor t6); Synthesis result using Xilinx Vertex 2 Pro is: 12 slices, 13 flip- Q6 <= Q6 xor (t6 xor t8); Y_T1 <= (Y_T1 and Q1) xor (t5 xor t1); flops and 23 LUTs. After dense encoding of macroplaces the Y_T2 <= (Y_T2 and Q3) xor (t3 xor t4); number of filp-flops is reduced to 11. Y_V1a <= (Y_V1a and Q3) xor (t2 xor t3); Y_V1b <= (Y_V1b and Q3) xor (t4 xor t5); Hierarchical encoding using macroplaces and registered Y_V2 <= (Y_V2 and Q5) xor (t8 xor t9); outputs gives balanced economical synthesis results as well Y_V3 <= (Y_V3 and Q6) xor (t7 xor t8); Y_M <= (Y_M and Q6) xor (t6 xor t7); flexibility during redesign of the controller. The coordination end if; places serve also as flags during partial reconfiguration of the end process; -- Outputs net. After modification of the water feeder from mechanical YT1 <= Y_T1; part (Fig. 2) it is easy to find local places P8 and P9 , which YT2 <= Y_T2; YV1 <= Y_V1a or Y_V1b; are encapsulated in M P5 and replace them by another subset YV2 <= Y_V2; without destroying the other parts of previous design. YV3 <= Y_V3; YM <= Y_M; The minimal number of coding variables for classic imple- end reactor_t; mentation with separated linked State Machine Components Fig. 6. Sample VHDL code of the net is equal eight. Its necessary to use complicated logic expressions for register excitation and decoding the seven Combinatorial procedures in formal design of logic controller outputs. are supported by Gentzen sequent calculus. The experimental design system can be used as a shell for VIII. S UMMARY existing proprietary tools, developed at Zielona Góra, as well The rigorous digital design process starts from hierarchical with standard professional environment for digital synthesis concurrent state machine model (HCSM), which has been and verification. It can also support the assertion-based design formally derived from modular, colored control interpreted methodology of application specific logic controllers with Petri net. The colored tokens, arcs, places and transitions checking technics embedded in configurable hardware. The separate hierarchically and concurrently related State Machine advantages are structured and modular state space of logic components. The rule-based textual logic description of Petri controllers, suitable for model checking. The self-evident net in VHDL syntax is accepted by professional design VHDL template is suitable for rapid modifications also by tools like Acvtive HDL (Aldec, USA) and Xlinx ISE. The hand. flexible, readable template for Petri net description is directly recognized by VHDL compiler and simulator as well as by R EFERENCES formal reasoning system. The logic specification is one-to- [1] M. Adamski, “Formal logic design of reprogrammable controllers,” in one mapped into Field Programmable Gate Array macrocells. Design of Embedded Control Systems, M. Adamski, A. Karatkevich, and INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 17 Fig. 7. Simulation results from AHDL tool M. Wȩgrzyn, Eds. New York: Springer, 2005, pp. 15–26. [18] K. Biliński, M. Adamski, J. Saul, and E. Dagless, “Petri-net-based algo- [2] G. Łabiak, M. Adamski, M. Doligalski, J. Tkacz, and A. Bukowiec, rithms for parallel-controller synthesis,” IEE Proceedings – Computers “UML modelling in rigorous design methodology for discrete con- and Digital Techniques, vol. 141, no. 6, pp. 405–412, 1994. trollers,” International Journal of Electronics and Telecommunications, [19] M. Adamski and M. Wȩgrzyn, “Design of reconfigurable logic con- vol. 58, no. 1, pp. 27–34, 2012. trollers from Petri net-based specifications,” in 4th IFAC Workshop on [3] J. Tkacz and M. Adamski, “Logic design of structured configurable Discrete-Event System Design - DESDes ’09, Gandia Beach, Spain, controllers,” in Proceedings of IEEE 3rd International Conference 2009, pp. 233–238. on Networked Embedded Systems for Every Application NESEA’12, [20] L. Gomes, A. Costa, J. Barros, and P. Lima, “From Petri net models Liverpool, United Kingdom, 2012, p. [6]. to VHDL implementation of digital controllers,” in 33rd Annual Con- [4] M. Adamski, “Specification and synthesis of Petri net based repro- ference of the IEEE Industrial Electronics Society IECON’07. Taipei, grammable logic controller,” in Proceedings of 5th IFAC International Taiwan: IEEE, 2007, pp. 94–99. Conference on Programmable Devices and Embedded Systems PDeS’01, [21] M. D’Agostino, D. Gabbay, and J. Posegga, Handbook of Tableau Brno, Czech Republic, 2001, pp. 95–100. Methods. Dordrecht: Kluwer Academic Publishers, 1999. [5] A. Bukowiec, “Synthesis of FSMs based on architectural decomposition [22] M. Ben-Ari, Mathematical Logic for Computer Science, 3rd ed. Lon- with joined multiple encoding,” International Journal of Electronics and don: Springer, 2012. Telecommunications, vol. 58, no. 1, pp. 35–41, 2012. [23] J. Tkacz, “State machine type colouring of Petri net by means of using a symbolic deduction method,” Measurement Automation and Monitoring, [6] M. Doligalski, Behavioral Specification Diversification of Reconfig- vol. 53, no. 5, pp. 120–122, 2007. urable Logic Controllers, ser. Lecture Notes in Control and Computer [24] L. Gniewek and J. Kluska, “Hardware implementation of fuzzy Petri net Science. Zielona Gra: University of Zielona Gra Press, 2012, vol. 20. as a controller,” IEEE Transactions on Systems, Man, and Cybernetics [7] M. Adamski and J. Tkacz, “Formal reasoning in logic design of recon- – Part B: Cybernetics, vol. 34, no. 3, pp. 1315–1324, 2004. figurable controllers,” in Proceedings of 11th IFAC/IEEE International [25] H. Foster, A. Krolnik, and D. Lacey, Assertion-Based Design, 2nd ed. Conference on Programmable Devices and Embedded Systems PDeS’12, Norwell: Kluwer Academic Publishers, 2004. Brno, Czech Republic, 2012, pp. 1–6. [8] J. H. Gallier, Logic for Computer Science: Foundations of Automatic Theorem Proving. New York: Harper & Row Publishers, 1985. [9] C. Girault and R. Valk, Petri Nets for System Engineering: A Guide to Modeling, Verification, and Applications. Berlin/Heidelberg: Springer- Jacek Tkacz Ph.D. was graduated from the Uni- Verlag, 2003. versity of Zielona Góra and since 2001 works in the [10] T. Murata, “Petri nets: Properties, analysis and applications,” Proceed- Chair of Computer Engineering. He’s research is de- ings of the IEEE, vol. 77, no. 4, pp. 541–580, 1989. voted to symbolic methods of theorem proving and [11] M. Adamski, A. Karatkevich, and M. Wȩgrzyn, Design of Embeded their application to computer science and electronics. Control Systems. New York: Springer Science+Business Media, Inc., He is also interested in novel design and develop- 2005. ment technologies for application software, includ- [12] A. Yakovlev, L. Gomes, and L. Lavagno, Hardware Design and Petri ing mobile applications. During the years 1997-2005 Nets. Boston: Kluwer, 2000. dr. Tkacz was involved in design and development of [13] M. Adamski and M. Wȩgrzyn, “Petri nets mapping into reconfig- the PROLIB software, used by many Polish libraries. urable logic controllers,” Electronics and Telecommunications Quarterly, vol. 55, no. 2, pp. 157–182, 2009. [14] A. Karatkevich, Dynamic Analysis of Petri Net-Based Discrete Systems, ser. Lecture Notes in Control and Information Sciences. Berlin: Springer-Verlag, 2007, vol. 356. [15] K. Jensen, K. Kristensen, and L. Wells, “Coloured Petri nets and CPN Marian Adamski Professor. Head of the Institute of tools for modelling and validation of concurrent systems,” International Computer Science and Electronics at the University Journal on Software Tools for Technology Transfer (STTT), vol. 9, no. 3, of Zielona Góra. Prof. Adamski’s research interests pp. 213–254, 2007. include the design of digital systems, understood as [16] T. Kozłowski, E. Dagless, J. Saul, M. Adamski, and J. Szajna, “Parallel digital microsystems, and formal methods in pro- controller synthesis using Petri nets,” IEE Proceedings – Computers and gramming of logical controllers. A member of IEEE, Digital Techniques, vol. 142, no. 4, pp. 263–271, 1995. IEE, ACM, PTEiTS (Polish Society for Theoretical and Applied Electrical Engineering) and PTI (Polish [17] M. Wȩgrzyn, P. Wolański, M. Adamski, and J. Monteiro, “Coloured Computer Science Society) Petri net model of application specific logic controller programs,” in Proceedings of IEEE International Symposium on Industrial Electronics ISIE’97, vol. 1. Guimarães, Portugal: Piscataway, 1997, pp. 158–163. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 18 SHARB: Shared Resource Arbitration in Partitioned Multicore Systems via Library Interposition Andreas Knirsch, Pierre Schnarz, Joachim Wietzke Abstract—With the increasing computational capabilities of For future systems, the incorporation of mobile devices multicore hardware platforms for embedded systems, the ex- might be realized as dynamic integration on the abstraction- tent of software-based functionalities also grows continuously. level of applications (apps), similar to the current state-of- Concurrently, formerly distributed functionalities are now in- tegrated into a common platform. This is supported by mul- the-art application deployment within the domain of smart- ticore architectures, which allow the parallel computation of phones. A prerequisite is the availability of appropriate SW independently developed software components. This constitutes frameworks and open standards to support the functional new opportunities and challenges. One of the latter is to integrate development of the ICM systems’ SW components [3], [4]. software components with different temporal requirements on a Connectivity to both the vehicle’s environment and in- common hardware platform. Shared system resources compound the temporal interference between different software components. vehicle subsystems constitute another important prerequisite In the following an approach is presented that supports the for future applications, which also includes the area of driver development and integration process through the use of static assistance applications. For example, available sensor technol- priorities to manage the access to shared resources from in- ogy like satellite positioning receivers (e.g. GPS), yaw-rate parallel executed components. The domain of In-Car Multimedia sensors, or wheel-speed sensors has already extended beyond is utilized to illustrate challenges and the proposed solution. The applicability is demonstrated through the use of a prototypical the vehicular system’s boundaries to factor in the current implementation. situation of the traffic for route calculation and guidance. Networking positively affects future driver assistance systems Index Terms—multicore processing, parallel programming, embedded software, multimedia systems, automotive applications through an augmented set of sensors. These may require a very accurate determination of the geographic position and anticipated driving directions. An ICM system can provide I. I NTRODUCTION both of these by considering a satellite and an odometric OR the past few decades the extent of automotive soft- based positioning system after alignment to accurate map data. F ware (SW) based functionalities has grown continuously. More recently, this has particularly applied to automotive A prerequisite for very accurate map data is the capability to dynamically update the on-board data using appropriate infotainment systems, which are now available for all classifi- communication channels. Thereby ICM systems act as the cations of cars. A change in that process of growth is not fore- producer and consumer for data used by other electronic con- seeable. Current in-vehicle infotainment (IVI) systems provide trol units (ECU) and remote applications (”within the cloud”). a rich variety of functionality and information to the vehicles’ Although ICM systems were operated within a safety relevant passengers. In the past, the main focus was on applications that environment before, they are gaining significance within the offered information and entertainment facilities. Meanwhile, context of functional safety and dependability regarding their IVI systems implement the human machine interface (HMI) increasing role as handler for data and information at the for a number of comfort functions using a link to different vehicular system’s boundary. vehicular fieldbus networks. Within this context the interaction ICM systems still act, among other things, as FM-tuner, with the user gains significance. This shifts the focus from the media player for local audio/video files and network streams, straight supply of contents to interactive multimedia systems, navigational device, phone, address book, email client and known as in-car multimedia (ICM) systems. Internet browser. These applications are offered to the driver The latest ICM systems are connected to infrastructure- and other passengers through several displays and control based wireless access networks. With this connectivity vehi- elements. They are presented to the users in terms of an cles evolve to mobile network nodes, which enables a wide integral and context sensitive HMI, allowing multimodal op- range of new applications. Based on this development, the erations and are adaptable to individual users. The underlying evolvements within the area of consumer electronics (CE) functionality of these applications is integrated into a SW have already influenced the applications available within cars system on a common hardware (HW) platform: the ”head- and the demands of their users. Formerly limited to the unit” [4]. synchronization of data with CE devices, they are now fully ICM SW systems are decomposed into interdependent integrated and aim to make use of the vehicle’s visualization components to counter a growing system complexity [5] and and operation capabilities [1], [2]. to achieve short development cycles. Independent third-party suppliers concurrently develop the SW components under a A. Knirsch, P. Schnarz, and J. Wietzke: In-Car Multimedia Labs, Fac- high division of labor. Thereby the tier-1 original equipment ulty of Computer Science, University of Applied Sciences Darmstadt, Haardtring 100, D-64295 Darmstadt, Germany Email: andreas.knirsch@h- manufacturer (OEM) adopts the role of the integrator. The da.de, pierre.schnarz@h-da.de, and joachim.wietzke@h-da.de. individual components and their interfaces are essentially INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 19 defined by the use of a functional viewpoint, although they computed tasks [9], [10], and in particular regarding the also have to fulfill different temporal requirements and provide interplay and unwanted interference in between certain SW sufficient responsiveness to user inputs. This includes and components. combines both time and event triggered tasks [6], for which The availability of multiple processor cores forms the basis operating systems provide adequate support. They provide for an approach to structure SW components through the use control regarding the assignment of computational capacity of execution domains (ED) [11]. The utilization of a single to tasks using appropriate scheduling strategies and task pri- operating system avoids the additional overhead introduced orities. In practice round robin scheduling (SCHED RR), in through a virtualization-based approach for structuring com- combination with different priority levels, has demonstrated ponents. With EDs, formerly physical isolated SW systems can applicability for ICM systems, not least because of the rela- be integrated into a highly integrated head-unit. The temporal tively good maintainability and predictability for systems with behavior of the targeted system also becomes predictable for many tasks (>500). Unfortunately this cannot be ensured for high system load situations. all components implemented by third party suppliers. Also, it Even though MC platforms provide multiple computational has to be considered that different scheduling strategies are not cores, there are still a number of resources that are available necessarily compatible with each other (e.g. cooperative and only once. Access to such shared resources is realized using preemptive), which also applies for independently structured a concurrent behavior, meaning each accessing component priority concepts. has to compete with others for the shared resource. On a The adaptation of SW components using incompatible single core system such access is implicitly arbitrated by the scheduling strategies and task priorities to achieve a homo- task scheduler and controlled using task-priorities: access to geneous system that fulfills all temporal requirements implies a certain resource is only granted as long as the accessing significant coordination efforts, both at the organizational level task is scheduled for computation by the task scheduler and and for the developers and respectively the integrators. A pos- when the state is changed to ”running”. This relies on the sible implication might be the re-engineering of considerable limitation that in a single core system only a single task in the parts of certain SW components, which on their own already state ”running” is possible. If there are multiple computational fulfill all the functional and non-functional requirements, but cores available, multiple tasks can be computed in parallel and which are not suitable for the concurrent use of shared they can also potentially access the same shared resource in resources. In this case the requirements for the components parallel. The temporal order of the latter is not deterministic. could be qualified as not appropriate. However, it also has This affects the temporal behavior and predictability of those to be noted that the already existing components of former accessing tasks. Therefore, the correct functioning of time systems (”legacy SW”) and third-party SW (COTS) have critical tasks cannot be guaranteed, especially in high system- to be considered, which ”cannot” be changed at all. The load situations. This might lead to the perception that the use underlying causes might be manifold: an integrator (tier-1 of single core systems can ease the integration of multiple OEM) probably has no interest in financing the necessary SW components due to implicit arbitration using the operating efforts or the third-party supplier does not want to incorporate system’s task scheduler. However, with such systems the the required changes due to strategic reasons. Hence, with an ”bottleneck” is the scheduling of tasks using incompatible increasing number of parties involved, a homogenization of the priorities and scheduling strategies, not forgetting to mention SW components and their behavior becomes more complex. the insufficient computational power. The move to MC systems Also, the rising number of components puts emphasis on is unavoidable for CPU-intensive parallel applications. This the challenge to enforce and fulfill superordinate temporal implies a move of the ”bottleneck”, where multiple tasks requirements. Additionally, even though the integration of compete for a single resource in parallel. A configurable components is done routinely, it is characterized by the use arbitration of such concurrent access is not available. of heuristic and improvised approaches [7]. In the following, an approach is presented to arbitrate As a result of the integration onto a common platform the access to shared resources in a parallel computed and the available system resources are concurrently used by dif- component-based SW system that is configurable for the ferent components. Besides the CPU this also applies for integrator. Therefore, the example of an ICM system is used. memory and input/output (IO) devices. Thereby, and with the The main objective is to improve the predictability of the focus on the overall system architecture, the complexity is temporal behavior especially for high system-load situations reduced and simplified in comparison to a federated archi- and therewith to improve the reliability of the integrated tecture consisting of separate HW components. The reason system. For this purpose further detail is included on the for this is the avoidance of additional HW components and problem definition and a set of requirements for a resource necessary communication facilities for inter-connecting. The arbiter are specified. A prototype is realized based on those simplification at HW level and intercommunication led to an specifications, which is also presented. increased integration density at SW level. According to the rising demands regarding the functionality of ICM systems II. P RIORITIZED ACCESS C ONTROL the demand for computational power increases. Here multicore (MC) based HW platforms provide a solution [8, p.167 ff.]. The arbitration of resource access for shared resources is Their assignment issues a new challenge in respect of the not a new field of research [12]. Generally speaking, the man- implementation and the necessary synchronization of parallel agement of resource access has to be done at some point. This INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 20 can be realized by interrupting the processing of an accessor dynamic scheduling (e.g. based on deadlines calculated during (e.g. preempting a low-prioritized thread within a singlecore runtime) would improve the efficiency with regard to process- environment) or at the other extreme: delegated to the targeted ing cores and resource utilization, but decrease predictability. system’s user (e.g. multiple different audio streams consumed Within the context of ICM systems that employ various tasks by the user in parallel). If the arbitration has to be predefined, of differing importance, triggered either by time or events, and based on the system requirements the access control needs clustered into components that are developed in parallel by to be reflected by the system. With multiple computational independent organizations, the need for predictability prevails resources and accessors this control has to be provided to the to foster a deterministic temporal behavior of the overall developer, respectively the integrator. Hence, the utilization of system. This is further supported by the problematic and costly MC systems in combination with component-based systems maintenance for vehicular systems after they have left the requires a practicable solution to affect the temporal order of production line. concurrent access. The SW components described here are The shared resources addressed here exclude the multiple developed independently, but depend on each other due to available CPU cores, but include I/O devices (e.g. automotive functional relationships. This requires efficient communication fieldbus connections like CAN and MOST, serial connections, facilities, provided by the use of shared memory regions and files). It is further presumed that these devices are utilized adequate mechanisms for synchronization like semaphores, using operating systems and available HW drivers. mutual exclusions and condition variables. SW frameworks can provide the necessary abstraction and support for the B. Requirements development process and improve the maintainability of the targeted system [13]. Such an abstraction may include a usable Based on the problem statement the following requirements and domain-specific interface for concurrent and parallel pro- for an arbiter can be derived: cessing, e.g. by providing capabilities to define EDs [11] and REQ-1 The access latency for a shared resource is pre- their priorities to enforce temporal requirements. Regarding dictable. the access of shared resources, a framework should also enable REQ-2 A change of third-party SW is not necessary. an integrator to define the temporal order of accessing these to REQ-3 The access to shared resources can be temporally improve the predictability of the system’s behavior. In relation ordered using static defined priorities. to the exemplary domain of ICM systems, the targeted system Although this enumeration is not complete, it defines the has to provide the functionality in a coherent and uniform most essential architectural driver for the implementation of a way in agreement with the vehicle’s user interface design. resource arbiter. This supports achieving the goal of providing the perception of an ensemble in one piece. This also includes the predefined behavior of the system. C. Related work The Automotive Open System Architecture (AUTOSAR) [14] is the outcome of a consortium consisting of automobile A. Scope of application manufacturers, suppliers and producers of development tools. The approach relies on a Symmetric Multiprocessing (SMP) It is a standardized architecture, development approach and system. Therewith additional costs are saved in comparison to application programming interface (API). With release 4.0 it system designs that are based on virtualization or Asymmetric also supports MC platforms. The main focus of AUTOSAR is Multiprocessing (AMP) and with respect to computational on mechanisms regarding the communication between appli- power and memory usage due to multiple instances of op- cations running on different processor cores. The utilization erating systems. However, the proposed approach is also of shared resources by applications which are deployed on portable to such architectural designs. Furthermore, it may be different cores is not supported [15, p.45]. This limits the that they are also required. Within AMP and virtualization degree of freedom regarding the structuring of SW components based systems concurrent access to shared resources might be and is thus a disadvantage for systems consisting of many necessary, depending on the integration density and hence the components, e.g. ICM systems. shared use of common resources. The ACTORS project [16] addresses embedded SW inten- Further, the static scheduling of tasks is presumed to sive systems in combination with resource utilization and high improve predictability and therefore maintainability due to demands regarding adaptability and efficiency. This problem simplified analyze capabilities during runtime. Additionally, domain can also be mapped to ICM systems. The project these can be principally processed within their Worst Case proposes virtualization techniques for SW isolation to improve Execution Time (WCET), although this condition might be predictability and reliability. The resource management is limited to a certain threshold due to the mix of time- and implemented in user-space and adapts the resource reservation event-triggered task characteristics. This means the SW system dynamically during runtime while considering optimal system is a ”schedulable taskset” as long as the concurring access occupancy. Although this promises good utilization of the to shared resources is not considered and the system is not system’s resources, the resource management does not take operating within a high-load scenario. This implies that during into account semantic dependencies in relation to the desired such a high system-load neither low prioritized tasks nor low temporal system behavior. This could be achieved through the prioritized access is scheduled for computation. The use of allocation of static priorities that are defined by an integrator. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 21 Nesbit et al. predict that the available mechanisms and strategies for managing the resources of future MC systems will be insufficient. They further describe that a shared uti- lization of resources by tasks executed simultaneously could lead to unpredictable individual durations of the respective threads related to the involved tasks. This may include the violation of requirements and associated Qualities of Service (QoS) [17]. Similar to the ACTORS project, they propose a spatial separation by use of virtualization and feedback channels for the adaptive management of resource utilization Fig. 1. Architectural layers of SHARB during runtime. Their main focus is on the computational resources of the underlying HW platform. Waldspurger et al. identify the conflation of concurrent the library interposition mechanism [21] the arbitration layer access to physical HW as a central challenge within the context introduced with SHARB is hidden to SW components that of I/O virtualization [18]. This includes prioritization and access managed shared resources. Applying this mechanism, arbitration. They specify a minimal additional overhead for the also referred to as interposing, introduces an intermediate layer indirection as critical. Although this is unquestioned, one of that provides capabilities to modify, prevent or substitute the the most important characteristics of a resource arbiter within functionality of referenced libraries. This appears transparent the following ”overhead” is regarded as secondary in relation to both the caller (the application) and the callee (library), only to the predictability to fulfill temporal requirements during based upon the configuration of the loader and runtime linker. high system-load situations. The loader overloads resource-access relevant symbols during runtime for the dynamic binding with the use of symbols pro- III. A RBITRATION OF I/O ACCESS WITHIN U SER -S PACE vided by the interposing resource arbiter. This means SHARB forms an additional layer on top of the operating systems and The specified requirements were realized and validated for system libraries to intercept certain calls, reinterpret those and applicability by use of a prototypical implementation, the redirect them to available system libraries where appropriate. Shared Resource Arbiter (SHARB). A preliminary discussion This obviates any functional changes regarding the access of the general operation accompanied by an early proof- to resources from the applicational layer. Hence, through of-concept of the applied techniques is presented in [19]. the use of SHARB no modifications to the already existing However, this work had to be substantially refactored with SW are necessary. This means there is no recompilation of a focus on applicability and thus evolved to the current supplied binary SW required, with respect to REQ-2. Further, implementation, presented in the following section. Therefore, no change within the layer of the operating system is necessary SHARB utilize OpenICM [20] as SW infrastructure. OpenICM because SHARB operates in user-space. In combination with is an academic SW framework maintained at the ICM Labs of its conformance to POSIX this eases a port to other system the University of Applied Sciences Darmstadt. It implements platforms. the concepts described in [13] by with the use of the Portable Operating System Interface (POSIX) API. OpenICM does not Through OpenICM as an infrastructural SW framework for yet provide any facilities for shared resource arbitration apart the application layer, the creation of threads is abstracted and from computational resources. However, it provides a mature unified. This includes the association of contextual data of API that abstracts from an underlying operating system with a a particular SW component with the thread by use of the focus on ICM systems. This includes features to modularize a POSIX API. Thereby SHARB is able to identify the context SW system into parallel computed components with respect of a particular SW component by the implicit identifier of to different priorities and an efficient asynchronous inter- an accessing thread. In combination with the identifier of the communication. This means SHARB does not only make use accessed resource, SHARB determines the priority of a certain of the existing facilities of OpenICM, but it is also a suitable access based on statically defined priorities as part of the enhancement to OpenICM to provide the necessary abstraction configuration associated with the SW components’ contextual for a predefined temporal behavior of the concurrent access to data. shared resources. B. Architecture and functional principle A. Architecture constraints and design decisions In the following section the internal functional principles of With consideration of REQ-1 SHARB avoids the use of SHARB are illustrated to describe the arbitration of concurrent dynamic memory and employs efficient communication facil- access to shared resources. The essential architectural compo- ities for internal synchronization and data transfer. Therefore, nents are depicted in Figure 1. Applicational SW components the abstractions provided by OpenICM are utilized for shared are EDs, based on the concepts as proposed in [11]. For the memory, message queues and the parallel execution of tasks. prioritization of resource accesses the Device Manager (DM) The interface to the application layer on top of SHARB as delegates all relevant calls from EDs to a Service Driver (SD). well as the interface to the operating system below conforms Relevant calls include primitives like open, read, write and to the POSIX API to achieve portability. With the use of close, as well as those used for the control and initialization INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 22 Fig. 4. Resource access without arbitration into the elements of the arbiter to denominate the execution contexts of the respective activities. For an open() the effective access priorities are determined, depending on the calling ED and the requested resource and a logical handle is created, which is used as the identifier for the association between ED and the resource (Figure 2 (activities Fig. 2. Main activities to process a call to open() 2-3)). After creating a new thread for the SD, communication channels are established for internal message events, followed by the call of the open routine using the device driver API from the DI (Figure 2 (activities 4-5)). The DM is blocked until the result of the ”real open()” is signaled. Subsequent calls to open() for the same resource will not affect the device driver as long as there is at least one active SD. This improves efficiency for multiple used resources. For a read() the DM delegates the call to the related SD by use of the logical handle, which was created during the open() (Figure 3 (activity 2)). The SD is already aware of the access priority, which is used to prioritize an available worker instance of the thread pool within the DI (Figure 3 (activities 3- Fig. 3. Main activities to process a call to read() 4)). The DI copies any read data to an internal shared memory buffer (SHM-buf), which is copied into the destination buffer (DEST-buf) by the DM on the signal from the respective SD of resources, as defined within the POSIX API. For each (Figure 3 (activities 5-7)). association between an ED and a resource, a dedicated SD In Figure 4 three exemplary EDs are depicted which access is created, which is executed as a thread within the context two resources. It illustrates the co-operation of multiple acces- (process frame) of the associated ED. A SD is connected to sors competing for shared resources (R). For this scenario the a Device Instance (DI) to actually perform the access to the following assumptions are given: resource abstraction provided by the OS. The details about the access to a certain resource are encapsulated within its 1) ED1 utilizes RA associated DI, which is executed as a thread in an independent 2) ED2 utilizes both RA and RB context. 3) ED3 utilizes RB Within this context a thread pool manages standby worker This means, RA and RB are shared resources. The order of threads, which receive prioritized jobs (DI-job) triggered by access is not defined for EDs executed in parallel (e.g. each events. A DI-job represents a read or a write access to a on a different core). certain resource. With the use of the thread pool, additional SHARB introduces a new abstraction layer, as depicted in costs for thread creation can be avoided during runtime. The Figure 5. communication between DM, SD and DI relies on shared • For each shared R, a separate DI is created during memory, POSIX message queues and binary semaphores. The initialization. prioritization is realized using static thread priorities given • For each association between ED and R, a separate SD to the DI-jobs and the SDs, whereas all DI-jobs and SDs is created during runtime. associated to a common DI are bound to a common processor • For each DI, n connections are handled during runtime. core. This ”cpu affinity” supports both a deterministic order Derived from the example introduced in Figure 5, the of resource access and an efficient communication within effective threads for the access of ED1 and ED2 are depicted SHARB through the use of a common cache memory hierar- in Figure 6 with their SHARB Priority Levels (SPL). The chy. The foundation for the prioritization of DI-jobs and SDs is SPLs are a concept to order the access-priority in dependency the utilization of the operating system’s CPU scheduler by use of the relation between a certain ED and R. This concept is of a round robin scheduling strategy with fixed priorities. Ad- implemented by the OS’s task scheduler. SPL-0 supersedes ditionally, the messages between the components are marked all other SPLs and is reserved for the management of the with priorities that are considered during their handling. effective resource access, realized by DIs. The subsequent Figures 2 and 3 provide an alternative view on the proto- SPLs correspond to the respective access priority derived from type’s design. They depict the activities for two exemplary the relation between ED and R. A single SPL represents a task scenarios for a call to open() and read(). These are partitioned queue. The task queues related to a single resource must be INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 23 C. Impact With the described approach the access to resources can be prioritized to achieve a more predictable temporal behavior. A prerequisite is the definition of a static configuration, which provides the capability to specify priorities for resources (and groups of them) in need of the accessing SW components. An access priority is not specified for an ED or a resource, but for a combination of both of them. Hence, an ED could make use of different priorities for different resources. This offers the Fig. 5. Resource access with arbitration required degree of freedom to the integrator, which is neces- sary to achieve a predefined temporal system behavior without the need to modify the implementations of the accessing SW components. Further, it is possible to arbitrate the access to a selected set of resources. This implies that additional costs for the management do only occur when necessary. With this selective deployment of SHARB the overhead introduced through the additional indirection is kept to a minimum, with the focus being on the overall system. It is also possible to combine SHARB with the implementation of another resource arbiter Fig. 6. Prioritization of the resource access using SPLs and CPU affinity through selective resource configuration. Additionally, the attributes and strategies used to access a specific resource can also be optimized using the decoupling of SW components and the resources. This means for example, bound to a single processing core. Figure 6 illustrates both the the use of new HW with legacy SW is feasible without the assignment of threads to SPLs and the partitioning into distinct need to change the SW. Apart from the demultiplexing of scheduling domains using corec1 and corec2 . Therefore, the accesses, SHARB can enforce an optimal configuration for the example shown in Figure 4 is refined: abstracted resources or HW devices (e.g. rate of transmission, 4) ED1 has higher access priority than ED2 for RA byte order, synchronization methods) through the intercepting 5) ED3 has higher access priority than ED2 for RB of control calls. This might provide further freedom during the integration process. For the OS’s task scheduler the threads of the arbiter are Moreover, the described abstraction provides the capabil- stringed on different priority scheduling queues, as depicted ity to realize different access strategies with regards to the in Figure 7. The DI-jobs represent effective access calls to the resource or the accessing SW component. This applies to corresponding resource (e.g. ED1 is accessing RA two times; read access in particular. The following enumeration lists a ED2 is accessing RA three times; both are assigned to corec1 ). selection of such strategies for stream-oriented resources: The OS’s priority levels utilized for the SPLs may supersede • A call of read() starts at the position after a previous the priorities assigned to EDs which are collocated on the read() of an access of the same component. same processing core. Although this is not a prerequisite, • A call of read() starts at the position after a previous it prevents unwanted temporal interference between SHARB read() of an access of any component. and the applications. Alternatively, dedicated processing cores • A call of read() returns the most recent data, independent might be reserved for SHARB, e.g. general-purpose cores with of any previous access. a reduced clock rate or features within a heterogeneous MC Furthermore, the abstraction allows the implementation of HW architecture. filters which may manipulate, discard or add transmitted data that is transparent for the application layer. In addition, the latter could be used to substitute, simulate or emulate unavailable resources in the early development stages and therefore to reduce risks during system integration. IV. E VALUATION In the following section the evaluation of the proposed approach based on the described prototypical implementation is discussed. A. Test setup The test system is an x86 MC HW platform, based on Fig. 7. Prioritized scheduling queues per core two Intel Xeon E5504 processors with four cores each. Even INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 24 TABLE I D EFINITIONS Symbol Parameter Description Fig. 8. Test setup for evaluation of prioritization. ai Arrival time time at which a task becomes ready si Start time time at which a task starts its execution fi Finishing time time at which a task finishes its execu- though equivalent performance is not available for current ICM tion systems, the test environment’s characteristics can be com- pi Preemption time time at which a task is interrupted due pared with next generation head-units. The operating systems to the arrival of a higher priority task ri Resume time time at which a task continues its exe- used are GNU/Linux 3.6, GNU/Linux 3.6 patched with PRE- cution after preemption EMPT RT real-time extension (in the following referred to as Ci Computation time time needed to compute a task without PREEMPT RT), and QNX Neutrino 6.5. The shared resource interruption is a dedicated kernel module, which implements a character Di Delay time introduced through pre-emption oriented device driver (char dev) that returns a given number (Di = ri - pi ) of bytes into a target buffer on a POSIX read request. To Ei Execution time time needed to compute a task includ- ing delays (Ei = Ci + Di ) simulate latency within a real resource driver and respectively Li Latency time until a ready task starts computa- a real device, the module responds to read access with a fixed tion (Li = si - ai ) latency of 50 ms, independent of the number of requested bytes or repetitions. Although, a real driver or device might not i corresponds to the related task, e.g. SD or DI-job. answer using a fixed latency, this testing environment prevents TABLE II additional variance during the time measurements. This leads P ERMUTATIONS OF TASK ARRIVALS FOR A SINGLE SCHEDULING DOMAIN to reproducible results. The scope of the evaluation is reduced to the testing of the prototype (and not any driver or device Scenario Computing Arriving Condition Impact implementations). Further, the module behaves as a blocking 1 SD1.A SD2.A fSD1.A >aSD2.A L device, which implies only one accessor can read at a time. 2 SD1.A DI-job2.A fSD1.A >aDI-job2.A L 3 SD2.A SD1.A fSD2.A >aSD1.A D B. Results - prioritization 4 SD2.A DI-job1.A fSD2.A >aDI-job1.A D To evaluate the correctness of the approach, the time to 5 DI-job1.A DI-job2.A fDI-job1.A >aDI-job2.A L read a predefined number of bytes is measured. Therefore, two 6 DI-job1.A SD2.A fDI-job1.A >aSD2.A L components (ED1 and ED2 ) are bound to different processor 7 DI-job2.A DI-job1.A fDI-job2.A >aDI-job1.A L cores. These components are implemented to read from the 8 DI-job2.A SD1.A fDI-job2.A >aSD-1.A L described device driver (RA ) concurrently and start simultane- Impact is either delay (D) or latency (L), whereas D is related to the computing ously. The dependencies are illustrated in Figure 8. The co- task and L is related to the arriving task. ordination of the simultaneous start is implemented through the use of a semaphore, triggered for both ED1 and ED2 . For ED1 a high and for ED2 a low resource access-priority symbols of Table I to support the clear distinction between is configured. Due to the implementation using the library in- computing and the arriving task. Corresponding to Figure 6, terpositioning mechanism the arbiter can be activated without tasks related to ED1 are of higher access priority than tasks much effort by adapting the search path of the dynamic linker. related to ED2 . This means SD1.A and DI-job1.A are configured The measurement is performed as a loop to collect values from for a high SPL, whereas SD2.A and DI-Job2.A are configured 10,000 runs. for a low SPL, respectively. A more trivial scenario with The results of the measurements show that the high- only one accessing ED is not considered here, because in prioritized ED1 successfully completes read before the low such a case access prioritization has no effect on the tasks’ prioritized ED2 gets access for every test run when SHARB computation order. is activated. With the arbiter deactivated, the number of test In order to show the correctness of SHARB three exemplary runs where ED1 finishes before ED2 compared with where scenarios selected from Table II are detailed in the following ED2 finishes before ED1 is uniformly distributed. section. They show either the delay in an already computing In addition to the previously described empirical methods task’s execution time or a delay in the start of the computation to prove predictable access prioritization, a theoretical consid- (latency). For reasons of clarity and comprehensibility any eration of SHARB is provided in the following. additional latency caused through the tasks’ context switches Therefore a set of parameters to define points in time was neglected. Further, the illustration is reduced to the and periods of time are provided in Table I. Furthermore, two fundamental types of task derived from the architectural Table II lists eight permutations of task arrivals related to the elements of SHARB. These differ in their behavior regarding concurrent access of different EDs using different priorities pre-emption: The SDs are pre-emptible whereas the DI-jobs within a common scheduling domain. Hence, they cause pre- are non-pre-emptible. The latter is caused through the DI- emption of, or introduce latency to tasks. The table also jobs main task of accessing the actual resource, which implies specifies the conditions of the concurring tasks, using the the current access must be finished before switching to a INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 25 Fig. 9. Low SPL latency (scenario 6). Fig. 12. Overhead in relation to repetitions in percentages. Fig. 10. Low SPL delay (scenario 3). subsequent DI-job (although this is highly dependent on the type of the actual resource and therefore leaves space for further optimization). Following previous notations, Figures 9, 10 and 11 visualize tasks separated by their configured SPL. The focus is on the scheduling order of the subsequent tasks. Figure 9 depicts the occurrence of a low-priority access during a computing high-priority access. In particular, scenario 6 of Table II is addressed here. The low-priority tasks SD2.A and DI-job2.A are delayed until all high-priority tasks are Fig. 13. Overhead in relation to amount of data in percentages. finished. This implies an introduced latency for SD2.A which may affect ED2 : maximum latency for a high-priority DI-job is the computation LSD2.A = f DI-job1.A − aSD2.A time of a single low-priority DI-job. Respectively, this may affect ED1 : Figure 10 depicts scenario 3. A high priority SD (and subsequent DI-job) pre-empts a low priority SD. In particular, LDI-job1.A = f DI-job2.A − aDI-job1.A the task-scheduler of the OS pre-empts SD2.A and starts the computation of SD1.A immediately after the arrival of SD1.A . To summarize, the temporal order of the concurring access SD2.A resumes computation after SD1.A and the subsequent to the shared resource is configurable with SHARB. Hence, DI-Job1.A finish computation. This causes a delay which may with its use the predictability of the behavior increases and affect ED2 : REQ-3 is met. DSD2.A = f DI-job1.A − aSD1.A E SD2.A = DSD2.A + E SD2.A C. Results - temporal overhead For the temporal overhead a single ED reads from a single Figure 11 depicts scenario 7. A high-priority DI-job arrives R. The results provided are based on 90,300 measurements. while a low priority DI-job is computing. The DI-job2.A These were taken natively (without SHARB/not prioritized) locks resource RA until the current job is finished. DI-job1.A and arbitrated to visualize the costs in terms of temporal is delayed until the low-priority DI-job2.A releases RA . The overhead. The amount of data (1-256 byte) and the number of repetitions (1-30) iterated during the data collection, whereas for each permutation 25 measurements were recorded to achieve reliable and evaluable results. In this context repe- titions are related to the number of calls to read() between an open() and close(). This means, a single read consists of the call ”open() - read() - close()”, whereas an access with eight repetitions consists of the calls ”open() - 8*read() - close()”. Figures 12 and 13 visualize the results as a percentage in relation to non-arbitrated access for Linux, PREEMPT RT and Fig. 11. High SPL delay (scenario 7). QNX. The respective overhead results from the mean of the INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 26 Fig. 14. Standard deviation in relation to native access and repetitions. Fig. 16. Overhead in relation to repetitions. Fig. 15. Standard deviation in relation to native access and amount of data. Fig. 17. Overhead in relation to amount of data. absolute time measurements for an arbitrated access, less the also show that the latency is scattered only slightly more with mean of the absolute time measurements for a native access. activated SHARB and is independent of size and repetitions. Figure 12 shows the overhead as the mean of different In particular, the standard deviation built from the mean of data sizes in relation to the repetitions. For a few repetitions different sizes for the different number of reads (as shown in a significant percentage of overhead can be observed. This Figure 14) illustrates a predictable temporal behavior that is is caused by the initial setup phase to establish the internal independent of the repetitions of a call to read(). For example, administration and communication infrastructure. These tem- both the distribution of the relative standard deviation with poral costs amortize when accessing the resource five times SHARB on QNX and also the standard deviation itself is or more. Further, this effect can be mitigated by the pre- smaller than the measures of SHARB on Linux. However, initialization of SHARB (the evaluation was performed by Linux patched with PREEMPT RT still performs comparably initialization on-demand). Nevertheless, the temporal overhead to the measures achieved with QNX. levels off at about 20 percent for QNX and Linux. Figures 16 and 17 provide details regarding the absolute Figure 13 shows the effect of SHARB related to different overhead (mean duration arbitrated less mean duration native). amounts of data, whereas for each data size the mean duration They correspond to Figures 12 and 13 for the observations based on different repetitions of the read() access is measured. regarding the comparison of QNX, PREEMPT RT and Linux: The overhead with QNX is significantly higher compared to On QNX SHARB behaves more deterministically, however the use with PREEMPT RT. However, the temporal overhead introduces more overhead. with QNX is much more stable in variance and therefore more Although the additional costs may appear high considering predictable (distribution with QNX is 0.14%, compared to the efficiency of the targeted system, the predictability of the 1.85% with PREEMPT RT, and 3.14% with Linux). access order prevails with respect to the deterministic behavior Further conclusions about the predictability regarding access of the overall system. This is the main objective of SHARB. latency are founded on the standard deviation. Figures 14 and 15 show the standard deviation of the time necessary V. C ONCLUSION AND O UTLOOK to access a resource using SHARB in comparison to native The integration of different SW components into a common access, which is also in relation to the repetitions and the platform implies demanding challenges. MC HW platforms amount of data respectively. Nevertheless, the measurements can provide support, but also create new challenges due to INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 27 the additional parallelism. In contrast to the concurrent multi- [12] M. Broy and T. Streicher, “Specification and Design of Shared Resource tasking (quasi-parallelism), concurrent resource access on a Arbitration,” Int. Journal of Parallel Programming, vol. 20, pp. 1–22, 1991. MC HW platform is not predictable in terms of temporal order. [13] J. Wietzke and M. T. Tran, Automotive Embedded Systeme. Berlin, This article motivates the necessity to arbitrate the access to DE: Springer, 2005. shared I/O resources. Therefore, the example of component [14] S. Bunzel, “AUTOSAR – the Standardized Software Architecture,” Informatik-Spektrum, vol. 34, pp. 79–83, 2011. based ICM systems is used, which are developed with a high [15] AUTOSAR, “Specification of Multi-Core OS Architecture,” vol. Version division of labor. Requirements were defined, which found the 1.1.0, Release 4.0, Revision 2, 2010. basis for an arbiter: SHARB. It introduces a thin architectural [16] E. Bini, G. Buttazzo, J. Eker, S. Schorr et al., “Resource Management on Multicore Systems: The ACTORS Approach,” Micro, IEEE, vol. 31, layer in between application components and system libraries no. 3, pp. 72–81, 2011. for accessing particular resources. SHARB appears transparent [17] K. J. Nesbit et al., “Multicore Resource Management,” IEEE Micro, to both the applicational layer and the libraries underneath, vol. 28, pp. 6–16, 2008. [18] C. Waldspurger and M. Rosenblum, “I/O Virtualization,” ACM Commu- which obviates the need to change either the applications nications, vol. 55, pp. 66–73, 2012. or libraries. Furthermore, SHARB resides within user-space [19] A. Knirsch, J. Wietzke, R. Moore, and P. S. Dowland, “Resource and therefore does not require any changes to the OS. A Management for Multicore Aware Software Architectures of In-Car Multimedia Systems,” in Informatik schafft Communities, ser. Lecture prototypical implementation which supports the verification Notes in Informatics, vol. P-192. Köllen, 2011, p. 216. of the presented approach for arbitration. The applicability [20] A. Knirsch, S. Vergata, and J. Wietzke, “Strukturierung von Multime- and introduced overhead of the prototypical implementation diasystemen für Fahrzeuge,” in Proc. Kommunikation unter Echtzeitbe- dingungen (Echtzeit’12). Springer, 2012, pp. 69–78. were theoretically and quantitatively evaluated. For the latter, [21] G. Nakhimovsky, “Building library interposers for fun and profit,” Unix different target OSs are compared. Insider, Jul. 2001. Although SHARB is targeted for use in SMP based ICM [22] S. Vergata, A. Knirsch, and J. Wietzke, “Integration zukünftiger In-Car- Multimediasysteme unter Verwendung von Virtualisierung und Multi- systems utilizing a single OS, the approach is also portable Core-Plattformen,” in Proc. Herausforderungen durch Echtzeitbetrieb to other usage contexts and architectures with similar require- (Echtzeit’11). Springer, 2012, pp. 21–28. ments. This includes architectures focusing on a more strict [23] A. Knirsch, A. Theis, J. Wietzke, and R. Moore, “Compositing User In- terfaces in Partitioned In-Vehicle Infotainment,” in Mensch & Computer isolation of components. These may also require the manage- 2013 - Workshopband. Oldenbourg Verlag, 2013, pp. 63–70. ment of concurrent access to shared resources, based on their density of integration or grade of parallelism. This refers, for example, to AMP systems [8, p.168] and architectures based upon virtualization using hypervisors [22]. For certain I/O Andreas Knirsch is currently a PhD student in the resources within the context of multimedia, including video Centre for Security, Communications and Network and audio, arbitration is still subject of research [23]. Research at Plymouth University, UK. He is further a researcher with the In-Car Multimedia Labs at h da Hochschule Darmstadt, DE where he received R EFERENCES his MSc in Computer Science in 2009. Prior to academia, he gained several years industrial expe- [1] R. Bose, J. Brakensiek, and K.-Y. Park, “Terminal Mode – Trans- rience in SW development. His research interests forming Mobile Devices into Automotive Application Platforms,” in include SW frameworks, system partitioning, and Proc. Automotive User Interfaces and Interactive Vehicular Applications operating systems for ICM. (AutomotiveUI’10). ACM, 2010, pp. 148–155. [2] J. Sonnenberg, “Service and User Interface Transfer from Nomadic De- vices to Car Infotainment Systems,” in Proc. Automotive User Interfaces and Interactive Vehicular Applications (AutomotiveUI’10). ACM, 2010, pp. 162–165. [3] G. Macario, M. Torchiano, and M. Violante, “An In-Vehicle Infotain- Pierre Schnarz is currently a PhD student in the ment Software Architecture Based on Google Android,” in Proc. IEEE Centre for Security, Communications and Network International Symposium on Industrial Embedded Systems (SIES’09), Research at Plymouth University, UK. He is further a 2009, pp. 257–260. researcher with the In-Car Multimedia Labs at h da [4] G. Smethurst, “Changing the In-Vehicle Infotainment Landscape,” Hochschule Darmstadt, DE where he received his GENIVI Alliance, Whitepaper, 2010. MSc in Computer Science in 2012. His research [5] M. Broy, “Challenges in Automotive Software Engineering,” in Proc. interests include multi-operating systems, trusted 28th International Conference on Software Engineering (ICSE’06). inter-OS communication and synchronization, and ACM, 2006, pp. 33–42. security requirements of ICM environments. [6] H. Kopetz, “Event-triggered versus time-triggered real-time systems,” in Operating Systems of the 90s and Beyond, ser. Lecture Notes in Computer Science. Springer, 1991, vol. 563, pp. 86–101. [7] A. Sangiovanni-Vincentelli and M. Di Natale, “Embedded System Design for Automotive Applications,” Computer, vol. 40, no. 10, pp. 42–51, 2007. Joachim Wietzke is a Professor of Informatics [8] J. Wietzke, Embedded Technologies. Berlin, DE: Springer, 2012. and heads the In-Car Multimedia Labs at h da [9] B. Cantril and J. Bonwick, “Real-World Concurrency,” ACM Queue, Hochschule Darmstadt, DE. Prof. Wietzke has been vol. 6, pp. 16–25, 2008. involved in a number of industrial projects related [10] H. Sutter, “The free lunch is over: A fundamental turn toward concur- to SW project task forces as well as next generation rency in software,” Dr. Dobb’s Journal, vol. 30, no. 3, pp. 202–210, ICM systems. Prior to his current position, he gained 2005. extensive industrial experience in head positions of [11] A. Knirsch, J. Wietzke, R. Moore, and P. S. Dowland, “An Approach SW development units at Bosch, Blaupunkt, and for Structuring Heterogeneous Automotive Software Systems by use Harman/Becker. His research interests include next of Multicore Architectures,” in Proc. Sixth Collaborative Research generation ICM systems and HMIs. Symposium on Security, E-learning, Internet and Networking (SEIN’10), 2010, pp. 19–30. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 28 Transistor Aging Prediction in Nanometer Digital Circuits Kyung Ki Kim Abstract—In nanometer technology, accurate aging prediction correctly over their entire lifetime. However, such an approach of MOSFET digital circuits is one of the most critical issues for excessively increases the size and power dissipation of a system. more reliable adaptive system design. This paper proposes a new This challenge for resilient circuits will require a design on-chip aging prediction circuit to monitor BTI and HCI aging paradigm shift in all aspects of VLSI design. As the impact of effects on digital circuits. The proposed circuit deploys a flip-flop reliability, new techniques in designing aging-resilient circuits based delay detector for monitoring a guardband violation of sequential logics. The outputs of the proposed circuit can be used are necessary to reduce the impact of the aging stresses on as a control signal in reliable self-adaptive systems. A 0.11µm performance, power, and yield or to predict the failure of a CMOS technology has been used to implement and evaluate the system. proposed circuits. As an easy solution to the aging phenomena, circuit designers need to consider these reliability mechanisms in the early design Index Terms—Aging effect, aging prediction, bias temperature stages to make sure that MOSFET circuits are operated with instability, hot carrier injection, reliability. enough margins (called guardband) to function correctly over their entire lifetime. However, since this solution excessively increases the circuit size and power dissipation of a system, new I. INTRODUCTION circuit design techniques should be introduced for the resilient A S technology is scaled down more aggressively, it has become ever harder to design reliable circuits with each technology node. Under normal operation conditions, a circuits. This challenge for resilient circuits will require a design paradigm shift to adaptive design for overcoming the performance degradation due to aging phenomena; in addition, transistor device can be changed by various stress sources such an accurate on-chip prediction circuit technique to monitor as negative bias temperature instability (NBTI), positive bias aging phenomena would be one of the key issues in the adaptive temperature instability (PBTI), hot carrier injection (HCI), and design techniques. The outputs of the prediction circuits can be time-dependent dielectric breakdown (TDDB): The NBTI has used as control signals in the new self-adaptive system using become a key reliability issue in nanometer PMOS devices. effective methods such as adaptive body biasing, supply voltage It describes the parameter degradation under a negative scaling, frequency scaling, etc. (static) bias stress mode at elevated temperature. A For a good adaptive technique, the self-adaptive system has corresponding dual effect, known as PBTI is seen for NMOS to include an on-chip aging prediction circuits whose outputs devices, when a positive (static) bias stress is applied across the are strongly correlated with threshold-voltage degradation gate oxide of the NMOS device [1]-[4]. The HCI causes a caused by aging stresses. In this paper, we propose a new degradation of the electrical parameters of a transistor when the one-chip aging prediction circuit which detects a guardband transistor is switching [5]-[8]. The TDDB causes a conduction violation of sequential logics. Recently, on-chip NBTI sensor path to form through a gate dielectric layer placed under circuits have been proposed, but they have some weak points in electrical stress, leading to parametric or functional failure measuring the impact of NBTI on digital circuits: Ref. [11] and [9][10]. These stress sources (NBTI, PBTI, HCI, and TDDB) [12] proposed a fully digital on-chip NBTI monitor, but the change the threshold voltage of the transistor device, which proposed circuits suffer from a less direct correlation between causes temporal degradation in device reliability and even result the frequency-degradation of the monitor circuit and the in failures in the transistor circuits. Vth-degradation caused by NBTI stress. Ref. [13] presents a The reliability (aging) effect has traditionally been the area of compact structure in the sub-threshold region to digitalize the process engineers. However, in the future, even the smallest of NBTI stress, but the presented structure is too sensitive to variations can slow down a transistor’s switching speed, and an temperature variation due to the circuit operation in the aging device may not perform adequately at a very low voltage. sub-threshold region. Ref. [14] proposed a circuit aging failure Because of such dilemmas, the transistor aging is emerging as a prediction scheme using the detection of a guardband violation, circuit designer’s problem. Therefore, circuit designers need to but the scheme has a complicated delay element with large area consider these reliability effects in the early stages of design to overhead. Moreover, it is not easy to apply the scheme to a make sure there are enough margins for circuits to function self-adaptive system. The Author is with the school of electronic and electrical engineering, In this paper, we propose a new fully digital on-chip aging Daegu University, Gyeongsan, South Korea (e-mail: kkkim@daegu.ac.kr). prediction circuit with a simple delay element using a 0.11 µm This research was supported by Basic Science Research Program through CMOS technology where outputs are strongly correlated with the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology. (2011-0014255) INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 29 the Vth-degradation caused by BTI and HCI stress. The proposed circuits can be easily applied to a self-adaptive system for a reliable operation. II. RELIABILITY PHENOMENON As gate oxide thickness and dimension of scale shrink in integrated circuit design, reliability problems occur since nanometer transistors are stressed by high electrics field and high switching activity over extended periods of time. These stress factors lead to device aging, resulting in performance degradation and eventually design failure during the expected lifetime of the designs. The MOSFET transistor aging phenomena, such as NBTI, PBTI, HCI and TDDB, will be the most critical device degradation mechanisms and become a Fig. 1. NBTI stress on a PMOSFET device in a nanometer technology limiting factor in nanoscale region [15]-[17]. NBTI degradation affects PMOS transistors when a negative bias is applied to the gate or, equivalently, when the gate is grounded and a positive bias is applied to source/drain as shown in Fig. 1. The presence of hydrogenated Si-bonds (Si-H) at the interface between Si and gate oxide, boron penetration into the gate oxide, and presence of impurities in the oxide originate interface and oxide charge traps. In inversion mode, holes can be injected into these traps which lead to Vth increase and Idsat decrease. An increase in Vth reduces the voltage overdrive (VDD-Vth), decreasing the circuit stability and margins. NBTI degrades performance and yield of PMOS devices. PBTI is seen for NMOS when a positive bias stress is applied across the gate oxide for the NMOS device. Although the impact of NBTI is higher than that of PBTI, PBTI has become increasingly important with the use of Hf-based dielectrics in the gate-oxide Fig. 2. HCI stress on a NMOSFET device in a nanometer technology. for leakage reduction [1]-[4]. HCI describes a degradation of the electrical parameters of MOSFETs under a dynamic stress mode. If a channel hot carrier Gate collides with a crystal atom near the drain region, it may produce an electron-hole pair by impact ionization also called Source Drain avalanche pair production as shown in Fig.2. Electrons from impact ionization could have enough energy to be injected into Substrate gate oxide region and charge existing oxide traps or generate Pinhole / defect in gate oxide new oxide-interface traps. The end result of hot carrier injection into gate oxide is a degradation of transistor parameters such as Fig. 3. TDDB stress on a MOSFET device in a nanometer technology. saturation current (Idsat) and threshold voltage (Vth) [5]-[8]. TDDB is a time-dependent gate oxide breakdown when subjected to a voltage and temperature stress. The breakdown III. ON-CHIP AGING PREDICTION CIRCUIT happens when a connecting path of traps is formed across the This section presents a new on-chip aging prediction circuit gate oxide, forming a conducting path from the gate to the which deploys a flip-flop based delay detector for monitoring a substrate or gate to source and drain as shown in Fig. 3. The guardband violation of sequential logics. The proposed circuit precise point at which the breakdown occurs is statistically detects the moment when the critical path delay of a distributed. As a result, only statistical averages can be combinational logic in a sequential design exceeds a normal predicted. For this reason, usually a large gate oxide area must value which guarantees a correct circuit operation. That is, it be used in order to be able to detect the breakdown. Gate oxide monitors if the output signal transition of the combinational breakdown manifests itself as an increase of gate current [9][10]. logic is generated at the guardband zone of the sequential design due to aging effects. The prediction circuit generates a logic “1” signal when the combinational logic brings about the guardband violation. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 30 (a) (a) (b) (b) Fig. 5. Timing diagram for the proposed circuit: (a) No guardband violation, (b) guardband violation. falling transition of the CLK signal is pushed to the guardband region. The buffer of the delay line consists of two inverters to focus on long falling transition. The Pulse_Generator makes a pulse at the falling transition time of the delayed CLK signal. (c) The pulse width of the Pulse_Generator is dependent on the AND gate size and NMOS size. A current output signal “D” of the combinational logic and a previous output signal “Q2” of the combinational logic is asserted to the exclusive-OR gate. When the two signals are different from each other which means the output of the combinational logic has a transition, the exclusive-OR generate logic “1”. If the logic “1” of the exclusive-OR is generated (d) before the pre-determined guardband region, the proposed circuit makes the output a logic “1” which is generated from the Fig. 4. Block diagram of the proposed aging prediction circuit: (a) Total AND gate with two inputs asserted from the exclusive-OR and scheme, (b) Buffer block, (c) Pulse generator block, (d) Failure decision block the pulse generator. The Failure-Decision-Block makes a final output depending on the n6 signal, and it is reset every clock The core circuit for detecting the guardband violation due to cycle and measurement cycle. aging effects consists of delay line circuit including buffer Figure 5 shows a timing diagram for the proposed circuit in chains and a modified flip-flop for aging prediction as shown in both cases of no aging and aging failure. As shown in Fig. 5 (a), Fig. 4. The modified flip-flop includes an exclusive-OR, a pulse each pulse generated from the Pulse_Generator is triggered just generator, a two-input AND gate, and a failure decision block. before the guardband region. As expected, the output signal is In Fig. 4, all the blocks of the aging prediction circuit use a converted to the high voltage since the input of the MEAS signal to turn off the prediction circuit during combinational logic is propagated before the guardband region. no-measurement mode and save power dissipation. The delay On the other hand, in Fig. 5( b), the ex-OR output is converted to line plays an important role to delay the CLK signal, and the the high voltage after the generated pulse (or at guardband INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 31 region), so the output signal is not triggered and remains at the [2] S. Zafar, et.al, “A comparative study of NBTI and PBTI (charge trapping) in SiO2/HFO2 stacks with fUSI, TiN, ReGates,” IEEE Proc. VLSI low voltage. In this case, an aging failure gets detected, and the Technology, pp. 23-25, June 2006. output can be used as a control signal of a self-adaptive system [3] J. Martin-Martinez, R. Rodriguez, M. Nafria, X. Aymerich, to compensate the aging failure. “Time-dependent variability related to BTI effects in MOSFETs: impact on CMOS differential amplifiers,” IEEE Trans. on Device and Materials Reliability, Vol. 9, Issue 2, pp. 305-310, June 2009. IV. EXPERIMENTAL RESULTS [4] S. Yang, H. Yang, C. Chuang, W. Hwang, “Timing control degradation The proposed circuits have been designed and evaluated and NBTI/PBTI tolerant design for Write-replica circuit in nanoscale CMOS SRAM,” IEEE Int’ Sym. On VLSI Design, Automation and Test, using a 0.11 µm MOSFET technology model (VDD=1.1V). For pp. 162-165, April 2009. a long term NBTI-stress simulation, we have increased the [5] A. Bravaix, C. Guerin, V. Huard, D. Roy, J. M. Roux, E. Vincent, number of cycles in the stressed input-signal with 0.5 duty cycle “Hot-carrier acceleration factors for low power management in DC-AC stressed 40nm NMOS node at high temperature,” International Reliability and 2 GHz frequency. The HCI stress time for these Physics Symposium Proceedings (IEEE IRPS), 47th Annual, pp. 26–30, experiments is 400 µsec which is not ages (run time) but actual April 2009. stress time (switching time). A 4x4 multiplier has been used as a [6] M. F. Lu, S. Chiang, A. Liu, S. Huang-Lu, et. al. “Hot carrier degradation in novel strained-Si nMOSFETs,” International Reliability Physics benchmark circuit in our simulation. Symposium Proceedings (IEEE IRPS), 42nd Annual, 25–29 April, 18–22, Table I shows that circuit level overhead associated with the 2004. proposed circuit. In the case of delay overhead, the penalty is [7] Y. Y. Chen, M. Gardner, J. Fulford, D. Wristers, A. B. Joshi, A. B., et. al. “Enhanced hot-hole degra-dation in P+-poly PMOSFETs with oxynitride very small because the proposed circuit does not have influence gate dielectrics,” International Symposium on VLSI Technology, Systems, on the multiplier delay time. The power overhead in the and Applications (VLSI), 8–10 June, 86–89, 1999. no-measurement mode is also very small because the [8] F. Arnaud, J. Liu, Y. M. Lee, et. al. “32nm general purpose bulk CMOS measurement signal is used to turn off the proposed circuit. On technology for high performance applications at low voltage,” IEEE IEDM’08, pp. 1-4, Dec. 2008. the other hand, in the measurement mode, the multiplier with the [9] S. Sahhaf, R. Degraeve, P. J. Roussel, B. Kaczer, T. Kauerauf, G. modified flip-flop consumes 10% more power than that of the Groeseneken, “A new TDDB reliability prediction methodology multiplier with a normal flip-flop. Finally, the area overhead is accounting for multiple SBD and wear out,” IEEE Trans. On Electron Devices, Vol. 56, Issue 7, pp. 1424-1432, July 2009. 28 transistors per modified flip-flop when the delay line is [10] R. Moonen, P. Vanmeerbeek, G. Lekens, W. De Ceuninck, P. Moens, J. shared for the each flip-flop. Therefore, all the simulation Boutsen, “Study of time-dependent dielectric breakdown on gate oxide results show that all the overhead impact on the multiplier is capacitors at high temperature,” IEEE Physical and Failure Analysis of Integrated Circuits (IPFA’07), pp. 288-291, July 2007. very small. [11] J. Keane, D. Persaud, C. H. Kim, “An all-in-one silicon odometer for separately monitoring HCI, BTI, and TDDB,” Proc. IEEE VLSI Circuits TABLE I Conf. , pp. 108-109, June 2009. SIMULATION RESULTS [12] T. Kim, R. Persaud, C. H. Kim, “Silicon odometer: An on-chip reliability monitor for measuring frequency degradation of digital circuits,” IEEE Overhead Result Journal of Solid-State Circuit, Vol. 43, No. 4, pp. 874-880, April 2008. Delay Overhead <1% [13] E. Karl, P. Singh, D. Blaauw, D. Sylvester, “Compact in-situ sensors for Power overhead monitoring negative bias-temperature-instability effect and oxide ~0.1% (in no-measurement mode) degradation,” Proc. of IEEE ISSCC Conf., pp. 410-411, Feb. 2008. Power Overhead [14] M. Agarwal, B. C. Paul, M. Zhang, S. Mitra, “Circuit failure and its ~10 % (in measurement mode) application to transistor aging,” IEEE VLSI Test Symposium (VTS’07), Area Overhead 28 (sharing of delay pp. 227-286, May 2007. (Transistor count per modified F/F) elements) [15] A. B. Kahng, “Design challenges at 65nm and beyond,” IEEE DATE, pp.1-2, March 2007. [16] J. W. McPherson, “Reliability challenges for 45nm and beyond,” IEEE Design Automation Conference, pp. 176-181, July 2006. V. CONCLUSION [17] D. Rittman, “Nanometer Reliability,” http://www.tayden.com/ This paper proposes novel on-chip aging prediction circuit in publications/Nanometer%20Reliabil-ity.pdf a 0.11 µm technology for monitoring a guardband violation of sequential logics. The simulation results show that the proposed circuits achieve a good aging failure prediction and low Kyung Ki Kim received the B.S. and M.S. degrees in electronic engineering from Yeungnam University, overhead. For a good adaptive design technique for overcoming Kyeongsan, South Korea, in 1995 and 1997, respectively, the performance degradation due to aging phenomena, our and the Ph.D. degree in computer engineering from accurate aging prediction circuit would be a practicable solution Northeastern University, Boston, MA, in 2008. In 2008, he was a member of the technical staff with Sun in nanoscale CMOS circuits. Microsystems, Santa Clara, CA, where he was involved in ROCK project. In 2009, he was a senior researcher with REFERENCES Illinois Institute of Technology, Chicago, IL. Currently, he is an assistant professor at Daegu University, South Korea. His current research focuses on [1] Vattikonda, R., W. Wenping, and C. Yu. "Modeling and minimization of nanoscale CMOS design, high speed low power VLSI design, analog VLSI PMOS NBTI effect for robust nanometer design", in Design Automation circuit design, electronic CAD, asynchronous circuit, and nano-electronics. Conference, 2006 43rd ACM/IEEE, 2006, pp. 1047-1052, 2006. INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 32 A New Method for Periodic Jitter Leakage Control Kyung Ki Kim Abstract—This paper presents a jitter decomposition method in the test of serial data channels. It is difficult to detect minor adjacent spectral jitter components due to frequency leakage in the jitter decomposition. Especially, when spectral jitter decomposition in random jitter (RJ) and periodic jitter (PJ) is performed, some degree of the PJ leakage can occur. The PJ leakage can cause serious error in separation PJ and RJ. Therefore, in this paper, a new method to avoid PJ leakage has been proposed. The proposed method shows significant improvement in PJ leakage control. Index Terms—Jitter, jitter decomposition, periodic jitter, PJ Fig. 1. Timing jitter components. leakage, random jitter. [5]. Separating and identifying the jitter components is very important for deriving TJ in the test of high speed serial data I. INTRODUCTION channels. In addition, the separating or decomposition of the jitter components offers the advantage to establish the source of A S serial data systems have data rates in excess of several giga bits per second, timing jitter can cause data errors. Timing jitter is defined as the deviation of a signal the phenomena leading to jitter and possible remedies. Methods such as the Tailfit Algorithm, the One-Shot Time-Interval transition time from the ideal transition time. A correct model Methodology, and the Spectral Methodology have been and analysis of jitter is essential for testing high speed serial data proposed on this critical issue [6]-[9]. It is relatively simple to channels. In Ref. [1], we presented a novel modeling analysis of measure each jitter component separately, but it is challenging jitter as applicable to the test of serial data channels. to measure and analyze multiple jitter components if they are In general, total jitter (TJ) consists of Deterministic Jitter either simultaneously injected, or already present in a signal (DJ) and Random Jitter (RJ). On the assumption that each jitter over a serial channel. component is independent, the distribution of TJ is given by the Among the decomposition methods, the spectral jitter convolution of the distributions of DJ and RJ. decomposition using Fast Fourier Transform (FFT) normally DJ consists of several subcomponents caused by different and gives good precision in estimating the frequency of the PJ [10]. mostly physically based phenomena, such as electromagnetic The spectral jitter decomposition is a method for measuring jitter spectrum, the means, and power distribution over interference, crosstalk and bandwidth limitation. The frequency. PJ can be caused by repetitive data patterns, but PJ subcomponents of DJ are Duty-Cycle Distortion (DCD), can be caused by repetitive data patterns, but PJ frequency Inter-Symbol Interference (ISI), Periodic Jitter (PJ), and leakage can cause error in decomposition of RJ and PJ. The Bounded Uncorrelated Jitter (BUJ). As DJ is not random and is error is subject to the FFT which reveals the spectral bounded, it is usually quantified by its peak-to-peak value. components and their power. Among the subcomponents, DCD and ISI are referred to as data In this paper, a jitter decomposition method has been correlated jitter, while PJ and BUJ are referred to as data presented, and also a new method to avoid PJ leakage has been uncorrelated jitter [2]. proposed. The new method shows significant improvement in RJ originates from various device-originated noise sources PJ leakage control. Measurement results of RJrms are accurate (such as thermal and flicker noise) and is characterized by a for all PJ values simulated. Gaussian distribution. It has been shown that it is theoretically unbounded in amplitude. Multiple random jitter sources add in II. JITTER CLASSIFICATION an RMS fashion; a peak-to-peak value is required when RJ is Total Jitter (TJ) consists of two components: Deterministic combined with DJ. Random jitter (RJ) is unbounded and Jitter (DJ) and Random Jitter (RJ). On the assumption that each uncorrelated jitter; therefore, it is quantified by its standard jitter component is independent, the distribution of TJ is given deviation, or RMS value [3][4]. by the convolution of the distributions of DJ and RJ. DJ consists Figure 1 shows the block diagram of the jitter classification of four subcomponents: Duty-Cycle Distortion (DCD), INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 33 4 4 x 10 TJ x 10 PJ Inter-Symbol Interference (ISI), Periodic Jitter (PJ), and Spectrum magnitude Spectrum magnitude 12 12 Bounded Uncorrelated Jitter (BUJ). As DJ is not random and is 10 10 bounded, it is usually quantified by its peak-to-peak value. 8 8 6 6 Among the subcomponents, DCD and ISI are referred to as data 4 4 correlated jitter, while PJ and BUJ are referred to as data 2 2 uncorrelated jitter. RJ is unbounded and uncorrelated jitter; 0 0 0.5 1 1.5 2 2.5 0 0 0.5 1 1.5 2 2.5 therefore, it is quantified by its standard deviation, or RMS Frequency(GHz) Frequency(GHz) 4 4 x 10 RJ x 10 ISI value [5][11]. Spectrum magnitude Spectrum magnitude 12 12 10 10 A. Random Jitter (RJ) model 8 8 RJ is commonly modeled by the Gaussian distribution and 6 6 4 4 therefore, it is assumed to be unbounded in amplitude [5]. RJ 2 2 comes from various device noise sources, such as thermal and 0 0 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 flicker noise. By the central limit theorem, the distribution of a Frequency(GHz) Frequency(GHz) large number of uncorrelated noise sources approaches a Fig. 2. Jitter spectrum. Gaussian distribution and is given by x2 1 −( 2 ) (1) J RJ ( x) = e 2σ σ 2π where σ is the standard deviation of the jitter distribution or the RMS value, and JRJ is the probability that the edge will occur at time x, where x is the deviation from the mean value of the transition time. Since the peak-to-peak value is infinite, it depends on the desired Bit Error Rate (BER) of RJ. The relationship between RMS and peak-to-peak jitter values (and the conversion as given previously in Equation (1)) is used for computing the peak-to-peak value of RJ as Fig. 3. Decomposition steps. Jitterp − p = α × JitterRMS (2) where α is determined by difference in propagation delay between LOW to HIGH and HIGH to LOW transitions. Sources of DCD can be comparator α (3) threshold offset errors, turn-on delays, and turn-off delays. 0.5 ⋅ erfc( ) = BER ( Bit Error Rate) 2⋅ 2 DCD is considered as a shift in time of the rising and falling A BER of 10-12 or better implies that the peak-to-peak value of edges. In this model, only two variables are needed, one for the RJ is 14σ. shifting of the rising edges and the other for the shifting of the falling edges. Samples of the discrete input data are moved by these two variables [13]. B. Periodic Jitter (PJ) model PJ is typically caused by unwanted modulation, such as D. Inter-Symbol Interference (ISI) model electromagnetic interference, and is uncorrelated to any data ISI comes from the dispersion of signals due to attenuation pattern [12]. PJ is modeled as a sum of cosine functions given by and reflection in the transmission media. The characterization N of ISI developed in [13] is used. The ISI model has a Low Pass PJ total (t ) = ∑ Ai cos(ωi t + θi ) (4) Filter (LPF) with bandwidth limiting effects and ringing. i =0 where PJtotal(t) denotes the total periodic jitter, N is the number of III. SPECTRAL JITTER DECOMPOSITION cosine components (tones), Ai is the amplitude in units of time, ωi is This section presents how to separate TJ into RJ, PJ, DCD the modulation frequency, t is the time, and θi is the initial phase. and ISI in a spectral method. Fig. 2 shows the jitter spectrum of each jitter component. As shown in Fig. 2, RJ is assumed to be a The PDF of a single tone PJ is given by white Gaussian where the spectrum is broad and flat. DJ is 1 periodic in time domain. Since the data pattern is periodic, DJ if x ≤ A (5) has a spectrum of impulses. Figure 3 illustrates our new jitter J PJ ( x) = π A2 − x 2 0 decomposition steps. The first step of the jitter decomposition is otherwise to separate TJ into RJ+PJ and DCD+ISI. In order to separate RJ+PJ from TJ, a decomposition method named “Pattern C. Duty Cycle Distortion (DCD) model Maker” can be deployed. DCD and ISI are dependent on data DCD results in HIGH bits (logic 1) having a different width pattern, and their impulses appear at multiples of 0.5/N where N than LOW bits (logic 0). One of the sources of DCD is the is the input pattern length. The histograms of DCD+ISI are INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 34 separate RJ+PJ from TJ, RJ can be obtained by computing the Root Mean Square (RMS) value of the noise floor in frequency domain. On the other hand, PJ spectrum can be separated from TJ by setting to zero all the bins attributable to RJ. Then, inverse FFT for the DJ spectrum has to be performed, and the peak-to-peak value of PJ can be measured in time domain. IV. NEW METHOD FOR PJ LEAKAGE CONTROL FFT assumes that the time record contains a representative Fig. 4. Pattern Marker Measurement. section of an endless periodic signal. It assumes that time records can seamlessly concatenated. If this is not the case, a phenomenon called leakage occurs. It is difficult to detect minor adjacent spectral components due to the leakage as shown in Fig. 5. When spectral jitter decomposition in RJ+PJ is performed, some degree of the leakage can occur. Therefore, PJ leakage causes error in separation PJ and RJ. To remove the PJ leakage, a peak frequency of PJ is measured, and data has to be truncated to obtain integer numbers of PJ periods as shown in Fig. 6. That is, to avoid the PJ leakage, data has to contain integer numbers Fig. 5. Leakage in FFT. of PJ periods. If the truncated data is used in FFT, the PJ leakage is removed. Table I presents the jitter measurement error depending on the proportion of RJ to PJ. As PJ_A(0.5PJpeak-to-peak) value and RJrms value is changed, the error is affected by the proportion of RJ to PJ: The error is increased when RJ is smaller than PJ as shown in Table I. This means that PJ leakage has big influence on ISI in case of small RJ, whereas ISI spectrum is indistinguishable from RJ in case of big RJ. PJ and ISI impluses sometimes overlap and cannot be correctly separated. In addition, RJ is not white because RJ pas through a LPF (ISI); RJ is bigger in low frequency and smaller in high frequency. Fig. 6. New method for PJ leakage control. Therefore, not-flat noise floor brings about errors in separating PJ and RJ. TABLE I ERROR DEPENDING ON THE PROPORTION OF RJ TO PJ V. EXPERIMENTAL RESULTS PJ_A(UI) RJ_rms(UI) Error ISI(%) Error PJ(%) Error RJ(%) To validate the proposed method for PJ leakage control, 0.05 0.01 19.80 1.84 5.16 numerical simulations are performed. The simulation setup is as 0.05 0.03 16.28 -1.54 -11.71 follows: input is 4096 copies of PRBS-7 pattern, UI (unit 0.05 0.10 -27.89 4.87 -17.38 interval) is 200 ps (5 GHz bit rate), 127 bit per pattern, 1024 0.30 0.01 45.03 3.20 33.26 samples per bit, PJ frequency is 5MHz with leakage, RJrms is 0.30 0.03 17.61 1.58 17.99 0.01UI, 0.03UI, and 0.1UI, PJpeak-to-peak is 0.1UI, 0.6UI, and 1 UI, 0.30 0.10 2.92 4.55 -8.71 and ISIpeak-to-peak is 0.05UI, 0.2UI, and 0.5 UI. 0.50 0.01 80.89 2.72 70.54 0.50 0.03 53.67 2.19 2.04 The separated jitter results with PJ leakage are shown Table 0.50 0.10 23.25 3.07 -8.72 II and Table III. The separated RJrms is not accurate for big PJ and small RJ as expected where the bigget RJrms error is 27.8%. On the other hand, the results of PJ and ISI are quite accurate made of every edge in the pattern. The mean values of these where PJpeak-to-peak error is less than 6.09 %, and ISIpeak-to-peak histograms are compared to the ideal edge location. The error is less than 1.56 %. measured mean location is subtracted from the calculated ideal Table IV and V show the experimental results when the edge location as shown in Fig. 4. proposed method for PJ leakage control has been used. As In order to separate DCD+ISI into DCD and ISI, IFFT and shown the tables, the proposed method shows significant histograms can be deployed. If inverse FFT for the DCD+ISI improvement in leakage control. The measurement results of spectrum has to be performed, two histograms in time domain is RJrms are accurate for all PJ value simulated. The results of PJ computed for rising edge and falling edge, where DCD is the and RJ are more accurate where PJpeak-to-peak error is less than difference between the two mean values of the histograms, and 1 %, and ISIpeak-to-peak error is less than 0.6 %. ISI is the average peak-to-peak of histograms. RJ is a Gaussian and white with RJrms=σ, the power spectrum density (PSD) is σ2 for the entire frequency domain. After INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 4, NO. 2, DECEMBER 2013 35 TABLE II VI. CONCLUSION SEPARATED JITTER RESULTS WITH PJ LEAKAGE (RJ << PJ) In this paper, a new jitter decomposition method has been Injected Jitter (UI) Measured Jitter (UI) Error (%) presented and proposed a new method to avoid PJ leakage in the RJ PJ ISI RJ PJ ISI RJ PJ ISI rms p-p p-p rms p-p p-p rms p-p p-p decomposition has been proposed. The simulation results show 0.01 0.1 0.05 0.0101 0.1043 0.0508 1.1 4.32 1.56 that the new method gives significant improvement in PJ 0.01 0.1 0.2 0.0101 0.104 0.2021 1.07 4.02 1.07 leakage control in case of that RJ is much less than PJ. 0.01 0.1 0.5 0.0101 0.1047 0.5 1.16 4.69 0.00 0.01 0.6 0.05 0.0112 0.6235 0.0508 11.74 3.92 1.56 Measurement results of RJrms are accurate for all PJ values 0.01 0.6 0.2 0.0111 0.6217 0.2021 11.39 3.62 1.07 simulated. In the future, we will apply this method to a real 0.01 0.6 0.5 0.0111 0.6204 0.5 11.46 3.39 0.00 0.01 1 0.05 0.0128 1.0398 0.0508 27.57 3.98 1.56 serial data channel. 0.01 1 0.2 0.0128 1.0383 0.2021 27.80 3.83 1.07 0.01 1 0.5 0.0128 1.0349 0.5 27.70 3.49 0.00 0.03 0.1 0.05 0.0301 0.1037 0.0508 0.27 3.68 1.56 REFERENCES 0.03 0.1 0.2 0.0301 0.1035 0.2021 0.22 3.46 1.07 [1] K. K. Kim, J. Huang, Y. Kim, and F. Lombardi, “Analysis and Simulation 0.03 0.1 0.5 0.0301 0.1061 0.5 0.25 6.09 0.00 0.03 0.6 0.05 0.0305 0.6205 0.0508 1.76 3.42 1.56 of Jitter Sequences for Testing Serial Data Channels”, IEEE Transactions 0.03 0.6 0.2 0.0305 0.6203 0.2021 1.76 3.38 1.07 on Industrial Informatics, Vol. 4, No. 2, pp. 134-143, MAY 2008. [2] Mike P. Li., “Jitter And Signaling Test For High-Speed Links,” IEEE TABLE III Custom Integrated Circuits, pp. 65-72, Sept. 2006. SEPARATED JITTER RESULTS WITH PJ LEAKAGE (RJ ≤ PJ) [3] F. Herzel and B. Razavi, “A Study of Oscillator Jitter Due to Supply and Substrate Noise,”” IEEE Trans. on Circuits and Systems II: Analog and Injected Jitter (UI) Measured Jitter (UI) Error (%) Digital Signal Processing, vol. 46, no. 1, pp. 56–62, Jan. 1999. RJ PJ ISI RJ PJ ISI RJ PJ ISI [4] “Precision Jitter Analysis Using the Agilent 86100C” DCA-J. Agilent rms p-p p-p rms p-p p-p rms p-p p-p Technologies, 2007. 0.03 0.6 0.50 0.0305 0.6206 0.5000 1.66 3.43 0.00 0.03 1.0 0.05 0.0311 1.0336 0.0508 3.70 3.36 1.56 [5] N. Ou, T. Farahmand et al., “Jitter Models for the Design and Test of 0.03 1.0 0.20 0.0312 1.0342 0.2021 3.99 3.42 1.07 Gbps-Speed Serial Interconnects,” IEEE Design and Test of Computers, 0.03 1.0 0.50 0.0312 1.0389 0.5000 3.86 3.89 0.00 vol. 21, Jul-Aug 2004. 0.10 0.1 0.05 0.1001 0.1060 0.0508 0.09 6.01 1.56 [6] Gene L. Harding, “A jitter education: A more detailed look at jitter for the 0.10 0.1 0.20 0.1000 0.1092 0.2021 -0.02 9.16 1.07 sophomore,” IEEE Frontiers in education conference(FIE), pp. T1C-17- 0.10 0.1 0.50 0.1000 0.1033 0.5000 0.02 3.29 0.00 0.10 0.6 0.05 0.1002 0.6186 0.0508 0.19 3.11 1.56 T1C-21, Oct. 2007. 0.10 0.6 0.20 0.1002 0.6201 0.2021 0.20 3.35 1.07 [7] Qingqi Dou and Jacob A. Abraham, “Jitter decomposition by time lag 0.10 0.6 0.50 0.1002 0.6253 0.5000 0.21 4.22 0.00 correlation,” IEEE ISQED’06, pp.6, March 2006. 0.10 1.0 0.05 0.1006 1.0295 0.0508 0.58 2.95 1.56 [8] Jiun-Lang Huang, “A Random Jitter Extraction Technique in the 0.10 1.0 0.20 0.1005 1.0313 0.2021 0.55 3.13 1.07 Presence of Sinusoidal Jitter,” Test Symposium, 2006. ATS ’06. 15th 0.10 1.0 0.50 0.1006 1.0289 0.5000 0.64 2.89 0.00 Asian, pp. 318-326, Nov. 2006. [9] Lattice Semiconductor Corp., “LatticeSC SERDES Jitter,” Technical TABLE IV Note TN1084: http://www.latticesemi.com/lit/docs/technotes/tn1084.pdf, SEPARATED JITTER RESULTS USING THE PROPOSE METHOD (RJ << PJ) April 2007. Injected Jitter (UI) Measured Jitter (UI) Error (%) [10] C.-K. Ong, D. Hong, K.-T. Cheng, and L.-C. Wang, “Jitter Spectral RJ PJ ISI RJ PJ ISI RJ PJ ISI Extraction for Multi-gigahertz Signal,” in Proc. ASP-DAC, Asia and rms p-p p-p rms p-p p-p rms p-p p-p South Pacific. IEEE, pp. 298-303, Jan. 2004. 0.01 0.1 0.05 0.01005 0.10011 0.05078 0.539 0.110 1.562 [11] Agilent Technologies, “Using Clock Jitter Analysis to Reduce BER in 0.01 0.1 0.20 0.01005 0.09997 0.20215 0.472 -0.026 1.074 Serial Data Applications,” Application Note: 0.01 0.1 0.50 0.01004 0.10016 0.50000 0.418 0.158 0.000 http://cp.literature.agilent.com/litweb/pdf/5989-5718EN.pdf, Dec. 2006. 0.01 0.6 0.05 0.01004 0.59994 0.05078 0.370 -0.011 1.562 0.0 0.6 0.20 0.01005 0.60000 0.20215 0.521 0.000 1.074 [12] C.-Y. Kuo and J.-L. Huang, “A period tracking based on-chip sinusoidal 0.01 0.6 0.50 0.01005 0.59998 0.50000 0.469 -0.003 0.000 jitter extraction technique,” IEEE VLSI Test Symposium, pp. 6, May 0.01 1.0 0.05 0.01005 1.00004 0.05078 0.473 0.004 1.562 2006. 0.01 1.0 0.20 0.01005 1.00006 0.20215 0.547 0.006 1.074 [13] James F. Buckwalter and Ali Hajimiri, “Analysis and Equalization of 0.01 1.0 0.50 0.01006 0.99988 0.50000 0.590 -0.012 0.000 Data-Dependent Jitter,” IEEE Journal of Solid-State Circuits, Vol. 41, No. 0.03 0.1 0.05 0.03000 0.09991 0.05078 -0.011 -0.094 1.562 3, pp. 607-620, March 2006. 0.03 0.1 0.20 0.02998 0.10053 0.20215 -0.077 0.529 1.074 0.03 0.1 0.50 0.02998 0.09954 0.50000 -0.065 -0.459 0.000 0.03 0.6 0.05 0.03000 0.59989 0.05078 -0.006 -0.019 1.562 0.03 0.6 0.20 0.03002 0.59951 0.20215 0.056 -0.082 1.074 TABLE V Kyung Ki Kim received the B.S. and M.S. degrees in SEPARATED JITTER RESULTS USING THE PROPOSE METHOD (RJ ≤ PJ) electronic engineering from Yeungnam University, Kyeongsan, South Korea, in 1995 and 1997, respectively, Injected Jitter (UI) Measured Jitter (UI) Error (%) RJ PJ ISI RJ PJ ISI RJ PJ ISI and the Ph.D. degree in computer engineering from rms p-p p-p rms p-p p-p rms p-p p-p Northeastern University, Boston, MA, in 2008. In 2008, 0.03 0.6 0.50 0.03000 0.60001 0.50000 0.005 0.002 0.000 he was a member of the technical staff with Sun 0.03 1.0 0.05 0.02998 1.00014 0.05078 -0.064 0.014 1.562 Microsystems, Santa Clara, CA, where he was involved in 0.03 1.0 0.20 0.03000 0.99993 0.20215 -0.001 -0.007 1.074 ROCK project. In 2009, he was a senior researcher with 0.03 1.0 0.50 0.02997 1.00101 0.50000 -0.105 0.101 0.000 Illinois Institute of Technology, Chicago, IL. Currently, he is an assistant 0.10 0.1 0.05 0.09996 0.09892 0.05078 -0.040 -1.081 1.562 professor at Daegu University, South Korea. His current research focuses on 0.10 0.1 0.20 0.10002 0.09985 0.20215 0.020 -0.155 1.074 0.10 0.1 0.50 0.09998 0.09974 0.50000 -0.021 -0.256 0.000 nanoscale CMOS design, high speed low power VLSI design, analog VLSI 0.10 0.6 0.05 0.09986 0.59921 0.05078 -0.144 -0.131 1.562 circuit design, electronic CAD, asynchronous circuit, and nano-electronics. 0.10 0.6 0.20 0.09989 0.60159 0.20215 -0.107 0.265 1.074 0.10 0.6 0.50 0.10001 0.59948 0.50000 0.009 -0.086 0.000 0.10 1.0 0.05 0.09983 1.00056 0.05078 -0.169 0.056 1.562 0.10 1.0 0.20 0.09991 0.99947 0.20215 -0.092 -0.053 1.074 0.10 1.0 0.50 0.09991 1.00017 0.50000 -0.090 0.017 0.000 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS The International Journal of Design, Analysis and Tools for Integrated Circuits and Systems (IJDATICS) was created by a network of researchers and engineers both from academia and industry. IJDATICS is an international journal intended for professionals and researchers in all fields of design, analysis and tools for integrated circuits and systems. The objective of the IJDATICS is to serve a better understanding between the community of researchers and practitioners both from academia and industry. Editor-In-Chief Ka Lok Man Xi'an Jiaotong-Liverpool University, China, Baltic Institute of Advanced Technology, Lithuania Co-Editor-In-Chief Chi-Un Lei University of Hong Kong, Hong Kong Amir-Mohammad Rahmani University of Turku, Finland Managing Editor Michele Mercaldi, Kaiyu Wan, Tomas Krilavičius, EnvEve, Switzerland Xi'an Jiaotong-Liverpool University, China Baltic Institute of Advanced Technologies, Lithuania Vytautas Magnus University, Lithuania Journal Secretary Publishing Manager Jun Wang, Nan Zhang, Fujitsu Laboratories of America, Inc., USA Xi'an Jiaotong-Liverpool University, China Linguistic Editor Nigel Julian Dixon, Caren Crowley, Xi'an Jiaotong-Liverpool University, China Katholieke Universiteit Leuven, Belgium Associate Editor Chao Lu Hai-Ning Liang Mou Ling Dennis Wong Arctic Sand Technologies Inc. Cambridge, MA, US Xi'an Jiaotong-Liverpool University, China Swinburne University of Technology, Malaysia Editorial Board Vladimir Hahanov, Kharkov National University of Franck Vedrine, CEA LIST, France Cheng C. Liu, University of Wisconsin at Stout, USA Radio Electronics, Ukraine Bruno Monsuez, ENSTA, France Farhan Siddiqui, Walden University, Minneapolis, Paolo Prinetto, Politecnico di Torino, Italy Kang Yen, Florida International University, USA USA Massimo Poncino, Politecnico di Torino, Italy Takenobu Matsuura, Tokai University, Japan Katsumi Wasaki, Shinshu University, Japan Alberto Macii, Politecnico di Torino, Italy R. Timothy Edwards, MultiGiG, Inc., USA Pankaj Gupta, Microsoft Corporation, USA Joongho Choi, University of Seoul, South Korea Olga Tveretina, Karlsruhe University, Germany Masoud Daneshtalab, University of Turku, Finland Wei Li, Fudan University, China Maria Helena Fino, Universidade Nova De Lisboa, Amit Chaudhry, Technology Panjab University, India Michel Schellekens, University College Cork, Ireland Portugal Bharat Bhushan Agarwal, I.F.T.M., University, India Emanuel Popovici, University College Cork, Ireland Adrian Patrick ORiordan, University College Cork, Abhilash Goyal, Oracle (SunMicrosystems), USA Jong-Kug Seon, System LSI Lab., LS Industrial Ireland Boguslaw Cyganek, AGH University of Science and Systems R&D Center, South Korea Grzegorz Labiak, University of Zielona Gora, Poland Technology, Poland Umberto Rossi, STMicroelectronics, Italy Jian Chang, Texas Instruments, Inc, USA Yeo Kiat Seng, Nanyang Technological University, Franco Fummi, University of Verona, Italy Yeh-Ching Chung, National Tsing-Hua University, Singapore Graziano Pravadelli, University of Verona, Italy Taiwan Youngmin Kim, UNIST Academy-Industry Research Yui Fai Lam, Hong Kong University of Science and Anna Derezinska, Warsaw University of Technology, Corporation, South Korea Technology, Hong Kong Poland Tom English, Xlinx, Ireland Ajay Patel, Intelligent Support Ltd, United Kingdom Kyoung-Rok Cho, Chungbuk National University, Nicolas Vallee, RATP, France Jinfeng Huang, Philips & LiteOn Digital Solutions South Korea Rajeev Narayanan, Cadence Design Systems, Austin, Netherlands, The Netherlands Yuanyuan Zeng, Wuhan university, China TX, USA Thierry Vallee, Georgia Southern University, D.P. Vasudevan, University College Cork, Ireland Xuan Guan, Freescale Semiconductor, Austin, TX, Statesboro, Georgia, USA Arkadiusz Bukowiec, University of Zielona Gora, USA Monica Donno, Minteos, Italy Poland Pradip Kumar Sadhu, Indian School of Mines, India Jun-Dong Cho, Sung Kyun Kwan University, South Maziar Goudarzi, Sharif University of Technology, Fei Qiao, Tsinghua University, China Korea Iran Ding-Yuan Cheng, National Chiao Tung University, AHM Zahirul Alam, International Islamic University Jin Song Dong, National University of Singapore, Taiwan Malaysia, Malaysia Singapore Shin-Il Lim, Seokyeong University, Seoul Korea Gregory Provan, University College Cork, Ireland Dhamin Al-Khalili, Royal Military College of Canada, Pradeep Sharma, IEC College of Engineering & Miroslav N. Velev, Aries Design Automation, USA Canada Technology, Greater M. Nasir Uddin, Lakehead University, Canada Zainalabedin Navabi, University of Tehran, Iran Noida, GB Nagar UP, India Dragan Bosnacki, Eindhoven University of Lyudmila Zinchenko, Bauman Moscow State Ausra Vidugiriene, Vytautas Magnus University, Technology, The Netherlands Technical University, Russia Lithuania Milan Pastrnak, Siemens IT Solutions and Services, Muhammad Almas Anjum, National University of Sheung-Hung Poon, National Tsing Hua University, Slovakia Sciences and Technology (NUST), Pakistan Taiwan John Herbert, University College Cork, Ireland Deepak Laxmi Narasimha, University of Malaya, Lixin Cheng, Suzhou Institute of Nano-Tech and Zhe-Ming Lu, Sun Yat-Sen University, China Malaysia Nano-Bionics (SINANO), Jeng-Shyang Pan, National Kaohsiung University of Danny Hughes, Katholieke Universiteit Leuven, Chinese Academy of Sciences, China Applied Sciences, Taiwan Belgium Yue Yang, Suzhou Institute of Nano-Tech and Nano- Chin-Chen Chang, Feng Chia University, Taiwan A.P. Sathish Kumar, PSG Institute of Advanced Bionics (SINANO), Studies, India Chinese Academy of Sciences, China Mong-Fong Horng, Shu-Te University, Taiwan N. Jaisankar, VIT University. India Yo-Sub Han, Yonsei University, South Korea Liang Chen, University of Northern British Columbia, Canada Atif Mansoor, National University of Sciences and Chien-Chang Chen, Tamkang University, Taiwan Technology (NUST), Pakistan Hui-huang Hsu, Tamkang University, Taiwan Chee-Peng Lim, University of Science Malaysia, Steven Hollands, Synopsys, Ireland Malaysia Siamak Mohammadi, University of Tehran, Iran Hwann-Tzong Chen, National Tsing Hua University, Salah Merniz, Mentouri University, Constantine, Felipe Klein, State University of Campinas Taiwan Algeria (UNICAMP), Brazil Wichian Sittiprapaporn, Mahasarakham University, Oscar Valero, University of Balearic Islands, Spain Enggee Lim, Xi'an Jiaotong-Liverpool University, Thailand Yang Yi, Sun Yat-Sen University, China China Aseem Gupta, Freescale Semiconductor Inc., Austin, Damien Woods, University of Seville, Spain Kevin Lee, Murdoch University, Australia TX, USA Matthieu Moy, Verimag Laboratory, France Prabhat Mahanti, University of New Brunswick, Saint Kevin Marquet, Verimag Laboratory, France Ramy Iskander, LIP6 Laboratory, France John, Canada Brian Logan, University of Nottingham, UK Suryaprasad Jayadevappa, PES School of Engineering, Tammam Tillo, Xi'an Jiaotong-Liverpool University, Asoke Nath, St. Xavier's College (Autonomous), India India China Tharwon Arunuphaptrairong, Chulalongkorn Shanmugasundaram Hariharan, Pavendar Wen Chang Huang, Kun Shan University, Taiwan University, Thailand Bharathidasan College of Engineering and Masahiro Sasaki, The University of Tokyo, Japan Shin-Ya Takahasi, Fukuoka University, Japan Technology, India Shishir K. Shandilya, NRI Institute of Information Shiho Kim, Chungbuk National University, Korea Chung-Ho Chen, National Cheng-Kung University, Science & Technology, India Hi Seok Kim, Cheongju University, Korea Taiwan J.P.M. Voeten, Eindhoven University of Technology, Yanyan Wu, Xi'an Jiaotong-Liverpool University, Kyung Ki Kim, Daegu University, Korea The Netherlands China
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-