Linear discriminant analysis Gaussian class densities with common covariance matrix Two-class classification problem f u ( u | C i ) = 1 √ (2 π ) r det ( Σ ) exp ( − ( u ( t ) − μ i ) T Σ − 1 ( u ( t ) − μ i ) 2 ) , i = 1 , 2 It follows that: P ( C 1 | u ( t )) = f u ( u | C 1 ) P ( C 1 ) f u ( u | C 1 ) P ( C 1 ) + f u ( u | C 2 ) P ( C 2 ) = 1 1 + e − z ( t ) where z ( t ) = u T ( t ) Σ − 1 ( μ 1 − μ 2 ) + 1 2 μ T 2 Σ − 1 μ 2 − 1 2 μ T 1 Σ − 1 μ 1 + log P ( C 1 ) P ( C 2 ) = φ T ( t ) θ with φ ( t ) = [ 1 u 1 ( t ) u 2 ( t ) · · · u r ( t ) ] T , θ = [ β 0 β 1 · · · β r ] T = [ β 0 β T ] T and β 0 = 1 2 μ T 2 Σ − 1 μ 2 − 1 2 μ T 1 Σ − 1 μ 1 + log P ( C 1 ) P ( C 2 ) , β = Σ − 1 ( μ 1 − μ 2 ) Roberto Diversi Learning and Estimation of Dynamical Systems M – p. 15/21 Linear discriminant analysis Of course, P ( C 2 | u ( t )) = 1 − P ( C 1 | u ( t )) . As for the logistic regression, the decision boundary is linear and is given by the hyperplane z ( t ) = 0 that is φ T ( t ) θ = 0 The estimates required in steps 1. and 2. can be computed as fo llows: ˆ μ 1 = 1 N 1 N ∑ t =1 y ( t ) u ( t ) , ˆ μ 2 = 1 N 2 N ∑ t =1 (1 − y ( t )) u ( t ) ˆ Σ = N 1 N N ∑ t =1 y ( t )( u ( t ) − ˆ μ 1 )( u ( t ) − ˆ μ 1 ) T + N 2 N N ∑ t =1 (1 − y ( t ))( u ( t ) − ˆ μ 2 )( u ( t ) − ˆ μ 2 ) T ˆ P ( C 1 ) = N 1 N 1 + N 2 , ˆ P ( C 2 ) = 1 − ˆ P ( C 1 ) where N 1 is the number of inputs belonging to class C 1 and N 2 = N − N 1 is the number of inputs belonging to class C 2 A new input u ( t ) is assigned to class C 1 if ˆ P ( C 1 | u ( t )) > 0 5 and to class C 2 otherwise. As for logistic regression, a different decisio n criterion can be used. Roberto Diversi Learning and Estimation of Dynamical Systems M – p. 16/21 Linear discriminant analysis Multiclass problem f u ( u | C i ) = 1 √ (2 π ) r det ( Σ ) exp ( − ( u ( t ) − μ i ) T Σ − 1 ( u ( t ) − μ i ) 2 ) , i = 1 , 2 , . . . , M It follows that: P ( C i | u ( t )) = f u ( u | C i ) P ( C i ) ∑ M k =1 f u ( u | C k ) P ( C k ) = e z i ( t ) ∑ M k =1 e z k ( t ) where z i ( t ) = u T ( t ) Σ − 1 μ i − 1 2 μ T i Σ − 1 μ i + log P ( C i ) = u T ( t ) β + β 0 = φ T ( t ) θ i The input space U = R r can thus be divided into M regions defined by a set of M hyperplanes. Each hyperplane represents the linear decisi on boundary between two classes. More precisely, the hyperplane separating classe s k and j is described by the equation ( z k ( t ) − z j ( t )) = 0 that is φ T ( t ) ( θ k − θ j ) = 0 Roberto Diversi Learning and Estimation of Dynamical Systems M – p. 17/21 Linear discriminant analysis The required estimates can be computed as follows: ˆ μ i = 1 N i ∑ t : y ( t )= y i u ( t ) , i = 1 , 2 , . . . , M ˆ Σ = M ∑ i =1 N i N ∑ t : y ( t )= y i ( u ( t ) − ˆ μ i )( u ( t ) − ˆ μ i ) T ˆ P ( C i ) = N i N , i = 1 , 2 , . . . , M where N i is the number of inputs belonging to class C i A new input u ( t ) is assigned to the class C k such that k = arg max i ∈ { 1 , 2 ,...,M } ˆ P ( C i | u ( t )) Roberto Diversi Learning and Estimation of Dynamical Systems M – p. 18/21