MATH 111 Chapter 2C 1 O u v w ε A β MATH 111 Complete Notes Chapter 2 Geometry and Regression 21 Equation of a plane 22 Approximation and best - fit 23 Hats and codes MATH 111 Chapter 2C 2 21 C The equation of a plane Orthogonality The dot product of two vectors u = ( a , b , c ) and v = ( x , y , z ) is defined as: u • v = = ( a , b , c )•( x , y , z ) = ax + by + cz. I have given this definition in R 3 but it can be formulated in any R n Thus (1, 2, 3 , 4 )•( 4, 3, 2, 1 ) = 4 + 6 + 6 + 4 = 20. (1, 2)•(4, – 2) = 4 – 4 = 0. The dimension s of u and v have to be the same, and their dot pro d- uct is a scalar, a real number. The dot product is called a “product” so you might suspect that it distri b utes over addition (as any self - respecting “product” ought to do). And indeed it does: w •( u + v ) = w • u + w • v It is also useful to observe that it is commutative: u • v = v • u The length of a vector u is defined as ║ u ║ = ║( a , b , c )║ = 2 2 2 c b a Note that the dot product of a vector with itself is the square of its length: u • u = = ( a , b , c )•( a , b , c ) = a 2 + b 2 + c 2 = ║ u ║ 2 T he distance between two points is the length of their vector difference For example, the distance between the points P = (1, 2, 3) and Q = (4, – 2, 1) is: dist[P, Q] = |P – Q| = |(1, 2, 3) – (4, – 2, 1)| = |( – 3, 4, 2)| = 29 2 4 3 2 2 2 Other dot - product notations 3 1 2 4 3 2 1 3 1 2 4 3 2 1 In thi s second case we are r e- garding it as a matrix multipl i- cation. The vector (1, – 2, 3) has length 14 9 4 1 MATH 111 Chapter 2C 3 v u u-v Orthogonality. We say that two vectors u and v are orthogonal and write u v if the corresponding arrows drawn from the origin are perpe n dicular. Is there an easy way to tell when two vectors in R 3 are orthogonal? I n- deed there is. The Orthogonality Theorem . Two vectors are orthogonal if and only if the ir dot product is zero: u v u • v = 0 Proof of the Geometry Result : Consider the triangle formed by the ve c- tors u , v and u – v . The cosine law gives us: ║ u – v ║ 2 = ║ u ║ 2 + ║ v ║ 2 – 2|| u ||•|| v || cos θ Expand the l eft side: ║ u – v ║ 2 = ( u – v )•( u – v ) = u • u – u • v – v • u + v • v = ║ u ║ 2 + ║ v ║ 2 – 2 u • v Compare the two sides of the first equation: 2 u • v = 2|| u ||•|| v || cos θ . Now we make the argument. If u v then θ = 90° and that implies cos θ = 0 and th us u • v = 0. Conversely, if u • v = 0, then if neither u nor v is zero (so that they have positive length) we must have cos θ = 0 and that implies that θ is either 90° or 270° and in either case u and v are ortho g- onal. Example 211. Find a vector that is orthogonal to both u = (1, 1, 1) and v = (1, – 1, 2). Solution. If the vector is ( x , y , z ), we want ( x , y , z )•(1, 1, 1) = x + y + z = 0 ( x , y , z )•(1, – 1, 2) = x – y + 2 z = 0 eliminate one variable in the easiest way, by adding the equations: 2 x + 3 z = 0 We have a system of two equations in three unknowns. Now we expect a free variable as the solution will not be unique (any mult i- ple of a solution will be a solution). Thus, we can exercise the d e- gree of freedom, letting x = 3. That makes z = – 2. The second equation then gives: y = x + 2 z = 3 – 4 = – 1. We get the solution: ( x , y , z ) = (3, – 1, – 2). And we can check that it is orthogonal to both u and v v u MATH 111 Chapter 2C 4 Example 21 2 Convince me that the equation 3 x + 2 y – z = 0 describes a plane. A parametric reformulation. Try to parameterize the graph with the points of the x - y plane and see if that will reveal a planar graph. U s- ing s and t as parameters for x and y the equation can be wri t ten: x = s y = t z = 3 s + 2 t Writing thi s in vector form: 2 1 0 3 0 1 t s z y x And this tells us that the graph is the set of all linear combinations of two ve c tors which do not lie along the same line. And that’s clearly a plane. A dot product formulation. W rite the left - hand side of the equation as a dot product. The expression 3 x + 2 y – z can be regarded as the dot pro d uct of the vector n = [3, 2, – 1] with the vector OP = [ x , y , z ]. The equ a tion becomes: 3 x + 2 y – z = 0 [3, 2, – 1] •[ x , y , z ] = 0 n • OP = 0 which says OP n There –– we’ve transformed the algebraic statement into a geome t- ric one. The equation specifies the set of all points P for which OP is perpendicular to the fixed vector n . Our geometric intuition tells us e x actly what that will look like –– those points P l ie in the plane which passes through the or i gin perpendicular to n Question. Does the equation 3 x + 2 y – z = – 6 also describe a plane? Yes it does. Write it as z = 3 x + 2 y + 6 and we see that it is e x actly the plane of the above example rais ed by 6 units. In partic u lar, it has the same normal: [3, 2, – 1]. O P n O Both approaches convince us that the equation d e scribes a plane. The vector n = [3, 2, – 1] is o r thogonal to the plane and is called a no r mal to the plane. MATH 111 Chapter 2C 5 Example 21 3 . Find an equation for the plane which is perpendic u lar to the vector n = [1, – 2, 4] and passes through the point Q( – 4, 5, 6). Solution . We look for a condition that will hold for a point P( x , y , z ) pr e- cisely when it lies on the plane. The idea comes from considering the vector QP. Since this vector lies in the plane, it is perpendicular to n : QP n QP n = 0. ( x +4, y – 5, z – 6) (1, – 2, 4) = 0. ( x+ 4) – 2( y – 5) + 4( z – 6 ) = 0. This is called the point - normal form. If we like we can put all the co n- stants over to the right side. We get what’s called the standard form x – 2 y + 4 z = 10. Example 21 4 Find a standard form equation ax + by + cz = d for the plane d e- scri bed by the parametric equation: P = A + s q + t r 0 2 1 2 2 1 1 2 4 t s z y x Solution. The key ingredient of the normal form equation is the normal ve c tor n = [ a , b , c ] that is orthogonal to the plane. How might we find that? Well the vectors q and r are both parallel to the plane, so n must be o r thogonal to both: n q = [ a , b , c ] [1, 2, 2] = a + 2 b + 2 c = 0 n r = [ a , b , c ] [ – 1, 2, 0] = – a + 2 b = 0 This gives us two equations in the 3 unknowns a , b and c , but of course since n can only be determined up to a constant, we expect a degree of freedom. The second equation says a = 2 b so we set b =1 and a =2 (using the degree of fre e- dom). Then the first equation says 2 c = – a – 2 b = – 2 – 2 = – 4, giving c = – 2. The vector n can be taken as: n = [2, 1, – 2]. Using the fact that the point A(4, 2, 1) is on the plane (set s=t= 0), we get the point - normal form: 2( x – 4) + ( y – 2) – 2( z – 1) = 0 We can get the normal form by putting the constants over to the right: 2 x + y – 2 z = 8 . n Q P QP A P r q The equatio n of a plane. We have two forms. The point - normal form: a ( x – x 0 ) + b ( y – y 0 ) + c ( z – z 0 ) = 0 Gives you a point Q( x 0 , y 0 , z 0 ) on the plane and shows clearly that the plane is perpendicular to the ve c tor [ a , b , c ]. The standard form ax + by + cz = d is mo re compact but less transparent. The normal vector [ a , b , c ] is still perpendicular to the plane, but this is not immediately clear from the equation. But it’s worth e m phasizing: Proposition. The vector [ a , b , c ] is orthogonal to the plane ax + by + cz = d. MATH 111 Chapter 2C 6 Example 215. Para metric equation of a line If we interpret t as time, the three equ a tions: x = 3 + t y = 1 + 2 t z = 2 + t. describe a point P moving in 3 - space. (a) What does its trajectory look like? (b) How fast is P moving at any time? Solution. (a) These are linear equations, and thus we feel that the traject o ry should be a line. To establish this write the equation s in vector form. p = a + t v z y x = 2 1 3 + t 1 2 1 At t = 0 P is at the point A(3, 1, 2). The points at t > 0 are found by adding the t - multiple of the vector v = [1, 2, 1]. Our geometric intuition tells us that this is a line L. The points on the line are o b- tained by adding mu l tiples of v to A. (b) In any unit of time, the poi nt moves a distance given by the length of the vector v : || v || = 6 1 2 1 2 2 2 . This is the speed of the point. We might also have argued that the speed is the length of the v e- locity vector: speed = ||d p /d t || = || v || = 6 x y z A(3,1,2) v MATH 111 Chapter 2C 7 Example 21 6 . A reflection problem. Given the point A(2, 4, – 6) and the plane x – 2 y + z = 6, find the reflection B of A in the plane. That is, if the plane is a mi r ror, B is the image of A. Solution. Here's one strategy. Find the line through A perpendic u- lar to the plane. Let it hit the plane at C. Then locate B so that C is the hal f way point between A and B. Okay. A normal to the plane is n = [1, – 2, 1], so the line through A o r thogonal to the plane has this as direction vector. The line is: 1 2 1 6 4 2 t z y x This might also be written x = 2+ t y = 4 – 2 t z = – 6+ t Now we have to find C. That’s where the line intersects the plane. So plug the equation of the line into the equ a tion of the plane: (2+ t ) – 2(4 – 2 t ) + ( – 6+ t ) = 6. t = 6 + 12 = 18. t = 3. Thus the intersection point is C = (2, 4, – 6) + 3(1, – 2, 1) = (5, – 2, – 3). How to find B? One way is to write: B = A + 2(AC) = A + 2(C – A) = – A + 2C = – (2, 4, – 6) + 2(5, – 2, – 3) = (8, – 8, 0). A B C There’s a really neat way to find B without finding C, just from the fact that you know the t value of C. Can you see it? Look. Think of the line. A has t - value 0. C has t - value 3. What do you think is the t - value of B? It’s t =6. That’s because as t changes we move along the line at constant speed. Thus putting t =6: B = (2 +6, 4 – 12, – 6+6) = (8, – 8, 0). MATH 111 Chapter 2C 8 n O B Example 21 7 How far ap art are the following two planes? z = 3 x + 2 y z = 3 x + 2 y + 6 Solution. The planes are parallel — indeed the first plane passes through the origin and the second is obtained from the first by “lif t- ing” it 6 units -- that is, translating it 6 units up the z axis. Writing the planes in standard form: – 3 x – 2 y + z = 0 – 3 x – 2 y + z = 6 We see that, as expected, they both have the same normal n = [3, 2, – 1]. W hat is wanted is the perpendicular distance between the planes We will use a strategy similar to that of the previous example. Take a point on the first plane, draw a line from that point orthog o- nal to the planes and extend it until it hits the second plane. The length of that line segment will be the distance between the planes. Okay. For a point on the first plane, take the origin. The the line will have as direction vector the normal n so will have equation: t z t y t x or t t z y x 2 3 1 2 3 1 2 3 0 0 0 This will intersect the second plane – 3 x – 2 y + z = 6 when: – 3(3 t ) – 2(2 t ) + ( – t ) = 6 which solves to give t = – 3/7. Thus the intersection with the se c- ond plane is at 1 2 3 7 3 z y x and the distance between the planes will be the length of this vector which is 14 7 3 ~ 1.6. What do we mean by the distance between two parallel planes? We mean the shortest or perpendicular di s tance between the m And to get hold of that we need the normal vector. MATH 111 Chapter 2C 9 22C. Approximation and best - fit Example 221. Con sider the system of equ a tions: 1 2 1 2 2 1 0 2 2 2 1 1 = 1 1 1 which has the form: A β = w Clearly there is no solution. The problem here is to find a way to identify a “best approx i ma tion” to a solution. Use a diagram to formulate a geometric condition for this. One way we have learned to think about this system is that it’s asking us to write w as a linear combination of the columns of A Call these co l umns u and v . Then wri te the system as: w v u Now u and v span a plane and since w cannot be written as a li n- ear combination of u and v , it does not lie in that plane. Working with the diagram, we might propose that a good cand i- date for a “best” approximati on to a solution would be the linear combination that was closest to w , that is, that minimized the length of the vector ε drawn from the linear combination A β to w And our geometric intu i tion tells us that this will be the foot of the perpendicular drop ped from the point w onto the plane. In that case, the best approximation to β would be the coeff i cients of the linear combination of u and v that gave us the foot of the pe r- pendicular. O u v w ε A β A least - squares solution of the equation A β = w is a vector β = β^ that minimizes the length of the “error” ε = w – A β Geometrical considerations tell us that this will happen when is ortho g- onal to the columns of A . The vector β^ can be calculated as the set of coefficients of the linear co mbination of u and v that gives us the foot of the perpendicular dropped from the point w onto the plane spanned by the columns of A Note: Of course if the system ha p pens to have a solution, then w will lie in the plane, and the minimum ε will be 0 MATH 111 Chapter 2C 10 O u v w ε A β Example 22 2. A matrix comput a tional scheme. In Example 1 we u sed a simple geometrical idea to specify a “best approximation” β to a sol u- tion of a linear system such as: A β = w : 1 1 1 0 2 2 2 1 1 To summarize the method, we first wrote the equation as A β + ε = w 1 1 1 0 2 2 2 1 1 3 2 1 and writing t he first term as a linear combination of the columns of A : α u + β v + ε = w 1 1 1 0 2 1 2 2 1 3 2 1 The ε vector serves as an error, and now there is always a solution to the system, in fact there is a solution for every choice of α and β –– we can alwa ys choose the ε i to make it work. Of course we want to choose the ε i to be as small (in some sense) as possible, and indeed the criter i- on we used in Example 1 was to choose ε to be of minimum length and that happens when it is orthogonal to the plane span ned by the columns of A. Note that this minimizes the sum of the squares of the ε i (which is the square of the length of ε ). Now how do we effectively “use” this orthogonality condition? There’s an elegant computational dance. Take the dot product of the equation with both u and v : α u•u + β u•v + u•ε = u•w α v•u + β v•v + v•ε = v•w Wri t e these in matrix form: w v w u ε v ε u v v u v v u u u Since ε is orthogonal to the plane, u•ε and v•ε are both zero, and we get a “square” system in which the number of eq uations is the same as the number of u n knowns: w v w u v v u v v u u u The dot products are calculated at the right: 3 1 5 3 3 9 We could solve this by elimination, but instead we use the matrix i n- verse approach: 15 7 18 1 30 14 36 1 3 1 9 3 3 5 36 1 3 1 5 3 3 9 1 We g et α = 7/18 and β = 15/18. 1 1 1 0 2 1 2 2 1 w v u u•u = 9 u•v = v•u = – 3 v•v = 5 u•w = 1 v•w = 3 The method of Example 1 gives a solution which min i- mizes || ε|| 2 = ε 1 2 + ε 2 2 + ε 3 2 MATH 111 Chapter 2C 11 O u v w ε A β As a check we calculate the error: = w – A β = 1 1 1 – 0 2 2 2 1 1 15 7 18 1 = 1 1 1 – 7 8 11 9 1 = 2 1 2 9 1 and check that ε is orthogonal t o the vectors u and v . A l ways do that! A technical note: In text - books on the subject, you will often see the original equation A β + ε = w 1 1 1 0 2 2 2 1 1 3 2 1 transformed by multiplying both sides on the left by the transpose of A A T A β + A T = A T w 1 1 1 0 2 1 2 2 1 0 2 1 2 2 1 0 2 2 2 1 1 0 2 1 2 2 1 3 2 1 But if you look carefully, that’s just a slick way of getting our 2×2 sy s- tem. Quite simply, the e n tries of the matrix A T A are the dot products u•u , u•v , v•u and v•v and the system above is exactly the one we a l- ready for mulated : ( A T A ) β + A T = A T w w v w u ε v ε u v v u v v u u u No matter how you think of it, what is important is that you can easily arrive at the corresponding square system of equations which will allow you to find the least - squares approximation. The forms of these matric es are given b e- low for the 2×2 and the 3×3 cases. Table of m atrices for the 2×2 and 3×3 cases A = [ u v ] A T A = v v u v v u u u v u v u A T w = w v w u w v u A = [ u v w ] A T A = w w v w u w w v v v u v w u v u u u w v u w v u A T z = z w z v z u z w v u The least - squares solution of the equation A β = w is the solution β of the “square” system of equations: A T A β = A T w Here A T is the transpose o f A , d e fined as the matrix whose rows are the columns of A Thus: 0 2 1 2 2 1 0 2 2 2 1 1 T MATH 111 Chapter 2C 12 Example 22 3. Find the least squares approximation to the solution of the system of equations: 2 x + y = 3 x – y = 1 x + y = 2 Soln. In matrix form the system is: A x = b 2 1 3 1 1 1 1 1 2 y x The equation for the least - squares soluti on x is A T A x = A T b We calculate: 3 2 2 6 v v u v v u u u A A T 4 9 w v w u w T A And the equation to be solved is : 4 9 3 2 2 6 y x The solution is : 6 19 14 1 4 9 6 2 2 3 14 1 4 9 3 2 2 6 1 y x We get: x = 19/14, y = 6/14. Example 224 Use the abov e approach to find a least squares solution for the equation 2 1 3 3 2 5 0 6 1 Solution. The new system is: 2 1 3 3 5 6 2 0 1 3 2 5 0 6 1 3 5 6 2 0 1 17 7 70 0 0 5 The equations are uncoupled, the first involving only α and the second only β . They read 5 α = 7 and 70 β = – 17 and the solution is 70 / 17 5 / 7 Note : This is an example in which the columns of A are orthogonal. N o tice that this gives us a diagonal coefficient matrix for our new sy s- tem, and the sol u tion can be found immediately. 2 1 3 1 1 1 1 1 2 w v u u•u = 6 u•v = v•u = 2 v•v = 3 u•w = 9 v•w = 4 MATH 111 Chapter 2C 13 0 1 10 20 30 40 50 2 3 4 5 x y 1 2 4 3 Example 22 5. Find the line y = x + that “best fits” the data at the right and calculate and display the error vector defined as the diffe r- ence bet ween the y - value of the point and the height of the approxima t- ing line. [Thus in the diagram, ε 2 is positive (point above the line) and the other three errors are neg a tive. ] Solution. Plug the data points into the equation. We get: y i = x i + + ε i 10 = α + β + ε 1 40 = 2 α + β + ε 2 20 = 3 α + β + ε 3 30 = 4 α + β + ε 4 where the ε i are the "errors" which measure the vertical distance b e tween the i th data point and the line. In vector - matrix form, this is X + = y 30 20 40 10 1 4 1 3 1 2 1 1 4 3 2 1 If there was a line passing exactly through all four data points then we could find a pair ( α , β ) which satisfied all four equation with ε i = 0. But this is not the case so we try to choose α and β to make the error as small as possible. The ideas ab ove tell us that this will be the case when is orthogonal to the columns of the matrix X . In this example, these are ve c- tors in R 4 , but the same dot product analysis works. The sy s tem we get is: w v w u v v u v v u u u 100 270 4 10 10 30 15 4 100 270 30 10 10 4 20 1 100 270 4 10 10 30 1 This tells us that the best fit line is y = 4 x + 15 The error is: = 4 3 2 1 = y – X = 1 7 17 9 15 4 1 4 1 3 1 2 1 1 30 20 40 10 As usual, check that is orthogonal to the columns of X. x y 1 10 2 40 3 20 4 30 The ε i are actually signed e r rors: when the point is below the line, ε i is neg a- tive. This is clear from the equations at the right. The regression line is often called the least squares line That’s b e cause it is the sum of the squares of the errors i that is being minimized. 30 20 40 10 1 1 1 1 4 3 2 1 y v u u•u = 30 u•v = v•u = 10 v•v = 4 u•y = 270 v•y = 100 MATH 111 Chapter 2C 14 Example 226. The data points at the rig ht have been observed. It is d e sired to fit a quadratic regression model: y = x 2 + x + Find the least squares quadratic polynomial and plot its graph along with the data points. Calculate the residual vector Solution: We write the six equ ations, one for each data point as the vector equ a tion: X + = y 2 1 1 0 1 0 1 2 4 1 2 4 1 1 1 1 1 1 1 0 0 1 0 0 6 5 4 3 2 1 The condition that be orthogonal to the columns of X is: X T = 0 X T ( X + ε ) = X T y X T X + X T ε = X T y X T X = X T y 5 7 13 6 6 10 6 10 18 10 18 34 We can solve this by solving the system of equations by elimin a tion or using technol o gy to calculate the matrix inverse. A good website is: http://www.bluebit.gr/matrix - calculator/calculate.aspx The sol u tion is 1 1 1 2 1 5 7 13 2 3 1 3 13 6 1 6 3 4 1 5 7 13 6 6 10 6 10 18 10 18 34 1 giving us the best fit parabola y = 2 1 ( x 2 – x + 1) The residual vector is = y – X = 1 1 1 1 1 1 2 1 1 1 1 2 1 1 2 4 1 2 4 1 1 1 1 1 1 1 0 0 1 0 0 2 1 1 0 1 0 As usual, check that is orthog o nal to the columns of X x y 0 0 0 1 1 0 1 1 2 1 2 2 0 1 2 3 0 1 2 3 y x 0 1 2 3 0 1 2 3 y x y= ( x^ 2 - x + 1)/2 2 1 1 0 1 0 1 1 1 1 1 1 2 2 1 1 0 0 4 4 1 1 0 0 y w v u u•u = 34 v•v = 10 w •w = 6 u•v = v•u = 18 u•w = w•u = 10 v •w = w•v = 6 u•y = 13 v•y = 7 u•y = 5 Having seen this picture, can you see a way you might have deduced that this would have to be the answer without doing any work at all? MATH 111 Chapter 2C 15 x y z Example 227. Find the plane z = x + y passing through the origin that “best fits” the data at the right and ca l culate and display the error vector Solution. Plug the data points int o the equation. We get: z i = x i + y i 10 = α + β + ε 1 3 0 = α + 2 β + ε 2 20 = 2 α + β + ε 3 50 = 2 α + 2 β + ε 4 where the ε i are the "errors" which measure the vertical distance b e tween the i th data point and the plane. In vector - matrix form, thi s is X + = z 50 20 30 10 2 2 1 2 2 1 1 1 4 3 2 1 The least - squares condition is that ε be orthogonal to the columns of the matrix X , that is X T ε = 0. And the a l gebraic condition for that is: ( X T X ) = X T y z y z x y y x y y x x x 190 180 10 9 9 10 280 90 19 1 190 180 10 9 9 10 19 1 190 180 10 9 9 10 1 This tells us that the best fit plane is ) 280 90 ( 19 1 y x z The error is: = 4 3 2 1 = z – X = 210 80 80 180 19 1 740 950 460 380 650 570 370 190 19 1 280 90 2 2 1 2 2 1 1 1 19 1 50 20 30 10 As usual, check that is orthogonal to the columns of X. x y z 1 1 10 1 2 30 2 1 20 2 2 50 50 20 30 10 2 1 2 1 2 2 1 1 z y x x•x = 10 x•y = y•x = 9 y•y = 10 x•z = 180 y•z = 190 MATH 111 Chapter 2C 16 Example 228. At the right is tabulated the mid - term test mark (out of 10) and the final marks for four of Jason’s friends who took the course last year. Jason’s mid - year test mark is 7. He decides to use a linear model y = mx + b to predict his final mark where x is the midterm mark received and y is the expected final mark. What does he di s cover? Solution. We use a “least squares” model. When we plug in the data, we get the equ a tions: 90 = 8 m + b 87 = 7 m + b 94 = 9 m + b 75 = 7 m + b We write this in matrix form and include the error: X b + ε = y [ u v ] b + ε = y 75 94 87 90 1 7 1 9 1 7 1 8 4 3 2 1 b m The least squares solution b is the solution of the equation: ( X T X ) b = X T y 346 2700 4 31 31 243 b m 378 74 11 1 346 2700 4 31 31 243 1 b m The best fit recursive equation is 11 378 74 x y Jason’s estimat e of his final mark is 5 81 11 378 7 74 y Now check that the error ε is orthogonal to the columns of X : 71 10 61 20 11 1 378 74 1 7 1 9 1 7 1 8 825 1034 957 990 11 1 378 74 11 1 1 7 1 9 1 7 1 8 75 94 87 90 1 7 1 9 1 7 1 8 75 94 87 90 4 3 2 1 b m and we calculate: 8×20+7×61+9×( – 10)+8×( – 71) = 0 and also 20+61 – 10 – 71 = 0. Check! student Mid Final Brenda 8 90 Alnoor 7 87 Kit 9 94 Tim 7 75 75 94 87 90 1 1 1 1 7 9 7 8 y v u u•u = 243 u•v = v•u = 31 v•v = 4 u•y = 2700 v•y = 346 MATH 111 Chapter 2C 17 23C The Hat Problem and Error - corre cting Codes Three players are sitting around a circle and either a red or a blue hat is placed on each person’s head. The colour of each hat is d e termined by a coin toss, with the outcome of each toss having no effect on the ot h ers. Each person can see the colour of the others’ hats, but not her own. After a brief pause, during which time players study the other hats, each player e i ther “passes” or attempts to guess the colour of the hat on her head. All three r e sponses are simultaneous so that no pl ayer can use the information gained from the response of a n other player. If at least one player has guessed the colour of the hat on her head and no player has guessed wrong the group shares a prize of one mi l lion dollars. Otherwise, if all have passed o r at least one player has guessed wrong, there is no prize. Now the point is that these three players are not in competition; they are a team. They are not allowed to communicate with one another during the hat ceremony but they can get together before - h and and talk strategy. The question is, how well can they do? Find a strategy which will ma x- imize the probability of winning the prize, and find that pro b ability. This is a good exploratory problem for small group work. It’s easy enough to find a str ategy which will give a 50% chance of winning. Pick one person to be the “captain” and have her guess red and have all the others pass. She will be right with probability 50%. The question is, can they do better than that? At this point a student in my class will often attempt to argue that it’s impossible to do better than 50%. Here’s the idea. No matter what the strategy, a person gains no information from the other players on the co l- our of her own hat which will always be red or blue with equal probabi l- ity. We deduce that the probability of being right when a colour is guessed has to be 50%. Thus, on average, everyone who guesses a co l our must guess wrong 50% of the time. Well, given that, how on earth could anyone do better than to win the prize half the time? In fact, there’s a strategy that will win 75% of the time. Can you find it? This exposition of the hat problem and of error - correcting codes in the next section is based on a unit constructed by Richard Hoshino who was, at the time, a Ph.D. student at Dalhousie University. Here’s another way to make the argument. Consider player 1. Take a state (a set of hat colours) for which the strategy requires player 1 to guess a colour. To be specific (but general) suppose the state is RBR Now consider the “partner” state obtained by switching pl ayer 1’s hat colour: BBR From player 1’s point of view, these states are the same and he will make the same guess in each of them. But he will clearly guess right in one and wrong in the ot h- er. Since these two states are equally likely, he will guess wro ng exactly half the time. For a couple of years I looked around for a “neat” way to introduce error - correcting codes. The awesome problem, created by a Ph.D. student, Todd Ebert, at the University of California, was the very thing. MATH 111 Chapter 2C 18 000 110 100 101 010 011 001 PP PP P P P P PP PP 111 XXX XXX 000 110 100 101 010 011 001 PP PP P P P P PP PP 111 XXX XXX Pass or go for the mix. Here we describe a strategy that will win 75% of the time. Number the players 1, 2 and 3 and use 0 for blue and 1 for red. T hen the 8 possible configurations are listed at the right. Now notice that these are of two kinds, pure and mixed. The first and the last are pure and the six in the middle are mixed. Here’s the strategy. Look at the other two hats. If they are differ ent colours, pass, if they are the same colour, guess the o p- posite, that is, “go for the mix.” It’s not hard to see that this will produce a win precisely when the confi g- uration is mixed. In that case, the minority colour will “go for the mix” and gues s right, and the other two will pass. For the two pure configur a- tions, everyone will “go for the mix” and will guess wrong. Notice that, for this strategy, there are the same number of right (check) and wrong (X) guesses, as we argued would have t o be the case. I n deed, all states are equally likely to occur, and the 6 mixed states have 1 right guess each while the 2 pure states each have 3 wrong guesses. Not only are the number of right and wrong guesses the same, there is a natural 1 - 1 corresp ondence between the right guesses and the wrong guesses. Given a fixed strategy, every time there is state in which I make a right guess, there will be a companion state with all hat colours the same except mine, in which I will guess wrong. I will make the same guess in both states and one will necessarily be wrong and the other will be right. This observation gives us a way to argue that we could never do any be t- ter than this 6/8 win. Indeed, simply to get those 6 winning states r e- quires at least 6 right guesses and that will require 6 wrong guesses and we’ve only 2 states left. In fact we can just do it by ha v ing all the guesses wrong for those 2 states. But in fact we’re “maxed out” with this config u- ration. An important definition Strategies which accomplish this, which use only one guess in any winning state and use all wrong guesses in any lo s- ing state are called max i mally effective. We are now going to look at the general n - hat game and ask when we might expect max i mally effe ctive strategies. The 8 possible states. The f irst one listed is the all - blue case. The se c ond one listed is the case where player 1 has red and the others have blue. Etc. 000 100 010 001 110 101 011 111 It’s important to note that these states are all equally likely with probability of occurring (½ ) 3 = 1/8. That’s because the three coins are fair and ind e- pendent. A natural 1 - 1 correspon d- ence between the right guesses and the wrong guesses. One of these matchings is illustrate d at the left. Can you find the others? What we’ve done in our 6/8 strategy is taken the right guesses and spread them out as thinly as poss i ble (only one per winning state) and taken the wrong guesses and concentrated them as thic k- ly as possible (3 per losing state). Strategies which accomplish this are called maximally effective . We si m ply can’t do better than that. MATH 111 Chapter 2C 19 L L W W W W W W The n - hat game. Proposition 1. For the n - hat game, the probability of winning can never exceed 1 n n . This target is met only with a maximally effective stra t egy. The argument above generalizes. A maximally ef fective strategy has one right guess for each W - state and n wrong guesses for each L - state. Then: # right guesses = (# Wstates) # wrong guesses = n (# Lstates) Since # right = # wrong, we deduce (#W - states) = n (#L - states) (Total #states) = (#Wstates ) + (#Lstates) = ( n+ 1)(#L - states) Thus the probability of losing is 1/( n +1) and hence the probability of wi n- ning is n /( n +1) and that’s clearly the best we can do. Proposition 2. A maximally effe c tive strategy can only occur if n is of the form 2 k – 1, i. e. for n = 3, 7, 15, 31, 63 , etc This follows from the last equation above: (Total #states) = ( n+ 1)(#Lstates) Now the total number of states is a power of 2 (= 2 n ). It follows that n +1 must be a power of 2, and that means that n is one less than a power of 2. And we are done. A geometric description of the 3 - player game. There is a useful and po w- erful geometric realization of the state space of the hat game. For the 3 - player game the 8 possible states can be represented as the 8 vertices (which we will call nodes ) of the unit cube with one axis for each player. Notice that two nodes are connected by an edge precisely when they di f- fer in a single slot. As a consequence of that, a player in a game, able to see the other two hats but not his o wn, can be regarded as sitting on one of the edges wondering which of the two endpoints belong to the actual state. Thus, while the nodes represent the possible states, the edges re p- resent all the possible situations a player can experience For exampl e, for the 3 - hat game, take player 1. Suppose she sees red hats on both the other players. Then the possibilities are 011 and 111 and the edge joining these is the heavy edge in the top right corner of the di a gram. The three other edges that she might “f ind herself on” are also bolded. These are the four edges that are parallel to the x 1 axis. Sim i larly, player 2’s four edges are those that are parallel to the x 2 axis. And player 3’s four edges are those that are vertical. Now given a particular st rategy, some states will be winning and the ot h- ers will be losing, and we can display this with a labeling of the nodes, either a W or an L. At the right we have labeled the 3 - hat cube in this way. Note that: 1. All neighbours of an L - node are W 2. Each W - node has exactly one L - neighbour. 3. A player on a WW edge passes 4. A player on a WL edge “goes for the W.” x 1 x 3 x 2 000 001 100 101 110 111 010 011 For example the node 101 repr e- sents the state in which players 1 and 3 have red hats (1) and player 2 has a blue hat. And the four edges parallel to the x 1 - axis repr e- sent the four possible situations in which player 1 might find herself. MATH 111 Chapter 2C 20 A A A A A A A A A A B B A A A A A A A A A A A A A A A A A A A A (a) (b) Generalization to the n - hat game. We now “extend” our analysis of the 3 - hat game to the case of n - hats. We did this, not by general izing the strategy of the 3 - hat game but by generalizing its structure. That is, it is the structure we have found on the unit cube in 3 - space that we will generalize. We begin with two remarkable results. The first (Prop. 3) starts with a labeling of the n - cube with the symbols A and B with two combinatorial properties. What the Proposition then tells us is that we can use the labeling to co n struct a maximally effective stra t egy for the n - hat game for which the A - nodes will win and the B nodes will l ose. Thus we will be able to get the W - L labeling from our A’s and B’s by r e placing A by W and B by L. Proposition 3. Suppose we can find a way to label the nodes of the n - dimensional cube with either a n A or a B so that the following two pro p erties ho ld: (a) All neighbours of a B - node are A (b) Each A - node has exactly one B - neighbour. Then let the players adopt the following strategy: (c) If you are on a n A - A edge, pass. (d) If you are on a B - A edge, guess the colour co r respond ing to the A - node. Then the A - nodes will all produce a win the B - nodes will all produce a loss this is a maximally effective stra t egy for the n - hat game so from what we have shown earlier, its wi n ning probability will be 1 n n For the strategy described above to make sense, we must note that while a player does not know what node he is on, he knows that it is one of two possible nodes connected by an edge. Since he knows the label of every node he knows whet h- er that edge is A - A or A - B. Propo sition 3 is easily verified. Since an A - node is connected to a B (that’s b), it will produce one right guess (that’s d) and since all other neighbours are A (that’s b) all other guesses will be a PASS (that’s c). So an A node will win (with exactly one r ight guess). Se c ondly, a B - node has all A neighbours (that’s a) so everyone will guess the A (that’s d) and it will lose with n wrong guesses. Along the way we have also verified maximal effectiveness. Given Proposition 3, the question becomes when ca n we construct such a labe l- ing? What we showed earlier tells us that we would never be able to find such a labeling for an n which was not of the form 2 k – 1 (as there would be no maxima l- ly effective strategies). What we now show is that for those n which are of the form 2 k – 1, we can always find such a labe l ing. Proposition 4. If n