Platformy programowania Prof. C. Jędrzejek, Room 317 building with the clock Mob. Tel 793 161 101 Zima 2020 Precision and Recall (zwrot) – classification measures When evaluating a search tool or a classifier, we are interested in at least two performance measures: Precision: Within a given set of positively-labeled results, the fraction that were true positives = tp/(tp + fp) Recall: Given a set of positively-labeled results, the fraction of all positives that were retrieved = tp/(tp + fn) Positively-labeled means judged “relevant” by the search engine or labeled as in the class by a classifier. tp = true positive, fp = false positive etc. Precision and Recall Search tools and classifiers normally assign scores to items. Sorting by score gives us a precision-recall plot which shows what performance would be for different score thresholds. Score increasing In IR a query consisting of 5 words (features) describing an object Be careful of “Accuracy” The simplest measure of performance would be the fraction of items that are correctly classified, or the “accuracy” which is: tp + tn tp + tn + fp + fn But this measure is dominated by the larger set (of positives or negatives) and favors trivial classifiers. e.g. if 5% of items are truly positive, then a classifier that always says “negative” is 95% accurate. . Confusion matrix z wartościami liczbowymi jakiegoś klasyfikatora Zadanie 1 – podobne do tego TP TN Sensitivity = =? Specificity = = ? TP+FN TN+FP Sensitivity: the ability of a test to correctly identify patients with a disease. Specificity: the ability of a test to correctly identify people without the disease. True positive: the person has the disease and the test is positive. ... False negative: the person has the disease and the test is negative. Decision trees • A tree where • internal node = test on a single attribute • branch = an outcome of the test • leaf node = class or class distribution A? B? C? D? Yes Konstanz, 27-28.3.2000 EDBT2000 tutorial - Class 10 Classical example: play tennis? Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N z Training overcast hot high false P rain mild high false P set from rain cool normal false P Quinlan’s rain cool normal true N book overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N Konstanz, 27-28.3.2000 EDBT2000 tutorial - Class 11 Decision tree obtained with ID3 (Quinlan 86) outlook sunny rain overcast humidity P windy high normal true false N P N P Konstanz, 27-28.3.2000 EDBT2000 tutorial - Class 12 From decision trees to classification rules • One rule is generated for each path in the tree from the root to a leaf • Rules are generally simpler to understand than trees outlook sunny IF outlook=sunny overcast rain AND humidity=normal humidity P windy THEN play tennis high normal true false N P N P Konstanz, 27-28.3.2000 EDBT2000 tutorial - Class 13 Example Tree for “Play?” Outlook sunny rain overcast Humidity Yes Windy high normal true false No Yes No Yes 14 Building Decision Tree [Q93] • Top-down tree construction – At start, all training examples are at the root. – Partition the examples recursively by choosing one attribute each time. • Bottom-up tree pruning – Remove subtrees or branches, in a bottom-up manner, to improve the estimated accuracy on new cases. 15 Choosing the Splitting Attribute • At each node, available attributes are evaluated on the basis of separating the classes of the training examples. A Goodness function is used for this purpose. • Typical goodness functions: – information gain (ID3/C4.5) – information gain ratio – gini index 16 witten&eibe Which attribute to select? 17 witten&eibe A criterion for attribute selection • Which is the best attribute? – The one which will result in the smallest tree – Heuristic: choose the attribute that produces the “purest” nodes • Popular impurity criterion: information gain – Information gain increases with the average purity of the subsets that an attribute produces • Strategy: choose attribute that results in 18 greatest information gain witten&eibe Zadanie 2 Weather Data with ID code ID Outlook Temperature Humidity Windy Play? A sunny hot high false No B sunny hot high true No C overcast hot high false Yes D rain mild high false Yes E rain cool normal false Yes F rain cool normal true No G overcast cool normal true Yes H sunny mild high false No I sunny cool normal false Yes J rain mild normal false Yes K sunny mild normal true Yes L overcast mild high true Yes M overcast hot normal false Yes 19 N rain mild high true No Information gain Information gain IG(A) is the measure of the difference in entropy from before to after the set S is split on an attribute A . In other words, how much uncertainty in S was reduced after splitting set S on attribute A . Example: attribute “Outlook” • “Outlook” = “Sunny”: info([2,3]) entropy(2/5,3/5) 2 / 5 log(2 / 5) 3 / 5 log(3 / 5) 0.971 bits • “Outlook” = “Overcast”: Note: log(0) is not defined, but we info([4,0]) entropy(1,0) 1log(1) 0 log(0) 0 bits evaluate 0*log(0) as zero • “Outlook” = “Rainy”: info([3,2]) entropy(3/5,2/5) 3 / 5 log(3 / 5) 2 / 5 log(2 / 5) 0.971 bits • Expected information for attribute: info([3,2],[4,0],[3,2]) (5 / 14) 0.971 (4 / 14) 0 (5 / 14) 0.971 21 0.693 bits witten&eibe Computing the information gain • Information gain: (information before split) – (information after split) gain(" Outlook") info([9,5]) - info([2,3], [4,0],[3,2]) 0.940 - 0.693 0.247 bits • Information gain for attributes from weather data: gain("Outlook") 0.247 bits gain("Temperature") 0.029 bits gain("Humidity") 0.152 bits gain(" Windy") 0.048 bits 22 witten&eibe Continuing to split gain("Humidity") 0.971 bits gain("Temperature") 0.571 bits gain(" Windy") 0.020 bits 23 witten&eibe The final decision tree • Note: not all leaves need to be pure; sometimes identical instances have different classes Splitting stops when data can’t be split any further 24 witten&eibe Highly-branching attributes • Problematic: attributes with a large number of values (extreme case: ID code) • Subsets are more likely to be pure if there is a large number of values Information gain is biased towards choosing attributes with a large number of values This may result in overfitting (selection of an attribute that is non-optimal for prediction) 26 witten&eibe Weather Data with ID code ID Outlook Temperature Humidity Windy Play? A sunny hot high false No B sunny hot high true No C overcast hot high false Yes D rain mild high false Yes E rain cool normal false Yes F rain cool normal true No G overcast cool normal true Yes H sunny mild high false No I sunny cool normal false Yes J rain mild normal false Yes K sunny mild normal true Yes L overcast mild high true Yes M overcast hot normal false Yes N rain mild high true No 27 Split for ID Code Attribute Entropy of split = 0 (since each leaf node is “pure”, having only one case. Information gain is maximal for ID code 28 witten&eibe K-means k-NN Basic Idea • Using the second property, the k-NN classification rule is to assign to a test sample the majority category label of its k nearest training samples • In practice, k is usually chosen to be odd, so as to avoid ties • The k = 1 rule is generally called the nearest-neighbor classification rule k-NN Algorithm • For each training instance t=(x, f(x)) – Add t to the set Tr_instances • Given a query instance q to be classified – Let x1, …, xk be the k training instances in Tr_instances nearest to q – Return k fˆ (q) = arg max å d (v, f ( xi )) vÎV i =1 • Where V is the finite set of target class values, and δ(a,b)=1 if a=b, and 0 otherwise (Kronecker function) • Intuitively, the k-NN algorithm assigns to each new query instance the majority class among its k nearest neighbors Simple Illustration q is + under 1-NN, but – under 5-NN Distance-weighted k-NN k • Replace fˆ (q) = arg max å d (v, f ( xi )) vÎV i =1 by: k fˆ (q) = argmax å 1 d (v, f (x i )) d( x i, x q ) 2 v ÎV i=1 Zadanie 3 Czy metoda zastosowana w „A Practical Introduction to K-Nearest Neighbors Algorithm for Regression (with Python code) AISHWARYA SINGH” ZGADZA SIĘ ZE WZOREM k fˆ (q) = argmax å 1 d (v, f (x i )) d( x i, x q ) 2 v ÎV i=1 Zadanie 4 dostosować notebook z wykładu 26.01 do przypadku klasyfikacji i wykonać obliczenia Zadanie 5, które z twierdzeń są prawdziwe you evaluate the performance of your classifier on independent test data and observe that the results are substantially worse than on the training set. Which of the following measures are likely to improve the performance of your spam detection system? using additional features increasing the weight decay parameter increasing the number of gradient descent iterations reducing the number of gradient descent iterations reducing the number of hidden neurons in the MLP obtaining more training data
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-