Ben-Gurion University of the Negev Data Structures 202.1.1031 Assignment No. 4 Responsible staff members: Prof. Paz Carmi (carmip@bgu.ac.il) Ilan Naiman (naimani@post.bgu.ac.il) Jules Zisser (zisserh@post.bgu.ac.il) Publish date: June 1st, 2025 Submission date: June 22nd, 2025 Contents Homework Guidelines 2 Integrity Statement 3 ADT Implementation 4 Huffman Code And Lempel–Ziv (LZ78) 9 B-trees 10 Amortized Analysis 12 Sorting 14 List of Questions 1 Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5 Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6 Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1 Homework Guidelines • The submission is in pairs or individuals. We recommend working in pairs to encourage discussion and mutual inspiration. For the avoidance of doubt, if the assignment was submitted in pairs, all of the submitted answers should be the outcome of a joint work. • Whether you submit as a pair or an individual, you must choose a group in the group assignment resource in Moodle. Groups created for individuals are of the form Assignment4_s_20, and groups for pairs are of form Assignment4_p_100. • Your submission should be a single printed pdf file (composed by LaTeX or any other word processor such as MS Word), and its name should be your group name (as chosen in Moodle), i.e., for the pair 17, the file name should be Assignment4_p_17.pdf • Questions regarding the assignment should be asked in the dedicated forum in Moodle or during the office hours of the responsible staff members. The forum is intended for discussion between the students regarding the assignment. The crew will answer questions in the forum when clarification is required in the pinned FAQ thread. • log () in the course is in base 2 , i.e., log 2 () unless stated otherwise. • A submission that does not include the integrity statement will receive a score of 0 • Each question details the maximal answer length. An answer longer than this limit will not be checked. • Points will be deduced from answers that are unclear to the readers; That includes answers with poor formatting, missing details, or imprecise verbose explanations. • For the L A TEX users, you can use the package lineno and the environment internallinenumbers (using \begin{} and \end{} ) for automatic line numbering, or use the macros defined above (in the source code) \startAnswerBox and \closeAnswerBox Before starting your work, read the following instructions carefully: • With each version of your work, submit it so that with each submission a larger part of your work is complete. This way, you won’t lose major parts of your work due to a technical issue or unexpected circumstances. • DO NOT wait until the final hour for submitting your work, because there might be a power stop- page/computer issues/internet issues/Moodle issues/etc. • Save a continuous backup of your work in some cloud platform you can access even if your computer malfunctions. • In case you work in pairs, both partners must verify that the work was submitted timely and fully. This way, you will prevent uncomfortable situations in cases of technical issues. 2 Integrity Statement We, <Insert your name> , ID number <Insert your ID> , and <Insert your name> , ID number <Insert your ID> assert that the work we submitted is entirely our own. We have not received any part from any other student in the class, nor did we give parts of it for others to use. We realize that if my work is found to contain code/answers that are not originally our own, a formal case will be opened against us with the BGU disciplinary committee. 3 ADT Implementation Question 1: (24 Points) The great retail company Kamazol ™ is a web-based company that sells many different items to their clients. An important part of the website behavior is to display the most popular items the platform users buy, meaning, the items with the most purchases. In this question, we wish to implement the ADT described below, where n is the maximal number of unique items in the platform at any given time specified by the platform manager in the Init phase. Remark: The items are inserted into the data structure by using the Record operation. Remark: An item is identified by a unique integer between 0 and u , where u is a parameter that is given in the initialization of the structure. Remark: Whenever there are two items with the same count, you may arbitrarily choose their ordering. Hint: Make sure that in your answer, you consider all possible cases. The operation Description Time Complexity Init( n , u ) Initializes a structure that supports n unique items (that is, the maximal amount of items in the system in any given time). Θ( n ) worst case Record(S, item, count) Record purchase of count instances of item ; If item has been purchased before, increase the purchase count by count ; Otherwise , insert the item with count count Θ(log n ) expected CountOf(S, item) Returns the purchase count of item if exists, and − 1 otherwise. Θ(1) expected MostPopular(S, k) Returns the k most popular items in the platform, where k ≤ n Θ( k ) worst case RemoveLeastPopular(S) Removes the least popular item in the platform. Θ(1) expected 4 Answer 1: A general description of the data structure: 1 2 3 4 5 Init(n,u) description: 1 2 3 4 5 Init(n,u) time complexity analysis: 1 2 3 5 Record(S, item, count) description: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Record(S, item, count) time complexity analysis: 1 2 3 4 5 6 7 8 6 CountOf(S, item) description: 1 2 3 4 5 CountOf(S, item) time complexity analysis: 1 2 3 MostPopular(S,k) description: 1 2 3 4 5 MostPopular(S,k) time complexity analysis: 1 2 3 RemoveLeastPopular(S) description: 1 2 3 4 5 7 RemoveLeastPopular(S) time complexity analysis: 1 2 3 8 Huffman Code And Lempel–Ziv (LZ78) Question 2: (12 points) Let | T | denote the length of a text T We say that a text T is in T ( n, d ) if and only if | T | = n and T contains d unique characters. Given a text T , let Huffman(T) be the text produced by performing the Huffman code on T 1. Give an explicit example of a text T in T ( n, d ) that minimizes |Huffman(T)|, where n = 32 and d = 6 2. Give an explicit example of a text T in T ( n, d ) whose LZ78 parsing consists of the fewest possible blocks, where n = 28 and d = 3 3. Among all strings in T (28 , 3) whose LZ78 parsing consist of the fewest possible blocks, give explicit string T that minimizes |Huffman(T)|. Answer 2: 1. 1 2 3 4 5 2. 1 2 3 4 5 3. 1 2 3 4 5 9 B-trees Question 3: (6 points) 1. Give an example of a B-tree T with t = 3 , where inserting a key x to T , then deleting x , results in a structurally different B-tree than T 2. Give an example of a B-tree T with t = 3 , where deleting x from T , then inserting x to T , results in a structurally different B-tree than T Answer 3: 1. The tree T : A drawing of the original tree T The element x is: (Write your answer here) A drawing of the tree after inserting x A drawing of the tree after deleting x 10 2. The tree T : A drawing of the original tree T The element x is: (Write your answer here) A drawing of the tree after deleting x A drawing of the tree after inserting x 11 Amortized Analysis In the lecture, you have been introduced to the following algorithm for building a heap: 1: for i ← Parent ( A. size − 1) to 0 do 2: maxHeapify(A, i) (Alg (2)) 3: end for Algorithm 1: buildMaxHeap(A, n) where maxHeapify is the following procedure you learned in class: 1: l ← Left ( i ) 2: r ← Right ( i ) 3: largest ← i 4: if l ≤ A.size − 1 AND A [ l ] > A [ largest ] then 5: largest ← l 6: end if 7: if r ≤ A.size − 1 AND A [ r ] > A [ largest ] then 8: largest ← r 9: end if 10: if largest ̸ = i then 11: exchange A [ i ] with A[largest] 12: maxHeapify(A, largest) 13: end if Algorithm 2: maxHeapify(A, i) Question 4: (20 points) In the following procedure of buildMaxHeap there are ⌊ n 2 ⌋ calls of maxHeapify operation, where n = A. size. Present and analyze the amortized cost of a single maxHeapify operation in this sequence of operations using the accounting method 12 Answer 4: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 13 Sorting Question 5: (22 points) 1. Given an array A with n − 1 numbers where n = 2 k for some integer k . One of the values appears exactly n 2 , another appears exactly n 4 and so on. More formally, for all 1 ≤ k ≤ log n exists a value that appears exactly n 2 k times in the array. Describe an algorithm that sorts A with a run time complexity of Θ( n ) worst case, and analyze its running time. Example: Input: 5 , 1 , 2 , 5 , 2 , 5 , 5 Output: 1 , 2 , 2 , 5 , 5 , 5 , 5 2. Definition: Two values in an array are called adjacent (or “consecutive”) if they are neighbours in the sorted array. For example, in the array [9 , 4 , 7 , 1] , the value 7 is adjacent to 4 on its left and to 9 on its right. Let A be an array containing n − 1 numbers, where n = 2 k for some positive integer k . For every integer 1 ≤ i ≤ k define the quota q i = n 2 i For every quota q i one of the following holds: (a) There exists a single value that occurs exactly q i times in A , or (b) There exist two adjacent values whose combined frequency is exactly q i (each may occur any positive number of times, but the sum is q i ). Design a deterministic algorithm that sorts A in worst case Θ( n ) time under the frequency condition above. Answer 5: 1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 14 17 18 2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 15 Question 6: (16 points) Given an array A with n elements such that each element contains a key (an integer) and additional satellite data. The array contains exactly k unique key values. 1. Assume that k ∈ Θ(log n ) . Describe a stable sorting algorithm that sorts A with a run time complexity of Θ( n log log n ) 2. Assume that k ∈ Θ( √ n ) Describe a stable sorting algorithm that sorts A with an expected run time complexity of Θ( n ) You may use Θ( n ) extra space. Example: Input: (5 , a ) , (1 , a ) , (2 , a ) , (5 , d ) , (2 , a ) , (5 , c ) , (5 , b ) Output: (1 , a ) , (2 , a ) , (2 , b ) , (5 , a ) , (5 , d ) , (5 , c ) , (5 , b ) Answer 6: 1. Solution for k ∈ Θ(log n ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2. Solution for k ∈ Θ( √ n ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Good Luck! 17