More Examples and Applications on AVL Tree

Similar documents
Principles of Computer Science

Chapter 11 Multiway Search Trees

Central Algorithmic Techniques. Iterative Algorithms

Ways of counting... Review of Basic Counting Principles. Spring Ways of counting... Spring / 11

15.053x. OpenSolver (

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

Maximum Likelihood ofevolutionary Trees is Hard p.1

Grade 6 Math Circles Winter February 6/7 Number Theory - Solutions Warm-up! What is special about the following groups of numbers?

IEEE SIGNAL PROCESSING LETTERS, VOL. 13, NO. 3, MARCH A Self-Structured Adaptive Decision Feedback Equalizer

Chapter 8 Multioperand Addition

Symbolic CTL Model Checking

Graph Algorithms Introduction to Data Structures. Ananda Gunawardena 8/2/2011 1

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks

Foundations of Natural Language Processing Lecture 13 Heads, Dependency parsing

Today we will... Foundations of Natural Language Processing Lecture 13 Heads, Dependency parsing. Evaluating parse accuracy. Evaluating parse accuracy

Course Syllabus. Operating Systems, Spring 2016, Meni Adler, Danny Hendler and Amnon Meisels 1 3/14/2016

Grade 6 Math Circles Winter February 6/7 Number Theory

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

Leman Akoglu Hanghang Tong Jilles Vreeken. Polo Chau Nikolaj Tatti Christos Faloutsos SIAM International Conference on Data Mining (SDM 2013)

How to Conduct On-Farm Trials. Dr. Jim Walworth Dept. of Soil, Water & Environmental Sci. University of Arizona

The GCD of m Linear Forms in n Variables Ben Klein

Connectivity in a Bipolar Fuzzy Graph and its Complement

Add_A_Class_with_Class_Number_Revised Thursday, March 18, 2010

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics

Chapter 3 CORRELATION AND REGRESSION

Plan Recognition through Goal Graph Analysis

Increased tortuosity of pulmonary arteries in patients with pulmonary hypertension in the arteries

Outlier Analysis. Lijun Zhang

Computational Tree Logic and Model Checking A simple introduction. F. Raimondi

Lecture 9A Section 2.7. Wed, Sep 10, 2008

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

Math Workshop On-Line Tutorial Judi Manola Paul Catalano. Slide 1. Slide 3

Cayley Graphs. Ryan Jensen. March 26, University of Tennessee. Cayley Graphs. Ryan Jensen. Groups. Free Groups. Graphs.

Introduction to Econometrics

Excel Solver. Table of Contents. Introduction to Excel Solver slides 3-4. Example 1: Diet Problem, Set-Up slides 5-11

New Ideas for Brain Modelling

Detecting Multiple Mean Breaks At Unknown Points With Atheoretical Regression Trees

Intro to Theory of Computation

CS709a: Algorithms and Complexity

Some Connectivity Concepts in Bipolar Fuzzy Graphs

American Society for Quality

Math Workshop On-Line Tutorial Judi Manola Paul Catalano

Hoare Logic and Model Checking. LTL and CTL: a perspective. Learning outcomes. Model Checking Lecture 12: Loose ends

Problem Solving Agents

Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data. Technical Report

The Hidden Topology of a Noisy Point Cloud

Methods for Predicting Type 2 Diabetes

On the number of palindromically rich words

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

MITOCW conditional_probability

Lecture 2: Linear vs. Branching time. Temporal Logics: CTL, CTL*. CTL model checking algorithm. Counter-example generation.

Bayesian Networks Representation. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Automatically Assembling the Building Blocks of Cellular Circuitry

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

An Evolutionary Approach to Tower of Hanoi Problem

OpenSim Tutorial #2 Simulation and Analysis of a Tendon Transfer Surgery

Credal decision trees in noisy domains

A hybrid approach for identification of root causes and reliability improvement of a die bonding process a case study

12 Lead Electrocardiogram (ECG) PFN: SOMACL17. Terminal Learning Objective. References

Knowledge Based Systems

Evolutionary Programming

Generative Adversarial Networks.

A Taxonomy of Decision Models in Decision Analysis

Chapter 13 Estimating the Modified Odds Ratio

Psychology Research Methods Lab Session Week 10. Survey Design. Due at the Start of Lab: Lab Assignment 3. Rationale for Today s Lab Session

University of Cambridge Engineering Part IB Information Engineering Elective

CNV PCA Search Tutorial

Math Circle Intermediate Group October 9, 2016 Combinatorics

PHP2500: Introduction to Biostatistics. Lecture III: Introduction to Probability

Steiner Tree Problems with Side Constraints Using Constraint Programming

VLSI Design, Fall Design of Adders Design of Adders. Jacob Abraham

Pattern Analysis and Machine Intelligence

TEMPORAL PREDICTION MODELS FOR MORTALITY RISK AMONG PATIENTS AWAITING LIVER TRANSPLANTATION

Representation and Analysis of Medical Decision Problems with Influence. Diagrams

Remarks on Bayesian Control Charts

A scored AUC Metric for Classifier Evaluation and Selection

Generalized idea of Palindrome Pattern Matching: Gapped Palindromes

Evaluating the Effectiveness of Metamorphic Testing on Edge Detection Programs

Spartan Dairy 3 Tutorial

A Unified Approach to Ranking in Probabilistic Databases. Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA

Efficient AUC Optimization for Information Ranking Applications

Research Article The Occurrence of Genetic Alterations during the Progression of Breast Carcinoma

Answers to Midterm Exam

Jitter-aware time-frequency resource allocation and packing algorithm

Artificial Intelligence For Homeopathic Remedy Selection

Sorting cancer karyotypes by elementary operations

Planning and Terrain Reasoning*

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

v Feature Stamping SMS 13.0 Tutorial Prerequisites Requirements Map Module Mesh Module Scatter Module Time minutes

Trauma Registry Training. Exercises. Dee Vernberg Dan Robinson Digital Innovation (800) ex 4.

Including Quantification in Defeasible Reasoning for the Description Logic EL

A Guide to Algorithm Design: Paradigms, Methods, and Complexity Analysis

PERSONALIZED medicine is a healthcare paradigm that. ConfidentCare: A Clinical Decision Support System for Personalized Breast Cancer Screening

EWMBA 296 (Fall 2015)

Lecture 13. Outliers

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

Irrationality in Game Theory

Transcription:

CSCI2100 Tutorial 11 Jianwen Zhao Department of Computer Science and Engineering The Chinese University of Hong Kong Adapted from the slides of the previous offerings of the course

Recall in lectures we studied the AVL tree, which is one type of self-balancing binary search tree. The aim was to store a set of integers S supporting the following operations: A predecessor query: given an integer q, find its predecessor in S; Insertion: add a new integer to S; and Deletion: remove an integer from S. We want all of these operations to run in O(log n) (where n is the number of integers in S) in the worst case. If we were to attempt to accomplish this using a BST, we must ensure it is balanced after every operation, and the AVL tree presents one method of doing so.

Rebalancing We know that a tree is balanced as long as the height of its subtrees differ by at most 1, and that insertion and deletion can only cause a 2-level imbanace (where the heights differ by 2). In lectures we explored the Left-Left and Left-Right cases in detail, so here we will look at Right-Right and Right-Left: h h + 2 h h + 2 h or h + 1 h + 1 h + 1 h

Right-Right Similar to Left-Left, fix by a rotation: a h h + 2 b x h + 1 A = h x + 1 a x b h + 1 C B C A B Note that x = h or h + 1, and the ordering from left to right of A, a, B, b, C is preserved after rotation.

Right-Left Similar to Left-Right, fix by a double rotation: h h + 2 a h h + 2 c h + 1 h + 1 h + 1 h b h h a x y b h A c x y A B C D D B C Note that x and y must be h or h 1. Futhermore at least one of them must be h.

Insertion Let s see these in action with some concrete examples involving insertion and deletion. Suppose we start with an empty tree and add 10, and. Inserting yields: 10 0 1 10 0? 10 0 2 0? 0 1 We first traverse from root-to-leaf and add a node with key. The height of the subtrees along this path are now invalidated, so we traverse back up to the root and recalculate at each node. When we get to node 10, we find that we have an imbalance, in this case of type Right-Right.

Insertion We fix via a rotation according to the Right-Right case: 10 0 2 1 1 0 1 10 One should check that at the end the tree is balanced, and satisfies the binary search tree property!

Insertion Suppose we then add 25, and 50. Upon inserting 50, we will find the need for another Right-Right rebalancing: 1? 1 3 2 2 10 1? 10 1 2 1 1 0 1 25? 25 0 1 10 25 50 50 50

Insertion To get a Right-Left case, let s add 35, 33, 37, 60 and 38 (in this exact order). Upon inserting 38 we will find: 2? 2 4 35 3 3 1 1? 2 1 1 3 2 2 1 2 2 10 25 35 50 1? 0 1 10 25 35 50 1 2 0 1 1 1 33 37 0 1 50 0 1 33 37 0? 60 33 37 0 1 60 10 25 38 60 38 38

Deletion Let s look at an example of a deletion as well: suppose that some sequence of operations later we arrive at the following and want to delete the key : 35 3 4 35? 4 35? 4 1 2 2 3 0 2 2 3 31 1 1 2 3 33 37 50 1 0 1 0 2 1 33 37 50 1 0 1 0 2 1 33 37 50 1 0 2 1 31 36 45 0 1 60 31 36 45 0 1 60 36 45 0 1 60 47 47 47 After identifying and deleting the appropriate node (more on this later), we again need to traverse back up to the root and rebalance. Here we run into a Right-Left case.

Deletion... and we are actually still not done because there is another Right-Right rebalance we need to do: 35 2 4 3 3 31 1 1 2 3 2 35 2 50 2 1 33 37 50 31 1 0 2 1 1 1 37 45 1 0 0 1 60 36 45 0 1 60 33 36 47 47 Remember that at most one rebalance is needed on insert; but deletion may require more than one!

Special Exercise 10 Problem 5 Problem Let S be a dynamic set of integers, and n = S. Describe a data structure to support the following operations on S with the required performance guarantees: Insert a new element to S in O(log n) time. Delete an element from S in O(log n) time. Report the k smallest elements of S in O(k) time, for any k satisfying 1 k n. Your data structure must consume O(n) space at all times.

Special Exercise 10 Problem 5 Solution We maintain a binary search tree T over S, which consumes O(n) space at all times. For insertion, insert a new element into T, rebalance T and identify the smallest element e in T (O(log n) time). For deletion, remove an element from T, rebalance T and identify the smallest element e in T (O(log n) time). For reporting the k smallest elements, start from e and repeat the following until we find the k elements: - Starting from current node, find the next node which stores the successor s of the current node, and report s.

Special Exercise 10 Problem 5 Example Suppose that k = 5 and after a sequence of insertions and deletions, we have the following BST: 15 10 35 73 3 12 32 36 60

Special Exercise 10 Problem 5 Example Then we can report the 5 smallest elements as follows: 15 15 15 10 35 73 10 35 73 10 35 73 3 12 32 36 60 3 12 32 36 60 3 12 32 36 60 15 15 15 10 35 73 10 35 73 10 35 73 3 12 32 36 60 3 12 32 36 60 3 12 32 36 60

Cost analysis Lemma: During the process of traversing the k smallest elements, any node can be visited at most twice. Proof: In our algorithm, We traverse the k elements in a bottom-up manner. Therefore, there are only two cases for visiting a node u: u u v Case 1 Case 2 Case 1: u has a right child. After visiting u, we have to visit its child node v, and then go back to visit u again. Hence, u was visited twice. Case 2: u does not have a right child. After visiting u, we directly go up to visit next node and never come back to u again. Hence, u is only visited once.

Cost analysis Since visiting a node once takes O(1) time, and based on the lemma we know that a node can be visited at most twice. Hence, the time for reporting k smallest elements is bounded by 2 k O(1) = O(k).

BST with counter dynamic version In tutorial 10, we have introduced range count problem and solved it by augmenting a balanced BST by storing the number of nodes in the subtree of u in each node u. For example: 29 cnt = 11 cnt = 4 cnt = 6 12 47 cnt = 1 cnt = 2 3 15 cnt = 2 cnt = 3 71 18 41 68 92 cnt = 1 cnt = 1 cnt = 1 cnt = 1

When we perform a insertion or deletion, we need to rebalance the BST as well as updating the counter values of some nodes. Suppose that we insert into the following BST: 29 cnt = 11 29 cnt =? cnt = 4 cnt = 6 12 47 cnt =? cnt = 6 12 47 cnt = 1 cnt = 2 3 15 cnt = 2 cnt = 3 71 cnt = 1 cnt =? 3 15 cnt = 2 cnt = 3 71 cnt = 1 18 41 68 92 cnt = 1 cnt = 1 cnt = 1 cnt =? 18 41 68 92 cnt = 1 cnt = 1 cnt = 1 cnt = 1 The node with key 15 becomes imbalanced, and the counter values of the nodes along the path we traversed when inserting are now invalidated.

We first rebalance the BST via a rotation(right-right case), and then recalculate the counter values of the afore mentioned nodes. 29 cnt =? cnt =? cnt = 6 12 47 29 cnt =? cnt =? cnt = 6 12 47 cnt = 1 cnt =? 3 15 cnt = 2 cnt = 3 71 cnt = 1 cnt =? 3 18 cnt = 2 cnt = 3 71 cnt =? 18 41 68 92 15 41 68 92 cnt = 1 cnt = 1 cnt = 1 cnt =? cnt = 1 cnt = 1 cnt = 1 cnt = 1 cnt = 1 29 cnt = 12 cnt = 5 cnt = 6 12 47 cnt = 1 cnt = 3 3 18 cnt = 2 cnt = 3 71 15 cnt = 1 41 68 92 cnt = 1 cnt = 1 cnt = 1 cnt = 1

Recall that when performing insertion or deletion, we essentially descend a root-to-leaf path, only the counter values of the nodes on this path need to be updated. Therefore, we can update the counter values of these nodes in the bottom-up order, which follows the same idea with the updating of the subtree height values of these nodes.

Count of Smaller Numbers Problem Problem Given an array A with n integers. Describe an algorithm to produce a new array B, such that B[i] (1 i n) is the number of elements in A that (i) are smaller than A[i], and (ii) are to the right of A[i]. Example A: 29 12 47 71 15 3 41 18 11 92 68 To the right of A[8] = 41, there are 2 smaller elements, then B[8] = 2; To the right of A[5] = 71, there are 6 smaller elements, then B[5] = 6; Hence, the output array B should be: B: 5 2 6 4 6 2 0 2 1 0 1 0

Solution Clearly, B[n] = 0. We maintain a balanced BST T with counters. At the beginning, T contains only A[n]. Then, for i = n 1 downto 1, do the following: Perform a range count, i.e., find the number of integers in T which are in the range of (, A[i]]. Denote the result by c. Set B[i] = c. Insert A[i] to T.

Example i A: 29 12 47 71 15 3 41 18 11 92 68 n = 12 Suppose that after a sequence of operations, i = 5, then our BST T constructed on the elements from A[6] to A[12] should be: 18 cnt = 7 cnt = 3 cnt = 3 11 68 3 15 41 92 cnt = 1 cnt = 1 cnt = 1 cnt = 1

Example 18 cnt = 7 cnt = 3 cnt = 3 11 68 3 15 41 92 cnt = 1 cnt = 1 cnt = 1 cnt = 1 Since A[5] = 71, perform a range count to find the number of integers in the above BST which are in the range of (3, 71]. This gives c = 6. Hence, B[5] = 6.

Cost Analysis For each element in A[i], we perform an insertion and a range count, which takes at most O(log n) time. We have n elements in total. Hence, the whole algorithm terminates in O(n log n) time.