Data Structures and Algorithms

Август 2, 2022

Главная
Информатика
Data Structures and Algorithms

Содержание

2. Recap Elementary data structures ADT Array based vs. linked implementation Worst case time complexity to help
3. Today’s Objectives What is a “MAP or Dictionary ADT”? What choices do we have to implement
4. Map or Dictionary
5. Map or Dictionary Models a searchable dynamic set of key-value entries Main operations are: searching, inserting,
6. The Map ADT get(k): if the map M has an entry with key k, return its
7. Example Operation Output Map isEmpty() true Ø put(5,A) null (5,A) put(7,B) null (5,A),(7,B) put(2,C) null (5,A),(7,B),(2,C)
8. A Simple List-Based Map We can implement a map using an unsorted list We store the
9. The get(k) Algorithm Algorithm get(k): while map.hasNext() do p = map.next() { the next element in
10. The put(k,v) Algorithm Algorithm put(k,v): while map.hasNext() do p = map.next() if p.element().getKey() = k then
11. The remove(k) Algorithm Algorithm remove(k): while map.hasNext() do p = map.next() if p.element().getKey() = k then
12. Performance of a List-Based Map Performance: put takes O(1) time since we can insert the new
13. Hash Map
14. Let’s Start With this Question How much time does it take to lookup an item in
15. Example Suppose you’re writing a program to access employee records for a company with 1000 employees.
16. Example (cont.) The easiest way to do this is by using an array (we already know
17. Example (cont.) The speed and simplicity of data access using this array-based database make it very
18. Example (cont.) But mostly, the keys are not so well behaved A simple example would be
19. What Did We Learn From The Example? Arrays are very fast when it comes to accessing
20. Hash Map Hash Table is a very practical way to maintain a map
21. Hash Table
22. Hash Table
23. Hash Function A hash function h maps keys of a given type to integers in a
24. Simple Hash Function for Integers
25. General Hash Functions A hash function is usually specified as the composition of two functions: Hash
26. Parts of a Hash Function © 2014 Goodrich, Tamassia, Goldwasser
27. Ideal Hash Function Every resulting hash value has exactly one input that will produce it Same
28. Some Principles If n items are placed in m buckets, and n is greater than m,
29. Collisions So collisions are inevitable Our goal should therefore be to minimize collisions We will achieve
30. Tip! Designing a hash function is a black art It is always better to use a
31. Hash Codes Memory address: We reinterpret the memory address of the key object as an integer
32. Hash Codes (cont.) Integer cast: We reinterpret the bits of the key as an integer Suitable
33. Hash Codes (cont.) Component sum: We partition the bits of the key into components of fixed
34. Hash Codes (cont.) We partition the bits of the key into a sequence of components of
35. Compression Functions Division: h2 (y) = y mod N The size N of the hash table
36. Compression Functions Multiply, Add and Divide (MAD) h2 (y) = [(ay + b) mod p] mod
37. Collision Handling
38. Collision Handling Separate Chaining: let each cell in the table point to a linked list of
39. Analysis of get(k) in Separate Chaining
40. Analysis of get(k) in Separate Chaining
41. Collision Handling Open Addressing: the colliding item is placed in a different cell of the table
42. Example Example: Linear probing h(x) = x mod 13 Insert keys 18, 41, 22, 44, 59,
43. Example Example: Linear probing h(x) = x mod 13 Insert keys 18, 41, 22, 44, 59,
44. Search with Linear Probing Consider a hash table A that uses linear probing get(k) We start
45. Updates with Linear Probing To handle insertions and deletions, we introduce a special object, called DEFUNCT,
46. Updates with Linear Probing put(k, o) We throw an exception if the table is full We
47. Collision Handling Open Addressing: the colliding item is placed in a different cell of the table
48. Double Hashing The secondary hash function cannot have zero values The table size N must be
49. Double Hashing Common choice of compression function for the secondary hash function: d(k) = q −
50. Example Consider a hash table storing integer keys that handles collision with double hashing N =
51. Example Consider a hash table storing integer keys that handles collision with double hashing N =
52. Analysis of get(k) in Open Addressing
54. Скачать презентацию

Слайд 2

Recap
Elementary data structures
ADT
Array based vs. linked implementation
Worst case time complexity to

help us choose based on our needs

Слайд 3

Today’s Objectives
What is a “MAP or Dictionary ADT”?
What choices do we

have to implement a MAP?
What is a hash function and a hash table?
What is collision and how to handle it?
How to analyze time complexity of a Hash Map?

Слайд 4

Map or Dictionary

Слайд 5

Map or Dictionary
Models a searchable dynamic set of key-value entries
Main operations

are: searching, inserting, and deleting items
Applications:
Compiler symbol table
A news indexing service

Слайд 6

The Map ADT
get(k): if the map M has an entry with

key k, return its associated value; else, return null
put(k, v): insert entry (k, v) into the map M; if key k is not already in M, then return null; else, return old value associated with k
remove(k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null
size(), isEmpty()
entrySet(): return an iterable collection of the entries in M
keySet(): return an iterable collection of the keys in M
values(): return an iterator of the values in M

Слайд 7

Example
Operation Output Map
isEmpty() true Ø
put(5,A) null (5,A)
put(7,B) null (5,A),(7,B)
put(2,C) null (5,A),(7,B),(2,C)
put(8,D) null (5,A),(7,B),(2,C),(8,D)
put(2,E) C (5,A),(7,B),(2,E),(8,D)
get(7) B (5,A),(7,B),(2,E),(8,D)
get(4) null (5,A),(7,B),(2,E),(8,D)
get(2) E (5,A),(7,B),(2,E),(8,D)
size() 4 (5,A),(7,B),(2,E),(8,D)
remove(5) A (7,B),(2,E),(8,D)
remove(2) E (7,B),(8,D)
get(2) null (7,B),(8,D)
isEmpty() false (7,B),(8,D)
© 2014 Goodrich, Tamassia, Goldwasser

Слайд 8

A Simple List-Based Map
We can implement a map using an unsorted

list
We store the items of the map in a list S (based on a doublylinked list), in arbitrary order

Слайд 9

The get(k) Algorithm
Algorithm get(k):
while map.hasNext() do
p = map.next() { the next

element in the map}
if p.element().getKey() = k then
return p.element().getValue()
return null {there is no entry with key equal to k}

Слайд 10

The put(k,v) Algorithm
Algorithm put(k,v):
while map.hasNext() do
p = map.next()
if p.element().getKey() = k

then
t = p.element().getValue()
map.set(p,(k,v))
return t {return the old value}
map.addLast((k,v))
n = n + 1 {increment variable storing number of entries}
return null { there was no entry with key equal to k }

Слайд 11

The remove(k) Algorithm
Algorithm remove(k):
while map.hasNext() do
p = map.next()
if p.element().getKey() = k

then
t = p.element().getValue()
map.remove(p)
n = n – 1 {decrement number of entries}
return t {return the removed value}
return null {there is no entry with key equal to k}

Слайд 12

Performance of a List-Based Map
Performance:
put takes O(1) time since we can

insert the new item at the beginning or at the end of the sequence
get and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key
The unsorted list implementation is effective only for maps of small size or for maps in which puts are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation)

Слайд 13

Hash Map

Слайд 14

Let’s Start With this Question
How much time does it take to

lookup an item in an array, if you already know its index?

Слайд 15

Example
Suppose you’re writing a program to access employee records for a

company with 1000 employees.
Goal: fastest possible access to any individual record
Each employee has a number from 1(founder) to 1000 (the most recent worker)
Employees are seldom laid off, and even when they are, their record stays in the database.

Слайд 16

Example (cont.)
The easiest way to do this is by using an

array (we already know the size)
Each employee record occupies one cell of the array
The index number of the cell is the employee number
empRecord rec = databaseArray[72];
databaseArray[totalEmployees++] = newRecord;

Слайд 17

Example (cont.)
The speed and simplicity of data access using this array-based

database make it very attractive.
However, it works in our example only because keys are well organized
Sequentially from 1 to a known maximum
No deletions required
New items can be added sequentially at the end

Слайд 18

Example (cont.)
But mostly, the keys are not so well behaved
A simple

example would be when keys are of type String.
Array indexing requires integer
One more problem: Even when using integers, the value could be outside of the range of the array

Слайд 19

What Did We Learn From The Example?
Arrays are very fast when

it comes to accessing an item based on its index
But “key” ? “index” mapping only works when
keys are integers, and
are within the bound, and
are not changed

Слайд 20

Hash Map
Hash Table is a very practical way to maintain a

map

Слайд 21

Hash Table

Слайд 22

Hash Table

Слайд 23

Hash Function
A hash function h maps keys of a given type

to integers in a fixed interval [0, N − 1]
Example: h(x) = x mod N is a hash function for integer keys
The integer h(x) is called the hash value of key x

Слайд 24

Simple Hash Function for Integers

Слайд 25

General Hash Functions
A hash function is usually specified as the composition

of two functions:
Hash code: h1: keys → integers
Compression function: h2: integers → [0, N → 1]

The hash code is applied first, and the compression function is applied next on the result, i.e., h(x) = h2(h1(x))
The goal of the hash function is to “disperse” the keys in a uniform manner

Слайд 26

Parts of a Hash Function
© 2014 Goodrich, Tamassia, Goldwasser

Слайд 27

Ideal Hash Function
Every resulting hash value has exactly one input that

will produce it
Same key hashes to the same index (repeatable)
Hash value is widely different if even a single bit is different in the key (avalanche)
Should work in general (for different types)

Слайд 28

Some Principles
If n items are placed in m buckets, and n

is greater than m, one or more buckets contain two or more items (Pigeonhole Principle)
This is called collision (two keys hash to the same index)
Birthday paradox

https://en.wikipedia.org/wiki/Birthday_problem

Слайд 29

Collisions
So collisions are inevitable
Our goal should therefore be to minimize collisions
We

will achieve it through:
Generating better hash codes
Performing better compression
Handling collisions

Слайд 30

Tip!
Designing a hash function is a black art
It is always better

to use a known good algorithm
But sometimes, as a student, it is better to try to design one for the sake of practice

Слайд 31

Hash Codes
Memory address:
We reinterpret the memory address of the key object

as an integer (default hash code of all Java objects)
Good in general, except that it is not repeatable

Слайд 32

Hash Codes (cont.)
Integer cast:
We reinterpret the bits of the key as

an integer
Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., byte, short, int and float in Java)

Слайд 33

Hash Codes (cont.)
Component sum:
We partition the bits of the key into

components of fixed length (e.g., 16 or 32 bits) and we sum the components
Fails to treat permutations differently (“abc”, “cba”, “cab”)

Слайд 34

Hash Codes (cont.)
We partition the bits of the key into a

sequence of components of fixed length (e.g., 8, 16 or 32 bits)
a0 a1 … an−1
We evaluate the polynomial
p(z) = a0 + a1 z + a2 z2 + … … + an−1zn−1
at a fixed value z
Especially suitable for strings (e.g., the choice z = 33 gives at most 6 collisions on a set of 50,000 English words)

Polynomial accumulation:

Слайд 35

Compression Functions
Division:
h2 (y) = y mod N
The size N of the

hash table is usually chosen to be a prime
Helps “spread out” the distribution of hashed values
Try inset keys with hash codes {200, 205, 210, 215, …, 600} into a table size of 100 vs. 101

Слайд 36

Compression Functions
Multiply, Add and Divide (MAD)
h2 (y) = [(ay + b)

mod p] mod N
p is a prime number larger than N
a and b are integers from the interval [0, p – 1], with a > 0

Слайд 37

Collision Handling

Слайд 38

Collision Handling
Separate Chaining: let each cell in the table point to

a linked list of entries that map there
Separate chaining is simple, but requires additional memory outside the table

Слайд 39

Analysis of get(k) in Separate Chaining

Слайд 40

Analysis of get(k) in Separate Chaining

Слайд 41

Collision Handling
Open Addressing: the colliding item is placed in a different

cell of the table
Linear Probing: handles collision by placing the item in the next (circularly) available cell
Each cell inspected is called a probe
Colliding items lump together, causing future collisions to cause a longer sequence of probes

Слайд 42

Example
Example:
Linear probing
h(x) = x mod 13
Insert keys 18, 41, 22, 44,

59, 32, 31, 73, in this order

Слайд 43

Example
Example:
Linear probing
h(x) = x mod 13
Insert keys 18, 41, 22, 44,

59, 32, 31, 73, in this order

Слайд 44

Search with Linear Probing
Consider a hash table A that uses linear

probing
get(k)
We start at cell h(k)
We probe consecutive locations until one of the following occurs
An item with key k is found, or
An empty cell is found, or
N cells have been unsuccessfully probed

Algorithm get(k)
i ← h(k)
p ← 0
repeat
c ← A[i]
if c = ∅
return null
else if c.getKey () = k
return c.getValue()
else
i ← (i + 1) mod N
p ← p + 1
until p = N
return null

Слайд 45

Updates with Linear Probing
To handle insertions and deletions, we introduce a

special object, called DEFUNCT, which replaces deleted elements
remove(k)
We search for an entry with key k
If such an entry (k, o) is found, we replace it with the special item DEFUNCT and we return element o
Else, we return null

Слайд 46

Updates with Linear Probing
put(k, o)
We throw an exception if the table

is full
We start at cell h(k)
We probe consecutive cells until the following occurs
A cell i is found that is either empty or stores DEFUNCT, or
We store (k, o) in cell i

Слайд 47

Collision Handling
Open Addressing: the colliding item is placed in a different

cell of the table
B. Double Hashing: uses a secondary hash function d(k) and handles collision by placing an items in the first available of cell of the series

(h(k) + jd(k)) mod N
for j = 1, … , N − 1

Слайд 48