Data Structures and Algorithms

Содержание

Слайд 2

Recap Elementary data structures ADT Array based vs. linked implementation Worst

Recap

Elementary data structures
ADT
Array based vs. linked implementation
Worst case time complexity to

help us choose based on our needs
Слайд 3

Today’s Objectives What is a “MAP or Dictionary ADT”? What choices

Today’s Objectives

What is a “MAP or Dictionary ADT”?
What choices do we

have to implement a MAP?
What is a hash function and a hash table?
What is collision and how to handle it?
How to analyze time complexity of a Hash Map?
Слайд 4

Map or Dictionary

Map or Dictionary

Слайд 5

Map or Dictionary Models a searchable dynamic set of key-value entries

Map or Dictionary

Models a searchable dynamic set of key-value entries
Main operations

are: searching, inserting, and deleting items
Applications:
Compiler symbol table
A news indexing service
Слайд 6

The Map ADT get(k): if the map M has an entry

The Map ADT

get(k): if the map M has an entry with

key k, return its associated value; else, return null
put(k, v): insert entry (k, v) into the map M; if key k is not already in M, then return null; else, return old value associated with k
remove(k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null
size(), isEmpty()
entrySet(): return an iterable collection of the entries in M
keySet(): return an iterable collection of the keys in M
values(): return an iterator of the values in M
Слайд 7

Example Operation Output Map isEmpty() true Ø put(5,A) null (5,A) put(7,B)

Example

Operation Output Map
isEmpty() true Ø
put(5,A) null (5,A)
put(7,B) null (5,A),(7,B)
put(2,C) null (5,A),(7,B),(2,C)
put(8,D) null (5,A),(7,B),(2,C),(8,D)
put(2,E) C (5,A),(7,B),(2,E),(8,D)
get(7) B (5,A),(7,B),(2,E),(8,D)
get(4) null (5,A),(7,B),(2,E),(8,D)
get(2) E (5,A),(7,B),(2,E),(8,D)
size() 4 (5,A),(7,B),(2,E),(8,D)
remove(5) A (7,B),(2,E),(8,D)
remove(2) E (7,B),(8,D)
get(2) null (7,B),(8,D)
isEmpty() false (7,B),(8,D)

© 2014 Goodrich, Tamassia, Goldwasser

Слайд 8

A Simple List-Based Map We can implement a map using an

A Simple List-Based Map

We can implement a map using an unsorted

list
We store the items of the map in a list S (based on a doublylinked list), in arbitrary order

© 2014 Goodrich, Tamassia, Goldwasser

Слайд 9

The get(k) Algorithm Algorithm get(k): while map.hasNext() do p = map.next()

The get(k) Algorithm

Algorithm get(k):
while map.hasNext() do
p = map.next() { the next

element in the map}
if p.element().getKey() = k then
return p.element().getValue()
return null {there is no entry with key equal to k}
Слайд 10

The put(k,v) Algorithm Algorithm put(k,v): while map.hasNext() do p = map.next()

The put(k,v) Algorithm

Algorithm put(k,v):
while map.hasNext() do
p = map.next()
if p.element().getKey() = k

then
t = p.element().getValue()
map.set(p,(k,v))
return t {return the old value}
map.addLast((k,v))
n = n + 1 {increment variable storing number of entries}
return null { there was no entry with key equal to k }
Слайд 11

The remove(k) Algorithm Algorithm remove(k): while map.hasNext() do p = map.next()

The remove(k) Algorithm

Algorithm remove(k):
while map.hasNext() do
p = map.next()
if p.element().getKey() = k

then
t = p.element().getValue()
map.remove(p)
n = n – 1 {decrement number of entries}
return t {return the removed value}
return null {there is no entry with key equal to k}
Слайд 12

Performance of a List-Based Map Performance: put takes O(1) time since

Performance of a List-Based Map

Performance:
put takes O(1) time since we can

insert the new item at the beginning or at the end of the sequence
get and remove take O(n) time since in the worst case (the item is not found) we traverse the entire sequence to look for an item with the given key
The unsorted list implementation is effective only for maps of small size or for maps in which puts are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation)
Слайд 13

Hash Map

Hash Map

Слайд 14

Let’s Start With this Question How much time does it take

Let’s Start With this Question

How much time does it take to

lookup an item in an array, if you already know its index?
Слайд 15

Example Suppose you’re writing a program to access employee records for

Example

Suppose you’re writing a program to access employee records for a

company with 1000 employees.
Goal: fastest possible access to any individual record
Each employee has a number from 1(founder) to 1000 (the most recent worker)
Employees are seldom laid off, and even when they are, their record stays in the database.
Слайд 16

Example (cont.) The easiest way to do this is by using

Example (cont.)

The easiest way to do this is by using an

array (we already know the size)
Each employee record occupies one cell of the array
The index number of the cell is the employee number
empRecord rec = databaseArray[72];
databaseArray[totalEmployees++] = newRecord;
Слайд 17

Example (cont.) The speed and simplicity of data access using this

Example (cont.)

The speed and simplicity of data access using this array-based

database make it very attractive.
However, it works in our example only because keys are well organized
Sequentially from 1 to a known maximum
No deletions required
New items can be added sequentially at the end
Слайд 18

Example (cont.) But mostly, the keys are not so well behaved

Example (cont.)

But mostly, the keys are not so well behaved
A simple

example would be when keys are of type String.
Array indexing requires integer
One more problem: Even when using integers, the value could be outside of the range of the array
Слайд 19

What Did We Learn From The Example? Arrays are very fast

What Did We Learn From The Example?

Arrays are very fast when

it comes to accessing an item based on its index
But “key” ? “index” mapping only works when
keys are integers, and
are within the bound, and
are not changed
Слайд 20

Hash Map Hash Table is a very practical way to maintain a map

Hash Map

Hash Table is a very practical way to maintain a

map
Слайд 21

Hash Table

Hash Table

 

Слайд 22

Hash Table

Hash Table

 

Слайд 23

Hash Function A hash function h maps keys of a given

Hash Function

A hash function h maps keys of a given type

to integers in a fixed interval [0, N − 1]
Example: h(x) = x mod N is a hash function for integer keys
The integer h(x) is called the hash value of key x
Слайд 24

Simple Hash Function for Integers

Simple Hash Function for Integers

Слайд 25

General Hash Functions A hash function is usually specified as the

General Hash Functions

A hash function is usually specified as the composition

of two functions:
Hash code: h1: keys → integers
Compression function: h2: integers → [0, N → 1]

The hash code is applied first, and the compression function is applied next on the result, i.e., h(x) = h2(h1(x))
The goal of the hash function is to “disperse” the keys in a uniform manner

Слайд 26

Parts of a Hash Function © 2014 Goodrich, Tamassia, Goldwasser

Parts of a Hash Function

© 2014 Goodrich, Tamassia, Goldwasser

Слайд 27

Ideal Hash Function Every resulting hash value has exactly one input

Ideal Hash Function

Every resulting hash value has exactly one input that

will produce it
Same key hashes to the same index (repeatable)
Hash value is widely different if even a single bit is different in the key (avalanche)
Should work in general (for different types)
Слайд 28

Some Principles If n items are placed in m buckets, and

Some Principles

If n items are placed in m buckets, and n

is greater than m, one or more buckets contain two or more items (Pigeonhole Principle)
This is called collision (two keys hash to the same index)
Birthday paradox

https://en.wikipedia.org/wiki/Birthday_problem

Слайд 29

Collisions So collisions are inevitable Our goal should therefore be to

Collisions

So collisions are inevitable
Our goal should therefore be to minimize collisions
We

will achieve it through:
Generating better hash codes
Performing better compression
Handling collisions
Слайд 30

Tip! Designing a hash function is a black art It is

Tip!

Designing a hash function is a black art
It is always better

to use a known good algorithm
But sometimes, as a student, it is better to try to design one for the sake of practice
Слайд 31

Hash Codes Memory address: We reinterpret the memory address of the

Hash Codes

Memory address:
We reinterpret the memory address of the key object

as an integer (default hash code of all Java objects)
Good in general, except that it is not repeatable

© 2014 Goodrich, Tamassia, Goldwasser

Слайд 32

Hash Codes (cont.) Integer cast: We reinterpret the bits of the

Hash Codes (cont.)

Integer cast:
We reinterpret the bits of the key as

an integer
Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., byte, short, int and float in Java)

© 2014 Goodrich, Tamassia, Goldwasser

Слайд 33

Hash Codes (cont.) Component sum: We partition the bits of the

Hash Codes (cont.)

Component sum:
We partition the bits of the key into

components of fixed length (e.g., 16 or 32 bits) and we sum the components
Fails to treat permutations differently (“abc”, “cba”, “cab”)

© 2014 Goodrich, Tamassia, Goldwasser

Слайд 34

Hash Codes (cont.) We partition the bits of the key into

Hash Codes (cont.)
We partition the bits of the key into a

sequence of components of fixed length (e.g., 8, 16 or 32 bits)
a0 a1 … an−1
We evaluate the polynomial
p(z) = a0 + a1 z + a2 z2 + … … + an−1zn−1
at a fixed value z
Especially suitable for strings (e.g., the choice z = 33 gives at most 6 collisions on a set of 50,000 English words)

© 2014 Goodrich, Tamassia, Goldwasser

Polynomial accumulation:

Слайд 35

Compression Functions Division: h2 (y) = y mod N The size

Compression Functions

Division:
h2 (y) = y mod N
The size N of the

hash table is usually chosen to be a prime
Helps “spread out” the distribution of hashed values
Try inset keys with hash codes {200, 205, 210, 215, …, 600} into a table size of 100 vs. 101
Слайд 36

Compression Functions Multiply, Add and Divide (MAD) h2 (y) = [(ay

Compression Functions

Multiply, Add and Divide (MAD)
h2 (y) = [(ay + b)

mod p] mod N
p is a prime number larger than N
a and b are integers from the interval [0, p – 1], with a > 0
Слайд 37

Collision Handling

Collision Handling

 

Слайд 38

Collision Handling Separate Chaining: let each cell in the table point

Collision Handling

Separate Chaining: let each cell in the table point to

a linked list of entries that map there
Separate chaining is simple, but requires additional memory outside the table
Слайд 39

Analysis of get(k) in Separate Chaining

Analysis of get(k) in Separate Chaining

 

Слайд 40

Analysis of get(k) in Separate Chaining

 

Analysis of get(k) in Separate Chaining

Слайд 41

Collision Handling Open Addressing: the colliding item is placed in a

Collision Handling

Open Addressing: the colliding item is placed in a different

cell of the table
Linear Probing: handles collision by placing the item in the next (circularly) available cell
Each cell inspected is called a probe
Colliding items lump together, causing future collisions to cause a longer sequence of probes
Слайд 42

Example Example: Linear probing h(x) = x mod 13 Insert keys

Example

Example:
Linear probing
h(x) = x mod 13
Insert keys 18, 41, 22, 44,

59, 32, 31, 73, in this order
Слайд 43

Example Example: Linear probing h(x) = x mod 13 Insert keys

Example

Example:
Linear probing
h(x) = x mod 13
Insert keys 18, 41, 22, 44,

59, 32, 31, 73, in this order
Слайд 44

Search with Linear Probing Consider a hash table A that uses

Search with Linear Probing

Consider a hash table A that uses linear

probing
get(k)
We start at cell h(k)
We probe consecutive locations until one of the following occurs
An item with key k is found, or
An empty cell is found, or
N cells have been unsuccessfully probed

Algorithm get(k)
i ← h(k)
p ← 0
repeat
c ← A[i]
if c = ∅
return null
else if c.getKey () = k
return c.getValue()
else
i ← (i + 1) mod N
p ← p + 1
until p = N
return null

Слайд 45

Updates with Linear Probing To handle insertions and deletions, we introduce

Updates with Linear Probing

To handle insertions and deletions, we introduce a

special object, called DEFUNCT, which replaces deleted elements
remove(k)
We search for an entry with key k
If such an entry (k, o) is found, we replace it with the special item DEFUNCT and we return element o
Else, we return null
Слайд 46

Updates with Linear Probing put(k, o) We throw an exception if

Updates with Linear Probing

put(k, o)
We throw an exception if the table

is full
We start at cell h(k)
We probe consecutive cells until the following occurs
A cell i is found that is either empty or stores DEFUNCT, or
We store (k, o) in cell i
Слайд 47

Collision Handling Open Addressing: the colliding item is placed in a

Collision Handling

Open Addressing: the colliding item is placed in a different

cell of the table
B. Double Hashing: uses a secondary hash function d(k) and handles collision by placing an items in the first available of cell of the series

(h(k) + jd(k)) mod N
for j = 1, … , N − 1

Слайд 48

Double Hashing The secondary hash function cannot have zero values The

Double Hashing

The secondary hash function cannot have zero values
The table size

N must be prime to allow probing of all the cells.
Слайд 49

Double Hashing Common choice of compression function for the secondary hash

Double Hashing

Common choice of compression function for the secondary hash function:


d(k) = q − (k mod q)
where
q < N
q is a prime
The possible values for d(k) are 1, 2, … , q

Слайд 50

Example Consider a hash table storing integer keys that handles collision

Example

Consider a hash table storing integer keys that handles collision with

double hashing
N = 13
h(k) = k mod 13
d(k) = 7 − k mod 7
Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order
Слайд 51

Example Consider a hash table storing integer keys that handles collision

Example

Consider a hash table storing integer keys that handles collision with

double hashing
N = 13
h(k) = k mod 13
d(k) = 7 − k mod 7
Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order
Слайд 52

Analysis of get(k) in Open Addressing

 

Analysis of get(k) in Open Addressing