DSA Tutorial
🔍

Hash Table Time & Space Complexity

The Complete Hash Table Operation Complexity Table

Every hash table operation has two complexities — average and worst case. The gap between them is what makes hashing unique among data structures.

OperationAverage CaseWorst CaseWorst Case Trigger
Insert (put / set / [])O(1)O(n)All keys collide into one chain
Lookup (get / has / count)O(1)O(n)All keys collide into one chain
Delete (remove / erase)O(1)O(n)All keys collide into one chain
Size (size / len)O(1)O(1)Stored as a field
Iterate all entriesO(n)O(n)Must visit every entry
Rehash (resize)O(n)O(n)Triggered by load factor
Build from n elementsO(n)O(n²)All keys collide during build
Set union (n + m elements)O(n + m)O(n + m)Insert all elements of both
Set intersectionO(min(n, m))O(n × m)Each check is a lookup
Set differenceO(n)O(n × m)Each check is a lookup

Why O(1) Average — The Statistical Guarantee

Hash tables achieve O(1) average by spreading keys uniformly across the table. With n entries in a table of capacity c, the expected chain length at any slot is:

Expected chain length = n / c = load factor α

With α = 0.75 (Java HashMap default):
  Expected comparisons per lookup = 1 + α/2 ≈ 1.375
  That rounds to O(1) — a fixed constant regardless of n

With α = 0.5:
  Expected comparisons per lookup ≈ 1.25 — faster

With α = 2.0 (twice as many entries as slots):
  Expected comparisons per lookup ≈ 2 — still O(1), but slower constant

The "O(1)" is not magic — it is the consequence of maintaining a low load factor. The load factor is kept below a threshold by rehashing when it gets too high. Rehashing is O(n) but happens exponentially less often as the table grows, giving O(1) amortized per insertion.

Why O(n) Worst Case — The Collision Catastrophe

The worst case occurs when all n keys hash to the same slot. Every lookup must scan the entire chain of length n.

Table size = 7, hash(k) = k % 7

Insert keys: 0, 7, 14, 21, 28, 35, 42
  hash(0)  = 0
  hash(7)  = 0
  hash(14) = 0
  hash(21) = 0
  hash(28) = 0
  hash(35) = 0
  hash(42) = 0

All 7 keys land at slot 0. Lookup for key 42:
  Scan slot 0 chain: 0 → 7 → 14 → 21 → 28 → 35 → 42
  7 comparisons → O(n) for a table of n=7 entries

This is called a hash-flooding attack when done intentionally. An attacker who knows the hash function can craft keys that always collide, degrading a web server's hash table from O(1) to O(n) per request.

Java 8 defends against this by switching from linked lists to red-black trees when a chain exceeds 8 entries — capping worst-case lookup at O(log n) per bucket.

Python 3.3+ defends with hash randomization — the hash function seed changes with each process startup, so an attacker cannot predict which keys will collide.

Load Factor and Its Effect on Performance

The load factor α governs the tradeoff between time and space in a hash table.

Load factor α = number of entries / table capacity

α → 0:   Very sparse table
  Lots of empty slots, very few collisions
  Fast lookups, wastes memory
  
α = 0.5:  Half full
  Good balance — fast and reasonably compact
  
α = 0.75: Java HashMap threshold
  Well-studied sweet spot for chaining
  Forces resize when reached
  
α = 1.0:  Full (for chaining)
  Every slot has one chain entry on average
  Acceptable but collisions increasing
  Open addressing MUST resize before this

α > 1.0:  Overloaded (chaining only — open addressing cannot exceed 1.0)
  Multiple entries per slot
  Performance degrades linearly with α

Performance by load factor (chaining, successful lookup):
  α = 0.5  → ~1.25 comparisons
  α = 0.75 → ~1.38 comparisons
  α = 1.0  → ~1.50 comparisons
  α = 2.0  → ~2.00 comparisons
  α = 5.0  → ~3.50 comparisons
  
These are all O(1) — the constant grows, but not with n.

Amortized O(1) Insert — The Rehashing Math

Each individual insert is O(1) except when it triggers a rehash, which is O(n). Why is the amortized cost still O(1)?

Doubling strategy: capacity doubles on every resize

Rehash history for n = 1000 insertions (starting capacity 16):
  Rehash at entry 13  → copy 12  elements
  Rehash at entry 25  → copy 24  elements
  Rehash at entry 49  → copy 48  elements
  Rehash at entry 97  → copy 96  elements
  Rehash at entry 193 → copy 192 elements
  Rehash at entry 385 → copy 384 elements
  Rehash at entry 769 → copy 768 elements

Total elements copied across all rehashes:
  12 + 24 + 48 + 96 + 192 + 384 + 768 = 1524

For n = 1000 insertions: 1524 extra copies ≈ 1.5n

Total work: n insertions + 1.5n copies = 2.5n = O(n)
Amortized per insertion: O(n) / n = O(1)

This geometric series argument — copies are 1 + 2 + 4 + ... + n/2 ≈ n — is identical to the dynamic array amortized analysis. Every expensive rehash is "pre-paid" by the n/2 cheap insertions that preceded it.

Space Complexity Reference

StructureSpace UsedNotes
HashMap with n entriesO(n)n key-value pairs + hash table array
HashSet with n entriesO(n)n keys + hash table array
Frequency array (int[26])O(1)Fixed size regardless of n
Frequency array (int[128])O(1)Fixed size for ASCII
Frequency map of n distinct keysO(n)n entries
Prefix sum map for array of nO(n)Up to n distinct prefix sums
Two HashMaps for bijectionO(n)Two maps, combined O(n)

Space Inside Each Entry

Java HashMap<String, Integer> per entry:
  Key object reference:   8 bytes (pointer)
  Value object reference: 8 bytes (pointer)
  Key String object:      ~40 + 2×length bytes (header + UTF-16 chars)
  Value Integer object:   ~16 bytes (header + int)
  Chain node overhead:    ~24 bytes (next pointer + hash code + header)
  Per entry total:        ~96 + 2×keyLength bytes

Python dict per entry (CPython):
  key object reference:   8 bytes
  value object reference: 8 bytes
  hash code cache:        8 bytes
  Per entry total:        ~24 bytes (plus object overhead for key/value)

C++ unordered_map<string, int> per entry:
  Key string (SSO inline): ~32 bytes for short strings
  Value int:                4 bytes
  Linked list node:        ~8 bytes (next pointer)
  Per entry total:         ~44 bytes + key length for long keys

JavaScript Map per entry:
  V8 manages internally — approximately 40-80 bytes per entry

Algorithm Complexity Using Hash Tables

When you use a hash map or hash set inside a larger algorithm, the hash operations contribute to the total complexity.

AlgorithmTimeSpaceHash Table Role
Two SumO(n)O(n)Store seen values for O(1) complement lookup
Contains DuplicateO(n)O(n)Store seen elements, O(1) existence check
Longest Consecutive SequenceO(n)O(n)O(1) neighbor existence check
Group AnagramsO(n × k)O(n × k)Frequency key → group lookup in O(1)
Top K Frequent ElementsO(n)O(n)Frequency build O(n) + bucket sort O(n)
Subarray Sum Equals KO(n)O(n)Prefix sum count in O(1) per step
Valid AnagramO(n)O(1)Fixed 26-array, O(1) comparison
Isomorphic StringsO(n)O(1)Two maps, bounded alphabet
LRU Cache (get + put)O(1) eachO(capacity)HashMap for O(1) key lookup
Word Frequency CountO(n)O(k)k distinct words, each O(1) insert

Hash Table vs Other Structures — When to Choose What

Understanding complexity differences guides the right structure choice.

Query type: "Does this element exist?"
  Array (unsorted):   O(n)    — linear scan
  Array (sorted):     O(log n) — binary search
  HashSet:            O(1)    — use HashSet

Query type: "What value is associated with this key?"
  Array of pairs:     O(n)    — linear scan
  Sorted array:       O(log n) — binary search for key
  HashMap:            O(1)    — use HashMap

Query type: "What is the minimum/maximum key?"
  HashMap/HashSet:    O(n)    — must scan all
  TreeMap/TreeSet:    O(log n) — leftmost/rightmost in BST
  → Use tree if min/max queries needed

Query type: "Give me all keys in sorted order"
  HashMap:            O(n log n) — collect and sort
  TreeMap:            O(n)    — in-order traversal
  → Use TreeMap if ordering needed

Query type: "All keys between x and y?"
  HashMap:            O(n)    — scan and filter
  TreeMap:            O(log n + k) — range query, k = result count
  → Use TreeMap for range queries

Reading Constraints to Predict Whether Hashing Applies

n ≤ 100:
  Any approach works — O(n²) brute force acceptable
  Hashing is fine but not required

n ≤ 10,000:
  O(n²) may time out on slow judges
  Hashing converts O(n²) lookup loops to O(n) — usually required

n ≤ 100,000 (10^5):
  O(n²) will time out
  O(n log n) or O(n) required
  HashMap/HashSet gives O(n) lookup, set operations O(n)

n ≤ 1,000,000 (10^6):
  Must be O(n) or O(n log n)
  Frequency arrays preferred over hash maps when possible (O(1) space)
  Hash maps for unbounded or large-range values

Memory signals:
  "lowercase letters only" → int[26], O(1) space, O(n) build
  "ASCII printable"        → int[128], O(1) space, O(n) build
  "arbitrary integers"     → HashMap, O(n) space
  "arbitrary strings"      → HashMap, O(n × avg_length) space

The O(1) vs O(n) Space Tradeoff

Every hash-based optimization trades space for time. Three canonical tradeoffs:

Tradeoff 1 — Existence Check:
  Brute force: scan array for each element → O(n) per check
  HashSet:     build once in O(n), check in O(1)
  Cost: O(n) space for the set
  Gain: O(n) per check → O(1) per check

Tradeoff 2 — Lookup by Value:
  Brute force: scan for matching value → O(n) per lookup
  HashMap:     build value→index map in O(n), lookup in O(1)
  Cost: O(n) space for the map
  Gain: O(n) per lookup → O(1) per lookup

Tradeoff 3 — Frequency Precomputation:
  Brute force: count element occurrences each time → O(n) per count
  Frequency map: build once in O(n), each count query in O(1)
  Cost: O(k) space where k = distinct elements
  Gain: O(n) per count → O(1) per count

Complexity Comparison: HashMap, TreeMap, Array

For interview problems, knowing when O(1) hash beats O(log n) tree helps avoid over-engineering and signals correct reasoning.

OperationUnsorted ArraySorted ArrayHashMapTreeMap
Lookup by keyO(n)O(log n)O(1) avgO(log n)
InsertO(1) appendO(n) shiftO(1) avgO(log n)
Delete by keyO(n)O(n)O(1) avgO(log n)
Min / MaxO(n)O(1)O(n)O(log n)
Floor / CeilingO(n)O(log n)O(n)O(log n)
Iterate sortedO(n log n)O(n)O(n log n)O(n)
Range queryO(n)O(log n + k)O(n)O(log n + k)

Use HashMap when you need fast key-value operations and do not care about order. Use TreeMap when you need ordering, min/max, floor/ceiling, or range queries. Never use TreeMap just for sorted output if you can sort the HashMap keys once at the end — that is O(n log n) either way.

Hidden Complexity: Hash Table Operations That Are Not O(1)

Several operations look like they should be O(1) but are actually more expensive.

Operation                         Looks Like    Actually Is
map.entrySet() iteration          O(1) setup    O(n) total — visits every bucket
map.values().stream().sorted()    O(n log n)    O(n log n) — sort after O(n) collect
"abc".hashCode() (first call)     O(1)?         O(3) = O(k) — hashes each char once
s.equals(t) used as map key check O(1)?         O(min(n,m)) char-by-char comparison
map.containsValue(v)              O(1)?         O(n) — must scan all values
new HashMap<>(existingMap)        O(1)?         O(n) — copies all n entries
Collections.frequency(list, x)   O(1)?         O(n) — linear scan, not a hash lookup

The most common mistake: using map.containsValue(v) thinking it is O(1). It is O(n) — HashMap only provides O(1) lookup by key, not by value. If you need O(1) value lookup, maintain a second HashMap with the values as keys.

Practical Complexity Checklist for Hash Table Problems

Step 1: Identify the query type.
  "Does X exist?"         → HashSet, O(1) per query
  "How many times X?"     → Frequency map, O(1) after O(n) build
  "What maps to X?"       → HashMap, O(1) per query
  "Minimum/maximum key?"  → TreeMap or scan HashMap in O(n)

Step 2: Determine if hashing is the bottleneck or the tool.
  Hashing replaces O(n) scan → O(1) lookup
  If the rest of the algorithm is O(n²), hashing drops it to O(n)
  If the rest is O(n log n), hashing drops it to O(n log n) still

Step 3: Account for space.
  n entries in map → O(n) space
  int[26] for letters → O(1) space
  Prefix sum map → O(n) space
  Two maps for bijection → O(n) space combined

Step 4: Check for hidden O(n) inside loops.
  map.containsValue() inside a loop → O(n²) total
  string.equals() as map key check → O(k) per comparison
  Building a new map inside a loop → O(n) per iteration → O(n²) total

Step 5: Verify load factor assumptions.
  Standard library maps stay below threshold automatically
  Custom hash tables: declare capacity as ~n/0.75 to avoid rehashing
  Pre-size with new HashMap<>(n * 4 / 3 + 1) when n is known

Interview Questions

Q: What causes O(n) worst-case lookup in a hash table, and how do Java and Python defend against it?

O(n) worst-case occurs when all n keys hash to the same slot, creating a chain of length n that must be fully scanned. This can happen accidentally with a poor hash function, or intentionally as a hash-flooding attack. Java 8+ defends by converting chains longer than 8 entries to red-black trees, capping worst-case at O(log n) per bucket. Python 3.3+ defends with hash randomization — the hash seed changes with each process, making it impossible for an attacker to predict collisions without knowing the seed.

Q: How does the load factor affect hash table performance, and what happens at extreme values?

The load factor α = entries/capacity determines the expected chain length (for chaining) or probe distance (for open addressing). At α = 0.75 (Java's threshold), expected lookups require ~1.4 comparisons — effectively O(1). At α → 0, the table is nearly empty with minimal collisions but wastes memory. At α → 1 (for chaining), chains average length 1 — still O(1) but with more variance. For open addressing, α → 1 means nearly every slot is occupied and probing becomes catastrophically slow — O(1/(1-α)) grows unboundedly. Open addressing tables must resize well before reaching α = 1.

Q: Why is building a hash map from n elements O(n) and not O(n²)?

Each individual insertion is O(1) amortized. With a good hash function and a maintained load factor, the expected chain length at any slot is O(α) ≈ O(1). Inserting n elements each taking O(1) gives O(n) total. Rehashing is O(n) when triggered, but the geometric series argument shows total copy work across all rehashes is also O(n). So n insertions + O(n) total rehash work = O(n) overall.

Q: When would you use a TreeMap instead of a HashMap, despite O(log n) being slower than O(1)?

When the problem requires ordered traversal, minimum or maximum key, floor/ceiling queries, or range queries. HashMap provides none of these — all require O(n) on a HashMap but O(log n) or O(n) respectively on a TreeMap. Examples: finding the closest number to a target (floor/ceiling), iterating entries in sorted key order, or finding all keys within a range. If an interview problem mentions "sorted," "range," "closest," or "k-th smallest/largest key," consider TreeMap.

FAQs

Is O(1) hash table lookup guaranteed or just expected?

Expected (average case). The guarantee only holds when the hash function distributes keys uniformly and the load factor stays below the threshold. In theory, adversarial keys can force all lookups to O(n). In practice with a good hash function on real data, O(1) holds consistently. For security-sensitive applications (web servers, compilers), use hash randomization or cryptographic hashes to prevent adversarial exploitation.

Does a larger hash table always mean faster lookup?

Not indefinitely. Increasing capacity reduces the load factor and expected chain length — up to a point. A very sparse table (α → 0) has near-zero collisions but poor cache behavior: the table array is mostly empty, and the few occupied slots are spread far apart, causing cache misses during the hash computation itself. The sweet spot is typically α between 0.5 and 0.75 — low enough for few collisions, dense enough for cache efficiency.

What is the space complexity of a HashMap key-value pair in Java with String keys and Integer values?

Each entry includes: the key String object (~40 + 2×length bytes for UTF-16 storage), the Integer value object (~16 bytes), the chain node (~32 bytes for next pointer, hash code cache, and object header), plus 8 bytes each for the key and value references in the node. For a short string key of 5 characters: ~40 + 10 + 16 + 32 + 8 + 8 = ~114 bytes per entry. This is why numeric keys (int, long) benefit from primitive-specialized maps (IntIntHashMap in Eclipse Collections or similar) that avoid autoboxing overhead.

Quick Quiz

Question 1: A HashMap has 1000 entries and capacity 2048. What is the load factor and what does it imply about performance?

  • A) α = 2.048, performance is degraded
  • B) α = 0.488, performance is good — well below 0.75
  • C) α = 0.75, Java will resize immediately
  • D) α = 1.0, every slot has exactly one entry

Answer: B) α ≈ 0.488. Load factor = 1000 / 2048 ≈ 0.488. This is well below Java's 0.75 threshold, so no resize will be triggered and expected chain length is less than 0.5 — nearly collision-free. The next resize will trigger at 2048 × 0.75 = 1536 entries.

Question 2: You call map.containsValue(42) inside a loop that runs n times. What is the total time complexity?

  • A) O(n) — containsValue is O(1) like containsKey
  • B) O(n log n) — containsValue is O(log n)
  • C) O(n²) — containsValue is O(n), called n times
  • D) O(1) — values are cached

Answer: C) O(n²). containsValue(v) must scan all n entries to check every value — O(n). Called n times inside a loop: O(n) × O(n) = O(n²). Fix: maintain a second HashMap of value→key for O(1) reverse lookup.

Question 3: Inserting n elements into a HashMap one by one, where each triggers rehashing (growing from 1 to 2 to 4 to ... to n). What is the total number of element copies across all rehashes?

  • A) O(n²) — each resize copies the growing table
  • B) O(n log n) — logarithmic number of resizes
  • C) O(n) — geometric series sums to ~2n
  • D) O(1) per resize — amortized cost

Answer: C) O(n). Copy counts: 1, 2, 4, 8, ..., n/2. Sum = 1 + 2 + 4 + ... + n/2 = n - 1 ≈ O(n). Total work: n insertions + n copies = O(n). Amortized per insertion: O(1).

Question 4: For an interview problem with n ≤ 10^6 elements, which is more appropriate — a frequency array int[26] or a HashMap<Character, Integer>?

  • A) HashMap — more general and equally fast
  • B) int[26] — O(1) space, faster access, no hashing overhead
  • C) Both are identical in performance at this scale
  • D) TreeMap — needed for ordered character output

Answer: B) int[26]. For lowercase letters, an int[26] array uses exactly 104 bytes regardless of n. A HashMap uses O(n) memory in general (though only up to 26 entries for lowercase). Array access is a direct index — no hash computation, no pointer chasing, no cache misses from scattered heap objects. For n = 10^6, the frequency array is 10-50x faster in practice and uses orders of magnitude less memory.

Summary

Hash table complexity is governed by one principle: O(1) average operations are maintained by keeping the load factor below a threshold, triggering O(n) rehashing when needed, which is amortized to O(1) per insertion by the doubling strategy.

The key complexities to carry as permanent intuition:

  • Insert, lookup, delete → O(1) average, O(n) worst case — the fundamental guarantee
  • Build from n elements → O(n) amortized — n × O(1) insertions
  • Rehash → O(n) per event, O(1) amortized across n insertions
  • Iteration → always O(n) — must visit every entry
  • containsValue → O(n) — not O(1) like containsKey
  • size/len → O(1) — stored as a field
  • Set union → O(n + m); intersection → O(min(n, m)) average; difference → O(n)
  • Frequency array (fixed alphabet) → O(1) space, O(n) build — always prefer over HashMap for bounded alphabets
  • Prefix sum map → O(n) space, O(n) build, O(1) per subarray query
  • Load factor sweet spot: 0.5–0.75 for time-space balance
  • Worst-case O(n) is caused by hash collisions — defended by randomized seeds (Python) or chain-to-tree conversion (Java 8+)
  • Use TreeMap (O(log n)) over HashMap (O(1) avg) only when ordering, min/max, or range queries are needed

In the next section, you will find Practice Problems organized by pattern to apply these hash table techniques on real interview problems.

Hash Table Time & Space Complexity