Hash Table Time & Space Complexity
The Complete Hash Table Operation Complexity Table
Every hash table operation has two complexities — average and worst case. The gap between them is what makes hashing unique among data structures.
| Operation | Average Case | Worst Case | Worst Case Trigger |
|---|---|---|---|
Insert (put / set / []) | O(1) | O(n) | All keys collide into one chain |
Lookup (get / has / count) | O(1) | O(n) | All keys collide into one chain |
Delete (remove / erase) | O(1) | O(n) | All keys collide into one chain |
Size (size / len) | O(1) | O(1) | Stored as a field |
| Iterate all entries | O(n) | O(n) | Must visit every entry |
| Rehash (resize) | O(n) | O(n) | Triggered by load factor |
| Build from n elements | O(n) | O(n²) | All keys collide during build |
| Set union (n + m elements) | O(n + m) | O(n + m) | Insert all elements of both |
| Set intersection | O(min(n, m)) | O(n × m) | Each check is a lookup |
| Set difference | O(n) | O(n × m) | Each check is a lookup |
Why O(1) Average — The Statistical Guarantee
Hash tables achieve O(1) average by spreading keys uniformly across the table. With n entries in a table of capacity c, the expected chain length at any slot is:
Expected chain length = n / c = load factor α With α = 0.75 (Java HashMap default): Expected comparisons per lookup = 1 + α/2 ≈ 1.375 That rounds to O(1) — a fixed constant regardless of n With α = 0.5: Expected comparisons per lookup ≈ 1.25 — faster With α = 2.0 (twice as many entries as slots): Expected comparisons per lookup ≈ 2 — still O(1), but slower constant
The "O(1)" is not magic — it is the consequence of maintaining a low load factor. The load factor is kept below a threshold by rehashing when it gets too high. Rehashing is O(n) but happens exponentially less often as the table grows, giving O(1) amortized per insertion.
Why O(n) Worst Case — The Collision Catastrophe
The worst case occurs when all n keys hash to the same slot. Every lookup must scan the entire chain of length n.
Table size = 7, hash(k) = k % 7 Insert keys: 0, 7, 14, 21, 28, 35, 42 hash(0) = 0 hash(7) = 0 hash(14) = 0 hash(21) = 0 hash(28) = 0 hash(35) = 0 hash(42) = 0 All 7 keys land at slot 0. Lookup for key 42: Scan slot 0 chain: 0 → 7 → 14 → 21 → 28 → 35 → 42 7 comparisons → O(n) for a table of n=7 entries
This is called a hash-flooding attack when done intentionally. An attacker who knows the hash function can craft keys that always collide, degrading a web server's hash table from O(1) to O(n) per request.
Java 8 defends against this by switching from linked lists to red-black trees when a chain exceeds 8 entries — capping worst-case lookup at O(log n) per bucket.
Python 3.3+ defends with hash randomization — the hash function seed changes with each process startup, so an attacker cannot predict which keys will collide.
Load Factor and Its Effect on Performance
The load factor α governs the tradeoff between time and space in a hash table.
Load factor α = number of entries / table capacity α → 0: Very sparse table Lots of empty slots, very few collisions Fast lookups, wastes memory α = 0.5: Half full Good balance — fast and reasonably compact α = 0.75: Java HashMap threshold Well-studied sweet spot for chaining Forces resize when reached α = 1.0: Full (for chaining) Every slot has one chain entry on average Acceptable but collisions increasing Open addressing MUST resize before this α > 1.0: Overloaded (chaining only — open addressing cannot exceed 1.0) Multiple entries per slot Performance degrades linearly with α Performance by load factor (chaining, successful lookup): α = 0.5 → ~1.25 comparisons α = 0.75 → ~1.38 comparisons α = 1.0 → ~1.50 comparisons α = 2.0 → ~2.00 comparisons α = 5.0 → ~3.50 comparisons These are all O(1) — the constant grows, but not with n.
Amortized O(1) Insert — The Rehashing Math
Each individual insert is O(1) except when it triggers a rehash, which is O(n). Why is the amortized cost still O(1)?
Doubling strategy: capacity doubles on every resize Rehash history for n = 1000 insertions (starting capacity 16): Rehash at entry 13 → copy 12 elements Rehash at entry 25 → copy 24 elements Rehash at entry 49 → copy 48 elements Rehash at entry 97 → copy 96 elements Rehash at entry 193 → copy 192 elements Rehash at entry 385 → copy 384 elements Rehash at entry 769 → copy 768 elements Total elements copied across all rehashes: 12 + 24 + 48 + 96 + 192 + 384 + 768 = 1524 For n = 1000 insertions: 1524 extra copies ≈ 1.5n Total work: n insertions + 1.5n copies = 2.5n = O(n) Amortized per insertion: O(n) / n = O(1)
This geometric series argument — copies are 1 + 2 + 4 + ... + n/2 ≈ n — is identical to the dynamic array amortized analysis. Every expensive rehash is "pre-paid" by the n/2 cheap insertions that preceded it.
Space Complexity Reference
| Structure | Space Used | Notes |
|---|---|---|
| HashMap with n entries | O(n) | n key-value pairs + hash table array |
| HashSet with n entries | O(n) | n keys + hash table array |
| Frequency array (int[26]) | O(1) | Fixed size regardless of n |
| Frequency array (int[128]) | O(1) | Fixed size for ASCII |
| Frequency map of n distinct keys | O(n) | n entries |
| Prefix sum map for array of n | O(n) | Up to n distinct prefix sums |
| Two HashMaps for bijection | O(n) | Two maps, combined O(n) |
Space Inside Each Entry
Java HashMap<String, Integer> per entry: Key object reference: 8 bytes (pointer) Value object reference: 8 bytes (pointer) Key String object: ~40 + 2×length bytes (header + UTF-16 chars) Value Integer object: ~16 bytes (header + int) Chain node overhead: ~24 bytes (next pointer + hash code + header) Per entry total: ~96 + 2×keyLength bytes Python dict per entry (CPython): key object reference: 8 bytes value object reference: 8 bytes hash code cache: 8 bytes Per entry total: ~24 bytes (plus object overhead for key/value) C++ unordered_map<string, int> per entry: Key string (SSO inline): ~32 bytes for short strings Value int: 4 bytes Linked list node: ~8 bytes (next pointer) Per entry total: ~44 bytes + key length for long keys JavaScript Map per entry: V8 manages internally — approximately 40-80 bytes per entry
Algorithm Complexity Using Hash Tables
When you use a hash map or hash set inside a larger algorithm, the hash operations contribute to the total complexity.
| Algorithm | Time | Space | Hash Table Role |
|---|---|---|---|
| Two Sum | O(n) | O(n) | Store seen values for O(1) complement lookup |
| Contains Duplicate | O(n) | O(n) | Store seen elements, O(1) existence check |
| Longest Consecutive Sequence | O(n) | O(n) | O(1) neighbor existence check |
| Group Anagrams | O(n × k) | O(n × k) | Frequency key → group lookup in O(1) |
| Top K Frequent Elements | O(n) | O(n) | Frequency build O(n) + bucket sort O(n) |
| Subarray Sum Equals K | O(n) | O(n) | Prefix sum count in O(1) per step |
| Valid Anagram | O(n) | O(1) | Fixed 26-array, O(1) comparison |
| Isomorphic Strings | O(n) | O(1) | Two maps, bounded alphabet |
| LRU Cache (get + put) | O(1) each | O(capacity) | HashMap for O(1) key lookup |
| Word Frequency Count | O(n) | O(k) | k distinct words, each O(1) insert |
Hash Table vs Other Structures — When to Choose What
Understanding complexity differences guides the right structure choice.
Query type: "Does this element exist?" Array (unsorted): O(n) — linear scan Array (sorted): O(log n) — binary search HashSet: O(1) — use HashSet Query type: "What value is associated with this key?" Array of pairs: O(n) — linear scan Sorted array: O(log n) — binary search for key HashMap: O(1) — use HashMap Query type: "What is the minimum/maximum key?" HashMap/HashSet: O(n) — must scan all TreeMap/TreeSet: O(log n) — leftmost/rightmost in BST → Use tree if min/max queries needed Query type: "Give me all keys in sorted order" HashMap: O(n log n) — collect and sort TreeMap: O(n) — in-order traversal → Use TreeMap if ordering needed Query type: "All keys between x and y?" HashMap: O(n) — scan and filter TreeMap: O(log n + k) — range query, k = result count → Use TreeMap for range queries
Reading Constraints to Predict Whether Hashing Applies
n ≤ 100: Any approach works — O(n²) brute force acceptable Hashing is fine but not required n ≤ 10,000: O(n²) may time out on slow judges Hashing converts O(n²) lookup loops to O(n) — usually required n ≤ 100,000 (10^5): O(n²) will time out O(n log n) or O(n) required HashMap/HashSet gives O(n) lookup, set operations O(n) n ≤ 1,000,000 (10^6): Must be O(n) or O(n log n) Frequency arrays preferred over hash maps when possible (O(1) space) Hash maps for unbounded or large-range values Memory signals: "lowercase letters only" → int[26], O(1) space, O(n) build "ASCII printable" → int[128], O(1) space, O(n) build "arbitrary integers" → HashMap, O(n) space "arbitrary strings" → HashMap, O(n × avg_length) space
The O(1) vs O(n) Space Tradeoff
Every hash-based optimization trades space for time. Three canonical tradeoffs:
Tradeoff 1 — Existence Check: Brute force: scan array for each element → O(n) per check HashSet: build once in O(n), check in O(1) Cost: O(n) space for the set Gain: O(n) per check → O(1) per check Tradeoff 2 — Lookup by Value: Brute force: scan for matching value → O(n) per lookup HashMap: build value→index map in O(n), lookup in O(1) Cost: O(n) space for the map Gain: O(n) per lookup → O(1) per lookup Tradeoff 3 — Frequency Precomputation: Brute force: count element occurrences each time → O(n) per count Frequency map: build once in O(n), each count query in O(1) Cost: O(k) space where k = distinct elements Gain: O(n) per count → O(1) per count
Complexity Comparison: HashMap, TreeMap, Array
For interview problems, knowing when O(1) hash beats O(log n) tree helps avoid over-engineering and signals correct reasoning.
| Operation | Unsorted Array | Sorted Array | HashMap | TreeMap |
|---|---|---|---|---|
| Lookup by key | O(n) | O(log n) | O(1) avg | O(log n) |
| Insert | O(1) append | O(n) shift | O(1) avg | O(log n) |
| Delete by key | O(n) | O(n) | O(1) avg | O(log n) |
| Min / Max | O(n) | O(1) | O(n) | O(log n) |
| Floor / Ceiling | O(n) | O(log n) | O(n) | O(log n) |
| Iterate sorted | O(n log n) | O(n) | O(n log n) | O(n) |
| Range query | O(n) | O(log n + k) | O(n) | O(log n + k) |
Use HashMap when you need fast key-value operations and do not care about order. Use TreeMap when you need ordering, min/max, floor/ceiling, or range queries. Never use TreeMap just for sorted output if you can sort the HashMap keys once at the end — that is O(n log n) either way.
Hidden Complexity: Hash Table Operations That Are Not O(1)
Several operations look like they should be O(1) but are actually more expensive.
Operation Looks Like Actually Is map.entrySet() iteration O(1) setup O(n) total — visits every bucket map.values().stream().sorted() O(n log n) O(n log n) — sort after O(n) collect "abc".hashCode() (first call) O(1)? O(3) = O(k) — hashes each char once s.equals(t) used as map key check O(1)? O(min(n,m)) char-by-char comparison map.containsValue(v) O(1)? O(n) — must scan all values new HashMap<>(existingMap) O(1)? O(n) — copies all n entries Collections.frequency(list, x) O(1)? O(n) — linear scan, not a hash lookup
The most common mistake: using map.containsValue(v) thinking it is O(1). It is O(n) — HashMap only provides O(1) lookup by key, not by value. If you need O(1) value lookup, maintain a second HashMap with the values as keys.
Practical Complexity Checklist for Hash Table Problems
Step 1: Identify the query type. "Does X exist?" → HashSet, O(1) per query "How many times X?" → Frequency map, O(1) after O(n) build "What maps to X?" → HashMap, O(1) per query "Minimum/maximum key?" → TreeMap or scan HashMap in O(n) Step 2: Determine if hashing is the bottleneck or the tool. Hashing replaces O(n) scan → O(1) lookup If the rest of the algorithm is O(n²), hashing drops it to O(n) If the rest is O(n log n), hashing drops it to O(n log n) still Step 3: Account for space. n entries in map → O(n) space int[26] for letters → O(1) space Prefix sum map → O(n) space Two maps for bijection → O(n) space combined Step 4: Check for hidden O(n) inside loops. map.containsValue() inside a loop → O(n²) total string.equals() as map key check → O(k) per comparison Building a new map inside a loop → O(n) per iteration → O(n²) total Step 5: Verify load factor assumptions. Standard library maps stay below threshold automatically Custom hash tables: declare capacity as ~n/0.75 to avoid rehashing Pre-size with new HashMap<>(n * 4 / 3 + 1) when n is known
Interview Questions
Q: What causes O(n) worst-case lookup in a hash table, and how do Java and Python defend against it?
O(n) worst-case occurs when all n keys hash to the same slot, creating a chain of length n that must be fully scanned. This can happen accidentally with a poor hash function, or intentionally as a hash-flooding attack. Java 8+ defends by converting chains longer than 8 entries to red-black trees, capping worst-case at O(log n) per bucket. Python 3.3+ defends with hash randomization — the hash seed changes with each process, making it impossible for an attacker to predict collisions without knowing the seed.
Q: How does the load factor affect hash table performance, and what happens at extreme values?
The load factor α = entries/capacity determines the expected chain length (for chaining) or probe distance (for open addressing). At α = 0.75 (Java's threshold), expected lookups require ~1.4 comparisons — effectively O(1). At α → 0, the table is nearly empty with minimal collisions but wastes memory. At α → 1 (for chaining), chains average length 1 — still O(1) but with more variance. For open addressing, α → 1 means nearly every slot is occupied and probing becomes catastrophically slow — O(1/(1-α)) grows unboundedly. Open addressing tables must resize well before reaching α = 1.
Q: Why is building a hash map from n elements O(n) and not O(n²)?
Each individual insertion is O(1) amortized. With a good hash function and a maintained load factor, the expected chain length at any slot is O(α) ≈ O(1). Inserting n elements each taking O(1) gives O(n) total. Rehashing is O(n) when triggered, but the geometric series argument shows total copy work across all rehashes is also O(n). So n insertions + O(n) total rehash work = O(n) overall.
Q: When would you use a TreeMap instead of a HashMap, despite O(log n) being slower than O(1)?
When the problem requires ordered traversal, minimum or maximum key, floor/ceiling queries, or range queries. HashMap provides none of these — all require O(n) on a HashMap but O(log n) or O(n) respectively on a TreeMap. Examples: finding the closest number to a target (floor/ceiling), iterating entries in sorted key order, or finding all keys within a range. If an interview problem mentions "sorted," "range," "closest," or "k-th smallest/largest key," consider TreeMap.
FAQs
Is O(1) hash table lookup guaranteed or just expected?
Expected (average case). The guarantee only holds when the hash function distributes keys uniformly and the load factor stays below the threshold. In theory, adversarial keys can force all lookups to O(n). In practice with a good hash function on real data, O(1) holds consistently. For security-sensitive applications (web servers, compilers), use hash randomization or cryptographic hashes to prevent adversarial exploitation.
Does a larger hash table always mean faster lookup?
Not indefinitely. Increasing capacity reduces the load factor and expected chain length — up to a point. A very sparse table (α → 0) has near-zero collisions but poor cache behavior: the table array is mostly empty, and the few occupied slots are spread far apart, causing cache misses during the hash computation itself. The sweet spot is typically α between 0.5 and 0.75 — low enough for few collisions, dense enough for cache efficiency.
What is the space complexity of a HashMap key-value pair in Java with String keys and Integer values?
Each entry includes: the key String object (~40 + 2×length bytes for UTF-16 storage), the Integer value object (~16 bytes), the chain node (~32 bytes for next pointer, hash code cache, and object header), plus 8 bytes each for the key and value references in the node. For a short string key of 5 characters: ~40 + 10 + 16 + 32 + 8 + 8 = ~114 bytes per entry. This is why numeric keys (int, long) benefit from primitive-specialized maps (IntIntHashMap in Eclipse Collections or similar) that avoid autoboxing overhead.
Quick Quiz
Question 1: A HashMap has 1000 entries and capacity 2048. What is the load factor and what does it imply about performance?
- ›A) α = 2.048, performance is degraded
- ›B) α = 0.488, performance is good — well below 0.75
- ›C) α = 0.75, Java will resize immediately
- ›D) α = 1.0, every slot has exactly one entry
Answer: B) α ≈ 0.488. Load factor = 1000 / 2048 ≈ 0.488. This is well below Java's 0.75 threshold, so no resize will be triggered and expected chain length is less than 0.5 — nearly collision-free. The next resize will trigger at 2048 × 0.75 = 1536 entries.
Question 2: You call map.containsValue(42) inside a loop that runs n times. What is the total time complexity?
- ›A) O(n) — containsValue is O(1) like containsKey
- ›B) O(n log n) — containsValue is O(log n)
- ›C) O(n²) — containsValue is O(n), called n times
- ›D) O(1) — values are cached
Answer: C) O(n²). containsValue(v) must scan all n entries to check every value — O(n). Called n times inside a loop: O(n) × O(n) = O(n²). Fix: maintain a second HashMap of value→key for O(1) reverse lookup.
Question 3: Inserting n elements into a HashMap one by one, where each triggers rehashing (growing from 1 to 2 to 4 to ... to n). What is the total number of element copies across all rehashes?
- ›A) O(n²) — each resize copies the growing table
- ›B) O(n log n) — logarithmic number of resizes
- ›C) O(n) — geometric series sums to ~2n
- ›D) O(1) per resize — amortized cost
Answer: C) O(n). Copy counts: 1, 2, 4, 8, ..., n/2. Sum = 1 + 2 + 4 + ... + n/2 = n - 1 ≈ O(n). Total work: n insertions + n copies = O(n). Amortized per insertion: O(1).
Question 4: For an interview problem with n ≤ 10^6 elements, which is more appropriate — a frequency array int[26] or a HashMap<Character, Integer>?
- ›A) HashMap — more general and equally fast
- ›B)
int[26]— O(1) space, faster access, no hashing overhead - ›C) Both are identical in performance at this scale
- ›D) TreeMap — needed for ordered character output
Answer: B) int[26]. For lowercase letters, an int[26] array uses exactly 104 bytes regardless of n. A HashMap uses O(n) memory in general (though only up to 26 entries for lowercase). Array access is a direct index — no hash computation, no pointer chasing, no cache misses from scattered heap objects. For n = 10^6, the frequency array is 10-50x faster in practice and uses orders of magnitude less memory.
Summary
Hash table complexity is governed by one principle: O(1) average operations are maintained by keeping the load factor below a threshold, triggering O(n) rehashing when needed, which is amortized to O(1) per insertion by the doubling strategy.
The key complexities to carry as permanent intuition:
- ›Insert, lookup, delete → O(1) average, O(n) worst case — the fundamental guarantee
- ›Build from n elements → O(n) amortized — n × O(1) insertions
- ›Rehash → O(n) per event, O(1) amortized across n insertions
- ›Iteration → always O(n) — must visit every entry
- ›
containsValue→ O(n) — not O(1) like containsKey - ›
size/len→ O(1) — stored as a field - ›Set union → O(n + m); intersection → O(min(n, m)) average; difference → O(n)
- ›Frequency array (fixed alphabet) → O(1) space, O(n) build — always prefer over HashMap for bounded alphabets
- ›Prefix sum map → O(n) space, O(n) build, O(1) per subarray query
- ›Load factor sweet spot: 0.5–0.75 for time-space balance
- ›Worst-case O(n) is caused by hash collisions — defended by randomized seeds (Python) or chain-to-tree conversion (Java 8+)
- ›Use TreeMap (O(log n)) over HashMap (O(1) avg) only when ordering, min/max, or range queries are needed
In the next section, you will find Practice Problems organized by pattern to apply these hash table techniques on real interview problems.