DSA Tutorial
🔍

Frequency Counting

What Is Frequency Counting?

Frequency counting is the pattern of building a map from each element to how many times it appears, then using that map to answer questions about the distribution.

Array:     [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

After one pass — frequency map:
  1 → 2
  2 → 1
  3 → 2
  4 → 1
  5 → 3   ← most frequent
  6 → 1
  9 → 1

Questions now answered in O(1):
  "How many times does 5 appear?"  → freq[5]  = 3
  "Does 7 appear at all?"          → freq[7]  = 0 (missing = 0)
  "What is the most frequent?"     → max of values = 5
  "How many unique elements?"      → len(freq) = 7

The core trade: spend O(n) time and O(k) space up front to build the map, then answer every subsequent question about element counts in O(1). This converts O(n) repeated scans into O(1) queries — the same fundamentally beneficial trade that makes all hashing worth using.

The Two Tools: Array vs HashMap

For frequency counting, choose the data structure based on what the elements are.

Frequency Array  — when elements are bounded integers or characters
  int[] freq = new int[26]          for lowercase letters (0-25)
  int[] freq = new int[128]         for ASCII characters  (0-127)
  int[] freq = new int[maxVal + 1]  for bounded integers

  Access:  freq['a' - 'a']   or   freq[num]
  Space:   O(range) — fixed regardless of n
  Speed:   Faster — no hashing, direct array index

HashMap — when elements are unbounded or non-integer
  Map<Integer, Integer> freq = new HashMap<>()
  Map<String, Integer>  freq = new HashMap<>()

  Access:  freq.getOrDefault(key, 0)
  Space:   O(k) where k = distinct elements
  Speed:   O(1) average but with hashing overhead

Decision rule:
  Lowercase letters only → int[26]
  ASCII characters       → int[128]
  Bounded integers       → int[max + 1] if max is small
  Large/unknown range    → HashMap
  Strings or objects     → HashMap

Building a Frequency Map — The Foundation

Java
1import java.util.HashMap; 2import java.util.Map; 3 4public class BuildFrequency { 5 6 public static void main(String[] args) { 7 int[] nums = {3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5}; 8 9 // Method 1: HashMap — works for any element type 10 Map<Integer, Integer> freq = new HashMap<>(); 11 for (int num : nums) { 12 freq.put(num, freq.getOrDefault(num, 0) + 1); 13 } 14 System.out.println("Frequency map: " + freq); 15 System.out.println("Count of 5: " + freq.getOrDefault(5, 0)); // 3 16 System.out.println("Count of 7: " + freq.getOrDefault(7, 0)); // 0 17 System.out.println("Unique count: " + freq.size()); // 7 18 19 // Method 2: Frequency array — for bounded integers 20 int[] arr = {0, 1, 2, 1, 0, 2, 2}; 21 int[] freqArr = new int[3]; // values are 0, 1, 2 22 for (int x : arr) freqArr[x]++; 23 24 System.out.println("\nFrequency array:"); 25 for (int i = 0; i < freqArr.length; i++) { 26 System.out.println(" " + i + " → " + freqArr[i]); 27 } 28 29 // Character frequency — the most common interview case 30 String s = "programming"; 31 int[] charFreq = new int[26]; 32 for (char c : s.toCharArray()) charFreq[c - 'a']++; 33 34 System.out.println("\nCharacter frequency:"); 35 for (int i = 0; i < 26; i++) { 36 if (charFreq[i] > 0) 37 System.out.println(" '" + (char)('a' + i) + "' → " + charFreq[i]); 38 } 39 } 40}
Output:
Frequency map: {1=2, 2=1, 3=2, 4=1, 5=3, 6=1, 9=1}
Count of 5:    3
Count of 7:    0
Unique count:  7

Character frequency:
  'a' → 1  'g' → 2  'i' → 1  'm' → 2  'n' → 1
  'o' → 1  'p' → 1  'r' → 2

Problem 1: Majority Element

Problem: Find the element that appears more than n/2 times. Guaranteed to exist.

Approach: Build a frequency map. The element with count > n/2 is the majority.

Java
1import java.util.HashMap; 2import java.util.Map; 3 4public class MajorityElement { 5 6 // O(n) time, O(n) space — frequency map approach 7 public static int majorityElement(int[] nums) { 8 Map<Integer, Integer> freq = new HashMap<>(); 9 int n = nums.length; 10 11 for (int num : nums) { 12 int count = freq.getOrDefault(num, 0) + 1; 13 if (count > n / 2) return num; // Early exit when found 14 freq.put(num, count); 15 } 16 17 return -1; // Guaranteed not to reach here 18 } 19 20 public static void main(String[] args) { 21 System.out.println(majorityElement(new int[]{3, 2, 3})); // 3 22 System.out.println(majorityElement(new int[]{2, 2, 1, 1, 1, 2, 2})); // 2 23 System.out.println(majorityElement(new int[]{1})); // 1 24 } 25}
Output:
3
2
1

Dry Run: Majority Element in [2, 2, 1, 1, 1, 2, 2], n=7, threshold=3

freq = {}, n/2 = 3

num=2: count=1, 1 > 3? No  → freq={2:1}
num=2: count=2, 2 > 3? No  → freq={2:2}
num=1: count=1, 1 > 3? No  → freq={2:2, 1:1}
num=1: count=2, 2 > 3? No  → freq={2:2, 1:2}
num=1: count=3, 3 > 3? No  → freq={2:2, 1:3}
num=2: count=3, 3 > 3? No  → freq={2:3, 1:3}
num=2: count=4, 4 > 3? YES → return 2 ✓

Early exit saves one unnecessary final iteration.

Problem 2: Top K Frequent Elements

Problem: Given an array, return the k most frequently occurring elements.

Approach: Build frequency map, then find the top k by frequency. Two common methods: min-heap (O(n log k)) or bucket sort (O(n)).

Java
1import java.util.*; 2 3public class TopKFrequent { 4 5 // Bucket sort approach — O(n) time, O(n) space 6 public static int[] topKFrequent(int[] nums, int k) { 7 // Step 1: Build frequency map 8 Map<Integer, Integer> freq = new HashMap<>(); 9 for (int num : nums) freq.put(num, freq.getOrDefault(num, 0) + 1); 10 11 // Step 2: Bucket sort — bucket[i] holds elements with frequency i 12 List<Integer>[] buckets = new List[nums.length + 1]; 13 for (int i = 0; i <= nums.length; i++) buckets[i] = new ArrayList<>(); 14 15 for (Map.Entry<Integer, Integer> e : freq.entrySet()) { 16 buckets[e.getValue()].add(e.getKey()); 17 } 18 19 // Step 3: Collect top k from highest frequency buckets 20 int[] result = new int[k]; 21 int idx = 0; 22 for (int i = nums.length; i >= 0 && idx < k; i--) { 23 for (int num : buckets[i]) { 24 if (idx < k) result[idx++] = num; 25 } 26 } 27 28 return result; 29 } 30 31 public static void main(String[] args) { 32 System.out.println(Arrays.toString(topKFrequent(new int[]{1,1,1,2,2,3}, 2))); // [1,2] 33 System.out.println(Arrays.toString(topKFrequent(new int[]{1}, 1))); // [1] 34 System.out.println(Arrays.toString(topKFrequent(new int[]{4,1,4,4,3,1,3}, 2))); // [4,1] or [4,3] 35 } 36}
Output:
[1, 2]
[1]
[4, 1]

How Bucket Sort Achieves O(n)

nums = [1,1,1,2,2,3],  k=2

Step 1 — Build frequency map:
  {1:3, 2:2, 3:1}

Step 2 — Place in frequency buckets (index = frequency):
  buckets[1] = [3]
  buckets[2] = [2]
  buckets[3] = [1]

Step 3 — Collect from highest bucket down until k reached:
  i=6: empty
  i=5: empty
  i=4: empty
  i=3: [1] → result=[1]  (1 of 2 collected)
  i=2: [2] → result=[1,2] (2 of 2) → done

Result: [1, 2] in O(n) — no sorting step needed

Why O(n) and not O(n log n)?
  No element-by-element comparison sort.
  Bucket index IS the sort key — frequency is a bounded integer [0, n].
  Building buckets: O(n). Scanning buckets: O(n). Total: O(n).

Problem 3: Subarray Sum Equals K

Problem: Count the number of subarrays that sum to exactly k.

Approach: Prefix sum + frequency map. If prefixSum[j] - prefixSum[i] == k, then prefixSum[j] - k == prefixSum[i] — check how many previous prefix sums equal currentSum - k.

Java
1import java.util.HashMap; 2import java.util.Map; 3 4public class SubarraySumEqualsK { 5 6 // O(n) time, O(n) space 7 public static int subarraySum(int[] nums, int k) { 8 Map<Integer, Integer> prefixCount = new HashMap<>(); 9 prefixCount.put(0, 1); // Empty prefix (sum=0) exists once 10 11 int count = 0; 12 int runningSum = 0; 13 14 for (int num : nums) { 15 runningSum += num; 16 17 // How many previous prefix sums equal (runningSum - k)? 18 // If prefixSum[i] == runningSum - k, then nums[i+1..j] sums to k 19 count += prefixCount.getOrDefault(runningSum - k, 0); 20 21 // Record this prefix sum 22 prefixCount.put(runningSum, prefixCount.getOrDefault(runningSum, 0) + 1); 23 } 24 25 return count; 26 } 27 28 public static void main(String[] args) { 29 System.out.println(subarraySum(new int[]{1, 1, 1}, 2)); // 2 30 System.out.println(subarraySum(new int[]{1, 2, 3}, 3)); // 2 31 System.out.println(subarraySum(new int[]{1, -1, 0}, 0)); // 3 32 System.out.println(subarraySum(new int[]{-1, -1, 1}, 0)); // 1 33 } 34}
Output:
2
2
3
1

Dry Run: Subarray Sum = 2 in [1, 1, 1]

k=2, prefixCount={0:1}, count=0

i=0: num=1, runningSum=1
  Look up (1-2)=-1 in prefixCount → 0 → count stays 0
  Store runningSum=1 → prefixCount={0:1, 1:1}

i=1: num=1, runningSum=2
  Look up (2-2)=0 in prefixCount → 1 → count = 0+1 = 1
    (subarray nums[0..1] = [1,1] sums to 2)
  Store runningSum=2 → prefixCount={0:1, 1:1, 2:1}

i=2: num=1, runningSum=3
  Look up (3-2)=1 in prefixCount → 1 → count = 1+1 = 2
    (subarray nums[1..2] = [1,1] sums to 2)
  Store runningSum=3 → prefixCount={0:1, 1:1, 2:1, 3:1}

Result: 2 ✓

Why prefixCount starts with {0:1}?
  It accounts for subarrays starting at index 0.
  If runningSum itself equals k, then (runningSum - k)=0 must be in the map.
  {0:1} ensures the entire prefix [0..i] is counted when it equals k.

Problem 4: Longest Subarray with Equal 0s and 1s

Problem: Given a binary array, find the length of the longest subarray with equal 0s and 1s.

Approach: Replace 0 with -1. The subarray has equal 0s and 1s when its sum equals 0. Use prefix sum + frequency map — store first occurrence of each prefix sum.

Java
1import java.util.HashMap; 2import java.util.Map; 3 4public class LongestEqualZeroOne { 5 6 public static int findMaxLength(int[] nums) { 7 // Replace 0 with -1: equal 0s and 1s → subarray sum = 0 8 Map<Integer, Integer> firstSeen = new HashMap<>(); 9 firstSeen.put(0, -1); // Sum 0 seen before index 0 10 11 int runningSum = 0; 12 int maxLen = 0; 13 14 for (int i = 0; i < nums.length; i++) { 15 runningSum += (nums[i] == 0) ? -1 : 1; 16 17 if (firstSeen.containsKey(runningSum)) { 18 // Same sum seen before → subarray between has sum 0 19 maxLen = Math.max(maxLen, i - firstSeen.get(runningSum)); 20 } else { 21 // First time this sum appears — record it 22 firstSeen.put(runningSum, i); 23 } 24 } 25 26 return maxLen; 27 } 28 29 public static void main(String[] args) { 30 System.out.println(findMaxLength(new int[]{0, 1})); // 2 31 System.out.println(findMaxLength(new int[]{0, 1, 0})); // 2 32 System.out.println(findMaxLength(new int[]{0, 0, 1, 0, 0, 0, 1, 1})); // 6 33 System.out.println(findMaxLength(new int[]{0, 0, 0, 1, 1, 1})); // 6 34 } 35}
Output:
2
2
6
6

Problem 5: Sort Characters By Frequency

Problem: Sort a string so that characters appear in descending order of frequency.

Approach: Count frequencies, then bucket sort by frequency for O(n) — or use a priority queue for O(n log n). Build the result string from highest-frequency to lowest.

Java
1import java.util.*; 2 3public class SortByFrequency { 4 5 public static String frequencySort(String s) { 6 // Step 1: Build frequency map 7 Map<Character, Integer> freq = new HashMap<>(); 8 for (char c : s.toCharArray()) freq.put(c, freq.getOrDefault(c, 0) + 1); 9 10 // Step 2: Bucket sort by frequency 11 List<Character>[] buckets = new List[s.length() + 1]; 12 for (int i = 0; i <= s.length(); i++) buckets[i] = new ArrayList<>(); 13 for (Map.Entry<Character, Integer> e : freq.entrySet()) { 14 buckets[e.getValue()].add(e.getKey()); 15 } 16 17 // Step 3: Build result from highest frequency down 18 StringBuilder sb = new StringBuilder(); 19 for (int i = s.length(); i >= 0; i--) { 20 for (char c : buckets[i]) { 21 for (int j = 0; j < i; j++) sb.append(c); 22 } 23 } 24 25 return sb.toString(); 26 } 27 28 public static void main(String[] args) { 29 System.out.println(frequencySort("tree")); // "eert" or "eetr" 30 System.out.println(frequencySort("cccaaa")); // "aaaccc" or "cccaaa" 31 System.out.println(frequencySort("Aabb")); // "bbAa" or "bbaA" 32 } 33}
Output:
eert
cccaaa
bbAa

The Prefix Sum + Frequency Map Pattern

Problems 3 and 4 both used the same two-step pattern — one of the most powerful combinations in hashing problems:

Pattern: Prefix Sum + Frequency Map

When to recognize it:
  "Count subarrays with sum equal to k"
  "Longest subarray with sum/property equal to some target"
  "Count subarrays divisible by k"
  "Number of nice subarrays (with exactly k odd numbers)"

Template:
  prefixCount = {0: 1}   ← base case: empty prefix
  runningSum  = 0
  answer      = 0

  for each element:
      update runningSum
      answer += prefixCount.get(runningSum - target, 0)
      prefixCount[runningSum] += 1   ← record AFTER query

Key insight:
  If two positions i < j have the same prefix sum,
  the subarray between them has sum 0 (or sum = difference).
  Storing prefix sums in a frequency map lets us count
  how many valid subarrays end at the current position — in O(1).

Comparing Frequency-Based Approaches

Problem TypeBest ToolTimeSpace
Count character occurrencesint[26] or int[128]O(n)O(1)
Count arbitrary element occurrencesHashMap<K, Int>O(n)O(k)
Find majority element (> n/2 times)HashMap with early exitO(n)O(n)
Top K frequent elementsHashMap + bucket sortO(n)O(n)
Sort by frequencyHashMap + bucket sortO(n)O(n)
Count subarrays with sum = kHashMap of prefix sumsO(n)O(n)
Longest subarray with target sumHashMap of first-seen prefix sumO(n)O(n)
Group by character frequencyHashMap of frequency-array keyO(n×k)O(n×k)

Common Mistakes Beginners Make

Forgetting the {0: 1} base case in prefix sum problems. Without it, subarrays starting at index 0 that sum to exactly k are not counted. The empty prefix has sum 0, and it must be in the map so that when runningSum == k, the lookup prefixCount[0] returns 1.

Querying before recording in prefix sum frequency problems. Always check prefixCount[runningSum - k] BEFORE putting runningSum into the map. Doing it in the wrong order would count the current element as a valid start of a subarray that includes itself — producing incorrect results, especially when k = 0.

Using sort for top-K when bucket sort achieves O(n). Sorting by frequency costs O(n log n). Since frequencies are bounded integers in [0, n], bucket sort gives O(n). This optimization is expected in interviews — an O(n log n) solution for top-K is considered incomplete.

Confusing freq.get(key) with freq.getOrDefault(key, 0) in Java. When a key is missing, get() returns null. Using null in arithmetic (null + 1) causes a NullPointerException. Always use getOrDefault(key, 0) when the key might be absent and you need a numeric fallback.

Interview Questions

Q: What is the time and space complexity of building a character frequency map for a string of length n?

O(n) time — one pass through every character, one O(1) operation per character. O(1) space if using a fixed-size array (int[26] for lowercase, int[128] for ASCII), because the array size is constant regardless of n. O(k) space with a hash map where k = number of distinct characters — bounded by the alphabet size, so also effectively O(1) for fixed alphabets.

Q: Why does the subarray sum equals k algorithm initialize prefixCount = {0: 1} instead of an empty map?

The entry {0: 1} represents the empty prefix — the state before processing any element, where the running sum is 0. Without it, subarrays that start at index 0 and sum to exactly k are missed. When runningSum == k at some index, we look up prefixCount[runningSum - k] = prefixCount[0]. If 0 is not in the map, we return 0 and miss the valid subarray [0..i].

Q: How does bucket sort achieve O(n) for top-K frequent elements when comparison sorting requires O(n log n)?

Bucket sort exploits the fact that element frequencies are bounded integers in the range [0, n]. Rather than comparing frequencies, we use frequency itself as an array index — bucket[freq] holds all elements with that exact frequency. Building buckets is O(n). Scanning from the highest bucket down is O(n). No comparison-based sorting step exists, which is why it beats the O(n log n) lower bound that applies to comparison sorts.

FAQs

When should I use Counter vs defaultdict(int) vs manual dict.get() in Python?

All three produce the same result. Use Counter in interviews when you want the most readable code and need bonus features like most_common(). Use defaultdict(int) when you are accumulating in a loop and want to avoid get() on every update. Use manual dict.get(key, 0) when you want explicit control or are avoiding imports. For the subarray sum pattern specifically, use manual dict because you need explicit control over when entries are added.

Can frequency counting handle negative numbers or very large integers?

Yes with a hash map — there is no constraint on key values. Frequency arrays require non-negative integer keys within a bounded range. For the subarray sum pattern (prefixCount), the keys are prefix sums which can be negative — a hash map handles this correctly. The only concern is hash map memory: if n is large and all elements are distinct, the map uses O(n) memory.

Is it safe to use freq[c]++ in C++ for frequency counting without checking if the key exists?

Yes. C++'s unordered_map operator [] inserts a default value (0 for int) when the key does not exist. So freq[c]++ correctly initializes missing keys to 0 and then increments to 1. This is a well-known intentional feature of C++ maps. It is the one case where [] for reading is safe — when the purpose is to write (increment) immediately after, starting from 0 if new.

Quick Quiz

Question 1: You want to count character frequencies in a lowercase-only string. Which structure uses the least memory?

  • A) HashMap<Character, Integer> — hash map for any char
  • B) int[26] — fixed array indexed by char - 'a'
  • C) TreeMap<Character, Integer> — sorted map
  • D) LinkedHashMap<Character, Integer> — ordered map

Answer: B) int[26]. A fixed array of 26 integers uses exactly 26 × 4 = 104 bytes regardless of string length. A hash map requires object overhead per entry (16+ bytes per key-value pair in Java), plus the hash table array itself. For bounded alphabets, the frequency array is always smaller and faster.

Question 2: In the subarray sum equals k algorithm, when should you look up prefixCount[runningSum - k] relative to recording prefixCount[runningSum]?

  • A) Look up after recording — order does not matter
  • B) Look up before recording — to avoid counting a subarray of length 0
  • C) Only look up, never record the current sum
  • D) Record first, then look up the complement

Answer: B) Look up before recording. Recording first would mean that when k=0, the current position's prefix sum would find itself in the map — incorrectly counting a zero-length "subarray" at the same position. Always query first, then record, to ensure only previous prefix sums are counted.

Question 3: top_k_frequent([1,1,1,2,2,3], 2) should return elements with frequencies [3, 2]. Bucket sort places element 1 in buckets[3] and element 2 in buckets[2]. Scanning from the highest bucket down, what is returned?

  • A) [3, 2] — the bucket indices, not the elements
  • B) [1, 2] — element 1 from bucket[3] first, then element 2 from bucket[2]
  • C) [2, 1] — element 2 (lower frequency) first
  • D) [1, 1, 1, 2, 2] — all occurrences

Answer: B) [1, 2]. Bucket index = frequency. Element 1 is in bucket[3] (freq=3), element 2 is in bucket[2] (freq=2). Scanning from highest bucket (index 5 downward): bucket[3] gives element 1 first, then bucket[2] gives element 2. We stop at k=2.

Question 4: For the longest subarray with equal 0s and 1s problem, what transformation makes prefix sums useful?

  • A) Replace every element with its square
  • B) Replace 0 with -1 — then equal 0s and 1s means subarray sum = 0
  • C) Replace 1 with 2 — then look for sum divisible by 3
  • D) No transformation needed — use raw values directly

Answer: B) Replace 0 with -1. With raw values, equal counts of 0s and 1s in a subarray sum to (count_ones). With 0→-1 substitution, each 0 contributes -1 and each 1 contributes +1. Equal counts cancel out to sum = 0. This transforms the problem into "find longest subarray with sum 0" — directly solved by the prefix sum frequency map pattern.

Summary

Frequency counting is the pattern of building a distribution map in one O(n) pass, then answering distribution questions in O(1). It transforms repeated O(n) scans into O(1) queries.

The key patterns to carry forward:

  • Build frequencyfreq.getOrDefault(key, 0) + 1 in one pass — O(n) time, O(k) space
  • Frequency arrayint[26] for lowercase, int[128] for ASCII — O(1) space for fixed alphabets
  • Majority element — check count > n/2 during build with early exit — O(n)
  • Top K frequent — frequency map + bucket sort — O(n) (better than O(n log n) sort)
  • Sort by frequency — same bucket sort pattern, build result string from high buckets — O(n)
  • Subarray sum = k — prefix sum + frequency map — O(n) time, O(n) space
    • Initialize prefixCount = {0: 1} for the empty prefix
    • Always query before recording the current prefix sum
  • Longest subarray, equal property — same pattern with first-seen index instead of count
    • prefixCount stores {sum: firstIndex} not {sum: count}

In the next topic, you will explore the Time and Space Complexity of hashing — the complete reference for every hash table operation and algorithm, including the factors that cause worst-case degradation.

Frequency Counting