Java Choosing the Right Collection
Java Choosing the Right Collection
The Java Collections Framework has over twenty concrete implementations. Most developers default to ArrayList and HashMap for everything and move on. That works until it does not — until a list.contains() inside a loop silently turns an O(n) operation into O(n²), or a HashMap in a shared service causes a race condition that only reproduces under load, or a HashSet breaks an alphabetically sorted output in a client demo. Collection choice is an architectural decision. Getting it right at design time prevents a class of bugs that are hard to diagnose later.
The Complete Java Collections Landscape
JAVA COLLECTIONS FRAMEWORK — FULL MAP
INTERFACES AND THEIR IMPLEMENTATIONS:
Collection<E> ← root of element-holding collections
│
├── List<E> ← ORDERED sequence, duplicates ALLOWED
│ │ "What is at position i?"
│ ├── ArrayList ← resizable array, O(1) get [DEFAULT CHOICE]
│ ├── LinkedList ← doubly-linked, O(1) head/tail, also Deque
│ └── CopyOnWriteArrayList ← thread-safe, snapshot read, write-rare
│
├── Set<E> ← UNIQUE elements, NO duplicates
│ │ "Have I seen this before?"
│ ├── HashSet ← O(1) all ops, no order [DEFAULT CHOICE]
│ ├── LinkedHashSet ← O(1), insertion order preserved
│ └── TreeSet ← O(log n), SORTED, NavigableSet API
│
└── Queue<E> / Deque<E> ← PROCESSING ORDER (FIFO, LIFO, priority)
│ "What should I process next?"
├── ArrayDeque ← circular array, O(1) both ends [DEFAULT CHOICE]
├── PriorityQueue ← binary min-heap, priority order (NOT FIFO)
├── LinkedBlockingQueue ← bounded/unbounded, BLOCKING, thread-safe
└── PriorityBlockingQueue ← priority + blocking, thread-safe
Map<K, V> (NOT a Collection subtype)
│ "What is the value for key X?"
├── HashMap ← O(1) all ops, no order [DEFAULT CHOICE]
├── LinkedHashMap ← O(1), insertion order (LRU with accessOrder)
├── TreeMap ← O(log n), SORTED keys, NavigableMap API
├── EnumMap ← array-backed, enum keys only, fastest
└── ConcurrentHashMap ← thread-safe, lock-free reads, bucket writes
THREAD-SAFE ALTERNATIVES:
CopyOnWriteArrayList ← List, snapshot iteration, write-rare
ConcurrentHashMap ← Map, bucket-level locking, atomic ops
ConcurrentHashMap.newKeySet() ← Set, backed by ConcurrentHashMap
ConcurrentSkipListMap ← sorted Map, thread-safe TreeMap equivalent
ConcurrentSkipListSet ← sorted Set, thread-safe TreeSet equivalent
LinkedBlockingQueue ← Queue, blocking producer-consumer
ArrayBlockingQueue ← Queue, bounded blocking, array-backed
Basic Overview — The Four Core Questions
Every collection decision maps to one of four questions. Answer the question first, then pick the implementation.
QUESTION 1: "Does position or order matter AND are duplicates natural?"
→ List
Examples: ordered pipeline steps, search result rankings,
shopping cart (same item twice = two entries),
message history, undo/redo stack content
QUESTION 2: "Do I need to know whether I have seen this element?"
→ Set
Examples: processed order IDs, visited URLs, active user IDs,
unique tags, distinct product categories
QUESTION 3: "Do I need to look up a value by a meaningful identifier?"
→ Map
Examples: userId → UserProfile, SKU → stockLevel,
word → frequency, config key → config value,
city → population
QUESTION 4: "Do I need to process items in a defined order?"
→ Queue / Deque
Examples: FIFO request queue, LIFO undo stack,
priority-based task scheduler,
producer-consumer pipeline
COMBINED STRUCTURES (when a single interface is not enough):
Map<K, List<V>> → group items by category (duplicates within group OK)
Map<K, Set<V>> → group items with uniqueness per group
Map<K, Integer> → count occurrences of each key
Set of Map entries → same as Map.entrySet()
When to Use Each Interface
Choosing List
USE List WHEN:
1. Sequence and position carry meaning
— ordered execution: step 1 must run before step 2
— ranked results: rank 1 is different from rank 5
— chronological history: events must be replayed in order
2. Duplicates are semantically correct
— the same product added twice to a cart = two line items
— the same error appearing three times in a log = three events
3. Index-based access is needed
— results.get(0) for the top result
— pipeline.get(currentStage) for workflow position
LIST IMPLEMENTATION GUIDE:
ArrayList → default for almost everything
O(1) get, O(1) amortised add at end
cache-friendly contiguous memory
LinkedList → ONLY when frequent O(1) insertions AND removals
at BOTH ends AND List interface is also needed
O(n) get(index) makes it wrong for random access
Use ArrayDeque instead for pure queue/stack
CopyOnWriteArrayList → ONLY for concurrent, read-heavy,
write-rare lists (listener registries, config)
O(n) full array copy on every write
Choosing Set
USE Set WHEN:
1. Membership testing is the primary operation
— "Has this orderId been processed?" → O(1) with HashSet
— "Has this URL been crawled?" → O(1) with HashSet
— "Is this userId active?" → O(1) with HashSet
The same question with List: O(n) scan every time
2. Deduplication is required
— collecting unique tags from multiple articles
— building a distinct visitor list from a raw event stream
— ensuring no duplicate notification is sent
3. Set algebra is needed
— intersection (retainAll): skills a candidate has AND the job requires
— difference (removeAll): required skills the candidate is missing
— union (addAll): all skills from any of several candidates
SET IMPLEMENTATION GUIDE:
HashSet → default for speed, no ordering concern
O(1) average add/remove/contains
LinkedHashSet → when insertion order must be preserved alongside uniqueness
"First-seen" dedup: keeps first occurrence position
Deterministic test output
TreeSet → when sorted order is needed, or range queries required
O(log n) operations
floor(), ceiling(), headSet(), tailSet()
Rejects null elements
Choosing Map
USE Map WHEN:
1. Key-based lookup by a meaningful identifier
— userId → session (not a slot in an array, an ID with business meaning)
— productSku → inventoryCount
— configKey → configValue
2. Counting or frequency analysis
— word → count: map.merge(word, 1, Integer::sum)
— event type → occurrences
3. Grouping elements by a category key
— category → List<Product>
— date → List<Event>
— role → Set<Permission>
MAP IMPLEMENTATION GUIDE:
HashMap → default for almost all single-threaded use
O(1) average all ops, no ordering
LinkedHashMap → when insertion-order iteration matters
Deterministic API responses
LRU cache: new LinkedHashMap<>(n, 0.75f, true)
+ override removeEldestEntry()
TreeMap → when sorted key order is required
Range views: headMap(), tailMap(), subMap()
Nearest key: floorKey(), ceilingKey()
O(log n) operations
ConcurrentHashMap → for multi-threaded shared maps
Lock-free reads, per-bucket writes
Atomic compound ops: putIfAbsent, computeIfAbsent, merge
EnumMap → keys are enum constants only
Array-backed, no hashing needed, fastest possible
Choosing Queue or Deque
USE Queue / Deque WHEN:
1. Processing order is the concern, not lookup
— FIFO: process requests in arrival order
— LIFO: undo the most recent action first
— Priority: always process the highest-urgency item next
2. Both-ends access is needed simultaneously
— sliding window algorithms
— work-stealing schedulers
QUEUE/DEQUE IMPLEMENTATION GUIDE:
ArrayDeque → DEFAULT for all single-threaded queue and stack use
Replaces java.util.Stack (Stack is synchronised even
in single-threaded use — legacy overhead)
O(1) both ends, no per-element allocation
PriorityQueue → when items have varying priority
Always returns the minimum (or max with reverseOrder())
NOT FIFO — insertion order is irrelevant
Used in Dijkstra, A*, top-K problems
LinkedBlockingQueue → producer-consumer with multiple threads
take() blocks consumers when empty
put() blocks producers when full (bounded)
ArrayBlockingQueue → bounded blocking queue for back-pressure control
Fixed capacity set at construction
The Decision Framework — Four Steps
STEP 1: CHOOSE THE INTERFACE
Is order/position meaningful AND duplicates are natural?
→ List
Is uniqueness the core constraint?
→ Set
Is lookup by a meaningful key the primary operation?
→ Map
Is processing order the concern (FIFO, LIFO, priority)?
→ Queue or Deque
STEP 2: CHOOSE ORDERING WITHIN THE INTERFACE
No ordering needed?
→ ArrayList / HashSet / HashMap / ArrayDeque
Insertion order must be preserved?
→ ArrayList (always) / LinkedHashSet / LinkedHashMap
Sorted natural or comparator order?
→ TreeSet / TreeMap (and NavigableSet/NavigableMap API)
Priority order (smallest or custom comparator first)?
→ PriorityQueue
STEP 3: CHECK THREAD-SAFETY REQUIREMENT
Single-threaded only?
→ Standard java.util implementations (faster, less memory)
Multi-threaded, read-heavy, write-rare?
→ CopyOnWriteArrayList / ConcurrentHashMap.newKeySet()
Multi-threaded, balanced reads and writes?
→ ConcurrentHashMap / ConcurrentSkipListMap
Multi-threaded producer-consumer queue?
→ LinkedBlockingQueue / ArrayBlockingQueue
STEP 4: CHECK SIZE AND PERFORMANCE CONSTRAINTS
Expected size < 50 elements?
→ Any structure works; standard implementations fine
Expected size in thousands with frequent lookup?
→ Hash-based (HashSet, HashMap) for O(1) or tree-based for sorted
Expected size in millions?
→ Pre-size: new HashMap<>(n / 0.75 + 1), new ArrayList<>(n)
→ Avoid CopyOnWriteArrayList (O(n) write copies entire large array)
→ Consider memory: LinkedList ~28 bytes per node vs ArrayList ~4 bytes
Core Operations with Examples
The Four Core Patterns Side by Side
1// File: CollectionSelectionDemo.java
2
3import java.util.*;
4import java.util.stream.Collectors;
5
6public class CollectionSelectionDemo {
7
8 public static void main(String[] args) {
9
10 // PATTERN 1: List — ordered history, duplicates are real events
11 System.out.println("=== List — search history (order + duplicates) ===");
12 List<String> searchHistory = new ArrayList<>();
13 searchHistory.add("java collections");
14 searchHistory.add("hashmap tutorial");
15 searchHistory.add("java collections"); // same search repeated — real event
16 searchHistory.add("arraylist vs linkedlist");
17 System.out.println("History : " + searchHistory);
18 System.out.println("Count : " + searchHistory.size()); // 4, not 3
19 System.out.println("Recent : " + searchHistory.get(searchHistory.size() - 1));
20
21 System.out.println();
22
23 // PATTERN 2: Set — deduplication + O(1) membership
24 System.out.println("=== Set — processed notifications (uniqueness) ===");
25 Set<String> dispatched = new HashSet<>();
26 String[] incoming = {"NTF-001","NTF-002","NTF-001","NTF-003","NTF-002"};
27 int sent = 0, skipped = 0;
28 for (String notifId : incoming) {
29 if (dispatched.add(notifId)) { // false = duplicate, already dispatched
30 sent++;
31 System.out.println(" SENT : " + notifId);
32 } else {
33 skipped++;
34 System.out.println(" SKIPPED: " + notifId + " (already dispatched)");
35 }
36 }
37 System.out.printf("Result: %d sent, %d skipped%n", sent, skipped);
38
39 System.out.println();
40
41 // PATTERN 3: Map — key-value lookup
42 System.out.println("=== Map — user role lookup (O(1) by key) ===");
43 Map<String, String> userRoles = new HashMap<>();
44 userRoles.put("priya@corp.com", "ADMIN");
45 userRoles.put("rohan@corp.com", "DEVELOPER");
46 userRoles.put("ananya@corp.com", "VIEWER");
47 String[] requesters = {"priya@corp.com", "karan@corp.com", "rohan@corp.com"};
48 for (String email : requesters) {
49 System.out.printf(" %-20s → %s%n",
50 email, userRoles.getOrDefault(email, "GUEST"));
51 }
52
53 System.out.println();
54
55 // PATTERN 4: Queue — FIFO processing order
56 System.out.println("=== Queue — task processing (FIFO order) ===");
57 Queue<String> taskQueue = new ArrayDeque<>();
58 taskQueue.offer("TASK-A");
59 taskQueue.offer("TASK-B");
60 taskQueue.offer("TASK-C");
61 System.out.println("Queue: " + taskQueue);
62 while (!taskQueue.isEmpty()) {
63 System.out.println(" Processing: " + taskQueue.poll()); // FIFO — A first
64 }
65 }
66}Output:
=== List — search history (order + duplicates) ===
History : [java collections, hashmap tutorial, java collections, arraylist vs linkedlist]
Count : 4
Recent : arraylist vs linkedlist
=== Set — processed notifications (uniqueness) ===
SENT : NTF-001
SENT : NTF-002
SKIPPED: NTF-001 (already dispatched)
SENT : NTF-003
SKIPPED: NTF-002 (already dispatched)
Result: 3 sent, 2 skipped
=== Map — user role lookup (O(1) by key) ===
priya@corp.com → ADMIN
karan@corp.com → GUEST
rohan@corp.com → DEVELOPER
=== Queue — task processing (FIFO order) ===
Queue: [TASK-A, TASK-B, TASK-C]
Processing: TASK-A
Processing: TASK-B
Processing: TASK-C
Ordering and Thread-Safety Variants
1// File: OrderingVariantsDemo.java
2
3import java.util.*;
4import java.util.concurrent.ConcurrentHashMap;
5import java.util.concurrent.CopyOnWriteArrayList;
6
7public class OrderingVariantsDemo {
8
9 public static void main(String[] args) {
10
11 // Variant 1: LinkedHashSet — unique + insertion order
12 System.out.println("=== LinkedHashSet — unique tags, first-seen order ===");
13 Set<String> uniqueTags = new LinkedHashSet<>();
14 String[] raw = {"java","spring","java","kafka","spring","redis","java"};
15 for (String tag : raw) uniqueTags.add(tag);
16 System.out.println("Input : " + Arrays.toString(raw));
17 System.out.println("Result: " + uniqueTags); // java, spring, kafka, redis
18
19 System.out.println();
20
21 // Variant 2: TreeMap — sorted keys + range navigation
22 System.out.println("=== TreeMap — sorted price tiers + range queries ===");
23 TreeMap<Integer, String> priceTiers = new TreeMap<>();
24 priceTiers.put(199, "Basic");
25 priceTiers.put(499, "Standard");
26 priceTiers.put(999, "Premium");
27 priceTiers.put(2999, "Enterprise");
28
29 int budget = 600;
30 System.out.println("Sorted tiers: " + priceTiers);
31 System.out.println("Best within Rs." + budget + " : " + priceTiers.floorKey(budget)); // 499
32 System.out.println("Next above Rs." + budget + " : " + priceTiers.ceilingKey(budget)); // 999
33 System.out.println("Tiers <= Rs." + budget + " : " + priceTiers.headMap(budget + 1));
34
35 System.out.println();
36
37 // Variant 3: ConcurrentHashMap — thread-safe shared map
38 System.out.println("=== ConcurrentHashMap — thread-safe counter ===");
39 Map<String, Integer> concurrentCounter = new ConcurrentHashMap<>();
40 // merge() is atomic — safe to call from multiple threads simultaneously
41 String[] events = {"LOGIN","VIEW","LOGIN","PURCHASE","VIEW","LOGIN"};
42 for (String event : events) {
43 concurrentCounter.merge(event, 1, Integer::sum);
44 }
45 System.out.println("Event counts: " + concurrentCounter);
46
47 System.out.println();
48
49 // Variant 4: CopyOnWriteArrayList — thread-safe listener list
50 System.out.println("=== CopyOnWriteArrayList — thread-safe listeners ===");
51 List<String> listeners = new CopyOnWriteArrayList<>();
52 listeners.add("AuditLogger");
53 listeners.add("MetricsSender");
54 listeners.add("EmailNotifier");
55 System.out.println("Registered: " + listeners);
56 // Iteration never throws CME even if another thread adds/removes
57 for (String listener : listeners) {
58 System.out.println(" Notifying: " + listener);
59 }
60 }
61}Output:
=== LinkedHashSet — unique tags, first-seen order ===
Input : [java, spring, java, kafka, spring, redis, java]
Result: [java, spring, kafka, redis]
=== TreeMap — sorted price tiers + range queries ===
Sorted tiers: {199=Basic, 499=Standard, 999=Premium, 2999=Enterprise}
Best within Rs.600 : 499
Next above Rs.600 : 999
Tiers <= Rs.600 : {199=Basic, 499=Standard}
=== ConcurrentHashMap — thread-safe counter ===
Event counts: {PURCHASE=1, LOGIN=3, VIEW=2}
=== CopyOnWriteArrayList — thread-safe listeners ===
Registered: [AuditLogger, MetricsSender, EmailNotifier]
Notifying: AuditLogger
Notifying: MetricsSender
Notifying: EmailNotifier
Real-World Example — CRED Rewards and Offer Engine
CRED's offer engine demonstrates four different collection choices in one cohesive system. An ArrayList holds the ordered offer pipeline steps where sequence is fixed by business rules. A HashSet tracks which user IDs have already received a particular offer for idempotency. A LinkedHashMap maintains the user's reward history in insertion order for display. A TreeMap indexes offer tiers by minimum credit score for range-based eligibility lookup.
1// File: CreditOffer.java
2
3public record CreditOffer(
4 String offerId,
5 String title,
6 double cashbackPercent,
7 int minCreditScore) {
8
9 @Override
10 public String toString() {
11 return String.format("[%s] %-28s %.1f%% cashback (min score: %d)",
12 offerId, title, cashbackPercent, minCreditScore);
13 }
14}1// File: OfferEngine.java
2
3import java.util.ArrayList;
4import java.util.HashSet;
5import java.util.LinkedHashMap;
6import java.util.List;
7import java.util.Map;
8import java.util.Set;
9import java.util.TreeMap;
10
11public class OfferEngine {
12
13 // TreeMap — O(log n) range lookup: "best offer for credit score X"
14 private final TreeMap<Integer, CreditOffer> offersByMinScore = new TreeMap<>();
15
16 // HashSet — O(1) idempotency: "has this user received this offer?"
17 private final Set<String> dispatchedKeys = new HashSet<>();
18
19 // LinkedHashMap — insertion-order reward history per user for display
20 private final Map<String, List<String>> userRewardHistory = new LinkedHashMap<>();
21
22 // ArrayList — ordered offer pipeline: validation steps run in fixed sequence
23 private final List<String> pipelineSteps = new ArrayList<>();
24
25 public void registerOffer(CreditOffer offer) {
26 offersByMinScore.put(offer.minCreditScore(), offer);
27 }
28
29 public void initPipeline() {
30 pipelineSteps.add("VALIDATE_USER");
31 pipelineSteps.add("CHECK_CREDIT_SCORE");
32 pipelineSteps.add("FIND_ELIGIBLE_OFFER");
33 pipelineSteps.add("CHECK_IDEMPOTENCY");
34 pipelineSteps.add("DISPATCH_OFFER");
35 pipelineSteps.add("RECORD_HISTORY");
36 }
37
38 public void processUser(String userId, int creditScore) {
39
40 System.out.println(" Processing: userId=" + userId + " score=" + creditScore);
41
42 // ArrayList — iterate ordered pipeline (position = execution order)
43 for (String step : pipelineSteps) {
44 System.out.println(" [" + step + "]");
45
46 if ("FIND_ELIGIBLE_OFFER".equals(step)) {
47 // TreeMap floorEntry — best offer whose minScore <= user's score
48 Map.Entry<Integer, CreditOffer> entry = offersByMinScore.floorEntry(creditScore);
49 if (entry == null) {
50 System.out.println(" No eligible offer found. Stopping.");
51 return;
52 }
53 System.out.println(" Eligible: " + entry.getValue());
54
55 } else if ("CHECK_IDEMPOTENCY".equals(step)) {
56 String dispatchKey = userId + ":" + offersByMinScore.floorKey(creditScore);
57 // HashSet.add() — returns false if already dispatched (O(1))
58 if (!dispatchedKeys.add(dispatchKey)) {
59 System.out.println(" DUPLICATE — offer already sent. Stopping.");
60 return;
61 }
62 System.out.println(" Idempotency key registered: " + dispatchKey);
63
64 } else if ("RECORD_HISTORY".equals(step)) {
65 // LinkedHashMap + ArrayList — ordered reward history per user
66 CreditOffer offer = offersByMinScore.floorEntry(creditScore).getValue();
67 userRewardHistory
68 .computeIfAbsent(userId, k -> new ArrayList<>())
69 .add(offer.title() + " (" + offer.cashbackPercent() + "%)");
70 System.out.println(" History recorded.");
71 }
72 }
73 System.out.println(" DONE.\n");
74 }
75
76 public void printHistory() {
77 System.out.println("=".repeat(62));
78 System.out.println(" USER REWARD HISTORY (insertion order)");
79 System.out.println("=".repeat(62));
80 // LinkedHashMap iterates in user-first-seen order
81 userRewardHistory.forEach((userId, rewards) -> {
82 System.out.println(" " + userId + ":");
83 rewards.forEach(r -> System.out.println(" - " + r));
84 });
85 System.out.println("=".repeat(62));
86 }
87
88 public static void main(String[] args) {
89
90 OfferEngine engine = new OfferEngine();
91 engine.initPipeline();
92
93 engine.registerOffer(new CreditOffer("OFR-01", "Silver Cashback Card", 1.5, 600));
94 engine.registerOffer(new CreditOffer("OFR-02", "Gold Rewards Program", 3.0, 720));
95 engine.registerOffer(new CreditOffer("OFR-03", "Platinum Elite Offer", 5.5, 800));
96 engine.registerOffer(new CreditOffer("OFR-04", "Black Card Privileges", 8.0, 850));
97
98 System.out.println("--- Processing users ---\n");
99 engine.processUser("user-priya", 820); // eligible for Platinum
100 engine.processUser("user-rohan", 740); // eligible for Gold
101 engine.processUser("user-priya", 820); // duplicate — idempotency check stops it
102 engine.processUser("user-ananya", 560); // score too low for any offer
103 engine.processUser("user-karan", 860); // eligible for Black Card
104
105 engine.printHistory();
106
107 System.out.println("\n--- Collection roles in this system ---");
108 System.out.println("ArrayList : ordered pipeline steps (sequence = business rule)");
109 System.out.println("HashSet : O(1) idempotency check (no duplicate offers)");
110 System.out.println("TreeMap : O(log n) score range lookup (floorEntry)");
111 System.out.println("LinkedHashMap + ArrayList: insertion-order reward history per user");
112 }
113}Output:
--- Processing users ---
Processing: userId=user-priya score=820
[VALIDATE_USER]
[CHECK_CREDIT_SCORE]
[FIND_ELIGIBLE_OFFER]
Eligible: [OFR-03] Platinum Elite Offer 5.5% cashback (min score: 800)
[CHECK_IDEMPOTENCY]
Idempotency key registered: user-priya:800
[DISPATCH_OFFER]
[RECORD_HISTORY]
History recorded.
DONE.
Processing: userId=user-rohan score=740
[VALIDATE_USER]
[CHECK_CREDIT_SCORE]
[FIND_ELIGIBLE_OFFER]
Eligible: [OFR-02] Gold Rewards Program 3.0% cashback (min score: 720)
[CHECK_IDEMPOTENCY]
Idempotency key registered: user-rohan:720
[DISPATCH_OFFER]
[RECORD_HISTORY]
History recorded.
DONE.
Processing: userId=user-priya score=820
[VALIDATE_USER]
[CHECK_CREDIT_SCORE]
[FIND_ELIGIBLE_OFFER]
Eligible: [OFR-03] Platinum Elite Offer 5.5% cashback (min score: 800)
[CHECK_IDEMPOTENCY]
DUPLICATE — offer already sent. Stopping.
Processing: userId=user-ananya score=560
[VALIDATE_USER]
[CHECK_CREDIT_SCORE]
[FIND_ELIGIBLE_OFFER]
No eligible offer found. Stopping.
Processing: userId=user-karan score=860
[VALIDATE_USER]
[CHECK_CREDIT_SCORE]
[FIND_ELIGIBLE_OFFER]
Eligible: [OFR-04] Black Card Privileges 8.0% cashback (min score: 850)
[CHECK_IDEMPOTENCY]
Idempotency key registered: user-karan:850
[DISPATCH_OFFER]
[RECORD_HISTORY]
History recorded.
DONE.
==============================================================
USER REWARD HISTORY (insertion order)
==============================================================
user-priya:
- Platinum Elite Offer (5.5%)
user-rohan:
- Gold Rewards Program (3.0%)
user-karan:
- Black Card Privileges (8.0%)
==============================================================
--- Collection roles in this system ---
ArrayList : ordered pipeline steps (sequence = business rule)
HashSet : O(1) idempotency check (no duplicate offers)
TreeMap : O(log n) score range lookup (floorEntry)
LinkedHashMap + ArrayList: insertion-order reward history per user
Performance Considerations
| Collection | Add | Get/Contains | Remove | Ordered? | Thread-safe? |
|---|---|---|---|---|---|
| ArrayList | O(1) amort | O(n) contains, O(1) get(i) | O(n) | Insertion | No |
| LinkedList | O(1) ends | O(n) | O(1) ptr | Insertion | No |
| HashSet | O(1) avg | O(1) avg | O(1) avg | No | No |
| LinkedHashSet | O(1) avg | O(1) avg | O(1) avg | Insertion | No |
| TreeSet | O(log n) | O(log n) | O(log n) | Sorted | No |
| HashMap | O(1) avg | O(1) avg | O(1) avg | No | No |
| LinkedHashMap | O(1) avg | O(1) avg | O(1) avg | Insertion | No |
| TreeMap | O(log n) | O(log n) | O(log n) | Sorted keys | No |
| ArrayDeque | O(1) amort | O(n) contains | O(1) ends | Insertion | No |
| PriorityQueue | O(log n) | O(1) peek | O(log n) | Priority | No |
| CopyOnWriteArrayList | O(n) | O(n) contains, O(1) get(i) | O(n) | Insertion | Yes |
| ConcurrentHashMap | O(1) avg | O(1) avg | O(1) avg | No | Yes |
The O(n) contains trap is the most costly collection mistake. ArrayList.contains() scans every element. For 10,000 items and 10,000 lookups: 100,000,000 comparisons. HashSet.contains() on the same data: 10,000 operations. Replacing a List with a Set for membership-test-heavy code is the single highest-impact collection change in production Java. During code reviews, this pattern — list.contains(item) inside a loop — is a near-automatic refactor to Set.
Best Practices
Choose the narrowest interface that satisfies your requirements. Declare List<E>, Set<E>, Map<K,V> — not ArrayList, HashSet, HashMap. This disciplines callers to the intended contract and lets you swap implementations without changing signatures. The only exception: when you need an implementation-specific method (e.g., TreeMap.floorKey() or ArrayDeque.push()), declare the next most specific interface (NavigableMap, Deque) rather than the concrete class.
Identify the dominant operation and choose accordingly. Is contains() called frequently? Use Set or Map. Is get(index) critical? Use ArrayList. Is add/remove at both ends frequent? Use ArrayDeque. Is sorted order always required? Use TreeSet or TreeMap. Is order-preserving deduplication needed? Use LinkedHashSet. Let the access pattern drive the choice — do not default to familiar classes.
Pre-size hash-based collections when the expected entry count is known. new HashMap<>(expectedSize / 0.75 + 1) allocates the backing array large enough to avoid any resize. new ArrayList<>(expectedSize) avoids O(n) array copies during growth. For a HashMap loading 100,000 entries, pre-sizing eliminates seven resize operations — each of which copies every existing entry.
For grouping, always use computeIfAbsent(). map.computeIfAbsent(key, k -> new ArrayList<>()).add(value) is atomic and clear. The three-line alternative — containsKey check, construct, put, then get — is both verbose and has a subtle issue: the get() after put() is a second hash lookup. computeIfAbsent() returns the existing or newly created collection in a single operation.
Common Mistakes
Mistake 1 — Using List.contains() in a Loop Instead of a Set
1// WRONG — O(n²): contains() scans the full list on every iteration
2List<String> processedIds = new ArrayList<>();
3for (String incoming : millionsOfIncomingIds) {
4 if (!processedIds.contains(incoming)) { // O(n) scan each time
5 processedIds.add(incoming);
6 process(incoming);
7 }
8}
9
10// CORRECT — O(n): HashSet.add() returns false for duplicates in O(1)
11Set<String> processedIds2 = new HashSet<>();
12for (String incoming : millionsOfIncomingIds) {
13 if (processedIds2.add(incoming)) { // O(1): false = duplicate, already seen
14 process(incoming);
15 }
16}Mistake 2 — Choosing HashMap When Order Is Required
1// WRONG — HashMap iteration order is undefined and can change after resize
2Map<String, Integer> configDisplay = new HashMap<>();
3configDisplay.put("timeout", 30);
4configDisplay.put("retries", 3);
5configDisplay.put("baseUrl", 443);
6// Iterating for display: order is random — timeout may appear last
7
8// CORRECT — LinkedHashMap when insertion order must be preserved
9Map<String, Integer> configOrdered = new java.util.LinkedHashMap<>();
10configOrdered.put("timeout", 30); // iterates: timeout, retries, baseUrl
11configOrdered.put("retries", 3); // always in this sequence
12configOrdered.put("baseUrl", 443);Mistake 3 — Using java.util.Stack Instead of ArrayDeque
1// WRONG — Stack extends Vector, every push/pop acquires a global mutex
2// even in single-threaded code
3java.util.Stack<String> callStack = new java.util.Stack<>();
4callStack.push("main()");
5callStack.push("processOrder()");
6String top = callStack.pop(); // acquires lock — unnecessary single-thread overhead
7
8// CORRECT — ArrayDeque: same LIFO semantics, zero synchronisation cost
9Deque<String> callStack2 = new ArrayDeque<>();
10callStack2.push("main()");
11callStack2.push("processOrder()");
12String top2 = callStack2.pop(); // O(1) amortised, no lockMistake 4 — Using a Sorted Collection When Sorting Once on Output Is Sufficient
1// WRONG — TreeSet pays O(log n) on every insert when sorted output
2// is only needed once at the end
3Set<String> accumulator = new TreeSet<>();
4for (String item : millionsOfInserts) {
5 accumulator.add(item); // O(log n) per insert → O(n log n) total with tree overhead
6}
7
8// CORRECT — collect into HashSet O(1) per insert, sort once at the end
9Set<String> accumulator2 = new HashSet<>();
10for (String item : millionsOfInserts) {
11 accumulator2.add(item); // O(1) per insert → O(n) total
12}
13List<String> sorted = new ArrayList<>(accumulator2);
14java.util.Collections.sort(sorted); // O(n log n) once at the end
15// Same total asymptotic cost, but lower constant factors than TreeSet for bulk loadInterview Questions
Q1. How do you decide which Java collection to use?
Start with the shape of the data: List for ordered sequences where duplicates are natural, Set for unique element membership, Map for key-value lookup, Queue/Deque for processing order. Then choose the implementation based on three factors: ordering requirement (no-order → Hash, insertion-order → Linked, sorted → Tree), thread-safety requirement (single-threaded → standard java.util, multi-threaded → java.util.concurrent), and dominant operation (frequent contains() → hash-based, range queries → tree-based, positional access → ArrayList).
Q2. When should you replace a List with a Set or Map in production code?
Replace List with Set when list.contains(element) is called in a loop or repeatedly — this is O(n) per call, O(n²) total. A Set makes contains() O(1). Replace List with Map when the code scans a list by comparing some field value to find the matching entry — for example, list.stream().filter(u -> u.getId().equals(id)).findFirst(). That is O(n) per lookup. A Map<String, User> keyed by id gives O(1). Both substitutions are high-impact and appear routinely in code reviews.
Q3. What is the difference between HashSet, LinkedHashSet, and TreeSet?
All three implement Set and reject duplicate elements. HashSet uses a hash table — O(1) average for all operations, no ordering guarantee, fastest for pure membership testing. LinkedHashSet adds a doubly-linked list threading all entries in insertion sequence — O(1) operations, predictable iteration order. TreeSet uses a Red-Black tree — O(log n) for all operations, always iterates in sorted natural or Comparator order, and provides NavigableSet range methods (first(), last(), headSet(), tailSet(), floor(), ceiling()).
Q4. When would you use TreeMap over HashMap?
Use TreeMap when sorted key order is required for iteration, when range views are needed (headMap(toKey), tailMap(fromKey), subMap(from, to)), or when nearest-key navigation is needed (floorKey(), ceilingKey()). Common use cases: time-series data keyed by timestamp, price tier lookup by score range, alphabetically sorted menus, and any use case where "give me all entries between A and B" is a real operation. HashMap is O(1) but cannot answer range or order questions. TreeMap is O(log n) but can.
Q5. What are the thread-safe collection choices and when should each be used?
For a thread-safe List with read-heavy, write-rare workloads: CopyOnWriteArrayList — snapshot iteration, never CME, O(n) write. For a thread-safe Map with concurrent reads and writes: ConcurrentHashMap — lock-free reads, per-bucket writes, atomic compound operations. For a thread-safe sorted Map: ConcurrentSkipListMap. For a producer-consumer queue with blocking semantics: LinkedBlockingQueue or ArrayBlockingQueue. Never use Hashtable or Collections.synchronizedMap() in new code — both hold a global lock on every method, serialising all access.
Q6. Why is ArrayDeque recommended over Stack and LinkedList for queue and stack operations?
java.util.Stack extends Vector, which synchronises every method using a global mutex — push() and pop() acquire a lock on every call even in single-threaded code. This is pure overhead. LinkedList allocates a Node object per element, increasing GC pressure and causing CPU cache misses on iteration since nodes are scattered in heap memory. ArrayDeque uses a circular resizable array — contiguous memory, cache-friendly, O(1) amortised at both ends, zero synchronisation cost. The Java documentation explicitly recommends ArrayDeque as the preferred replacement for both.
FAQs
What is the simplest rule for choosing between List, Set, and Map?
Three questions: Does the order of elements matter and are duplicates valid data? → List. Is uniqueness the constraint and do you need O(1) membership testing? → Set. Do you need to look up a value by a meaningful key? → Map. More than 80% of collection decisions fall into one of these three, and the default implementations (ArrayList, HashSet, HashMap) are correct for the majority of each category.
When should you use EnumMap over HashMap?
When all keys are constants of a single enum type. EnumMap uses the enum constant's ordinal as a direct array index — no hashing, no collision chains, no resize. It is the fastest possible Map for enum keys and uses less memory than HashMap. Common cases: feature flags by environment enum, route dispatch by HTTP method enum, state machine transitions by state enum. new EnumMap<>(DayOfWeek.class) is measurably faster than HashMap<DayOfWeek, V> for this exact use case.
Is it ever correct to use a List where a Set would work?
Yes, when duplicates are semantically meaningful (the same item added twice to a cart represents two units), when positional access is required (the fifth search result is distinct from the first), or when the list is small enough that O(n) contains is acceptable (under about 50 elements where constant factors dominate). LinkedHashSet covers the case where uniqueness AND order are both required. If both properties are needed but the list is also small and gets iterated together: LinkedHashSet is still the right choice.
How do I convert a List to a Set for deduplication?
new LinkedHashSet<>(list) deduplicates while preserving first-occurrence order in O(n). new HashSet<>(list) deduplicates in O(n) with no order guarantee. To get an ArrayList back: new ArrayList<>(new LinkedHashSet<>(list)). This one-liner is the idiomatic Java pattern for "deduplicate this list while preserving order."
What collection should I use to count occurrences?
Map<T, Integer> is the standard structure. The merge() method makes counting clean: map.merge(element, 1, Integer::sum) inserts 1 if absent or adds 1 to the existing count. For concurrent counting from multiple threads, ConcurrentHashMap with the same merge() call is thread-safe. Collections.frequency(list, element) exists but is O(n) per call — use a Map for counting over the full collection.
What is the difference between PriorityQueue and ArrayDeque for task scheduling?
ArrayDeque processes tasks in FIFO order — the task that arrived first is processed first. PriorityQueue processes tasks in priority order — the task with the lowest value (or custom Comparator-defined priority) is always processed next, regardless of arrival order. Use ArrayDeque when "first come, first served" fairness is the requirement. Use PriorityQueue when urgency or weight drives the processing order — incident systems, Dijkstra's shortest path, top-K streaming algorithms.
Summary
Collection choice in Java is a four-step decision: what shape is the data, what operations dominate, what ordering and uniqueness constraints apply, and is thread safety required. ArrayList, HashSet, HashMap, and ArrayDeque cover the majority of single-threaded cases. LinkedHashSet and LinkedHashMap cover the insertion-order cases. TreeSet and TreeMap cover sorted-order and range-query cases. CopyOnWriteArrayList and ConcurrentHashMap cover the concurrent cases.
The highest-impact improvement from getting collection choice right is turning O(n²) List.contains() loops into O(n) Set.add() loops. The second most impactful is replacing parallel lists or scanned lists-of-objects with a Map for O(1) key-based lookup. Both improvements are straightforward once you can see the pattern — and both are invisible to the developer who defaults to ArrayList for every collection need.
For interviews: articulate the List/Set/Map decision rule clearly, explain why HashSet.contains() is O(1) while ArrayList.contains() is O(n), describe when LinkedHashSet/LinkedHashMap are needed, explain the TreeMap range-query API, and contrast ConcurrentHashMap with Collections.synchronizedMap(). These questions span TCS fresher rounds through Flipkart and Razorpay system design discussions.
What to Read Next
| Topic | Link |
|---|---|
| How ArrayList's resizable array implements the List contract internally | Java ArrayList → |
| How HashSet uses HashMap internally to enforce Set uniqueness in O(1) | Java HashSet → |
| How HashMap's hash table achieves O(1) key lookup with bucket chaining | Java HashMap → |
| How TreeMap's Red-Black tree enables sorted keys and range navigation | Java TreeMap → |
| How the Collections Framework organises all interfaces and implementations | Java Collections Framework → |