The Java Collections Framework has over twenty concrete implementations. Most developers default to ArrayList and HashMap for everything and move on. That works until it does not — until a list.contains() inside a loop silently turns an O(n) operation into O(n²), or a HashMap in a shared service causes a race condition that only reproduces under load, or a HashSet breaks an alphabetically sorted output in a client demo. Collection choice is an architectural decision. Getting it right at design time prevents a class of bugs that are hard to diagnose later.

The Complete Java Collections Landscape

JAVA COLLECTIONS FRAMEWORK — FULL MAP

  INTERFACES AND THEIR IMPLEMENTATIONS:

  Collection<E>                     ← root of element-holding collections
  │
  ├── List<E>                        ← ORDERED sequence, duplicates ALLOWED
  │   │   "What is at position i?"
  │   ├── ArrayList                  ← resizable array, O(1) get  [DEFAULT CHOICE]
  │   ├── LinkedList                 ← doubly-linked, O(1) head/tail, also Deque
  │   └── CopyOnWriteArrayList       ← thread-safe, snapshot read, write-rare
  │
  ├── Set<E>                         ← UNIQUE elements, NO duplicates
  │   │   "Have I seen this before?"
  │   ├── HashSet                    ← O(1) all ops, no order  [DEFAULT CHOICE]
  │   ├── LinkedHashSet              ← O(1), insertion order preserved
  │   └── TreeSet                    ← O(log n), SORTED, NavigableSet API
  │
  └── Queue<E> / Deque<E>           ← PROCESSING ORDER (FIFO, LIFO, priority)
      │   "What should I process next?"
      ├── ArrayDeque                 ← circular array, O(1) both ends  [DEFAULT CHOICE]
      ├── PriorityQueue              ← binary min-heap, priority order (NOT FIFO)
      ├── LinkedBlockingQueue        ← bounded/unbounded, BLOCKING, thread-safe
      └── PriorityBlockingQueue      ← priority + blocking, thread-safe

  Map<K, V>   (NOT a Collection subtype)
  │   "What is the value for key X?"
  ├── HashMap                        ← O(1) all ops, no order  [DEFAULT CHOICE]
  ├── LinkedHashMap                  ← O(1), insertion order (LRU with accessOrder)
  ├── TreeMap                        ← O(log n), SORTED keys, NavigableMap API
  ├── EnumMap                        ← array-backed, enum keys only, fastest
  └── ConcurrentHashMap              ← thread-safe, lock-free reads, bucket writes

  THREAD-SAFE ALTERNATIVES:
    CopyOnWriteArrayList             ← List,  snapshot iteration, write-rare
    ConcurrentHashMap                ← Map,   bucket-level locking, atomic ops
    ConcurrentHashMap.newKeySet()    ← Set,   backed by ConcurrentHashMap
    ConcurrentSkipListMap            ← sorted Map, thread-safe TreeMap equivalent
    ConcurrentSkipListSet            ← sorted Set, thread-safe TreeSet equivalent
    LinkedBlockingQueue              ← Queue, blocking producer-consumer
    ArrayBlockingQueue               ← Queue, bounded blocking, array-backed

Basic Overview — The Four Core Questions

Every collection decision maps to one of four questions. Answer the question first, then pick the implementation.

QUESTION 1: "Does position or order matter AND are duplicates natural?"
  → List
  Examples: ordered pipeline steps, search result rankings,
            shopping cart (same item twice = two entries),
            message history, undo/redo stack content

QUESTION 2: "Do I need to know whether I have seen this element?"
  → Set
  Examples: processed order IDs, visited URLs, active user IDs,
            unique tags, distinct product categories

QUESTION 3: "Do I need to look up a value by a meaningful identifier?"
  → Map
  Examples: userId → UserProfile, SKU → stockLevel,
            word → frequency, config key → config value,
            city → population

QUESTION 4: "Do I need to process items in a defined order?"
  → Queue / Deque
  Examples: FIFO request queue, LIFO undo stack,
            priority-based task scheduler,
            producer-consumer pipeline

  COMBINED STRUCTURES (when a single interface is not enough):
    Map<K, List<V>>   → group items by category (duplicates within group OK)
    Map<K, Set<V>>    → group items with uniqueness per group
    Map<K, Integer>   → count occurrences of each key
    Set of Map entries → same as Map.entrySet()

When to Use Each Interface

Choosing List

USE List WHEN:
  1. Sequence and position carry meaning
     — ordered execution: step 1 must run before step 2
     — ranked results: rank 1 is different from rank 5
     — chronological history: events must be replayed in order

  2. Duplicates are semantically correct
     — the same product added twice to a cart = two line items
     — the same error appearing three times in a log = three events

  3. Index-based access is needed
     — results.get(0) for the top result
     — pipeline.get(currentStage) for workflow position

LIST IMPLEMENTATION GUIDE:
  ArrayList   → default for almost everything
               O(1) get, O(1) amortised add at end
               cache-friendly contiguous memory

  LinkedList  → ONLY when frequent O(1) insertions AND removals
               at BOTH ends AND List interface is also needed
               O(n) get(index) makes it wrong for random access
               Use ArrayDeque instead for pure queue/stack

  CopyOnWriteArrayList → ONLY for concurrent, read-heavy,
               write-rare lists (listener registries, config)
               O(n) full array copy on every write

Choosing Set

USE Set WHEN:
  1. Membership testing is the primary operation
     — "Has this orderId been processed?"  → O(1) with HashSet
     — "Has this URL been crawled?"        → O(1) with HashSet
     — "Is this userId active?"            → O(1) with HashSet
     The same question with List: O(n) scan every time

  2. Deduplication is required
     — collecting unique tags from multiple articles
     — building a distinct visitor list from a raw event stream
     — ensuring no duplicate notification is sent

  3. Set algebra is needed
     — intersection (retainAll): skills a candidate has AND the job requires
     — difference (removeAll): required skills the candidate is missing
     — union (addAll): all skills from any of several candidates

SET IMPLEMENTATION GUIDE:
  HashSet       → default for speed, no ordering concern
                  O(1) average add/remove/contains

  LinkedHashSet → when insertion order must be preserved alongside uniqueness
                  "First-seen" dedup: keeps first occurrence position
                  Deterministic test output

  TreeSet       → when sorted order is needed, or range queries required
                  O(log n) operations
                  floor(), ceiling(), headSet(), tailSet()
                  Rejects null elements

Choosing Map

USE Map WHEN:
  1. Key-based lookup by a meaningful identifier
     — userId → session (not a slot in an array, an ID with business meaning)
     — productSku → inventoryCount
     — configKey → configValue

  2. Counting or frequency analysis
     — word → count: map.merge(word, 1, Integer::sum)
     — event type → occurrences

  3. Grouping elements by a category key
     — category → List<Product>
     — date → List<Event>
     — role → Set<Permission>

MAP IMPLEMENTATION GUIDE:
  HashMap         → default for almost all single-threaded use
                    O(1) average all ops, no ordering

  LinkedHashMap   → when insertion-order iteration matters
                    Deterministic API responses
                    LRU cache: new LinkedHashMap<>(n, 0.75f, true)
                               + override removeEldestEntry()

  TreeMap         → when sorted key order is required
                    Range views: headMap(), tailMap(), subMap()
                    Nearest key: floorKey(), ceilingKey()
                    O(log n) operations

  ConcurrentHashMap → for multi-threaded shared maps
                    Lock-free reads, per-bucket writes
                    Atomic compound ops: putIfAbsent, computeIfAbsent, merge

  EnumMap         → keys are enum constants only
                    Array-backed, no hashing needed, fastest possible

Choosing Queue or Deque

USE Queue / Deque WHEN:
  1. Processing order is the concern, not lookup
     — FIFO: process requests in arrival order
     — LIFO: undo the most recent action first
     — Priority: always process the highest-urgency item next

  2. Both-ends access is needed simultaneously
     — sliding window algorithms
     — work-stealing schedulers

QUEUE/DEQUE IMPLEMENTATION GUIDE:
  ArrayDeque      → DEFAULT for all single-threaded queue and stack use
                    Replaces java.util.Stack (Stack is synchronised even
                    in single-threaded use — legacy overhead)
                    O(1) both ends, no per-element allocation

  PriorityQueue   → when items have varying priority
                    Always returns the minimum (or max with reverseOrder())
                    NOT FIFO — insertion order is irrelevant
                    Used in Dijkstra, A*, top-K problems

  LinkedBlockingQueue → producer-consumer with multiple threads
                    take() blocks consumers when empty
                    put() blocks producers when full (bounded)

  ArrayBlockingQueue → bounded blocking queue for back-pressure control
                    Fixed capacity set at construction

The Decision Framework — Four Steps

STEP 1: CHOOSE THE INTERFACE

  Is order/position meaningful AND duplicates are natural?
    → List

  Is uniqueness the core constraint?
    → Set

  Is lookup by a meaningful key the primary operation?
    → Map

  Is processing order the concern (FIFO, LIFO, priority)?
    → Queue or Deque

STEP 2: CHOOSE ORDERING WITHIN THE INTERFACE

  No ordering needed?
    → ArrayList / HashSet / HashMap / ArrayDeque

  Insertion order must be preserved?
    → ArrayList (always) / LinkedHashSet / LinkedHashMap

  Sorted natural or comparator order?
    → TreeSet / TreeMap (and NavigableSet/NavigableMap API)

  Priority order (smallest or custom comparator first)?
    → PriorityQueue

STEP 3: CHECK THREAD-SAFETY REQUIREMENT

  Single-threaded only?
    → Standard java.util implementations (faster, less memory)

  Multi-threaded, read-heavy, write-rare?
    → CopyOnWriteArrayList / ConcurrentHashMap.newKeySet()

  Multi-threaded, balanced reads and writes?
    → ConcurrentHashMap / ConcurrentSkipListMap

  Multi-threaded producer-consumer queue?
    → LinkedBlockingQueue / ArrayBlockingQueue

STEP 4: CHECK SIZE AND PERFORMANCE CONSTRAINTS

  Expected size < 50 elements?
    → Any structure works; standard implementations fine

  Expected size in thousands with frequent lookup?
    → Hash-based (HashSet, HashMap) for O(1) or tree-based for sorted

  Expected size in millions?
    → Pre-size: new HashMap<>(n / 0.75 + 1), new ArrayList<>(n)
    → Avoid CopyOnWriteArrayList (O(n) write copies entire large array)
    → Consider memory: LinkedList ~28 bytes per node vs ArrayList ~4 bytes

Core Operations with Examples

The Four Core Patterns Side by Side

Java

// File: CollectionSelectionDemo.java

import java.util.*;
import java.util.stream.Collectors;

public class CollectionSelectionDemo {

    public static void main(String[] args) {

        // PATTERN 1: List — ordered history, duplicates are real events
        System.out.println("=== List — search history (order + duplicates) ===");
        List<String> searchHistory = new ArrayList<>();
        searchHistory.add("java collections");
        searchHistory.add("hashmap tutorial");
        searchHistory.add("java collections"); // same search repeated — real event
        searchHistory.add("arraylist vs linkedlist");
        System.out.println("History  : " + searchHistory);
        System.out.println("Count    : " + searchHistory.size()); // 4, not 3
        System.out.println("Recent   : " + searchHistory.get(searchHistory.size() - 1));

        System.out.println();

        // PATTERN 2: Set — deduplication + O(1) membership
        System.out.println("=== Set — processed notifications (uniqueness) ===");
        Set<String> dispatched = new HashSet<>();
        String[] incoming = {"NTF-001","NTF-002","NTF-001","NTF-003","NTF-002"};
        int sent = 0, skipped = 0;
        for (String notifId : incoming) {
            if (dispatched.add(notifId)) { // false = duplicate, already dispatched
                sent++;
                System.out.println("  SENT   : " + notifId);
            } else {
                skipped++;
                System.out.println("  SKIPPED: " + notifId + " (already dispatched)");
            }
        }
        System.out.printf("Result: %d sent, %d skipped%n", sent, skipped);

        System.out.println();

        // PATTERN 3: Map — key-value lookup
        System.out.println("=== Map — user role lookup (O(1) by key) ===");
        Map<String, String> userRoles = new HashMap<>();
        userRoles.put("priya@corp.com",  "ADMIN");
        userRoles.put("rohan@corp.com",  "DEVELOPER");
        userRoles.put("ananya@corp.com", "VIEWER");
        String[] requesters = {"priya@corp.com", "karan@corp.com", "rohan@corp.com"};
        for (String email : requesters) {
            System.out.printf("  %-20s → %s%n",
                email, userRoles.getOrDefault(email, "GUEST"));
        }

        System.out.println();

        // PATTERN 4: Queue — FIFO processing order
        System.out.println("=== Queue — task processing (FIFO order) ===");
        Queue<String> taskQueue = new ArrayDeque<>();
        taskQueue.offer("TASK-A");
        taskQueue.offer("TASK-B");
        taskQueue.offer("TASK-C");
        System.out.println("Queue: " + taskQueue);
        while (!taskQueue.isEmpty()) {
            System.out.println("  Processing: " + taskQueue.poll()); // FIFO — A first
        }
    }
}

Output:
=== List — search history (order + duplicates) ===
History  : [java collections, hashmap tutorial, java collections, arraylist vs linkedlist]
Count    : 4
Recent   : arraylist vs linkedlist

=== Set — processed notifications (uniqueness) ===
  SENT   : NTF-001
  SENT   : NTF-002
  SKIPPED: NTF-001 (already dispatched)
  SENT   : NTF-003
  SKIPPED: NTF-002 (already dispatched)
Result: 3 sent, 2 skipped

=== Map — user role lookup (O(1) by key) ===
  priya@corp.com       → ADMIN
  karan@corp.com       → GUEST
  rohan@corp.com       → DEVELOPER

=== Queue — task processing (FIFO order) ===
Queue: [TASK-A, TASK-B, TASK-C]
  Processing: TASK-A
  Processing: TASK-B
  Processing: TASK-C

Ordering and Thread-Safety Variants

Java

// File: OrderingVariantsDemo.java

import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.CopyOnWriteArrayList;

public class OrderingVariantsDemo {

    public static void main(String[] args) {

        // Variant 1: LinkedHashSet — unique + insertion order
        System.out.println("=== LinkedHashSet — unique tags, first-seen order ===");
        Set<String> uniqueTags = new LinkedHashSet<>();
        String[] raw = {"java","spring","java","kafka","spring","redis","java"};
        for (String tag : raw) uniqueTags.add(tag);
        System.out.println("Input : " + Arrays.toString(raw));
        System.out.println("Result: " + uniqueTags); // java, spring, kafka, redis

        System.out.println();

        // Variant 2: TreeMap — sorted keys + range navigation
        System.out.println("=== TreeMap — sorted price tiers + range queries ===");
        TreeMap<Integer, String> priceTiers = new TreeMap<>();
        priceTiers.put(199,  "Basic");
        priceTiers.put(499,  "Standard");
        priceTiers.put(999,  "Premium");
        priceTiers.put(2999, "Enterprise");

        int budget = 600;
        System.out.println("Sorted tiers: " + priceTiers);
        System.out.println("Best within Rs." + budget + " : " + priceTiers.floorKey(budget));   // 499
        System.out.println("Next above Rs." + budget + "  : " + priceTiers.ceilingKey(budget)); // 999
        System.out.println("Tiers <= Rs." + budget + "    : " + priceTiers.headMap(budget + 1));

        System.out.println();

        // Variant 3: ConcurrentHashMap — thread-safe shared map
        System.out.println("=== ConcurrentHashMap — thread-safe counter ===");
        Map<String, Integer> concurrentCounter = new ConcurrentHashMap<>();
        // merge() is atomic — safe to call from multiple threads simultaneously
        String[] events = {"LOGIN","VIEW","LOGIN","PURCHASE","VIEW","LOGIN"};
        for (String event : events) {
            concurrentCounter.merge(event, 1, Integer::sum);
        }
        System.out.println("Event counts: " + concurrentCounter);

        System.out.println();

        // Variant 4: CopyOnWriteArrayList — thread-safe listener list
        System.out.println("=== CopyOnWriteArrayList — thread-safe listeners ===");
        List<String> listeners = new CopyOnWriteArrayList<>();
        listeners.add("AuditLogger");
        listeners.add("MetricsSender");
        listeners.add("EmailNotifier");
        System.out.println("Registered: " + listeners);
        // Iteration never throws CME even if another thread adds/removes
        for (String listener : listeners) {
            System.out.println("  Notifying: " + listener);
        }
    }
}

Output:
=== LinkedHashSet — unique tags, first-seen order ===
Input : [java, spring, java, kafka, spring, redis, java]
Result: [java, spring, kafka, redis]

=== TreeMap — sorted price tiers + range queries ===
Sorted tiers: {199=Basic, 499=Standard, 999=Premium, 2999=Enterprise}
Best within Rs.600  : 499
Next above Rs.600   : 999
Tiers <= Rs.600     : {199=Basic, 499=Standard}

=== ConcurrentHashMap — thread-safe counter ===
Event counts: {PURCHASE=1, LOGIN=3, VIEW=2}

=== CopyOnWriteArrayList — thread-safe listeners ===
Registered: [AuditLogger, MetricsSender, EmailNotifier]
  Notifying: AuditLogger
  Notifying: MetricsSender
  Notifying: EmailNotifier

Real-World Example — CRED Rewards and Offer Engine

CRED's offer engine demonstrates four different collection choices in one cohesive system. An ArrayList holds the ordered offer pipeline steps where sequence is fixed by business rules. A HashSet tracks which user IDs have already received a particular offer for idempotency. A LinkedHashMap maintains the user's reward history in insertion order for display. A TreeMap indexes offer tiers by minimum credit score for range-based eligibility lookup.

Java

// File: CreditOffer.java

public record CreditOffer(
        String offerId,
        String title,
        double cashbackPercent,
        int minCreditScore) {

    @Override
    public String toString() {
        return String.format("[%s] %-28s %.1f%% cashback (min score: %d)",
            offerId, title, cashbackPercent, minCreditScore);
    }
}

Java

// File: OfferEngine.java

import java.util.ArrayList;
import java.util.HashSet;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.TreeMap;

public class OfferEngine {

    // TreeMap — O(log n) range lookup: "best offer for credit score X"
    private final TreeMap<Integer, CreditOffer> offersByMinScore = new TreeMap<>();

    // HashSet — O(1) idempotency: "has this user received this offer?"
    private final Set<String> dispatchedKeys = new HashSet<>();

    // LinkedHashMap — insertion-order reward history per user for display
    private final Map<String, List<String>> userRewardHistory = new LinkedHashMap<>();

    // ArrayList — ordered offer pipeline: validation steps run in fixed sequence
    private final List<String> pipelineSteps = new ArrayList<>();

    public void registerOffer(CreditOffer offer) {
        offersByMinScore.put(offer.minCreditScore(), offer);
    }

    public void initPipeline() {
        pipelineSteps.add("VALIDATE_USER");
        pipelineSteps.add("CHECK_CREDIT_SCORE");
        pipelineSteps.add("FIND_ELIGIBLE_OFFER");
        pipelineSteps.add("CHECK_IDEMPOTENCY");
        pipelineSteps.add("DISPATCH_OFFER");
        pipelineSteps.add("RECORD_HISTORY");
    }

    public void processUser(String userId, int creditScore) {

        System.out.println("  Processing: userId=" + userId + " score=" + creditScore);

        // ArrayList — iterate ordered pipeline (position = execution order)
        for (String step : pipelineSteps) {
            System.out.println("    [" + step + "]");

            if ("FIND_ELIGIBLE_OFFER".equals(step)) {
                // TreeMap floorEntry — best offer whose minScore <= user's score
                Map.Entry<Integer, CreditOffer> entry = offersByMinScore.floorEntry(creditScore);
                if (entry == null) {
                    System.out.println("      No eligible offer found. Stopping.");
                    return;
                }
                System.out.println("      Eligible: " + entry.getValue());

            } else if ("CHECK_IDEMPOTENCY".equals(step)) {
                String dispatchKey = userId + ":" + offersByMinScore.floorKey(creditScore);
                // HashSet.add() — returns false if already dispatched (O(1))
                if (!dispatchedKeys.add(dispatchKey)) {
                    System.out.println("      DUPLICATE — offer already sent. Stopping.");
                    return;
                }
                System.out.println("      Idempotency key registered: " + dispatchKey);

            } else if ("RECORD_HISTORY".equals(step)) {
                // LinkedHashMap + ArrayList — ordered reward history per user
                CreditOffer offer = offersByMinScore.floorEntry(creditScore).getValue();
                userRewardHistory
                    .computeIfAbsent(userId, k -> new ArrayList<>())
                    .add(offer.title() + " (" + offer.cashbackPercent() + "%)");
                System.out.println("      History recorded.");
            }
        }
        System.out.println("  DONE.\n");
    }

    public void printHistory() {
        System.out.println("=".repeat(62));
        System.out.println("  USER REWARD HISTORY (insertion order)");
        System.out.println("=".repeat(62));
        // LinkedHashMap iterates in user-first-seen order
        userRewardHistory.forEach((userId, rewards) -> {
            System.out.println("  " + userId + ":");
            rewards.forEach(r -> System.out.println("    - " + r));
        });
        System.out.println("=".repeat(62));
    }

    public static void main(String[] args) {

        OfferEngine engine = new OfferEngine();
        engine.initPipeline();

        engine.registerOffer(new CreditOffer("OFR-01", "Silver Cashback Card",  1.5, 600));
        engine.registerOffer(new CreditOffer("OFR-02", "Gold Rewards Program",  3.0, 720));
        engine.registerOffer(new CreditOffer("OFR-03", "Platinum Elite Offer",  5.5, 800));
        engine.registerOffer(new CreditOffer("OFR-04", "Black Card Privileges", 8.0, 850));

        System.out.println("--- Processing users ---\n");
        engine.processUser("user-priya",  820); // eligible for Platinum
        engine.processUser("user-rohan",  740); // eligible for Gold
        engine.processUser("user-priya",  820); // duplicate — idempotency check stops it
        engine.processUser("user-ananya", 560); // score too low for any offer
        engine.processUser("user-karan",  860); // eligible for Black Card

        engine.printHistory();

        System.out.println("\n--- Collection roles in this system ---");
        System.out.println("ArrayList  : ordered pipeline steps (sequence = business rule)");
        System.out.println("HashSet    : O(1) idempotency check (no duplicate offers)");
        System.out.println("TreeMap    : O(log n) score range lookup (floorEntry)");
        System.out.println("LinkedHashMap + ArrayList: insertion-order reward history per user");
    }
}

Output:
--- Processing users ---

  Processing: userId=user-priya score=820
    [VALIDATE_USER]
    [CHECK_CREDIT_SCORE]
    [FIND_ELIGIBLE_OFFER]
      Eligible: [OFR-03] Platinum Elite Offer         5.5% cashback (min score: 800)
    [CHECK_IDEMPOTENCY]
      Idempotency key registered: user-priya:800
    [DISPATCH_OFFER]
    [RECORD_HISTORY]
      History recorded.
  DONE.

  Processing: userId=user-rohan score=740
    [VALIDATE_USER]
    [CHECK_CREDIT_SCORE]
    [FIND_ELIGIBLE_OFFER]
      Eligible: [OFR-02] Gold Rewards Program         3.0% cashback (min score: 720)
    [CHECK_IDEMPOTENCY]
      Idempotency key registered: user-rohan:720
    [DISPATCH_OFFER]
    [RECORD_HISTORY]
      History recorded.
  DONE.

  Processing: userId=user-priya score=820
    [VALIDATE_USER]
    [CHECK_CREDIT_SCORE]
    [FIND_ELIGIBLE_OFFER]
      Eligible: [OFR-03] Platinum Elite Offer         5.5% cashback (min score: 800)
    [CHECK_IDEMPOTENCY]
      DUPLICATE — offer already sent. Stopping.

  Processing: userId=user-ananya score=560
    [VALIDATE_USER]
    [CHECK_CREDIT_SCORE]
    [FIND_ELIGIBLE_OFFER]
      No eligible offer found. Stopping.

  Processing: userId=user-karan score=860
    [VALIDATE_USER]
    [CHECK_CREDIT_SCORE]
    [FIND_ELIGIBLE_OFFER]
      Eligible: [OFR-04] Black Card Privileges        8.0% cashback (min score: 850)
    [CHECK_IDEMPOTENCY]
      Idempotency key registered: user-karan:850
    [DISPATCH_OFFER]
    [RECORD_HISTORY]
      History recorded.
  DONE.

==============================================================
  USER REWARD HISTORY (insertion order)
==============================================================
  user-priya:
    - Platinum Elite Offer (5.5%)
  user-rohan:
    - Gold Rewards Program (3.0%)
  user-karan:
    - Black Card Privileges (8.0%)
==============================================================

--- Collection roles in this system ---
ArrayList  : ordered pipeline steps (sequence = business rule)
HashSet    : O(1) idempotency check (no duplicate offers)
TreeMap    : O(log n) score range lookup (floorEntry)
LinkedHashMap + ArrayList: insertion-order reward history per user

Performance Considerations

Collection	Add	Get/Contains	Remove	Ordered?	Thread-safe?
ArrayList	O(1) amort	O(n) contains, O(1) get(i)	O(n)	Insertion	No
LinkedList	O(1) ends	O(n)	O(1) ptr	Insertion	No
HashSet	O(1) avg	O(1) avg	O(1) avg	No	No
LinkedHashSet	O(1) avg	O(1) avg	O(1) avg	Insertion	No
TreeSet	O(log n)	O(log n)	O(log n)	Sorted	No
HashMap	O(1) avg	O(1) avg	O(1) avg	No	No
LinkedHashMap	O(1) avg	O(1) avg	O(1) avg	Insertion	No
TreeMap	O(log n)	O(log n)	O(log n)	Sorted keys	No
ArrayDeque	O(1) amort	O(n) contains	O(1) ends	Insertion	No
PriorityQueue	O(log n)	O(1) peek	O(log n)	Priority	No
CopyOnWriteArrayList	O(n)	O(n) contains, O(1) get(i)	O(n)	Insertion	Yes
ConcurrentHashMap	O(1) avg	O(1) avg	O(1) avg	No	Yes

The O(n) contains trap is the most costly collection mistake. ArrayList.contains() scans every element. For 10,000 items and 10,000 lookups: 100,000,000 comparisons. HashSet.contains() on the same data: 10,000 operations. Replacing a List with a Set for membership-test-heavy code is the single highest-impact collection change in production Java. During code reviews, this pattern — list.contains(item) inside a loop — is a near-automatic refactor to Set.

Best Practices

Choose the narrowest interface that satisfies your requirements. Declare List<E>, Set<E>, Map<K,V> — not ArrayList, HashSet, HashMap. This disciplines callers to the intended contract and lets you swap implementations without changing signatures. The only exception: when you need an implementation-specific method (e.g., TreeMap.floorKey() or ArrayDeque.push()), declare the next most specific interface (NavigableMap, Deque) rather than the concrete class.

Identify the dominant operation and choose accordingly. Is contains() called frequently? Use Set or Map. Is get(index) critical? Use ArrayList. Is add/remove at both ends frequent? Use ArrayDeque. Is sorted order always required? Use TreeSet or TreeMap. Is order-preserving deduplication needed? Use LinkedHashSet. Let the access pattern drive the choice — do not default to familiar classes.

Pre-size hash-based collections when the expected entry count is known. new HashMap<>(expectedSize / 0.75 + 1) allocates the backing array large enough to avoid any resize. new ArrayList<>(expectedSize) avoids O(n) array copies during growth. For a HashMap loading 100,000 entries, pre-sizing eliminates seven resize operations — each of which copies every existing entry.

For grouping, always use computeIfAbsent(). map.computeIfAbsent(key, k -> new ArrayList<>()).add(value) is atomic and clear. The three-line alternative — containsKey check, construct, put, then get — is both verbose and has a subtle issue: the get() after put() is a second hash lookup. computeIfAbsent() returns the existing or newly created collection in a single operation.

Common Mistakes

Mistake 1 — Using List.contains() in a Loop Instead of a Set

Java

// WRONG — O(n²): contains() scans the full list on every iteration
List<String> processedIds = new ArrayList<>();
for (String incoming : millionsOfIncomingIds) {
    if (!processedIds.contains(incoming)) { // O(n) scan each time
        processedIds.add(incoming);
        process(incoming);
    }
}

// CORRECT — O(n): HashSet.add() returns false for duplicates in O(1)
Set<String> processedIds2 = new HashSet<>();
for (String incoming : millionsOfIncomingIds) {
    if (processedIds2.add(incoming)) { // O(1): false = duplicate, already seen
        process(incoming);
    }
}

Mistake 2 — Choosing HashMap When Order Is Required

Java

// WRONG — HashMap iteration order is undefined and can change after resize
Map<String, Integer> configDisplay = new HashMap<>();
configDisplay.put("timeout",  30);
configDisplay.put("retries",  3);
configDisplay.put("baseUrl",  443);
// Iterating for display: order is random — timeout may appear last

// CORRECT — LinkedHashMap when insertion order must be preserved
Map<String, Integer> configOrdered = new java.util.LinkedHashMap<>();
configOrdered.put("timeout", 30);   // iterates: timeout, retries, baseUrl
configOrdered.put("retries", 3);    // always in this sequence
configOrdered.put("baseUrl", 443);

Mistake 3 — Using java.util.Stack Instead of ArrayDeque

Java

// WRONG — Stack extends Vector, every push/pop acquires a global mutex
// even in single-threaded code
java.util.Stack<String> callStack = new java.util.Stack<>();
callStack.push("main()");
callStack.push("processOrder()");
String top = callStack.pop(); // acquires lock — unnecessary single-thread overhead

// CORRECT — ArrayDeque: same LIFO semantics, zero synchronisation cost
Deque<String> callStack2 = new ArrayDeque<>();
callStack2.push("main()");
callStack2.push("processOrder()");
String top2 = callStack2.pop(); // O(1) amortised, no lock

Mistake 4 — Using a Sorted Collection When Sorting Once on Output Is Sufficient

Java

// WRONG — TreeSet pays O(log n) on every insert when sorted output
// is only needed once at the end
Set<String> accumulator = new TreeSet<>();
for (String item : millionsOfInserts) {
    accumulator.add(item); // O(log n) per insert → O(n log n) total with tree overhead
}

// CORRECT — collect into HashSet O(1) per insert, sort once at the end
Set<String> accumulator2 = new HashSet<>();
for (String item : millionsOfInserts) {
    accumulator2.add(item); // O(1) per insert → O(n) total
}
List<String> sorted = new ArrayList<>(accumulator2);
java.util.Collections.sort(sorted); // O(n log n) once at the end
// Same total asymptotic cost, but lower constant factors than TreeSet for bulk load

Interview Questions

Q1. How do you decide which Java collection to use?

Start with the shape of the data: List for ordered sequences where duplicates are natural, Set for unique element membership, Map for key-value lookup, Queue/Deque for processing order. Then choose the implementation based on three factors: ordering requirement (no-order → Hash, insertion-order → Linked, sorted → Tree), thread-safety requirement (single-threaded → standard java.util, multi-threaded → java.util.concurrent), and dominant operation (frequent contains() → hash-based, range queries → tree-based, positional access → ArrayList).

Q2. When should you replace a List with a Set or Map in production code?

Replace List with Set when list.contains(element) is called in a loop or repeatedly — this is O(n) per call, O(n²) total. A Set makes contains() O(1). Replace List with Map when the code scans a list by comparing some field value to find the matching entry — for example, list.stream().filter(u -> u.getId().equals(id)).findFirst(). That is O(n) per lookup. A Map<String, User> keyed by id gives O(1). Both substitutions are high-impact and appear routinely in code reviews.

Q3. What is the difference between HashSet, LinkedHashSet, and TreeSet?

All three implement Set and reject duplicate elements. HashSet uses a hash table — O(1) average for all operations, no ordering guarantee, fastest for pure membership testing. LinkedHashSet adds a doubly-linked list threading all entries in insertion sequence — O(1) operations, predictable iteration order. TreeSet uses a Red-Black tree — O(log n) for all operations, always iterates in sorted natural or Comparator order, and provides NavigableSet range methods (first(), last(), headSet(), tailSet(), floor(), ceiling()).

Q4. When would you use TreeMap over HashMap?

Use TreeMap when sorted key order is required for iteration, when range views are needed (headMap(toKey), tailMap(fromKey), subMap(from, to)), or when nearest-key navigation is needed (floorKey(), ceilingKey()). Common use cases: time-series data keyed by timestamp, price tier lookup by score range, alphabetically sorted menus, and any use case where "give me all entries between A and B" is a real operation. HashMap is O(1) but cannot answer range or order questions. TreeMap is O(log n) but can.

Q5. What are the thread-safe collection choices and when should each be used?

For a thread-safe List with read-heavy, write-rare workloads: CopyOnWriteArrayList — snapshot iteration, never CME, O(n) write. For a thread-safe Map with concurrent reads and writes: ConcurrentHashMap — lock-free reads, per-bucket writes, atomic compound operations. For a thread-safe sorted Map: ConcurrentSkipListMap. For a producer-consumer queue with blocking semantics: LinkedBlockingQueue or ArrayBlockingQueue. Never use Hashtable or Collections.synchronizedMap() in new code — both hold a global lock on every method, serialising all access.

Q6. Why is ArrayDeque recommended over Stack and LinkedList for queue and stack operations?

java.util.Stack extends Vector, which synchronises every method using a global mutex — push() and pop() acquire a lock on every call even in single-threaded code. This is pure overhead. LinkedList allocates a Node object per element, increasing GC pressure and causing CPU cache misses on iteration since nodes are scattered in heap memory. ArrayDeque uses a circular resizable array — contiguous memory, cache-friendly, O(1) amortised at both ends, zero synchronisation cost. The Java documentation explicitly recommends ArrayDeque as the preferred replacement for both.

FAQs

What is the simplest rule for choosing between List, Set, and Map?

Three questions: Does the order of elements matter and are duplicates valid data? → List. Is uniqueness the constraint and do you need O(1) membership testing? → Set. Do you need to look up a value by a meaningful key? → Map. More than 80% of collection decisions fall into one of these three, and the default implementations (ArrayList, HashSet, HashMap) are correct for the majority of each category.

When should you use EnumMap over HashMap?

When all keys are constants of a single enum type. EnumMap uses the enum constant's ordinal as a direct array index — no hashing, no collision chains, no resize. It is the fastest possible Map for enum keys and uses less memory than HashMap. Common cases: feature flags by environment enum, route dispatch by HTTP method enum, state machine transitions by state enum. new EnumMap<>(DayOfWeek.class) is measurably faster than HashMap<DayOfWeek, V> for this exact use case.

Is it ever correct to use a List where a Set would work?

Yes, when duplicates are semantically meaningful (the same item added twice to a cart represents two units), when positional access is required (the fifth search result is distinct from the first), or when the list is small enough that O(n) contains is acceptable (under about 50 elements where constant factors dominate). LinkedHashSet covers the case where uniqueness AND order are both required. If both properties are needed but the list is also small and gets iterated together: LinkedHashSet is still the right choice.

How do I convert a List to a Set for deduplication?

new LinkedHashSet<>(list) deduplicates while preserving first-occurrence order in O(n). new HashSet<>(list) deduplicates in O(n) with no order guarantee. To get an ArrayList back: new ArrayList<>(new LinkedHashSet<>(list)). This one-liner is the idiomatic Java pattern for "deduplicate this list while preserving order."

What collection should I use to count occurrences?

Map<T, Integer> is the standard structure. The merge() method makes counting clean: map.merge(element, 1, Integer::sum) inserts 1 if absent or adds 1 to the existing count. For concurrent counting from multiple threads, ConcurrentHashMap with the same merge() call is thread-safe. Collections.frequency(list, element) exists but is O(n) per call — use a Map for counting over the full collection.

What is the difference between PriorityQueue and ArrayDeque for task scheduling?

ArrayDeque processes tasks in FIFO order — the task that arrived first is processed first. PriorityQueue processes tasks in priority order — the task with the lowest value (or custom Comparator-defined priority) is always processed next, regardless of arrival order. Use ArrayDeque when "first come, first served" fairness is the requirement. Use PriorityQueue when urgency or weight drives the processing order — incident systems, Dijkstra's shortest path, top-K streaming algorithms.

Summary

Collection choice in Java is a four-step decision: what shape is the data, what operations dominate, what ordering and uniqueness constraints apply, and is thread safety required. ArrayList, HashSet, HashMap, and ArrayDeque cover the majority of single-threaded cases. LinkedHashSet and LinkedHashMap cover the insertion-order cases. TreeSet and TreeMap cover sorted-order and range-query cases. CopyOnWriteArrayList and ConcurrentHashMap cover the concurrent cases.

The highest-impact improvement from getting collection choice right is turning O(n²) List.contains() loops into O(n) Set.add() loops. The second most impactful is replacing parallel lists or scanned lists-of-objects with a Map for O(1) key-based lookup. Both improvements are straightforward once you can see the pattern — and both are invisible to the developer who defaults to ArrayList for every collection need.

For interviews: articulate the List/Set/Map decision rule clearly, explain why HashSet.contains() is O(1) while ArrayList.contains() is O(n), describe when LinkedHashSet/LinkedHashMap are needed, explain the TreeMap range-query API, and contrast ConcurrentHashMap with Collections.synchronizedMap(). These questions span TCS fresher rounds through Flipkart and Razorpay system design discussions.

What to Read Next

Topic	Link
How ArrayList's resizable array implements the List contract internally	Java ArrayList →
How HashSet uses HashMap internally to enforce Set uniqueness in O(1)	Java HashSet →
How HashMap's hash table achieves O(1) key lookup with bucket chaining	Java HashMap →
How TreeMap's Red-Black tree enables sorted keys and range navigation	Java TreeMap →
How the Collections Framework organises all interfaces and implementations	Java Collections Framework →

←

PreviousList vs Set vs Map

NextComparable

→