217. Contains Duplicate

LeetCode: Problem #217
Difficulty: Easy
Topics: Array Hash Map

Problem Statement

Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.

Constraints:

1 <= nums.length <= 10⁵
-10⁹ <= nums[i] <= 10⁹

Example

Input:  nums = [1, 2, 3, 1]
Output: true
Explanation: 1 appears at index 0 and index 3

Input:  nums = [1, 2, 3, 4]
Output: false
Explanation: all elements are distinct

Input:  nums = [1, 1, 1, 3, 3, 4, 3, 2, 4, 2]
Output: true
Explanation: multiple duplicates exist

How to Think About This Problem

Step 1 — Understand what's being asked

We need to detect whether any number shows up more than once. The moment we find a single repeat, we can return true immediately. If we finish scanning the entire array without finding one, we return false.

Notice: we don't need to find ALL duplicates, count them, or report their positions. We just need to answer "yes or no — does any duplicate exist?"

Step 2 — Identify the constraint that matters

The constraint that makes this interesting is scale: up to 100,000 elements. A naive approach that checks every element against every other element requires O(n²) comparisons — up to 10 billion operations for the largest inputs. That's too slow.

The real question becomes: how do we check "have I seen this number before?" in less than O(n) time per lookup?

Step 3 — Think about data structures

When you need fast membership testing ("is X already in my collection?"), a hash set is the right tool.

List membership check: O(n) — scans the entire list
Hash set membership check: O(1) — computes a hash and checks one bucket

By maintaining a set of numbers we've already seen, we can answer "seen before?" in constant time for every element.

Step 4 — Build the intuition

Imagine you're a bouncer at a club with a guest list. As people arrive, you check their name against your notepad. If the name is on the notepad, they've already been in — that's a duplicate. If not, you write it down and let them through.

The notepad is your hash set. You never scan the whole notepad — you just check the relevant entry. That's the O(1) lookup guarantee.

Approaches

Approach 1 — Brute Force

Intuition: Compare every possible pair of elements and check if any two are equal.

Steps:

For each element at index i, loop through all elements at index j > i
If nums[i] == nums[j], a duplicate exists — return true
If no match found after all pairs, return false

Illustration:

nums = [1, 2, 3, 1]

i=0 (num=1):
  j=1: 1 == 2? No
  j=2: 1 == 3? No
  j=3: 1 == 1? YES → return True ✓

(If no match:)
i=1 (num=2):
  j=2: 2 == 3? No
  j=3: 2 == 1? No
i=2 (num=3):
  j=3: 3 == 1? No
→ return False

Complexity:

Time: O(n²) — for each of n elements, we scan up to n-1 others
Space: O(1) — no extra data structure used

Code:

def contains_duplicates_brute_force(nums: list[int]) -> bool:
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] == nums[j]:
                return True
    return False

Approach 2 — Optimal (Hash Set)

Intuition: Walk through the array once, tracking every number in a set; the moment we encounter a number already in the set, we have our duplicate.

Steps:

Create an empty set seen
For each number in nums:
- If num is already in seen → return True
- Otherwise → add num to seen
If the loop completes → return False

Illustration:

nums = [1, 2, 3, 1]

num=1: seen={}       → 1 not in seen → seen={1}
num=2: seen={1}      → 2 not in seen → seen={1, 2}
num=3: seen={1, 2}   → 3 not in seen → seen={1, 2, 3}
num=1: seen={1,2,3}  → 1 IS in seen  → return True ✓

nums = [1, 2, 3, 4]

num=1: seen={}         → seen={1}
num=2: seen={1}        → seen={1, 2}
num=3: seen={1, 2}     → seen={1, 2, 3}
num=4: seen={1, 2, 3}  → seen={1, 2, 3, 4}
Loop ends             → return False ✓

Complexity:

Time: O(n) — single pass, O(1) lookup and insert per element
Space: O(n) — set grows up to n elements in the worst case (all unique)

Code:

def contains_duplicates_hash_map(nums: list[int]) -> bool:
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

Approach 3 — Length Comparison (One-Liner)

Intuition: A set automatically removes duplicates; if the set is smaller than the original array, at least one duplicate existed.

Steps:

Convert nums to a set (all duplicates removed)
Compare the length of the set with the original array length
If lengths differ, duplicates existed

Illustration:

nums = [1, 2, 3, 1]

set(nums) = {1, 2, 3}   → length 3
len(nums) = 4

3 != 4 → True ✓

nums = [1, 2, 3, 4]

set(nums) = {1, 2, 3, 4} → length 4
len(nums) = 4

4 == 4 → False ✓

Complexity:

Time: O(n) — building the set requires one pass
Space: O(n) — the set holds up to n elements

Code:

def contains_duplicates_len(nums: list[int]) -> bool:
    return len(nums) != len(set(nums))

Note: This approach always processes the entire array before returning, even if a duplicate appears at index 1. Approach 2 short-circuits on the first duplicate found. For most interview settings, Approach 2 is preferred.

Solution Breakdown — Step by Step

def contains_duplicates_hash_map(nums: list[int]) -> bool:
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False

Line by line:

seen = set()

A hash set that tracks every distinct number we've visited
Unlike a list, membership testing (in) is O(1) — hash-based lookup
We don't need to store indices here (contrast with Two Sum), just presence

for num in nums:

Iterate directly over values — we don't need indices, so no enumerate
Each iteration processes one number and makes an O(1) decision

if num in seen:

O(1) hash set lookup — Python computes hash(num) and checks one bucket
If this returns True, we found the same number twice — work is done

return True

Early exit: we return the moment the first duplicate is confirmed
No need to scan the rest of the array

seen.add(num)

Record this number so future iterations can detect it as a duplicate
Critical: this comes AFTER the membership check — if we added first, every number would falsely appear to be a duplicate of itself

return False

Only reached if the loop finishes without finding any duplicate
Means all elements in nums were distinct

Common Mistakes

1. Using a list instead of a set

# WRONG — O(n) lookup, total becomes O(n²)
seen = []
if num in seen:  # scans every element of seen
    return True
seen.append(num)

Always use set() for membership testing. This is the entire performance difference.

2. Adding before checking

# WRONG — every number immediately "duplicates" itself
seen.add(num)
if num in seen:  # always True!
    return True

Always check membership before adding. Add is the fallback when no match is found.

3. Checking length without understanding the trade-off

# Works but doesn't short-circuit
return len(nums) != len(set(nums))

This is fine for correctness but builds the full set even when nums[0] == nums[1]. In interviews, mention this trade-off — it shows depth of understanding.

Pattern Recognition

Use the "have I seen this before?" hash set pattern when you see:

"Does any element appear more than once?"

"Find the first repeated element"

"Check for membership in O(1)"

Similar problems:

Two Sum — same hash-lookup pattern, but stores index alongside value (uses dict not set)
Longest Substring Without Repeating Characters — sliding window + set to track current window
Happy Number — detect cycle by tracking seen values in a set
Find the Duplicate Number — duplicate detection with space constraint (O(1) space solution)
Intersection of Two Arrays — set membership across two arrays

Real World Use Cases

1. Idempotency key deduplication in distributed payment systems

Payment APIs use idempotency keys to prevent duplicate charges when a client retries a failed request. The server stores processed keys in a Redis set and checks membership on every incoming request — the exact "have I seen this before?" pattern. Processing 50,000 requests per second makes O(1) lookup non-negotiable.

2. Event deduplication in Kafka consumer pipelines

In high-throughput event streaming, network retries often cause the same message to be delivered more than once. Consumers maintain a sliding-window set of recently processed message IDs and check for membership before processing each event, preventing duplicate writes to the database.

3. Username/email uniqueness validation during user registration

When a user registers, the system checks whether their chosen username or email already exists in the database. Database systems implement this with unique indexes (hash-based internally), which perform the same O(1) duplicate detection at storage scale rather than in-memory scale.

Quick Summary

Approach	Time	Space
Brute Force (nested loops)	O(n²)	O(1)
Optimal (hash set, early exit)	O(n)	O(n)
One-liner (length comparison)	O(n)	O(n)

Key Takeaways

A hash set gives O(1) membership testing — use it whenever you need "have I seen X?"
Early return on first duplicate is more efficient than building the full set first
Trading O(n) space for O(n) time (instead of O(n²) time with O(1) space) is almost always the right call
The in operator on a Python list is O(n); on a set it is O(1) — this distinction defines your algorithm's complexity

Where to Practice

Platform	Problem	Difficulty
LeetCode #217	Contains Duplicate	Easy

This problem is part of the Blind 75 and NeetCode 150 interview prep lists.