SE 2FA3 — Formal Languages and Automata — is a course about one of the deepest questions in computer science: what problems can computers solve, and how do we describe the inputs those programs read?
Rather than asking "how do I write a program," this course asks "what kinds of patterns can a machine even recognize?" The answer turns out to be a beautiful hierarchy — and your midterm focuses on the simplest level of that hierarchy: regular languages.
The Core Equivalence — The Most Important Idea
The entire course rests on one stunning theorem:
Regular Expression ⟺ NFA ⟺ DFA ⟺ Regular Language
These four things all describe exactly the same class of languages. They are just different ways of expressing the same power. Understanding this equivalence — and knowing how to convert between them — is the heart of this course.
A Mental Model: The Language Machine
Think of a language as a yes/no filter on strings. You feed it a string, it says "this string is in the language" (accept) or "it's not" (reject).
A DFA (Deterministic Finite Automaton) is one way to physically build such a filter — a machine with a finite number of "rooms" (states) that reads one character at a time and moves between rooms, finally saying yes or no based on which room it ends up in.
What Topics Appear on the Midterm?
| Topic | Frequency | Marks |
|---|---|---|
| True/False about theory | Every exam | 5 marks |
| DFA Construction | Every exam | 2–3 marks |
| NFA Construction | Every exam | 2–3 marks |
| Regular Expressions | Every exam | 5 marks |
| Subset/Product Construction | Most exams | 3–5 marks |
| Pumping Lemma | Occasionally | 5 marks |
You're allowed 1 page of notes (front and back). Build it as you go through this lesson — every key formula is noted with what to write down.
Alphabet (Σ)
An alphabet Σ is a finite, non-empty set of symbols.
Examples:
- Σ = {a, b} — the two-letter alphabet used in most exam problems
- Σ = {0, 1} — binary alphabet, used for bit-string problems
- Σ = {a, b, c, ..., z} — the English alphabet
Strings
A string (or word) over Σ is a finite sequence of symbols from Σ.
The length |w| of string w is the number of symbols in it.
The empty string ε (epsilon) has length 0 — it contains no symbols.
| String | Length | Over Σ = {a,b}? |
|---|---|---|
ε | 0 | Yes |
a | 1 | Yes |
aabb | 4 | Yes |
abba | 4 | Yes |
abc | 3 | No — 'c' not in Σ |
Σ* — The Universal Set
Σ* (pronounced "Sigma star") is the set of ALL strings over Σ, including ε.
It is always infinite (unless Σ is empty).
Σ* = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...}
Languages
A language L over Σ is any subset of Σ*.
A language can be finite or infinite, and ∅ (the empty set) and Σ* itself are valid languages.
Over Σ = {a, b}:
- L₁ = {ε, a, b} — finite language with 3 strings
- L₂ = {aⁿbⁿ | n ≥ 0} = {ε, ab, aabb, aaabbb, ...} — infinite language
- L₃ = {w | w contains at least one "ab"} — another infinite language
- L₄ = ∅ — the empty language (no strings at all)
Operations on Languages
| Operation | Symbol | Meaning | Example |
|---|---|---|---|
| Union | L₁ ∪ L₂ | Strings in L₁ OR L₂ | {a} ∪ {b} = {a,b} |
| Intersection | L₁ ∩ L₂ | Strings in BOTH L₁ and L₂ | {a,b} ∩ {b,c} = {b} |
| Complement | Σ* − L | All strings NOT in L | Σ* − {a} = {ε, b, aa, ab, ...} |
| Concatenation | L₁ · L₂ | Strings from L₁ followed by strings from L₂ | {ab}·{ba} = {abba} |
| Kleene Star | L* | Zero or more concatenations of L | {ab}* = {ε, ab, abab, ababab, ...} |
| Kleene Plus | L⁺ | One or more concatenations | {ab}⁺ = {ab, abab, ...} |
Every finite language is regular. This kills many True/False questions! If you see "L = {aⁿbⁿ | n ≤ 100}", don't be fooled — that's a finite set of 101 strings, so it IS regular and its complement IS also regular.
What Makes a Language "Regular"?
A language is regular if it can be described by any of these equivalent formalisms:
- A DFA (Deterministic Finite Automaton)
- An NFA (Nondeterministic Finite Automaton)
- A Regular Expression
Regular languages are closed under union, intersection, complement, concatenation, and Kleene star. If L is regular, any of these operations on L also produces a regular language.
Formal Definition
A DFA M = (Q, Σ, δ, s, F) consists of:
- Q — a finite set of states
- Σ — the input alphabet
- δ: Q × Σ → Q — the transition function (reads a state + symbol, outputs a state)
- s ∈ Q — the start state
- F ⊆ Q — the set of accepting (final) states
The transition function δ is total — every state must have exactly one transition for every symbol in Σ. No missing transitions, no multiple transitions. This is what makes it "deterministic."
How a DFA Reads a String
The extended transition function δ̂ tells us where we end up after reading a whole string:
δ̂(q, wa) = δ(δ̂(q, w), a) (read w first, then symbol a)
String x is ACCEPTED by M iff δ̂(s, x) ∈ F
In plain English: start at the start state, read the string one symbol at a time following the arrows, and at the end — if you're in an accepting state, accept; otherwise reject.
Anatomy of a DFA Diagram
DFA that accepts all strings containing "ab" as a substring. Σ = {a,b}.
→ single arrow = start state | double circle = accepting state
Tracing a String Through a DFA
Let's trace the string baab through the DFA above.
→ read 'b': δ(q₀, b) = q₀ (stay — no 'ab' yet)
→ read 'a': δ(q₀, a) = q₁ (saw an 'a' — waiting for 'b')
→ read 'a': δ(q₁, a) = q₁ (another 'a' — still waiting for 'b')
→ read 'b': δ(q₁, b) = q₂ (got "ab"! → accepting state)
End state: q₂ ∈ F ✓ → ACCEPT
Interactive DFA Simulator
Try the DFA for "strings containing ab". Type any string over {a, b}:
Transition Table
A DFA can also be written as a transition table — every state × symbol combination has exactly one output state.
| State | On 'a' | On 'b' | Notes |
|---|---|---|---|
| → q₀ | q₁ | q₀ | Start: haven't seen 'a' yet |
| q₁ | q₁ | q₂ | Just saw 'a', waiting for 'b' |
| ★ q₂ | q₂ | q₂ | Saw "ab" — stay accepted forever |
→ marks start state, ★ marks accepting states
The Golden Rule
States = Memory. A DFA state represents everything the machine needs to remember about what it has read so far. Ask yourself: "what information do I need to track to decide acceptance?" Each distinct memory value becomes a state.
Step-by-Step Method
What about the string read so far determines the outcome? Could be: "last two characters," "count mod 3," "did I see 'ab'?", "am I in a valid prefix?"
Add a "dead/trap" state if strings can become permanently rejected. This state has transitions back to itself on every symbol.
Every state must have exactly one outgoing edge per symbol. No exceptions.
Start = memory for "read nothing." Accepting = memories that indicate the string satisfies the language property.
Trace at least one string that should be accepted and one that should be rejected. Fix any errors.
Worked Example 1: Strings ending in "ab" 2024 Q2
Build a DFA with exactly 4 states for L = {x | x ends with "ab"}.
We need to know how close we are to completing "ab" at the end. There are 4 distinct situations:
- q₀: Haven't seen any promising suffix (start, or just saw 'b' after non-'a')
- q₁: Last character was 'a' (potential start of "ab")
- q₂: Last two characters were "ab" ← ACCEPT
- q₃: Dead/trap (for this example we use it for specific "b" after reset)
Actually with 4 states:
| State | Meaning | On 'a' | On 'b' | Accepting? |
|---|---|---|---|---|
| → q₀ | Start / "reset" (saw 'b' not preceded by 'a') | q₁ | q₀ | No |
| q₁ | Last char was 'a' | q₁ | q₂ | No |
| ★ q₂ | Last two chars were "ab" | q₁ | q₃ | Yes |
| q₃ | Was in "ab" state, then saw 'b' — now just "bb" suffix | q₁ | q₀ | No |
Trace "aab": q₀ →(a) q₁ →(a) q₁ →(b) q₂ ✓ (accepted — ends in "ab")
Trace "aba": q₀ →(a) q₁ →(b) q₂ →(a) q₁ ✗ (rejected — "aba" doesn't end in "ab")
Trace "bab": q₀ →(b) q₀ →(a) q₁ →(b) q₂ ✓ (accepted — ends in "ab")
Worked Example 2: Binary multiples of 3 2023 Q2
Build a DFA for L = {x | x is a binary string representing a multiple of 3}.
Track the value of the string read so far, modulo 3. When you read bit b at the end of a number n, the new value is 2n + b. So we only need 3 states!
| State | Meaning (value mod 3) | On '0' | On '1' | Accepting? |
|---|---|---|---|---|
| →★ q₀ | value ≡ 0 (mod 3) | q₀ | q₁ | Yes (0 is multiple of 3) |
| q₁ | value ≡ 1 (mod 3) | q₂ | q₀ | No |
| q₂ | value ≡ 2 (mod 3) | q₁ | q₂ | No |
If current value ≡ r (mod 3) and we read bit b, new value = 2r + b.
From q₀ (r=0): read 0 → 0 mod 3 = 0 = q₀; read 1 → 1 mod 3 = 1 = q₁ ✓
From q₁ (r=1): read 0 → 2 mod 3 = 2 = q₂; read 1 → 3 mod 3 = 0 = q₀ ✓
From q₂ (r=2): read 0 → 4 mod 3 = 1 = q₁; read 1 → 5 mod 3 = 2 = q₂ ✓
For "NOT multiples of 3": flip the accepting states → q₁ and q₂ accept, q₀ doesn't.
Worked Example 3: Strings where |x| is even
L = {x ∈ Σ* | |x| is even} over Σ = {a, b}.
Track length mod 2 — only 2 states needed:
| State | Meaning | On 'a' | On 'b' | Accepting? |
|---|---|---|---|---|
| →★ q₀ | Even length so far | q₁ | q₁ | Yes (ε has even length 0) |
| q₁ | Odd length so far | q₀ | q₀ | No |
When the problem says "exactly 4 states," count carefully. You may need a dead/trap state even if the language seems simple. A trap state has transitions to itself on all symbols and is never accepting.
What Makes an NFA Different?
An NFA N = (Q, Σ, δ, s, F) is like a DFA but with three relaxations:
- δ: Q × (Σ ∪ {ε}) → 2Q — transitions return a SET of states (possibly empty)
- From any state, on any symbol, you can go to zero, one, or many states
- ε-transitions are allowed — free moves without consuming input
Acceptance: A string is accepted if there EXISTS at least one computation path that ends in an accepting state.
DFA vs NFA — Side by Side
| Feature | DFA | NFA |
|---|---|---|
| Transitions per symbol | Exactly 1 | 0, 1, or many |
| ε-transitions | ❌ Not allowed | ✅ Allowed |
| Acceptance rule | The (unique) end state ∈ F | ANY path ends in F |
| Languages recognized | Regular only | Regular only (same!) |
| Design effort | More states, more complex | Often simpler to design |
| State count | Can be up to 2ⁿ more states | Often exponentially fewer |
Despite the differences, NFA and DFA recognize exactly the same class of languages — the regular languages. For every NFA there exists an equivalent DFA, and vice versa. This is a theorem, not just a claim.
Why Use NFAs Then?
NFAs are much easier to design for many languages. You can "guess" which branch to take, knowing one accepting path is enough. The machine is conceptually parallel — imagine it exploring all paths at once.
Σ = {0,1}. This NFA nondeterministically "guesses" when the last two symbols are about to be "11".
NFA for L = {w | w ends in "11"}. The nondeterminism happens at q₀ on '1' — the machine can stay at q₀ OR move to q₁.
Tracing an NFA — Parallel Computation
Think of an NFA as running ALL possible paths simultaneously. When you hit a fork (e.g., on '1' from q₀ you can go to q₀ OR q₁), the machine splits and pursues both.
Start: {q₀}
Read '0': from q₀ on '0' → {q₀} → Current set: {q₀}
Read '1': from q₀ on '1' → {q₀, q₁} → Current set: {q₀, q₁}
Read '1': from q₀ on '1' → {q₀, q₁}; from q₁ on '1' → {q₂} → Current set: {q₀, q₁, q₂}
End: {q₀, q₁, q₂} ∩ F = {q₂} ≠ ∅ → ACCEPT ✓
ε-transitions
ε-transitions let you move to another state without reading any input. They're extremely useful for designing NFAs by composing smaller pieces.
An NFA for L((aa+b)*(bb+a)*) might use ε-transitions to connect the two halves, meaning: "guess" when you've finished the first half and are starting the second.
The Core Idea
A DFA must be deterministic — at any point it's in one state. But when simulating an NFA, we might be in multiple states at once. The insight: make those sets of NFA states become single DFA states.
Key Idea: Each DFA state is a set of NFA states. The DFA tracks "which NFA states could we possibly be in right now?"
The Algorithm
The DFA starts in the set containing only the NFA's start state. (Include ε-closure if there are ε-transitions.)
Collect all NFA states reachable from any state in S by reading a.
Keep a worklist. Process each unvisited DFA state. ∅ (empty set) is a valid dead state — it transitions to itself on everything.
Worked Example: NFA for "strings ending in 11"
NFA: Q = {q₀, q₁, q₂}, Σ = {0,1}, s = q₀, F = {q₂}
NFA transitions: δ(q₀,0)={q₀}, δ(q₀,1)={q₀,q₁}, δ(q₁,0)=∅, δ(q₁,1)={q₂}, δ(q₂,0)=∅, δ(q₂,1)=∅
Build the Subset Table
| DFA State (Set) | On '0' | On '1' | Accepting? |
|---|---|---|---|
| → {q₀} | δ(q₀,0) = {q₀} | δ(q₀,1) = {q₀,q₁} | No (q₂ ∉ {q₀}) |
| {q₀,q₁} | δ(q₀,0)∪δ(q₁,0) = {q₀}∪∅ = {q₀} | δ(q₀,1)∪δ(q₁,1) = {q₀,q₁}∪{q₂} = {q₀,q₁,q₂} | No |
| ★ {q₀,q₁,q₂} | δ(q₀,0)∪δ(q₁,0)∪δ(q₂,0) = {q₀} | δ(q₀,1)∪δ(q₁,1)∪δ(q₂,1) = {q₀,q₁,q₂} | Yes ✓ (q₂ ∈ this set) |
We only generated 3 DFA states — no others were reachable. The final DFA has states {q₀}, {q₀,q₁}, {q₀,q₁,q₂} with the last one accepting.
Always present subset construction as a table. The table format is exactly what the marker expects and makes your work easy to follow. Show the computation for each cell: "δ(q₀,0) ∪ δ(q₁,0) = ..."
How Many States Can You Get?
An NFA with n states can produce a DFA with up to 2n states (one for each possible subset). In practice, usually far fewer subsets are reachable. In exam problems, you typically get 3–6 DFA states.
The Idea
Given two DFAs M₁ and M₂, the product construction builds a new DFA that runs both simultaneously. Each state of the product DFA is a pair (p, q) — one state from each machine.
Product DFA M = (Q₁ × Q₂, Σ, δ, (s₁,s₂), F) where:
δ((p, q), a) = (δ₁(p, a), δ₂(q, a))
For UNION (L₁ ∪ L₂):
F = {(p,q) | p ∈ F₁ OR q ∈ F₂}
For INTERSECTION (L₁ ∩ L₂):
F = {(p,q) | p ∈ F₁ AND q ∈ F₂}
For DIFFERENCE (L₁ \ L₂):
F = {(p,q) | p ∈ F₁ AND q ∉ F₂}
Worked Example 2024 Q3
Build DFAs for:
- M₁: L(M₁) = {x | |x| mod 2 = 0} — strings of even length
- M₂: L(M₂) = {x | |x| mod 3 = 0} — strings whose length is divisible by 3
States: e (even, ACCEPT), o (odd). Start: e. δ(e,a)=δ(e,b)=o, δ(o,a)=δ(o,b)=e.
States: 0 (ACCEPT), 1, 2. Start: 0. Every symbol increments count mod 3: δ(0,a)=δ(0,b)=1, δ(1,a)=δ(1,b)=2, δ(2,a)=δ(2,b)=0.
Product for M₃ = M₁ ∪ M₂
States: (e,0), (e,1), (e,2), (o,0), (o,1), (o,2). Start: (e,0). Accept if p∈F₁ OR q∈F₂.
| State | On any symbol | Accept? (even OR div3) |
|---|---|---|
| ★ (e, 0) | (o, 1) | Yes — e∈F₁ AND 0∈F₂ |
| ★ (e, 1) | (o, 2) | Yes — e∈F₁ |
| ★ (e, 2) | (o, 0) | Yes — e∈F₁ |
| ★ (o, 0) | (e, 1) | Yes — 0∈F₂ |
| (o, 1) | (e, 2) | No |
| (o, 2) | (e, 0) | No |
Product for M₄ = M₁ ∩ M₂
Same structure, but accept only when BOTH are in accepting states (even AND divisible by 3 = divisible by 6):
| State | Accept? (even AND div3) |
|---|---|
| ★ (e, 0) | Yes |
| (e, 1) | No |
| (e, 2) | No |
| (o, 0) | No |
| (o, 1) | No |
| (o, 2) | No |
For intersection, only strings of length divisible by 6 are accepted — makes sense!
Union = OR = lenient (accept if either machine accepts)
Intersection = AND = strict (accept only if both accept)
Why Regular Expressions?
Building a DFA or NFA can be tedious. Regular expressions (regex) give us a compact algebraic notation for describing the same languages, and they're much easier to write. They're also equivalent — every regex corresponds to an NFA and vice versa.
The Building Blocks
| Expression | Language it denotes | Example |
|---|---|---|
∅ | The empty language — no strings | (nothing) |
ε | The language {ε} — only the empty string | "" |
a (any symbol) | The language {a} — just that one symbol | "a" |
r + s | L(r) ∪ L(s) — strings from r OR s | a+b → {a, b} |
rs | L(r) · L(s) — strings from r then s | ab → {ab} |
r* | L(r)* — zero or more repetitions of r | a* → {ε, a, aa, ...} |
Operator Precedence (High to Low)
2. Concatenation — middle, left-to-right
3. + (union) — lowest, applies to largest possible expressions
Examples:
ab* = a(b*) — star binds to b only
a+bc = a+(bc) — + is lowest, so bc is one piece
(a+b)* = any combo of a's and b's in any order
Essential Shorthands
| Shorthand | Equivalent | Meaning |
|---|---|---|
r+ | rr* | One or more r's |
r? | ε + r | Zero or one r (optional) |
(a+b) | any a or b | Often written Σ when Σ={a,b} |
(a+b)* | Σ* | Any string over {a,b} |
(bb + a)*(aa + b)* reads as: "zero or more occurrences of (bb or a), followed by zero or more occurrences of (aa or b)."
This matches strings like: ε, a, bb, aaa, bba, bbaa, aab, ...
Interactive Regex Pattern Tester
Try common exam patterns. Enter a test string and see if it matches.
a*b+ describe?Strategy: Think in Constraints
To write a regex for "all strings satisfying property X," ask yourself:
- What can come before and after the key pattern?
- Can I break this into cases (use
+)? - What characters are forbidden in what positions?
- Am I tracking a count? Think mod — or use DFA reasoning then write regex from that.
Key insight: If 'a' is followed by 'b' anywhere, the string is rejected. So once we see any 'a', we can never see a 'b' afterward. This means: all the b's come before all the a's.
Answer: b*a*
Test: "ba" ✓, "bba" ✓, "aaa" ✓, "bbb" ✓, "ε" ✓, "ab" ✗ (correct!), "bab" ✗ (correct!)
Note: "bab" fails because after the middle 'a', there's a 'b'.
Key insight: An odd-length string can be written as: one character, then any number of pairs of characters.
Answer: (a+b)((a+b)(a+b))*
Or equivalently: ((a+b)(a+b))*(a+b)
The pattern picks any single symbol, then zero or more pairs. Total: 1 + 2k = odd ✓
Test: "a" (length 1 ✓), "abc"... actually "aba" (length 3 ✓), "ab" (length 2 ✗ correct).
Key insight: Every 'a' must be immediately followed by a 'b' (or end the string). So a's can only appear as "ab" blocks, except possibly a final 'a'.
Answer: b*(ab*)*(ε+a) or equivalently (b+ab)*(ε+a)
Reading: any number of b's, then any number of (a followed by b's), optionally ending in a single a.
Test: "aba" ✓, "abab" ✓, "b" ✓, "ε" ✓, "aa" ✗ (correct!), "bab" ✓, "baa" ✗ (correct!)
Key insight: The string must contain "aa" exactly once, and no other consecutive a's. This means: before and after the "aa", we can have anything EXCEPT another "aa". Use the "no consecutive a's" pattern around the "aa".
Let NO_AA = (b+ab)*(ε+a) = strings with no consecutive a's.
Answer: (b+ab)* aa (b+ab)*(ε+a)
But wait — we also need to make sure there's no 'a' immediately before or after the "aa" that would create a run of 3+ a's. Adjust:
Refined: (b+ab)* b aa b (ab+b)* (ε+a) (if the "aa" is in the middle)
This is tricky — full solution requires careful case analysis based on whether "aa" appears at start/end/middle.
Key insight: The number of b's is 1, 2, 4, 5, 7, 8, ... (not 0, 3, 6, 9...). Split into two cases: #b ≡ 1 (mod 3) or #b ≡ 2 (mod 3).
Let A = a* (any number of a's, used between b's)
Let GRP = a*ba*ba*ba* = exactly 3 b's with any a's
For #b ≡ 1 (mod 3): a*(ba*ba*ba*)* b a* — groups of 3 b's, plus 1 extra
For #b ≡ 2 (mod 3): a*(ba*ba*ba*)* b a* b a* — groups of 3 b's, plus 2 extra
Answer: union of both cases with +.
Tip: on the exam, these counting patterns are often better designed as a DFA first, then described in English.
Key insight: Either "00" comes before "11" or "11" comes before "00".
Let ANY = (0+1)*
Answer: (0+1)* 00 (0+1)* 11 (0+1)* + (0+1)* 11 (0+1)* 00 (0+1)*
Or more concisely since the parts can overlap: you need any string that has at least one "00" AND at least one "11" somewhere. The two cases cover whether "00" or "11" appears first.
Key insight: Length ≡ 1 (mod 3) OR length ≡ 2 (mod 3).
Let Σ = (0+1).
Answer: Σ(ΣΣΣ)* + ΣΣ(ΣΣΣ)*
First part: lengths 1, 4, 7, ... (≡ 1 mod 3)
Second part: lengths 2, 5, 8, ... (≡ 2 mod 3)
Strategy — State Elimination Method:
- If multiple accepting states, add a new single accepting state with ε-transitions from each.
- Add a new start state with ε-transition to the original start.
- Eliminate states one by one: for each state q being eliminated, for each pair (p, r) of predecessor and successor, add a direct transition labeled
p_to_q · q_loop* · q_to_r. - When only start and accept remain, the label on the remaining transition is your regex.
This is algebraic and mechanical — practice it on the 2024 Q4d and 2025 Q4c diagrams.
The Conversion Map
↑ │
│ Subset Construction
State Elimination │
│ ↓
DFA ←────────────── DFA (already a DFA — just simplify)
Regex → NFA (Thompson's Construction)
Build NFAs for primitive cases, then combine with ε-transitions.
| Regex | NFA Structure |
|---|---|
a (single symbol) | →(s) —a→ ((t)) |
r + s (union) | New start → ε → NFA_r; New start → ε → NFA_s; both old accepts → ε → new accept |
rs (concat) | NFA_r's accept → ε → NFA_s's start |
r* (star) | New start/accept → ε → NFA_r start; NFA_r accept → ε → NFA_r start (loop); NFA_r accept → ε → new accept; new start → ε → new accept |
This is: (a*)(aa)(b*) — three concatenated pieces.
- Build NFA for a* → a loop state
- Build NFA for aa → two consecutive a-transitions
- Build NFA for b* → a b loop state
- Connect with ε-transitions: a* accept → ε → aa start, aa accept → ε → b* start
Or just draw it directly: →(q₀) —a→ (q₀) self-loop, then —a→ (q₁) —a→ (q₂) —b→ ((q₂)) with b self-loop. Note q₀ loops on 'a', then eventually takes 'a' to q₁, another 'a' to q₂ (accepting), b's stay at q₂.
Regex → DFA Directly
For many exam problems, you can build a DFA directly from a regex by thinking about what the regex means. Identify the states (what do you need to remember?) and fill in transitions.
This language = "any number of (aa or b) blocks, then any number of (bb or a) blocks." The key: in the first part, a's come in pairs; in the second, b's come in pairs.
States needed: track "which part are we in" and "did we see an odd a or b."
This is complex — for exam purposes, build the NFA first (much easier!) then convert via subset construction if a DFA is required.
DFA → Regex (State Elimination)
For each pair of states p (that goes to q) and r (that q goes to), add a direct p→r edge labeled: (label of p→q)(R*)(label of q→r)
If there was already a p→r edge, take the union (+) of old and new labels.
For the 2025 Q4c and Q4d diagrams, state elimination is the intended approach. Practice doing it on small 3-state DFAs. The algebra can get messy — simplify as you go.
Definition
Two states p and q in a DFA are equivalent (written p ≡ q) if:
In plain English: from both p and q, the same set of strings leads to acceptance. They behave identically for all future inputs.
The Exam Trap — Two Common Confusions
p ≡ q does NOT mean δ(p,a) = δ(q,a)
Equivalent states can have transitions going to different states — as long as those different states are themselves equivalent to each other.
Example: Suppose q₀ and q₁ both reject all future inputs. They're equivalent. But δ(q₀, a) = q₂ and δ(q₁, a) = q₃, where q₂ and q₃ are also equivalent to each other. So p≡q but δ(p,a) ≠ δ(q,a).
If p ∈ F and q ∉ F → p and q are definitely NOT equivalent
Proof: use x = ε as the witness. δ̂(p, ε) = p ∈ F (p accepts ε), but δ̂(q, ε) = q ∉ F (q rejects ε). Since they disagree on ε, they're not equivalent.
This is TRUE and appears as a True/False claim. It's always TRUE.
How to Check Equivalence
Two states p and q are NOT equivalent if there exists a "distinguishing string" x such that exactly one of δ̂(p,x) ∈ F and δ̂(q,x) ∈ F holds.
In a DFA for "strings containing ab": is q₀ ≡ q₁?
From q₀: reading "ab" → q₂ (accept). From q₁: reading "b" → q₂ (accept), actually reading "ab" from q₁: a→q₁, b→q₂ (accept). Same!
But from q₀: reading "b" stays at q₀ (reject for ε after). From q₁: reading "b" → q₂ (accept). So x="b" is a distinguishing string → q₀ ≢ q₁.
Summary of Key Claims
| Claim | True/False | Why |
|---|---|---|
| p ≡ q → ∀a: δ(p,a) = δ(q,a) | FALSE | Transitions can differ but lead to equivalent states |
| p ≡ q → ∀a: δ(p,a) ≡ δ(q,a) | TRUE | This is what equivalence actually guarantees |
| p∈F, q∉F → p ≢ q | TRUE | ε witnesses the difference |
| p ≢ q → ∃a: δ(p,a) ≠ δ(q,a) | FALSE | Different transitions can still lead to equivalent states; non-equivalence is about future behavior, not immediate transitions |
Why Do We Need This?
Not all languages are regular. The Pumping Lemma gives us a way to prove a language cannot be recognized by any DFA, NFA, or regex. The classic non-regular language is L = {aⁿbⁿ | n ≥ 0} — you can't count to n with finite memory.
The Theorem
If L is a regular language, then there exists an integer k ≥ 1 (the pumping length) such that every string s ∈ L with |s| ≥ k can be split into three parts s = xyz where:
- |y| ≥ 1 — the middle part y is non-empty
- |xy| ≤ k — x and y together are at most k characters
- ∀i ≥ 0: xyⁱz ∈ L — pumping y any number of times keeps the string in L
Using It: The Adversarial Game
To prove L is NOT regular, you play a game against a demon. You win if you show the Pumping Lemma fails.
| Demon | Picks k (you don't know what k is) |
| You | Pick a string s ∈ L with |s| ≥ k (your choice depends on k) |
| Demon | Picks how to split s = xyz (satisfying conditions 1 and 2) |
| You | Pick i ≥ 0 to show xyⁱz ∉ L |
If you can always find such an i (for any split the demon picks), L is not regular.
Worked Example: L = {aⁿbⁿ | n ≥ 0}
We don't know what k is, but we know it's some positive integer.
This is in L (it has k a's followed by k b's) and |s| = 2k ≥ k. ✓
Since |xy| ≤ k, and s starts with k a's, both x and y must be entirely within the a-prefix. So y = aʲ for some j ≥ 1, and x = aⁱ for some i ≥ 0, with i + j ≤ k.
= x y z
This has k−j a's and k b's. Since j ≥ 1, k−j < k. So we have fewer a's than b's → NOT in L. ✓ We win!
The 2025 exam asked you to critique a faulty proof. Things to check for:
- Did they pick a string that's actually in L?
- Does their split satisfy |xy| ≤ k?
- Did they handle ALL possible demon splits (not just one)?
- Did they actually show the pumped string leaves L?
Common Choices for s
| Language type | Good choice for s | Why it works |
|---|---|---|
| {aⁿbⁿ} | aᵏbᵏ | y lands in a's only; pumping unbalances a's and b's |
| {ww | w ∈ Σ*} | aᵏbᵏaᵏbᵏ | Pumping disrupts the duplication pattern |
| Palindromes | aᵏbaᵏ | Pumping a's only on one side breaks palindrome structure |
| {a^(p) | p prime} | aᵖ where p ≥ k is prime | Pumped versions have non-prime length |
For each claim: if TRUE, you just need to write "True." If FALSE, you must explain WHY with a counterexample or proof. Be precise and concise.
The 6 Recurring Patterns
Every True/False question falls into one of these patterns. Master these and you'll get Q1 every time:
| # | Pattern | Answer | Key Insight |
|---|---|---|---|
| A | "Finite language → complement non-regular" | FALSE | All finite languages are regular. Complement of regular = regular. |
| B | "Every NFA has an equivalent DFA" | TRUE | Subset construction theorem. |
| C | "There exists NFA with no equivalent DFA" | FALSE | Every NFA has an equivalent DFA (same as B). |
| D | "p≡q implies δ(p,a)=δ(q,a)" | FALSE | Equivalence is about future acceptance, not identical transitions. |
| E | "p∈F, q∉F → p,q not equivalent" | TRUE | ε is a distinguishing witness. (δ̂(p,ε)=p∈F but δ̂(q,ε)=q∉F) |
| F | "Complement of regular is non-regular" | FALSE | Regular languages are closed under complement. |
All 15 Claims — Complete Bank
TRUE. This is the NFA→DFA equivalence theorem. Subset construction converts any NFA to an equivalent DFA. No explanation needed — just write "True."
FALSE. L = {ε, ab, aabb} is a finite language. All finite languages are regular. Since regular languages are closed under complement, Σ*−L is also regular.
TRUE. This is the literal definition of DFA acceptance. No explanation needed.
FALSE. The regex requires the string to start with 'b' (the first symbol is a literal 'b'). But "aaba" starts with 'a'. No string starting with 'a' can match this regex.
FALSE. State equivalence means δ(p,a) and δ(q,a) are equivalent states, not necessarily the same state. Counterexample: Take a minimal DFA; duplicate any accepting state. The two copies are equivalent but δ goes to different (yet equivalent) copies.
TRUE. Take any DFA M. Add redundant states to get NFA N that recognizes the same language. M has fewer states. Alternatively, there are languages where the minimal DFA is smaller than any NFA for that language.
TRUE. Every finite language is regular. The Pumping Lemma holds for all regular languages. Set k = (length of longest string + 1). No string in L satisfies |s| ≥ k, so the condition is vacuously true.
FALSE. For every NFA, subset construction produces an equivalent DFA. No NFA can recognize a language that no DFA can recognize.
FALSE. Regular languages are closed under complement. The complement of any regular language is also regular. (To build a DFA for the complement: flip all accepting states.)
FALSE. Non-equivalence means there exists a string x distinguishing p and q. But the immediate transitions δ(p,a) and δ(q,a) can be the same state. Example: p∈F, q∉F, but δ(p,a)=δ(q,a)=r for all a. Still p≢q (ε witnesses it), but transitions are identical.
FALSE. L has only 101 strings (finite). All finite languages are regular. Complement of a regular language is regular.
TRUE. The first alternative (a+ba)* includes (a)* which contains "aa" (take "a" twice). So aa ∈ L((a+ba)*) ⊆ L((a+ba)* + (ab+ba)*).
TRUE. Classic result: the language of strings where the n-th-from-last bit is 1. An NFA needs n+1 states; every equivalent DFA needs 2ⁿ states. So there exists an NFA that is smaller than every equivalent DFA.
FALSE. This universal claim is false. For many simple languages, a DFA can have fewer or equal states to the NFA. Also, you can always add useless states to an NFA to make it larger than any equivalent DFA. The claim "ALL NFAs have fewer states than ALL equivalent DFAs" is clearly wrong.
TRUE. Use ε as the witness. δ̂(p,ε) = p ∈ F (p accepts ε). δ̂(q,ε) = q ∉ F (q rejects ε). Since the string ε is accepted from p but rejected from q, p and q are not equivalent by definition.
Try each problem yourself first. Then click "Show Solution" to see the full worked answer. Honest self-assessment is the key to learning.
For each claim, state True or False. If False, explain why.
(a) If L₁ and L₂ are both non-regular, then L₁ ∪ L₂ is non-regular.
(b) There exists a DFA with 2 states that recognizes L = {w | w has odd length}.
(c) For all DFAs M: L(M) ≠ ∅.
(d) If L is infinite, then L is non-regular.
(e) For all NFAs N, the subset construction produces a DFA with exactly 2|Q| states.
Solutions:
(a) FALSE. L₁ = {aⁿbⁿ | n≥0} and L₂ = complement of L₁ are both non-regular, but L₁ ∪ L₂ = Σ* which IS regular.
(b) TRUE. States: "even length" (ACCEPT) and "odd length". Start at "even." Every symbol swaps between them.
(c) FALSE. A DFA with no accepting states (F = ∅) recognizes the empty language ∅.
(d) FALSE. L = {a}* = {ε, a, aa, ...} is infinite and regular.
(e) FALSE. The subset construction can produce up to 2|Q| states, but typically far fewer are reachable.
(a) [3 marks] Build a DFA for L = {w ∈ {a,b}* | w contains an even number of a's}.
(b) [2 marks] Build a DFA for L = {w ∈ {0,1}* | the binary number w is divisible by 4}.
Solutions:
(a) Track #a's mod 2. Two states: q_even (ACCEPT, start), q_odd.
δ(q_odd, a) = q_even, δ(q_odd, b) = q_odd
Accepting: q_even (ε has 0 a's = even ✓).
(b) Divisible by 4 = last two bits are 00. States track the last two bits read. Start state = "seen nothing" (or ε, which has value 0 → divisible by 4).
δ(s, 0)=s, δ(s, 1)=q1
δ(q1, 0)=q2, δ(q1, 1)=q3
δ(q2, 0)=s, δ(q2, 1)=q1
δ(q3, 0)=q2, δ(q3, 1)=q3
NFA over Σ = {a, b}: Q = {0, 1, 2}, s = 0, F = {2}.
Transitions: δ(0,a) = {0,1}, δ(0,b) = {0}, δ(1,a) = ∅, δ(1,b) = {2}, δ(2,a) = ∅, δ(2,b) = ∅.
(a) Describe the language in plain English.
(b) Convert to a DFA via subset construction. Show all work.
Solutions:
(a) The language is: all strings that contain "ab" as a substring. (State 1 = "just saw 'a'", state 2 = "saw 'ab'")
(b) Subset Construction Table:
| DFA State | On 'a' | On 'b' | Accept? |
|---|---|---|---|
| → {0} | {0,1} | {0} | No |
| {0,1} | {0,1} | {0,2} | No |
| ★ {0,2} | {0,1} | {0} | Yes (2∈F) |
3 DFA states. Start: {0}. Accept: {0,2}.
Σ = {a, b}. Write regular expressions for:
(a) All strings where every 'a' is immediately followed by 'b'.
(b) All strings of even length that start and end with the same symbol.
(c) All strings where the number of a's is divisible by 3.
(d) All strings that contain neither "aa" nor "bb".
Solutions:
(a) Every 'a' must be followed by 'b'. The only 'a' that doesn't need to be followed by 'b' is... none. So: b*(abb*)* (ε + "nothing") — wait, 'a' must always be followed by 'b'. Answer: (b + ab)*
(b) Even length, start=end=a: a(a+b)*a, even length (total). And start=end=b: b(a+b)*b. But we need even total length. a(a+b)^(2k)a doesn't work directly... Answer: a(a+b)(a+b)*a + b(a+b)(a+b)*b + aa + bb (length 2 base cases plus length 4+).
More cleanly: (a(a+b)*a) + (b(a+b)*b) where the inner part has even length. Constrain inner: a((a+b)(a+b))*a + b((a+b)(a+b))*b
(c) Groups of 3 a's: b*(ab*ab*ab*)* b* — any b's interspersed, a's come in groups of 3.
(d) No "aa" and no "bb" — strings must alternate: ab, ba, aba, bab, abab, ... Answer: (ab)*(ε+a) + (ba)*(ε+b)
Final Exam Tips
- Finite languages are ALWAYS regular — and so are their complements. This kills many T/F questions.
- p≡q does NOT mean δ(p,a)=δ(q,a) — equivalence = same future behavior, not same transitions.
- For subset construction: show your table with all states and transitions. A state is accepting if it contains any NFA accepting state.
- For pumping lemma: choose s carefully, account for ALL demon splits (not just one), and explicitly show the pumped string leaves L.
- For regex: simplify. If your answer is correct but needlessly complicated, you lose marks. Think about the simplest equivalent expression.
You've completed the full lesson! Go back through any sections that felt unclear, practice writing DFAs and NFAs on paper, and build your cheat sheet. Good luck on your midterm! 🎓