SE 2FA3 — Full Lesson

Chapter 01 — Introduction

The Big Picture

What is this course actually about, and why does it matter?

SE 2FA3 — Formal Languages and Automata — is a course about one of the deepest questions in computer science: what problems can computers solve, and how do we describe the inputs those programs read?

Rather than asking "how do I write a program," this course asks "what kinds of patterns can a machine even recognize?" The answer turns out to be a beautiful hierarchy — and your midterm focuses on the simplest level of that hierarchy: regular languages.

The Core Equivalence — The Most Important Idea

The entire course rests on one stunning theorem:

Regular Expression ⟺ NFA ⟺ DFA ⟺ Regular Language

These four things all describe exactly the same class of languages. They are just different ways of expressing the same power. Understanding this equivalence — and knowing how to convert between them — is the heart of this course.

A Mental Model: The Language Machine

Think of a language as a yes/no filter on strings. You feed it a string, it says "this string is in the language" (accept) or "it's not" (reject).

A DFA (Deterministic Finite Automaton) is one way to physically build such a filter — a machine with a finite number of "rooms" (states) that reads one character at a time and moves between rooms, finally saying yes or no based on which room it ends up in.

What Topics Appear on the Midterm?

Topic	Frequency	Marks
True/False about theory	Every exam	5 marks
DFA Construction	Every exam	2–3 marks
NFA Construction	Every exam	2–3 marks
Regular Expressions	Every exam	5 marks
Subset/Product Construction	Most exams	3–5 marks
Pumping Lemma	Occasionally	5 marks

⚡ Exam Note

You're allowed 1 page of notes (front and back). Build it as you go through this lesson — every key formula is noted with what to write down.

Chapter 02 — Foundations

Alphabets, Strings & Languages

The mathematical vocabulary you need for everything else.

Alphabet (Σ)

Definition

An alphabet Σ is a finite, non-empty set of symbols.

Examples:

Σ = {a, b} — the two-letter alphabet used in most exam problems
Σ = {0, 1} — binary alphabet, used for bit-string problems
Σ = {a, b, c, ..., z} — the English alphabet

Strings

Definition

A string (or word) over Σ is a finite sequence of symbols from Σ.

The length |w| of string w is the number of symbols in it.

The empty string ε (epsilon) has length 0 — it contains no symbols.

String	Length	Over Σ = {a,b}?
`ε`	0	Yes
`a`	1	Yes
`aabb`	4	Yes
`abba`	4	Yes
`abc`	3	No — 'c' not in Σ

Σ* — The Universal Set

Definition

Σ* (pronounced "Sigma star") is the set of ALL strings over Σ, including ε.

It is always infinite (unless Σ is empty).

For Σ = {a, b}:
Σ* = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...}

Languages

Definition

A language L over Σ is any subset of Σ*.

A language can be finite or infinite, and ∅ (the empty set) and Σ* itself are valid languages.

Examples

Over Σ = {a, b}:

L₁ = {ε, a, b} — finite language with 3 strings
L₂ = {aⁿbⁿ | n ≥ 0} = {ε, ab, aabb, aaabbb, ...} — infinite language
L₃ = {w | w contains at least one "ab"} — another infinite language
L₄ = ∅ — the empty language (no strings at all)

Operations on Languages

Operation	Symbol	Meaning	Example
Union	L₁ ∪ L₂	Strings in L₁ OR L₂	{a} ∪ {b} = {a,b}
Intersection	L₁ ∩ L₂	Strings in BOTH L₁ and L₂	{a,b} ∩ {b,c} = {b}
Complement	Σ* − L	All strings NOT in L	Σ* − {a} = {ε, b, aa, ab, ...}
Concatenation	L₁ · L₂	Strings from L₁ followed by strings from L₂	{ab}·{ba} = {abba}
Kleene Star	L*	Zero or more concatenations of L	{ab}* = {ε, ab, abab, ababab, ...}
Kleene Plus	L⁺	One or more concatenations	{ab}⁺ = {ab, abab, ...}

⚡ Key Fact for Exams

Every finite language is regular. This kills many True/False questions! If you see "L = {aⁿbⁿ | n ≤ 100}", don't be fooled — that's a finite set of 101 strings, so it IS regular and its complement IS also regular.

What Makes a Language "Regular"?

A language is regular if it can be described by any of these equivalent formalisms:

A DFA (Deterministic Finite Automaton)
An NFA (Nondeterministic Finite Automaton)
A Regular Expression

Regular languages are closed under union, intersection, complement, concatenation, and Kleene star. If L is regular, any of these operations on L also produces a regular language.

Quick Check: Which of the following is TRUE about Σ*?

Σ* is a finite set

ε ∈ Σ* for any non-empty alphabet

Σ* contains only single symbols

Every language L equals Σ*

Chapter 03 — Automata

Deterministic Finite Automata

The machine that decides if a string belongs to a language.

Formal Definition

Definition — DFA

A DFA M = (Q, Σ, δ, s, F) consists of:

Q — a finite set of states
Σ — the input alphabet
δ: Q × Σ → Q — the transition function (reads a state + symbol, outputs a state)
s ∈ Q — the start state
F ⊆ Q — the set of accepting (final) states

⚠️ Critical Property

The transition function δ is total — every state must have exactly one transition for every symbol in Σ. No missing transitions, no multiple transitions. This is what makes it "deterministic."

How a DFA Reads a String

The extended transition function δ̂ tells us where we end up after reading a whole string:

δ̂(q, ε) = q (reading ε stays put)
δ̂(q, wa) = δ(δ̂(q, w), a) (read w first, then symbol a)

String x is ACCEPTED by M iff δ̂(s, x) ∈ F

In plain English: start at the start state, read the string one symbol at a time following the arrows, and at the end — if you're in an accepting state, accept; otherwise reject.

Anatomy of a DFA Diagram

DFA that accepts all strings containing "ab" as a substring. Σ = {a,b}.

→ single arrow = start state | double circle = accepting state

Tracing a String Through a DFA

Let's trace the string baab through the DFA above.

δ̂(q₀, baab)
→ read 'b': δ(q₀, b) = q₀ (stay — no 'ab' yet)
→ read 'a': δ(q₀, a) = q₁ (saw an 'a' — waiting for 'b')
→ read 'a': δ(q₁, a) = q₁ (another 'a' — still waiting for 'b')
→ read 'b': δ(q₁, b) = q₂ (got "ab"! → accepting state)

End state: q₂ ∈ F ✓ → ACCEPT

Interactive DFA Simulator

Try the DFA for "strings containing ab". Type any string over {a, b}:

DFA Simulator — L = {w | w contains "ab"}

Enter a string above and press Run.

Transition Table

A DFA can also be written as a transition table — every state × symbol combination has exactly one output state.

State	On 'a'	On 'b'	Notes
→ q₀	q₁	q₀	Start: haven't seen 'a' yet
q₁	q₁	q₂	Just saw 'a', waiting for 'b'
★ q₂	q₂	q₂	Saw "ab" — stay accepted forever

→ marks start state, ★ marks accepting states

What does it mean for a DFA to be "deterministic"?

There can be multiple paths for the same input string

From any state, each symbol leads to exactly one next state

The DFA accepts all possible input strings

ε-transitions are used to move between states

Chapter 04 — Techniques

Building DFAs — How to Actually Do It

A systematic approach with worked examples from past exams.

The Golden Rule

States = Memory. A DFA state represents everything the machine needs to remember about what it has read so far. Ask yourself: "what information do I need to track to decide acceptance?" Each distinct memory value becomes a state.

Step-by-Step Method

1

Identify what you need to remember

What about the string read so far determines the outcome? Could be: "last two characters," "count mod 3," "did I see 'ab'?", "am I in a valid prefix?"

2

List all possible memory values → one state each

Add a "dead/trap" state if strings can become permanently rejected. This state has transitions back to itself on every symbol.

3

Fill in ALL transitions (every state × every symbol)

Every state must have exactly one outgoing edge per symbol. No exceptions.

4

Mark start state (→) and accepting states (double circle)

Start = memory for "read nothing." Accepting = memories that indicate the string satisfies the language property.

5

Test with examples

Trace at least one string that should be accepted and one that should be rejected. Fix any errors.

Worked Example 1: Strings ending in "ab" 2024 Q2

Build a DFA with exactly 4 states for L = {x | x ends with "ab"}.

Step 1 — What do we track?

We need to know how close we are to completing "ab" at the end. There are 4 distinct situations:

q₀: Haven't seen any promising suffix (start, or just saw 'b' after non-'a')
q₁: Last character was 'a' (potential start of "ab")
q₂: Last two characters were "ab" ← ACCEPT
q₃: Dead/trap (for this example we use it for specific "b" after reset)

Actually with 4 states:

State	Meaning	On 'a'	On 'b'	Accepting?
→ q₀	Start / "reset" (saw 'b' not preceded by 'a')	q₁	q₀	No
q₁	Last char was 'a'	q₁	q₂	No
★ q₂	Last two chars were "ab"	q₁	q₃	Yes
q₃	Was in "ab" state, then saw 'b' — now just "bb" suffix	q₁	q₀	No

Verification

Trace "aab": q₀ →(a) q₁ →(a) q₁ →(b) q₂ ✓ (accepted — ends in "ab")

Trace "aba": q₀ →(a) q₁ →(b) q₂ →(a) q₁ ✗ (rejected — "aba" doesn't end in "ab")

Trace "bab": q₀ →(b) q₀ →(a) q₁ →(b) q₂ ✓ (accepted — ends in "ab")

Worked Example 2: Binary multiples of 3 2023 Q2

Build a DFA for L = {x | x is a binary string representing a multiple of 3}.

Key Insight

Track the value of the string read so far, modulo 3. When you read bit b at the end of a number n, the new value is 2n + b. So we only need 3 states!

State	Meaning (value mod 3)	On '0'	On '1'	Accepting?
→★ q₀	value ≡ 0 (mod 3)	q₀	q₁	Yes (0 is multiple of 3)
q₁	value ≡ 1 (mod 3)	q₂	q₀	No
q₂	value ≡ 2 (mod 3)	q₁	q₂	No

Why the transitions work

If current value ≡ r (mod 3) and we read bit b, new value = 2r + b.

From q₀ (r=0): read 0 → 0 mod 3 = 0 = q₀; read 1 → 1 mod 3 = 1 = q₁ ✓

From q₁ (r=1): read 0 → 2 mod 3 = 2 = q₂; read 1 → 3 mod 3 = 0 = q₀ ✓

From q₂ (r=2): read 0 → 4 mod 3 = 1 = q₁; read 1 → 5 mod 3 = 2 = q₂ ✓

For "NOT multiples of 3": flip the accepting states → q₁ and q₂ accept, q₀ doesn't.

Worked Example 3: Strings where |x| is even

L = {x ∈ Σ* | |x| is even} over Σ = {a, b}.

Track length mod 2 — only 2 states needed:

State	Meaning	On 'a'	On 'b'	Accepting?
→★ q₀	Even length so far	q₁	q₁	Yes (ε has even length 0)
q₁	Odd length so far	q₀	q₀	No

⚡ Common Exam Trick

When the problem says "exactly 4 states," count carefully. You may need a dead/trap state even if the language seems simple. A trap state has transitions to itself on all symbols and is never accepting.

You want a DFA for L = {w | w contains exactly one 'a'}. How many states do you need?

2 states

3 states

4 states

5 states

Chapter 05 — Automata

Nondeterministic Finite Automata

More expressive to design, same power as DFAs.

What Makes an NFA Different?

Definition — NFA

An NFA N = (Q, Σ, δ, s, F) is like a DFA but with three relaxations:

δ: Q × (Σ ∪ {ε}) → 2^Q — transitions return a SET of states (possibly empty)
From any state, on any symbol, you can go to zero, one, or many states
ε-transitions are allowed — free moves without consuming input

Acceptance: A string is accepted if there EXISTS at least one computation path that ends in an accepting state.

DFA vs NFA — Side by Side

Feature	DFA	NFA
Transitions per symbol	Exactly 1	0, 1, or many
ε-transitions	❌ Not allowed	✅ Allowed
Acceptance rule	The (unique) end state ∈ F	ANY path ends in F
Languages recognized	Regular only	Regular only (same!)
Design effort	More states, more complex	Often simpler to design
State count	Can be up to 2ⁿ more states	Often exponentially fewer

⚠️ Critical Point

Despite the differences, NFA and DFA recognize exactly the same class of languages — the regular languages. For every NFA there exists an equivalent DFA, and vice versa. This is a theorem, not just a claim.

Why Use NFAs Then?

NFAs are much easier to design for many languages. You can "guess" which branch to take, knowing one accepting path is enough. The machine is conceptually parallel — imagine it exploring all paths at once.

Example — NFA for "strings ending in 11"

Σ = {0,1}. This NFA nondeterministically "guesses" when the last two symbols are about to be "11".

NFA for L = {w | w ends in "11"}. The nondeterminism happens at q₀ on '1' — the machine can stay at q₀ OR move to q₁.

Tracing an NFA — Parallel Computation

Think of an NFA as running ALL possible paths simultaneously. When you hit a fork (e.g., on '1' from q₀ you can go to q₀ OR q₁), the machine splits and pursues both.

Trace "011" through the NFA:

Start: {q₀}
Read '0': from q₀ on '0' → {q₀} → Current set: {q₀}
Read '1': from q₀ on '1' → {q₀, q₁} → Current set: {q₀, q₁}
Read '1': from q₀ on '1' → {q₀, q₁}; from q₁ on '1' → {q₂} → Current set: {q₀, q₁, q₂}

End: {q₀, q₁, q₂} ∩ F = {q₂} ≠ ∅ → ACCEPT ✓

ε-transitions

ε-transitions let you move to another state without reading any input. They're extremely useful for designing NFAs by composing smaller pieces.

Example

An NFA for L((aa+b)*(bb+a)*) might use ε-transitions to connect the two halves, meaning: "guess" when you've finished the first half and are starting the second.

An NFA is in state set {q₁, q₂} after reading some string. q₂ is an accepting state. The string...

Is rejected because q₁ is not accepting

Is accepted because q₂ ∈ F

Is only accepted if both q₁ and q₂ are accepting

Cannot be in two states at once — this is an error

Chapter 06 — Constructions

Subset Construction (NFA → DFA)

The algorithm that proves NFA = DFA in power.

The Core Idea

A DFA must be deterministic — at any point it's in one state. But when simulating an NFA, we might be in multiple states at once. The insight: make those sets of NFA states become single DFA states.

Key Idea: Each DFA state is a set of NFA states. The DFA tracks "which NFA states could we possibly be in right now?"

The Algorithm

1

Start state = {s_NFA}

The DFA starts in the set containing only the NFA's start state. (Include ε-closure if there are ε-transitions.)

2

For each DFA state S and each symbol a, compute δ_DFA(S, a)

δ_DFA(S, a) = ∪_{q ∈ S} δ_NFA(q, a)

Collect all NFA states reachable from any state in S by reading a.

3

Add any new sets as new DFA states

Keep a worklist. Process each unvisited DFA state. ∅ (empty set) is a valid dead state — it transitions to itself on everything.

4

Accepting DFA states = any set containing an NFA accepting state

S is accepting ⟺ S ∩ F_NFA ≠ ∅

Worked Example: NFA for "strings ending in 11"

NFA: Q = {q₀, q₁, q₂}, Σ = {0,1}, s = q₀, F = {q₂}

NFA transitions: δ(q₀,0)={q₀}, δ(q₀,1)={q₀,q₁}, δ(q₁,0)=∅, δ(q₁,1)={q₂}, δ(q₂,0)=∅, δ(q₂,1)=∅

Build the Subset Table

DFA State (Set)	On '0'	On '1'	Accepting?
→ {q₀}	δ(q₀,0) = {q₀}	δ(q₀,1) = {q₀,q₁}	No (q₂ ∉ {q₀})
{q₀,q₁}	δ(q₀,0)∪δ(q₁,0) = {q₀}∪∅ = {q₀}	δ(q₀,1)∪δ(q₁,1) = {q₀,q₁}∪{q₂} = {q₀,q₁,q₂}	No
★ {q₀,q₁,q₂}	δ(q₀,0)∪δ(q₁,0)∪δ(q₂,0) = {q₀}	δ(q₀,1)∪δ(q₁,1)∪δ(q₂,1) = {q₀,q₁,q₂}	Yes ✓ (q₂ ∈ this set)

We only generated 3 DFA states — no others were reachable. The final DFA has states {q₀}, {q₀,q₁}, {q₀,q₁,q₂} with the last one accepting.

⚡ Exam Tip

Always present subset construction as a table. The table format is exactly what the marker expects and makes your work easy to follow. Show the computation for each cell: "δ(q₀,0) ∪ δ(q₁,0) = ..."

How Many States Can You Get?

An NFA with n states can produce a DFA with up to 2ⁿ states (one for each possible subset). In practice, usually far fewer subsets are reachable. In exam problems, you typically get 3–6 DFA states.

In subset construction, when is a DFA state S an accepting state?

Only if S is the start state

If all NFA states in S are accepting

If S contains at least one NFA accepting state

If S is the empty set ∅

Chapter 07 — Constructions

Product Construction

Combining two DFAs for union and intersection.

The Idea

Given two DFAs M₁ and M₂, the product construction builds a new DFA that runs both simultaneously. Each state of the product DFA is a pair (p, q) — one state from each machine.

Given M₁ = (Q₁, Σ, δ₁, s₁, F₁) and M₂ = (Q₂, Σ, δ₂, s₂, F₂):

Product DFA M = (Q₁ × Q₂, Σ, δ, (s₁,s₂), F) where:
δ((p, q), a) = (δ₁(p, a), δ₂(q, a))

For UNION (L₁ ∪ L₂):
F = {(p,q) | p ∈ F₁ OR q ∈ F₂}

For INTERSECTION (L₁ ∩ L₂):
F = {(p,q) | p ∈ F₁ AND q ∈ F₂}

For DIFFERENCE (L₁ \ L₂):
F = {(p,q) | p ∈ F₁ AND q ∉ F₂}

Worked Example 2024 Q3

Build DFAs for:

M₁: L(M₁) = {x | |x| mod 2 = 0} — strings of even length
M₂: L(M₂) = {x | |x| mod 3 = 0} — strings whose length is divisible by 3

M₁: Even length (Σ = {a,b})

States: e (even, ACCEPT), o (odd). Start: e. δ(e,a)=δ(e,b)=o, δ(o,a)=δ(o,b)=e.

M₂: Length divisible by 3

States: 0 (ACCEPT), 1, 2. Start: 0. Every symbol increments count mod 3: δ(0,a)=δ(0,b)=1, δ(1,a)=δ(1,b)=2, δ(2,a)=δ(2,b)=0.

Product for M₃ = M₁ ∪ M₂

States: (e,0), (e,1), (e,2), (o,0), (o,1), (o,2). Start: (e,0). Accept if p∈F₁ OR q∈F₂.

State	On any symbol	Accept? (even OR div3)
★ (e, 0)	(o, 1)	Yes — e∈F₁ AND 0∈F₂
★ (e, 1)	(o, 2)	Yes — e∈F₁
★ (e, 2)	(o, 0)	Yes — e∈F₁
★ (o, 0)	(e, 1)	Yes — 0∈F₂
(o, 1)	(e, 2)	No
(o, 2)	(e, 0)	No

Product for M₄ = M₁ ∩ M₂

Same structure, but accept only when BOTH are in accepting states (even AND divisible by 3 = divisible by 6):

State	Accept? (even AND div3)
★ (e, 0)	Yes
(e, 1)	No
(e, 2)	No
(o, 0)	No
(o, 1)	No
(o, 2)	No

For intersection, only strings of length divisible by 6 are accepted — makes sense!

⚡ Memory Trick

Union = OR = lenient (accept if either machine accepts)
Intersection = AND = strict (accept only if both accept)

Chapter 08 — Regular Expressions

Regular Expressions

A concise algebraic language for describing regular languages.

Why Regular Expressions?

Building a DFA or NFA can be tedious. Regular expressions (regex) give us a compact algebraic notation for describing the same languages, and they're much easier to write. They're also equivalent — every regex corresponds to an NFA and vice versa.

The Building Blocks

Expression	Language it denotes	Example
`∅`	The empty language — no strings	(nothing)
`ε`	The language {ε} — only the empty string	""
`a` (any symbol)	The language {a} — just that one symbol	"a"
`r + s`	L(r) ∪ L(s) — strings from r OR s	`a+b` → {a, b}
`rs`	L(r) · L(s) — strings from r then s	`ab` → {ab}
`r*`	L(r)* — zero or more repetitions of r	`a*` → {ε, a, aa, ...}

Operator Precedence (High to Low)

1. * (Kleene star) — highest, applies to smallest possible expression
2. Concatenation — middle, left-to-right
3. + (union) — lowest, applies to largest possible expressions

Examples:
ab* = a(b*) — star binds to b only
a+bc = a+(bc) — + is lowest, so bc is one piece
(a+b)* = any combo of a's and b's in any order

Essential Shorthands

Shorthand	Equivalent	Meaning
`r+`	`rr*`	One or more r's
`r?`	`ε + r`	Zero or one r (optional)
`(a+b)`	any a or b	Often written Σ when Σ={a,b}
`(a+b)*`	Σ*	Any string over {a,b}

Reading Regex Aloud

(bb + a)*(aa + b)* reads as: "zero or more occurrences of (bb or a), followed by zero or more occurrences of (aa or b)."

This matches strings like: ε, a, bb, aaa, bba, bbaa, aab, ...

Interactive Regex Pattern Tester

Regex String Checker

Try common exam patterns. Enter a test string and see if it matches.

Pattern (description)

Test String

What does a*b+ describe?

Exactly one 'a' and exactly one 'b'

Any mix of a\'s and b\'s

Zero or more a's followed by one or more b's

Zero or more a's followed by zero or more b's

Chapter 09 — Regular Expressions

Writing Regular Expressions

Every exam pattern, with full worked solutions.

Strategy: Think in Constraints

To write a regex for "all strings satisfying property X," ask yourself:

What can come before and after the key pattern?
Can I break this into cases (use +)?
What characters are forbidden in what positions?
Am I tracking a count? Think mod — or use DFA reasoning then write regex from that.

Key insight: If 'a' is followed by 'b' anywhere, the string is rejected. So once we see any 'a', we can never see a 'b' afterward. This means: all the b's come before all the a's.

Answer: b*a*

Test: "ba" ✓, "bba" ✓, "aaa" ✓, "bbb" ✓, "ε" ✓, "ab" ✗ (correct!), "bab" ✗ (correct!)

Note: "bab" fails because after the middle 'a', there's a 'b'.

Key insight: An odd-length string can be written as: one character, then any number of pairs of characters.

Answer: (a+b)((a+b)(a+b))*

Or equivalently: ((a+b)(a+b))*(a+b)

The pattern picks any single symbol, then zero or more pairs. Total: 1 + 2k = odd ✓

Test: "a" (length 1 ✓), "abc"... actually "aba" (length 3 ✓), "ab" (length 2 ✗ correct).

Key insight: Every 'a' must be immediately followed by a 'b' (or end the string). So a's can only appear as "ab" blocks, except possibly a final 'a'.

Answer: b*(ab*)*(ε+a) or equivalently (b+ab)*(ε+a)

Reading: any number of b's, then any number of (a followed by b's), optionally ending in a single a.

Test: "aba" ✓, "abab" ✓, "b" ✓, "ε" ✓, "aa" ✗ (correct!), "bab" ✓, "baa" ✗ (correct!)

Key insight: The string must contain "aa" exactly once, and no other consecutive a's. This means: before and after the "aa", we can have anything EXCEPT another "aa". Use the "no consecutive a's" pattern around the "aa".

Let NO_AA = (b+ab)*(ε+a) = strings with no consecutive a's.

Answer: (b+ab)* aa (b+ab)*(ε+a)

But wait — we also need to make sure there's no 'a' immediately before or after the "aa" that would create a run of 3+ a's. Adjust:

Refined: (b+ab)* b aa b (ab+b)* (ε+a) (if the "aa" is in the middle)

This is tricky — full solution requires careful case analysis based on whether "aa" appears at start/end/middle.

Key insight: The number of b's is 1, 2, 4, 5, 7, 8, ... (not 0, 3, 6, 9...). Split into two cases: #b ≡ 1 (mod 3) or #b ≡ 2 (mod 3).

Let A = a* (any number of a's, used between b's)

Let GRP = a*ba*ba*ba* = exactly 3 b's with any a's

For #b ≡ 1 (mod 3): a*(ba*ba*ba*)* b a* — groups of 3 b's, plus 1 extra

For #b ≡ 2 (mod 3): a*(ba*ba*ba*)* b a* b a* — groups of 3 b's, plus 2 extra

Answer: union of both cases with +.

Tip: on the exam, these counting patterns are often better designed as a DFA first, then described in English.

Key insight: Either "00" comes before "11" or "11" comes before "00".

Let ANY = (0+1)*

Answer: (0+1)* 00 (0+1)* 11 (0+1)* + (0+1)* 11 (0+1)* 00 (0+1)*

Or more concisely since the parts can overlap: you need any string that has at least one "00" AND at least one "11" somewhere. The two cases cover whether "00" or "11" appears first.

Key insight: Length ≡ 1 (mod 3) OR length ≡ 2 (mod 3).

Let Σ = (0+1).

Answer: Σ(ΣΣΣ)* + ΣΣ(ΣΣΣ)*

First part: lengths 1, 4, 7, ... (≡ 1 mod 3)

Second part: lengths 2, 5, 8, ... (≡ 2 mod 3)

Strategy — State Elimination Method:

If multiple accepting states, add a new single accepting state with ε-transitions from each.
Add a new start state with ε-transition to the original start.
Eliminate states one by one: for each state q being eliminated, for each pair (p, r) of predecessor and successor, add a direct transition labeled p_to_q · q_loop* · q_to_r.
When only start and accept remain, the label on the remaining transition is your regex.

This is algebraic and mechanical — practice it on the 2024 Q4d and 2025 Q4c diagrams.

Chapter 10 — Conversions

Regex ↔ Automaton Conversions

Going back and forth between the three representations.

The Conversion Map

Regex ──────────────→ NFA (Thompson's Construction)
↑ │
│ Subset Construction
State Elimination │
│ ↓
DFA ←────────────── DFA (already a DFA — just simplify)

Regex → NFA (Thompson's Construction)

Build NFAs for primitive cases, then combine with ε-transitions.

Regex	NFA Structure
`a` (single symbol)	→(s) —a→ ((t))
`r + s` (union)	New start → ε → NFA_r; New start → ε → NFA_s; both old accepts → ε → new accept
`rs` (concat)	NFA_r's accept → ε → NFA_s's start
`r*` (star)	New start/accept → ε → NFA_r start; NFA_r accept → ε → NFA_r start (loop); NFA_r accept → ε → new accept; new start → ε → new accept

Example — NFA for a*aab* 2025 Q4a

This is: (a*)(aa)(b*) — three concatenated pieces.

Build NFA for a* → a loop state
Build NFA for aa → two consecutive a-transitions
Build NFA for b* → a b loop state
Connect with ε-transitions: a* accept → ε → aa start, aa accept → ε → b* start

Or just draw it directly: →(q₀) —a→ (q₀) self-loop, then —a→ (q₁) —a→ (q₂) —b→ ((q₂)) with b self-loop. Note q₀ loops on 'a', then eventually takes 'a' to q₁, another 'a' to q₂ (accepting), b's stay at q₂.

Regex → DFA Directly

For many exam problems, you can build a DFA directly from a regex by thinking about what the regex means. Identify the states (what do you need to remember?) and fill in transitions.

Example — DFA for (aa+b)*(bb+a)* 2025 Q2b

This language = "any number of (aa or b) blocks, then any number of (bb or a) blocks." The key: in the first part, a's come in pairs; in the second, b's come in pairs.

States needed: track "which part are we in" and "did we see an odd a or b."

This is complex — for exam purposes, build the NFA first (much easier!) then convert via subset construction if a DFA is required.

DFA → Regex (State Elimination)

1

Add new start state q_s with ε-transition to old start state.

2

Add new accepting state q_f with ε-transitions from all old accepting states.

3

Eliminate interior states one by one. When eliminating state q with self-loop R (or ε if no loop):

For each pair of states p (that goes to q) and r (that q goes to), add a direct p→r edge labeled: (label of p→q)(R*)(label of q→r)

If there was already a p→r edge, take the union (+) of old and new labels.

4

When only q_s and q_f remain, the label on q_s→q_f is your regex.

Exam Note

For the 2025 Q4c and Q4d diagrams, state elimination is the intended approach. Practice doing it on small 3-state DFAs. The algebra can get messy — simplify as you go.

Chapter 11 — Theory

State Equivalence

The concept behind DFA minimization — and a recurring exam trap.

Definition

Two states p and q in a DFA are equivalent (written p ≡ q) if:

∀x ∈ Σ*: δ̂(p, x) ∈ F ⟺ δ̂(q, x) ∈ F

In plain English: from both p and q, the same set of strings leads to acceptance. They behave identically for all future inputs.

The Exam Trap — Two Common Confusions

⚠️ Critical Distinction — Appears Every Exam

p ≡ q does NOT mean δ(p,a) = δ(q,a)

Equivalent states can have transitions going to different states — as long as those different states are themselves equivalent to each other.

Example: Suppose q₀ and q₁ both reject all future inputs. They're equivalent. But δ(q₀, a) = q₂ and δ(q₁, a) = q₃, where q₂ and q₃ are also equivalent to each other. So p≡q but δ(p,a) ≠ δ(q,a).

⚠️ The Other Direction — Also an Exam Trap

If p ∈ F and q ∉ F → p and q are definitely NOT equivalent

Proof: use x = ε as the witness. δ̂(p, ε) = p ∈ F (p accepts ε), but δ̂(q, ε) = q ∉ F (q rejects ε). Since they disagree on ε, they're not equivalent.

This is TRUE and appears as a True/False claim. It's always TRUE.

How to Check Equivalence

Two states p and q are NOT equivalent if there exists a "distinguishing string" x such that exactly one of δ̂(p,x) ∈ F and δ̂(q,x) ∈ F holds.

Example

In a DFA for "strings containing ab": is q₀ ≡ q₁?

From q₀: reading "ab" → q₂ (accept). From q₁: reading "b" → q₂ (accept), actually reading "ab" from q₁: a→q₁, b→q₂ (accept). Same!

But from q₀: reading "b" stays at q₀ (reject for ε after). From q₁: reading "b" → q₂ (accept). So x="b" is a distinguishing string → q₀ ≢ q₁.

Summary of Key Claims

Claim	True/False	Why
p ≡ q → ∀a: δ(p,a) = δ(q,a)	FALSE	Transitions can differ but lead to equivalent states
p ≡ q → ∀a: δ(p,a) ≡ δ(q,a)	TRUE	This is what equivalence actually guarantees
p∈F, q∉F → p ≢ q	TRUE	ε witnesses the difference
p ≢ q → ∃a: δ(p,a) ≠ δ(q,a)	FALSE	Different transitions can still lead to equivalent states; non-equivalence is about future behavior, not immediate transitions

States p and q are equivalent (p ≡ q). Which of the following MUST be true?

δ(p, a) = δ(q, a) for every symbol a

p ∈ F and q ∉ F

For every x ∈ Σ*, δ̂(p,x) ∈ F ⟺ δ̂(q,x) ∈ F

p ∉ F

Chapter 12 — Theory

The Pumping Lemma

The tool for proving languages are NOT regular.

Why Do We Need This?

Not all languages are regular. The Pumping Lemma gives us a way to prove a language cannot be recognized by any DFA, NFA, or regex. The classic non-regular language is L = {aⁿbⁿ | n ≥ 0} — you can't count to n with finite memory.

The Theorem

Pumping Lemma Statement

If L is a regular language, then there exists an integer k ≥ 1 (the pumping length) such that every string s ∈ L with |s| ≥ k can be split into three parts s = xyz where:

|y| ≥ 1 — the middle part y is non-empty
|xy| ≤ k — x and y together are at most k characters
∀i ≥ 0: xyⁱz ∈ L — pumping y any number of times keeps the string in L

Using It: The Adversarial Game

To prove L is NOT regular, you play a game against a demon. You win if you show the Pumping Lemma fails.

Demon	Picks k (you don't know what k is)
You	Pick a string s ∈ L with \|s\| ≥ k (your choice depends on k)
Demon	Picks how to split s = xyz (satisfying conditions 1 and 2)
You	Pick i ≥ 0 to show xyⁱz ∉ L

If you can always find such an i (for any split the demon picks), L is not regular.

Worked Example: L = {aⁿbⁿ | n ≥ 0}

1

Demon picks k.

We don't know what k is, but we know it's some positive integer.

2

We pick s = aᵏbᵏ

This is in L (it has k a's followed by k b's) and |s| = 2k ≥ k. ✓

3

Demon picks s = xyz with |y| ≥ 1, |xy| ≤ k.

Since |xy| ≤ k, and s starts with k a's, both x and y must be entirely within the a-prefix. So y = aʲ for some j ≥ 1, and x = aⁱ for some i ≥ 0, with i + j ≤ k.

s = aⁱ · aʲ · aᵏ⁻ⁱ⁻ʲ bᵏ
= x y z

4

We pick i = 0 (remove y).

xy⁰z = xz = aᵏ⁻ʲ bᵏ

This has k−j a's and k b's. Since j ≥ 1, k−j < k. So we have fewer a's than b's → NOT in L. ✓ We win!

⚡ Exam Tip: Critique Questions

The 2025 exam asked you to critique a faulty proof. Things to check for:

Did they pick a string that's actually in L?
Does their split satisfy |xy| ≤ k?
Did they handle ALL possible demon splits (not just one)?
Did they actually show the pumped string leaves L?

Common Choices for s

Language type	Good choice for s	Why it works
{aⁿbⁿ}	aᵏbᵏ	y lands in a's only; pumping unbalances a's and b's
{ww \| w ∈ Σ*}	aᵏbᵏaᵏbᵏ	Pumping disrupts the duplication pattern
Palindromes	aᵏbaᵏ	Pumping a's only on one side breaks palindrome structure
{a^(p) \| p prime}	aᵖ where p ≥ k is prime	Pumped versions have non-prime length

You're proving L = {aⁿbⁿ} is non-regular. The demon picks k = 5. You choose s = a⁵b⁵. The demon then splits it as x = a², y = a², z = ab⁵. What's wrong?

s is not in L

The split doesn't satisfy the pumping conditions — but actually wait, let me check |xy|...

You cannot find any i that pumps outside L

Actually — the split IS valid (|xy|=4≤5) and pumping i=0 gives a³b⁵ ∉ L. The proof works.

Chapter 13 — Exam Prep

True/False Mastery

All 15 claims from 3 years of exams — with full explanations.

Q1 Strategy

For each claim: if TRUE, you just need to write "True." If FALSE, you must explain WHY with a counterexample or proof. Be precise and concise.

The 6 Recurring Patterns

Every True/False question falls into one of these patterns. Master these and you'll get Q1 every time:

#	Pattern	Answer	Key Insight
A	"Finite language → complement non-regular"	FALSE	All finite languages are regular. Complement of regular = regular.
B	"Every NFA has an equivalent DFA"	TRUE	Subset construction theorem.
C	"There exists NFA with no equivalent DFA"	FALSE	Every NFA has an equivalent DFA (same as B).
D	"p≡q implies δ(p,a)=δ(q,a)"	FALSE	Equivalence is about future acceptance, not identical transitions.
E	"p∈F, q∉F → p,q not equivalent"	TRUE	ε is a distinguishing witness. (δ̂(p,ε)=p∈F but δ̂(q,ε)=q∉F)
F	"Complement of regular is non-regular"	FALSE	Regular languages are closed under complement.

All 15 Claims — Complete Bank

TRUE. This is the NFA→DFA equivalence theorem. Subset construction converts any NFA to an equivalent DFA. No explanation needed — just write "True."

FALSE. L = {ε, ab, aabb} is a finite language. All finite languages are regular. Since regular languages are closed under complement, Σ*−L is also regular.

TRUE. This is the literal definition of DFA acceptance. No explanation needed.

FALSE. The regex requires the string to start with 'b' (the first symbol is a literal 'b'). But "aaba" starts with 'a'. No string starting with 'a' can match this regex.

FALSE. State equivalence means δ(p,a) and δ(q,a) are equivalent states, not necessarily the same state. Counterexample: Take a minimal DFA; duplicate any accepting state. The two copies are equivalent but δ goes to different (yet equivalent) copies.

TRUE. Take any DFA M. Add redundant states to get NFA N that recognizes the same language. M has fewer states. Alternatively, there are languages where the minimal DFA is smaller than any NFA for that language.

TRUE. Every finite language is regular. The Pumping Lemma holds for all regular languages. Set k = (length of longest string + 1). No string in L satisfies |s| ≥ k, so the condition is vacuously true.

FALSE. For every NFA, subset construction produces an equivalent DFA. No NFA can recognize a language that no DFA can recognize.

FALSE. Regular languages are closed under complement. The complement of any regular language is also regular. (To build a DFA for the complement: flip all accepting states.)

FALSE. Non-equivalence means there exists a string x distinguishing p and q. But the immediate transitions δ(p,a) and δ(q,a) can be the same state. Example: p∈F, q∉F, but δ(p,a)=δ(q,a)=r for all a. Still p≢q (ε witnesses it), but transitions are identical.

FALSE. L has only 101 strings (finite). All finite languages are regular. Complement of a regular language is regular.

TRUE. The first alternative (a+ba)* includes (a)* which contains "aa" (take "a" twice). So aa ∈ L((a+ba)*) ⊆ L((a+ba)* + (ab+ba)*).

TRUE. Classic result: the language of strings where the n-th-from-last bit is 1. An NFA needs n+1 states; every equivalent DFA needs 2ⁿ states. So there exists an NFA that is smaller than every equivalent DFA.

FALSE. This universal claim is false. For many simple languages, a DFA can have fewer or equal states to the NFA. Also, you can always add useless states to an NFA to make it larger than any equivalent DFA. The claim "ALL NFAs have fewer states than ALL equivalent DFAs" is clearly wrong.

TRUE. Use ε as the witness. δ̂(p,ε) = p ∈ F (p accepts ε). δ̂(q,ε) = q ∉ F (q rejects ε). Since the string ε is accepted from p but rejected from q, p and q are not equivalent by definition.

Chapter 14 — Exam Prep

Full Practice Problems

Exam-style questions with full worked solutions.

How to Use This Section

Try each problem yourself first. Then click "Show Solution" to see the full worked answer. Honest self-assessment is the key to learning.

For each claim, state True or False. If False, explain why.

(a) If L₁ and L₂ are both non-regular, then L₁ ∪ L₂ is non-regular.

(b) There exists a DFA with 2 states that recognizes L = {w | w has odd length}.

(c) For all DFAs M: L(M) ≠ ∅.

(d) If L is infinite, then L is non-regular.

(e) For all NFAs N, the subset construction produces a DFA with exactly 2|Q| states.

Solutions:

(a) FALSE. L₁ = {aⁿbⁿ | n≥0} and L₂ = complement of L₁ are both non-regular, but L₁ ∪ L₂ = Σ* which IS regular.

(b) TRUE. States: "even length" (ACCEPT) and "odd length". Start at "even." Every symbol swaps between them.

(c) FALSE. A DFA with no accepting states (F = ∅) recognizes the empty language ∅.

(d) FALSE. L = {a}* = {ε, a, aa, ...} is infinite and regular.

(e) FALSE. The subset construction can produce up to 2|Q| states, but typically far fewer are reachable.

(a) [3 marks] Build a DFA for L = {w ∈ {a,b}* | w contains an even number of a's}.

(b) [2 marks] Build a DFA for L = {w ∈ {0,1}* | the binary number w is divisible by 4}.

Solutions:

(a) Track #a's mod 2. Two states: q_even (ACCEPT, start), q_odd.

δ(q_even, a) = q_odd, δ(q_even, b) = q_even
δ(q_odd, a) = q_even, δ(q_odd, b) = q_odd

Accepting: q_even (ε has 0 a's = even ✓).

(b) Divisible by 4 = last two bits are 00. States track the last two bits read. Start state = "seen nothing" (or ε, which has value 0 → divisible by 4).

States: s (start/accept, last bits "00"), q1 (last bits "01"), q2 (last bits "10"), q3 (last bits "11")
δ(s, 0)=s, δ(s, 1)=q1
δ(q1, 0)=q2, δ(q1, 1)=q3
δ(q2, 0)=s, δ(q2, 1)=q1
δ(q3, 0)=q2, δ(q3, 1)=q3

NFA over Σ = {a, b}: Q = {0, 1, 2}, s = 0, F = {2}.

Transitions: δ(0,a) = {0,1}, δ(0,b) = {0}, δ(1,a) = ∅, δ(1,b) = {2}, δ(2,a) = ∅, δ(2,b) = ∅.

(a) Describe the language in plain English.

(b) Convert to a DFA via subset construction. Show all work.

Solutions:

(a) The language is: all strings that contain "ab" as a substring. (State 1 = "just saw 'a'", state 2 = "saw 'ab'")

(b) Subset Construction Table:

DFA State	On 'a'	On 'b'	Accept?
→ {0}	{0,1}	{0}	No
{0,1}	{0,1}	{0,2}	No
★ {0,2}	{0,1}	{0}	Yes (2∈F)

3 DFA states. Start: {0}. Accept: {0,2}.

Σ = {a, b}. Write regular expressions for:

(a) All strings where every 'a' is immediately followed by 'b'.

(b) All strings of even length that start and end with the same symbol.

(c) All strings where the number of a's is divisible by 3.

(d) All strings that contain neither "aa" nor "bb".

Solutions:

(a) Every 'a' must be followed by 'b'. The only 'a' that doesn't need to be followed by 'b' is... none. So: b*(abb*)* (ε + "nothing") — wait, 'a' must always be followed by 'b'. Answer: (b + ab)*

(b) Even length, start=end=a: a(a+b)*a, even length (total). And start=end=b: b(a+b)*b. But we need even total length. a(a+b)^(2k)a doesn't work directly... Answer: a(a+b)(a+b)*a + b(a+b)(a+b)*b + aa + bb (length 2 base cases plus length 4+).

More cleanly: (a(a+b)*a) + (b(a+b)*b) where the inner part has even length. Constrain inner: a((a+b)(a+b))*a + b((a+b)(a+b))*b

(c) Groups of 3 a's: b*(ab*ab*ab*)* b* — any b's interspersed, a's come in groups of 3.

(d) No "aa" and no "bb" — strings must alternate: ab, ba, aba, bab, abab, ... Answer: (ab)*(ε+a) + (ba)*(ε+b)

Final Exam Tips

⚡ Top 5 Things to Remember

Finite languages are ALWAYS regular — and so are their complements. This kills many T/F questions.
p≡q does NOT mean δ(p,a)=δ(q,a) — equivalence = same future behavior, not same transitions.
For subset construction: show your table with all states and transitions. A state is accepting if it contains any NFA accepting state.
For pumping lemma: choose s carefully, account for ALL demon splits (not just one), and explicitly show the pumped string leaves L.
For regex: simplify. If your answer is correct but needlessly complicated, you lose marks. Think about the simplest equivalent expression.

You've completed the full lesson! Go back through any sections that felt unclear, practice writing DFAs and NFAs on paper, and build your cheat sheet. Good luck on your midterm! 🎓