The Myhill-Nerode theorem is a statement about the equivalence classes of words with respect to formal languages. It is used to determine the minimal number of states required by a DFA to recognize a language or to prove that a language is not regular.
Let x, y be words with respect to some alphabet Σ, and L be a language with respect to the same alphabet Σ.
A distinguishing extension between x and y with respect to L is any word z such that either xz or yz is in L, but not both.
If such a word z exists, then x and y are distinguishable with respect to L.
For example, let L be the language of binary words with more instances of 1 than 0.
Let x = 100 and y = 0010.
The extension z = 10 is not distinguishing, because both xz and yz are not in L.
The extension z = 111 is also not distinguishing, because both xz and yz are in L.
The extension z = 11 is distinguishing, because xz is in L but yz is not.
An equivalence class is a set of words that are indistinguishable with respect to some language.
If x and y are words in the same equivalence class, then for any word z, xz and yz are either both accepted or both rejected by L.
Equivalence classes partition Σ*; each word belongs to one and only one equivalence class.
For example, consider the following DFA.
It recognizes binary words that begin and end with opposite symbols.
The words 00 and 010 both end in the same state, q1.
Concatenating any extension to these words corresponds to following the same transitions from the same state.
In other words, if z is any arbitrary word, then 00 ⋅ z ends in the same state as 010 ⋅ z.
More generally, if two words end in the same state in a DFA, they are indistinguishable.
The states of this DFA correspond to the following equivalence classes.
The Myhill-Nerode theorem states that L is regular if and only if ~L has a finite number of equivalence classes.
It follows that if ~L has an infinite number of equivalence classes, L is not regular.
Consider the language L = { an ⋅ bn | n ∈ N }.
That is, the language of all words that are some amount of a's followed by the same amount of b's.