Work in progress!
Regular expressions are a symbolic representation of regular languages.
They express languages that can be recognized by finite automata.
Regular expressions are more concise than formal definitions of finite automata, and can be algebraically manipulated.
Regular expressions define a set of constants with respect to an alphabet Σ:
For example, let L be the language expressed by the regular expression a(b+c)*d . This expression denotes any word beginning with a single "a", followed by an arbitrary number of "b" or "c", and ending with a single "d".
L = { ad, abd, acd, abbd, abcd, acbd, accd, abbbd, abbcd, abcbd, abccd, acbbd, … }
This is not a complete list. There are an infinite amount of identities of regular expressions.
The union of any set with the empty set is the set itself.
This is similar to adding 0.
The concatenation of two sets is the set of each possible pairwise concatenation of a word in the left set with a word in the right set. Because there are no words in the empty set, there is no concatenation of any word with an element of the empty set.
This is similar to multiplying by zero.
The concatenation of any word with the null word is the word itself.
This is similar to multiplying by 1.
Kleene closure represents an arbitrary number of repeated concatenations.
By (3), this concatenation will equal the word itself, which in this case is ε.
This is similar to raising 1 to the n-th power.
Be careful! One could assume ∅* = ∅, but the Kleene closure of any language always includes ε.
The union of any set with itself is itself.
The concatenation of "at least zero" with "at least zero" remains "at least zero".
The sum of two arbitrary natural numbers is another arbitrary natural number.
The concatenation of "exactly one" with "at least zero" is "at least one".
Both forms correspond to the Kleene plus operation.
The product of two arbitrary numbers is another arbitrary number.
By (8), RR* and R*R each represent the Kleene plus of R.
Kleene Plus is simply Kleene closure minus the null word.
Consider words expressed by the left hand side (QR)n for any value of n ≥ 0:
Concatenation distributes across unions.
This is similar to distribution multiplication across addition in algebra.
However, because concatenation is non-commutative, the direction of distribution is important.
Arden's Lemma is used to solve systems of equations of regular languages.
First, consider the first equation, R = Q + RS.
That is, R can expand to either Q or RS.
Suppose it expands to RS. Then, the subsequent R can be evaluated again.
R → RS → RSS → … → RSSS…SSS
Suppose R expands to Q. Then, the evaluation is terminated.
RSSS…SSS → QSSS…SSS
This expression cannot be expanded any further.
Similarly for the second equation, R = Q + SR.
Unlike this first equation which is recursive to the left, this one is recursive to the right.
Thus, the terminating expression Q will be on the right.
R → SR → SSR → … → SSS…SSSR → SSS…SSSQ
If ε ∈ S, then there are infinitely many solutions, i.e. R = Σ*.
In this case, R = Q + SR ⇒ Σ* = Q + SΣ*
Because S can potentially be nullable, it does not restrict Σ* by concatenation.
Therefore, Σ* = Q + SΣ* ⇒ Σ* = Q + Σ* ⇒ Σ* = Σ*
If ε ∉ S, then the values of RS or SR cannot equal R.
This restricts the possible values of R that satisfy the equation to just one.
1. Which of the following languages is not equivalent to (R + S)* ?
2. Prove: RaR*Rb = RcR*Rd where a, b, c, d ∈ ℕ and a+b = c+d