Work in progress!

Regular Expressions

Regular expressions are a symbolic representation of regular languages.
They express languages that can be recognized by finite automata.

Regular expressions are more concise than formal definitions of finite automata, and can be algebraically manipulated.

Definition

Regular expressions define a set of constants with respect to an alphabet Σ:

∅, the empty language
ε, the null word
a, where a ∈ Σ (any single symbol)

Regular expressions define a set of operations where Q, R, S are regular expressions:

( QR ) denotes the concatenation of words expressed by Q with words expressed by R.
For example, if Q = { a, b } and R = { c, d }, then QR = { ac, ad, bc, bd }
Concatenation is associative, so (QR)S = Q(RS).
Concatenation is non-commutative, so QR does not necessarily equal RQ.
( Q + R ) denotes the union of words expressed by either Q or R.
For example, if Q = { a, b } and R = { c, d }, then Q + R = { a, b, c, d }
Union is associative, so (Q + R) + S = Q + (R + S).
Union is commutative, so Q + R = R + Q.
( R* ) denotes the kleene closure of words expressed by R.
For example, if R = { a, b }, then R* = { ε, a, b, aa, ab, ba, bb, aaa, aab, aba, … }

For example, let L be the language expressed by the regular expression a(b+c)*d . This expression denotes any word beginning with a single "a", followed by an arbitrary number of "b" or "c", and ending with a single "d".
L = { ad, abd, acd, abbd, abcd, acbd, accd, abbbd, abbcd, abcbd, abccd, acbbd, … }

Regular Expression Cheat Sheet

Basic Identities

For any regular expressions Q, R, and S over Σ:

∅ + R = R
∅R = R∅ = ∅
εR = Rε = R
ε* = ε
∅* = ε
R + R = R
R*R* = R*
RR* = R*R
(R*)* = R*
ε + RR* = ε + R*R = R*
(QR)*Q = Q(RQ)*
(Q + R)S = QS + RS and Q(R + S) = QR + QS

Arden's Lemma

For any regular expressions Q, R, and S over Σ.

If R = Q + RS, then R = QS* .
If R = Q + SR, then R = S*Q .

If ε ∉ S, then these solutions are unique.

This is not a complete list. There are an infinite amount of identities of regular expressions.

Basic Identities

1. ∅ + R = R

The union of any set with the empty set is the set itself.
This is similar to adding 0.

2. ∅R = R∅ = ∅

The concatenation of two sets is the set of each possible pairwise concatenation of a word in the left set with a word in the right set. Because there are no words in the empty set, there is no concatenation of any word with an element of the empty set.
This is similar to multiplying by zero.

3. εR = Rε = R

The concatenation of any word with the null word is the word itself.
This is similar to multiplying by 1.

4. ε* = ε

Kleene closure represents an arbitrary number of repeated concatenations.
By (3), this concatenation will equal the word itself, which in this case is ε.
This is similar to raising 1 to the n-th power.

5. ∅* = ε

Be careful! One could assume ∅* = ∅, but the Kleene closure of any language always includes ε.

6. R + R = R

The union of any set with itself is itself.

7. RR = R*

The concatenation of "at least zero" with "at least zero" remains "at least zero".
The sum of two arbitrary natural numbers is another arbitrary natural number.

8. RR* = R*R

The concatenation of "exactly one" with "at least zero" is "at least one".
Both forms correspond to the Kleene plus operation.

9. (R) = R*

The product of two arbitrary numbers is another arbitrary number.

10. ε + RR* = ε + RR = R

By (8), RR* and R*R each represent the Kleene plus of R.
Kleene Plus is simply Kleene closure minus the null word.

11. (QR)Q = Q(RQ)

Consider words expressed by the left hand side (QR)ⁿ for any value of n ≥ 0:

(QR)⁰Q = Q
(QR)¹Q = QRQ
(QR)²Q = QRQRQ
(QR)³Q = QRQRQRQ

Consider words expressed by the right hand side Q(RQ)ⁿ for the same values of n:

Q(RQ)⁰ = Q
Q(RQ)¹ = QRQ
Q(RQ)² = QRQRQ
Q(RQ)³ = QRQRQRQ

12. (Q + R)S = QS + RS and Q(R + S) = QR + QS

Concatenation distributes across unions.
This is similar to distribution multiplication across addition in algebra.
However, because concatenation is non-commutative, the direction of distribution is important.

Arden's Lemma

Arden's Lemma is used to solve systems of equations of regular languages.

Arden's Lemma

For any regular expressions Q, R, and S over Σ.

If R = Q + RS, then R = QS* .
If R = Q + SR, then R = S*Q .

These solutions are the smallest languages the satisfy the equations. If ε ∉ S, then these are the only solutions.

Proof

First, consider the first equation, R = Q + RS.
That is, R can expand to either Q or RS.
Suppose it expands to RS. Then, the subsequent R can be evaluated again.
R → RS → RSS → … → RSSS…SSS
Suppose R expands to Q. Then, the evaluation is terminated.
RSSS…SSS → QSSS…SSS
This expression cannot be expanded any further.

Similarly for the second equation, R = Q + SR.
Unlike this first equation which is recursive to the left, this one is recursive to the right.
Thus, the terminating expression Q will be on the right.
R → SR → SSR → … → SSS…SSSR → SSS…SSSQ

If ε ∈ S, then there are infinitely many solutions, i.e. R = Σ*.
In this case, R = Q + SR ⇒ Σ* = Q + SΣ*
Because S can potentially be nullable, it does not restrict Σ* by concatenation.
Therefore, Σ* = Q + SΣ* ⇒ Σ* = Q + Σ* ⇒ Σ* = Σ*

If ε ∉ S, then the values of RS or SR cannot equal R.
This restricts the possible values of R that satisfy the equation to just one.

Exercises

1. Which of the following languages is not equivalent to (R + S)* ?

(R* + S*)*
R*(SR*)*
(R + SR*)*
R*(R*S)*
(R*S*)*

2. Prove: R^aR*R^b = R^cR*R^d where a, b, c, d ∈ ℕ and a+b = c+d

Regular Expressions

Definition

Regular Expression Cheat Sheet

Basic Identities

Arden's Lemma

Basic Identities

1. ∅ + R = R

2. ∅R = R∅ = ∅

3. εR = Rε = R

4. ε* = ε

5. ∅* = ε

6. R + R = R

7. R*R* = R*

8. RR* = R*R

9. (R*)* = R*

10. ε + RR* = ε + R*R = R*

11. (QR)*Q = Q(RQ)*

12. (Q + R)S = QS + RS and Q(R + S) = QR + QS

Arden's Lemma

Arden's Lemma

Proof

Exercises

7. RR = R*

9. (R) = R*

10. ε + RR* = ε + RR = R

11. (QR)Q = Q(RQ)