Theory of Automata
The theory of automata is the study of abstract machines and computational problems. The computation problems can be solved using these abstract machines. The abstract machine is called the automata or automaton.
Different names of Automata Theory are
- Theory of computation (TOC)
- Automata Theory
- Theory of Formal Languages and Computation
- Formal languages, Automata, Grammars
Terminologies in Theory of Automata
Below are the most important terminologies explained in simple and clear wording
1. Alphabet (Σ) and Symbol
An alphabet denoted as “Σ” (also called sigma) is a nonempty finite set of symbols. The elements of the alphabet are called symbols. A symbol is a single element (atomic entity) of Length 1.
Example:
-
Σ = {0,1}
-
Σ = {a,b}
-
Σ = {1,2,3}
- Σ = {1,2,3, a,b,c}
In the above examples 0,1, a,b, 1,2,3 are symbols that contain the length exactly 1.
| Important:
ε (epsilon) is not an alphabet symbol; it is the unique empty string of length 0. |
2. String (Word)
A string is a finite sequence of symbols from an alphabet (Σ).
- If Σ = {0,1}, then 0, 01, 1101, 11010,…so on, are valid strings.
- If Σ = {a,b}, then a, aaab, bbbba, ababba,…so on, are valid strings.
- If Σ = {0,1, 2,3,…..,9}, then 12345, 56654, 11455, 8888888,…so on, are valid strings.
- If Σ = {a,b,c,d,…,z}, then theory, beautiful, hello,…so on, are valid strings.
The number of symbols in a string is called the length of the string. For example, if w = 1011, then the length of the string (|w|) = 4. The string with no symbols is called the empty string.It is represented by ε (called epsilon). The length of an empty string is 0
- The empty string ε is a string of length 0; for example, if Σ = {a, b}, then ε ∈ Σ*. The empty set ∅ is a set with no elements; for example, L = ∅ means the language contains no strings at all.
Let the alphabet be Σ = {a, b}, which means we can form strings using only the symbols a and b.
- A string of length 0 is called the empty string and is written as ε.
- For length 1, the possible strings are: a, b.
- For length 2, the possible strings are: aa, ab, ba, bb.
- For length 3, the possible strings are: aaa, aab, aba, abb, baa, bab, bba, bbb.
- For length 4, the possible strings are: aaaa, aaab, aaba, aabb, abaa, abab, abba, abbb, baaa, baab, baba, babb, bbaa, bbab, bbba, bbbb.
In general, for alphabet size 2, the total number of strings of length n is 2ⁿ.
Important Points:
|
Set Vs String
- Set: A collection of distinct elements where order does not matter, e.g., {a, b} = {b, a}.
- String: An ordered sequence of symbols where order matters, e.g., “ab” ≠ “ba”.
Here is a clear comparison table for Set vs String in TOC:
| Basis | Set | String |
|---|---|---|
| Definition | A collection of elements | A sequence of symbols |
| Order | Order does not matter, i.e., {a,b} = {b,a} | Order matters, i.e., ab is not equal to ba |
| Repetition | Duplicate elements not allowed | Symbols can repeat, i.e., aababbba |
| Written As | {a, b, c} | abc |
| Example | {a, b} | ab |
| Length | No concept of length, finite or infinite | Has length which is always finite |
| Empty Form | ∅ (empty set) | ε (empty string) |
| Used For | Define languages | Form elements of languages |
Exercise:
| Question 1: Let Σ= {a}, How many Strings are there over Σ ?
Answer: If Σ = {a}, then there are infinitely many strings over Σ because we can form ε, a, aa, aaa, aaaa, and so on endlessly (since Σ* is infinite). For every length n ≥ 0, there is exactly one string (aⁿ), such as ε (length 0), a (length 1), aa (length 2), aaa (length 3), and so forth without end, but each string has finite length. Question 02: Consider the strings over the alphabet {a,b,c}.
Answer: i. Each position has 3 choices. So, 35=243 strings is the answer ii. Each of n positions has 3 choices. So, 3n iii. Length = 7, Positions fixed are
The remaining positions are 4, therefore each has 3 choices. So, 34 = 81 strings iv. Length ≤ 5 means: Total: 30+31+32+33+34+35 =1+3+9+27+81+243 = 364 strings |
NOTE:
- Every Individual String has a finite length. But there are infinitely many strings over any Σ.
- Analogy: Each Movie has a finite length, but we can create infinitely many movies.
Reversal of a String:
The reversal of a string is obtained by writing the symbols of the string in the opposite order.
- Syntax: If w = a₁a₂a₃…aₙ, then the reversal of the string is written as wᴿ = aₙaₙ₋₁…a₂a₁.
- Example: If w = abc, then its reversal is wᴿ = cba.
Palindrome String:
A palindrome string is a string that reads the same forward and backward (i.e., w = wᴿ).
- Example:
aba,aa,abba, madam, eye.
Concatenation of Strings
Concatenation is the operation of joining two strings together to form a new string.
- Notation: If x and y are strings, their concatenation is written as xy.
- Example: If x = ab and y = ba, then xy = abba. Similarly If x = ab, y = ba, and z = mn, then zxy = mnabba.
- Length Rule: |xy| = |x| + |y| (the length of the new string is the sum of both lengths).
- Order Matters: xy ≠ yx in general.
- Example: Let x = ab and y = c
- xy = abc
- yx = cab
- Since abc ≠ cab, the order of strings matters in concatenation.
- Associative Property: (xy)z = x(yz).
- Identity Element: ε (epsilon) acts as the identity element for concatenation.
- Left Identity: εx = x.
- Right Identity: xε = x.
- Example with ε: If x = ab, then εx = ab and xε = ab.
Power of a String
If w = ab, then the power of the string means repeated concatenation of w with itself.
Example:
- w⁰ = ε
- w¹ = ab
- w² = abab
- w³ = ababab
- w⁴ = abababab
- In general, wⁿ means concatenating the string w with itself n times.
Prefix of string
A prefix of a string is any part taken from the beginning of the string.
- Example: If w = abcde, prefixes are ε, a, ab, abc, abcd, abcde.
Suffix of string
A suffix of a string is any part taken from the end of the string.
- Example: If w = abcde, suffixes are ε, e, de, cde, bcde, abcde.
Substring of string
A substring is any continuous sequence of symbols inside a string.
- Example: If w = abcde, substrings include a, ab, bc, bcd, cde, abc, abcde.
Note: every prefix and suffix is a substring, but not every substring is a prefix or suffix.
Number of Prefixes, Suffixes, and Substrings of a String
Consider a string w = abc having length = 3, so n =3. Look at the following table
| Concept | Formula | Number | Example |
|---|---|---|---|
| Prefix | n + 1 | 3 + 1 = 4 | ε, a, ab, abc |
| Suffix | n + 1 | 3 + 1 = 4 | ε, c, bc, abc |
| Substring | n(n + 1) / 2 | 3(4) / 2 = 6 | a, b, c, ab, bc, abc |
How to get Substring formula: n(n + 1) / 2
Let w = abcde (length n = 5). A substring is any continuous part of the string. To count them, we consider all possible starting positions.
-
Starting at position 1 (a): a, ab, abc, abcd, abcde → 5 substrings
-
Starting at position 2 (b): b, bc, bcd, bcde → 4 substrings
-
Starting at position 3 (c): c, cd, cde → 3 substrings
-
Starting at position 4 (d): d, de → 2 substrings
-
Starting at position 5 (e): e → 1 substring
Adding them gives 5 + 4 + 3 + 2 + 1 = 15 substrings. In general, the number of substrings of a string of length n is n(n+1)/2, which matches this example: 5(5+1)/2 = 15.
Subsequence of a String:
A subsequence of a string is formed by deleting zero or more symbols from the string without changing the order of the remaining symbols.
- Example: Let w = abc, then the subsequences will be ε, a, b, c, ab, ac, bc, abc.
Note: Unlike substrings, a subsequence does not require symbols to be consecutive.
Counting Subsequences
If a string has n different symbols, then each symbol can either be in
cluded or not included in a subsequence.
-
So, for each of the n symbols, there are 2 choices: include or exclude.
-
Total number of subsequences = 2ⁿ
Example:
Let w = abc (n = 3, all symbols different)
-
Total subsequences = 2³ = 8
-
Subsequences: ε, a, b, c, ab, ac, bc, abc
Note: This formula counts the empty subsequence ε as well.
|
Note: If a symbol repeats, subsequences that differ only in the position of identical symbols are considered the same. The total number of distinct subsequences is less than 2ⁿ. |
Set of ALL strings over the given alphabet
The set of all strings over a given alphabet Σ is denoted by Σ* (called the Kleene Star of Σ). It includes all possible strings of any finite length, including the empty string ε.
-
Properties:
- Includes strings of length 0, 1, 2, 3, … infinitely.
- Concatenation of any number of symbols from Σ forms a string in Σ*.
The content of Σ* depends on the number of symbols in the alphabet.
- Unary alphabet (|Σ| = 1), Σ = {a}: Σ* = {ε, a, aa, aaa, …} – strings are just repeated a’s.
- Binary alphabet (|Σ| = 2), Σ = {0,1}: Σ* = {ε, 0, 1, 00, 01, 10, 11, 000, …} – all combinations of 0 and 1.
- Ternary alphabet (|Σ| = 3), Σ = {a,b,c}: Σ* = {ε, a, b, c, aa, ab, ac, ba, bb, bc, …} – all combinations of a, b, c.
- Quaternary alphabet (|Σ| = 4), Σ = {a,b,c,d}: Σ* = {ε, a, b, c, d, aa, ab, ac, ad, ba, bb, …} – all sequences using 4 symbols.
- Observation: For alphabet size k, number of strings of length n = kⁿ, and Σ* is the union of strings of all lengths 0,1,2,… infinitely.
This shows how Σ* grows rapidly as the alphabet size increases.
3. Language (L)
A language is a set of strings over an alphabet. Let’s take Σ = {a, b}. A language over Σ is any set of strings formed from Σ, including finite or infinite sets. Here are some examples of languages L₁ over Σ:
-
L₁ = {ε} → language containing only the empty string.
-
L₂ = {a, b} → language containing all strings of length 1.
-
L₃ = {aa, ab, ba, bb} → all strings of length 2.
-
L₄ = {a, ab, abb, abbb, …} → strings starting with a.
-
L₅ = {b, ba, baa, baaa, …} → strings starting with b.
-
L₆ = {w ∈ Σ* | w ends with a} → strings of any length that end with a.
-
L₇ = {w ∈ Σ* | w has even length} → strings like ε, ab, aa, bb, aabb, …
-
L₈ = Σ* → language of all possible strings over {a,b}.
-
L₉ = {w ∈ Σ* | w contains exactly two a’s} → strings like “aab”, “aba”, “baa”.
-
L₁₀ = ∅ → empty language containing no strings.
Observation: A language over Σ can be any subset of Σ* (L⊆Σ*), finite or infinite, empty, or the whole set.
|
Important: Here’s the difference between ∅ (empty set) and ε (empty string) in Theory of Automata:
|
Operations on Language
In Theory of Automata, a language is a set of strings over an alphabet Σ. Several operations can be performed on languages to create new languages. Here’s a concise explanation:
-
Union (L₁ ∪ L₂) → All strings that belong to L₁ or L₂ or both.
-
Example: L₁ = {a, b}, L₂ = {b, c} → L₁ ∪ L₂ = {a, b, c}
-
-
Intersection (L₁ ∩ L₂) → All strings that belong to both L₁ and L₂.
-
Example: L₁ = {a, b}, L₂ = {b, c} → L₁ ∩ L₂ = {b}
-
-
Complement (Lᶜ) → All strings in Σ* that are not in L.
-
Example: Σ = {a, b}, L = {a} → Lᶜ = {ε, b, aa, ab, ba, bb, …}
-
-
Concatenation (L₁L₂) → All strings formed by joining a string from L₁ with a string from L₂.
-
Example: L₁ = {a, b}, L₂ = {c, d} → L₁L₂ = {ac, ad, bc, bd}
-
-
Kleene Star (L)* → All strings formed by concatenating zero or more strings from L, including ε.
-
Example: L = {a} → L* = {ε, a, aa, aaa, …}
-
-
Kleene Plus (L⁺) → All strings formed by concatenating one or more strings from L, excluding ε.
-
Example: L = {a} → L⁺ = {a, aa, aaa, …}
-
-
Difference (L₁ − L₂) → All strings in L₁ that are not in L₂.
-
Example: L₁ = {a, b, c}, L₂ = {b, c} → L₁ − L₂ = {a}
-
These operations allow constructing complex languages from simpler ones.
Reversal of a Language:
4. Kleene Star (Σ*)
The set of all possible strings (including ε) formed from Σ.
If Σ = {a}, then:
Σ* = {ε, a, aa, aaa, aaaa, …}
5. Kleene Plus (Σ⁺)
All possible strings except ε.
Σ⁺ = Σ* − {ε}
6. Finite Automaton (FA)
A mathematical machine that accepts or rejects strings.
Main types:
-
DFA (Deterministic Finite Automaton)
-
NFA (Non-Deterministic Finite Automaton)
7. Deterministic Finite Automaton (DFA)
A DFA is defined as:
M = (Q, Σ, δ, q₀, F)
Where:
-
Q → Finite set of states
-
Σ → Input alphabet
-
δ → Transition function
-
q₀ → Initial state
-
F → Set of final states
In DFA:
-
Exactly one transition for each symbol from each state.
8. Non-Deterministic Finite Automaton (NFA)
In NFA:
-
A state may have multiple transitions for the same symbol.
-
May include ε-transitions.
9. Regular Expression (RE)
A formal way to describe regular languages.
Operators:
-
Union ( + or | )
-
Concatenation
-
Kleene Star (*)
Example:
-
(a+b)*abb
10. Regular Language
A language that can be:
-
Accepted by DFA/NFA
-
Described using Regular Expression
11. Grammar
A formal way to generate strings of a language.
G = (V, Σ, P, S)
Where:
-
V → Variables
-
Σ → Terminals
-
P → Productions
-
S → Start symbol
12 Chomsky Hierarchy
Proposed by Noam Chomsky.
Types:
-
Type 0 → Recursively Enumerable
-
Type 1 → Context Sensitive
-
Type 2 → Context Free
-
Type 3 → Regular
Circles and arrows form the automata machine, where Circles represent states, and arrows represent transitions. An automaton takes some input strings as input, and this input goes through some states and may be entered in the final state.
Basic Elements – Theory of Automata
Grammar defines production rules, which generate strings (made from input symbols). These strings form a language, which can be accepted or rejected by an automata machine.

The following are some essential elements of the theory of Automata.
1. Grammar
A grammar is a formal set of rules used to generate strings in a language. It includes:
- A start symbol
- Non-Terminals (variables)
- Terminals (alphabet symbols)
- Production rules
Example:
Let the grammar be:
-
Start Symbol: S
-
Terminals: a, b
-
All Production Rules:
-
S → aS
-
S → bS
-
S → a
-
Can represented as S → aS | bS | a
Production rules are the specific rewriting rules in a grammar. They tell how to replace non-terminals with terminals or other non-terminals.
From above grammar, rules are:
-
S → aS
-
S → bS
-
S → a
These allow generating strings like:
-
a
-
aa (S → aS → aa)
-
aba (S → aS → abS → aba)
| Grammar is commonly used to represent a Set of Production Rules with the letter ‘P’.
Note: In all Production rules, P : S → aS | bS | a. Where S → a is one the production rule. |
The followings are the major types of grammar in Automata
-
Type-3: Regular Grammar – Used by Finite Automata to generate Regular Languages.
-
Type-2: Context-Free Grammar (CFG) – Used by Pushdown Automata to generate Context-Free Languages.
-
Type-1: Context-Sensitive Grammar (CSG) – Used by Linear Bounded Automata to generate Context-Sensitive Languages.
-
Type-0: Unrestricted Grammar – Used by Turing Machines to generate Recursively Enumerable Languages.
2. Input Strings
Input strings are sequences of terminal symbols (like a, b) that are derived by applying production rules starting from the start symbol. It is a collection of some symbols from the Sigma (∑). The symbol ‘W’ denotes the string.
String Examples (from the Grammar above):
-
a
-
aa
-
ababa
-
bba
These are input strings we may feed into an automaton for testing.
| Important: Sigma (∑) contains a finite set of symbols known as an alphabet. Symbols involve letters, the alphabet, or any picture. Strings are made from input symbols (∑).
Examples: ∑ = {a, b} , ∑ = {A, B, C, D} , ∑ = {0, 1, ….., 5] |
If ∑ = {x, y}, then various strings that can be generated from ∑ are {x, y, xxx, xyxy, yy, xyy, …..,}.
An empty string, also known as an epsilon string, is a string that contains no symbols or has zero occurrences of symbols. Symbol ‘ε’ denotes the epsilon string. A string’s length refers to the number of characters it contains. Symbol ‘|w|’ denotes the length of that string. If the string is w= 010, its length is 3.
3. Language
A language is the set of all valid strings that can be generated by a grammar. Grammar uses production rules to create the set of strings.
| Note: Languages in automata are classified into four major types: Regular, Context-Free, Context-Sensitive, and Recursively Enumerable, based on the Chomsky Hierarchy. |
Example 01:
The language L = { all strings over a and b that end in a }
So:
-
Valid strings (in the language): a, aa, aba, bba
-
Invalid strings: b, ab, bb
Example: 02
Language (L) = {Set of string of length 2} = {xx, yy, xy, yx} // Finite Language
Example: 03
L2 = {Set of all strings starts with ‘x’} = {x, xx, x, xyy, xyyyy, xyxyxyyyx, ………,} // Infinite Language
4. Automata Machine
An automaton (like a DFA or NFA) is a state machine that recognizes the language by processing input strings and decides whether they belong to a language.
It consists of:
- States
- Transitions based on input symbols
- A start state
- Accepting (final) state(s)
The automaton takes each input string and processes it through states. If it ends in an accepting state, the string is accepted; otherwise, it’s rejected.
| Note: Some examples of automata machines are Finite Automaton (FA), Pushdown Automaton (PDA), Linear Bounded Automaton (LBA), Turing Machine (TM), Moore machine, Mealy machine etc. |
Validation of an Automata Machine
-
An automaton is designed to recognize a language (a set of valid strings).
-
To validate that the automaton is correct, we test it using strings:
-
It should accept all strings in the language.
-
It should reject all strings not in the language.
-
So, we validate the machine by checking how it behaves on many test strings — both valid and invalid — based on the definition of the target language.
Example:
Let the language L = all binary strings that end in 1.
We can validate the DFA for this language using strings like:
-
“1” → Should be accepted
-
“101” → Accepted
-
“100” → Rejected
-
“” (empty string) → Rejected
If the DFA accepts all strings in L and rejects all others, then it is valid for language L.
Theory of Automata Notations
The following diagram represents the notations used in automata theory.

Important: Languages are generated according to their grammar and accepted or rejected by automata machines. Automata are abstract machines that accept or reject regular languages.
Note: “It is possible to express all regular languages using Regular Expressions.”
Types of Automata Machines
The following table presents information about different types of automata and their respective recognized languages and computational abilities. Each automaton type ranges from regular to ω-regular language classes. The purpose of the table is to demonstrate the varying computational capacities of different automata models.
| Automata Machine | Recognized Languages | Computational Power |
|---|---|---|
| Finite Automaton (FA) | Regular Languages | Limited to Regular Languages |
| Two-Way Finite Automaton | Regular Languages | Same as Finite Automaton |
| Mealy Machine | Output depends on state and input | Limited to Regular Languages |
| Moore Machine | Output depends only on the state | Limited to Regular Languages |
| Pushdown Automaton (PDA) | Context-Free Languages | More powerful than Finite Automaton |
| Linear Bounded Automaton (LBA) | Context-Sensitive Languages | More powerful than PDA |
| Nested Stack Automaton | Certain types of nested structures | More expressive than PDAs |
| Register Machine | Integer functions | Turing-equivalent |
| Turing Machine (TM) | Recursively Enumerable Languages | The most powerful model of computation |
| Multi-Tape Turing Machine | Recursively Enumerable Languages | Same as the Turing Machine |
| Non-deterministic Turing Machine (NDTM) | Recursively Enumerable Languages | Same as the Turing Machine |
| Deterministic Büchi Automaton | ω-limit Languages | Recognizes languages with infinite words satisfying Büchi acceptance condition |
| Nondeterministic Büchi Automaton | ω-Regular Languages | Recognizes languages with infinite words satisfying Büchi acceptance condition |
| Weighted Automaton | Weighted Languages | Assign weights to infinite words |
| Rabin Automaton, Streett Automaton, Parity Automaton, Muller Automaton | ω-Regular Languages | Recognize various classes of ω-regular languages |
Applications Theory of Automata
Following are the applications of thoery of automata
- Compiler Design: Utilizes automata theory for lexical analysis and parsing in compiler construction.
- Parsing and Syntax Analysis: Uses context-free grammars and pushdown automata for analyzing syntax in compilers.
- Lexical Analysis: Applies regular expressions and finite automata to identify tokens in programming language source code.
- Natural Language Processing (NLP): Involves language recognition and syntax analysis for applications like speech recognition.
- Software Engineering: Utilizes automata for specifying and verifying correctness in software systems.
- Model Checking: Verifies system specifications to ensure correctness in hardware and software.
- Network Protocol Design: Uses finite state machines to model and analyze network protocol behavior.
- DNA Computing: Applies automata concepts for solving computational problems using DNA strands.
- Formal Verification: Uses automata for verifying correctness in hardware and software systems.
- Regular Expressions in Text Processing: Widely used for searching and data extraction tasks.
- Databases: Applies finite automata and regular expressions for query processing and pattern matching.
- VLSI Design: Utilizes automata theory in modeling and optimizing sequential circuits in VLSI.
- Robotics: Implements finite state machines to model and control robot behavior based on sensory input.