Theory of Automata

The theory of automata is the study of abstract machines and computational problems. The computation problems can be solved using these abstract machines. The abstract machine is called the automata or automaton.

Different names of Automata Theory are 

  • Theory of computation (TOC)
  • Automata Theory
  • Theory of Formal Languages and Computation
  • Formal languages, Automata, Grammars 

Terminologies in Theory of Automata

Below are the most important terminologies explained in simple and clear wording

1. Alphabet (Σ) and Symbol

An alphabet denoted as “Σ” (also called sigma) is a nonempty finite set of symbols. The elements of the alphabet are called symbols. A symbol is a single element (atomic entity) of Length 1.

Example:

  • Σ = {0,1}

  • Σ = {a,b}

  • Σ = {1,2,3}

  • Σ = {1,2,3, a,b,c}

In the above examples 0,1, a,b, 1,2,3 are symbols that contain the length exactly 1.

Important:

ε (epsilon) is not an alphabet symbol; it is the unique empty string of length 0.

2. String (Word)

A string is a finite sequence of symbols from an alphabet (Σ).

  • If Σ = {0,1}, then 0, 01, 1101, 11010,…so on, are valid strings.
  • If Σ = {a,b}, then a, aaab, bbbba, ababba,…so on, are valid strings.
  • If Σ = {0,1, 2,3,…..,9}, then 12345, 56654, 11455, 8888888,…so on, are valid strings.
  • If Σ = {a,b,c,d,…,z}, then theory, beautiful, hello,…so on, are valid strings.

The number of symbols in a string is called the length of the string.  For example, if w = 1011, then the length of the string (|w|) = 4. The string with no symbols is called the empty string.It is represented by ε (called epsilon). The length of an empty string is 0

  • The empty string ε is a string of length 0; for example, if Σ = {a, b}, then ε ∈ Σ*. The empty set is a set with no elements; for example, L = ∅ means the language contains no strings at all.

Let the alphabet be Σ = {a, b}, which means we can form strings using only the symbols a and b.

  • A string of length 0 is called the empty string and is written as ε.
  • For length 1, the possible strings are: a, b.
  • For length 2, the possible strings are: aa, ab, ba, bb.
  • For length 3, the possible strings are: aaa, aab, aba, abb, baa, bab, bba, bbb.
  • For length 4, the possible strings are: aaaa, aaab, aaba, aabb, abaa, abab, abba, abbb, baaa, baab, baba, babb, bbaa, bbab, bbba, bbbb.

In general, for alphabet size 2, the total number of strings of length n is 2ⁿ.

Important Points:

  • If Σ = {0,1}, then the string 01 is not equal to the string 10.
  • If Σ = {0,1}, then strings like a110 or 10b cannot be derived from the alphabet because the alphabet does not contain the symbols a and b.

Set Vs String

  • Set: A collection of distinct elements where order does not matter, e.g., {a, b} = {b, a}.
  • String: An ordered sequence of symbols where order matters, e.g., “ab” ≠ “ba”.

Here is a clear comparison table for Set vs String in TOC:

Basis Set String
Definition A collection of elements A sequence of symbols
Order Order does not matter, i.e., {a,b} = {b,a} Order matters, i.e., ab is not equal to ba
Repetition Duplicate elements not allowed Symbols can repeat, i.e., aababbba
Written As {a, b, c} abc
Example {a, b} ab
Length No concept of length, finite or infinite Has length which is always finite
Empty Form ∅ (empty set) ε (empty string)
Used For Define languages Form elements of languages

Exercise:

Question 1: Let Σ= {a}, How many Strings are there over Σ ?

AnswerIf Σ = {a}, then there are infinitely many strings over Σ because we can form ε, a, aa, aaa, aaaa, and so on endlessly (since Σ* is infinite). For every length n ≥ 0, there is exactly one string (aⁿ), such as ε (length 0), a (length 1), aa (length 2), aaa (length 3), and so forth without end, but each string has finite length.

Question 02: Consider the strings over the alphabet {a,b,c}.

  1. i. How many strings of length 5 are there?
  2. How many strings of length n are there?
  3. How many strings of length 7 are there, in which the first, middle, and last symbols are b?
  4. How many strings of length at most 5 are there?

Answer:

i. Each position has 3 choices. So, 35=243 strings is the answer

ii. Each of n positions has 3 choices. So, 3n

iii. Length = 7, Positions fixed are

  • 1st = b

  • 4th (middle) = b

  • 7th = b

The remaining positions are 4, therefore each has 3 choices. So, 34 = 81 strings

iv. Length ≤ 5 means:
0, 1, 2, 3, 4, 5

Total:

30+31+32+33+34+35 =1+3+9+27+81+243  364 strings

NOTE:

  • Every Individual String has a finite length. But there are infinitely many strings over any Σ.
  • Analogy: Each Movie has a finite length, but we can create infinitely many movies.

Reversal of a String:

The reversal of a string is obtained by writing the symbols of the string in the opposite order.

  • Syntax: If w = a₁a₂a₃…aₙ, then the reversal of the string is written as wᴿ = aₙaₙ₋₁…a₂a₁.
  • Example: If w = abc, then its reversal is wᴿ = cba.

Palindrome String:

A palindrome string is a string that reads the same forward and backward (i.e., w = wᴿ).

  • Example: aba, aa, abba, madam, eye.

Concatenation of Strings

Concatenation is the operation of joining two strings together to form a new string.

  • Notation: If x and y are strings, their concatenation is written as xy.
  • Example: If x = ab and y = ba, then xy = abba. Similarly If x = ab, y = ba, and z = mn, then zxy = mnabba.
  • Length Rule: |xy| = |x| + |y| (the length of the new string is the sum of both lengths).
  • Order Matters: xy ≠ yx in general.
    • Example: Let x = ab and y = c
    • xy = abc
    • yx = cab
    • Since abc ≠ cab, the order of strings matters in concatenation.
  • Associative Property: (xy)z = x(yz).
  • Identity Element: ε (epsilon) acts as the identity element for concatenation.
  • Left Identity: εx = x.
  • Right Identity: xε = x.
  • Example with ε: If x = ab, then εx = ab and xε = ab.

Power of a String

If w = ab, then the power of the string means repeated concatenation of w with itself.

Example:

  • w⁰ = ε
  • w¹ = ab
  • w² = abab
  • w³ = ababab
  • w⁴ = abababab
  • In general, wⁿ means concatenating the string w with itself n times.

Prefix of string

A prefix of a string is any part taken from the beginning of the string.

  • Example: If w = abcde, prefixes are ε, a, ab, abc, abcd, abcde.

Suffix of string

A suffix of a string is any part taken from the end of the string.

  • Example: If w = abcde, suffixes are ε, e, de, cde, bcde, abcde.

Substring of string

A substring is any continuous sequence of symbols inside a string.

  • Example: If w = abcde, substrings include a, ab, bc, bcd, cde, abc, abcde.

Note: every prefix and suffix is a substring, but not every substring is a prefix or suffix.

Number of Prefixes, Suffixes, and Substrings of a String

Consider a string w = abc having length = 3, so n =3. Look at the following table

Concept Formula Number Example
Prefix n + 1 3 + 1 = 4 ε, a, ab, abc
Suffix n + 1 3 + 1 = 4 ε, c, bc, abc
Substring n(n + 1) / 2 3(4) / 2 = 6 a, b, c, ab, bc, abc

How to get Substring formula: n(n + 1) / 2

Let w = abcde (length n = 5). A substring is any continuous part of the string. To count them, we consider all possible starting positions.

  • Starting at position 1 (a): a, ab, abc, abcd, abcde → 5 substrings

  • Starting at position 2 (b): b, bc, bcd, bcde → 4 substrings

  • Starting at position 3 (c): c, cd, cde → 3 substrings

  • Starting at position 4 (d): d, de → 2 substrings

  • Starting at position 5 (e): e → 1 substring

Adding them gives 5 + 4 + 3 + 2 + 1 = 15 substrings. In general, the number of substrings of a string of length n is n(n+1)/2, which matches this example: 5(5+1)/2 = 15.

Subsequence of a String:

A subsequence of a string is formed by deleting zero or more symbols from the string without changing the order of the remaining symbols.

  • Example: Let w = abc, then the subsequences will be ε, a, b, c, ab, ac, bc, abc.

Note: Unlike substrings, a subsequence does not require symbols to be consecutive.

Counting Subsequences

If a string has n different symbols, then each symbol can either be in

cluded or not included in a subsequence.

  • So, for each of the n symbols, there are 2 choices: include or exclude.

  • Total number of subsequences = 2ⁿ

Example:
Let w = abc (n = 3, all symbols different)

  • Total subsequences = 2³ = 8

  • Subsequences: ε, a, b, c, ab, ac, bc, abc

Note: This formula counts the empty subsequence ε as well.

Note: If a symbol repeats, subsequences that differ only in the position of identical symbols are considered the same. The total number of distinct subsequences is less than 2ⁿ.

Set of ALL strings over the given alphabet

The set of all strings over a given alphabet Σ is denoted by Σ* (called the Kleene Star of Σ). It includes all possible strings of any finite length, including the empty string ε.

  • Properties:

    • Includes strings of length 0, 1, 2, 3, … infinitely.
    • Concatenation of any number of symbols from Σ forms a string in Σ*.

The content of Σ* depends on the number of symbols in the alphabet.

  • Unary alphabet (|Σ| = 1), Σ = {a}: Σ* = {ε, a, aa, aaa, …} – strings are just repeated a’s.
  • Binary alphabet (|Σ| = 2), Σ = {0,1}: Σ* = {ε, 0, 1, 00, 01, 10, 11, 000, …} – all combinations of 0 and 1.
  • Ternary alphabet (|Σ| = 3), Σ = {a,b,c}: Σ* = {ε, a, b, c, aa, ab, ac, ba, bb, bc, …} – all combinations of a, b, c.
  • Quaternary alphabet (|Σ| = 4), Σ = {a,b,c,d}: Σ* = {ε, a, b, c, d, aa, ab, ac, ad, ba, bb, …} – all sequences using 4 symbols.
  • Observation: For alphabet size k, number of strings of length n = kⁿ, and Σ* is the union of strings of all lengths 0,1,2,… infinitely.

This shows how Σ* grows rapidly as the alphabet size increases.

3. Language (L)

A language is a set of strings over an alphabet. Let’s take Σ = {a, b}. A language over Σ is any set of strings formed from Σ, including finite or infinite sets. Here are some examples of languages L₁ over Σ:

  • L₁ = {ε} → language containing only the empty string.

  • L₂ = {a, b} → language containing all strings of length 1.

  • L₃ = {aa, ab, ba, bb} → all strings of length 2.

  • L₄ = {a, ab, abb, abbb, …} → strings starting with a.

  • L₅ = {b, ba, baa, baaa, …} → strings starting with b.

  • L₆ = {w ∈ Σ* | w ends with a} → strings of any length that end with a.

  • L₇ = {w ∈ Σ* | w has even length} → strings like ε, ab, aa, bb, aabb, …

  • L₈ = Σ* → language of all possible strings over {a,b}.

  • L₉ = {w ∈ Σ* | w contains exactly two a’s} → strings like “aab”, “aba”, “baa”.

  • L₁₀ = ∅ → empty language containing no strings.

Observation: A language over Σ can be any subset of Σ* (LΣ*), finite or infinite, empty, or the whole set.

Important: Here’s the difference between ∅ (empty set) and ε (empty string) in Theory of Automata:

  • ε (epsilon)

    • Represents the empty string, a string of length 0.

    • It belongs to Σ* for any alphabet Σ.

    • Example: If Σ = {a, b}, ε ∈ Σ*

  • ∅ (empty set)

    • Represents the empty language, a set containing no strings at all.

    • It does not contain ε unless explicitly stated.

    • Example: If L = ∅, then L contains no strings, not even ε.

  • Epsilon Set {ε}:
    • A language (set) containing only the empty string.
    • For examaple L = {ε}
    • Finite language of size 1, contains ε

Operations on Language

In Theory of Automata, a language is a set of strings over an alphabet Σ. Several operations can be performed on languages to create new languages. Here’s a concise explanation:

  • Union (L₁ ∪ L₂) → All strings that belong to L₁ or L₂ or both.

    • Example: L₁ = {a, b}, L₂ = {b, c} → L₁ ∪ L₂ = {a, b, c}

  • Intersection (L₁ ∩ L₂) → All strings that belong to both L₁ and L₂.

    • Example: L₁ = {a, b}, L₂ = {b, c} → L₁ ∩ L₂ = {b}

  • Complement (Lᶜ) → All strings in Σ* that are not in L.

    • Example: Σ = {a, b}, L = {a} → Lᶜ = {ε, b, aa, ab, ba, bb, …}

  • Concatenation (L₁L₂) → All strings formed by joining a string from L₁ with a string from L₂.

    • Example: L₁ = {a, b}, L₂ = {c, d} → L₁L₂ = {ac, ad, bc, bd}

  • Kleene Star (L)* → All strings formed by concatenating zero or more strings from L, including ε.

    • Example: L = {a} → L* = {ε, a, aa, aaa, …}

  • Kleene Plus (L⁺) → All strings formed by concatenating one or more strings from L, excluding ε.

    • Example: L = {a} → L⁺ = {a, aa, aaa, …}

  • Difference (L₁ − L₂) → All strings in L₁ that are not in L₂.

    • Example: L₁ = {a, b, c}, L₂ = {b, c} → L₁ − L₂ = {a}

These operations allow constructing complex languages from simpler ones.

Reversal of a Language:

4. Kleene Star (Σ*)

The set of all possible strings (including ε) formed from Σ.

If Σ = {a}, then:

Σ* = {ε, a, aa, aaa, aaaa, …}

5. Kleene Plus (Σ⁺)

All possible strings except ε.

Σ⁺ = Σ* − {ε}

6. Finite Automaton (FA)

A mathematical machine that accepts or rejects strings.

Main types:

  • DFA (Deterministic Finite Automaton)

  • NFA (Non-Deterministic Finite Automaton)

7. Deterministic Finite Automaton (DFA)

A DFA is defined as:

M = (Q, Σ, δ, q₀, F)

Where:

  • Q → Finite set of states

  • Σ → Input alphabet

  • δ → Transition function

  • q₀ → Initial state

  • F → Set of final states

In DFA:

  • Exactly one transition for each symbol from each state.

8. Non-Deterministic Finite Automaton (NFA)

In NFA:

  • A state may have multiple transitions for the same symbol.

  • May include ε-transitions.

9. Regular Expression (RE)

A formal way to describe regular languages.

Operators:

  • Union ( + or | )

  • Concatenation

  • Kleene Star (*)

Example:

  • (a+b)*abb

10. Regular Language

A language that can be:

  • Accepted by DFA/NFA

  • Described using Regular Expression

11. Grammar

A formal way to generate strings of a language.

G = (V, Σ, P, S)

Where:

  • V → Variables

  • Σ → Terminals

  • P → Productions

  • S → Start symbol

12 Chomsky Hierarchy

Proposed by Noam Chomsky.

Types:

  1. Type 0 → Recursively Enumerable

  2. Type 1 → Context Sensitive

  3. Type 2 → Context Free

  4. Type 3 → Regular

 

Circles and arrows form the automata machine, where Circles represent states, and arrows represent transitions. An automaton takes some input strings as input, and this input goes through some states and may be entered in the final state.

Basic Elements – Theory of Automata

Grammar defines production rules, which generate strings (made from input symbols). These strings form a language, which can be accepted or rejected by an automata machine.

Elements in Theory of Automata

The following are some essential elements of the theory of Automata.

1. Grammar

A grammar is a formal set of rules used to generate strings in a language. It includes:

  • A start symbol
  • Non-Terminals (variables)
  • Terminals (alphabet symbols)
  • Production rules

Example:

Let the grammar be:

  • Start Symbol: S

  • Terminals: a, b

  • All Production Rules:

    • S → aS

    • S → bS

    • S → a

Can represented as S → aS | bS | a

Production rules are the specific rewriting rules in a grammar. They tell how to replace non-terminals with terminals or other non-terminals.

 From above grammar, rules are:

  • S → aS

  • S → bS

  • S → a

These allow generating strings like:

  • a

  • aa (S → aS → aa)

  • aba (S → aS → abS → aba)

Grammar is commonly used to represent a Set of Production Rules with the letter ‘P’.

Note: In all Production rules, P : S → aS | bS | a. Where S → a is one the production rule.

The followings are the major types of  grammar in Automata

  • Type-3: Regular Grammar – Used by Finite Automata to generate Regular Languages.

  • Type-2: Context-Free Grammar (CFG) – Used by Pushdown Automata to generate Context-Free Languages.

  • Type-1: Context-Sensitive Grammar (CSG) – Used by Linear Bounded Automata to generate Context-Sensitive Languages.

  • Type-0: Unrestricted Grammar – Used by Turing Machines to generate Recursively Enumerable Languages.

2. Input Strings

Input strings are sequences of terminal symbols (like a, b) that are derived by applying production rules starting from the start symbol. It is a collection of some symbols from the Sigma (∑). The symbol ‘W’ denotes the string.

String Examples (from the Grammar above):

  • a

  • aa

  • ababa

  • bba

These are input strings we may feed into an automaton for testing.

Important: Sigma (∑) contains a finite set of symbols known as an alphabet. Symbols involve letters, the alphabet, or any picture. Strings are made from input symbols (∑).

Examples: ∑ = {a, b} , ∑ = {A, B, C, D} , ∑ = {0, 1, ….., 5]  

If ∑ = {x, y}, then various strings that can be generated from ∑ are {x, y, xxx, xyxy, yy, xyy, …..,}.

An empty string, also known as an epsilon string, is a string that contains no symbols or has zero occurrences of symbols. Symbol ‘ε’ denotes the epsilon string. A string’s length refers to the number of characters it contains. Symbol ‘|w|’ denotes the length of that string. If the string is w= 010, its length is 3.

3. Language

A language is the set of all valid strings that can be generated by a grammar. Grammar uses production rules to create the set of strings. 

Note: Languages in automata are classified into four major types: Regular, Context-Free, Context-Sensitive, and Recursively Enumerable, based on the Chomsky Hierarchy.

Example 01:

The language L = { all strings over a and b that end in a }

So:

  • Valid strings (in the language): a, aa, aba, bba

  • Invalid strings: b, ab, bb

Example: 02

Language (L) = {Set of string of length 2} = {xx, yy, xy, yx}        //  Finite Language

Example: 03

L2 = {Set of all strings starts with ‘x’} = {x, xx, x, xyy, xyyyy, xyxyxyyyx, ………,}      //  Infinite Language

4. Automata Machine

An automaton (like a DFA or NFA) is a state machine that recognizes the language by processing input strings and decides whether they belong to a language.

It consists of:

  • States
  • Transitions based on input symbols
  • A start state
  • Accepting (final) state(s)

The automaton takes each input string and processes it through states. If it ends in an accepting state, the string is accepted; otherwise, it’s rejected.

Note: Some examples of automata machines are Finite Automaton (FA),  Pushdown Automaton (PDA), Linear Bounded Automaton (LBA), Turing Machine (TM), Moore machine, Mealy machine etc.

Validation of an Automata Machine

  • An automaton is designed to recognize a language (a set of valid strings).

  • To validate that the automaton is correct, we test it using strings:

    • It should accept all strings in the language.

    • It should reject all strings not in the language.

So, we validate the machine by checking how it behaves on many test strings — both valid and invalid — based on the definition of the target language.

 Example:

Let the language L = all binary strings that end in 1.

We can validate the DFA for this language using strings like:

  • “1” → Should be accepted

  • “101” → Accepted

  • “100” → Rejected

  • “” (empty string) → Rejected

If the DFA accepts all strings in L and rejects all others, then it is valid for language L.

Theory of Automata Notations

The following diagram represents the notations used in automata theory.

Notations in Theory Of Automata

Important: Languages are generated according to their grammar and accepted or rejected by automata machines. Automata are abstract machines that accept or reject regular languages.

Note: “It is possible to express all regular languages using Regular Expressions.”

Types of Automata Machines

The following table presents information about different types of automata and their respective recognized languages and computational abilities. Each automaton type ranges from regular to ω-regular language classes. The purpose of the table is to demonstrate the varying computational capacities of different automata models.

Automata Machine Recognized Languages Computational Power
Finite Automaton (FA) Regular Languages Limited to Regular Languages
Two-Way Finite Automaton Regular Languages Same as Finite Automaton
Mealy Machine Output depends on state and input Limited to Regular Languages
Moore Machine Output depends only on the state Limited to Regular Languages
Pushdown Automaton (PDA) Context-Free Languages More powerful than Finite Automaton
Linear Bounded Automaton (LBA) Context-Sensitive Languages More powerful than PDA
Nested Stack Automaton Certain types of nested structures More expressive than PDAs
Register Machine Integer functions Turing-equivalent
Turing Machine (TM) Recursively Enumerable Languages The most powerful model of computation
Multi-Tape Turing Machine Recursively Enumerable Languages Same as the Turing Machine
Non-deterministic Turing Machine (NDTM) Recursively Enumerable Languages Same as the Turing Machine
Deterministic Büchi Automaton ω-limit Languages Recognizes languages with infinite words satisfying Büchi acceptance condition
Nondeterministic Büchi Automaton ω-Regular Languages Recognizes languages with infinite words satisfying Büchi acceptance condition
Weighted Automaton Weighted Languages Assign weights to infinite words
Rabin Automaton, Streett Automaton, Parity Automaton, Muller Automaton ω-Regular Languages Recognize various classes of ω-regular languages

Applications Theory of Automata

Following are the applications of thoery of automata 

  • Compiler Design: Utilizes automata theory for lexical analysis and parsing in compiler construction.
  • Parsing and Syntax Analysis: Uses context-free grammars and pushdown automata for analyzing syntax in compilers.
  • Lexical Analysis: Applies regular expressions and finite automata to identify tokens in programming language source code.
  • Natural Language Processing (NLP): Involves language recognition and syntax analysis for applications like speech recognition.
  • Software Engineering: Utilizes automata for specifying and verifying correctness in software systems.
  • Model Checking: Verifies system specifications to ensure correctness in hardware and software.
  • Network Protocol Design: Uses finite state machines to model and analyze network protocol behavior.
  • DNA Computing: Applies automata concepts for solving computational problems using DNA strands.
  • Formal Verification: Uses automata for verifying correctness in hardware and software systems.
  • Regular Expressions in Text Processing: Widely used for searching and data extraction tasks.
  • Databases: Applies finite automata and regular expressions for query processing and pattern matching.
  • VLSI Design: Utilizes automata theory in modeling and optimizing sequential circuits in VLSI.
  • Robotics: Implements finite state machines to model and control robot behavior based on sensory input.

"Your Support, Our Priority"

"We Make It Easy"