Introduction to B Tree in Data Structure
A B-tree is a self-balancing search tree used in data structures to manage large amounts of sorted data efficiently. B-trees are widely used in database indexing, file systems, and search engines due to their ability to store multiple keys in each node, reducing disk I/O operations. Unlike binary search trees, a B-tree node can have more than two children, allowing efficient handling of large datasets. Key points of B Tree are
- A B-Tree is a self-balancing search tree in which nodes can have multiple keys and children.
- It holds a balanced height where all leaf nodes are at the same level
- It is arranged in sorted order
Components of Each Node In B Tree
The major components of B tree are BP (Block Pointer), Keys, CP (Child Pointer), and DP (Data Pointer). Let’s explain all of these with an simple example of B tree Order with 3
-
BP (Block Pointer):The BP refers to the disk block address of the current node, enabling efficient retrieval from disk storage. It does not point to other nodes, just the current one.
-
Keys: These are stored in sorted order within the node and are used for searching and navigating the tree.
-
CP (Child Pointer): This pointer directs to child nodes (applicable for internal nodes) and is used for tree traversal. For instance, CP1, CP2, CP3, etc., are child pointers that point to disk blocks 200, 300, and 400, respectively (referring to child nodes).
-
DP (Data Pointer): This pointer links to the actual data records that are stored elsewhere. Each key is associated with a DP to access the corresponding data.
Example: Similar to searching for a topic in a book, the B-Tree enables you to jump directly to the relevant data block (via the BP) instead of having to scan everything.
Key Properties for B-tree in Term of Order
B tree of order “m” has the following properties
- Maximum Number of Children:
- Every node can have a maximum of
m
children (wherem
is the order of the B-tree).
- Every node can have a maximum of
- Minimum Number of Children:
- Leaf Node: Minimum children = 0 (a leaf node can be empty).
- Root Node: Minimum children = 2 (if the root has children). If the root node has no children (i.e., the tree is empty or only the root exists), the minimum children is 1.
- Internal Node (non-root): Minimum children = ⌈m/2⌉ (the ceiling of half the order).
- Maximum Number of Keys:
- Every node can have a maximum of
m - 1
keys, which corresponds to having m children (one less key than children).
- Every node can have a maximum of
- Minimum Number of Keys:
- Root Node (with children): Minimum keys = 1 (if the root has children).
- Internal Nodes (non-root): Minimum keys = ⌈m/2⌉ – 1 (the ceiling of half the order minus 1).
- Leaf Nodes: Minimum keys = 0 (a leaf node can be empty).
Key Properties for B-tree with Help of Table
Order (m) |
internal node Max Children (m) |
internal node Min Children (⌈m/2⌉) |
internal node Max Keys (m – 1) |
internal node Min Keys (⌈m/2⌉ – 1) |
---|---|---|---|---|
1 | 1 | ⌈1/2⌉ = ⌈0.5⌉ = 1 | 1 – 1 = 0 | ⌈1/2⌉ – 1 = 1 – 1 = 0 |
2 | 2 | ⌈2/2⌉ = ⌈1⌉ = 1 | 2 – 1 = 1 | ⌈2/2⌉ – 1 = 1 – 1 = 0 |
3 | 3 | ⌈3/2⌉ = ⌈1.5⌉ = 2 | 3 – 1 = 2 | ⌈3/2⌉ – 1 = 2 – 1 = 1 |
4 | 4 | ⌈4/2⌉ = ⌈2⌉ = 2 | 4 – 1 = 3 | ⌈4/2⌉ – 1 = 2 – 1 = 1 |
5 | 5 | ⌈5/2⌉ = ⌈2.5⌉ = 3 | 5 – 1 = 4 | ⌈5/2⌉ – 1 = 3 – 1 = 2 |
Special Cases (Some fixed values)
1.) If Root has some children then Root Min Children will 2.
2.) If the root node has no children then
- Root node Min Children = 1
- Root node Min Keys = 1
- leaf node Min keys = 0
Ordering of Elements (keys) in B Tree
1. Keys Are Sorted Within Nodes
In a B-tree, each node can store multiple keys (up to m - 1
keys if the order is m
). These keys are always stored in sorted order (non-decreasing order). This means that the keys inside a node are arranged from smallest to largest.
Example:
Consider a B-tree with an order of m = 4
(meaning each node can have up to 3 keys and 4 children).
- Example with 03 keys:
[10, 20, 30]
.
These keys are sorted in increasing order. No matter how many keys a node contains, they will always be arranged in the same way: from the smallest on the left to the largest on the right.
2. Children Are Ordered Based on Keys
In a B-tree, the children of each node are also organized in a way that corresponds to the keys stored in the node.
Each internal node of a B-tree contains a number of keys and has one more child pointer than the number of keys. These child pointers point to the subtrees (or children) of the current node. The keys in the node divide the child nodes into intervals based on their values.
Ordering of Children:
- A node with k keys will have k + 1 children (subtrees).
- The key values help determine which child a given key belongs to. Specifically, the relationship is as follows
- The first child (child 0) contains all keys less than the first key in the node.
- The second child (child 1) contains keys that are greater than or equal to the first key but less than the second key.
- The third child (child 2) contains keys that are greater than or equal to the second key but less than the third key, and so on.
- The last child contains all keys that are greater than or equal to the last key.
Examples of B Tree
Example 01: B-tree of order 3 with values from 1 to 10.
Example 02: B-tree of order 4 with random values (5,3,21,9,1,13,2,7,10,12,4,8)
Example 03: B-tree of order 5 with alphabet values (D,H,Z,K,B,P,Q,E,A,S,W,T,C,L,N,Y,M)
Example 04: B-tree of order 05 with values from 1 to 20.
How to find the Order (m) of B-Tree
Understanding memory calculation for a single node (block) is essential, as knowing how many children (m) a node (block) can have determines the order of the B-tree.
Question: Consider a B tree with Key size = 10 bytes block size is 512 bytes, data pointer of size 8 bytes and block pointer is 5 bytes find the order of B tree?
Note: The order (m) of a B-tree is defined as the maximum number of children a node can have.
Solution:
- Block Size (512 bytes)
- The block size represents the storage capacity of each node in the B-tree.
- Each node can hold keys, pointers, and other metadata within 512 bytes.
- Block Pointer (child pointer) Size (5 bytes)
- A block pointer is used to connect nodes in a B-tree.
- Each internal node requires pointers to its child nodes.
- Since a node with m keys has m+1 children, it needs m+1 block pointers.
- Key Size (10 bytes)
- The key represents a unique identifier for data in a B-tree.
- Each key occupies 10 bytes of memory in a node.
- Data Pointer Size (8 bytes)
- Each key in a B-tree has an associated data pointer.
- The data pointer stores the actual location of the data corresponding to the key.
- Each key has one data pointer of 8 bytes.
Step-by-Step Calculation of B-tree Order (m)
Total Node Storage Capacity (Block Size)
Each node has a total storage capacity of 512 bytes.
Memory Calculation for one Node (Block)
A B-tree node contains
- m block pointers (each of 5 bytes) → Total = m × 5
- m-1 keys (each of 10 bytes) → Total = (m – 1) × 10
- m-1 data pointers (each of 8 bytes) → Total = (m – 1) × 8
Thus, the total memory used by a node can be written as:
m×5 + (m−1)×10+(m−1)×8 ≤ 512 5m + 10m−10+ 8m−8 ≤ 512 23m - 18 ≤ 512 23m ≤ 530 m≤530/23 m≤23.04 Since m must be an integer, we take the floor value: m=23
This means that each internal node in this B-tree can have at most 23 children and at most 22 keys (since max keys = m – 1).
Applications of B Tree
- Databases (MySQL, PostgreSQL, Oracle) for indexing.
- File Systems (NTFS, HFS+, EXT4) for efficient storage.
- Key-value stores and Indexing in search engines.
- Operating Systems for managing virtual memory and paging.