In Recovery... | How expensive are page splits in terms of transaction log?
In the example, I'm going to create a table with approximately 1000-byte long rows:
USE pagesplittest;
CREATE CLUSTERED INDEX BigRows_CL ON BigRows (c1);
INSERT INTO BigRows VALUES (2, 'a');
INSERT INTO BigRows VALUES (3, 'a');
INSERT INTO BigRows VALUES (4, 'a');
INSERT INTO BigRows VALUES (6, 'a');
INSERT INTO BigRows VALUES (7, 'a');
I've engineered the case where the clustered index data page has space for one more row, and ...
I've left a 'gap' at c1=5. Let's add it as part of an explicit transaction and see how much transaction log is generated:
INSERT INTO BigRows VALUES (8, 'a');
SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('PageSplitTest');
-----------------------------------
That's about what I'd expect for that row. Now what about when I cause a page split by inserting the 'missing' c1=5 row into the full page?
COMMIT TRAN
INSERT INTO BigRows VALUES (5, 'a');
SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('PageSplitTest');
-----------------------------------
Wow. 5.5x more bytes are written to the transaction log as part of the system transaction that does the split.
The ratio gets worse as the row size gets smaller. For a row with an approximately 100-byte long row (use the same code as above, but change to a CHAR (100), insert 67 rows with a 'gap' somewhere then insert the 68th to cause the split), the two numbers are 328 and 5924 - the split cause 18 times more log to be generated! For a row with an approximately 10-byte long row, I got numbers of 240 and 10436, because I created skewed data (about 256 rows with the key value 8) and then inserted key value 5 which forced a (rare) non-middle page split. That's a ratio of more than 43 times more log generated! You can try this yourself if you want: I changed the code to have a CHAR (10), inserted values 1, 2, 3, 4, 6, 7, then inserted 256 key values of 8 and then 2 of 5. The resulting page had only 6 rows - it split after the key value 5 - the Storage Engine doesn't always do a 50/50 page split. And that's not even causing nasty cascading page-splits, or splits that have to split a page multiple times to fit a new (variable-sized) row in.
Bottom line: page splits don't just cause extra IOs and index fragmentation, they generate a *lot* more transaction log. And all that log has to be (potentially) backed up, log shipped, mirrored....
Hide