How to minimize the page splits in sqlserver to improve the performane of database



Page Splits:

A page is 8Kbytes of data which can be index related, data related, large object binary (lob’s) etc...

When you insert rows into a table they go on a page, into ‘slots’, your row will have a row length and you can get only so many rows on the 8Kbyte page. What happens when that row’s length increases because you entered a bigger product name in your varchar column for instance,well,SQL Server needs to move the other rows along in order to make room for your modification, if the combined new length of all the rows on the page will no longer fit on that page then SQL Server grabs a new page and moves rows to the right or left of your modification onto it – that is called a ‘page split’.

Page splits arise when records from one memory page are moved to another page during changes to your table. Suppose a new record (Martin) being inserted, in sequence, between Adam and Rony. Since there’s no room in this memory page, some records will need to shift around. The page split occurs when Irene’s record moves to the second page.

This creates page fragmentation and is very bad for performance and is also reported as page split.

Page splits are considered very bad for performance, and there are a number of techniques to reduce, or even eliminate, the risk of page splits.

 

Example code for tracking Page Splits :

We can find the bad page splits using the event sql_server.transaction_log. This event monitors all the activities in the transaction log, because that we need to use with caution. We can filter the ‘operation’ field looking for the value 11, which means LOP_DELETE_SPLIT. This is the deletion of rows that happens when SQL Server is moving rows from one page to another in a page split, a bad page split.

 

Extended Events for SQL Server provides a generic tracing and troubleshooting framework which allows deeper and more granular level control of tracing which was not possible using earlier methods like DBCC, SQL Trace, Profiler, etc... These earlier methods still exist and Extended Events is not a replacement.

 

For this We need to create the session by t-sql. The code to create the session will be this:

 

IF EXISTS (SELECT 1

            FROM sys.server_event_sessions

            WHERE name = 'PageSplits_Tracker')

    DROP EVENT SESSION [PageSplits_Tracker] ON SERVER

 

CREATE EVENT SESSION PageSplits_Tracker

ON    SERVER

ADD EVENT sqlserver.transaction_log(

    WHERE operation = 11  -- LOP_DELETE_SPLIT

)

 

--Description for transaction_log event is: “Occurs when a record is added to the SQL Server transaction log.

--This is a very high volume event that will affect the performance of the server. Therefore, you should use

--appropriate filtering to reduce the number of events, and only use this event for targeted troubleshooting

--during a short time period.”

 

-- LOP_DELETE_SPLIT : A page split has occurred. Rows have moved physically.

 

ADD TARGET package0.histogram(

    SET filtering_event_name = 'sqlserver.transaction_log',

        source_type = 0,source = 'database_id');

GO

 

 

--package0.histogram : You can use the histogram target to troubleshoot performance issues.      

--filtering_event_name : Any event present in the Extended Events session.

--source_type : The type of object that the bucket is based on.

--0 for an event

--1 for an action

--source : The event column or action name that is used as the data source.

 

-- Start the Event Session

 

ALTER EVENT SESSION PageSplits_Tracker

ON SERVER

STATE=START;

GO

 

-- Create the database

 

CREATE DATABASE Performance_Tracker

GO

USE [Performance_Tracker]

GO

 

-- Create a bad splitting clustered index table

 

CREATE TABLE PageSplits

( ROWID UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,

  Data INT NOT NULL DEFAULT (RAND()*1000),

  Change_Date DATETIME2 NOT NULL DEFAULT CURRENT_TIMESTAMP);

GO

 

--  This index should mid-split based on the DEFAULT column value

 

CREATE INDEX IX_PageSplitsPk_Data ON PageSplits (Data);

GO

 

--  This index should end-split based on the DEFAULT column value

 

CREATE INDEX IX_PageSplitsPk_ChangeDate ON PageSplits (Change_Date);

GO

 

-- Create a table with an increasing clustered index

 

CREATE TABLE PageSplits_Index

( ROWID INT IDENTITY NOT NULL PRIMARY KEY,

  Data INT NOT NULL DEFAULT (RAND()*1000),

  Change_Date DATETIME2 NOT NULL DEFAULT DATEADD(mi, RAND()*-1000, CURRENT_TIMESTAMP));

GO

 

--  This index should mid-split based on the DEFAULT column value

 

CREATE INDEX IX_PageSplits_Index_ChangeDate ON PageSplits_Index (Change_Date);

GO

 

-- Insert the default values repeatedly into the tables

 

WHILE 1=1

BEGIN

    INSERT INTO PageSplits DEFAULT VALUES;

    INSERT INTO PageSplits_Index DEFAULT VALUES;

    WAITFOR DELAY '00:00:00.005';

END

GO

 

--If we startup this workload and allow it to run for a couple of minutes, we can then query the histogram target

--for our session to find the database that has the mid-page splits occurring.

 

-- Query the target data to identify the worst splitting database_id

 

with cte as

(

SELECT

    n.value('(value)[1]', 'int') AS database_id,

    DB_NAME(n.value('(value)[1]', 'int')) AS database_name,

    n.value('(@count)[1]', 'bigint') AS split_count

FROM

(SELECT CAST(target_data as XML) target_data

 FROM sys.dm_xe_sessions AS s

 JOIN sys.dm_xe_session_targets t

     ON s.address = t.event_session_address

 WHERE s.name = 'PageSplits_Tracker'

  AND t.target_name = 'histogram' ) as tab

CROSS APPLY target_data.nodes('HistogramTarget/Slot') as q(n)

)

select * from cte

 

database_id

database_name

split_count

16

Performance_Tracker

123

 

--With the database_id of the worst splitting database, we can then change our event session configuration

--to only look at this database, and then change our histogram target configuration to bucket on the alloc_unit_id

--so that we can then track down the worst splitting indexes in the database experiencing the worst mid-page splits

-- Drop the Event Session so we can recreate it

-- to focus on the highest splitting database

 

DROP EVENT SESSION [PageSplits_Tracker]

ON SERVER

 

-- Create the Event Session to track LOP_DELETE_SPLIT transaction_log operations in the server

 

CREATE EVENT SESSION [PageSplits_Tracker]

ON    SERVER

ADD EVENT sqlserver.transaction_log(

    WHERE operation = 11  -- LOP_DELETE_SPLIT

      AND database_id = 16 -- CHANGE THIS BASED ON TOP SPLITTING DATABASE!

)

 

ADD TARGET package0.histogram(

    SET filtering_event_name = 'sqlserver.transaction_log',

        source_type = 0, -- Event Column

        source = 'alloc_unit_id');

GO

 

-- Start the Event Session Again

 

ALTER EVENT SESSION [PageSplits_Tracker]

ON SERVER

STATE=START;

GO

 

--With the new event session definition, we can now rerun our problematic workload for more than 10 minutes period

-- and look at the worst splitting indexes based on the alloc_unit_id’s that are in the histogram target:

 

WHILE 1=1

BEGIN

    INSERT INTO PageSplits DEFAULT VALUES;

    INSERT INTO PageSplits_Index DEFAULT VALUES;

    WAITFOR DELAY '00:00:00.005';

END

GO

 

-- Query Target Data to get the top splitting objects in the database:

SELECT

    o.name AS table_name,

    i.name AS index_name,

    tab.split_count,indexstats.index_type_desc AS IndexType,

indexstats.avg_fragmentation_in_percent,

    i.fill_factor

FROM (    SELECT

            n.value('(value)[1]', 'bigint') AS alloc_unit_id,

            n.value('(@count)[1]', 'bigint') AS split_count

        FROM

        (SELECT CAST(target_data as XML) target_data

         FROM sys.dm_xe_sessions AS s

         JOIN sys.dm_xe_session_targets t

             ON s.address = t.event_session_address

         WHERE s.name = 'PageSplits_Tracker'

          AND t.target_name = 'histogram' ) as tab

        CROSS APPLY target_data.nodes('HistogramTarget/Slot') as q(n)

) AS tab

JOIN sys.allocation_units AS au

    ON tab.alloc_unit_id = au.allocation_unit_id

JOIN sys.partitions AS p

    ON au.container_id = p.partition_id

JOIN sys.indexes AS i

    ON p.object_id = i.object_id

        AND p.index_id = i.index_id

JOIN sys.objects AS o

    ON p.object_id = o.object_id

JOIN sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) indexstats

ON i.object_id = indexstats.object_id

AND i.index_id = indexstats.index_id

WHERE o.is_ms_shipped = 0

ORDER BY indexstats.avg_fragmentation_in_percent DESC

 

 

table_name

index_name

split_count

IndexType

avg_fragmentation_in_percent

fill_factor

PageSplits_Index

IX_PageSplits_Index_ChangeDate

286

NONCLUSTERED INDEX

99.57894737

0

PageSplits

PK__PageSpli__97BD02EBEA21A6BC

566

CLUSTERED INDEX

99.37238494

0

PageSplits

IX_PageSplitsPk_Data

341

NONCLUSTERED INDEX

98.98989899

0

PageSplits

IX_PageSplitsPk_ChangeDate

3

NONCLUSTERED INDEX

1.747572816

0

 

--With this information we can now go back and change our FillFactor specifications and retest/monitor the impact

-- to determine whether we’ve had the appropriate reduction in mid-page splits to accommodate the time between

-- our index rebuild operations:

-- Change FillFactor based on split occurences to minimize page splits

 

Using Fill Factor we can minimize the page splits :

Fill Factor :When an index is created with a fill factor percentage, this leaves a percentage of the index pages free after the index is created, rebuilt or reorganized. This free space is used to hold additional pages as page splits occur, reducing the change of a page split in the data page causing a page split in the index structure as well, but even with your Fill Factor set to 10% to 20%, index pages eventually fill up and are split the same way that a data page is split.

 A page is the basic unit of data storage in SQL server. Its size is 8KB(8192 bytes). Data is stored in the leaf-level pages of Index.  The percentage of space to be filled with data in a leaf level page is decided by fill factor. The remaining space left is used for future growth of data in the page. Fill factor is a number from 1 to 100. Its default value is 0, which is same as 100. So when we say fill factor is 70 means, 70% of space is filled with data and remaining 30% is vacant for future use. So higher the fill factor, more data is stored in the page. Fill factor setting is applied when we create/rebuild index.

 

ALTER INDEX PK__PageSpli__97BD02EBEA21A6BC ON PageSplits REBUILD WITH (FILLFACTOR=70)

ALTER INDEX IX_PageSplitsPk_Data ON PageSplits REBUILD WITH (FILLFACTOR=70)

ALTER INDEX IX_PageSplits_Index_ChangeDate ON PageSplits_Index REBUILD WITH (FILLFACTOR=80)

 

GO

-- Stop the Event Session to clear the target

ALTER EVENT SESSION [PageSplits_Tracker]

ON SERVER

STATE=STOP;

GO

 

-- Start the Event Session Again

ALTER EVENT SESSION [PageSplits_Tracker]

ON SERVER

STATE=START;                           

GO

 

 --Do the workload once again

WHILE 1=1

BEGIN

    INSERT INTO PageSplits DEFAULT VALUES;

    INSERT INTO PageSplits_Index DEFAULT VALUES;

    WAITFOR DELAY '00:00:00.005';

END

GO

 

--With the reset performed we can again start up our workload generation and

--begin monitoring the effect of the FillFactor specifications on the indexes with our code.

--After another 2 minute period, the following splits were noted.

--Once again Query Target Data to get the top splitting objects in the database:

--At present there is no page splits are found in indexes IX_PageSplitsPk_ChangeDate, PK__PageSpli__97BD02EBEA21A6BC,   IX_PageSplitsPk_Data

 

table_name

index_name

split_count

IndexType

avg_fragmentation_in_percent

fill_factor

PageSplits_Index

IX_PageSplits_Index_ChangeDate

7

NONCLUSTERED INDEX

1.534526854

80

 

  

Summary :

It is recommended to keep default fill-factor for the databases that have prevalence of SELECT queries running against them such as in Data Warehouses, and it is highly advisable to consider changing your default fill-factor settings if you are managing database with a large number of INSERT and UPDATE queries hitting it. In such databases, a nonzero fill factor other than 0 or 100 can be good for performance if the new data is evenly distributed throughout the table.

Consider a situation wherein the existing rows are updated with data that lengthens the size of the rows; for instance, scenarios when you add extra columns to the table. You should consider the use of a fill factor that is less than 100. The extra space on each page will help to minimize page splits caused by extra length in the rows. However, if all the data is added to the end of the table, the empty space in the index pages will not be filled. For example, if the index key column is an IDENTITY column, the key for new rows is always increasing and the index rows are logically added to the end of the index. In such a situation, you should set your FILLFACTOR to 100.