dwdii: Data Movement with Partition Switching in SQL Server

Data Movement with Partition Switching in SQL Server

This post is a follow-on to my prior “part 1” post Table Partitioning in SQL Server where I introduced the topic and laid-out the data definition which will be the basis for this post on data movement with partition switching in SQL Server.

The process of data movement with partition switching is mentioned in many articles as a seemingly simple operation invoked via a variant of the ALTER TABLE statement. Generally, this is true, but in practice, I have found several key steps which are often not addressed in the average partition switching article. In this post, I hope to shine more light on the real meat of partition switching in a more real world scenario.

In my “real world scenario”, as mentioned in the part 1 post, we have a parent table with a one to many relationship with a partitioned child table containing a foreign key back to the parent. As shown in the DDL from part 1, both tables have identity columns which are the primary keys as well. The identity column on the partitioned child table complicates the data movement, but I’ll show a solution based in part on the DBCC CHECKIDENT suggestion from Transferring Data Efficiently by Using Partition Switching.

The Data Movement - Phase 1

To begin with, given the parent table in our scenario, we start with a simple insert. For simplicity, the parent table includes only an identity column and the timestamp, but could easily contain any additional columns you might need.

1
2
3
4
5
6
7
DECLARE @ParentID BIGINT

-- ParentTable insert
INSERT INTO [DWD].[ParentTable] ([Timestamp]) VALUES (GETUTCDATE());

-- Get the parent key for use during the data loading.
SET @ParentID = SCOPE_IDENTITY();

Next, we have to deal with the child identity challenge. Rather than use DBCC CHECKIDENT I’ve chosen to use the IDENT_CURRENT function. Keep in mind that this data movement must run serially, pipeline-style, and cannot have multiple instances executing this same SQL concurrently.

1
2
3
4
5
6
7
8
9
10
11
DECLARE @DestPartition INT
DECLARE @IdentCurrent BIGINT
DECLARE @StageIdentCurrent BIGINT
DECLARE @NewIdentStage BIGINT

SET @DestPartition = (SELECT $PARTITION.PF1_Right(@ParentID));
SET @IdentCurrent = (IDENT_CURRENT('[DWD].[ChildPartitionedTable]'));
SET @NewIdentStage = @IdentCurrent + 1

-- Reseed the staging table so we keep the IDENTITY up to date.
DBCC CHECKIDENT ('[DWD].[ChildPartitionedTable_Stage]', RESEED, @NewIdentStage) WITH NO_INFOMSGS

First, we call the partitioning function (line 6) to determine the destination partition for the given partitioning key.
Call IDENT_CURRENT on the partitioned child table to get the last allocated IDENTITY value.
Increment the IDENTITY value.
Finally, re-seed the staging table to match the partitioned child table’s next IDENTITY value.

If you are playing along on your side, you can run the following statements to verify where things stand. In development, I use these statements to make sure things are proceeding as expected in my stored proc.

1
2
SET @StageIdentCurrent = (IDENT_CURRENT('[DWD].[ChildPartitionedTable_Stage]'));
SELECT [ParentID] = @ParentID, [DestPartition] = @DestPartition, [IDENT_CURRENT] = @IdentCurrent, [Stage_IDENT_CURRENT] = @StageIdentCurrent;

Next, we prep the staging table by dropping old constraints so we can add revised constraints after data load. These constraints are logistical best practices to ensure, with the help of SQL Server, our staging data is in bounds for the partition switch coming up soon.

1
2
3
4
5
6
7
8
9
10
-- Stage Table: Drop the CHECK constraint and PK on staging table
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[DWD].PK_ChildPartitionedTable_Stage_ID') AND type in (N'PK'))
BEGIN
	ALTER TABLE [DWD].[ChildPartitionedTable_Stage] DROP CONSTRAINT PK_ChildPartitionedTable_Stage_ID  
END

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[DWD].CheckRange_ChildPartitionedTable_Stage_ParentTable_ID') AND type in (N'C'))
BEGIN
	ALTER TABLE [DWD].[ChildPartitionedTable_Stage] DROP CONSTRAINT CheckRange_ChildPartitionedTable_Stage_ParentTable_ID  
END

Here is the data load. For demonstration purposes, I’m selecting data from the AdventureWorks2012 database, but in practice I would use a bulk insert via the .NET Framework’s SQLBulkCopy class, SSIS, or some other bulk insert option.

1
2
3
-- Bulk load into staging table 
INSERT INTO [DWD].[ChildPartitionedTable_Stage] (ParentTable_ID,Timestamp,Description) 
	SELECT @ParentID, GETUTCDATE(), CarrierTrackingNumber FROM AdventureWorks2012.Sales.SalesOrderDetail;

Next, we add our primary key onto the staging table to facilitate lookups and constraint checking

1
2
3
4
ALTER TABLE [DWD].[ChildPartitionedTable_Stage] ADD 
	CONSTRAINT PK_ChildPartitionedTable_Stage_ID
				PRIMARY KEY CLUSTERED (ID, ParentTable_ID)
				WITH (IGNORE_DUP_KEY = OFF)

As mentioned earlier, a partition switching best practice is to add a check constraint on the staging table to ensure only new partition values are present. Additionally, we update the check constraint on the destination table to verify we are in partition range and no existing rows would fall in the new parition range. Our ‘6’ comes from the new Parent_ID allocated during the ParentTable insert (@ParentID from line 7 of the first code block, above). ParentTable_ID is the key for our partitioning function. The range 0 - 6 is hardcoded due to ALTER TABLE requiring constants, so this would be an sp_executesql call or implemented in application code executed via the .NET SqlCommand class.

1
2
3
4
5
6
7
8
-- Best practice to add the check constraint on the staging table after the primary index
IF  NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[DWD].CheckRange_ChildPartitionedTable_Stage_ParentTable_ID') AND type in (N'C'))
BEGIN
	ALTER TABLE [DWD].[ChildPartitionedTable_Stage] ADD CONSTRAINT CheckRange_ChildPartitionedTable_Stage_ParentTable_ID CHECK (6 = ParentTable_ID)
END

ALTER TABLE [DWD].[ChildPartitionedTable] DROP CONSTRAINT CheckRange_ChildPartitionedTable_ParentTable_ID  
ALTER TABLE [DWD].[ChildPartitionedTable] ADD CONSTRAINT CheckRange_ChildPartitionedTable_ParentTable_ID CHECK (ParentTable_ID >= 0 AND 6 > ParentTable_ID)

Again, the ‘6’ comes in when the partition function is altered to split the boundary at 6 to effectively create the new partition.

1
2
ALTER PARTITION SCHEME PS1_Right NEXT USED FgSandbox2;
ALTER PARTITION FUNCTION PF1_Right () SPLIT RANGE (6);

Finally, the actual ALTER TABLE ... SWITCH TO statement which performs the data movement from the staging table to the paritioned destination table.

1
2
3
-- Do the meta-data switch op 
ALTER TABLE [DWD].[ChildPartitionedTable_Stage]
	SWITCH TO [DWD].[ChildPartitionedTable] PARTITION $PARTITION.PF1_Right(6)

I realize I moved fairly quickly through this article, but I hope it will help show (in a slightly more concrete way) how partition switching can be used to load large volumes of data efficiently.

Some articles I found useful:

Best,

Daniel (@dwdii)

Data Movement with Partition Switching in SQL Server

The Data Movement - Phase 1

Articles