Very large tables can be a bear to load, query and archive. How is “very large table” defined? Usually in hundreds of gigabytes or terabytes. Regardless, if you have a given table which is so large as to be causing maintenance or query bottlenecks, you might benefit from table partitioning.
I my ‘real world’ scenario, I have a parent table which is not very large, but a child table which is expected to receive a minimum of 100,000 rows/day (likely 10x that). This constitutes a very large table for us and will certainly benefit from table partitioning both in the loading and querying aspects.
There are many articles that cover the basic steps (listed below) of creating a partitioned table. The [Partitioned Tables and Indexes] (http://technet.microsoft.com/en-us/library/ms190787.aspx) article from Microsoft TechNet is a good example.
I will quickly go through these steps in the following T-SQL inorder to set the stage for the area I want to focus on, which is the actual data loading phase of partitioned tables.
Below is a simple CREATE PARTITION FUNCTION statement that starts off
with a single boundary at zero. Given the
RANGE RIGHT, this means that all negative numbers are in the left-most partition, and
zero starts the right most partition initially.
1 CREATE PARTITION FUNCTION PF1_Right (bigint) AS RANGE RIGHT FOR VALUES (0);
You can this partition function using the ‘$PARTITION’ database object as shown below, which will return
that the key value of zero will be stored in the 2nd partition.
1 SELECT $PARTITION.PF1_Right(0);
Next, we’ll create a filegroup for the physical storage of the partitioned table. It is optional (you could use the default PRIMARY filgroup), but recommended. At least the initial filegroup must exist before the partition scheme can be defined. Below are two simple ALTER DATABASE statements to add a filegroup named “FgSandbox2” to the Sandbox database (change the name from Sandbox to your database name), and then adding a physical file named Sandbox2.ndf to the filegroup (change the database name and path as necessary).
1 2 3 4 5 6 7 8 9 10 11 12 13 ALTER DATABASE Sandbox ADD FILEGROUP FgSandbox2; ALTER DATABASE Sandbox ADD FILE ( Name = Sandbox2, FILENAME = 'D:\Databases\MSSQL11.MSSQLSERVER\Data\Sandbox2.ndf', SIZE = 50 MB, MAXSIZE = UNLIMITED, FILEGROWTH = 10 % ) TO FILEGROUP FgSandbox2
Below, the CREATE PARTITION SCHEME statement is used to create the
partitioning scheme which will be assigned to the table. The
ALL TO syntax is used to specify that the FgSandbox2 filegroup
will be used for all partitions of the partitioned table.
1 2 3 CREATE PARTITION SCHEME PS1_Right AS PARTITION PF1_Right ALL TO (FgSandbox2);
Although not specifically required for partitioning, it is a best practice to create and use a non-dbo schema for encapsulating tables. The CREATE SCHEMA statement below creates a new schema named ‘DWD’ (my initials). The partition scheme and function created earlier are not (can’t be) assigned a schema, but shortly we will be creating the the partitioned table and it can and shout be assigned to our non-dbo schema.
1 CREATE SCHEMA [DWD];
Next, we’ll create a series of tables. As mentioned earlier, my scenario involves a partitioned child table with a foreign key to a non-partitined parent table. This obviously means creating 2 tables here, and because my goal is to show parition switching from a staging table, we’ll create this table as well.
First, the parent table… for simplicity it includes only the primary key identity column (ID), and a timestamp for informational purposes.
1 2 3 4 5 CREATE TABLE [DWD].[ParentTable] ( ID BIGINT NOT NULL IDENTITY (1,1) CONSTRAINT PK_ParentTable_ID PRIMARY KEY CLUSTERED, Timestamp DATETIME NOT NULL );
Next is the actual partitioned table. As mentioned, this table has a foreign key to the parent table. In addition, we have a primary key identity column, in keeping with our ‘real world’ scenario, and again information columns timestamp and description.
Take note of the
ON clause at the end of the statement. It indicates that the table will be using the partition scheme PS1_Right,
based ont he ParentTable_ID to determine partition into which a given row is stored.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 CREATE TABLE [DWD].[ChildPartitionedTable] ( ID BIGINT NOT NULL IDENTITY(1,1), ParentTable_ID BIGINT NOT NULL CONSTRAINT FK_ChildPartitionedTable_ParentTable_ID FOREIGN KEY REFERENCES [DWD].[ParentTable](ID), Timestamp SMALLDATETIME NOT NULL, Description NVARCHAR(100) NULL, CONSTRAINT PK_ChildPartitionedTable_ID PRIMARY KEY CLUSTERED (ID, ParentTable_ID) WITH (IGNORE_DUP_KEY = OFF), CONSTRAINT CheckRange_ChildPartitionedTable_ParentTable_ID CHECK (ParentTable_ID >= 0 AND 20 >= ParentTable_ID) ) ON PS1_Right ( ParentTable_ID );
Finally, we prepare a staging table which will be used to initially load the data into the database. Notice that the staging table is in the same filegroup as the partition scheme partition. When more than one filegroup is used by the partition scheme, either multiple staging tables must be used, or each load needs to create it’s own staging table in the proper filegroup relevant to the given data load.
1 2 3 4 5 6 7 8 CREATE TABLE [DWD].[ChildPartitionedTable_Stage] ( ID BIGINT NOT NULL IDENTITY(1,1), ParentTable_ID BIGINT NOT NULL CONSTRAINT FK_ChildPartitionedTable_Stage_ParentTable_ID FOREIGN KEY REFERENCES [DWD].[ParentTable](ID), Timestamp SMALLDATETIME NOT NULL, Description NVARCHAR(100) NULL ) ON FgSandbox2; -- Important that the staging table is on the same FileGroup as the destination partition.
… Coming soon - Part 2 : Data Movement with Partition Switching …
Some articles I found useful: