SQL Server Partitioning for Performance
Often enough I get questions regarding partitioning to try to help alleviate performance issues with very large tables. A common misconception is that due to the table being large, when queries are run against it and SQL Server has to read the entire table, if you partition the table, SQL Server will read less data. The reality is that a table scan is a table scan, regardless if the table is partitioned or not. There are exceptions, if you are partitioning and putting each file on a separate disk to leverage faster storage IO, sure, that could help.
When DBA’s are looking to improve overall query performance for very large tables, they have several tools available.
The most common approach is to properly index the table for the queries that are executed against the table most often and/or are doing table scans or full index scans. Many times, it may all come down to refactoring the queries themselves where queries/reports are doing select *.
Writing queries to pull back only the data needed is key. Indexing tables so that fewer rows are read can drastically improve overall performance.
Another option to consider is compression. Data compression can decrease how much space the table takes up in the buffer pool. Reading data that is compressed is also typically quicker.
As simple as it sounds, archive data. If data isn’t required to be available for most end users, archiving the data into another table for a select group of users can eliminate excessive reads. Many times customers will search from today to the beginning of time. If they should really be limited to 12 months, either restrict them to 12 months or build an archive process to move that data out to an archive table. If possible, if the true data retention is 12 months, delete the data. Organizations really should have a data governance team that determines data retention policies. Unfortunately for data professionals, most organizations want to keep data forever. That is great news for storage vendors!
So that begs the question – When Should You Partition?
The number one reason that I look to partitioning is to be able to leverage piecemeal restores. If I have large databases that have years of archive data that will never change, I can partition data into yearly or quarterly ReadOnly filegroups. If a restore is needed quickly, I can restore the read/write file groups first to restore user functionality, then bring the ReadOnly filegroups online later.
Partitioning older data into ReadOnly file groups can also allow overall maintenance improvements. Filegroup level backups can allow you to not need to backup read only data as frequently. Index and statistics maintenance is also not regularly needed on data that isn’t changing. Similar for DBCC CHECKDB scans.
Older data in separate filegroups also allows for an easy method to ‘delete’ chunks of data, simply drop the filegroup.
A saying I’ve heard many times is that, “You partition for maintenance, not performance”. I’ve seen this play out time and time again with various customers over the past 15-20 years.
Leave a Reply