About a month ago I was in a position to take over support of a large database and handle the upgrade from SQL 2000 to SQL 2005. The database is approx 800 GB in size. Not the largest DB in the world, but the largest for the company I work for. Upon taking over the database I started looking for the basic stuff, like backups, maintenance task, etc. What I discovered concerned me greatly. The previous DBA had no script to run DBCC CHECKDB, backups only occurred twice a week, single file group, and more. The data in this database is static, once it is loaded it never changes, in the rare case the data is in correct it is deleted for that day and reloaded. The data is loaded into monthly tables and then merged into quarterly tables. There is an application GUI that the Sys Admin does this task with. With the data already being carved up into nice neat quarterly tables it made since to split the data into multiple file groups. I checked with the vendor of the application about this and they said we have the latitude to make this change. As I started the task of moving the data into file groups I learned that we did not have a clustered index on any of the tables. What we did have was 17 non clustered indexes on a table with 27 columns. All the rows in the tables for 7 1/2 years of data total 1.5 billion rows. That is a lot of indexes, approx 541 GB of indexes on 222 GB of data.
So what do I do? No clustered index to alter in order to move the data into file groups. I had to create a clustered index to move the data. Unfortunately there are not any columns in the tables that are unique so I picked the most unique column to create the index on. The move went very smooth; however my indexes GREW by over 100 GB. I couldn’t understand why the indexes would have grown just because I created a clustered index. What I was able to find out from research and help from SSC is that when you have a HEAP then SQL Server assigns and 8 byte number as the row identifier that is used by the non clustered indexes. By creating a clustered index that was not unique SQL had to attach an additional 4 bytes to the clustered index in order to make it unique. These 4 bytes per 17 indexes by 1.5 b rows added up to over 100 GB of additional space. Since I had the data moved into the new file groups I decided to drop the clustered index and put the indexes in their original state. After reading a great deal more about indexes I decided that the best thing to do at this point would be to create an identity column of data type (INT) on each table in question and set it as the clustered index. I contacted the vendor to discuss modifying their database structure. After 30 minutes on the phone with their product development team explaining in detail the difference between a heap table and one with a clustered index and how non clustered indexes work, they agreed that creating the identity column is the smart thing to do. I got the green light to move forward and have been told that my changes for the file groups and identity column will make it into their next release of the product.
The end result is
1) 8 file groups, only the primary file group has data that changes. After a full backup we can now backup the primary file group (80 GB) in a matter of minute’s verses hours.
2) Clustered Index on all large tables with a reduction of 112 GB by them being able to point to 4 bytes instead of 8.
3) Nightly backup of the primary file group.
4) Weekly DBCC CHECKDB checks.
What are my next steps? I plan to evaluate the current 17 non clustered indexes of which are on single columns and see which ones we can remove if any, and where a composite index would make sense. I have already determined several that would reduce query times by 9 seconds each. Typical query is taking 10 – 12 seconds. Make sure to come back to my blog for additional details.