What is Data Compression ? Issues With Data Compression

Data Compression

Data Compression is a way to compress the data in a database so that we can reduce the amount of storage space required to host the data. There is a caveat with Data Compression, depending upon the amount of data stored within a table, the allocation unit size of your disk and the data types we could end up using more storage.

Migrating a database to SQL Server 2008 offers one of the greatest advantages; the ability to compress data, thereby reducing the disk overhead. Compressing the data in SQL Server 2008 is a relatively simple task.

This feature is available only in SQL Server 2008 Enterprise Edition and Developer Edition. The reduction in I/O, required to pull data, leads to a significant improvement in performance. This is more so in data warehouse and data mart environments where a DBA works with extremely large databases.

There are 2 types of data compression:

  • Row Compression
  • Page Compression

Row Compression:

Row level compression will provide savings by not storing blank characters within fixed character strings like char data type. NULL and 0 values are not stored and so do not incur additional storage overhead. Following are the data types in SQL Server 2008 which support row level compression:

  • smallint
  • int
  • bigint
  • decimal
  • numeric
  • bit
  • smallmoney
  • money
  • float
  • real
  • datetime
  • datetime2
  • datetimeoffset
  • char
  • nchar
  • binary
  • timestamp/rowversion

Page Compression:

Page compression uses a complex algorithm to minimize the storage space of the data. This is also known as dictionary compression. SQL Server 2008 looks at all the data stored in a page and builds a dictionary based upon that data which can be referenced for repeated values. Only the dictionary ID and the changes of the dictionary value are stored. This provides great savings for similar patterned data. Page compression includes row compression as well, so we get the benefit of both the compressions.

Issues With Data Compression:

Data Compression is not useful in all the cases. Depending on the size of the data, performance requirements and whether the data is encrypted data compressions may not be the ideal solution in certain cases. There is a CPU overhead associated with using data compression and compressing the data may adversely impact the performance of the system in certain cases. High volume OLTP databases could be significantly impacted when attempting to implement data compression.

 

Is Data Compression the Ideal Situation?

Before implementing data compression on a database, a DBA should consider the following:

  • Estimate the potential storage savings that you could get by implementing data compression
  • Complete a baseline performance analysis of the database server and reproduce the same in a staging or development environment. Enable compression and evaluate the performance against that baseline
  • Look at the tables which provide the biggest benefit by compressing data.

Is the Table worth Compressing?

SQL Server 2008 ships with a system stored procedure named sp_estimate_data_compression_savings. This stored procedure accepts 5 parameters:

  1. @schema_name
  2. @object_name
  3. @index_id
  4. @partition_number
  5. @data_compression

@object_name and @data_compression are the 2 most critical parameters. The @object_name parameter refers to the table that you wish to evaluate. @data_compression can have one of the 3 values: NONE, ROW or PAGE. Depending on the value passed to the parameter this stored procedure will perform estimations.

The other parameters provide some more advanced estimation options. @schema_name allows a DBA to estimate the savings against the tables on a different schema. @index_id will allow you to specify the estimated savings for a single index on a table based upon the index id. If this parameter is not passed the stored procedure will assess all the indexes on that table. @partition_number will allow you to define particular partitions on a table to evaluate potential savings. This can be very useful for estimating the savings on archived data. Since the archived data rarely changes, it is an ideal candidate for compression.