SQL Server 2008: Change Data Capture (CDC)
Another one of the new features within SQL Server 2008 is the Change Data Capture (CDC). CDC is designed to capture insert, update and delete activity on a SQL table and place the information into a separate relational table. It uses an asynchronous capture mechanism that reads the transaction logs and populates the CDC table with the row's data which change. The CDC table mirrors the column structure of the tracked table, together with metadata regarding the change. In later releases, there is also going to be a audit feature allowing you to audit who accesses what information and when, based more on select statements.
To enable CDC, the following two statements are required. Note, SQLServerAgent must be running for this to work.
use testDatabase
--Activate CDC
EXEC sys.sp_cdc_enable_db_change_data_capture
--IsDatabaseEnabled?
SELECT is_cdc_enabled FROM sys.databases WHERE name = 'testDatabase'
--Enable CDC on table
EXEC sys.sp_cdc_enable_table_change_data_capture @source_schema = 'dbo', @source_name = 'Table_1', @role_name = 'cdc_test'
--IsTableEnabled?
SELECT is_tracked_by_cdc FROM sys.tables WHERE name = 'table_1'
The sp_cdc_enable_table... statement enables the tracking of the changes for a table, parameters are required for setting the source schema, the source name which is the table you want to track, and the role name, if the role does not exist it will create it for you.
After executing these statements you will have a number of new tables within your database. The tables are:
· cdc.captured_columns - Information about the captured columns being tracked.
· cdc.change_tables - Tables being tracked.
· cdc.ddl_history - Changes to the tracked table, details the command issued on the table.
· cdc.index_columns - Guessing this is tracked indexed columns.
· cdc.lsn_time_mapping - When a transaction starts and ends. Relates to rows in the tracked tables.
There is also a table for each table which is tracked, in my case I have just the one - cdc.dbo_Table_1_CT. If we execute a change to the table, such as inserting a new record.
INSERT INTO Table_1 VALUES (1, 'First Test')
A new row is added to the table, with a record also being inserted into lsn_time_mapping.
__$start_lsn __$end_lsn __$seqval __$operation __$update_mask id TestMessage
0x0000001E0000006B001A NULL 0x0000001E0000006B0018 2 0x03 1 First Test
If we update the table again using
UPDATE Table_1 SET TestMessage = 'Updated First Test' WHERE id = 1
Two records are inserted into the tracked table. One with the TestMessage 'First Test' (the original message) and the other with 'Updated First Test' both relating to a single transaction in the lsn_time_mapping table.
If we do
DELETE FROM Table_1 WHERE id = 1
We have a single row inserted into dbo_Table_1_CT saying which row was deleted.
Within the dbo_Table_1_CT there is a column __$operation. This gives you the ID relating to the operation executed on the row. 2 = Insert. 3 = Before Update. 4 = Post Update. 1 = Delete.
If we change the design of the table while it is being tracked, a row is inserted into cdc.ddl_history with the command executed on the table.
source_object_id object_id required_column_update ddl_command
---------------- ----------- ---------------------- --------------------------------
1077578877 1093578934 0 ALTER TABLE dbo.table_1 ADD Status nvarchar(50) NULL
However, if we execute another insert command on the table, the new column does not appear in the result set/dbo_table_1_ct. It would be nice to keep the two insync and not have to worry about updating the table if it is being tracked. To get the tables back insync, I had to disable it (capture_instance parameter relates to the columns being captured I think) and re-enable the tracking.
The commands to do this are:
EXEC sys.sp_cdc_disable_table_change_data_capture @source_schema = 'dbo', @source_name = 'Table_1', @capture_instance = 'all'
EXEC sys.sp_cdc_enable_table_change_data_capture @source_schema = 'dbo', @source_name = 'Table_1', @role_name = 'cdc_test'
In case you are worried about this using up too much space, there is a sys.sp_cdc_cleanup_change_table stored procedure which can be used to remove entries up to a point in time. You could then link this to a Job to archive the information, or simply report and remove.