Change data capture. Practice experience. Part 1

New really great feature appears in SQL Server 2008 – change data capture. In few words: Change data capture records insert, update, and delete activity that is applied to a SQL Server table. This makes the details of the changes available in an easily consumed relational format. Further details can be found on MSDN:

Few months ago I’ve faced a task to create an OLAP solution to provide business with a list of charts and reports on SharePoint portal.

The target architecture was designed the following way:

So, there is a database (BI), where users enter data, and there is a warehouse database (BIWarehouse – used for Analysis Services), on which all reports are processing. The only question was how to catch what data was changed in BI database and transfer from one database to another only modified data. I’ve decided to use for this purpose change data capture mechanism. As it was my first business solution based on this technology, I’ll appreciate deeply all comments and remarks.

Now, to clarify my task I have to some words about BIWarehouse database structure. It can bу separate to the following parts:

  • Reference books – simple tables that contain glossary information (rates names, user names, business units etc).
  • Set of tables that store information about rates. These tables are ground for all reports. I’ll call then Main tables in the text below.

On this post I’ll cover only how I parse information for reference books, data tables updates I’ll blog in the next post. Also I will not speak about how to turn on data change on SQL Server. I start from that point, that change data capture is enabled for specified database, so let’s come directly to modifications of the tables. First of all, we have to add the needed table on BI database to change data capture. This is done by the following command:

EXECUTE sys.sp_cdc_enable_table
    @source_schema = N'dbo'
  , @source_name = N'Ref_BusinessUnit'
  , @role_name = null;

If you want to disable change data capture for the specific table, you have to run the command sys.sp_cdc_disable_table:

execute sys.sp_cdc_disable_table
  @source_schema =  N'dbo' ,
  @source_name = 'Ref_BusinessUnit',
  @capture_instance = N'dbo_Ref_BusinessUnit'

After we have enable change data capture for a table, if we make any changes, they will appear in special table. For Ref_BusinessUnit it is cdc.dbo_Ref_BusinessUnit_CT. The select from it will return all changes that were made to records:

The detailed description of all returned columns: Here I post only short comments on columns I’ll use in my procedures:

Column nameDescription
__$start_lsnCommit LSN associated with the change that preserves the commit order of the change. Changes committed in the same transaction share the same commit LSN value.
__$operationIdentifies the data manipulation language (DML) operation needed to apply the row of change data to the target data source. Can be one of the following:1 = delete2 = insert3 = update (captured column values are those before the update operation). This value applies only when the row filter option ‘all update old’ is specified.4 = update (captured column values are those after the update operation)

So, let’s return to my task and renew it using change data capture possibilities: every 30 minutes I have to identify all records that were changed since last synchronization in BI database and make the same changes in BIWarehouse database.

Firstly I have to store somewhere LSN till the synchronization was made last time. If I try to get data without such parameter, I’ll have to synchronize all changes that were made to the table after last change table cleanup. To avoid it, I’ve created special table dbo.Service_WarehouseConfig, in which last synchronized LSN for each table is stored:

Now let’s say few words about structure of each ref. book in the BIWarehouse database:

The main fields are:

  • <Ref. Book name>ID – ID reference of the record in the BI database,
  • List of fields that are needed for OLAP cube.
  • __LastUPdatedTime – date and time when the record was updated last time.

Now, at last, all preparations are made and we come to procedure that will transfer data from BI to BIWarehouse.

DECLARE @from_lsn binary(10), @to_lsn binary(10), @save_to_lsn binary(10)

-- Get the LSN till synchronization was made the last time
select @save_to_lsn = t.LastLSN from BIWarehouse.dbo.Service_WarehouseConfig t with(xlock) where ID = 'Ref_BusinessUnit table'

-- If the parameter is not defined, the synchronization is made for the first time and we take the minimum value.
if @save_to_lsn is null begin
      select @from_lsn = sys.fn_cdc_get_min_lsn('dbo_Ref_BusinessUnit')
-- In other case, we the next LSN value following the saved one
else begin
      select @from_lsn = sys.fn_cdc_increment_lsn(@save_to_lsn)
-- Get the maximum LSN value.
select @to_lsn = sys.fn_cdc_get_max_lsn()

We’ve done all necessary preparations and in @from_lsn and @to_lsn variables have the period in which all modifications are laying.

-- Start processings onlу if FROM values less then TO value
if @from_lsn < @to_lsn begin

-- Declare table to store all changes that were made.
      Declare @TableAnalysis table (
            Operation int,
            StartLsn binary(10),
            ID int,
            Name nvarchar(126)
      Declare @IsError bit = 0
    -- Put all changes into temp table, using special CDC function. If function returns an error – stop processing
      begin try
            insert into @TableAnalysis (Operation, StartLsn, ID, Name)
            select __$operation, __$start_lsn, ID, Name
            from  cdc.fn_cdc_get_all_changes_dbo_Ref_BusinessUnit (@from_lsn, @to_lsn, 'all update old');
      end try
      begin catch
            select @IsError = 1
      end catch

It’s recommended to work with change tables only through special designed functions. Sometimes it returns errors (I think that this happens when I execute it immediately after a change is made), so in order to prevent procedure from generating an exeption, I’ve made here a try-catch block.

The code below updates BIWarehouse. I do not delete any records in ref. books as they can be used in Main tables.

   If (@IsError = 0) begin
-- Add new records to BIWarehouse.
  insert into BIWarehouse.dbo.Ref_BusinessUnit (BusinessUnitID, Name, __LastUpdatedTime)
            select t.ID, t.Name, GetDate()
            from @TableAnalysis t
            where t.Operation = 2
-- Update existing records.
  update BIWarehouse.dbo.Ref_BusinessUnit
            set Name = t.Name, __LastUpdatedTime = GetDate()
            from (
                  select t1.Operation, t1.ID, t1.Name,
-- If there were several updates of one record from last synchronization – we take only last version.
  row_number() over (
                        partition by t1.Operation
                        order by t1.StartLsn desc
                  )  RowNum
                  from @TableAnalysis t1
            ) t
            where t.Operation = 4 and t.RowNum = 1 and BIWarehouse.dbo.Ref_BusinessUnit.BusinessUnitID = t.ID

-- Save the LSN till we made a synchronization.
  update BIWarehouse.dbo.Service_WarehouseConfig
            set LastSLN = @to_lsn
            where ID = 'Ref_BusinessUnit table'

So, I can say that change data capture significantly decreases the amount of efforts you have to make to get all changes that were made to particular tables. I’ve found working with this functionality easy and will for sure use it in my future projects.

That’s all for today. In the next posts I’ll continue speaking about changes data capture, but will concentrate on update data in Main tables.

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *