Sometimes we may have situations where we only load a day’s worth of data into our data warehouse. We may also need to keep the destination data in line with the incoming data.
For instance, imagine we had a shopping basket filled with items one day, which we load into our data warehouse, but the next day (for whatever reason) two of the products are no longer valid. How can we remove these records without deleting all of the other values in our data warehouse for our other trolleys that aren’t in the incoming data set?
The answer is with an incremental MERGE:
--Declare temp tables to store data DECLARE @Target TABLE ( PackageId INTEGER, JournalId INTEGER, Amount MONEY, IsDeleted BIT DEFAULT 0 ) DECLARE @Source TABLE ( PackageId INTEGER, JournalId INTEGER, Amount MONEY ) --Insert some dummy data INSERT INTO @Source VALUES (1,1,10),(1,2,20),(1,3,30),(2,1,40),(2,2,50) --Now merge it into the destination table. MERGE INTO @Target AS TARGET USING @Source AS SOURCE ON TARGET.PackageId = SOURCE.PackageId AND TARGET.JournalId = SOURCE.JournalId --When we match on the merge key, then update the destination table with the new value WHEN MATCHED THEN UPDATE SET TARGET.Amount = Source.Amount --Whne we don't match, then we're a new record, so insert the new record into the destination WHEN NOT MATCHED BY TARGET THEN INSERT ( PackageId, JournalId, Amount ) VALUES ( SOURCE.PackageId, SOURCE.JournalId, SOURCE.Amount ); --Now show our results. SELECT * FROM @Source; SELECT * FROM @Target; --Now let's get rid of some data and carry out an update to show how we would handle a merge for --an incremental load DELETE FROM @Source WHERE PackageId = 1 OR (PackageId = 2 AND JournalId = 1); UPDATE @Source SET Amount = Amount + 20 --Use a CTE to limit the destination rows we want to match against, becuase if we've only loaded --1 day worth of values in from our source system, we could end up deleting lots of data by accident ;WITH tgtCTE AS ( SELECT T.* FROM @Source S INNER JOIN @Target T ON T.PackageId = S.PackageId ) MERGE INTO tgtCTE AS TARGET USING @Source AS SOURCE ON TARGET.PackageId = SOURCE.PackageId AND TARGET.JournalId = SOURCE.JournalId --Again, update any existing records WHEN MATCHED THEN UPDATE SET TARGET.Amount = Source.Amount --Again, insert any new records WHEN NOT MATCHED BY TARGET THEN INSERT ( PackageId, JournalId, Amount ) VALUES ( SOURCE.PackageId, SOURCE.JournalId, SOURCE.Amount ) --Here is the intersting part; we're going to soft delete any records from the --destination table that do not exist in the incoming data set. WHEN NOT MATCHED BY SOURCE THEN UPDATE SET TARGET.IsDeleted = 1; --Show our final results. As you can see in the second merge we only had 1 record for PackageId = 2 --the other JournalId didn't exist in the source table, so it was soft deleted from the destination. SELECT * FROM @Source; SELECT * FROM @Target;
Go ahead and run the above code and see how it all works. Hopefully this will help you on your way.
Paul White gives a great example here at DBA.StackExchange