Merge – Incremental Soft Delete

Standard

Sometimes we may have situations where we only load a day’s worth of data into our data warehouse. We may also need to keep the destination data in line with the incoming data.

For instance, imagine we had a shopping basket filled with items one day, which we load into our data warehouse, but the next day (for whatever reason) two of the products are no longer valid. How can we remove these records without deleting all of the other values in our data warehouse for our other trolleys that aren’t in the incoming data set?

The answer is with an incremental MERGE: 

--Declare temp tables to store data
DECLARE @Target TABLE
(
PackageId INTEGER,
JournalId INTEGER,
Amount MONEY,
IsDeleted BIT DEFAULT 0
)


DECLARE @Source TABLE
(
PackageId INTEGER,
JournalId INTEGER,
Amount MONEY
)

--Insert some dummy data
INSERT INTO @Source VALUES (1,1,10),(1,2,20),(1,3,30),(2,1,40),(2,2,50)

--Now merge it into the destination table.
MERGE INTO @Target AS TARGET
	USING
	@Source AS SOURCE
	ON TARGET.PackageId = SOURCE.PackageId
	AND TARGET.JournalId = SOURCE.JournalId

--When we match on the merge key, then update the destination table with the new value
WHEN MATCHED THEN
	UPDATE SET
		TARGET.Amount = Source.Amount

--Whne we don't match, then we're a new record, so insert the new record into the destination
WHEN NOT MATCHED BY TARGET THEN
	INSERT
	(
	PackageId,
	JournalId,
	Amount
	)
	VALUES
	(
	SOURCE.PackageId,
	SOURCE.JournalId,
	SOURCE.Amount
	);

--Now show our results.
SELECT * FROM @Source;
SELECT * FROM @Target;


--Now let's get rid of some data and carry out an update to show how we would handle a merge for
--an incremental load
DELETE FROM @Source WHERE PackageId = 1 OR (PackageId = 2 AND JournalId = 1);
UPDATE @Source SET Amount = Amount + 20

--Use a CTE to limit the destination rows we want to match against, becuase if we've only loaded
--1 day worth of values in from our source system, we could end up deleting lots of data by accident
;WITH tgtCTE AS
(
SELECT T.*
FROM @Source S
INNER JOIN @Target T ON T.PackageId = S.PackageId
)


MERGE INTO tgtCTE AS TARGET
	USING @Source AS SOURCE
	ON TARGET.PackageId = SOURCE.PackageId
	AND TARGET.JournalId = SOURCE.JournalId

--Again, update any existing records
WHEN MATCHED THEN
	UPDATE SET
		TARGET.Amount = Source.Amount

--Again, insert any new records
WHEN NOT MATCHED BY TARGET THEN
	INSERT
	(
	PackageId,
	JournalId,
	Amount
	)
	VALUES
	(
	SOURCE.PackageId,
	SOURCE.JournalId,
	SOURCE.Amount
	)

--Here is the intersting part; we're going to soft delete any records from the 
--destination table that do not exist in the incoming data set.
WHEN NOT MATCHED BY SOURCE THEN UPDATE SET TARGET.IsDeleted = 1;


--Show our final results. As you can see in the second merge we only had 1 record for PackageId = 2
--the other JournalId didn't exist in the source table, so it was soft deleted from the destination.
SELECT * FROM @Source;
SELECT * FROM @Target;

Go ahead and run the above code and see how it all works. Hopefully this will help you on your way.

Paul White gives a great example here at DBA.StackExchange