It’s easy: use the replace function in T-SQL
Select replace('aaaaax', 'a', 'b')
But what if we want to replace multiple strings? We have call replace multiple time within itself until it gets so ugly we immediately disown the code and send it to a boarding school in Switzerland just so we don’t see it anymore:
Select replace(replace(replace(replace('abcdefgh', 'a', 'b'), 'b', 'c'), 'c','d'),'d','e')
And if there are 10 characters to replace, it will look like some monsters from deep inside the depths of hell. Or whatever.
Testing
Let’s first create a table containing a sequence of integers, from 1 to 10000. The first time I saw this technique was in Louis Davidson’s book
Pro SQL Server 2008 Relational Database Design and Implementation (http://www.amazon.com/Server-Relational-Database-Implementation-Experts/dp/143020866X/ref=sr_1_5?ie=UTF8&qid=1372519918&sr=8-5&keywords=louis+davidson+pro+sql+server)
I really like this approach, so here it is:
USE [Tests]GO/****** Object: Table [dbo].[sequence] Script Date: 6/29/2013 8:35:09 AM ******/
SET ANSI_NULLS ON
GOSET QUOTED_IDENTIFIER ON
GOCREATE TABLE [dbo].[sequence](
[Number] [int] NULL
) ON [PRIMARY]
GO;with digits as
(
select (1) as number
Unionselect (2) as number
Unionselect (3) as number
Unionselect (4) as number
Unionselect (5) as number
Unionselect (6) as number
Unionselect (7) as number
Unionselect (8) as number
Unionselect (9) as number
Unionselect (0) as number
)
insert into sequence
select d1.number + 10*d2.number+100*d3.number+1000*d4.number +1 from digits d1 cross join digits d2
cross join digits d3
cross join digits d4
Let’s create a test table that will contain just one column (name) and 10000 rows. This table will be used to test the different replace strategies we will consider.
USE [Tests]GO/****** Object: Table [dbo].[ReplaceTest] Script Date: 6/29/2013 11:29:28 PM ******/
SET ANSI_NULLS ON
GOSET QUOTED_IDENTIFIER ON
GOCREATE TABLE [dbo].[ReplaceTest](
    [Name] [nvarchar](100) NULL) ON [PRIMARY]
GOwith InitialString as
(Select 'abcdefghijklmnopqrstuvwxyz' as initial)
insert into ReplaceTestselect InitialString.initial+cast(number as varchar(5))
from InitialString cross join sequence
So now we have table containing 10000 strings. Notice there is no index on this table, and this is by design. The goal here is to analyze the replace strategies without indexing the table, focusing only on the algorithm.
The Replace Code from Hell
The first replace code to test is the obvious one:
select replace(replace(replace(replace(replace(ReplaceTest.Name,'a', 'b'), 'b', 'c'), 'c', 'd'), 'd', 'e'), 'e', 'f') from ReplaceTest
Using a trace on the sql server, let’s look at the number of reads and query duration for the code above:
So we have 129 reads and 380 ms duration.
Replacing using CTE – A Bad Idea?
Let’s do things… a bit differently. Let’s do some replacing using CTE, recursive CTE and create a more configurable, more friendly SQL code.
;WITH s AS
(SELECT ROW_NUMBER() OVER (ORDER BY ReplaceTest.Name) AS rono, ReplaceTest.Name AS s FROM dbo.ReplaceTest),
replacetble (id, toReplace, replaceWith)
AS(select 1 as id, 'a' as toReplace, 'b' as replaceWith
Union ALL
select 2 as id, 'b', 'c'
Union ALL
select 3 as id, 'c', 'd'
Union ALL
select 4 as id, 'd', 'e'
Union ALL
select 5 as id, 'e', 'f'),
replacenow(z, idr ) AS(
SELECT REPLACE(s.s,(select COALESCE(replacetble.toReplace,'') from replacetble where id =1 ),(select COALESCE(replacetble.replaceWith,'') from replacetble where id =1 ))AS z,1 AS idr FROM s
UNION ALL
SELECT REPLACE(replacenow.z,(SELECT COALESCE(replacetble.toReplace,'') FROM replacetble
WHERE id = idr),(SELECT COALESCE(replacetble.replaceWith,'') FROM replacetble
WHERE id = idr)) AS z, idr+1 FROM replacenow
WHERE idr>=1 AND idr<=LEN(replacenow.z)
)
SELECT z,idr FROM replacenow
WHERE idr=(SELECT max(id) FROM replacetble)+1
Two things are important here: The replacetable CTE that simply defines the rules of the replace (first replace ‘a’ with ‘b’, then ‘b’ with ‘c’ and so on) and the replacenow CTE that uses a recursive strategy and actually performs the replace. I think this code looks better than the first one, it is more configurable - simply add a new UNION ALL at the end of the replacetable CTE with the string you’d want to replace:
Union ALL
select 6 as id, 'f', 'g'
But, it has a problem. It is very slow. How slow? Take a look
So we have 583162 reads, we have 1935 ms duration and we even have 221 writes, just to show how slow this code really is.
Coming up in part two: A different take on the whole recursive CTE idea and we’ll have fun with some SQL CLR.
Cheers.
 
 Postări
Postări
 
 
 
