Part 1 here: http://verrigo.blogspot.ro/2013/06/on-replacing-in-strings-in-sql-server.html
So last time we looked at two strategies for replacing strings using T-SQL. The first one (using repeated calls of replace function), and the second one, using a recursive CTE approach.
Two things to notice. First, with the recursive CTE approach, the replace is configurable and much easier to look at. Second, the performance of the first approach is a lot better than the second one.
Approach | Reads | Writes | Duration |
Replace(Replace… | 129 | 0 | 380 |
Using CTE | 583162 | 221 | 1935 |
Let’s CTE again (Is it worth it?)
So, let’s remember the CTE approach again
;WITH s AS
(SELECT ROW_NUMBER() OVER (ORDER BY ReplaceTest.Name) AS rono, ReplaceTest.Name AS s FROM dbo.ReplaceTest),
replacetble (id, toReplace, replaceWith)
AS
(select 1 as id, 'a' as toReplace, 'b' as replaceWith
Union ALL
select 2 as id, 'b', 'c'
Union ALL
select 3 as id, 'c', 'd'
Union ALL
select 4 as id, 'd', 'e'
Union ALL
select 5 as id, 'e', 'f'),
replacenow(z, idr ) AS
(
SELECT REPLACE(s.s,(select COALESCE(replacetble.toReplace,'') from replacetble where id =1 ),(select COALESCE(replacetble.replaceWith,'') from replacetble where id =1 ))AS z,1 AS idr FROM s
UNION ALL
SELECT REPLACE(replacenow.z,(SELECT COALESCE(replacetble.toReplace,'') FROM replacetble
WHERE id = idr),(SELECT COALESCE(replacetble.replaceWith,'') FROM replacetble
WHERE id = idr)) AS z, idr+1 FROM replacenow
WHERE idr>=1 AND idr<=LEN(replacenow.z)
)
SELECT z,idr FROM replacenow
WHERE idr=(SELECT max(id) FROM replacetble)+1
First thing to do here is to get rid of the replacetable CTE and start using a table. A table and an index on it.
CREATE TABLE [dbo].[ReplaceTable](
[id] [int] IDENTITY(1,1) NOT NULL,
[toReplace] [varchar](100) NULL,
[replaceWith] [varchar](100) NULL,
CONSTRAINT [PK_ReplaceTable] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
GO
insert into ReplaceTable
select 'a' as toReplace, 'b' as replaceWith
Union ALL
select 'b', 'c'
Union ALL
select 'c', 'd'
Union ALL
select 'd', 'e'
Union ALL
select 'e', 'f'
GO /****** Object: Index [NonClusteredIndex-20130702-222441] Script Date: 7/11/2013 10:20:44 PM ******/ CREATE NONCLUSTERED INDEX [NonClusteredIndex-20130702-222441] ON [dbo].[ReplaceTable] ( [toReplace] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO /****** Object: Index [NonClusteredIndex-20130702-225644] Script Date: 7/11/2013 10:20:44 PM ******/ CREATE NONCLUSTERED INDEX [NonClusteredIndex-20130702-225644] ON [dbo].[ReplaceTable] ( [replaceWith] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO
Next, we will create a clusterd index on the ReplaceTest table:
/****** Object: Index [ClusteredIndex-20130702-213612] Script Date: 7/11/2013 10:30:13 PM ******/
DROP INDEX [ClusteredIndex-20130702-213612] ON [dbo].[ReplaceTest] WITH ( ONLINE = OFF )
GO
/****** Object: Index [ClusteredIndex-20130702-213612] Script Date: 7/11/2013 10:30:13 PM ******/
CREATE CLUSTERED INDEX [ClusteredIndex-20130702-213612] ON [dbo].[ReplaceTest]
(
[Name] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
And now, the code will suffer minor adjustments in order to take advantage of the recent changes:
;WITH replacenow(z, idr ) AS
(
SELECT REPLACE(replacetest.name,(select COALESCE(replacetable.toReplace,'') from replacetable where id =1 ),(select COALESCE(replacetable.replaceWith,'') from replacetable where id =1 ))AS z,1 AS idr FROM replacetest
UNION ALL
SELECT REPLACE(replacenow.z,(SELECT COALESCE(replacetable.toReplace,'') FROM replacetable
WHERE id = idr),(SELECT COALESCE(replacetable.replaceWith,'') FROM replacetable
WHERE id = idr)) AS z, idr+1 FROM replacenow
WHERE idr>=1 AND idr<=LEN(replacenow.z)
)
SELECT z,idr FROM replacenow
WHERE idr=(SELECT max(id) FROM replacetable)+1
Next, we take a look at the two queries again. We will examine the execution plan and use the sql profiler to determine the number of reads, writes and duration for each query.
A plan is a plan is plan…
If so far the conclusion is not obvious by now, check out the photos below showing the execution plan for each query:
Execution plan for the first approach
Execution plan for the CTE approach
Let’s look at the result from SQL Profiler:
Reads, Writes, Duration for the first approach
Reads, Writes, Duration for the CTE approach
So, to sum it up, there’s really only one way to do this kind of replacing in SQL Server
Approach | Reads | Writes | Duration |
Replace(Replace… | 144 | 0 | 936 |
Using CTE | 823316 | 341 | 6961 |
Fun with SQL CLR
For the final scenario, we’re gonna take a look at the following request: Strip out all the diacritics in a text and replace them with their corresponding characters. So for example: “astăzi” will become “astazi”.
In order to achieve this and not use so many calls of the replace functions that will make the code look horrible, we will use SQL CLR.
First, let’s create a C# class that will hold our RemoveDiacritics method
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.SqlServer.Server;
using System.Globalization;
namespace SQLFunctions
{
public class SVF
{
public static string RemoveDiacritics(string s)
{
string stFormD = s.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int ich = 0; ich < stFormD.Length; ich++) {
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if (uc != UnicodeCategory.NonSpacingMark) { sb.Append(stFormD[ich]); }
}
return (sb.ToString().Normalize(NormalizationForm.FormC));
}
}
}
Build the class and then run the following sql code:
sp_configure 'clr enabled', 1
GO
reconfigure
GO
create assembly SQLFunctions from 'SQLFunctions.dll'
GO
create function RemoveDiacritics(@s nvarchar(200)) returns nvarchar(200)
as external name [SQLFunctions].[SQLFunctions.SVF].[RemoveDiacritics]
SQL CLR RemoveDiacritics execution plan
Reads, Writes, Duration for SQL CLR RemoveDiacritics
So this method, having 349 reads, 0 writes and 992 ms duration is much nicer to look at and does suffer in terms of performance.
Couple of notes
1. I got the code for the sql clr method that uses the Normalize method from here: http://stackoverflow.com/a/249126/404311
2. I am using SSMS Tools Pack (http://www.ssmstoolspack.com/) to save the execution plans to .jpg files. So big thanks to Mladen Prajdić (@MladenPrajdic) for this
Cheers,