Random Things: 2013

duminică, 14 iulie 2013

On Replacing in Strings in SQL Server - part 2

Part 1 here: http://verrigo.blogspot.ro/2013/06/on-replacing-in-strings-in-sql-server.html

So last time we looked at two strategies for replacing strings using T-SQL. The first one (using repeated calls of replace function), and the second one, using a recursive CTE approach.

Two things to notice. First, with the recursive CTE approach, the replace is configurable and much easier to look at. Second, the performance of the first approach is a lot better than the second one.

Approach	Reads	Writes	Duration
Replace(Replace…	129	0	380
Using CTE	583162	221	1935

Let’s CTE again (Is it worth it?)

So, let’s remember the CTE approach again

;WITH s AS

(SELECT ROW_NUMBER() OVER (ORDER BY ReplaceTest.Name) AS rono, ReplaceTest.Name AS s FROM dbo.ReplaceTest),

replacetble (id, toReplace, replaceWith)

AS

(select 1 as id, 'a' as toReplace, 'b' as replaceWith

Union ALL

select 2 as id,  'b', 'c'

Union ALL

select 3 as id,   'c', 'd'

Union ALL

select 4 as id,  'd', 'e'

Union ALL

select 5 as id,  'e', 'f'),

replacenow(z, idr ) AS

SELECT REPLACE(s.s,(select COALESCE(replacetble.toReplace,'')  from replacetble where id =1 ),(select COALESCE(replacetble.replaceWith,'') from replacetble where id =1 ))AS z,1 AS idr FROM s

UNION ALL

SELECT REPLACE(replacenow.z,(SELECT COALESCE(replacetble.toReplace,'') FROM replacetble

WHERE id = idr),(SELECT COALESCE(replacetble.replaceWith,'') FROM replacetble

WHERE id = idr)) AS z, idr+1 FROM replacenow

WHERE idr>=1 AND idr<=LEN(replacenow.z)

SELECT z,idr FROM replacenow

WHERE idr=(SELECT max(id) FROM replacetble)+1

First thing to do here is to get rid of the replacetable CTE and start using a table. A table and an index on it.

CREATE TABLE [dbo].[ReplaceTable](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [toReplace] [varchar](100) NULL,

    [replaceWith] [varchar](100) NULL,

 CONSTRAINT [PK_ReplaceTable] PRIMARY KEY CLUSTERED

    [id] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

SET ANSI_PADDING OFF

GO

GO

insert into ReplaceTable

select 'a' as toReplace, 'b' as replaceWith

Union ALL

select  'b', 'c'

Union ALL

select   'c', 'd'

Union ALL

select   'd', 'e'

Union ALL

select   'e', 'f'

GO

/****** Object:  Index [NonClusteredIndex-20130702-222441]    Script Date: 7/11/2013 10:20:44 PM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20130702-222441] ON [dbo].[ReplaceTable]
(
    [toReplace] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

/****** Object:  Index [NonClusteredIndex-20130702-225644]    Script Date: 7/11/2013 10:20:44 PM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20130702-225644] ON [dbo].[ReplaceTable]
(
    [replaceWith] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Next, we will create a clusterd index on the ReplaceTest table:

/****** Object:  Index [ClusteredIndex-20130702-213612]    Script Date: 7/11/2013 10:30:13 PM ******/

DROP INDEX [ClusteredIndex-20130702-213612] ON [dbo].[ReplaceTest] WITH ( ONLINE = OFF )

GO

/****** Object:  Index [ClusteredIndex-20130702-213612]    Script Date: 7/11/2013 10:30:13 PM ******/

CREATE CLUSTERED INDEX [ClusteredIndex-20130702-213612] ON [dbo].[ReplaceTest]

    [Name] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

And now, the code will suffer minor adjustments in order to take advantage of the recent changes:

;WITH replacenow(z, idr ) AS

SELECT REPLACE(replacetest.name,(select COALESCE(replacetable.toReplace,'')  from replacetable where id =1 ),(select COALESCE(replacetable.replaceWith,'') from replacetable where id =1 ))AS z,1 AS idr FROM replacetest

UNION ALL

SELECT REPLACE(replacenow.z,(SELECT COALESCE(replacetable.toReplace,'') FROM replacetable

WHERE id = idr),(SELECT COALESCE(replacetable.replaceWith,'') FROM replacetable

WHERE id = idr)) AS z, idr+1 FROM replacenow

WHERE idr>=1 AND idr<=LEN(replacenow.z)

SELECT z,idr FROM replacenow

WHERE idr=(SELECT max(id) FROM replacetable)+1

Next, we take a look at the two queries again. We will examine the execution plan and use the sql profiler to determine the number of reads, writes and duration for each query.

A plan is a plan is plan…

If so far the conclusion is not obvious by now, check out the photos below showing the execution plan for each query:

Execution plan for the first approach

Execution plan for the CTE approach

Let’s look at the result from SQL Profiler:

Reads, Writes, Duration for the first approach

Reads, Writes, Duration for the CTE approach

So, to sum it up, there’s really only one way to do this kind of replacing in SQL Server

Approach	Reads	Writes	Duration
Replace(Replace…	144	0	936
Using CTE	823316	341	6961

Fun with SQL CLR

For the final scenario, we’re gonna take a look at the following request: Strip out all the diacritics in a text and replace them with their corresponding characters. So for example: “astăzi” will become “astazi”.

In order to achieve this and not use so many calls of the replace functions that will make the code look horrible, we will use SQL CLR.

First, let’s create a C# class that will hold our RemoveDiacritics method

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

using Microsoft.SqlServer.Server;

using System.Globalization;

namespace SQLFunctions

    public class SVF

        public static string RemoveDiacritics(string s)

            string stFormD = s.Normalize(NormalizationForm.FormD);

            StringBuilder sb = new StringBuilder();

            for (int ich = 0; ich < stFormD.Length; ich++) {

                UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);

                if (uc != UnicodeCategory.NonSpacingMark) { sb.Append(stFormD[ich]); }

            return (sb.ToString().Normalize(NormalizationForm.FormC));

Build the class and then run the following sql code:

sp_configure 'clr enabled', 1

GO

reconfigure

GO

create assembly SQLFunctions from 'SQLFunctions.dll'

GO

create function RemoveDiacritics(@s nvarchar(200)) returns nvarchar(200)

as external name [SQLFunctions].[SQLFunctions.SVF].[RemoveDiacritics]

Now, let’s see how a call to our new function compares to the previous methods. First, the execution plan and then the profiler output

SQL CLR RemoveDiacritics execution plan

Reads, Writes, Duration for SQL CLR RemoveDiacritics

So this method, having 349 reads, 0 writes and 992 ms duration is much nicer to look at and does suffer in terms of performance.

Couple of notes

1. I got the code for the sql clr method that uses the Normalize method from here: http://stackoverflow.com/a/249126/404311

2. I am using SSMS Tools Pack (http://www.ssmstoolspack.com/) to save the execution plans to .jpg files. So big thanks to Mladen Prajdić (@MladenPrajdic) for this

Cheers,

sâmbătă, 29 iunie 2013

On Replacing in Strings in SQL Server - part 1

It’s easy: use the replace function in T-SQL

Select replace('aaaaax', 'a', 'b')

But what if we want to replace multiple strings? We have call replace multiple time within itself until it gets so ugly we immediately disown the code and send it to a boarding school in Switzerland just so we don’t see it anymore:

Select replace(replace(replace(replace('abcdefgh', 'a', 'b'), 'b', 'c'), 'c','d'),'d','e')

And if there are 10 characters to replace, it will look like some monsters from deep inside the depths of hell. Or whatever.

Testing

Let’s first create a table containing a sequence of integers, from 1 to 10000. The first time I saw this technique was in Louis Davidson’s book

Pro SQL Server 2008 Relational Database Design and Implementation (http://www.amazon.com/Server-Relational-Database-Implementation-Experts/dp/143020866X/ref=sr_1_5?ie=UTF8&qid=1372519918&sr=8-5&keywords=louis+davidson+pro+sql+server)

I really like this approach, so here it is:

USE [Tests]

GO

/****** Object:  Table [dbo].[sequence]    Script Date: 6/29/2013 8:35:09 AM ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[sequence](

    [Number] [int] NULL

) ON [PRIMARY]

GO

;with digits as

select (1) as number

Union

select (2) as number

Union

select (3) as number

Union

select (4) as number

Union

select (5) as number

Union

select (6) as number

Union

select (7) as number

Union

select (8) as number

Union

select (9) as number

Union

select (0) as number

insert into sequence

select d1.number + 10*d2.number+100*d3.number+1000*d4.number +1 from digits d1 cross join digits d2

cross join digits d3

cross join digits d4

Let’s create a test table that will contain just one column (name) and 10000 rows. This table will be used to test the different replace strategies we will consider.

USE [Tests]

GO

/****** Object:  Table [dbo].[ReplaceTest]    Script Date: 6/29/2013 11:29:28 PM ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[ReplaceTest](

    [Name] [nvarchar](100) NULL

) ON [PRIMARY]

GO

with InitialString as

(Select 'abcdefghijklmnopqrstuvwxyz' as initial)

insert into ReplaceTest

select InitialString.initial+cast(number as varchar(5))

from InitialString cross join sequence

So now we have table containing 10000 strings. Notice there is no index on this table, and this is by design. The goal here is to analyze the replace strategies without indexing the table, focusing only on the algorithm.

The Replace Code from Hell

The first replace code to test is the obvious one:

select replace(replace(replace(replace(replace(ReplaceTest.Name,'a', 'b'), 'b', 'c'), 'c', 'd'), 'd', 'e'), 'e', 'f') from ReplaceTest

Using a trace on the sql server, let’s look at the number of reads and query duration for the code above:

So we have 129 reads and 380 ms duration.

Replacing using CTE – A Bad Idea?

Let’s do things… a bit differently. Let’s do some replacing using CTE, recursive CTE and create a more configurable, more friendly SQL code.

;WITH s AS

(SELECT ROW_NUMBER() OVER (ORDER BY ReplaceTest.Name) AS rono, ReplaceTest.Name AS s FROM dbo.ReplaceTest),

replacetble (id, toReplace, replaceWith)

AS

(select 1 as id, 'a' as toReplace, 'b' as replaceWith

Union ALL

select 2 as id,  'b', 'c'

Union ALL

select 3 as id,   'c', 'd'

Union ALL

select 4 as id,  'd', 'e'

Union ALL

select 5 as id,  'e', 'f'),

replacenow(z, idr ) AS

SELECT REPLACE(s.s,(select COALESCE(replacetble.toReplace,'')  from replacetble where id =1 ),(select COALESCE(replacetble.replaceWith,'') from replacetble where id =1 ))AS z,1 AS idr FROM s

UNION ALL

SELECT REPLACE(replacenow.z,(SELECT COALESCE(replacetble.toReplace,'') FROM replacetble

WHERE id = idr),(SELECT COALESCE(replacetble.replaceWith,'') FROM replacetble

WHERE id = idr)) AS z, idr+1 FROM replacenow

WHERE idr>=1 AND idr<=LEN(replacenow.z)

SELECT z,idr FROM replacenow

WHERE idr=(SELECT max(id) FROM replacetble)+1

Two things are important here: The replacetable CTE that simply defines the rules of the replace (first replace ‘a’ with ‘b’, then ‘b’ with ‘c’ and so on) and the replacenow CTE that uses a recursive strategy and actually performs the replace. I think this code looks better than the first one, it is more configurable - simply add a new UNION ALL at the end of the replacetable CTE with the string you’d want to replace:

Union ALL

select 6 as id,  'f', 'g'

But, it has a problem. It is very slow. How slow? Take a look

So we have 583162 reads, we have 1935 ms duration and we even have 221 writes, just to show how slow this code really is.

Coming up in part two: A different take on the whole recursive CTE idea and we’ll have fun with some SQL CLR.

Cheers.

Random Things

duminică, 14 iulie 2013

On Replacing in Strings in SQL Server - part 2

Part 1 here: http://verrigo.blogspot.ro/2013/06/on-replacing-in-strings-in-sql-server.html

Let’s CTE again (Is it worth it?)

A plan is a plan is plan…

Fun with SQL CLR

Couple of notes

sâmbătă, 29 iunie 2013

On Replacing in Strings in SQL Server - part 1

Testing

The Replace Code from Hell

Replacing using CTE – A Bad Idea?

Despre mine

Arhivă blog

Subscribe

duminică, 14 iulie 2013

On Replacing in Strings in SQL Server - part 2

Part 1 here: http://verrigo.blogspot.ro/2013/06/on-replacing-in-strings-in-sql-server.html

Let’s CTE again (Is it worth it?)

A plan is a plan is plan…

Fun with SQL CLR

Couple of notes

sâmbătă, 29 iunie 2013

On Replacing in Strings in SQL Server - part 1

Testing

The Replace Code from Hell

Replacing using CTE – A Bad Idea?

Despre mine

Arhivă blog