This is a supporting post for a rather lengthy article about How to transform any query to Html (scheduled for 2014-12-16).
The case for generating html output directly in T-Sql
It is a widely discussed idea in the community that all server side data handling with T-SQL should not consider issues of data presentation – i.e. how it should be rendered on clients. I agree mostly, but there is at least one exception to this rule – when using database mail to send query results to clients.
When talking to people other than DBAs you need to transport the message contained in your data. Marketing or sales people expect formatted output and the only way to process query results for database mail is to transform them to html.
The problem of special characters
Of course, when transforming row content as string values to html you need to transform any contained special characters as well – a simple example being the frequent Umlauts in German or accented letters in French.
Unfortunately there is no inbuilt function to html-encode special characters in T-SQL. There are several ways to address the problem. One way would be to create a clr-integrated assembly employing .NET-framework functions – potentially an elegant solution since the transform could be handled with a simple call to HtmlEncode in System.Web – but the System.Web namespace cannot be integrated in clr-assemblies.
So I decided to work on a semi-complete solution which is written natively in T-SQL and should do for most practical use cases (at least for mine).
1. Creating a table for Html Entities
My solution employs a table to provide replacements for common characters. The first step is to create this table, using a stored procedure. For this I adapted a somewhat complete list of common special characters from Wikipedia.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- p_CreateHtmlEntityTable -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - USE <YOUR DATABASE NAME HERE> GO IF OBJECT_ID(N'p_CreateHtmlEntityTable', 'P') IS NOT NULL DROP PROCEDURE p_CreateHtmlEntityTable GO CREATE PROCEDURE p_CreateHtmlEntityTable AS BEGIN IF OBJECT_ID(N'HtmlEntities', 'U') IS NOT NULL DROP TABLE HtmlEntities; CREATE TABLE htmlEntities ( charValue NCHAR, htmlEntity NVARCHAR(20), CodePointHex NVARCHAR(12), UnicodeHex NVARCHAR(12), UnicodeDec INT, descr NVARCHAR(200) ); INSERT htmlEntities (htmlEntity, UnicodeHex, descr) VALUES ('&', '', 'ampersand'),-- erroneous conversion when after " ('"', '', 'quotation mark (APL quote)'), (''', '', 'apostrophe (apostrophe-quote); see below'), ('<', 'C;', 'less-than sign'), ('>', 'E;', 'greater-than sign'), ... ('♦', '੪', 'black diamond suit[f]'); UPDATE htmlEntities SET CodePointHex = Replace(Replace(UnicodeHex, '&#', ''), ';', '') FROM htmlEntities; UPDATE htmlEntities SET UnicodeDec = CONVERT(INT, dbo.HexStrToVarBinary(CodePointHex)), charValue = NCHAR(CONVERT(INT, dbo.HexStrToVarBinary(CodePointHex))) FROM htmlEntities END GO |
The full-blown code can be downloaded at the bottom of this article.
What I’m essentially doing here is creating a lookup table containing different representations of special characters:
- Column
charValue
shows the special character itself. - Column
htmlEntity
contains the replacement to be used in html output - Column
UnicodeHex
shows the Unicode Hex Value as string - Column
UnicodeDec
shows the Unicode Hex Value in decimal form - Column
CodePointHex
shows the Unicode Codepoint in Hex
The initial Inserts after table creation only takes in the columns htmlEntity, UnicodeHex and description as literal values. At the end I calculate UnicodeDecimal and derive the charValue from it – avoiding character misrepresentations due to pasting literals into the SSMS code editor.
Before I forget: this Stored Procedure needs the HexStrToVarBinary-Function provided by Micheal Letterle in his post T-SQL Hex string to VarBinary improved. I include it as courtesy in the download below.
2. A Function to html-encode special characters
The second step is to create a function to simplify the html-encoding itself.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
IF OBJECT_ID('f_htmlEncode', 'FN') IS NOT NULL DROP FUNCTION f_htmlEncode GO CREATE FUNCTION f_htmlEncode (@input NVARCHAR(MAX)) RETURNS NVARCHAR(MAX) AS BEGIN -- Declare the return variable here DECLARE @result NVARCHAR(MAX) = @input; SELECT @result = REPLACE(@result COLLATE Latin1_General_CS_AS, NCHAR(UnicodeDec), htmlEntity) FROM htmlEntities OPTION (MAXRECURSION 32000) RETURN @result END GO |
A first try with this function was not as satisfying as desired. I had to play around with the ordering of html entities in the replacement table to achieve a proper encoding in the end. And to encode upper and lower case variants correctly it is important to specify the collation as case sensitive with COLLATE Latin1_General_CS_AS.
3. Usage and testing for correctness
To ensure that the encoding is working as expected I wrote a short test. The testing strategy is to html-encode all special characters in a single string and compare them to a string generated directly from their html counterparts.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
EXEC p_CreateHtmlEntityTable; PRINT ' ' PRINT 'Testing html encoding for all characters in table htmlEntities' DECLARE @specialChars NVARCHAR(MAX) = '' SELECT @specialChars = COALESCE(@specialChars + '|' + NCHAR(UnicodeDec), '') FROM htmlEntities DECLARE @htmlEntities NVARCHAR(MAX) = '' SELECT @htmlEntities = COALESCE(@htmlEntities + '|' + htmlEntity, '') FROM htmlEntities DECLARE @convHtmlEntities NVARCHAR(MAX) = '' SELECT @convHtmlEntities = dbo.f_htmlEncode(@specialChars) IF @convHtmlEntities = @htmlEntities PRINT 'Success' ELSE BEGIN PRINT 'Encoding failed!' PRINT '@Conversion result =' PRINT @convHtmlEntities PRINT ' ' PRINT 'Expected result = ' PRINT @HtmlEntities END |
First I execute the stored procedure to update table htmlEntities
. Then I create a string containing all special characters in the table – converted on the fly with NCHAR(UnicodeDec)
to avoid mis-representations. Step 3 is to build the expected conversion result – a string concatenated from all corresponding html entities . Then the specialChars are html-encoded, compared to the expected result and a success- or failure-message is printed.
I had to repeat this test several times to iron out some encoding problems related to the ordering of replacement entries – but now it works.
Hope this will help you with similar tasks at hand.
To spare you the typing: Here’s the Code to Download.
References:
- List of character entities taken from Wikipedia’s List_of_XML_and_HTML_character_entity_references
- Stackoverflow provides info about Case Sensitivity in T-Sql character replacement
- Inspiration: https://devio.wordpress.com/2009/07/11/convert-unicode-hex-codepoint-to-unicode-character-in-sql-server/
- T-SQL UNICODE-Function: http://msdn.microsoft.com/en-us/library/ms180059.aspx
- The Role of Collations in String Conversion (CASE Sensitivity) : http://msdn.microsoft.com/en-us/library/ms190920.aspx
- SQL Server Collations and Unicode Support: http://msdn.microsoft.com/en-us/library/ms143726.aspx
- CASE Sensitivity in String Conversion: http://msdn.microsoft.com/en-us/library/ms190920.aspx
- HexStrToVarBinary taken from http://michaeldotnet.blogspot.de/2007/11/t-sql-hex-string-to-varbinary-improved.html
Please read my reply to Mark. Unfortunately, I cannot provide any further help
I have the same issue as Mark. Any solution to this problem?
The solution above doesn’t include a way to encode only parts of a string. My first idea would be keep the url in a separate column and include it in the output unencoded. See the follow-up post where I’ve applied the encoding function with several columns to output a full html email : http://www.manfredkipfelsberger.de/transform-any-query-to-html/
hey there – love the code, but… one of the columns I am outputting is a website url but due to the urlencoding it keeps replacing my ” with & quot ; (as expected due to urlencoding support).
Is there a way I can have the html output to allow urls to bypass this encoding so I get an actual clickable href link?
Ta.
Sorry, the Disqus editor screwed it up: the correct code for THORN is 00DE, not 00FE.
Hi Mark,
I just had a look – actually the Unicode-Hex value of Þ is wrong and should be �DE instead of $#00FE.
I tested your issue with this correction and it works.
Regards, Manfred
When I have N’ the ‘, the output is ” þe “. I’ve even tried to eliminate Þ from my Html Entities table, but to no avail.