Client side HTML encoding and decoding

One of the things that I have found strange about Javascript is its lack of inbuilt functions to handle HTML encoding and decoding. Most server side languages have this functionality built into them but Javascript has escape, encodeURIComponent, encodeURI, unescape, decodeURIComponent and decodeURI functions which are aimed at making strings portable and for encoding URIs and URI parameters but there is no function for HTML encoding.

Now you may think well there's not much demand for a Javascript HTMLEncode and HTMLDecode function as any textual content that needs encoding should be done server-side before the HTML page is rendered and I would have agreed with you not long ago. However I have started working more and more with AJAX and especially RSS feeds and other client side delivered content such as Googles AJAX APIs and I have found more and more the need to reformat content delivered from external sources especially by HTML encoding or decoding content client side using Javascript.

For more details about reformating content with Javascript and the problems associated with simple replace statements you can read my related blog article.

My Encoder Object

Therefore I have created a little library of functions designed to help me encode and decode HTML with Javascript which you can download here: Encoder.js.

There are a number of useful functions within the object which I will outline here:

  • HTML2Numerical: Converts HTML entities to their numerical equivalents.
  • NumericalToHTML: Converts numerical entities to their HTML equivalents.
  • numEncode: Numerically encodes unicode characters.
  • htmlDecode: Decodes HTML encoded text to its original state.
  • htmlEncode: Encodes HTML to either numerical or HTML entities. This is determined by the EncodeType property.
  • XSSEncode: Encodes the basic characters used in XSS attacks to malform HTML.
  • correctEncoding: Corrects any double encoded ampersands.
  • stripUnicode: Removes all unicode characters.
  • hasEncoded: Returns true if a string contains html encoded entities within it.
//example of using the html encode object

// set the type of encoding to numerical entities e.g & instead of &
Encoder.EncodeType = "numerical";

// or to set it to encode to html entities e.g & instead of &
Encoder.EncodeType = "entity";

// HTML encode text from an input element
// This will prevent double encoding.
var encoded = Encoder.htmlEncode(document.getElementById('input'));

// To encode but to allow double encoding which means any existing entities such as
// & will be converted to &
var dblEncoded = Encoder.htmlEncode(document.getElementById('input'),true);

// Decode the now encoded text
var decoded = Encoder.htmlDecode(encoded);

// Check whether the text still contains HTML/Numerical entities
var containsEncoded = Encoder.hasEncoded(decoded);

HTML Encoder and Decoder

I have defaulted the entry box with some example content and then encoded it and then decoded the result to prove the decoding works. Notice I have used standard characters < > & " as well as UTF-8 text (I have no idea what the Arabic says) and some already encoded entities to prove that anything already encoded will not get double encoded. Obviously if you don't mind about double encoding then call Encoder.htmlEncode(value,true) with a second parameter set to true or if you are using my form for your own encoding make sure the checkbox is ticked.


