Ajax Chinese Encoding Solution
Published: 2010-06-03 12:44:00
Category: Frontend
Summary: In mainstream browsers, Ajax objects send data encoded in UTF-8. Therefore, when both frontend and backend files are uniformly encoded in UTF-8, the situation is classic and straightforward. If GB character set is indeed required, appropriate workarounds are needed. This article explains solutions for both character sets.
Introduction
In mainstream browsers, Ajax objects send request data encoded in UTF-8. Therefore, when both frontend and backend files are uniformly encoded in UTF-8, the situation is classic and straightforward. If GB character set is indeed required, appropriate workarounds are needed. This article explains solutions for both character sets. Server-side files are exemplified using PHP.
The following solutions have been tested in IE series, Firefox 3, Chrome 4, and Opera 10.
UTF-8 Classic Solution
Both frontend and backend files should be uniformly encoded in UTF-8.
HTML file declaration:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
JavaScript files should also be saved in UTF-8 mode. The core to ensure that parameters are always transmitted in UTF-8 encoding is to process parameter values uniformly with encodeURIComponent(). That is, the following line in SF.HTTP.xhr_utf.js:
param = encodeURIComponent(param);
Note that at this point, the request data received by the backend PHP program is UTF-8 encoded. If you need to convert it to data in another character set, you can further process it with iconv. The PHP program's output data must also be UTF-8 encoded, with the declaration:
header("text/html;charset=UTF-8");
Note that to ensure IE properly receives UTF-8 data, you must write "UTF-8" in uppercase, not lowercase or other forms!
Here is the complete code for each file:
sfxhr_utf.htm
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>AJAX Chinese - UTF-8 Version</title>
<script type="text/javascript" src="SF.HTTP.xhr_utf.js" mce_src="SF.HTTP.xhr_utf.js"></script>
</head>
<body>
<script type="text/javascript"></script>
</body>
</html>
SF.HTTP.xhr_utf.js
var SF = {
//HTTP Request Related
HTTP: {
/**
* Ajax call method compatible with GB2312 character set
* The workaround uses escape, where submitted Chinese characters become %u9886%u5730, etc.
* Backend programs then process them into GB character set Chinese, see the unescape function in ajax.php.
* Input parameters, JSON format object
* {
* 'url': url, Request URL
* 'type': method, Submission method, get or post, default is get
* 'charset':charset, Character set, default utf-8
* 'params': Submitted parameters, JSON format, e.g. {var1:'北京', var2:'test'}
* 'success': Handler function when response is successful, parameter is standard XMLHttpRequest object
* 'fail': Handler function when request fails, usually to give prompt on page, can be empty
* 'loading': Loading function during response wait, usually to give prompt on page, can be empty
* }
*/
xhr:function(json) {
//Get input parameters and assign default values
var url=json.url,
method=json.type || 'get',
params=json.params || {},
onComplete=json.success,
charset=json.charset || 'utf8',
onFailure=json.fail,
loading=json.loading;
var getHTTPObject = function() {
var xmlhttp = false;
if (window.XMLHttpRequest) {
xmlhttp = new XMLHttpRequest();
} else if(window.ActiveXObject) {
try {
xmlhttp = new ActiveXObject("Msxml2.XMLHTTP");
} catch (e) {
try {
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e) {
xmlhttp = false;
}
}
}
return xmlhttp;
};
if (loading) {
loading();
}
var query = '';
for (var i in params) {
var param = params[i];
if ('gb2312'==charset) {
param = escape(param);
}
//Final solution for IE GET method parameter value transmission issue,
//when using UTF-8 character set, uniformly add encodeURIComponent
else {
param = encodeURIComponent(param);
}
query+= i + '='+ param + '&';
}
var XHR = getHTTPObject();
//XHR.setRequestHeader("charset","gb2312");
XHR.onreadystatechange = function() {
if (XHR.readyState == 4) {
if (XHR.status == 200 || XHR.status == 304) {
if (onComplete) {
onComplete(XHR);
}
} else {
if (onFailure) {
onFailure(XHR)
};
}
}
};
method = ('get' == method.toLowerCase()) ? 'get':'post';
if ('post'==method) {
XHR.open(method, url, true);
XHR.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
XHR.send(query);
} else {
url += '?'+ query + 'random='+Math.random();
XHR.open(method, url, true);
XHR.send(null);
}
}
}
};
ajax_utf.php
<?php
$test = isset($_GET['test']) ? $_GET['test'] : $_POST['test'];
//To ensure IE properly receives UTF-8 data, you must use "UTF-8" in uppercase, not lowercase!
header("text/plain;charset=UTF-8");
echo ($test);
GB2312 Workaround Solution
Both frontend and backend files should be uniformly encoded in ANSI.
HTML file declaration:
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
To prevent Ajax objects from sending data in the default UTF-8 encoding, the core workaround is to process parameter values uniformly with escape(). Chinese characters will become Unicode format like %u5317%u4EAC, so browsers will treat them as Western data, no longer URI encoding them with UTF-8 character set as with regular Chinese characters. That is, the following line in SF.HTTP.xhr_utf.js:
param = escape(param);
Note that at this point, the request data received by the backend PHP program is unencoded Unicode format, which needs to be converted back to GB2312 character set data using the unescape() function in the PHP program. The output data must also be GB2312 encoded, with the declaration before output:
header("Content-type: text/html; charset=gb2312");
For usage instructions of the SF.HTTP.xhr() method, see the method comments and HTML examples.
In example packages for different character sets, SF.HTTP.xhr.js and SF.HTTP.xhr_utf.js have exactly the same content, except that SF.HTTP.xhr_utf.js is UTF-8 encoded. It is recommended to use the UTF-8 version with all files uniformly encoded in UTF-8, because IE browsers require referenced external files to also be UTF-8 encoded.
Here is the complete code for each file:
sfxhr.htm
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<title>AJAX Chinese - GB2312 Version</title>
<script type="text/javascript" src="SF.HTTP.xhr.js" mce_src="SF.HTTP.xhr.js"></script>
</head>
<body>
<script type="text/javascript"></script>
</body>
</html>
SF.HTTP.xhr.js
var SF = {
//HTTP Request Related
HTTP: {
/**
* Ajax call method compatible with GB2312 character set
* The workaround uses escape, where submitted Chinese characters become %u9886%u5730, etc.
* Backend programs then process them into GB character set Chinese, see the unescape function in ajax.php.
* Input parameters, JSON format object
* {
* 'url': url, Request URL
* 'type': method, Submission method, get or post, default is get
* 'charset':charset, Character set, default utf-8
* 'params': Submitted parameters, JSON format, e.g. {var1:'北京', var2:'test'}
* 'success': Handler function when response is successful, parameter is standard XMLHttpRequest object
* 'fail': Handler function when request fails, usually to give prompt on page, can be empty
* 'loading': Loading function during response wait, usually to give prompt on page, can be empty
* }
*/
xhr:function(json) {
//Get input parameters and assign default values
var url=json.url,
method=json.type || 'get',
params=json.params || {},
onComplete=json.success,
charset=json.charset || 'utf8',
onFailure=json.fail,
loading=json.loading;
var getHTTPObject = function() {
var xmlhttp = false;
if (window.XMLHttpRequest) {
xmlhttp = new XMLHttpRequest();
} else if(window.ActiveXObject) {
try {
xmlhttp = new ActiveXObject("Msxml2.XMLHTTP");
} catch (e) {
try {
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e) {
xmlhttp = false;
}
}
}
return xmlhttp;
};
if (loading) {
loading();
}
var query = '';
for (var i in params) {
var param = params[i];
if ('gb2312'==charset) {
param = escape(param);
}
//Final solution for IE GET method parameter value transmission issue,
//when using UTF-8 character set, uniformly add encodeURIComponent
else {
param = encodeURIComponent(param);
}
query+= i + '='+ param + '&';
}
var XHR = getHTTPObject();
//XHR.setRequestHeader("charset","gb2312");
XHR.onreadystatechange = function() {
if (XHR.readyState == 4) {
if (XHR.status == 200 || XHR.status == 304) {
if (onComplete) {
onComplete(XHR);
}
} else {
if (onFailure) {
onFailure(XHR)
};
}
}
};
method = ('get' == method.toLowerCase()) ? 'get':'post';
if ('post'==method) {
XHR.open(method, url, true);
XHR.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
XHR.send(query);
} else {
url += '?'+ query + 'random='+Math.random();
XHR.open(method, url, true);
XHR.send(null);
}
}
}
};
ajax.php
<?php
//If the input parameter is not a unicode format value, it will be returned as is.
function unescape($str) {
$str = rawurldecode($str);
preg_match_all("/(?:%u.{4})|&#x.{4};|&#\d+;|.+/U",$str,$r);
$ar = $r[0];
//print_r($ar);
foreach($ar as $k=>$v) {
if(substr($v,0,2) == "%u")
$ar[$k] = iconv("UCS-2","GB2312",pack("H4",substr($v,-4)));
elseif(substr($v,0,3) == "&#x")
$ar[$k] = iconv("UCS-2","GB2312",pack("H4",substr($v,3,-1)));
elseif(substr($v,0,2) == "&#") {
echo substr($v,2,-1)."<br>";
$ar[$k] = iconv("UCS-2","GB2312",pack("n",substr($v,2,-1)));
}
}
return join("",$ar);
}
$test = isset($_GET['test']) ? $_GET['test'] : $_POST['test'];
//To ensure output content is in the specified character set, explicitly declare it.
//When outputting GB2312 character set content,
header("Content-type: text/html; charset=gb2312");
echo unescape($test);