問題分析
不過在實(shí)際使用中, 還是發(fā)現(xiàn)按照最基本的方式調(diào)用 HttpClient 時(shí), 并不支持 UTF-8 編碼, 在網(wǎng)絡(luò)上找過一些文章, 也不得要領(lǐng), 于是查看了 commons-httpClient3.0.1 的一些代碼, 首先在 PostMethod 中找到了 generateRequestEntity() 方法:
/** * Generates a request entity from the post parameters, if present. Calls * {@link EntityEnclosingMethod#generateRequestBody()} if parameters have not been set. * * @since 3.0 */ protected RequestEntity generateRequestEntity() { if (!this.params.isEmpty()) { // Use a ByteArrayRequestEntity instead of a StringRequestEntity. // This is to avoid potential encoding issues. Form url encoded strings // are ASCII by definition but the content type may not be. Treating the content // as bytes allows us to keep the current charset without worrying about how // this charset will effect the encoding of the form url encoded string. String content = EncodingUtil.formUrlEncode(getParameters(), getRequestCharSet()); ByteArrayRequestEntity entity = new ByteArrayRequestEntity( EncodingUtil.getAsciiBytes(content), FORM_URL_ENCODED_CONTENT_TYPE ); return entity; } else { return super.generateRequestEntity(); } } |
原來使用 NameValuePair 加入的 HTTP 請(qǐng)求的參數(shù)最終都會(huì)轉(zhuǎn)化為 RequestEntity 提交到 HTTP 服務(wù)器, 接著在 PostMethod 的父類 EntityEnclosingMethod 中找到了如下的代碼:
/** * Returns the request's charset. The charset is parsed from the request entity's * content type, unless the content type header has been set manually. * * @see RequestEntity#getContentType() * * @since 3.0 */ public String getRequestCharSet() { if (getRequestHeader("Content-Type") == null) { // check the content type from request entity // We can't call getRequestEntity() since it will probably call // this method. if (this.requestEntity != null) { return getContentCharSet( new Header("Content-Type", requestEntity.getContentType())); } else { return super.getRequestCharSet(); } } else { return super.getRequestCharSet(); } } |
解決方案
從上面兩段代碼可以看出是 HttpClient 是如何依據(jù) "Content-Type" 獲得請(qǐng)求的編碼(字符集), 而這個(gè)編碼又是如何應(yīng)用到提交內(nèi)容的編碼過程中去的. 按照這個(gè)原來, 其實(shí)我們只需要重載 getRequestCharSet() 方法, 返回我們需要的編碼(字符集)名稱, 就可以解決 UTF-8 或者其它非默認(rèn)編碼提交 POST 請(qǐng)求時(shí)的亂碼問題了.
測(cè)試
首先在 Tomcat 的 ROOT WebApp 下部署一個(gè)頁面 test.jsp, 作為測(cè)試頁面, 主要代碼片段如下:
<%@ page contentType="text/html;charset=UTF-8"%> <%@ page session="false" %> <% request.setCharacterEncoding("UTF-8"); String val = request.getParameter("TEXT"); System.out.println(">>>> The result is " + val); %> |
接著寫一個(gè)測(cè)試類, 主要代碼如下:
public static void main(String[] args) throws Exception, IOException { String url = "http://localhost:8080/test.jsp"; PostMethod postMethod = new UTF8PostMethod(url); //填入各個(gè)表單域的值 NameValuePair[] data = { new NameValuePair("TEXT", "中文"), }; //將表單的值放入postMethod中 postMethod.setRequestBody(data); //執(zhí)行postMethod HttpClient httpClient= new HttpClient(); httpClient.executeMethod(postMethod); } //Inner class for UTF-8 support public static class UTF8PostMethod extends PostMethod{ public UTF8PostMethod(String url){ super(url); } @Override public String getRequestCharSet() { //return super.getRequestCharSet(); return "UTF-8"; } } |
運(yùn)行這個(gè)測(cè)試程序, 在 Tomcat 的后臺(tái)輸出中可以正確打印出 ">>>> The result is 中文" .
|