HttpClient POST 的 UTF-8 編碼問題

pengx 2009-09-15

展開全文

問題分析

不過在實(shí)際使用中, 還是發(fā)現(xiàn)按照最基本的方式調(diào)用 HttpClient 時(shí), 并不支持 UTF-8 編碼, 在網(wǎng)絡(luò)上找過一些文章, 也不得要領(lǐng), 于是查看了 commons-httpClient3.0.1 的一些代碼, 首先在 PostMethod 中找到了 generateRequestEntity() 方法:

    /**
     * Generates a request entity from the post parameters, if present.   Calls
     * {@link EntityEnclosingMethod#generateRequestBody()} if parameters have not been set.
     * 
     * @since 3.0
     */
    protected RequestEntity generateRequestEntity() {
        if (!this.params.isEmpty()) {
            // Use a ByteArrayRequestEntity instead of a StringRequestEntity.
            // This is to avoid potential encoding issues.   Form url encoded strings
            // are ASCII by definition but the content type may not be.   Treating the content
            // as bytes allows us to keep the current charset without worrying about how
            // this charset will effect the encoding of the form url encoded string.
            String content = EncodingUtil.formUrlEncode(getParameters(), getRequestCharSet());
            ByteArrayRequestEntity entity = new ByteArrayRequestEntity(
                EncodingUtil.getAsciiBytes(content),
                FORM_URL_ENCODED_CONTENT_TYPE
            );
            return entity;
        } else {
            return super.generateRequestEntity();
        }
    }

原來使用 NameValuePair 加入的 HTTP 請(qǐng)求的參數(shù)最終都會(huì)轉(zhuǎn)化為 RequestEntity 提交到 HTTP 服務(wù)器, 接著在 PostMethod 的父類 EntityEnclosingMethod 中找到了如下的代碼:

    /**
     * Returns the request's charset.   The charset is parsed from the request entity's 
     * content type, unless the content type header has been set manually. 
     * 
     * @see RequestEntity#getContentType()
     * 
     * @since 3.0
     */
    public String getRequestCharSet() {
        if (getRequestHeader("Content-Type") == null) {
            // check the content type from request entity
            // We can't call getRequestEntity() since it will probably call
            // this method.
            if (this.requestEntity != null) {
                return getContentCharSet(
                    new Header("Content-Type", requestEntity.getContentType()));
            } else {
                return super.getRequestCharSet();
            }
        } else {
            return super.getRequestCharSet();
        }
    }

解決方案

從上面兩段代碼可以看出是 HttpClient 是如何依據(jù) "Content-Type" 獲得請(qǐng)求的編碼(字符集), 而這個(gè)編碼又是如何應(yīng)用到提交內(nèi)容的編碼過程中去的. 按照這個(gè)原來, 其實(shí)我們只需要重載 getRequestCharSet() 方法, 返回我們需要的編碼(字符集)名稱, 就可以解決 UTF-8 或者其它非默認(rèn)編碼提交 POST 請(qǐng)求時(shí)的亂碼問題了.

測(cè)試

首先在 Tomcat 的 ROOT WebApp 下部署一個(gè)頁面 test.jsp, 作為測(cè)試頁面, 主要代碼片段如下:

<%@ page contentType="text/html;charset=UTF-8"%>
<%@ page session="false" %>
<%
request.setCharacterEncoding("UTF-8");
String val = request.getParameter("TEXT");
System.out.println(">>>> The result is " + val);
%>

接著寫一個(gè)測(cè)試類, 主要代碼如下:

    public static void main(String[] args) throws Exception, IOException {
        String url = "http://localhost:8080/test.jsp";
        PostMethod postMethod = new UTF8PostMethod(url);
        //填入各個(gè)表單域的值
        NameValuePair[] data = {
                new NameValuePair("TEXT", "中文"),
        };
        //將表單的值放入postMethod中
        postMethod.setRequestBody(data);
        //執(zhí)行postMethod
        HttpClient httpClient= new HttpClient();
        httpClient.executeMethod(postMethod);
    }
    
    //Inner class for UTF-8 support
    public static class UTF8PostMethod extends PostMethod{
        public UTF8PostMethod(String url){
            super(url);
        }
        @Override
        public String getRequestCharSet() {
            //return super.getRequestCharSet();
            return "UTF-8";
        }
    }

運(yùn)行這個(gè)測(cè)試程序, 在 Tomcat 的后臺(tái)輸出中可以正確打印出 ">>>> The result is 中文" .

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自： pengx > 《HttpClient》

舉報(bào)/認(rèn)領(lǐng)