Linux上中文乱码问题的解决办法

中文出现乱码，首先需要查看字符串编码和解码的方法是否一致，其次检查Linux的LANG环境变量中的编码是否匹配程序的编解码方法。

建议统一编码为UTF-8，对应的bash脚本：export LANG=zh_CN.UTF-8

编码是字符串转换为字节数组（二进制串），解码是逆过程， charset指定了其中转换/逆转换的方式（见字符编解码的故事）。就Java而言，用指定的charset（例如UTF-8）对字符串编码用byte[] dst = input_string.getBytes("UTF-8");实现，解码用“String dst = new String(input_byte_array, "UTF-8");”实现；

如果编码使用byte[] dst = input_string.getBytes()，则具体的编码方式依赖于操作系统，存在编码错误的风险，解码用String dst = new String(input_byte_array);也存在解码错误的风险，所以在编/解码过程中最好明确指定使用的charset。

下面的代码将字节数组转换为16进制字节码字符串，以便观察不同的charset如何影响生成的字节码。

public class MQReceiver { 
 public static void main(String[] args) { 
  byte[] buf = mq.recvTextMessage(); // receive msg from mq 
  if (buf != null) { 
   String hexRes = bytes2HexString(buf); 
   System.out.println(hexRes); 
   System.out.println(getChnBytes(hexRes)); 
   String bufstr = new String(buf,"UTF-8"); 
   System.out.println("Recv alarm: " + bufstr); 
  } else { 
   System.out.println("buf: " + buf); 
  } 
 }


 public static String bytes2HexString(byte[] b) { 
  String ret = ""; 
  for (int i = 0; i < b.length; i++) { 
   String hex = Integer.toHexString(b[i] & 0xFF); 
   if (hex.length() == 1) { 
    hex = '0' + hex; 
   } 
   ret += hex.toUpperCase(); 
  } 
  return ret; 
 } 
 public static String getChnBytes(String input) { 
  String sflag = " <FM_ALARM_MSG.AlarmText> "; 
  String eflag = "</FM_ALARM_MSG.AlarmText>"; 
  byte[] start = sflag.getBytes(); 
  byte[] end = eflag.getBytes(); 
  String ss = bytes2HexString(start); 
  String se = bytes2HexString(end); 
  int beginIndex = input.indexOf(ss) + ss.length(); 
  int endIndex = input.indexOf(se); 
  return input.substring(beginIndex, endIndex); 
 } 
}

其中getChnBytes方法从下面的字符数组中找到汉字，提取出来，以便于观察汉字部分编码的变化，

<FM_ALARM_MSG.EventTime>2012-06-20 14:39:51</FM_ALARM_MSG.EventTime> <FM_ALARM_MSG.AlarmText> 检查数据库连接产生异常</FM_ALARM_MSG.AlarmText><C_FP0></C_FP0>

例如UTF-8的“ 检查数据库连接产生异常”编码输出如下：

E6A380 
E69FA5 
E695B0 
E68DAE 
E5BA93 
E8BF9E 
E68EA5 
E4BAA7 
E7949F 
E5BC82 
E5B8B8

注：快典网可以在线查看汉字的各种编码。

Linux上中文乱码问题的解决办法

Published

Last Updated

Category

Tags

Contact