In Python 2.x, the default string is byte string, which means every byte is convert to a character. If you write a Unicode string, you have to write it as u'...'. On the contrary, in Python 3.x, the default string is Unicode string. If you want a byte string, you have to write it as b'...'.
In Python 3.3:
chn = '将帖子翻译为中\n'
with open('py3uni', 'w') as f:
f.write(chn)
print(chn.encode('utf-8'))
mygbk = chn.encode('gbk')
with open('py3gbk', 'wb') as f:
f.write(mygbk)
readgbk = open('py3gbk', encoding='gbk').read()
print(readgbk)
print(type(readgbk)) # <class 'str'>
In Python 2.7:
# -*- encoding: utf-8 -*-
import codecs
inputStr = u'将帖子翻译为中文2015年3月\n'
gbkFn = 'gbkFile'
utf8Fn = 'utf8File'
print('Print original Unicdoe string: ' + inputStr)
print('Print in UTF-8 encoding: ' + inputStr.encode('utf-8'))
with codecs.open(utf8Fn, 'w', 'utf-8') as f:
f.write(inputStr)
fromUTF8 = codecs.open(utf8Fn, encoding='utf-8').read()
with codecs.open(gbkFn, 'w', 'gbk') as f:
f.write(inputStr)
fromGBK = codecs.open(gbkFn, encoding='gbk').read()
print('Strings from different encodings are the same?')
print(fromUTF8 == fromGBK)
print("\nString type:")
print(type(fromGBK))
You can convert gbkFile to utf8File in shell with:
iconv -f gbk -t utf8 gbkFile > utf8File
.
You can also write strings to file in this way:
gbkStr = inputStr.encode('gbk')
with open('gbkFile', 'wb') as f:
f.write(gbkStr)
While it's not as concise as the previous method.