DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Calculate String Length Containing Chinese Characters


I want to remove all lines with only 1 Chinese character. The origin file is very large, so I have to filter out all lines with 1 character. And remove them together.

Sort all lines in the file by their length is a easy way to achieve this. The content of file origin.txt is:

一十一 一十二 一十三 一十四 一十五 一十六 一十七 一十八 一十九 二十

Shell

awk '{print length, $0}' origin.txt | sort -n | cut -d " " -f2- | uniq > target.txt

Python

Node



Published

May 7, 2015

Last Updated

May 7, 2015

Category

Tech

Tags

  • chinese 8
  • length 1
  • node 4
  • python 136
  • unicode 10

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor