DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

几种常用语言处理文本的效率比较


读一个37264行,大小为81MB的文本文件f4.json,计算每行的单词数,然后打印出总单词数, Python用时0.16秒,Ruby用时1.16秒,Haskell用时13.4秒,分别差一个数量级。 下面是测试脚本和过程:

wordcount.py:

inp = 'f4.json'
counts = []
with open(inp) as f:
  for line in f:
    counts.append(len(line.split()))
print(sum(counts))

wordcount.rb:

inp = 'f4.json'
words = []
File.open(inp).each do |line|
  words.push(line.split.size)
end
puts words.reduce(0, :+)

WordCount.hs:

main :: IO ()
main = do
input <- readFile "f4.json"
print $ sum(countWords input)

countWords input = map (length.words) (lines input)

测试过程: ``` wc -l f4.json 37264 f4.json

time python wordcount.py 1103404 python wordsum.py 0.14s user 0.01s system 99% cpu 0.157 total

time ruby wordsum.rb 1103404 ruby wordsum.rb 1.13s user 0.02s system 99% cpu 1.158 total

time runhaskell Main.hs 1105752 runhaskell wordcount.hs 12.64s user 0.76s system 100% cpu 13.395 total



Published

Apr 15, 2016

Last Updated

Apr 15, 2016

Category

Tech

Tags

  • benchmark 4
  • haskell 11
  • python 136
  • ruby 9

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor