Chapter 2
If we provide two methods with the same name, the second one is retained because of the strict evaluation order.
在一个类中,如果定义了两个名称一样的方法,后一个会覆盖前一个, 下面是一个示例:
$ cat << EOF > test.py
class Test:
def ma(self):
print('m1')
def ma(self):
print('m2')
test_same_method = '''
>>> Test().ma()
m2
'''
__test__ = {
'Same_method': test_same_method
}
def test():
import doctest
doctest.testmod()
if __name__ == "__main__":
test()
EOF
$ python3 test.py
...
Test passed.
Chapter 3
doctest
doctest能够将函数的docstring作为测试依据,非常方便,
基本的方法是doctest.testmod()
,
testmod是test module的意思。
不用参数特别指定的情况下,testmod
测试当前模块中所有的函数,
以及在__test__
字典里定义的测试。
__test__
的结构是:{ 'test_name1': 'test_str1', 'test_name2': 'test_str2', ... }
.
其中test_str
是一段完整的测试docstring,参考第2章中的示例。
如果testmod()
中不带verbose
参数,只有在测试失败时才打印测试报告,
如果doctest.testmod(verbose=True)
,则不论成功失败都打印测试报告。
Python用python -i <fileanme.py>
把脚本加载到console中,以方便生成测试docstring.
IPython有个%doctest_mod
,方便进行doctest,具体Google 'ipython doctest mode'.
namedtuple
namedtuple 对象是一组key-value对,key是字符串, value可以是任何类型的对象(namedtuple的定义中并不指定value的类型), 所以既可以给namedtuple对象属性赋值tuple或者namedtuple形成一个完全immutalbe的对象, 也可以给namedtuple对象属性赋值list或者dict,形成一种“半可变”对象, 例如下面就是一个半可变对象:
>>> Pet = namedtuple('Pet', 'name age')
>>> kitty = Pet('kitty', 3)
>>> puppy = Pet('puppy', 5)
>>> People = namedtuple('People', 'name age pets')
>>> leo = People('Leo', 23, [kitty, puppy])
>>> leo
People(name='Leo', age=23, pets=[Pet(name='kitty', age=3), Pet(name='puppy', age=5)])
>>> leo.name
'Leo'
>>> leo.age = 33
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
>>> leo.pets
[Pet(name='kitty', age=3), Pet(name='puppy', age=5)]
>>> leo.pets[1].name
'puppy'
>>> leo.pets.append('panda')
>>> leo.pets
[Pet(name='kitty', age=3), Pet(name='puppy', age=5), 'panda']
namedtuple对象leo的name和age是不可变的, pets属性则是可变的list,可以在上面append任意类型数据。
与dict相比,namedtuple的优势首先是immutable的,其次可以使用点号引用属性。
根据文档,声明namedtuple的key列表field_names
时,
既可以写成'name age'
,也可以写成['name', 'age']
,二者是等价的。
JavaScript ES6中的const
关键字能够防止变量被rebinding,
不能保证变量本身不变,例如给一个const对象添加属性是合法的:
const foo = {};
foo.bar = 42;
console.log(foo.bar); // 42
所以const只能保证简单变量如果数值、字符串不变(不能rebinding),无法保证对象不变 (但声明为const的对象在可读性性上仍然有优势)。 Python无法保证简单量不变(没有防止rebinding的机制), 但可以用namedtuple生成严格意义上的immutable object,二者各有千秋。
函数参数列表中的星号
怎样理解下面代码中的Pair(*row[n*2:n*2+2])
?
from collections import namedtuple
Pair = namedtuple("Pair", ("x", "y"))
def series(n, row_iter):
for row in row_iter:
yield Pair(*row[n*2:n*2+2])
首先看如何在调用函数时使用星号:
def myfunc(foo, bar):
print('foo: %s' % foo)
print('bar: %s' % bar)
pms = ['p1', 'p2']
test_same_method = '''
>>> myfunc(*pms)
foo: p1
bar: p2
'''
__test__ = {
'Same_method': test_same_method
}
def test():
import doctest
doctest.testmod()
if __name__ == "__main__":
test()
可知星号的作用是把作为输入参数的list展开成多个参数, 再被送给函数处理。
上面代码中row[n*2:n*2+2]
的值是一个长度为2的list,假设是[2.32 8.23]
,
那么Pair(*row[n*2:n*2+2])
实际上是Pair(2.32, 8.23)
。
Chapter 4
iterable vs list
iterable
是可迭代的对象,可以通过iter()
或者自己的__iter__()
方法转变为迭代器(iterator).
iterable
可以是有状态的,也可以是无状态的,例如字符串和list都是iterable,
但前者是无状态的(immutale),后者是mutable的。
iterator
是迭代器,是有状态的,它的内部指针记录了当前值,iterator
也是一种iterable
。
每次next(iterator1)
,就返回iterator1
的内部指针指向的当前元素,然后指针后移一位。
类似于FIFO的队列,而list的pop()
方法可以看作是LIFO的栈,
它们之间的关系是:
对iterable
使用iter()
函数,得到lazy的iterator
,
对iterator
使用list()
函数,得到eager的list,即:
iter(iterable) -> iterator (lazy)
list(iterator) -> list (eager)
而生成器(generator)是迭代器的一种,
生成器(generator)包含两种子类型:generator function, generator expression.
前者是包含yield
关键字的函数,后者是形如(x*x for x in range(3))
这样的表达式,
List comprehension, such as mylist = [x*x for x in range(3)]
is just a syntax sugar for
my_ge = (x*x for x in range(3))
mylist = list(my_ge)`
由于generator是一种iterator,所以可以用下面的方法对其取值:
In [1]: aa = (x*x for x in range(3))
In [2]: aa?
Type: generator
String form: <generator object <genexpr> at 0x7f226a1184c8>
Docstring: <no docstring>
In [3]: next(aa)
Out[3]: 0
In [4]: list(aa)
Out[4]: [1, 4]
In [5]: list(x*x for x in range(3))
Out[5]: [0, 1, 4]
In [6]: list((x*x for x in range(3)))
Out[6]: [0, 1, 4]
参考:
下面代码演示了二者的关系,
In [1]: s = 'cat'
In [2]: sit = s.__iter__()
In [3]: next(sit)
Out[3]: 'c'
In [4]: next(sit)
Out[4]: 'a'
In [5]: sit2 = iter(s)
In [6]: next(sit2)
Out[6]: 'c'
In [7]: next(sit2)
Out[7]: 'a'
In [8]: sit?
Type: str_iterator
String form: <str_iterator object at 0x7fdc48a19940>
Docstring: <no docstring>
In [9]: sit2?
Type: str_iterator
String form: <str_iterator object at 0x7fdc48a1add8>
Docstring: <no docstring>
下面是对P68代码的分析:
$ ipython
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
In [1]: def legs(lat_lon_iter):
...: begin = next(lat_lon_iter)
...: print('begin: %s' % begin)
...: for end in lat_lon_iter:
...: print('end: %s' % end)
...: yield begin, end
...: print('begin, end after yield: %s, %s' % (begin, end))
...: begin = end
...: print('begin, end after begin=end: %s, %s' % (begin, end))
...:
In [2]: list(legs(x for x in range(3)))
begin: 0
end: 1
begin, end after yield: 0, 1
begin, end after begin=end: 1, 1
end: 2
begin, end after yield: 1, 2
begin, end after begin=end: 2, 2
Out[2]: [(0, 1), (1, 2)]
In [3]: list(legs(iter([0,1,2])))
begin: 0
end: 1
begin, end after yield: 0, 1
begin, end after begin=end: 1, 1
end: 2
begin, end after yield: 1, 2
begin, end after begin=end: 2, 2
Out[3]: [(0, 1), (1, 2)]
In [4]: list(legs([0,1,2]))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-543b79d86b01> in <module>()
----> 1 list(legs([0,1,2]))
<ipython-input-1-53eedac44550> in legs(lat_lon_iter)
1 def legs(lat_lon_iter):
----> 2 begin = next(lat_lon_iter)
3 print('begin: %s' % begin)
4 for end in lat_lon_iter:
5 print('end: %s' % end)
TypeError: 'list' object is not an iterator
In [5]: myit = legs(x for x in range(3))
In [6]: next(myit)
begin: 0
end: 1
Out[6]: (0, 1)
In [7]: next(myit)
begin, end after yield: 0, 1
begin, end after begin=end: 1, 1
end: 2
Out[7]: (1, 2)
In [8]: next(myit)
begin, end after yield: 1, 2
begin, end after begin=end: 2, 2
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-8-410d07518a3f> in <module>()
----> 1 next(myit)
StopIteration:
In [9]:
Chapter 5
range object
P95. range
in Python 3 is lazy, immutable. It is not a generator,
and can be used multiple times.
See If range() is a generator in Python 3.3, why can I not call next() on a range? for details.
Chapter 7
关于 Spearman rank-order correlation 一节的计算公式和具体含义,参考 Spearman's rank correlation coefficient 的 Example 一节,给出了一个智商和看电视时间长度关系的例子很直观。
第1版第157页示例代码错误:
rank_x= tuple(ranker(pairs, key=lambda x:x[0] ))
和
rank_xy= (ranker(rank_x, key=lambda x:x[1] ))
中的ranker
函数应为rank_data
。
rank_data
函数分析
rank_data
函数的职责是做pattern matching:
-
用
if
分支语句将输入数据转为rerank
函数的输入格式交给rerank
函数处理; -
将
rerank
函数返回结果第1项(float
类型的rank排序值)插入到返回结果第2项 (Rank_Data
对象)的rank_seq
属性中。
rerank
函数分析
输入是由Rank_Data
组成的tuple,输出是一个iterator,
元素形如(2.5, Rank_Data(rank_seq=(), raw=(3, 1.2)))
,验证如下:
$ cd $FPP_CODE_SAMPLE/Chapter_7
$ ipython
In [4]: from ch07_ex4 import *
In [10]: key = lambda obj: obj
In [30]: pairs = ((2, 0.8), (3, 1.2), (5, 1.2), (7, 2.3), (11, 18))
In [31]: ranked = tuple(Rank_Data((),d) for d in pairs)
In [32]: ranked
Out[32]:
(Rank_Data(rank_seq=(), raw=(2, 0.8)),
Rank_Data(rank_seq=(), raw=(3, 1.2)),
Rank_Data(rank_seq=(), raw=(5, 1.2)),
Rank_Data(rank_seq=(), raw=(7, 2.3)),
Rank_Data(rank_seq=(), raw=(11, 18)))
In [33]: reRank = rerank(ranked, key)
In [34]: list(reRank)
Out[34]:
[(1.0, Rank_Data(rank_seq=(), raw=(2, 0.8))),
(2.0, Rank_Data(rank_seq=(), raw=(3, 1.2))),
(3.0, Rank_Data(rank_seq=(), raw=(5, 1.2))),
(4.0, Rank_Data(rank_seq=(), raw=(7, 2.3))),
(5.0, Rank_Data(rank_seq=(), raw=(11, 18)))]
In [36]: reRank_y = rerank(ranked, lambda x: x[1])
In [37]: list(reRank_y)
Out[37]:
[(1.0, Rank_Data(rank_seq=(), raw=(2, 0.8))),
(2.5, Rank_Data(rank_seq=(), raw=(3, 1.2))),
(2.5, Rank_Data(rank_seq=(), raw=(5, 1.2))),
(4.0, Rank_Data(rank_seq=(), raw=(7, 2.3))),
(5.0, Rank_Data(rank_seq=(), raw=(11, 18)))]
In [42]: list(reRank_y)
Out[42]: []
更详细的分析见rerank函数docstring.
Coroutine in Python
$ cat << EOF > coro.py
import itertools
def fib():
n, a, b = 0, 0, 1
while True:
yield b
a, b = b, a + b
n = n + 1
fg = fib()
f10 = itertools.islice(fg, 10)
print(list(f10))
$ python coro.py
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
Chapter 8
本章用到的函数示例: http://docspy3zh.readthedocs.io/en/latest/library/itertools.html#module-itertools
P169: 演示打开多个文件的源代码多出一行,在译文中被去掉:
readers = map(lambda f: csv.reader(f, delimiter='\t'), files)
因为这行代码在本页下面被提出来,上面不应该先写出来,
且一段代码中有相邻的两个readers
变量也不合理。
P170,groupby()
函数只能处理按key
排好序的序列,
如果没有排好序,groupby()
将返回错误结果,示例:
from itertools import groupby
things = [("animal", "bear"), ("plant", "cactus"), ("vehicle", "speed boat"),
("vehicle", "school bus"), ("animal", "duck")]
types = groupby(things, lambda x: x[0])
for key, group in types:
print('-------- a new type ---------')
for thing in group:
print('key: %s, thing: %s' % (key, thing))
sortedthings = sorted(things, key=lambda x: x[0])
sortedtypes = groupby(sortedthings, lambda x: x[0])
for key, group in sortedtypes:
print('-------- a sorted type ---------')
for thing in group:
print('key: %s, thing: %s' % (key, thing))
P175:
filterfalse = lambda pred, iterable:
filter(lambda x: not pred(x), iterable)
上面代码中,pred
和iterable
是输入参数,返回值是
filter(lambda x: not pred(x), iterable)
.
参考filter函数定义,
上面的返回值可以写为:
mypred = lambda x: not pred(x)
(item for item in iterable if mypred(item))
P178:
对take
的表述有误,function(0)
和function(1)
后面还有更多值,
见https://docs.python.org/2/library/itertools.html#recipes
consume()
函数用法比较特别,它类似于listobj.append()
函数,
最终结果不是通过返回值得到,而是直接修改输入参数的状态(例如上面的listobj对象):
myiter = iter(range(15))
consume(myiter, 7)
list(myiter)
而不是:
myiter = iter(range(15))
res = consume(myiter, 7)
list(res)
Chapter 9
像素数据格式
pixel_subset
是一个tuple,其中每个元素又是一个tuple,
包含两部分:坐标和颜色值(用RGB三元组表示)。
In [41]: pixel_subset
Out[41]:
(((0, 0), (92, 139, 195)),
((0, 1), (92, 139, 195)),
((0, 2), (92, 139, 195)),
((0, 3), (91, 138, 194)),
((0, 4), (91, 138, 194)),
((0, 5), (91, 138, 194)),
((0, 6), (91, 138, 194)),
((0, 7), (91, 138, 194)),
((0, 8), (92, 139, 195)),
((0, 9), (93, 140, 196)))
combinations()
vs combinations_with_replacement()
早晨翻译了一页FPP,为了对比 itertools
模块中 combinations()
和
combinations_with_replacement()
函数的区别
(首先参考了itertools doc),
在华硕笔记本上安装了 Python3 版本的 miniconda (choco install miniconda3
),
安装完后 conda
不会被加进 %PATH%
,而是在启动菜单里新建了一个快捷方式,
内容为 %windir%\system32\cmd.exe "/K" C:\ProgramData\Miniconda3\Scripts\activate.bat C:\ProgramData\Miniconda3
,
启动它进入的是 conda
的 root 环境,新建一个 fpp 环境,
用 conda
安装 python
和 ipython
就可以使用 IPython 了
(比安装完整的 Anaconda 更精简),下面是两个函数的对比:
In [1]: from itertools import combinations, combinations_with_replacement
In [3]: list(combinations(range(4), 2))
Out[3]: [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
In [4]: list(combinations(range(4), 3))
Out[4]: [(0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3)]
In [5]: list(combinations_with_replacement(range(4), 2))
Out[5]:
[(0, 0), (0, 1), (0, 2), (0, 3), (1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
make_color_map() 函数分析
colors = get_colors()
从"crayola.gpl"文件中加载颜色值,colors
是一个tuple,
元素形如:Color(rgb=(247, 83, 148), name='Violet Red')
best = (min((euclidean(rgb, c), rgb, c) for c in colors)
for rgb in product(bit3, bit3, bit3))
这里best
的含义是:对每一个给定的rgb
值,与colors
中的每种颜色(的RGB值)
逐一比较,求出其中距离最短的值。这里利用了min()
函数比较tuple时的取值特点:
min((x0, y0, z0), (x1, y1, z1), …, (xn, yn, zn))
返回的tuple是x值最小的那个元素,
而与y, z的大小无关,例如
»>> min((1, 20, 30), (1, 2, 3), (0.2, 1, 1))
(0.2, 1, 1)
然后利用for语句生成与每个RGB颜色最接近的Crayola颜色的值,例如下面的best
的一个
元素表达的是:对RGB颜色(0, 0, 0)
,与它最接近的Crayola颜色是名字为'Black',
RGB为(0, 0, 0)
的颜色:
»>> best.next()
(0.0, (0, 0, 0), Color(rgb=(0, 0, 0), name='Black'))
接下来以RGB颜色为key,以对应的Crayola颜色为value形成一个数据字典:
color_map = dict((b[1], b[2].rgb) for b in best)
,例如:
»>> color_map[(0, 64, 64)]
(26, 72, 118)
运行以上测试的方法为:python -m Chapter_9.ch09_ex1
.
permutations & combinations
>>> from itertools import permutations, combinations
>>> list(permutations(range(3)))
[(0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)]
>>> list(combinations(range(4), 2))
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
>>> list(enumerate((3, 7, 2, 1, -4, 5)))
[(0, 3), (1, 7), (2, 2), (3, 1), (4, -4), (5, 5)]
permutations()
返回序列的所有排列,combinations()
返回序列的所有组合,
enumerate()
返回一组二元tuple,其中第一个元素是序列值的下标,第二个元素是序列值。
'Generating all combinations' 一节代码分析
对应源代码为 Chapter_9/ch09_ex3.py.
s7, s43等数据源都是一维list,格式为
['', '2001', …, '2017', '', 'X', '1.2', …, '83', '', 'Y', '123', …, '93.6']
详见http://www.tylervigen.com/view_correlation?id=7
但实际上是一个3行(n + 1)列的表格,第一行是表头,第二行是变量X的值, 第三行是变量Y的值,每一列是一个具体的年份,例如:
| 2001 | 2002 | 2003 | 2017
---- | ---- | ---- | ---- | ---- X | 1.2 | 24.3 | 3.8 | 83 Y | 123 | 4.3 | 382 | 93.6
被column_data()
处理后变为如下形式:
>>> from pprint import pprint
>>> source = list(column_data(s7, s3890, s43))
>>> pprint(source)
[('year',
'Per capita consumption of cheese (US)Pounds (USDA)',
'Number of people who died by becoming tangled in their bedsheetsDeaths '
'(US) (CDC)',
'year',
'Per capita consumption of mozzarella cheese (US)Pounds (USDA)',
'Civil engineering doctorates awarded (US)Degrees awarded (National '
'Science Foundation)',
'year',
'US crude oil imports from VenezuelaMillions of barrels (Dept. of Energy)',
'Per capita consumption of high fructose corn syrup (US)Pounds (USDA)'),
('2000', '29.8', '327', '2000', '9.3', '480', '2000', '446', '62.6'),
('2001', '30.1', '456', '2001', '9.7', '501', '2001', '471', '62.5'),
('2002', '30.5', '509', '2002', '9.7', '540', '2002', '438', '62.8'),
('2003', '30.6', '497', '2003', '9.7', '552', '2003', '436', '60.9'),
('2004', '31.3', '596', '2004', '9.9', '547', '2004', '473', '59.8'),
('2005', '31.7', '573', '2005', '10.2', '622', '2005', '449', '59.1'),
('2006', '32.6', '661', '2006', '10.5', '655', '2006', '416', '58.2'),
('2007', '33.1', '741', '2007', '11', '701', '2007', '420', '56.1'),
('2008', '32.7', '809', '2008', '10.6', '712', '2008', '381', '53'),
('2009', '32.8', '717', '2009', '10.6', '708', '2009', '352', '50.1')]
也就是3个数据源纵向合并形成一个二维表。multi_corr(source)
函数计算source
中
任意两列的相关系数(其中3个重复的year
列彼此间的相关系数为1,被去掉),
计算结果格式为(元素中第3个值为相关系数):
[('X1', 'X2', 0.98), ('X3', 'X5', 0.96), … ]
.
完整代码:
>>> results = list(multi_corr(source))
>>> results[0]
('year', 'Per capita consumption of cheese (US)Pounds (USDA)', 0.9630127825658988)
>>> print( "{2: 4.2f}: {0} vs {1}".format(*results[0]) )
0.96: year vs Per capita consumption of cheese (US)Pounds (USDA)
product()
与 permutations()
的区别
>>> list(product(range(3), range(4)))
[(0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3)]
>>> list(permutations(range(3), 2))
[(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]
可以看到product()
的参数是 n 个集合,permutations()
的参数是一个集合和一个长度值,
前者可以看作多个集合组成的多重for
循环,后者返回一个几个内部的所有排列。