The following codes demonstrates how to decorate a scalar function (myadd
)
to a "dataframe applied" function
(a function can be used by DataFrame.apply()
of pandas):
cat << EOF > demo.py
import pandas as pd
def dffunc(func):
def wrapper(*args, **kwargs):
print(args)
print(type(args))
row = args[0]
print(type(row))
params = {}
for key in kwargs:
if kwargs[key] in row:
params[key] = row[kwargs[key]]
else:
params[key] = kwargs[key]
return func(**params)
return wrapper
@dffunc
def myadd(x, y, z):
return 10 * x + y - z
df = pd.DataFrame({'a': range(6),
'b': ['foo', 'bar'] * 3,
'c': range(12, 18)})
df['d'] = df.apply(myadd, axis=1, x='a', y='c', z=33)
print(df)
EOF
python demo.py
(a 0
b foo
c 12
Name: 0, dtype: object,)
<class 'tuple'>
<class 'pandas.core.series.Series'>
(a 1
b bar
c 13
Name: 1, dtype: object,)
<class 'tuple'>
<class 'pandas.core.series.Series'>
...
a b c d
0 0 foo 12 -21
1 1 bar 13 -10
2 2 foo 14 1
3 3 bar 15 12
4 4 foo 16 23
5 5 bar 17 34
Based on above output, we see when a function (myadd
) is called
by df.apply()
, the only positional parameter is each row of the dataframe,
wrapped in tuple args
containing all positional pamameters.
When args
is printed in wrapper
function, we see (row, )
,
where row
is a: 0, b: foo, c: 12
.
We can get all named parameters and their values from kwargs
in the form of dictionary, manipulating each as we like (the for
loop in
function dffunc
), and run the origin function (myadd
) with new parameters
(func(**params)
).