Apply a scalar function to a dataframe

The following codes demonstrates how to decorate a scalar function (myadd) to a "dataframe applied" function (a function can be used by DataFrame.apply() of pandas):

cat << EOF > demo.py
import pandas as pd

def dffunc(func):
    def wrapper(*args, **kwargs):
        print(args)
        print(type(args))
        row = args[0]
        print(type(row))
        params = {}
        for key in kwargs:
            if kwargs[key] in row:
                params[key] = row[kwargs[key]]
            else:
                params[key] = kwargs[key]
        return func(**params)
    return wrapper

@dffunc
def myadd(x, y, z):
    return 10 * x + y - z

df = pd.DataFrame({'a': range(6),
                   'b': ['foo', 'bar'] * 3,
                   'c': range(12, 18)})
df['d'] = df.apply(myadd, axis=1, x='a', y='c', z=33)
print(df)
EOF

python demo.py
(a      0
b    foo
c     12
Name: 0, dtype: object,)
<class 'tuple'>
<class 'pandas.core.series.Series'>
(a      1
b    bar
c     13
Name: 1, dtype: object,)
<class 'tuple'>
<class 'pandas.core.series.Series'>
...

   a    b   c   d
0  0  foo  12 -21
1  1  bar  13 -10
2  2  foo  14   1
3  3  bar  15  12
4  4  foo  16  23
5  5  bar  17  34

Based on above output, we see when a function (myadd) is called by df.apply(), the only positional parameter is each row of the dataframe, wrapped in tuple args containing all positional pamameters. When args is printed in wrapper function, we see (row, ), where row is a: 0, b: foo, c: 12.

We can get all named parameters and their values from kwargs in the form of dictionary, manipulating each as we like (the for loop in function dffunc), and run the origin function (myadd) with new parameters (func(**params)).

Apply a scalar function to a dataframe

Published

Last Updated

Category

Tags

Contact