DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Apply a scalar function to a dataframe


The following codes demonstrates how to decorate a scalar function (myadd) to a "dataframe applied" function (a function can be used by DataFrame.apply() of pandas):

cat << EOF > demo.py
import pandas as pd

def dffunc(func):
    def wrapper(*args, **kwargs):
        print(args)
        print(type(args))
        row = args[0]
        print(type(row))
        params = {}
        for key in kwargs:
            if kwargs[key] in row:
                params[key] = row[kwargs[key]]
            else:
                params[key] = kwargs[key]
        return func(**params)
    return wrapper

@dffunc
def myadd(x, y, z):
    return 10 * x + y - z

df = pd.DataFrame({'a': range(6),
                   'b': ['foo', 'bar'] * 3,
                   'c': range(12, 18)})
df['d'] = df.apply(myadd, axis=1, x='a', y='c', z=33)
print(df)
EOF

python demo.py
(a      0
b    foo
c     12
Name: 0, dtype: object,)
<class 'tuple'>
<class 'pandas.core.series.Series'>
(a      1
b    bar
c     13
Name: 1, dtype: object,)
<class 'tuple'>
<class 'pandas.core.series.Series'>
...

   a    b   c   d
0  0  foo  12 -21
1  1  bar  13 -10
2  2  foo  14   1
3  3  bar  15  12
4  4  foo  16  23
5  5  bar  17  34

Based on above output, we see when a function (myadd) is called by df.apply(), the only positional parameter is each row of the dataframe, wrapped in tuple args containing all positional pamameters. When args is printed in wrapper function, we see (row, ), where row is a: 0, b: foo, c: 12.

We can get all named parameters and their values from kwargs in the form of dictionary, manipulating each as we like (the for loop in function dffunc), and run the origin function (myadd) with new parameters (func(**params)).



Published

Mar 18, 2020

Last Updated

Mar 18, 2020

Category

Tech

Tags

  • decorator 1
  • pandas 5
  • python 136

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor