DarkMatter in Cyberspace
  • Home
  • Categories
  • Tags
  • Archives

Estimator, Transformer and Pipeline in Machine Learning


Main concepts in Pipelines (in Spark Docs > Programming Guides > Machine Learning Library (MLlib) Guide > ML Pipelines) gives a good introduction to these 3 concepts.

A dataframe is converted to a model (which is a transformser) via the fit() method of an estimator. A dataframe is converted to another dataframe via the transform() method of a transformer. A pipeline is an estimator, which is consisted by many transformers and estimators.

The document provides an example of above concepts. A text processing pipeline built with 2 transformers (Tokenizer and HashingTF) and an estimator (LogisticRegression). The input of the fit() method of the pipeline is raw text in dataframe. The output is a model (transformer), which is used to predict new text.

As the documents mentioned, the pipeline and related concepts are mostly inspired by scikit-learn. So these concepts in scikit-learn have the same meanings.



Published

Jan 23, 2018

Last Updated

Jan 23, 2018

Category

Tech

Tags

  • estimator 1
  • machine learning 3
  • pipeline 1
  • transformer 1

Contact

  • Powered by Pelican. Theme: Elegant by Talha Mansoor