一行Python代码训练所有分类或回归模型-益强资讯全景

系统运维: 一行Python代码训练所有分类或回归模型
时间：2010-12-5 17:23:32 作者：系统运维来源：人工智能查看：评论：0
内容摘要：自动化机器学习(自动ML)是指自动化数据科学模型开发管道的组件。Automl减少数据科学家的工作量并加快工作流程。Automl可用于自动化各种流水线组件，包括数据理解，EDA，数据处理，模型训练，Qu
自动化机器学习(自动ML)是码训模型指自动化数据科学模型开发管道的组件。Automl减少数据科学家的练所类或工作量并加快工作流程。Automl可用于自动化各种流水线组件，有分包括数据理解，回归EDA，码训模型数据处理，练所类或模型训练，有分Quand参数调谐等。回归
对于端到端机器学习项目，码训模型每个管道组件的练所类或复杂性取决于项目。有各种自动启用源库，有分可加快每个管道组件。回归阅读本文知道8个自动列表库以自动化机器学习管道。码训模型
在本文中，练所类或我们将讨论如何使用开源Python库LazyPredict自动化模型训练过程。有分
什么是lazypredict?
LazyPredict是一个开源Python库，可自动化模型训练管道并加快工作流程。LazyPredict在分类数据集中约为30个分类模型，并列出了回归数据集的40个回归模型。
LazyPredict与训练有素的型号一起回到其性能指标，而无需编写太多代码。人们可以比较每个模型的源码库性能指标并调整最佳模型，以进一步提高性能。
安装：
leazepredict可以使用pypl库安装：
pip install lazypredict
安装后，可以导入库进行分类和回归模型的自动训练。
from lazypredict.Supervised import LazyRegressor, LazyClassifier
用法：
LazyPredict支持分类和回归问题，所以我会讨论两个任务的演示
波士顿住房(回归)和泰坦尼克号(分类)DataSet用于演示LazyPredict库。
分类任务：
LazyPredict的用法非常直观，类似于Scikit-learn。首先，为分类任务创建估计器LazyClassifier的实例。一个可以通过定制度量标准进行评估，默认情况下，每种型号将在准确性，ROC AUC分数，F1分数进行评估。
在继续进行LazyPredict模型训练之前，必须阅读数据集并处理它以使其适合训练。
import pandas as pd from sklearn.model_selection import train_test_split # Read the titanic dataset df_cls = pd.read_csv("titanic.csv") df_clsdf_cls = df_cls.drop([PassengerId,Name,Ticket, Cabin], axis=1) # Drop instances with null records df_clsdf_cls = df_cls.dropna() # feature processing df_cls[Sex] = df_cls[Sex].replace({ male:1, female:0}) df_cls[Embarked] = df_cls[Embarked].replace({ S:0, C:1, Q:2}) # Creating train test split y = df_cls[Survived] X = df_cls.drop(columns=[Survived], axis=1) # Call train test split on the data and capture the results X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)
在特征工程和将数据分成训练测试数据之后，我们可以使用LazyPredict进行模型训练。
# LazyClassifier Instance and fiting data cls= LazyClassifier(ignore_warnings=False, custom_metric=None) models, predictions = cls.fit(X_train, X_test, y_train, y_test)
回归任务：
类似于分类模型训练，LazyPredict附带了回归数据集的自动模型训练。实现类似于分类任务，在实例LazyRegressor中的更改。云南idc服务商
import pandas as pd from sklearn.model_selection import train_test_split # read the data column_names = [CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B, LSTAT, MEDV] df_reg = pd.read_csv("housing.csv", header=None, delimiter=r"\s+", names=column_names) # Creating train test split y = df_reg[MEDV] X = df_reg.drop(columns=[MEDV], axis=1) # Call train_test_split on the data and capture the results X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2) reg = LazyRegressor(ignore_warnings=False, custom_metric=None) models, predictions = reg.fit(X_train, X_test, y_train, y_test)
> (Image by Author), Performance metrics of 42 regression models for the Boston Housing dataset
观察上述性能指标，Adaboost分类器是分类任务的最佳性能模型，渐变增强的替换机策略模型是回归任务的最佳表现模型。
结论：
在本文中，我们已经讨论了LazyPredict库的实施，这些库可以在几行Python代码中训练大约70个分类和回归模型。它是一个非常方便的工具，因为它给出了模型执行的整体情况，并且可以比较每个模型的性能。
每个模型都训练，默认参数，因为它不执行HyperParameter调整。选择最佳执行模型后，开发人员可以调整模型以进一步提高性能。
谢谢你的阅读！
本文翻译自Christopher Tao的文章《Train all Classification or Regression models in one line of Python Code》。
香港云服务器
NVIDIA携手Booz AllenHamilton，以AI赋能的网络安全服务助力企业抵御风险
 融合极简——华为全新发布电力模块3.0