I'm trying to use StratifiedKFold
to create train/test/val splits for use in a non-sklearn machine learning work flow.
(我正在尝试使用StratifiedKFold
来创建训练/测试/ val拆分,以用于非sklearn机器学习工作流程。)
So, the DataFrame needs to be split and then stay that way. (因此,需要拆分DataFrame,然后再保持这种状态。)
I'm trying to do it like the following, using .values
because I'm passing pandas DataFrames:
(我正在尝试使用.values
进行以下操作,因为我正在传递pandas DataFrames:)
skf = StratifiedKFold(n_splits=3, shuffle=False)
skf.get_n_splits(X, y)
for train_index, test_index, valid_index in skf.split(X.values, y.values):
print("TRAIN:", train_index, "TEST:", test_index, "VALID:", valid_index)
X_train, X_test, X_valid = X.values[train_index], X.values[test_index], X.values[valid_index]
y_train, y_test, y_valid = y.values[train_index], y.values[test_index], y.values[valid_index]
This fails with:
(失败的原因是:)
ValueError: not enough values to unpack (expected 3, got 2).
I read through all of the sklearn
docs and ran the example code, but did not gain a better understanding of how to use stratified k fold splits outside of a sklearn
cross-validation scenario.
(我通读了所有sklearn
文档并运行了示例代码,但没有更好地理解如何在sklearn
交叉验证方案之外使用分层的k倍拆分。)
EDIT:
(编辑:)
I also tried like this:
(我也这样尝试过:)
# Create train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, stratify=y)
# Create validation split from train split
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.05)
Which seems to work, although I imagine I'm messing with the stratification by doing so.
(这似乎可行,尽管我想我这样做会弄乱分层。)
ask by tw0000 translate from so