Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
556 views
in Technique[技术] by (71.8m points)

python - InvalidArgumentError: assertion failed: [Labels must be <= n_classes - 1] [Condition x <= y did not hold element-wise:] [x (head/losses/Cast:0) = ]


I've been trying to implement this ML Linear Model into my dataset. (https://www.tensorflow.org/tutorials/estimator/linear)
Language: Python 3.8.3
L?braries: TensorFlow 2.4.0
Numpy: 1.19.3
Pandas
Matplotlib
and the others:
import os
import sys

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib
import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf

ss1517 is the name of my dataset. It is a CSV file with 4116 rows and 20 columns and has lots of NaN values( There is no column that hasn't NaN value)

traindata = ss1517.iloc[0:2470,:] # 60 % of my dataset is splitted by training set
evaldata = ss1517.iloc[2470:4116, :] # 40 % of my dataset is splitted by eval set
ytrain = traindata.pop("AvgOfMajor N")
yeval = evaldata.pop("AvgOfMajor N")

CATEGORICAL_COLUMNS are the categorical columns in my dataset.
NUMERIC_COLUMNS are the numeric columns in my dataset.

CATEGORICAL_COLUMNS = ['Location-Name', 'Location-Code', 'Borough', 'Building-Name', 'Schools-in-Building', 'ENGroupA', 'RangeA']
NUMERIC_COLUMNS = ['Geographical-District-Code', 'Register', '#-Schools', 'Major-N', 'Oth-N', 'NoCrim-N', 'Prop-N', 'Vio-N', 'AvgOfOth-N', 'AvgOfNoCrim-N', 'AvgOfProp-N', 'AvgOfVio-N']

feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
  vocabulary = traindata[feature_name].unique()
  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
  def input_function():# inner function, this will be returned.
    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) # Create tf.data.Dataset object with data and its label
    if shuffle:
      ds = ds.shuffle(1000) # randomize order of data
    ds = ds.batch(batch_size).repeat(num_epochs)
    return ds # return a batch of dataset
  return input_function # return the input_function

train_input_fn = make_input_fn(traindata, ytrain) 
eval_input_fn = make_input_fn(evaldata, yeval, num_epochs=1, shuffle=False) 
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
linear_est.train(train_input_fn) #train
result = linear_est.evaluate(eval_input_fn) #get model metrics/stats by testing on testing data

clear_output() #clears console output
print(result["accuracy"]) #the result variable is simply dict of stats about our model

I have this error InvalidArgumentError: assertion failed: [Labels must be <= n_classes - 1] [Condition x <= y did not hold element-wise:] [x (head/losses/Cast:0) = ] [[0.28][0.28][1.69]...] [y (head/losses/check_label_range/Const:0) = ] [1]
when I run this cell:

linear_est.train(train_input_fn) #train
result = linear_est.evaluate(eval_input_fn) #get model metrics/stats by testing on testing data

clear_output() #clears console output
print(result["accuracy"]) #the result variable is simply dict

Note I used fillna(method="bfill") and fillna(method="ffill) on my dataset (ss1517) to fill the Na values.
How could I solve this error?

question from:https://stackoverflow.com/questions/65642746/invalidargumenterror-assertion-failed-labels-must-be-n-classes-1-condi

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Tensorflow expects the integers from 0 up to the number of classes as class labels (range(0, num_classes)).

If you want to keep the labels as is, add label_vocabulary parameter to the classifier-definition:

linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns, label_vocabulary={add_list_of_labels_here})

label_vocabulary:

A list of strings represents possible label values. If given, labels must be string type and have any value in label_vocabulary. If it is not given, that means labels are already encoded as integer or float within [0, 1] for n_classes=2 and encoded as integer values in {0, 1,..., n_classes-1} for n_classes>2 . Also there will be errors if vocabulary is not provided and labels are string.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...