You are a little confused here; you speak about accuracy, while showing the formula for the loss.
The equation you show is indeed the cross-entropy loss formula for binary classification (or simply logistic loss).
y[i]
are the labels, which are indeed either 0 or 1.
p[i]
are the predictions, usually interpreted as probabilities, which are real numbers in [0,1] (without any rounding).
Now for each i
, only one term in the sum will survive - the first term vanishes when y[i] = 0
, and similarly the second term vanishes when y[i] = 1
.
Let's see some examples:
Suppose that y[0] = 1
, while we have predicted p[0] = 0.99
(i.e. a rather good prediction). The second term of the sum vanishes (since 1 - y[0] = 0
), while the first one becomes log(0.99) = -0.01
; so, the contribution of this sample prediction (i=0
) to the overall loss is 0.01 (due to the -
sign in front of the sum).
Suppose now that the true label of the next sample is again 1, i.e. y[1] = 1
, but here we have made a rather poor prediction of p[1] = 0.1
; again, the second term vanishes, and the contribution of this prediction to the overall loss is now -log(0.1) = 2.3
, which is indeed greater than our first, good prediction, as we should expect intuitively.
As a final example, let's suppose that y[2] = 0
, and we have made a perfectly good prediction here of p[2] = 0
; hence, the first term vanishes, and the second term becomes
(1 - y[2]) * log(1 - p[2]) = 1 * log(1) = log(1) = 0
i.e. we have no loss contributed, again as we intuitively expected, since we have made a perfectly good prediction here for i=2
.
The logistic loss formula simply computes all these errors of the individual predictions, sums them, and divides by their number n
.
Nevertheless, this is the loss (i.e. scores[0]
in your snippet), and not the accuracy.
Loss and accuracy are different things; roughly speaking, the accuracy is what we are actually interested in from a business perspective, while the loss is the objective function that the learning algorithms (optimizers) are trying to minimize from a mathematical perspective. Even more roughly speaking, you can think of the loss as the "translation" of the business objective (accuracy) to the mathematical domain, a translation which is necessary in classification problems (in regression ones, usually the loss and the business objective are the same, or at least can be the same in principle, e.g. the RMSE)...
Will Keras automatically round our predictions to 0 or 1?
Actually yes: to compute the accuracy, we implicitly set a threshold in the predicted probabilities (usually 0.5 in binary classification, but this may differ in the case of highly imbalanced data); so, in model.evaluate
, Keras actually converts our predictions to 1 if p[i] > 0.5
and to 0 otherwise. Then, the accuracy is computed by simply counting the cases where y_true==y_pred
(correct predictions) and dividing by the total number of samples, to give a number in [0,1].
So, to summarize:
- There is no rounding for the computation of loss
- There is an implicit thresholding operation for the computation of accuracy