python - One hot encoding from numpy

Question

Welcome To Ask or Share your Answers For Others

python - One hot encoding from numpy

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - One hot encoding from numpy

I am trying to understand values output from an example python tutorial. The output doesent seem to be in any order that I can understand. The particular python lines are causing me trouble :

vocab_size = 13   #just to provide all variable values
m = 84 #just to provide all variable values
Y_one_hot = np.zeros((vocab_size, m))
Y_one_hot[Y.flatten(), np.arange(m)] = 1

The input Y.flatten() is evaluated as the following numpy-array :

  [ 8  9  7  4  9  7  8  4  8  7  8 12  4  8  9  8 12  7  8  9  7 12  7  2
  9  7  8  7  2  0  7  8 12  2  0  8  8 12  7  0  8  6 12  7  2  8  6  5
  7  2  0  6  5 10  2  0  8  5 10  1  0  8  6 10  1  3  8  6  5  1  3 11
  6  5 10  3 11  5 10  1 11 10  1  3]

np arrange is a tensor ranging from 0-83

np.arange(m)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83]

Ok so the output that I am having trouble understanding from the new Y_one_hot is that I recieve a numpy array of size 13 (as expected) but I do not understand why the positions of the ones are located where they are located based on the Y.flatten() input for example here is the first array of the 13:

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
  0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0]

Could someone please explain how I got from that input value to that output array from that single line? It seems like the ones are in random positions and in some other arrays of the 13 the number of ones also seems to be random. Is this the intended behavior?

here is a full runnable example:

import numpy as np
import sys
import re



# turn Y into one hot encoding
Y =  np.array([ 8,  9,  7,  4 , 9,  7,  8,  4,  8,  7,  8, 12,  4,  8,  9,  8, 12,  7,  8,  9,  7, 12,  7,  2,
  9,  7,  8,  7,  2,  0,  7,  8, 12,  2,  0,  8,  8, 12,  7,  0,  8,  6, 12,  7,  2,  8,  6,  5,
  7,  2,  0,  6,  5, 10,  2,  0,  8,  5, 10,  1,  0,  8,  6, 10,  1,  3,  8,  6,  5,  1,  3, 11,
  6,  5, 10,  3, 11,  5, 10,  1, 11, 10,  1,  3])
m = 84
vocab_size = 13
Y_one_hot = np.zeros((vocab_size, m))
Y_one_hot[Y.flatten(), np.arange(m)] = 1
np.set_printoptions(threshold=sys.maxsize)
print(Y_one_hot.astype(int))

question from:https://stackoverflow.com/questions/65643248/one-hot-encoding-from-numpy

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T18:46:36+0000

The line Y_one_hot[Y.flatten(), np.arange(m)] = 1 is setting values of the array with lists of integer indices (Documented at Integer Array Indexing)

The arrays of indices are broadcast together, and the result for 1D arrays is essentially an efficient way to do this:

for i, j in zip(Y.flatten(), np.arange(m)):
    Y_one_hot[i, j] = 1

In words, each column of Y_one_hot corresponds to an entry of Y.flatten(), and has a single nonzero value in the row given by the entry.

It may be easier to see with a smaller array:

Y_onehot = np.zeros((2, 3), dtype=int)
Y = np.array([0, 1, 0])

Y_onehot[Y.flatten(), np.arange(3)] = 1

print(Y_onehot)
# [[1 0 1]
#  [0 1 0]]

Three entries map to three columns, and each column has a single nonzero entry in the row corresponding to the value.

Categories

python - One hot encoding from numpy

python - One hot encoding from numpy

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags