Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
432 views
in Technique[技术] by (71.8m points)

python - 在Python 3中将字符串转换为字节的最佳方法?(Best way to convert string to bytes in Python 3?)

There appears to be two different ways to convert a string to bytes, as seen in the answers to TypeError: 'str' does not support the buffer interface

(从TypeError的答案中可以看出,有两种不同的方法可以将字符串转换为字节:'str'不支持缓冲区接口)

Which of these methods would be better or more Pythonic?

(以下哪种方法更好或更Pythonic?)

Or is it just a matter of personal preference?

(还是仅仅是个人喜好问题?)

b = bytes(mystring, 'utf-8')

b = mystring.encode('utf-8')
  ask by Mark Ransom translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you look at the docs for bytes , it points you to bytearray :

(如果查看bytes文档,它将指向bytearray :)

bytearray([source[, encoding[, errors]]])

(bytearray([源[,编码[,错误]]]))

Return a new array of bytes.

(返回一个新的字节数组。)

The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

(字节数组类型是一个可变的整数序列,范围为0 <= x <256。它具有可变序列类型中介绍的大多数可变序列的常用方法,以及字节类型具有的大多数方法,请参见字节和。字节数组方法。)

The optional source parameter can be used to initialize the array in a few different ways:

(可选的source参数可以通过几种不同的方式用于初始化数组:)

If it is a string, you must also give the encoding (and optionally, errors) parameters;

(如果是字符串,则还必须提供编码(以及可选的错误)参数;)

bytearray() then converts the string to bytes using str.encode().

(然后,bytearray()使用str.encode()将字符串转换为字节。)

If it is an integer, the array will have that size and will be initialized with null bytes.

(如果它是整数,则数组将具有该大小,并将使用空字节初始化。)

If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.

(如果它是符合缓冲区接口的对象,则该对象的只读缓冲区将用于初始化bytes数组。)

If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

(如果是可迭代的,则它必须是0 <= x <256范围内的整数的可迭代对象,这些整数用作数组的初始内容。)

Without an argument, an array of size 0 is created.

(没有参数,将创建大小为0的数组。)

So bytes can do much more than just encode a string.

(因此, bytes可以对字符串进行编码,还可以做更多的事情。)

It's Pythonic that it would allow you to call the constructor with any type of source parameter that makes sense.

(这是Pythonic的用法,它允许您使用有意义的任何类型的源参数来调用构造函数。)

For encoding a string, I think that some_string.encode(encoding) is more Pythonic than using the constructor, because it is the most self documenting -- "take this string and encode it with this encoding" is clearer than bytes(some_string, encoding) -- there is no explicit verb when you use the constructor.

(对于编码字符串,我认为some_string.encode(encoding)比使用构造函数更具Python some_string.encode(encoding) ,因为它是最易于说明的文档-“使用此字符串并使用此编码对其进行编码”比bytes(some_string, encoding)更清晰bytes(some_string, encoding) -使用构造函数时没有显式动词。)

Edit: I checked the Python source.

(编辑:我检查了Python源。)

If you pass a unicode string to bytes using CPython, it calls PyUnicode_AsEncodedString , which is the implementation of encode ;

(如果使用CPython将unicode字符串传递给bytes ,则它将调用PyUnicode_AsEncodedString ,它是encode的实现;)

so you're just skipping a level of indirection if you call encode yourself.

(因此,如果您自己encode那么您将跳过间接级别。)

Also, see Serdalis' comment -- unicode_string.encode(encoding) is also more Pythonic because its inverse is byte_string.decode(encoding) and symmetry is nice.

(另外,请参见Serdalis的评论unicode_string.encode(encoding)也是Pythonic的,因为它的倒数是byte_string.decode(encoding)并且对称性很好。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...