A string, such as “Kodeclik Online Academy” is represented in Python as a sequence of bytes and it is very easy to understand how the string representation (which is the list of characters) is mapped to the underlying sequence of bytes. For instance, the first byte will represent “K”, the second byte will represent “o”, and so on.
There are two ways to convert a Python string to the underlying array of bytes used to store it. The first approach is to use the “encode” method on the string. Alternatively, we can use the bytes function that can be applied on the given string. Both methods allow the specification of the encoding to be used. We will see how each of these approaches work.
Converting Python strings to bytes using the encode() method
A simple way to convert Python strings to bytes would work as follows:
Here we have created a string (called “name”), then used the method “encode” to arrive at a list of bytes (with an argument, namely “ascii” which indicates the encoding to be used). Finally we print the resulting bytelist. The output is:
Hmm. That is not so insightful. However, the “b” in front of the string essentially says that what follows it is stored in byte form which your print statement has recognized and printed in a very informative manner. If you would like to peek into the individual bytes, we update the program to:
Now the output will be:
You can confirm that the upper case letters have lower values (in the ASCII exchange format) than lower case letters. This is because they come earlier in the encoding. You can also confirm that repeat letters are encoded with the same byte representation (e.g., like “e” which is represented by 101).
Another common format is “utf-8” which can be viewed as a superset of ascii, i.e., it encodes everything that ascii does (often using the same byte representation) and goes beyond it to represent a whole range of special characters. You can update the program:
and you can see that the output is exactly the same (for this string).
One of the key differences between ascii and utf-8 is that in ascii all characters are represented using exactly one byte whereas in utf-8 some characters are represented in one byte, others might take two bytes, and so on. As a result, for ascii, fetching the third character is as simple as fetching the third byte. But for utf-8, this can be more complicated. But this is something for the Python interpreter to worry about. For your purposes you can simply use the encode() method to inspect the byte representation.
Converting Python strings to bytes using the bytes() function
A second way to convert Python strings to bytes is to use the bytes() function. Unlike the encode method, bytes() is a function to be used like:
The output is the same as before:
Once again, you can change ‘ascii’ to ‘utf-8’ and explore that form of encoding.
In summary, converting strings to bytes is very convenient in Python using either the encode method or the bytes function.
Kodeclik is an online coding academy for kids and teens to learn real world programming. Kids are introduced to coding in a fun and exciting way and are challeged to higher levels with engaging, high quality content.