Python String Objects - Part I

String Types in Python 3

In Python 3, there are 3 string types:

  • str is used for Unicode text (including ASCII)
  • bytes is used for binary data (including encoded text)
  • bytearray is a mutable and can be changed in place (they aren’t really text strings; they’re sequences of small, 8-bit integers)

Files work in two modes:

  • text which represents content as str and implements Unicode encodings
  • binary which deals in raw bytes and does no data translation

Python’s strings serve the same role as character arrays in languages such as C. Python has no distinct type for individual characters. Technically, Python strings are categorized as immutable sequences object.

String Literals

>>> 'a', 'b', 'c'   # Adding commas between strings returns a tuple
('a', 'b', 'c')
>>> 'a' 'b' 'c'     # Implicit concatenation
'abc'  

Escape Sequences

backslashes are used to introduce special character codings known as escape sequences. escape sequences are handy for embedding special character codes.

The built-in len function—it returns the actual number of characters in a string:

>>> s = 'a\nb\tc'
>>> len(s)
5  

String can contain none printable values:

>>> s='\x00\x00\x00'    # '\x00' stands for zero (null) character
>>> len(s)
3  
>>> print(s)

>>> type(s)
<class 'str'>  

Raw Strings Suppress Escapes

If the letter r (uppercase or lowercase) appears just before the opening quote of a string, it turns off the escape mechanism.

myfile = open(r'C:\new\text.dat', 'w')

raw strings are also commonly used for regular expressions.

Strings in Action

Basic Operations

>>> location = 'Shanghai'
>>> for c in location: print(c,end='-')
...
S-h-a-n-g-h-a-i->>> 

>>> 'S' in location   # check if specified character(s) in a string
True  
>>> 'an' in location
True  

Indexing and Slicing

Strings are sequenced collections of characters, we can fetch a character in string by index.

Python also support using a negative index value to fetch characters. Technically, a negative offset is added to the length of a string to derive a positive offset.

>>> name = 'Jalo'
>>> name[0:3], name[:3], name[:-1]
('Jal', 'Jal', 'Jal')

Extended slicing: The third limit and slice objects

slice expressions also support one optional third index, used as a step: STRING[start:end:step].

>>> days = 'Sun Mon Tue Wen Thi Fri Sat'
>>> days[::4]
'SMTWTFS'  

If use some negative value for this index, it will collect items from right to left:

>>> say = 'Hello'
>>> say[::-1]       # −1 indicates that the slice should go from right to left
'olleH'  

Conversion Function

String Conversion

int function converts a string to a number str function converts a number to its string representation repr function converts a number to its string representation, but returns the object as a string of code that can be return to recreate the object.

>>> a = 99
>>> b = '1'
>>> a + int(b)        #convert b to integer 1, then perform addition
100  
>>> str(a) + b        #convert a to string '99', then perform string concatenation
'991'  

Character Conversion

ord function returns the actual binary value of specified single string (character) in memory chr function taking an integer code and converting it to the corresponding character

>>> ord('惊')
24778        # It's an identifying number in the Unicode character set  
>>> chr(24778)
'惊'  
>>> chr(24770)
'惂'  

Changing Strings

Strings are 'immutable sequence' object, which means you cannot change a string in place.

Technically, when you change a string, you're creating a new string object and re-link the object to it's original variable name:

>>> name = 'Jalo'
>>> name = name + ' Wang'    # Created a new str obj and assign it to 'name' variable
>>> name
'Jalo Wang'