String Types in Python 3
In Python 3, there are 3 string types:
- str is used for Unicode text (including ASCII)
- bytes is used for binary data (including encoded text)
- bytearray is a mutable and can be changed in place (they aren’t really text strings; they’re sequences of small, 8-bit integers)
Files work in two modes:
- text which represents content as str and implements Unicode encodings
- binary which deals in raw bytes and does no data translation
Python’s strings serve the same role as character arrays in languages such as C. Python has no distinct type for individual characters. Technically, Python strings are categorized as immutable sequences object.
>>> 'a', 'b', 'c' # Adding commas between strings returns a tuple ('a', 'b', 'c') >>> 'a' 'b' 'c' # Implicit concatenation 'abc'
backslashes are used to introduce special character codings known as escape sequences. escape sequences are handy for embedding special character codes.
The built-in len function—it returns the actual number of characters in a string:
>>> s = 'a\nb\tc' >>> len(s) 5
String can contain none printable values:
>>> s='\x00\x00\x00' # '\x00' stands for zero (null) character >>> len(s) 3 >>> print(s) >>> type(s) <class 'str'>
Raw Strings Suppress Escapes
If the letter r (uppercase or lowercase) appears just before the opening quote of a string, it turns off the escape mechanism.
myfile = open(r'C:\new\text.dat', 'w')
raw strings are also commonly used for regular expressions.
Strings in Action
>>> location = 'Shanghai' >>> for c in location: print(c,end='-') ... S-h-a-n-g-h-a-i->>> >>> 'S' in location # check if specified character(s) in a string True >>> 'an' in location True
Indexing and Slicing
Strings are sequenced collections of characters, we can fetch a character in string by index.
Python also support using a negative index value to fetch characters. Technically, a negative offset is added to the length of a string to derive a positive offset.
>>> name = 'Jalo' >>> name[0:3], name[:3], name[:-1] ('Jal', 'Jal', 'Jal')
Extended slicing: The third limit and slice objects
slice expressions also support one optional third index, used as a step:
>>> days = 'Sun Mon Tue Wen Thi Fri Sat' >>> days[::4] 'SMTWTFS'
If use some negative value for this index, it will collect items from right to left:
>>> say = 'Hello' >>> say[::-1] # −1 indicates that the slice should go from right to left 'olleH'
int function converts a string to a number str function converts a number to its string representation repr function converts a number to its string representation, but returns the object as a string of code that can be return to recreate the object.
>>> a = 99 >>> b = '1' >>> a + int(b) #convert b to integer 1, then perform addition 100 >>> str(a) + b #convert a to string '99', then perform string concatenation '991'
ord function returns the actual binary value of specified single string (character) in memory chr function taking an integer code and converting it to the corresponding character
>>> ord('惊') 24778 # It's an identifying number in the Unicode character set >>> chr(24778) '惊' >>> chr(24770) '惂'
Strings are 'immutable sequence' object, which means you cannot change a string in place.
Technically, when you change a string, you're creating a new string object and re-link the object to it's original variable name:
>>> name = 'Jalo' >>> name = name + ' Wang' # Created a new str obj and assign it to 'name' variable >>> name 'Jalo Wang'