Python - Strings

String (computer science):

In computer science a string is any _finite_ sequence of characters (i.e., letters, numerals, symbols and punctuation marks). An important characteristic of each string is its length, which is the number of characters in it. The length can be any natural number (zero or any positive integer)  

Python has a built-in string class named "str" with many useful features. String literals can be enclosed by either double or single quotes, although, single quotes are commonly used to follow style guidelines. Backslash escapes work the usual way within both single and double quoted literals -- e.g. \n \' \". A double quoted string literal can contain single quotes without any fuss (e.g. "I can't do it") and single quoted strings can contain double quotes.

A string literal can span multiple lines. You must use a backslash \ at the end of the line to escape the newline. String literals inside triple quotes, """" or ''', can multiple lines of text and are commonly used for multi-line code commenting.

Python strings cannot be changed after they are created. This is known an as immutable object. Since strings can not be changed, we construct new strings as we go to represent computed values. For example, the expression ('spam' + 'eggs') takes in the 2 strings 'spam' and 'eggs' and builds a new string 'spameggs'.

Characters in a string can be accessed using the standard [ ] syntax, and like Java, C# and C++, Python uses zero-based indexing. Therefore, if a string is 'spameggs', then str[1] is 'p', str[3] is 'm'

If the index is out of bounds for the string, Python will throw error. The Python style is to halt the program if it can't tell what to do, rather than just make up a default value. The slice syntax also works to retrieve any substring from a string. The len(string) function returns the length of a string. The [ ] syntax and the len() function actually work on any sequence type -- strings, lists, etc..

Python tries to make its operations work consistently across different types.

Notice in the code below that variables are not pre-declared -- just assign to them and go.

In Python, integer assignment is as simple as:
(from the shell)

>>> x = 'Craig'
>>> y = 6
>>> print(x * y)

Python maintains the types of variables and does not let you mix types when performing operations.
This will throw an error:

>>> x = 6
>>> y = 'This string'
>>> print(x + y)
Traceback (most recent call last):
  File "", line 1, in 
TypeError:  unsupported operand type(s) for +: 'int' and 'str'

You can concatenate strings by using two strings and the + symbol.

>>> x = 'Craig'
>>> y = 'Is Awesome'
>>> print( x + ' ' + y)
Craig Is Awesome

String Methods

Here are some of the most common string methods.

s.lower(), s.upper()
Returns the lowercase or uppercase version of the string

Returns a string with whitespace removed from the start and end

Tests if all the string chars are in the various character classes

s.startswith('other'), s.endswith('other')
Tests if the string starts or ends with the given other string

Searches for the given other string within s, and returns the first index where it begins or -1 if not found

s.replace('old', 'new')
Returns a string where all occurrences of 'old' have been replaced by 'new'

Returns a list of substrings separated by the given delimiter. The delimiter is not a regular expression, it's just text. 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']. As a convenient special case s.split() (with no arguments) splits on all whitespace chars.

s.join(list) -- opposite of split(), joins the elements in the given list together using the string as the delimiter. e.g. '---'.join(['aaa', 'bbb', 'ccc']) -> aaa---bbb---ccc

Python does not have a separate character type. Instead an expression like s[8] returns a string length of 1, containing the character.

With that string-length-1, the operators , <=, ... all work as you would expect, so you don't need to know that Python does not have a separate char type.

String Slices

The slice syntax is a handy way to refer to sub-parts of sequences. Usually referring to strings and lists. The slice s[start:end] is the elements beginning at start and extending up to but not including end.

Suppose we have the following string...

s = 'Monty Python!'

the variable 's' is now the string representation of 's' with letter indexes 0 1 2 3 4 5 6 7 8 9 10 11.

s[1:4] is 'Mon'
Chars starting at index 1 and extending up to but not including index 4

s[1:] is 'onty Python!'
omitting either index defaults to the start or end of the string

s[:] is 'Monty Python!'
A copy of the whole thing (this is the pythonic way to copy a sequence like a string or list)

s[1:100] is 'onty Python!'
An index that is too big is truncated down to the string length

The standard zero-based index numbers give quick access to characters near the start of the string. Python also uses negative numbers to give simple access to the characters at the end of the string:

s[-1] is the last char 'n', s[-2] is 'o' the next-to-last char, etc.

Negative index numbers count back from the end of the string:

s[-1] is 'n'
Last char (1st from the end)

s[-4] is 't'
4th from the end

s[:-3] is 'Monty Pyt'
Going up to but not including the last 3 chars

s[-3:] is 'hon'
Starting with the 3rd char from the end and extending to the end of the string.

It is true of slices that for any index n, s[:n] + s[n:] s. This works even for n negative or out of bounds. Or put another way s[:n] and s[n:] always partition the string into two string parts, conserving all the characters. As we'll see in the list section later, slices work with lists too.

String %

Python has a printf()-like facility to assemble a string with it's variable reference value. The % operator takes a printf-type format string on the left (%d int, %s string, %f/%g floating point), and the matching values in a tuple on the right (a tuple is made of values separated by commas, typically grouped inside parenthesis):

The % operator

# add parenthesis to make the long-line work:
text = ("%d chemistry is the combination of %d and %d in living organisms." % ('Organic', 'Chemistry', 'Biology'))

i18n Strings (Unicode)

Regular Python strings are not unicode, they are just plain bytes. To create a unicode string, use the 'u' prefix on the string literal:

>>> ustring = u'A unicode \u018e string \xf1'
>>> ustring
u'A unicode \u018e string \xf1'

A unicode string is a different type of object from regular "str" string, but the unicode string is compatible (they share the common superclass "basestring"), and the various libraries such as regular expressions work correctly if passed a unicode string instead of a regular string.

To convert a unicode string to bytes with an encoding such as 'utf-8', call the ustring.encode('utf-8') method on the unicode string. Going the other direction, the unicode(s, encoding) function converts encoded plain bytes to a unicode string:

If Statement

Python does not use { } to enclose blocks of code for if/loops/function etc..
Instead, Python uses the colon (:) and indentation to group statements. The boolean test for an if does not need to be in parenthesis. They can have both elif and else clauses.

Any value can be used as an if-test. The zero values all count as false: None, 0, empty string, empty list, empty dictionary. There is also a Boolean type with two values: True and False (converted to an int, these are 1 and 0). Python has the usual comparison operations: ==, !=, <, <=, >, >=.

The boolean operators are the spelled out words and, or not.

Here's what the code might look like for a grocery store promotion based on the total grocery bill. Notice how each block of then/else statements starts with a colon(:) and the statements are grouped by their indentation:

if total >= 200:
  print 'Thank You.  You are entered to win a prize'
    if resp == 'wow' or total >= 300:
      print('You have won a $10 gift card')
    elif resp == 'cool' or total >= 250:
      print("You win a $5 gift card.")
      print("You win a free sub?")

My most common syntax mistake is forgetting the colon after the if statement.

Craig Derington

Secular Humanist, Libertarian, FOSS Evangelist building Cloud Apps developed on Red Hat Enterprise Linux and Ubuntu Server. My toolset includes Python, Celery, Flask, Django, MySQL, MongoDB and Git.

comments powered by Disqus