Python is a well-established language, with the current version (version 3) released in 2008. Version 2 is currently the most widely used version of the language (and the version that is the subject of this tutorial), and it is installed by default on nearly all modern UNIX systems. Python is also available for OS X and Windows.
Python was first released in 1991. You can read about the history of Python at its Wikipedia page or by going to one of the many Python websites. The best books to learn about Python are Python Essential Reference (good for the absolute beginner - I used it to learn Python!) and Learning Python (also an excellent book which I've used as a reference and to help teach Python).
This is a short course that will provide you with a quick taste of Python. Please work through this course at your own pace. Python is best learned by using it, so please copy out and play with the examples provided, and also have a go at the exercises.
This course is a mirror of the Perl course, with exactly the same pages, examples and exercises. If you want to compare Perl with Python, then please click the Compare with Perl links.
You write Python using a simple text editor, like pico. Log on to a UNIX computer and use a text editor to open a file called script.py, e.g.
$ pico script.py
(note that the $ sign here indicates that the above is a command that you have to type in the shell - you don't need to type the $ sign itself - just type pico script.py)
Python scripts traditionally end in .py. This isn't a requirement, but it does make it easier to recognise the file.
Now type the following into the file;
print "Hello from Python!"
Save the file. You have just written a simple Python script! To run it, type
$ python script.py
This line uses the Python interpreter (called python) to read your python script and to follow the instructions that it finds. In this case you have told Python to print to the screen the line "Hello from Python!".
This was a simple script. Python is a language designed to help you write everything from small and simple scripts to large complete programs. In my opinion Python is one of the best prototyping languages, and the best language for writing programs that glue together or provide interfaces to other programs.
This script has introduced three of the basic building blocks of Python;
$ pico variables.py
Type into the script the following lines;
a = "Hello"; b = "from"; c = "Python!"; print "%s %s %s" % (a, b, c)
What do you think will be printed when you run this script? Run the script by typing;
$ python variables.py
Did you see what you expected? In this script we created three variables, a, b and c. The line a = "Hello" sets the variable a equal to the string Hello. b is set equal to the string fromwhile c is set equal to Python!.
The last line is interesting! The print command prints the string that follows it. In this case the string is equal to "%s %s %s". The %s symbols provide placeholders into which the values of variables can be placed. The variables in this case are a, b and c. These are supplied in the parenthesis after the % sign after the string that is printed. The values of the variables are substituted in in the order they appear in the parenthesis, e.g. create a new Python script called variables2.py,
$ pico variables2.py
and type the following;
a = 42 b = 3.14159265 c = "Spot the dog" d = True print "Print integers (whole numbers) like %d by typing percent d." % (a) print "Print floating point numbers like %f by typing percent f." % (b) print "Print strings like %s by typing percent s." % (c) print "Print logical values using %d, %f or %s." % (d, d, d) print "You can add as many in a line, e.g. %s, %f, %d" % (c, b, a) print "You can control the width, e.g. %5d, or %0004d" % (a, a) print "You can control the precision, e.g. %8.1f or %8.5f" % (b, b)
What do you think will be printed to the screen when you run this script?
Run this script (python variables2.py). Did you see what you expected?
Play with this script by changing the placeholders and see how that affects the output.
A Python script is a file that contains instructions to the python interpreter, with one instruction per line, that are read one at a time from the top of the script to the bottom. You can, however, divert this flow using a loop. Open a new Python script loop.py and write this;
for i in range(1, 11):
five_times_i = 5 * i
print "5 times %d equals %d" % ( i, five_times_i )
What do you think will be printed to the screen? Run the script ($ python loop.py). Did you see what you expected?
This script has introduced a for loop. The loop has two parts;
Loops are very powerful. For example;
for i in range(0, 201, 2):
print "%d" % i
prints all of the even numbers from 0 to 200.
for i in range(10, 0, -1):
print "%d..." % i
print "We have lift off!"
prints out a count down.
for i in range(1, 4):
for j in range(1,4):
i_times_j = i * j
print "%d" % i_times_j,
print "\n",
prints out a 3*3 matrix where the element at (i,j) equals i times j.
Note in this last example that adding a comma on the end of the print line (print "%d" % i_times_j,) stops a return (newline) from being printed at the end of the line, and that you can explicitly print a newline by printing "\n" (e.g. print "\n",).
Note as well in the last example that you can nest loops (one loop can be inside another), but you must be careful within indentation to ensure that a line of code is in the loop you want. Try writing this;
for i in range(1, 4):
for j in range(1,4):
i_times_j = i * j
print "%d" % i_times_j,
print "\n",
This above code won't work as the print "%d" % i_times_j, is indented only into the first loop (for i in range(1,4):), and is not part of the second loop (for j in range(1,4):) as is required for this example.
As you can see, indentation in Python is really important. Getting it wrong can dramatically change your script, and bugs caused by incorrect indentation can be very hard to find. While this is a weakness of Python, it is also a strength, as enforcing correct indentation helps make Python scripts easier to read and easier to maintain over long periods of time.
Arguments are important for all programs. Arguments for programs have nothing to do with shouting, but are additional bits of information supplied to the program when it is run. Open a new Python script (pico arguments.py) and type this;
import sys
n_arguments = len(sys.argv)
for i in range(0, n_arguments):
print "Argument %d equals %s" % ( i, sys.argv[i] )
Run this script by typing
$ python arguments.py here are some arguments
What do you see? Can you work out what happened?
In this case you passed four arguments to your script; here, are, some and arguments. The Python interpreter read those arguments and placed them, together with the name of the script, into a special variable called sys.argv that you can access from your script (the sys.argv variable comes from the module called sys. This is why we have to load (import) the sys module at the start of the script using the line import sys).
Because there can be more than one argument, the sys.argv variable must be capable of holding more than one value. sys.argv must be able to hold multiple values. Arrays are variables that can hold multiple values. An array is a collection of values that can be accessed by their index. Create a script array.py and write this;
my_array = [ ]
my_array.append( "cat" )
my_array.append( "dog" )
my_array.append( 261 )
print my_array
print my_array[0]
print my_array[1]
print my_array[2]
print "my_array contains %d items" % ( len(my_array) )
another_array = [ 1, 2, 3, "purple", 51.2 ]
print another_array[4]
two_dimensional_array = [ [1,2,3], [4,5,6], [7,8,9] ]
print two_dimensional_array[0][2]
for i in range(0, 3):
for j in range(0,3):
print "%d " % (two_dimensional_array[i][j]),
print "\n",
Run this script ($ python array.py). Can you understand what has been printed and why?
The size of an array (the number of values it contains) can be found by typing size_of_array = len(array). You can access an individual value within the array using square brackets, e.g. array[0] is the first value in the array, array[1] is the second value etc. (Note that we start counting from zero - the first item is at array[0] not array[1]).
Exercise
Use the knowledge you've gained so far to write a Python script that can print out any times table. Call your script times_table.py, and have it read two arguments. The first argument should be the times table to print (e.g. the five times table) while the second should be the highest value of the times table to go up to. So
$ python times_table.py 5 12
should print the five times table from 1 times 5 to 12 times 5.
Note that the arguments are loaded into Python as strings. You will need to convert them to integers by using the lines like;
n = int( sys.argv[1] )
Answer (don't peek at this unless you are stuck or until you have finished!)
As an extension, can you think of a way to use arrays to print out the times table using words rather than using numbers? To do this you will need to know that you can assign values to an array using the following syntax;
a = [ 1, 2, 3, 4, 5 ] b = [ "cat", "dog", "fish", "bird" ] c = [ "zero", "one", "two", "three" ]
Answer (don't peek at this unless you are stuck or until you have finished!)
import sys
t = int( sys.argv[1] )
n = int( sys.argv[2] )
print "This is the %d times table." % t
for i in range(1, n+1):
t_times_i = t * i
print "%d times %d equals %d" % (i, t, t_times_i)
import sys
t = int( sys.argv[1] )
n = int( sys.argv[2] )
numbers = [ "zero", "one", "two", "three", "four",
"five", "six", "seven", "eight", "nine",
"ten", "eleven", "twelve" ]
print "This is the %s times table." % numbers[t]
for i in range(1, n+1):
t_times_i = t * i
print "%s times %s equals %d" % ( numbers[i], numbers[t], t_times_i )
Loops provide a means to execute part of the script multiple times. Conditions provide the route to choose whether or not to execute part of a script. Open a new Python script (pico conditions.py) and type the following;
for i in range(1,11):
if i < 5:
print "%d is less than 5." % i
elif i > 5:
print "%d is greater than 5." % i
else:
print "%d is equal to 5." % i
This script loops i over all values from 1 to 10, and uses an if block to test each value of i. There are three sections to the if block;
If blocks can be used, for example, to correct input, e.g.
import sys
n = int( sys.argv[1] )
if n < 0:
print >>sys.stderr,"We cannot process negative numbers!"
sys.exit(-1)
(in this case we print to sys.stderr, which prints the string to the standard error stream, and then we use the sys.exit function to exit from the script with the return value -1)
if blocks are very powerful. For example type and run the below script; (you may want to use copy-and-paste rather than typing it in by hand!)
import sys
n = int( sys.argv[1] )
if n < 0:
print "%d is negative." % n
elif n > 100:
print "%d is large and positive." % n
elif n == 10:
for i in range(n, 0, -1):
print "%d..." % i
print "Blast off!"
elif n == 42:
print "The answer to life, the universe and everything!"
else:
print "What is %d?" % n
Can you work out what it does before you run it? Run it with some different arguments. Does it do what you expect?
Python is great at processing text and reading and writing files. Open a new Python script (pico files.py) and type the following lines.
import sys
filename = sys.argv[1]
FILE = open(filename, "r")
lines = FILE.readlines()
i = 0
for line in lines:
i = i + 1
print "%4d: %s" % ( i, line ),
Run this script by passing as an argument the path to any file, e.g.
$ python files.py ./files.py
What you should see is that Python has printed out every line of the file, with each line preceeded by its line number. Lets go through each line of the script to see how Python has achieved this feat
First we got the filename as the first argument to the script via the line filename = sys.argv[1]
The next step was to open the file. You open files using the open command. The part open(filename, "r") says to open the file whose path is the value of the variable filename and open the file for reading ("r"). This returns a filehandle which is assigned to the variable FILE. If the file does not exist, or is not readable then the script will exit with an error (have a try and see what the error looks like!)
In the next line, lines = FILE.readlines() we are asking Python to read all of the lines in the file from the filehandle FILE.
In the next line i = 0 we are just initialising the counter variable i so that it is equal to zero.
The next line while (for line in lines:) is interesting. It is a for loop, but now it loops over each line contained in the array lines.
In the body of the loop, i = i + 1; just increments the count of how many lines have been read.
Then the line print "%4d: %s" % ( i, line ), prints the value of the counter and the value of the line. Note that we have had to finish this line with a comma to stop Python printing out an extra newline ("\n"), as there is already one newline character being printed from line.
An alternative way of achieving the same affect is to use a more traditional for loop to loop over the lines, e.g.
import sys
filename = sys.argv[1]
lines = open( filename, "r" ).readlines()
for i in range( 0, len(lines) ):
print "%4d: %s" % ( i+1, lines[i] ),
(note that here we've called "readlines" directly on the returned filehandle from "open", and that we can get the size of the array of lines using len(lines))
Exercise
head and tail are two useful UNIX programs that can be used to print out the first few, or last few lines of a file (this is useful if you are monitoring log files). Can you write a Python script that does the same thing?
For example
$ python head.py 5 filename
should print out the first five lines of a file, and
$ python tail.py 10 filename
should print out the last ten lines of a file.
Answer head.py and tail.py. (don't peek unless you are stuck or until you have finished!)
Can you go one better and write a body command, that prints the middle of a file? For example
$ python body.py 20 25 filename
prints lines 20 to 25 of a file. Can you write the code so that
$ python body.py 25 20 filename
would print lines 25 to 20 (so reversing the file)? (Hint - you can access the line at index i in the array of lines using line = lines[i])
Answer. (again, don't peek until you have finished!)
import sys
n = int( sys.argv[1] )
filename = sys.argv[2]
lines = open( filename, "r" ).readlines()
nlines = len( lines )
if n > nlines:
n = nlines
for i in range(0, n):
print lines[i],
import sys
n = int( sys.argv[1] )
lines = open( sys.argv[2], "r" ).readlines()
nlines = len(lines)
if n > nlines:
n = nlines
for i in range(nlines - n, nlines):
print lines[i],
import sys
start = int( sys.argv[1] )
end = int( sys.argv[2] )
lines = open( sys.argv[3], "r" ).readlines()
nlines = len(lines)
if start < 0:
start = 0
if end >= nlines:
end = nlines-1
for i in range(start, end+1):
print lines[i],
Python is equally good at writing to files as it is at reading them. Open a new Python script (pico write_times_table.py) and type;
import sys
filename = sys.argv[1]
n = int( sys.argv[2] )
FILE = open( filename, "w" )
for i in range(1, 11):
print >>FILE,"%d times %d equals %d" % ( i, n, i*n )
FILE.close()
Run this script by typing;
$ python write_times_table.py five.txt 5
This should result in the five times table being written to the file five.txt in the current directory.
The part of the line FILE = open( filename, "w" ) opens the file whose path is the variable filename and connects it to the filehandle FILE. This time however, the file is opened using mode "w", so the file is opened for writing, not reading. If the file does not exist, then the file is created, and it it does exist, then the file is overwritten (so be careful not to overwrite any of your important files!).
There are three different modes for opening files;
To write to the file, supply the filehandle to the print command, e.g. as in the script type print >>FILE, "%d times %d equals %d" % ( i, n, i*n ). The filehandle is placed between the print command and the string to be printed, together with ">>" and a comma.
Finally, when you have finished writing to a file you should close it using the close command. This ensures that what you have written is properly copied to disc (as it may up to this point be buffered in memory).
Filehandles allow you to refer to more than one file at a time. For example, we could modify the script that numbered each line of the file so that it wrote the numbered lines to another file. For example;
import sys
filename = sys.argv[1]
numbered_filename = "%s_numbered" % filename
RFILE = open( filename, "r" )
WFILE = open( numbered_filename, "w" )
lines = RFILE.readlines()
for i in range( 0, len(lines) ):
print >>WFILE,"%4d: %s" % ( i, lines[i] ),
RFILE.close()
WFILE.close()
(note that numbered_filename = "%s_numbered" % filename uses the same syntax are print, except now the output is returned to a new string variable, rather than printed to the screen. So if filename contained the string file.txt, then numbered_filename would be set equal to file.txt_numbered)
Most files are arranged into words. It is very easy to split a line of text using Python into an array of words. Create a new Python script (pico words.py) and type the following;
import sys
lines = open( sys.argv[1], "r" ).readlines()
total_nwords = 0
for line in lines:
words = line.split()
nwords = len( words )
total_nwords += nwords
print "The total number of words in the file %s equals %d" % \
(sys.argv[1], total_nwords)
(note that total_nwords += nwords uses the += (increment) operator, that increments total_nwords by nwords. Also note that the backslash "\" allows us to split a single line of Python code across multiple lines of the script)
The new command in this script is split. This command is a function of a string, and splits the string into an array of strings. line.split() splits the string contained in the variable line, splitting the string whenever it sees a space character. You can split by whatever you wish, so line.split(":") would split the line using colons, while line.split("the") would split the line using the word the.
Because multiple values are returned by split, they are returned as an array. The number of words is given by the size of the array (len( words )), and the words can be accessed using square brackets (e.g. words[0] is the first word of the line).
Sometimes you want to instead want to get an array containing all of the letters in the string. Fortunately Python strings already present themselves as an array of letters. For example, take a look at this script that counts the number of lines, words and letters in a file;
import sys
lines = open( sys.argv[1], "r" ).readlines()
total_nlines = len( lines )
total_nwords = 0
total_nletters = 0
for line in lines:
total_nwords += len( line.split() )
total_nletters += len( line )
print "%s contains %d lines, %d words and %d letters." % \
( sys.argv[1], total_nlines, total_nwords, total_nletters )
(note that the "\" backslash allows us to break one line of code across multiple lines in the script)
Exercises
Write a Python script that prints out the first word of the first five lines of an arbitrary file (Here is the answer).
Here is a comma-separated table of values;
Make,Insurance Class,Premium ($),Age (years) Ferrari,10,2432.50,3 BMW,8,1231.10,1 VW,6,862.20,4 Fiat,4,591.10,2 Bugatti,15,4312.00,1
Copy this into a text file using pico
Write a Python script that turns this from a comma separated file with headings Make,Insurance Class, Premium ($),Age (years) into a space separated file with headings Make Premium($) Insurance_Class (answer).
(Hint. You may want to strip the newline characters from the end of each line of the file. You can do this by using the rstrip command, e.g. line.rstrip() which removes any extra spaces or newline characters from the end of line)
Write a Python script that will print out the mean average premium, the make of the oldest car in the list, and the makes of the car in the highest and lowest insurance groups (answer).
import sys
lines = open( sys.argv[1], "r" ).readlines()
nlines = 5
if nlines > len(lines):
nlines = len(lines)
for i in range(0, nlines):
words = lines[i].split()
if len(words) > 0:
print words[0]
import sys
lines = open( sys.argv[1], "r" ).readlines()
# print our own header
print "Make Premium($) Insurance_class"
# skip the header line (start at 1 rather than 0)
for i in range( 1, len(lines) ):
line = lines[i].rstrip()
words = line.split(",")
if len(words) >= 4:
print "%s %s %s" % (words[0], words[2], words[3])
import sys
lines = open( sys.argv[1], "r" ).readlines()
total_premium = 0
nmakes = 0
oldest_age = 0
highest_class = 0
lowest_class = 1000
for i in range( 1, len(lines) ):
line = lines[i].rstrip()
words = line.split(",")
if len(words) >= 4:
make = words[0]
car_class = int( words[1] )
premium = float( words[2] )
age = words[3]
nmakes += 1
total_premium += premium
if age > oldest_age:
oldest_make = make
oldest_age = age
if car_class > highest_class:
highest_make = make
highest_class = car_class
if car_class < lowest_class:
lowest_make = make
lowest_class = car_class
avg_premium = total_premium / nmakes
print "The average premium is $%f." % avg_premium
print "The oldest make is %s." % oldest_make
print "The make in the lowest class is %s." % lowest_make
print "The make in the highest class is %s." % highest_make
Python is an excellent language to use when searching within files. Searching is very useful, for example you could imagine using Python to search an output file to find the results of a calculation. Searching in Python is straight-forward. Open a new Python script (pico search.py) and type the following;
import sys
import re
lines = open( sys.argv[1], "r" ).readlines()
for line in lines:
if re.search( r"the", line ):
print line,
This script will search a file and print out all of the lines that contain the word the. Try it out!
The key line of this script is if (re.search( r"the", line )). This is a condition that uses Python's regular expression pattern matching module, re together with a match string, r"the". The match string is just like a normal string. The r in front of the string is a label to tell Python not to expand slashes. While there are no slashes in this search string, I prefer to add the r by default to search strings as this prevents confusion when using more complicated searches.
For example;
import re #does the line contain a lowercase a? re.search( r"a", line ) #does the line contain an uppercase A? re.search( r"A", line ) #does the line contain the word "cat" re.search( r"cat", line )
To make the search case-insensitive, you must add re.IGNORECASE after the string to search, e.g.
import re #search for an upper case or lower case a re.search( r"a", line, re.IGNORECASE ) #search for "cat", "CAT", "CaT", "caT" etc. re.search( r"cat", line, re.IGNORECASE )
The combination of search with split provides a powerful tool to help you process simulation output files. Imagine you have run a simulation that calculates the energy of a molecule. Lets imagine that the output file from the simulation looks something like this;
Starting program... Loading molecule... Initialising variables... Starting the calculation - this could take a while! Molecule energy = 2432.6 kcal mol-1 Calculation finished. Bye!
You can get the energy by searching for lines that contain Molecule energy =, and then using split to break this line into words. The value of the energy is the fourth word. Here is an example script that does just this;
import sys
import re
lines = open( sys.argv[1], "r" ).readlines()
for line in lines:
if re.search( r"Molecule energy =", line ):
words = line.split()
energy = float( words[3] )
print "The energy of the molecule is %f kcal mol-1" % energy
break
Try copying this example output to a file (logfile.txt) and copying the above Python script (search_log.py) to see that this works. Or try to write a similar Python script that processes an output file from one of the programs that you use.
Python's text search is very flexible. For example, you can search for the contents of a variable, e.g.
import re search_string = "the"; re.search( search_string, line )
This will match if line contains the value of search_string (namely the).
Exercise
grep is a useful UNIX program that lets you print out lines in a file that match some passed text, for example
$ grep the file.txt
will print out all of the lines that contain the word the.
Write a Python script (grep.py) that acts like grep. (Answer)
import sys
import re
search_string = sys.argv[1]
lines = open( sys.argv[2], "r" ).readlines()
for line in lines:
if re.search( search_string, line ):
print line,
As well as being excellent for search, Python is also great at doing search and replace. Create a new Python script (replace.py) and copy the following;
import sys
import re
lines = open( sys.argv[1], "r" ).readlines()
for line in lines:
line = re.sub( r"the", "THE", line )
print line,
This script reads in a file and prints out every line to the screen. However, before printing the line, it modifies it using the sub (substitute) function, re.sub( r"the", "THE", line ). This sub function is a lot like the search function, except that now there is an extra argument (THE) that is the replacement string. This searches for the text in the first string r"the" and replaces it with the text in the second string "THE" (so in this case this replaces the with THE). Note that this replaces all occurances of the with THE. You can optionally specify the maximum number of matches, e.g.
import re line = "Round the ragged rock the ragged rascal ran" # replace only a maximum of 2 matches of "ra" with "RA" line = re.sub( r"ra", "RA", line, 2 ) print line
The sub function performs a case-sensitive substitution. Case-insensitive substitution is a little more complex... For example;
import re #replace all occurances of "the", "The", "THe" etc. with "THE" line = re.sub( re.compile( r"the", re.IGNORECASE ), "THE", line )
In this last case, we've had to use re.compile to compile a search expression that could perform a case-insensitive search for the.
Sometimes you may want to perform the substitution at a specific place in the line, e.g. only at the beginning of the line or only at the end. You can do this by adding either a carat (^) to the beginning of the search string, to force matching at the beginning, or by adding a dollar sign ($) to the end of the search string to force matching at the end, e.g.;
import re line = "is this the bliss that this is" # replace all instances of "is" with "IS" print re.sub( r"is", "IS", line ) # replace only the first two instances of "is" with "IS" print re.sub( r"is", "IS", line, 2 ) # replace only an "is" at the beginning of the string print re.sub( r"^is", "IS", line ) # replace only the "is" at the end of the string print re.sub( r"is$", "IS", line )
You can also use variables in the search and replace parts of the substitute string, e.g.
import re search = "the" replace = "THE" #case-insensitive replace "the" with "THE" line = re.sub( re.compile(search, re.IGNORECASE), replace, line )
Exercise
Use search and replace to update your grep.py script so that it not only prints matching lines, but it also highlights the matched string in the line (e.g. by adding asterisks around the word, or by capitalising the word).
(note, you can capitalise a string by writing line = line.upper(). Similarly, you can lower-case a string by writing line = line.lower())
Here a possible answer.
import sys
import re
search_string = sys.argv[1]
replace_string = "**%s**" % ( search_string.upper() )
lines = open( sys.argv[2], "r" ).readlines()
for line in lines:
print re.sub( search_string, replace_string, line )
So far you've seen how you can use Python to process your output files. However, what makes Python a good glue language is its ability to actually run programs as well. There are several ways to run a program from your Python script. I'll only present a couple of ways here. Open a new Python script (system_run.py) and copy the following;
import sys import os directory = sys.argv[1] os.system( "ls %s" % directory )
This is a simple script that just lists the contents of a directory. The key line is os.system("ls %s" % directory). The system command (part of the os module) is passed a string, and executes the value of that string in pretty much exactly the same way that the same text would have been executed if you had typed it yourself at the command line. The output of the command is printed to the screen.
Lets imagine that we have to run ten simulations to calculate the energy of ten different molecules, that are held in the files input1.mol to input10.mol. The energy is calculated using the program molnrg, which is passed the name of the file to process. Here is a simple script that can run all ten simulations, outputting the results to ten log files, called output1.log to output10.log.
import os
for i in range(1, 11):
os.system( "molnrg input%d.mol > output%d.log" % (i, i) )
Wasn't that easier than running each simulation individually?
os.system is good if you want to just run a program. However, there are times when you would like to process the output of the program within Python. To do this, you have to use os.popen. Open a new Python script (pico popen.pl) and copy the following;
import sys
import os
directory = sys.argv[1]
files = os.popen( "ls %s" % directory, "r" ).readlines()
nfiles = len( files )
print "There are %d files in %s" % (nfiles, directory)
for i in range(0, nfiles):
print "%d: %s" % ( i, files[i] ),
This script lists the contents of a directory, but first says how many files are in the directory, and then prints each one preceded by its number.
The key line here is files = os.popen( "ls %s" % directory, "r" ). The string contained in the string ("ls %s" % directory) is executed, and returned as a virtual filehandle. Like normal filehandles, you can get all of the lines by using the readlines function. Note that the newline (\n) character is left on the end of each output line. Use the rstrip() command if you want to remove the newline character, e.g. files[i].rstrip().
Exercises
convert is a UNIX program that can convert an image from one file format to another (e.g. convert a JPEG file to a PNG). Write a Python script that can convert all of the JPEG files in a directory into PNG files.
(the command to convert file.jpg to file.png is convert file.jpg file.png)
Here's a possible answer.
import sys
import os
import re
directory = sys.argv[1]
jpeg_files = os.popen( "ls %s" % directory, "r" ).readlines()
for jpeg_file in jpeg_files:
jpeg_file = jpeg_file.rstrip()
png_file = re.sub( r"jpg$", "png", jpeg_file )
command = "convert %s %s" % (jpeg_file, png_file)
print "Running '%s'..." % command
os.system( command )
Python can be used for all stages of controlling jobs submitted to a compute cluster. You've seen how you can write files using Python. This lets you use Python to write the command and input files for your programs. You have also seen how to run programs from within Python, so you can use Python to run the job using the newly-written input file. You can then process the output using split, search and replace. You could then use the processed output to write new input files to run more programs. In this way, Python can act as the glue that can stick a chain of programs together, with the output of one program being used to provide the input of the next program.
If this has whet your appetite for Python then I really recommend that you get hold of a Python book (like Python Essential Reference and Learning Python). There are also hundreds of Python tutorials on the web (just perform a web search for python tutorial or python for beginners). The best way to learn Python though is to read other people's Python and copy it. Please feel free to copy, adapt and play with the examples in this workshop. They should hopefully provide the starting points for a range of simple tasks that you may wish to perform using Python.