Python is an excellent language to use when searching within files. Searching is very useful, for example you could imagine using Python to search an output file to find the results of a calculation. Searching in Python is straight-forward. Open a new Python script
search.py and type the following;
from __future__ import print_function import sys with open(sys.argv) as f: for line in f: if "the" in line: print(line, end="")
This script will search a file and print out all of the lines that contain the word
the. Try it out!
The key line of this script is
if "the" in line:. This is a condition that uses Python’s
in statement. It looks for the thing to the left of the
in inside the thing to the right of the
in. In this case it’s looking for the string
"the" inside the string containing the line from the file.
a in b will always return
# does the line contain a lowercase a? "a" in line # does the line contain an uppercase A? "A" in line # does the line contain the word "cat" "cat" in line
If you want to do anything more complicated that a simple “does this string contain this other string” then you have to use Python’s regular expression pattern matching module,
re. This module provides a function called
search() which peforms a similar task to the
in statement. Repeating the last example using the
re module would look like:
import re # does the line contain a lowercase a? "a", line) re.search( # does the line contain an uppercase A? "A", line) re.search( # does the line contain the word "cat" "cat", line)re.search(
search() function normally takes two arguments: the string you want to search for and the string you want to search within. The match string is just like a normal string. The
r in front of the string is a label to tell Python not to expand slashes. While there are no slashes in this search string, I prefer to add the
r by default to search strings as this prevents confusion when using more complicated searches.
To make the search case-insensitive, you must add
re.IGNORECASE after the string to search, e.g.
import re # search for an upper case or lower case a "a", line, re.IGNORECASE) re.search( # search for "cat", "CAT", "CaT", "caT" etc. "cat", line, re.IGNORECASE)re.search(
The combination of search with split provides a powerful tool to help you process simulation output files. Imagine you have run a simulation that calculates the energy of a molecule. Lets imagine that the output file from the simulation looks something like this;
Starting program... Loading molecule... Initialising variables... Starting the calculation - this could take a while! Molecule energy = 2432.6 kcal mol-1 Calculation finished. Bye!
You can get the energy by searching for lines that contain
Molecule energy =, and then using split to break this line into words. The value of the energy is the fourth word. Here is an example script that does just this;
import sys with open(sys.argv) as f: for line in f: if "Molecule energy =" in line: = line.split() words = float(words) energy print("The energy of the molecule is %f kcal mol-1" % energy) break
Try copying the example output to a file (e.g.
logfile.txt) and copying the above Python script (e.g.
search_log.py) to see that this works. Or try to write a similar Python script that processes an output file from one of the programs that you use.
Python’s text search is very flexible. For example, you can search for the contents of a variable, e.g.
= "the" search_string if search_string in line: print(line)
This will match if line contains the value of
grep is a useful UNIX program that lets you print out lines in a file that match some passed text, for example
grep the file.txt
will print out all of the lines that contain the word
Write a Python script (
grep.py) that acts like grep. (Answer)