Book Image

Python for Secret Agents

By : Steven F. Lott, Steven F. Lott
Book Image

Python for Secret Agents

By: Steven F. Lott, Steven F. Lott

Overview of this book

Table of Contents (12 chapters)
Python for Secret Agents
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Background briefing – math and numbers


We'll review basics of Python programming before we start any of our more serious missions. If you already know a little Python, this should be a review. If you don't know any Python, this is just an overview and many details will be omitted.

If you've never done any programming before, this briefing may be a bit too brief. You might want to get a more in-depth tutorial. If you're completely new to programming, you might want to look over this page for additional tutorials: https://wiki.python.org/moin/BeginnersGuide/NonProgrammers. For more help to start with expert Python programming, go to http://www.packtpub.com/expert-python-programming/book.

The usual culprits

Python provides the usual mix of arithmetic and comparison operators. However, there are some important wrinkles and features. Rather than assuming you're aware of them, we'll review the details.

The conventional arithmetic operators are: +, -, *, /, //, %, and **. There are two variations on division: an exact division (/) and an integer division (//). You must choose whether you want an exact, floating-point result, or an integer result:

>>> 355/113
3.1415929203539825
>>> 355//113
3
>>> 355.0/113.0
3.1415929203539825
>>> 355.0//113.0
3.0

The exact division (/) produces a float result from two integers. The integer division produces an integer result. When we use float values, we expect exact division to produce float. Even with two floating-point values, the integer division produces a rounded-down floating-point result.

We have this extra division operator to avoid having to use wordy constructs such as int(a/b) or math.floor(a/b).

Beyond conventional arithmetic, there are some additional bit fiddling operators that are available: &, |, ^, >>, <<, and ~. These operators work on integers (and sets). These are emphatically not Boolean operators; they don't work on the narrow domain of True and False. They work on the individual bits of an integer.

We'll use binary values with the 0b prefix to show what the operators do, as shown in the following code. We'll look at details of this 0b prefix later.

>>> bin(0b0101 & 0b0110)
'0b100'
>>> bin(0b0101 ^ 0b0110)
'0b11'
>>> bin(0b0101 | 0b0110)
'0b111'
>>> bin(~0b0101)
'-0b110'

The & operator does bitwise AND. The ^ operator does bitwise exclusive OR (XOR). The | operator does inclusive OR. The ~ operator is the complement of the bits. The result has many 1 bits and is shown as a negative number.

The << and >> operators are for doing left and right shifts of the bits, as shown in the following code:

>>> bin( 0b110 << 4 )
'0b1100000'
>>> bin( 0b1100000 >> 3 )
'0b1100'

It may not be obvious, but shifting left x bits is like multiplying it by 2**x, except it may operate faster. Similarly, shifting right by b bits amounts to division by 2**b.

We also have all of the usual comparison operators: <, <=, >, >=, ==, and !=.

In Python, we can combine comparison operators without including the AND operator:

>>> 7 <= 11 < 17
True
>>> 7 <= ll and 11 < 17
True

This simplification really does implement our conventional mathematical understanding of how comparisons can be written. We don't need to say 7 <= 11 and 11 < 17.

There's another comparison operator that's used in some specialized situations: is. The is operator will appear, for now, to be the same as ==. Try it. 3 is 3 and 3 == 3 seem to do the same thing. Later, when we start using the None object, we'll see the most common use for the is operator. For more advanced Python programming, there's a need to distinguish between two references to the same object (is) and two objects which claim to have the same value (==).

The ivory tower of numbers

Python gives us a variety of numbers, plus the ability to easily add new kinds of numbers. We'll focus on the built-in numbers here. Adding new kinds of numbers is the sort of thing that takes up whole chapters in more advanced books.

Python ranks the numbers into a kind of tower. At the top are numbers with fewest features. Each subclass extends that number with more and more features. We'll look at the tower from bottom up, starting with the integers that have the most features, and moving towards the complex numbers that have the least features. The following sections cover the various kinds of numbers we'll need to use.

Integer numbers

We can write integer values in base 10, 16, 8, or 2. Base 10 numbers don't need a prefix, the other bases will use a simple two-character prefix, as shown in the following snippet:

48813
0xbead 
0b1011111010101101
0o137255

We also have functions that will convert numbers into handy strings in different bases. We can use the hex(), oct(), and bin() functions to see a value in base 16, 8, or 2.

The question of integer size is common. Python integers don't have a maximum size. They're not artificially limited to 32 or 64 bits. Try this:

>>> 2**256
115792089237316195423570985008687907853269984665640564039457584007913129639936

Large numbers work. They may be a bit slow, but they work perfectly fine.

Rational numbers

Rational numbers are not commonly used. They must be imported from the standard library. We must import the fractions.Fraction class definition. It looks like this:

>>> from fractions import Fraction

Once we have the Fraction class defined, we can use it to create numbers. Let's say we were sent out to track down a missing device. Details of the device are strictly need-to-know. Since we're new agents, all that HQ will release to us is the overall size in square inches.

Here's an exact calculation of the area of a device we found. It is measured as 4⅞" multiplied by 2¼":

>>> length=4+Fraction("7/8")
>>> width=2+Fraction("1/4")
>>> length*width
Fraction(351, 32)

Okay, the area is 351/32, which is—what?—in real inches and fractions.

We can use Python's divmod() function to work this out. The divmod() function gives us a quotient and a remainder, as shown in the following code:

>>> divmod(351,32)
(10, 31)

It's about 5 × 2, so the value seems to fit within our rough approximation. We can transmit that as the proper result. If we found the right device, we'll be instructed on what to do with it. Otherwise, we might have blown the assignment.

Floating-point numbers

We can write floating-point values in common or scientific notation as follows:

3.1415926
6.22E12

The presence of the decimal point distinguishes an integer from a float.

These are ordinary double-precision floating-point numbers. It's important to remember that floating-point values are only approximations. They usually have a 64-bit implementation.

If you're using CPython, they're explicitly based on the C compiler that was shown in the sys.version startup message. We can also get information from the platform package as shown in the following code snippet:

>>> import platform
>>> platform.python_build()
('v3.3.4:7ff62415e426', 'Feb  9 2014 00:29:34')
>>> platform.python_compiler()
'GCC 4.2.1 (Apple Inc. build 5666) (dot 3)'

This tells us which compiler was used. That, in turn, can tell us what floating-point libraries were used. This may help determine which underlying mathematical libraries are in use.

Decimal numbers

We need to be careful with money. Words to live by: the accountants watching over spies are a tight-fisted bunch.

What's important is that floating-point numbers are an approximation. We can't rely on approximations when working with money. For currency, we need exact decimal values, nothing else will do. Decimal numbers can be used with the help of an extension module. We'll import the decimal.Decimal class definition to work with currency. It looks like this:

>>> from decimal import Decimal

The informant we bribed to locate the device wants to be paid 50,000 Greek Drachma for the information on the missing device. When we submit our expenses, we'll need to include everything, including the cab fare (23.50 dollars) and the expensive lunch we had to buy her (12,900 GRD).

Why wouldn't the informant accept Dollars or Euros? We don't want to know, we just want their information. Recently, Greek Drachma were trading at 247.616 per dollar.

What's the exact budget for the information? In drachma and dollars?

First, we will convert currency exact to the mil (1000 of a dollar):

>>> conversion=Decimal("247.616")
>>> conversion
Decimal('247.616')

The tab for our lunch, converted from drachma to dollars, is calculated as follows:

>>> lunch=Decimal("12900")
>>> lunch/conversion
Decimal('52.09679503747738433703799431')

What? How is that mess going to satisfy the accountants?

All those digits are a consequence of exact division: we get a lot of decimal places of precision; not all of them are really relevant. We need to formalize the idea of rounding off the value so that the government accountants will be happy. The nearest penny will do. In the Decimal method, we'll use the quantize method. The term quantize refers to rounding up, rounding down, and truncating a given value. The decimal module offers a number of quantizing rules. The default rule is ROUND_HALF_EVEN: round to the nearest value; in the case of a tie, prefer the even value. The code looks as follows:

>>> penny=Decimal('.00')
>>> (lunch/conversion).quantize(penny)
Decimal('52.10')
That's much better. How much was the bribe we needed to pay?
>>> bribe=50000
>>> (bribe/conversion).quantize(penny)
Decimal('201.93')

Notice that the division involved an integer and a decimal. Python's definition of decimal will quietly create a new decimal number from the integer so that the math will be done using decimal objects.

The cab driver charged us US Dollars. We don't need to do much of a conversion. So, we will add this amount to the final amount, as shown in the following code:

>>> cab=Decimal('23.50')
That gets us to the whole calculation: lunch plus bribe, converted, plus cab.
>>> ((lunch+bribe)/conversion).quantize(penny)+cab
Decimal('277.52')

Wait. We seem to be off by a penny. Why didn't we get 277.53 dollars as an answer?

Rounding. The basic rule is called round half up. Each individual amount (52.10 and 201.93) had a fraction of a penny value rounded up. (The more detailed values were 52.097 and 201.926.) When we computed the sum of the drachma before converting, the total didn't include the two separately rounded-up half-penny values.

We have a very fine degree of control over this. There are a number of rounding schemes, and there are a number of ways to define when and how to round. Also, some algebra may be required to see how it all fits together.

Complex numbers

We also have complex numbers in Python. They're written with two parts: a real and an imaginary value, as shown in the following code:

>>> 2+3j
(2+3j)

If we mix complex values with most other kinds of numbers, the results will be complex. The exception is decimal numbers. But why would we be mixing engineering data and currency? If any mission involves scientific and engineering data, we have a way to deal with the complex values.

Outside the numbers

Python includes a variety of data types, which aren't numbers. In the Handling text and strings section, we'll look at Python strings. We'll look at collections in Chapter 2, Acquiring Intelligence Data.

Boolean values, True and False, form their own little domain. We can extract a Boolean value from most objects using the bool() function. Here are some examples:

>>> bool(5)
True
>>> bool(0)
False
>>> bool('')
False
>>> bool(None)
False
>>> bool('word')
True

The general pattern is that most objects have a value True and a few exceptional objects have a value False. Empty collections, 0, and None have a value False. Boolean values have their own special operators: and, or, and not. These have an additional feature. Here's an example:

>>> True and 0
0
>>> False and 0
False

When we evaluate True and 0, both sides of the and operator are evaluated; the right-hand value was the result. But when we evaluated False and 0, only the left-hand side of and was evaluated. Since it was already False, there was no reason to evaluate the right-hand side.

The and and or operators are short-circuit operators. If the left side of and is False, that's sufficient and the right-hand side is ignored. If the left-hand side of or is True, that's sufficient and the right-hand side is ignored.

Python's rules for evaluation follow mathematic practice closely. Arithmetic operations have the highest priority. Comparison operators have a lower priority than arithmetic operations. The logical operators have a very low priority. This means that a+2 > b/3 or c==15 will be done in phases: first the arithmetic, then the comparison, and finally the logic.

Mathematical rules are followed by arithmetic rules. ** has a higher priority than *, /, //, or %. The + and operators come next. When we write 2*3+4, the 2*3 operation must be performed first. The bit fiddling is even lower in priority. When you have a sequence of operations of the same priority (a+b+c), the computations are performed from left to right. If course, if there's any doubt, it's sensible to use parenthesis.

Assigning values to variables

We've been using the REPL feature of our Python toolset. In the long run, this isn't ideal. We'll be much happier writing scripts. The point behind using a computer for intelligence gathering is to automate data collection. Our scripts will require assignment to variables. It will also require explicit output and input.

We've shown the simple, obvious assignment statement in several examples previously. Note that we don't declare variables in Python. We simply assign values to variables. If the variable doesn't exist, it gets created. If the variable does exist, the previous value is replaced.

Let's look at some more sophisticated technology for creating and changing variables. We have multiple assignment statements. The following code will assign values to several variables at once:

>>> length, width = 2+Fraction(1,4), 4+Fraction(7,8)
>>> length
Fraction(9, 4)
>>> width
Fraction(39, 8)
>>> length >= width
False

We've set two variables, length and width. However, we also made a small mistake. The length isn't the larger value; we've switched the values of length and width. We can swap them very simply using a multiple assignment statement as follows:

>>> length, width = width, length
>>> length
Fraction(39, 8)
>>> width
Fraction(9, 4)

This works because the right-hand side is computed in its entirety. In this case, it's really simple. Then all of the values are broken down and assigned to the named variables. Clearly, the number of values on the right have to match the number of variables on the left or this won't work.

We also have augmented assignment statements. These couple an arithmetic operator with the assignment statement. The following code is an example of +=: using assignment augmented with addition. Here's an example of computing a sum from various bits and pieces:

>>> total= 0
>>> total += (lunch/conversion).quantize(penny)
>>> total += (bribe/conversion).quantize(penny)
>>> total += cab
>>> total
Decimal('277.53')

We don't have to write total = total +.... Instead, we can simply write total += .... It's a nice clarification of what our intent is.

All of the arithmetic operators are available as augmented assignment statements. We might have a hard time finding a use for %= or **=, but the statements are part of the language.

The idea of a nice clarification should lead to some additional thinking. For example, the variable named conversion is a perfectly opaque name. Secrecy for data is one thing: we'll look at ways to encrypt data. Obscurity through shabby processing of that data often leads to a nightmarish mess. Maybe we should have called it something that defines more clearly what it means. We'll revisit this problem of obscurity in some examples later on.

Writing scripts and seeing output

Most of our missions will involve gathering and analyzing data. We won't be creating a very sophisticated User Interface (UI). Python has tools for building websites and complex graphical user interfaces (GUIs). The complexity of those topics leads to entire books to cover GUI and web development.

We don't want to type each individual Python statement at the >>> prompt. That makes it easy to learn Python, but our goal is to create programs. In GNU/Linux parlance, our Python application programs can be called scripts. This is because Python programs fit the definition for a scripting language.

For our purposes, we'll focus on scripts that use the command-line interface (CLI) Everything we'll write will run in a simple terminal window. The advantage of this approach is speed and simplicity. We can add graphic user interfaces later. Or we can expand the essential core of a small script into a web service, once it works.

What is an application or a script? A script is simply a plain text file. We can use any text editor to create this file. A word processor is rarely a good idea, since word processors aren't good at producing plain text files.

If we're not working from the >>> REPL prompt, we'll need to explicitly display the output. We'll display output from a script using the print() function.

Here's a simple script we can use to produce a receipt for bribing (encouraging) our informant.

From decimal import Decimal:

PENNY= Decimal('.00')

grd_usd= Decimal('247.616')
lunch_grd= Decimal('12900')
bribe_grd= 50000
cab_usd= Decimal('23.50')

lunch_usd= (lunch_grd/grd_usd).quantize(PENNY)
bribe_usd= (bribe_grd/grd_usd).quantize(PENNY)

print( "Lunch", lunch_grd, "GRD", lunch_usd, "USD" )
print( "Bribe", bribe_grd, "GRD", bribe_usd, "USD" )
print( "Cab", cab_usd, "USD" )
print( "Total", lunch_usd+bribe_usd+cab_usd, "USD" )

Let's break this script down so that we can follow it. Reading a script is a lot like putting a tail on an informant. We want to see where the script goes and what it does.

First, we imported the Decimal definition. This is essential for working with currency. We defined a value, PENNY, that we'll use to round off currency calculations to the nearest penny. We used a name in all caps to make this variable distinctive. It's not an ordinary variable; we should never see it on the left-hand side of an assignment statement again in the script.

We created the currency conversion factor, and named it grd_usd. That's a name that seems meaningful than conversion in this context. Note that we also added a small suffix to our amount names. We used names such as lunch_grd, bribe_grd, and cab_usd to emphasize which currency is being used. This can help prevent head-scrambling problems.

Given the grd_usd conversion factor, we created two more variables, lunch_usd and bribe_usd, with the amounts converted to dollars and rounded to the nearest penny. If the accountants want to fiddle with the conversion factor—perhaps they can use a different bank than us spies—they can tweak the number and prepare a different receipt.

The final step was to use the print() function to write the receipt. We printed the three items we spent money on, showing the amounts in GRD and USD. We also computed the total. This will help the accountants to properly reimburse us for the mission.

We'll describe the output as primitive but acceptable. After all, they're only accountants. We'll look into pretty formatting separately.

Gathering user input

The simplest way to gather input is to copy and paste it into the script. That's what we did previously. We pasted the Greek Drachma conversion into the script: grd_usd= Decimal('247.616'). We could annotate this with a comment to help the accountants make any changes.

Additional comments come at the end of the line, after a # sign. They look like this:

grd_usd= Decimal('247.616') # Conversion from Mihalis Bank 5/15/14

This extra text is part of the application, but it doesn't actually do anything. It's a note to ourselves, our accountants, our handler, or the person who takes over our assignments when we disappear.

This kind of data line is easy to edit. But sometimes the people we work with want more flexibility. In that case, we can gather this value as input from a person. For this, we'll use the input() function.

We often break user input down into two steps like this:

        entry= input("GRD conversion: ")
        grd_usd= Decimal(entry)

The first line will write a prompt and wait for the user to enter the amount. The amount will be a string of characters, assigned to the variable entry. Python can't use the characters directly in arithmetic statements, so we need to explicitly convert them to a useful numeric type.

The second line will try to convert the user's input to a useful Decimal object. We have to emphasize the try part of this. If the user doesn't enter a string that represents valid Decimal number, there will be a major crisis. Try it.

The crisis will look like this:

>>> entry= input("GRD conversion: ")
GRD conversion: 123.%$6
>>> grd_usd= Decimal(entry)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

Rather than this, enter a good number. We entered 123.%$6.

The bletch starting with Traceback indicates that Python raised an exception. A crisis in Python always results in an exception being raised. Python defines a variety of exceptions to make it possible for us to write scripts that deal with these kinds of crises.

Once we've seen how to deal with crises, we can look at string data and some simple clean-up steps that can make the user's life a little easier. We can't fix their mistakes, but we can handle a few common problems that stem from trying to type numbers on a keyboard.

Handling exceptions

An exception such as decimal.InvalidOperation is raised when the Decimal class can't parse the given string to create a valid Decimal object. What can we do with this exception?

We can ignore it. In that case, our application program crashes. It stops running and the agents using it are unhappy. Not really the best approach.

Here's the basic technique for catching an exception:

    entry= input("GRD conversion: ")
    try:
        grd_usd= Decimal(entry)
    except decimal.InvalidOperation:
        print("Invalid: ", entry)

We've wrapped the Decimal() conversion and assignment in a try: statement. If every statement in the try: block works, the grd_usd variable will be set. If, on the other hand, a decimal.InvalidOperation exception is raised inside the try: block, the except clause will be processed. This writes a message and does not set the grd_usd variable.

We can handle an exception in a variety of ways. The most common kind of exception handling will clean up in the event of some failure. For example, a script that attempts to create a file might delete the useless file if an exception was raised. The problem hasn't been solved: the program still has to stop. But it can stop in a clean, pleasant way instead of a messy way.

We can also handle an exception by computing an alternate answer. We might be gathering information from a variety of web services. If one doesn't respond in time, we'll get a timeout exception. In this case, we may try an alternate web service.

In another common exception-handling case, we may reset the state of the computation so that an action can be tried again. In this case, we'll wrap the exception handler in a loop that can repeatedly ask the user for input until they provide a valid number.

These choices aren't exclusive and some handlers can perform combinations of the previous exception handlers. We'll look at the third choice, trying again, in detail.

Looping and trying again

Here's a common recipe for getting input from the user:

grd_usd= None
while grd_usd is None:
    entry= input("GRD conversion: ")
    try:
        grd_usd= Decimal(entry)
    except decimal.InvalidOperation:
        print("Invalid: ", entry)
print( grd_usd, "GRD = 1 USD" )

We'll add a tail to this and follow it around for a bit. The goal is to get a valid decimal value for our currency conversion, grd_usd. We'll initialize that variable as Python's special None object.

The while statement makes a formal declaration of our intent. We're going to execute the body of the while statement while the grd_usd variable remains set to None. Note that we're using the is operator to compare grd_usd to None. We're emphasizing a detail here: there's only one None object in Python and we're using that single instance. It's technically possible to tweak the definition of ==; we can't tweak the definition of is.

At the end of the while statement, grd_usd is None must be False; we can say grd_usd is not None. When we look at the body of the statement, we can see that only one statement sets grd_usd, so we're assured that it must be a valid Decimal object.

Within the body of the while statement, we've used our exception-handling recipe. First, we prompt and get some input, setting the entry variable. Then, inside the try statement, we attempt to convert the string to a Decimal value. If that conversion works, then grd_usd will have that Decimal object assigned. The object will not be None and the loop will terminate. Victory!

If the conversion of entry to a Decimal value fails, the exception will be raised. We'll print a message, and leave grd_usd alone. It will still have a value of None. The loop will continue until a valid value is entered.

Python has other kinds of loops, we'll get to them later in this chapter.