Exploring Groups and Codes
with Python

Part 4: Putting Theory into Practice

by
Kirby Urner
First Posted: March 22, 2001
Last Updated: July 16, 2001


Our Enigma-style encrypting machine will load a user-defined number of permutations p1, p2... Each permutation, when sent into a power-loop, defines a subgroup which periodically cycles back through the identity element. The length of this period is the order of the subgroup (and will divide the order of the group).

These so-called rotors may be randomly initialized or, if the user needs to duplicate the setup of another enigma object, may be initialized with specific substitution dictionaries.

The rotors are setup "odometer style" such that just as the left-most completes a complete power cycle, the one just to its right "clicks" one position, i.e. goes to a next higher power. All the rotors multiplied together give the substitution at any given moment -- and this keeps changing, because we drive the left-most rotor one click with each symbol to be enciphered.

A Secret Key System

The only reason this is a viable scheme is because multiplication is associative and because every permutation has an inverse. If we know the rotors that were used, and their order, we can run the machine in reverse against the ciphertext, retracing our steps, to regain the plaintext. So the rotors and their order become the information which needs to get to our intended recipient (and no one else). This is a classic example of a "secret key" system, which depends on two people having access to the same secret key.

Sometimes people work out a way to generate the same keys without transmitting anything. For example, there might be an algorithm which uses the xth story on the yth page of the New York Times for any given day, and somehow builds substitution dictionaries from that text. If our sender and receiver share this algorithm, and access to the New York Times, it's conceivable that they could build identical enciphering and deciphering machines on any given day (they also need to have the same values for x and y).

An "odometer" set to advance the rotors in synch with their respective periods (each defines a subgroup of a particular order i.e. size), is a special case of a mixed base number. You can think of an ordinary base 10 number as a series of rotors all set to modulo 10. When we add to this number, rotors going from 9 back to 0 trigger a "carry" to the next rotor, which clicks to a next position, and perhaps carries further to the left, and so on.

A "mixed base" number works the same way, except the modulus for each slot may be different. A good example is our time-keeping system, wherein the units are seconds, minutes, hours, days, and weeks such that 60 seconds = 1 minute, 60 minutes = 1 hour, 24 hours = 1 day, 7 days = 1 week. Adding seconds to the left-most digit will drive our "time piece" according to the modulus at each position.

In the last section, we used R-class elements as members of an Rgroup. These elements are defined with respect to a modulus in just the way we need. Our mixed base Number class will put R-class objects in successive odometer positions and keep track of the carries, such that adding to our number will drive it upward in just the way we need to keep our rotors rotating.

classNumber:

    def__init__(self,columns):
        self.columns = columns
        bases = []
        for i in self.columns:
            bases.append(i.n)
        self.bases = bases

Below is the initialization method for Enigma. It expects permutations as inputs, computes their respective orders, and uses this information to initialize a counter object, one of our mixed based Number-class objects.

classEnigma:

    def__init__(self,rotors):
        self.rotors=rotors
        columns = []
        for rotor in rotors:
            columns.append( R(0,rotor.ord()) )        
        self.counter = Number(columns)
        self.permute = P(mkdict([])) # initialize w/ identity dictionary

Number objects are fun to play with. For example, if you initialize all the slots to elements of the same modulus (e.g. 2), we can use the Number object as a base-converter:

>>> converter = Number( (R(0,2),R(0,2),R(0,2),R(0,2)) )
>>> converter
(0, 0, 0, 0)
>>> converter = converter + [0,0,0,13]  # use list to add
>>> converter   # returns 13 in base 2
[1, 1, 0, 1]
>>> 8 + 4 + 0 + 1     # confirm answer
13

Likewise, we can initialize a mixed base Number on the model of time-keeping, and convert a number of seconds into the corresponding number of weeks, days, hours and minutes (plus any seconds left over):

>>> converter = Number ( (R(0,52),R(0,7),R(0,24),R(0,60),R(0,60)) )
>>> converter = converter + [0,0,0,0,1000000]  # add a million seconds...
>>> converter     # = 1 week, 4 days, 13 hours, 46 minutes and 40 seconds
[1, 4, 13, 46, 40]
>>> 7*24*60*60 + 4*24*60*60 + 13*60*60 + 46*60 + 40  # confirm answer
1000000

Our Enigma-class contains encrypt and decrypt methods which expect lines of text. As encrypt visits each character, it multiplies the object's internal substitution dictionary by another power of itself.

If advancing the left-most counter digit makes a difference to any digits up the line, the appropriate permutations are likewise applied. The click method actually drives this process of advancing the rotors, multiplying with rotors on the left when encrypting, or multiplying with inverted rotors on the right when decrypting.

>>> enigma = Enigma(( P(), ))  #  create a one-rotor enigma object
>>> enigma                     #  rotor's repeat cycle = 20
Enigma-class object: rotors of order [20]
>>> enigma.rotors[0]           #  let's view the actual permutation
Permutation: [('Z', 'C', 'X', 'Y', 'K', 'V', 'E', 'S', 'A', 'D', 'J', 
'G', 'O', 'B', 'M', 'Q', 'P', 'N', 'T', 'F'), ('W', 'R', 'L', 'H')]
>>> enigma.encrypt("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA")  # simple string
'DJGOBMQPNTFZCXYKVESADJGOBMQPNT'

Decrypting undoes encrypting, because e = p1 * p2...* pn * pn.inv() * ...p2.inv() * p1.inv(), where e is the identity permutation.

>>> enigma.reset()  # reinitialize at start of each decryption
>>> enigma.decrypt('DJGOBMQPNTFZCXYKVESADJGOBMQPNT')  # reversing ops
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'

We haven't done anything with files so far, but Python makes it easy to read and write to disk. A full-blown encyption system will likely read long passages and convert them on the fly. Here is a short passage saved in sample.txt, which we will encrypt using an enigma object.

   Fourscore and seven years ago our fathers brought forth
    on this continent a new nation, conceived in liberty and
    dedicated to the proposition that all men are created
    equal. Now we are engaged in a great civil war, testing
    whether that nation or any nation so conceived and so
    dedicated can long endure. We are met on a great battlefield
    of that war. We have come to dedicate a portion of that field
    as a final resting-place for those who here gave their lives
    that that nation might live. It is altogether fitting and
    proper that we should do this. But in a larger sense, we
    cannot dedicate, we cannot consecrate, we cannot hallow this
    ground. The brave men, living and dead who struggled here
    have consecrated it far above our poor power to add or detract.
    The world will little note nor long remember what we say here,
    but it can never forget what they did here.

Notice that this passage contains some symbols in addition to the letters A-Z, such as puncutation and of course the space character. It's also a mix of upper and lower case letters, but to keep things simple, we will stick with the practice of treating all letters as uppercase. The _domain may be enhanced slightly however, such that some punctuation symbols enter into the mappings. Instead of stripping out all the punctuation, or leaving where it is, as a clue to would-be code breakers, we simply make these symbols (including the space character) part of our substitution dictionary. This mades the resulting ciphertext all that much harder to decode.

>>> ciphers._domain = list(uppercase+".,- ")  # A-Z + some punctuation
>>> p1,p2,p3 = P(),P(),P()  # generate three random rotors
>>> p1
Permutation: [('.', 'O', 'Q', 'W', 'R', '-', 'N', 'Z', 'I', 'P', ' ', 'V',
'M', ',', 'U', 'T', 'K', 'J', 'G', 'X', 'L', 'H', 'S', 
'B', 'Y', 'F', 'A', 'E', 'D', 'C')]
>>> p2
Permutation: [('.', 'H'), (',', 'Z', 'S', 'P', 'U', 'K', '-', 'Q', 'L',
'W', 'I', 'A', 'F'), (' ', 'T', 'J'), ('X', 'M', 'G', 'V'), ('Y', 'R', 
'D', 'C', 'B'), ('N', 'E', 'O')]
>>> p3
Permutation: [('.', 'J', 'L', 'U', 'F', 'R', 'C', 'K', 'W', 'G', 'E', 
'Y', 'D', 'H', 'P', ' ', 'N', 'Z', 'O', 'M', ',', 'S', 'T', 'V', 'Q', 
'B', 'I')]
>>> enigma = Enigma((p1,p2,p3))
>>> enigma   # the leftmost rotor rarely turns in this case
Enigma-class object: rotors of order [30, 780, 27]
>>> encipher("sample.txt","output.txt",enigma) # encipher the sample text
   R,CGIDBHMBAJBLGIHJWWRRALMPUFHWXTRXIUQEXWTMSXZXCB RK,YO.
   PLLBLANUGN- .ESA.XMYDAV.LNYVWSDWP.DZWIRMAFZOOBSSU,SFC,LC
   O -WC,MAIHFWKSKEQQJCVWXM,LUXXLXHBSH-GF MTTHSBIFFFLEKV
   VYRLDR.BQMYHN,LIQR,PXCYIGVRWWCFVJWCZ-WJSXWZ-CAXSTE,CVPZ
   KTBMC,JVEV-.WR-O,KZ KCWVSZXRVYQJMMGCYRQG.YXATFGJTGBIQ
   UBOETJZVCRQJZZQSM-SFG.H ZHWUAJKGC,VDIHF  KRHB B-ZVB.VGO,GSPK
   LNE,EBMYRBAIGZIW AL VGJUXJZCQLJQMPAWKHAZZHXIERJJS.IBIUQEPGCO-
   UTLU RB,MASIXNREBWJ-XMQ-OUCBILIOTDODOET.HG.PTNRIQ-Q,CPRIETYX
   NG,UXHX,KSJ,YZTMMI-R LVUIYDLCQMONL,HPXFYIDQK,GIOCV,VQ LJ 
   .KPR. NZNLIAPCZVZGNNDJ.PVSVUA L ZWA.RRCHTCSRH-.SRWEPCS W
   K-HUQFYUHYJ.-WLWPXQ ZV.O-JFBMLCF..VUMUZJNDRVAGUVBXJ RSXD-DM 
   DLRLR-LIUIQU.WK-UT.AIE. NF H.VKQVFGWKIMEMXNPH,IUNS-DAHP-H
   JBUJ,RTGVHFFALTPBFKJTAPQAGRGQRZJY  HSGHHA,N DBRCUKC-YAGB-NMUVK-
   IOHTHPGCYBTSO,V.,H UACGWBIWCBFXQSDCSWRUD LVM , NC.XYZQ,CS-TP-F
   YZEFCVMI,YYVXOFB NWFG LEXEHOKFKPKNQVNRFOYYN

Again, notice that the same letter does not always have the same other letter swapped in, as per the simple scheme of the clubhouse codes. Because the rotors keep changing the substitution dictionary, we have no way to look for common word patterns e.g. it does no good to find the most frequently occuring letter and guess it might really be E (the most-used letter in English).

The real Enigma machine was not designed exactly like our Enigma class. It's odometer was not mixed base nor geared to the order of the rotor's subgroup. However, the design principles are similar.

image
>>> decipher("output.txt","decrypted.txt",enigma)

    FOURSCORE AND SEVEN YEARS AGO OUR FATHERS BROUGHT FORTH
    ON THIS CONTINENT A NEW NATION, CONCEIVED IN LIBERTY AND
    DEDICATED TO THE PROPOSITION THAT ALL MEN ARE CREATED
    EQUAL. NOW WE ARE ENGAGED IN A GREAT CIVIL WAR, TESTING
    WHETHER THAT NATION OR ANY NATION SO CONCEIVED AND SO
    DEDICATED CAN LONG ENDURE. WE ARE MET ON A GREAT BATTLEFIELD
    OF THAT WAR. WE HAVE COME TO DEDICATE A PORTION OF THAT FIELD
    AS A FINAL RESTING-PLACE FOR THOSE WHO HERE GAVE THEIR LIVES
    THAT THAT NATION MIGHT LIVE. IT IS ALTOGETHER FITTING AND
    PROPER THAT WE SHOULD DO THIS. BUT IN A LARGER SENSE, WE
    CANNOT DEDICATE, WE CANNOT CONSECRATE, WE CANNOT HALLOW THIS
    GROUND. THE BRAVE MEN, LIVING AND DEAD WHO STRUGGLED HERE
    HAVE CONSECRATED IT FAR ABOVE OUR POOR POWER TO ADD OR DETRACT.
    THE WORLD WILL LITTLE NOTE NOR LONG REMEMBER WHAT WE SAY HERE,
    BUT IT CAN NEVER FORGET WHAT THEY DID HERE.

A Public Key System

The RSA algorithm, when fully developed and implemented, is actually quite sophisticated. A long message has to be carved up into smaller chunks and block-encoded, but with the chunks chained together and treated to endow them with necessary properties. Our purpose here is not to build a commerical product, but to understand the number-theoretic basis for it, which is a comparatively simple undertaking.

As we discussed in the last section, numbers relatively prime to n, when self-multiplied modulo n, will cycle through the identity element with a period equal to the totient of n, or phi(n). They may cycle through more frequently, with a period that divides phi(n), but the point is phi(n) or some multiple thereof will always work to return us to 1 mod n, and the very next self-multiplication, being against the identity element, takes us back to our original number! So any exponent equivalent to 1 mod phi(n) will return us back to the start, to m. Given c, the encrypted message, is already m to some power e, we need to find a d such that e*d = 1 mod phi(n).

Again, the basic strategy is one of exponentiation: we raise a message to a power, thereby encrypting it. But because we know how to complete the power-cycle, ending at an exponent of 1 mod phi(n), decryption is possible. Even if we know what the encrypting exponent is, (e.g. 3), and what the modulus is (public key n), without knowing the factors of n we will not have a way for computing e's inverse d, such that e*d = 1 mod phi(n) -- because we won't even know phi(n)! But with the prime factors of n in hand, we'll be able to get a value for d, and raising the encrypted message to the dth power, will be equivalent to raising the unencrypted message to the first power, which will return the message itself.

A few caveats are in order. Firstly, it's important that m < n to start, otherwise m mod n, which is the number we return to at the end of the day, will be equivalent to our message, modulo n, but will not be the message itself. So the numeric value of our message chunks needs to remain under the static numeric value of n (remember, n is the product of very large primes, so a message chunk is likely over 1K bits, or over 125 ASCII characters -- not impractical). Secondly, m^e needs to be larger than n, otherwise decryption is simply a matter of taking the eth root of c. In that case, we won't have a discrete logarithm problem at all, and it's the difficulty of solving this problem, as well as that of factoring large numbers, upon which the security of this algorithm depends. Finally, we have to make sure that our proposed e doesn't divide phi(n) evenly (without remainder), as if it does so, then no specific value of d, such that d*e mod phi(n) = 1, will be obtainable. If you have your heart set on a particular e, then keep trying out different n's until you've got one that works (i.e. have the software do this for you).

Factoring very large numbers is a difficult proposition, if those numbers were designed to have very few prime factors, say only two, and both of them very large (over a hundred digits apiece). No one has yet devised an algorithm which will "crack" such a number into its prime factors in any reasonable amount of time. This is what makes it safe to share N as a public key: we don't expect anyone to be able to factor it, thereby gaining access to the totient information and having a way to complete the power-cycle started by the encrypting exponent.

The discerning reader might ask why we cannot regain the message by reversing the powering m ** 3 mod n (assume we know e = 3). In other words, let c = m ** 3 mod n. This means nk + c = m ** 3, or m = (nk + c) ** (1/3), where k is some non-negative integer. k will vary depending on the original m, so if a message is broken down into fixed-length message blocks, then we will need to find k for each block. Since integers with integer cube roots are relatively rare, and further apart the higher our n, we would assume we had the right value for k any time a precise cube root for (nk + c) was found.

The above approach is called finding the discrete logarithm, and is as difficult to accomplish as factoring given current knowledge.

The first challenge for the RSA algorithm is to come up with large primes p and q, factors for the composite N (the shared, public key). How shall we determine whether an 80-digit number, chosen at random, is really prime or not? Trial-by-division requires testing prime factors up to the 2nd root of the candidate, and that's impractical here. Other tests, such as the Fermat Test, might be used, but it turns out that all we really need are factors that are very probably prime, and the Miller-Rabin Algorithm is just such a test for probable primehood. A number which survives this test, when its set to a high standard, is very likely prime, and that suits our needs insofar as RSA is concerned.

Some of the algorithms needed to set up an RSA demo are included in the primes.py module, used in my Numeracy + Computer Literacy series. You will need to have that available if you want to use the RSA methods included in ciphers.py. Below, we generate a public key modulus n, and a private factor d, the inverse of 3 mod phi(n).

>>> n,d = rsasetup()
Working...
Percent chance of being prime: 99.9999999999
Elapsed time: 0.267604217302 seconds
Working...
Percent chance of being prime: 99.9999999999
Elapsed time: 0.0669421210546 seconds
Working...
Percent chance of being prime: 99.9999999999
Elapsed time: 0.0665314537619 seconds
OK!
Working...
Percent chance of being prime: 99.9999999999
Elapsed time: 0.151537907106 seconds
Working...
Percent chance of being prime: 99.9999999999
Elapsed time: 0.102013107829 seconds
Working...
Percent chance of being prime: 99.9999999999
Elapsed time: 0.0940268861364 seconds
Working...
Percent chance of being prime: 99.9999999999
Elapsed time: 0.12831173838 seconds
Working...
Percent chance of being prime: 99.9999999999
Elapsed time: 0.0843250808766 seconds
OK!
>>> n
7079808081698252822563131744360695812933441314982530047888027726444644
70451211790779808067353611870844631247L
>>> d
4719872054465501881708754496240463875288960876572912006389314194060669
22604029391865930584361685445056166747L

Although the resulting n is certainly a long integer, it's not as long as a real RSA public key would likely be. I've settled for a shorter n for demonstration purposes, although Python is certainly capable of producing a longer n using the very same algorithms -- the computer just takes a few more seconds.

The value of d was extracted using what's called the extended version of Euclid's Algorithm for finding the gcd of two numbers -- also called the binary gcd algorithm. This algorithm returns the inverse of a mod b, such that a * inverse(a,b) = 1 mod b. In this case, b is phi(n) and a is our encryption exponent e (=3 in this implementation), so d corresponds to inverse(3,phi(n)). Here's a look at bingcd in action:

>>> e = 3
>>> totient = 123941983   # number of relative primes < some N
>>> bingcd(e,totient)    # what number * e mod totient = 1?
(82627989, -2, 1)
>>> (3*82627989) % 123941983  # checking...
1
>>> b = bigppr()         # generate some random 100-digit probable prime
Working...
Percent chance of being prime: 99.9999999999
Elapsed time: 1.22037747867 seconds
>>> b                    
537746793031253143913132035521159306880773720889164352707098986334
85552352540402796114645240686711639L
>>> inverse(e,b)          # compute the inverse d, so that e*d mod b = 1
358497862020835429275421357014106204587182480592776235138065990889
90368235026935197409763493791141093L
>>> (3*inverse(e,b)) % b  # checking...
1L
>>> e*inverse(e,b)/b      # b goes into e*inverse(e,b) 2x, w/ remainder 1
2L

You may recall that our R-class contains a method for computing the inverse of any R-object. However, this method depends on knowing the totient of the R-object's modulus. In this case, the modulus in question is phi(n), the totient of n. But we don't know the totient of that totient, i.e. we have no easy way to compute phi(phi(n)). This is why we need to resort to the binary gcd algorithm.

Below we raise a number of random integers to the 3rd power, then to the 3 * d power, and confirm that raising m ** 3 to the dth power essentially undoes the encrypting effects of raising m to the 3rd power. Again, this works because of the generalization proved by Euler: raising a member of an Rgroup to the totient power, or any multiple of the totient power, returns the identity element.

>>> checkrsa(d,n)  # check random m to see that m = m**(3*d) mod n
m mod n = 85 : m**3 mod n =     614125 : m**(3*d) mod n = 85
m mod n = 80 : m**3 mod n =     512000 : m**(3*d) mod n = 80
m mod n =  6 : m**3 mod n =        216 : m**(3*d) mod n =  6
m mod n = 26 : m**3 mod n =      17576 : m**(3*d) mod n = 26
m mod n = 37 : m**3 mod n =      50653 : m**(3*d) mod n = 37
m mod n = 57 : m**3 mod n =     185193 : m**(3*d) mod n = 57
m mod n = 91 : m**3 mod n =     753571 : m**(3*d) mod n = 91
m mod n = 14 : m**3 mod n =       2744 : m**(3*d) mod n = 14
m mod n = 26 : m**3 mod n =      17576 : m**(3*d) mod n = 26
m mod n = 54 : m**3 mod n =     157464 : m**(3*d) mod n = 54

def rsaencrypt(message,pubkey):
    """
    Using default enciphering exponent of 3,
    mod some pubkey = p*q (see rsasetup)
    """
    m = mknum(message) # string letter hex codes into long integer
    return pow(m,3,pubkey)

def rsadecrypt(c,privkey,n):
    """
    (3*d) = k*totient + 1 (k some integer>0), since
    d is the inverse of 3 mod totient.  

    Since c = m**3 mod pubkey as per rsaencrypt:
    c**d mod pubkey  =  m**(k*totient + 1) mod pubkey
                     =  m**(k*totient) * m mod pubkey
                     =  1**k * m mod pubkey (Euler's Theorem)
                     =  m mod pubkey
    """
    m = pow(c,privkey,n)
    return mkphrase(m) # convert long integer back to ASCII characters 
>>> c = rsaencrypt("Able was I ere I saw Elba",n)
>>> c
2479410906483657024660446624231176765858115584497063496232494084
48953241374598156100251118445532609845650325L
>>> rsadecrypt(c,d,n)
'Able was I ere I saw Elba'


For further reading:

Return to CP4E


oregon.gif - 8.3 K
Oregon Curriculum Network