The Cryptopals (née Matasano) Crypto Challenges, originally released in 2013, are a series of coding challenges in cryptography. They largely involve writing implementations of modern cryptosystems; then, more importantly, breaking them in well-understood but (at the time) rarely-implemented ways. The topics include symmetric- and public-key cryptosystems, digital signature schemes, cryptographic hashes, random number generators, timing attacks, and elliptic curves. All in all, the challenges are a lot of fun that ramp up from 'very easy' to 'somewhat challenging' to 'actually pretty difficult'.
This repository holds my Haskell solutions to all 66 challenges from all eight problem sets. I've put it online because
- There aren't many complete solutions; most repositories trail off after solving the first couple of problem sets, and even the more complete ones usually only have the first six problem sets (all that were originally published);
- Very few solutions are online in Haskell.
My solutions are fairly heavily documented in
Literate Haskell,
in Markdown to make them more easily readable from github.
I'm not claiming them as exemplar Haskell style,
just a possible, fairly idiomatic way to implement the challenges.
There is a test for every Challenge, in the test/
directory,
which confirms that the cryptosystem or exploit works as expected.
Most of the Challenges have a module of their own, in the src/
directory;
this directory also contains functions and data structures useful in
multiple Challenges, in their own modules.
There were originally six sets of eight problems each, sent by email, one set at a time on demand. Two more sets were eventually released, for a total of eight sets and 64 problems. By the time I did the challenges, several years after the original release, the first seven sets were available online at [https://cryptopals.com]. Set 8 was not available until it was re-released in 2018, with two bonus challenges, as part of a political fundraiser; it's now online at [https://toadstyle.org/cryptopals/].
-
Convert hex to base64: The
Bytes
module introduces theHasBytes
class, which describes anything that can be converted back and forth from aByteString
. The modulesBytes.Hex
andBytes.Base64
contain instances for hex- and base64-encoded strings. -
Fixed XOR: Adds the
xorb
function, which performs a byte-by-byte XOR of twoByteString
s, to theBytes
module. -
Single-byte XOR cipher: We can break a monoalphabetic XOR cipher by trying every possible key (there are 256 of them) and comparing the decrypted result of each against the expected statistical properties of the plaintext. In this case, we look just at the distribution of letter frequencies.
XORCipher
sets up and breaks a monoalphabetic XOR cipher; using the tools defined in theDistribution
module, candidate plaintexts are compared against an approximate distribution of English text defined inDistribution/English
. TheUtil
module contains various small utility functions. -
Detect single-character XOR: Just try decrypting every line of the data file. Only one will have a decryption to anything like English text.
-
Implement repeating-key XOR: Polyalphabetic XOR is carried out by the function
polyXOR
inXORCipher
. -
Break repeating-key XOR: The function
breakPolyXOR
inXORCipher
decrypts a polyalphabetic XOR cipher by splitting it into multiple monoalphabetic XOR ciphers. Splitting is done bychunksOf
, defined inBytes
; some new functions were also added toUtil
. -
AES in ECB mode: We use AES primitives from the cryptonite package. The appropriate wrapper functions are defined in the
AES
module. -
Detect AES in ECB mode: I don't see how this could be done in general; I don't think any "normal" text will have lots of repeated 16-byte blocks. But one of the texts in the input file does, so (since ECB is a permutation on 16-byte blocks) it must be the text the Challenge is talking about. The function
countRepeats
inUtil
is the only added code.
-
Implement PKCS#7 padding: The function
padPKCS7
is defined inPadding.PKCS7
. -
Implement CBC mode: Encrypting is a simple left scan, while decrypting is just a zip.
encryptCBC
anddecryptCBC
are defined inAES
. -
An ECB/CBC detection oracle: Our first chosen-plaintext attack. We can tell if an encryption system is using ECB or CBC by encrypting a plaintext with many repeated blocks; an ECB-encrypted text will then have many repeated ciphertext blocks, while the text encrypted with CBC will not. This test is carried out by module
Challenge11
, using new functions from the modulesRandom
andBlockTools
. -
Byte-at-a-time ECB decryption (Simple): "Simple" in that we only have to find the unknown suffix to our chosen plaintext.
Challenge12
has the relevant code, with helper functions inBlockTools
. -
ECB cut-and-paste: Break an encrypted user profile to gain admin privileges. This involves writing a profile creator, sanitizer, and validator, then a function which breaks it. Both are in
Challenge13
, supported by PKCS#7 validation inPadding.PKCS7
and new functions fromBlockTools
andUtil
. -
Byte-at-a-time ECB decryption (Harder): "Harder" because we now have to deal with a prefix of unknown length. It's not much harder, though; it's pretty simple using our existing machinery to turn an infix oracle into something that our code from Challenge 12 can solve. The function to do that is in the module
Challenge14
. -
PKCS#7 padding validation: We already did this to solve Challenge 13, so there's nothing new here.
-
CBC bitflipping attacks: Because in CBC mode we XOR each ciphertext block against the next plaintext block, any change to a ciphertext block will cause the equivalent XOR to the next plaintext block. We can use this to inject all sorts of nasty stuff into plaintext. The code is in
Challenge16
.
-
The CBC padding oracle: My favourite Challenge thus far. Because we know what bytes have to appear at the end of valid PKCS#7 padding, we can use a padding oracle, which only tells us whether a chosen cipher's padding is valid, to completely decrypt any message. The function
breakCBCPadding
is defined in moduleChallenge17
. -
Implement CTR, the stream cipher mode: Encrypt the numbers 0, 1, 2... and use the resulting blocks as a keystream to XOR against the message. Encryption and decryption functions added to the
AES
module, with some utilities inUtil
andBytes.Integral
. -
Break fixed-nonce CTR mode using substitutions: AKA "Try to solve this manually so you see how much easier it is in Challenge 20 when we do it systematically!" I fiddled with this in ghci for a while, and decrypted most of the lines; but systematically is better! Thus:
-
Break fixed-nonce CTR statistically: Fixed-nonce CTR is just a repeated keystream, i.e. a polyalphabetic XOR cipher. We can use pretty much the same machinery to break it.
-
Implement the MT19937 Mersenne Twister RNG: The Mersenne Twister implementation is in module
MersenneTwister
. -
Crack a MT19937 seed: AKA "Why you shouldn't use the current time to seed your RNG". What code is needed is in
Challenge22
. -
Clone an MT19937 RNG from its output:
cloneMT
added to theMersenneTwister
module. Most interesting to me is that you can untemper any 624 successive RNG outputs, stick them into a single block, and get a new MT generator that reproduces the exact output of the first, but does twisting at a different time. -
Create the MT19937 stream cipher and break it: For the first part of the Challenge, the keyspace is so small that we can just brute-force it. Is there some cleverer way to proceed, using the properties of MT specifically? Code in
Challenge24
.
-
Break "random access read/write" AES CTR: "Because K XOR 0 == K!" Code for both editing and breaking the cipher in
Challenge25
. -
CTR bitflipping: Even easier than the CBC variant. Code in
Challenge26
. -
Implement a SHA-1 keyed MAC: The first hash-authentication Challenge. SHA-1 hashes and MACs are defined in the module
Hash
. -
Break a SHA-1 keyed MAC using length extension: This attack is so easy that it's amazing that it's "very useful" in the real world. The attack is in
Challenge29
, and it needs the definition of SHA-1 padding fromPadding.Hash
. -
Break an MD4 keyed MAC using length extension: The same as Challenge 29, with MD4 instead of SHA-1. The attack is in
Challenge30
; we also have to add MD4-specific stuff toHash
andPadding.Hash
. -
Implement and break HMAC-SHA1 with an artificial timing leak: This one has a lot of moving parts. Since we need to spin up a webserver for testing, there's a new test driver,
TimingTests
. The webserver validates requests authenticated by an HMAC (newly added toHash
); but the HMAC is compared usinginsecure_compare
(defined in moduleTiming
), which inserts an artificial, tunable pause between the comparison of each sequential byte. This means that there is a measurable difference between the comparison of two strings which match on the first n bytes and two which match on the first n+1 bytes. The timing attack (implemented inChallenge31
) uses this difference to discover the valid MAC by repeated queries. -
Break HMAC-SHA1 with a slightly less artificial timing leak: The attack from Challenge 31 only works consistently with a per-byte delay down to about 30ms. By using multiple queries per candidate, and some simple statistics, we can get an attack that finds the MAC even with a delay four orders of magnitude smaller. I haven't tried running this with a delay below 5µs; it worked with that delay, but took almost all weekend and over a million queries. The updated attack is implemented in
Challenge32
.
-
Implement Diffie-Hellman: Halfway through the Challenges, we come to number-theoretic cryptography. One of the most-used modules from now on will be
Modulo
, which implements arithmetic on numbers modulo a large integer. Diffie-Hellman itself is described in the modulePublicKey.DiffieHellman
, with more general public key infrastructure defined inPublicKey
. -
Implement a MITM key-fixing attack on Diffie-Hellman with parameter injection: We simulate a communication protocol with threads communicating via
MVar
s; this machinery is inCommChannel
. The protocols for participants A and B, and the man-in-the-middle, are defined inChallenge34
. -
Implement DH with negotiated groups, and break with malicious "g" parameters: More DH parameter meddling. The A and B participants are the ones from Challenge 34, while the new eavesdroppers are defined in
Challenge35
. -
Implement Secure Remote Password (SRP): The toy client and server implementations are in
Challenge36
. -
Break SRP with a zero key: The (hilariously simple) attacker is in
Challenge37
. -
Offline dictionary attack on simplified SRP: The updated, weaker SRP client and server, plus the impostor server, are defined in
Challenge38
. -
Implement RSA: RSA is in
PublicKey.RSA
. -
Implement an E=3 RSA Broadcast attack: The attack is in
Challenge40
, but it's basically just integer cube root composed with the Chinese remainder theorem. Both are in the moduleMath
.
-
Implement unpadded message recovery oracle: Since RSA encryption is such simple math, it's almost transparent to operations like multiplication, which lets us to stuff like recover unpadded messages. The code is in
Challenge41
. -
Bleichenbacher's e=3 RSA Attack: Such a cool one! It's so easy to forget that the message has to go at the end of the byte string, and you pay for it if you do. RSA signatures (including a valid verifier) are implemented in
PublicKey.RSA
; they use the PKCS#1 padding scheme, implemented inPadding.PKCS1
. The broken validator and an attack against it are inChallenge42
. -
DSA key recovery from nonce: Calling all of these different non-repeated throwaway values "nonces" seems really misleading. (And we'll see more "nonces" later on!) DSA is implemented in
PublicKey.DSA
; code to break the use of a bad "nonce" is inChallenge43
. -
DSA nonce recovery from repeated nonce: Once again, repeat the nonce, give up the game. Code is in
Challenge44
. -
DSA parameter tampering: More fun with invalid parameters. Unfortunately for us, setting g to zero won't work; if g is zero, then any signature with r = 0 is valid, but the DSA specification specifically states that no signature is valid if r is zero. Setting g to p+1 works, though. Code is in
Challenge45
. -
RSA parity oracle: Really cute, but only a warmup for what's coming next. Code is in
Challenge46
. -
Bleichenbacher's PKCS 1.5 Padding Oracle (Complete Case): Probably my favourite Challenge in the entire series. (Though there are a couple of really good ones coming later too.) Relies on the same idea as Challenge 46 - that you can multiply 'through' RSA encoding - but works by finding factors we can multiply by to get valid PKCS#1 padding. When we find a factor that gives us a valid padding, we can limit the possible message to some finite union of intervals. Challenge 47 is to implement only part of the algorithm - basically everything except handling multiple intervals. It's actually not much harder to go all the way to the full algorithm, so only the complete attack is implemented here; intervals, including intersection and union operations, are implemented in module
Interval
, while the attack itself is inChallenge48
.This is the first attack in a while that takes more than a second or so to run (at least on my PCs). As such, there's a new test driver. It prints out the known upper bound after each iteration, resulting in a slow convergence from gibberish to the actual message. The Challenges call it a "hollywood style" decryption, so the new driver is
Hollywood
.
-
CBC-MAC Message Forgery: Fairly straightforward. CBC-MAC sounds like a not-so-good idea, since you can just add in a single glue block of your choice to paste two hashes together. Implementation and attacks are in
Challenge49
. -
Hashing with CBC-MAC: More fun with CBC-MAC. The idea of backing up from the end of the cipher to find the output hash of the block is pretty cool. Code in
Challenge50
. -
Compression Ratio Side-Channel Attacks: Very occasionally the test for this will fail. I think it's because the random session ID contains part of a word that appears elsewhere in the header, and thus the ID doesn't compress as a block; I'm not sure how to deal with this. The attack is in
Challenge51
. -
Iterated Hash Function Multicollisions: Merkle-Damgard hashes are implemented in
Hash.MerkleDamgard
, and general hash collision generators are inHash.Collision
. The Challenge is addressed inChallenge52
. -
Kelsey and Schneier's Expandable Messages: Implementation in
Challenge53
; in addition, a new function for finding collisions between different hash functions (or different IVs) has been added toHash.Collision
. -
Kelsey and Kohno's Nostradamus Attack: This one's cute; we build a massive binary tree of hash collisions so we can more easily collide from the hash of our faked blocks into one of the leaves and down to the final hash value. Implementation is in
Challenge54
. -
MD4 Collisions: This one's exhausting. Implementing MD4 is just the start. There's big table of conditions on individual bit values during MD4 evaluation, and the paper is not clear on where these values come from. Ensuring the first hundred or so conditions isn't so bad (after you lose your eyesight debugging by squinting at bit patterns), but for the last few dozen you start trampling on your own work. Again, the paper's suggestion of "more precise modification" is lacking in any explanation, besides a single example. Have fun figuring a way around that!
The final implementation is in
Challenge55
; it's the longest source file in this repository by far. -
RC4 Single-Byte Biases: More statistics! This one's pretty slow (something like half an hour), and there's more slow ones coming up, so there's a new test driver
SlowTests
. The implementation (and table of the observed biases of more than forty billion ciphers) is inChallenge56
.
-
Diffie-Hellman Revisited: Small Subgroup Confinement: There's a lot going on with this Set. Many of the attacks are general; they work in any group. We're only dealing with the multiplicative group modulo a prime right now, but elliptic curves are coming up soon! The group-agnostic attacks are located in module
GroupOps
; there's a specialnewtype
inModulo
with the appropriateSemigroup
andMonoid
instances. The Challenge itself is dealt with inChallenge57
. -
Pollard's Method for Catching Kangaroos: The kangaroo chase (and the underlying reason it works) is one of the coolest things I learned doing these Challenges. It's implemented (and (hopefully) clearly explained) in
GroupOps
. The Challenge using it is inChallenge58
. -
Elliptic Curve Diffie-Hellman and Invalid-Curve Attacks: On to elliptic curves! They're defined (in the Weierstrass formulation) in module
EllipticCurve
, while EC Diffie-Hellman is defined inPublicKey.ECDiffieHellman
. Generating random EC points is done inRandom
, using a modular square root function fromMath
. Nothing needs to change inGroupOps
, since those operations work on elements of any group! The Challenge itself isChallenge59
. -
Single-Coordinate Ladders and Insecure Twists: An alternate elliptic-curve formulation, from Montgomery; the space is defined in
EllipticCurve
, the EC Diffie-Hellman variant inPublicKey.ECDiffieHellman
. The attack is inChallenge60
.Unfortunately, here we find a problem with working with elliptic curves. Combining two group elements in modular integers is just a matter of a single modular multiplication. Combining two group elements in elliptic curves requires a modular inversion, which needs something like O(log n) divisions. This makes a lot of attacks more than an order of magnitude slower in ECs than modular integers. In this Challenge, the problem is magnified because we have to convert from Montgomery to Weierstrass elliptic curves for the attack; but the transformation is not one-to-one, so we end up having to test up to four possibilities. When each kangaroo chase is thirty times slower than the equivalent, this adds up. As a result, this Challenge finds itself in
SlowTests
. -
Duplicate-Signature Key Selection in ECDSA (and RSA): This one's neat, and points out how the name "digital signature" is sort of misleading; just because you can validate a signature doesn't mean that you were the one who created it in the first place! Elliptic curve DSA is implemented in
PublicKey.ECDSA
; the attack on RSA uses the full Pohlig-Hellman and Pollard rho algorithms, which are implemented inGroupOps
; we also need to solve modular linear equations, implemented inMath
. The attacks themselves are inChallenge61
. -
Key-Recovery Attacks on ECDSA with Biased Nonces: An interesting Challenge that introduced me to lattice basis reduction. The Lenstra-Lenstra-Lovász algorithm is implemented in module
LLL
, while the attack using it is inChallenge62
. -
Key-Recovery Attacks on GCM with Repeated Nonces: Our whirlwind tour takes us on to GCM! Playing with Galois fields here kind of makes me wish I'd taken advanced algebra back in school. GCM and GCM-MACs are implemented in module
GCM
. The attack on repeated nonces is inChallenge63
, and it uses a simple polynomial type defined in modulePolynomial
. -
Key-Recovery Attacks on GCM with a Truncated MAC: This is a completely different kind of attack, relying on the fact that some blocks in the GCM polynomial are squared, taken to the fourth power, and so on; since these are linear operations in GF(2^128), we can reduce changing these blocks to a problem in linear algebra! This is probably the most complex attack in the series; I had to go over it several times piece by piece before I really understood what it was doing and how it worked. Linear algebra on bit matrices is implemented in module
BitMatrix
; the attack itself is inChallenge64
. This (and the follow-up, Challenge 65) is another slow one, so it's tested by the driverSlowTests
. -
Truncated-MAC GCM Revisited: Improving the Key-Recovery Attack via Ciphertext Length Extension: The first of two bonus Challenges is also an extension of Challenge 64. This time we get to mess (slightly) with the length block, which in practice speeds up our first forgery by a factor of two. Since most of the time taken is for that forgery, it's actually a huge improvement. The Challenge is implemented in
Challenge65
. -
Exploiting Implementation Errors in Diffie-Hellman: The second bonus Challenge, and the last in the series (so far!). The implementation is in
Challenge66
. It's a slow one again, tested bySlowTests
.