Optinimize [RE]

September 22, 2023 14 minute read

Reversing program wrote in nim-lang that uses the bigints library for big integer math.

Challenge Description

Nim is good at bignum arithmetic.

optinimize.tar.gz

Playing around

If we run this program, we see that it prints out the flag character by character, but it seemingly stops after a few characters in. Later on, I learned that the flag generation routine is just too slow, so we’ll have to optimize it, unsurprisingly, given the challenge name.

$ ./main
SECCON{3b42

Investigation

Opening this up in Ghidra, we can already see functions like NimMain, NimMainInner, etc. These tells us that we are working with nim-lang. There are also many functions in the format <name>...bigints...bigints_u.... Note the <name> portion as I will call the functions just by that name.

Upon searching “nim lang bigints”, I found this library: nimlang/bigints. Looking at the BigInts type in src/bigints.nim, and refer to initBigInt in ghidra, we can create a bigint type with the correct fields. Right click the variable (param1) > “Auto Create Structure” > right click again > “Edit Data Type”.

There are multiple initBigInt functions, mine uses .._u208 but either one can be used to create the custom type.

The mappings should be easy to figure out from the code + knowledge from the library source code we found earlier.

Understanding the binary

If we look at main (NimMainModule), we can retype some variables with our new bigint type, and Ghidra will give us better decompilations. For example:

void NimMainModule(void)
{
  ...
  do {
    _DAT_00118270 = 0;
    ...
    k__main_u69 = 0;
    _DAT_00118268 = 0;
    ...

Retyping k__main_u69 to bigint will turn this part into

void NimMainModule(void)
{
  ...
  do {
    k__main_u69.is_neg = 0;
    ...
    k__main_u69.seq_len = 0;
    k__main_u69.seq = 0;
    ...

Essentially, I just went around and retyping variables that were passed into initBigInt. Ignoring the details, we can see that NimMainModule loops through ns__main_u18 array and takes each element (n__main_u68), passing it into Q().

If we go into Q, we can retype and tidy up the code, and we see it calls P().

Because understanding the program statically is too hard, I rewrote the logic in python:

ns = [ 0x4a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x6f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x95, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xae, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xbf, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc7, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd5, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc8, 0x1a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xba, 0x24, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x43, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x26, 0x56, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd9, 0x6a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x71, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x1b, 0x90, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x9e, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb6, 0x5f, 0x1e, 0x00, 0x00, 0x00, 0x00, 0x00, 0x64, 0xf7, 0x26, 0x00, 0x00, 0x00, 0x00, 0x00, 0x9e, 0xbd, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x78, 0x76, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x3b, 0x17, 0x5b, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb1, 0xe3, 0x6f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x25, 0xef, 0x78, 0x00, 0x00, 0x00, 0x00, 0x00, 0x5f, 0x8e, 0x85, 0x00, 0x00, 0x00, 0x00, 0x00, 0x39, 0xc6, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf6, 0x6a, 0xad, 0x00, 0x00, 0x00, 0x00, 0x00, 0x96, 0x00, 0x08, 0x01, 0x00, 0x00, 0x00, 0x00, 0xcd, 0x08, 0x8e, 0x01, 0x00, 0x00, 0x00, 0x00, 0x07, 0x61, 0xbb, 0x01, 0x00, 0x00, 0x00, 0x00, 0xf1, 0x0f, 0xf5, 0x01, 0x00, 0x00, 0x00, 0x00, 0x27, 0x63, 0x5c, 0x02, 0x00, 0x00, 0x00, 0x00, 0xb6, 0x71, 0xa9, 0x02, 0x00, 0x00, 0x00, 0x00, 0x93, 0x84, 0xd6, 0x02, 0x00, 0x00, 0x00, 0x00, 0xc0, 0xf0, 0x62, 0x03, 0x00, 0x00, 0x00, 0x00, 0xad, 0x8e, 0x78, 0x03, 0x00, 0x00, 0x00, 0x00, 0xed, 0xa8, 0xca, 0x03, 0x00, 0x00, 0x00, 0x00 ]
cs = [ 0x3c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf4, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x1a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x17, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x4c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xdf, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x21, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xdf, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x4e, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xfa, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2d, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xfa, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd4, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x95, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x6d, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xce, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x69, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7d, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x95, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xea, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xeb, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x27, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x63, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x75, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x37, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd4, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ] 
assert(len(ns) == len(cs))

# UTIL FUNCTION
def arrconcatbytes(arr, size):
    result = []
    for index in range(len(arr) // size):
        val = 0
        for i in range(size):
            val |= arr[size*index + i] << (8 * i)
        result.append(val)
    return result

ns = arrconcatbytes(ns, 8)
cs = arrconcatbytes(cs, 8)

print("ns[] = { %s }" % ", ".join(map(str, ns)))
print("cs[] = { %s }" % ", ".join(map(str, cs)))

def P(x):
    a = 3
    b = 0
    c = 2

    if x == 0: x = a
    elif x == 1: x = b
    elif x == 2: x = c
    else:
        # (guessed this part)
        while 2 < x:
            a, b, c = b, c, a + b
            x -= 1

        x = c

    return x

def Q(x):
    i, j = 0, 0
    while i < x:
        j += 1
        if P(j) % j == 0:
            i = i + 1
    return j

def main():
    # not exactly the same codewise, but logically equivalent
    for i in range(len(ns)):
        k = Q(ns[i]) % 0x100
        print(chr(cs[i] ^ k), end='')

Solution

After laying the code out in python (I kind of guessed the final part in the while loop cause that part of the code barely made sense to me), I verified that it matched the output of the original program. Since I wanted to speed up the execution, it is good to know what sequence P generates, then we might be able to use some properties to speed up the loop in Q.

After running P(i) for the first few values of i (3, 0, 2, 3, 2, 5, …), I searched on OEIS and found that it generates the Perrin numbers. Q is just such that Q(x) is the x-th number n > 0, such that the n+1th Pierrin number or P(n) (P(0) is the 1st, etc.) is divisible by n. Let’s look at an example to better illustrate: Q(4) gives 5 because 1 | P(1), 2 | P(2), 3 | P(3), 5 | P(5), so the 4th is number 5.

Quoted from wikipedia:

It has been proven that for all primes p, p divides P(p). However, the converse is not true: for some composite numbers n, n may still divide P(n). If n has this property, it is called a “Perrin pseudoprime”.

Since there are only a few pseudoprimes (up to a reasonably large number) already given in the wiki page, we just need a fast way to generate all the primes (up to a large enough number), insert the pseudoprimes in, and we have a precomputed list of numbers satisfying the above constraint. We will then have to only look up the array at index x, and the value is Q(x).

For prime number generation, I asked a friend for the code for wheel factorization (fast prime generation) written in C++.

Final Script

#include <bits/stdc++.h>
using namespace std;

// Wheel factorization (don't know how it works, don't care)
vector<long long> sieve(const long long N, const long long Q = 17, const long long L = 1 << 15) {
    static const long long rs[] = {1, 7, 11, 13, 17, 19, 23, 29};
    struct P {
        P(long long p) : p(p) {}
        long long p;
        long long pos[8];
    };
    auto approx_prime_count = [](const long long N) -> long long {
        return N > 60184 ? N / (log(N) - 1.1)
                         : max(1., N / (log(N) - 1.11)) + 1;
    };

    const long long v = sqrt(N), vv = sqrt(v);
    vector<bool> isp(v + 1, true);
    for (long long i = 2; i <= vv; ++i)
        if (isp[i]) {
            for (long long j = i * i; j <= v; j += i) isp[j] = false;
        }

    const long long rsize = approx_prime_count(N + 30);
    vector<long long> primes = {2, 3, 5};
    long long psize = 3;
    primes.resize(rsize);

    vector<P> sprimes;
    size_t pbeg = 0;
    long long prod = 1;
    for (long long p = 7; p <= v; ++p) {
        if (!isp[p]) continue;
        if (p <= Q) prod *= p, ++pbeg, primes[psize++] = p;
        auto pp = P(p);
        for (long long t = 0; t < 8; ++t) {
            long long j = (p <= Q) ? p : p * p;
            while (j % 30 != rs[t]) j += p << 1;
            pp.pos[t] = j / 30;
        }
        sprimes.push_back(pp);
    }

    vector<unsigned char> pre(prod, 0xFF);
    for (size_t pi = 0; pi < pbeg; ++pi) {
        auto pp = sprimes[pi];
        const long long p = pp.p;
        for (long long t = 0; t < 8; ++t) {
            const unsigned char m = ~(1 << t);
            for (long long i = pp.pos[t]; i < prod; i += p) pre[i] &= m;
        }
    }

    const long long block_size = (L + prod - 1) / prod * prod;
    vector<unsigned char> block(block_size);
    unsigned char* pblock = block.data();
    const long long M = (N + 29) / 30;

    for (long long beg = 0; beg < M; beg += block_size, pblock -= block_size) {
        long long end = min(M, beg + block_size);
        for (long long i = beg; i < end; i += prod) {
            copy(pre.begin(), pre.end(), pblock + i);
        }
        if (beg == 0) pblock[0] &= 0xFE;
        for (size_t pi = pbeg; pi < sprimes.size(); ++pi) {
            auto& pp = sprimes[pi];
            const long long p = pp.p;
            for (long long t = 0; t < 8; ++t) {
                long long i = pp.pos[t];
                const unsigned char m = ~(1 << t);
                for (; i < end; i += p) pblock[i] &= m;
                pp.pos[t] = i;
            }
        }
        for (long long i = beg; i < end; ++i) {
            for (long long m = pblock[i]; m > 0; m &= m - 1) {
                primes[psize++] = i * 30 + rs[__builtin_ctz(m)];
            }
        }
    }
    assert(psize <= rsize);
    while (psize > 0 && primes[psize - 1] > N) --psize;
    primes.resize(psize);
    return primes;
}

int main() {
	long long ns[] = { 74, 85, 111, 121, 128, 149, 174, 191, 199, 213, 774, 6856, 9402, 15616, 17153, 22054, 27353, 28931, 36891, 40451, 1990582, 2553700, 3194270, 4224632, 5969723, 7332785, 7925541, 8752735, 10012217, 11365110, 17301654, 26085581, 29057287, 32837617, 39609127, 44659126, 47613075, 56815808, 58232493, 63613165 };
	long long cs[] = { 60, 244, 26, 208, 138, 23, 124, 76, 223, 33, 223, 176, 18, 184, 78, 250, 217, 45, 102, 250, 212, 149, 240, 102, 109, 206, 105, 0, 125, 149, 234, 217, 10, 235, 39, 99, 117, 17, 55, 212 };
	vector<long long> v = sieve(2e9);

	long long psuedoprimes[] = {
		271441,904631,16532714,24658561,27422714,
		27664033,46672291,102690901,130944133,196075949,
		214038533,517697641,545670533,801123451,855073301,
		903136901,970355431,1091327579,1133818561,
		1235188597,1389675541,1502682721,2059739221,
		2304156469,2976407809,3273820903,
	};

	for (auto pp : psuedoprimes) v.push_back(pp);
	sort(v.begin(), v.end());

	for (size_t i = 0; i < sizeof(ns)/sizeof(ns[0]); i++) {
		// When ns[i] = 0, return 0 (not in the generation method)
		// When ns[i] = 1, return 1 (not a prime)
		char k = ns[i] < 2 ? ns[i] : (v[ns[i] - 2] & 0xff);
		cout << char(cs[i] ^ k);
	}
	cout << endl;

	return 0;
}

Flag : SECCON{3b4297373223a58ccf3dc06a6102846f}

David Tan