Aug 10

Reapplying the Decal: A giant jigsaw puzzle

Tag: Code, ProgrammingAdam Wright @ 3:29 pm

Well, people have asked for a post about how the “memlocs” used within Decal are found, and who am I to argue? This one will be a bit more hands on, for those that are willing. Bonus kudos (and maybe something from my Thistledown swag bag) for the first person to solve the puzzle at the end of this article pair!

Before we write a line of code, or even turn on our computers (wait! Don’t turn off your computer!), we have to again stop think about our problem. What’s a “memloc”? Why would we want a memloc? Where’s my glass of Pinot noir gone?

First, the spelling. “memloc” is a contraction of “memory location”. For this to make any sense, we’ll have to dig into how computers work a bit. We already know that computers are stupid – all they can do is follow instructions. Your computer has a specific device for doing this – the CPU. The CPU looks at memory, finds an instruction to execute, and runs it. Then it finds another one, and runs that. It keeps doing this from the moment you turn it on until it’s turned off.

We also know that common way of programming is to use “object orientation”. We write our programs in a special language the humans can more easily understand, and then we translate it into instructions computers can understand. In the case of AC, the language Turbine use is called “C++”. Almost certainly, the only language your CPU understands is called “x86 assembly”. The translator turns objects written into C++ into this much more verbose and complex assembly language, so that your computer can execute them.

Why do we care? Because in the AC client that Turbine wrote, there are lots of fun groups of instructions that were the original Turbine objects. If we could get hold of these objects, we can extend the client with Decal much more easily. Rather than making Decal “type for you” to interact, we can just use the same objects that Turbine uses to interact with it.

Unfortunately, lots of information is lost in this translation phase (called “compilation”). Because humans don’t need to understand the instructions anymore the “compiler” can make lots of changes to make it easier for the computer to execute them. This is unfortunate for us as we don’t get to see the C++ that Turbine wrote – all we have is the big assembly instruction list! This is now hard to read and hard to understand, but if we want to use these objects, we’re going to have to try and make some sense of it.

We’ll have to “undo the translation” far enough that we can see roughly how the original objects work, and what places in the assembly code correspond to each object. These are called the “instruction addresses”, but as your computer loads the instructions into memory so it can execute them, we can just as easily call them “memory locations” - “memlocs”!

Right, we all now understand why we want “memlocs”, what they are, and where they come from. So, how hard can it be? We’ll read through these instructions until we find what we’re looking for, note them down, and go home early! But, as always, something’s there to trip us up. The first problem, the one you as users see most often is that every time Turbine translates their C++ into assembly (every patch, basically), the compiler has different work to do. The translation ends up being slightly different every time, and this is why we have to find the “memlocs” again every month. Sometimes, Turbine don’t make many changes, the translation is similar, and the job is easy. Sometimes, the C++ changes a lot and so the assembly changes a lot – just like when the expansion was released. In this case, everything is moved a lot, some objects are deleted entirely, and new objects take their place, resulting in a totally new set of instructions.

But, we can deal with this – we just plod along every month and find them again. This being far too kind, the world throws another problem at our feet – There are approximately 1,380,000 instructions in the client executable! Even with some of the helper tools used, this is a lot of things to read and piece together. The size of the client is a prime factor in how long it takes to find the addresses we’re after.

So how is it done? Well, the easiest way to demonstrate this is to show you. We’ll make a sample program of our own, compile it, and then work out a “memloc” from the compiled result. So, tune in for part 2 and see code created, ripped apart, and stuck back together in a grotesque mockery of education!

Adam Wright (as Asriel)

Disclaimer: This explanation is actually a simplification of what goes on, but it’s close enough to be useful.

Edit: To once again prove that whilst spell checkers can fix the spelling of your words, they can’t fix the meaning.

12 Responses to “Reapplying the Decal: A giant jigsaw puzzle”

  1. Cd Locke says:

    Adam your great! Reminding me of my night school Teacher
    that I took to learn how to use my first computer.
    I ended up being an asst. in the class after a speical
    summer Class because everyone would come to me for answers
    rather then the Teacher. I then got my classes for free.
    I had to give that up when C++ came out. But I do still
    understand.

  2. Fye says:

    Adam, I’m really enjoying these articles. I’ve been programming with C++ and various other languages for awhile but have never gotten into reverse engineering etc.

    Thanks!

  3. Bh Gambit says:

    Wheres the puzzle bro? I want some free goodies:) Nice reading while the servers are down, and hopefully with this information available to the player base there will be a lot less questions on when your gonna be done…Best of luck.

  4. Kyle says:

    I wish school was as interesting as this when I took classes. Are you a cse teacher by chance? If so what school do you teach at?

  5. Ling says:

    Thanks for the outside view of a complex inside subject.

    P.S. Looked for meaning of “Retraction” in Wikepedia, Google, various others. Please clarify that you didn’t mean something like the following: “Retraction of the foreskin alone is a simple and effective alternative to circumcision in managing most boys with a symptomatic, non-retractable prepuce.” :)

  6. Weyfarere says:

    He may have meant, ‘“memloc” is a contraction of “memory location”’.

  7. Adam Wright says:

    Thanks Ling/Wayfarere - I did indeed mean contraction, and I’ve corrected the error. More updates will come later today or early tomorrow, as I’ve only just got back from London.

  8. Darth Mord says:

    Asriel, all I can say is I wish you guys had done this little series sooner. This is a fun read and I have enjoyed the last few articles you have put up. It certainly puts a different perspective as to what goes on behind the scenes.

    Thank you.

  9. Enolive says:

    I applaud your effort to educate the masses (including my huddled, collective selves).
    Thank you! I look forward to your next article.

  10. Joseph Bruno says:

    We had exactly this problem back in the 1980s, with a couple of products we wrote that patched MSDOS 3 to (a) optimize its file access and (b) add transparent data encryption. Initially we just got hold of DOS, disassembled it, and hard-coded the patch locations into our program. We added checks to make sure that the memory locations contained what we expected them to contain– if they didn’t, our program told the user “please send us a bootable DOS disk so we can analyse it and make our product work on your system”.

    That got a bit boring - every computer manufacturer seemed to have built their MSDOS a little differently - so we ended up doing some genetic engineering. For every patch we needed to make, we identified a pattern of assembler code near that patch. Now, when the user ran our program, our program would search for each of the patterns within the MSDOS code, and when it had made sure that it had found them all, this meant it knew where to insert all the patches. So our program would work on versions of MSDOS that we’d never even seen.

    This is exactly what happens in DNA engineering, where a “restriction enzyme” looks for a pattern of genetic code and splits the DNA molecule at that point.

    This sort of technique might save you some work (unless you’re planning to tell us, in the next part, that that’s what you’re doing). If Turbine changed the C++ compiler then you’d be stuck, because the assembler code emitted by one compiler will be quite different from that emitted by another; but it is very, very rare for a software manufacturer to change compilers. Even if your existing compiler has bugs, at least you know (by now) what the bugs are. If you changed to a new compiler, those old bugs would probably be gone - but how long would it take you to find the new ones?!

  11. Miss Stephanie of WE says:

    At our shop, we have used OTS software that takes assembly code and translates it back to third generation language. Is something like this available to you?

  12. Archgrove - Implicit Definition » Reapplying the Decal Annex: Answering the comments says:

    [...] k of this work, and he’s had a much harder job than I have by an order of magnitude. To Joseph Bruno, we have used a pattern matching [...]