Aug 12

Reapplying the Decal: Learning to use the cutting tools

Tag: Code,ProgrammingAdam Wright @ 2:13 pm

Welcome back! Last time, we defined what a “memloc” was and why we might want one, so anyone new to this series, please read up on the previous articles from this list. Also, for the rest of these articles, you’ll need a so-called “hex editor” – basically a glorified text editor that can edit anything, not just text. I’ll recommend frHED to you, as it’s solid and free (though you can use whatever you like). Download it and install it, but don’t be scared by it. All will become clear.

Before we can find a “memloc” of our own, we’ll need something to dissect. As the AC client is far too complex for an introductory exercise, the best approach will be to make a sample program to work on. Lets use the canonical “Hello world” example, whereby the words “Hello” and “World” are printed to the screen and the program waits for us to press enter.

I’ll write our “Hello world” program for us all in the same language Turbine use (C++) and I’ll use the same compiler that they do (MS Visual Studio 7.1). For those disappointed about not seeing the C++, well, that would rather spoil the exercise. We want to work in conditions that mirror what we go through with Decal, just with some of the extra complications removed. Also I have “Super Bonus Question” regarding this, so just hold it together for a while longer!

[Sound of keyboard and compilation]

Done! Here’s one “Hello World” client, ready for us to download. Grab it, extract it and run it to get the idea. I hope that at least some of you will trust that I’m not trying to break your machine, but if you’re worried then please just wait a few days and I’ll post the original source.

Right, we’ve now got everything we need to begin! However, being the conscientious developers that we are, before we gleefully jump in and start dismantling programs we need to make sure we understand what we’re dealing with. Hence the rest of this session will deal with how the big instruction list that makes up your program is stored on your machine. Don’t worry, that tasty little example program will still be waiting for us.

Let’s start with a program. What is a program? The answer you’ve probably come up with is “An exe file”, and that’s a damn good start. As far as our users are concerned, programs are exe files (exe for “Executable”). Inside an exe file is the big instruction list that makes up the program and the extra data it will need to run. The pressing question becomes “how is our instruction list stored”?

You know that you can store your images on your computer in lots of different file types. You’ve no doubt used bitmap files (.BMP), JPEG files (.JPG) and many others. These are the file formats and, like images, programs have file formats as well – Indeed, the “.EXE” file format is called “Portable Executable” and it defines where in the file we’ll find our instruction list and where we’ll find our data. Using a special tool, we can find out that PE format says that, for our program, the instruction list will be at the start of the program, and the data will follow straight after it.

Advanced bonus question: Given that Windows programs are only designed to run on Windows, what does the “Portable” in “Portable Executable” refer to?

What we expect to find in the instruction list for helloWorld.exe would seem to be obvious – it’ll just be a list of instructions telling our machine to print “Hello world”! But, don’t forget, in the translation from the C++ into the instruction list the compiler will have added some more instructions, things that whilst not directly printing “Hello world” are necessary for the computer to finish the job.

What our “data” is might be less obvious. In general, the data is everything the instructions need to complete their task and as in this case our task is writing to the screen, the instructions will deal solely with that. What they actually print is up to us, and as such “Hello world” is our data.

We now know what the compiler put into our executable file. We’ve got a list of instructions at the beginning that will tell the computer to print something to the screen, as well as some extra instructions to help it along. This will be followed by what will actually be printed, in our data section (which might also contain some other useful data added by the compiler). Let’s check that the reality matches what we’ve learned, so open frHED and load into it the executable file “helloWorld.exe”. What it shows you might look scary, but don’t be put off – it’s actually really simple.

The left hand column shows you the position the line being shown takes in the file. The middle column is the actual data in the file. Both these columns are displayed in hexadecimal, but don’t worry if you don’t know it – we’re only interested in the right hand column, which interprets the data for us as normal text. If we’re correct, somewhere near the bottom of this column will be the data section containing (at least) the words “Hello” and “World”, so scroll down and…yes! There they are! Our new knowledge matches the theory, and the universe makes sense. Fantastic!

In summary, we’ve built our sample program, learned a little bit about how it’s stored and then checked this knowledge against the real world. This is an important cycle in scientific and semi-scientific disciplines. Learn, hypothesise, validate. Whenever learning something new, follow it often and you’ll find the subject seems far more alive than just reading a textbook.

Next time, I’ll talk a little bit about how your CPU actually executes the instruction list. Finally, part 4 will actually have us in there, hands dirty, hacking away changing what helloWorld.exe does without ever having seen the source code! I hope you can contain your excitement, because I’m having trouble!

Adam Wright (as Asriel).

PS – Yes, I know I said this would be a two part article, but the amount of back story needed to make sure everyone has a chance of playing along would have made this part way to long to digest. Sorry for my misestimate, and I hope no-ones too bothered. After this set of articles is done, I do have at least one more stand alone planned targeting a specific problem we’ve had in updating Decal.

8 Responses to “Reapplying the Decal: Learning to use the cutting tools”

  1. Enolive says:

    Its portable because the format is not architecture-specific.

    Thanks for the write-up! I look forward to more. :)

  2. Adam Wright says:

    Yikes, a fast and entirely correct answer. Being somewhat unprepared, I’ll just say well done, and elaborate later!

  3. James Bray says:

    Nice article.

    As a long time programmer that has never had the opportunity to delve much below C pointers, I’ve always been fascinated by the black art of dissassembly.

    Looking forward to the rest of the series….

    James Bray

  4. Leanne says:

    This is fun stuff–I anxiously await the next installment!

  5. Tom says:

    You make complex topics seem very simple. Thanks for sharing your gift!

  6. Eugene says:

    I am still waiting for Sweet Mary’s comments…

  7. Ling says:

    The term “Portable Executable” was chosen because the intent was to have a common file format for all flavors of Windows, on all supported CPUs (http://msdn.microsoft.com/msdnmag/issues/02/02/PE/default.aspx).

  8. instigater says:

    HAIL Asriel thanks for setting this up its really a nice read good fortune to you and yours