Aug 12 2005

Reapplying the Decal: Learning to use the cutting tools

Tag: Code, ProgrammingAdam Wright @ 2:13 pm

Welcome back! Last time, we defined what a “memloc” was and why we might want one, so anyone new to this series, please read up on the previous articles from this list. Also, for the rest of these articles, you’ll need a so-called “hex editor” – basically a glorified text editor that can edit anything, not just text. I’ll recommend frHED to you, as it’s solid and free (though you can use whatever you like). Download it and install it, but don’t be scared by it. All will become clear.

Before we can find a “memloc” of our own, we’ll need something to dissect. As the AC client is far too complex for an introductory exercise, the best approach will be to make a sample program to work on. Lets use the canonical “Hello world” example, whereby the words “Hello” and “World” are printed to the screen and the program waits for us to press enter.

I’ll write our “Hello world” program for us all in the same language Turbine use (C++) and I’ll use the same compiler that they do (MS Visual Studio 7.1). For those disappointed about not seeing the C++, well, that would rather spoil the exercise. We want to work in conditions that mirror what we go through with Decal, just with some of the extra complications removed. Also I have “Super Bonus Question” regarding this, so just hold it together for a while longer!

[Sound of keyboard and compilation]

Done! Here’s one “Hello World” client, ready for us to download. Grab it, extract it and run it to get the idea. I hope that at least some of you will trust that I’m not trying to break your machine, but if you’re worried then please just wait a few days and I’ll post the original source.

Right, we’ve now got everything we need to begin! However, being the conscientious developers that we are, before we gleefully jump in and start dismantling programs we need to make sure we understand what we’re dealing with. Hence the rest of this session will deal with how the big instruction list that makes up your program is stored on your machine. Don’t worry, that tasty little example program will still be waiting for us.

Let’s start with a program. What is a program? The answer you’ve probably come up with is “An exe file”, and that’s a damn good start. As far as our users are concerned, programs are exe files (exe for “Executable”). Inside an exe file is the big instruction list that makes up the program and the extra data it will need to run. The pressing question becomes “how is our instruction list stored”?

You know that you can store your images on your computer in lots of different file types. You’ve no doubt used bitmap files (.BMP), JPEG files (.JPG) and many others. These are the file formats and, like images, programs have file formats as well - Indeed, the “.EXE” file format is called “Portable Executable” and it defines where in the file we’ll find our instruction list and where we’ll find our data. Using a special tool, we can find out that PE format says that, for our program, the instruction list will be at the start of the program, and the data will follow straight after it.

Advanced bonus question: Given that Windows programs are only designed to run on Windows, what does the “Portable” in “Portable Executable” refer to?

What we expect to find in the instruction list for helloWorld.exe would seem to be obvious – it’ll just be a list of instructions telling our machine to print “Hello world”! But, don’t forget, in the translation from the C++ into the instruction list the compiler will have added some more instructions, things that whilst not directly printing “Hello world” are necessary for the computer to finish the job.

What our “data” is might be less obvious. In general, the data is everything the instructions need to complete their task and as in this case our task is writing to the screen, the instructions will deal solely with that. What they actually print is up to us, and as such “Hello world” is our data.

We now know what the compiler put into our executable file. We’ve got a list of instructions at the beginning that will tell the computer to print something to the screen, as well as some extra instructions to help it along. This will be followed by what will actually be printed, in our data section (which might also contain some other useful data added by the compiler). Let’s check that the reality matches what we’ve learned, so open frHED and load into it the executable file “helloWorld.exe”. What it shows you might look scary, but don’t be put off – it’s actually really simple.

The left hand column shows you the position the line being shown takes in the file. The middle column is the actual data in the file. Both these columns are displayed in hexadecimal, but don’t worry if you don’t know it - we’re only interested in the right hand column, which interprets the data for us as normal text. If we’re correct, somewhere near the bottom of this column will be the data section containing (at least) the words “Hello” and “World”, so scroll down and…yes! There they are! Our new knowledge matches the theory, and the universe makes sense. Fantastic!

In summary, we’ve built our sample program, learned a little bit about how it’s stored and then checked this knowledge against the real world. This is an important cycle in scientific and semi-scientific disciplines. Learn, hypothesise, validate. Whenever learning something new, follow it often and you’ll find the subject seems far more alive than just reading a textbook.

Next time, I’ll talk a little bit about how your CPU actually executes the instruction list. Finally, part 4 will actually have us in there, hands dirty, hacking away changing what helloWorld.exe does without ever having seen the source code! I hope you can contain your excitement, because I’m having trouble!

Adam Wright (as Asriel).

PS – Yes, I know I said this would be a two part article, but the amount of back story needed to make sure everyone has a chance of playing along would have made this part way to long to digest. Sorry for my misestimate, and I hope no-ones too bothered. After this set of articles is done, I do have at least one more stand alone planned targeting a specific problem we’ve had in updating Decal.


Aug 11 2005

Reapplying the Decal Annex: Answering the comments

Tag: Code, Personal, ProgrammingAdam Wright @ 11:26 pm

My apologies for not getting part 5 of “Reapplying the Decal” up today, but I do have a (pathetic) excuse. I was in London for the majority of the my time, and in what I had left for writing, I decided that one more part wouldn’t cut it for the “memlocs” section. Expect to see part 2 of a total 3 or 4 tomorrow!

To at least write something, I’ll answer a few of the comment questions. Anyone not interested in what the comments have said, feel free to tune out now and return for the article on Friday.

First of all, a myriad of thanks to those who’ve left supporting comments, and I’ll even spare a couple for those who’ve don’t think this was particularly useful. Criticism is always useful; as long as one can extract the barbs it often comes with. I didn’t expect a lot of interest when I started writing these, but I’m glad that people found them interesting.

Regarding personal questions, I don’t normally like to talk about myself as I prefer to let my work speak for me. Nonetheless, for Kyle, no – I’m not a teacher nor have I ever formally taught. At the moment, I’m a mature student (over 21) majoring in Mathematics with a minor in Computer Science.

As for those wondering about a donations link, don’t expect to see one on my site anytime soon (I cannot and do not speak for the rest of the developers). I’m fortunate to be blessed with a comfortable life, and I do this purely because I find it interesting. On the remote chance that something here really provokes the need for a fiscal contribution, there are many good charities and groups that I’d much rather see the money go to. I’ll value anything given to them (or other good causes) much the more than any money you could ever give me. Oh, and if you do donate because of something I’ve done, please send me an e-mail that we might share the warm fuzzy feeling.

Now, the meaty technical questions and advice regarding memlocs. I should first say that I’m not overly involved in finding the function and object addresses at the moment – Hazridi has shouldered the bulk of this work, and he’s had a much harder job than I have by an order of magnitude.

To Joseph Bruno, we have used a pattern matching technique in the past to help locate functions changed by only address additions (and other minor alterations). At one point, someone (alas, I can’t remember who) wrote a tool that would find all the functions for us. But, as you say, this sort of device is vulnerable to failure when significant changes are made, or when the compiler is updated. The best of my memory says we’ve lived through two compiler updates so far - VC6 to VC7, and VC7 to VC7.1 We might have also suffered an optimisation flag change (which is just as damaging), I’m sure one of the other developers will remember far better than I. The idea is certainly sound, and I’m Decal will use it again when the client is stable.

To Miss Stepahnie, it’s an intriguing idea, but I’ve yet to find any decompiler that will produce anything useful from compiled C++. We have no debug symbols and compiler optimisations are used aggressively in the client, both contributing to a soup of code that, whilst in theory is reversible, would produce C that wouldn’t be much more better than assembly itself. The one to many instruction mapping you get with 3g to 1g alone would be a big issue (we’d have to go back to the assembly to know exactly where, and in what state we want to perform the call). However, I love being proven wrong so if you know of one, I’m sure we’ll gladly look at it!


Aug 10 2005

Reapplying the Decal: A giant jigsaw puzzle

Tag: Code, ProgrammingAdam Wright @ 3:29 pm

Well, people have asked for a post about how the “memlocs” used within Decal are found, and who am I to argue? This one will be a bit more hands on, for those that are willing. Bonus kudos (and maybe something from my Thistledown swag bag) for the first person to solve the puzzle at the end of this article pair!

Before we write a line of code, or even turn on our computers (wait! Don’t turn off your computer!), we have to again stop think about our problem. What’s a “memloc”? Why would we want a memloc? Where’s my glass of Pinot noir gone?

First, the spelling. “memloc” is a contraction of “memory location”. For this to make any sense, we’ll have to dig into how computers work a bit. We already know that computers are stupid – all they can do is follow instructions. Your computer has a specific device for doing this – the CPU. The CPU looks at memory, finds an instruction to execute, and runs it. Then it finds another one, and runs that. It keeps doing this from the moment you turn it on until it’s turned off.

We also know that common way of programming is to use “object orientation”. We write our programs in a special language the humans can more easily understand, and then we translate it into instructions computers can understand. In the case of AC, the language Turbine use is called “C++”. Almost certainly, the only language your CPU understands is called “x86 assembly”. The translator turns objects written into C++ into this much more verbose and complex assembly language, so that your computer can execute them.

Why do we care? Because in the AC client that Turbine wrote, there are lots of fun groups of instructions that were the original Turbine objects. If we could get hold of these objects, we can extend the client with Decal much more easily. Rather than making Decal “type for you” to interact, we can just use the same objects that Turbine uses to interact with it.

Unfortunately, lots of information is lost in this translation phase (called “compilation”). Because humans don’t need to understand the instructions anymore the “compiler” can make lots of changes to make it easier for the computer to execute them. This is unfortunate for us as we don’t get to see the C++ that Turbine wrote – all we have is the big assembly instruction list! This is now hard to read and hard to understand, but if we want to use these objects, we’re going to have to try and make some sense of it.

We’ll have to “undo the translation” far enough that we can see roughly how the original objects work, and what places in the assembly code correspond to each object. These are called the “instruction addresses”, but as your computer loads the instructions into memory so it can execute them, we can just as easily call them “memory locations” - “memlocs”!

Right, we all now understand why we want “memlocs”, what they are, and where they come from. So, how hard can it be? We’ll read through these instructions until we find what we’re looking for, note them down, and go home early! But, as always, something’s there to trip us up. The first problem, the one you as users see most often is that every time Turbine translates their C++ into assembly (every patch, basically), the compiler has different work to do. The translation ends up being slightly different every time, and this is why we have to find the “memlocs” again every month. Sometimes, Turbine don’t make many changes, the translation is similar, and the job is easy. Sometimes, the C++ changes a lot and so the assembly changes a lot – just like when the expansion was released. In this case, everything is moved a lot, some objects are deleted entirely, and new objects take their place, resulting in a totally new set of instructions.

But, we can deal with this – we just plod along every month and find them again. This being far too kind, the world throws another problem at our feet – There are approximately 1,380,000 instructions in the client executable! Even with some of the helper tools used, this is a lot of things to read and piece together. The size of the client is a prime factor in how long it takes to find the addresses we’re after.

So how is it done? Well, the easiest way to demonstrate this is to show you. We’ll make a sample program of our own, compile it, and then work out a “memloc” from the compiled result. So, tune in for part 2 and see code created, ripped apart, and stuck back together in a grotesque mockery of education!

Adam Wright (as Asriel)

Disclaimer: This explanation is actually a simplification of what goes on, but it’s close enough to be useful.

Edit: To once again prove that whilst spell checkers can fix the spelling of your words, they can’t fix the meaning.


Aug 10 2005

Reapplying the Decal: The quiz answers

Tag: Code, ProgrammingAdam Wright @ 2:51 pm

Well, despite embedding a couple of “bonus questions” into yesterdays post, I didn’t receive any answers. I’ll have to make the next ones easier! Anyway, here are the answers.

Question 1: What has become common since DirectX 6 that means DirectX 9 doesn’t need good colour key support?

The answer I was specifically looking for was “hardware accelerated alpha blending”. I would also have taken “transparency” and partial credit would be given for “3D cards”.

Alpha blending is the composition of two images together so that one of them appears to be translucent on top of the other. You’ve all seen the effects –smoke that clouds your vision, see through water etc. Whilst simple mathematically, these operations are pretty CPU intensive. Say we’re going to draw image A over image B, with image A slightly translucent. We have to read every pixel from image A, and blend the colour values with image B, to produce our final image. Even if you only run at 800×600 resolution, that’s 800*600 = 480000 pixels to read twice (one for each image), and the same amount of the actual blending operations – every frame!

Here’s another “Stuff you might want to know”TM aside.

When someone sends you an image via email, and might notice your paint program opening it as “24 bit”, and if you’re au-fait with graphics, you might have even seen “32 bit”. These “bits” refer to the colours in each pixel the image. Most of the time, 24 bit images are enough – 8 bits for red, 8 bits for green and 8 bits for blue produce all the colours your eye can resolve. So what’s the extra 8 for in 32 bit images? Why – alpha! These 8 bits say, “When you composite this image onto another one, here’s how translucent this pixel is”. These 8 bits are called the “alpha channel”.

The hardware of yore wasn’t really sophisticated enough to handle the operations required for alpha blending, but they did implement colour keying (as a concept, it turns up remarkably often). But today, with the advent of graphics cards that can do a bazillion blended polygons, having specific functionality for colour keying would be redundant. Programs are now expected to just provide an alpha channel for their images, and where they want to “see though” (in the way that colour keys provided), you just set your alpha bits to 100% transparent.

So, specific colour keying functionality in graphics APIs has largely gone the way of the dodo, to be replaced by something more flexible and useful. Vive la progress!

Question 2: Can you tell me why the final phase, which draws the plugin window and the control bar, doesn’t need to use the colour key drawing object?

The answer is basically “As the plugin window and control bar are rectangular, we don’t need to composite them in with colour keys – they cut right out without needing to be see through anywhere”.

This one required a bit of lateral thinking. We know from the first article that we’re now drawing straight onto the AC window – as such, we don’t need to draw onto a big “buffer” first, which will then be copied across. If we did need such a buffer, we’d still have to use colour keys (otherwise, the background of the buffer would completely cover the AC window). Additionally, because all the images we’re drawing are rectangular, and don’t have any see through portions, we don’t have to deal with colour keying for them either! So, we don’t need colour keying at all for the final phase of composition.

I hope that’s not too confusing!

As to further writing, I’ve had requests to make a post about how the memory locations (“memlocs”) are actually found. This is a pretty involved topic, but I’ll try and write something today; maybe I’ll even include a “find your own memloc” quiz! Wow, this just gets better and better, doesn’t it?


Aug 09 2005

Reapplying the Decal: Part 2

Tag: Code, ProgrammingAdam Wright @ 6:14 pm

Well, yesterdays article seems to have gone down quite well. So, as promised, here’s part 2, wherein colour keying is restored, code is demonstrated, and more screenshots are taken! Before continuing, I’ll ask those of you who didn’t read yesterday’s instalment to head on over. Go on – we’ll wait.

Sigh. OK, for those of you who didn’t bother, here’s a summary anyway.

Last time, on Decal Trek: The Next Implementation, we had just finished recreating suitable objects that will allow us draw onto the screen again. We’d tested our new objects, found some problems, and experimented with solutions to them. Eventually, we decided to cajole the old objects into accepting us. When we left off, we’d basically got everything work, except the colour keying.

The best way to solve a problem is to not have that problem at all. So, the first question is “why do we need colour keying at all”? Well, let’s explain how Decal puts things together – most UI rendering systems work in this way, so this is a N for the price of 1 education (with large N)!

The UI system is made up of lots of different components. Buttons, text boxes, check controls and others. These are all individual objects. Lets have a look at them, in an idealised form, one at a time.

Single controls example

Very nice. 3 controls all rendered onto a piece of paper, one at a time. This is pretty much how Decal works – we have lots of controls, and they’re drawn onto the “paper” of your AC window. But, Decal doesn’t look like a lot of controls all floating around on their own. The controls are organised into groups. Lets do that – put our controls onto our tab page, just like in game.

Controls without a colour key

Hmm. Not bad. As we can see, the button control works fantastically – it blends in perfectly. This is an artefact of the way the drawing works – it only deals with copying rectangles around. There is no other reasonable way. Computers like squares, everything can always be contained within a square, and it’s bloody hard to find the shape of some text saying “Hello”! So, we draw squares round everything, cut them out, and stick them onto the tab box.

So, as the button control is a square, we could cut it out perfectly. However, the checkbox isn’t quite a square – it’s a circle and some text. When we cut it out and drew it into the tab box, the background came with it, and gave us that nasty white box. So, let’s make white the colour key. When we copy over the rectangle, we’ll copy over every bit of the colour except when it’s white.

Controls with a colour key

Great! We didn’t bring the white with us, and because none of the control actually used white, it looks fine. Colour keys to the rescue! I hope this demonstrates their idea, need and functioning, so here’s a bonus question for astute readers: What has become common since DirectX 6 that means DirectX 9 doesn’t need good colour key support?

Having established the need for colour keys, let’s work out how to implement them in Decal again. We know we have our Decal image that we need to draw. As explained above, this image is put together out of lots of little images (the control objects) – this is called “composition”. We need to get colour keys working again for this composition phase.

Our main task yesterday reopened a route that allows us to use some of the old techniques that were locked off to us. We can now ask each object for a “DC” – a special object that the old fashioned drawing code uses. Because we can now get this DC, we can use some functions Microsoft provided in Windows that will largely do the colour keying for us. So, let’s change our drawing objects to use this DC during composition.

[Sound of keyboard]

Great! It seems to work! Except…oh dear. Once again, the ugly head of performance has been raised. It’s not as bad as yesterday, but it still gives a bit of problem, especially for old computers. As many AC players are still on older machines, we’re going to have to look at this. Wouldn’t life be so much easier if everyone upgraded his or her machines bi-monthly like us developers?

Taking stock (programming involves a lot of stopping and thinking to make sure one’s taking the right road), we remember that the objects we used yesterday were drawing at a nice speed. Our new DC rendering is somewhat slow. We read over our results, and look at our image.

Decal renderer without a colour key

Hmmm. There isn’t really a lot of the colour key present. This suggests that, perhaps, the colour key isn’t used everywhere. In fact, all the controls that are square don’t seem to use it at all! What if, instead of using our new, perfect, drawing object everywhere, we could sometimes the old, imperfect but fast one?

[Sound of keyboard]

Done! The drawing object can now choose which method it uses. That’s a good start – but the drawing object is concerned only with drawing. We’ll have to tell it what to use, and when. So, we’ll update all the image drawing, and some of the composition code, to only use the slow colour key method when it’s strictly necessary. We’ll analyse each image when it’s loaded to see if it needs a colour key, and configure the compositor never to use the colour key on the final phase (when it’s not needed). Bonus question 2: Can you tell me why the final phase, which draws the plugin window and the control bar, doesn’t need to use the colour key drawing object?

Superb – the colour key rendering is only performed when required now! We’re done. Let’s look at the results.

Decal renderer, finished

That looks pretty damn good, and whilst not quite as fast as the old Decal, is fast enough for now. Code wise, today’s changes are longer than yesterday – this is often the way with programming. We got 99% of the way there with 2 lines, then it took dozens to get the last 1%. In fact, writing it took 2 passes – one to test the idea, which produced functional, but messy code. Messy code will make it hard for the next person to get anywhere, so it was rewritten to be neater and more flexible for future users. For those interested in what the actual programming looks like, you can see changes I made in this file.

This concludes this short article pair on fixing the renderer. If people find these interesting, I can certainly write up some more on other parts of the system. Throw me some ideas, and I’ll try and respond.

Adam Wright (as Asriel)


« Previous PageNext Page »