2013-05-26

65c816: from assembly to C: a quest for die hard retroengineers!

Some things are impossible to accomplish

For the rest, it's just that you did not try hard enough.

a skeptic (sketchic?) Ray Arnold
This simple fact is behind my interest in 65c816 and the old games that were developed with it: challenge.

Although I had some Intel assembly experience, I clearly overestimated it - deep diving into 6502/65c816 assembly taught me (the hard way) how things work under the hood, and it's a fashioning digital mechanical world.

On reverse engineering

Here be dragons! read more at Wikipedia and here, talking about ethics in computing.

What will happen when there will be a law to regulate everything? You will cease to be free.

This information is provided for learning purposes, not to reverse engineer some company algorithm to steal the precious industrial secrets.
So follow my advice and use reverse engineering for ethical purposes :)

Learning material

A good collection of reference material is essential to succeed. These are the sources I used most often:
In general the other SNES resources at wikibooks are of good level, you can start from there.

As a general advice: DO NOT use badly written manuals/guides (that is the norm nowadays), or anything that leaves you a doubt. Those doubts will be accumulating even if you don't notice it, and you will waste your time in frustration.
When dealing with assembly you really need to reduce number of variables you are dealing with! Make sure you have a grasp of each conditional branching (vs CPU flags) and how ROL,ADC,SBC work - tip: carry is not trivial! ;)

Other interesting pages


What I have learnt:

  • most past games ('90s era) were written directly in assembly (with macro aid, at best)
  • you had to be very creative to fit the limited resources of the consoles
  • dead code and mistakes are commonplace in these games
  • reverse-engineering without tools is hardly possible - I estimate that with double brain size I could probably have a better shot at it

Context limitation

We are human beings, thus we work best when we can focus on a few related abstractions rather than on a wide context with little correlation.

I am going to release a tool, asmdecor, that I developed as an aid for tool-assisted reverse engineering, hopefully they will be of help to other retroengineers as well. 

Labels are your friends!

Doesn't matter if correct - we need labels to reference abstractions. Absolute/Relative memory addresses are not easy to remember.

NOTE: in assembly world there is little difference between a label and a variable, since they both point to a memory location (although in different banks). Here I use them interchangeably for this reason.

I have come up with these rules to assign labels:
  • assign a label to the variables you are currently focusing on
  • remember to change the label once you have a better insight into its usage
  • if you did not succeed, remove the label - next time you will want to start from scratch instead of being conditioned by your previous (inconcludent) ideas
  • if some variable turns out to be too much generically described, drop it
  • make a smart usage of "copyOf", "previous", "counter" etc. basically you should favour relation-rich names
The tool I developed, asmdecor, helps with:
  • variables used only inside a specific procedure, prefixed as "tmp"
  • variables used only for writing, prefixed as "writeOut"
  • variables used only for reading, prefixed as "const"
  • read/write variables, prefixed with "var"
  • procedures and branching point recognition
All variables also come with reference count in their name, to identify "high traffic"/"high value" buffers.

Tracing is your friend!

When trying to understand a piece of code you will find yourself countless times debugging/tracing (step by step) the code and comparing the values with your own version of the algorithm.

The more complex is the piece of code you are trying to understand, the worse it will be - and you will quickly find out that you are doing it ineffectively.

But this was a quest for die hard retroengineers, remember? So I will share with you how I approach this problem - read on!

Vertical tracing


What I have found useful is a technique that I call "vertical tracing" (or "core sampling"), effective on loops and state machines (decompression etc).

This technique consists in creating two indexed stripes, one from your original 65c816 code (master stripe) and one from your reverse-engineered code (carbon copy stripe).
Each of these stripes is made up of multiple (key, value) tuples, created as follows:
  1. normally you want to sample at each iteration, so you will "tether" the loop by using the counter variable as the index of your stripe
  2. you define a "collection spot" for your loop, e.g. a specific variable in your code that you want to monitor in the master stripe
  3. create the corresponding collection spot in your code
  4. compare the master and carbon copy stripes
At step (4) it is important that you throw an exception or otherwise crash when your program is not matching the master stripe.

Example of stripe

1: 0x6300
2: 0x64C0
3: 0x6A00
4: 0x62B0
5: 0x710A
6: 0x712A

In a more compact form:
const char *masterStripe = "0x6300, 0x64C0, 0x6A00, 0x62B0, 0x710A, 0x712A";

By using the compact form you can string-compare it to your carbon copy stripe and thus easily verify behaviour from within your code.

Note that you can also choose to collect the decision taken at a specific branch instead of the value of some variable.

How to generate the master stripe


The most convenient way is to create a trace of the unit of logic you want to reproduce, then use grep/bash scripts to extract the stripe.

Summary: how to write C code from 65c816 assembly

No, there is no magic tool to convert from assembly to C*.

"Isn't tracing a bit like DNA extraction?",
the wise raptor asked.



Tracing, variable context heuristics and procedure identification are the key ingredients.
asmdecor will help you with the latter two, but you still have to analyze it with your mind (and that's the fun part ;) ).

Having advised with all the above, I am left with just a few tips to give you:
  1. identify the procedures, and do not try to study too many procedures at once; ideally you should start with the minimum amount of procedures that are needed
  2. put comments near the branch points with the original addresses, they will be later on helpful for lookup when you are tracing or doing step-by-step debugging
  3. use a original dumps to initialize your buffers, like sections of WRAM and ROM
That's all folks, I hope this read was useful - send me your feedback!




* = It could be an interesting project, however the generated code would be little readable and you still would need to understand it

No comments:

Post a Comment