Some things are impossible to accomplishFor the rest, it's just that you did not try hard enough.
|a skeptic (sketchic?) Ray Arnold|
Although I had some Intel assembly experience, I clearly overestimated it - deep diving into 6502/65c816 assembly taught me (the hard way) how things work under the hood, and it's a fashioning digital mechanical world.
On reverse engineeringHere be dragons! read more at Wikipedia and here, talking about ethics in computing.
What will happen when there will be a law to regulate everything? You will cease to be free.
This information is provided for learning purposes, not to reverse engineer some company algorithm to steal the precious industrial secrets.
So follow my advice and use reverse engineering for ethical purposes :)
Learning materialA good collection of reference material is essential to succeed. These are the sources I used most often:
- 6502 Instruction Set, this is technical and complete (you will not find wording like "I think", "maybe", "like the others" etc), many thanks to Cloudgen for putting this online
- SNES Hardware registers, good for reference
- Qwertie's SNES documentation, this is not of very good quality (and old!) but I've used it in the beginning
- Snes9x' cpu macros (cpumacro.h), when you are in doubt about how an instruction behaves, go read how it is emulated!
As a general advice: DO NOT use badly written manuals/guides (that is the norm nowadays), or anything that leaves you a doubt. Those doubts will be accumulating even if you don't notice it, and you will waste your time in frustration.
When dealing with assembly you really need to reduce number of variables you are dealing with! Make sure you have a grasp of each conditional branching (vs CPU flags) and how ROL,ADC,SBC work - tip: carry is not trivial! ;)
Other interesting pages
- Computer History and Emulation by Marat Fayzullin, check out his proposed instruction set features for 65c820, an interesting perspective of an user of the 65c816 instruction set about a processor that never existed - 65c820
- Easy 6502 by skrilldrick, a well done guide that I found later on
- tehskeen game tutorial, complete with zip files and sources (thanks to The Wayback Machine)
What I have learnt:
- most past games ('90s era) were written directly in assembly (with macro aid, at best)
- you had to be very creative to fit the limited resources of the consoles
- dead code and mistakes are commonplace in these games
- reverse-engineering without tools is hardly possible - I estimate that with double brain size I could probably have a better shot at it
Context limitationWe are human beings, thus we work best when we can focus on a few related abstractions rather than on a wide context with little correlation.
I am going to release a tool, asmdecor, that I developed as an aid for tool-assisted reverse engineering, hopefully they will be of help to other retroengineers as well.
Labels are your friends!Doesn't matter if correct - we need labels to reference abstractions. Absolute/Relative memory addresses are not easy to remember.
NOTE: in assembly world there is little difference between a label and a variable, since they both point to a memory location (although in different banks). Here I use them interchangeably for this reason.
I have come up with these rules to assign labels:
- assign a label to the variables you are currently focusing on
- remember to change the label once you have a better insight into its usage
- if you did not succeed, remove the label - next time you will want to start from scratch instead of being conditioned by your previous (inconcludent) ideas
- if some variable turns out to be too much generically described, drop it
- make a smart usage of "copyOf", "previous", "counter" etc. basically you should favour relation-rich names
- variables used only inside a specific procedure, prefixed as "tmp"
- variables used only for writing, prefixed as "writeOut"
- variables used only for reading, prefixed as "const"
- read/write variables, prefixed with "var"
- procedures and branching point recognition
Tracing is your friend!When trying to understand a piece of code you will find yourself countless times debugging/tracing (step by step) the code and comparing the values with your own version of the algorithm.
The more complex is the piece of code you are trying to understand, the worse it will be - and you will quickly find out that you are doing it ineffectively.
But this was a quest for die hard retroengineers, remember? So I will share with you how I approach this problem - read on!
What I have found useful is a technique that I call "vertical tracing" (or "core sampling"), effective on loops and state machines (decompression etc).
This technique consists in creating two indexed stripes, one from your original 65c816 code (master stripe) and one from your reverse-engineered code (carbon copy stripe).
Each of these stripes is made up of multiple (key, value) tuples, created as follows:
- normally you want to sample at each iteration, so you will "tether" the loop by using the counter variable as the index of your stripe
- you define a "collection spot" for your loop, e.g. a specific variable in your code that you want to monitor in the master stripe
- create the corresponding collection spot in your code
- compare the master and carbon copy stripes
Example of stripe1: 0x6300
In a more compact form:
const char *masterStripe = "0x6300, 0x64C0, 0x6A00, 0x62B0, 0x710A, 0x712A";
By using the compact form you can string-compare it to your carbon copy stripe and thus easily verify behaviour from within your code.
Note that you can also choose to collect the decision taken at a specific branch instead of the value of some variable.
How to generate the master stripe
The most convenient way is to create a trace of the unit of logic you want to reproduce, then use grep/bash scripts to extract the stripe.
Summary: how to write C code from 65c816 assemblyNo, there is no magic tool to convert from assembly to C*.
|"Isn't tracing a bit like DNA extraction?",|
the wise raptor asked.
Tracing, variable context heuristics and procedure identification are the key ingredients.
asmdecor will help you with the latter two, but you still have to analyze it with your mind (and that's the fun part ;) ).
Having advised with all the above, I am left with just a few tips to give you:
- identify the procedures, and do not try to study too many procedures at once; ideally you should start with the minimum amount of procedures that are needed
- put comments near the branch points with the original addresses, they will be later on helpful for lookup when you are tracing or doing step-by-step debugging
- use a original dumps to initialize your buffers, like sections of WRAM and ROM
* = It could be an interesting project, however the generated code would be little readable and you still would need to understand it