Progress!

I haven’t written anything here in a while, but I’ve made a lot of progress: we’re now successfully scrolling in the Nintendo logo, loaded from the game ROM. And we finally wrote some opcode functions that determine the target and source registers from bits of the opcode themselves. For example, we now have an operation::LD that we use 63 times, instead of needing to define 63 different functions.

Disassembler?

I was thinking about writing a disassembler, but I was confused.

If the entire ROM were nothing but opcodes, I could parse that, one at a time.

But if big chunks of it are random data, how can I tell that apart? If I start parsing those as instructions, I’ll probably be mis-aligned when I get back to actual instructions.

I guess it’s not solvable in the 100% general case, since programs can in theory (but probably only in the case of like demos in practice) use the same memory as both codes and data.

I imagine you can track known opcode addresses, starting with the entry point and then looking at any of its static jumps. but you can also do computed jumps to any address, and you can’t figure out every possible value for those.

Maybe you just statically analyze what you can (assumiung no data/code co-use), and then get some guidance from analyzing actual use during execution, and only use humans for the remaining disambiguation.

Maybe I should do some reading instead of just speculating.

I guess partial disassembly might not be too bad: you just include the unrecognized segments in the output as data sections, so you can preserve them while making (at least certain) modifications to the other code around them.

That actually sounds quite reasonable.

be okay… what exactly does that mean?

It means that your recompiled result will be identical IFF your modified code is exactly the same size as the input, so you don’t mess up any alignment. If you’re careless, you will mess up the alignment, but there are lots of classic assembly/ROM hacking techniques for making changes to code without messing up alignment. so, I might want to make sure that the recompiled code doesn’t move anything around – just aborting when you try to recompile, if you might have messed up alignment. every instruction might be prefixed with its address in the original compiled binary:

    0x0000:
        ADD B, $0F
    0x0002:
        ADD C, A
    0x0003:
    DATA 0x125413542652466476462426524

So if you replace ADD C, A with an instruction that requires more than one byte, it won’t be able to keep the data in its original 0x0003 address, because there won’t be enough space.

But First

I’ve also described a few more concrete objectives in GitHub issues:

Opcodes
https://github.com/jeremyBanks/0dmg/issues/1
Graphics debug pannes
https://github.com/jeremyBanks/0dmg/issues/2
Interrupts (writing this was eductional)
https://github.com/jeremyBanks/0dmg/issues/3
Blargg’s test suites
https://github.com/jeremyBanks/0dmg/issues/4
Command-line debugger capabilities
https://github.com/jeremyBanks/0dmg/issues/5

I’ll keep mulling over the idea of a disassembler and assembler, but might not start unless I’m more confident how I’d want to to integrate that code with the interpreter.