Development notes/Windows

DLL injection, debug loader, both, or what?

So... originally I wanted to refactor the whole functionality of thcrap into a single DLL, so that we are portable to any other patch that needs debug-level access (most prominently, this would be Rollcaster).

First problem is that we have to somehow replicate the breakpoint functionality without the Win32 debug API.

... which, already, seems to be impossible while debugging the game.

Solution: Force UnhandledExceptionFilter to always run the custom exception filter, even in debug mode.

@7c864065: e9 97 00 00 00 90 (for XP SP3's kernel32.dll)

But modifying kernel memory is like the most cruel hack ever, and I'd be surprised if virus scanners didn't scream at this. Also, it's OS- and, probably, even version-dependent.

And it doesn't work for Visual C++, anyway.

Is this even a valid test case?

... after all, we couldn't even debug some of the hacks with the old approach.

OK, Let's Do This The Hard Way.

As in: Except for breakpoints, call a DLL function.

Step 1: Assemble CALL [thcrap.BreakpointHandler] at the breakpoint. (Using CALL instead of JMP pushes the "return address" on the stack - which can be used to derive the breakpoint address we need) (th13.exe test scenario: breakpoint on 47bda0, original code: e8f79e0000)

Step 2: Before doing anything else, push all registers on the stack. We don't want to accidentally overwrite our breakpoint parameters.

(Which means that BreakpointHandler needs to be fully coded in assembly - any C function does some register juggling of its own before even executing its first instruction)

-> (2012-12-01: Not true! See __declspec(naked)!)

__asm {
	pushfd
	pushad
}

Step 3: Breakpoint address (plus 5) is now in [esp+20].

__asm {
	sub dword ptr ss:[esp+20], 5
	mov eax, dword ptr ss:[esp+20]
}

Step 4: Build parameter array by reading the values saved in Step 2.

Step 5: Write original code back to breakpoint address. Don't forget to "VirtualUnProtect" the memory.

__asm {
	sub esp, 20
	mov ecx, esp
	push 1c
	push ecx
	push eax
	call KERNEL32.VirtualQuery
	
	mov ebp, esp
	mov ecx, ebp
	add ecx, 1c
	push ecx 	; lpflOldProtect
	push 40 	; flNewProtect = PAGE_EXECUTE_READWRITE
	push dword ptr ss:[ebp+c]	; dwSize = mbi.RegionSize
	push dword ptr ss:[ebp]  	; lpAddress = mbi.BaseAddress
	call KERNEL32.VirtualProtect
	
	; ... then write stuff back.
	
	mov ecx, ebp
	add ecx, 1c
	push ecx 	; lpflOldProtect
	push dword ptr ds:[ecx] 	; flNewProtect = lpflOldProtect
	push dword ptr ss:[ebp+c]	; dwSize = mbi.RegionSize
	push dword ptr ss:[ebp]  	; lpAddress = mbi.BaseAddress
	call KERNEL32.VirtualProtect
	
	add esp, 20
}

Step 6: Restore registers and return.

__asm {
	popad
	popfd
	retn
}

... oh, wait, and how do we make this repeatable? WE CAN'T.

The obvious solution would be to build code caves to execute the code we've overwritten, and to then jump back. But we can't know the instruction boundaries...

... except if we manually determine them and add them as an additional parameter to each breakpoint.

But this also comes with harsh restrictions: no relative jumps or calls within 5 bytes of the breakpoint. This already fails for the first breakpoint I looked at (th13 file_size).

... OK, good, we can place it a few bytes earlier so that it would still work.

And with another feature to make the execution of the code cave optional, this might just work out. Going to test this tomorrow by fully changing everything to that system, and seeing how far I'll get with this. \o

Building the code cave approach

First thing I noticed while looking through what I wrote yesterday: Saving the x86 general purpose registers does not a full snapshot make. At the very least, we'll be needing the flags register.
And the FPU stack. (?)
And the MMX registers. (?)
... well, most of the stuff in the CONTEXT structure, really.

(Hindsight: SetUnhandledExceptionFilter wouldn't have worked anyway. th13 calls this function to set its own "breakpoint handler". This would overwrite our handler and thus remove all of our breakpoint functionality.) (... OK, good, we can overwrite this easily, but who knows how that may affect the game.)

... Eh, wait, not so fast.

Let's first refactor our current prototype to use the JSON files.

Ideally, it should work without any parameters - you just feed the EXE, it gets hashed, and then the correct injection values are selected.

And by doing this, you notice that you do want all the build hashes in one file, after all.

So, how would we call the patch when it's done?

Either:

	thcrap <EXE file> <zipfile> <zipfile> <zipfile> <zipfile> ...

Or:

	thcrap <setup.js>

Example

th13-en-de-nodanmaku.js

{
    "patchcentral_dir_cmt": "Directory which holds game_versions.js and the rest of the game support data. Defaults to the same directory as thcrap.exe.",
    "patchcentral_dir": "T:\\thcrap\\central",
    "exe": "T:\\13\\i_am_a_renamed_13th_2hu_gaem.exe",
    "patches": [
        {
            "archive": "T:\\thcrap\\central\\th13\\en.zip"
        },
        {
            "dir": "T:\\thcrap\\central\\th13\\de\\",
            "ignore-files": [
                "spellcards",
                "themedb"
            ]
        },
        {
            "archive": "T: \\thcrap\\central\\th13\\nodanmaku.zip"
        }
    ],
    "font": "Calibri"
}

It should be obvious what this configuration does:

Applies English, German and danmaku removal patches, in this order. Text appears in German where available, otherwise in English, and if that's not done either, in Japanese.
German translations for spell cards and music titles are ignored, leaving them in English

On the command line, without an extra file:

	thcrap T:\13\i_am_a_renamed_13th_2hu_gaem.exe --font:Calibri 1:T:\thcrap\central\th13\en.zip 2:T:\thcrap\central\th13\de\ --ignore-files:spellcards,themedb 3:T:\thcrap\central\th13\nodanmaku.zip

A simpler one

	thcrap T:\09\th09.exe 1:T:\thcrap\central\th09\fr.zip

... setup.js is probably the preferred method ^_^ -> ... and, thus, the only one to be used for now. For the command line, I'd have to write a separate parser. JSON I can already easily evaluate, thanks to Jansson.

OK, now what.

Load setup file
Change directory to <patchcentral_dir>, if given
Hash EXE
Look up hash in game_versions.js
Load base + version-specific JSON support files
???
Run EXE
Inject thcrap.dll
Read the function pointers of all functions exported by thcrap.dll into a JSON hash table
Render and apply binary hacks

Now, that wasn't too difficult ^_^

The New Injection System

Right now, the patch code is separated into thcrap_loader.exe (DLL injection, processing of breakpoints) and thcrap.dll (Unicode compatibility hooks). This decision was necessary because we need a separate application to function as the debugger, as well as a command-line parameter pointing to the patch configuration.

As we're now rewriting the breakpoint system The Hard Way, the debugger part is no longer necessary. Instead, we can now aim for a maximum of compatibility with existing patches needing external loaders.

That leaves the parameter... which we could pass using named pipes~ (I've tried to use named papes before, but never got them to work, as they require precise synchronization of calls in both programs. But now, I've got them to work. Hooray for increased programming skill.)

This way, we could move the entire patch code to thcrap.dll, while the loader only has to do the DLL injection and the named pipe handling.

Bu~t we could go a bit further.

thcrap_injector.dll

All this DLL does is to inject thcrap.dll into a given process and pass the parameter via named pipe.

With this DLL, any program can use thcrap with as little effort as possible. Just call LoadLibrary, GetProcAddress, and execute the function.

And here's the catch: This process is so simple that we could even write it in assembly into programs where we don't have the source code for! (... OK, good, we need to find a way to pass the patch configuration, but that should be no big deal. Especially for patches which use configuration files on their own.)

Patched netplay via Adonis and Rollcaster, here we come.

Why do we need a separate DLL?

Looks better and draws attention to that feature.
thcrap.dll contains the named pipe and injection code. Loading this DLL without injecting it into a Touhou game may have dangerous and undesired consequences...

... and after re-writing the injection code, that one argument was so weak that I decided against it and just put all of it into thcrap.dll.

thcrap_loader.exe

This gets reduced to... exactly the same code we would have to write into other patches to load the thcrap_injector. Only that we can write it in C.

And, of course, we need to launch the .exe. Yeah, maybe this is _a bit_ backwards, as we have to parse the patch configuration file anyway to find out which game to start and which process to inject the DLL into.

Codecaves, again

Good news. Turns out we only need 5 bytes for the jump call instead of 6 (... really, what was I thinking). That's a big difference!

To ease implementation, we hardcode every cave to 16 32 bytes.

With the return jump, this leaves up to 11 bytes for copied code.

... alright, looking good OH SHIT, WE CAN'T EVEN INCLUDE RELATIVE CALLS AARRRGGHHH

... OK, good, most of the time, we only have one call at the beginning of the code cave. This means that we could add a hack to fix it...

... ...

Alright! Breakpoints _basically_ work now.

Patch stacking

... anyway, done already.

Which means that we're only missing one feature:

on-the-fly JSON -> .msg patching

Problem: Prototype is written in Python, and integrating Python into C would add some 1-2 MB of code to the program.

Since I hate bloat like this, and the rest of the thcrap binaries are still under 40 KB, including the Python code directly is not an option for me.

That leaves 2 possibilities:

Compile the Python to C using Shedskin
Rewrite the complete code in C

Rewrite in C

Alright, let's look through everything that will make that particularly difficult or annoying:

Parameters

Those are stored as JSON objects anyway.

Number arrays could be quickly and painlessly converted to variable-size C arrays (God bless alloca)
Function pointers are stored using the existing DLL export/function pointer interface used for the binary hacks.
I decided against conditional deletion anyway (yup, no centered single-line assist dialog boxes in th11 for you), as it would make the code way more ugly. And guess what, that would have also been the only difficult thing to rebuild in C.

... Two days later, the rewrite was complete - and guess what, the new patcher is not that much more complex than the Python version, and it even feels a lot cleaner!

What happened in the meantime:

The patchcentral definition in the run configuration was removed. Now, this is simply treated as another patch.
I didn't use named pipes after all. Might go back to it once I add support for "run configuration stacking".

Images

This creates a slight problem as far as patch stacking is concerned. We can't just simply overlay transparent images on each other.

Patch stacking and minimizing redundancy, the cop-out

So, how about distinguishing between 8-bit alpha and 1-bit alpha?

8-bit alpha replacements are unstackable (is that a word?). If such a replacement is found for a specific source images, it is either overlaid (if the source image has no alpha channel) or fully replaces the original image otherwise.
1-bit alpha replacements are stackable. They only make sense if the source image has no alpha channel itself, though.

Unfortunately, there is no "25-bit" PNG format. This means that these 1-bit replacements can only have a maximum of 255 colors...

... except, of course, if there is (or if we can write) a function to analyze the actual alpha usage in an image regardless of its actual bit depth.

Patch stacking and minimizing redundancy, for real

... and then you see front/front00.png and realize that you need a better system anyway. Not only because of patch stacking, but also because of bloat - if we could remove the UI border and other content that probably doesn't require translation, you end up with a 140 KB file, less than 10% of the original file size.

PNG does not support a "second alpha channel" or something similar to that. So, what do?

Add a separate gray-scale mask image?

Probably simple to implement, yeah, but is bound to annoy image editors considerably.

Add a JSON file with a bunch of sprite rectangle definitions?

And who is going to write these files?

Replace on sprite level, splitting the original file into one replacement file per sprite?

I don't think the image editors will like this... and I don't really like bunches of small images either. Plus, I'd have to write a parser to split them like this.

Replace on sprite level, still with the original image layout, using sprite-local alpha analysis.

That sounds cool... and pretty much like the only "right" thing to do.

We iterate over all the sprite rectangles in an ANM and analyze the alpha usage of the replacement image at that rectangle. If all pixels are fully transparent, nothing is replaced. Otherwise, the entire replacement rectangle is taken.

Sounds like a bitch to implement though... and needs more ANM-specific code. Oh well.

Summary

In the end, we'll still need both of these features to appropriately deal with all graphics.

Fortunately, after looking over the images actually relevant for translation again, this is in fact a pretty advanced issue and we'll don't need it that urgently.

Roadmap

☑ Set up libpng and zlib

☑ Get familiar with the ANM format

☑ Write code to replace a THTX with a PNG (stack_game_file_resolve on the PNG name, for now). At this point, PNG handling code is local to thcrap_tsa.

At this point, we only patch replacement images with the same dimensions ~~(and possibly image format?)~~.

☐ Write an ANM sprite boundary drawer script to be run on every image before its upload onto the wiki. This will tell the editors how much space they exactly have.

☐ Ship wiki images on srv.thpatch.net

☐ New build

That's all we need for basic support. At this point, change to implementing text layout engine and ending support for th10 and later (since they merely use a modified .msg format, and come back to images later, if we have the time.

Hey, let's split off the relevant parts of `thanm` into a sub-project and use that in `thcrap_tsa`! One shared code base!

Well, too bad that we're coding in C and thanm uses a custom list structure, which, again, is defined in thtk's own utility functions. Since it wouldn't make sense to move these into the thanm sub-project since the rest of thtk depends on these too, that would make two sub-projects. Then, thtk has to be refactored to use these... um, and we're talking about how many lines of code again? And how much of thanm is actually relevant to us at this point?

Also, any change to the format (even in the scale we have already seen) requires a new thcrap_tsa build, and this is always rather... "expensive".

All in all, the programming effort saved pales in comparison to the maintenance work that this would cause.

But is this future-proof? Don't we also want to offer hi-res graphic patches?

Compare the ANM script of a random high-res th14 .anm with the ANM script of an equivalent file from a previous game. Notice something? That's right - all the sprite coordinates everywhere are different, taking the higher image resolution into account.

Automating all of these value replacements based requires the patcher to have complete knowledge of the ANM format. Any volunteers willing to go that far? I'll certainly don't.

The way we'll eventually do hi-res ANM patching is by providing an .anm skeleton file (containing only the (manually adjusted) script) together with the PNGs the patcher will then patch into this skeleton file - just like th06 does it natively. (In fact, this is the only area where th06's engine is actually designed better than the rest of the games.)

So what do we do instead?

formats.js will contain structure offsets and byte sizes for the relevant header data.

Spell cards

ID-based lookup table, accessed via breakpoints. No reason to start parsing and patching ECLs here, and also ~~nicely works around~~ can include a different fix for the "in the result screen, spell cards appear in the language they were last encountered in" issue.

Translated names will be stored in...

game_dir/spells.js? Runs the very, very low risk of name collision with an original game file... if ZUN ever starts to introduce that himself. (Files in game_dir/ replace game data files with the same name.)
game_id.spells.js? Collides with the versioning scheme. Not too nice, either.
A single spells.js, with game_id sub-objects? That file's going to get comparatively large really quick. I do have the bandwidth, though... but it's also inconsistent with existing schemes.

However, given that a lot of people will be wanting to blacklist spell card translation for every language but English, a single spells.js file seems more straightforward than having to fumble around with wildcards.

And what about th08, th095 and th125 spell comments?

... ... ...

th08 is easy. The game always displays two lines of spell comments in the spell practice screen.

... oh, yeah, file format:

game_dir/spellcomments.js

{
    "0": {
        "comment1": [
            "line 1",
            "line 2"
        ],
        "comment2": [
            "line 1",
            "line 2"
        ],
        "owner": "Character"
    }
}

(yes, owner is shown on the screen)

This should nicely work for all three games. I don't think we need the verbosity to mention Aya and Hatate in th125, it should be self-explanatory.

The Result Screen

... of th14 sprintf's each line ("No. %d %s %d/%d") completely into one fixed-width buffer.

Which means that we

a) need the th13 English patch-style text layout engine first (to nicely align this even with proportional fonts) and

b) need to sprintf to our own buffer. I refuse to merely enlarge the game's buffer - just like with the rest of thcrap, we do not do the fixed-width char buffer thing at all, ever - unless terrible APIs force us to.

Implementation

Modifications to `thcrap_tsa`

thcrap_init_plugin loads and unloads spells.js globally

New breakpoints

spell_id: Reads the spell card number from a register

spell_lookup: Writes a const char* (thanks Jansson) to the translated string to some register

New standard parameters

cave_exec: Can be set to false to skip execution of the original slice of code we carved out for our breakpoint, if it would normally be executed.; Required for th13, where the assembly code leaves no other option than to "replace" a certain long instruction with our breakpoint. Much nicer than having to add a separate binary hack to patch that location with NOPs :)

Sounds easy so far, so where are the problems?

As expected, the usual suspects for spell-card related problems don't employ a straightforward ID → name mapping inside the ECL:

Double Spoiler `(th125)`

Has no spell ID whatsoever in the ECL.

However, we can use the index into the global stage table for the same purpose. This index is calculated as

(level number - 1) * 10 + (stage number - 1)

This requires two breakpoints though, one for regular gameplay and another one for replays.

Uwabami Breakers `(alcostg)`

Does have spell IDs, but they're not unique, AAARRRGGGHHH!

Probably a result from the game's rushed development schedule and the fact that these names aren't displayed on any other result screen.

I currently see no other solution than to ship a full set of fixed ECLs in base_tsa, just to get the IDs right...

... but wait a moment. Couldn't we just add a custom spell counter that increments on each invocation of ins_342? The stages can only be played sequentially anyway.

... except, of course, someone patches in direct stage access. So we would have to set the correct value at the beginning of each stage.

... nah. Bad idea.

Shoot the Bullet `(th095)`

What is this I don't even

All in all, this game requires four slightly different binary hacks to calculate a Double Spoiler-like spell ID everywhere a spell name can be rendered.

Phantasmagoria of Flower View `(th09)`

Stores spell card names in the SHT files... and only those for the two players are ever loaded. I don't see any way to get an unique ID anywhere near the text rendering code.

Even worse, the access code for these strings is duplicated for every permutation of player number and spell number.

The SHT files store the names as consecutive NULL-padded fixed-width strings. I would normally refuse to write a patch hook for this for the above-mentioned reasons... but well, a 64-byte limit isn't too bad.

So unless I pull off a ridiculous "are we patching Tasofro games here" sane rewrite of the spell name access code via binary hacks, I'll go with SHT patching... gah.

Deduplication of spell card names in recent games

From th10 on, spell card names shared between difficulties aren't duplicated in the ECL anymore. Their different IDs for each difficulty they appear in are generated on-the-fly, depending on the ECL instruction used. Of course, we don't want translators to copy names, and this would be not much of a problem - we simply move the breakpoint up to a position which has the real ID.

In one instance however (th10 Stage 5, first boss spell card), the Hard and Lunatic variants have different names, yet share the same ID in the ECL.

We solve this by having the spell card breakpoint handlers keep two spell IDs: the one directly from the ECL file (spell_id) and the real number in-game (spell_id_real). If spell_id_real doesn't resolve to any entry in the spell table, spell_id is taken instead.

"By the way, why don't you just use [Japanese text] => [target language] dictionaries for text lookup?"

"I can understand wanting to give people 2 lines of dialogue for every box, but for the rest?"

Because we (that is, the thcrap engine) do not yet have 100% of the global market share in Touhou translations, yet eventually want to achieve this goal. In view of the other existing English/Spanish/Russian/Korean/Chinese/??? patches, we can't require that everyone still keeps the Japanese original (with these strings around) - and people do assume compatibility with English patches.

Just in case someone thinks about suggesting this.

However... we could use a dictionary based on the string address to do...

Translation of hardcoded strings

The basics

Yep. We'll just do a lookup based on the string address on every invocation of our custom

TextOut
MessageBox
others?

Breakpoints

We'll also offer a new breakpoint to do this lookup at any place it might be necessary.

`sprintf`

In addition to that, we should provide a mechanism to handle sprintf calls (of which there are plenty in the games) ourselves. After I saw th06 and th07 crash on dialog lines that exceed 64 bytes, I really don't assume that the original buffer sizes are large enough anymore :)

(Also, <untrusted wiki user data>.)

But how do we store these? They need to be in the heap, so...

JSON object with the breakpoint address as key?: Seems nice and all, especially as far as garbage collection is concerned (setting a new string with the same key deletes the old one), but will utterly fail if the same call is looped.

JSON array, with one new element at each call?: Will hog more and more memory over time. It should hardly matter, but come on, that's just a bad design.

JSON array which is periodically pruned?: Define "periodically".

JSON object, but adding the loop variable to the breakpoint parameters so that we can append it to the key?: Perfect! ...unless I'm missing something here.