How to patch a new Touhou game in a couple of hours

From Touhou Patch Center
Jump to: navigation, search


Possible engine changes and their impact on our plans

The game is a 64-bit executable

After ZUN found the switch to enable SSE2 instructions probably updated to Visual Studio 2012 during the development of Icon th14.png Double Dealing Character, which generates SSE2 instructions per default (which threw me off for 2 hours when preparing the binary hacks for the trial version), flipping the "64-bit switch" is the most scary compiler-related setting that is yet to happen.

Portability to 64-bit wasn't exactly of interest when writing thcrap, due to its nature of being a just-in-time memory patcher targeted exclusively at some 32-bit games. However, adding 64-bit support has become a lot less scary as of 2017:

So we're actually pretty prepared in case this does happen after all. Bring it on.


Unlike SSE2 which is available in every CPU built after 2003, a 64-bit build is nothing you enable just because you can - the entire operating system with all APIs need to be 64-bit as well.

Thus, if we assume the data of Steam's monthly hardware survey to be an accurate representation of OS distribution among gamers, 6.69% (as of May 2017) of ZUN's potential audience would not be able to play the game if it was 64-bit.

Tangentially related: Will ZUN finally drop Windows XP support this time?

As of May 2017, Windows XP now has a market share of 0.88% in the Steam survey. The Unity hardware stats always show a higher number for XP, which is currently at 6.5%. The Unity number is probably the more significant one for us, since Unity engine games seem to be more prevalent in China and developing countries[citation needed], which we probably should care about, with Touhou Patch Center's international focus.

ZUN is still de-facto "supporting" XP as of Icon th15.png Legacy of Lunatic Kingdom:

  • Although some game builds might not start on XP, in particular the initial trial and full versions (v0.01a and v1.00a) of Icon th14.png Double Dealing Character, this has always been "fixed" in later corresponding updates.
  • Icon th143.png Impossible Spell Card changed the default font for rendered text to Meiryo, the new standard font of Japanese on Windows introduced in Vista. However, it still includes a second code path that uses MS Gothic in case Meiryo isn't available (read: in case it's running on XP), complete with custom text placement settings to make up for the fact that Meiryo and MS Gothic are metric-incompatible.
  • And even if it is necessary to create an "XP patch" by hacking the MajorOperatingSystemVersion field in the PE header, the games so far have all worked fine after that. And if a game works on XP, people will use it on XP, no matter what the package says. (Heck, I will use it on my XP VM, whose boot→desktop time of 4 seconds from an SSD remains unmatched by later Windows versions. And it is probably still more realistic testing environment for thcrap than ReactOS is.)

ZUN suddenly starts to think that .NET is a good idea

Yeah, right, and throw away all the work in his STG engine, which has been gradually evolving since 2002. (A lot of the bad coding practices you see in ZUN code, and that we have to fix as part of thcrap support, have been there since Icon th06.png Embodiment of Scarlet Devil.)

(Not sure how FUCKED we'd actually are in this case, though.)


  • Tasofro-style segmented data file loading code (complete data file is never in one contiguous block of memory)
    • → We’re FUCKED
      • … okay, well, if it’s halfway sane (see DynaMarisa), we could do it and have the patch one day later
    • PC-98 games used to do this to some extent…


  • Vastly different message file format
    • → We’re FUCKED (for real this time)
    • Hasn’t happened since th06

  • Slightly different message file format (e.g. structure sizes have changed) or new opcodes relevant to us
    • → Update the .msg patcher and release a build. Should only cause a delay of a few hours at most.
    • The former hasn’t happened since th09, the latter not since th11


Plaintext vs. Images
To further speed up the progress, certain tasks can be done by non-developer volunteers. These are marked in green and prefixed with ND: (non-dev) below, so that you can easily Ctrl-F for it.
The only thing required for this is some sort of webfolder software that allows the quick sharing of files. Something like Dropbox.

Steps 1 and 2 are required for every official version of a game.

Step 0: Collect patch-relevant information about the game

This is not necessary for the patch itself, only to ease the work of the translators.
  • ND: Which characters are there, and where do they appear? (Bosses, midbosses, additional characters talking). Insert their Romaji names into MediaWiki:Thpatch-chars.
  • ND: Are there any dialog sequences different from the normal pre-boss and post-boss dialogs?
  • ND: We need a complete list of spell cards with their correct IDs to verify these later. Thus, get another person to play through the game on every difficulty. Just continue through all the way, characters don't matter, unless some come with exclusive spellcards. Then, give your score.dat to the developer.
  • ND: Do we have people with good Japanese knowledge wanting to help? If yes: Spend your time on transcribing images, not dialogue or spells or anything else. We're going dump all plaintext in the course of this workflow, but we can't dump text from images.
(If you even believe that having transcribed kanji will help those translators whose kanji aptitude perhaps is not as high as yours. Seems not nearly as important as I thought it would be, from what I've gathered so far.)

General support

Step 0.5: Open a separate x64dbg instance with the previous game executable

Seriously, this new game should mostly just be a copy-pasta of the last one.

Step 1: Hash the game

  • ND: You will also receive the game executable in the shared folder. Extract the game icon using Resource Hacker or a similar software, convert it to PNG, and upload it as File:Icon_th16.png.
  • ND: Follow some Western superplayers, and talk them into sharing their score file at regular intervals! Yatsuzume, for example, fully cleared Icon th165.png Violet Detector in under 4 hours, which is still quick enough to be useful to me. Otherwise, I'll have to spend some time with figuring out some cheats on my own.
  • sha256sum th??.exe
  • after this, people can already select the game in the configuration tool

Step 2: Get symbols for relevant libc functions

As of August 2018, Relyze is the only free-to-use reverse-engineering tool I know that actually comes with signatures for recent Visual Studio versions, and can therefore show function names for a statically linked Visual C++ runtime. Since the trial version forcibly closes with a "please register, kthx" message after a few minutes, let's get all the symbols we could possibly need right at the beginning:

Function Needed for
_malloc File breakpoints
_free File breakpoints
_vsprintf Safe sprintf
_strchr Ruby
_atoi Ruby

Step 3: Search for breakpoints

Make sure you have dat_dump set to true in your run configuration, otherwise thcrap won't actually dump anything from the game.
Also, actually run under thcrap, always.

  • String search for “Decode” should get us near the loading code
    • if not there anymore (new logging format or more aggressive compiler optimization?), trace back from ReadFile calls
  • Address which has file name and file size in some register at the same time
  • if not applicable, add separate file_name breakpoint
  • Function call shortly after file_size which returns the fully unpacked and decrypted file
    • if there is no such thing anymore, we’re FUCKED (this is exactly while we didn’t do Tasofro games ourselves in the beginning)
    • … except, of course, if the "function call" is merely inlined - see th06, th08 and th09
  • file_buffer: register that contains the address of the final file buffer
  • Don't forget stack_clear_size!!! (Number of things pushed to the function at this breakpoint * 4)
  • some place near the end of the function at the file_load breakpoint
  • should require no parameters on its own – if the function allocates a new buffer though (th08 and th09 do), specify that in file_buffer here

Step 4: Dump all data

With these breakpoints, we now have a on-the-fly data dumper, without having to know anything about the .dat format. And since we don't actually care about thdat, the enabler of static .dat patches… let's just write a small chunk of assembly to dump it all!

  • Set a debugging breakpoint on file_size
    • arc_load is this function. Confirm that, there should be lots of calls to it.
    • file_table can be found a bit later (the archive entry lookup is inlined below). It's a pointer to somewhere in the data segment, not the heap-allocated thing this pointer points to! Can be found near a place that looks like this:
00402E6F | B9 88 3D 50 00                     | mov ecx, <th15.v1.00b.file_table>          |
00402E74 | E8 77 95 06 00                     | call <th15.v1.00b.sub_46C3F0>              |
00402E79 | 6A 02                              | push 2                                     |
00402E7B | E8 00 4B 00 00                     | call <th15.v1.00b.sub_407980>              |
00402E80 | 8B C7                              | mov eax, edi                               |
00402E82 | 5F                                 | pop edi                                    |
00402E83 | 5E                                 | pop esi                                    |
00402E84 | 5B                                 | pop ebx                                    |
00402E85 | 8B E5                              | mov esp, ebp                               |
00402E87 | 5D                                 | pop ebp                                    |
00402E88 | C2 04 00                           | ret 4                                      |
  • Step out of this function to free up critical sections and stuff.

With these values, search a nice spot, adjust and paste the code somewhere, and jump to it.

Dump the entire game archive
Description In this example, arc_load takes the file name in ECX and the target address for the file size in EDX.
Address A nice place
  1. 8b35 00000000
  2. 83ec 04
  3. 89e2
  4. 8b0e
  5. 85c9
  6. 74 13
  7. 31c0
  8. 50
  9. e8 00000000
  10. 50
  11. e8 00000000
  12. 83c6 10
  13. eb e5
  14. cc
  1. mov esi, dword ptr [file_table]
  2. sub esp,4                  ; allocate a local variable to store the file size
  3. mov edx,esp
  4. mov ecx,dword ptr ds:[esi]
  5. test ecx,ecx               ; end of list? 
  6. je short +0x15
  7. xor eax,eax
  8. push eax
  9. call arc_load
  10. push eax
  11. call _free
  12. add esi,10
  13. jmp short -0x18
  14. int3                       ; that's it, we're done

If it didn't work, take a close look at the calling convention of arc_load, and adjust the code above accordingly. If everything went well, this also indicates no relevant ANM format changes. Ship the file breakpoints if that's the case, or update the ANM patcher in case thcrap crashed while trying to dump the sprite boundaries.

Step 5: Make the Music Room translatable

  • ND: You will receive musiccmt.txt. Turn it into a wiki music page, which should be easy enough to do manually.
  • Add the music titles to that one file on the server
  • Make sure to run processMessageChanges.php!

Step 6: Upload images

For the longest time, I was terribly scared of those… but once I did the implementation, it actually turned to be the easiest thing to patch! Because it's also the one thing that requires the most effort to translate, we'll start with them, so that the image editors can immediately get to work.

So far, the ANM format only changed with Icon th11.png Subterranean Animism, and has since then been constant. Script instructions have come and gone, yeah, but that's nothing we care about.

This means that, as soon as we have general dumping support, we'll also have image patching and sprite boundary dumping support. All that it now takes is a simple thanm x on every file.

  • If there is a volunteer, copy all images into the shared folder.
  • ND: Then, just look through the extracted images to see what can be translated, delete everything else, and upload those images to the wiki in the meantime.
  • Let that one script create an image page for the wiki and copy the files with the correct names:
cd C:\Users\Nmlgc\Desktop\Stream\thcrap\scripts
mkdir C:\Users\Nmlgc\Desktop\Stream\wiki
python -g th165 C:\Users\Nmlgc\Dropbox\newtouhou\dat -t C:\Users\Nmlgc\Desktop\Stream\wiki > C:\Users\Nmlgc\Dropbox\newtouhou\images.txt

, post that, and the rest is up to the image editors.

In-game dialogue

Step 7: Declare the font block

  • Place breakpoints on all calls to TextOutA.
  • Label the function that calls all of these, and D3DXLoadSurfaceFromMemory at the end, as draw_text.
  • Set a breakpoint at the font handle selection jump in draw_text()
  • Go through every possible font ID value by directly manipulating the register, and note down the address for each. Careful, cmp eax, 8 means that there are 9 cases! (This ninth font is used for trophies in Icon th143.png Impossible Spell Card and Icon th165.png Violet Detector.)
  • Do a good job. If there is a direct font_array[font_id * 4] mapping and only the last ID is font_array[-1] (because why wouldn't it be), spell it all out.

Step 8: Fix sprintf() buffer overflows for all four text output functions (left-/center-/right-aligned)

They have been there ever since Icon th06.png Embodiment of Scarlet Devil, and we have to get rid of them before putting anything up for translation. Otherwise, translators can not only crash the game, but (in the case of spell cards) also possibly corrupt score.dat, just by inputting a sufficiently long spell name. And the original limits are very strict, especially when taking Greek or Cyrillic UTF-8 text into account.

And yes, there are instances where the input is actually a format string - the Music Room or the Spell Practice menu come to mind.

And yes, all four, now. The alignment hacks will not make the sprintf_rep ones redundant, as they are usually placed at the beginning of the inlined strlen() function, after the pointer to the fixed-size char buffer was loaded into a register.

  • Enter the Music Room to find draw_ltext by looking at the stack when the breakpoint in draw_text is hit.
  • draw_ctext and draw_rtext are the other two.

The safe sprintf hacks have been pretty streamlined by now. As of Icon th16.png Hidden Star in Four Seasons and its… interesting vsprintf() variety with another weird parameter after the format string, they have become even simpler:

call strings_vsprintf_msvcrt14   ; Yes, the stack is already laid out correctly for the function
mov  dword ptr ss:[x], eax       ; Save the pointer

That's it. If we're lucky, there already is a pre-made hack for the exact address of x in base_tsa's global.js. If not, add it. Again, if it's more complicated than that, you're doing something wrong.

If we're extremely unlucky and the executable is compiled without stack frames (the horror…), the hack usually is safe_sprintf_esp+<value before _vsprintf + 4>.

For sprintf_rep, a function can have more than one of these!

  • Also, set breakpoints on every call to draw_ltext.

(Also, check whether that is even necessary with this new game. ZUN has sprintf()d into fixed-size local char[] buffers ever since Icon th06.png Embodiment of Scarlet Devil, but hey, one can dream.)

Step 9: Investigate the message format

The opcodes relevant to us probably haven’t changed (they haven’t since th11), but there might be some new instructions. Add them to thmsg if that is the case, then dump all .msg files to plaintext. Note that msg2wiki strips off the last extension from the filename, so pipe the results into %f.txt.

Step 10: Convert message dumps to wiki code

    • First, move any possible third dialog entry, which is most likely shown before entry 0 (pre-boss) and entry 1 (post-boss), to the top of each .txt file.
    • Dump all stages at once, seperately for every character:
./msg2wiki.exe st01a.msg.txt st02a.msg.txt st03a.msg.txt > StoryA.txt
./msg2wiki.exe st01b.msg.txt st02b.msg.txt st03b.msg.txt > StoryB.txt
./msg2wiki.exe st01c.msg.txt st02c.msg.txt st03c.msg.txt > StoryC.txt
./msg2wiki.exe st01d.msg.txt st02d.msg.txt st03d.msg.txt > StoryD.txt
  • ND: Replace the tokens.
  • ND: Post on the wiki, and make available for translation.

Step 11: Music Room translation hacks

… It's been so long. Just look at how the last game did it. 🤷

Step 12: Additional small hacks

base_tsa: Textbox width

  • Play the game in Easy mode until the boss. Wait until one of the remaining draw_ltext breakpoints from earlier is hit.
  • Search for the inlined strlen a bit below.
  • Place the hack at the start of the strlen.
  • If a rewrite is necessary due to a compiler change:
push 0             ; font ID
push ecx           ; the register that contains the full string
call GetTextExtentForFontID
sub  eax, 0x1c     ; yes, this will fit the text exactly
jae  +2            ; ensure a "minimum width"…
xor  eax, eax      ; …of zero
  • (sub eax, 0x1c will appear in the original code a few instructions later. For consistency, we move it directly after the GetTextExtent call.)
  • Remove anything that writes to EAX afterwards. This should only include one rounding instruction (and eax, 0xFFFFFFF0).
  • Restore all XMM registers loaded above that may get clobbered by GetTextExtentForFontID(), and that are needed for the calculations below.
  • Second binary hack about 200 bytes below the first.
  • Continue to play until the first spell card.

base_tsa: ascii_960.png

If ZUN still hasn't aligned the sprites correctly. (Compare with the last game.)

script_latin: ascii.png

base_tsa: Meiryo unlock

In the callback to EnumFontFamiliesExA, place meiryo_strcmp_remove on the final JNE instruction. Don't do that. script_latin may be able to work around this for a number of languages, but the ones that don't depend on script_latin will still be forced to work around Meiryo so that the translations look like they do in Icon th14.png Double Dealing Character. This way, we're at least giving them the chance to change their locale in order to get the correct look.

script_latin: Meiryo removal

Place meiryo_disable on the EnumFontFamiliesExA instruction itself.

Endings (for full versions only, obviously)

Not different from in-game dialogue at all.

Step 13: Investigate the ending format

This ties with ANM for the most consistent Touhou data format. The new one hasn't changed since Icon th10.png Mountain of Faith, so this step will most likely be a non-issue.

Step 14: Convert ending dumps to wiki code

See above.

Spell cards

Oh boy. Ever since Icon th10.png Mountain of Faith, this is probably the biggest minefield in Touhou code as far as translation is concerned.

Step 15: Correctly align spell card names

  • Look for the functions. Normally, it's draw_rtext() (3) ← spell name processing function (2) ← ECL parser (1).
  • Do the alignment hacks. Shuffle the rest of the function around in such a way as to fit in our call to GetTextExtentForFont.

Step 16: Set up breakpoints

For spell name patching, we need up to four variables:

The spell number as given by the ECL file
Found in (1) near the call to (2)
The real spell number, including a difficulty offset
Found shortly after spell_id
A value between 0 and 3, indicating the difficulty level this spell appears in.
This is used in the result or Spell Practice menus where we only have spell_id_real and thus wouldn't be able to go back to the base ID of a particular spell.
The register to write the translated spell name to.
This breakpoint should be set in (2) near the call to (3). By deferring spell name fetching as long as possible, we don't have to fix all the buffer overflows in (2).
Also, keep in mind that cave_exec: false is a thing (although it shouldn't be necessary anymore with deferred fetching)

While locating these breakpoints, assign labels to the "ECL parameter getter" functions according to the type of their return value.

Step 17: Investigate the ECL format

At this point, we again depend on Touhou Toolkit; not only for the complete list of spell names with their IDs, but also to create the replacement ECLs for the Skipgame patch.

And most likely, the ECL format has changed again, adding a few new opcodes (and other stuff we hopefully don't care about), so that simply specifying the last game will give "id ### was not found in the format table" errors.

To find new and changed opcodes:

  1. Set a breakpoint on the
    fprintf(stderr, "%s: id %d was not found in the format table\n", argv0, id);
    line in thecl10.c.
  2. Step out and find out the param_count by looking at the instr variable.
  3. Try all parameter types until the result makes sense. In modern Touhou, we only really have to differentiate between integers ('S') and floats ('f').

And yes, we do specify the correct types in this step. Sure, we can just add "S" everywhere and quickly get that thing to work. But we have the resources to do better, and it's not worth doing crappy work now and then expecting someone else to do the "real" thecl development work later.

Step 18: Convert spell names to wiki code

At least that step is pretty straightforward.

  • Grep spell card name instructions out of all files, do iconv -f shift-jis -t utf-8, do some sed magic to bring it into a simpler format, and sort it. A bash one-liner.
  • Look for duplicated spell IDs and set the correct number according to the difficulty, by looking at the flags near the instruction containing the spell name.
  • Run that corrected dump through
  • Do a bit of search-and-replace for the character names
  • … and post stuff on the wiki.

Supplementary patches

Step 19: Enlarge the spell name sprite in text.anm to cover the entire screen

"Already? It's just card name cutoffs," you might say. However, this should take no longer than 10 minutes, since the fix will be made obvious from previous text.anm files. Especially if this game has Spell Practice, and we then even save the minute it takes to reach the Stage 1 mid-boss to test this.

Step 20: Instant ending support (for full versions only, obviously)

Yes, this takes precedence over Skipgame. By the time development has reached this point, all stages will likely have at least some sort of draft-quality translation, and both players and translators will have unlocked all stages in Practice Mode by this point. (Or we'll at least have a semi-complete score.dat by that point.) Therefore, it is more important right now to save all proofreaders the ~25 minutes it takes to clear the game and review the endings for cutoffs and other problems, than saving them ~3 minutes for each separate stage. (Also, Skipgame is a ton of work in comparison to instant_ending.)

Step 21: Skipgame support

This involves:

  • Deleting all Main* and *_at subroutines
  • Keeping the LogoEnemy
  • Shortening spell card times to 2 seconds, or more if that causes glitches
  • Deleting any midbosses with no spell cards on any difficulty
  • Using a Shift-JIS editor that won't destroy the original spell card names on save, so that score.dat will remain compatible to the unpatched game.

Keep the original distinction between st??.ecl, st??mbs.ecl, and st??bs.ecl. This allows Skipgame to be turned into a boss rush patch by simply ignoring the right files.

Step 22: Hardcoded strings, resolution dialog, custom.exe, and any other low-priority things

  • Position of the ruby_offset breakpoint should be a bit before a draw_ltext call, after _atoi.

Workflow for new full versions of games we already have trial support for

With an existing trial build, we already have the technical support worked out and it only needs the addresses and small other adjustments to work with the full version. Since the audience will be much larger, we need to be all the more careful here. Thus, we port all the technical support before doing anything else.

The (outdated) workflow for this is as follows (changes in bold):

  • Step 0: Collect patch-relevant information about the game
  • Step 1: Hash the game
  • Step 2: Search breakpoints.
  • Step 3: Port all existing base_tsa binary hacks and breakpoints to the new build (really, it's better to leave the game untranslated for 15-30 minutes than to risk buffer overflows with some language)
  • Step 4: Dump all data. Quickly check if any files differ (text.anm probably does) and whether everything ports over correctly
  • Step 5: Add binary hacks and breakpoints to base_tsa
  • Step 6: Upload images
  • Step 7: Investigate the message format
  • Step 8: Convert message dumps to wiki code
  • Step 9: Investigate the ending format
  • Step 10: Convert ending dumps to wiki code
  • Step 11: Investigate the ECL format
  • Step 12: Convert spell names to wiki code
  • Step 13: Skipgame support
  • Step 14: Go to sleep. All the important stuff is translatable now.
  • Step 15: Do the kludgy Music Room workaround thingy
  • Step 16: Leisurely pick out translatable hardcoded strings
Retrieved from ""