Development notes/Server

From Touhou Patch Center
Jump to navigation Jump to search

Random thoughts

What about our existing Python code?

... That's barely 300 lines. Rewriting that all to PHP will be easier (and faster) than setting up some means of Python↔PHP integration.

Who knows, maybe the MediaWiki APIs may already come with some of that functionality!

Server-side flat file creation

PHP's open_basedir setting is exactly what we need as our first line of defense against possible directory traversal attacks!

However, this setting can only be in effect during a hook call. Once TPC's processing is done, this restriction needs to be removed again - otherwise, MediaWiki can't even open its own PHP files.

Verify: If our extension is split into multiple files too, would open_basedir keep them from being loaded? If so, we need to remove it immediately before the call and activate it afterwards. (Seems like a wrapper around function calls is necessary...)

→ Wrap all MediaWiki hooks into an exception-handling block in order to be able to reset open_basedir after any kind of error without lots of error checking

(2013-05-02) Turns out that open_basedir can't be cleared by any means after it was set. Hooray. It's exactly this kind of PHP stupidity that kept me from doing this for so long.

OK, so this creates more problems than it can possibly solve. Alternatives?

  • basename - we don't need to specify subdirectories for patchable files anyway. Except for Tasofro games...
  • Replacing all slashes with underscores or the like.

Make no assumptions about page layouts

It should be possible to have the different types of patch data (for example, both version hashes, game information and binary hacks) on any number of pages.

But this means that we have to evaluate every format hook for every page we parse...

→ Each format hook loops through the array of template calls on a page... → Just munch the templates that were successfully parsed! :-)

(2013-05-08) ... What the hell? That was quite the misunderstanding about what hooks can do. So, let's do it properly, and give each template its own hook function. This function will only evaluate that one kind of template. State between templates can be kept inside the class.

Uh, wait, why don't we create a reverse mapping for the patch pages, as with the hashes?

After all, that's all we really need for scraping - a MW page → corresponding patch mapping, and not a "pages of patch" array.

Which, however, should be an array, too, to allow patches to share data with each other.

A~nd, since MySQL doesn't really have a straightforward way of storing arrays, we'll just store that as a CSV string.

This mapping could also give us...

Breadcrumb navigation on pages that are part of a patch

In theory, this seems very easy to implement then. Won't be anything of higher priority, though.

We need hooks, lots of them

I mean, when the base page of a page is deleted or moved, all the mappings need to be updated, too.

Same goes for update.php, which needs to reconstruct it completely - i.e., recursively evaluate all top-level pages in the Patch namespace.

And what about external sources?

That's an entirely different matter anyway. Since the execution of any possible polling code ($wgTPCExtPoll) in the extension depends on the fact of users actually visiting our site, it seems like a better idea to just create a standalone server process accessing the MW API to update external data.

Or we get Touhou Wiki to run our extension :-)

2013-05-10T18:51:36  <Nmlgc> Or I write another, lightweight extension that sends some kind of signal to my server whenever a page is edited, and _then_ scrapes the page.
2013-05-10T18:51:57  <Nazeo> o:
2013-05-10T18:55:01  <Nmlgc> ... which would, in turn, require me to write another server thingy to listen for these signals.
2013-05-10T18:55:26  <Nazeo> ...is this something you want to do?
2013-05-10T18:55:47  <Nmlgc> Not... really.
2013-05-10T18:56:53  <Nazeo> Ack...
2013-05-10T18:57:32  <Nmlgc> And I don't think they would run this extension because 1) it's my first time writing PHP code and 2) flat files

Credits system

... not something to be implemented at this point, unfortunately.

Crucial features left to implement

☑ Template scraping
☑ Mirroring
☑ Binary hack parser
Patch map cache for non-existing pages Sane rewrite of the page→patch mapping
☑ Version template parser
☑ Version templates
☑ Server-side stacking inside one patch
{{Thcrap ver}} switches (only second-level support for a small number of fields at the moment - do we need more? I don't think so)
☑ ... and a fix for the <fieldset> vs. <syntaxhighlight> issue
☑ Game info template parser
☑ Patch info template parser
Server template parser (now a "template" that generates a wikitable out of the server data in LocalSettings, hardcoded into the extension)
☑ Breakpoint template parser
☑ Breakpoint templates
☑ DialogTable parser (Python→PHP rewrite)
☑ Message format template parser
☑ Lists of all files included in a patch and their latest time stamp
☑ [[File:]] includes
☑ FileUpload hook
☑ Update, rebuilding and other hooks

(Note: With the basic template scraper, the specific parsers mostly are only a ~30 sloc matter of shuffling array elements around. Except for DialogTable, which is a bit more complex)

Time is running out~!

At this point, this is not going to work out without a strict schedule, so:

Friday, May 17 Finish server extension
Saturday, May 18 Deploy server extension to main wiki
Sunday, May 19 Start self-updating code
Monday, May 20 Finish self-updating code
Tuesday, May 21 Implement small features missing in the engine (relative paths, JSON font settings, plug-in system)
Wednesday, May 22 Write a really basic batch thingy for the users to configure their patch stack
Thursday, May 23 Update whereismynewtouhou.info and make first twitch.tv tests
Friday, May 24 <Buffer time - do optional features or revise older ones>

Viable features at this point:

  • Spell cards
Saturday, May 25

Things that will not be done by May 26

  • zipfile support - hooray for thousands of single-file HTTP requests for all the initial downloads
  • Any more advanced means of security on the wiki (Flagged Revisions) and server (stable/current branches) - important pages are going to be locked
  • released source code of the client-side patcher - I am still ashamed of this mess

Front-end improvements (Nazeo's department)

The actual code in any template page is never seen by the parser - which means that you can completely go nuts as far as the design of the Patch parser templates is concerned :-) MediaWiki:Common.css is not blocked for admins here, so feel free to store the CSS there.

These are just ideas:

  • Paint a "magic border" in a specific color around every template that is evaluated by the server
  • Clearly indicate when the current file context changes. This will provide greater transparency to hackers and translators - we could even link to these files on the server.