HodorCSE

Localization of software, while not trivial, is not a particularly novel problem. Where it gets more interesting is in resource-constrained systems, where your ability to display strings is limited by display resolution and memory limitations may make it difficult to include multiple localized copies of any given string in a single binary. All of this is then on top of the usual (admittedly slight in well-designed systems) difficulty in selecting a language at runtime and maintaining reasonably readable code.

This all comes to mind following discussion of providing translations of Doors CSE, a piece of software for the TI-84+ Color Silver Edition1 that falls squarely into the "embedded software" category. The simple approach (and the one taken in previous versions of Doors CS) to localizing it is just replacing the hard-coded strings and rebuilding.

As something of a joke, it was proposed to make additional "joke" translations, for languages such as Klingon or pirate. I proposed a Hodor translation, along the lines of the Hodor UI patch2 for Android. After making that suggestion, I decided to exercise my skills a bit and actually make one.

Hodor (Implementation)3

Since I don't have access to the source code of Doors CSE, I had to modify the binary to rewrite the strings. Referring the to file format guide, we are aware that TI-8x applications are mostly Intel hex, with a short header. Additionally, I know that these applications are cryptographically signed which implies I will need to resign the application when I have made my changes.

Dumping contents

I installed the IntelHex module in a Python virtualenv to process the file into a format easier to modify, though I ended up not needing much capability from there. I simply used a hex editor to remove the header from the 8ck file (the first 0x4D bytes).

Simply trying to convert the 8ck payload to binary without further processing doesn't work in this case, because Doors CSE is a multipage application. On these calculators Flash applications are split into 16-kilobyte pages which get swapped into the memory bank at 0x4000. Thus the logical address of the beginning of each page is 0x4000, and programs that are not aware of the special delimiters used in the TI format (to delimit pages) handle this poorly. The raw hex file (after removing the 8ck header) looks like this:

:020000020000FC
:20400000800F00007B578012010F8021088031018048446F6F727343534580908081020382
:2040200022090002008070C39D40C39A6DC3236FC30E70C3106EC3CA7DC3FD7DC3677EC370
:20404000A97EC3FF7EC35D40C35D40C33D78C34E78C36A78C37778C35D40C3A851C9C940F3
:2040600001634001067001C36D00CA7D00BC6E00024900097A00E17200487500985800BDF8
[snip]
:020000020001FB
:20402000EF7D4721B98411AE84010900EDB0EFAA4AC302723A9B87B7CA4940FE01CA4340B9
:20404000C30272CD4F40C30272CDB540C30272EF67452100002275FE3EA03273FECD63405B


Lines 1 and 7 here are the TI-specific page markers, indicating the beginning of pages 0 and 1, respectively. The lines following each of those contain 32 (20 hex) bytes of data starting at address 0x40000 (4000). I extracted the data from each page out to its own file with a text editor, minus the page delimiter. From there, I was able to use the hex2bin.py script provided with the IntelHex module to create two binary files, one for each page.

Modifying strings

With two binary files, I was ready to modify some strings. The calculator's character set mostly coincides with ASCII, so I used the strings program packaged with GNU binutils to examine the strings in the image.

$strings page00.bin HDoorsCSE ##6M#60> oJ:T Uo& dQ:T [snip] xImprove BASIC editor Display clock Enable lowercase Always launch Doors CSE Launch Doors CSE with PRGM]  With some knowledge of the strings in there, it was reasonably short work to find them with a hex editor (in this case I used HxD) and replace them with variants on the string "Hodor". I also found that page 1 of the application contains no meaningful strings, so I ended up only needing to examine page 0. Some of the reported strings require care in modification, because they refer to system-invariant strings. For example, "OFFSCRPT" appears in there, which I know from experience is the magic name which may be given to an AppVar to make the calculator execute its contents when turned off. Thus I did not modify that string, in addition to a few others (names of authors, URLs, etc). Repacking I ran bin2hex.py to convert the modified page 0 binary back into hex, and pasted the contents of that file back into the whole-app hex file (replacing the original contents of page 0). From there, I had to re-sign the binary.4 WikiTI points out how easy that process is, so I installed rabbitsign and went on my merry way: $ rabbitsign -g -r -o HodorCSE.8ck HodorCSE.hex


Testing

I loaded the app up in an emulator to give it a quick test, and was met by complete nonsense, as intended.

I'm providing the final modified 8ck here, for the amusement of my readers. I don't suggest that anybody use it seriously, not for the least reason that I didn't test it at all thoroughly to be sure I didn't inadvertently break something.

Extending the concept

It's relatively easy to extend this concept to the calculator's OS as well (and in fact similar string replacements have been done before) with the OS signing keys in hand. I lack the inclination to do so, but surely somebody else would be able to do something fun with it using the process I outlined here.

1. That name sounds stupider every time I write it out. Henceforth, it's just "the CSE."
2. The programmer of that one took is surprisingly far, such that all of the code that feasibly can be is also Hodor-filled
3. Hodor hodor hodor hodor. Hodor hodor hodor.
4. This signature doesn't identify the author, as you might assume. Once upon a time TI provided the ability for application authors to pay some amount of money to get a signing key associated with them personally, but that system never saw wide use. Nowadays everybody signs their applications with the public "freeware" keys, just because the calculator requires that all apps be signed and the public keys must be stored on the calculator (of which the freeware keys are preinstalled on all of them).

"A Sufficiently Smart Compiler"

On a bit of a lark today, I decided to see if I could get Spasm running in a web browser via Emscripten. I was successful, but found that something seemed to be optimizing out most of main() such that I had to hack in my own main function that performed the same critical functions and (for the sake of simplicity) hard-coded the relevant command-line options.

Looking into the problem a bit further, I observed that not all of main() was being removed; there was one critical line left in. The beginning of the function in source and the generated code were as follows.

C++ source:

int main (int argc, char **argv)
{
int curr_arg = 1;
bool case_sensitive = false;
bool is_storage_initialized = false;

use_colors = true;
extern WORD user_attributes;
user_attributes = save_console_attributes ();
atexit (restore_console_attributes_at_exit);

//if there aren't enough args, show info
if (argc < 2) {

Generated Javascript (asm.js):

function _main($argc,$argv) {
$argc =$argc | 0;
$argv =$argv | 0;
HEAP8[4296] = 1;
__Z23save_console_attributesv() | 0;
return 0;
}

Spasm is known to work in general, but I found it unlikely that LLVM's optimizer would be optimizing this code wrong as well. Building with optimizations turned off generated correct code, so it was definitely the optimizer breaking this and not some silly bug in Emscripten. Looking a little deeper into the save_console_attributes function, we see the following code:

WORD save_console_attributes () {
#ifdef WIN32
CONSOLE_SCREEN_BUFFER_INFO csbiScreenBufferInfo;
GetConsoleScreenBufferInfo (GetStdHandle (STD_OUTPUT_HANDLE), &csbiScreenBufferInfo);
return csbiScreenBufferInfo.wAttributes;
#endif
}

Since I'm not building for a Windows target (Emscripten's runtime environment resembles a Unix-like system), this was preprocesses down to an empty function (returning void), but it's declared with a non-void return. Smells like undefined behavior! Let's make this function return 0:

WORD save_console_attributes () {
#ifdef WIN32
CONSOLE_SCREEN_BUFFER_INFO csbiScreenBufferInfo;
GetConsoleScreenBufferInfo (GetStdHandle (STD_OUTPUT_HANDLE), &csbiScreenBufferInfo);
return csbiScreenBufferInfo.wAttributes;
#else
return 0;
#endif
}

With that single change, I now get useful code in main. Evidently LLVM's optimizer was smart enough to recognize the call to that function invoked UB and optimized out the rest of main.

Concluding

This issue illustrates nicely the dangers of a sufficiently smart compiler, where updates to your compiler might break otherwise-working code because it's subtly broken. This is particularly of concern in C, where the compilers tend to go to extreme measures to optimize the generated code and there are a lot of ways to inadvertently invoke undefined behavior.

Static analyzers are a big help in finding these issues. Looking more closely at the compiler output from building Spasm, it emitted a warning regarding this function, as well as several potential buffer overflows of the following form:

strncat(s, "/", sizeof(s));

This looks correct (s is a static buffer), but is subtly broken because the length parameter taken by strncat should be the maximum allowed length of the string, excluding the null terminator. The third parameter should be sizeof(s) - 1 in this case, otherwise the string's null terminator might be written out of bounds.

Appendix

The code for my work on this is up on Bitbucket and might be of interest to some readers. I fear that by working on this project I've inadvertently committed to becoming the future maintainer of Spasm, which I find to contain a significant amount of poor-quality code. Perhaps I'll have to write a replacement for Spasm in Rust, which I've been quite pleased with as a potential replacement for C, without the numerous pitfalls and rather more modern in its capabilities.

Reverse-engineering Ren'py packages

Some time ago (September 3, 2013, apparently), I had just finished reading Analogue: A Hate Story (which I highly recommend, by the way) and was particularly taken with the art. At that point it seems my engineer's instincts kicked in and it seemed reasonable to reverse-engineer the resource archives to extract the art for my own nefarious purposes.

Yeah, I really got into Analogue. That's all of the achievements.

A little examination of the game files revealed a convenient truth: it was built with Ren'Py, a (open-source) visual novel engine written in Python. Python is a language I'm quite familiar with, so the actual task promised to be well within my expertise.

Code

Long story short, I've build some rudimentary tools for working with compiled Ren'py data. You can get it from my repository on BitBucket. Technically-inclined readers might also want to follow along in the code while reading.

Background

There are a large number of games designed with Ren'py. It's an easy tool to get started with and hack on, since the script language is fairly simple and because it's open-source, more sophisticated users are free to bend it to their will. A few examples of (in my opinion) high-quality things built with the engine:

Since visual novels tend to live or die on the combination of art and writing, the ability to examine the assets outside the game environment offers interesting possibilities.

Since it was handy, I started my experimentation with Analogue.

I've been working on a project that uses GStreamer to play back audio files in an automatically-determined order. My implementation uses a playbin, which is nice and easy to use. I had some issues getting it to continue playback on reaching the end of a file, though.

According to the documentation for the about-to-finish signal,

This signal is emitted when the current uri is about to finish. You can set the uri and suburi to make sure that playback continues.

This signal is emitted from the context of a GStreamer streaming thread.

Because I wanted to avoid blocking a streaming thread under the theory that doing so might interrupt playback (the logic in determining what to play next hits external resources so may take some time), my program simply forwarded that message out to be handled in the application's main thread by posting a message to the pipeline's bus.

Now, this approach appeared to work, except it didn't start playing the next URI, and the pipeline never changed state- it was simply wedged. Turns out that you must assign to the uri property from the same thread, otherwise it doesn't do anything.

Fortunately, it turns out that blocking that streaming thread while waiting for data isn't an issue (determined by experiment by simply blocking the thread for a while before setting the uri.

Newlib's git repository

Because I had quite the time finding it when I wanted to submit a patch to newlib, there's a git mirror of the canonical CVS repository for newlib, which all the patches I saw on the mailing list were based off of. Maybe somebody else looking for it will find this note useful:

git clone git://sourceware.org/git/newlib.git

Matrioshka brains and IPv6: a thought experiment

Nich (one of my roommates) mentioned recently that discussion in his computer networking course this semester turned to IPv6 in a recent session, and we spent a short while coming up with interesting ways to consider the size of the IPv6 address pool.

Assuming 2128 available addresses (an overestimate since some number of them are reserved for certain uses and are not publicly routable), for example, there are more IPv6 addresses than there are (estimated) grains of sand on Earth by a factor of approximately $3 \times 10^{14}$(Wolfram|Alpha says there are between 1020 and 1024 grains of sand on Earth).

A Matrioshka brain?

While Nich quickly lost interest in this diversion into math, I started venturing into cosmic scales to find numbers that compare to that very large address space. I eventually started attempting to do things with the total mass of the Solar System, at which point I made the connection to a Matrioshka brain.

"A what?" you might say. A Matrioshka brain is a megastructure composed of multiple nested Dyson spheres, themselves megastructures of orbiting solar-power satellites in density sufficient to capture most of a star's energy output. A Matrioshka brain uses the captured energy to power computation at an incredible scale, probably to run an uploaded version of something evolved from contemporary civilization (compared to a more classical use of powering a laser death ray or something). Random note: a civilization capable of building a Dyson sphere would be at least Type II on the Kardashev scale. I find Charlie Stross' novel Accelerando to be a particularly vivid example, beginning in a recognizable near-future sort of setting and eventually progressing into a Matrioshka brain-based civilization.

While the typical depiction of a Dyson sphere is a solid shell, it's much more practical to build a swam of individual devices that together form a sort of soft shell, and this is how it's approached in Accelerando, where the Solar System's non-Solar mass is converted into "computronium", effectively a Dyson swarm of processors with integrated thermal generators. By receiving energy from the sunward side and radiating waste heat to the next layer out, computation may be performed.

Let's calculate

Okay, we've gotten definitions out of the way. Now, what I was actually pondering: how does the number of routable IPv6 addresses compare to an estimate of the number of computing devices there might be in a Matrioshka brain? That is, would IPv6 be sufficient as a routing protocol for such a network, and how many devices might that be?

A silicon wafer used for manufacturing electronics, looking into the near future, has a diameter of 450 millimeters and thickness of 925 micrometers (450mm wafers are not yet common, but mass-production processes for this size are being developed as the next standard). These wafers are effectively pure crystals of elemental (that is, monocrystalline) silicon, which are processed to become semiconductor integrated circuits. Our first target, then, will be to determine the mass of an ideal 450mm wafer.

First, we'll need the volume of that wafer (since I was unable to find a precise number for a typical wafer's mass):

Given the wafer's volume, we then need to find its density in order to calculate its mass. I'm no chemist, but I know enough to be dangerous in this instance. A little bit of research reveals that silicon crystals have the same structure as diamond, which is known as diamond cubic. It looks something like this:

Now, this diagram is rather difficult to make sense of, and I struggled with a way to estimate the number of atoms in a given volume from that. A little more searching revealed a handy reference in a materials science textbook, however. The example I've linked here notes that there are 8 atoms per unit cell, which puts us in a useful position for further computation. Given that, the only remaining question is how large each unit cell is. That turns out to be provided by the crystal's lattice constant.
According to the above reference, and supported by the same information from the ever-useful HyperPhysics, the lattice constant of silicon is 0.543 nanometers. With this knowledge in hand, we can compute the average volume per atom in a silicon crystal, since the crystal structure fits 8 atoms into a cube with sides 0.543 nanometers long.

Now that we know the amount of space each atom (on average) takes up in this crystal, we can use the atomic mass of silicon to compute the density. Silicon's atomic mass is 28.0855 atomic mass units, or about $4.66371 \times 10^{-23}$grams.

Thus, we can easily compute the mass of a single wafer, given the volume we computed earlier.

"Four"ier transform

Today's Saturday Morning Breakfast Cereal:

I liked the joke and am familiar enough with the math of working in unusual bases that I felt a need to implement a quick version of this in Python. Code follows.

#!/usr/bin/env python

def fourier(x, b):
"""Attempts to find a fourier version of x, working down from base b.

Returns the fouriest base."""
mostFours = 0
bestBase = -1

for base in range(b, 1, -1):
fours = 0
t = x
while t != 0:
if (t % base) == 4:
fours += 1
t //= base

# Prefer lower bases
if fours >= mostFours:
print(baseconvert(x, base) + "_{0}".format(base))
mostFours = fours
bestBase = base

return bestBase

BASE_CHARS = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def baseconvert(x, base):
s = ""
while x != 0:
s += BASE_CHARS[x % base]
x //= base
return ''.join(reversed(s))

if __name__ == '__main__':
from sys import argv, exit
if len(argv) < 2:
print("""Usage: {0} number

Computes the "four"ier transform of , printing the optimizations to
reach the "fouriest" form.""".format(argv[0]))
exit(1)

x = int(argv[1])
# Base 36 is the largest sensible base to use
base = fourier(x, 36)

if base == -1:
print("{0} is four-prime!".format(x))

This is Python 3.x code, using explicit integer division. It should work under the 2.x series if you change line 34 to use "/=" rather than "//=". It can only go up to base 36, because I didn't want to deal with bases that are hard to represent in reasonable ways. Up to base 64 is an option, but in that case I would have wanted to use MIME base 64, which puts digits at positions 52 through 61, which would be confusing to read. Thus it only supports up to base 36, but could be adjusted with relative east to do larger bases.

Running a few examples:

$python fourier.py 624 HC_36 HT_35 IC_34 IU_33 JG_32 K4_31 143_23 1B4_20 440_12 4444_5$ python fourier.py 65535
1EKF_36
1IHF_35
1MNH_34
1R5U_33
1VVV_32
2661_31
2COF_30
2JQO_29
2RGF_28
38O6_27
3IOF_26
44LA_25
B44F_18
14640_15
4044120_5

\$ python fourier.py 3
3 is four-prime!

A few quirks: it prefers lower bases, so bases that match earlier attempts in fouriness will be printed, despite having equal fouriness. I've decided to call values that have no representations containing a '4' character "four-prime", which is probably going to be a rare occurrence, but the program handles it okay.

Generalization of the algorithm is certainly possible, and basically requires changing the condition on line 14 to match different digit values. For example, a hypothetical "Three"ier transform would replace the '4' on line 14 with a '3'.

There's a rather interesting discussion of the topic over on Reddit, as well as a few other implementations. (Thanks to Merth for pointing those out to me.)

Of Cable Modems and the Dumb Solution

I was studying in Japan last semester (which explains somewhat why I haven't posted anything interesting here in a while). That's a whole different set of things to blog about, which I'll get to at some point with any luck (maybe I'll just force myself to write one post per day for a bit, even though these things tend to take at least a few hours to write..).

Background

At any rate, back in Houghton I live with a few roommates in an apartment served by Charter internet service (which I believe is currently DOCSIS2). The performance tends to be quite good (it seems that the numbers that they quote for service speeds are guaranteed minimums, unlike most other ISPs), but I like to have complete control over my firewall and routing.

In the past, such freedom has been achieved through my trusty WRT54GL, but the 4-megabyte Flash chip in that device makes it hard to fit in a configuration that includes IPv6 support, which is increasingly important to me. As I had an Intel Atom-based board sitting around some time ago, I decided to turn that into a full-time router/firewall running pfSense. The power available with pfSense is probably overkill for my needs, but it ensures I'll be able to stay up to date and potentially do fancy things with my network configuration at some future date.

Returning to the matter at hand: the whole system was working just fine for a while, but I got a report from my roommates that the internet connection had stopped working, but came up okay with a bargain-basement consumer router (a Linksys/Cisco E900). From what information I was able to get from my roommates, it sounded like a hardware failure in the secondary network card, which is used for the WAN uplink (not exactly surprising, since it's a 100-megabit PCI Ethernet card I pulled out of something else some time ago).

Debugging

On my recent return to the apartment, one of my priorities was getting the pfSense system up and running again as the main router/firewall. While the E900 was performing fine, pfSense allows me to get a few additional things out of the connection. Most notably, Charter provide a 6rd relay for ISP-provided IPv6 (compared to something like the public IPv6 tunnel service available from Hurricane Electric), which is quite desirable to me.

After performing a basic test, the pfSense box did indeed fail to get a public IP address from Charter when put in place as the primary gateway. At that point, I decided to break out a network analyzer (Wireshark in this case) and see how the DHCP solicitations on the WAN interface differed between the E900 and my pfSense configuration. What follows is Wireshark's dissection of a single DHCP Discover message from each system.

Ethernet II, Src: Micro-St_60:86:0c (8c:89:a5:60:86:0c), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Source: Micro-St_60:86:0c (8c:89:a5:60:86:0c)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 0.0.0.0 (0.0.0.0), Dst: 255.255.255.255 (255.255.255.255)
Version: 4
Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
0001 00.. = Differentiated Services Codepoint: Unknown (0x04)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 328
Identification: 0x0000 (0)
Flags: 0x00
Fragment offset: 0
Time to live: 128
Protocol: UDP (17)
Source: 0.0.0.0 (0.0.0.0)
Destination: 255.255.255.255 (255.255.255.255)
User Datagram Protocol, Src Port: bootpc (68), Dst Port: bootps (67)
Source port: bootpc (68)
Destination port: bootps (67)
Length: 308
Checksum: 0x9918 [validation disabled]
Bootstrap Protocol
Message type: Boot Request (1)
Hardware type: Ethernet
Hops: 0
Transaction ID: 0x1a5f4329
Seconds elapsed: 0
.000 0000 0000 0000 = Reserved flags: 0x0000
Next server IP address: 0.0.0.0 (0.0.0.0)
Relay agent IP address: 0.0.0.0 (0.0.0.0)
Server host name not given
Boot file name not given
Option: (53) DHCP Message Type
Length: 1
DHCP: Discover (1)
Option: (12) Host Name
Length: 10
Host Name: Needlecast
Option: (55) Parameter Request List
Length: 4
Parameter Request List Item: (1) Subnet Mask
Parameter Request List Item: (3) Router
Parameter Request List Item: (15) Domain Name
Parameter Request List Item: (6) Domain Name Server
Option: (61) Client identifier
Length: 7
Hardware type: Ethernet
Option: (255) End
Option End: 255
Padding
pfSense 2.0.2

Ethernet II, Src: 3com_8a:b9:6b (00:50:da:8a:b9:6b), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Source: 3com_8a:b9:6b (00:50:da:8a:b9:6b)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 0.0.0.0 (0.0.0.0), Dst: 255.255.255.255 (255.255.255.255)
Version: 4
Differentiated Services Field: 0x10 (DSCP 0x04: Unknown DSCP; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
0001 00.. = Differentiated Services Codepoint: Unknown (0x04)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 328
Identification: 0x0000 (0)
Flags: 0x00
Fragment offset: 0
Time to live: 16
Protocol: UDP (17)
Source: 0.0.0.0 (0.0.0.0)
Destination: 255.255.255.255 (255.255.255.255)
User Datagram Protocol, Src Port: bootpc (68), Dst Port: bootps (67)
Source port: bootpc (68)
Destination port: bootps (67)
Length: 308
Checksum: 0x3a68 [validation disabled]
Bootstrap Protocol
Message type: Boot Request (1)
Hardware type: Ethernet
Hops: 0
Transaction ID: 0x06303c2b
Seconds elapsed: 0
Bootp flags: 0x0000 (Unicast)
0... .... .... .... = Broadcast flag: Unicast
.000 0000 0000 0000 = Reserved flags: 0x0000
Next server IP address: 0.0.0.0 (0.0.0.0)
Relay agent IP address: 0.0.0.0 (0.0.0.0)
Server host name not given
Boot file name not given
Option: (53) DHCP Message Type
Length: 1
DHCP: Discover (1)
Option: (61) Client identifier
Length: 7
Hardware type: Ethernet
Option: (12) Host Name
Length: 7
Host Name: pfSense
Option: (55) Parameter Request List
Length: 8
Parameter Request List Item: (1) Subnet Mask
Parameter Request List Item: (2) Time Offset
Parameter Request List Item: (121) Classless Static Route
Parameter Request List Item: (3) Router
Parameter Request List Item: (15) Domain Name
Parameter Request List Item: (6) Domain Name Server
Parameter Request List Item: (12) Host Name
Option: (255) End
Option End: 255
Padding

(Apologies to anybody who finds the above ugly, but I only have so much patience for CSS while blogging.)

There are a few differences there, none of which seem really harmful. Given it was working without incident before, however, I guessed that maybe some upstream configuration had changed and become buggy. In particular, I thought that either the BOOTP broadcast flag (line 32 of both packet dissections) needed to be set for some reason, or the upstream DHCP server was choking on some of the parameters pfSense was requesting.

In an effort to pin down the problem, I manually made some DHCP requests with dhclient configured to match what I was seeing from the E900. The configuration I used with dhclient looked like this (where xl0 is the identifier BSD assigns to my WAN interface):

interface "xl0" {
send host-name "Needlecast";
send dhcp-client-identifier 1:8c:89:a5:60:86:0c;
}

This yielded packets that, when examined in Wireshark, only differed by some of the hardware addresses and the BOOTP broadcast flag. At that point I was rather stuck. Newer releases of dhclient support an option to force the broadcast flag in requests, but FreeBSD (which pfSense is derived from) does not provide a new enough version to have that option, and I didn't want to try building it myself. In addition, I know that my ISP doesn't lock connections to MAC addresses, so I shouldn't have to spoof the MAC address of the E900 (indeed, nothing needed to be changed when switching from pfSense to the E900, so the other direction shouldn't need anything special).

Since I was stuck, it was time to start doing things that seemed increasingly unlikely. One comment on the pfSense forum related to a similar issue mentioned that cable modems tend to be simple DOCSIS-to-Ethernet bridges, so there's some sort of binding to the client MAC address in the upstream DOCSIS machinery, which rebooting the modem should reset. So I hooked everything up normally, cycled power to the modem and booted up pfSense, and...

...it worked.

I had spent a few evenings working on the problem, and the fix was that simple. I was glad it was finally working so I could reconfigure internet-y goodness (QoS, DDNS updating, 6rd tunneling, VPN) on it, but there was certainly also frustration mixed in there.

Lessons

So what's the lesson? I suppose we might say that "you're never too knowledgeable to try rebooting it". It's common advice to less savvy users to "try rebooting it", but I think that's an oft-neglected solution when more technically-inclined individuals are working on a problem. On the other hand, maybe I've just learned some details about DOCSIS systems and the solution in this case happened to be rebooting.

<witty and relevant image goes here>

Better SSL

I updated the site's SSL certificate to no longer be self-signed. This means that if you use the site over HTTPS, you won't need to manually accept the certificate, since it is now signed by StartSSL. If you're interested in doing similar, Ars Technica have a decent walk through the process (though they target nginx for configuration, which may not be useful to those running other web servers).

For convenience, you can follow this link to switch to the HTTPS site.

Rationale

I was reading up on web frameworks available when programming in Haskell earlier today, and I liked the use of domain-specific languages (DSLs) within frameworks such as the routing syntax in Yesod. Compared to how routes are specified in Django (as a similar example that I'm already familiar with), the DSL is both easier to read (because it doesn't need to be valid code in the hosting language) and faster (since it ends up getting compiled into the application as properly executable code).

A pattern I find myself using rather often in Python projects is to have a small module (usually called config) that encapsulates an INI-style configuration file. It feels like an ugly solution though, since it generally just exports a ConfigParser instance. Combined with consideration of DSLs in Haskell, that got me thinking: what if there were an easier way that made INI configuration files act like Python source such that they could just be imported and have the contents of the file exposed as simple Python types (thus hiding some unnecessary complexity)?

Implementation

I was aware of Python's import hook mechanisms, so I figured that it should be a good way to approach this problem, and it ended up being a good excuse to learn more about the import hook mechanism. Thus, the following code provides a way to expose INI-style configuration as Python modules. It should be compatible with Python 3 after changing the import of ConfigParser on line 1 to configparser, but I only tested it on Python 2.7.

import ConfigParser, imp, os, sys

def __init__(self, prefix):
self.prefix = prefix

if name in sys.modules:
return sys.modules[name]

module = imp.new_module(name)
if name == self.prefix:
# 'from config import foo' gets config then config.foo,
# so we need a dummy package.
module.__package__ = name
module.__path__ = []
module.__file__ = __file__
else:
# Try to find a .ini file
module.__package__, _, fname = name.rpartition('.')
fname += '.ini'
module.__file__ = fname
if not os.path.isfile(fname):
raise ImportError("Could not find a .ini file matching " + name)
else:

sys.modules[name] = module
return module

def find_module(self, name, path=None):
if name.startswith(self.prefix):
return self
else:
return None

"""Load ini-style file f into module m."""
cp = ConfigParser.SafeConfigParser()
for section in cp.sections():
setattr(m, section, dict(cp.items(section)))

def init(package='config'):
"""Install the ini import hook for the given virtual package name."""


Most of this code should be fairly easy to follow. The magic of the import hook itself is all in the INILoader class, and exactly how that works is specified in PEP 302.

Usage

So how do you use this? Basically, you must simply run init(), then any imports from the specified package (config by default) will be resolved from an .ini file rather than an actual Python module. Sections in a file are exposed as dictionaries under the module.

An example is much more informative than the preceding short description, so here's one. I put the code on my Python path as INIImport.py and created foo.ini with the following contents:

[cat]
sound=meow
[dog]
sound=woof
[cow]
sound=moo

It has three sections, each describing an animal. Now I load up a Python console and use it:

>>> import INIImport
>>> INIImport.init()
>>> from config import foo
>>> foo.cat
{'sound': 'meow'}
>>> foo.dog['sound']
'woof'

This has the same semantics as a normal Python module, so it can be reloaded or aliased just like any other module:

>>> import config.foo
>>> foo == config.foo
True
<module 'config.foo' from 'foo.ini'>

The ability to reload this module is particularly handy, because my normal configuration module approach doesn't provide an easy way to reload the file.

Improvements, Limitations

Some addition improvements come to mind if I were to release this experiment as production-quality code. Notably, additional path manipulations for finding .ini files would be useful, such as taking a path argument to init(), supplying a set of directories to search within. Having a way to remove the import hook that it installs would also be good, and straightforward to implement. There's no way to get all the sections in the module, so it would also be useful to export the sections somehow-- perhaps by having the module support the mapping protocol (so all the sections could be retrieved with module.items(), for example).

The main limitation of this scheme is that it has no way to determine the desired type of loaded configuration values, so everything is a string. This is a typical limitation when using the ConfigParser module, but compared to a more robust configuration scheme such as defining objects in a Python file (such as Django does), this might be an annoying loss of expressiveness. The values can always be coerced to the required type when retrieving them, but that's a bit of unnecessary extra code in whatever uses the configuration.

It may also be useful to provide a way to write configuration back to a file when modifying a config module, but my simplistic implementation makes no attempt at such. Doing so would not be terribly difficult, just involving some wrapper objects to handle member assignment for sections and items, then providing a mechanism for saving the current values back to the original file.

Postlude

This made for an interesting experiment, and it should be a handy example for how to implement import hooks in Python. You may use this code freely within your own work, but I'd appreciate if you leave a note here that it was useful, and link back to this post.