Tag Archives: Software

Web history archival and WARC management

I’ve been a sort of ‘rogue archivist’ along the lines of the Archive Team for some time, but generally lack the combination of motivation and free time to directly take part in their activities. That said, I do sometimes go on bursts of archival since these things do concern me; it’s just a question of when I’ll get manic enough to be useful and latch onto an archival task as the one to do. An earlier public example is when I mirrored ticalc.org.


The historical record contains plenty of instances where people maintained copies of their communications or other documentation which has proven useful to study, and in the digital world the same is likely to be true. With the ability to cheaply store large amounts of data, it is also relatively easy to generate collections in the hope of their future utility.

Something I first played with back in 2014 was extracting lists of web pages to archive from web browser history. From a public perspective this may not be particularly interesting, but if maintained over a period of time this data could be interesting as a snapshot of a typical-in-some-fashion individual’s daily life, or for purposes I can’t foresee.

Today I’m going to write a little about how I collect this data and reduce the space requirements. The products of this work that are source code can be found on Bitbucket.

Continue reading Web history archival and WARC management

HodorCSE

Localization of software, while not trivial, is not a particularly novel problem. Where it gets more interesting is in resource-constrained systems, where your ability to display strings is limited by display resolution and memory limitations may make it difficult to include multiple localized copies of any given string in a single binary. All of this is then on top of the usual (admittedly slight in well-designed systems) difficulty in selecting a language at runtime and maintaining reasonably readable code.

This all comes to mind following discussion of providing translations of Doors CSE, a piece of software for the TI-84+ Color Silver Edition1 that falls squarely into the “embedded software” category. The simple approach (and the one taken in previous versions of Doors CS) to localizing it is just replacing the hard-coded strings and rebuilding.

As something of a joke, it was proposed to make additional “joke” translations, for languages such as Klingon or pirate. I proposed a Hodor translation, along the lines of the Hodor UI patch2 for Android. After making that suggestion, I decided to exercise my skills a bit and actually make one.

Hodor (Implementation)3

Since I don’t have access to the source code of Doors CSE, I had to modify the binary to rewrite the strings. Referring the to file format guide, we are aware that TI-8x applications are mostly Intel hex, with a short header. Additionally, I know that these applications are cryptographically signed which implies I will need to resign the application when I have made my changes.

Dumping contents

I installed the IntelHex module in a Python virtualenv to process the file into a format easier to modify, though I ended up not needing much capability from there. I simply used a hex editor to remove the header from the 8ck file (the first 0x4D bytes).

Simply trying to convert the 8ck payload to binary without further processing doesn’t work in this case, because Doors CSE is a multipage application. On these calculators Flash applications are split into 16-kilobyte pages which get swapped into the memory bank at 0x4000. Thus the logical address of the beginning of each page is 0x4000, and programs that are not aware of the special delimiters used in the TI format (to delimit pages) handle this poorly. The raw hex file (after removing the 8ck header) looks like this:

:020000020000FC
:20400000800F00007B578012010F8021088031018048446F6F727343534580908081020382
:2040200022090002008070C39D40C39A6DC3236FC30E70C3106EC3CA7DC3FD7DC3677EC370
:20404000A97EC3FF7EC35D40C35D40C33D78C34E78C36A78C37778C35D40C3A851C9C940F3
:2040600001634001067001C36D00CA7D00BC6E00024900097A00E17200487500985800BDF8
[snip]
:020000020001FB
:204000003A9987B7CA1940FE01CA3440FE02CAFB40FE03CA0541C3027221415D7E23666FAD
:20402000EF7D4721B98411AE84010900EDB0EFAA4AC302723A9B87B7CA4940FE01CA4340B9
:20404000C30272CD4F40C30272CDB540C30272EF67452100002275FE3EA03273FECD63405B

Lines 1 and 7 here are the TI-specific page markers, indicating the beginning of pages 0 and 1, respectively. The lines following each of those contain 32 (20 hex) bytes of data starting at address 0x40000 (4000). I extracted the data from each page out to its own file with a text editor, minus the page delimiter. From there, I was able to use the hex2bin.py script provided with the IntelHex module to create two binary files, one for each page.

Modifying strings

With two binary files, I was ready to modify some strings. The calculator’s character set mostly coincides with ASCII, so I used the strings program packaged with GNU binutils to examine the strings in the image.

$ strings page00.bin
HDoorsCSE
##6M#60>
oJ:T
        Uo&
dQ:T
[snip]
xImprove BASIC editor
Display clock
Enable lowercase
Always launch Doors CSE
Launch Doors CSE with
PRGM]

With some knowledge of the strings in there, it was reasonably short work to find them with a hex editor (in this case I used HxD) and replace them with variants on the string “Hodor”.

hodor-page00[1]
HxD helpfully highlights modified bytes in red.

I also found that page 1 of the application contains no meaningful strings, so I ended up only needing to examine page 0. Some of the reported strings require care in modification, because they refer to system-invariant strings. For example, “OFFSCRPT” appears in there, which I know from experience is the magic name which may be given to an AppVar to make the calculator execute its contents when turned off. Thus I did not modify that string, in addition to a few others (names of authors, URLs, etc).

Repacking

I ran bin2hex.py to convert the modified page 0 binary back into hex, and pasted the contents of that file back into the whole-app hex file (replacing the original contents of page 0). From there, I had to re-sign the binary.4 WikiTI points out how easy that process is, so I installed rabbitsign and went on my merry way:

$ rabbitsign -g -r -o HodorCSE.8ck HodorCSE.hex

Testing

I loaded the app up in an emulator to give it a quick test, and was met by complete nonsense, as intended.

hodor-jstified

I’m providing the final modified 8ck here, for the amusement of my readers. I don’t suggest that anybody use it seriously, not for the least reason that I didn’t test it at all thoroughly to be sure I didn’t inadvertently break something.

Extending the concept

It’s relatively easy to extend this concept to the calculator’s OS as well (and in fact similar string replacements have been done before) with the OS signing keys in hand. I lack the inclination to do so, but surely somebody else would be able to do something fun with it using the process I outlined here.

 


  1. That name sounds stupider every time I write it out. Henceforth, it’s just “the CSE.” 
  2. The programmer of that one took is surprisingly far, such that all of the code that feasibly can be is also Hodor-filled
  3. Hodor hodor hodor hodor. Hodor hodor hodor. 
  4. This signature doesn’t identify the author, as you might assume. Once upon a time TI provided the ability for application authors to pay some amount of money to get a signing key associated with them personally, but that system never saw wide use. Nowadays everybody signs their applications with the public “freeware” keys, just because the calculator requires that all apps be signed and the public keys must be stored on the calculator (of which the freeware keys are preinstalled on all of them). 

rtorrent scripting considered harmful

As best I can tell, whomever designed the scripting system for rtorrent did so in a manner contrived to make it as hard to use as possible.  It seems that = is the function application operator, and precedence is stated by using a few levels of distinct escaping. For example:

# Define a method 'tnadm_complete', which executes 'baz' if both 'foo' and 'bar' return true.
system.method.insert=tnadm_complete,simple,branch={and="foo=,bar=",baz=}

With somewhat more sane design, it might look more like this:

system.method.insert(tnadm_complete, simple, branch(and(foo(),bar()),baz()))

That still doesn’t help the data-type ambiguity problems (‘tnadm_complete’ is a string here, but not obviously so), but it’s a bit better in readability. I haven’t tested whether the escaping with {} can be nested, but I’m not confident that it can.

In any case, that’s just a short rant since I just spent about two hours wrapping my brain around it. Hopefully that work turns into some progress on a new project concept, otherwise it was mostly a waste. As far as the divergence meter goes, I’m currently debugging a lack of communication between my in-circuit programmer and the microcontroller.

Incidentally, the rtorrent community wiki is a rather incomplete but still useful reference for this sort of thing, while gi-torrent provides a reasonably-organized overview of the XMLRPC methods available (which appear to be what the scripting exposes), and the Arch wiki has a few interesting examples.

A few small projects

Going through some of my old projects this evening, I came across a couple little tools I wrote.  I’ve uploaded them here in the hope that others will find them useful.  They are the GCNClient GUI and RX BRR calculator.

I make no guarantees of the utility of these pieces of software, but they may be useful as examples in how to perform some task in the .NET framework (both are written in C# for .NET), or just for performing the very specific tasks which they are designed to perform.

Some Chronos Documentation

Moving on from my previous post (in which I muttered sullenly about brain-dead packaging of software for Linux), I began hacking on my Chronos proper tonight. Read on for some juicy tidbits.

Initial build

The first order of business was to set up a toolchain targeting MSP430. Since I’m running Arch on my primary development system, it was a simple matter to build gcc-msp430 from the AUR.

With that, I was ready to try compiling things. I assumed (correctly) that the provided firmware would not build on GCC without modification, but a little googling pointed me to OpenChronos, which effectively takes the stock firmware, makes it build with any of several compilers (TI’s compiler included with CCS, IAR’s, and GCC). Come to think of it, LLVM has an experimental MSP430 backend that might be interesting to try out.

One git checkout and an invocation of make later, and I was staring at a screenful of errors. “How auspicious,” I thought. The first part of the fix was easy– I simply needed msp430-libc for some of the more specialized functions that don’t map well into straight C- things like interrupt handling (which is in msp430-libc’s signal.h for some reason) or machine-specific delays.

The remaining compilation errors after grabbing libc were rather more troublesome, however. There were two main classes of problem.

  • Uses of types at some specific bit-width (such as uint16_t). These were easily resolved by strategic inclusion of stdint.h, but I’m not very happy with how I had to do it. Spraying header inclusions all over the source code is a poor way to fix things.
  • Large delay constants. There were two cases in the radio control code which adjusted the microcontroller’s voltage regulator, which then requires a rather long delay before the system can be considered stable again. The solution in code is simply to delay for as many as ~800000 clock cycles. Normally that wouldn’t be a problem, but some of the delay constants were larger than the input type to the __delay_cycles function could hold. My hacky solution was to split those into two calls of half the length, which seemed to work out OK.

After a while to figure out the compilation problems, I was able to build a firmware image. After the struggles I had with unpacking TI’s sample code and demo applications, it was fortunately painless to actually run them. I just ensured I had Tcl/Tk installed and ran the Chronos Control Center application. Putting the Chronos itself into WBSL (Wireless BootStrap Loader) mode and clicking a few times was easy, and I quickly got my new firmware image flashed onto the CC430.

Preparing for mods

Now that I had a known-working toolchain, it was time to get to work actually implementing some of the toy features I wanted to add. Since the single most interesting feature of the hardware is the radio (although the low-power capabilities of the MSP430 are quite shiny as well), I set out to see how I could communicate with the watch from my PC.

One of the USB dongles that comes packaged with the Chronos is a USB wireless access point, basically just a CC1111 (6801 core with USB and RF transciever). I understand that earlier revisions of the demo applications didn’t include source code for the software running on the CC1111, but the current release includes it. Some people had taken a bit of trouble to reverse-engineer the communications, but that alone isn’t very useful documentation. With that in mind, I set out to document for myself how to communicate with the RF access point and go through that to talk to the Chronos.

Setting up communications is easy, fortunately. The CC1111 is programmed to enumerate as a USB CDC, so one must only open the virtual serial port it creates with a 115200 bps baud rate with 8 data bits and 1 stop bit. (If that’s not terse enough for you: 115200 baud, 8n1.)

With virtual serial communications up, the upper-level protocol is rather easy- it consists of packets of at least 3 bytes each, where the first one is always 0xFF. Byte 2 provides a command ID, and byte 3 specifies the total packet size, including the overhead (so the minimum valid size is 3). Anything more in the message is interpreted based on the command ID.

Command IDs

There are a number of command IDs defined, but only a few that are of particular interest. In the hopes that somebody else will find it useful, I include my raw notes on the command bytes below.

As a little bit of context, the system can run on either of two different radio protocols. TI’s SimpliciTI is a protocol designed mainly for communication between low-power nodes in a network, while BlueRobin is a radio protocol developed by IAR Systems, notable with the Chronos because it allows communication with a heart rate monitor developed by BMi GmbH.

Command bytes:
    BM_GET_PRODUCT_ID
        Dumps 32-bit product ID into the usb buffer
    BM_GET_STATUS
        returns system_status (some file-scope var?)
== bluerobin
    BM_RESET
        Turns off bluerobin
    BM_START_BLUEROBIN
        Start bluerobin (set a flag, actually), stop simpliciti if that's going
    BM_SET_BLUEROBIN_ID
    BM_GET_BLUEROBIN_ID
    BM_SET_HEARTRATE
    BM_SET_SPEED
== simpliciti RX
    BM_START_SIMPLICITI
        Start simpliciti, stop bluerobin if that's going
    BM_GET_SIMPLICITIDATA
        Dump the 4 bytes from the simpliciti_data buffer to USB
        also mark simpliciti data as read
        If no pending data, usb_buffer[PACKET_BYTE_FIRST_DATA] = 0xFF
    BM_SYNC_START
        nop
    BM_SYNC_SEND_COMMAND
        copy packet from USB to simpliciti buffer and flag for tx ready
    BM_SYNC_GET_BUFFER_STATUS
        1-byte payload packet out, = var simpliciti_sync_buffer_status
    BM_SYNC_READ_BUFFER
        copy simpliciti_data buffer out to USB
    BM_STOP_SIMPLICITI
        Flag to turn off simpliciti

== WBSL
    BM_START_WBSL
        flag to start WBSL, turn off bluerobin/simpliciti if active
    BM_STOP_WBSL
        stop wbsl, turn off LED
    BM_GET_WBSL_STATUS
        copy back var wbsl_status
    BM_GET_PACKET_STATUS_WBSL
        copy back var wbsl_packet_flag or WBSL_ERROR if wbsl is off
    BM_GET_MAX_PAYLOAD_WBSL
        copy back max number of bytes allowed in wbsl payload
    BM_SEND_DATA_WBSL
        set wbsl_packet_flag to WBSL_PROCESSING_PACKET
        deocode packet and spew it to the 430
== self-test
    BM_INIT_TEST
    BM_NEXT_TEST
    BM_GET_TEST_RESULT
    BM_WRITE_BYTE
        write a byte to the access point's Flash memory
        first data byte is the value to write
        second and third are address, little-endian (2 is lsb, 3 => msb)
        must be in test mode (precede this with BM_INIT_TEST)

(Command and system status constants are defined in BM_API.h, FWIW.)

Knowing all the commands, it’s pretty easy to pull out the useful ones. BM_START_SIMPLICITI makes the access point switch into SimpliciTI mode, and sending BM_SYNC_START allows direct communication through the radio link with BM_SYNC_{SEND,READ}_BUFFER functions.

More.. later

This is as far as I’m going to go with this adventure for today, but there’s more to come in the coming days (hopefully, assuming my motivation holds out). This is just preliminary documentation– I’m hoping to create a more formal set of documents providing a whirlwind overview of how to get hacking on the Chronos, but I feel this is an excellent start.

How not to distribute software

I recently acquired a TI eZ430-Chronos watch/development platform. It’s a pretty fancy piece of kit just running the stock firmware, but I got it with hacking in mind, so of course that’s what I set out to do. Little did I know that TI’s packaging of some of the related tools is a good lesson in what not to do when packaging software for users of any system that isn’t Windows..

The first thing to do when working with a new platform is usually to try out the sample applications, and indeed in this case I did exactly that. TI helpfully provide a distribution of the PC-side software for communicating with the Chronos that runs on Linux, but things cannot be that easy. What follows is a loose transcript of my session to get slac388a unpacked so I could look at the provided code.

$ unzip slac388a.zip
$ ls
Chronos-Setup
$ chmod +x Chronos-Setup
$ ./Chronos-Setup
$

Oh, it did nothing. Maybe it segfaulted silently because it’s poorly written?

$ dmesg | tail
[snip]
[2591.111811] [drm] force priority to high
[2591.111811] [drm] force priority to high
$ file Chronos-Setup
Chronos-Setup: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, stripped
$ gdb Chronos-Setup
GNU gdb (GDB) 7.3
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/tari/workspace/chronos-tests/Chronos-Setup...
warning: no loadable sections found in added symbol-file /home/tari/workspace/chronos-tests/Chronos-Setup
(no debugging symbols found)...done.
(gdb) r
Starting program: /home/tari/workspace/chronos-tests/Chronos-Setup
[Inferior 1 (process 9214) exited with code 0177]

Great. It runs and exits with code 127. How useful.</sarcasm>

A Windows-style installer, "InstallJammer Wizard". On Linux.
This is stupid.

I moved the program over to a 32-bit system, and of course it worked fine, although that revealed a stunningly brain-dead design decision. The image (to the right) says everything.

To recap, this was a Windows-style self-extracting installer packed in a zip archive upon initial download, designed to run on a 32-bit Linux system, which failed silently when run on a 64-bit system. I am simply stunned by the bad design.

Bonus tidbit: it unpacked an uninstaller in the directory of source code and compiled demo applications, as if whoever packaged it decided the users (remember, this is an embedded development demo board so it’s logical to assume the users are fairly tech-savvy) were too clueless to delete a single directory when the contents were no longer wanted. I think the only possible reaction is a hearty :facepalm:.

mkg3a

Casio’s FX-CG, or Prizm, is a rather interesting device, and the programmers over on Cemetech seem to have found it worthwhile to make the Prizm do their bidding in software.

The Prizm device itself is based around some sort of SuperH core, identified at times in the system software as a SH7305 a “SH7780 or thereabouts”. The 7780 is not an exact device, though, and it’s likely a licensed SH4 core in a Casio ASIC. Whatever the case, GCC targeted for sh and compiling without the FPU (-m4a-nofpu) and in big-endian mode (-mb) seems to work on the hardware provided.

Between Jonimus and myself (with input from other users on what configurations will work), we’ve assembled a GCC-based toolchain targeting the Prizm. Jon put together a cross-compiler for sh with some supporting scripts, while I contributed a linker script and runtime initialization routine (crt0), both of which were adapted from Kristaba’s work.

With that, we can build binaries targetting sh and linked such that they’ll run on the Prizm, but that alone isn’t very useful. Jon also created libfxcg, a library providing access to the syscalls on the platform. Finally, I created mkg3a, a tool to pack the raw binaries output by the linker into the g3a files accepted by the device.

Rumor has it the whole set of tools works. I haven’t been able to verify that myself since I don’t have a Prizm of my own, but it’s all out there. Tarballs of the whole package are over on Jon’s site, for anyone interested.

Pointless Linux Hacks

I nearly always find it interesting to muck about in someone else’s code, often to add simple features or to make it do something silly, and the Linux kernel is no exception to that. What follows is my own first adventure into patching Linux to do my evil bidding.

Aside from mucking about in code for fun, digging through public source code such as that provided by Linux can be very useful when developing something new.

A short story

I was doing nothing of particular importance yesterday afternoon when I was booting up my previously mentioned netbook. The machine usually runs on a straight framebuffer powered by KMS on i915 hardware, and my kernel is configured to show the famous Tux logo while booting.

Readers familiar with the logo behaviour might already see where I’m going with this, but the kernel typically displays one copy of the logo for each processor in the system (so a uniprocessor machine shows one tux, a quad-core shows four, etc..). As a bit of a joke, then, suggested a friend, why not patch my kernel to make it look like a much more powerful machine than it really is? Of course, that’s exactly what I did, and here’s the patch for Linux 2.6.38.

--- drivers/video/fbmem.c.orig	2011-04-14 07:26:34.865849376 -0400
+++ drivers/video/fbmem.c	2011-04-13 13:06:28.706011678 -0400
@@ -635,7 +635,7 @@
 	int y;

 	y = fb_show_logo_line(info, rotate, fb_logo.logo, 0,
-			      num_online_cpus());
+			      4 * num_online_cpus());
 	y = fb_show_extra_logos(info, y, rotate);

 	return y;

Quite simply, my netbook now pretends to have an eight-core processor (the Atom with SMT reports two logical cores) as far as the visual indications go while booting up.

Source-diving

Thus we come to source-diving, a term I’ve borrowed from the community of Nethack players to describe the process of searching for the location of a particular piece of code in some larger project.

Diving in someone else’s source is frequently useful, although I don’t have any specific examples of it in my own work at the moment. For an outside example, have a look at musca, which is a tiling window manager for X which was written from scratch but used ratpoison and dwm (two other X window managers) as models:

Musca’s code is actually written from scratch, but a lot of useful stuff was gleaned from reading the source code of those two excellent projects.

A personal recommendation for anyone seeking to go source-diving: become good friends with grep. In the case of my patch above, the process went something like this:

  • grep -R LOGO_LINUX linux-2.6.38/ to find all references to LOGO_LINUX in the source tree.
  • Examine the related files, find drivers/video/fbmem.c, which contains the logo display code.
  • Find the part which controls the number of logos to display by searching that file for ‘cpu’, assuming (correctly) that it must call some outside function to get the number of CPUs active in the system.
  • Patch line 638 (for great justice).

Next up in my source-diving adventures will be finding the code which controls what happens when the user presses control+alt+delete, in anticipation of sometime rewriting fb-hitler into a standalone kernel rather than a program running on top of Linux..

Of Links and Kana

I sometimes use Links on various computers when I can’t be bothered to deal with a full graphical environment and just want to look something up. Given I also try to ensure that this site renders in an acceptable manner in text-only mode, Links is indispensable at times.

Now imagine my surprise when I discovered that Links will try to transliterate Japanese kana (a general term for the scripts in which characters correspond to syllables, rather than more abstract ideas such as in kanji) to some extent.

Links romanizing some kana on this page
See page title at center-top.

In that shot, Links has translated the kana in my page’s header to a reasonable romanization- the pronounciation of those characters would be Tari, as in the beginnings of ‘tan’ and ‘return’. I don’t know if that was a recent feature (I’m currently running Links 2.3pre1), but it was a pleasant surprise to see it romanizing kana.

Obfuscation for Fun and Profit

One of the fun things to do with computer languages is abuse them. Confusing human readers of code can be pretty easy, but it takes a specially crafted program to be thoroughly incomprehensible to readers of the source code yet still be legal within the syntax of whatever language the program is written in.

Not dissimilar from building a well-obfuscated program is using esoteric languages and building quines. All of these things can be mind-bending but also provide excellent learning resources for some dark corners of language specification, as well as the occasional clever optimization.

Obfuscation

It’s not uncommon for malware source code to be pretty heavily obfuscated, but that’s nothing compared to properly obfuscated code. What follows is some publically-released Linux exploit code.

ver = wtfyourunhere_heee(krelease, kversion);
if(ver < 0)
    __yyy_tegdtfsrer("!!!  Un4bl3 t0 g3t r3l3as3 wh4t th3 fuq!n");
__gggdfstsgdt_dddex("$$$ K3rn3l r3l3as3: %sn", krelease);
if(argc != 1) {
   while( (ret = getopt(argc, argv, "siflc:k:o:")) > 0) {
      switch(ret) {
          case 'i':
              flags |= KERN_DIS_GGDHHDYQEEWR4432PPOI_LSM|KERN_DIS_DGDGHHYTTFSR34353_FOPS;
              useidt=1; // u have to use -i to force IDT Vector
              break;
          case 'f':
              flags |= KERN_DIS_GGDHHDYQEEWR4432PPOI_LSM|KERN_DIS_GGDYYTDFFACVFD_IDT;
              break;

It reads like gibberish, but examination of the numerous #define statements at beginning of that file and some find/replace action make quick work to deobfuscate the source. Beyond that, the sheer pointlessness of ‘1337 5p33k’ in status messages makes my respect for the author plummet, no matter how skilled they may be at creating exploits.

Let’s now consider an entry to the International Obfuscated C Code Contest (IOCCC) from 1986, submitted by Jim Hague:

#define    DIT (
#define DAH )
#define __DAH   ++
#define DITDAH  *
#define DAHDIT  for
#define DIT_DAH malloc
#define DAH_DIT gets
#define _DAHDIT char
_DAHDIT _DAH_[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:"
;main           DIT         DAH{_DAHDIT
DITDAH          _DIT,DITDAH     DAH_,DITDAH DIT_,
DITDAH          _DIT_,DITDAH        DIT_DAH DIT
DAH,DITDAH      DAH_DIT DIT     DAH;DAHDIT
DIT _DIT=DIT_DAH    DIT 81          DAH,DIT_=_DIT
__DAH;_DIT==DAH_DIT DIT _DIT        DAH;__DIT
DIT'n'DAH DAH      DAHDIT DIT      DAH_=_DIT;DITDAH
DAH_;__DIT      DIT         DITDAH
_DIT_?_DAH DIT      DITDAH          DIT_ DAH:'?'DAH,__DIT
DIT' 'DAH,DAH_ __DAH    DAH DAHDIT      DIT
DITDAH          DIT_=2,_DIT_=_DAH_; DITDAH _DIT_&&DIT
DITDAH _DIT_!=DIT   DITDAH DAH_>='a'?   DITDAH
DAH_&223:DITDAH     DAH_ DAH DAH;       DIT
DITDAH          DIT_ DAH __DAH,_DIT_    __DAH DAH
DITDAH DIT_+=       DIT DITDAH _DIT_>='a'?  DITDAH _DIT_-'a':0
DAH;}_DAH DIT DIT_  DAH{            __DIT DIT
DIT_>3?_DAH     DIT          DIT_>>1 DAH:''DAH;return
DIT_&1?'-':'.';}__DIT DIT           DIT_ DAH _DAHDIT
DIT_;{DIT void DAH write DIT            1,&DIT_,1 DAH;}

What does it do? I couldn’t say without spending a while examining the code. Between clever abuse of the C preprocessor to redefine important language constructs and use of only a few language elements, it’s very difficult to decipher that program. According to the author’s comments, it seems to convert ASCII text on standard input to Morse code.

Aside from (ab)using the preprocessor extensively, IOCCC entries frequently use heavily optimized algorithms which do clever manipulation of data in only a few statements. For a good waste of time, I suggest browsing the list of IOCCC winners. At the least, C experts can work through some pretty good brain teasers, and C learners might pick up some interesting tricks or learn something new while puzzling through the code.

So what? Obfuscating code intentionally is fun and makes for an interesting exercise.

Quines

Another interesting sort of program is a quine- a program that prints its own source code when run. Wikipedia has plenty of information on quines as well as a good breakdown on how to create one. My point in discussing quines, however, is simply to point out a fun abuse of the quine ‘rules’, as it were. Consider the following:

#!/bin/cat

On a UNIX or UNIX-like system, that single line is a quine, because it’s abusing the shebang. The shebang (‘#!’), when used in a plain-text file, indicates to the kernel when loading a file with intent to run it that the file is not itself executable, but should be interpreted.

The system then invokes the program given on the shebang line (in this case /bin/cat) and gives the name of the original file as an argument. Effectively, this makes the system do the following, assuming that line is in the file quine.sh:

$ /bin/cat quine.sh

As most UNIX users will know, cat takes all inputs and writes them back to output, and is useful for combining multiple files (invocation like cat file1 file2 > both) or just viewing the contents of a file as plain text on the terminal. Final result: cat prints the contents of quine.sh.

Is that an abuse of the quine rules? Possibly. Good for learning more about system internals? Most definitely.

Esoteric Languages

Finally in our consideration of mind-bending ways to (ab)use computer languages, we come to the general topic of esoteric languages. Put concisely, an esoteric language is one intended to be difficult to use or just be unusual in some way. Probably the most well-known one is brainfuck, which is.. aptly named, being Turing-complete but also nearly impossible to create anything useful with.

The Esoteric language site has a variety of such languages listed, few of which are of much use. However, the mostly arbitrary limitations imposed on programmers in such languages can make for very good logic puzzles and often require use of rarely-seen tricks to get anything useful done.

One of my personal favorites is Petrovich. More of a command interpreter than programming language, Petrovich does whatever it wants and must be trained to do the desired operations.