Category Archives: Software

Back to wordpress

After about a year of running a purely static site here, I finally decided it would be worthwhile to move the site backend back to WordPress.

I moved away from WordPress early this year primarily because I was dissatisfied with the theming situation.  While lightword is certainly a well-designed piece of software and markup, I wanted a system that would be easier to customize.  Being written and configured in PHP (a language I don’t know have have little interest in learning), I decided WordPress didn’t offer the easy customizeability that I wanted in a web publishing platform, and made the switch to generating the site as a set of static pages with hyde.  I’ve now decided to make the switch back to WordPress, and the rest of this post outlines my thought process in doing so.

Archiving

One of the things that I am most concerned about in life is the preservation of information.  To me, destruction of information, no matter the content, is a deeply regrettable action.  Deliberate destruction of data is fortunately rare, but too often it may still be lost, often through simple neglect.  For example, [science fiction author] Charlie Stross, in a recent discussion, noted that the web site belonging to Robert Bradbury has become inaccessible at some point since his death, and Mr. Stross was thus unable to find Bradbury’s original article on the subject of Matroishka brains.

That comment led me to realize quickly that this great distributed repository of our age (the world wide web) is a frighteningly ethereal thing- what exists on a server at one moment may disappear without warning for reasons ranging from legal intervention (perhaps because some party asserts the information is illegal to distribute) to the death of the author (in which one’s web hosting may be suspended due to unpaid bills).  Whatever the reason, it is impossible to guarantee that some piece of data will not be lost forever if the author’s copy disappears.

How can we preserve information on the web?  Historically, libraries have filled that role, and in that respect, things haven’t changed that much in the Internet age.  The Internet Archive is a nonprofit organization that works to be like a digital library, and they specifically note that huge swaths of cultural (and other) data might be lost to the depths of time if we do not take steps to preserve it now.  The Internet Archive’s wayback machine (which will probably be familiar to many readers who have needed to track down no-longer-online data) is a continually-updated archive of snapshots of the web.

It’s fairly slow to crawl, but most pages are eventually found by the wayback machine crawlers, so the challenge of data preservation is greatly reduced for site owners, to in most cases only requiring content to be online for a short time (probably less than a year in most cases) before it is permanently archived.  For non-textual content unfortunately, the wayback machine is useless, since it will only mirror web pages, and not images or other non-textual content.  To ensure preservation of non-textual content, however, the solution is also rather easy: upload it to the Internet Archive.  It’s not automatic like the wayback machine, but the end result is the same.

Back to WordPress

This brings me back to my choice of using WordPress to host this web site, rather than a solution that I develop and maintain.  Quite simply, I decided that it is more important to get information I produce out in public so it can be disseminated and archived, rather than maintain fine-grained control over the presentation of the information.

While with Hyde I was able to easily control every aspect of the site design and layout, it also meant that I had to much write much of the the software to drive any additional features that might improve searchability or structure of the content.  When working with WordPress (or any out-of-the-box CMS really), however, I can concern myself with the things that are of real importance- the data, and let the presentation mostly take care of itself.

While Hyde put up barriers to disseminating information (the source being decoupled from presentation and requiring offline editing, for example), my new-old out-of-the-box CMS solution in WordPress makes it extremely easy to publicize information without getting tied up in details which are ultimately irrelevant.

Filtering oneself

With ease of putting information out in public comes the challenge of searching it.  I try to be selective about what I make public, partially because I tend to be somewhat introverted, but also in order to ensure that the information I generate and publicize is that which is of interest to people in the future (although it seems I was only doing the latter subconsciously prior to now).  There are platforms to fill with drivel and day-to-day artifacts of life, but a site like this is not one of them- Twitter, Facebook, and numerous other ‘social’ web sites fill that niche admirably, but can never replace more carefully curated collections

Ephemera

Preservation of ephemera is at the core of some of the large privacy concerns in today’s world.  Companies such as Facebook host huge amounts of arguably irrelevant content generated by their users, and mine the data to generate profiles for their users.  On its surface, this is an amazing piece of work, because these companies have effectively constructed automated systems to document the lives of everybody currently alive.  Let that sink in for a moment: Facebook is capable of generating a moderately detailed biography for each of this planet’s 7 billion people (provided they each were to provide Facebook with some basic data).

What would you do with a biography of someone distilled from advertising data (advertising data because that’s what Facebook exists to do- sell information about what you might like to buy to advertisers)?  I don’t know, but the future has a way of finding interesting ways to use existing data.  In some distant future, maybe a project might seek to reconstruct (even resurrect, by a way of thinking) everybody who ever lived.  There are innumerable possibilities for what might be done with the data (this goes for anything, not just biographical data like this), but it becomes impossible to use it if it gets destroyed.

The historical bane of all archives has been capacity.  With digital archives, this is a significantly smaller problem.  With multi-terabyte hard disks costing on the order of $0.10 per gigabyte and solid-state memory continuing to follow the pace of Moore’s law (although probably not for much longer), it is easier than ever to store huge amounts of information, dwarfing the largest collections of yesteryear.  As long as storage capacity continues to grow (we’ve only recently scratched the surface of using quantum phenomena (holography) for data storage, for example), the sheer amount of data generated by nearly any process is not a concern.

Back on topic

Returning from that digression, the point of switching this site back to a WordPress backend is to get data out to the public more reliably and faster, in order to preserve the information more permanently.  What finally pushed me back was a sudden realization that there’s nothing stopping me from customizing WordPress in a similar fashion to what I did on the Hyde-based site- it simply requires a bit of experience with the backend code.  While PHP is one language I tend to loathe, the immediate utility of a working system is more valuable than the potential utility of a system I need to program myself.

There’s another lesson I can derive from this experience, too: building a flexible system is good, but you should distribute it ready-to-go for a common use case.  Reducing the barrier to entry for a tool can make or break it, and tools that go unused are of no use- getting people using a new creation is the primary barrier to progress

 

mkg3a

Casio’s FX-CG, or Prizm, is a rather interesting device, and the programmers over on Cemetech seem to have found it worthwhile to make the Prizm do their bidding in software.

The Prizm device itself is based around some sort of SuperH core, identified at times in the system software as a SH7305 a “SH7780 or thereabouts”. The 7780 is not an exact device, though, and it’s likely a licensed SH4 core in a Casio ASIC. Whatever the case, GCC targeted for sh and compiling without the FPU (-m4a-nofpu) and in big-endian mode (-mb) seems to work on the hardware provided.

Between Jonimus and myself (with input from other users on what configurations will work), we’ve assembled a GCC-based toolchain targeting the Prizm. Jon put together a cross-compiler for sh with some supporting scripts, while I contributed a linker script and runtime initialization routine (crt0), both of which were adapted from Kristaba’s work.

With that, we can build binaries targetting sh and linked such that they’ll run on the Prizm, but that alone isn’t very useful. Jon also created libfxcg, a library providing access to the syscalls on the platform. Finally, I created mkg3a, a tool to pack the raw binaries output by the linker into the g3a files accepted by the device.

Rumor has it the whole set of tools works. I haven’t been able to verify that myself since I don’t have a Prizm of my own, but it’s all out there. Tarballs of the whole package are over on Jon’s site, for anyone interested.

Pointless Linux Hacks

I nearly always find it interesting to muck about in someone else’s code, often to add simple features or to make it do something silly, and the Linux kernel is no exception to that. What follows is my own first adventure into patching Linux to do my evil bidding.

Aside from mucking about in code for fun, digging through public source code such as that provided by Linux can be very useful when developing something new.

A short story

I was doing nothing of particular importance yesterday afternoon when I was booting up my previously mentioned netbook. The machine usually runs on a straight framebuffer powered by KMS on i915 hardware, and my kernel is configured to show the famous Tux logo while booting.

Readers familiar with the logo behaviour might already see where I’m going with this, but the kernel typically displays one copy of the logo for each processor in the system (so a uniprocessor machine shows one tux, a quad-core shows four, etc..). As a bit of a joke, then, suggested a friend, why not patch my kernel to make it look like a much more powerful machine than it really is? Of course, that’s exactly what I did, and here’s the patch for Linux 2.6.38.

--- drivers/video/fbmem.c.orig	2011-04-14 07:26:34.865849376 -0400
+++ drivers/video/fbmem.c	2011-04-13 13:06:28.706011678 -0400
@@ -635,7 +635,7 @@
 	int y;

 	y = fb_show_logo_line(info, rotate, fb_logo.logo, 0,
-			      num_online_cpus());
+			      4 * num_online_cpus());
 	y = fb_show_extra_logos(info, y, rotate);

 	return y;

Quite simply, my netbook now pretends to have an eight-core processor (the Atom with SMT reports two logical cores) as far as the visual indications go while booting up.

Source-diving

Thus we come to source-diving, a term I’ve borrowed from the community of Nethack players to describe the process of searching for the location of a particular piece of code in some larger project.

Diving in someone else’s source is frequently useful, although I don’t have any specific examples of it in my own work at the moment. For an outside example, have a look at musca, which is a tiling window manager for X which was written from scratch but used ratpoison and dwm (two other X window managers) as models:

Musca’s code is actually written from scratch, but a lot of useful stuff was gleaned from reading the source code of those two excellent projects.

A personal recommendation for anyone seeking to go source-diving: become good friends with grep. In the case of my patch above, the process went something like this:

  • grep -R LOGO_LINUX linux-2.6.38/ to find all references to LOGO_LINUX in the source tree.
  • Examine the related files, find drivers/video/fbmem.c, which contains the logo display code.
  • Find the part which controls the number of logos to display by searching that file for ‘cpu’, assuming (correctly) that it must call some outside function to get the number of CPUs active in the system.
  • Patch line 638 (for great justice).

Next up in my source-diving adventures will be finding the code which controls what happens when the user presses control+alt+delete, in anticipation of sometime rewriting fb-hitler into a standalone kernel rather than a program running on top of Linux..

Of Links and Kana

I sometimes use Links on various computers when I can’t be bothered to deal with a full graphical environment and just want to look something up. Given I also try to ensure that this site renders in an acceptable manner in text-only mode, Links is indispensable at times.

Now imagine my surprise when I discovered that Links will try to transliterate Japanese kana (a general term for the scripts in which characters correspond to syllables, rather than more abstract ideas such as in kanji) to some extent.

Links romanizing some kana on this page
See page title at center-top.

In that shot, Links has translated the kana in my page’s header to a reasonable romanization- the pronounciation of those characters would be Tari, as in the beginnings of ‘tan’ and ‘return’. I don’t know if that was a recent feature (I’m currently running Links 2.3pre1), but it was a pleasant surprise to see it romanizing kana.

Obfuscation for Fun and Profit

One of the fun things to do with computer languages is abuse them. Confusing human readers of code can be pretty easy, but it takes a specially crafted program to be thoroughly incomprehensible to readers of the source code yet still be legal within the syntax of whatever language the program is written in.

Not dissimilar from building a well-obfuscated program is using esoteric languages and building quines. All of these things can be mind-bending but also provide excellent learning resources for some dark corners of language specification, as well as the occasional clever optimization.

Obfuscation

It’s not uncommon for malware source code to be pretty heavily obfuscated, but that’s nothing compared to properly obfuscated code. What follows is some publically-released Linux exploit code.

ver = wtfyourunhere_heee(krelease, kversion);
if(ver < 0)
    __yyy_tegdtfsrer("!!!  Un4bl3 t0 g3t r3l3as3 wh4t th3 fuq!n");
__gggdfstsgdt_dddex("$$$ K3rn3l r3l3as3: %sn", krelease);
if(argc != 1) {
   while( (ret = getopt(argc, argv, "siflc:k:o:")) > 0) {
      switch(ret) {
          case 'i':
              flags |= KERN_DIS_GGDHHDYQEEWR4432PPOI_LSM|KERN_DIS_DGDGHHYTTFSR34353_FOPS;
              useidt=1; // u have to use -i to force IDT Vector
              break;
          case 'f':
              flags |= KERN_DIS_GGDHHDYQEEWR4432PPOI_LSM|KERN_DIS_GGDYYTDFFACVFD_IDT;
              break;

It reads like gibberish, but examination of the numerous #define statements at beginning of that file and some find/replace action make quick work to deobfuscate the source. Beyond that, the sheer pointlessness of ‘1337 5p33k’ in status messages makes my respect for the author plummet, no matter how skilled they may be at creating exploits.

Let’s now consider an entry to the International Obfuscated C Code Contest (IOCCC) from 1986, submitted by Jim Hague:

#define    DIT (
#define DAH )
#define __DAH   ++
#define DITDAH  *
#define DAHDIT  for
#define DIT_DAH malloc
#define DAH_DIT gets
#define _DAHDIT char
_DAHDIT _DAH_[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:"
;main           DIT         DAH{_DAHDIT
DITDAH          _DIT,DITDAH     DAH_,DITDAH DIT_,
DITDAH          _DIT_,DITDAH        DIT_DAH DIT
DAH,DITDAH      DAH_DIT DIT     DAH;DAHDIT
DIT _DIT=DIT_DAH    DIT 81          DAH,DIT_=_DIT
__DAH;_DIT==DAH_DIT DIT _DIT        DAH;__DIT
DIT'n'DAH DAH      DAHDIT DIT      DAH_=_DIT;DITDAH
DAH_;__DIT      DIT         DITDAH
_DIT_?_DAH DIT      DITDAH          DIT_ DAH:'?'DAH,__DIT
DIT' 'DAH,DAH_ __DAH    DAH DAHDIT      DIT
DITDAH          DIT_=2,_DIT_=_DAH_; DITDAH _DIT_&&DIT
DITDAH _DIT_!=DIT   DITDAH DAH_>='a'?   DITDAH
DAH_&223:DITDAH     DAH_ DAH DAH;       DIT
DITDAH          DIT_ DAH __DAH,_DIT_    __DAH DAH
DITDAH DIT_+=       DIT DITDAH _DIT_>='a'?  DITDAH _DIT_-'a':0
DAH;}_DAH DIT DIT_  DAH{            __DIT DIT
DIT_>3?_DAH     DIT          DIT_>>1 DAH:''DAH;return
DIT_&1?'-':'.';}__DIT DIT           DIT_ DAH _DAHDIT
DIT_;{DIT void DAH write DIT            1,&DIT_,1 DAH;}

What does it do? I couldn’t say without spending a while examining the code. Between clever abuse of the C preprocessor to redefine important language constructs and use of only a few language elements, it’s very difficult to decipher that program. According to the author’s comments, it seems to convert ASCII text on standard input to Morse code.

Aside from (ab)using the preprocessor extensively, IOCCC entries frequently use heavily optimized algorithms which do clever manipulation of data in only a few statements. For a good waste of time, I suggest browsing the list of IOCCC winners. At the least, C experts can work through some pretty good brain teasers, and C learners might pick up some interesting tricks or learn something new while puzzling through the code.

So what? Obfuscating code intentionally is fun and makes for an interesting exercise.

Quines

Another interesting sort of program is a quine- a program that prints its own source code when run. Wikipedia has plenty of information on quines as well as a good breakdown on how to create one. My point in discussing quines, however, is simply to point out a fun abuse of the quine ‘rules’, as it were. Consider the following:

#!/bin/cat

On a UNIX or UNIX-like system, that single line is a quine, because it’s abusing the shebang. The shebang (‘#!’), when used in a plain-text file, indicates to the kernel when loading a file with intent to run it that the file is not itself executable, but should be interpreted.

The system then invokes the program given on the shebang line (in this case /bin/cat) and gives the name of the original file as an argument. Effectively, this makes the system do the following, assuming that line is in the file quine.sh:

$ /bin/cat quine.sh

As most UNIX users will know, cat takes all inputs and writes them back to output, and is useful for combining multiple files (invocation like cat file1 file2 > both) or just viewing the contents of a file as plain text on the terminal. Final result: cat prints the contents of quine.sh.

Is that an abuse of the quine rules? Possibly. Good for learning more about system internals? Most definitely.

Esoteric Languages

Finally in our consideration of mind-bending ways to (ab)use computer languages, we come to the general topic of esoteric languages. Put concisely, an esoteric language is one intended to be difficult to use or just be unusual in some way. Probably the most well-known one is brainfuck, which is.. aptly named, being Turing-complete but also nearly impossible to create anything useful with.

The Esoteric language site has a variety of such languages listed, few of which are of much use. However, the mostly arbitrary limitations imposed on programmers in such languages can make for very good logic puzzles and often require use of rarely-seen tricks to get anything useful done.

One of my personal favorites is Petrovich. More of a command interpreter than programming language, Petrovich does whatever it wants and must be trained to do the desired operations.