mkg3a

Casio’s FX-CG, or Prizm, is a rather interesting device, and the programmers over on Cemetech seem to have found it worthwhile to make the Prizm do their bidding in software.

The Prizm device itself is based around some sort of SuperH core, identified at times in the system software as a SH7305 a “SH7780 or thereabouts”. The 7780 is not an exact device, though, and it’s likely a licensed SH4 core in a Casio ASIC. Whatever the case, GCC targeted for sh and compiling without the FPU (-m4a-nofpu) and in big-endian mode (-mb) seems to work on the hardware provided.

Between Jonimus and myself (with input from other users on what configurations will work), we’ve assembled a GCC-based toolchain targeting the Prizm. Jon put together a cross-compiler for sh with some supporting scripts, while I contributed a linker script and runtime initialization routine (crt0), both of which were adapted from Kristaba’s work.

With that, we can build binaries targetting sh and linked such that they’ll run on the Prizm, but that alone isn’t very useful. Jon also created libfxcg, a library providing access to the syscalls on the platform. Finally, I created mkg3a, a tool to pack the raw binaries output by the linker into the g3a files accepted by the device.

Rumor has it the whole set of tools works. I haven’t been able to verify that myself since I don’t have a Prizm of my own, but it’s all out there. Tarballs of the whole package are over on Jon’s site, for anyone interested.

Pointless Linux Hacks

I nearly always find it interesting to muck about in someone else’s code, often to add simple features or to make it do something silly, and the Linux kernel is no exception to that. What follows is my own first adventure into patching Linux to do my evil bidding.

Aside from mucking about in code for fun, digging through public source code such as that provided by Linux can be very useful when developing something new.

A short story

I was doing nothing of particular importance yesterday afternoon when I was booting up my previously mentioned netbook. The machine usually runs on a straight framebuffer powered by KMS on i915 hardware, and my kernel is configured to show the famous Tux logo while booting.

Readers familiar with the logo behaviour might already see where I’m going with this, but the kernel typically displays one copy of the logo for each processor in the system (so a uniprocessor machine shows one tux, a quad-core shows four, etc..). As a bit of a joke, then, suggested a friend, why not patch my kernel to make it look like a much more powerful machine than it really is? Of course, that’s exactly what I did, and here’s the patch for Linux 2.6.38.

--- drivers/video/fbmem.c.orig	2011-04-14 07:26:34.865849376 -0400
+++ drivers/video/fbmem.c	2011-04-13 13:06:28.706011678 -0400
@@ -635,7 +635,7 @@
 	int y;

 	y = fb_show_logo_line(info, rotate, fb_logo.logo, 0,
-			      num_online_cpus());
+			      4 * num_online_cpus());
 	y = fb_show_extra_logos(info, y, rotate);

 	return y;

Quite simply, my netbook now pretends to have an eight-core processor (the Atom with SMT reports two logical cores) as far as the visual indications go while booting up.

Source-diving

Thus we come to source-diving, a term I’ve borrowed from the community of Nethack players to describe the process of searching for the location of a particular piece of code in some larger project.

Diving in someone else’s source is frequently useful, although I don’t have any specific examples of it in my own work at the moment. For an outside example, have a look at musca, which is a tiling window manager for X which was written from scratch but used ratpoison and dwm (two other X window managers) as models:

Musca’s code is actually written from scratch, but a lot of useful stuff was gleaned from reading the source code of those two excellent projects.

A personal recommendation for anyone seeking to go source-diving: become good friends with grep. In the case of my patch above, the process went something like this:

  • grep -R LOGO_LINUX linux-2.6.38/ to find all references to LOGO_LINUX in the source tree.
  • Examine the related files, find drivers/video/fbmem.c, which contains the logo display code.
  • Find the part which controls the number of logos to display by searching that file for ‘cpu’, assuming (correctly) that it must call some outside function to get the number of CPUs active in the system.
  • Patch line 638 (for great justice).

Next up in my source-diving adventures will be finding the code which controls what happens when the user presses control+alt+delete, in anticipation of sometime rewriting fb-hitler into a standalone kernel rather than a program running on top of Linux..

Of Links and Kana

I sometimes use Links on various computers when I can’t be bothered to deal with a full graphical environment and just want to look something up. Given I also try to ensure that this site renders in an acceptable manner in text-only mode, Links is indispensable at times.

Now imagine my surprise when I discovered that Links will try to transliterate Japanese kana (a general term for the scripts in which characters correspond to syllables, rather than more abstract ideas such as in kanji) to some extent.

Links romanizing some kana on this page
See page title at center-top.

In that shot, Links has translated the kana in my page’s header to a reasonable romanization- the pronounciation of those characters would be Tari, as in the beginnings of ‘tan’ and ‘return’. I don’t know if that was a recent feature (I’m currently running Links 2.3pre1), but it was a pleasant surprise to see it romanizing kana.

Obfuscation for Fun and Profit

One of the fun things to do with computer languages is abuse them. Confusing human readers of code can be pretty easy, but it takes a specially crafted program to be thoroughly incomprehensible to readers of the source code yet still be legal within the syntax of whatever language the program is written in.

Not dissimilar from building a well-obfuscated program is using esoteric languages and building quines. All of these things can be mind-bending but also provide excellent learning resources for some dark corners of language specification, as well as the occasional clever optimization.

Obfuscation

It’s not uncommon for malware source code to be pretty heavily obfuscated, but that’s nothing compared to properly obfuscated code. What follows is some publically-released Linux exploit code.

ver = wtfyourunhere_heee(krelease, kversion);
if(ver < 0)
    __yyy_tegdtfsrer("!!!  Un4bl3 t0 g3t r3l3as3 wh4t th3 fuq!n");
__gggdfstsgdt_dddex("$$$ K3rn3l r3l3as3: %sn", krelease);
if(argc != 1) {
   while( (ret = getopt(argc, argv, "siflc:k:o:")) > 0) {
      switch(ret) {
          case 'i':
              flags |= KERN_DIS_GGDHHDYQEEWR4432PPOI_LSM|KERN_DIS_DGDGHHYTTFSR34353_FOPS;
              useidt=1; // u have to use -i to force IDT Vector
              break;
          case 'f':
              flags |= KERN_DIS_GGDHHDYQEEWR4432PPOI_LSM|KERN_DIS_GGDYYTDFFACVFD_IDT;
              break;

It reads like gibberish, but examination of the numerous #define statements at beginning of that file and some find/replace action make quick work to deobfuscate the source. Beyond that, the sheer pointlessness of ‘1337 5p33k’ in status messages makes my respect for the author plummet, no matter how skilled they may be at creating exploits.

Let’s now consider an entry to the International Obfuscated C Code Contest (IOCCC) from 1986, submitted by Jim Hague:

#define    DIT (
#define DAH )
#define __DAH   ++
#define DITDAH  *
#define DAHDIT  for
#define DIT_DAH malloc
#define DAH_DIT gets
#define _DAHDIT char
_DAHDIT _DAH_[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:"
;main           DIT         DAH{_DAHDIT
DITDAH          _DIT,DITDAH     DAH_,DITDAH DIT_,
DITDAH          _DIT_,DITDAH        DIT_DAH DIT
DAH,DITDAH      DAH_DIT DIT     DAH;DAHDIT
DIT _DIT=DIT_DAH    DIT 81          DAH,DIT_=_DIT
__DAH;_DIT==DAH_DIT DIT _DIT        DAH;__DIT
DIT'n'DAH DAH      DAHDIT DIT      DAH_=_DIT;DITDAH
DAH_;__DIT      DIT         DITDAH
_DIT_?_DAH DIT      DITDAH          DIT_ DAH:'?'DAH,__DIT
DIT' 'DAH,DAH_ __DAH    DAH DAHDIT      DIT
DITDAH          DIT_=2,_DIT_=_DAH_; DITDAH _DIT_&&DIT
DITDAH _DIT_!=DIT   DITDAH DAH_>='a'?   DITDAH
DAH_&223:DITDAH     DAH_ DAH DAH;       DIT
DITDAH          DIT_ DAH __DAH,_DIT_    __DAH DAH
DITDAH DIT_+=       DIT DITDAH _DIT_>='a'?  DITDAH _DIT_-'a':0
DAH;}_DAH DIT DIT_  DAH{            __DIT DIT
DIT_>3?_DAH     DIT          DIT_>>1 DAH:''DAH;return
DIT_&1?'-':'.';}__DIT DIT           DIT_ DAH _DAHDIT
DIT_;{DIT void DAH write DIT            1,&DIT_,1 DAH;}

What does it do? I couldn’t say without spending a while examining the code. Between clever abuse of the C preprocessor to redefine important language constructs and use of only a few language elements, it’s very difficult to decipher that program. According to the author’s comments, it seems to convert ASCII text on standard input to Morse code.

Aside from (ab)using the preprocessor extensively, IOCCC entries frequently use heavily optimized algorithms which do clever manipulation of data in only a few statements. For a good waste of time, I suggest browsing the list of IOCCC winners. At the least, C experts can work through some pretty good brain teasers, and C learners might pick up some interesting tricks or learn something new while puzzling through the code.

So what? Obfuscating code intentionally is fun and makes for an interesting exercise.

Quines

Another interesting sort of program is a quine- a program that prints its own source code when run. Wikipedia has plenty of information on quines as well as a good breakdown on how to create one. My point in discussing quines, however, is simply to point out a fun abuse of the quine ‘rules’, as it were. Consider the following:

#!/bin/cat

On a UNIX or UNIX-like system, that single line is a quine, because it’s abusing the shebang. The shebang (‘#!’), when used in a plain-text file, indicates to the kernel when loading a file with intent to run it that the file is not itself executable, but should be interpreted.

The system then invokes the program given on the shebang line (in this case /bin/cat) and gives the name of the original file as an argument. Effectively, this makes the system do the following, assuming that line is in the file quine.sh:

$ /bin/cat quine.sh

As most UNIX users will know, cat takes all inputs and writes them back to output, and is useful for combining multiple files (invocation like cat file1 file2 > both) or just viewing the contents of a file as plain text on the terminal. Final result: cat prints the contents of quine.sh.

Is that an abuse of the quine rules? Possibly. Good for learning more about system internals? Most definitely.

Esoteric Languages

Finally in our consideration of mind-bending ways to (ab)use computer languages, we come to the general topic of esoteric languages. Put concisely, an esoteric language is one intended to be difficult to use or just be unusual in some way. Probably the most well-known one is brainfuck, which is.. aptly named, being Turing-complete but also nearly impossible to create anything useful with.

The Esoteric language site has a variety of such languages listed, few of which are of much use. However, the mostly arbitrary limitations imposed on programmers in such languages can make for very good logic puzzles and often require use of rarely-seen tricks to get anything useful done.

One of my personal favorites is Petrovich. More of a command interpreter than programming language, Petrovich does whatever it wants and must be trained to do the desired operations.

Raptor Speech

In a fit of boredom this evening, I tried to see what the speech recognition in Windows 7 would give back when I made raptor noises into it. The result.. speaks pretty well for itself:

F and has and has a Hack it has A hack who know Her house Just how hot enough And who know how It has had To add up data at data to go out and It’s all of all Go ahead goal happened: how has a Staff headed to a

And if his own booth for th FFI have had for the hand-held her and who often have no

Btrfs

I recently converted the root filesystem on my netbook, a now rather old Acer Aspire One with an incredibly slow 1.8″ Flash SSD, from the ext3 I had been using for quite a while to the shiny new btrfs, which becomes more stable every time the Linux kernel gets updated. As I don’t keep any data of particular importance on there, I had no problem with running an experimental filesystem on it.

Not only was the conversion relatively painless, but the system now performs better than it ever did with ext3/4.

Conversion

Btrfs supports a nearly painless conversion from ext2/3/4 due to its flexible design. Because btrfs has almost no fixed locations for metadata on the disc, it is actually possible to allocate btrfs metadata inside the free space in an ext filesystem. Given that, all that’s required to convert a filesystem is to run btrfs-convert on it- the only requirement is that the filesystem not be mounted.

As the test subject of this experiment was just my netbook, this was easy, since I keep a rather simple partition layout on that machine. In fact, before the conversion, I had a single 8GB ext4 partition on the system’s rather pathetic SSD, and that was the extent of available storage. After backing up the contents of my home directory to another machine, I proceeded to decimate the contents of my home directory and drop the amount of storage in-use from about 6GB to more like 3GB, a healthy gain.

Linux kernel

To run a system on Btrfs, there must, of course, be support for it in the kernel. Because I customarily build my own kernels on my netbook, it was a simple matter of enabling Btrfs support and rebuilding my kernel image. Most distribution kernels probably won’t have such support enabled since the filesystem is still under rather heavy development, so it was fortunate that my setup made it so easy.

GRUB

The system under consideration runs GRUB 2, currently version 1.97, which has no native btrfs support. That’s a problem, as I was hoping to only have a single partition. With a little research, it was easy to find that no version of GRUB currently supports booting from btrfs, although there is an experimental patchset with provides basic btrfs support in a module. Unfortunately, to load a module, GRUB needs to be able to read the partition in which the module resides. If my /boot is on btrfs, that’s a bit troublesome. Thus, the only option is for me to create a separate partition for /boot, containing GRUB’s files and my Linux kernel image to boot, formatted with some other file system. The obvious choice was the tried-and-true ext3.

This presents a small problem, in that I need to resize my existing root partition to make room on the disc for a small /boot partition. Easily remedied, however, with application of the Ultimate Boot CD, which includes the wonderful Parted Magic. GParted, included in Parted Magic, made short work of resizing the existing partition and its filesystem, as well as moving that partition to the end of the disc, which eventually left me with a shiny new ext3 partition filling the first 64MB of the disc.

Repartitioning

After creating my new /boot partition, it was a simple matter of copying the contents of /boot on the old partition to the new one, adjusting the fstab, and changing my kernel command line in the GRUB config file to mount /dev/sda2 as root rather than sda1.

Move the contents of /boot:

$ mount /dev/sda1 /mnt/boot
$ cp -a /boot /mnt/boot
$ rm -r /boot

Updated fstab:

/dev/sda1       /boot   ext3    defaults    0 1
/dev/sda2       /       btrfs   defaults    0 1

Finishing up

Finally, it was time to actually run btrfs-convert. I booted the system into the Arch Linux installer (mostly an arbitrary choice, since I had that image laying around) and installed the btrfs utilities package (btrfs-progs-unstable) in the live environment. Then it was a simple matter of running btrfs-convert on /dev/sda2 and waiting about 15 minutes, during which time the disc was being hit pretty hard. Finally, a reboot.

..following which the system failed to come back up, with GRUB complaining loudly about being unable to find its files. I booted the system from the Arch installer once again and ran grub-install on sda1 in order to reconfigure GRUB to handle the changed disc layout. With another reboot, everything was fine.

With my new file system in place, I took some time to tweak the mount options for the new partition. Btrfs is able to tune itself for solid-state devices, and will set those options automatically. From the Btrfs FAQ:

There are some optimizations for SSD drives, and you can enable them by mounting with -o ssd. As of 2.6.31-rc1, this mount option will be enabled if Btrfs is able to detect non-rotating storage.

However, there’s also a ssd_spread option:

Mount -o ssd_spread is more strict about finding a large unused region of the disk for new allocations, which tends to fragment the free space more over time. Mount -o ssd_spread is often faster on the less expensive SSD devices

That sounds exactly like my situation- a less expensive SSD device which is very slow when doing extensive writes to ext3/4. In addition to ssd_spread, I turned on the noatime option for the filesystem, which cuts down on writes at the expense of not recording access times for files and directories on the file system. As I’m seldom, if ever, concerned with access times, and especially so on my netbook, I lose nothing from such a change and gain (hopefully) increased performance.

Thus, my final (optimized) fstab line for the root filesystem:

/dev/sda2       /       btrfs   defaults,noatime,ssd_spread    0

Results

After running with the new setup for about a week and working on normal tasks with it, I can safely say that on my AA1, Btrfs with ssd_spread is significantly more responsive than ext4 ever was. While running Firefox, for example, the system would sometimes stop responding to input while hitting the disc fairly hard.

With Btrfs, I no longer have any such problem- everything remains responsive even under fairly high I/O load (such as while Firefox is downloading data from Firefox Sync, or when I’m applying updates).

Monday Link Dump

It’s a Christmas miracle!  There’s a new post!  Or maybe not, but take what you can get.  Here are some fun links.

  • It’s hardly a secret that LEDs may also be used as rather poor photodiodes, but this paper from Mitsubishi Research Laboratories goes into great detail in how such properties may be exploited for short-range wireless communication with only a few parts on a microcontroller.
  • Boing Boing has a neat gallery of technology in use at the US Library of Congress to digitize collections.
  • A ridiculously nice panorama of the Milky Way as seen from the summit of Chimborazo, the highest peak in Ecuador.
  • I feel like the esoteric language Petrovich could be implemented amusingly with a genetic algorithm to come up with pseudo-random actions.
  • I take a bit more of an interest into computer graphics than other things which I don’t consider my actual field of expertise, so neat things like the seam carving scheme for image resizing/retargeting are of particular interest, especially when they’re as clever as that one.
  • Rediscovered sketch2photo while browsing things related to seam carving, which is also worth checking out.
  • Knowing a bit of information theory is very very useful for anyone working with software, especially when data compression is concerned.  David MacKay’s book on information theory is an enlightening bit of work (although I have yet to get far into it) and you can’t beat free digital copies.
  • Okonomiyaki sounds tasty.  Will have to keep it in mind for sometime when I’m actually cooking.

That’s it for the links I’ve stockpiled here.  Some ideas on chording keyboarding and image processing for personal amusement will hopefully materialize into a coherent blog post sometime soon.

With that, here’s an interesting bit of wisdom from the hacker community which I can’t recall where it came from:

The virtual adept does not own the information it creates, and thus
has no right or desire to profit from it. The virtual adept exists
purely to manifest the infinite potential of information in to
information itself, and to minimize the complexity of an
information request in a way that will benefit all conscious
entities. What is not information is not consequential to the
virtual adept, not money, not fame, not power.

Am I a hacker? No.
I am a student of virtuality.
I am the witch malloc,
I am the cult of the otherworld,
and I am the entropy.

I am Phantasmal Phantasmagoria,
and I am a virtual adept.

Oh, Hi

It’s been a while since I posted anything new, but there are currently three post drafts and another concept languishing.  I’ll get around to those sometime.  Here are some vector images to pass the time.

markov.py

This was a little for-fun project that I built: a Python module/script that can be used to semi-randomly generate words, based on Markov chains.

Background, implementation

I was inspired by recalling the story of the Automated Curse Generator, which seemed like something that would be interesting to implement for fun in my own time, as it did indeed turn out to be.  In short, the module examines input text and generates a graph with edges weighted based on character frequency, then traverses the graph to generate a word.

To generate the chains, the module builds a directed graph based on the seed text, where characters are linked to all the characters which are known to follow them, with edges weighted according to the percentage of all following characters any particular character consists of.  For example, the string “zezifadi r00lz dr” would generate the following graph, where the value of each edge is the probability of choosing that edge to leave the associated vertex:

Graphviz
Click for graphviz source code.

To generate a word, then, it can be as simple as starting at ‘ ‘ (the red node) and continuing to traverse the graph until another ‘ ‘ is encountered.  In reality, while that worked, it was awfully boring.  When seeded with some text in English, there was a disappointing number of short, boring (not to mention unpronounceable) words and far too few amusing longer ones.  Think ‘ad’ and ‘s’ rather than ‘throm’.

It was rather easy to generate more interesting words, however, by simply adding some word-length limits, defaulting to a minimum of 4 character and a maximum of 12, tunable via arguments to the word generation method of the map.  Rather than blindly following edges, as long as the word generated is shorter than the minimum, any chaining result of ‘ ‘ will be ignored.  When maximum length is reached, the word will be immediately terminated provided the current character has any connection to blank space.  If not, generation continues until such a connection is found.

What makes this so entertaining, I think, is its versatility.  Since word generation is based entirely on the character frequency statistics of the input text, it works for any language.  By extension, that means it could be easily be made to generate whole phrases in $(East-Asian language of your choice) by feeding it ideographs rather than Latin characters (ばかです (yes, I’m aware this is actually Kana)), or just nonsense that pronounces a lot like Simlish by putting in some other Simlish nonsense.

The script

Having implemented word generation in the module, it was reasonably short work to wrap the whole thing in a script so it could be invoked from the command line for great lulz.  Something like the following does a decent job of providing amusement by generating a word every 15 seconds.  For more fun, pipe the output into a speech synthesizer.

Tari@Kerwin ~ $ while markov.py; do sleep 15; done

Of course, before anything can be generated, a graph must be generated, which can be done via the -s option on the script or by invoking the addString method of MarkovMap.  Quick example:

Tari@Kerwin ~ $ # Add the given string to the current graph, or to a new one.
Tari@Kerwin ~ $ markov.py -s"String to seed with" -ffoo.pkl
IO error on foo.pkl, creating new map
seeeeed
Tari@Kerwin ~ $ # Add some Delmore Schwartz to the map via stdin
Tari@Kerwin ~ $ markov.py -ffoo.pkl -s- << EOF
> (This is the school in which we learn...)
>What is the self amid this  blaze?
>What am I now that I was then
>Which I shall suffer and act  again,
>The theodicy I wrote in my high school days
>Restored all  life from infancy,
>The children shouting are bright as they run
>(This  is the school in which they learn...)
>Ravished entirely in their  passing play!
>(...that time is the fire in which they burn.)
>EOF
idagheam
Tari@Kerwin ~ $ # Generate a word from the default graph in file markov.pkl
Tari@Kerwin ~ $ markov.py
awaike
Tari@Kerwin ~ $

Easy enough.  I’ve found that a Maori seed (via Project Gutenburg) makes for some of the more easily pronounced words, but any language will (mostly) generate words that are pronounceable via that language’s pronunciation rules.

For seeding with non-Latin character sets, the script can take the -l or –lax option (‘strict’ keyword parameter to MarkovMap.addString()), which removes the restriction keeping graphed characters as only alphabetic.  The downside, then, is that everything in the input is mapped out, so you’re much more likely to get garbage out unless the input is carefully sanitized of punctuation and such (GIGO, after all).

Code

Enough talk, I’m sure you just want to pick apart my code and play with nonsense words at this point.  Download link is below.  I’m providing the code under the Simplified BSD License so you’re allowed to do nearly anything with it, I just ask that you credit me for it in some way if you reuse or redistribute it.

Download markov.py

Wednesday link dump

Because I have nothing better to do right now, it’s a good time to dump the interesting links that I’ve been accumulating.

Glowy.  Also radioactive.
Cherenkov glow in the Advanced Test Reactor
  • While radioactive hunks of matter are often portrayed as glowing with a green tinge, we all know that’s not actually true.. unless there’s Cherenkov Radiation involved, as in many nuclear reactors- that’s not green, though.
  • Google have (for now) won the suit against them by Viacom regarding copyrighted content being uploaded to YouTube, which is good news for everyone except maybe Viacom.  It’s still fun to read choice excerpts of correspondence involving all sorts of mudslinging in the case (warning: lots of curses).
  • OpenStreetMap is a neat project to create free maps, similar to Google Maps, Bing Maps, etc.  Cool stuff, and all the map data is Creative Commons, meaning it could be used for any number of shiny projects.
  • There might be life on Saturn’s moon, Titan, observations courtesy of the NASA/ESA/ASI Cassini mission, which has been bouncing around the Saturnian system since mid-2004 after launch way back in 1997.  It’s far from a sure thing, but it’s really exciting that predictions of how life might work on Titan have been supported by observation.
  • This study (PDF) of internet routing to previously unused blocks is quite interesting, especially the numerous SIP streams pointed at 1.1.1.1 (section 5.1).
  • The EFF (kind of like the ACLU of internet, if you’re not familiar with them) recently put out the HTTPS Everywhere extension for Firefox.  When it’s this easy to lock down your web traffic, there’s no reason not to.  What’s your excuse?
  • Huge things are cool.  Want to feel tiny?  Go ask Wikipedia about the local supercluster, then consider how tiny everything humanity knows is, relative to that.  When you’re done scrabbling about in your own Total Perspective Vortex, consider epic timescales for extra kicks.  Yeah.. cosmology is awesome.

HUGE THINGS

..and that’s several weeks of accumulated cool-things.  Enjoy.