# Of Names and Localization

When I’m not thinking in or of computer languages, one of the things that I find consistently interesting is natural languages. As such, I’ll occasionally spend some time simply puzzling over bits of language (for which purpose Language Log is an excellent feed of topics).

As it happened, I spent some time today informing myself more on the fairly well-known conlangs Esperanto and Lojban. I find each of them interesting, although my usual pragmatic approach to things probably means I’ll never do any serious study or either.

The point of this rambling, however, is that an exercise in Lojban For Beginners challenges the reader to spell their name using the lojban orthography. Jumping off from there, I endeavoured to see how my name might be translated to other languages.

Beginning with Lojban, I believe my name might be written as pitir.mar’aini (using the Latin orthography). For the uninitiated, the ‘.’ is a full stop, since Lojban doesn’t routinely capitalize (it mainly serves to denote unusual emphasis in pronounciation) nor is spacing a strictly enforced part of notation. Beyond that, the apostrophe is actually pronounced as an ‘h’ would be in English, and is considered a letter rather than a piece of punctuation. Everything else is fairly straightforward, just using rather different spelling conventions than might be seen in English.

Overall, I find that the Lojban orthography is pretty easy to get a hold of as a native English speaker. But what about some other languages? I have a bit of experience with Japanese, so I gave that one a try.

Coming up with a proper equivalent of my name with Japanese orthography is a bit of a kludge, since Japanese names are traditionally written with kanji, and so may also be considered to encode literal meanings[citation needed]. Choosing appropriate kanji for a translation of my name is far outside my expertise and it would sound completely different (and thus diverges from the point of this exercise), so I’ll settle with a katakana approximation: ピーター・マルハイニ. I use katakana here because it is traditionally used for words of foreign origin, which indeed my name is.

It’s an interesting challenge to transliterate from a European language (English here) into Japanese, since the Japanese syllabary is almost exclusively open (that is, the sounds end with vowels). In this case, I had to fudge my given name (Peter), since Japanese completely lacks the sound that ‘r’ provides in English- it becomes ‘PeTa’ instead, which I find to be acceptably close (the ーs denote long vowels).

In ‘Marheine’, the ‘rh’ construct is difficult, since it’s a consonant cluster which doesn’t fit into the aforementioned open syllables. For that, I fudged it with ‘ru’, which is (as far as I can deduce) a fairly common trick.

While I’m considering Japanese pronounciation, it’s worth mentioning the characters in the header on this web site (タリ). That’s a representation of my usual alias, Tari, in katakana.

# Muse

I finally now got around to implementing something (anything!) in the Muse section on this site. Hopefully more material will trickle into there in the coming times as I move more ramblings, musings, and general useful information into there.

# Raptor Speech

In a fit of boredom this evening, I tried to see what the speech recognition in Windows 7 would give back when I made raptor noises into it. The result.. speaks pretty well for itself:

F and has and has a Hack it has A hack who know Her house Just how hot enough And who know how It has had To add up data at data to go out and It’s all of all Go ahead goal happened: how has a Staff headed to a

And if his own booth for th FFI have had for the hand-held her and who often have no

It’s a Christmas miracle!  There’s a new post!  Or maybe not, but take what you can get.  Here are some fun links.

• It’s hardly a secret that LEDs may also be used as rather poor photodiodes, but this paper from Mitsubishi Research Laboratories goes into great detail in how such properties may be exploited for short-range wireless communication with only a few parts on a microcontroller.
• Boing Boing has a neat gallery of technology in use at the US Library of Congress to digitize collections.
• A ridiculously nice panorama of the Milky Way as seen from the summit of Chimborazo, the highest peak in Ecuador.
• I feel like the esoteric language Petrovich could be implemented amusingly with a genetic algorithm to come up with pseudo-random actions.
• I take a bit more of an interest into computer graphics than other things which I don’t consider my actual field of expertise, so neat things like the seam carving scheme for image resizing/retargeting are of particular interest, especially when they’re as clever as that one.
• Rediscovered sketch2photo while browsing things related to seam carving, which is also worth checking out.
• Knowing a bit of information theory is very very useful for anyone working with software, especially when data compression is concerned.  David MacKay’s book on information theory is an enlightening bit of work (although I have yet to get far into it) and you can’t beat free digital copies.
• Okonomiyaki sounds tasty.  Will have to keep it in mind for sometime when I’m actually cooking.

That’s it for the links I’ve stockpiled here.  Some ideas on chording keyboarding and image processing for personal amusement will hopefully materialize into a coherent blog post sometime soon.

With that, here’s an interesting bit of wisdom from the hacker community which I can’t recall where it came from:

The virtual adept does not own the information it creates, and thus
has no right or desire to profit from it. The virtual adept exists
purely to manifest the infinite potential of information in to
information itself, and to minimize the complexity of an
information request in a way that will benefit all conscious
entities. What is not information is not consequential to the
virtual adept, not money, not fame, not power.

Am I a hacker? No.
I am a student of virtuality.
I am the witch malloc,
I am the cult of the otherworld,
and I am the entropy.

I am Phantasmal Phantasmagoria,
and I am a virtual adept.


# markov.py

This was a little for-fun project that I built: a Python module/script that can be used to semi-randomly generate words, based on Markov chains.

## Background, implementation

I was inspired by recalling the story of the Automated Curse Generator, which seemed like something that would be interesting to implement for fun in my own time, as it did indeed turn out to be.  In short, the module examines input text and generates a graph with edges weighted based on character frequency, then traverses the graph to generate a word.

To generate the chains, the module builds a directed graph based on the seed text, where characters are linked to all the characters which are known to follow them, with edges weighted according to the percentage of all following characters any particular character consists of.  For example, the string “zezifadi r00lz dr” would generate the following graph, where the value of each edge is the probability of choosing that edge to leave the associated vertex:

To generate a word, then, it can be as simple as starting at ‘ ‘ (the red node) and continuing to traverse the graph until another ‘ ‘ is encountered.  In reality, while that worked, it was awfully boring.  When seeded with some text in English, there was a disappointing number of short, boring (not to mention unpronounceable) words and far too few amusing longer ones.  Think ‘ad’ and ‘s’ rather than ‘throm’.

It was rather easy to generate more interesting words, however, by simply adding some word-length limits, defaulting to a minimum of 4 character and a maximum of 12, tunable via arguments to the word generation method of the map.  Rather than blindly following edges, as long as the word generated is shorter than the minimum, any chaining result of ‘ ‘ will be ignored.  When maximum length is reached, the word will be immediately terminated provided the current character has any connection to blank space.  If not, generation continues until such a connection is found.

What makes this so entertaining, I think, is its versatility.  Since word generation is based entirely on the character frequency statistics of the input text, it works for any language.  By extension, that means it could be easily be made to generate whole phrases in $(East-Asian language of your choice) by feeding it ideographs rather than Latin characters (ばかです (yes, I’m aware this is actually Kana)), or just nonsense that pronounces a lot like Simlish by putting in some other Simlish nonsense. ## The script Having implemented word generation in the module, it was reasonably short work to wrap the whole thing in a script so it could be invoked from the command line for great lulz. Something like the following does a decent job of providing amusement by generating a word every 15 seconds. For more fun, pipe the output into a speech synthesizer. Tari@Kerwin ~$ while markov.py; do sleep 15; done

Of course, before anything can be generated, a graph must be generated, which can be done via the -s option on the script or by invoking the addString method of MarkovMap.  Quick example:

Tari@Kerwin ~ $# Add the given string to the current graph, or to a new one. Tari@Kerwin ~$ markov.py -s"String to seed with" -ffoo.pkl
IO error on foo.pkl, creating new map
seeeeed
Tari@Kerwin ~ $# Add some Delmore Schwartz to the map via stdin Tari@Kerwin ~$ markov.py -ffoo.pkl -s- << EOF
> (This is the school in which we learn...)
>What is the self amid this  blaze?
>What am I now that I was then
>Which I shall suffer and act  again,
>The theodicy I wrote in my high school days
>Restored all  life from infancy,
>The children shouting are bright as they run
>(This  is the school in which they learn...)
>Ravished entirely in their  passing play!
>(...that time is the fire in which they burn.)
>EOF
idagheam
Tari@Kerwin ~ $# Generate a word from the default graph in file markov.pkl Tari@Kerwin ~$ markov.py
awaike
Tari@Kerwin ~ \$

Easy enough.  I’ve found that a Maori seed (via Project Gutenburg) makes for some of the more easily pronounced words, but any language will (mostly) generate words that are pronounceable via that language’s pronunciation rules.

For seeding with non-Latin character sets, the script can take the -l or –lax option (‘strict’ keyword parameter to MarkovMap.addString()), which removes the restriction keeping graphed characters as only alphabetic.  The downside, then, is that everything in the input is mapped out, so you’re much more likely to get garbage out unless the input is carefully sanitized of punctuation and such (GIGO, after all).

## Code

Enough talk, I’m sure you just want to pick apart my code and play with nonsense words at this point.  Download link is below.  I’m providing the code under the Simplified BSD License so you’re allowed to do nearly anything with it, I just ask that you credit me for it in some way if you reuse or redistribute it.

Because I have nothing better to do right now, it’s a good time to dump the interesting links that I’ve been accumulating.

• While radioactive hunks of matter are often portrayed as glowing with a green tinge, we all know that’s not actually true.. unless there’s Cherenkov Radiation involved, as in many nuclear reactors- that’s not green, though.
• Google have (for now) won the suit against them by Viacom regarding copyrighted content being uploaded to YouTube, which is good news for everyone except maybe Viacom.  It’s still fun to read choice excerpts of correspondence involving all sorts of mudslinging in the case (warning: lots of curses).
• OpenStreetMap is a neat project to create free maps, similar to Google Maps, Bing Maps, etc.  Cool stuff, and all the map data is Creative Commons, meaning it could be used for any number of shiny projects.
• There might be life on Saturn’s moon, Titan, observations courtesy of the NASA/ESA/ASI Cassini mission, which has been bouncing around the Saturnian system since mid-2004 after launch way back in 1997.  It’s far from a sure thing, but it’s really exciting that predictions of how life might work on Titan have been supported by observation.
• This study (PDF) of internet routing to previously unused blocks is quite interesting, especially the numerous SIP streams pointed at 1.1.1.1 (section 5.1).
• The EFF (kind of like the ACLU of internet, if you’re not familiar with them) recently put out the HTTPS Everywhere extension for Firefox.  When it’s this easy to lock down your web traffic, there’s no reason not to.  What’s your excuse?
• Huge things are cool.  Want to feel tiny?  Go ask Wikipedia about the local supercluster, then consider how tiny everything humanity knows is, relative to that.  When you’re done scrabbling about in your own Total Perspective Vortex, consider epic timescales for extra kicks.  Yeah.. cosmology is awesome.

..and that’s several weeks of accumulated cool-things.  Enjoy.

# PuTTYJL

After putting up with the lack of support for Windows 7’s jump lists in PuTTY for a while, I finally got tired enough of it to do something.  Nothing as cool as patching PuTTY to do them itself, but I wrote a wrapper which indexes the saved sessions, allowing the user to select which ones should be included in the list.

From the project page:

PuTTYJL is a wrapper and patch for PuTTY written in C# for .NET 3.5 and Windows 7, adding support for the new Jump Lists, allowing you to create jump list entries for saved sessions in the registry and optionally just launch the wrapper to start a default session in PuTTY.

Get it here.

# Hacking life

Today (er, yesterday) was a big day for science.  In the May 20th issue of Science, there was an interesting paper detailing how researchers at the J. Craig Venter Institute (yup, name means nothing to me there) successfully created a life-form containing entirely artificial DNA [abstract,PDF].  This is really exciting stuff.

As the authors of the paper note, sequencing genomes is nothing new, but there’s a gigantic leap between just knowing how something is made and being able to make it yourself.  Although this modified strain of yeast has mostly stock genes from other yeast and just over a million base pairs, Wired Science notes that our ability to manufacture chunks of DNA has grown by around 100x in the last five years.  Following such a linear pattern, we would be able to build a human genome from scratch (~3 billion base pairs) within ten years.  From here, where can we go?  Anywhere.

Consider what living things do in nature.  Now take some of that variety and modify it a little to do something more useful.  Say, design an enzyme allowing yeast to break down oil from spills and removing any other metabolic pathways.  You suddenly have a bacterium which eats oil spills, then the colony dies when the oil goes away.

Sure, something like that is a ways off; we don’t have anywhere the necessary knowledge of the biochemistry involved in such a thing (or do we..?  I could be entirely wrong).  Proteins are amazingly complex molecules, and their assembly/folding is rather poorly understood at best.  However, give it a while, and we could begin to do radical things within the framework of living things.  Say, custom-designed viruses to patch our genomes.  Literally, life hacking.

This is simply incredible stuff, and it’s the first step toward the singularity, IMHO.  More thoughts on that in the coming days.

# Ubunchu?

Go read.  You can come back in a bit and be mildly confused.

..or don’t (if you did, good).

Ubunchu is.. interesting.  I’ve lately taken to ignoring most things that so much as mention UbunchuUbuntu (as my StumbleUpon history will attest to), but it’s good to know that there’s still plenty of sense, if you know where to look.  Not everyone is spewing nonsense from their nostrils, evidently.

## The truth is over there

There’s a pretty impressive amount of truth lurking in Ubunchu under the silliness and not-quite-OS-tan levels of moe.  Basically every opinion voiced in the manga is accurate, IMHO.  In my mind, there’s a place for each OS- each has is strengths, and certain weaknesses- what you get from any system is a combination of what it can offer well and what you put in.

Ubuntu, for example, is a very newbie-friendly Linux distro, and it’s very good at being free and (generally) easy to use.  Try to control it too much, and it might break on you- that’s just how the game works.

Arch, my Linux distro of choice, is quite different- it’s one that expects you, the user, to go poking around and configure things yourself.  Arch is great at being configured to exactly fit your needs, provided you’re willing to take the time to learn things for yourself (do eeeet, I say).

Windows is good at being itself- not free, but generally worth the price for those who know what they’re doing.  I’ll freely admit that Windows tends to be overkill and even something of a liability (what with malware and all) for uninformed users, but I feel that power users can get a lot out of Windows with excellent dev tools like Visual Studio, a huge multitude of games, and the multiple ways to do just about anything.  Say what you want about Microsoft, but their developer support is superb.

OSX?  Well, it’s good at being simple to use and not free.  I won’t say much on that since I generally make a point of ignoring Apple products, but I hear such systems are well-liked in the creative community.  My main complaint is that it generally only gives you one way to do whatever it is you might want to do, which is rather painful for someone like me, who likes to poke around in things.

## Bottom line

Operating systems bring a variety of benefits to the table, and it’s up to you, the user, to decide which one(s) fit your computer usage style best.  Now if you’ll excuse me, I’m off to SSH into my server to check on my CLI IRC client. ( ^‐^)_

Oh, and I want a sysadmin’s club.

# CPU Comparison Shopping

I’ve been slowly working towards putting together a new PC build to replace my current one, a Core 2 Duo- based system I built about three years ago, which is starting to show its age.  In the interest of comparison shopping, I put together a spreadsheet and some charts looking at the newer Intel (i5/i7) and AMD (Phenom X4/X6) processors.  Turns out that Intel’s Core i5-750 seems to be the best deal in processors for what I’m looking for in a system at the moment.

## Raw Data

Clock speeds are in MHz, TDP in Watts, and cost is price in USD at newegg as of 5/3/2010.  Processors with SMT (hyperthreading) are noted in the Cores column.

 Manufacturer Model Cores Clock TDP Cost AMD Phenom II X4 955 BE 4 3200 125 159.99 AMD Phenom II X4 940 BE 4 3000 125 161.99 AMD Phenom II X4 965 BE 4 3400 125 180.99 AMD Phenom II X6 1090T 6 3200 125 309.99 Intel Core i5-650 2 3200 73 184.99 Intel Core i5-661 2 3330 87 199.99 Intel Core i7-920 4 (SMT) 2660 130 279.99 Intel Core i7-930 4 (SMT) 2800 130 294.99 Intel Core i5-750 4 2660 95 199.99 Intel Core i7-860 4 (SMT) 2800 95 279.99