Notes to self, 2008

2008-12-26 - nfo file / encoding / viewer / linux

Viewing NFO files with less(1) can be a pain when you have to look at lots of non-printable characters enclosed in angle brackets (e.g. <DB><DB>...). You can use the LESSCHARSET environment variable — set it to "dos" or "latin9" — for the particular session (e.g. LESSCHARSET=dos less abc.nfo), but that will only tell less that these high ascii characters are printable, not that you want them in UTF-8. (Your terminal character set is UTF-8, right? It is 2008 after all.) Result, you see question marks instead of angle-bracketed escapes.

The solution, use a wrapper that calls iconv. Save this as /usr/local/bin/nfoless and give it execute permissions.

charset=cp437 # default

usage() {
        echo "Usage: $0 [OPTIONS] FILE..." >&2
        echo "Options: one of -(cp)1(252), -(cp)4(37), -(cp)8(50), -(latin)9" >&2
        exit 1

# Check arguments, parse options
[ -z "$*" ] && usage
while getopts "1489" opt ; do
        if [ "$opt" = "1" ]; then charset=cp1252
        elif [ "$opt" = "4" ]; then charset=cp437
        elif [ "$opt" = "8" ]; then charset=cp850
        elif [ "$opt" = "9" ]; then charset=latin9
        else usage

# Loop over files and call less
for f in "$@" ; do
        iconv -f "$charset" -t "utf-8" < "$f" | less

2008-12-12 - tcpdump / application data offset / tcp-ipv4

I frequently look at tcpdump output to examine application traffic. Most of the time I'm not interested in the IP and TCP headers. So, how do I know where the application data starts?

12:20:35.401025 IP (tos 0x0, ttl 64, id 42955, offset 0, flags [DF], proto TCP
(6), length 59) walter-desktop.local.46565 >
P, cksum 0x523c (incorrect (-> 0x6980), 16:23(7) ack 1 win 46
<nop,nop,timestamp 86717116 1854650221>
  0x0000:  4500 003b a7cb 4000 4006 40e3 0a65 0a9c  E..;..@.@.@..e..
  0x0010:  5bc2 e14b b5e5 0050 8423 8f63 9756 bab9  [..K...P.#.c.V..
  0x0020:  8018 002e 523c 0000 0101 080a 052b 32bc  ....R<.......+2.
  0x0030:  6e8b b76d 666f 6f62 6172 0a              n..mfoobar.

When looking at tcpdump (libpcap) output as seen above — and your goal is to find the application data — you should check the high order nibble of byte 0x14 + 0xC == 0x20, the "Data offset".

As you can see in the example, byte 0x20 has value 0x80, the high order nibble is 8. Multiply that by 4 (DWORD size) and you get the offset from the beginning of the TCP header (0x20).

(The TCP header begins at 0x14 in this case, because the low order nibble of byte 0x00 is 5. And as we all know 0x5 * 0x4 == 0x14. (The high order nibble is the IP version, which is 4 for IPv4.))

Therefore, the "foobar\n" typed by me, starts at 0x14 + 0x20 == 34. And that is correct, as you can see that 666f 6f62 6172 0a starts there.

To recap:

  • Byte 0x00 == 0x4? for IPv4 traffic
  • TCP begins at: (packet[0x0] & 0xF) << 2
    E.g.: nibble 5 means 0x14, look for TCP header size at 0x14 + 0xC == 0x20
  • Data begins at: (packet[TCP+0xC] & 0xF0) >> 2 + TCP
    E.g.: nibble 8 means 0x20, look for data at 0x14 + 0x20 == 0x34

2008-12-05 - GNU as compatible Hello World

When looking for the assembler Hello World for Linux, GNU as compatible snippets did not turn op. Modifying some found nasm code did the trick.

.globl _start
    .type   _start, @function
    movl    $4, %eax    /* system call number (sys_write) */
    movl    $1, %ebx    /* file descriptor (stdout) */
    movl    $msg, %ecx  /* message to write */
    movl    len, %edx   /* message length */
    int     $0x80       /* call kernel */

    movl    $1, %eax    /* system call number (sys_exit) */
    xorl    %ebx, %ebx  /* return value success */
    int     $0x80       /* call kernel */

    .string "Hello World!\12"
    .long   13

Compile with as -o hello.o hello.s && ld -o hello hello.o.

2008-11-16 - vlan connection / vtun / single client

I keep forgetting what vtun(1) settings I need to get a single client connected to an internal network somewhere. So, once and for all, here it is:


default {
  keepalive yes;
  type tun;
client_10_20_30_40 {
  proto tcp; # test udp first, your mileage may vary
  passwd some_secret_password;
  up {
    program /sbin/ip "addr flush dev %d"; # clear old config
    program /sbin/ip "link set %d up mtu 1470 # lower mtu to avoid fragments
    program /sbin/ip "route add dev %d scope link";
    program /sbin/arp "-sD -i %d %d pub";
  down {
    program /sbin/arp "-dD -i %d";
    program /sbin/ip "route del dev %d scope link";
    program /sbin/ip "link set %d down mtu 1500";


# replace eth0 with your LAN interface
net.ipv4.conf.eth0.proxy_arp = 1


# replace eth0 with your LAN interface
iptables -A FORWARD -i tun+ -o eth0 -j ACCEPT
iptables -A FORWARD -i eth0 -o tun+ -j ACCEPT


default {
  keepalive yes;
  persist no;
client_10_20_30_40 {
  passwd some_secret_password;
  up {
    program /sbin/ip "addr flush dev %d"; # clear old config
    program /sbin/ip "addr add broadcast dev %d";
    program /sbin/ip "link set %d up mtu 1470"; # lower mtu to avoid fragments
    # <-- perhaps prepend nameservers to resolv.conf here
  down {
    # <-- remove the prepended nameservers from resolv.conf here
    program /sbin/ip "link set %d down mtu 1500";
    program /sbin/ip "addr flush dev %d";


2008-08-20 - windows notepad security

I don't know how this got to be, but Editing a batch file with notepad.exe is suddenly a security risk!

Editing a batch file is a security risk!

Now, what do you do? Click "Run" and risk that you actually run the file? (Which you do not always want.)

In my case, it did run notepad and let me edit the file. But do you feel lucky?

(Guilty OS: Windows Server 2003 Standard/Web(?) Edition Service Pack 2)

2008-08-15 - non-c++ like protected / php

If you have a C++ background, protected and private member variables and functions can do confusing things in PHP. Let's consult the documentation:

C++ (Bjarne's C++ Glossary)

public member - a member accessible to all users of a class.
protected member - a member accessible only from classes derived from its class.
private member - a member accessible only from its own class.

PHP (online manual, OOP5)

Public declared items can be accessed everywhere. Protected limits access to inherited and parent classes (and to the class that defines the item). Private limits visibility only to the class that defines the item.

C++ is a bit unclear there, because — according to my compiler — a base class can always call overridden (virtual) members, private or not. In PHP however, where all public and protected members in PHP behave like virtual members, private members behave like non-virtual members.

I.e. it is obvious that you cannot call a private base member from a subclass. But, it is not so obvious that a base class can call private members of a subclass when they're virtual. And here the destinction lies. In PHP, if the member is private, the base class calls the base class method. If the member is protected, the base class calls the overridden method. In C++ it depends on the virtualness whether the overridden method gets called.

Better explained in code; the following two code snippets behave the same. Observe how you need to not use the virtual keyword for PrivateBase::_f() C++ to get the same effect as in PHP.

#include <assert.h>
class PrivateBase {
        public: virtual int f() { return _f(); }
        private: int _f() { return 1; }
class PrivateDerived: public PrivateBase {
        private: int _f() { return 2; }
class ProtectedBase {
        public: virtual int f() { return _f(); }
        protected: virtual int _f() { return 1; }
class ProtectedDerived: public ProtectedBase {
        protected: virtual int _f() { return 2; }
int main() {
        PrivateDerived *objPriv = new PrivateDerived();
        ProtectedDerived *objProt = new ProtectedDerived();
        assert(objPriv->f() == 1);
        assert(objProt->f() == 2);
        return 0;
class PrivateBase {
        public function f() { return $this->_f(); }
        private function _f() { return 1; }
class PrivateDerived extends PrivateBase {
        private function _f() { return 2; }
class ProtectedBase {
        public function f() { return $this->_f(); }
        protected function _f() { return 1; }
class ProtectedDerived extends ProtectedBase {
        protected function _f() { return 2; }
$objPriv = new PrivateDerived();
$objProt = new ProtectedDerived();
assert('$objPriv->f() == 1');
assert('$objProt->f() == 2');

The C++ was tested with g++-4.1.2 and PHP with version 5.2.0.

P.S. Felix Geisendörfer is a moron.

2008-08-05 - mac os x / airport fails autoconnect

After an update, quite a while ago, on this particular PowerPC running Mac OS X 10.3.9, the Airport WiFi adapter failed to autoconnect to the preconfigured network. After startup, you had to connect to a network by hand. The same had to be done after the (automatic) sleep-mode.

While this wasn't a particularly hard or time consuming operation (two clicks), it wasn't like this before. And like always, it was fixable. But not through enabling/disabling stuff in the UI. The System Preferences / Network / Airport default-join network was configured to the correct (and only) network, but it used a password instead of a keyring. First after editing the XML — mmm.. nice.. XML — configuration files, I got it to work properly.

The edits:

--- /Library/Preferences/SystemConfiguration/preferences.plist.old      Tue Aug  5 11:08:13 2008
+++ /Library/Preferences/SystemConfiguration/preferences.plist  Tue Aug  5 11:19:55 2008
@@ -204,18 +204,10 @@
-                               <key>AuthPassword</key>
-                               <data>
-                               ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi=
-                               </data>
-                               <key>AuthPasswordEncryption</key>
-                               <string>AirPort</string>
-                               <string>Preferred</string>
+                               <string>Automatic</string>
-                               <key>PreferredNetwork</key>
-                               <string>Hemma</string>

(If you can't read the above, you need to learn diff(1)/patch(1) syntax.)

After making the changes, I went back to the Airport default-join settings where it asked me if I wanted to allow the network manager to access to my keyring. A simple Yes-click made everything good again.

2008-07-14 - undisclosed recipients / syntax

Ever wonder what the colon and the semi-colon do in the undisclosed-recipients:; [sic] that gets added by MTA's when you send mail to only Bcc?

It's a special group construct, with zero members, as can be seen here in RFC 2822:

When it is desirable to treat several mailboxes as a single unit
(i.e., in a distribution list), the group construct can be used. The
group construct allows the sender to indicate a named group of
recipients. This is done by giving a display name for the group,
followed by a colon, followed by a comma separated list of any number
of mailboxes (including zero and one), and ending with a semicolon.
Because the list of mailboxes can be empty, using the group construct
is also a simple way to communicate to recipients that the message
was sent to one or more named sets of recipients, without actually
providing the individual mailbox address for each of those

Therefore mailing to my-bad-ass-group:myname@mysite.mytld; works, sending the mail to myname@mysite.mytld like it should.

2008-05-26 - windows pngcrush / png gamma

Photoshop still adds the gamma flag to their png files (nothing wrong with that) and Internet Explorer (IE) still parses this gamma flag (and you don't want that). I posted a solution for Linux earlier that uses a script that calls pngcrush to remove the gamma flag.

Here's a Visual Basic script (vbs) that does the same thing (except for single files only) for Windows (with an actual user interface.. sort of ;-) ).

dim strFile, objDialog
Set objDialog = createObject("UserAccounts.CommonDialog")
objDialog.filter = "All Folders|*.png"
''objDialog.InitialDir = "C:\" ''defaults to DESKTOP/HOMEDIR
if objDialog.showOpen = 0 then
end if
strFile = objDialog.FileName
set objDialog = nothing

dim objFso, objWshell, intRet
set objFso = createObject("Scripting.FileSystemObject")
objFso.moveFile strFile, strFile & ".tmp"
set objWshell = wscript.createObject("WScript.Shell")
intRet ="pngcrush.exe -quiet -rem gAMA  """ & strFile & ".tmp"" """ & strFile & """", 0, true)
if intRet = 0 then
    objFso.deleteFile strFile & ".tmp"
end if

Save the script as wjd_pngcrush.vbs in a folder somewhere and create a conventient shortcut to it. Then, download the latest pngcrush-win32 package from and place pngcrush.exe somewhere in your %PATH% (e.g. in C:\WINDOWS).

Now you can run the shortcut, select the png file you want to “fix” and click OK. The file will be replaced with a version which shows up like you want it to in IE.

2008-04-19 - email / recipient delimiter / plus / validation

Don't you just hate it when you have to cave to the ignorance of RFC-ignoring regular expression building moron programmers? Yes.. when your perfectly valid e-mail address is rejected for all the wrong reasons. That perfectly valid e-mail address with a plus-sign in it, used to delimit your username from the address extension.

I always enter as e-mail address whenever I have to sign up or fill out a form somewhere. This way I can keep track of where mail comes from and automatically sort it in my inbox.

Unfortunately, there are too many places where thick copy-paste-programming so-called programmers have scoured the net for some regular expression that only allows [a-z0-9._-] in the username part of the e-mail address, resulting in decent folk getting annoying “This e-mail address is invalid” lies thrown at them in bold red glyphs.

When I started using this recipient delimiter, I knew this was an issue. But now I have finally succumbed. Here, for your enjoyment, are the postfix(1) configuration settings to rewrite minuses (-) — that generally are accepted — to pluses (+):

  # You already had this enabled
  recipient_delimiter = +
  # You add this
  recipient_canonical_maps = regexp:/etc/postfix/recipient_canonical_maps

  # Replace recipient-delimited emails with minus to the (correct) plus.
  /^([^@-]+)-([^@]+)@(.+)$/       $1+$2@$3

postmap(1) the recipient_canonical_maps and reload postfix(1). Now you can use yourname-whatever@yourhost as well as yourname+whatever@yourhost. And no.. I'm not happy with this workaround :-(

Note that this has the side effect that your mailer accepts all mails to your domain that contain a minus at first and bounces it only later when it realises that it doesn't exist. Alternatively you could replace the first part between parentheses in the recipient_canonical_maps with only valid usernames. But then you have to update that everytime you add users and aliases.

Grmbl.. &^%#@$@.. grr.. mrrr..


2008-04-11 - surfnet / fasthub / tropico

Those people looking for the Surfnet Direct Connect hub, should use: All other addresses — like, and — are now incorrect and obsolete.

Happy DC'ing.

2008-03-17 - mssql server / choosing collation

SQL_Latin1_General_Cp1_CI_AI it will be.

The SQL collation name suffix tells us the following:

  • Code page 1, this really is code page 1252 though
  • Pref is not specified, so uppercase is not preferred
  • Case Insensitive ordering
  • Accent Insensitive as well

Accent insensitive is good for Dutch and languages that use no accents at all — e.g. English. For other languages, it is better that the accented characters are between the unaccented ones, than on some for a foreigner completely unknown location. (In Swedish 'å' comes after 'z' and before 'ä' and 'ö'.) Ignoring the accents is the safest bet when you're not targetting a uniform linguistic audience.

Case insensitive sorting is — yes, it goes against common Nerd sense — completely necessary for your users. They will smack you if you let 'albert' sort after 'Zachary'. Forget optional performance penalties, think of the time you'll save not having to explain your strange (to them) sort order.

All other MSSQL Server collations have a “sensible” name with a code page number. Somehow Cp1 got to be the name for the collation with code page 1252. We do not choose code page 850. Code page 850 has some of CP437's graphics characters to build a text-mode user interface. We don't need those characters. Furthermore, CP1252 is a superset of ISO-8859-1; which is good from a standards viewpoint. Also it has its characters logically arranged. With lowercase variants always 0x20 higher than their uppercase counterparts.

If you disagree with me, I'd love to hear your view on the matter.

2008-02-27 - windows 2003 does not honour hubris

So I'm not mission critical after all ;-)

C:\Documents and Settings\walter\Bureaublad>copy con: hubris.vbs
set myself = getObject("LDAP://CN=Walter Doekes,CN=Users,DC=studentennet,DC=local")
myself.put "isCriticalSystemObject", TRUE
        1 bestand(en) gekopieerd.

C:\Documents and Settings\walter\Bureaublad>cscript /nologo hubris.vbs
C:\Documents and Settings\walter\Bureaublad\hubris.vbs(3, 1) (null):
  De server kan de aanvraag niet verwerken.

2008-02-10 - mp3 ripping with ubuntu

On Ubuntu-gutsy (7.10) the default CD extractor/ripper is the Sound Juicer. It pops up with a simple player when you insert a CD, so it is the obvious choice. By default, however, it does not do MP3 encoding.

In Preferences -> Edit (Output Format) Profiles you get to Edit GNOME Audio Profiles, where the options "CD Quality, MP3" already exists. (Alternately, you could run gnome-audio-profiles-properties.) This uses gstreamer-plugins. For starters, you need the gstreamer lame plugin:

$ sudo apt-get install gstreamer0.10-lame
$ gst-inspect-0.10 | grep lame
lame:  lame: L.A.M.E. mp3 encoder

Now MP3-ripping works. But you want to be able to tweak the settings as well. gst-inspect will give you all possible options you can use in your Audio Profile:

$ gst-inspect-0.10 lame
                        Boolean. Default: true Current: true
  emphasis            : Emphasis
                        flags: readable, writable
                        Boolean. Default: true Current: false
  vbr-quality         : VBR Quality
                        flags: readable, writable
                        Enum "GstLameQuality" Current: 5, "5"
                           (0): 0                - 0 - Best
                           (1): 1                - 1

Lots of settings there. According to the web, for good audio you will want something like this:

quality=0 (best)
mode=1 (joint stereo)
vbr=4 (variable bitrate, new)
vbr-quality=2 (near best)

If you think you have a problem with Joint Stereo, you should read Joint Stereo: The Myths ... and The Realities.

Next, you'll want xingmux and id3v2mux for more generally accepted headers. Without them iPods and other MP3-players may show incorrect times. id3v2mux is already installed (gstreamer0.10-plugins-good, you can check it with gst-inspect-0.10 id3v2mux). xingmux is in the gstreamer0.10-plugins-bad package:

$ sudo apt-get install gstreamer0.10-plugins-bad

Combine the above into one GStreamer pipeline, and you get:

audio/x-raw-int,rate=44100,channels=2 ! lame name=enc quality=0 mode=1 vbr=4 vbr-min-bitrate=128 vbr-max-bitrate=256 vbr-quality=2 ! xingmux ! id3v2mux

Put that in the MP3 Output Format Audio Profile and you're set. Happy ripping!

2008-01-14 - sql hacks / second column as new row

Let's say you have a table city_tbl with a column city_name.

city_id (UNIQUE) | city_name (UNIQUE) | city_population
1                | New York           | 8.214.426
2                | Peking             | 17.430.000

If you want to add name aliases — Beijing for Peking for example — you would normally add a city_name_tbl table and move the contents of the city_name column there.

city_id | city_name (UNIQUE)
1       | New York
2       | Beijing
2       | Peking

There are a couple of minor drawbacks to this approach. First, that you lose the preferred name — you'll have to add a city_name_order column. Second, that it seems like overkill if you only have a handful of aliases in a table with thousands of records. And lastly, you might not have a “friendly” interface to these records and you want to keep your data easily editable with an SQL-client utility.

If these drawbacks apply, maybe you would add a city_alias column instead.

city_id (UNIQUE) | city_name (UNIQUE) | city_alias (NULL) | city_population
1                | New York           | NULL              | 8.214.426
2                | Peking             | Beijing           | 17.430.000

Now you've lost the ability to easily search by name and alias at the same time, unfortunately. If you're not careful and remember to put OR city_alias LIKE ... in every single WHERE-clause that concerns the city name.

That's where the hack comes in. Create — or use an already defined — helper table that provides a counter. It needs at least a two rows and the lowest number must be -1. It's quite possible that you already have a table that satisfies those constraints.

counter_number (UNIQUE)

Create the following view (this one is for MS-SQL, it will be a slight bit different for other SQL's).

CREATE VIEW dbo.city_vw
SELECT c.city_id, (CASE WHEN h.helper = -1 THEN c.city_name ELSE c.city_alias END) AS city_name, city_population
FROM city_tbl c, (SELECT TOP 2 counter_number AS helper FROM counter_tbl ORDER BY counter_number) h
WHERE h.helper = -1 OR c.city_alias IS NOT NULL

With this hack you can SELECT city_id, city_name FROM city_vw to get all possible names and aliases in city_name. The second city name column has become a new row in the view.

Just remember that this is Wrong (tm). Don't complain to me when your CPU utilization goes through the roof ;-)

Update 2008-08-15

It's funny how one can overlook the obvious. The UNION statement does exactly what's needed here. Forget this note exists :)