Escaping URLs

To ensure that URLs can be displayed regardless of the locale, special characters are substituted by a percent sign followed by their two digit hexadecimal equivalent.

The following two Perl scripts were created to encode and decode URLs.

Note: This procedure is defined in RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax

Escaping

#!/usr/bin/perl

use strict;
use warnings;
use English;

my $url = $ARGV[0];
for (my $i = 0; $i < length($url); ++$i) {
    my $char = substr($url, $i, 1);

    # substitute
    if ($char eq '+') {
        $char = ' ';
    }

    # translate
    if ($char =~ m/^([^a-zA-Z0-9-_/.,:?&=])$/) {
        print '%' . unpack('H2', $1);

    } else {
        print $char;
    }
}
print "n";

De-escaping

#!/usr/bin/perl

use strict;
use warnings;
use English;

my $url = $ARGV[0];
$url =~ s/%(..)/pack('c', hex($1))/eg;

print $url . "n";
Feedback is always welcome! If you'd like to get in touch with me concerning the contents of this article, please use Twitter.