Thursday, July 15, 2010

URLs and Japanese writing - Like Ike and Tina Turner: an untennable combination

ワワワ。東京三菱銀行。コム

This is what the internet must look like to those who are not born and raised on the Roman alphabet (which judging from my analytics report is a small percentage of readers.) Those of us tutored in the ways of the ABCs are fortunate for two reasons: first is that our compact 26 letter Roman alphabet is so compatible with the binary-centric world of computers. And second, that the Internet was developed by ABCers.

For example, my name "Dan" in ASCII text converted to binary is

"0100010001100001011011100000110100001010"

But when converted to a katakana reading as ダン (Katakana by the way is the Japanese writing system now primarily used for writing foreign words like my name) the binary representation is

”0010011000100011001100010011001000110100
00111000001100000011101100100110001000110
01100010011001000110101001100110011000100
111011”

which is just over 3x the number of ones and zeros despite being only two characters.

(Serious diversion alert: This is probably an extremely boring, mildly technical and profoundly wrong explanation of what the previous means - if you are following feel free to skip ahead to the ASCII smiley face)

ASCII (American Standard Code for Information Interchange) is the name of the numerical code used by computers to understand numbers, letters and punctuation marks. Since "A" doesn't mean crap to a computer, it is converted to a code of zeros and ones. With a set of 52 letters (caps and lowercase), 10 digits, and a handful of punctuation marks the binary code for each can be much shorter since the grand total comes in at 95 printable characters. (yes, since 2007 the internet uses UTF-8, but from the seven seconds I spent reading about it on wikipedia, they seem to be fairly similar, one is 7-bit, the other is 8-bit; go shout it from a mountain).
Since ASCII doesn't have Kanji, Hiragana, Katakana, the Cyrillic, Greek, Hebrew, Thai, or Korean alphabets (etc. ad nauseam). It converts them into a five digit numerical code which then is converted into binary. The crux of it all is that they use up loads more memory since there are over 2,000 standard characters in Japanese, and over 10,000 in Chinese.

: )

welcome back. All this goes to say that URLs are not necessarily the easiest things for non-ABC users to commit to memory. While we in the ABC world can see www.Honda.com, or (in the case of the opening salvo) www.Tokyo-MitsubishiBank.com and remember it as a word, for others it is a matter of memorizing a string of, at worst, meaningless, and at best, begging to be misspelled, characters. And we all know what Firefox, Chrome, and IE think about misspellings: Tethered Swimming (Simpsons fans you're welcome)

So what does this mean for Japanese marketers eager to get customers to a website? it means "検索" or, kensaku, which is Japanese for "search" (in terms of google or yahoo that is)

Most TV or billboard ads don't give a URL, rather they give an image of a term entered into a search box with a kensaku button next to it telling the viewer to go the the search engine of their liking and enter that term (you can of course search in any language you like) to find the webpage you are looking for.

Here's the kicker - this means your company must be on top of SEO (Search Engine Optimization*) like a casino on a card counter. You must know the Google search algorithm like a 12-year-old at a madras knows the Koran. Except it isn't published and it changes all the time (the algorithm that is, not the Koran). Imagine you make a billboard telling potential customers to search for Nakamura Construction, but then all of a sudden some ass-hat with the last name Nakamura murders the head of another famous construction company - very likely stories related to Nakamura and his 9-iron-to-the-head antics will quickly surpass your website sending your potential customers straight to the website of some paparazzi orgy of innuendo, gory details, and questionable police photographs.

And you just paid for all that.

Next time you curse phonics, just think of that, and what your life would be like entering urls like the one at the top.

*Enough with the intrusive asides - SEO is the manipulation and modification of a webpage to ensure it is the first result returned by a search engine. It is generally influenced by textual content, page title, visitor traffic, external links, frequency of update, and other factors.

No comments:

Post a Comment