Software // Enterprise Applications
Commentary
10/20/2008
00:00 AM
Commentary
Commentary
Commentary
Connect Directly
RSS
E-Mail
50%
50%

Time for Unicode?


The first programming languages used a very restrictive character set. FORTRAN used a character set of only 48 characters. The ASCII character set offers 128 characters. Languages like C took full advantage of it, finding an appropriate and intuitive use for most of them. Then things went into reverse as C tried to accommodate more restrictive character sets by standardizing on trigraphs, and later with digraphs. For example, the trigraphs used ??< and ??> to represent { and }, and digraphs used .
These were treated with the enthusiasm one might reserve for a dead rat in a deli display case.

With the D programming language, we continuously run up against the problem that ASCII has reached its expressivity limits. Trying to come up with a sensible character or character pair for a particular need is frustrating, as "all the good ones are taken" and unattractive ones like the C digraphs are what's left.

But then there's Unicode. Programming language minds, intellects vast and cool, regard this Unicode with envious eyes(!). There are plenty of characters that fit the bill nicely. There are the chevrons « and » which serve as another set of brackets to lighten the overburdened ambiguities of ( ). There are the dot-product and cross-product characters · and × which would make lovely infix operator tokens for math libraries. The greek letters would be great for math variable names.

Alas, Unicode has a downside. Not all editors will display Unicode, and those that do make it hard to enter Unicode characters. A language designer might say, that's ok, we'll just pick a digraph or trigraph for those programmers who cannot edit Unicode source code. I think, though, that the C experience with trigraphs and digraphs shows this to be a failed path.

The D programming language has already driven stakes in the ground, saying it will not support 16 bit processors, processors that don't have 8 bit bytes, and processors with crippled, non-IEEE floating point. Is it time to drive another stake in and say the time for Unicode has come? Do your programming tools support Unicode source code?

What do you think?

 

Comment  | 
Print  | 
More Insights
Building A Mobile Business Mindset
Building A Mobile Business Mindset
Among 688 respondents, 46% have deployed mobile apps, with an additional 24% planning to in the next year. Soon all apps will look like mobile apps and it's past time for those with no plans to get cracking.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.