Software // Enterprise Applications
Commentary
10/20/2008
12:00 AM
Commentary
Commentary
Commentary
50%
50%

Time for Unicode?


The first programming languages used a very restrictive character set. FORTRAN used a character set of only 48 characters. The ASCII character set offers 128 characters. Languages like C took full advantage of it, finding an appropriate and intuitive use for most of them. Then things went into reverse as C tried to accommodate more restrictive character sets by standardizing on trigraphs, and later with digraphs. For example, the trigraphs used ??< and ??> to represent { and }, and digraphs used .
These were treated with the enthusiasm one might reserve for a dead rat in a deli display case.

With the D programming language, we continuously run up against the problem that ASCII has reached its expressivity limits. Trying to come up with a sensible character or character pair for a particular need is frustrating, as "all the good ones are taken" and unattractive ones like the C digraphs are what's left.

But then there's Unicode. Programming language minds, intellects vast and cool, regard this Unicode with envious eyes(!). There are plenty of characters that fit the bill nicely. There are the chevrons « and » which serve as another set of brackets to lighten the overburdened ambiguities of ( ). There are the dot-product and cross-product characters · and × which would make lovely infix operator tokens for math libraries. The greek letters would be great for math variable names.

Alas, Unicode has a downside. Not all editors will display Unicode, and those that do make it hard to enter Unicode characters. A language designer might say, that's ok, we'll just pick a digraph or trigraph for those programmers who cannot edit Unicode source code. I think, though, that the C experience with trigraphs and digraphs shows this to be a failed path.

The D programming language has already driven stakes in the ground, saying it will not support 16 bit processors, processors that don't have 8 bit bytes, and processors with crippled, non-IEEE floating point. Is it time to drive another stake in and say the time for Unicode has come? Do your programming tools support Unicode source code?

What do you think?

 

Comment  | 
Print  | 
More Insights
Building A Mobile Business Mindset
Building A Mobile Business Mindset
Among 688 respondents, 46% have deployed mobile apps, with an additional 24% planning to in the next year. Soon all apps will look like mobile apps and it's past time for those with no plans to get cracking.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Dec. 9, 2014
Apps will make or break the tablet as a work device, but don't shortchange critical factors related to hardware, security, peripherals, and integration.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of December 7, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program!
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.