Time for Unicode? - InformationWeek
IoT
IoT
Software // Enterprise Applications
Commentary
10/20/2008
12:00 AM
Commentary
Commentary
Commentary
50%
50%
RELATED EVENTS
Free Yourself from Legacy Apps
Jun 08, 2017
They've served their purpose years ago, but now they're stretching your IT budget and increasing s ...Read More>>

Time for Unicode?


The first programming languages used a very restrictive character set. FORTRAN used a character set of only 48 characters. The ASCII character set offers 128 characters. Languages like C took full advantage of it, finding an appropriate and intuitive use for most of them. Then things went into reverse as C tried to accommodate more restrictive character sets by standardizing on trigraphs, and later with digraphs. For example, the trigraphs used ??< and ??> to represent { and }, and digraphs used .
These were treated with the enthusiasm one might reserve for a dead rat in a deli display case.

With the D programming language, we continuously run up against the problem that ASCII has reached its expressivity limits. Trying to come up with a sensible character or character pair for a particular need is frustrating, as "all the good ones are taken" and unattractive ones like the C digraphs are what's left.

But then there's Unicode. Programming language minds, intellects vast and cool, regard this Unicode with envious eyes(!). There are plenty of characters that fit the bill nicely. There are the chevrons « and » which serve as another set of brackets to lighten the overburdened ambiguities of ( ). There are the dot-product and cross-product characters · and × which would make lovely infix operator tokens for math libraries. The greek letters would be great for math variable names.

Alas, Unicode has a downside. Not all editors will display Unicode, and those that do make it hard to enter Unicode characters. A language designer might say, that's ok, we'll just pick a digraph or trigraph for those programmers who cannot edit Unicode source code. I think, though, that the C experience with trigraphs and digraphs shows this to be a failed path.

The D programming language has already driven stakes in the ground, saying it will not support 16 bit processors, processors that don't have 8 bit bytes, and processors with crippled, non-IEEE floating point. Is it time to drive another stake in and say the time for Unicode has come? Do your programming tools support Unicode source code?

What do you think?

 

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of Data and Analytics
Today's companies are differentiating themselves using data analytics, but the journey requires adjustments to people, processes, technology, and culture. 
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of November 6, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll