Click here for a .pdf version optimized for printing
Notes:
In the following I will mostly refer to texts written in Latin and Cyrillic scripts, as used in Europe; however, the problem described here may apply to other scripts as well, all over the world.
In this document, by “writing correctly” a language means the use of all national characters of that language, including accented characters, where applicable.
In current alphabetical writing systems,
any of the letters A, А, Å or Ă is just another letter.
None of them is more special than another.
Sending an SMS text message over a GSM network has become a trivial practice for most of us. Although relatively cheap, sending an SMS text message still has a cost. The problem is that this cost is different, depending on the language used by the sender – even by the same sender, using the same device.
If your native language happens to be one from the “western” part of Europe, good for you: you can write correctly your language, without spelling or grammar restrictions. However, if your native language happens to be one form the “eastern” part of Europe, bad luck: if you want to write correctly your language, it costs you more.
The difference comes from the way the SMS part of the GSM standard has been originally developed: the GSM character set only covers a few so-called Western languages. Any message written correctly in any non-Western language may double or more the cost of that message. The only way of keeping the SMS text message cost at “normal” price in any situation when using the Latin script is to use exclusively non-accented characters, thus dropping any language-specific character from the text. This may lead to language crippling, for the solely reason of improper SMS technical protocol implementation and/or improper operator charging mechanism.
A Short Message Service (SMS) text message is sent over a GSM network as a stream of 1120 bits of data. The 3GPP TS 23.038 technical specification, which defines the language-specific requirements for GSM, describes three methods to represent an alphabet:
When sending a message that includes any character (accented or not) that happens to fall into the default alphabet, everything is “normal”. In this mode, a single SMS text message can include 160 characters (1120 divided by 7).
The problem occurs when sending a message that includes one or more regional language characters that are outside the default alphabet. The presence of one or more of these characters will trigger the whole message in UCS-2 fixed encoding, i.e. two bytes for each single character from the Unicode base plane. In this mode, a single SMS text message can include only 70 characters (1120 divided by 16).
Note: an example of how an SMS-related PC application treats different characters of different languages can be seen at the end of this document.
This may lead to some bizarre count and pricing for messages containing same number of characters, but different national character types. Here is an example:
As can be seen, attempting to respect the culture for one language increases the number of the messages required and the final price, while attempting the same for other language does not. Actual messages number and resulted cost may vary, depending on the particular language, number of characters, characters used, etc.
I agree the fact that in practice many users will never write accented characters in SMS text messages using their phone device keyboard (especially on those phones with “traditional” keypad), but this discussion here is for those who are willing to do that and are discouraged because of the unjustified hassle. More than that, some may wish to send SMS text messages from their computer using the PC application that came with the mobile device, in which case writing correctly a native language may be simply natural.
A proper solution to this problem is a complex matter, but first of all there should be desire to start to find one. At first glance it seemed that some solutions may exists, but none truly satisfy the discrimination problem raised here, except probably the last from the ones listed below:
The fundamental idea is that all languages of the world must be treated equally by any technology. It is simply absurd to consider an alphabet that begins with ABC to be more important that an alphabet that begins with АБВ, or to consider characters Å or Ñ to be more important than characters Ă or Č.
In the summer of 2008 I complained about this issue to the Commissioner for Multilingualism in the European Commission. After a while I received the answer which I quote below in italics.
Note: at that time, my report also included an issue strictly related to the Romanian language (the ș and ț issue); I will take the liberty to omit the answer related to that issue, as it is of no relevance for the scope of this article.
Dear Mr Secară,
I would like to thank you for drawing the attention of the Commission to annoyances you are confronted with the use of language specific accentuated characters on the GSM network. As you clearly describe it, the problem only occurs in specific cases, however the promotion of multilingualism is at the heart of the priorities of Commissioner Orban.
You have correctly identified the dual nature of the problem: one being technical with the inadequate handling of 2 specific characters for the Romanian language, the other being the charging mechanisms for SMS.
[...]
For the latter issue, charging plans are entirely under the responsibility of telecom operators. The charging plans for SMS are subject to competition pressures between operators, the result being that such costs are driven down. Commissioner Reding is also applying continuous pressure on mobile operators to reduce costs of SMS. Without entirely removing the annoyances you mentioned this would at least minimize their impact. On the other hand in our Annual Information Society Report 2008 we identified that information society developments in Romania were still at an early stage with the resulting benchmarking indicators being close to the bottom of the EU rankings. We are continuing our efforts to encourage Member States to reduce the gaps between high and low performers.
I would like to once again thank you for your detailed and accurate report.
With my best wishes for your efforts
Anne Bucher
Head of Unit INFSO-C1
"Lisbon Strategy and i2010"
Directorate General Information Society and Media
Office: BU25 01/131
European Commission
I thank Anne Bucher very much for the kind response, but I consider it not good enough.
It seems I was not very clear in expressing my idea to the European Commission, but I consider the SMS text message issue to be under the responsibility of the mobile device manufacturers, not the mobile operators. It is the mobile device and the PC software accompanying the mobile device that makes the difference between, for example, Latin alphabet as used by (say) French language and Cyrillic alphabet as used by (say) Bulgarian language. It is the device that changes accordingly the maximum number of characters per SMS text message in that session. It is the device that truly makes the discrimination and in the end has a cultural and economic impact over the user.
I suppose that an electronic device must meet some criteria in order to be allowed for commercial distribution in UE market; apart from not being toxic, not inflammable, etc., it should also not be culture discriminant. Why not ?
At least in Europe, I expect that trends of discrimination caused by multilingualism to be regulated also by the European Commission. I find hard to believe (although not impossible) that all GSM operators from non-Western language countries will rush to change their charging mechanism just for cultural reasons.
On the other hand I also believe that the issue described here is rather caused by the lack of interest in internationalization matters from those who originally set the GSM technical specifications (and also the fault of those who later did too little to correct properly the technical problems or weaknesses).
Users should be able to choose freely how to treat their language, either mistreated or respectful, but either case should not be constrained by poorly designed technology. Nowadays the technical things are enough advanced to be able to cope successfully with any cultural demand.
Notes:
All snapshots were taken with the locale of the PC set to my language, Romanian.
I used the Motorola Phone Tools application for this example just because it has a very intuitive interface in relation to the subject discussed here.