An Agreeable Procrastination – and the blog of Niels Kühnel

Code, the universe and everything

The New Localization Framework in Umbraco 5

with 18 comments

This is a primer on the new localization framework in Umbraco 5.

Microsoft did a very fine job with System.Globalization when it comes to formatting numbers and dates for different locales. The localization framework in Umbraco adds to this support for grammatical differences between (spoken) languages including differences in plural forms, order of words in sentences etc.

Generally seen the framework consists of:

  • A replacement for resource strings that allows texts to be combined from a multitude of layered sources.
  • A superset of the string.Format syntax with a domain specific template language tailored for handling grammatical differences between languages.

The main objective is to separate grammatical logic from code and to maximize the length of text passages to be localized to give translators maximum context and flexibility. All this while minimizing the number of redundant texts.

Let’s look at a simple example to illustrate why you need this framework. Suppose you want to greet the user in some system with the number of new messages like “Welcome Fletcher. You have 5 new messages”. You quickly realize that this doesn’t work with only 1 message and take a simple approach with

string.Format(“Welcome {0}. You have {1} new message(s)”, name, count).

That works for English but it’s not suitable for localization because other languages may not support the “word + (plural ending)” form very well. Besides, you probably don’t want your fancy Web 2.0 site to print messages that looks like something from a DOS command prompt (y/n?).

Instead you might solve this with

string.Format(“Welcome {0}. You have {1} new {2}”, name, count, count == 1 ? “message” : “messages”)

or

string.Format(Get(“Greeting”), name, count, count == 1 ? Get(“MessageSingular”) : Get(“MessagePlural”)

(Assuming that you have a Get method to get resource strings)

That will work great for most Western languages. You may however have cut off the French because they use the singular form for zero too (They have 0 message). And it will definitely not work for Slavic languages because they have much more difficult rules for plural forms. See http://translate.sourceforge.net/wiki/l10n/pluralforms for reference (e.g. one of the Polish cases is n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20))

Not supporting a lot of exotic languages may be okay for your intended audience but the approach still has some disadvantages:

  1. You have now included language specific grammatical logic in your code. It’s very annoying to fix bugs related to this and it may be a never ending story as new languages are targeted.
  2. You have three different texts to translate for the same message and it’s not very clear in which context the atomic texts “message” and “messages” belong.
  3. You need to explain what {0}, {1} and {2} represents in the text and it becomes ghastly if you have even more parameters. That may entice you to split the text to reduce the number of parameters but then the translator will lose some of the flexibility.

In Umbraco 5 you’ll write

Localize(“Greeting”, new { Username = name, MessageCount = count }).

From a developer’s perspective that’s perfect because the code above very clearly expresses “I want to have the greeting text here, and I’ll pass these named parameters that it need for it”. Now, from a translator’s perspective this is also great because the framework’s entire pattern syntax is available for creating a translation without any compromises.

The English version for this would be

Welcome {Username}. You have #MessageCount{1: 1 new message | {#} new messages}

The first thing you’ll notice is that the named parameters are used instead of numbers. In code anonymous types are used to specify the parameters and in the text it’s clear what the parameters represent. In this example it’s not needed but you can use the normal format specifiers after the name so {MessageCount:N2} would become “5.00”.

The second thing is the “switch” construct that allows you to use different texts for different counts. It has the syntax

#[ParameterName] { [Condition 1] : Text 1 | [Condition 2] : Text 2| Text in other cases}.

Within the switch body the special parameter {#} means the value of the parameter being switched on.

This should open some opportunities as you, without changing the code in your application, could make more interesting texts by changing it to

You have #MessageCount{0: no new messages | 1: one new message | < 10: {#} new messages | a lot of messages!}

There’s a pretty extensive syntax for the switch conditions and all the plural rules from plural form reference link above are supported.

Now, you may argue that translators don’t like writing curly braces but the syntax is just what the framework expects. Feel free to make your own easier-to-understand intermediate format for the translator or even create a graphical editor. The point is that grammatical logic is effectively removed from your application’s code.

(By the way, the framework is not locked to this syntax. It’s just the default parser. The framework works on ASTs and you can create your own grammar and parse that into these ASTs instead.)

Now let’s add one final thing to the example. Say we want some HTML in the text as we want it to be

“Welcome <span class=’user-name’>Fletcher</span>. You have <span>5 new messages</span>”

With the string.Format approach you may either have to chop the text into tiny pieces or accept that the translations include markup. Neither is desirable. The former approach creates an immense number of texts with very unclear purposes and the latter makes you tired if you decide to change the markup after the software has been translated to 20 languages.

With the Umbraco localization framework you can write

Welcome <NameFormat: {Username}>. You have <MessageFormat: #MessageCount{…} >

And then dynamically specify the markup with

Localize(“Greeting”, new { Username = name, MessageCount = count, NameFormat = “<span class=’user-name’>{#}</span>”, MessageFormat=”<span>{#}</span>” }).

This gives at least the advantages that 1) you don’t have to split the text so the translator still has full context and flexibility and 2) the translated text is not tied to HTML and you could, in principle, use it on other devices because it just contains “format markers”.

With the default settings the parameter values are HTML encoded but the format you specify is not, so little Bobby <XSS would be greeted with “Welcome <span class=’user-name’>Bobby &lt;XSS</span>”

In a later blog posts I’ll dive deeper into the syntax and its features that include reusable templates, switching on timespans, roman numbers and much more (you can see an up to date’ish specification of the grammar here), but my next post will be about

Text sources

My next blog post will be about how text sources are structured, the default XML format and how new sources can be implemented and embedded in assembly manifests. One of the main benefits over ordinary resources strings is that even if texts are embedded in assemblies, texts for other languages can be added from XML files, databases etc. These other sources can also replace/correct texts in existing languages. You’ll also see how texts can be arranged in namespaces to avoid clashes and how properties of MVC view models are automatically mapped to text keys without the use of attributes.

Even if you don’t expect your application to be translated to other languages you can still benefit from the framework as it greatly helps you maintain your texts without hacking your code.

Rembember: “Language is vivid. Don’t let computer languages keep it down!”

Advertisements

Written by niels.kuhnel

May 12, 2011 at 3:04 am

Posted in Uncategorized

18 Responses

Subscribe to comments with RSS.

  1. Hi Niels

    Can you point me towards any documentation for the grammar?

    Thanks

    Kola

    June 29, 2011 at 7:46 am

    • Sure. It’s in the repo at umbraco.codeplex.com (Source/Libraries/Umbraco.Foundation/Localization/Parsing/Default Parser Syntax.txt)

      For convenience you can also just download here

      niels.kuhnel

      June 29, 2011 at 10:19 am

  2. Thanks!

    Kola

    June 29, 2011 at 1:18 pm

  3. Very nice! Best framework for such things that I’ve seen. But is it also possible to use this outside of Umbraco? I guess this would help for any application.

    Remy Blättler

    August 17, 2011 at 10:17 am

  4. Thanks 🙂 When Umbraco 5 is released, the “Umbraco Application Framework” (name uncertain) will be part of it, and that will include the localization framework and some other nifty features to be used in any application.

    niels.kuhnel

    August 17, 2011 at 10:35 am

  5. So it’s not part of the CTP yet? Any dates for the release already?

    Remy Blättler

    August 17, 2011 at 10:58 am

  6. Oh, yes it’s also part of the CTP. Forgot to mention that 🙂

    niels.kuhnel

    August 17, 2011 at 11:08 am

  7. I’ve been trying but to no avail. Supposedly it accesses the LocalizatonEntries.xml currently at ~/App_Data/Umbraco, although the code at LocalizationWebconfig.cs seems to point to ~/App_Config by default. None of the locations work for me, even after a call to LocalizationWebConfig.SetupDefaultManager just before calling Localize(). I am eagerly waiting to that next post; meanwhile, do you have a hint for me?

    • I’m honeymooning in Barbados and I’ll get back to you next week. Cool?

      niels.kuhnel

      January 19, 2012 at 3:16 am

    • All right. Umbraco 5 does in fact look in ~/App_Data/Umbraco/LocalizationEntries.xml but the text are not used because they have a wrong “namespace” for the execution context.

      Just change the root tag to , and the texts will be used.

      I tried adding

      The number is {Number:N2}

      and then

      @using Umbraco.Framework.Localization
      bla.., bla.. bla..
      @(“TestKey”.Localize(parameters: new {Number = 23}))

      in a razor template, and that worked.

      Thanks for noting 🙂

      niels.kuhnel

      February 1, 2012 at 1:07 pm

  8. Niels, I really, REALLY like what you guys have done for the localization. I’m attempting to leverage it in a project I’m working on, but I don’t want the “extra” cruft that Umbraco.Framework needs, like JSON references, etc

    Any thoughts on slicing that out into its own self contained library, not dependent on 3rd party libs?

    Eric Newton

    February 23, 2012 at 5:07 pm

    • Thanks, I’m happy to hear you’re using it 🙂 You can use Umbraco.Framework standalone if you’re only using Localization (i.e. you don’t need other references). I’m currently preparing a post where I’m using a Google spreadsheet as text source and in that project I’m only referencing Umbraco.Framework.

      niels.kuhnel

      February 24, 2012 at 10:49 am

  9. hi Niels, just to clarify, does your example use the “LocalizationEntries.xml” file or the Dictionary section in the CMS. i’m not having any luck getting this working on the u5.1 nightly build. thanks

    andrew

    March 7, 2012 at 1:12 am

    • It uses “LocalizationEntries.xml”. What is missing from my previous comment is that you have to write Namespace=”” in the root tag. I haven’t tested with 5.1 but does that work there too?
      I’m sorry that the code in the release needs clarification. We’ll resolve that in the “Go document what code does, and align code with documentation”-phase next 🙂

      niels.kuhnel

      March 7, 2012 at 2:21 am

  10. With the popularity and rise in real-time text-based communications, such as Facebook , Twitter , instant messaging , e-mail , Internet and online gaming services, chat rooms , discussion boards and mobile phone text messaging ( SMS ), came the emergence of a new text language tailored to the immediacy and compactness of these new communication media.

    Bobbie Glass

    January 9, 2013 at 2:18 am

  11. I need to to thank you for this good read!! I absolutely loved every bit
    of it. I habe got you book marked to look at new
    stuff you post…

    Emmanuel Ward

    October 10, 2013 at 6:05 am

  12. A fascinating discussion is definitely worth comment.
    I do think that you need too write more about this subject
    matter, it may nott be a taboo subject but generally folks
    don’t speak about these topics. To the next! Best wishes!!

    Serenity Truman

    October 16, 2013 at 7:58 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: