The quest for plural formatting in Python

15 Dec '15

The more technical post Custom formatter for Python strings was inspired by this problem and dives deeper into string formatting internals. This post sticks to localisation (l10n) issues.

I was looking for a sane way to pluralize pluralise nouns inside strings. Initially, we were passing the correct noun directly to format, but this is messy (extra if) and doesn’t translate. Plurals can be tricky to get right, you can’t always just add an ‘s’:

Also, people kept getting the logic wrong. It’s count == 1, not count > 1 as it’s ‘zero apples’, not ‘zero apple’ *. (* Not a guarantee. This issue will be back later with a vengeance.)

Voilà, la première tentative:

import string

class PluralFormatter(string.Formatter):
    def format_field(self, value, format_spec):
        if format_spec.startswith('plural,'):
            words = format_spec.split(',')
            if value == 1:
                return words[1]
                return words[2]
            return super().format_field(value, format_spec)

This implementation is okay. It does handle as many plural nouns as you want in a string, not just one. And you could use a lazy version where you just specify the plural ‘s’, but (spoiler alert) you shouldn’t:

>>> fmt = PluralFormatter()
>>> msg = '{0} {0:plural,bottle,bottles} on the wall'
>>> for bottle_count in (99, 3, 2, 1, 0):
...     print(fmt.format(msg, bottle_count))
99 bottles on the wall
3 bottles on the wall
2 bottles on the wall
1 bottle on the wall
0 bottles on the wall
>>> fmt.format('apple{0:plural,,s} and pear{1:plural,,s}', 3, 5)
'apples and pears'

What else is there? Easily translatable, right? Just translate the whole format string! Genius…

EN: '{0} {0:plural,bottle,bottles} on the wall'
DE: '{0} {0:plural,Flasche,Flaschen} auf der Wand'

… until you get to non-Germanic languages, where the rule isn’t count == 1. How about Russian: (n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2).

Language is hard, man.

There are, of course, existing solutions. But the ones I’ve seen feel lacking. Even gettext doesn’t get this quite right in my opinion. But then again, I can barely understand that documentation.

I think the formatter solution still has the advantage. It’s inherently cleaner than any gettext hack with two messages, and as it has more granular information to work with than most other solutions, it should be able to do better. The current downfall is the pluralisation logic. And while nouns can depend on context, the next thing I’d try is to simply translate the noun separately and on it’s own.

import string
from gettext import ngettext

class PluralFormatter(string.Formatter):
    def format_field(self, value, format_spec):
        if format_spec.startswith('plural,'):
            _, singular, plural = format_spec.split(',')
            return ngettext(singular, plural, value)
            return super().format_field(value, format_spec)

Note that this version doesn’t work with the lazy ‘s’, but otherwise we have a winner.

In practice, I bet we’ll find a case where it doesn’t work. But this way we’ve thought of the documentation/publication/translation team and international users. All in all, a good solution.

Python, language

Newer Older