MessageFormat 2011q1

For ICU 4.8, we want to tackle some of the MessageFormat (and related) work that has piled up. For more ideas and things to do see the Message Formatting Redesign page and the common parent page.

    • Simple message pattern string parser as separate, public API

    • Use the message pattern parser in the implementation of MessageFormat, ChoiceFormat, PluralFormat, and SelectFormat.

  • We probably want to add an ICU version of ChoiceFormat in ICU4J.

    • Fix quoting and apostrophe handling.

    • Fix syntax and update docs. In particular, define where whitespace is allowed, what set of characters in argument names and plural form keys, syntax of argument numbers and plural offsets (only ASCII digits?), etc.

    • Java: Probably need to do something about compatible serialization.

    • PluralRules used in MessagePattern? Replace with simple interface for mapping double->String?

      • No, moved dependency on PluralRules out of MessagePattern, into PluralFormat.

    • Is MessageFormat.format() supposed to be thread-safe?

      • No, documented as not being thread-safe.

Status

Implemented in ICU 4.8. For a summary of benefits and incompatible changes, see the ICU 4.8 download page. (That page refers back here and to subpages for details.)

Demo

See the MessagePatternDemo class and the MiniMessageFormatter it calls.

For output from the MessagePatternDemo see this subpage.

Docs

MessageFormat syntax currently in the API docs

MessageFormat in the User Guide

Proposals & discussions

See also the Questions & Decisions sub-page.

icu-design 2011-02-08 00:59: ICU4J API *pre*-proposal: MessagePattern class

icu-design 2010-08-19 23:24: need stronger form of MessageFormat.autoQuoteApostrophe()

icu-design 2010-08-24 00:26: ICU API proposal: MessageFormat with autoQuoteApostrophe() behavior by default

... branched here

... including on 2010-08-25 19:24:

We discussed this in the ICU team meeting today, and reached the following consensus: - Change the default parsing behavior but implement both variants. - In Java, the old behavior can be chosen via an ICUConfig class backed by a .properties file. (Essentially at build time.) - In C++, the old behavior can be chosen via a #define to uconfig.h . (Also at build time.)

icu-design 2010-08-27 00:28: modify MessageFormat.autoQuoteApostrophe()?

icu-design 2010-08-26 21:25: ICU PluralFormat: not possible to have # in fragment?

icu-design 2010-07-27 20:23: proposal: extend plural format syntax

setFormat() etc.

If we could get rid of MessageFormat.setFormat() & siblings (deprecate, and throw UnsupportedOperationException), then we need to fix or replace several internal uses... which proved non-trivial, especially because such MessageFormat objects are used not only for formatting (where a preformatted string could have been passed in rather than using a custom Format object) but also for parsing. We decided that we needed to implement setFormat() etc., and it was not too hard.

We did remove one feature that already did not work well: We dropped support for toPattern() when custom formats have been set, because there is no good way in general to construct a pattern string. See the following ideas for some discussion.

If we had to implement setFormat() etc. with toPattern() support, then here are some ideas:

    • Add methods to the base class (Java: com.ibm.icu.text.UFormat; C++: Format) to support MessageFormat.toPattern(). For example:

      • toPatternStyle() -- for the argument style (different from regular toPattern() in that it returns "medium" and similar if applicable)

      • toPatternType() -- for the argument type

    • Add a MessagePattern.ArgType CUSTOM or similar which triggers special processing.

    • We could pretty certainly get away with requiring that only a NONE or CUSTOM argument can be replaced with a CUSTOM one. (But it might not matter.)

Tickets

Umbrella ticket: #8319: MessageFormat 2011q1

#2322: Add string ids to Message Format (this got done in 2007, but review the API ideas here)

    • resolved as fixed; named arguments supported since ICU 3.8, ticket #5792

#5904: Implement PluralFormat#formatToCharacterIterator

    • added comment pointing to recent decision not to write an attribute for the plural #, reduced priority to zero

#6306: C wrapper for PluralRules and PluralFormat.

    • Peter (Apple) is adding a PluralRules C wrapper with ticket #8467; cross-referenced the tickets; we probably don't need a PluralFormat C wrapper (use umsg_xxx())

#6466: PluralFormat special case for "zero" (superseded by #7858)

    • marked as "wontfix" with cross-reference to ticket #7858

#6563: ICU4J MessageFormat allocates a big int[] on the first call

    • done

#6858: Use of PluralFormat is poorly documented

    • done by Claire; more doc updates by Markus in ticket #8319

#6881: Issue: PluralFormat interoperability with NumberFormat (MessageFormat)

    • rejected as "worksforme" (since this is an existing, working unit test) with explanation

#6985: PluralFormat should return error if any mismatch between pattern and locale.

    • rejected as "needsmoreinfo"

#7048: Named arguments feature in MessageFormat is not well documented

    • done

#7165: introduce MessageFormat base class: only string substitution

    • made unscheduled, with comment

#7181: autoQuoteApostrophe doesn't work in nested subformats

    • closed as "fixed" with comments

#7457: move LocaleDisplayNames class (C++) to the common library once we have MessageFormatBase there

    • postponed

#7510: Fix remaining problems in MessageFormat (getFormats(), setFormat(), setFormatByArgumentIndex(), setFormatByArgumentName(), getFormatsByArgumentIndex(), named vs. numeric arguments, ease of calling with named arguments)

    • responded, mostly done via tickets #8095 & #8319

#7575: selectFormat docs should move to userguide

    • done

#7618: Use of Simple MessageFormat alternative in ICU implementation in ICU4J

    • postponed

#7682: MessageFormat::format and argument inconsistency

    • rejected as "wontfix" (behavior is specified in JDK MessageFormat API docs)

#7691: add Format method to describe itself for MessageFormat::toPattern()

    • added comment pointing to recent decision not to return a pattern for custom formats, reduced priority to zero

#7858: Extend PluralFormat syntax with offset and explicit values

    • done

#7860: ideas for better PluralFormat/MessageFormat performance

    • more or less done

#7904: MessageFormat should allow and trim whitespace around argument number/name

    • done

#7905: MessageFormat.applyPattern() can fail on a Turkish system

    • done

#7917: PluralFormat replaces quoted # signs

    • done

#7918: ICU4J MessageFormat should not use java.text.ChoiceFormat

    • done

#7938: umsg.h documentation out of sync with class MessageFormat

    • done

#8095: MessageFormat class provide method to get argument names

    • TODO: proposed

#8106: ICU4J MessageFormat needs performance improvement

    • marked as "needsmoreinfo", might come back

#8191: getArgTypeList should have same behavior with named/numbered argument pattern.

    • rejected: getArgTypeList() can only work for numbered arguments

#8245: getFormatNames should get the name but not

    • misunderstanding, rejected as "worksforme"

#8325: bug in the MessageFormat::getFormat(const UnicodeString& formatName, UErrorCode& status)

    • done