Uploaded image for project: 'ICU'
  1. ICU-2006

Transliterator getSource, getTarget problems

    Details

    • Time Needed:
      Days

      Description

      getSourceSet and getTargetSet are incorrectly implemented. Whenever a string is
      affected, but not all of its constitutant characters are, only the string should
      be added, not the constituants. While this cannot be exact, it should be much
      closer than it is now.

      The way to do this for RB transliterators is: (e.g. with getSource) as you are
      walking through the rules, store as follows:

      Step A
      
      Case 1
      ab <set> c > ...
         where <set> is some non-trivial UnicodeSet, or quantified element like a*
      
      To the result add each of "ab", <set>, "c".
      
      Case 2
      ab ($v1*) c > $1
      

      To the result add each of "ab", "c". Don't add $v1, since it is preserved in the
      output. Case 2 might be too hard initially; if so, do later.

        Attachments

          Activity

            People

            • Assignee:
              mark.edward.davis Mark Davis
              Reporter:
              apibot TracBot
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                tracCreated: