Only comments that are publicly visible are listed below.
Such comments sometimes contain links to other items that are not publicly visible.
C018 | Na | Na | N | Dan Connolly
| -
| 4.4 | Example unclear
Decision: Not applicable. Rationale: We have classified this as 'not applicable' because you have not asked for any actual changes of the document.
Your problem with understanding the cz example may have been due to rendering issues, but we doubt that, because the
(non-combining) cedilla that we used in the draft to represent the combining cedilla is part of iso-8859-1, which was rendered well since the very first browsers. We suspect that you may have overlooked the cedilla, a little low hook after the z.
We will try to take your suggestion for a test into consideration for our next phase. Our response (sent 2003-12-11) -- Notification
|
C023 | E | A | S | Jeremy Carroll
| -
| 4.4 | 'normalization-sensitive' unclear
Decision: Accepted. Decision: Add examples (i) of operations which are not normalization-sensitive, and (ii) illustrating what we mean by inputs and outputs. Our response (sent 2003-05-01) -- Notification
|
C024 | Na | Na | S | Jeremy Carroll
| -
| 3.2 | Is UTF-7 a unicode encoding form?
Decision: Not applicable. We have classified this comment as 'not applicable', because it is only a question. Our answer is that yes and no. UTF-7 can be considered an unicode encoding form, or not. It is an unicode encoding form to the extent that it encodes a sequence of unicode characters. However, it does not map a character to an identifiable sequence of bytes, and has a number of other rather undesirable properties. It was designed for use in very special cases such as Email, but has widely been replaced by UTF-8, and is no longer recommended for use, to the extent that we decided that the most adequate way to handle it in the Character Model was to completely ignore it.
|
C025 | T | A | N | Olu Ibidunni
| -
| 3.1.5 | In 4th example, should 'o' be 'ö'?
|
C026 | E | A | N | Ian Jacobs
| -
| 3.1.3 | mapping between character codes and units of displayed text
See also the following comments: C096
Decision: Accepted. Decision: Change 'character codes' to 'characters'. Our response (sent 2003-02-17) -- Notification |
C027 | Na | Na | O | Joseph Reagle
| XML Sig WG
| Various | XML Sig WG comments
|
C028 | Na | Na | N | Jeremy Carroll
| RDF Core WG
| Various | Endorsement from RDF Core
Decision: Not applicable. Rationale: We thank you for your endorsement. We have classified this comment as 'not applicable' because it does not suggest or imply any changes. We would like to note that the Character Model is written so as to make clear that specifications do not have to follow all the requirements, just those that apply in their specific case. Our response (sent 2003-02-13) -- Notification |
C029 | Na | Na | N | Jeremy Carroll
| RDF Core WG
| 2 | breadth of scope
Our response (sent 2002-05-27) -- Re: breadth of scope Decision: Not applicable. Rationale: We have classified this comment as 'not applicable', because it is too general. Each CharMod requirement applies only where applicable. For example, if a specification doesn't deal with sorting, then requirements related to sorting do not apply. Also, specifications that don't deal with text (e.g. a bitmap format) would therefore not have any applicable requrements (except e.g. for textual comments and other metainformation embedded in the format). We would also like to point out that the term 'processing model' is taken very widely here. Even if a specification does not have an explicitly defined processing model, it implicitly defines how to process (e.g. match) characters. As an example, RDF conforms to the processing model, on the level of the abstract syntax by virtue of the fact that the abstract syntax is expressed in Unicode, and on the level of RDF/XML by virtue of being based on XML. Our response (sent 2003-02-13) -- Notification |
C030 | E | N | N | Jeremy Carroll
| RDF Core WG
| 3.5 | non-universality of processing model
Our response (sent 2002-05-27) -- Re: non-universality of processing model Decision: Noted. Rationale: We have classified this comment as 'Noted', because it did not contain any suggestions for changes. However, in order to address the misunderstanding that we think
this comment exposes, we have added some text (just before
C014): "Also, while this document uses the term Reference <emph>Processing</emph>
Model and describes its properties in terms of processing, the model also
applies to specifications that do not explicitly define a processing model." We hope that this clarifies the situation for RDF: Even if there is
no processing model for RDF, on the level of text processing, RDF
conforms to the Charmod Reference Processing Model because of the
way the abstract syntax is defined in terms of Unicode characters
and because of the way XML is used. Our response (sent 2003-02-13) -- Notification |
C031 | S | P | N | Jeremy Carroll
| RDF Core WG
| 8 | no dependency on IRI draft
See also the following comments: C059 C170
Decision: Partially accepted. Rationale: Our plan is that the IRI ID, referenced in this section, will have been submitted for Proposed Standard by the time CharMod moves to the next stage. IRI equality is fully addressed in the latest IRI ID version. Our response (sent 2003-02-13) -- Notification |
C032 | Na | Na | O | Jeremy Carroll
| RDF Core WG
| Various | Overview of RDF Core feedback
-
Comment (received 2002-05-27) -- Overview of RDF Core feedback
The RDF Core WG has made feedback concerning the following sections of charmod: > 1. Introduction > 2. Conformance > 3.4 Strings > 3.5 Reference Processing Model > 4. Early Uniform Normalization > 6. String Identity Matching > 8. Characeter Encoding in URI References > 9. Referencing the Unicode Standard > A.2 Other References > C. Composing Characters > D. Resources for Normalization [...] RDF Core makes no comments on the other sections.
This comment lists the sections that have been commented on by the RDF Core WG. Please see the specific comments listed below. This comment has been split into the following comments: C028 C029 C030 C031 |
C033 | E | P | N | Ian Jacobs
| -
| 3.1.6 | Use of word 'byte'
Decision: Improve the sentence. Decision: Partially accepted. Rationale for 'Partially accepted': We think we can improve on the suggested wording. Our response (sent 2003-02-17) -- Notification |
C034 | S | A | S | Joseph Reagle
| XML Sig WG
| 3.6.3 | Private Use Code Points: Disagreement with our approach
|
C035 | S | A | N | Joseph Reagle
| XML Sig WG
| Various | 'All W3C specs must conform.'
Decision: Accepted Rationale: We have originally rejected this comment. We have later, after extensive discussions, been instructed by W3C Management that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language. We have also been instructed to request a finding from the TAG corresponding to the text that we removed.
|
C036 | E | A | N | Joseph Reagle
| XML Sig WG
| 3.1.3 | Define 'logical order'
|
C037 | E | A | N | Joseph Reagle
| XML Sig WG
| 4.1.1 | Character 'í' hard to distinguish from 'i', particularly when
italicized
|
C038 | E | A | N | Jim Melton
| XML Query WG
| 2 | Conformance of new vs. old specs
See also the following comments: C051 C088 C089 C135
Decision: Accepted You point out a clear inconsistency, which we have fixed a while ago. We have later been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language altogether. We have been instructed to request a finding from the TAG corresponding to the text that we removed. We will make sure that, if relevant, the inconsistency you pointed out will not reappear. |
C039 | E | A | N | Jim Melton
| XML Query WG
| 3.1.5 | Determining relevant language for sorting
|
C040 | E | A | N | Jim Melton
| XML Query WG
| 3.1.7 | How to avoid use of the term 'character'?
See also the following comments: C004 C138 C166
Decision: Accepted Decision: We'll add clarification. Our response (sent 2003-05-01) -- Notification |
C041 | S | A | N | Jim Melton
| XML Query WG
| 3.2 | Proprietary charset identifiers
See also the following comments: C139
Decision: Rejected. See our response (below) for our rationale. Our response (sent 2002-06-05) -- Re: proprietary charset identifiers [...] Please tell us, at your earliest convenience, whether you are
satisfied with our decision or not. If not, please provide
additional rationale. Decision: Accepted. Note: We've made the requested change and will ask the XML Query WG whether they have further concerns about section 3.6.2: [S] If the unique encoding approach is not taken, specifications SHOULD mandate the use of the IANA charset registry names [...] If they do have such concerns, they need to raise a separate comment. After some discussion (see the last message on this from Jim Melton)
we have decided to accept the comment as it was made. We have changed "... is identified by an IANA charset identifier." to "... is identified by a unique identifier, such as an IANA charset identifier." However, our exchange suggests that the XML Query WG may also not be okay with some of the wording in Section 3.6.2, which (among
else) says:
"[S] If the unique encoding approach is not taken, specifications SHOULD mandate the use of the IANA charset registry names [...]"; if this is the case, please indicate so at as soon as possible, or we will have to assume that this is okay with you. |
C042 | S | A | | Jim Melton
| XML Query WG
| 4.4 | Discussion of subsequent items
|
C043 | S | A | | Jim Melton
| XML Query WG
| 4.4 | Objection to prohibition against receiver from normalizing text
|
C044 | S | A | | Jim Melton
| XML Query WG
| 4.4 | Prohibition against normalizing suspect text
|
C045 | S | A | | Jim Melton
| XML Query WG
| 4.4 | Prohibition against interim unnormalized states
|
C046 | Na | Na | O | David Fallside
| XMLP WG
| Various | XMLP WG response to Charmod LC#2
|
C047 | Na | Na | O | Tim Bray
| -
| Various | Comments on Character Model
|
C048 | Na | Na | O | Yin Leng Husband
| WSArch WG
| Various | WSArch WG review of Charmod LC #2
|
C049 | Na | Na | O | Norman Walsh
| TAG
| Various | TAG comments on Character Model for the World Wide Web 1.0
|
C050 | S | A | S | Cliff Schmidt
| Microsoft
| Various | CharMod restricts closed systems
|
C051 | E | A | S | Cliff Schmidt
| Microsoft
| 2 | Inconsistent/Redundant Requirements for W3C Spec Conformance
|
C052 | S | P | S | Cliff Schmidt
| Microsoft
| 2 | W3C Spec Conformance
|
C053 | E | P | S | Cliff Schmidt
| Microsoft
| 3.5 | Full Range of Unicode Code Points Not Allowed in XML
Decision: Partially accepted. Rationale for 'Partially accepted': It is up to each specification to provide the specific reason(s) for deviating from a 'SHOULD' requirement. We have however amended the text to read: 'Specifications SHOULD not arbitrarily exclude characters from the full range of Unicode code points from U+0000 to U+10FFFF inclusive;'. Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg |
C054 | E | R | S | Cliff Schmidt
| Microsoft
| 4.2.3 | Definition of 'Fully-Normalized'
Decision: Partially accepted. Rationale: Checking reveals that we could go either way;
change record to partially accepted, but no change. Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg |
C055 | S | R | S | Cliff Schmidt
| Microsoft
| 4.4 | Mandating NFC for All Web Content
Decision: Rejected. Note, however, the relaxation of the language in section 4.4. Rationale: 'Foreign systems' is undefined and undefinable. Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg |
C056 | S | P | S | Cliff Schmidt
| Microsoft
| 4.4 | Text-Processors MUST Perform Normalization Checking
Decision: Partially accepted. Rationale: Try to add a note explaining that in
the base case,
only a one-character lookahead is needed. In the long term, try to move material about
'composing' characters to UAX 15. Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg |
C057 | S | P | S | Cliff Schmidt
| Microsoft
| 4.4 | Content Producers and Proxies
|
C058 | E | R | S | Cliff Schmidt
| Microsoft
| 4.4 | Web Repositories
|
C059 | S | A | N | Cliff Schmidt
| Microsoft
| 8 | IRIs
See also the following comments: C031 C170
Decision: Accepted. Our plan is that the IRI ID, referenced in this section, will have been submitted for Proposed Standard by the time CharMod moves to the next stage. IRI equality is fully addressed in the latest IRI ID version.
|
C060 | Na | Na | N | David Fallside
| XMLP WG
| 2 | XML Protocol LC#2 review question on implementation testing
Decision: Not applicable Rationale: We have classified this comment as 'not applicable', because it is a question, not a comment leading to a potential change of the Character Model. The test suite should test for CharMod-related requirements in the
specification(s) being tested. The tests should conform to [C] requirements
(except where they are wrong on purpose). If the test collection includes
code, then that should also conform to [I] requirements.
|
C061 | E | Na | N | David Fallside
| XMLP WG
| 1.1 | 'All W3C specifications must conform to this document'
Decision: Rejected Rationale 'Rejected': This para states the general principle and refers to section 2 for details. The various requirements will come into force once the CharMod spec becomes a REC. New decision: Not applicable Rationale: We have classified this comment as 'not applicable' because we have been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language from section 2. We still define conformance to CharMod. We have been instructed to request a finding from the TAG corresponding to the text that we removed. So CharMod will be enforced by the fact of being a REC, coupled with an eventual TAG finding and ongoing reviews of relevant specs by the I18N WG.
|
C062 | Na | Na | C | David Fallside
| XMLP WG
| 4.4 | 'XML protocol need not normalize application payloads or check to insure that they are normalized'
|
C063 | Na | Na | C | David Fallside
| XMLP WG
| 4.4 | Is text to be
normalized when forwarded?
|
C064 | E | R | C | David Fallside
| XMLP WG
| 4.4 | Give the reason(s) for prohibition
against normalizing suspect text
|
C065 | Na | Na | C | David Fallside
| XMLP WG
| 4.4 | Does rejected un-normalized
text have to be
normalized before it is returned to sender?
|
C066 | Na | Na | | David Fallside
| XMLP WG
| 4.4 | Specifications that define a mechanism for producing a document SHOULD require that the final output be normalized
Q: Is a document text-based format? A: Yes. [We must clarify that we mean the textual parts of documents] Q: Is this requirement covered by
the earlier one? A: No. There may not be a spec for the output document, as in the case of plain text, or the spec may not (yet) require N11N.
|
C067 | S | P | S | Tim Bray
| -
| 3.1.5 | Collation
Decision: Partially accepted. Note: We'll change the 'MUST' to a 'SHOULD'. Our response (sent 2003-05-01) -- Notification
|
C068 | S | P | S | Tim Bray
| -
| 3.6 | Unique Character Encoding
See also the following comments: C114 Decision: Partially accepted. Rationale: We have added:
"[S] When basing a protocol, format, or API on a protocol, format, or API that already has rules for character encoding, specifications SHOULD use rather than change these rules." and have added XML as an example. As said elsewhere, we prefer not to have requirements specific to a particular format. Also, the 'authored by humans' part is not necessarily true; in general, humans care about the actual text and about the tools they use, not about encodings. Our response (sent 2004-01-16) -- Notification
|
C069 | E | P | S | Tim Bray
| -
| 3.6.2 | Admissibility of UTF-*
Decision: Partially accepted. Note: Covered by our edit resulting from C114 and your previous comment C068. Our response (sent 2004-01-16) -- Notification
|
C070 | S | N | S | Tim Bray
| -
| 4 | Early Uniform Normalization
Decision: Noted.
We agree with the sentiment. Refer to some
mails by Mark about cost of checking/normalizing. Doing normalization
really early (when data is input or converted) is usually
very cheap because it can be done by design (e.g. keyboards
with dead keys, conversion from a specific legacy encoding).
Decision: Noted. We agree that this is an important consideration. Please refer to some earlier mails by Mark Davis about cost of checking/normalizing. Doing normalization really early (when data is input or converted) is usually very cheap because it can be done by design (e.g. keyboards with dead keys, conversion from a specific legacy encoding). Normalization is indeed best run at i/o speed, but this should be human input speed rather than network i/o speed. A general normalization algorithm needs significantly more than 10KB footprint. But there is quite a wide range of possible tradeoffs between speed and footprint. We have added references to implementations and additional material in Appendix D, resources for Normalization. http://www.w3.org/International/Group/charmod-edit/#sec-n11n-resources
There is also an FAQ at http://www.unicode.org/faq/normalization.html.
An implementation that I (MD) did for just *checking* NFC came in under 50KB (in C). Mark reported 110KB for actual normalization to NFC (in Java). Our response (sent 2004-01-16) -- Notification |
C071 | E | R | D | Tim Bray
| -
| 6 | Bit-by-bit identity
Decision: Rejected. Rationale: What actually happens in the various programming languages we know is that they all require care to make sure that the encoding is really the same. There is no C function to automatically compare multibyte and wide-character represetations, and so on. We think that it is much better to be too specific to make sure implementers don't forget anything, rather than too abstract. Our response (sent 2004-01-16) -- Notification
|
C072 | S | R | D | Tim Bray
| -
| 9 | Referencing Unicode
|
C073 | S | A | S | Tim Bray
| -
| 3.1.3 | [S] Protocols, data formats and APIs MUST store, interchange or
process text data in logical order
|
C074 | S | P | C | Tim Bray
| -
| 2 | [S] [I] and [C]
|
C075 | E | A | S | Tim Bray
| -
| 3.1.6 | Backward octets
|
C076 | E | R | S | Tim Bray
| -
| 3.7 | Absence of explicit end delimiters makes Charmod non-compliant
Decision: Rejected Rationale for 'Rejected': See below Our response (sent 2002-06-18) -- Re: Absence of explicit end delimiters makes Charmod non-compliant [...] The U+hhhh notation does not appear to have any delimiters, but
because U+hhhh is used as a word in free-flowing text, the usual
word delimiters (space, punctuation) function as delimiters.
Also, U+hhhh is not actually any escape syntax, because it
is not intended to stand in directly for a character, but to
talk about a character on a meta-level. In both aspects, this
is similar to character names (e.g. LATIN UPPER CASE LETTER A). Please tell us whether you are satisfied or not with this
decision at your earliest convenience.
|
C077 | E | P | S | Tim Bray
| -
| 4.2.2 | Bold legacy encoding
Decision: Partially accepted. Rationale for 'Partially accepted': This is not the first instance of the term. Decision: Add a link to the definition in section 1.2. Our response (sent 2003-05-01) -- Notification
|
C078 | E | A | S | Tim Bray
| -
| 4.2.2 | or the absence thereof
|
C079 | E | R | S | Tim Bray
| -
| 4.4 | '[C] In order to conform to this specification, all text content on the web MUST ...'
|
C080 | E | A | N | Yin Leng Husband
| WSArch WG
| 1.1 | 'must conform to these provisions'
|
C081 | E | A | N | Yin Leng Husband
| WSArch WG
| 1.2 | 'covers the widest possible range'
|
C082 | E | A | N | Yin Leng Husband
| WSArch WG
| 1.2 | 'a way of referencing characters independent of
the encoding of a resource'
|
C083 | E | A | N | Yin Leng Husband
| WSArch WG
| 1.2 | 'Unicode now serves as a common reference'
Decision: Accepted Decision: Make the sentence:
'Unicode now serves as a common reference for W3C specifications and
applications.'
more like the previous two sentences. Additionally, remove 'common reference'. Our response (sent 2003-02-17) -- Notification
|
C084 | E | R | S | Yin Leng Husband
| WSArch WG
| 1.2 | 'Use of control codes for various purposes'
|
C085 | E | A | N | Yin Leng Husband
| WSArch WG
| 1.2 | 'such properties'
|
C086 | E | A | N | Yin Leng Husband
| WSArch WG
| 2 | Inconsistent usage of term 'requirements'
Decision: Accepted Decision: Replace our 1st para with this para from RFC 2119: The key words 'MUST',
'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', 'SHOULD', 'SHOULD NOT',
'RECOMMENDED', 'MAY', and 'OPTIONAL' in this document are to be
interpreted as described in RFC 2119. Our response (sent 2003-02-17) -- Notification
|
C087 | Na | Na | S | Yin Leng Husband
| WSArch WG
| 2 | How will conformance be enforced?
Decision: Not applicable Rationale: We have classified this as 'Not applicable', because you have asked questions rather than suggesting changes to the document. Our answers are as follows: Decision: Q: How will conformance be enforced?
A: Through the usual W3C Process. Q: Are the the conformance requirements in this document testable for
violations? A: Because this is an architectural specification, it is not possible to test the requirements automatically. The conformance requirements are testable by human beings. For some specific [S], [I] and [C] it is possible to write automated tests for some of the requirements in some contexts (such as a specific specification).
|
C088 | S | A | S | Yin Leng Husband
| WSArch WG
| 2 | 'MUST conform' vs 'SHOULD be modified in order to conform'
See also the following comments: C038 C051 C089 C135 Decision: Accepted You point out a clear inconsistency, which we have fixed a while ago. We have later been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language altogether. We have been instructed to request a finding from the TAG corresponding to the text that we removed. We will make sure that, if relevant, the inconsistency you pointed out will not reappear.
|
C089 | E | A | S | Yin Leng Husband
| WSArch WG
| 2 | How is 'the next version of that specification [to] be modified in order to conform'?
See also the following comments: C038 C051 C088 C135 Decision: Accepted. You point out a clear inconsistency, which we have fixed a while ago. We have later been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language altogether. We have been instructed to request a finding from the TAG corresponding to the text that we removed. We will make sure that, if relevant, the inconsistency you pointed out will not reappear.
|
C090 | E | A | N | Yin Leng Husband
| WSArch WG
| 2 | Way unclear
|
C091 | E | Na | S | Yin Leng Husband
| WSArch WG
| 3.1.1 | Would be helpful to define 'featural syllabary'
See also the following comments: C093 Decision: Not Applicable. Rationale for 'Not Applicable': We decided to simplify the text by removing definitions such as abugida, abjad, etc. Our response (sent 2003-02-17) -- Notification
|
C092 | E | P | S | Yin Leng Husband
| WSArch WG
| 3.1.1 | 'combines symbols for individual sounds of the language'
Decision: Partially accepted Rationale for 'Partially accepted': This
section (3.1.1) is introductory. We don't want to use the term 'phoneme' before the next section (3.1.2), where it is introduced. Decision: Change 'into square syllabic blocks'
to
'into square blocks, each of which represents a syllable'. Our response (sent 2003-02-17) -- Notification
|
C093 | E | Na | S | Yin Leng Husband
| WSArch WG
| 3.1.1 | 'Indic scripts are abugidas'
See also the following comments: C006 Decision: Not Applicable. Rationale for 'Not Applicable': We decided to simplify the text by removing definitions such as abugida, abjad, etc. Our response (sent 2003-02-17) -- Notification
|
C094 | E | Na | S | Yin Leng Husband
| WSArch WG
| 3.1.1 | 'Arabic script is an example of an abjad'
See also the following comments: C093 Decision: Not Applicable. Rationale for 'Not Applicable': We decided to simplify the text by removing definitions such as abugida, abjad, etc. Our response (sent 2003-02-17) -- Notification
|
C095 | E | A | N | Yin Leng Husband
| WSArch WG
| 3.1.1 | 'Usages'
|
C096 | E | A | N | Yin Leng Husband
| WSArch WG
| 3.1.3 | 'Characters' vs 'character codes'
|
C097 | E | A | N | Yin Leng Husband
| WSArch WG
| 3.1.3 | Logical order
|
C098 | E | A | N | Yin Leng Husband
| WSArch WG
| 3.1.5 | 'In Thai the sequence U+0E44 U+0E01 must be sorted as if
it was written U+0E01 U+0E44.'
|
C099 | E | Na | S | Yin Leng Husband
| WSArch WG
| 3.1.7 | 'Character' and 'text' are defined circularly
|
C100 | S | R | S | Yin Leng Husband
| WSArch WG
| 3.6.2 | UTF-8 or UTF-16 as a
default encoding form
|
C101 | Na | Na | O | Yin Leng Husband
| WSArch WG
| 3.6.2 | 'Specifications MUST NOT propose the use of heuristics to determine the encoding of data'
Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2 Character encoding identification, 9th paragraph, last
sentence
'[S] Specifications MUST NOT propose the use of heuristics
to determine the encoding of data.'
It would be helpful to either give examples of the
undesirable 'heuristics' or the reasons for banning 'use of heuristics'.
Would the absence of a BOM in UTF-8 encoding be considered use of heuristics
for identifying encoding? This comment has been split into the following comments: C133 C134
|
C102 | S | A | N | Yin Leng Husband
| WSArch WG
| 3.6.2 | 'On interfaces to other protocols, software SHOULD
support conversion'
|
C103 | S | A | N | Yin Leng Husband
| WSArch WG
| 3.6.2 | 'between Unicode encoding forms' or 'to Unicode encoding forms' or both?
|
C104 | E | A | S | Yin Leng Husband
| WSArch WG
| 3.7 | 'instances of the language' vs 'the language'
Decision: Not applicable Rationale for 'Not applicable': Languages don't have character encodings inherently associated with them. Language instances do. Decision: Accepted. We understand the concern. Action: Text has been reworded 'There is also a need, often satisfied by the same or similar mechanisms, to express characters not directly representable in the character encoding chosen for a particular document or program (an instance of the markup or programming language).' Our response (sent 2003-02-17) -- Notification Our response (sent 2003-05-07) -- RE: Your comments on the Character Model [C080-C086, C090-C100, C102-C105, C107-C111]
|
C105 | E | A | N | Yin Leng Husband
| WSArch WG
| 3.7 | 'a language's syntax, which is itself expressed as characters represented at the character encoding level'
|
C106 | E | A | S | Yin Leng Husband
| WSArch WG
| 3.7 | 'Escape syntaxes where the end is determined by a character
outside the set of characters admissible in the character escape itself
SHOULD be avoided'
Decision: Accepted We have replaced
"Escape syntaxes where the end is determined by a character outside the set of characters admissible in the character escape itself SHOULD be avoided." with "Escape syntaxes where the end is determined by any character outside the set of characters admissible in the character escape itself SHOULD be avoided." Although this change is minimal, it should now be clear that this refers to cases where almost any arbitrary character can terminate an escape. Strictly speaking, the ';' in the examples is part of the escape (part of the text that gets replaced), where in other cases, the terminating character itself is not replaced. (often old octal notations work that way).
|
C107 | E | A | N | Yin Leng Husband
| WSArch WG
| 3.7 | 'Escaped characters SHOULD be acceptable wherever
unescaped characters are'
|
C108 | E | A | N | Yin Leng Husband
| WSArch WG
| 3.7 | 'escaped characters SHOULD be acceptable in
identifiers and comments'
|
C109 | E | A | N | Yin Leng Husband
| WSArch WG
| 4.2.3 | 'Many languages will benefit from defining more boundaries'
|
C110 | E | A | N | Yin Leng Husband
| WSArch WG
| 4.3.1 | Missing 'nor' clause
|
C111 | E | A | N | Yin Leng Husband
| WSArch WG
| 4.3.1 | 'not include-normalized' vs 'not Unicode-normalized'
|
C112 | Na | Na | O | Karl Dubost
| QA WG
| Various | QA Review for Charmod
|
C113 | Na | Na | O | Mark Scardina
| XSL WG
| Various | XSL WG Comments on Character Model WD
|
C114 | S | P | S | Norman Walsh
| TAG
| 3.6 | Specifications SHOULD NOT add rules for character
encoding beyond what is provided in XML
See also the following comments: C068 Decision: Partially accepted. Decision: Clarify our intent (When building on top of a pre-existing spec such as XML, this is a good enough reason to 'escape' the SHOULD). Additional comments: It is unclear whether the comment is trying to address
only XML, or is more general. It mentions XML several times, but is worded
as if it may also apply to things outside XML.
We think that having a single encoding can be beneficial in many cases,
and that XML on this point should not restrict things outside XML.
We think that for XML, the considerations given in the comment
(protocol vs. document) are important. We have added the following text: >>>>
[S] When basing a protocol, format, or API on a protocol, format, or API
that already has rules for character encoding, specifications SHOULD
use rather than change these rules. EXAMPLE: An XML-based format should use the existing XML rules for choosing
and determining the character encoding of external entities, rather than
invent new ones.
>>>>
Our response (sent 2004-01-16) -- Notification
|
C115 | Na | Na | O | Chris Lilley
| TAG
| Various | TAG comments on Character Model for the World Wide Web 1.0
|
C116 | E | R | N | Chris Lilley
| TAG
| Various | Numbered conformance requirements
Decision: We have classified this as editorial, and decided to reject it. Rationale: Changes to the document would cause major problems. We might at a later stage add numbers if we feel that the document is stable enough, but we don't want to commit to it. Our response (sent 2004-01-16) -- Notification Decision: Accepted. We have added visible ids to each conformance criterion that can be used to link directly to that criterion. These will be stable.
|
C117 | E | P | S | Chris Lilley
| TAG
| Various | The use, within the spec, of images of characters
Decision: Partially accepted. Rationale for 'Partially accepted': We have carefully reexamined the use of images, character numbers (U+...), character names, and actual characters, and made some corrections. We have based the choice of which mean(s) to use in each case on the amount of general support for the characters in question (Latin-1 being supported from the start of the Web, whereas Plane2 not yet being widely available anywhere), and on the importance of visual, logical, or numerical information for the point being made, and have tried to make sure that there are two or more means of representation where appropriate. We would like to point out that to some extent, we have to deal with a bootstrap problem. As an example, both the Unicode Standard and the SVG spec use bitmap images as a way to 'ground' one technology in another. Our response (sent 2004-01-16) -- Notification We have added an Appendix containing the text. You can link to the appropriate text by clicking on an image. This is explained in chapter 1.
|
C118 | S | P | S | Chris Lilley
| TAG
| Various | XML 1.0 and 1.1 are non conforming
Decision: Partially accepted. Decision: Attempt to clarify terminology such as 'conforming';
Improve text about code points in section 3.5. Rationale for 'Partially accepted': We have attempted to clarify terminology such as "conforming"; (i.e. to indicate that preexisting technology only 'SHOULD' conform even when new one 'MUST'; but this is now to some extent obsolete due to the fact that the application of Charmod to other specs will not be defined by Charmod itself, but rather by a TAG finding (we hope)). We have improved text in various instances where we thought that there might be a problem. We never had the intention to make XML 1.0 or XML 1.1 non-conforming. We would be very glad to reexamine and fix any specific instance where you think that we (still) are saying that XML is not conforming if you can point out such specific instances to us. On the other hand, we wrote Charmod so that it not
only applies to XML, but also to other, potentially new formats. We therefore tried to make sure to indicate best practice for such cases even if these might not always be exactly the same as what XML (to quite some extent for historical reasons) is doing. A typical example would be the use of both decimal and hexadecimal escape syntaxes in HTML and XML.
Our response (sent 2004-01-16) -- Notification
|
C119 | S | A | N | Chris Lilley
| TAG
| Various | Split the document in two
Original decision: Rejected. Rationale: The proposed approach would result in a lot more work, as all the chapters have been written
as part of a single document. Note also that other chapters (including 6 String Identity Matching and 7 String Indexing) would have to be dropped from such an early document, as both depend on Early Uniform Normalization. This would, effectively, leave only chapters 3 Characters and 9 Referencing the Unicode Standard and ISO/IEC 10646 (chapter 5 Compatibility and Formatting Characters consists mainly of a link to an extenal document). The comment overstates the urgency of getting chapters such as 3 Characters to REC status. New decision: Accepted. Rationale: We have originally rejected this comment, but we have recently re-examined it, and we are putting together a plan for splitting the document. The sections on string indexing and on string matching depend to a certain extent on normalization, and so we are not completely sure of the final structure of the document at this stage. Our response (sent 2004-01-16) -- Notification Split is now effective.
|
C120 | S | P | S | Chris Lilley
| TAG
| 3.1.5 | Remove parts dealing with collation and sorting
Decision: Partially accepted. Rationale for 'Partially accepted': We have modified the normative statements (changing from 'MUST' to 'SHOULD' and some wording changes).
We disagree that the section on collation/sorting does not match the maturity of the other sections. In the context of Section 3.1, Perceptions of Characters, the fact that units of collation are different from other units, and the various issues, are important and well established. The text as well as the examples have been carefully chosen to show the range of phenomena. We do not see the need for a separate architectural document on collation and related issues; there are already an ISO standard and an Unicode Technical Standard, as well as many implementations, for user-oriented sorting/collation. Our response (sent 2004-01-16) -- Notification
|
C121 | Na | Na | O | Chris Lilley
| TAG
| 3.1.3 | Units of visual rendering
Comment (received 2002-05-27) -- Comments on charmod from Chris 'Logical selection looks like this:' There should be a requirement after that [S][I] Specifications of protocols and APIs that involve selection of
ranges MUST provide for contiguous logical selections. Having defined the terms 'logical selection mode' and 'visual selection
mode', please use them rather than the highly ambiguous 'discontiguous
selections' and 'contiguous selections', so in fact that should be [S][I] Specifications of protocols and APIs that involve selection of
ranges MUST provide for text selection in logical selection mode. Also, should there not be something about copying that selection and
pasting it somewhere else, that what you get is the logical selection? Similarly in the next part, I suggest rewording to remove the ambiguous
phrase: [S] Specifications of protocols and APIs that involve selection of
ranges SHOULD provide for text selection in logical selection mode, at
least to the extent necessary to support implementation of visual
selection on screen on top of those protocols and APIs. Its not clear that this is such a strong requirement and it complicates
processing, especially on handheld devices. Perhaps weaken to MAY? And
say what happens when this funky visual selection getc copied and pasted
- do you get a set of separate logical selections (if so how delimited)?
A single visually ordered selection (yuk)? Something else? Otherwise, the weaker requirement for contiguous visual selection is
likely to merely encourage the use of visual storage or the disposal of
logical storage once the visual result has been generated. Which would
lead to text copied from visualy contiguous (logically discontiguous)
selections being stored in visual order. Which is to be avoided. It would be a good idea to tie into WAI concerns by noting that
accessibility tools, which access the DOM, should be able to get at
logically ordered text and to know which parts are selected.
This comment has been split into the following comments: C174 C175 C176 C177 C178 C179
|
C122 | E | A | S | Chris Lilley
| TAG
| 3.5 | Specifications MUST be defined in terms of Unicode characters, not bytes or glyphs
Decision: Accepted. We have added a preliminary qualifying sentence:
"All specifications that involve processing of text MUST specify the processing of text according to the Reference Processing Model, namely:" Our response (sent 2004-01-16) -- Notification
|
C123 | E | A | S | Chris Lilley
| TAG
| 3.5 | Is XML non-conforming?
Decision: Accepted. Decision: Change section 2 to state that if a specification satisfies the conditions laid down in RFC 2119 ('[...] there [...] exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.'), it shall be considered to be conforming. For general issues, see #C118. We have reworded the sentence in question, it now reads: "Specifications SHOULD not arbitrarily exclude characters from the full range of Unicode code points from U+0000 to
U+10FFFF inclusive; code points above U+10FFFF MUST NOT be allowed." We also have added a note:
"NOTE: Despite the prohibition against arbitrarily excluding characters, specifications will typically exclude Unicode ranges such as surrogate and non-character code points. On the other hand, it would be an example of an arbitrary decision to exclude characters above the Basic Multilingual Plane, or limit the characters to ASCII or Latin-1 repertoire."
We do not think that the exclusion of U+0000 in XML 1.1, or of the C0 range in XML 1.0, is arbitrary; it was done for very clear reasons. Our response (sent 2004-01-16) -- Notification
|
C124 | Na | Na | O | Chris Lilley
| TAG
| 3.6.2 | Character encoding identification
Comment (received 2002-05-27) -- Comments on charmod from Chris '[S] If the unique encoding approach is not chosen, specifications MUST
designate at least one of the UTF-8 and UTF-16 encoding forms of Unicode
as admissible encodings and SHOULD choose at least one of UTF-8 or
UTF-16 as mandated encoding forms (encoding forms that MUST be supported
by implementations of the specification).' Does that mean that, for example, saying UTF-8 is allowed and UTF-16 is
disallowed and an encoding declaration is not required, is okay? Needs a little more on encodings that are a group of similar but not
identical encodings, for example shift-jis. 'Because of the layered Web architecture (e.g. formats used over
protocols), there may be multiple and at times conflicting information
about character encoding. [S] Specifications MUST define
conflict-resolution mechanisms (e.g. priorities) for cases where there
is multiple or conflicting information about character encoding.' Yes. Better though to not define such layering; the XML MIME RFC messed
this up by allowing the charset and the xml encoding declaration to
differ and for the former to take precedence; this requires 'save as' to
rewrite the XML otherwise it is no longer well formed.... better to
require any transcoders to leave XML alone or to know how to rewrite the
encoding declaration if they change the encoding. 'Certain encodings are more or less associated with certain languages
(e.g. Shift-JIS with Japanese); trying to support a given language or
set of customers may mean that certain encodings have to be supported.' The corollary should be clearly stated: do not assume that 'everyone'
supports a favored but non-mandated encoding 'every parser I know
supports Latin-1/Shift-JIS' is not true. This comment has been split into the following comments: C182 C183 C184 C185
|
C125 | S | P | S | Chris Lilley
| TAG
| 3.6.3 | 3.6.3 contradictory
Decision: Accepted. We agree with your concern about e.g. an svg glyph with an attribute unicode="︀". We have changed the text somewhat, please check. However, we would like to point out that this svg mechanism is not designed for agreement on private use characters, it is designed for rendering of characters in general. It can be used for *rendering* of private-use characters, which may be appropriate or necessary in some cases. It could also be misused to completely change the rendering of some text (in the case of Chinese or Japanese easily to an extent that would completely change the meaning of the visually appearing text). While the use for private use characters could be checked, the use for completely changing the rendering could obviously not be checked by an SVG implementation. Our response (sent 2004-01-16) -- Notification
|
C126 | E | Na | S | Chris Lilley
| TAG
| 3.7 | Should XML allow NCRs everywhere?
We have classified this as "Not applicable", because it was a question. Our answer is: Yes, in an ideal world, or if we ever got to redo XML, it would be preferable to allow NCRs e.g. in element and attribute names, because this leads to a more clearly layered encoding model. Indeed the I18N WG at one time was in contact with Jon Bosak and others (including members of the respective ISO committee) to investigate the possibility of such a change. As explained under #C118, this does not mean that XML is non-conformant, nor that it should be changed. But it is important to note this experience for any new formats. We would also like to note that CSS and Java do it this way. Our response (sent 2004-01-16) -- Notification
|
C127 | S | R | N | Chris Lilley
| TAG
| 8 | Say that the IRI form is used in the document instance and the hexified URI form when it goes over the wire
Decision: Rejected. Rationale: We do not want to preclude the direct use of IRIs by wire protocols. Whether to use URIs or IRIs is defined by the wire protocol in question. HTTP currently defines to use URIs, a new version of HTTP (if ever needed) or some other protocol may use IRIs. Similar considerations apply to documents formats, some document formats in some 'slots' may allow IRIs, whereas others don't. Our response (sent 2004-01-16) -- Notification
|
C128 | E | R | S | Chris Lilley
| TAG
| 9 | Referencing the Unicode Standard and ISO/IEC 10646
Decision: Rejected. Rationale: The current language is the result of careful deliberation and compromise. The situation is not as simple as you describe it. ISO 10646 and Unicode are as good as the other at giving the "LATIN SMALL LETTER A" the semantics of 'latin small letter a'. Also, ISO 10646 actually contains a normative reference to Unicode's bidi algorithm, and some other stuff in Unicode.
Our response (sent 2004-01-16) -- Notification
|
C129 | E | R | N | Steven Pemberton
| HTML WG
| 3.6.3 | 'private agreements don't scale on the web'
Decision: Rejected. Rationale: We believe that private agreements indeed do not scale on the Web. The text already contains the explanation why this is so: "Code points from different private agreements may collide. Also a private agreement, and therefore the meaning of the code points, can quickly become lost." (slight editorial changes from the LC version) The collision problem already exists for two private agreements, and very quickly increases with the number of agreements. |
C130 | E | A | N | Steven Pemberton
| HTML WG
| 4.3.1 | Readability of tables
|
C131 | E | A | N | Steven Pemberton
| HTML WG
| Various | Spelling
|
C132 | E | P | N | Steven Pemberton
| HTML WG
| 3.3 | Give example of transcoding
|
C133 | E | A | S | Yin Leng Husband
| WSArch WG
| 3.6.2 | 'Specifications MUST NOT propose the use of heuristics to determine the encoding of data'
See also the following comments: C158 C169 Decision: Accepted We have added explanatory text as follows: "Examples of heuristics include the use of statistical analysis of byte
(pattern) frequencies or character (pattern) frequencies. Heuristics are bad because they will not work consistently across different implementations. Well-defined instructions of how to unambiguously determine a character encoding, such as those given in XML 1.0 [XML 1.0], Appendix F, are not considered heuristics."
|
C134 | E | Na | S | Yin Leng Husband
| WSArch WG
| 3.6.2 | 'Specifications MUST NOT propose the use of heuristics to determine the encoding of data'
Decision: Not applicable We have classified this comment as 'not applicable', because it is a question. Note: It depends on the context, eg in XML, if both a BOM and an encoding
declaration are absent, the entity must be encoded using UTF-8. We do not
consider this to amount to the use of heuristics, as the correct behavior
is fully specified and deterministic. In other contexts, the detection of
the absence of a BOM might be used as part of general 'sniffing', which we
would say amounts to the use of heuristics. But we do not know of such a case, and in general, UTF-8 should work even without a BOM.
|
C135 | S | A | N | Mark Scardina
| XSL WG
| 2 | XSL WG Comments on Character Model WD
See also the following comments: C038 C051 C088 C089 Decision: Accepted. You point out a clear inconsistency, which we have fixed a while ago. We have later been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language altogether. We have been instructed to request a finding from the TAG corresponding to the text that we removed. We will make sure that, if relevant, the inconsistency you pointed out will not reappear. Our response (sent 2003-02-13) -- Notification
|
C136 | E | A | N | Mark Scardina
| XSL WG
| 3.1.3 | XSL WG Comments on Character Model WD
Decision: Partially accepted. Rationale: As elsewhere in the
document, the intent is not to proscribe internals, only external
behavior. But this may include protocols (e.g. http), data formats
(e.g. XML,...), and APIs (e.g. the DOM), and it may include storage
(mostly data formats), interchange (mostly protocols), and processing
(mostly APIs). To make absolutely clear that the Character Model only addresses
observable behavior, we have changed the following sentence in the
introduction: "Where this specification contains a procedural description, it is
to be understood as a way to specify the desired external behavior.
Implementations can use other means of achieving the same results,
as long as observable behavior is not affected." to "Where this specification places requirements on processing, it is
to be understood as a way to specify the desired external behavior.
Implementations can use other means of achieving the same results,
as long as observable behavior is not affected." Our response (sent 2003-02-13) -- Notification
|
C137 | S | A | N | Mark Scardina
| XSL WG
| 3.1.5 | XSL WG Comments on Character Model WD
|
C138 | E | A | N | Mark Scardina
| XSL WG
| 3.1.7 | XSL WG Comments on Character Model WD
See also the following comments: C004 C040 C166 Decision: Accepted. Add clarification / examples. Our response (sent 2002-05-01) -- Notification
|
C139 | S | A | N | Mark Scardina
| XSL WG
| 3.2 | XSL WG Comments on Character Model WD
See also the following comments: C041 Decision: Accepted. We have accepted this comment and have changed
"... is identified by an IANA charset identifier." to
"... is identified by a unique identifier, such as an IANA charset identifier." in Section 3.2. We suspect that you might also have some problems with some of the wording in Section 3.6.2, which (among else) says: "[S] If the unique encoding approach is not taken, specifications SHOULD mandate the use of the IANA charset registry names [...]"; if this is the case, please indicate so at as soon as possible, or we will have to assume that this is okay with you. Our response (sent 2003-02-13) -- Notification
|
C140 | E | Na | W | Mark Scardina
| XSL WG
| 3.5 | XSL WG Comments on Character Model WD
Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD '[S] Specifications MAY allow use of any character encoding which can
be transcoded to Unicode for its text entities. [S] Specifications MAY choose to disallow or deprecate some encodings
and to make others mandatory. Independent of the actual encoding, the
specified behavior MUST be the same as if the processing happened as
follows:
The encoding of any text entity received by the application implementing
the specification MUST be determined and the text entity MUST be
interpreted as a sequence of Unicode characters - this MUST be
equivalent to transcoding the entity to some Unicode encoding form ,
adjusting any character encoding label if necessary, and receiving it in
that Unicode encoding form. All processing MUST take place on this
sequence of Unicode characters. If text is output by the application,
the sequence of Unicode characters MUST be encoded using an encoding
chosen among those allowed by the specification. [S] If a specification
is such that multiple text entities are involved (such as an XML
document referring to external parsed entities), it MAY choose to allow
these entities to be in different character encodings. In all cases, the
Reference Processing Model MUST be applied to all entities.'
It may be less confusing to have these requirements separated with
a clarifying sentence, breaking these out under a clarifying context.
Is this intent to forbid entity representation of non-Unicode
characters?
Our response (sent 2002-07-23) -- Re: XSL WG Comments on Chairacter Model WD Comment (received 2002-07-23) -- Re: XSL WG Comments on Chairacter Model WD Our response (sent 2002-07-24) -- Re: XSL WG Comments on Chairacter Model WD Comment (received 2002-07-25) -- Re: XSL WG Comments on Chairacter Model WD Comment (received 2002-07-30) -- Re: XSL WG Comments on Chairacter Model WD Our response (sent 2002-09-03) -- RE: Please clarify XSL WG comment (issue 146) on Character Model WD [...] Please note that we have asked you for clarification on three
of your comments [...] Comment (received 2002-09-10) -- Character Model Comments Clarifications Our response (sent 2002-09-24) -- Re: Character Model Comments Clarifications Comment (received 2002-09-24) -- RE: Character Model Comments Clarifications [...] As to the 'one paragraph' comment, my apologies as in my
'cut and pasting' of the WD for our discussion, the paragraphs got lost.
Thus the resulting comment. Our response (sent 2002-10-07) -- RE: Character Model Comments Clarifications Many thanks for the comment above. Unfortunately, this doesn't
really help us understanding your original comment. To make
progress on this issue, can I suggest that you, or somebody
else from the XSL WG, take the original comment
(e.g. at http://www.w3.org/International/Group/2002/charmod-lc/#C140),
and exchange the sentence
'It may be less confusing to have these
requirements separated with a clarifying sentence, breaking these out
under a clarifying context.'
with something more detailed, explaining which requirements
(i.e. some of those cited, all of those cited,...) where to break,
what to clarify in particular, and so on. Comment (received 2002-10-07) -- RE: Character Model Comments Clarifications Martin, the original comment is no longer relevant once the original
text was reviewed based upon your answer. Please close it.
|
C141 | E | A | N | Mark Scardina
| XSL WG
| 3.7 | XSL WG Comments on Character Model WD
|
C142 | Na | Na | N | Mark Scardina
| XSL WG
| 3.7 | XSL WG Comments on Character Model WD
Decision: Not applicable. Rationale: We have classified this comment as "not applicable", because the comment is too general to give any idea of what is wrong with the document or what we should do to fix it. We note that we think that"Specifications MUST NOT invent a new escaping mechanism if an appropriate one already exists." leaves enough room for new escaping syntaxes should an appropriate one not yet exist.
Our response (sent 2003-02-13) -- Notification
|
C143 | E | R | C | Mark Scardina
| XSL WG
| 4.4 | XSL WG Comments on Character Model WD
|
C144 | Na | Na | O | Mark Scardina
| XSL WG
| 4.4 | XSL WG Comments on Character Model WD
Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD '[S] [I] A text-processing component that receives suspect text MUST
NOT perform any normalization-sensitive operations unless it has first
confirmed through inspection that the text is in normalized form, and
MUST NOT normalize the suspect text . Private agreements MAY, however,
be created within private systems which are not subject to these rules,
but any externally observable results MUST be the same as if the rules
had been obeyed.'
The exception for private agreements is crippled by the observable
results restriction thus when all is said and done any suspect text will
always remain.
Section 4.4 appears to require that XML be changed to disallow the
use of a composing character as the first character in an entity. This
change would be backwards incompatible. XSL WG specifications such as
XSLT and XPath must continue to work with all XML well-formed documents.
Since the contents of an XML text node are 'suspect text' (there
is nothing to prevent use of a composing character as the first
character in a text node), section 4.4 appears to be saying that XPath
must disallow operations such as substring() unless the text is
inspected and found to be normalized. We do not believe that users want
to pay the high cost of this feature.
This comment has been split into the following comments: C187 C188 C189
|
C145 | Na | R | C | Mark Scardina
| XSL WG
| 4.4 | XSL WG Comments on Character Model WD
|
C146 | E | R | C | Mark Scardina
| XSL WG
| 4.4 | XSL WG Comments on Character Model WD
|
C147 | E | A | | Mark Scardina
| XSL WG
| 4.4 | XSL WG Comments on Character Model WD
|
C148 | E | A | N | Mark Scardina
| XSL WG
| 8 | XSL WG Comments on Character Model WD
Decision: Accepted. Note that full guidance is given in the newest version of the IRI spec, which is referenced by Section 8. This explicitly does not require to 'expand' %HH escapes, and therefore does not conflict with curret implementation practice for namespace URIs. Our response (sent 2003-02-13) -- Notification
|
C150 | E | R | D | C. M. Sperberg-McQueen
| -
| Various | The term 'UCS' vs. the term 'Unicode'
Decision: Rejected. Rationale for 'Rejected': The word 'Unicode' is almost universally used in this sense, including
by Production 2 of the XML specification. Decision: Review all instances of the word 'Unicode', to ensure they are used consistently. Note: There are two problems with the use of the word 'Unicode' in the Note in section 3.3: a) we mean the Unicode Standard, b) Unicode is not an encoding. Our response (sent 2003-02-12) -- Notification |
C151 | E | P | D | C. M. Sperberg-McQueen
| -
| A.2 | ANSI X3.4 is missing
Decision: Partially accepted. Decision: Cite ISO 646 (International Reference Version), rather than ANSI X3.4, and link to it from the text. Rationale for 'Partially accepted': Where a national and an international standard define the same matter, use of the latter is preferable. Our response (sent 2003-02-12) -- Notification |
C152 | S | A | S | C. M. Sperberg-McQueen
| -
| 3.1.5 | Spanish 'ch' is not a letter sequence
|
C153 | E | A | S | C. M. Sperberg-McQueen
| -
| 3.1.5 | Counting languages
|
C154 | T | A | S | C. M. Sperberg-McQueen
| -
| 3.1.5 | For 'o' read 'ö'
|
C155 | S | A | N | C. M. Sperberg-McQueen
| -
| 3.1.5 | User control of collation, foreign matter
Decision: Accepted. We have replaced
"[S] [I] When sorting and searching in the context of a particular language, it MUST be possible to deal gracefully with strings being compared that contain Unicode characters not normally associated with that language." with "[S] [I] Specifications and implementations of sorting and searching
algorithms SHOULD accommodate all characters in the Unicode set." The change from 'MUST' to 'SHOULD' is not due to this comment, but due to other comments. Our response (sent 2003-02-13) -- Notification |
C156 | E | A | S | C. M. Sperberg-McQueen
| -
| 3.5 | Deriving specs from specs, building specs on specs
|
C157 | S | A | S | C. M. Sperberg-McQueen
| -
| 3.6 | Always reliable identification is a chimaera
|
C158 | E | A | N | C. M. Sperberg-McQueen
| -
| 3.6.2 | Heuristics
See also the following comments: C133 C169
Our response (sent 2002-07-11) -- Re: Heuristics Decision: Accepted. We have added explanatory text as follows: "Examples of heuristics include the use of statistical analysis of byte
(pattern) frequencies or character (pattern) frequencies. Heuristics are bad because they will not work consistently across different implementations. Well-defined instructions of how to unambiguously determine a character encoding, such as those given in XML 1.0 [XML 1.0], Appendix F, are not considered heuristics." Our response (sent 2003-02-13) -- Notification |
C159 | Na | Na | O | C. M. Sperberg-McQueen
| -
| 3.7 | fixed-length escapes
-
Comment (received 2002-07-12) -- fixed-length escapes
In contemplating the rule '[S] Escape syntax SHOULD either require
explicit end delimiters or mandate a fixed number of characters in
each character escape' I am uncertain whether you intend to outlaw the
kinds of escapes defined by section 6.3 of ISO 2022 or not. ISO 2022
defines some fixed-length and some variable-length escape sequences,
in which certain classes of characters are defined as final
characters. These final characters might be viewed as explicit end
delimiters, but they are not solely delimiters. They are part of the
escape sequence and cannot be disregarded in establishing the meaning
of the escape sequence. I don't think I have a strong preference for making escape sequences
of this kind legal or illegal here, but I think it probably needs
to be clearer whether they are legal or not. In the same rule, 'Escape syntaxes where the end is determined by a
character outside the set of characters admissible in the character
escape itself SHOULD be avoided' is a good provision, but at first
glance it seemed to be saying that the terminating semicolon of
entity and character references (which is 'a character outside the
set of characters admissible in the character escape itself') was
being deprecated. I think rephrasing might help, though I have not
been able to draft a better alternative.
Our response (sent 2002-07-12) -- Re: fixed-length escapes Comment (received 2002-07-13) -- Re: fixed-length escapes This comment has been split into the following comments: C180 C181 |
C160 | E | A | S | C. M. Sperberg-McQueen
| -
| 4.1.2 | For 'insure' read 'ensure'
|
C161 | E | A | N | Chris Haynes
| -
| 3.7 | Recommend Unicode for character escapes?
Our response (sent 2002-07-13) -- Re: Recommend Unicode for character escapes? I think this is a typical example of where it was just absolutely
obvious to us, but where it makes a lot of sense to say things
explicitly. I would actually prefer to change the 'should' to
a 'must':
... and MUST represent the Unicode code point of the character. Decision: Accepted. Our response (sent 2003-02-17) -- Notification |
C162 | Na | Na | N | Karl Dubost
| QA WG
| 2 | Conformance
Decision: Not applicable. Rationale: We have classified this comment as 'not applicable' because it does not make any suggestions re. changes of the specification. We have been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language from section 2. We still define conformance to CharMod. We have been instructed to request a finding from the TAG corresponding to the text that we removed. So CharMod will be enforced by the fact of being a REC, coupled with an eventual TAG finding and ongoing reviews of relevant specs by the I18N WG. As we understand, many of the requirements on other specs that the QA WG is looking at are much more procedural in nature, whereas the requirements in CharMod are more technical. Therefore, different considerations may apply to your work.
|
C163 | Na | Na | N | Karl Dubost
| QA WG
| 2 | Testable Assertions/Requirements
Our response (sent 2002-06-20) -- Re: QA Review for Charmod Binary tests are very difficult in many case, or have to be worked
out individually for each spec (e.g. XML, CSS,...). Decision: Not applicable. Rationale: We have classified this as 'Not applicable', because you are just as kind about our plans, not suggesting changes to the document. The Character Model is an architectural specification, and it is therefore difficult if not impossible to create binary tests. If we had an automatic test to see whether another specification conforms to the character model, that would indeed be great, but it is obvious that this is impossible. In some cases, tests can be worked out for individual specifications that conform to the character model (e.g. XML, CSS,...), but those would be part of the test suite for that spec. For some aspects of the character model, or some material we reference, there are already tests, e.g. for NFC. Regarding examples and techniques, the text already contains many examples where we found they are necessary to clarify the specification, and we have added more examples as a result of last call comments. We also expect that for passing CR, we will have to provide a list of other specifications that follow the various provisions in Charmod, and such a list will provide a wealth of examples.
|
C164 | E | R | S | Karl Dubost
| QA WG
| 3.1.2 | QA Review for Charmod
Our response (sent 2002-06-20) -- Re: QA Review for Charmod This would be wrong because there are also multiple phonemes - one character
and multiple phonemes - multiple characters. Excluding the one-to-one case
looked easier.
Decision: Rejected Rationale for 'Rejected': See our earlier response above. Our response (sent 2003-05-01) -- Notification
|
C165 | E | A | N | Karl Dubost
| QA WG
| 3.1.3 | QA Review for Charmod
Our response (sent 2002-06-20) -- Re: QA Review for Charmod I think it can be [S/I/C]. Specifications don't store or interchange
data, but they specify how it's done.
Decision: Accepted. Our response (sent 2003-05-01) -- Notification
|
C166 | E | P | S | Karl Dubost
| QA WG
| 3.1.7 | QA Review for Charmod
|
C167 | E | R | S | Karl Dubost
| QA WG
| Various | QA Review for Charmod
|
C168 | S | A | N | C. M. Sperberg-McQueen
| XML Schema WG
| 3.6 | Reliability of character encoding identification
See also the following comments: C157
Decision: Accepted. We have removed the word 'always'. The intent is to require specifications to make it possible to reliably identify character encodings. Our response (sent 2003-02-13) -- Notification |
C169 | E | A | N | C. M. Sperberg-McQueen
| XML Schema WG
| 3.6.2 | Heuristics considered useful
See also the following comments: C133 C158
Decision: Accepted. We have accepted this comment. We have added explanatory text as follows: "Examples of heuristics include the use of statistical analysis of byte
(pattern) frequencies or character (pattern) frequencies. Heuristics are bad because they will not work consistently across different implementations. Well-defined instructions of how to unambiguously determine a character encoding, such as those given in XML 1.0 [XML 1.0], Appendix F, are not considered heuristics." Our response (sent 2003-02-13) -- Notification |
C170 | S | P | N | C. M. Sperberg-McQueen
| XML Schema WG
| 8 | Converting to RFC-2396-style URIs
See also the following comments: C031 C059
Decision: Partially accepted. Rationale: Our plan is that the IRI
Internet-Draft, referenced in this section, will have been submitted
for Proposed Standard by the time CharMod moves to the next stage (CR).
Conversion from IRIs to URIs is fully addressed in the IRI spec, and
is needed there, and should therefore not be duplicated in charmod. The reference to RFC 2718 is informative only. To make this clearer,
we have moved it out of the actual conformance criterion
(C060)
into a separate sentence reading "This is in accordance with Guidelines
for new URL Schemes [rfc2718] Section 2.2.5.". In any way, that
part of that section speaks about new schemas and things such as
XPointer, not about 'IRI slots' such as anyURI. Our response (sent 2003-02-13) -- Notification |
C171 | S | A | C | C. M. Sperberg-McQueen
| XML Schema WG
| 4 | Early uniform normalization
|
C172 | E | A | S | C. M. Sperberg-McQueen
| -
| 9 | The spelling of ISO's name
|
C173 | Na | Na | W | Rick Jelliffe
| -
| 3.7 | fixed-length escapes
Comment (received 2002-07-13) -- Re: fixed-length escapes [...] To 'escape' a character means to allow it to be used with
a different significance. So in C '\' is the delimiter character,
and '\\' is the escaped delimiter. In XML, the only escaping is provided by CDATA marked
sections (and perhaps by comments and PIs). In SGML, you
could optionally have a 'markup suppression' character that
acted as an escape too. An entity reference is not an 'escape'. To keep on calling
it an 'escape' loses a valuable distinction, and can only
promote confusion, because it lumps together references
and real escapes. Withdrawn.
|
C174 | E | R | S | Chris Lilley
| TAG
| 3.1.3 | Units of visual rendering
Decision: Rejected. Rationale for 'Rejected': We have rejected this comment. Distinct concepts require distinct terms. Logical selection can lead to both contiguous and discontiguous selections. Visual selection also can lead to both contiguous and discontiguous selections if you select from the end of a line to the start of the next. Logical/visual selection indicates the principle by which the program is working; contiguous/discontiguous selection indicates the visible results of this inner working. Our response (sent 2004-01-16) -- Notification
|
C175 | E | R | D | Chris Lilley
| TAG
| 3.1.3 | Units of visual rendering
Decision: Rejected. Rationale: We think that this is already covered by: [S] Protocols, data formats and APIs MUST store, interchange or process
text data in logical order. Our response (sent 2004-01-16) -- Notification
|
C176 | S | R | D | Chris Lilley
| TAG
| 3.1.3 | Units of visual rendering
Decision: Rejected. Rationale: We rejected this comment. What if the paste is into a bitmap editor? Also, if it's a visual selection, then copying/pasting should paste the characters selected in the visual selection, rather than those in a corresponding logical selection between the same end points.
Our response (sent 2004-01-16) -- Notification
|
C177 | E | R | S | Chris Lilley
| TAG
| 3.1.3 | Units of visual rendering
Decision: Rejected. Rationale: The original paragraph is about visual
selection, not logical. Visual selection requires discontiguous logical
ranges and the requirement is for protocols and APIs to provide the
latter. Our response (sent 2004-01-16) -- Notification
|
C178 | S | R | D | Chris Lilley
| TAG
| 3.1.3 | Units of visual rendering
Decision: Rejected. Rationale: First, visual storage and visual selection are independent of each other. We think it's important that protocols and APIs SHOULD support discontiguous logical ranges so that implementations
MAY implement visual selection if they wish. This is in particular relevant for technologies such as XPointer. We do not think that this will lead to the use of visual ordering inside the selection. In situations such as cut/paste without special support, the visual selection is usually copied as as sequence of segments, all internally in logical order. The sequence of segments and other things may be implementation-dependent, and in advanced applications, the overall result may depend on where the insertion is made. Our response (sent 2004-01-16) -- Notification
|
C179 | E | A | S | Chris Lilley
| TAG
| 3.1.3 | Units of visual rendering
Decision: Accepted. We have mentioned accessibility (and interoperability in general) just before "[S] Protocols, data formats and APIs MUST store, interchange or process text data in logical order."
Our response (sent 2004-01-16) -- Notification
|
C180 | E | R | N | C. M. Sperberg-McQueen
| -
| 3.7 | fixed-length escapes
Our response (sent 2002-07-12) -- Re: fixed-length escapes Decision: Rejected. Rationale: We do not think it is necessary to explicitly exclude this kind of escape sequences, because we do not think that anybody would actually want to use anything like this. There is an amazingly wide variety of escape sequence syntaxes, but we have never seen anything that even get close. While completely distinguishing good and bad escape syntaxes has some appeal, we want to keep a certain practical touch to our document and want to keep it readable, and want to give the reader enough breathing room that they can actually think about the issues at hand (because they need to; the Character Model cannot just be applied mechanically).
Our response (sent 2003-02-13) -- Notification |
C181 | E | A | N | C. M. Sperberg-McQueen
| -
| 3.7 | fixed-length escapes
See also the following comments: C106
Decision: Accepted. We have replaced
"Escape syntaxes where the end is determined by a character outside the set of characters admissible in the character escape itself SHOULD be avoided." with "Escape syntaxes where the end is determined by any character outside the set of characters admissible in the character escape itself SHOULD be avoided." Although this change is minimal, it should now be clear that this refers to cases where almost any arbitrary character can terminate an escape. Strictly speaking, the ';' in the examples is part of the escape (part of the text that gets replaced), where in other cases, the terminating character itself is not replaced. (often old octal notations work that way). Our response (sent 2003-02-13) -- Notification |
C182 | Na | Na | N | Chris Lilley
| TAG
| 3.6.2 | Character encoding identification
Answer: Yes. Decision: We have classified this as "Not applicable", because it was a question. Our answer is "yes". This should be understood in light of our comments to C118. It is not meant to change the rules of specific existing formats or protocols, but to give guidance to new formats or protocols. Our response (sent 2004-01-16) -- Notification
|
C183 | E | P | N | Chris Lilley
| TAG
| 3.6.2 | Character encoding identification
Decision: Rejected. Rationale: We have rejected this comment, because this is already mentioned. But as a result of other editing, the relevant note is now in a very prominent position just after the opening paragraph. If you think this is not enough, please provide concrete suggestions on what you think is missing. Our response (sent 2004-01-16) -- Notification
|
C184 | Na | Na | N | Chris Lilley
| TAG
| 3.6.2 | Character encoding identification
Decision: Not applicable. Rationale: We decided to reject this comment, in the sense that we are not dealing with this issue in the current version. However, we will note this issue for an eventual future version of the document. We would like to point out that we do not introduce layering, we just point out that it exists. On the specific point of RFC 3023, we can agree that some adjustments may be needed, but we think that this is for the IETF process to decide this. Going as far as disallowing a charset parameter in a protocol does not seem appropriate, because it would restrict implementation and deployment too much. In general, saying something like "don't allow too many ways of specifying the character encoding" seems like a good idea, but it is too general to be helpful for actual specification designers, and providing more detailed advice and examples seems difficult at this point.
|
C185 | E | A | N | Chris Lilley
| TAG
| 3.6.2 | Character encoding identification
Decision: Accepted. We added a note.
|
C186 | S | A | N | Deborah Goldsmith
| Apple
| 4.4 | Apple comments on Character Model
|
C187 | S | P | C | Mark Scardina
| XSL WG
| 4.4 | XSL WG Comments on Character Model WD
|
C188 | Na | N | C | Mark Scardina
| XSL WG
| 4.4 | XSL WG Comments on Character Model WD
|
C189 | Na | R | C | Mark Scardina
| XSL WG
| 4.4 | XSL WG Comments on Character Model WD
|
C191 | S | A | N | Dan Chiba
| -
| 3.6.2 | 'x-' prefix on charset names
Our response (sent 2002-10-02) -- Re: 'x-' prefix on charset names What you are saying is that you are in a situation where you can't respect the SHOULD in the first sentence nor the SHOULD NOT in the second sentence, and it's unclear which one of them is stronger. I propose that we have a look at this in the WG and make clear that in such a case, using x- is better than not using x-.
Decision: Accepted We added: "C023[S][I][C] If an unregistered character encoding is used, the convention of using 'x-' at the beginning of the name MUST be followed."
|
Impact: red cell if still not assigned.
Decision: red cell if still not assigned.
Status: red cell if still not assigned.
Status: orange cell if Closed but needs moving to Notified.
Status: yellow cell if Notified but needs moving to one of Satisfied, Dissatisfied, or Withdrawn.