manual/xmlformat/BUGS


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

xmlformat bugs

Ruby version is slower than the Perl version, a difference that shows up
particularly for larger documents. I haven't done any profiling to determine
which parts of the program account for the differences.

----------
Within normalized elements, the line-wrapping algorithm preserves inline
tags, but it doesn't take any internal line-breaks within those tags into
account in its line-length calculations.  Consider this element:

<para>
This is text with an inline <inline-element attr="This is
an attribute
value">element</inline-element> in the middle.
</para>

The opening <inline-element> tag is considered to have a length
equal to its total number of characters.  The line-wrapping algorithm
could be made more complex to take the line-breaks into account.
I haven't bothered, and may never bother.  If such a change is made, no
indenting should be applied that would change the attribute value.

----------
Line-wrapping length calculations don't take into account the possibility
that text from a different element may occur on the same line if break
values are set to zero.  For example, with a wrap-length of 15, you could
end up with output like this:

<listitem><para>This is a line
of text.</para></listitem><listitem><para>This is a line
of text.</para></listitem>

The middle line has more than 15 characters of text.

This also shows that wrap-length doesn't take into account the length of
tags of enclosing blocks on the same line, which can also be considered a
bug.

Fix: Set all your break values > 0.

----------
Normalization converts runs of spaces to single spaces.  This means that
if you write two spaces after periods in text, normalization will
convert them to single spaces. Even if normalization didn't do that,
the two spaces would be lost if line-wrapping is enabled and they occur
at a line break.

I don't know if this is really a bug, but it's something to be aware of.

----------
Doesn't recognize multi-byte files. In some cases, you can work around this.
For example, an editor might save a file as Unicode even when the document
contains only ASCII characters.  Re-save the file as an ASCII file (or
some single-byte encoding such as ISO-8859-1) and it should work.