summaryrefslogtreecommitdiffstats
path: root/src/sed/BUGS
blob: a8bce01d3a58b96b03d15ba24072058a695c4d07 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
* ABOUT BUGS

Before reporting a bug, please check the list of known bugs
and the list of oft-reported non-bugs (below).

Bugs and comments may be sent to bonzini@gnu.org; please
include in the Subject: header the first line of the output of
``sed --version''.

Please do not send a bug report like this:

	[while building frobme-1.3.4] 
	$ configure 
	sed: file sedscr line 1: Unknown option to 's'

If sed doesn't configure your favorite package, take a few extra
minutes to identify the specific problem and make a stand-alone test
case.

A stand-alone test case includes all the data necessary to perform the
test, and the specific invocation of sed that causes the problem.  The
smaller a stand-alone test case is, the better.  A test case should
not involve something as far removed from sed as ``try to configure
frobme-1.3.4''.  Yes, that is in principle enough information to look
for the bug, but that is not a very practical prospect.



* NON-BUGS

`N' command on the last line

  Most versions of sed exit without printing anything when the `N'
  command is issued on the last line of a file.  GNU sed instead
  prints pattern space before exiting unless of course the `-n'
  command switch has been specified.  More information on the reason
  behind this choice can be found in the Info manual.


regex syntax clashes (problems with backslashes)

  sed uses the Posix basic regular expression syntax.  According to
  the standard, the meaning of some escape sequences is undefined in
  this syntax;  notable in the case of GNU sed are `\|', `\+', `\?',
  `\`', `\'', `\<', `\>', `\b', `\B', `\w', and `\W'.

  As in all GNU programs that use Posix basic regular expressions, sed
  interprets these escape sequences as meta-characters.  So, `x\+'
  matches one or more occurrences of `x'.   `abc\|def' matches either
  `abc' or `def'.

  This syntax may cause problems when running scripts written for other
  seds.  Some sed programs have been written with the assumption that
  `\|' and `\+' match the literal characters `|' and `+'.  Such scripts
  must be modified by removing the spurious backslashes if they are to
  be used with recent versions of sed (not only GNU sed).

  On the other hand, some scripts use `s|abc\|def||g' to remove occurrences
  of _either_ `abc' or `def'.  While this worked until sed 4.0.x, newer
  versions interpret this as removing the string `abc|def'.  This is
  again undefined behavior according to POSIX, but this interpretation
  is arguably more robust: the older one, for example, required that
  the regex matcher parsed `\/' as `/' in the common case of escaping
  a slash, which is again undefined behavior; the new behavior avoids
  this, and this is good because the regex matcher is only partially
  under our control.

  In addition, GNU sed supports several escape characters (some of
  which are multi-character) to insert non-printable characters
  in scripts (`\a', `\c', `\d', `\o', `\r', `\t', `\v', `\x').  These
  can cause similar problems with scripts written for other seds.


-i clobbers read-only files

  In short, `sed d -i' will let one delete the contents of
  a read-only file, and in general the `-i' option will let
  one clobber protected files.  This is not a bug, but rather a
  consequence of how the Unix filesystem works.

  The permissions on a file say what can happen to the data
  in that file, while the permissions on a directory say what can
  happen to the list of files in that directory.  `sed -i'
  will not ever open for writing  a file that is already on disk,
  rather, it will work on a temporary file that is finally renamed
  to the original name: if you rename or delete files, you're actually
  modifying the contents of the directory, so the operation depends on
  the permissions of the directory, not of the file).  For this same
  reason, sed will not let one use `-i' on a writeable file in a
  read-only directory (but unbelievably nobody reports that as a
  bug...).


`0a' does not work (gives an error)

  There is no line 0.  0 is a special address that is only used to treat
  addresses like `0,/RE/' as active when the script starts: if you
  write `1,/abc/d' and the first line includes the word `abc', then
  that match would be ignored because address ranges must span at least
  two lines (barring the end of the file); but what you probably wanted is
  to delete every line up to the first one including `abc', and this
  is obtained with `0,/abc/d'.


`[a-z]' is case insensitive

  You are encountering problems with locales.  POSIX mandates that `[a-z]'
  uses the current locale's collation order -- in C parlance, that means
  strcoll(3) instead of strcmp(3).  Some locales have a case insensitive
  strcoll, others don't: one of those that have problems is Estonian.

  Another problem is that [a-z] tries to use collation symbols.  This
  only happens if you are on the GNU system, using GNU libc's regular
  expression matcher instead of compiling the one supplied with GNU sed.
  In a Danish locale, for example, the regular expression `^[a-z]$'
  matches the string `aa', because aa is a single collating symbol that
  comes after `a' and before `b'; `ll' behaves similarly in Spanish
  locales, or `ij' in Dutch locales.

  To work around these problems, which may cause bugs in shell scripts,
  set the LC_ALL environment variable to `C', or set the locale on a
  more fine-grained basis with the other LC_* environment variables.