summaryrefslogtreecommitdiffstats
path: root/doc/internals/api/ist.txt
blob: 0f118d6e63745f96c968e5bcfb6463db6d6125ea (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
2021-11-08 - Indirect Strings (IST) API


1. Background
-------------

When parsing traffic, most of the standard C string functions are unusable
since they rely on a trailing zero. In addition, for the rare ones that support
a length, we have to constantly maintain both the pointer and the length. But
then, it's easy to come up with complex lengths and offsets calculations all
over the place, rendering the code hard to read and bugs hard to avoid or spot.

IST provides a solution to this by defining a structure made of exactly two
word size elements, that most C ABIs know how to handle as a register when
used as a function argument or a function's return value. The functions are
inlined to leave a maximum set of opportunities to the compiler or optimization
and expression reduction, and as a result they are often inexpensive to use. It
is important however to keep in mind that all of these are designed for minimal
code size when dealing with short strings (i.e. parsing tokens in protocols),
and they are not optimal for processing large blocks.


2. API description
------------------

IST are defined like this:

  struct ist {
          char  *ptr;  // pointer to the string's first byte
          size_t len;  // number of valid bytes starting from ptr
  };

A string is not set if its ->ptr member is NULL. In this case .len is undefined
and is recommended to be zero.

Declaring a function returning an IST:

  struct ist produce_ist(int ok)
  {
      return ok ? IST("OK") : IST("KO");
  }

Declaring a function consuming an IST:

  void say_ist(struct ist i)
  {
      write(1, istptr(i), istlen(i));
  }

Chaining the two:

  void say_ok(int ok)
  {
      say_ist(produce_ist(ok));
  }

Notes:
  - the arguments are passed as value, not reference, so there's no need for
    any "const" in their declaration (except to catch coding mistakes).
    Pointers to ist may benefit from being marked "const" however.

  - similarly for the return value, there's no point is marking it "const" as
    this would protect the pointer and length, not the data.

  - use ist0() to append a trailing zero to a variable string for use with
    printf()'s "%s" format, or for use with functions that work on NUL-
    terminated strings, but beware of not doing this with constants.

  - the API provides a starting pointer and current length, but does not
    provide an allocated size. It remains up to the caller to know how large
    the allocated area is when adding data, though most functions make this
    easy.

The following macros and functions are defined. Those whose name starts with
underscores require special care and must not be used without being certain
they are properly used (typically subject to buffer overflows if misused). Note
that most functions were added over time depending on instant needs, and some
are very close to each other. Many useful functions are still missing and would
deserve being added.

Below, arguments "i1","i2" are all of type "ist". Arguments "s" are
NUL-terminated strings of type "char*", and "cs" are of type "const char *".
Arguments "c" are of type "char", and "n" are of type size_t.

  IST(cs):ist            make constant IST from a NUL-terminated const string
  IST_NULL:ist           return an unset IST = ist2(NULL,0)
  __istappend(i1,c):ist  append character <c> at the end of ist <i1>
  ist(s):ist             return an IST from a nul-terminated string
  ist0(i1):char*         write a \0 at the end of an IST, return the string
  ist2(cs,l):ist         return a variable IST from a const string and length
  ist2bin(s,i1):ist      copy IST into a buffer, return the result
  ist2bin_lc(s,i1):ist   like ist2bin() but turning turning to lower case
  ist2bin_uc(s,i1):ist   like ist2bin() but turning turning to upper case
  ist2str(s,i1):ist      copy IST into a buffer, add NUL and return the result
  ist2str_lc(s,i1):ist   like ist2str() but turning turning to lower case
  ist2str_uc(s,i1):ist   like ist2str() but turning turning to upper case
  ist_find(i1,c):ist     return first occurrence of char <c> in <i1>
  ist_find_ctl(i1):char* return pointer to first CTL char in <i1> or NULL
  ist_skip(i1,c):ist     return first occurrence of char not <c> in <i1>
  istadv(i1,n):ist       advance the string by <n> characters
  istalloc(n):ist        return allocated string of zero initial length
  istcat(d,s,n):ssize_t  copy <s> after <d> for <n> chars max, return len or -1
  istchr(i1,c):char*     return pointer to first occurrence of <c> in <i1>
  istclear(i1*):size_t   return previous size and set size to zero
  istcpy(d,s,n):ssize_t  copy <s> over <d> for <n> chars max, return len or -1
  istdiff(i1,i2):int     return the ordinal difference, like strcmp()
  istdup(i1):ist         allocate new ist and copy original one into it
  istend(i1):char*       return pointer to first character after the IST
  isteq(i1,i2):int       return non-zero if strings are equal
  isteqi(i1,i2):int      like isteq() but case-insensitive
  istfree(i1*)           free of allocated <i1>/IST_NULL and set it to IST_NULL
  istissame(i1,i2):int   return true if pointers and lengths are equal
  istist(i1,i2):ist      return first occurrence of <i2> in <i1>
  istlen(i1):size_t      return the length of the IST (number of characters)
  istmatch(i1,i2):int    return non-zero if i1 starts like i2 (empty OK)
  istmatchi(i1,i2):int   like istmatch() but case insensitive
  istneq(i1,i2,n):int    like isteq() but limited to the first <n> chars
  istnext(i1):ist        return the IST advanced by one character
  istnmatch(i1,i2,n):int like istmatch() but limited to the first <n> chars
  istpad(s,i1):ist       copy IST into a buffer, add a NUL, return the result
  istptr(i1):char*       return the starting pointer of the IST
  istscat(d,s,n):ssize_t same as istcat() but always place a NUL at the end
  istscpy(d,s,n):ssize_t same as istcpy() but always place a NUL at the end
  istshift(i1*):char     return the first character and advance the IST by one
  istsplit(i1*,c):ist    return part before <c>, make ist start from <c>
  iststop(i1,c):ist      truncate ist before first occurrence of <c>
  isttest(i1):int        return true if ist is not NULL, false otherwise
  isttrim(i1,n):ist      return ist trimmed to no more than <n> characters
  istzero(i1,n):ist      trim to <n> chars, trailing zero included.


3. Quick index by typical C construct or function
-------------------------------------------------

Some common C constructs may be adjusted to use ist instead. The mapping is not
always one-to-one, but usually the computations on the length part tends to
disappear in the refactoring, allowing to directly chain function calls. The
entries below are hints to figure what function to look for in order to rewrite
some common use cases.

  char*                  IST equivalent

  strchr()               istchr(), ist_find(), iststop()
  strstr()               istist()
  strcpy()               istcpy()
  strscpy()              istscpy()
  strlcpy()              istscpy()
  strcat()               istcat()
  strscat()              istscat()
  strlcat()              istscat()
  strcmp()               istdiff()
  strdup()               istdup()
  !strcmp()              isteq()
  !strncmp()             istneq(), istmatch(), istnmatch()
  !strcasecmp()          isteqi()
  !strncasecmp()         istneqi(), istmatchi()
  strtok()               istsplit()
  return NULL            return IST_NULL
  s = malloc()           s = istalloc()
  free(s); s = NULL      istfree(&s)
  p != NULL              isttest(p)
  c = *(p++)             c = istshift(p)
  *(p++) = c             __istappend(p, c)
  p += n                 istadv(p, n)
  p + strlen(p)          istend(p)
  p[max] = 0             isttrim(p, max)
  p[max+1] = 0           istzero(p, max)