From da76459dc21b5af2449af2d36eb95226cb186ce2 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sun, 28 Apr 2024 11:35:11 +0200 Subject: Adding upstream version 2.6.12. Signed-off-by: Daniel Baumann --- doc/internals/api/filters.txt | 1186 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1186 insertions(+) create mode 100644 doc/internals/api/filters.txt (limited to 'doc/internals/api/filters.txt') diff --git a/doc/internals/api/filters.txt b/doc/internals/api/filters.txt new file mode 100644 index 0000000..eee74cf --- /dev/null +++ b/doc/internals/api/filters.txt @@ -0,0 +1,1186 @@ + ----------------------------------------- + Filters Guide - version 2.5 + ( Last update: 2021-02-24 ) + ------------------------------------------ + Author : Christopher Faulet + Contact : christopher dot faulet at capflam dot org + + +ABSTRACT +-------- + +The filters support is a new feature of HAProxy 1.7. It is a way to extend +HAProxy without touching its core code and, in certain extent, without knowing +its internals. This feature will ease contributions, reducing impact of +changes. Another advantage will be to simplify HAProxy by replacing some parts +by filters. As we will see, and as an example, the HTTP compression is the first +feature moved in a filter. + +This document describes how to write a filter and what to keep in mind to do +so. It also talks about the known limits and the pitfalls to avoid. + +As said, filters are quite new for now. The API is not freezed and will be +updated/modified/improved/extended as needed. + + + +SUMMARY +------- + + 1. Filters introduction + 2. How to use filters + 3. How to write a new filter + 3.1. API Overview + 3.2. Defining the filter name and its configuration + 3.3. Managing the filter lifecycle + 3.3.1. Dealing with threads + 3.4. Handling the streams activity + 3.5. Analyzing the channels activity + 3.6. Filtering the data exchanged + 4. FAQ + + + +1. FILTERS INTRODUCTION +----------------------- + +First of all, to fully understand how filters work and how to create one, it is +best to know, at least from a distance, what is a proxy (frontend/backend), a +stream and a channel in HAProxy and how these entities are linked to each other. +doc/internals/entities.pdf is a good overview. + +Then, to support filters, many callbacks has been added to HAProxy at different +places, mainly around channel analyzers. Their purpose is to allow filters to +be involved in the data processing, from the stream creation/destruction to +the data forwarding. Depending of what it should do, a filter can implement all +or part of these callbacks. For now, existing callbacks are focused on +streams. But future improvements could enlarge filters scope. For instance, it +could be useful to handle events at the connection level. + +In HAProxy configuration file, a filter is declared in a proxy section, except +default. So the configuration corresponding to a filter declaration is attached +to a specific proxy, and will be shared by all its instances. it is opaque from +the HAProxy point of view, this is the filter responsibility to manage it. For +each filter declaration matches a uniq configuration. Several declarations of +the same filter in the same proxy will be handle as different filters by +HAProxy. + +A filter instance is represented by a partially opaque context (or a state) +attached to a stream and passed as arguments to callbacks. Through this context, +filter instances are stateful. Depending the filter is declared in a frontend or +a backend section, its instances will be created, respectively, when a stream is +created or when a backend is selected. Their behaviors will also be +different. Only instances of filters declared in a frontend section will be +aware of the creation and the destruction of the stream, and will take part in +the channels analyzing before the backend is defined. + +It is important to remember the configuration of a filter is shared by all its +instances, while the context of an instance is owned by a uniq stream. + +Filters are designed to be chained. It is possible to declare several filters in +the same proxy section. The declaration order is important because filters will +be called one after the other respecting this order. Frontend and backend +filters are also chained, frontend ones called first. Even if the filters +processing is serialized, each filter will bahave as it was alone (unless it was +developed to be aware of other filters). For all that, some constraints are +imposed to filters, especially when data exchanged between the client and the +server are processed. We will discuss again these constraints when we will tackle +the subject of writing a filter. + + + +2. HOW TO USE FILTERS +--------------------- + +To use a filter, the parameter 'filter' should be used, followed by the filter +name and, optionally, its configuration in the desired listen, frontend or +backend section. For instance : + + listen test + ... + filter trace name TST + ... + + +See doc/configuration.txt for a formal definition of the parameter 'filter'. +Note that additional parameters on the filter line must be parsed by the filter +itself. + +The list of available filters is reported by 'haproxy -vv' : + + $> haproxy -vv + HAProxy version 1.7-dev2-3a1d4a-33 2016/03/21 + Copyright 2000-2016 Willy Tarreau + + [...] + + Available filters : + [COMP] compression + [TRACE] trace + + +Multiple filter lines can be used in a proxy section to chain filters. Filters +will be called in the declaration order. + +Some filters can support implicit declarations in certain circumstances +(without the filter line). This is not recommended for new features but are +useful for existing ones moved in a filter, for backward compatibility +reasons. Implicit declarations are supported when there is only one filter used +on a proxy. When several filters are used, explicit declarations are mandatory. +The HTTP compression filter is one of these filters. Alone, using 'compression' +keywords is enough to use it. But when at least a second filter is used, a +filter line must be added. + + # filter line is optional + listen t1 + bind *:80 + compression algo gzip + compression offload + server srv x.x.x.x:80 + + # filter line is mandatory for the compression filter + listen t2 + bind *:81 + filter trace name T2 + filter compression + compression algo gzip + compression offload + server srv x.x.x.x:80 + + + + +3. HOW TO WRITE A NEW FILTER +---------------------------- + +To write a filter, there are 2 header files to explore : + + * include/haproxy/filters-t.h : This is the main header file, containing all + important structures to use. It represents the + filter API. + + * include/haproxy/filters.h : This header file contains helper functions that + may be used. It also contains the internal API + used by HAProxy to handle filters. + +To ease the filters integration, it is better to follow some conventions : + + * Use 'flt_' prefix to name the filter (e.g flt_http_comp or flt_trace). + + * Keep everything related to the filter in a same file. + +The filter 'trace' can be used as a template to write new filter. It is a good +start to see how filters really work. + +3.1 API OVERVIEW +---------------- + +Writing a filter can be summarized to write functions and attach them to the +existing callbacks. Available callbacks are listed in the following structure : + + struct flt_ops { + /* + * Callbacks to manage the filter lifecycle + */ + int (*init) (struct proxy *p, struct flt_conf *fconf); + void (*deinit) (struct proxy *p, struct flt_conf *fconf); + int (*check) (struct proxy *p, struct flt_conf *fconf); + int (*init_per_thread) (struct proxy *p, struct flt_conf *fconf); + void (*deinit_per_thread)(struct proxy *p, struct flt_conf *fconf); + + /* + * Stream callbacks + */ + int (*attach) (struct stream *s, struct filter *f); + int (*stream_start) (struct stream *s, struct filter *f); + int (*stream_set_backend)(struct stream *s, struct filter *f, struct proxy *be); + void (*stream_stop) (struct stream *s, struct filter *f); + void (*detach) (struct stream *s, struct filter *f); + void (*check_timeouts) (struct stream *s, struct filter *f); + + /* + * Channel callbacks + */ + int (*channel_start_analyze)(struct stream *s, struct filter *f, + struct channel *chn); + int (*channel_pre_analyze) (struct stream *s, struct filter *f, + struct channel *chn, + unsigned int an_bit); + int (*channel_post_analyze) (struct stream *s, struct filter *f, + struct channel *chn, + unsigned int an_bit); + int (*channel_end_analyze) (struct stream *s, struct filter *f, + struct channel *chn); + + /* + * HTTP callbacks + */ + int (*http_headers) (struct stream *s, struct filter *f, + struct http_msg *msg); + int (*http_payload) (struct stream *s, struct filter *f, + struct http_msg *msg, unsigned int offset, + unsigned int len); + int (*http_end) (struct stream *s, struct filter *f, + struct http_msg *msg); + + void (*http_reset) (struct stream *s, struct filter *f, + struct http_msg *msg); + void (*http_reply) (struct stream *s, struct filter *f, + short status, + const struct buffer *msg); + + /* + * TCP callbacks + */ + int (*tcp_payload) (struct stream *s, struct filter *f, + struct channel *chn, unsigned int offset, + unsigned int len); + }; + + +We will explain in following parts when these callbacks are called and what they +should do. + +Filters are declared in proxy sections. So each proxy have an ordered list of +filters, possibly empty if no filter is used. When the configuration of a proxy +is parsed, each filter line represents an entry in this list. In the structure +'proxy', the filters configurations are stored in the field 'filter_configs', +each one of type 'struct flt_conf *' : + + /* + * Structure representing the filter configuration, attached to a proxy and + * accessible from a filter when instantiated in a stream + */ + struct flt_conf { + const char *id; /* The filter id */ + struct flt_ops *ops; /* The filter callbacks */ + void *conf; /* The filter configuration */ + struct list list; /* Next filter for the same proxy */ + unsigned int flags; /* FLT_CFG_FL_* */ + }; + + * 'flt_conf.id' is an identifier, defined by the filter. It can be + NULL. HAProxy does not use this field. Filters can use it in log messages or + as a uniq identifier to check multiple declarations. It is the filter + responsibility to free it, if necessary. + + * 'flt_conf.conf' is opaque. It is the internal configuration of a filter, + generally allocated and filled by its parsing function (See § 3.2). It is + the filter responsibility to free it. + + * 'flt_conf.ops' references the callbacks implemented by the filter. This + field must be set during the parsing phase (See § 3.2) and can be refine + during the initialization phase (See § 3.3). If it is dynamically allocated, + it is the filter responsibility to free it. + + * 'flt_conf.flags' is a bitfield to specify the filter capabilities. For now, + only FLT_CFG_FL_HTX may be set when a filter is able to process HTX + streams. If not set, the filter is excluded from the HTTP filtering. + + +The filter configuration is global and shared by all its instances. A filter +instance is created in the context of a stream and attached to this stream. in +the structure 'stream', the field 'strm_flt' is the state of all filter +instances attached to a stream : + + /* + * Structure representing the "global" state of filters attached to a + * stream. + */ + struct strm_flt { + struct list filters; /* List of filters attached to a stream */ + struct filter *current[2]; /* From which filter resume processing, for a specific channel. + * This is used for resumable callbacks only, + * If NULL, we start from the first filter. + * 0: request channel, 1: response channel */ + unsigned short flags; /* STRM_FL_* */ + unsigned char nb_req_data_filters; /* Number of data filters registered on the request channel */ + unsigned char nb_rsp_data_filters; /* Number of data filters registered on the response channel */ + unsigned long long offset[2]; /* gloal offset of input data already filtered for a specific channel + * 0: request channel, 1: response channel */ + }; + + +Filter instances attached to a stream are stored in the field +'strm_flt.filters', each instance is of type 'struct filter *' : + + /* + * Structure representing a filter instance attached to a stream + * + * 2D-Array fields are used to store info per channel. The first index + * stands for the request channel, and the second one for the response + * channel. Especially, and are offsets representing amount of + * data that the filter are, respectively, parsed and forwarded on a + * channel. Filters can access these values using FLT_NXT and FLT_FWD + * macros. + */ + struct filter { + struct flt_conf *config; /* the filter's configuration */ + void *ctx; /* The filter context (opaque) */ + unsigned short flags; /* FLT_FL_* */ + unsigned long long offset[2]; /* Offset of input data already filtered for a specific channel + * 0: request channel, 1: response channel */ + unsigned int pre_analyzers; /* bit field indicating analyzers to + * pre-process */ + unsigned int post_analyzers; /* bit field indicating analyzers to + * post-process */ + struct list list; /* Next filter for the same proxy/stream */ + }; + + * 'filter.config' is the filter configuration previously described. All + instances of a filter share it. + + * 'filter.ctx' is an opaque context. It is managed by the filter, so it is its + responsibility to free it. + + * 'filter.pre_analyzers and 'filter.post_analyzers will be described later + (See § 3.5). + + * 'filter.offset' will be described later (See § 3.6). + + +3.2. DEFINING THE FILTER NAME AND ITS CONFIGURATION +--------------------------------------------------- + +During the filter development, the first thing to do is to add it in the +supported filters. To do so, its name must be registered as a valid keyword on +the filter line : + + /* Declare the filter parser for "my_filter" keyword */ + static struct flt_kw_list flt_kws = { "MY_FILTER_SCOPE", { }, { + { "my_filter", parse_my_filter_cfg, NULL /* private data */ }, + { NULL, NULL, NULL }, + } + }; + INITCALL1(STG_REGISTER, flt_register_keywords, &flt_kws); + + +Then the filter internal configuration must be defined. For instance : + + struct my_filter_config { + struct proxy *proxy; + char *name; + /* ... */ + }; + + +All callbacks implemented by the filter must then be declared. Here, a global +variable is used : + + struct flt_ops my_filter_ops { + .init = my_filter_init, + .deinit = my_filter_deinit, + .check = my_filter_config_check, + + /* ... */ + }; + + +Finally, the function to parse the filter configuration must be written, here +'parse_my_filter_cfg'. This function must parse all remaining keywords on the +filter line : + + /* Return -1 on error, else 0 */ + static int + parse_my_filter_cfg(char **args, int *cur_arg, struct proxy *px, + struct flt_conf *flt_conf, char **err, void *private) + { + struct my_filter_config *my_conf; + int pos = *cur_arg; + + /* Allocate the internal configuration used by the filter */ + my_conf = calloc(1, sizeof(*my_conf)); + if (!my_conf) { + memprintf(err, "%s : out of memory", args[*cur_arg]); + return -1; + } + my_conf->proxy = px; + + /* ... */ + + /* Parse all keywords supported by the filter and fill the internal + * configuration */ + pos++; /* Skip the filter name */ + while (*args[pos]) { + if (!strcmp(args[pos], "name")) { + if (!*args[pos + 1]) { + memprintf(err, "'%s' : '%s' option without value", + args[*cur_arg], args[pos]); + goto error; + } + my_conf->name = strdup(args[pos + 1]); + if (!my_conf->name) { + memprintf(err, "%s : out of memory", args[*cur_arg]); + goto error; + } + pos += 2; + } + + /* ... parse other keywords ... */ + } + *cur_arg = pos; + + /* Set callbacks supported by the filter */ + flt_conf->ops = &my_filter_ops; + + /* Last, save the internal configuration */ + flt_conf->conf = my_conf; + return 0; + + error: + if (my_conf->name) + free(my_conf->name); + free(my_conf); + return -1; + } + + +WARNING : In this parsing function, 'flt_conf->ops' must be initialized. All + arguments of the filter line must also be parsed. This is mandatory. + +In the previous example, the filter lne should be read as follows : + + filter my_filter name MY_NAME ... + + +Optionally, by implementing the 'flt_ops.check' callback, an extra set is added +to check the internal configuration of the filter after the parsing phase, when +the HAProxy configuration is fully defined. For instance : + + /* Check configuration of a trace filter for a specified proxy. + * Return 1 on error, else 0. */ + static int + my_filter_config_check(struct proxy *px, struct flt_conf *my_conf) + { + if (px->mode != PR_MODE_HTTP) { + Alert("The filter 'my_filter' cannot be used in non-HTTP mode.\n"); + return 1; + } + + /* ... */ + + return 0; + } + + + +3.3. MANAGING THE FILTER LIFECYCLE +---------------------------------- + +Once the configuration parsed and checked, filters are ready to by used. There +are two main callbacks to manage the filter lifecycle : + + * 'flt_ops.init' : It initializes the filter for a proxy. This callback may be + defined to finish the filter configuration. + + * 'flt_ops.deinit' : It cleans up what the parsing function and the init + callback have done. This callback is useful to release + memory allocated for the filter configuration. + +Here is an example : + + /* Initialize the filter. Returns -1 on error, else 0. */ + static int + my_filter_init(struct proxy *px, struct flt_conf *fconf) + { + struct my_filter_config *my_conf = fconf->conf; + + /* ... */ + + return 0; + } + + /* Free resources allocated by the trace filter. */ + static void + my_filter_deinit(struct proxy *px, struct flt_conf *fconf) + { + struct my_filter_config *my_conf = fconf->conf; + + if (my_conf) { + free(my_conf->name); + /* ... */ + free(my_conf); + } + fconf->conf = NULL; + } + + +3.3.1 DEALING WITH THREADS +-------------------------- + +When HAProxy is compiled with the threads support and started with more that one +thread (global.nbthread > 1), then it is possible to manage the filter per +thread with following callbacks : + + * 'flt_ops.init_per_thread': It initializes the filter for each thread. It + works the same way than 'flt_ops.init' but in the + context of a thread. This callback is called + after the thread creation. + + * 'flt_ops.deinit_per_thread': It cleans up what the init_per_thread callback + have done. It is called in the context of a + thread, before exiting it. + +It is the filter responsibility to deal with concurrency. check, init and deinit +callbacks are called on the main thread. All others are called on a "worker" +thread (not always the same). It is also the filter responsibility to know if +HAProxy is started with more than one thread. If it is started with one thread +(or compiled without the threads support), these callbacks will be silently +ignored (in this case, global.nbthread will be always equal to one). + + +3.4. HANDLING THE STREAMS ACTIVITY +----------------------------------- + +It may be interesting to handle streams activity. For now, there is three +callbacks that should define to do so : + + * 'flt_ops.stream_start' : It is called when a stream is started. This + callback can fail by returning a negative value. It + will be considered as a critical error by HAProxy + which disabled the listener for a short time. + + * 'flt_ops.stream_set_backend' : It is called when a backend is set for a + stream. This callbacks will be called for all + filters attached to a stream (frontend and + backend). Note this callback is not called if + the frontend and the backend are the same. + + * 'flt_ops.stream_stop' : It is called when a stream is stopped. This callback + always succeed. Anyway, it is too late to return an + error. + +For instance : + + /* Called when a stream is created. Returns -1 on error, else 0. */ + static int + my_filter_stream_start(struct stream *s, struct filter *filter) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... */ + + return 0; + } + + /* Called when a backend is set for a stream */ + static int + my_filter_stream_set_backend(struct stream *s, struct filter *filter, + struct proxy *be) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... */ + + return 0; + } + + /* Called when a stream is destroyed */ + static void + my_filter_stream_stop(struct stream *s, struct filter *filter) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... */ + } + + +WARNING : Handling the streams creation and destruction is only possible for + filters defined on proxies with the frontend capability. + +In addition, it is possible to handle creation and destruction of filter +instances using following callbacks: + + * 'flt_ops.attach' : It is called after a filter instance creation, when it is + attached to a stream. This happens when the stream is + started for filters defined on the stream's frontend and + when the backend is set for filters declared on the + stream's backend. It is possible to ignore the filter, if + needed, by returning 0. This could be useful to have + conditional filtering. + + * 'flt_ops.detach' : It is called when a filter instance is detached from a + stream, before its destruction. This happens when the + stream is stopped for filters defined on the stream's + frontend and when the analyze ends for filters defined on + the stream's backend. + +For instance : + + /* Called when a filter instance is created and attach to a stream */ + static int + my_filter_attach(struct stream *s, struct filter *filter) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + if (/* ... */) + return 0; /* Ignore the filter here */ + return 1; + } + + /* Called when a filter instance is detach from a stream, just before its + * destruction */ + static void + my_filter_detach(struct stream *s, struct filter *filter) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... */ + } + +Finally, it may be interesting to notify the filter when the stream is woken up +because of an expired timer. This could let a chance to check some internal +timeouts, if any. To do so the following callback must be used : + + * 'flt_opt.check_timeouts' : It is called when a stream is woken up because of + an expired timer. + +For instance : + + /* Called when a stream is woken up because of an expired timer */ + static void + my_filter_check_timeouts(struct stream *s, struct filter *filter) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... */ + } + + +3.5. ANALYZING THE CHANNELS ACTIVITY +------------------------------------ + +The main purpose of filters is to take part in the channels analyzing. To do so, +there is 2 callbacks, 'flt_ops.channel_pre_analyze' and +'flt_ops.channel_post_analyze', called respectively before and after each +analyzer attached to a channel, except analyzers responsible for the data +forwarding (TCP or HTTP). Concretely, on the request channel, these callbacks +could be called before following analyzers : + + * tcp_inspect_request (AN_REQ_INSPECT_FE and AN_REQ_INSPECT_BE) + * http_wait_for_request (AN_REQ_WAIT_HTTP) + * http_wait_for_request_body (AN_REQ_HTTP_BODY) + * http_process_req_common (AN_REQ_HTTP_PROCESS_FE) + * process_switching_rules (AN_REQ_SWITCHING_RULES) + * http_process_req_ common (AN_REQ_HTTP_PROCESS_BE) + * http_process_tarpit (AN_REQ_HTTP_TARPIT) + * process_server_rules (AN_REQ_SRV_RULES) + * http_process_request (AN_REQ_HTTP_INNER) + * tcp_persist_rdp_cookie (AN_REQ_PRST_RDP_COOKIE) + * process_sticking_rules (AN_REQ_STICKING_RULES) + +And on the response channel : + + * tcp_inspect_response (AN_RES_INSPECT) + * http_wait_for_response (AN_RES_WAIT_HTTP) + * process_store_rules (AN_RES_STORE_RULES) + * http_process_res_common (AN_RES_HTTP_PROCESS_BE) + +Unlike the other callbacks previously seen before, 'flt_ops.channel_pre_analyze' +can interrupt the stream processing. So a filter can decide to not execute the +analyzer that follows and wait the next iteration. If there are more than one +filter, following ones are skipped. On the next iteration, the filtering resumes +where it was stopped, i.e. on the filter that has previously stopped the +processing. So it is possible for a filter to stop the stream processing on a +specific analyzer for a while before continuing. Moreover, this callback can be +called many times for the same analyzer, until it finishes its processing. For +instance : + + /* Called before a processing happens on a given channel. + * Returns a negative value if an error occurs, 0 if it needs to wait, + * any other value otherwise. */ + static int + my_filter_chn_pre_analyze(struct stream *s, struct filter *filter, + struct channel *chn, unsigned an_bit) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + switch (an_bit) { + case AN_REQ_WAIT_HTTP: + if (/* wait that a condition is verified before continuing */) + return 0; + break; + /* ... * / + } + return 1; + } + + * 'an_bit' is the analyzer id. All analyzers are listed in + 'include/haproxy/channels-t.h'. + + * 'chn' is the channel on which the analyzing is done. It is possible to + determine if it is the request or the response channel by testing if + CF_ISRESP flag is set : + + │ ((chn->flags & CF_ISRESP) == CF_ISRESP) + + +In previous example, the stream processing is blocked before receipt of the HTTP +request until a condition is verified. + +'flt_ops.channel_post_analyze', for its part, is not resumable. It returns a +negative value if an error occurs, any other value otherwise. It is called when +a filterable analyzer finishes its processing, so once for the same analyzer. +For instance : + + /* Called after a processing happens on a given channel. + * Returns a negative value if an error occurs, any other + * value otherwise. */ + static int + my_filter_chn_post_analyze(struct stream *s, struct filter *filter, + struct channel *chn, unsigned an_bit) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + struct http_msg *msg; + + switch (an_bit) { + case AN_REQ_WAIT_HTTP: + if (/* A test on received headers before any other treatment */) { + msg = ((chn->flags & CF_ISRESP) ? &s->txn->rsp : &s->txn->req); + txn->status = 400; + msg->msg_state = HTTP_MSG_ERROR; + http_reply_and_close(s, s->txn->status, http_error_message(s)); + return -1; /* This is an error ! */ + } + break; + /* ... * / + } + return 1; + } + + +Pre and post analyzer callbacks of a filter are not automatically called. They +must be regiesterd explicitly on analyzers, updating the value of +'filter.pre_analyzers' and 'filter.post_analyzers' bit fields. All analyzer bits +are listed in 'include/types/channels.h'. Here is an example : + + static int + my_filter_stream_start(struct stream *s, struct filter *filter) + { + /* ... * / + + /* Register the pre analyzer callback on all request and response + * analyzers */ + filter->pre_analyzers |= (AN_REQ_ALL | AN_RES_ALL) + + /* Register the post analyzer callback of only on AN_REQ_WAIT_HTTP and + * AN_RES_WAIT_HTTP analyzers */ + filter->post_analyzers |= (AN_REQ_WAIT_HTTP | AN_RES_WAIT_HTTP) + + /* ... * / + return 0; + } + + +To surround activity of a filter during the channel analyzing, two new analyzers +has been added : + + * 'flt_start_analyze' (AN_REQ/RES_FLT_START_FE/AN_REQ_RES_FLT_START_BE) : For + a specific filter, this analyzer is called before any call to the + 'channel_analyze' callback. From the filter point of view, it calls the + 'flt_ops.channel_start_analyze' callback. + + * 'flt_end_analyze' (AN_REQ/RES_FLT_END) : For a specific filter, this + analyzer is called when all other analyzers have finished their + processing. From the filter point of view, it calls the + 'flt_ops.channel_end_analyze' callback. + +These analyzers are called only once per streams. + +'flt_ops.channel_start_analyze' and 'flt_ops.channel_end_analyze' callbacks can +interrupt the stream processing, as 'flt_ops.channel_analyze'. Here is an +example : + + /* Called when analyze starts for a given channel + * Returns a negative value if an error occurs, 0 if it needs to wait, + * any other value otherwise. */ + static int + my_filter_chn_start_analyze(struct stream *s, struct filter *filter, + struct channel *chn) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... TODO ... */ + + return 1; + } + + /* Called when analyze ends for a given channel + * Returns a negative value if an error occurs, 0 if it needs to wait, + * any other value otherwise. */ + static int + my_filter_chn_end_analyze(struct stream *s, struct filter *filter, + struct channel *chn) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... TODO ... */ + + return 1; + } + + +Workflow on channels can be summarized as following : + + FE: Called for filters defined on the stream's frontend + BE: Called for filters defined on the stream's backend + + +------->---------+ + | | | + +----------------------+ | +----------------------+ + | flt_ops.attach (FE) | | | flt_ops.attach (BE) | + +----------------------+ | +----------------------+ + | | | + V | V + +--------------------------+ | +------------------------------------+ + | flt_ops.stream_start (FE)| | | flt_ops.stream_set_backend (FE+BE) | + +--------------------------+ | +------------------------------------+ + | | | + ... | ... + | | | + | ^ | + | --+ | | --+ + +------<----------+ | | +--------<--------+ | + | | | | | | | + V | | | V | | ++-------------------------------+ | | | +-------------------------------+ | | +| flt_start_analyze (FE) +-+ | | | flt_start_analyze (BE) +-+ | +|(flt_ops.channel_start_analyze)| | F | |(flt_ops.channel_start_analyze)| | ++---------------+---------------+ | R | +-------------------------------+ | + | | O | | | + +------<---------+ | N ^ +--------<-------+ | B + | | | T | | | | A ++---------------|------------+ | | E | +---------------|------------+ | | C +|+--------------V-------------+ | | N | |+--------------V-------------+ | | K +||+----------------------------+ | | D | ||+----------------------------+ | | E +|||flt_ops.channel_pre_analyze | | | | |||flt_ops.channel_pre_analyze | | | N +||| V | | | | ||| V | | | D +||| analyzer (FE) +-+ | | ||| analyzer (FE+BE) +-+ | ++|| V | | | +|| V | | + +|flt_ops.channel_post_analyze| | | +|flt_ops.channel_post_analyze| | + +----------------------------+ | | +----------------------------+ | + | --+ | | | + +------------>------------+ ... | + | | + [ data filtering (see below) ] | + | | + ... | + | | + +--------<--------+ | + | | | + V | | + +-------------------------------+ | | + | flt_end_analyze (FE+BE) +-+ | + | (flt_ops.channel_end_analyze) | | + +---------------+---------------+ | + | --+ + V + +----------------------+ + | flt_ops.detach (BE) | + +----------------------+ + | + V + +--------------------------+ + | flt_ops.stream_stop (FE) | + +--------------------------+ + | + V + +----------------------+ + | flt_ops.detach (FE) | + +----------------------+ + | + V + +By zooming on an analyzer box we have: + + ... + | + V + | + +-----------<-----------+ + | | + +-----------------+--------------------+ | + | | | | + | +--------<---------+ | | + | | | | | + | V | | | + | flt_ops.channel_pre_analyze ->-+ | ^ + | | | | + | | | | + | V | | + | analyzer --------->-----+--+ + | | | + | | | + | V | + | flt_ops.channel_post_analyze | + | | | + | | | + +-----------------+--------------------+ + | + V + ... + + + 3.6. FILTERING THE DATA EXCHANGED +----------------------------------- + +WARNING : To fully understand this part, it is important to be aware on how the + buffers work in HAProxy. For the HTTP part, it is also important to + understand how data are parsed and structured, and how the internal + representation, called HTX, works. See doc/internals/buffer-api.txt + and doc/internals/htx-api.txt for details. + +An extended feature of the filters is the data filtering. By default a filter +does not look into data exchanged between the client and the server because it +is expensive. Indeed, instead of forwarding data without any processing, each +byte need to be buffered. + +So, to enable the data filtering on a channel, at any time, in one of previous +callbacks, 'register_data_filter' function must be called. And conversely, to +disable it, 'unregister_data_filter' function must be called. For instance : + + my_filter_http_headers(struct stream *s, struct filter *filter, + struct http_msg *msg) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* 'chn' must be the request channel */ + if (!(msg->chn->flags & CF_ISRESP)) { + struct htx *htx; + struct ist hdr; + struct http_hdr_ctx ctx; + + htx = htxbuf(msg->chn->buf); + + /* Enable the data filtering for the request if 'X-Filter' header + * is set to 'true'. */ + hdr = ist("X-Filter); + ctx.blk = NULL; + if (http_find_header(htx, hdr, &ctx, 0) && + ctx.value.len >= 4 && memcmp(ctx.value.ptr, "true", 4) == 0) + register_data_filter(s, chn, filter); + } + + return 1; + } + +Here, the data filtering is enabled if the HTTP header 'X-Filter' is found and +set to 'true'. + +If several filters are declared, the evaluation order remains the same, +regardless the order of the registrations to the data filtering. Data +registrations must be performed before the data forwarding step. However, a +filter may be unregistered from the data filtering at any time. + +Depending on the stream type, TCP or HTTP, the way to handle data filtering is +different. HTTP data are structured while TCP data are raw. And there are more +callbacks for HTTP streams to fully handle all steps of an HTTP transaction. But +the main part is the same. The data filtering is performed in one callback, +called in loop on input data starting at a specific offset for a given +length. Data analyzed by a filter are considered as forwarded from its point of +view. Because filters are chained, a filter never analyzes more data than its +predecessors. Thus only data analyzed by the last filter are effectively +forwarded. This means, at any time, any filter may choose to not analyze all +available data (available from its point of view), blocking the data forwarding. + +Internally, filters own 2 offsets representing the number of bytes already +analyzed in the available input data, one per channel. There is also an offset +couple at the stream level, in the strm_flt object, representing the total +number of bytes already forwarded. These offsets may be retrieved and updated +using following macros : + + * FLT_OFF(flt, chn) + + * FLT_STRM_OFF(s, chn) + +where 'flt' is the 'struct filter' passed as argument in all callbacks, 's' the +filtered stream and 'chn' is the considered channel. However, there is no reason +for a filter to use these macros or take care of these offsets. + + +3.6.1 FILTERING DATA ON TCP STREAMS +----------------------------------- + +The TCP data filtering for TCP streams is the easy case, because HAProxy do not +parse these data. Data are stored in raw in the buffer. So there is only one +callback to consider: + + * 'flt_ops.tcp_payload : This callback is called when input data are + available. If not defined, all available data will be considered as analyzed + and forwarded from the filter point of view. + +This callback is called only if the filter is registered to analyze TCP +data. Here is an example : + + /* Returns a negative value if an error occurs, else the number of + * consumed bytes. */ + static int + my_filter_tcp_payload(struct stream *s, struct filter *filter, + struct channel *chn, unsigned int offset, + unsigned int len) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + int ret = len; + + /* Do not parse more than 'my_conf->max_parse' bytes at a time */ + if (my_conf->max_parse != 0 && ret > my_conf->max_parse) + ret = my_conf->max_parse; + + /* if available data are not completely parsed, wake up the stream to + * be sure to not freeze it. The best is probably to set a + * chn->analyse_exp timer */ + if (ret != len) + task_wakeup(s->task, TASK_WOKEN_MSG); + return ret; + } + +But it is important to note that tunnelled data of an HTTP stream may also be +filtered via this callback. Tunnelled data are data exchange after an HTTP tunnel +is established between the client and the server, via an HTTP CONNECT or via a +protocol upgrade. In this case, the data are structured. Of course, to do so, +the filter must be able to parse HTX data and must have the FLT_CFG_FL_HTX flag +set. At any time, the IS_HTX_STRM() macros may be used on the stream to know if +it is an HTX stream or a TCP stream. + + +3.6.2 FILTERING DATA ON HTTP STREAMS +------------------------------------ + +The HTTP data filtering is a bit more complex because HAProxy data are +structutred and represented to an internal format, called HTX. So basically +there is the HTTP counterpart to the previous callback : + + * 'flt_ops.http_payload' : This callback is called when input data are + available. If not defined, all available data will be considered as analyzed + and forwarded for the filter. + +But the prototype for this callbacks is slightly different. Instead of having +the channel as parameter, we have the HTTP message (struct http_msg). This +callback is called only if the filter is registered to analyze TCP data. Here is +an example : + + /* Returns a negative value if an error occurs, else the number of + * consumed bytes. */ + static int + my_filter_http_payload(struct stream *s, struct filter *filter, + struct http_msg *msg, unsigned int offset, + unsigned int len) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + struct htx *htx = htxbuf(&msg->chn->buf); + struct htx_ret htxret = htx_find_offset(htx, offset); + struct htx_blk *blk; + + blk = htxret.blk; + offset = htxret.ret; + for (; blk; blk = htx_get_next_blk(blk, htx)) { + enum htx_blk_type type = htx_get_blk_type(blk); + + if (type == HTX_BLK_UNUSED) + continue; + else if (type == HTX_BLK_DATA) { + /* filter data */ + } + else + break; + } + + return len; + } + +In addition, there are two others callbacks : + + * 'flt_ops.http_headers' : This callback is called just before the HTTP body + forwarding and after any processing on the request/response HTTP + headers. When defined, this callback is always called for HTTP streams + (i.e. without needs of a registration on data filtering). + Here is an example : + + + /* Returns a negative value if an error occurs, 0 if it needs to wait, + * any other value otherwise. */ + static int + my_filter_http_headers(struct stream *s, struct filter *filter, + struct http_msg *msg) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + struct htx *htx = htxbuf(&msg->chn->buf); + struct htx_sl *sl = http_get_stline(htx); + int32_t pos; + + for (pos = htx_get_first(htx); pos != -1; pos = htx_get_next(htx, pos)) { + struct htx_blk *blk = htx_get_blk(htx, pos); + enum htx_blk_type type = htx_get_blk_type(blk); + struct ist n, v; + + if (type == HTX_BLK_EOH) + break; + if (type != HTX_BLK_HDR) + continue; + + n = htx_get_blk_name(htx, blk); + v = htx_get_blk_value(htx, blk); + /* Do something on the header name/value */ + } + + return 1; + } + + * 'flt_ops.http_end' : This callback is called when the whole HTTP message was + processed. It may interrupt the stream processing. So, it could be used to + synchronize the HTTP request with the HTTP response, for instance : + + /* Returns a negative value if an error occurs, 0 if it needs to wait, + * any other value otherwise. */ + static int + my_filter_http_end(struct stream *s, struct filter *filter, + struct http_msg *msg) + { + struct my_filter_ctx *my_ctx = filter->ctx; + + + if (!(msg->chn->flags & CF_ISRESP)) /* The request */ + my_ctx->end_of_req = 1; + else /* The response */ + my_ctx->end_of_rsp = 1; + + /* Both the request and the response are finished */ + if (my_ctx->end_of_req == 1 && my_ctx->end_of_rsp == 1) + return 1; + + /* Wait */ + return 0; + } + +Then, to finish, there are 2 informational callbacks : + + * 'flt_ops.http_reset' : This callback is called when an HTTP message is + reset. This happens either when a 1xx informational response is received, or + if we're retrying to send the request to the server after it failed. It + could be useful to reset the filter context before receiving the true + response. + By checking s->txn->status, it is possible to know why this callback is + called. If it's a 1xx, we're called because of an informational + message. Otherwise, it is a L7 retry. + + * 'flt_ops.http_reply' : This callback is called when, at any time, HAProxy + decides to stop the processing on a HTTP message and to send an internal + response to the client. This mainly happens when an error or a redirect + occurs. + + +3.6.3 REWRITING DATA +-------------------- + +The last part, and the trickiest one about the data filtering, is about the data +rewriting. For now, the filter API does not offer a lot of functions to handle +it. There are only functions to notify HAProxy that the data size has changed to +let it update internal state of filters. This is the developer responsibility to +update data itself, i.e. the buffer offsets, using following function : + + * 'flt_update_offsets()' : This function must be called when a filter alter + incoming data. It updates offsets of the stream and of all filters + preceding the calling one. Do not call this function when a filter change + the size of incoming data leads to an undefined behavior. + +A good example of filter changing the data size is the HTTP compression filter. -- cgit v1.2.3