Parsing can cause network requests to be performed, especially
if a URI is given as an argument such as with
raptor_parser_parse_uri()
however there may also be indirect requests such as with the
GRDDL parser that retrieves URIs depending on the results of
initial parse requests. The URIs requested may not be wanted
to be fetched or need to be filtered, and this can be done in
three ways.
RAPTOR_OPTION_NO_NET
The parser option
RAPTOR_OPTION_NO_NET
can be set with
raptor_parser_set_option()
and forbids all network requests. There is no customisation with
this approach, for that see the URI filter in the next section.
rdf_parser = raptor_new_parser(world, "rdfxml"); /* Disable internal network requests */ raptor_parser_set_option(rdf_parser, RAPTOR_OPTION_NO_NET, NULL, 1);
raptor_www_set_uri_filter()
The
raptor_www_set_uri_filter()
allows setting of a filtering function to operate on all URIs
retrieved by a WWW connection. This connection can be used in
parsing when operated by hand.
void write_bytes_handler(raptor_www* www, void *user_data, const void *ptr, size_t size, size_t nmemb) { { raptor_parser* rdf_parser = (raptor_parser*)user_data; raptor_parser_parse_chunk(rdf_parser, (unsigned char*)ptr, size*nmemb, 0); } int uri_filter(void* filter_user_data, raptor_uri* uri) { /* return non-0 to forbid the request */ } int main(int argc, char *argv[]) { ... rdf_parser = raptor_new_parser(world, "rdfxml"); www = raptor_new_www(world); /* filter all URI requests */ raptor_www_set_uri_filter(www, uri_filter, filter_user_data); /* make WWW write bytes to parser */ raptor_www_set_write_bytes_handler(www, write_bytes_handler, rdf_parser); raptor_parser_parse_start(rdf_parser, uri); raptor_www_fetch(www, uri); /* tell the parser that we are done */ raptor_parser_parse_chunk(rdf_parser, NULL, 0, 1); raptor_free_www(www); raptor_free_parser(rdf_parser); ... }
raptor_parser_set_uri_filter()
The
raptor_parser_set_uri_filter()
allows setting of a filtering function to operate on all URIs that
the parser sees. This operates on the internal raptor_www object
used inside parsing to retrieve URIs, similar to that described in
the previous section.
int uri_filter(void* filter_user_data, raptor_uri* uri) { /* return non-0 to forbid the request */ } rdf_parser = raptor_new_parser(world, "rdfxml"); raptor_parser_set_uri_filter(rdf_parser, uri_filter, filter_user_data); /* parse content as normal */ raptor_parser_parse_uri(rdf_parser, uri, base_uri);
RAPTOR_OPTION_WWW_TIMEOUT
If the value of option
RAPTOR_OPTION_WWW_TIMEOUT
if set to a number >0, it is used as the timeout in seconds
for retrieving of URIs during parsing (primarily for GRDDL).
This uses
raptor_www_set_connection_timeout()
internally.
rdf_parser = raptor_new_parser(world, "grddl"); /* set internal URI retrieval maximum time to 5 seconds */ raptor_parser_set_option(rdf_parser, RAPTOR_OPTION_WWW_TIMEOUT, NULL, 5);