diff options
Diffstat (limited to '')
-rw-r--r-- | samples/cernrules.txt | 640 |
1 files changed, 640 insertions, 0 deletions
diff --git a/samples/cernrules.txt b/samples/cernrules.txt new file mode 100644 index 0000000..6df8b30 --- /dev/null +++ b/samples/cernrules.txt @@ -0,0 +1,640 @@ +# This files contains examples and an explanation for the RULESFILE / RULE +# feature. +# +# Rules for Lynx are experimental. They provide a rudimentary capability +# for URL rejection and substitution based on string matching. +# Most users and most installations will not need this feature, it is here +# in case you find it useful. Note that this may change or go away in +# future releases of Lynx; if you find it useful, consider describing your +# use of it in a message to <lynx-dev@nongnu.org>. +# +# Syntax: +# ======= +# Summary of common forms: +# +# Fail URL1 +# Map URL1 URL2 [CONDITION] +# Pass URL1 [URL2] [CONDITION] +# Redirect URL1 URL2 [CONDITION] +# RedirectPerm URL1 URL2 [CONDITION] +# UseProxy URL1 PROXYURL [CONDITION] +# UseProxy URL1 "none" [CONDITION] +# +# Alert URL1 MESSAGE [CONDITION] +# AlwaysAlert URL1 MESSAGE [CONDITION] +# UserMsg URL1 MESSAGE [CONDITION] +# InfoMsg URL1 MESSAGE [CONDITION] +# Progress URL1 MESSAGE [CONDITION] +# +# As you may have guessed, comments are introduced by a '#' character. +# Rules have the general form +# Operator Operand1 [Operand2] [CONDITION] +# with words separated by whitespace. Words containing space can be quoted +# with "double quotes". Although normally this should not be necessary +# necessary for URLs, it has to be used for MESSAGE Operands in Alert etc. +# See below for an explanation of the optional CONDITION. +# +# Recognized operators are +# +# Fail URL1 +# Reject access to this URL, stop processing further rules. +# +# Map URL1 URL2 +# Change the current URL to URL2, then continue processing. +# +# Pass URL1 [URL2] +# Accept this URL and stop processing further rules; if URL2 +# is given, apply this as the last mapping. +# See the next item for reasons why you generally don't want to "pass" +# a changed URL. +# +# RedirectTemp URL1 URL2 +# RedirectPerm URL1 URL2 +# Redirect [STATUS] URL1 URL2 +# Stop processing further rules and redirect to URL2, just as if lynx had +# received a HTTP redirection with URL2 as the new location. This means that +# URL2 is subject to any applicable permission checking, if it passes a new +# request will be issued (which may result in a new round of rules checking, +# with a new "current URL") or the new URL might be taken from the cache, and, +# after successful loading, lynx's idea of what the loaded document's URL is +# will be fully updated. All this does not happen if you just "pass" a changed +# URL (or let it fall through), so this is generally the preferred way for +# substituting URLs. +# If the RedirectPerm variant is used, or if the optional word is supplied and +# is either "permanent" or "301", act as if lynx had received a permanent +# redirection (with HTTP status 301). In most cases this will not make a +# noticeable difference. Lynx may cache the location in a special way for 301 +# redirections, so that the redirection is followed immediately the next time +# the same original URL is accessed, without re-checking of rules. Therefore +# the permanent variant should never be used if the desired outcome of rules +# processing depends on variable conditions (see CONDITIONS below) or on +# setting a special flag (see next item). +# +# PermitRedirection URL1 +# Mark following redirection as permitted, and continue processing. Some +# redirection locations are normally not allowed, because permitting them in a +# response from an arbitrary remote server would open a security hole, and +# others are not allowed if certain restrictions options are in effect. Among +# redirection locations normally always forbidden are lynxprog: and lynxexec: +# schemes. With "default" anonymous restrictions in effect, many URL schemes +# are disallowed if the user would not be allowed to use them with 'g'oto. +# This rule allows to override the permission checking if rules processing ends +# with a Redirect (including the RedirectPerm or RedirectTemp forms). It is +# ignored otherwise, in particular, it does not influence acceptance if rules +# processing ends with a "Pass" and a real redirection is received in the +# subsequent HTTP request. If redirections are chained, it only applies to the +# redirection that ends the same rules cycle. Note that the new URL is still +# subject to other permission checks that are not specific to redirections; but +# using this rule may still weaken the expected effect of -anonymous, +# -validate, -realm, and other restriction options, including TRUSTED_EXEC and +# similar in lynx.cfg, so be careful where you redirect to if restrictions are +# important! +# +# UseProxy URL1 PROXYURL +# Stop processing further rules, and force access through the proxy given by +# PROXYURL. PROXYURL should have the same form as required for foo_proxy +# environment variables and lynx.cfg options, i.e., (unless you are trying to +# do something unusual) "http://some.proxy-server.dom:port/". This rule +# overrides any use of a proxy (or external gateway) that might otherwise apply +# because of environment variables or lynx.cfg options, it also overrides any +# "no_proxy" settings. +# +# UseProxy URL1 none +# Mark request as NOT using any proxy (or external gateway), and continue +# processing(!). For a request marked this way, any subsequent UseProxy +# rule with a PROXYURL will be ignored, and any use of a proxy (or external +# gateway) that might otherwise apply because of environment variables or +# lynx.cfg options will be overridden. Note that the marking will not +# survive a Redirect rule (since that will result, if successful, in a +# new request). +# +# Alert URL1 MESSAGE +# AlwaysAlert URL1 MESSAGE +# UserMsg URL1 MESSAGE +# InfoMsg URL1 MESSAGE +# Progress URL1 MESSAGE +# These produce various kinds of statusline messages, differing in whether +# a pause is enforced and in its duration, immediately when the rule is +# applied. AlwaysAlert shows the message text even in non-interactive mode +# (-dump, -source, etc.). Rule processing continues after the message is +# shown. As usual, these rules only apply if URL1 matches. MESSAGE is +# the text to be displayed, it can contain one occurrence of "%s" which +# will be replaced by the current URL, literal '%' characters should be +# doubled as "%%". +# +# Rules are processed sequentially first to last for each request, a rule +# applies if the current URL matches URL1. The current URL is initially the +# URL for the resource the user is trying to access, but may change as the +# result of applied Map rules. case-sensitive (!) string comparison is used, +# in addition URL1 can contain one '*' which is interpreted as a wildcard +# matching 0 or more characters. So if for example +# "http://example.com/dir/doc.html" is requested, it would match any of +# the following: +# Pass http:* +# Pass http://example.com/*.html +# Pass http://example.com/* +# Pass http://example* +# Pass http://*/doc.html +# but not: +# Pass http://example/* +# Pass http://Example.COM/dir/doc.html +# Pass http://Example.COM/* +# +# If a URL2 is given and also contains a '*', that character will be +# replaced by whatever matched in URL1. Processing stops with the +# first matching "Fail" or "Pass" or when the end of the rules is reached. +# If the end is reached without a "Fail" or "Pass", the URL is allowed +# (equivalent to a final "Pass *"). +# +# The requested URL will have been transformed to Lynx's normal +# representation. This means that local file resources should be +# expected in the form "file://localhost/<path using slash separators>", +# not in the machine's native representation for filenames. +# +# Anyone with experience configuring the venerable CERN httpd server will +# recognize some of the syntax - in fact, the code implementing rules goes +# back to a common ancestor. But note the differences: all URLs and URL- +# patterns here have to be given as absolute URLs, even for local files. +# (Absolute URLs don't imply proxying.) +# +# CONDITIONS +# ---------- +# All rules mentioned can be followed by an optional CONDITION, which can +# be used to further restrict when the rule should be applied (in addition +# to the match on URL1). A CONDITION takes one of the forms +# "if" CONDITIONFLAG +# "unless" CONDITIONFLAG +# and currently two condition flags are recognized: +# "userspecified" (or abbreviated "userspec") +# "redirected" +# To explain these, first some terms need to be defined. A "request" +# is... +# +# A user action (like following a link, or entering a 'g'oto URL) can either be +# rejected immediately (for example, because of restrictions in effect, or +# because of invalid input), or can generate a "request". For the purpose of +# this discussion, a "request" is the sequence of processing done by lynx, +# which might ultimately lead to an actual network request and loading and +# display of data; a request can also result in rejection (for example, some +# restrictions are checked at this stage), or in a redirection. A redirection +# in turn can be rejected (which makes the request fail), or can automatically +# generate a new request. A "request chain" is the sequence of one or more +# requests triggered by the same user event that are chained together by +# redirections. +# For each request, some URL schemes are handled (or rejected) specially, see +# Limitation 1 below, the others are passed to the generic access code. Rules +# processing occurs at the beginning of the generic access code, before a +# request is dispatched to the scheme-specific protocol module (but after +# checking whether the request can be satisfied by re-displaying an already +# cached document). +# With these definitions, the meaning of the possible CONDITIONFLAGS: +# +# if redirected +# The rule applies if the current request results from a redirection; +# whether that was a real HTTP redirection or one generated by a rule +# in the previous request makes no difference. In other words, the +# condition is true if the current request is not the first one in the +# request chain. +# +# if userspecified +# The rule applies if the initial URL of the request chain was specified +# by the user. Lynx marks a request as "user specified" for URLs that +# come from 'g'oto prompts, as well as for following links in a bookmark +# or Jump file and some other special (lynx-generated) pages that may +# contain URLs that were typed in by the user. +# Note that this is not a property of the request, but of the whole request +# chain (based on where the first request's URL came from). The current +# URL may differ from what the user typed +# - because of initial fixups, including conversion of Guess-URLs and file +# paths to full URLs, +# - because of Map rules applied, and/or +# - because of a previous redirection. +# So to make reasonably sure a suspicious or potentially dangerous URL has +# been entered by the user, i.e. is not a link or external redirection +# location that cannot be trusted, a combination of "userspecified" and +# "redirected" flags should be used, for example +# Fail URL1 unless userspecified +# Fail URL1 if redirected +# ... +# +# CAVEAT +# ====== +# First, to squash any false expectations, an example for what NOT TO DO. +# It might be expected that a rule like +# Fail file://localhost/etc/passwd # <- DON'T RELY ON THIS +# could be used to prevent access to the file "/etc/passwd". This might +# fool a naive user, but the more sophisticated user could still gain +# access, by experimenting with other forms like (@@@ untested) +# "file://<machine's domain name>/etc/passwd" or "/etc//passwd" +# or "/etc/p%61asswd" or "/etc/passwd?" or "/etc/passwd#X" and so on. +# There are many URL forms for accessing the same resource, and Lynx +# just doesn't guarantee that URLs for the same resource will look the +# same way. +# +# The same reservation applies to any attempts to block access to unwanted +# sites and so on. This isn't the right place for implementing it. +# (Lynx has a number of mechanisms documented elsewhere to restrict access, +# see the INSTALLATION file, lynx.cfg, lynx -help, lynx -restrictions.) +# +# Some more useful applications: +# +# 1. Disabling URLs by access scheme +# ---------------------------------- +# Fail gopher:* +# Fail finger:* +# Fail lynxcgi:* +# Fail LYNXIMGMAP:* +# This should work (but no guarantees) because Lynx canonicalizes +# the case of recognized access schemes and does not interpret +# %-escaping in the scheme part (@@@ always?) +# +# Note that for many access schemes Lynx already has mechanisms to +# restrict access (see lynx.cfg, -help, -restrictions, etc.), others +# have to be specifically enabled. Those mechanisms should be used +# in preference. +# Note especially Limitation 1 below. +# This can be used for the remaining cases, or in addition by the +# more paranoid. Note that disabling "file:*" will also make many +# of the special pages generated by lynx as temporary files (INFO, +# history, ...) inaccessible, on the other hand it doesn't prevent +# _writing_ of various temp files - probably not what you want. +# +# You could also direct access for a scheme to a brief text explaining +# why it's not available: +# Redirect news:* http://localhost/texts/newsserver-is-broken.html +# +# 2. Preventing accidental access +# ------------------------------- +# If there is a page or site you don't want to access for whatever +# reason (say there's a link to it that crashes Lynx [don't forget to +# report a bug], or if that starts sending you a 5 Mb file you don't +# want, or you just don't like the people...), you can prevent yourself +# from accidentally accessing it: +# Fail http://bad.site.com/* +# +# 3. Compressed files +# ------------------- +# You have downloaded a bunch of HTML documents, and compressed them +# to save space. Then you discover that links between the files don't +# work, because they all use the names of the uncompressed files. The +# following kind of rule will allow you to navigate, invisibly accessing +# the compressed files: +# Map file://localhost/somedir/*.html file://localhost/somedir/*.html.gz +# or, perhaps better: +# Redirect file://localhost/somedir/*.html file://localhost/somedir/*.html.gz +# +# 4. Use local copies +# ------------------- +# You have downloaded a tree of HTML documents, but there are many links +# between them that still point to the remote location. You want to access +# the local copies instead, after all that's why you downloaded them. You +# could start editing the HTML, but the following might be simpler: +# Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html +# Or even combine this with compressing the files: +# Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html.gz +# +# Again, replacing the "Map" with "Redirect" is probably better - it will +# allow you to see the _real_ location on the lynx INFO screen or in the +# HISTORY list, will avoid duplicates in the cache if the same document is +# loaded with two different URLs, and may allow you to 'e'dit the local +# from within lynx if you feel like it. +# +# 5. Broken links etc. +# -------------------- +# A user has moved from http://www.siteA.com/~jdoe to http://siteB.org/john, +# or http://www.provider.com/company/ has moved to their own server +# http://www.company.com, but there are still links to the old location +# all over the place; they now are broken or lead to a stupid "this page +# has moved, please update your bookmarks. Refresh in 5 seconds" page +# which you're tired of seeing. This will not fix your bookmarks, and +# it will let you see the outdated URLs for longer (Limitation 3 below), +# but for a quick fix: +# Redirect http://www.siteA.com/~jdoe/* http://siteB.org/john/* +# Redirect http://www.provider.com/company/* http://www.company.com/* +# +# You could use "Map" instead of "Redirect", but this would let you see the +# outdated URLs for longer and even bookmark them, and you are likely to +# create invalid links if not all documents from a site are mapped +# (Limitation 3). +# +# 6. DNS troubles +# --------------- +# A special case of broken links. If a site is inaccessible because the +# name cannot be resolved (your or their name server is broken, or the +# name registry once again made a mistake, or they really didn't pay in +# time...) but you still somehow know the address; or if name lookups are +# just too slow: +# Map http://www.somesite.com/* http://10.1.2.3/* +# (You could do the equivalent more cleanly by adding an entry to the hosts +# file, if you have access to it.) +# +# Or, if a name resolves to several addresses of which one is down, and the +# DNS hasn't caught up: +# Map http://www.w3.org/* http://www12.w3.org/* +# +# Note that this can break access to some name-based virtually hosted sites. +# +# In this case use of "Map" is probably preferred over "Redirect", as long +# as the URL on the left side contains the real and preferred hostname or +# the problem is only temporary. +# +# 7. Avoid redirections +# --------------------- +# Some sites have a habit to provide links that don't go to the destination +# directly but always force redirection via some intermediate URL. The +# delay imposed by this, especially for users with slower connections and +# for overloaded servers, can be avoided if the intermediate URLs always +# follow some simple pattern: we can then anticipate the redirect that will +# inevitably follow and generate it internally. For example, +# Redirect http://lwn.net/cgi-bin/vr/* http://* +# +# Warning: The page authors may not like this circumvention. Often the +# redirection is wanted by them to track access, sometimes in connection +# with cookies. Some sites may employ mechanisms that defeat the shortcut. +# It is your responsibility to decide whether use of this feature is +# acceptable. (But note that the same effect can be achieved anyway for +# any link by editing the URL, e.g. with the ELGOTO ('E') key in Lynx, so +# a shortcut like this does not create some new kind of intrusion.) +# +# 8. Detailed proxy selection +# --------------------------- +# Basic use for this one should be obvious, if you have a need for it. +# It simply allows selecting use (or non-use) of proxies on a more detailed +# level than the traditional <scheme>_proxy and no_proxy variables, as well +# as using different proxies for different sites. +# For example, to request access through an anonymizing proxy for all pages +# on a "suspicious" site: +# UseProxy http://suspicious.site/* http://anonymyzing.proxy.dom/ +# (as long as all URLs really have a matching form, not some alternative +# like <http://suspicious.site:80/> or <http://SuSpIcIoUs.site/>!) +# +# To access some site through a local squid proxy, running on the same host +# as lynx, except for some image types (say because you rarely access images +# with lynx anyway, and if you do, you don't want them cached by the proxy): +# UseProxy http://some.site/*.gif none +# UseProxy http://some.site/*.jpg none +# UseProxy http://some.site/* http://localhost:3128/ +# Note that order is important here. +# +# To exempt a local address from all proxying: +# UseProxy http://local.site/* none +# +# Note however that for some purposes the "no_proxy" setting may be better +# suited than "UseProxy ... none", because of its different matching logic +# (see comments in lynx.cfg). +# +# 9. Invent your own scheme +# ------------------------- +# Suppose you want to teach lynx to handle a completely new URL scheme. +# If what's required for the new scheme is already available in lynx in +# _some_ way, this may be possible with some inventive use of rules. +# As an example, let's assume you want to introduce a simple "man:" scheme +# for showing manual pages, so (for a Unix-like system, at least) "man:lynx" +# would display the same help information as the "man lynx" command and so +# on (we ignore section numbers etc. for simplicity here). +# First, since lynx doesn't know anything about a "man:" scheme, it will +# normally reject any such URLs at an early stage. However, a trick exists +# to bypass that hurdle: define a man_proxy environment variable *outside of +# lynx, before starting lynx* (it won't work in lynx.cfg), the actual value +# is unimportant and won't actually be used. For example, in your shell: +# export man_proxy=X +# +# If you already have some kind of HTTP-accessible man gateway available, +# the task then probably just amounts to transforming the URL into the right +# form. For one such gateway (in this case, a CGI script running on the +# local machine), the rule +# Redirect man:* http://localhost/cgi-bin/dwww?type=runman&location=*/ +# or, alternatively, +# UseProxy man:* none +# Map man:* http://localhost/cgi-bin/dwww?type=runman&location=*/ +# does it, for other setups the right-hand side just has to be modified +# appropriately. The "UseProxy" is to make sure the bogus man_proxy gets +# ignored. +# +# If no CGI-like access is available, you might want to invoke your system's +# man command directly for a man: URL. Here is some discussion of how this +# could be done, and why ultimately you may not want to do it; this is also +# an opportunity to show examples for how some of the rules and conditions +# can be used that haven't been discussed in detail elsewhere. +# Lynx provides the lynxexec: (and the similar lynxprog:) scheme for running +# (nearly) arbitrary commands locally. At the heart of employing it for +# man: would be a rule like this: +# Redirect man:* "lynxexec:/usr/bin/man *" +# (It is a peculiarity of this scheme that the literal space and quoting +# are necessary here. Also note that Map cannot be used here instead of +# Redirect, since lynxexec, as a special kind of URL, needs to be handled +# "early" in a request.) +# Of course, execution of arbitrary commands is a potentially dangerous +# thing. lynxexec has to be specifically enabled at compile time and in +# lynx.cfg (or with command line options), and there are various levels +# of control, too much to go into here. It is assumed in the following that +# lynxexec has been enabled to the degree necessary (allow /usr/bin/man +# execution) but hopefully not too much. +# What needs to be prevented is that allowing local execution of the man +# command might unintentionally open up unwanted execution of other commands, +# possibly by some trick that could be exploited. For example, redirecting +# man:* as above, the URL "man:lynx;rm -r *" could result in the command +# "man lynx;rm -r *" executed by the system, with obvious disastrous results. +# (This particular example won't actually work, for several reasons; but +# for the purpose of discussion let's assume it did, there may be similar +# ones that do.) +# Because of such dangers, redirection to a lynxexec: is normally never +# accepted by lynx. We need at least a PermitRedirection rule to override +# this protective limitation: +# PermitRedirection man:* +# Redirect man:* "lynxexec:/usr/bin/man *" +# But now we have potentially opened up local execution more than is +# acceptable via the man: scheme, so this needs to be examined. +# There are two aspects to security here: (1) restricting the user, and (2) +# protecting the user. The first could also be phrased as protecting the +# system from the user; the second as preventing lynx (and the system) from +# doing things the user doesn't really want. Aspect (1) is very important +# for setups providing anonymous guest accounts and similarly restricted +# environments. (Otherwise shell access is normally allowed, and trying to +# protect the system in lynx would be rather pointless.) As far as access +# to some URLs is concerned, the difference can be characterized in terms of +# which sources of URLs are trusted enough to allow access: for (1), only +# links occurring in a limited number of documents are trusted enough for +# some (or all) URLs, user input at 'g'oto prompts and the like is not (if +# not completely disabled). For (2) and assuming a user with normal shell +# privileges, the user may be trusted enough to accept any URL explicitly +# entered, but URLs from arbitrary external sources are not - someone might +# try to use them to trick the user (by following an innocent-looking link) +# or lynx (by following a redirection) into doing something undesirable. +# +# In the following we are concerned with (2); it is assumed that providers +# of anonymous accounts would not want to follow this path, and would have +# no need for additional schemes that imply local execution anyway. (For +# one thing, with the man example they would have to carefully check that +# users cannot break out of the man command to a local shell prompt.) +# +# Getting back to the example, it was already mentioned that lynx does not +# allow redirections to lynxexec. In fact this continues to be disallowed +# for real redirection received from HTTP servers. But we have introduced +# a new man: scheme, and the lynx code that does the redirection checking +# doesn't know anything about special considerations for man: URLs, so +# an external HTTP server might send a redirection message with "Location: +# man:<something>", which lynx would allow, and which would in turn be +# redirected by our rule to "lynxexec:/usr/bin/man <something>". Unless +# we are 100% sure that either this can never happen or that the lynxexec +# URL resulting from this can have no harmful effect, this needs to be +# prevented. It can be done by checking for the "redirected" condition, +# either by putting something like (the first line is of course optional) +# Alert man:* "Redirection to man: not allowed" if redirected +# Fail man:* if redirected +# somewhere before the Redirect rule, or, reversing the logic, by adding +# a condition to the redirection rules, i.e. they become +# PermitRedirection man:* unless redirected +# Redirect man:* "lynxexec:/usr/bin/man *" unless redirected +# (actually, putting the condition on either one of the rules would be +# sufficient). The second variant assumes that the attempted access to +# man: via redirection will ultimately fail because there is no other way +# to handle such URLs. +# +# The above should take care of rejecting man: URLs from redirections, but +# what about regular links in HTML (like <A HREF="man:...">)? As long as +# it can be assumed that the user will always inspect each and every link +# before following it, and never follow a link that can have harmful effect, +# no further restrictions are necessary. But this is a very big assumption, +# unrealistic except perhaps in some single-user setups where the user is +# is identical with the rule writer. So normally most links have to be +# regarded as suspect, and only URLs entered by the user can be accepted: +# Alert man:* "Redirection to man: not allowed" if redirected +# Fail man:* if redirected +# Alert man:* "Link to man: not allowed" unless userspecified +# Fail man:* unless userspecified +# +# With these restrictions we have limited the ways our new man: scheme can +# be used rather severely, to the point where its usefulness is questionable. +# In addition to 'g'oto prompts, it may work in Jump files; also, should +# links to man:<something> appear in HTML text, the user could retype them +# manually or use the ELGOTO ('E') command with some trivial editing (like +# adding a space) to "confirm" the URL. Even if the precautions outlined +# above are followed: THIS TEXT DOES NOT IMPLY ANY PROMISE THAT, BY FOLLOWING +# THE EXAMPLES, LYNX WILL BE SAFE. On the other hand, some of the precautions +# *may* not be necessary: it is possible that careful use of TRUSTED_EXEC +# options in lynx.cfg could offer enough protection while making the new +# scheme more useful. +# +# If all this seems a bit too scary, that's intentional; it should be noted +# that these considerations are not in general necessary for "harmless" URL +# schemes, but appropriate for this "extreme" example. One last remark +# regarding the hypothetical man scheme: instead of implementing it through +# "lynxexec:" or "lynxprog:", it would be somewhat safer to use "lynxcgi:" +# instead if it is supported. A simple lynxcgi script would have to write +# the man page to stdout (either converted to text/html or as plain text, +# preceded by an appropriate Content-Type header line), and all necessary +# checking for special shell characters would be done within the script - +# lynx does not use the system() function to run the script. +# +# Other Limitations +# ================= +# First, see CAVEAT above. There are other limitations: +# +# 1. Applicable URL schemes +# ------------------------- +# Rules processing does not apply to all URL schemes. Some are +# handled differently from the generic access code, therefore rules +# for such URLs will never be "seen". This limitation applies at +# least to lynxexec:, lynxprog:, mailto:, LYNXHIST:, LYNXMESSAGES:, +# LYNXCFG:, and LYNXCOMPILEOPTS: URLs. You shouldn't be tempted +# to try to redirect most of these schemes anyway, but this also +# makes it impossible to disable them with "Fail" rules. +# +# Also, a scheme has to be known to Lynx in order to get as far as +# applying rules - you cannot just define your own new foobar: scheme +# and then map it to something here, but see Application 9, above, +# for a workaround. +# +# 2. No re-checking +# ----------------- +# When a URL is mapped to a different one, the new URL is not checked +# again for compliance with most restrictions established by -anonymous, +# -restrictions, lynx.cfg and so on. This can be regarded as a feature: +# it allows specific exceptions. Of course it means that users for +# whom any restrictions must be enforced cannot have write access to a +# personal rules file, but that should be obvious anyway! +# This limitation does not applies if "Redirect" is used, in that case +# the new URL will always be re-examined. +# +# 3. Mappings are invisible +# ------------------------- +# Changing the URL with "Map" or "Pass" rules will in general not be +# visible to the user, because it happens at a late stage of processing +# a request (similar to directing a request through a proxy). One +# can think of two kinds of URL for every resource: a "Document URL" as +# the user sees it (on INFO page, history list, status line, etc.), and +# a "physical URL" used for the actual access. Rules change only the +# physical URL. This is different from the effect of HTTP redirection. +# Often this is bad, sometimes it may be desirable. +# +# Changing the URL can create broken links if a document has relative URLs, +# since they are taken to be relative to the "Document URL" (if no BASE tag +# is present) when the HTML is parsed. +# +# This limitation does not apply if "Redirect" is used - the new location +# will be visible to the user, and will be used by lynx for resolving +# relative URLs within the document. +# +# 4. Interaction with proxying +# ---------------------------- +# Rules processing is done after most other access checks, but before +# proxy (and gateway) settings are examined. A "Fail" rule works +# as expected, but when the URL has been mapped to a different one, +# the subsequent proxy checking can get confused. If it decides that +# access is through a proxy or gateway, it will generally use the +# original URL to construct the "physical" URL, effectively overriding +# the mapping rules. If the mapping is to a different access scheme +# or hostname, proxy checking could also be fooled to use a proxy when +# it shouldn't, to not use one when it should, or (if different proxies +# are used for different schemes) to use the wrong proxy. So "just +# don't do that"; in some cases setting the no_proxy variable will help. +# Example 3 happens to work nicely if there is a http_proxy but no +# ftp_proxy. +# +# This limitation does not come into play if a "UseProxy" rule is applied, +# in either of its two forms: with a PROXYURL, proxying is fully under +# the control of the rules author, and with "none", subsequent proxy +# and gateway checking is completely disabled. It is therefore a good +# idea to combine any "Map" and "Pass" rules that might result in passing +# the changed URL with explicit "UseProxy" rules, if the rules file is +# expected to be used together with proxying; or else always use "Redirect" +# instead of simple passing. +# +# 5. Case-sensitive matching +# -------------------------- +# The matching logic is generic string-based. It doesn't know anything +# about URL syntax, and so it cannot know in which parts of a URL case +# matters and where it doesn't. As a result, all comparisons are case- +# sensitive. If (a limited number of) case variations of a URL need +# to be dealt with, several rules can be used instead of one. +# In particular, this makes "UseProxy ... none" in some ways more limited +# than a no_proxy setting. +# +# 6. Redirection differences +# -------------------------- +# For some URLs lynx does never check after a request whether a redirection +# occurs; that makes the "Redirect" rule useless for such URLs (in addition +# to those mentioned under limitation 1.). Some of them are some gopher +# types, telnet: and similar in most situations, newspost: and similar, +# lynxcgi:, and some other private types. Trying to redirect these will +# make access fail. You probable don't want to change such URLs anyway, +# but if you feel you must, try using "Map" and "Pass" instead. +# +# The -noredir command line option only applies for real HTTP redirection +# responses, Redirect rules are still applied. Also for certain other +# command line options (-mime_header, -head) and command keys (HEAD) lynx +# shows the redirection message (or part of it) in case of a real HTTP +# redirection, instead of following the redirection. Here, too, a Redirect +# rule remains effective (there is no redirection message to show, after all). +# +# 7. URLs required +# ---------------- +# Full absolute URLs (modulo possible "*" matching wildcards) are required +# in rules. Strings like "www.somewhere.com" or "/some/dir/some.file" or +# "www.somewhere.com/some/dir/some.file" are not URLs. Lynx may accept +# them as user input, as abbreviated forms for URLs; but by the time the +# rules get checked, those have been converted to full URLs, if they can +# be recognized. This also means that rules cannot influence which strings +# typed at a 'g'oto prompt are recognized for URLs - rules processing kicks +# in later. |