summaryrefslogtreecommitdiffstats
path: root/doc/wget.info
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--doc/wget.info5075
1 files changed, 5075 insertions, 0 deletions
diff --git a/doc/wget.info b/doc/wget.info
new file mode 100644
index 0000000..93d759b
--- /dev/null
+++ b/doc/wget.info
@@ -0,0 +1,5075 @@
+This is wget.info, produced by makeinfo version 6.5 from wget.texi.
+
+This file documents the GNU Wget utility for downloading network data.
+
+ Copyright © 1996-2011, 2015, 2018 Free Software Foundation, Inc.
+
+ Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+Texts. A copy of the license is included in the section entitled “GNU
+Free Documentation License”.
+INFO-DIR-SECTION Network applications
+START-INFO-DIR-ENTRY
+* Wget: (wget). Non-interactive network downloader.
+END-INFO-DIR-ENTRY
+
+
+File: wget.info, Node: Top, Next: Overview, Prev: (dir), Up: (dir)
+
+Wget 1.20.1
+***********
+
+This file documents the GNU Wget utility for downloading network data.
+
+ Copyright © 1996-2011, 2015, 2018 Free Software Foundation, Inc.
+
+ Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+Texts. A copy of the license is included in the section entitled “GNU
+Free Documentation License”.
+
+* Menu:
+
+* Overview:: Features of Wget.
+* Invoking:: Wget command-line arguments.
+* Recursive Download:: Downloading interlinked pages.
+* Following Links:: The available methods of chasing links.
+* Time-Stamping:: Mirroring according to time-stamps.
+* Startup File:: Wget’s initialization file.
+* Examples:: Examples of usage.
+* Various:: The stuff that doesn’t fit anywhere else.
+* Appendices:: Some useful references.
+* Copying this manual:: You may give out copies of this manual.
+* Concept Index:: Topics covered by this manual.
+
+
+File: wget.info, Node: Overview, Next: Invoking, Prev: Top, Up: Top
+
+1 Overview
+**********
+
+GNU Wget is a free utility for non-interactive download of files from
+the Web. It supports HTTP, HTTPS, and FTP protocols, as well as
+retrieval through HTTP proxies.
+
+ This chapter is a partial overview of Wget’s features.
+
+ • Wget is non-interactive, meaning that it can work in the
+ background, while the user is not logged on. This allows you to
+ start a retrieval and disconnect from the system, letting Wget
+ finish the work. By contrast, most of the Web browsers require
+ constant user’s presence, which can be a great hindrance when
+ transferring a lot of data.
+
+ • Wget can follow links in HTML, XHTML, and CSS pages, to create
+ local versions of remote web sites, fully recreating the directory
+ structure of the original site. This is sometimes referred to as
+ “recursive downloading.” While doing that, Wget respects the Robot
+ Exclusion Standard (‘/robots.txt’). Wget can be instructed to
+ convert the links in downloaded files to point at the local files,
+ for offline viewing.
+
+ • File name wildcard matching and recursive mirroring of directories
+ are available when retrieving via FTP. Wget can read the
+ time-stamp information given by both HTTP and FTP servers, and
+ store it locally. Thus Wget can see if the remote file has changed
+ since last retrieval, and automatically retrieve the new version if
+ it has. This makes Wget suitable for mirroring of FTP sites, as
+ well as home pages.
+
+ • Wget has been designed for robustness over slow or unstable network
+ connections; if a download fails due to a network problem, it will
+ keep retrying until the whole file has been retrieved. If the
+ server supports regetting, it will instruct the server to continue
+ the download from where it left off.
+
+ • Wget supports proxy servers, which can lighten the network load,
+ speed up retrieval and provide access behind firewalls. Wget uses
+ the passive FTP downloading by default, active FTP being an option.
+
+ • Wget supports IP version 6, the next generation of IP. IPv6 is
+ autodetected at compile-time, and can be disabled at either build
+ or run time. Binaries built with IPv6 support work well in both
+ IPv4-only and dual family environments.
+
+ • Built-in features offer mechanisms to tune which links you wish to
+ follow (*note Following Links::).
+
+ • The progress of individual downloads is traced using a progress
+ gauge. Interactive downloads are tracked using a
+ “thermometer”-style gauge, whereas non-interactive ones are traced
+ with dots, each dot representing a fixed amount of data received
+ (1KB by default). Either gauge can be customized to your
+ preferences.
+
+ • Most of the features are fully configurable, either through command
+ line options, or via the initialization file ‘.wgetrc’ (*note
+ Startup File::). Wget allows you to define “global” startup files
+ (‘/usr/local/etc/wgetrc’ by default) for site settings. You can
+ also specify the location of a startup file with the –config
+ option. To disable the reading of config files, use –no-config.
+ If both –config and –no-config are given, –no-config is ignored.
+
+ • Finally, GNU Wget is free software. This means that everyone may
+ use it, redistribute it and/or modify it under the terms of the GNU
+ General Public License, as published by the Free Software
+ Foundation (see the file ‘COPYING’ that came with GNU Wget, for
+ details).
+
+
+File: wget.info, Node: Invoking, Next: Recursive Download, Prev: Overview, Up: Top
+
+2 Invoking
+**********
+
+By default, Wget is very simple to invoke. The basic syntax is:
+
+ wget [OPTION]... [URL]...
+
+ Wget will simply download all the URLs specified on the command line.
+URL is a “Uniform Resource Locator”, as defined below.
+
+ However, you may wish to change some of the default parameters of
+Wget. You can do it two ways: permanently, adding the appropriate
+command to ‘.wgetrc’ (*note Startup File::), or specifying it on the
+command line.
+
+* Menu:
+
+* URL Format::
+* Option Syntax::
+* Basic Startup Options::
+* Logging and Input File Options::
+* Download Options::
+* Directory Options::
+* HTTP Options::
+* HTTPS (SSL/TLS) Options::
+* FTP Options::
+* Recursive Retrieval Options::
+* Recursive Accept/Reject Options::
+* Exit Status::
+
+
+File: wget.info, Node: URL Format, Next: Option Syntax, Prev: Invoking, Up: Invoking
+
+2.1 URL Format
+==============
+
+“URL” is an acronym for Uniform Resource Locator. A uniform resource
+locator is a compact string representation for a resource available via
+the Internet. Wget recognizes the URL syntax as per RFC1738. This is
+the most widely used form (square brackets denote optional parts):
+
+ http://host[:port]/directory/file
+ ftp://host[:port]/directory/file
+
+ You can also encode your username and password within a URL:
+
+ ftp://user:password@host/path
+ http://user:password@host/path
+
+ Either USER or PASSWORD, or both, may be left out. If you leave out
+either the HTTP username or password, no authentication will be sent.
+If you leave out the FTP username, ‘anonymous’ will be used. If you
+leave out the FTP password, your email address will be supplied as a
+default password.(1)
+
+ *Important Note*: if you specify a password-containing URL on the
+command line, the username and password will be plainly visible to all
+users on the system, by way of ‘ps’. On multi-user systems, this is a
+big security risk. To work around it, use ‘wget -i -’ and feed the URLs
+to Wget’s standard input, each on a separate line, terminated by ‘C-d’.
+
+ You can encode unsafe characters in a URL as ‘%xy’, ‘xy’ being the
+hexadecimal representation of the character’s ASCII value. Some common
+unsafe characters include ‘%’ (quoted as ‘%25’), ‘:’ (quoted as ‘%3A’),
+and ‘@’ (quoted as ‘%40’). Refer to RFC1738 for a comprehensive list of
+unsafe characters.
+
+ Wget also supports the ‘type’ feature for FTP URLs. By default, FTP
+documents are retrieved in the binary mode (type ‘i’), which means that
+they are downloaded unchanged. Another useful mode is the ‘a’ (“ASCII”)
+mode, which converts the line delimiters between the different operating
+systems, and is thus useful for text files. Here is an example:
+
+ ftp://host/directory/file;type=a
+
+ Two alternative variants of URL specification are also supported,
+because of historical (hysterical?) reasons and their widespreaded use.
+
+ FTP-only syntax (supported by ‘NcFTP’):
+ host:/dir/file
+
+ HTTP-only syntax (introduced by ‘Netscape’):
+ host[:port]/dir/file
+
+ These two alternative forms are deprecated, and may cease being
+supported in the future.
+
+ If you do not understand the difference between these notations, or
+do not know which one to use, just use the plain ordinary format you use
+with your favorite browser, like ‘Lynx’ or ‘Netscape’.
+
+ ---------- Footnotes ----------
+
+ (1) If you have a ‘.netrc’ file in your home directory, password will
+also be searched for there.
+
+
+File: wget.info, Node: Option Syntax, Next: Basic Startup Options, Prev: URL Format, Up: Invoking
+
+2.2 Option Syntax
+=================
+
+Since Wget uses GNU getopt to process command-line arguments, every
+option has a long form along with the short one. Long options are more
+convenient to remember, but take time to type. You may freely mix
+different option styles, or specify options after the command-line
+arguments. Thus you may write:
+
+ wget -r --tries=10 http://fly.srk.fer.hr/ -o log
+
+ The space between the option accepting an argument and the argument
+may be omitted. Instead of ‘-o log’ you can write ‘-olog’.
+
+ You may put several options that do not require arguments together,
+like:
+
+ wget -drc URL
+
+ This is completely equivalent to:
+
+ wget -d -r -c URL
+
+ Since the options can be specified after the arguments, you may
+terminate them with ‘--’. So the following will try to download URL
+‘-x’, reporting failure to ‘log’:
+
+ wget -o log -- -x
+
+ The options that accept comma-separated lists all respect the
+convention that specifying an empty list clears its value. This can be
+useful to clear the ‘.wgetrc’ settings. For instance, if your ‘.wgetrc’
+sets ‘exclude_directories’ to ‘/cgi-bin’, the following example will
+first reset it, and then set it to exclude ‘/~nobody’ and ‘/~somebody’.
+You can also clear the lists in ‘.wgetrc’ (*note Wgetrc Syntax::).
+
+ wget -X '' -X /~nobody,/~somebody
+
+ Most options that do not accept arguments are “boolean” options, so
+named because their state can be captured with a yes-or-no (“boolean”)
+variable. For example, ‘--follow-ftp’ tells Wget to follow FTP links
+from HTML files and, on the other hand, ‘--no-glob’ tells it not to
+perform file globbing on FTP URLs. A boolean option is either
+“affirmative” or “negative” (beginning with ‘--no’). All such options
+share several properties.
+
+ Unless stated otherwise, it is assumed that the default behavior is
+the opposite of what the option accomplishes. For example, the
+documented existence of ‘--follow-ftp’ assumes that the default is to
+_not_ follow FTP links from HTML pages.
+
+ Affirmative options can be negated by prepending the ‘--no-’ to the
+option name; negative options can be negated by omitting the ‘--no-’
+prefix. This might seem superfluous—if the default for an affirmative
+option is to not do something, then why provide a way to explicitly turn
+it off? But the startup file may in fact change the default. For
+instance, using ‘follow_ftp = on’ in ‘.wgetrc’ makes Wget _follow_ FTP
+links by default, and using ‘--no-follow-ftp’ is the only way to restore
+the factory default from the command line.
+
+
+File: wget.info, Node: Basic Startup Options, Next: Logging and Input File Options, Prev: Option Syntax, Up: Invoking
+
+2.3 Basic Startup Options
+=========================
+
+‘-V’
+‘--version’
+ Display the version of Wget.
+
+‘-h’
+‘--help’
+ Print a help message describing all of Wget’s command-line options.
+
+‘-b’
+‘--background’
+ Go to background immediately after startup. If no output file is
+ specified via the ‘-o’, output is redirected to ‘wget-log’.
+
+‘-e COMMAND’
+‘--execute COMMAND’
+ Execute COMMAND as if it were a part of ‘.wgetrc’ (*note Startup
+ File::). A command thus invoked will be executed _after_ the
+ commands in ‘.wgetrc’, thus taking precedence over them. If you
+ need to specify more than one wgetrc command, use multiple
+ instances of ‘-e’.
+
+
+File: wget.info, Node: Logging and Input File Options, Next: Download Options, Prev: Basic Startup Options, Up: Invoking
+
+2.4 Logging and Input File Options
+==================================
+
+‘-o LOGFILE’
+‘--output-file=LOGFILE’
+ Log all messages to LOGFILE. The messages are normally reported to
+ standard error.
+
+‘-a LOGFILE’
+‘--append-output=LOGFILE’
+ Append to LOGFILE. This is the same as ‘-o’, only it appends to
+ LOGFILE instead of overwriting the old log file. If LOGFILE does
+ not exist, a new file is created.
+
+‘-d’
+‘--debug’
+ Turn on debug output, meaning various information important to the
+ developers of Wget if it does not work properly. Your system
+ administrator may have chosen to compile Wget without debug
+ support, in which case ‘-d’ will not work. Please note that
+ compiling with debug support is always safe—Wget compiled with the
+ debug support will _not_ print any debug info unless requested with
+ ‘-d’. *Note Reporting Bugs::, for more information on how to use
+ ‘-d’ for sending bug reports.
+
+‘-q’
+‘--quiet’
+ Turn off Wget’s output.
+
+‘-v’
+‘--verbose’
+ Turn on verbose output, with all the available data. The default
+ output is verbose.
+
+‘-nv’
+‘--no-verbose’
+ Turn off verbose without being completely quiet (use ‘-q’ for
+ that), which means that error messages and basic information still
+ get printed.
+
+‘--report-speed=TYPE’
+ Output bandwidth as TYPE. The only accepted value is ‘bits’.
+
+‘-i FILE’
+‘--input-file=FILE’
+ Read URLs from a local or external FILE. If ‘-’ is specified as
+ FILE, URLs are read from the standard input. (Use ‘./-’ to read
+ from a file literally named ‘-’.)
+
+ If this function is used, no URLs need be present on the command
+ line. If there are URLs both on the command line and in an input
+ file, those on the command lines will be the first ones to be
+ retrieved. If ‘--force-html’ is not specified, then FILE should
+ consist of a series of URLs, one per line.
+
+ However, if you specify ‘--force-html’, the document will be
+ regarded as ‘html’. In that case you may have problems with
+ relative links, which you can solve either by adding ‘<base
+ href="URL">’ to the documents or by specifying ‘--base=URL’ on the
+ command line.
+
+ If the FILE is an external one, the document will be automatically
+ treated as ‘html’ if the Content-Type matches ‘text/html’.
+ Furthermore, the FILE’s location will be implicitly used as base
+ href if none was specified.
+
+‘--input-metalink=FILE’
+ Downloads files covered in local Metalink FILE. Metalink version 3
+ and 4 are supported.
+
+‘--keep-badhash’
+ Keeps downloaded Metalink’s files with a bad hash. It appends
+ .badhash to the name of Metalink’s files which have a checksum
+ mismatch, except without overwriting existing files.
+
+‘--metalink-over-http’
+ Issues HTTP HEAD request instead of GET and extracts Metalink
+ metadata from response headers. Then it switches to Metalink
+ download. If no valid Metalink metadata is found, it falls back to
+ ordinary HTTP download. Enables ‘Content-Type:
+ application/metalink4+xml’ files download/processing.
+
+‘--metalink-index=NUMBER’
+ Set the Metalink ‘application/metalink4+xml’ metaurl ordinal
+ NUMBER. From 1 to the total number of “application/metalink4+xml”
+ available. Specify 0 or ‘inf’ to choose the first good one.
+ Metaurls, such as those from a ‘--metalink-over-http’, may have
+ been sorted by priority key’s value; keep this in mind to choose
+ the right NUMBER.
+
+‘--preferred-location’
+ Set preferred location for Metalink resources. This has effect if
+ multiple resources with same priority are available.
+
+‘--xattr’
+ Enable use of file system’s extended attributes to save the
+ original URL and the Referer HTTP header value if used.
+
+ Be aware that the URL might contain private information like access
+ tokens or credentials.
+
+‘-F’
+‘--force-html’
+ When input is read from a file, force it to be treated as an HTML
+ file. This enables you to retrieve relative links from existing
+ HTML files on your local disk, by adding ‘<base href="URL">’ to
+ HTML, or using the ‘--base’ command-line option.
+
+‘-B URL’
+‘--base=URL’
+ Resolves relative links using URL as the point of reference, when
+ reading links from an HTML file specified via the
+ ‘-i’/‘--input-file’ option (together with ‘--force-html’, or when
+ the input file was fetched remotely from a server describing it as
+ HTML). This is equivalent to the presence of a ‘BASE’ tag in the
+ HTML input file, with URL as the value for the ‘href’ attribute.
+
+ For instance, if you specify ‘http://foo/bar/a.html’ for URL, and
+ Wget reads ‘../baz/b.html’ from the input file, it would be
+ resolved to ‘http://foo/baz/b.html’.
+
+‘--config=FILE’
+ Specify the location of a startup file you wish to use instead of
+ the default one(s). Use –no-config to disable reading of config
+ files. If both –config and –no-config are given, –no-config is
+ ignored.
+
+‘--rejected-log=LOGFILE’
+ Logs all URL rejections to LOGFILE as comma separated values. The
+ values include the reason of rejection, the URL and the parent URL
+ it was found in.
+
+
+File: wget.info, Node: Download Options, Next: Directory Options, Prev: Logging and Input File Options, Up: Invoking
+
+2.5 Download Options
+====================
+
+‘--bind-address=ADDRESS’
+ When making client TCP/IP connections, bind to ADDRESS on the local
+ machine. ADDRESS may be specified as a hostname or IP address.
+ This option can be useful if your machine is bound to multiple IPs.
+
+‘--bind-dns-address=ADDRESS’
+ [libcares only] This address overrides the route for DNS requests.
+ If you ever need to circumvent the standard settings from
+ /etc/resolv.conf, this option together with ‘--dns-servers’ is your
+ friend. ADDRESS must be specified either as IPv4 or IPv6 address.
+ Wget needs to be built with libcares for this option to be
+ available.
+
+‘--dns-servers=ADDRESSES’
+ [libcares only] The given address(es) override the standard
+ nameserver addresses, e.g. as configured in /etc/resolv.conf.
+ ADDRESSES may be specified either as IPv4 or IPv6 addresses,
+ comma-separated. Wget needs to be built with libcares for this
+ option to be available.
+
+‘-t NUMBER’
+‘--tries=NUMBER’
+ Set number of tries to NUMBER. Specify 0 or ‘inf’ for infinite
+ retrying. The default is to retry 20 times, with the exception of
+ fatal errors like “connection refused” or “not found” (404), which
+ are not retried.
+
+‘-O FILE’
+‘--output-document=FILE’
+ The documents will not be written to the appropriate files, but all
+ will be concatenated together and written to FILE. If ‘-’ is used
+ as FILE, documents will be printed to standard output, disabling
+ link conversion. (Use ‘./-’ to print to a file literally named
+ ‘-’.)
+
+ Use of ‘-O’ is _not_ intended to mean simply “use the name FILE
+ instead of the one in the URL;” rather, it is analogous to shell
+ redirection: ‘wget -O file http://foo’ is intended to work like
+ ‘wget -O - http://foo > file’; ‘file’ will be truncated
+ immediately, and _all_ downloaded content will be written there.
+
+ For this reason, ‘-N’ (for timestamp-checking) is not supported in
+ combination with ‘-O’: since FILE is always newly created, it will
+ always have a very new timestamp. A warning will be issued if this
+ combination is used.
+
+ Similarly, using ‘-r’ or ‘-p’ with ‘-O’ may not work as you expect:
+ Wget won’t just download the first file to FILE and then download
+ the rest to their normal names: _all_ downloaded content will be
+ placed in FILE. This was disabled in version 1.11, but has been
+ reinstated (with a warning) in 1.11.2, as there are some cases
+ where this behavior can actually have some use.
+
+ A combination with ‘-nc’ is only accepted if the given output file
+ does not exist.
+
+ Note that a combination with ‘-k’ is only permitted when
+ downloading a single document, as in that case it will just convert
+ all relative URIs to external ones; ‘-k’ makes no sense for
+ multiple URIs when they’re all being downloaded to a single file;
+ ‘-k’ can be used only when the output is a regular file.
+
+‘-nc’
+‘--no-clobber’
+ If a file is downloaded more than once in the same directory,
+ Wget’s behavior depends on a few options, including ‘-nc’. In
+ certain cases, the local file will be “clobbered”, or overwritten,
+ upon repeated download. In other cases it will be preserved.
+
+ When running Wget without ‘-N’, ‘-nc’, ‘-r’, or ‘-p’, downloading
+ the same file in the same directory will result in the original
+ copy of FILE being preserved and the second copy being named
+ ‘FILE.1’. If that file is downloaded yet again, the third copy
+ will be named ‘FILE.2’, and so on. (This is also the behavior with
+ ‘-nd’, even if ‘-r’ or ‘-p’ are in effect.) When ‘-nc’ is
+ specified, this behavior is suppressed, and Wget will refuse to
+ download newer copies of ‘FILE’. Therefore, “‘no-clobber’” is
+ actually a misnomer in this mode—it’s not clobbering that’s
+ prevented (as the numeric suffixes were already preventing
+ clobbering), but rather the multiple version saving that’s
+ prevented.
+
+ When running Wget with ‘-r’ or ‘-p’, but without ‘-N’, ‘-nd’, or
+ ‘-nc’, re-downloading a file will result in the new copy simply
+ overwriting the old. Adding ‘-nc’ will prevent this behavior,
+ instead causing the original version to be preserved and any newer
+ copies on the server to be ignored.
+
+ When running Wget with ‘-N’, with or without ‘-r’ or ‘-p’, the
+ decision as to whether or not to download a newer copy of a file
+ depends on the local and remote timestamp and size of the file
+ (*note Time-Stamping::). ‘-nc’ may not be specified at the same
+ time as ‘-N’.
+
+ A combination with ‘-O’/‘--output-document’ is only accepted if the
+ given output file does not exist.
+
+ Note that when ‘-nc’ is specified, files with the suffixes ‘.html’
+ or ‘.htm’ will be loaded from the local disk and parsed as if they
+ had been retrieved from the Web.
+
+‘--backups=BACKUPS’
+ Before (over)writing a file, back up an existing file by adding a
+ ‘.1’ suffix (‘_1’ on VMS) to the file name. Such backup files are
+ rotated to ‘.2’, ‘.3’, and so on, up to BACKUPS (and lost beyond
+ that).
+
+‘--no-netrc’
+ Do not try to obtain credentials from ‘.netrc’ file. By default
+ ‘.netrc’ file is searched for credentials in case none have been
+ passed on command line and authentication is required.
+
+‘-c’
+‘--continue’
+ Continue getting a partially-downloaded file. This is useful when
+ you want to finish up a download started by a previous instance of
+ Wget, or by another program. For instance:
+
+ wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
+
+ If there is a file named ‘ls-lR.Z’ in the current directory, Wget
+ will assume that it is the first portion of the remote file, and
+ will ask the server to continue the retrieval from an offset equal
+ to the length of the local file.
+
+ Note that you don’t need to specify this option if you just want
+ the current invocation of Wget to retry downloading a file should
+ the connection be lost midway through. This is the default
+ behavior. ‘-c’ only affects resumption of downloads started
+ _prior_ to this invocation of Wget, and whose local files are still
+ sitting around.
+
+ Without ‘-c’, the previous example would just download the remote
+ file to ‘ls-lR.Z.1’, leaving the truncated ‘ls-lR.Z’ file alone.
+
+ If you use ‘-c’ on a non-empty file, and the server does not
+ support continued downloading, Wget will restart the download from
+ scratch and overwrite the existing file entirely.
+
+ Beginning with Wget 1.7, if you use ‘-c’ on a file which is of
+ equal size as the one on the server, Wget will refuse to download
+ the file and print an explanatory message. The same happens when
+ the file is smaller on the server than locally (presumably because
+ it was changed on the server since your last download
+ attempt)—because “continuing” is not meaningful, no download
+ occurs.
+
+ On the other side of the coin, while using ‘-c’, any file that’s
+ bigger on the server than locally will be considered an incomplete
+ download and only ‘(length(remote) - length(local))’ bytes will be
+ downloaded and tacked onto the end of the local file. This
+ behavior can be desirable in certain cases—for instance, you can
+ use ‘wget -c’ to download just the new portion that’s been appended
+ to a data collection or log file.
+
+ However, if the file is bigger on the server because it’s been
+ _changed_, as opposed to just _appended_ to, you’ll end up with a
+ garbled file. Wget has no way of verifying that the local file is
+ really a valid prefix of the remote file. You need to be
+ especially careful of this when using ‘-c’ in conjunction with
+ ‘-r’, since every file will be considered as an "incomplete
+ download" candidate.
+
+ Another instance where you’ll get a garbled file if you try to use
+ ‘-c’ is if you have a lame HTTP proxy that inserts a “transfer
+ interrupted” string into the local file. In the future a
+ “rollback” option may be added to deal with this case.
+
+ Note that ‘-c’ only works with FTP servers and with HTTP servers
+ that support the ‘Range’ header.
+
+‘--start-pos=OFFSET’
+ Start downloading at zero-based position OFFSET. Offset may be
+ expressed in bytes, kilobytes with the ‘k’ suffix, or megabytes
+ with the ‘m’ suffix, etc.
+
+ ‘--start-pos’ has higher precedence over ‘--continue’. When
+ ‘--start-pos’ and ‘--continue’ are both specified, wget will emit a
+ warning then proceed as if ‘--continue’ was absent.
+
+ Server support for continued download is required, otherwise
+ ‘--start-pos’ cannot help. See ‘-c’ for details.
+
+‘--progress=TYPE’
+ Select the type of the progress indicator you wish to use. Legal
+ indicators are “dot” and “bar”.
+
+ The “bar” indicator is used by default. It draws an ASCII progress
+ bar graphics (a.k.a “thermometer” display) indicating the status of
+ retrieval. If the output is not a TTY, the “dot” bar will be used
+ by default.
+
+ Use ‘--progress=dot’ to switch to the “dot” display. It traces the
+ retrieval by printing dots on the screen, each dot representing a
+ fixed amount of downloaded data.
+
+ The progress TYPE can also take one or more parameters. The
+ parameters vary based on the TYPE selected. Parameters to TYPE are
+ passed by appending them to the type sperated by a colon (:) like
+ this: ‘--progress=TYPE:PARAMETER1:PARAMETER2’.
+
+ When using the dotted retrieval, you may set the “style” by
+ specifying the type as ‘dot:STYLE’. Different styles assign
+ different meaning to one dot. With the ‘default’ style each dot
+ represents 1K, there are ten dots in a cluster and 50 dots in a
+ line. The ‘binary’ style has a more “computer”-like orientation—8K
+ dots, 16-dots clusters and 48 dots per line (which makes for 384K
+ lines). The ‘mega’ style is suitable for downloading large
+ files—each dot represents 64K retrieved, there are eight dots in a
+ cluster, and 48 dots on each line (so each line contains 3M). If
+ ‘mega’ is not enough then you can use the ‘giga’ style—each dot
+ represents 1M retrieved, there are eight dots in a cluster, and 32
+ dots on each line (so each line contains 32M).
+
+ With ‘--progress=bar’, there are currently two possible parameters,
+ FORCE and NOSCROLL.
+
+ When the output is not a TTY, the progress bar always falls back to
+ “dot”, even if ‘--progress=bar’ was passed to Wget during
+ invocation. This behaviour can be overridden and the “bar” output
+ forced by using the “force” parameter as ‘--progress=bar:force’.
+
+ By default, the ‘bar’ style progress bar scroll the name of the
+ file from left to right for the file being downloaded if the
+ filename exceeds the maximum length allotted for its display. In
+ certain cases, such as with ‘--progress=bar:force’, one may not
+ want the scrolling filename in the progress bar. By passing the
+ “noscroll” parameter, Wget can be forced to display as much of the
+ filename as possible without scrolling through it.
+
+ Note that you can set the default style using the ‘progress’
+ command in ‘.wgetrc’. That setting may be overridden from the
+ command line. For example, to force the bar output without
+ scrolling, use ‘--progress=bar:force:noscroll’.
+
+‘--show-progress’
+ Force wget to display the progress bar in any verbosity.
+
+ By default, wget only displays the progress bar in verbose mode.
+ One may however, want wget to display the progress bar on screen in
+ conjunction with any other verbosity modes like ‘--no-verbose’ or
+ ‘--quiet’. This is often a desired a property when invoking wget
+ to download several small/large files. In such a case, wget could
+ simply be invoked with this parameter to get a much cleaner output
+ on the screen.
+
+ This option will also force the progress bar to be printed to
+ ‘stderr’ when used alongside the ‘--logfile’ option.
+
+‘-N’
+‘--timestamping’
+ Turn on time-stamping. *Note Time-Stamping::, for details.
+
+‘--no-if-modified-since’
+ Do not send If-Modified-Since header in ‘-N’ mode. Send
+ preliminary HEAD request instead. This has only effect in ‘-N’
+ mode.
+
+‘--no-use-server-timestamps’
+ Don’t set the local file’s timestamp by the one on the server.
+
+ By default, when a file is downloaded, its timestamps are set to
+ match those from the remote file. This allows the use of
+ ‘--timestamping’ on subsequent invocations of wget. However, it is
+ sometimes useful to base the local file’s timestamp on when it was
+ actually downloaded; for that purpose, the
+ ‘--no-use-server-timestamps’ option has been provided.
+
+‘-S’
+‘--server-response’
+ Print the headers sent by HTTP servers and responses sent by FTP
+ servers.
+
+‘--spider’
+ When invoked with this option, Wget will behave as a Web “spider”,
+ which means that it will not download the pages, just check that
+ they are there. For example, you can use Wget to check your
+ bookmarks:
+
+ wget --spider --force-html -i bookmarks.html
+
+ This feature needs much more work for Wget to get close to the
+ functionality of real web spiders.
+
+‘-T seconds’
+‘--timeout=SECONDS’
+ Set the network timeout to SECONDS seconds. This is equivalent to
+ specifying ‘--dns-timeout’, ‘--connect-timeout’, and
+ ‘--read-timeout’, all at the same time.
+
+ When interacting with the network, Wget can check for timeout and
+ abort the operation if it takes too long. This prevents anomalies
+ like hanging reads and infinite connects. The only timeout enabled
+ by default is a 900-second read timeout. Setting a timeout to 0
+ disables it altogether. Unless you know what you are doing, it is
+ best not to change the default timeout settings.
+
+ All timeout-related options accept decimal values, as well as
+ subsecond values. For example, ‘0.1’ seconds is a legal (though
+ unwise) choice of timeout. Subsecond timeouts are useful for
+ checking server response times or for testing network latency.
+
+‘--dns-timeout=SECONDS’
+ Set the DNS lookup timeout to SECONDS seconds. DNS lookups that
+ don’t complete within the specified time will fail. By default,
+ there is no timeout on DNS lookups, other than that implemented by
+ system libraries.
+
+‘--connect-timeout=SECONDS’
+ Set the connect timeout to SECONDS seconds. TCP connections that
+ take longer to establish will be aborted. By default, there is no
+ connect timeout, other than that implemented by system libraries.
+
+‘--read-timeout=SECONDS’
+ Set the read (and write) timeout to SECONDS seconds. The “time” of
+ this timeout refers to “idle time”: if, at any point in the
+ download, no data is received for more than the specified number of
+ seconds, reading fails and the download is restarted. This option
+ does not directly affect the duration of the entire download.
+
+ Of course, the remote server may choose to terminate the connection
+ sooner than this option requires. The default read timeout is 900
+ seconds.
+
+‘--limit-rate=AMOUNT’
+ Limit the download speed to AMOUNT bytes per second. Amount may be
+ expressed in bytes, kilobytes with the ‘k’ suffix, or megabytes
+ with the ‘m’ suffix. For example, ‘--limit-rate=20k’ will limit
+ the retrieval rate to 20KB/s. This is useful when, for whatever
+ reason, you don’t want Wget to consume the entire available
+ bandwidth.
+
+ This option allows the use of decimal numbers, usually in
+ conjunction with power suffixes; for example, ‘--limit-rate=2.5k’
+ is a legal value.
+
+ Note that Wget implements the limiting by sleeping the appropriate
+ amount of time after a network read that took less time than
+ specified by the rate. Eventually this strategy causes the TCP
+ transfer to slow down to approximately the specified rate.
+ However, it may take some time for this balance to be achieved, so
+ don’t be surprised if limiting the rate doesn’t work well with very
+ small files.
+
+‘-w SECONDS’
+‘--wait=SECONDS’
+ Wait the specified number of seconds between the retrievals. Use
+ of this option is recommended, as it lightens the server load by
+ making the requests less frequent. Instead of in seconds, the time
+ can be specified in minutes using the ‘m’ suffix, in hours using
+ ‘h’ suffix, or in days using ‘d’ suffix.
+
+ Specifying a large value for this option is useful if the network
+ or the destination host is down, so that Wget can wait long enough
+ to reasonably expect the network error to be fixed before the
+ retry. The waiting interval specified by this function is
+ influenced by ‘--random-wait’, which see.
+
+‘--waitretry=SECONDS’
+ If you don’t want Wget to wait between _every_ retrieval, but only
+ between retries of failed downloads, you can use this option. Wget
+ will use “linear backoff”, waiting 1 second after the first failure
+ on a given file, then waiting 2 seconds after the second failure on
+ that file, up to the maximum number of SECONDS you specify.
+
+ By default, Wget will assume a value of 10 seconds.
+
+‘--random-wait’
+ Some web sites may perform log analysis to identify retrieval
+ programs such as Wget by looking for statistically significant
+ similarities in the time between requests. This option causes the
+ time between requests to vary between 0.5 and 1.5 * WAIT seconds,
+ where WAIT was specified using the ‘--wait’ option, in order to
+ mask Wget’s presence from such analysis.
+
+ A 2001 article in a publication devoted to development on a popular
+ consumer platform provided code to perform this analysis on the
+ fly. Its author suggested blocking at the class C address level to
+ ensure automated retrieval programs were blocked despite changing
+ DHCP-supplied addresses.
+
+ The ‘--random-wait’ option was inspired by this ill-advised
+ recommendation to block many unrelated users from a web site due to
+ the actions of one.
+
+‘--no-proxy’
+ Don’t use proxies, even if the appropriate ‘*_proxy’ environment
+ variable is defined.
+
+ *Note Proxies::, for more information about the use of proxies with
+ Wget.
+
+‘-Q QUOTA’
+‘--quota=QUOTA’
+ Specify download quota for automatic retrievals. The value can be
+ specified in bytes (default), kilobytes (with ‘k’ suffix), or
+ megabytes (with ‘m’ suffix).
+
+ Note that quota will never affect downloading a single file. So if
+ you specify ‘wget -Q10k https://example.com/ls-lR.gz’, all of the
+ ‘ls-lR.gz’ will be downloaded. The same goes even when several
+ URLs are specified on the command-line. However, quota is
+ respected when retrieving either recursively, or from an input
+ file. Thus you may safely type ‘wget -Q2m -i sites’—download will
+ be aborted when the quota is exceeded.
+
+ Setting quota to 0 or to ‘inf’ unlimits the download quota.
+
+‘--no-dns-cache’
+ Turn off caching of DNS lookups. Normally, Wget remembers the IP
+ addresses it looked up from DNS so it doesn’t have to repeatedly
+ contact the DNS server for the same (typically small) set of hosts
+ it retrieves from. This cache exists in memory only; a new Wget
+ run will contact DNS again.
+
+ However, it has been reported that in some situations it is not
+ desirable to cache host names, even for the duration of a
+ short-running application like Wget. With this option Wget issues
+ a new DNS lookup (more precisely, a new call to ‘gethostbyname’ or
+ ‘getaddrinfo’) each time it makes a new connection. Please note
+ that this option will _not_ affect caching that might be performed
+ by the resolving library or by an external caching layer, such as
+ NSCD.
+
+ If you don’t understand exactly what this option does, you probably
+ won’t need it.
+
+‘--restrict-file-names=MODES’
+ Change which characters found in remote URLs must be escaped during
+ generation of local filenames. Characters that are “restricted” by
+ this option are escaped, i.e. replaced with ‘%HH’, where ‘HH’ is
+ the hexadecimal number that corresponds to the restricted
+ character. This option may also be used to force all alphabetical
+ cases to be either lower- or uppercase.
+
+ By default, Wget escapes the characters that are not valid or safe
+ as part of file names on your operating system, as well as control
+ characters that are typically unprintable. This option is useful
+ for changing these defaults, perhaps because you are downloading to
+ a non-native partition, or because you want to disable escaping of
+ the control characters, or you want to further restrict characters
+ to only those in the ASCII range of values.
+
+ The MODES are a comma-separated set of text values. The acceptable
+ values are ‘unix’, ‘windows’, ‘nocontrol’, ‘ascii’, ‘lowercase’,
+ and ‘uppercase’. The values ‘unix’ and ‘windows’ are mutually
+ exclusive (one will override the other), as are ‘lowercase’ and
+ ‘uppercase’. Those last are special cases, as they do not change
+ the set of characters that would be escaped, but rather force local
+ file paths to be converted either to lower- or uppercase.
+
+ When “unix” is specified, Wget escapes the character ‘/’ and the
+ control characters in the ranges 0–31 and 128–159. This is the
+ default on Unix-like operating systems.
+
+ When “windows” is given, Wget escapes the characters ‘\’, ‘|’, ‘/’,
+ ‘:’, ‘?’, ‘"’, ‘*’, ‘<’, ‘>’, and the control characters in the
+ ranges 0–31 and 128–159. In addition to this, Wget in Windows mode
+ uses ‘+’ instead of ‘:’ to separate host and port in local file
+ names, and uses ‘@’ instead of ‘?’ to separate the query portion of
+ the file name from the rest. Therefore, a URL that would be saved
+ as ‘www.xemacs.org:4300/search.pl?input=blah’ in Unix mode would be
+ saved as ‘www.xemacs.org+4300/search.pl@input=blah’ in Windows
+ mode. This mode is the default on Windows.
+
+ If you specify ‘nocontrol’, then the escaping of the control
+ characters is also switched off. This option may make sense when
+ you are downloading URLs whose names contain UTF-8 characters, on a
+ system which can save and display filenames in UTF-8 (some possible
+ byte values used in UTF-8 byte sequences fall in the range of
+ values designated by Wget as “controls”).
+
+ The ‘ascii’ mode is used to specify that any bytes whose values are
+ outside the range of ASCII characters (that is, greater than 127)
+ shall be escaped. This can be useful when saving filenames whose
+ encoding does not match the one used locally.
+
+‘-4’
+‘--inet4-only’
+‘-6’
+‘--inet6-only’
+ Force connecting to IPv4 or IPv6 addresses. With ‘--inet4-only’ or
+ ‘-4’, Wget will only connect to IPv4 hosts, ignoring AAAA records
+ in DNS, and refusing to connect to IPv6 addresses specified in
+ URLs. Conversely, with ‘--inet6-only’ or ‘-6’, Wget will only
+ connect to IPv6 hosts and ignore A records and IPv4 addresses.
+
+ Neither options should be needed normally. By default, an
+ IPv6-aware Wget will use the address family specified by the host’s
+ DNS record. If the DNS responds with both IPv4 and IPv6 addresses,
+ Wget will try them in sequence until it finds one it can connect
+ to. (Also see ‘--prefer-family’ option described below.)
+
+ These options can be used to deliberately force the use of IPv4 or
+ IPv6 address families on dual family systems, usually to aid
+ debugging or to deal with broken network configuration. Only one
+ of ‘--inet6-only’ and ‘--inet4-only’ may be specified at the same
+ time. Neither option is available in Wget compiled without IPv6
+ support.
+
+‘--prefer-family=none/IPv4/IPv6’
+ When given a choice of several addresses, connect to the addresses
+ with specified address family first. The address order returned by
+ DNS is used without change by default.
+
+ This avoids spurious errors and connect attempts when accessing
+ hosts that resolve to both IPv6 and IPv4 addresses from IPv4
+ networks. For example, ‘www.kame.net’ resolves to
+ ‘2001:200:0:8002:203:47ff:fea5:3085’ and to ‘203.178.141.194’.
+ When the preferred family is ‘IPv4’, the IPv4 address is used
+ first; when the preferred family is ‘IPv6’, the IPv6 address is
+ used first; if the specified value is ‘none’, the address order
+ returned by DNS is used without change.
+
+ Unlike ‘-4’ and ‘-6’, this option doesn’t inhibit access to any
+ address family, it only changes the _order_ in which the addresses
+ are accessed. Also note that the reordering performed by this
+ option is “stable”—it doesn’t affect order of addresses of the same
+ family. That is, the relative order of all IPv4 addresses and of
+ all IPv6 addresses remains intact in all cases.
+
+‘--retry-connrefused’
+ Consider “connection refused” a transient error and try again.
+ Normally Wget gives up on a URL when it is unable to connect to the
+ site because failure to connect is taken as a sign that the server
+ is not running at all and that retries would not help. This option
+ is for mirroring unreliable sites whose servers tend to disappear
+ for short periods of time.
+
+‘--user=USER’
+‘--password=PASSWORD’
+ Specify the username USER and password PASSWORD for both FTP and
+ HTTP file retrieval. These parameters can be overridden using the
+ ‘--ftp-user’ and ‘--ftp-password’ options for FTP connections and
+ the ‘--http-user’ and ‘--http-password’ options for HTTP
+ connections.
+
+‘--ask-password’
+ Prompt for a password for each connection established. Cannot be
+ specified when ‘--password’ is being used, because they are
+ mutually exclusive.
+
+‘--use-askpass=COMMAND’
+ Prompt for a user and password using the specified command. If no
+ command is specified then the command in the environment variable
+ WGET_ASKPASS is used. If WGET_ASKPASS is not set then the command
+ in the environment variable SSH_ASKPASS is used.
+
+ You can set the default command for use-askpass in the ‘.wgetrc’.
+ That setting may be overridden from the command line.
+
+‘--no-iri’
+
+ Turn off internationalized URI (IRI) support. Use ‘--iri’ to turn
+ it on. IRI support is activated by default.
+
+ You can set the default state of IRI support using the ‘iri’
+ command in ‘.wgetrc’. That setting may be overridden from the
+ command line.
+
+‘--local-encoding=ENCODING’
+
+ Force Wget to use ENCODING as the default system encoding. That
+ affects how Wget converts URLs specified as arguments from locale
+ to UTF-8 for IRI support.
+
+ Wget use the function ‘nl_langinfo()’ and then the ‘CHARSET’
+ environment variable to get the locale. If it fails, ASCII is
+ used.
+
+ You can set the default local encoding using the ‘local_encoding’
+ command in ‘.wgetrc’. That setting may be overridden from the
+ command line.
+
+‘--remote-encoding=ENCODING’
+
+ Force Wget to use ENCODING as the default remote server encoding.
+ That affects how Wget converts URIs found in files from remote
+ encoding to UTF-8 during a recursive fetch. This options is only
+ useful for IRI support, for the interpretation of non-ASCII
+ characters.
+
+ For HTTP, remote encoding can be found in HTTP ‘Content-Type’
+ header and in HTML ‘Content-Type http-equiv’ meta tag.
+
+ You can set the default encoding using the ‘remoteencoding’ command
+ in ‘.wgetrc’. That setting may be overridden from the command
+ line.
+
+‘--unlink’
+
+ Force Wget to unlink file instead of clobbering existing file.
+ This option is useful for downloading to the directory with
+ hardlinks.
+
+
+File: wget.info, Node: Directory Options, Next: HTTP Options, Prev: Download Options, Up: Invoking
+
+2.6 Directory Options
+=====================
+
+‘-nd’
+‘--no-directories’
+ Do not create a hierarchy of directories when retrieving
+ recursively. With this option turned on, all files will get saved
+ to the current directory, without clobbering (if a name shows up
+ more than once, the filenames will get extensions ‘.n’).
+
+‘-x’
+‘--force-directories’
+ The opposite of ‘-nd’—create a hierarchy of directories, even if
+ one would not have been created otherwise. E.g. ‘wget -x
+ http://fly.srk.fer.hr/robots.txt’ will save the downloaded file to
+ ‘fly.srk.fer.hr/robots.txt’.
+
+‘-nH’
+‘--no-host-directories’
+ Disable generation of host-prefixed directories. By default,
+ invoking Wget with ‘-r http://fly.srk.fer.hr/’ will create a
+ structure of directories beginning with ‘fly.srk.fer.hr/’. This
+ option disables such behavior.
+
+‘--protocol-directories’
+ Use the protocol name as a directory component of local file names.
+ For example, with this option, ‘wget -r http://HOST’ will save to
+ ‘http/HOST/...’ rather than just to ‘HOST/...’.
+
+‘--cut-dirs=NUMBER’
+ Ignore NUMBER directory components. This is useful for getting a
+ fine-grained control over the directory where recursive retrieval
+ will be saved.
+
+ Take, for example, the directory at
+ ‘ftp://ftp.xemacs.org/pub/xemacs/’. If you retrieve it with ‘-r’,
+ it will be saved locally under ‘ftp.xemacs.org/pub/xemacs/’. While
+ the ‘-nH’ option can remove the ‘ftp.xemacs.org/’ part, you are
+ still stuck with ‘pub/xemacs’. This is where ‘--cut-dirs’ comes in
+ handy; it makes Wget not “see” NUMBER remote directory components.
+ Here are several examples of how ‘--cut-dirs’ option works.
+
+ No options -> ftp.xemacs.org/pub/xemacs/
+ -nH -> pub/xemacs/
+ -nH --cut-dirs=1 -> xemacs/
+ -nH --cut-dirs=2 -> .
+
+ --cut-dirs=1 -> ftp.xemacs.org/xemacs/
+ ...
+
+ If you just want to get rid of the directory structure, this option
+ is similar to a combination of ‘-nd’ and ‘-P’. However, unlike
+ ‘-nd’, ‘--cut-dirs’ does not lose with subdirectories—for instance,
+ with ‘-nH --cut-dirs=1’, a ‘beta/’ subdirectory will be placed to
+ ‘xemacs/beta’, as one would expect.
+
+‘-P PREFIX’
+‘--directory-prefix=PREFIX’
+ Set directory prefix to PREFIX. The “directory prefix” is the
+ directory where all other files and subdirectories will be saved
+ to, i.e. the top of the retrieval tree. The default is ‘.’ (the
+ current directory).
+
+
+File: wget.info, Node: HTTP Options, Next: HTTPS (SSL/TLS) Options, Prev: Directory Options, Up: Invoking
+
+2.7 HTTP Options
+================
+
+‘--default-page=NAME’
+ Use NAME as the default file name when it isn’t known (i.e., for
+ URLs that end in a slash), instead of ‘index.html’.
+
+‘-E’
+‘--adjust-extension’
+ If a file of type ‘application/xhtml+xml’ or ‘text/html’ is
+ downloaded and the URL does not end with the regexp
+ ‘\.[Hh][Tt][Mm][Ll]?’, this option will cause the suffix ‘.html’ to
+ be appended to the local filename. This is useful, for instance,
+ when you’re mirroring a remote site that uses ‘.asp’ pages, but you
+ want the mirrored pages to be viewable on your stock Apache server.
+ Another good use for this is when you’re downloading CGI-generated
+ materials. A URL like ‘http://site.com/article.cgi?25’ will be
+ saved as ‘article.cgi?25.html’.
+
+ Note that filenames changed in this way will be re-downloaded every
+ time you re-mirror a site, because Wget can’t tell that the local
+ ‘X.html’ file corresponds to remote URL ‘X’ (since it doesn’t yet
+ know that the URL produces output of type ‘text/html’ or
+ ‘application/xhtml+xml’.
+
+ As of version 1.12, Wget will also ensure that any downloaded files
+ of type ‘text/css’ end in the suffix ‘.css’, and the option was
+ renamed from ‘--html-extension’, to better reflect its new
+ behavior. The old option name is still acceptable, but should now
+ be considered deprecated.
+
+ As of version 1.19.2, Wget will also ensure that any downloaded
+ files with a ‘Content-Encoding’ of ‘br’, ‘compress’, ‘deflate’ or
+ ‘gzip’ end in the suffix ‘.br’, ‘.Z’, ‘.zlib’ and ‘.gz’
+ respectively.
+
+ At some point in the future, this option may well be expanded to
+ include suffixes for other types of content, including content
+ types that are not parsed by Wget.
+
+‘--http-user=USER’
+‘--http-password=PASSWORD’
+ Specify the username USER and password PASSWORD on an HTTP server.
+ According to the type of the challenge, Wget will encode them using
+ either the ‘basic’ (insecure), the ‘digest’, or the Windows ‘NTLM’
+ authentication scheme.
+
+ Another way to specify username and password is in the URL itself
+ (*note URL Format::). Either method reveals your password to
+ anyone who bothers to run ‘ps’. To prevent the passwords from
+ being seen, use the ‘--use-askpass’ or store them in ‘.wgetrc’ or
+ ‘.netrc’, and make sure to protect those files from other users
+ with ‘chmod’. If the passwords are really important, do not leave
+ them lying in those files either—edit the files and delete them
+ after Wget has started the download.
+
+‘--no-http-keep-alive’
+ Turn off the “keep-alive” feature for HTTP downloads. Normally,
+ Wget asks the server to keep the connection open so that, when you
+ download more than one document from the same server, they get
+ transferred over the same TCP connection. This saves time and at
+ the same time reduces the load on the server.
+
+ This option is useful when, for some reason, persistent
+ (keep-alive) connections don’t work for you, for example due to a
+ server bug or due to the inability of server-side scripts to cope
+ with the connections.
+
+‘--no-cache’
+ Disable server-side cache. In this case, Wget will send the remote
+ server an appropriate directive (‘Pragma: no-cache’) to get the
+ file from the remote service, rather than returning the cached
+ version. This is especially useful for retrieving and flushing
+ out-of-date documents on proxy servers.
+
+ Caching is allowed by default.
+
+‘--no-cookies’
+ Disable the use of cookies. Cookies are a mechanism for
+ maintaining server-side state. The server sends the client a
+ cookie using the ‘Set-Cookie’ header, and the client responds with
+ the same cookie upon further requests. Since cookies allow the
+ server owners to keep track of visitors and for sites to exchange
+ this information, some consider them a breach of privacy. The
+ default is to use cookies; however, _storing_ cookies is not on by
+ default.
+
+‘--load-cookies FILE’
+ Load cookies from FILE before the first HTTP retrieval. FILE is a
+ textual file in the format originally used by Netscape’s
+ ‘cookies.txt’ file.
+
+ You will typically use this option when mirroring sites that
+ require that you be logged in to access some or all of their
+ content. The login process typically works by the web server
+ issuing an HTTP cookie upon receiving and verifying your
+ credentials. The cookie is then resent by the browser when
+ accessing that part of the site, and so proves your identity.
+
+ Mirroring such a site requires Wget to send the same cookies your
+ browser sends when communicating with the site. This is achieved
+ by ‘--load-cookies’—simply point Wget to the location of the
+ ‘cookies.txt’ file, and it will send the same cookies your browser
+ would send in the same situation. Different browsers keep textual
+ cookie files in different locations:
+
+ Netscape 4.x.
+ The cookies are in ‘~/.netscape/cookies.txt’.
+
+ Mozilla and Netscape 6.x.
+ Mozilla’s cookie file is also named ‘cookies.txt’, located
+ somewhere under ‘~/.mozilla’, in the directory of your
+ profile. The full path usually ends up looking somewhat like
+ ‘~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt’.
+
+ Internet Explorer.
+ You can produce a cookie file Wget can use by using the File
+ menu, Import and Export, Export Cookies. This has been tested
+ with Internet Explorer 5; it is not guaranteed to work with
+ earlier versions.
+
+ Other browsers.
+ If you are using a different browser to create your cookies,
+ ‘--load-cookies’ will only work if you can locate or produce a
+ cookie file in the Netscape format that Wget expects.
+
+ If you cannot use ‘--load-cookies’, there might still be an
+ alternative. If your browser supports a “cookie manager”, you can
+ use it to view the cookies used when accessing the site you’re
+ mirroring. Write down the name and value of the cookie, and
+ manually instruct Wget to send those cookies, bypassing the
+ “official” cookie support:
+
+ wget --no-cookies --header "Cookie: NAME=VALUE"
+
+‘--save-cookies FILE’
+ Save cookies to FILE before exiting. This will not save cookies
+ that have expired or that have no expiry time (so-called “session
+ cookies”), but also see ‘--keep-session-cookies’.
+
+‘--keep-session-cookies’
+ When specified, causes ‘--save-cookies’ to also save session
+ cookies. Session cookies are normally not saved because they are
+ meant to be kept in memory and forgotten when you exit the browser.
+ Saving them is useful on sites that require you to log in or to
+ visit the home page before you can access some pages. With this
+ option, multiple Wget runs are considered a single browser session
+ as far as the site is concerned.
+
+ Since the cookie file format does not normally carry session
+ cookies, Wget marks them with an expiry timestamp of 0. Wget’s
+ ‘--load-cookies’ recognizes those as session cookies, but it might
+ confuse other browsers. Also note that cookies so loaded will be
+ treated as other session cookies, which means that if you want
+ ‘--save-cookies’ to preserve them again, you must use
+ ‘--keep-session-cookies’ again.
+
+‘--ignore-length’
+ Unfortunately, some HTTP servers (CGI programs, to be more precise)
+ send out bogus ‘Content-Length’ headers, which makes Wget go wild,
+ as it thinks not all the document was retrieved. You can spot this
+ syndrome if Wget retries getting the same document again and again,
+ each time claiming that the (otherwise normal) connection has
+ closed on the very same byte.
+
+ With this option, Wget will ignore the ‘Content-Length’ header—as
+ if it never existed.
+
+‘--header=HEADER-LINE’
+ Send HEADER-LINE along with the rest of the headers in each HTTP
+ request. The supplied header is sent as-is, which means it must
+ contain name and value separated by colon, and must not contain
+ newlines.
+
+ You may define more than one additional header by specifying
+ ‘--header’ more than once.
+
+ wget --header='Accept-Charset: iso-8859-2' \
+ --header='Accept-Language: hr' \
+ http://fly.srk.fer.hr/
+
+ Specification of an empty string as the header value will clear all
+ previous user-defined headers.
+
+ As of Wget 1.10, this option can be used to override headers
+ otherwise generated automatically. This example instructs Wget to
+ connect to localhost, but to specify ‘foo.bar’ in the ‘Host’
+ header:
+
+ wget --header="Host: foo.bar" http://localhost/
+
+ In versions of Wget prior to 1.10 such use of ‘--header’ caused
+ sending of duplicate headers.
+
+‘--compression=TYPE’
+ Choose the type of compression to be used. Legal values are
+ ‘auto’, ‘gzip’ and ‘none’.
+
+ If ‘auto’ or ‘gzip’ are specified, Wget asks the server to compress
+ the file using the gzip compression format. If the server
+ compresses the file and responds with the ‘Content-Encoding’ header
+ field set appropriately, the file will be decompressed
+ automatically.
+
+ If ‘none’ is specified, wget will not ask the server to compress
+ the file and will not decompress any server responses. This is the
+ default.
+
+ Compression support is currently experimental. In case it is
+ turned on, please report any bugs to ‘bug-wget@gnu.org’.
+
+‘--max-redirect=NUMBER’
+ Specifies the maximum number of redirections to follow for a
+ resource. The default is 20, which is usually far more than
+ necessary. However, on those occasions where you want to allow
+ more (or fewer), this is the option to use.
+
+‘--proxy-user=USER’
+‘--proxy-password=PASSWORD’
+ Specify the username USER and password PASSWORD for authentication
+ on a proxy server. Wget will encode them using the ‘basic’
+ authentication scheme.
+
+ Security considerations similar to those with ‘--http-password’
+ pertain here as well.
+
+‘--referer=URL’
+ Include ‘Referer: URL’ header in HTTP request. Useful for
+ retrieving documents with server-side processing that assume they
+ are always being retrieved by interactive web browsers and only
+ come out properly when Referer is set to one of the pages that
+ point to them.
+
+‘--save-headers’
+ Save the headers sent by the HTTP server to the file, preceding the
+ actual contents, with an empty line as the separator.
+
+‘-U AGENT-STRING’
+‘--user-agent=AGENT-STRING’
+ Identify as AGENT-STRING to the HTTP server.
+
+ The HTTP protocol allows the clients to identify themselves using a
+ ‘User-Agent’ header field. This enables distinguishing the WWW
+ software, usually for statistical purposes or for tracing of
+ protocol violations. Wget normally identifies as ‘Wget/VERSION’,
+ VERSION being the current version number of Wget.
+
+ However, some sites have been known to impose the policy of
+ tailoring the output according to the ‘User-Agent’-supplied
+ information. While this is not such a bad idea in theory, it has
+ been abused by servers denying information to clients other than
+ (historically) Netscape or, more frequently, Microsoft Internet
+ Explorer. This option allows you to change the ‘User-Agent’ line
+ issued by Wget. Use of this option is discouraged, unless you
+ really know what you are doing.
+
+ Specifying empty user agent with ‘--user-agent=""’ instructs Wget
+ not to send the ‘User-Agent’ header in HTTP requests.
+
+‘--post-data=STRING’
+‘--post-file=FILE’
+ Use POST as the method for all HTTP requests and send the specified
+ data in the request body. ‘--post-data’ sends STRING as data,
+ whereas ‘--post-file’ sends the contents of FILE. Other than that,
+ they work in exactly the same way. In particular, they _both_
+ expect content of the form ‘key1=value1&key2=value2’, with
+ percent-encoding for special characters; the only difference is
+ that one expects its content as a command-line parameter and the
+ other accepts its content from a file. In particular,
+ ‘--post-file’ is _not_ for transmitting files as form attachments:
+ those must appear as ‘key=value’ data (with appropriate
+ percent-coding) just like everything else. Wget does not currently
+ support ‘multipart/form-data’ for transmitting POST data; only
+ ‘application/x-www-form-urlencoded’. Only one of ‘--post-data’ and
+ ‘--post-file’ should be specified.
+
+ Please note that wget does not require the content to be of the
+ form ‘key1=value1&key2=value2’, and neither does it test for it.
+ Wget will simply transmit whatever data is provided to it. Most
+ servers however expect the POST data to be in the above format when
+ processing HTML Forms.
+
+ When sending a POST request using the ‘--post-file’ option, Wget
+ treats the file as a binary file and will send every character in
+ the POST request without stripping trailing newline or formfeed
+ characters. Any other control characters in the text will also be
+ sent as-is in the POST request.
+
+ Please be aware that Wget needs to know the size of the POST data
+ in advance. Therefore the argument to ‘--post-file’ must be a
+ regular file; specifying a FIFO or something like ‘/dev/stdin’
+ won’t work. It’s not quite clear how to work around this
+ limitation inherent in HTTP/1.0. Although HTTP/1.1 introduces
+ “chunked” transfer that doesn’t require knowing the request length
+ in advance, a client can’t use chunked unless it knows it’s talking
+ to an HTTP/1.1 server. And it can’t know that until it receives a
+ response, which in turn requires the request to have been completed
+ – a chicken-and-egg problem.
+
+ Note: As of version 1.15 if Wget is redirected after the POST
+ request is completed, its behaviour will depend on the response
+ code returned by the server. In case of a 301 Moved Permanently,
+ 302 Moved Temporarily or 307 Temporary Redirect, Wget will, in
+ accordance with RFC2616, continue to send a POST request. In case
+ a server wants the client to change the Request method upon
+ redirection, it should send a 303 See Other response code.
+
+ This example shows how to log in to a server using POST and then
+ proceed to download the desired pages, presumably only accessible
+ to authorized users:
+
+ # Log in to the server. This can be done only once.
+ wget --save-cookies cookies.txt \
+ --post-data 'user=foo&password=bar' \
+ http://example.com/auth.php
+
+ # Now grab the page or pages we care about.
+ wget --load-cookies cookies.txt \
+ -p http://example.com/interesting/article.php
+
+ If the server is using session cookies to track user
+ authentication, the above will not work because ‘--save-cookies’
+ will not save them (and neither will browsers) and the
+ ‘cookies.txt’ file will be empty. In that case use
+ ‘--keep-session-cookies’ along with ‘--save-cookies’ to force
+ saving of session cookies.
+
+‘--method=HTTP-METHOD’
+ For the purpose of RESTful scripting, Wget allows sending of other
+ HTTP Methods without the need to explicitly set them using
+ ‘--header=Header-Line’. Wget will use whatever string is passed to
+ it after ‘--method’ as the HTTP Method to the server.
+
+‘--body-data=DATA-STRING’
+‘--body-file=DATA-FILE’
+ Must be set when additional data needs to be sent to the server
+ along with the Method specified using ‘--method’. ‘--body-data’
+ sends STRING as data, whereas ‘--body-file’ sends the contents of
+ FILE. Other than that, they work in exactly the same way.
+
+ Currently, ‘--body-file’ is _not_ for transmitting files as a
+ whole. Wget does not currently support ‘multipart/form-data’ for
+ transmitting data; only ‘application/x-www-form-urlencoded’. In
+ the future, this may be changed so that wget sends the
+ ‘--body-file’ as a complete file instead of sending its contents to
+ the server. Please be aware that Wget needs to know the contents
+ of BODY Data in advance, and hence the argument to ‘--body-file’
+ should be a regular file. See ‘--post-file’ for a more detailed
+ explanation. Only one of ‘--body-data’ and ‘--body-file’ should be
+ specified.
+
+ If Wget is redirected after the request is completed, Wget will
+ suspend the current method and send a GET request till the
+ redirection is completed. This is true for all redirection
+ response codes except 307 Temporary Redirect which is used to
+ explicitly specify that the request method should _not_ change.
+ Another exception is when the method is set to ‘POST’, in which
+ case the redirection rules specified under ‘--post-data’ are
+ followed.
+
+‘--content-disposition’
+
+ If this is set to on, experimental (not fully-functional) support
+ for ‘Content-Disposition’ headers is enabled. This can currently
+ result in extra round-trips to the server for a ‘HEAD’ request, and
+ is known to suffer from a few bugs, which is why it is not
+ currently enabled by default.
+
+ This option is useful for some file-downloading CGI programs that
+ use ‘Content-Disposition’ headers to describe what the name of a
+ downloaded file should be.
+
+ When combined with ‘--metalink-over-http’ and
+ ‘--trust-server-names’, a ‘Content-Type: application/metalink4+xml’
+ file is named using the ‘Content-Disposition’ filename field, if
+ available.
+
+‘--content-on-error’
+
+ If this is set to on, wget will not skip the content when the
+ server responds with a http status code that indicates error.
+
+‘--trust-server-names’
+
+ If this is set, on a redirect, the local file name will be based on
+ the redirection URL. By default the local file name is based on the
+ original URL. When doing recursive retrieving this can be helpful
+ because in many web sites redirected URLs correspond to an
+ underlying file structure, while link URLs do not.
+
+‘--auth-no-challenge’
+
+ If this option is given, Wget will send Basic HTTP authentication
+ information (plaintext username and password) for all requests,
+ just like Wget 1.10.2 and prior did by default.
+
+ Use of this option is not recommended, and is intended only to
+ support some few obscure servers, which never send HTTP
+ authentication challenges, but accept unsolicited auth info, say,
+ in addition to form-based authentication.
+
+‘--retry-on-host-error’
+ Consider host errors, such as “Temporary failure in name
+ resolution”, as non-fatal, transient errors.
+
+‘--retry-on-http-error=CODE[,CODE,...]’
+ Consider given HTTP response codes as non-fatal, transient errors.
+ Supply a comma-separated list of 3-digit HTTP response codes as
+ argument. Useful to work around special circumstances where
+ retries are required, but the server responds with an error code
+ normally not retried by Wget. Such errors might be 503 (Service
+ Unavailable) and 429 (Too Many Requests). Retries enabled by this
+ option are performed subject to the normal retry timing and retry
+ count limitations of Wget.
+
+ Using this option is intended to support special use cases only and
+ is generally not recommended, as it can force retries even in cases
+ where the server is actually trying to decrease its load. Please
+ use wisely and only if you know what you are doing.
+
+
+File: wget.info, Node: HTTPS (SSL/TLS) Options, Next: FTP Options, Prev: HTTP Options, Up: Invoking
+
+2.8 HTTPS (SSL/TLS) Options
+===========================
+
+To support encrypted HTTP (HTTPS) downloads, Wget must be compiled with
+an external SSL library. The current default is GnuTLS. In addition,
+Wget also supports HSTS (HTTP Strict Transport Security). If Wget is
+compiled without SSL support, none of these options are available.
+
+‘--secure-protocol=PROTOCOL’
+ Choose the secure protocol to be used. Legal values are ‘auto’,
+ ‘SSLv2’, ‘SSLv3’, ‘TLSv1’, ‘TLSv1_1’, ‘TLSv1_2’, ‘TLSv1_3’ and
+ ‘PFS’. If ‘auto’ is used, the SSL library is given the liberty of
+ choosing the appropriate protocol automatically, which is achieved
+ by sending a TLSv1 greeting. This is the default.
+
+ Specifying ‘SSLv2’, ‘SSLv3’, ‘TLSv1’, ‘TLSv1_1’, ‘TLSv1_2’ or
+ ‘TLSv1_3’ forces the use of the corresponding protocol. This is
+ useful when talking to old and buggy SSL server implementations
+ that make it hard for the underlying SSL library to choose the
+ correct protocol version. Fortunately, such servers are quite
+ rare.
+
+ Specifying ‘PFS’ enforces the use of the so-called Perfect Forward
+ Security cipher suites. In short, PFS adds security by creating a
+ one-time key for each SSL connection. It has a bit more CPU impact
+ on client and server. We use known to be secure ciphers (e.g. no
+ MD4) and the TLS protocol. This mode also explicitly excludes
+ non-PFS key exchange methods, such as RSA.
+
+‘--https-only’
+ When in recursive mode, only HTTPS links are followed.
+
+‘--ciphers’
+ Set the cipher list string. Typically this string sets the cipher
+ suites and other SSL/TLS options that the user wish should be used,
+ in a set order of preference (GnuTLS calls it ’priority string’).
+ This string will be fed verbatim to the SSL/TLS engine (OpenSSL or
+ GnuTLS) and hence its format and syntax is dependant on that. Wget
+ will not process or manipulate it in any way. Refer to the OpenSSL
+ or GnuTLS documentation for more information.
+
+‘--no-check-certificate’
+ Don’t check the server certificate against the available
+ certificate authorities. Also don’t require the URL host name to
+ match the common name presented by the certificate.
+
+ As of Wget 1.10, the default is to verify the server’s certificate
+ against the recognized certificate authorities, breaking the SSL
+ handshake and aborting the download if the verification fails.
+ Although this provides more secure downloads, it does break
+ interoperability with some sites that worked with previous Wget
+ versions, particularly those using self-signed, expired, or
+ otherwise invalid certificates. This option forces an “insecure”
+ mode of operation that turns the certificate verification errors
+ into warnings and allows you to proceed.
+
+ If you encounter “certificate verification” errors or ones saying
+ that “common name doesn’t match requested host name”, you can use
+ this option to bypass the verification and proceed with the
+ download. _Only use this option if you are otherwise convinced of
+ the site’s authenticity, or if you really don’t care about the
+ validity of its certificate._ It is almost always a bad idea not
+ to check the certificates when transmitting confidential or
+ important data. For self-signed/internal certificates, you should
+ download the certificate and verify against that instead of forcing
+ this insecure mode. If you are really sure of not desiring any
+ certificate verification, you can specify –check-certificate=quiet
+ to tell wget to not print any warning about invalid certificates,
+ albeit in most cases this is the wrong thing to do.
+
+‘--certificate=FILE’
+ Use the client certificate stored in FILE. This is needed for
+ servers that are configured to require certificates from the
+ clients that connect to them. Normally a certificate is not
+ required and this switch is optional.
+
+‘--certificate-type=TYPE’
+ Specify the type of the client certificate. Legal values are ‘PEM’
+ (assumed by default) and ‘DER’, also known as ‘ASN1’.
+
+‘--private-key=FILE’
+ Read the private key from FILE. This allows you to provide the
+ private key in a file separate from the certificate.
+
+‘--private-key-type=TYPE’
+ Specify the type of the private key. Accepted values are ‘PEM’
+ (the default) and ‘DER’.
+
+‘--ca-certificate=FILE’
+ Use FILE as the file with the bundle of certificate authorities
+ (“CA”) to verify the peers. The certificates must be in PEM
+ format.
+
+ Without this option Wget looks for CA certificates at the
+ system-specified locations, chosen at OpenSSL installation time.
+
+‘--ca-directory=DIRECTORY’
+ Specifies directory containing CA certificates in PEM format. Each
+ file contains one CA certificate, and the file name is based on a
+ hash value derived from the certificate. This is achieved by
+ processing a certificate directory with the ‘c_rehash’ utility
+ supplied with OpenSSL. Using ‘--ca-directory’ is more efficient
+ than ‘--ca-certificate’ when many certificates are installed
+ because it allows Wget to fetch certificates on demand.
+
+ Without this option Wget looks for CA certificates at the
+ system-specified locations, chosen at OpenSSL installation time.
+
+‘--crl-file=FILE’
+ Specifies a CRL file in FILE. This is needed for certificates that
+ have been revocated by the CAs.
+
+‘--pinnedpubkey=file/hashes’
+ Tells wget to use the specified public key file (or hashes) to
+ verify the peer. This can be a path to a file which contains a
+ single public key in PEM or DER format, or any number of base64
+ encoded sha256 hashes preceded by “sha256//” and separated by “;”
+
+ When negotiating a TLS or SSL connection, the server sends a
+ certificate indicating its identity. A public key is extracted
+ from this certificate and if it does not exactly match the public
+ key(s) provided to this option, wget will abort the connection
+ before sending or receiving any data.
+
+‘--random-file=FILE’
+ [OpenSSL and LibreSSL only] Use FILE as the source of random data
+ for seeding the pseudo-random number generator on systems without
+ ‘/dev/urandom’.
+
+ On such systems the SSL library needs an external source of
+ randomness to initialize. Randomness may be provided by EGD (see
+ ‘--egd-file’ below) or read from an external source specified by
+ the user. If this option is not specified, Wget looks for random
+ data in ‘$RANDFILE’ or, if that is unset, in ‘$HOME/.rnd’.
+
+ If you’re getting the “Could not seed OpenSSL PRNG; disabling SSL.”
+ error, you should provide random data using some of the methods
+ described above.
+
+‘--egd-file=FILE’
+ [OpenSSL only] Use FILE as the EGD socket. EGD stands for “Entropy
+ Gathering Daemon”, a user-space program that collects data from
+ various unpredictable system sources and makes it available to
+ other programs that might need it. Encryption software, such as
+ the SSL library, needs sources of non-repeating randomness to seed
+ the random number generator used to produce cryptographically
+ strong keys.
+
+ OpenSSL allows the user to specify his own source of entropy using
+ the ‘RAND_FILE’ environment variable. If this variable is unset,
+ or if the specified file does not produce enough randomness,
+ OpenSSL will read random data from EGD socket specified using this
+ option.
+
+ If this option is not specified (and the equivalent startup command
+ is not used), EGD is never contacted. EGD is not needed on modern
+ Unix systems that support ‘/dev/urandom’.
+
+‘--no-hsts’
+ Wget supports HSTS (HTTP Strict Transport Security, RFC 6797) by
+ default. Use ‘--no-hsts’ to make Wget act as a non-HSTS-compliant
+ UA. As a consequence, Wget would ignore all the
+ ‘Strict-Transport-Security’ headers, and would not enforce any
+ existing HSTS policy.
+
+‘--hsts-file=FILE’
+ By default, Wget stores its HSTS database in ‘~/.wget-hsts’. You
+ can use ‘--hsts-file’ to override this. Wget will use the supplied
+ file as the HSTS database. Such file must conform to the correct
+ HSTS database format used by Wget. If Wget cannot parse the
+ provided file, the behaviour is unspecified.
+
+ The Wget’s HSTS database is a plain text file. Each line contains
+ an HSTS entry (ie. a site that has issued a
+ ‘Strict-Transport-Security’ header and that therefore has specified
+ a concrete HSTS policy to be applied). Lines starting with a dash
+ (‘#’) are ignored by Wget. Please note that in spite of this
+ convenient human-readability hand-hacking the HSTS database is
+ generally not a good idea.
+
+ An HSTS entry line consists of several fields separated by one or
+ more whitespace:
+
+ ‘<hostname> SP [<port>] SP <include subdomains> SP <created> SP
+ <max-age>’
+
+ The HOSTNAME and PORT fields indicate the hostname and port to
+ which the given HSTS policy applies. The PORT field may be zero,
+ and it will, in most of the cases. That means that the port number
+ will not be taken into account when deciding whether such HSTS
+ policy should be applied on a given request (only the hostname will
+ be evaluated). When PORT is different to zero, both the target
+ hostname and the port will be evaluated and the HSTS policy will
+ only be applied if both of them match. This feature has been
+ included for testing/development purposes only. The Wget testsuite
+ (in ‘testenv/’) creates HSTS databases with explicit ports with the
+ purpose of ensuring Wget’s correct behaviour. Applying HSTS
+ policies to ports other than the default ones is discouraged by RFC
+ 6797 (see Appendix B "Differences between HSTS Policy and
+ Same-Origin Policy"). Thus, this functionality should not be used
+ in production environments and PORT will typically be zero. The
+ last three fields do what they are expected to. The field
+ INCLUDE_SUBDOMAINS can either be ‘1’ or ‘0’ and it signals whether
+ the subdomains of the target domain should be part of the given
+ HSTS policy as well. The CREATED and MAX-AGE fields hold the
+ timestamp values of when such entry was created (first seen by
+ Wget) and the HSTS-defined value ’max-age’, which states how long
+ should that HSTS policy remain active, measured in seconds elapsed
+ since the timestamp stored in CREATED. Once that time has passed,
+ that HSTS policy will no longer be valid and will eventually be
+ removed from the database.
+
+ If you supply your own HSTS database via ‘--hsts-file’, be aware
+ that Wget may modify the provided file if any change occurs between
+ the HSTS policies requested by the remote servers and those in the
+ file. When Wget exists, it effectively updates the HSTS database
+ by rewriting the database file with the new entries.
+
+ If the supplied file does not exist, Wget will create one. This
+ file will contain the new HSTS entries. If no HSTS entries were
+ generated (no ‘Strict-Transport-Security’ headers were sent by any
+ of the servers) then no file will be created, not even an empty
+ one. This behaviour applies to the default database file
+ (‘~/.wget-hsts’) as well: it will not be created until some server
+ enforces an HSTS policy.
+
+ Care is taken not to override possible changes made by other Wget
+ processes at the same time over the HSTS database. Before dumping
+ the updated HSTS entries on the file, Wget will re-read it and
+ merge the changes.
+
+ Using a custom HSTS database and/or modifying an existing one is
+ discouraged. For more information about the potential security
+ threats arised from such practice, see section 14 "Security
+ Considerations" of RFC 6797, specially section 14.9 "Creative
+ Manipulation of HSTS Policy Store".
+
+‘--warc-file=FILE’
+ Use FILE as the destination WARC file.
+
+‘--warc-header=STRING’
+ Use STRING into as the warcinfo record.
+
+‘--warc-max-size=SIZE’
+ Set the maximum size of the WARC files to SIZE.
+
+‘--warc-cdx’
+ Write CDX index files.
+
+‘--warc-dedup=FILE’
+ Do not store records listed in this CDX file.
+
+‘--no-warc-compression’
+ Do not compress WARC files with GZIP.
+
+‘--no-warc-digests’
+ Do not calculate SHA1 digests.
+
+‘--no-warc-keep-log’
+ Do not store the log file in a WARC record.
+
+‘--warc-tempdir=DIR’
+ Specify the location for temporary files created by the WARC
+ writer.
+
+
+File: wget.info, Node: FTP Options, Next: Recursive Retrieval Options, Prev: HTTPS (SSL/TLS) Options, Up: Invoking
+
+2.9 FTP Options
+===============
+
+‘--ftp-user=USER’
+‘--ftp-password=PASSWORD’
+ Specify the username USER and password PASSWORD on an FTP server.
+ Without this, or the corresponding startup option, the password
+ defaults to ‘-wget@’, normally used for anonymous FTP.
+
+ Another way to specify username and password is in the URL itself
+ (*note URL Format::). Either method reveals your password to
+ anyone who bothers to run ‘ps’. To prevent the passwords from
+ being seen, store them in ‘.wgetrc’ or ‘.netrc’, and make sure to
+ protect those files from other users with ‘chmod’. If the
+ passwords are really important, do not leave them lying in those
+ files either—edit the files and delete them after Wget has started
+ the download.
+
+‘--no-remove-listing’
+ Don’t remove the temporary ‘.listing’ files generated by FTP
+ retrievals. Normally, these files contain the raw directory
+ listings received from FTP servers. Not removing them can be
+ useful for debugging purposes, or when you want to be able to
+ easily check on the contents of remote server directories (e.g. to
+ verify that a mirror you’re running is complete).
+
+ Note that even though Wget writes to a known filename for this
+ file, this is not a security hole in the scenario of a user making
+ ‘.listing’ a symbolic link to ‘/etc/passwd’ or something and asking
+ ‘root’ to run Wget in his or her directory. Depending on the
+ options used, either Wget will refuse to write to ‘.listing’,
+ making the globbing/recursion/time-stamping operation fail, or the
+ symbolic link will be deleted and replaced with the actual
+ ‘.listing’ file, or the listing will be written to a
+ ‘.listing.NUMBER’ file.
+
+ Even though this situation isn’t a problem, though, ‘root’ should
+ never run Wget in a non-trusted user’s directory. A user could do
+ something as simple as linking ‘index.html’ to ‘/etc/passwd’ and
+ asking ‘root’ to run Wget with ‘-N’ or ‘-r’ so the file will be
+ overwritten.
+
+‘--no-glob’
+ Turn off FTP globbing. Globbing refers to the use of shell-like
+ special characters (“wildcards”), like ‘*’, ‘?’, ‘[’ and ‘]’ to
+ retrieve more than one file from the same directory at once, like:
+
+ wget ftp://gnjilux.srk.fer.hr/*.msg
+
+ By default, globbing will be turned on if the URL contains a
+ globbing character. This option may be used to turn globbing on or
+ off permanently.
+
+ You may have to quote the URL to protect it from being expanded by
+ your shell. Globbing makes Wget look for a directory listing,
+ which is system-specific. This is why it currently works only with
+ Unix FTP servers (and the ones emulating Unix ‘ls’ output).
+
+‘--no-passive-ftp’
+ Disable the use of the “passive” FTP transfer mode. Passive FTP
+ mandates that the client connect to the server to establish the
+ data connection rather than the other way around.
+
+ If the machine is connected to the Internet directly, both passive
+ and active FTP should work equally well. Behind most firewall and
+ NAT configurations passive FTP has a better chance of working.
+ However, in some rare firewall configurations, active FTP actually
+ works when passive FTP doesn’t. If you suspect this to be the
+ case, use this option, or set ‘passive_ftp=off’ in your init file.
+
+‘--preserve-permissions’
+ Preserve remote file permissions instead of permissions set by
+ umask.
+
+‘--retr-symlinks’
+ By default, when retrieving FTP directories recursively and a
+ symbolic link is encountered, the symbolic link is traversed and
+ the pointed-to files are retrieved. Currently, Wget does not
+ traverse symbolic links to directories to download them
+ recursively, though this feature may be added in the future.
+
+ When ‘--retr-symlinks=no’ is specified, the linked-to file is not
+ downloaded. Instead, a matching symbolic link is created on the
+ local filesystem. The pointed-to file will not be retrieved unless
+ this recursive retrieval would have encountered it separately and
+ downloaded it anyway. This option poses a security risk where a
+ malicious FTP Server may cause Wget to write to files outside of
+ the intended directories through a specially crafted .LISTING file.
+
+ Note that when retrieving a file (not a directory) because it was
+ specified on the command-line, rather than because it was recursed
+ to, this option has no effect. Symbolic links are always traversed
+ in this case.
+
+2.10 FTPS Options
+=================
+
+‘--ftps-implicit’
+ This option tells Wget to use FTPS implicitly. Implicit FTPS
+ consists of initializing SSL/TLS from the very beginning of the
+ control connection. This option does not send an ‘AUTH TLS’
+ command: it assumes the server speaks FTPS and directly starts an
+ SSL/TLS connection. If the attempt is successful, the session
+ continues just like regular FTPS (‘PBSZ’ and ‘PROT’ are sent,
+ etc.). Implicit FTPS is no longer a requirement for FTPS
+ implementations, and thus many servers may not support it. If
+ ‘--ftps-implicit’ is passed and no explicit port number specified,
+ the default port for implicit FTPS, 990, will be used, instead of
+ the default port for the "normal" (explicit) FTPS which is the same
+ as that of FTP, 21.
+
+‘--no-ftps-resume-ssl’
+ Do not resume the SSL/TLS session in the data channel. When
+ starting a data connection, Wget tries to resume the SSL/TLS
+ session previously started in the control connection. SSL/TLS
+ session resumption avoids performing an entirely new handshake by
+ reusing the SSL/TLS parameters of a previous session. Typically,
+ the FTPS servers want it that way, so Wget does this by default.
+ Under rare circumstances however, one might want to start an
+ entirely new SSL/TLS session in every data connection. This is
+ what ‘--no-ftps-resume-ssl’ is for.
+
+‘--ftps-clear-data-connection’
+ All the data connections will be in plain text. Only the control
+ connection will be under SSL/TLS. Wget will send a ‘PROT C’ command
+ to achieve this, which must be approved by the server.
+
+‘--ftps-fallback-to-ftp’
+ Fall back to FTP if FTPS is not supported by the target server.
+ For security reasons, this option is not asserted by default. The
+ default behaviour is to exit with an error. If a server does not
+ successfully reply to the initial ‘AUTH TLS’ command, or in the
+ case of implicit FTPS, if the initial SSL/TLS connection attempt is
+ rejected, it is considered that such server does not support FTPS.
+
+
+File: wget.info, Node: Recursive Retrieval Options, Next: Recursive Accept/Reject Options, Prev: FTP Options, Up: Invoking
+
+2.11 Recursive Retrieval Options
+================================
+
+‘-r’
+‘--recursive’
+ Turn on recursive retrieving. *Note Recursive Download::, for more
+ details. The default maximum depth is 5.
+
+‘-l DEPTH’
+‘--level=DEPTH’
+ Specify recursion maximum depth level DEPTH (*note Recursive
+ Download::).
+
+‘--delete-after’
+ This option tells Wget to delete every single file it downloads,
+ _after_ having done so. It is useful for pre-fetching popular
+ pages through a proxy, e.g.:
+
+ wget -r -nd --delete-after http://whatever.com/~popular/page/
+
+ The ‘-r’ option is to retrieve recursively, and ‘-nd’ to not create
+ directories.
+
+ Note that ‘--delete-after’ deletes files on the local machine. It
+ does not issue the ‘DELE’ command to remote FTP sites, for
+ instance. Also note that when ‘--delete-after’ is specified,
+ ‘--convert-links’ is ignored, so ‘.orig’ files are simply not
+ created in the first place.
+
+‘-k’
+‘--convert-links’
+ After the download is complete, convert the links in the document
+ to make them suitable for local viewing. This affects not only the
+ visible hyperlinks, but any part of the document that links to
+ external content, such as embedded images, links to style sheets,
+ hyperlinks to non-HTML content, etc.
+
+ Each link will be changed in one of the two ways:
+
+ • The links to files that have been downloaded by Wget will be
+ changed to refer to the file they point to as a relative link.
+
+ Example: if the downloaded file ‘/foo/doc.html’ links to
+ ‘/bar/img.gif’, also downloaded, then the link in ‘doc.html’
+ will be modified to point to ‘../bar/img.gif’. This kind of
+ transformation works reliably for arbitrary combinations of
+ directories.
+
+ • The links to files that have not been downloaded by Wget will
+ be changed to include host name and absolute path of the
+ location they point to.
+
+ Example: if the downloaded file ‘/foo/doc.html’ links to
+ ‘/bar/img.gif’ (or to ‘../bar/img.gif’), then the link in
+ ‘doc.html’ will be modified to point to
+ ‘http://HOSTNAME/bar/img.gif’.
+
+ Because of this, local browsing works reliably: if a linked file
+ was downloaded, the link will refer to its local name; if it was
+ not downloaded, the link will refer to its full Internet address
+ rather than presenting a broken link. The fact that the former
+ links are converted to relative links ensures that you can move the
+ downloaded hierarchy to another directory.
+
+ Note that only at the end of the download can Wget know which links
+ have been downloaded. Because of that, the work done by ‘-k’ will
+ be performed at the end of all the downloads.
+
+‘--convert-file-only’
+ This option converts only the filename part of the URLs, leaving
+ the rest of the URLs untouched. This filename part is sometimes
+ referred to as the "basename", although we avoid that term here in
+ order not to cause confusion.
+
+ It works particularly well in conjunction with
+ ‘--adjust-extension’, although this coupling is not enforced. It
+ proves useful to populate Internet caches with files downloaded
+ from different hosts.
+
+ Example: if some link points to ‘//foo.com/bar.cgi?xyz’ with
+ ‘--adjust-extension’ asserted and its local destination is intended
+ to be ‘./foo.com/bar.cgi?xyz.css’, then the link would be converted
+ to ‘//foo.com/bar.cgi?xyz.css’. Note that only the filename part
+ has been modified. The rest of the URL has been left untouched,
+ including the net path (‘//’) which would otherwise be processed by
+ Wget and converted to the effective scheme (ie. ‘http://’).
+
+‘-K’
+‘--backup-converted’
+ When converting a file, back up the original version with a ‘.orig’
+ suffix. Affects the behavior of ‘-N’ (*note HTTP Time-Stamping
+ Internals::).
+
+‘-m’
+‘--mirror’
+ Turn on options suitable for mirroring. This option turns on
+ recursion and time-stamping, sets infinite recursion depth and
+ keeps FTP directory listings. It is currently equivalent to ‘-r -N
+ -l inf --no-remove-listing’.
+
+‘-p’
+‘--page-requisites’
+ This option causes Wget to download all the files that are
+ necessary to properly display a given HTML page. This includes
+ such things as inlined images, sounds, and referenced stylesheets.
+
+ Ordinarily, when downloading a single HTML page, any requisite
+ documents that may be needed to display it properly are not
+ downloaded. Using ‘-r’ together with ‘-l’ can help, but since Wget
+ does not ordinarily distinguish between external and inlined
+ documents, one is generally left with “leaf documents” that are
+ missing their requisites.
+
+ For instance, say document ‘1.html’ contains an ‘<IMG>’ tag
+ referencing ‘1.gif’ and an ‘<A>’ tag pointing to external document
+ ‘2.html’. Say that ‘2.html’ is similar but that its image is
+ ‘2.gif’ and it links to ‘3.html’. Say this continues up to some
+ arbitrarily high number.
+
+ If one executes the command:
+
+ wget -r -l 2 http://SITE/1.html
+
+ then ‘1.html’, ‘1.gif’, ‘2.html’, ‘2.gif’, and ‘3.html’ will be
+ downloaded. As you can see, ‘3.html’ is without its requisite
+ ‘3.gif’ because Wget is simply counting the number of hops (up to
+ 2) away from ‘1.html’ in order to determine where to stop the
+ recursion. However, with this command:
+
+ wget -r -l 2 -p http://SITE/1.html
+
+ all the above files _and_ ‘3.html’’s requisite ‘3.gif’ will be
+ downloaded. Similarly,
+
+ wget -r -l 1 -p http://SITE/1.html
+
+ will cause ‘1.html’, ‘1.gif’, ‘2.html’, and ‘2.gif’ to be
+ downloaded. One might think that:
+
+ wget -r -l 0 -p http://SITE/1.html
+
+ would download just ‘1.html’ and ‘1.gif’, but unfortunately this is
+ not the case, because ‘-l 0’ is equivalent to ‘-l inf’—that is,
+ infinite recursion. To download a single HTML page (or a handful
+ of them, all specified on the command-line or in a ‘-i’ URL input
+ file) and its (or their) requisites, simply leave off ‘-r’ and
+ ‘-l’:
+
+ wget -p http://SITE/1.html
+
+ Note that Wget will behave as if ‘-r’ had been specified, but only
+ that single page and its requisites will be downloaded. Links from
+ that page to external documents will not be followed. Actually, to
+ download a single page and all its requisites (even if they exist
+ on separate websites), and make sure the lot displays properly
+ locally, this author likes to use a few options in addition to
+ ‘-p’:
+
+ wget -E -H -k -K -p http://SITE/DOCUMENT
+
+ To finish off this topic, it’s worth knowing that Wget’s idea of an
+ external document link is any URL specified in an ‘<A>’ tag, an
+ ‘<AREA>’ tag, or a ‘<LINK>’ tag other than ‘<LINK
+ REL="stylesheet">’.
+
+‘--strict-comments’
+ Turn on strict parsing of HTML comments. The default is to
+ terminate comments at the first occurrence of ‘-->’.
+
+ According to specifications, HTML comments are expressed as SGML
+ “declarations”. Declaration is special markup that begins with
+ ‘<!’ and ends with ‘>’, such as ‘<!DOCTYPE ...>’, that may contain
+ comments between a pair of ‘--’ delimiters. HTML comments are
+ “empty declarations”, SGML declarations without any non-comment
+ text. Therefore, ‘<!--foo-->’ is a valid comment, and so is
+ ‘<!--one-- --two-->’, but ‘<!--1--2-->’ is not.
+
+ On the other hand, most HTML writers don’t perceive comments as
+ anything other than text delimited with ‘<!--’ and ‘-->’, which is
+ not quite the same. For example, something like ‘<!------------>’
+ works as a valid comment as long as the number of dashes is a
+ multiple of four (!). If not, the comment technically lasts until
+ the next ‘--’, which may be at the other end of the document.
+ Because of this, many popular browsers completely ignore the
+ specification and implement what users have come to expect:
+ comments delimited with ‘<!--’ and ‘-->’.
+
+ Until version 1.9, Wget interpreted comments strictly, which
+ resulted in missing links in many web pages that displayed fine in
+ browsers, but had the misfortune of containing non-compliant
+ comments. Beginning with version 1.9, Wget has joined the ranks of
+ clients that implements “naive” comments, terminating each comment
+ at the first occurrence of ‘-->’.
+
+ If, for whatever reason, you want strict comment parsing, use this
+ option to turn it on.
+
+
+File: wget.info, Node: Recursive Accept/Reject Options, Next: Exit Status, Prev: Recursive Retrieval Options, Up: Invoking
+
+2.12 Recursive Accept/Reject Options
+====================================
+
+‘-A ACCLIST --accept ACCLIST’
+‘-R REJLIST --reject REJLIST’
+ Specify comma-separated lists of file name suffixes or patterns to
+ accept or reject (*note Types of Files::). Note that if any of the
+ wildcard characters, ‘*’, ‘?’, ‘[’ or ‘]’, appear in an element of
+ ACCLIST or REJLIST, it will be treated as a pattern, rather than a
+ suffix. In this case, you have to enclose the pattern into quotes
+ to prevent your shell from expanding it, like in ‘-A "*.mp3"’ or
+ ‘-A '*.mp3'’.
+
+‘--accept-regex URLREGEX’
+‘--reject-regex URLREGEX’
+ Specify a regular expression to accept or reject the complete URL.
+
+‘--regex-type REGEXTYPE’
+ Specify the regular expression type. Possible types are ‘posix’ or
+ ‘pcre’. Note that to be able to use ‘pcre’ type, wget has to be
+ compiled with libpcre support.
+
+‘-D DOMAIN-LIST’
+‘--domains=DOMAIN-LIST’
+ Set domains to be followed. DOMAIN-LIST is a comma-separated list
+ of domains. Note that it does _not_ turn on ‘-H’.
+
+‘--exclude-domains DOMAIN-LIST’
+ Specify the domains that are _not_ to be followed (*note Spanning
+ Hosts::).
+
+‘--follow-ftp’
+ Follow FTP links from HTML documents. Without this option, Wget
+ will ignore all the FTP links.
+
+‘--follow-tags=LIST’
+ Wget has an internal table of HTML tag / attribute pairs that it
+ considers when looking for linked documents during a recursive
+ retrieval. If a user wants only a subset of those tags to be
+ considered, however, he or she should be specify such tags in a
+ comma-separated LIST with this option.
+
+‘--ignore-tags=LIST’
+ This is the opposite of the ‘--follow-tags’ option. To skip
+ certain HTML tags when recursively looking for documents to
+ download, specify them in a comma-separated LIST.
+
+ In the past, this option was the best bet for downloading a single
+ page and its requisites, using a command-line like:
+
+ wget --ignore-tags=a,area -H -k -K -r http://SITE/DOCUMENT
+
+ However, the author of this option came across a page with tags
+ like ‘<LINK REL="home" HREF="/">’ and came to the realization that
+ specifying tags to ignore was not enough. One can’t just tell Wget
+ to ignore ‘<LINK>’, because then stylesheets will not be
+ downloaded. Now the best bet for downloading a single page and its
+ requisites is the dedicated ‘--page-requisites’ option.
+
+‘--ignore-case’
+ Ignore case when matching files and directories. This influences
+ the behavior of -R, -A, -I, and -X options, as well as globbing
+ implemented when downloading from FTP sites. For example, with
+ this option, ‘-A "*.txt"’ will match ‘file1.txt’, but also
+ ‘file2.TXT’, ‘file3.TxT’, and so on. The quotes in the example are
+ to prevent the shell from expanding the pattern.
+
+‘-H’
+‘--span-hosts’
+ Enable spanning across hosts when doing recursive retrieving (*note
+ Spanning Hosts::).
+
+‘-L’
+‘--relative’
+ Follow relative links only. Useful for retrieving a specific home
+ page without any distractions, not even those from the same hosts
+ (*note Relative Links::).
+
+‘-I LIST’
+‘--include-directories=LIST’
+ Specify a comma-separated list of directories you wish to follow
+ when downloading (*note Directory-Based Limits::). Elements of
+ LIST may contain wildcards.
+
+‘-X LIST’
+‘--exclude-directories=LIST’
+ Specify a comma-separated list of directories you wish to exclude
+ from download (*note Directory-Based Limits::). Elements of LIST
+ may contain wildcards.
+
+‘-np’
+‘--no-parent’
+ Do not ever ascend to the parent directory when retrieving
+ recursively. This is a useful option, since it guarantees that
+ only the files _below_ a certain hierarchy will be downloaded.
+ *Note Directory-Based Limits::, for more details.
+
+
+File: wget.info, Node: Exit Status, Prev: Recursive Accept/Reject Options, Up: Invoking
+
+2.13 Exit Status
+================
+
+Wget may return one of several error codes if it encounters problems.
+
+0
+ No problems occurred.
+
+1
+ Generic error code.
+
+2
+ Parse error—for instance, when parsing command-line options, the
+ ‘.wgetrc’ or ‘.netrc’...
+
+3
+ File I/O error.
+
+4
+ Network failure.
+
+5
+ SSL verification failure.
+
+6
+ Username/password authentication failure.
+
+7
+ Protocol errors.
+
+8
+ Server issued an error response.
+
+ With the exceptions of 0 and 1, the lower-numbered exit codes take
+precedence over higher-numbered ones, when multiple types of errors are
+encountered.
+
+ In versions of Wget prior to 1.12, Wget’s exit status tended to be
+unhelpful and inconsistent. Recursive downloads would virtually always
+return 0 (success), regardless of any issues encountered, and
+non-recursive fetches only returned the status corresponding to the most
+recently-attempted download.
+
+
+File: wget.info, Node: Recursive Download, Next: Following Links, Prev: Invoking, Up: Top
+
+3 Recursive Download
+********************
+
+GNU Wget is capable of traversing parts of the Web (or a single HTTP or
+FTP server), following links and directory structure. We refer to this
+as to “recursive retrieval”, or “recursion”.
+
+ With HTTP URLs, Wget retrieves and parses the HTML or CSS from the
+given URL, retrieving the files the document refers to, through markup
+like ‘href’ or ‘src’, or CSS URI values specified using the ‘url()’
+functional notation. If the freshly downloaded file is also of type
+‘text/html’, ‘application/xhtml+xml’, or ‘text/css’, it will be parsed
+and followed further.
+
+ Recursive retrieval of HTTP and HTML/CSS content is “breadth-first”.
+This means that Wget first downloads the requested document, then the
+documents linked from that document, then the documents linked by them,
+and so on. In other words, Wget first downloads the documents at depth
+1, then those at depth 2, and so on until the specified maximum depth.
+
+ The maximum “depth” to which the retrieval may descend is specified
+with the ‘-l’ option. The default maximum depth is five layers.
+
+ When retrieving an FTP URL recursively, Wget will retrieve all the
+data from the given directory tree (including the subdirectories up to
+the specified depth) on the remote server, creating its mirror image
+locally. FTP retrieval is also limited by the ‘depth’ parameter.
+Unlike HTTP recursion, FTP recursion is performed depth-first.
+
+ By default, Wget will create a local directory tree, corresponding to
+the one found on the remote server.
+
+ Recursive retrieving can find a number of applications, the most
+important of which is mirroring. It is also useful for WWW
+presentations, and any other opportunities where slow network
+connections should be bypassed by storing the files locally.
+
+ You should be warned that recursive downloads can overload the remote
+servers. Because of that, many administrators frown upon them and may
+ban access from your site if they detect very fast downloads of big
+amounts of content. When downloading from Internet servers, consider
+using the ‘-w’ option to introduce a delay between accesses to the
+server. The download will take a while longer, but the server
+administrator will not be alarmed by your rudeness.
+
+ Of course, recursive download may cause problems on your machine. If
+left to run unchecked, it can easily fill up the disk. If downloading
+from local network, it can also take bandwidth on the system, as well as
+consume memory and CPU.
+
+ Try to specify the criteria that match the kind of download you are
+trying to achieve. If you want to download only one page, use
+‘--page-requisites’ without any additional recursion. If you want to
+download things under one directory, use ‘-np’ to avoid downloading
+things from other directories. If you want to download all the files
+from one directory, use ‘-l 1’ to make sure the recursion depth never
+exceeds one. *Note Following Links::, for more information about this.
+
+ Recursive retrieval should be used with care. Don’t say you were not
+warned.
+
+
+File: wget.info, Node: Following Links, Next: Time-Stamping, Prev: Recursive Download, Up: Top
+
+4 Following Links
+*****************
+
+When retrieving recursively, one does not wish to retrieve loads of
+unnecessary data. Most of the time the users bear in mind exactly what
+they want to download, and want Wget to follow only specific links.
+
+ For example, if you wish to download the music archive from
+‘fly.srk.fer.hr’, you will not want to download all the home pages that
+happen to be referenced by an obscure part of the archive.
+
+ Wget possesses several mechanisms that allows you to fine-tune which
+links it will follow.
+
+* Menu:
+
+* Spanning Hosts:: (Un)limiting retrieval based on host name.
+* Types of Files:: Getting only certain files.
+* Directory-Based Limits:: Getting only certain directories.
+* Relative Links:: Follow relative links only.
+* FTP Links:: Following FTP links.
+
+
+File: wget.info, Node: Spanning Hosts, Next: Types of Files, Prev: Following Links, Up: Following Links
+
+4.1 Spanning Hosts
+==================
+
+Wget’s recursive retrieval normally refuses to visit hosts different
+than the one you specified on the command line. This is a reasonable
+default; without it, every retrieval would have the potential to turn
+your Wget into a small version of google.
+
+ However, visiting different hosts, or “host spanning,” is sometimes a
+useful option. Maybe the images are served from a different server.
+Maybe you’re mirroring a site that consists of pages interlinked between
+three servers. Maybe the server has two equivalent names, and the HTML
+pages refer to both interchangeably.
+
+Span to any host—‘-H’
+
+ The ‘-H’ option turns on host spanning, thus allowing Wget’s
+ recursive run to visit any host referenced by a link. Unless
+ sufficient recursion-limiting criteria are applied depth, these
+ foreign hosts will typically link to yet more hosts, and so on
+ until Wget ends up sucking up much more data than you have
+ intended.
+
+Limit spanning to certain domains—‘-D’
+
+ The ‘-D’ option allows you to specify the domains that will be
+ followed, thus limiting the recursion only to the hosts that belong
+ to these domains. Obviously, this makes sense only in conjunction
+ with ‘-H’. A typical example would be downloading the contents of
+ ‘www.example.com’, but allowing downloads from
+ ‘images.example.com’, etc.:
+
+ wget -rH -Dexample.com http://www.example.com/
+
+ You can specify more than one address by separating them with a
+ comma, e.g. ‘-Ddomain1.com,domain2.com’.
+
+Keep download off certain domains—‘--exclude-domains’
+
+ If there are domains you want to exclude specifically, you can do
+ it with ‘--exclude-domains’, which accepts the same type of
+ arguments of ‘-D’, but will _exclude_ all the listed domains. For
+ example, if you want to download all the hosts from ‘foo.edu’
+ domain, with the exception of ‘sunsite.foo.edu’, you can do it like
+ this:
+
+ wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \
+ http://www.foo.edu/
+
+
+File: wget.info, Node: Types of Files, Next: Directory-Based Limits, Prev: Spanning Hosts, Up: Following Links
+
+4.2 Types of Files
+==================
+
+When downloading material from the web, you will often want to restrict
+the retrieval to only certain file types. For example, if you are
+interested in downloading GIFs, you will not be overjoyed to get loads
+of PostScript documents, and vice versa.
+
+ Wget offers two options to deal with this problem. Each option
+description lists a short name, a long name, and the equivalent command
+in ‘.wgetrc’.
+
+‘-A ACCLIST’
+‘--accept ACCLIST’
+‘accept = ACCLIST’
+‘--accept-regex URLREGEX’
+‘accept-regex = URLREGEX’
+ The argument to ‘--accept’ option is a list of file suffixes or
+ patterns that Wget will download during recursive retrieval. A
+ suffix is the ending part of a file, and consists of “normal”
+ letters, e.g. ‘gif’ or ‘.jpg’. A matching pattern contains
+ shell-like wildcards, e.g. ‘books*’ or ‘zelazny*196[0-9]*’.
+
+ So, specifying ‘wget -A gif,jpg’ will make Wget download only the
+ files ending with ‘gif’ or ‘jpg’, i.e. GIFs and JPEGs. On the
+ other hand, ‘wget -A "zelazny*196[0-9]*"’ will download only files
+ beginning with ‘zelazny’ and containing numbers from 1960 to 1969
+ anywhere within. Look up the manual of your shell for a
+ description of how pattern matching works.
+
+ Of course, any number of suffixes and patterns can be combined into
+ a comma-separated list, and given as an argument to ‘-A’.
+
+ The argument to ‘--accept-regex’ option is a regular expression
+ which is matched against the complete URL.
+
+‘-R REJLIST’
+‘--reject REJLIST’
+‘reject = REJLIST’
+‘--reject-regex URLREGEX’
+‘reject-regex = URLREGEX’
+ The ‘--reject’ option works the same way as ‘--accept’, only its
+ logic is the reverse; Wget will download all files _except_ the
+ ones matching the suffixes (or patterns) in the list.
+
+ So, if you want to download a whole page except for the cumbersome
+ MPEGs and .AU files, you can use ‘wget -R mpg,mpeg,au’.
+ Analogously, to download all files except the ones beginning with
+ ‘bjork’, use ‘wget -R "bjork*"’. The quotes are to prevent
+ expansion by the shell.
+
+ The argument to ‘--accept-regex’ option is a regular expression which
+is matched against the complete URL.
+
+The ‘-A’ and ‘-R’ options may be combined to achieve even better
+fine-tuning of which files to retrieve. E.g. ‘wget -A "*zelazny*" -R
+.ps’ will download all the files having ‘zelazny’ as a part of their
+name, but _not_ the PostScript files.
+
+ Note that these two options do not affect the downloading of HTML
+files (as determined by a ‘.htm’ or ‘.html’ filename prefix). This
+behavior may not be desirable for all users, and may be changed for
+future versions of Wget.
+
+ Note, too, that query strings (strings at the end of a URL beginning
+with a question mark (‘?’) are not included as part of the filename for
+accept/reject rules, even though these will actually contribute to the
+name chosen for the local file. It is expected that a future version of
+Wget will provide an option to allow matching against query strings.
+
+ Finally, it’s worth noting that the accept/reject lists are matched
+_twice_ against downloaded files: once against the URL’s filename
+portion, to determine if the file should be downloaded in the first
+place; then, after it has been accepted and successfully downloaded, the
+local file’s name is also checked against the accept/reject lists to see
+if it should be removed. The rationale was that, since ‘.htm’ and
+‘.html’ files are always downloaded regardless of accept/reject rules,
+they should be removed _after_ being downloaded and scanned for links,
+if they did match the accept/reject lists. However, this can lead to
+unexpected results, since the local filenames can differ from the
+original URL filenames in the following ways, all of which can change
+whether an accept/reject rule matches:
+
+ • If the local file already exists and ‘--no-directories’ was
+ specified, a numeric suffix will be appended to the original name.
+ • If ‘--adjust-extension’ was specified, the local filename might
+ have ‘.html’ appended to it. If Wget is invoked with ‘-E -A.php’,
+ a filename such as ‘index.php’ will match be accepted, but upon
+ download will be named ‘index.php.html’, which no longer matches,
+ and so the file will be deleted.
+ • Query strings do not contribute to URL matching, but are included
+ in local filenames, and so _do_ contribute to filename matching.
+
+This behavior, too, is considered less-than-desirable, and may change in
+a future version of Wget.
+
+
+File: wget.info, Node: Directory-Based Limits, Next: Relative Links, Prev: Types of Files, Up: Following Links
+
+4.3 Directory-Based Limits
+==========================
+
+Regardless of other link-following facilities, it is often useful to
+place the restriction of what files to retrieve based on the directories
+those files are placed in. There can be many reasons for this—the home
+pages may be organized in a reasonable directory structure; or some
+directories may contain useless information, e.g. ‘/cgi-bin’ or ‘/dev’
+directories.
+
+ Wget offers three different options to deal with this requirement.
+Each option description lists a short name, a long name, and the
+equivalent command in ‘.wgetrc’.
+
+‘-I LIST’
+‘--include LIST’
+‘include_directories = LIST’
+ ‘-I’ option accepts a comma-separated list of directories included
+ in the retrieval. Any other directories will simply be ignored.
+ The directories are absolute paths.
+
+ So, if you wish to download from ‘http://host/people/bozo/’
+ following only links to bozo’s colleagues in the ‘/people’
+ directory and the bogus scripts in ‘/cgi-bin’, you can specify:
+
+ wget -I /people,/cgi-bin http://host/people/bozo/
+
+‘-X LIST’
+‘--exclude LIST’
+‘exclude_directories = LIST’
+ ‘-X’ option is exactly the reverse of ‘-I’—this is a list of
+ directories _excluded_ from the download. E.g. if you do not want
+ Wget to download things from ‘/cgi-bin’ directory, specify ‘-X
+ /cgi-bin’ on the command line.
+
+ The same as with ‘-A’/‘-R’, these two options can be combined to
+ get a better fine-tuning of downloading subdirectories. E.g. if
+ you want to load all the files from ‘/pub’ hierarchy except for
+ ‘/pub/worthless’, specify ‘-I/pub -X/pub/worthless’.
+
+‘-np’
+‘--no-parent’
+‘no_parent = on’
+ The simplest, and often very useful way of limiting directories is
+ disallowing retrieval of the links that refer to the hierarchy
+ “above” than the beginning directory, i.e. disallowing ascent to
+ the parent directory/directories.
+
+ The ‘--no-parent’ option (short ‘-np’) is useful in this case.
+ Using it guarantees that you will never leave the existing
+ hierarchy. Supposing you issue Wget with:
+
+ wget -r --no-parent http://somehost/~luzer/my-archive/
+
+ You may rest assured that none of the references to
+ ‘/~his-girls-homepage/’ or ‘/~luzer/all-my-mpegs/’ will be
+ followed. Only the archive you are interested in will be
+ downloaded. Essentially, ‘--no-parent’ is similar to
+ ‘-I/~luzer/my-archive’, only it handles redirections in a more
+ intelligent fashion.
+
+ *Note* that, for HTTP (and HTTPS), the trailing slash is very
+ important to ‘--no-parent’. HTTP has no concept of a
+ “directory”—Wget relies on you to indicate what’s a directory and
+ what isn’t. In ‘http://foo/bar/’, Wget will consider ‘bar’ to be a
+ directory, while in ‘http://foo/bar’ (no trailing slash), ‘bar’
+ will be considered a filename (so ‘--no-parent’ would be
+ meaningless, as its parent is ‘/’).
+
+
+File: wget.info, Node: Relative Links, Next: FTP Links, Prev: Directory-Based Limits, Up: Following Links
+
+4.4 Relative Links
+==================
+
+When ‘-L’ is turned on, only the relative links are ever followed.
+Relative links are here defined those that do not refer to the web
+server root. For example, these links are relative:
+
+ <a href="foo.gif">
+ <a href="foo/bar.gif">
+ <a href="../foo/bar.gif">
+
+ These links are not relative:
+
+ <a href="/foo.gif">
+ <a href="/foo/bar.gif">
+ <a href="http://www.example.com/foo/bar.gif">
+
+ Using this option guarantees that recursive retrieval will not span
+hosts, even without ‘-H’. In simple cases it also allows downloads to
+“just work” without having to convert links.
+
+ This option is probably not very useful and might be removed in a
+future release.
+
+
+File: wget.info, Node: FTP Links, Prev: Relative Links, Up: Following Links
+
+4.5 Following FTP Links
+=======================
+
+The rules for FTP are somewhat specific, as it is necessary for them to
+be. FTP links in HTML documents are often included for purposes of
+reference, and it is often inconvenient to download them by default.
+
+ To have FTP links followed from HTML documents, you need to specify
+the ‘--follow-ftp’ option. Having done that, FTP links will span hosts
+regardless of ‘-H’ setting. This is logical, as FTP links rarely point
+to the same host where the HTTP server resides. For similar reasons,
+the ‘-L’ options has no effect on such downloads. On the other hand,
+domain acceptance (‘-D’) and suffix rules (‘-A’ and ‘-R’) apply
+normally.
+
+ Also note that followed links to FTP directories will not be
+retrieved recursively further.
+
+
+File: wget.info, Node: Time-Stamping, Next: Startup File, Prev: Following Links, Up: Top
+
+5 Time-Stamping
+***************
+
+One of the most important aspects of mirroring information from the
+Internet is updating your archives.
+
+ Downloading the whole archive again and again, just to replace a few
+changed files is expensive, both in terms of wasted bandwidth and money,
+and the time to do the update. This is why all the mirroring tools
+offer the option of incremental updating.
+
+ Such an updating mechanism means that the remote server is scanned in
+search of “new” files. Only those new files will be downloaded in the
+place of the old ones.
+
+ A file is considered new if one of these two conditions are met:
+
+ 1. A file of that name does not already exist locally.
+
+ 2. A file of that name does exist, but the remote file was modified
+ more recently than the local file.
+
+ To implement this, the program needs to be aware of the time of last
+modification of both local and remote files. We call this information
+the “time-stamp” of a file.
+
+ The time-stamping in GNU Wget is turned on using ‘--timestamping’
+(‘-N’) option, or through ‘timestamping = on’ directive in ‘.wgetrc’.
+With this option, for each file it intends to download, Wget will check
+whether a local file of the same name exists. If it does, and the
+remote file is not newer, Wget will not download it.
+
+ If the local file does not exist, or the sizes of the files do not
+match, Wget will download the remote file no matter what the time-stamps
+say.
+
+* Menu:
+
+* Time-Stamping Usage::
+* HTTP Time-Stamping Internals::
+* FTP Time-Stamping Internals::
+
+
+File: wget.info, Node: Time-Stamping Usage, Next: HTTP Time-Stamping Internals, Prev: Time-Stamping, Up: Time-Stamping
+
+5.1 Time-Stamping Usage
+=======================
+
+The usage of time-stamping is simple. Say you would like to download a
+file so that it keeps its date of modification.
+
+ wget -S http://www.gnu.ai.mit.edu/
+
+ A simple ‘ls -l’ shows that the time stamp on the local file equals
+the state of the ‘Last-Modified’ header, as returned by the server. As
+you can see, the time-stamping info is preserved locally, even without
+‘-N’ (at least for HTTP).
+
+ Several days later, you would like Wget to check if the remote file
+has changed, and download it if it has.
+
+ wget -N http://www.gnu.ai.mit.edu/
+
+ Wget will ask the server for the last-modified date. If the local
+file has the same timestamp as the server, or a newer one, the remote
+file will not be re-fetched. However, if the remote file is more
+recent, Wget will proceed to fetch it.
+
+ The same goes for FTP. For example:
+
+ wget "ftp://ftp.ifi.uio.no/pub/emacs/gnus/*"
+
+ (The quotes around that URL are to prevent the shell from trying to
+interpret the ‘*’.)
+
+ After download, a local directory listing will show that the
+timestamps match those on the remote server. Reissuing the command with
+‘-N’ will make Wget re-fetch _only_ the files that have been modified
+since the last download.
+
+ If you wished to mirror the GNU archive every week, you would use a
+command like the following, weekly:
+
+ wget --timestamping -r ftp://ftp.gnu.org/pub/gnu/
+
+ Note that time-stamping will only work for files for which the server
+gives a timestamp. For HTTP, this depends on getting a ‘Last-Modified’
+header. For FTP, this depends on getting a directory listing with dates
+in a format that Wget can parse (*note FTP Time-Stamping Internals::).
+
+
+File: wget.info, Node: HTTP Time-Stamping Internals, Next: FTP Time-Stamping Internals, Prev: Time-Stamping Usage, Up: Time-Stamping
+
+5.2 HTTP Time-Stamping Internals
+================================
+
+Time-stamping in HTTP is implemented by checking of the ‘Last-Modified’
+header. If you wish to retrieve the file ‘foo.html’ through HTTP, Wget
+will check whether ‘foo.html’ exists locally. If it doesn’t, ‘foo.html’
+will be retrieved unconditionally.
+
+ If the file does exist locally, Wget will first check its local
+time-stamp (similar to the way ‘ls -l’ checks it), and then send a
+‘HEAD’ request to the remote server, demanding the information on the
+remote file.
+
+ The ‘Last-Modified’ header is examined to find which file was
+modified more recently (which makes it “newer”). If the remote file is
+newer, it will be downloaded; if it is older, Wget will give up.(1)
+
+ When ‘--backup-converted’ (‘-K’) is specified in conjunction with
+‘-N’, server file ‘X’ is compared to local file ‘X.orig’, if extant,
+rather than being compared to local file ‘X’, which will always differ
+if it’s been converted by ‘--convert-links’ (‘-k’).
+
+ Arguably, HTTP time-stamping should be implemented using the
+‘If-Modified-Since’ request.
+
+ ---------- Footnotes ----------
+
+ (1) As an additional check, Wget will look at the ‘Content-Length’
+header, and compare the sizes; if they are not the same, the remote file
+will be downloaded no matter what the time-stamp says.
+
+
+File: wget.info, Node: FTP Time-Stamping Internals, Prev: HTTP Time-Stamping Internals, Up: Time-Stamping
+
+5.3 FTP Time-Stamping Internals
+===============================
+
+In theory, FTP time-stamping works much the same as HTTP, only FTP has
+no headers—time-stamps must be ferreted out of directory listings.
+
+ If an FTP download is recursive or uses globbing, Wget will use the
+FTP ‘LIST’ command to get a file listing for the directory containing
+the desired file(s). It will try to analyze the listing, treating it
+like Unix ‘ls -l’ output, extracting the time-stamps. The rest is
+exactly the same as for HTTP. Note that when retrieving individual
+files from an FTP server without using globbing or recursion, listing
+files will not be downloaded (and thus files will not be time-stamped)
+unless ‘-N’ is specified.
+
+ Assumption that every directory listing is a Unix-style listing may
+sound extremely constraining, but in practice it is not, as many
+non-Unix FTP servers use the Unixoid listing format because most (all?)
+of the clients understand it. Bear in mind that RFC959 defines no
+standard way to get a file list, let alone the time-stamps. We can only
+hope that a future standard will define this.
+
+ Another non-standard solution includes the use of ‘MDTM’ command that
+is supported by some FTP servers (including the popular ‘wu-ftpd’),
+which returns the exact time of the specified file. Wget may support
+this command in the future.
+
+
+File: wget.info, Node: Startup File, Next: Examples, Prev: Time-Stamping, Up: Top
+
+6 Startup File
+**************
+
+Once you know how to change default settings of Wget through command
+line arguments, you may wish to make some of those settings permanent.
+You can do that in a convenient way by creating the Wget startup
+file—‘.wgetrc’.
+
+ Besides ‘.wgetrc’ is the “main” initialization file, it is convenient
+to have a special facility for storing passwords. Thus Wget reads and
+interprets the contents of ‘$HOME/.netrc’, if it finds it. You can find
+‘.netrc’ format in your system manuals.
+
+ Wget reads ‘.wgetrc’ upon startup, recognizing a limited set of
+commands.
+
+* Menu:
+
+* Wgetrc Location:: Location of various wgetrc files.
+* Wgetrc Syntax:: Syntax of wgetrc.
+* Wgetrc Commands:: List of available commands.
+* Sample Wgetrc:: A wgetrc example.
+
+
+File: wget.info, Node: Wgetrc Location, Next: Wgetrc Syntax, Prev: Startup File, Up: Startup File
+
+6.1 Wgetrc Location
+===================
+
+When initializing, Wget will look for a “global” startup file,
+‘/usr/local/etc/wgetrc’ by default (or some prefix other than
+‘/usr/local’, if Wget was not installed there) and read commands from
+there, if it exists.
+
+ Then it will look for the user’s file. If the environmental variable
+‘WGETRC’ is set, Wget will try to load that file. Failing that, no
+further attempts will be made.
+
+ If ‘WGETRC’ is not set, Wget will try to load ‘$HOME/.wgetrc’.
+
+ The fact that user’s settings are loaded after the system-wide ones
+means that in case of collision user’s wgetrc _overrides_ the
+system-wide wgetrc (in ‘/usr/local/etc/wgetrc’ by default). Fascist
+admins, away!
+
+
+File: wget.info, Node: Wgetrc Syntax, Next: Wgetrc Commands, Prev: Wgetrc Location, Up: Startup File
+
+6.2 Wgetrc Syntax
+=================
+
+The syntax of a wgetrc command is simple:
+
+ variable = value
+
+ The “variable” will also be called “command”. Valid “values” are
+different for different commands.
+
+ The commands are case-, underscore- and minus-insensitive. Thus
+‘DIr__PrefiX’, ‘DIr-PrefiX’ and ‘dirprefix’ are the same. Empty lines,
+lines beginning with ‘#’ and lines containing white-space only are
+discarded.
+
+ Commands that expect a comma-separated list will clear the list on an
+empty command. So, if you wish to reset the rejection list specified in
+global ‘wgetrc’, you can do it with:
+
+ reject =
+
+
+File: wget.info, Node: Wgetrc Commands, Next: Sample Wgetrc, Prev: Wgetrc Syntax, Up: Startup File
+
+6.3 Wgetrc Commands
+===================
+
+The complete set of commands is listed below. Legal values are listed
+after the ‘=’. Simple Boolean values can be set or unset using ‘on’ and
+‘off’ or ‘1’ and ‘0’.
+
+ Some commands take pseudo-arbitrary values. ADDRESS values can be
+hostnames or dotted-quad IP addresses. N can be any positive integer,
+or ‘inf’ for infinity, where appropriate. STRING values can be any
+non-empty string.
+
+ Most of these commands have direct command-line equivalents. Also,
+any wgetrc command can be specified on the command line using the
+‘--execute’ switch (*note Basic Startup Options::.)
+
+accept/reject = STRING
+ Same as ‘-A’/‘-R’ (*note Types of Files::).
+
+add_hostdir = on/off
+ Enable/disable host-prefixed file names. ‘-nH’ disables it.
+
+ask_password = on/off
+ Prompt for a password for each connection established. Cannot be
+ specified when ‘--password’ is being used, because they are
+ mutually exclusive. Equivalent to ‘--ask-password’.
+
+auth_no_challenge = on/off
+ If this option is given, Wget will send Basic HTTP authentication
+ information (plaintext username and password) for all requests.
+ See ‘--auth-no-challenge’.
+
+background = on/off
+ Enable/disable going to background—the same as ‘-b’ (which enables
+ it).
+
+backup_converted = on/off
+ Enable/disable saving pre-converted files with the suffix
+ ‘.orig’—the same as ‘-K’ (which enables it).
+
+backups = NUMBER
+ Use up to NUMBER backups for a file. Backups are rotated by adding
+ an incremental counter that starts at ‘1’. The default is ‘0’.
+
+base = STRING
+ Consider relative URLs in input files (specified via the ‘input’
+ command or the ‘--input-file’/‘-i’ option, together with
+ ‘force_html’ or ‘--force-html’) as being relative to STRING—the
+ same as ‘--base=STRING’.
+
+bind_address = ADDRESS
+ Bind to ADDRESS, like the ‘--bind-address=ADDRESS’.
+
+ca_certificate = FILE
+ Set the certificate authority bundle file to FILE. The same as
+ ‘--ca-certificate=FILE’.
+
+ca_directory = DIRECTORY
+ Set the directory used for certificate authorities. The same as
+ ‘--ca-directory=DIRECTORY’.
+
+cache = on/off
+ When set to off, disallow server-caching. See the ‘--no-cache’
+ option.
+
+certificate = FILE
+ Set the client certificate file name to FILE. The same as
+ ‘--certificate=FILE’.
+
+certificate_type = STRING
+ Specify the type of the client certificate, legal values being
+ ‘PEM’ (the default) and ‘DER’ (aka ASN1). The same as
+ ‘--certificate-type=STRING’.
+
+check_certificate = on/off
+ If this is set to off, the server certificate is not checked
+ against the specified client authorities. The default is “on”.
+ The same as ‘--check-certificate’.
+
+connect_timeout = N
+ Set the connect timeout—the same as ‘--connect-timeout’.
+
+content_disposition = on/off
+ Turn on recognition of the (non-standard) ‘Content-Disposition’
+ HTTP header—if set to ‘on’, the same as ‘--content-disposition’.
+
+trust_server_names = on/off
+ If set to on, construct the local file name from redirection URLs
+ rather than original URLs.
+
+continue = on/off
+ If set to on, force continuation of preexistent partially retrieved
+ files. See ‘-c’ before setting it.
+
+convert_links = on/off
+ Convert non-relative links locally. The same as ‘-k’.
+
+cookies = on/off
+ When set to off, disallow cookies. See the ‘--cookies’ option.
+
+cut_dirs = N
+ Ignore N remote directory components. Equivalent to
+ ‘--cut-dirs=N’.
+
+debug = on/off
+ Debug mode, same as ‘-d’.
+
+default_page = STRING
+ Default page name—the same as ‘--default-page=STRING’.
+
+delete_after = on/off
+ Delete after download—the same as ‘--delete-after’.
+
+dir_prefix = STRING
+ Top of directory tree—the same as ‘-P STRING’.
+
+dirstruct = on/off
+ Turning dirstruct on or off—the same as ‘-x’ or ‘-nd’,
+ respectively.
+
+dns_cache = on/off
+ Turn DNS caching on/off. Since DNS caching is on by default, this
+ option is normally used to turn it off and is equivalent to
+ ‘--no-dns-cache’.
+
+dns_timeout = N
+ Set the DNS timeout—the same as ‘--dns-timeout’.
+
+domains = STRING
+ Same as ‘-D’ (*note Spanning Hosts::).
+
+dot_bytes = N
+ Specify the number of bytes “contained” in a dot, as seen
+ throughout the retrieval (1024 by default). You can postfix the
+ value with ‘k’ or ‘m’, representing kilobytes and megabytes,
+ respectively. With dot settings you can tailor the dot retrieval
+ to suit your needs, or you can use the predefined “styles” (*note
+ Download Options::).
+
+dot_spacing = N
+ Specify the number of dots in a single cluster (10 by default).
+
+dots_in_line = N
+ Specify the number of dots that will be printed in each line
+ throughout the retrieval (50 by default).
+
+egd_file = FILE
+ Use STRING as the EGD socket file name. The same as
+ ‘--egd-file=FILE’.
+
+exclude_directories = STRING
+ Specify a comma-separated list of directories you wish to exclude
+ from download—the same as ‘-X STRING’ (*note Directory-Based
+ Limits::).
+
+exclude_domains = STRING
+ Same as ‘--exclude-domains=STRING’ (*note Spanning Hosts::).
+
+follow_ftp = on/off
+ Follow FTP links from HTML documents—the same as ‘--follow-ftp’.
+
+follow_tags = STRING
+ Only follow certain HTML tags when doing a recursive retrieval,
+ just like ‘--follow-tags=STRING’.
+
+force_html = on/off
+ If set to on, force the input filename to be regarded as an HTML
+ document—the same as ‘-F’.
+
+ftp_password = STRING
+ Set your FTP password to STRING. Without this setting, the
+ password defaults to ‘-wget@’, which is a useful default for
+ anonymous FTP access.
+
+ This command used to be named ‘passwd’ prior to Wget 1.10.
+
+ftp_proxy = STRING
+ Use STRING as FTP proxy, instead of the one specified in
+ environment.
+
+ftp_user = STRING
+ Set FTP user to STRING.
+
+ This command used to be named ‘login’ prior to Wget 1.10.
+
+glob = on/off
+ Turn globbing on/off—the same as ‘--glob’ and ‘--no-glob’.
+
+header = STRING
+ Define a header for HTTP downloads, like using ‘--header=STRING’.
+
+compression = STRING
+ Choose the compression type to be used. Legal values are ‘auto’
+ (the default), ‘gzip’, and ‘none’. The same as
+ ‘--compression=STRING’.
+
+adjust_extension = on/off
+ Add a ‘.html’ extension to ‘text/html’ or ‘application/xhtml+xml’
+ files that lack one, a ‘.css’ extension to ‘text/css’ files that
+ lack one, and a ‘.br’, ‘.Z’, ‘.zlib’ or ‘.gz’ to compressed files
+ like ‘-E’. Previously named ‘html_extension’ (still acceptable,
+ but deprecated).
+
+http_keep_alive = on/off
+ Turn the keep-alive feature on or off (defaults to on). Turning it
+ off is equivalent to ‘--no-http-keep-alive’.
+
+http_password = STRING
+ Set HTTP password, equivalent to ‘--http-password=STRING’.
+
+http_proxy = STRING
+ Use STRING as HTTP proxy, instead of the one specified in
+ environment.
+
+http_user = STRING
+ Set HTTP user to STRING, equivalent to ‘--http-user=STRING’.
+
+https_only = on/off
+ When in recursive mode, only HTTPS links are followed (defaults to
+ off).
+
+https_proxy = STRING
+ Use STRING as HTTPS proxy, instead of the one specified in
+ environment.
+
+ignore_case = on/off
+ When set to on, match files and directories case insensitively; the
+ same as ‘--ignore-case’.
+
+ignore_length = on/off
+ When set to on, ignore ‘Content-Length’ header; the same as
+ ‘--ignore-length’.
+
+ignore_tags = STRING
+ Ignore certain HTML tags when doing a recursive retrieval, like
+ ‘--ignore-tags=STRING’.
+
+include_directories = STRING
+ Specify a comma-separated list of directories you wish to follow
+ when downloading—the same as ‘-I STRING’.
+
+iri = on/off
+ When set to on, enable internationalized URI (IRI) support; the
+ same as ‘--iri’.
+
+inet4_only = on/off
+ Force connecting to IPv4 addresses, off by default. You can put
+ this in the global init file to disable Wget’s attempts to resolve
+ and connect to IPv6 hosts. Available only if Wget was compiled
+ with IPv6 support. The same as ‘--inet4-only’ or ‘-4’.
+
+inet6_only = on/off
+ Force connecting to IPv6 addresses, off by default. Available only
+ if Wget was compiled with IPv6 support. The same as ‘--inet6-only’
+ or ‘-6’.
+
+input = FILE
+ Read the URLs from STRING, like ‘-i FILE’.
+
+keep_session_cookies = on/off
+ When specified, causes ‘save_cookies = on’ to also save session
+ cookies. See ‘--keep-session-cookies’.
+
+limit_rate = RATE
+ Limit the download speed to no more than RATE bytes per second.
+ The same as ‘--limit-rate=RATE’.
+
+load_cookies = FILE
+ Load cookies from FILE. See ‘--load-cookies FILE’.
+
+local_encoding = ENCODING
+ Force Wget to use ENCODING as the default system encoding. See
+ ‘--local-encoding’.
+
+logfile = FILE
+ Set logfile to FILE, the same as ‘-o FILE’.
+
+max_redirect = NUMBER
+ Specifies the maximum number of redirections to follow for a
+ resource. See ‘--max-redirect=NUMBER’.
+
+mirror = on/off
+ Turn mirroring on/off. The same as ‘-m’.
+
+netrc = on/off
+ Turn reading netrc on or off.
+
+no_clobber = on/off
+ Same as ‘-nc’.
+
+no_parent = on/off
+ Disallow retrieving outside the directory hierarchy, like
+ ‘--no-parent’ (*note Directory-Based Limits::).
+
+no_proxy = STRING
+ Use STRING as the comma-separated list of domains to avoid in proxy
+ loading, instead of the one specified in environment.
+
+output_document = FILE
+ Set the output filename—the same as ‘-O FILE’.
+
+page_requisites = on/off
+ Download all ancillary documents necessary for a single HTML page
+ to display properly—the same as ‘-p’.
+
+passive_ftp = on/off
+ Change setting of passive FTP, equivalent to the ‘--passive-ftp’
+ option.
+
+password = STRING
+ Specify password STRING for both FTP and HTTP file retrieval. This
+ command can be overridden using the ‘ftp_password’ and
+ ‘http_password’ command for FTP and HTTP respectively.
+
+post_data = STRING
+ Use POST as the method for all HTTP requests and send STRING in the
+ request body. The same as ‘--post-data=STRING’.
+
+post_file = FILE
+ Use POST as the method for all HTTP requests and send the contents
+ of FILE in the request body. The same as ‘--post-file=FILE’.
+
+prefer_family = none/IPv4/IPv6
+ When given a choice of several addresses, connect to the addresses
+ with specified address family first. The address order returned by
+ DNS is used without change by default. The same as
+ ‘--prefer-family’, which see for a detailed discussion of why this
+ is useful.
+
+private_key = FILE
+ Set the private key file to FILE. The same as
+ ‘--private-key=FILE’.
+
+private_key_type = STRING
+ Specify the type of the private key, legal values being ‘PEM’ (the
+ default) and ‘DER’ (aka ASN1). The same as
+ ‘--private-type=STRING’.
+
+progress = STRING
+ Set the type of the progress indicator. Legal types are ‘dot’ and
+ ‘bar’. Equivalent to ‘--progress=STRING’.
+
+protocol_directories = on/off
+ When set, use the protocol name as a directory component of local
+ file names. The same as ‘--protocol-directories’.
+
+proxy_password = STRING
+ Set proxy authentication password to STRING, like
+ ‘--proxy-password=STRING’.
+
+proxy_user = STRING
+ Set proxy authentication user name to STRING, like
+ ‘--proxy-user=STRING’.
+
+quiet = on/off
+ Quiet mode—the same as ‘-q’.
+
+quota = QUOTA
+ Specify the download quota, which is useful to put in the global
+ ‘wgetrc’. When download quota is specified, Wget will stop
+ retrieving after the download sum has become greater than quota.
+ The quota can be specified in bytes (default), kbytes ‘k’ appended)
+ or mbytes (‘m’ appended). Thus ‘quota = 5m’ will set the quota to
+ 5 megabytes. Note that the user’s startup file overrides system
+ settings.
+
+random_file = FILE
+ Use FILE as a source of randomness on systems lacking
+ ‘/dev/random’.
+
+random_wait = on/off
+ Turn random between-request wait times on or off. The same as
+ ‘--random-wait’.
+
+read_timeout = N
+ Set the read (and write) timeout—the same as ‘--read-timeout=N’.
+
+reclevel = N
+ Recursion level (depth)—the same as ‘-l N’.
+
+recursive = on/off
+ Recursive on/off—the same as ‘-r’.
+
+referer = STRING
+ Set HTTP ‘Referer:’ header just like ‘--referer=STRING’. (Note
+ that it was the folks who wrote the HTTP spec who got the spelling
+ of “referrer” wrong.)
+
+relative_only = on/off
+ Follow only relative links—the same as ‘-L’ (*note Relative
+ Links::).
+
+remote_encoding = ENCODING
+ Force Wget to use ENCODING as the default remote server encoding.
+ See ‘--remote-encoding’.
+
+remove_listing = on/off
+ If set to on, remove FTP listings downloaded by Wget. Setting it
+ to off is the same as ‘--no-remove-listing’.
+
+restrict_file_names = unix/windows
+ Restrict the file names generated by Wget from URLs. See
+ ‘--restrict-file-names’ for a more detailed description.
+
+retr_symlinks = on/off
+ When set to on, retrieve symbolic links as if they were plain
+ files; the same as ‘--retr-symlinks’.
+
+retry_connrefused = on/off
+ When set to on, consider “connection refused” a transient error—the
+ same as ‘--retry-connrefused’.
+
+robots = on/off
+ Specify whether the norobots convention is respected by Wget, “on”
+ by default. This switch controls both the ‘/robots.txt’ and the
+ ‘nofollow’ aspect of the spec. *Note Robot Exclusion::, for more
+ details about this. Be sure you know what you are doing before
+ turning this off.
+
+save_cookies = FILE
+ Save cookies to FILE. The same as ‘--save-cookies FILE’.
+
+save_headers = on/off
+ Same as ‘--save-headers’.
+
+secure_protocol = STRING
+ Choose the secure protocol to be used. Legal values are ‘auto’
+ (the default), ‘SSLv2’, ‘SSLv3’, and ‘TLSv1’. The same as
+ ‘--secure-protocol=STRING’.
+
+server_response = on/off
+ Choose whether or not to print the HTTP and FTP server
+ responses—the same as ‘-S’.
+
+show_all_dns_entries = on/off
+ When a DNS name is resolved, show all the IP addresses, not just
+ the first three.
+
+span_hosts = on/off
+ Same as ‘-H’.
+
+spider = on/off
+ Same as ‘--spider’.
+
+strict_comments = on/off
+ Same as ‘--strict-comments’.
+
+timeout = N
+ Set all applicable timeout values to N, the same as ‘-T N’.
+
+timestamping = on/off
+ Turn timestamping on/off. The same as ‘-N’ (*note
+ Time-Stamping::).
+
+use_server_timestamps = on/off
+ If set to ‘off’, Wget won’t set the local file’s timestamp by the
+ one on the server (same as ‘--no-use-server-timestamps’).
+
+tries = N
+ Set number of retries per URL—the same as ‘-t N’.
+
+use_proxy = on/off
+ When set to off, don’t use proxy even when proxy-related
+ environment variables are set. In that case it is the same as
+ using ‘--no-proxy’.
+
+user = STRING
+ Specify username STRING for both FTP and HTTP file retrieval. This
+ command can be overridden using the ‘ftp_user’ and ‘http_user’
+ command for FTP and HTTP respectively.
+
+user_agent = STRING
+ User agent identification sent to the HTTP Server—the same as
+ ‘--user-agent=STRING’.
+
+verbose = on/off
+ Turn verbose on/off—the same as ‘-v’/‘-nv’.
+
+wait = N
+ Wait N seconds between retrievals—the same as ‘-w N’.
+
+wait_retry = N
+ Wait up to N seconds between retries of failed retrievals only—the
+ same as ‘--waitretry=N’. Note that this is turned on by default in
+ the global ‘wgetrc’.
+
+
+File: wget.info, Node: Sample Wgetrc, Prev: Wgetrc Commands, Up: Startup File
+
+6.4 Sample Wgetrc
+=================
+
+This is the sample initialization file, as given in the distribution.
+It is divided in two section—one for global usage (suitable for global
+startup file), and one for local usage (suitable for ‘$HOME/.wgetrc’).
+Be careful about the things you change.
+
+ Note that almost all the lines are commented out. For a command to
+have any effect, you must remove the ‘#’ character at the beginning of
+its line.
+
+ ###
+ ### Sample Wget initialization file .wgetrc
+ ###
+
+ ## You can use this file to change the default behaviour of wget or to
+ ## avoid having to type many many command-line options. This file does
+ ## not contain a comprehensive list of commands -- look at the manual
+ ## to find out what you can put into this file. You can find this here:
+ ## $ info wget.info 'Startup File'
+ ## Or online here:
+ ## https://www.gnu.org/software/wget/manual/wget.html#Startup-File
+ ##
+ ## Wget initialization file can reside in /usr/local/etc/wgetrc
+ ## (global, for all users) or $HOME/.wgetrc (for a single user).
+ ##
+ ## To use the settings in this file, you will have to uncomment them,
+ ## as well as change them, in most cases, as the values on the
+ ## commented-out lines are the default values (e.g. "off").
+ ##
+ ## Command are case-, underscore- and minus-insensitive.
+ ## For example ftp_proxy, ftp-proxy and ftpproxy are the same.
+
+
+ ##
+ ## Global settings (useful for setting up in /usr/local/etc/wgetrc).
+ ## Think well before you change them, since they may reduce wget's
+ ## functionality, and make it behave contrary to the documentation:
+ ##
+
+ # You can set retrieve quota for beginners by specifying a value
+ # optionally followed by 'K' (kilobytes) or 'M' (megabytes). The
+ # default quota is unlimited.
+ #quota = inf
+
+ # You can lower (or raise) the default number of retries when
+ # downloading a file (default is 20).
+ #tries = 20
+
+ # Lowering the maximum depth of the recursive retrieval is handy to
+ # prevent newbies from going too "deep" when they unwittingly start
+ # the recursive retrieval. The default is 5.
+ #reclevel = 5
+
+ # By default Wget uses "passive FTP" transfer where the client
+ # initiates the data connection to the server rather than the other
+ # way around. That is required on systems behind NAT where the client
+ # computer cannot be easily reached from the Internet. However, some
+ # firewalls software explicitly supports active FTP and in fact has
+ # problems supporting passive transfer. If you are in such
+ # environment, use "passive_ftp = off" to revert to active FTP.
+ #passive_ftp = off
+
+ # The "wait" command below makes Wget wait between every connection.
+ # If, instead, you want Wget to wait only between retries of failed
+ # downloads, set waitretry to maximum number of seconds to wait (Wget
+ # will use "linear backoff", waiting 1 second after the first failure
+ # on a file, 2 seconds after the second failure, etc. up to this max).
+ #waitretry = 10
+
+
+ ##
+ ## Local settings (for a user to set in his $HOME/.wgetrc). It is
+ ## *highly* undesirable to put these settings in the global file, since
+ ## they are potentially dangerous to "normal" users.
+ ##
+ ## Even when setting up your own ~/.wgetrc, you should know what you
+ ## are doing before doing so.
+ ##
+
+ # Set this to on to use timestamping by default:
+ #timestamping = off
+
+ # It is a good idea to make Wget send your email address in a `From:'
+ # header with your request (so that server administrators can contact
+ # you in case of errors). Wget does *not* send `From:' by default.
+ #header = From: Your Name <username@site.domain>
+
+ # You can set up other headers, like Accept-Language. Accept-Language
+ # is *not* sent by default.
+ #header = Accept-Language: en
+
+ # You can set the default proxies for Wget to use for http, https, and ftp.
+ # They will override the value in the environment.
+ #https_proxy = http://proxy.yoyodyne.com:18023/
+ #http_proxy = http://proxy.yoyodyne.com:18023/
+ #ftp_proxy = http://proxy.yoyodyne.com:18023/
+
+ # If you do not want to use proxy at all, set this to off.
+ #use_proxy = on
+
+ # You can customize the retrieval outlook. Valid options are default,
+ # binary, mega and micro.
+ #dot_style = default
+
+ # Setting this to off makes Wget not download /robots.txt. Be sure to
+ # know *exactly* what /robots.txt is and how it is used before changing
+ # the default!
+ #robots = on
+
+ # It can be useful to make Wget wait between connections. Set this to
+ # the number of seconds you want Wget to wait.
+ #wait = 0
+
+ # You can force creating directory structure, even if a single is being
+ # retrieved, by setting this to on.
+ #dirstruct = off
+
+ # You can turn on recursive retrieving by default (don't do this if
+ # you are not sure you know what it means) by setting this to on.
+ #recursive = off
+
+ # To always back up file X as X.orig before converting its links (due
+ # to -k / --convert-links / convert_links = on having been specified),
+ # set this variable to on:
+ #backup_converted = off
+
+ # To have Wget follow FTP links from HTML files by default, set this
+ # to on:
+ #follow_ftp = off
+
+ # To try ipv6 addresses first:
+ #prefer-family = IPv6
+
+ # Set default IRI support state
+ #iri = off
+
+ # Force the default system encoding
+ #localencoding = UTF-8
+
+ # Force the default remote server encoding
+ #remoteencoding = UTF-8
+
+ # Turn on to prevent following non-HTTPS links when in recursive mode
+ #httpsonly = off
+
+ # Tune HTTPS security (auto, SSLv2, SSLv3, TLSv1, PFS)
+ #secureprotocol = auto
+
+
+File: wget.info, Node: Examples, Next: Various, Prev: Startup File, Up: Top
+
+7 Examples
+**********
+
+The examples are divided into three sections loosely based on their
+complexity.
+
+* Menu:
+
+* Simple Usage:: Simple, basic usage of the program.
+* Advanced Usage:: Advanced tips.
+* Very Advanced Usage:: The hairy stuff.
+
+
+File: wget.info, Node: Simple Usage, Next: Advanced Usage, Prev: Examples, Up: Examples
+
+7.1 Simple Usage
+================
+
+ • Say you want to download a URL. Just type:
+
+ wget http://fly.srk.fer.hr/
+
+ • But what will happen if the connection is slow, and the file is
+ lengthy? The connection will probably fail before the whole file
+ is retrieved, more than once. In this case, Wget will try getting
+ the file until it either gets the whole of it, or exceeds the
+ default number of retries (this being 20). It is easy to change
+ the number of tries to 45, to insure that the whole file will
+ arrive safely:
+
+ wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg
+
+ • Now let’s leave Wget to work in the background, and write its
+ progress to log file ‘log’. It is tiring to type ‘--tries’, so we
+ shall use ‘-t’.
+
+ wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
+
+ The ampersand at the end of the line makes sure that Wget works in
+ the background. To unlimit the number of retries, use ‘-t inf’.
+
+ • The usage of FTP is as simple. Wget will take care of login and
+ password.
+
+ wget ftp://gnjilux.srk.fer.hr/welcome.msg
+
+ • If you specify a directory, Wget will retrieve the directory
+ listing, parse it and convert it to HTML. Try:
+
+ wget ftp://ftp.gnu.org/pub/gnu/
+ links index.html
+
+
+File: wget.info, Node: Advanced Usage, Next: Very Advanced Usage, Prev: Simple Usage, Up: Examples
+
+7.2 Advanced Usage
+==================
+
+ • You have a file that contains the URLs you want to download? Use
+ the ‘-i’ switch:
+
+ wget -i FILE
+
+ If you specify ‘-’ as file name, the URLs will be read from
+ standard input.
+
+ • Create a five levels deep mirror image of the GNU web site, with
+ the same directory structure the original has, with only one try
+ per document, saving the log of the activities to ‘gnulog’:
+
+ wget -r https://www.gnu.org/ -o gnulog
+
+ • The same as the above, but convert the links in the downloaded
+ files to point to local files, so you can view the documents
+ off-line:
+
+ wget --convert-links -r https://www.gnu.org/ -o gnulog
+
+ • Retrieve only one HTML page, but make sure that all the elements
+ needed for the page to be displayed, such as inline images and
+ external style sheets, are also downloaded. Also make sure the
+ downloaded page references the downloaded links.
+
+ wget -p --convert-links http://www.example.com/dir/page.html
+
+ The HTML page will be saved to ‘www.example.com/dir/page.html’, and
+ the images, stylesheets, etc., somewhere under ‘www.example.com/’,
+ depending on where they were on the remote server.
+
+ • The same as the above, but without the ‘www.example.com/’
+ directory. In fact, I don’t want to have all those random server
+ directories anyway—just save _all_ those files under a ‘download/’
+ subdirectory of the current directory.
+
+ wget -p --convert-links -nH -nd -Pdownload \
+ http://www.example.com/dir/page.html
+
+ • Retrieve the index.html of ‘www.lycos.com’, showing the original
+ server headers:
+
+ wget -S http://www.lycos.com/
+
+ • Save the server headers with the file, perhaps for post-processing.
+
+ wget --save-headers http://www.lycos.com/
+ more index.html
+
+ • Retrieve the first two levels of ‘wuarchive.wustl.edu’, saving them
+ to ‘/tmp’.
+
+ wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
+
+ • You want to download all the GIFs from a directory on an HTTP
+ server. You tried ‘wget http://www.example.com/dir/*.gif’, but
+ that didn’t work because HTTP retrieval does not support globbing.
+ In that case, use:
+
+ wget -r -l1 --no-parent -A.gif http://www.example.com/dir/
+
+ More verbose, but the effect is the same. ‘-r -l1’ means to
+ retrieve recursively (*note Recursive Download::), with maximum
+ depth of 1. ‘--no-parent’ means that references to the parent
+ directory are ignored (*note Directory-Based Limits::), and
+ ‘-A.gif’ means to download only the GIF files. ‘-A "*.gif"’ would
+ have worked too.
+
+ • Suppose you were in the middle of downloading, when Wget was
+ interrupted. Now you do not want to clobber the files already
+ present. It would be:
+
+ wget -nc -r https://www.gnu.org/
+
+ • If you want to encode your own username and password to HTTP or
+ FTP, use the appropriate URL syntax (*note URL Format::).
+
+ wget ftp://hniksic:mypassword@unix.example.com/.emacs
+
+ Note, however, that this usage is not advisable on multi-user
+ systems because it reveals your password to anyone who looks at the
+ output of ‘ps’.
+
+ • You would like the output documents to go to standard output
+ instead of to files?
+
+ wget -O - http://jagor.srce.hr/ http://www.srce.hr/
+
+ You can also combine the two options and make pipelines to retrieve
+ the documents from remote hotlists:
+
+ wget -O - http://cool.list.com/ | wget --force-html -i -
+
+
+File: wget.info, Node: Very Advanced Usage, Prev: Advanced Usage, Up: Examples
+
+7.3 Very Advanced Usage
+=======================
+
+ • If you wish Wget to keep a mirror of a page (or FTP
+ subdirectories), use ‘--mirror’ (‘-m’), which is the shorthand for
+ ‘-r -l inf -N’. You can put Wget in the crontab file asking it to
+ recheck a site each Sunday:
+
+ crontab
+ 0 0 * * 0 wget --mirror https://www.gnu.org/ -o /home/me/weeklog
+
+ • In addition to the above, you want the links to be converted for
+ local viewing. But, after having read this manual, you know that
+ link conversion doesn’t play well with timestamping, so you also
+ want Wget to back up the original HTML files before the conversion.
+ Wget invocation would look like this:
+
+ wget --mirror --convert-links --backup-converted \
+ https://www.gnu.org/ -o /home/me/weeklog
+
+ • But you’ve also noticed that local viewing doesn’t work all that
+ well when HTML files are saved under extensions other than ‘.html’,
+ perhaps because they were served as ‘index.cgi’. So you’d like
+ Wget to rename all the files served with content-type ‘text/html’
+ or ‘application/xhtml+xml’ to ‘NAME.html’.
+
+ wget --mirror --convert-links --backup-converted \
+ --html-extension -o /home/me/weeklog \
+ https://www.gnu.org/
+
+ Or, with less typing:
+
+ wget -m -k -K -E https://www.gnu.org/ -o /home/me/weeklog
+
+
+File: wget.info, Node: Various, Next: Appendices, Prev: Examples, Up: Top
+
+8 Various
+*********
+
+This chapter contains all the stuff that could not fit anywhere else.
+
+* Menu:
+
+* Proxies:: Support for proxy servers.
+* Distribution:: Getting the latest version.
+* Web Site:: GNU Wget’s presence on the World Wide Web.
+* Mailing Lists:: Wget mailing list for announcements and discussion.
+* Internet Relay Chat:: Wget’s presence on IRC.
+* Reporting Bugs:: How and where to report bugs.
+* Portability:: The systems Wget works on.
+* Signals:: Signal-handling performed by Wget.
+
+
+File: wget.info, Node: Proxies, Next: Distribution, Prev: Various, Up: Various
+
+8.1 Proxies
+===========
+
+“Proxies” are special-purpose HTTP servers designed to transfer data
+from remote servers to local clients. One typical use of proxies is
+lightening network load for users behind a slow connection. This is
+achieved by channeling all HTTP and FTP requests through the proxy which
+caches the transferred data. When a cached resource is requested again,
+proxy will return the data from cache. Another use for proxies is for
+companies that separate (for security reasons) their internal networks
+from the rest of Internet. In order to obtain information from the Web,
+their users connect and retrieve remote data using an authorized proxy.
+
+ Wget supports proxies for both HTTP and FTP retrievals. The standard
+way to specify proxy location, which Wget recognizes, is using the
+following environment variables:
+
+‘http_proxy’
+‘https_proxy’
+ If set, the ‘http_proxy’ and ‘https_proxy’ variables should contain
+ the URLs of the proxies for HTTP and HTTPS connections
+ respectively.
+
+‘ftp_proxy’
+ This variable should contain the URL of the proxy for FTP
+ connections. It is quite common that ‘http_proxy’ and ‘ftp_proxy’
+ are set to the same URL.
+
+‘no_proxy’
+ This variable should contain a comma-separated list of domain
+ extensions proxy should _not_ be used for. For instance, if the
+ value of ‘no_proxy’ is ‘.mit.edu’, proxy will not be used to
+ retrieve documents from MIT.
+
+ In addition to the environment variables, proxy location and settings
+may be specified from within Wget itself.
+
+‘--no-proxy’
+‘proxy = on/off’
+ This option and the corresponding command may be used to suppress
+ the use of proxy, even if the appropriate environment variables are
+ set.
+
+‘http_proxy = URL’
+‘https_proxy = URL’
+‘ftp_proxy = URL’
+‘no_proxy = STRING’
+ These startup file variables allow you to override the proxy
+ settings specified by the environment.
+
+ Some proxy servers require authorization to enable you to use them.
+The authorization consists of “username” and “password”, which must be
+sent by Wget. As with HTTP authorization, several authentication
+schemes exist. For proxy authorization only the ‘Basic’ authentication
+scheme is currently implemented.
+
+ You may specify your username and password either through the proxy
+URL or through the command-line options. Assuming that the company’s
+proxy is located at ‘proxy.company.com’ at port 8001, a proxy URL
+location containing authorization data might look like this:
+
+ http://hniksic:mypassword@proxy.company.com:8001/
+
+ Alternatively, you may use the ‘proxy-user’ and ‘proxy-password’
+options, and the equivalent ‘.wgetrc’ settings ‘proxy_user’ and
+‘proxy_password’ to set the proxy username and password.
+
+
+File: wget.info, Node: Distribution, Next: Web Site, Prev: Proxies, Up: Various
+
+8.2 Distribution
+================
+
+Like all GNU utilities, the latest version of Wget can be found at the
+master GNU archive site ftp.gnu.org, and its mirrors. For example, Wget
+1.20.1 can be found at
+<https://ftp.gnu.org/pub/gnu/wget/wget-1.20.1.tar.gz>
+
+
+File: wget.info, Node: Web Site, Next: Mailing Lists, Prev: Distribution, Up: Various
+
+8.3 Web Site
+============
+
+The official web site for GNU Wget is at
+<https//www.gnu.org/software/wget/>. However, most useful information
+resides at “The Wget Wgiki”, <http://wget.addictivecode.org/>.
+
+
+File: wget.info, Node: Mailing Lists, Next: Internet Relay Chat, Prev: Web Site, Up: Various
+
+8.4 Mailing Lists
+=================
+
+Primary List
+------------
+
+The primary mailinglist for discussion, bug-reports, or questions about
+GNU Wget is at <bug-wget@gnu.org>. To subscribe, send an email to
+<bug-wget-join@gnu.org>, or visit
+<https://lists.gnu.org/mailman/listinfo/bug-wget>.
+
+ You do not need to subscribe to send a message to the list; however,
+please note that unsubscribed messages are moderated, and may take a
+while before they hit the list—*usually around a day*. If you want your
+message to show up immediately, please subscribe to the list before
+posting. Archives for the list may be found at
+<https://lists.gnu.org/archive/html/bug-wget/>.
+
+ An NNTP/Usenettish gateway is also available via Gmane
+(http://gmane.org/about.php). You can see the Gmane archives at
+<http://news.gmane.org/gmane.comp.web.wget.general>. Note that the
+Gmane archives conveniently include messages from both the current list,
+and the previous one. Messages also show up in the Gmane archives
+sooner than they do at <https://lists.gnu.org>.
+
+Obsolete Lists
+--------------
+
+Previously, the mailing list <wget@sunsite.dk> was used as the main
+discussion list, and another list, <wget-patches@sunsite.dk> was used
+for submitting and discussing patches to GNU Wget.
+
+ Messages from <wget@sunsite.dk> are archived at
+ <https://www.mail-archive.com/wget%40sunsite.dk/> and at
+ <http://news.gmane.org/gmane.comp.web.wget.general> (which also
+ continues to archive the current list, <bug-wget@gnu.org>).
+
+ Messages from <wget-patches@sunsite.dk> are archived at
+ <http://news.gmane.org/gmane.comp.web.wget.patches>.
+
+
+File: wget.info, Node: Internet Relay Chat, Next: Reporting Bugs, Prev: Mailing Lists, Up: Various
+
+8.5 Internet Relay Chat
+=======================
+
+In addition to the mailinglists, we also have a support channel set up
+via IRC at ‘irc.freenode.org’, ‘#wget’. Come check it out!
+
+
+File: wget.info, Node: Reporting Bugs, Next: Portability, Prev: Internet Relay Chat, Up: Various
+
+8.6 Reporting Bugs
+==================
+
+You are welcome to submit bug reports via the GNU Wget bug tracker (see
+<https://savannah.gnu.org/bugs/?func=additem&group=wget>) or to our
+mailing list <bug-wget@gnu.org>.
+
+ Visit <https://lists.gnu.org/mailman/listinfo/bug-wget> to get more
+info (how to subscribe, list archives, ...).
+
+ Before actually submitting a bug report, please try to follow a few
+simple guidelines.
+
+ 1. Please try to ascertain that the behavior you see really is a bug.
+ If Wget crashes, it’s a bug. If Wget does not behave as
+ documented, it’s a bug. If things work strange, but you are not
+ sure about the way they are supposed to work, it might well be a
+ bug, but you might want to double-check the documentation and the
+ mailing lists (*note Mailing Lists::).
+
+ 2. Try to repeat the bug in as simple circumstances as possible. E.g.
+ if Wget crashes while downloading ‘wget -rl0 -kKE -t5 --no-proxy
+ http://example.com -o /tmp/log’, you should try to see if the crash
+ is repeatable, and if will occur with a simpler set of options.
+ You might even try to start the download at the page where the
+ crash occurred to see if that page somehow triggered the crash.
+
+ Also, while I will probably be interested to know the contents of
+ your ‘.wgetrc’ file, just dumping it into the debug message is
+ probably a bad idea. Instead, you should first try to see if the
+ bug repeats with ‘.wgetrc’ moved out of the way. Only if it turns
+ out that ‘.wgetrc’ settings affect the bug, mail me the relevant
+ parts of the file.
+
+ 3. Please start Wget with ‘-d’ option and send us the resulting output
+ (or relevant parts thereof). If Wget was compiled without debug
+ support, recompile it—it is _much_ easier to trace bugs with debug
+ support on.
+
+ Note: please make sure to remove any potentially sensitive
+ information from the debug log before sending it to the bug
+ address. The ‘-d’ won’t go out of its way to collect sensitive
+ information, but the log _will_ contain a fairly complete
+ transcript of Wget’s communication with the server, which may
+ include passwords and pieces of downloaded data. Since the bug
+ address is publically archived, you may assume that all bug reports
+ are visible to the public.
+
+ 4. If Wget has crashed, try to run it in a debugger, e.g. ‘gdb `which
+ wget` core’ and type ‘where’ to get the backtrace. This may not
+ work if the system administrator has disabled core files, but it is
+ safe to try.
+
+
+File: wget.info, Node: Portability, Next: Signals, Prev: Reporting Bugs, Up: Various
+
+8.7 Portability
+===============
+
+Like all GNU software, Wget works on the GNU system. However, since it
+uses GNU Autoconf for building and configuring, and mostly avoids using
+“special” features of any particular Unix, it should compile (and work)
+on all common Unix flavors.
+
+ Various Wget versions have been compiled and tested under many kinds
+of Unix systems, including GNU/Linux, Solaris, SunOS 4.x, Mac OS X, OSF
+(aka Digital Unix or Tru64), Ultrix, *BSD, IRIX, AIX, and others. Some
+of those systems are no longer in widespread use and may not be able to
+support recent versions of Wget. If Wget fails to compile on your
+system, we would like to know about it.
+
+ Thanks to kind contributors, this version of Wget compiles and works
+on 32-bit Microsoft Windows platforms. It has been compiled
+successfully using MS Visual C++ 6.0, Watcom, Borland C, and GCC
+compilers. Naturally, it is crippled of some features available on
+Unix, but it should work as a substitute for people stuck with Windows.
+Note that Windows-specific portions of Wget are not guaranteed to be
+supported in the future, although this has been the case in practice for
+many years now. All questions and problems in Windows usage should be
+reported to Wget mailing list at <wget@sunsite.dk> where the volunteers
+who maintain the Windows-related features might look at them.
+
+ Support for building on MS-DOS via DJGPP has been contributed by
+Gisle Vanem; a port to VMS is maintained by Steven Schweda, and is
+available at <https://antinode.info/dec/sw/wget.html>.
+
+
+File: wget.info, Node: Signals, Prev: Portability, Up: Various
+
+8.8 Signals
+===========
+
+Since the purpose of Wget is background work, it catches the hangup
+signal (‘SIGHUP’) and ignores it. If the output was on standard output,
+it will be redirected to a file named ‘wget-log’. Otherwise, ‘SIGHUP’
+is ignored. This is convenient when you wish to redirect the output of
+Wget after having started it.
+
+ $ wget http://www.gnus.org/dist/gnus.tar.gz &
+ ...
+ $ kill -HUP %%
+ SIGHUP received, redirecting output to `wget-log'.
+
+ Other than that, Wget will not try to interfere with signals in any
+way. ‘C-c’, ‘kill -TERM’ and ‘kill -KILL’ should kill it alike.
+
+
+File: wget.info, Node: Appendices, Next: Copying this manual, Prev: Various, Up: Top
+
+9 Appendices
+************
+
+This chapter contains some references I consider useful.
+
+* Menu:
+
+* Robot Exclusion:: Wget’s support for RES.
+* Security Considerations:: Security with Wget.
+* Contributors:: People who helped.
+
+
+File: wget.info, Node: Robot Exclusion, Next: Security Considerations, Prev: Appendices, Up: Appendices
+
+9.1 Robot Exclusion
+===================
+
+It is extremely easy to make Wget wander aimlessly around a web site,
+sucking all the available data in progress. ‘wget -r SITE’, and you’re
+set. Great? Not for the server admin.
+
+ As long as Wget is only retrieving static pages, and doing it at a
+reasonable rate (see the ‘--wait’ option), there’s not much of a
+problem. The trouble is that Wget can’t tell the difference between the
+smallest static page and the most demanding CGI. A site I know has a
+section handled by a CGI Perl script that converts Info files to HTML on
+the fly. The script is slow, but works well enough for human users
+viewing an occasional Info file. However, when someone’s recursive Wget
+download stumbles upon the index page that links to all the Info files
+through the script, the system is brought to its knees without providing
+anything useful to the user (This task of converting Info files could be
+done locally and access to Info documentation for all installed GNU
+software on a system is available from the ‘info’ command).
+
+ To avoid this kind of accident, as well as to preserve privacy for
+documents that need to be protected from well-behaved robots, the
+concept of “robot exclusion” was invented. The idea is that the server
+administrators and document authors can specify which portions of the
+site they wish to protect from robots and those they will permit access.
+
+ The most popular mechanism, and the de facto standard supported by
+all the major robots, is the “Robots Exclusion Standard” (RES) written
+by Martijn Koster et al. in 1994. It specifies the format of a text
+file containing directives that instruct the robots which URL paths to
+avoid. To be found by the robots, the specifications must be placed in
+‘/robots.txt’ in the server root, which the robots are expected to
+download and parse.
+
+ Although Wget is not a web robot in the strictest sense of the word,
+it can download large parts of the site without the user’s intervention
+to download an individual page. Because of that, Wget honors RES when
+downloading recursively. For instance, when you issue:
+
+ wget -r http://www.example.com/
+
+ First the index of ‘www.example.com’ will be downloaded. If Wget
+finds that it wants to download more documents from that server, it will
+request ‘http://www.example.com/robots.txt’ and, if found, use it for
+further downloads. ‘robots.txt’ is loaded only once per each server.
+
+ Until version 1.8, Wget supported the first version of the standard,
+written by Martijn Koster in 1994 and available at
+<http://www.robotstxt.org/orig.html>. As of version 1.8, Wget has
+supported the additional directives specified in the internet draft
+‘<draft-koster-robots-00.txt>’ titled “A Method for Web Robots Control”.
+The draft, which has as far as I know never made to an RFC, is available
+at <http://www.robotstxt.org/norobots-rfc.txt>.
+
+ This manual no longer includes the text of the Robot Exclusion
+Standard.
+
+ The second, less known mechanism, enables the author of an individual
+document to specify whether they want the links from the file to be
+followed by a robot. This is achieved using the ‘META’ tag, like this:
+
+ <meta name="robots" content="nofollow">
+
+ This is explained in some detail at
+<http://www.robotstxt.org/meta.html>. Wget supports this method of
+robot exclusion in addition to the usual ‘/robots.txt’ exclusion.
+
+ If you know what you are doing and really really wish to turn off the
+robot exclusion, set the ‘robots’ variable to ‘off’ in your ‘.wgetrc’.
+You can achieve the same effect from the command line using the ‘-e’
+switch, e.g. ‘wget -e robots=off URL...’.
+
+
+File: wget.info, Node: Security Considerations, Next: Contributors, Prev: Robot Exclusion, Up: Appendices
+
+9.2 Security Considerations
+===========================
+
+When using Wget, you must be aware that it sends unencrypted passwords
+through the network, which may present a security problem. Here are the
+main issues, and some solutions.
+
+ 1. The passwords on the command line are visible using ‘ps’. The best
+ way around it is to use ‘wget -i -’ and feed the URLs to Wget’s
+ standard input, each on a separate line, terminated by ‘C-d’.
+ Another workaround is to use ‘.netrc’ to store passwords; however,
+ storing unencrypted passwords is also considered a security risk.
+
+ 2. Using the insecure “basic” authentication scheme, unencrypted
+ passwords are transmitted through the network routers and gateways.
+
+ 3. The FTP passwords are also in no way encrypted. There is no good
+ solution for this at the moment.
+
+ 4. Although the “normal” output of Wget tries to hide the passwords,
+ debugging logs show them, in all forms. This problem is avoided by
+ being careful when you send debug logs (yes, even when you send
+ them to me).
+
+
+File: wget.info, Node: Contributors, Prev: Security Considerations, Up: Appendices
+
+9.3 Contributors
+================
+
+GNU Wget was written by Hrvoje Nikšić <hniksic@xemacs.org>,
+
+ However, the development of Wget could never have gone as far as it
+has, were it not for the help of many people, either with bug reports,
+feature proposals, patches, or letters saying “Thanks!”.
+
+ Special thanks goes to the following people (no particular order):
+
+ • Dan Harkless—contributed a lot of code and documentation of
+ extremely high quality, as well as the ‘--page-requisites’ and
+ related options. He was the principal maintainer for some time and
+ released Wget 1.6.
+
+ • Ian Abbott—contributed bug fixes, Windows-related fixes, and
+ provided a prototype implementation of the breadth-first recursive
+ download. Co-maintained Wget during the 1.8 release cycle.
+
+ • The dotsrc.org crew, in particular Karsten Thygesen—donated system
+ resources such as the mailing list, web space, FTP space, and
+ version control repositories, along with a lot of time to make
+ these actually work. Christian Reiniger was of invaluable help
+ with setting up Subversion.
+
+ • Heiko Herold—provided high-quality Windows builds and contributed
+ bug and build reports for many years.
+
+ • Shawn McHorse—bug reports and patches.
+
+ • Kaveh R. Ghazi—on-the-fly ‘ansi2knr’-ization. Lots of portability
+ fixes.
+
+ • Gordon Matzigkeit—‘.netrc’ support.
+
+ • Zlatko Čalušić, Tomislav Vujec and Dražen Kačar—feature suggestions
+ and “philosophical” discussions.
+
+ • Darko Budor—initial port to Windows.
+
+ • Antonio Rosella—help and suggestions, plus the initial Italian
+ translation.
+
+ • Tomislav Petrović, Mario Mikočević—many bug reports and
+ suggestions.
+
+ • François Pinard—many thorough bug reports and discussions.
+
+ • Karl Eichwalder—lots of help with internationalization, Makefile
+ layout and many other things.
+
+ • Junio Hamano—donated support for Opie and HTTP ‘Digest’
+ authentication.
+
+ • Mauro Tortonesi—improved IPv6 support, adding support for dual
+ family systems. Refactored and enhanced FTP IPv6 code. Maintained
+ GNU Wget from 2004–2007.
+
+ • Christopher G. Lewis—maintenance of the Windows version of GNU
+ WGet.
+
+ • Gisle Vanem—many helpful patches and improvements, especially for
+ Windows and MS-DOS support.
+
+ • Ralf Wildenhues—contributed patches to convert Wget to use Automake
+ as part of its build process, and various bugfixes.
+
+ • Steven Schubiger—Many helpful patches, bugfixes and improvements.
+ Notably, conversion of Wget to use the Gnulib quotes and quoteargs
+ modules, and the addition of password prompts at the console, via
+ the Gnulib getpasswd-gnu module.
+
+ • Ted Mielczarek—donated support for CSS.
+
+ • Saint Xavier—Support for IRIs (RFC 3987).
+
+ • People who provided donations for development—including Brian
+ Gough.
+
+ The following people have provided patches, bug/build reports, useful
+suggestions, beta testing services, fan mail and all the other things
+that make maintenance so much fun:
+
+ Tim Adam, Adrian Aichner, Martin Baehr, Dieter Baron, Roger Beeman,
+Dan Berger, T. Bharath, Christian Biere, Paul Bludov, Daniel Bodea, Mark
+Boyns, John Burden, Julien Buty, Wanderlei Cavassin, Gilles Cedoc, Tim
+Charron, Noel Cragg, Kristijan Čonkaš, John Daily, Andreas Damm, Ahmon
+Dancy, Andrew Davison, Bertrand Demiddelaer, Alexander Dergachev, Andrew
+Deryabin, Ulrich Drepper, Marc Duponcheel, Damir Džeko, Alan Eldridge,
+Hans-Andreas Engel, Aleksandar Erkalović, Andy Eskilsson, João Ferreira,
+Christian Fraenkel, David Fritz, Mike Frysinger, Charles C. Fu,
+FUJISHIMA Satsuki, Masashi Fujita, Howard Gayle, Marcel Gerrits, Lemble
+Gregory, Hans Grobler, Alain Guibert, Mathieu Guillaume, Aaron Hawley,
+Jochen Hein, Karl Heuer, Madhusudan Hosaagrahara, HIROSE Masaaki, Ulf
+Harnhammar, Gregor Hoffleit, Erik Magnus Hulthen, Richard Huveneers,
+Jonas Jensen, Larry Jones, Simon Josefsson, Mario Jurić, Hack Kampbjørn,
+Const Kaplinsky, Goran Kezunović, Igor Khristophorov, Robert Kleine,
+KOJIMA Haime, Fila Kolodny, Alexander Kourakos, Martin Kraemer, Sami
+Krank, Jay Krell, Σίμος Ξενιτέλλης (Simos KSenitellis), Christian
+Lackas, Hrvoje Lacko, Daniel S. Lewart, Nicolás Lichtmeier, Dave Love,
+Alexander V. Lukyanov, Thomas Lußnig, Andre Majorel, Aurelien Marchand,
+Matthew J. Mellon, Jordan Mendelson, Ted Mielczarek, Robert Millan, Lin
+Zhe Min, Jan Minar, Tim Mooney, Keith Moore, Adam D. Moss, Simon Munton,
+Charlie Negyesi, R. K. Owen, Jim Paris, Kenny Parnell, Leonid Petrov,
+Simone Piunno, Andrew Pollock, Steve Pothier, Jan Přikryl, Marin Purgar,
+Csaba Ráduly, Keith Refson, Bill Richardson, Tyler Riddle, Tobias
+Ringstrom, Jochen Roderburg, Juan José Rodríguez, Maciej W. Rozycki,
+Edward J. Sabol, Heinz Salzmann, Robert Schmidt, Nicolas Schodet, Benno
+Schulenberg, Andreas Schwab, Steven M. Schweda, Chris Seawood, Pranab
+Shenoy, Dennis Smit, Toomas Soome, Tage Stabell-Kulo, Philip Stadermann,
+Daniel Stenberg, Sven Sternberger, Markus Strasser, John Summerfield,
+Szakacsits Szabolcs, Mike Thomas, Philipp Thomas, Mauro Tortonesi, Dave
+Turner, Gisle Vanem, Rabin Vincent, Russell Vincent, Željko Vrba,
+Charles G Waldman, Douglas E. Wegscheid, Ralf Wildenhues, Joshua David
+Williams, Benjamin Wolsey, Saint Xavier, YAMAZAKI Makoto, Jasmin Zainul,
+Bojan Ždrnja, Kristijan Zimmer, Xin Zou.
+
+ Apologies to all who I accidentally left out, and many thanks to all
+the subscribers of the Wget mailing list.
+
+
+File: wget.info, Node: Copying this manual, Next: Concept Index, Prev: Appendices, Up: Top
+
+Appendix A Copying this manual
+******************************
+
+* Menu:
+
+* GNU Free Documentation License:: License for copying this manual.
+
+
+File: wget.info, Node: GNU Free Documentation License, Prev: Copying this manual, Up: Copying this manual
+
+A.1 GNU Free Documentation License
+==================================
+
+ Version 1.3, 3 November 2008
+
+ Copyright © 2000-2002, 2007-2008, 2015, 2018 Free Software
+ Foundation, Inc.
+ <http://fsf.org/>
+
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ 0. PREAMBLE
+
+ The purpose of this License is to make a manual, textbook, or other
+ functional and useful document “free” in the sense of freedom: to
+ assure everyone the effective freedom to copy and redistribute it,
+ with or without modifying it, either commercially or
+ noncommercially. Secondarily, this License preserves for the
+ author and publisher a way to get credit for their work, while not
+ being considered responsible for modifications made by others.
+
+ This License is a kind of “copyleft”, which means that derivative
+ works of the document must themselves be free in the same sense.
+ It complements the GNU General Public License, which is a copyleft
+ license designed for free software.
+
+ We have designed this License in order to use it for manuals for
+ free software, because free software needs free documentation: a
+ free program should come with manuals providing the same freedoms
+ that the software does. But this License is not limited to
+ software manuals; it can be used for any textual work, regardless
+ of subject matter or whether it is published as a printed book. We
+ recommend this License principally for works whose purpose is
+ instruction or reference.
+
+ 1. APPLICABILITY AND DEFINITIONS
+
+ This License applies to any manual or other work, in any medium,
+ that contains a notice placed by the copyright holder saying it can
+ be distributed under the terms of this License. Such a notice
+ grants a world-wide, royalty-free license, unlimited in duration,
+ to use that work under the conditions stated herein. The
+ “Document”, below, refers to any such manual or work. Any member
+ of the public is a licensee, and is addressed as “you”. You accept
+ the license if you copy, modify or distribute the work in a way
+ requiring permission under copyright law.
+
+ A “Modified Version” of the Document means any work containing the
+ Document or a portion of it, either copied verbatim, or with
+ modifications and/or translated into another language.
+
+ A “Secondary Section” is a named appendix or a front-matter section
+ of the Document that deals exclusively with the relationship of the
+ publishers or authors of the Document to the Document’s overall
+ subject (or to related matters) and contains nothing that could
+ fall directly within that overall subject. (Thus, if the Document
+ is in part a textbook of mathematics, a Secondary Section may not
+ explain any mathematics.) The relationship could be a matter of
+ historical connection with the subject or with related matters, or
+ of legal, commercial, philosophical, ethical or political position
+ regarding them.
+
+ The “Invariant Sections” are certain Secondary Sections whose
+ titles are designated, as being those of Invariant Sections, in the
+ notice that says that the Document is released under this License.
+ If a section does not fit the above definition of Secondary then it
+ is not allowed to be designated as Invariant. The Document may
+ contain zero Invariant Sections. If the Document does not identify
+ any Invariant Sections then there are none.
+
+ The “Cover Texts” are certain short passages of text that are
+ listed, as Front-Cover Texts or Back-Cover Texts, in the notice
+ that says that the Document is released under this License. A
+ Front-Cover Text may be at most 5 words, and a Back-Cover Text may
+ be at most 25 words.
+
+ A “Transparent” copy of the Document means a machine-readable copy,
+ represented in a format whose specification is available to the
+ general public, that is suitable for revising the document
+ straightforwardly with generic text editors or (for images composed
+ of pixels) generic paint programs or (for drawings) some widely
+ available drawing editor, and that is suitable for input to text
+ formatters or for automatic translation to a variety of formats
+ suitable for input to text formatters. A copy made in an otherwise
+ Transparent file format whose markup, or absence of markup, has
+ been arranged to thwart or discourage subsequent modification by
+ readers is not Transparent. An image format is not Transparent if
+ used for any substantial amount of text. A copy that is not
+ “Transparent” is called “Opaque”.
+
+ Examples of suitable formats for Transparent copies include plain
+ ASCII without markup, Texinfo input format, LaTeX input format,
+ SGML or XML using a publicly available DTD, and standard-conforming
+ simple HTML, PostScript or PDF designed for human modification.
+ Examples of transparent image formats include PNG, XCF and JPG.
+ Opaque formats include proprietary formats that can be read and
+ edited only by proprietary word processors, SGML or XML for which
+ the DTD and/or processing tools are not generally available, and
+ the machine-generated HTML, PostScript or PDF produced by some word
+ processors for output purposes only.
+
+ The “Title Page” means, for a printed book, the title page itself,
+ plus such following pages as are needed to hold, legibly, the
+ material this License requires to appear in the title page. For
+ works in formats which do not have any title page as such, “Title
+ Page” means the text near the most prominent appearance of the
+ work’s title, preceding the beginning of the body of the text.
+
+ The “publisher” means any person or entity that distributes copies
+ of the Document to the public.
+
+ A section “Entitled XYZ” means a named subunit of the Document
+ whose title either is precisely XYZ or contains XYZ in parentheses
+ following text that translates XYZ in another language. (Here XYZ
+ stands for a specific section name mentioned below, such as
+ “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.)
+ To “Preserve the Title” of such a section when you modify the
+ Document means that it remains a section “Entitled XYZ” according
+ to this definition.
+
+ The Document may include Warranty Disclaimers next to the notice
+ which states that this License applies to the Document. These
+ Warranty Disclaimers are considered to be included by reference in
+ this License, but only as regards disclaiming warranties: any other
+ implication that these Warranty Disclaimers may have is void and
+ has no effect on the meaning of this License.
+
+ 2. VERBATIM COPYING
+
+ You may copy and distribute the Document in any medium, either
+ commercially or noncommercially, provided that this License, the
+ copyright notices, and the license notice saying this License
+ applies to the Document are reproduced in all copies, and that you
+ add no other conditions whatsoever to those of this License. You
+ may not use technical measures to obstruct or control the reading
+ or further copying of the copies you make or distribute. However,
+ you may accept compensation in exchange for copies. If you
+ distribute a large enough number of copies you must also follow the
+ conditions in section 3.
+
+ You may also lend copies, under the same conditions stated above,
+ and you may publicly display copies.
+
+ 3. COPYING IN QUANTITY
+
+ If you publish printed copies (or copies in media that commonly
+ have printed covers) of the Document, numbering more than 100, and
+ the Document’s license notice requires Cover Texts, you must
+ enclose the copies in covers that carry, clearly and legibly, all
+ these Cover Texts: Front-Cover Texts on the front cover, and
+ Back-Cover Texts on the back cover. Both covers must also clearly
+ and legibly identify you as the publisher of these copies. The
+ front cover must present the full title with all words of the title
+ equally prominent and visible. You may add other material on the
+ covers in addition. Copying with changes limited to the covers, as
+ long as they preserve the title of the Document and satisfy these
+ conditions, can be treated as verbatim copying in other respects.
+
+ If the required texts for either cover are too voluminous to fit
+ legibly, you should put the first ones listed (as many as fit
+ reasonably) on the actual cover, and continue the rest onto
+ adjacent pages.
+
+ If you publish or distribute Opaque copies of the Document
+ numbering more than 100, you must either include a machine-readable
+ Transparent copy along with each Opaque copy, or state in or with
+ each Opaque copy a computer-network location from which the general
+ network-using public has access to download using public-standard
+ network protocols a complete Transparent copy of the Document, free
+ of added material. If you use the latter option, you must take
+ reasonably prudent steps, when you begin distribution of Opaque
+ copies in quantity, to ensure that this Transparent copy will
+ remain thus accessible at the stated location until at least one
+ year after the last time you distribute an Opaque copy (directly or
+ through your agents or retailers) of that edition to the public.
+
+ It is requested, but not required, that you contact the authors of
+ the Document well before redistributing any large number of copies,
+ to give them a chance to provide you with an updated version of the
+ Document.
+
+ 4. MODIFICATIONS
+
+ You may copy and distribute a Modified Version of the Document
+ under the conditions of sections 2 and 3 above, provided that you
+ release the Modified Version under precisely this License, with the
+ Modified Version filling the role of the Document, thus licensing
+ distribution and modification of the Modified Version to whoever
+ possesses a copy of it. In addition, you must do these things in
+ the Modified Version:
+
+ A. Use in the Title Page (and on the covers, if any) a title
+ distinct from that of the Document, and from those of previous
+ versions (which should, if there were any, be listed in the
+ History section of the Document). You may use the same title
+ as a previous version if the original publisher of that
+ version gives permission.
+
+ B. List on the Title Page, as authors, one or more persons or
+ entities responsible for authorship of the modifications in
+ the Modified Version, together with at least five of the
+ principal authors of the Document (all of its principal
+ authors, if it has fewer than five), unless they release you
+ from this requirement.
+
+ C. State on the Title page the name of the publisher of the
+ Modified Version, as the publisher.
+
+ D. Preserve all the copyright notices of the Document.
+
+ E. Add an appropriate copyright notice for your modifications
+ adjacent to the other copyright notices.
+
+ F. Include, immediately after the copyright notices, a license
+ notice giving the public permission to use the Modified
+ Version under the terms of this License, in the form shown in
+ the Addendum below.
+
+ G. Preserve in that license notice the full lists of Invariant
+ Sections and required Cover Texts given in the Document’s
+ license notice.
+
+ H. Include an unaltered copy of this License.
+
+ I. Preserve the section Entitled “History”, Preserve its Title,
+ and add to it an item stating at least the title, year, new
+ authors, and publisher of the Modified Version as given on the
+ Title Page. If there is no section Entitled “History” in the
+ Document, create one stating the title, year, authors, and
+ publisher of the Document as given on its Title Page, then add
+ an item describing the Modified Version as stated in the
+ previous sentence.
+
+ J. Preserve the network location, if any, given in the Document
+ for public access to a Transparent copy of the Document, and
+ likewise the network locations given in the Document for
+ previous versions it was based on. These may be placed in the
+ “History” section. You may omit a network location for a work
+ that was published at least four years before the Document
+ itself, or if the original publisher of the version it refers
+ to gives permission.
+
+ K. For any section Entitled “Acknowledgements” or “Dedications”,
+ Preserve the Title of the section, and preserve in the section
+ all the substance and tone of each of the contributor
+ acknowledgements and/or dedications given therein.
+
+ L. Preserve all the Invariant Sections of the Document, unaltered
+ in their text and in their titles. Section numbers or the
+ equivalent are not considered part of the section titles.
+
+ M. Delete any section Entitled “Endorsements”. Such a section
+ may not be included in the Modified Version.
+
+ N. Do not retitle any existing section to be Entitled
+ “Endorsements” or to conflict in title with any Invariant
+ Section.
+
+ O. Preserve any Warranty Disclaimers.
+
+ If the Modified Version includes new front-matter sections or
+ appendices that qualify as Secondary Sections and contain no
+ material copied from the Document, you may at your option designate
+ some or all of these sections as invariant. To do this, add their
+ titles to the list of Invariant Sections in the Modified Version’s
+ license notice. These titles must be distinct from any other
+ section titles.
+
+ You may add a section Entitled “Endorsements”, provided it contains
+ nothing but endorsements of your Modified Version by various
+ parties—for example, statements of peer review or that the text has
+ been approved by an organization as the authoritative definition of
+ a standard.
+
+ You may add a passage of up to five words as a Front-Cover Text,
+ and a passage of up to 25 words as a Back-Cover Text, to the end of
+ the list of Cover Texts in the Modified Version. Only one passage
+ of Front-Cover Text and one of Back-Cover Text may be added by (or
+ through arrangements made by) any one entity. If the Document
+ already includes a cover text for the same cover, previously added
+ by you or by arrangement made by the same entity you are acting on
+ behalf of, you may not add another; but you may replace the old
+ one, on explicit permission from the previous publisher that added
+ the old one.
+
+ The author(s) and publisher(s) of the Document do not by this
+ License give permission to use their names for publicity for or to
+ assert or imply endorsement of any Modified Version.
+
+ 5. COMBINING DOCUMENTS
+
+ You may combine the Document with other documents released under
+ this License, under the terms defined in section 4 above for
+ modified versions, provided that you include in the combination all
+ of the Invariant Sections of all of the original documents,
+ unmodified, and list them all as Invariant Sections of your
+ combined work in its license notice, and that you preserve all
+ their Warranty Disclaimers.
+
+ The combined work need only contain one copy of this License, and
+ multiple identical Invariant Sections may be replaced with a single
+ copy. If there are multiple Invariant Sections with the same name
+ but different contents, make the title of each such section unique
+ by adding at the end of it, in parentheses, the name of the
+ original author or publisher of that section if known, or else a
+ unique number. Make the same adjustment to the section titles in
+ the list of Invariant Sections in the license notice of the
+ combined work.
+
+ In the combination, you must combine any sections Entitled
+ “History” in the various original documents, forming one section
+ Entitled “History”; likewise combine any sections Entitled
+ “Acknowledgements”, and any sections Entitled “Dedications”. You
+ must delete all sections Entitled “Endorsements.”
+
+ 6. COLLECTIONS OF DOCUMENTS
+
+ You may make a collection consisting of the Document and other
+ documents released under this License, and replace the individual
+ copies of this License in the various documents with a single copy
+ that is included in the collection, provided that you follow the
+ rules of this License for verbatim copying of each of the documents
+ in all other respects.
+
+ You may extract a single document from such a collection, and
+ distribute it individually under this License, provided you insert
+ a copy of this License into the extracted document, and follow this
+ License in all other respects regarding verbatim copying of that
+ document.
+
+ 7. AGGREGATION WITH INDEPENDENT WORKS
+
+ A compilation of the Document or its derivatives with other
+ separate and independent documents or works, in or on a volume of a
+ storage or distribution medium, is called an “aggregate” if the
+ copyright resulting from the compilation is not used to limit the
+ legal rights of the compilation’s users beyond what the individual
+ works permit. When the Document is included in an aggregate, this
+ License does not apply to the other works in the aggregate which
+ are not themselves derivative works of the Document.
+
+ If the Cover Text requirement of section 3 is applicable to these
+ copies of the Document, then if the Document is less than one half
+ of the entire aggregate, the Document’s Cover Texts may be placed
+ on covers that bracket the Document within the aggregate, or the
+ electronic equivalent of covers if the Document is in electronic
+ form. Otherwise they must appear on printed covers that bracket
+ the whole aggregate.
+
+ 8. TRANSLATION
+
+ Translation is considered a kind of modification, so you may
+ distribute translations of the Document under the terms of section
+ 4. Replacing Invariant Sections with translations requires special
+ permission from their copyright holders, but you may include
+ translations of some or all Invariant Sections in addition to the
+ original versions of these Invariant Sections. You may include a
+ translation of this License, and all the license notices in the
+ Document, and any Warranty Disclaimers, provided that you also
+ include the original English version of this License and the
+ original versions of those notices and disclaimers. In case of a
+ disagreement between the translation and the original version of
+ this License or a notice or disclaimer, the original version will
+ prevail.
+
+ If a section in the Document is Entitled “Acknowledgements”,
+ “Dedications”, or “History”, the requirement (section 4) to
+ Preserve its Title (section 1) will typically require changing the
+ actual title.
+
+ 9. TERMINATION
+
+ You may not copy, modify, sublicense, or distribute the Document
+ except as expressly provided under this License. Any attempt
+ otherwise to copy, modify, sublicense, or distribute it is void,
+ and will automatically terminate your rights under this License.
+
+ However, if you cease all violation of this License, then your
+ license from a particular copyright holder is reinstated (a)
+ provisionally, unless and until the copyright holder explicitly and
+ finally terminates your license, and (b) permanently, if the
+ copyright holder fails to notify you of the violation by some
+ reasonable means prior to 60 days after the cessation.
+
+ Moreover, your license from a particular copyright holder is
+ reinstated permanently if the copyright holder notifies you of the
+ violation by some reasonable means, this is the first time you have
+ received notice of violation of this License (for any work) from
+ that copyright holder, and you cure the violation prior to 30 days
+ after your receipt of the notice.
+
+ Termination of your rights under this section does not terminate
+ the licenses of parties who have received copies or rights from you
+ under this License. If your rights have been terminated and not
+ permanently reinstated, receipt of a copy of some or all of the
+ same material does not give you any rights to use it.
+
+ 10. FUTURE REVISIONS OF THIS LICENSE
+
+ The Free Software Foundation may publish new, revised versions of
+ the GNU Free Documentation License from time to time. Such new
+ versions will be similar in spirit to the present version, but may
+ differ in detail to address new problems or concerns. See
+ <http://www.gnu.org/copyleft/>.
+
+ Each version of the License is given a distinguishing version
+ number. If the Document specifies that a particular numbered
+ version of this License “or any later version” applies to it, you
+ have the option of following the terms and conditions either of
+ that specified version or of any later version that has been
+ published (not as a draft) by the Free Software Foundation. If the
+ Document does not specify a version number of this License, you may
+ choose any version ever published (not as a draft) by the Free
+ Software Foundation. If the Document specifies that a proxy can
+ decide which future versions of this License can be used, that
+ proxy’s public statement of acceptance of a version permanently
+ authorizes you to choose that version for the Document.
+
+ 11. RELICENSING
+
+ “Massive Multiauthor Collaboration Site” (or “MMC Site”) means any
+ World Wide Web server that publishes copyrightable works and also
+ provides prominent facilities for anybody to edit those works. A
+ public wiki that anybody can edit is an example of such a server.
+ A “Massive Multiauthor Collaboration” (or “MMC”) contained in the
+ site means any set of copyrightable works thus published on the MMC
+ site.
+
+ “CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0
+ license published by Creative Commons Corporation, a not-for-profit
+ corporation with a principal place of business in San Francisco,
+ California, as well as future copyleft versions of that license
+ published by that same organization.
+
+ “Incorporate” means to publish or republish a Document, in whole or
+ in part, as part of another Document.
+
+ An MMC is “eligible for relicensing” if it is licensed under this
+ License, and if all works that were first published under this
+ License somewhere other than this MMC, and subsequently
+ incorporated in whole or in part into the MMC, (1) had no cover
+ texts or invariant sections, and (2) were thus incorporated prior
+ to November 1, 2008.
+
+ The operator of an MMC Site may republish an MMC contained in the
+ site under CC-BY-SA on the same site at any time before August 1,
+ 2009, provided the MMC is eligible for relicensing.
+
+ADDENDUM: How to use this License for your documents
+====================================================
+
+To use this License in a document you have written, include a copy of
+the License in the document and put the following copyright and license
+notices just after the title page:
+
+ Copyright (C) YEAR YOUR NAME.
+ Permission is granted to copy, distribute and/or modify this document
+ under the terms of the GNU Free Documentation License, Version 1.3
+ or any later version published by the Free Software Foundation;
+ with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
+ Texts. A copy of the license is included in the section entitled ``GNU
+ Free Documentation License''.
+
+ If you have Invariant Sections, Front-Cover Texts and Back-Cover
+Texts, replace the “with...Texts.” line with this:
+
+ with the Invariant Sections being LIST THEIR TITLES, with
+ the Front-Cover Texts being LIST, and with the Back-Cover Texts
+ being LIST.
+
+ If you have Invariant Sections without Cover Texts, or some other
+combination of the three, merge those two alternatives to suit the
+situation.
+
+ If your document contains nontrivial examples of program code, we
+recommend releasing these examples in parallel under your choice of free
+software license, such as the GNU General Public License, to permit
+their use in free software.
+
+
+File: wget.info, Node: Concept Index, Prev: Copying this manual, Up: Top
+
+Concept Index
+*************
+
+
+* Menu:
+
+* #wget: Internet Relay Chat. (line 6)
+* .css extension: HTTP Options. (line 10)
+* .html extension: HTTP Options. (line 10)
+* .listing files, removing: FTP Options. (line 21)
+* .netrc: Startup File. (line 6)
+* .wgetrc: Startup File. (line 6)
+* accept directories: Directory-Based Limits.
+ (line 17)
+* accept suffixes: Types of Files. (line 15)
+* accept wildcards: Types of Files. (line 15)
+* append to log: Logging and Input File Options.
+ (line 11)
+* arguments: Invoking. (line 6)
+* authentication: Download Options. (line 535)
+* authentication <1>: HTTP Options. (line 43)
+* authentication <2>: HTTP Options. (line 393)
+* authentication credentials: Download Options. (line 113)
+* backing up converted files: Recursive Retrieval Options.
+ (line 90)
+* backing up files: Download Options. (line 107)
+* bandwidth, limit: Download Options. (line 330)
+* base for relative links in input file: Logging and Input File Options.
+ (line 111)
+* bind address: Download Options. (line 6)
+* bind DNS address: Download Options. (line 11)
+* bug reports: Reporting Bugs. (line 6)
+* bugs: Reporting Bugs. (line 6)
+* cache: HTTP Options. (line 71)
+* caching of DNS lookups: Download Options. (line 414)
+* case fold: Recursive Accept/Reject Options.
+ (line 62)
+* client DNS address: Download Options. (line 11)
+* client IP address: Download Options. (line 6)
+* clobbering, file: Download Options. (line 68)
+* command line: Invoking. (line 6)
+* comments, HTML: Recursive Retrieval Options.
+ (line 168)
+* connect timeout: Download Options. (line 314)
+* Content On Error: HTTP Options. (line 380)
+* Content-Disposition: HTTP Options. (line 363)
+* Content-Encoding, choose: HTTP Options. (line 197)
+* Content-Length, ignore: HTTP Options. (line 160)
+* continue retrieval: Download Options. (line 118)
+* continue retrieval <1>: Download Options. (line 177)
+* contributors: Contributors. (line 6)
+* conversion of links: Recursive Retrieval Options.
+ (line 32)
+* cookies: HTTP Options. (line 80)
+* cookies, loading: HTTP Options. (line 90)
+* cookies, saving: HTTP Options. (line 138)
+* cookies, session: HTTP Options. (line 143)
+* cut directories: Directory Options. (line 32)
+* debug: Logging and Input File Options.
+ (line 17)
+* default page name: HTTP Options. (line 6)
+* delete after retrieval: Recursive Retrieval Options.
+ (line 16)
+* directories: Directory-Based Limits.
+ (line 6)
+* directories, exclude: Directory-Based Limits.
+ (line 30)
+* directories, include: Directory-Based Limits.
+ (line 17)
+* directory limits: Directory-Based Limits.
+ (line 6)
+* directory prefix: Directory Options. (line 59)
+* DNS cache: Download Options. (line 414)
+* DNS IP address, client, DNS: Download Options. (line 11)
+* DNS IP address, client, DNS <1>: Download Options. (line 19)
+* DNS server: Download Options. (line 19)
+* DNS timeout: Download Options. (line 308)
+* dot style: Download Options. (line 189)
+* downloading multiple times: Download Options. (line 68)
+* EGD: HTTPS (SSL/TLS) Options.
+ (line 142)
+* entropy, specifying source of: HTTPS (SSL/TLS) Options.
+ (line 127)
+* examples: Examples. (line 6)
+* exclude directories: Directory-Based Limits.
+ (line 30)
+* execute wgetrc command: Basic Startup Options.
+ (line 19)
+* FDL, GNU Free Documentation License: GNU Free Documentation License.
+ (line 6)
+* features: Overview. (line 6)
+* file names, restrict: Download Options. (line 433)
+* file permissions: FTP Options. (line 73)
+* filling proxy cache: Recursive Retrieval Options.
+ (line 16)
+* follow FTP links: Recursive Accept/Reject Options.
+ (line 34)
+* following ftp links: FTP Links. (line 6)
+* following links: Following Links. (line 6)
+* force html: Logging and Input File Options.
+ (line 104)
+* ftp authentication: FTP Options. (line 6)
+* ftp password: FTP Options. (line 6)
+* ftp time-stamping: FTP Time-Stamping Internals.
+ (line 6)
+* ftp user: FTP Options. (line 6)
+* globbing, toggle: FTP Options. (line 45)
+* hangup: Signals. (line 6)
+* header, add: HTTP Options. (line 171)
+* hosts, spanning: Spanning Hosts. (line 6)
+* HSTS: HTTPS (SSL/TLS) Options.
+ (line 161)
+* HTML comments: Recursive Retrieval Options.
+ (line 168)
+* http password: HTTP Options. (line 43)
+* http referer: HTTP Options. (line 229)
+* http time-stamping: HTTP Time-Stamping Internals.
+ (line 6)
+* http user: HTTP Options. (line 43)
+* idn support: Download Options. (line 557)
+* ignore case: Recursive Accept/Reject Options.
+ (line 62)
+* ignore length: HTTP Options. (line 160)
+* include directories: Directory-Based Limits.
+ (line 17)
+* incomplete downloads: Download Options. (line 118)
+* incomplete downloads <1>: Download Options. (line 177)
+* incremental updating: Time-Stamping. (line 6)
+* index.html: HTTP Options. (line 6)
+* input-file: Logging and Input File Options.
+ (line 46)
+* input-metalink: Logging and Input File Options.
+ (line 69)
+* Internet Relay Chat: Internet Relay Chat. (line 6)
+* invoking: Invoking. (line 6)
+* IP address, client: Download Options. (line 6)
+* IPv6: Download Options. (line 483)
+* IRC: Internet Relay Chat. (line 6)
+* iri support: Download Options. (line 557)
+* Keep-Alive, turning off: HTTP Options. (line 59)
+* keep-badhash: Logging and Input File Options.
+ (line 73)
+* latest version: Distribution. (line 6)
+* limit bandwidth: Download Options. (line 330)
+* link conversion: Recursive Retrieval Options.
+ (line 32)
+* links: Following Links. (line 6)
+* list: Mailing Lists. (line 5)
+* loading cookies: HTTP Options. (line 90)
+* local encoding: Download Options. (line 566)
+* location of wgetrc: Wgetrc Location. (line 6)
+* log file: Logging and Input File Options.
+ (line 6)
+* mailing list: Mailing Lists. (line 6)
+* metalink-index: Logging and Input File Options.
+ (line 85)
+* metalink-over-http: Logging and Input File Options.
+ (line 78)
+* mirroring: Very Advanced Usage. (line 6)
+* no parent: Directory-Based Limits.
+ (line 43)
+* no-clobber: Download Options. (line 68)
+* nohup: Invoking. (line 6)
+* number of tries: Download Options. (line 26)
+* offset: Download Options. (line 177)
+* operating systems: Portability. (line 6)
+* option syntax: Option Syntax. (line 6)
+* Other HTTP Methods: HTTP Options. (line 330)
+* output file: Logging and Input File Options.
+ (line 6)
+* overview: Overview. (line 6)
+* page requisites: Recursive Retrieval Options.
+ (line 103)
+* passive ftp: FTP Options. (line 61)
+* password: Download Options. (line 535)
+* pause: Download Options. (line 350)
+* Persistent Connections, disabling: HTTP Options. (line 59)
+* portability: Portability. (line 6)
+* POST: HTTP Options. (line 262)
+* preferred-location: Logging and Input File Options.
+ (line 93)
+* progress indicator: Download Options. (line 189)
+* proxies: Proxies. (line 6)
+* proxy: Download Options. (line 391)
+* proxy <1>: HTTP Options. (line 71)
+* proxy authentication: HTTP Options. (line 220)
+* proxy filling: Recursive Retrieval Options.
+ (line 16)
+* proxy password: HTTP Options. (line 220)
+* proxy user: HTTP Options. (line 220)
+* quiet: Logging and Input File Options.
+ (line 28)
+* quota: Download Options. (line 398)
+* random wait: Download Options. (line 373)
+* randomness, specifying source of: HTTPS (SSL/TLS) Options.
+ (line 127)
+* rate, limit: Download Options. (line 330)
+* read timeout: Download Options. (line 319)
+* recursion: Recursive Download. (line 6)
+* recursive download: Recursive Download. (line 6)
+* redirect: HTTP Options. (line 214)
+* redirecting output: Advanced Usage. (line 89)
+* referer, http: HTTP Options. (line 229)
+* reject directories: Directory-Based Limits.
+ (line 30)
+* reject suffixes: Types of Files. (line 39)
+* reject wildcards: Types of Files. (line 39)
+* relative links: Relative Links. (line 6)
+* remote encoding: Download Options. (line 580)
+* reporting bugs: Reporting Bugs. (line 6)
+* required images, downloading: Recursive Retrieval Options.
+ (line 103)
+* resume download: Download Options. (line 118)
+* resume download <1>: Download Options. (line 177)
+* retries: Download Options. (line 26)
+* retries, waiting between: Download Options. (line 364)
+* retrieving: Recursive Download. (line 6)
+* robot exclusion: Robot Exclusion. (line 6)
+* robots.txt: Robot Exclusion. (line 6)
+* sample wgetrc: Sample Wgetrc. (line 6)
+* saving cookies: HTTP Options. (line 138)
+* security: Security Considerations.
+ (line 6)
+* server maintenance: Robot Exclusion. (line 6)
+* server response, print: Download Options. (line 274)
+* server response, save: HTTP Options. (line 236)
+* session cookies: HTTP Options. (line 143)
+* signal handling: Signals. (line 6)
+* spanning hosts: Spanning Hosts. (line 6)
+* specify config: Logging and Input File Options.
+ (line 124)
+* spider: Download Options. (line 279)
+* SSL: HTTPS (SSL/TLS) Options.
+ (line 6)
+* SSL certificate: HTTPS (SSL/TLS) Options.
+ (line 73)
+* SSL certificate authority: HTTPS (SSL/TLS) Options.
+ (line 99)
+* SSL certificate type, specify: HTTPS (SSL/TLS) Options.
+ (line 79)
+* SSL certificate, check: HTTPS (SSL/TLS) Options.
+ (line 44)
+* SSL CRL, certificate revocation list: HTTPS (SSL/TLS) Options.
+ (line 111)
+* SSL protocol, choose: HTTPS (SSL/TLS) Options.
+ (line 11)
+* SSL Public Key Pin: HTTPS (SSL/TLS) Options.
+ (line 115)
+* start position: Download Options. (line 177)
+* startup: Startup File. (line 6)
+* startup file: Startup File. (line 6)
+* suffixes, accept: Types of Files. (line 15)
+* suffixes, reject: Types of Files. (line 39)
+* symbolic links, retrieving: FTP Options. (line 77)
+* syntax of options: Option Syntax. (line 6)
+* syntax of wgetrc: Wgetrc Syntax. (line 6)
+* tag-based recursive pruning: Recursive Accept/Reject Options.
+ (line 38)
+* time-stamping: Time-Stamping. (line 6)
+* time-stamping usage: Time-Stamping Usage. (line 6)
+* timeout: Download Options. (line 290)
+* timeout, connect: Download Options. (line 314)
+* timeout, DNS: Download Options. (line 308)
+* timeout, read: Download Options. (line 319)
+* timestamping: Time-Stamping. (line 6)
+* tries: Download Options. (line 26)
+* Trust server names: HTTP Options. (line 385)
+* types of files: Types of Files. (line 6)
+* unlink: Download Options. (line 595)
+* updating the archives: Time-Stamping. (line 6)
+* URL: URL Format. (line 6)
+* URL syntax: URL Format. (line 6)
+* usage, time-stamping: Time-Stamping Usage. (line 6)
+* user: Download Options. (line 535)
+* user-agent: HTTP Options. (line 240)
+* various: Various. (line 6)
+* verbose: Logging and Input File Options.
+ (line 32)
+* wait: Download Options. (line 350)
+* wait, random: Download Options. (line 373)
+* waiting between retries: Download Options. (line 364)
+* WARC: HTTPS (SSL/TLS) Options.
+ (line 240)
+* web site: Web Site. (line 6)
+* Wget as spider: Download Options. (line 279)
+* wgetrc: Startup File. (line 6)
+* wgetrc commands: Wgetrc Commands. (line 6)
+* wgetrc location: Wgetrc Location. (line 6)
+* wgetrc syntax: Wgetrc Syntax. (line 6)
+* wildcards, accept: Types of Files. (line 15)
+* wildcards, reject: Types of Files. (line 39)
+* Windows file names: Download Options. (line 433)
+* xattr: Logging and Input File Options.
+ (line 97)
+
+
+
+Tag Table:
+Node: Top744
+Node: Overview2076
+Node: Invoking5768
+Node: URL Format6628
+Ref: URL Format-Footnote-19307
+Node: Option Syntax9413
+Node: Basic Startup Options12191
+Node: Logging and Input File Options13049
+Node: Download Options18678
+Node: Directory Options48244
+Node: HTTP Options51095
+Node: HTTPS (SSL/TLS) Options71747
+Node: FTP Options84906
+Node: Recursive Retrieval Options91968
+Node: Recursive Accept/Reject Options101225
+Node: Exit Status105430
+Node: Recursive Download106465
+Node: Following Links109704
+Node: Spanning Hosts110670
+Node: Types of Files112939
+Node: Directory-Based Limits117833
+Node: Relative Links121100
+Node: FTP Links121950
+Node: Time-Stamping122841
+Node: Time-Stamping Usage124513
+Node: HTTP Time-Stamping Internals126385
+Ref: HTTP Time-Stamping Internals-Footnote-1127733
+Node: FTP Time-Stamping Internals127936
+Node: Startup File129423
+Node: Wgetrc Location130363
+Node: Wgetrc Syntax131217
+Node: Wgetrc Commands131982
+Node: Sample Wgetrc148575
+Node: Examples154603
+Node: Simple Usage154964
+Node: Advanced Usage156413
+Node: Very Advanced Usage160229
+Node: Various161773
+Node: Proxies162482
+Node: Distribution165439
+Node: Web Site165783
+Node: Mailing Lists166083
+Node: Internet Relay Chat167820
+Node: Reporting Bugs168115
+Node: Portability170843
+Node: Signals172490
+Node: Appendices173197
+Node: Robot Exclusion173545
+Node: Security Considerations177407
+Node: Contributors178617
+Node: Copying this manual184347
+Node: GNU Free Documentation License184587
+Node: Concept Index209939
+
+End Tag Table
+
+
+Local Variables:
+coding: utf-8
+End: