diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-15 19:43:11 +0000 |
commit | fc22b3d6507c6745911b9dfcc68f1e665ae13dbc (patch) | |
tree | ce1e3bce06471410239a6f41282e328770aa404a /upstream/opensuse-tumbleweed/man1/wget.1 | |
parent | Initial commit. (diff) | |
download | manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.tar.xz manpages-l10n-fc22b3d6507c6745911b9dfcc68f1e665ae13dbc.zip |
Adding upstream version 4.22.0.upstream/4.22.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'upstream/opensuse-tumbleweed/man1/wget.1')
-rw-r--r-- | upstream/opensuse-tumbleweed/man1/wget.1 | 2372 |
1 files changed, 2372 insertions, 0 deletions
diff --git a/upstream/opensuse-tumbleweed/man1/wget.1 b/upstream/opensuse-tumbleweed/man1/wget.1 new file mode 100644 index 00000000..e0b6dc5a --- /dev/null +++ b/upstream/opensuse-tumbleweed/man1/wget.1 @@ -0,0 +1,2372 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "WGET 1" +.TH WGET 1 2024-02-20 "GNU Wget 1.21.4" "GNU Wget" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +Wget \- The non\-interactive network downloader. +.SH SYNOPSIS +.IX Header "SYNOPSIS" +wget [\fIoption\fR]... [\fIURL\fR]... +.SH DESCRIPTION +.IX Header "DESCRIPTION" +GNU Wget is a free utility for non-interactive download of files from +the Web. It supports HTTP, HTTPS, and FTP protocols, as +well as retrieval through HTTP proxies. +.PP +Wget is non-interactive, meaning that it can work in the background, +while the user is not logged on. This allows you to start a retrieval +and disconnect from the system, letting Wget finish the work. By +contrast, most of the Web browsers require constant user's presence, +which can be a great hindrance when transferring a lot of data. +.PP +Wget can follow links in HTML, XHTML, and CSS pages, to +create local versions of remote web sites, fully recreating the +directory structure of the original site. This is sometimes referred to +as "recursive downloading." While doing that, Wget respects the Robot +Exclusion Standard (\fI/robots.txt\fR). Wget can be instructed to +convert the links in downloaded files to point at the local files, for +offline viewing. +.PP +Wget has been designed for robustness over slow or unstable network +connections; if a download fails due to a network problem, it will +keep retrying until the whole file has been retrieved. If the server +supports regetting, it will instruct the server to continue the +download from where it left off. +.SH OPTIONS +.IX Header "OPTIONS" +.SS "Option Syntax" +.IX Subsection "Option Syntax" +Since Wget uses GNU getopt to process command-line arguments, every +option has a long form along with the short one. Long options are +more convenient to remember, but take time to type. You may freely +mix different option styles, or specify options after the command-line +arguments. Thus you may write: +.PP +.Vb 1 +\& wget \-r \-\-tries=10 http://fly.srk.fer.hr/ \-o log +.Ve +.PP +The space between the option accepting an argument and the argument may +be omitted. Instead of \fB\-o log\fR you can write \fB\-olog\fR. +.PP +You may put several options that do not require arguments together, +like: +.PP +.Vb 1 +\& wget \-drc <URL> +.Ve +.PP +This is completely equivalent to: +.PP +.Vb 1 +\& wget \-d \-r \-c <URL> +.Ve +.PP +Since the options can be specified after the arguments, you may +terminate them with \fB\-\-\fR. So the following will try to download +URL \fB\-x\fR, reporting failure to \fIlog\fR: +.PP +.Vb 1 +\& wget \-o log \-\- \-x +.Ve +.PP +The options that accept comma-separated lists all respect the convention +that specifying an empty list clears its value. This can be useful to +clear the \fI.wgetrc\fR settings. For instance, if your \fI.wgetrc\fR +sets \f(CW\*(C`exclude_directories\*(C'\fR to \fI/cgi\-bin\fR, the following +example will first reset it, and then set it to exclude \fI/~nobody\fR +and \fI/~somebody\fR. You can also clear the lists in \fI.wgetrc\fR. +.PP +.Vb 1 +\& wget \-X "" \-X /~nobody,/~somebody +.Ve +.PP +Most options that do not accept arguments are \fIboolean\fR options, +so named because their state can be captured with a yes-or-no +("boolean") variable. For example, \fB\-\-follow\-ftp\fR tells Wget +to follow FTP links from HTML files and, on the other hand, +\&\fB\-\-no\-glob\fR tells it not to perform file globbing on FTP URLs. A +boolean option is either \fIaffirmative\fR or \fInegative\fR +(beginning with \fB\-\-no\fR). All such options share several +properties. +.PP +Unless stated otherwise, it is assumed that the default behavior is +the opposite of what the option accomplishes. For example, the +documented existence of \fB\-\-follow\-ftp\fR assumes that the default +is to \fInot\fR follow FTP links from HTML pages. +.PP +Affirmative options can be negated by prepending the \fB\-\-no\-\fR to +the option name; negative options can be negated by omitting the +\&\fB\-\-no\-\fR prefix. This might seem superfluous\-\-\-if the default for +an affirmative option is to not do something, then why provide a way +to explicitly turn it off? But the startup file may in fact change +the default. For instance, using \f(CW\*(C`follow_ftp = on\*(C'\fR in +\&\fI.wgetrc\fR makes Wget \fIfollow\fR FTP links by default, and +using \fB\-\-no\-follow\-ftp\fR is the only way to restore the factory +default from the command line. +.SS "Basic Startup Options" +.IX Subsection "Basic Startup Options" +.IP \fB\-V\fR 4 +.IX Item "-V" +.PD 0 +.IP \fB\-\-version\fR 4 +.IX Item "--version" +.PD +Display the version of Wget. +.IP \fB\-h\fR 4 +.IX Item "-h" +.PD 0 +.IP \fB\-\-help\fR 4 +.IX Item "--help" +.PD +Print a help message describing all of Wget's command-line options. +.IP \fB\-b\fR 4 +.IX Item "-b" +.PD 0 +.IP \fB\-\-background\fR 4 +.IX Item "--background" +.PD +Go to background immediately after startup. If no output file is +specified via the \fB\-o\fR, output is redirected to \fIwget-log\fR. +.IP "\fB\-e\fR \fIcommand\fR" 4 +.IX Item "-e command" +.PD 0 +.IP "\fB\-\-execute\fR \fIcommand\fR" 4 +.IX Item "--execute command" +.PD +Execute \fIcommand\fR as if it were a part of \fI.wgetrc\fR. A command thus invoked will be executed +\&\fIafter\fR the commands in \fI.wgetrc\fR, thus taking precedence over +them. If you need to specify more than one wgetrc command, use multiple +instances of \fB\-e\fR. +.SS "Logging and Input File Options" +.IX Subsection "Logging and Input File Options" +.IP "\fB\-o\fR \fIlogfile\fR" 4 +.IX Item "-o logfile" +.PD 0 +.IP \fB\-\-output\-file=\fR\fIlogfile\fR 4 +.IX Item "--output-file=logfile" +.PD +Log all messages to \fIlogfile\fR. The messages are normally reported +to standard error. +.IP "\fB\-a\fR \fIlogfile\fR" 4 +.IX Item "-a logfile" +.PD 0 +.IP \fB\-\-append\-output=\fR\fIlogfile\fR 4 +.IX Item "--append-output=logfile" +.PD +Append to \fIlogfile\fR. This is the same as \fB\-o\fR, only it appends +to \fIlogfile\fR instead of overwriting the old log file. If +\&\fIlogfile\fR does not exist, a new file is created. +.IP \fB\-d\fR 4 +.IX Item "-d" +.PD 0 +.IP \fB\-\-debug\fR 4 +.IX Item "--debug" +.PD +Turn on debug output, meaning various information important to the +developers of Wget if it does not work properly. Your system +administrator may have chosen to compile Wget without debug support, in +which case \fB\-d\fR will not work. Please note that compiling with +debug support is always safe\-\-\-Wget compiled with the debug support will +\&\fInot\fR print any debug info unless requested with \fB\-d\fR. +.IP \fB\-q\fR 4 +.IX Item "-q" +.PD 0 +.IP \fB\-\-quiet\fR 4 +.IX Item "--quiet" +.PD +Turn off Wget's output. +.IP \fB\-v\fR 4 +.IX Item "-v" +.PD 0 +.IP \fB\-\-verbose\fR 4 +.IX Item "--verbose" +.PD +Turn on verbose output, with all the available data. The default output +is verbose. +.IP \fB\-nv\fR 4 +.IX Item "-nv" +.PD 0 +.IP \fB\-\-no\-verbose\fR 4 +.IX Item "--no-verbose" +.PD +Turn off verbose without being completely quiet (use \fB\-q\fR for +that), which means that error messages and basic information still get +printed. +.IP \fB\-\-report\-speed=\fR\fItype\fR 4 +.IX Item "--report-speed=type" +Output bandwidth as \fItype\fR. The only accepted value is \fBbits\fR. +.IP "\fB\-i\fR \fIfile\fR" 4 +.IX Item "-i file" +.PD 0 +.IP \fB\-\-input\-file=\fR\fIfile\fR 4 +.IX Item "--input-file=file" +.PD +Read URLs from a local or external \fIfile\fR. If \fB\-\fR is +specified as \fIfile\fR, URLs are read from the standard input. +(Use \fB./\-\fR to read from a file literally named \fB\-\fR.) +.Sp +If this function is used, no URLs need be present on the command +line. If there are URLs both on the command line and in an input +file, those on the command lines will be the first ones to be +retrieved. If \fB\-\-force\-html\fR is not specified, then \fIfile\fR +should consist of a series of URLs, one per line. +.Sp +However, if you specify \fB\-\-force\-html\fR, the document will be +regarded as \fBhtml\fR. In that case you may have problems with +relative links, which you can solve either by adding \f(CW\*(C`<base +href="\fR\f(CIurl\fR\f(CW">\*(C'\fR to the documents or by specifying +\&\fB\-\-base=\fR\fIurl\fR on the command line. +.Sp +If the \fIfile\fR is an external one, the document will be automatically +treated as \fBhtml\fR if the Content-Type matches \fBtext/html\fR. +Furthermore, the \fIfile\fR's location will be implicitly used as base +href if none was specified. +.IP \fB\-\-input\-metalink=\fR\fIfile\fR 4 +.IX Item "--input-metalink=file" +Downloads files covered in local Metalink \fIfile\fR. Metalink version 3 +and 4 are supported. +.IP \fB\-\-keep\-badhash\fR 4 +.IX Item "--keep-badhash" +Keeps downloaded Metalink's files with a bad hash. It appends .badhash +to the name of Metalink's files which have a checksum mismatch, except +without overwriting existing files. +.IP \fB\-\-metalink\-over\-http\fR 4 +.IX Item "--metalink-over-http" +Issues HTTP HEAD request instead of GET and extracts Metalink metadata +from response headers. Then it switches to Metalink download. +If no valid Metalink metadata is found, it falls back to ordinary HTTP download. +Enables \fBContent-Type: application/metalink4+xml\fR files download/processing. +.IP \fB\-\-metalink\-index=\fR\fInumber\fR 4 +.IX Item "--metalink-index=number" +Set the Metalink \fBapplication/metalink4+xml\fR metaurl ordinal +NUMBER. From 1 to the total number of "application/metalink4+xml" +available. Specify 0 or \fBinf\fR to choose the first good one. +Metaurls, such as those from a \fB\-\-metalink\-over\-http\fR, may have +been sorted by priority key's value; keep this in mind to choose the +right NUMBER. +.IP \fB\-\-preferred\-location\fR 4 +.IX Item "--preferred-location" +Set preferred location for Metalink resources. This has effect if multiple +resources with same priority are available. +.IP \fB\-\-xattr\fR 4 +.IX Item "--xattr" +Enable use of file system's extended attributes to save the +original URL and the Referer HTTP header value if used. +.Sp +Be aware that the URL might contain private information like +access tokens or credentials. +.IP \fB\-F\fR 4 +.IX Item "-F" +.PD 0 +.IP \fB\-\-force\-html\fR 4 +.IX Item "--force-html" +.PD +When input is read from a file, force it to be treated as an HTML +file. This enables you to retrieve relative links from existing +HTML files on your local disk, by adding \f(CW\*(C`<base +href="\fR\f(CIurl\fR\f(CW">\*(C'\fR to HTML, or using the \fB\-\-base\fR command-line +option. +.IP "\fB\-B\fR \fIURL\fR" 4 +.IX Item "-B URL" +.PD 0 +.IP \fB\-\-base=\fR\fIURL\fR 4 +.IX Item "--base=URL" +.PD +Resolves relative links using \fIURL\fR as the point of reference, +when reading links from an HTML file specified via the +\&\fB\-i\fR/\fB\-\-input\-file\fR option (together with +\&\fB\-\-force\-html\fR, or when the input file was fetched remotely from +a server describing it as HTML). This is equivalent to the +presence of a \f(CW\*(C`BASE\*(C'\fR tag in the HTML input file, with +\&\fIURL\fR as the value for the \f(CW\*(C`href\*(C'\fR attribute. +.Sp +For instance, if you specify \fBhttp://foo/bar/a.html\fR for +\&\fIURL\fR, and Wget reads \fB../baz/b.html\fR from the input file, it +would be resolved to \fBhttp://foo/baz/b.html\fR. +.IP \fB\-\-config=\fR\fIFILE\fR 4 +.IX Item "--config=FILE" +Specify the location of a startup file you wish to use instead of the +default one(s). Use \-\-no\-config to disable reading of config files. +If both \-\-config and \-\-no\-config are given, \-\-no\-config is ignored. +.IP \fB\-\-rejected\-log=\fR\fIlogfile\fR 4 +.IX Item "--rejected-log=logfile" +Logs all URL rejections to \fIlogfile\fR as comma separated values. The values +include the reason of rejection, the URL and the parent URL it was found in. +.SS "Download Options" +.IX Subsection "Download Options" +.IP \fB\-\-bind\-address=\fR\fIADDRESS\fR 4 +.IX Item "--bind-address=ADDRESS" +When making client TCP/IP connections, bind to \fIADDRESS\fR on +the local machine. \fIADDRESS\fR may be specified as a hostname or IP +address. This option can be useful if your machine is bound to multiple +IPs. +.IP \fB\-\-bind\-dns\-address=\fR\fIADDRESS\fR 4 +.IX Item "--bind-dns-address=ADDRESS" +[libcares only] +This address overrides the route for DNS requests. If you ever need to +circumvent the standard settings from /etc/resolv.conf, this option together +with \fB\-\-dns\-servers\fR is your friend. +\&\fIADDRESS\fR must be specified either as IPv4 or IPv6 address. +Wget needs to be built with libcares for this option to be available. +.IP \fB\-\-dns\-servers=\fR\fIADDRESSES\fR 4 +.IX Item "--dns-servers=ADDRESSES" +[libcares only] +The given address(es) override the standard nameserver +addresses, e.g. as configured in /etc/resolv.conf. +\&\fIADDRESSES\fR may be specified either as IPv4 or IPv6 addresses, +comma-separated. +Wget needs to be built with libcares for this option to be available. +.IP "\fB\-t\fR \fInumber\fR" 4 +.IX Item "-t number" +.PD 0 +.IP \fB\-\-tries=\fR\fInumber\fR 4 +.IX Item "--tries=number" +.PD +Set number of tries to \fInumber\fR. Specify 0 or \fBinf\fR for +infinite retrying. The default is to retry 20 times, with the exception +of fatal errors like "connection refused" or "not found" (404), +which are not retried. +.IP "\fB\-O\fR \fIfile\fR" 4 +.IX Item "-O file" +.PD 0 +.IP \fB\-\-output\-document=\fR\fIfile\fR 4 +.IX Item "--output-document=file" +.PD +The documents will not be written to the appropriate files, but all +will be concatenated together and written to \fIfile\fR. If \fB\-\fR +is used as \fIfile\fR, documents will be printed to standard output, +disabling link conversion. (Use \fB./\-\fR to print to a file +literally named \fB\-\fR.) +.Sp +Use of \fB\-O\fR is \fInot\fR intended to mean simply "use the name +\&\fIfile\fR instead of the one in the URL;" rather, it is +analogous to shell redirection: +\&\fBwget \-O file http://foo\fR is intended to work like +\&\fBwget \-O \- http://foo > file\fR; \fIfile\fR will be truncated +immediately, and \fIall\fR downloaded content will be written there. +.Sp +For this reason, \fB\-N\fR (for timestamp-checking) is not supported +in combination with \fB\-O\fR: since \fIfile\fR is always newly +created, it will always have a very new timestamp. A warning will be +issued if this combination is used. +.Sp +Similarly, using \fB\-r\fR or \fB\-p\fR with \fB\-O\fR may not work as +you expect: Wget won't just download the first file to \fIfile\fR and +then download the rest to their normal names: \fIall\fR downloaded +content will be placed in \fIfile\fR. This was disabled in version +1.11, but has been reinstated (with a warning) in 1.11.2, as there are +some cases where this behavior can actually have some use. +.Sp +A combination with \fB\-nc\fR is only accepted if the given output +file does not exist. +.Sp +Note that a combination with \fB\-k\fR is only permitted when +downloading a single document, as in that case it will just convert +all relative URIs to external ones; \fB\-k\fR makes no sense for +multiple URIs when they're all being downloaded to a single file; +\&\fB\-k\fR can be used only when the output is a regular file. +.IP \fB\-nc\fR 4 +.IX Item "-nc" +.PD 0 +.IP \fB\-\-no\-clobber\fR 4 +.IX Item "--no-clobber" +.PD +If a file is downloaded more than once in the same directory, Wget's +behavior depends on a few options, including \fB\-nc\fR. In certain +cases, the local file will be \fIclobbered\fR, or overwritten, upon +repeated download. In other cases it will be preserved. +.Sp +When running Wget without \fB\-N\fR, \fB\-nc\fR, \fB\-r\fR, or +\&\fB\-p\fR, downloading the same file in the same directory will result +in the original copy of \fIfile\fR being preserved and the second copy +being named \fIfile\fR\fB.1\fR. If that file is downloaded yet +again, the third copy will be named \fIfile\fR\fB.2\fR, and so on. +(This is also the behavior with \fB\-nd\fR, even if \fB\-r\fR or +\&\fB\-p\fR are in effect.) When \fB\-nc\fR is specified, this behavior +is suppressed, and Wget will refuse to download newer copies of +\&\fIfile\fR. Therefore, "\f(CW\*(C`no\-clobber\*(C'\fR" is actually a +misnomer in this mode\-\-\-it's not clobbering that's prevented (as the +numeric suffixes were already preventing clobbering), but rather the +multiple version saving that's prevented. +.Sp +When running Wget with \fB\-r\fR or \fB\-p\fR, but without \fB\-N\fR, +\&\fB\-nd\fR, or \fB\-nc\fR, re-downloading a file will result in the +new copy simply overwriting the old. Adding \fB\-nc\fR will prevent +this behavior, instead causing the original version to be preserved +and any newer copies on the server to be ignored. +.Sp +When running Wget with \fB\-N\fR, with or without \fB\-r\fR or +\&\fB\-p\fR, the decision as to whether or not to download a newer copy +of a file depends on the local and remote timestamp and size of the +file. \fB\-nc\fR may not be specified at the +same time as \fB\-N\fR. +.Sp +A combination with \fB\-O\fR/\fB\-\-output\-document\fR is only accepted +if the given output file does not exist. +.Sp +Note that when \fB\-nc\fR is specified, files with the suffixes +\&\fB.html\fR or \fB.htm\fR will be loaded from the local disk and +parsed as if they had been retrieved from the Web. +.IP \fB\-\-backups=\fR\fIbackups\fR 4 +.IX Item "--backups=backups" +Before (over)writing a file, back up an existing file by adding a +\&\fB.1\fR suffix (\fB_1\fR on VMS) to the file name. Such backup +files are rotated to \fB.2\fR, \fB.3\fR, and so on, up to +\&\fIbackups\fR (and lost beyond that). +.IP \fB\-\-no\-netrc\fR 4 +.IX Item "--no-netrc" +Do not try to obtain credentials from \fI.netrc\fR file. By default +\&\fI.netrc\fR file is searched for credentials in case none have been +passed on command line and authentication is required. +.IP \fB\-c\fR 4 +.IX Item "-c" +.PD 0 +.IP \fB\-\-continue\fR 4 +.IX Item "--continue" +.PD +Continue getting a partially-downloaded file. This is useful when you +want to finish up a download started by a previous instance of Wget, or +by another program. For instance: +.Sp +.Vb 1 +\& wget \-c ftp://sunsite.doc.ic.ac.uk/ls\-lR.Z +.Ve +.Sp +If there is a file named \fIls\-lR.Z\fR in the current directory, Wget +will assume that it is the first portion of the remote file, and will +ask the server to continue the retrieval from an offset equal to the +length of the local file. +.Sp +Note that you don't need to specify this option if you just want the +current invocation of Wget to retry downloading a file should the +connection be lost midway through. This is the default behavior. +\&\fB\-c\fR only affects resumption of downloads started \fIprior\fR to +this invocation of Wget, and whose local files are still sitting around. +.Sp +Without \fB\-c\fR, the previous example would just download the remote +file to \fIls\-lR.Z.1\fR, leaving the truncated \fIls\-lR.Z\fR file +alone. +.Sp +If you use \fB\-c\fR on a non-empty file, and the server does not support +continued downloading, Wget will restart the download from scratch and overwrite +the existing file entirely. +.Sp +Beginning with Wget 1.7, if you use \fB\-c\fR on a file which is of +equal size as the one on the server, Wget will refuse to download the +file and print an explanatory message. The same happens when the file +is smaller on the server than locally (presumably because it was changed +on the server since your last download attempt)\-\-\-because "continuing" +is not meaningful, no download occurs. +.Sp +On the other side of the coin, while using \fB\-c\fR, any file that's +bigger on the server than locally will be considered an incomplete +download and only \f(CW\*(C`(length(remote) \- length(local))\*(C'\fR bytes will be +downloaded and tacked onto the end of the local file. This behavior can +be desirable in certain cases\-\-\-for instance, you can use \fBwget \-c\fR +to download just the new portion that's been appended to a data +collection or log file. +.Sp +However, if the file is bigger on the server because it's been +\&\fIchanged\fR, as opposed to just \fIappended\fR to, you'll end up +with a garbled file. Wget has no way of verifying that the local file +is really a valid prefix of the remote file. You need to be especially +careful of this when using \fB\-c\fR in conjunction with \fB\-r\fR, +since every file will be considered as an "incomplete download" candidate. +.Sp +Another instance where you'll get a garbled file if you try to use +\&\fB\-c\fR is if you have a lame HTTP proxy that inserts a +"transfer interrupted" string into the local file. In the future a +"rollback" option may be added to deal with this case. +.Sp +Note that \fB\-c\fR only works with FTP servers and with HTTP +servers that support the \f(CW\*(C`Range\*(C'\fR header. +.IP \fB\-\-start\-pos=\fR\fIOFFSET\fR 4 +.IX Item "--start-pos=OFFSET" +Start downloading at zero-based position \fIOFFSET\fR. Offset may be expressed +in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix, etc. +.Sp +\&\fB\-\-start\-pos\fR has higher precedence over \fB\-\-continue\fR. When +\&\fB\-\-start\-pos\fR and \fB\-\-continue\fR are both specified, wget will emit a +warning then proceed as if \fB\-\-continue\fR was absent. +.Sp +Server support for continued download is required, otherwise \fB\-\-start\-pos\fR +cannot help. See \fB\-c\fR for details. +.IP \fB\-\-progress=\fR\fItype\fR 4 +.IX Item "--progress=type" +Select the type of the progress indicator you wish to use. Legal +indicators are "dot" and "bar". +.Sp +The "bar" indicator is used by default. It draws an ASCII progress +bar graphics (a.k.a "thermometer" display) indicating the status of +retrieval. If the output is not a TTY, the "dot" bar will be used by +default. +.Sp +Use \fB\-\-progress=dot\fR to switch to the "dot" display. It traces +the retrieval by printing dots on the screen, each dot representing a +fixed amount of downloaded data. +.Sp +The progress \fItype\fR can also take one or more parameters. The parameters +vary based on the \fItype\fR selected. Parameters to \fItype\fR are passed by +appending them to the type sperated by a colon (:) like this: +\&\fB\-\-progress=\fR\fItype\fR\fB:\fR\fIparameter1\fR\fB:\fR\fIparameter2\fR. +.Sp +When using the dotted retrieval, you may set the \fIstyle\fR by +specifying the type as \fBdot:\fR\fIstyle\fR. Different styles assign +different meaning to one dot. With the \f(CW\*(C`default\*(C'\fR style each dot +represents 1K, there are ten dots in a cluster and 50 dots in a line. +The \f(CW\*(C`binary\*(C'\fR style has a more "computer"\-like orientation\-\-\-8K +dots, 16\-dots clusters and 48 dots per line (which makes for 384K +lines). The \f(CW\*(C`mega\*(C'\fR style is suitable for downloading large +files\-\-\-each dot represents 64K retrieved, there are eight dots in a +cluster, and 48 dots on each line (so each line contains 3M). +If \f(CW\*(C`mega\*(C'\fR is not enough then you can use the \f(CW\*(C`giga\*(C'\fR +style\-\-\-each dot represents 1M retrieved, there are eight dots in a +cluster, and 32 dots on each line (so each line contains 32M). +.Sp +With \fB\-\-progress=bar\fR, there are currently two possible parameters, +\&\fIforce\fR and \fInoscroll\fR. +.Sp +When the output is not a TTY, the progress bar always falls back to "dot", +even if \fB\-\-progress=bar\fR was passed to Wget during invocation. This +behaviour can be overridden and the "bar" output forced by using the "force" +parameter as \fB\-\-progress=bar:force\fR. +.Sp +By default, the \fBbar\fR style progress bar scroll the name of the file from +left to right for the file being downloaded if the filename exceeds the maximum +length allotted for its display. In certain cases, such as with +\&\fB\-\-progress=bar:force\fR, one may not want the scrolling filename in the +progress bar. By passing the "noscroll" parameter, Wget can be forced to +display as much of the filename as possible without scrolling through it. +.Sp +Note that you can set the default style using the \f(CW\*(C`progress\*(C'\fR +command in \fI.wgetrc\fR. That setting may be overridden from the +command line. For example, to force the bar output without scrolling, +use \fB\-\-progress=bar:force:noscroll\fR. +.IP \fB\-\-show\-progress\fR 4 +.IX Item "--show-progress" +Force wget to display the progress bar in any verbosity. +.Sp +By default, wget only displays the progress bar in verbose mode. One may +however, want wget to display the progress bar on screen in conjunction with +any other verbosity modes like \fB\-\-no\-verbose\fR or \fB\-\-quiet\fR. This +is often a desired a property when invoking wget to download several small/large +files. In such a case, wget could simply be invoked with this parameter to get +a much cleaner output on the screen. +.Sp +This option will also force the progress bar to be printed to \fIstderr\fR when +used alongside the \fB\-\-output\-file\fR option. +.IP \fB\-N\fR 4 +.IX Item "-N" +.PD 0 +.IP \fB\-\-timestamping\fR 4 +.IX Item "--timestamping" +.PD +Turn on time-stamping. +.IP \fB\-\-no\-if\-modified\-since\fR 4 +.IX Item "--no-if-modified-since" +Do not send If-Modified-Since header in \fB\-N\fR mode. Send preliminary HEAD +request instead. This has only effect in \fB\-N\fR mode. +.IP \fB\-\-no\-use\-server\-timestamps\fR 4 +.IX Item "--no-use-server-timestamps" +Don't set the local file's timestamp by the one on the server. +.Sp +By default, when a file is downloaded, its timestamps are set to +match those from the remote file. This allows the use of +\&\fB\-\-timestamping\fR on subsequent invocations of wget. However, it +is sometimes useful to base the local file's timestamp on when it was +actually downloaded; for that purpose, the +\&\fB\-\-no\-use\-server\-timestamps\fR option has been provided. +.IP \fB\-S\fR 4 +.IX Item "-S" +.PD 0 +.IP \fB\-\-server\-response\fR 4 +.IX Item "--server-response" +.PD +Print the headers sent by HTTP servers and responses sent by +FTP servers. +.IP \fB\-\-spider\fR 4 +.IX Item "--spider" +When invoked with this option, Wget will behave as a Web \fIspider\fR, +which means that it will not download the pages, just check that they +are there. For example, you can use Wget to check your bookmarks: +.Sp +.Vb 1 +\& wget \-\-spider \-\-force\-html \-i bookmarks.html +.Ve +.Sp +This feature needs much more work for Wget to get close to the +functionality of real web spiders. +.IP "\fB\-T seconds\fR" 4 +.IX Item "-T seconds" +.PD 0 +.IP \fB\-\-timeout=\fR\fIseconds\fR 4 +.IX Item "--timeout=seconds" +.PD +Set the network timeout to \fIseconds\fR seconds. This is equivalent +to specifying \fB\-\-dns\-timeout\fR, \fB\-\-connect\-timeout\fR, and +\&\fB\-\-read\-timeout\fR, all at the same time. +.Sp +When interacting with the network, Wget can check for timeout and +abort the operation if it takes too long. This prevents anomalies +like hanging reads and infinite connects. The only timeout enabled by +default is a 900\-second read timeout. Setting a timeout to 0 disables +it altogether. Unless you know what you are doing, it is best not to +change the default timeout settings. +.Sp +All timeout-related options accept decimal values, as well as +subsecond values. For example, \fB0.1\fR seconds is a legal (though +unwise) choice of timeout. Subsecond timeouts are useful for checking +server response times or for testing network latency. +.IP \fB\-\-dns\-timeout=\fR\fIseconds\fR 4 +.IX Item "--dns-timeout=seconds" +Set the DNS lookup timeout to \fIseconds\fR seconds. DNS lookups that +don't complete within the specified time will fail. By default, there +is no timeout on DNS lookups, other than that implemented by system +libraries. +.IP \fB\-\-connect\-timeout=\fR\fIseconds\fR 4 +.IX Item "--connect-timeout=seconds" +Set the connect timeout to \fIseconds\fR seconds. TCP connections that +take longer to establish will be aborted. By default, there is no +connect timeout, other than that implemented by system libraries. +.IP \fB\-\-read\-timeout=\fR\fIseconds\fR 4 +.IX Item "--read-timeout=seconds" +Set the read (and write) timeout to \fIseconds\fR seconds. The +"time" of this timeout refers to \fIidle time\fR: if, at any point in +the download, no data is received for more than the specified number +of seconds, reading fails and the download is restarted. This option +does not directly affect the duration of the entire download. +.Sp +Of course, the remote server may choose to terminate the connection +sooner than this option requires. The default read timeout is 900 +seconds. +.IP \fB\-\-limit\-rate=\fR\fIamount\fR 4 +.IX Item "--limit-rate=amount" +Limit the download speed to \fIamount\fR bytes per second. Amount may +be expressed in bytes, kilobytes with the \fBk\fR suffix, or megabytes +with the \fBm\fR suffix. For example, \fB\-\-limit\-rate=20k\fR will +limit the retrieval rate to 20KB/s. This is useful when, for whatever +reason, you don't want Wget to consume the entire available bandwidth. +.Sp +This option allows the use of decimal numbers, usually in conjunction +with power suffixes; for example, \fB\-\-limit\-rate=2.5k\fR is a legal +value. +.Sp +Note that Wget implements the limiting by sleeping the appropriate +amount of time after a network read that took less time than specified +by the rate. Eventually this strategy causes the TCP transfer to slow +down to approximately the specified rate. However, it may take some +time for this balance to be achieved, so don't be surprised if limiting +the rate doesn't work well with very small files. +.IP "\fB\-w\fR \fIseconds\fR" 4 +.IX Item "-w seconds" +.PD 0 +.IP \fB\-\-wait=\fR\fIseconds\fR 4 +.IX Item "--wait=seconds" +.PD +Wait the specified number of seconds between the retrievals. Use of +this option is recommended, as it lightens the server load by making the +requests less frequent. Instead of in seconds, the time can be +specified in minutes using the \f(CW\*(C`m\*(C'\fR suffix, in hours using \f(CW\*(C`h\*(C'\fR +suffix, or in days using \f(CW\*(C`d\*(C'\fR suffix. +.Sp +Specifying a large value for this option is useful if the network or the +destination host is down, so that Wget can wait long enough to +reasonably expect the network error to be fixed before the retry. The +waiting interval specified by this function is influenced by +\&\f(CW\*(C`\-\-random\-wait\*(C'\fR, which see. +.IP \fB\-\-waitretry=\fR\fIseconds\fR 4 +.IX Item "--waitretry=seconds" +If you don't want Wget to wait between \fIevery\fR retrieval, but only +between retries of failed downloads, you can use this option. Wget will +use \fIlinear backoff\fR, waiting 1 second after the first failure on a +given file, then waiting 2 seconds after the second failure on that +file, up to the maximum number of \fIseconds\fR you specify. +.Sp +By default, Wget will assume a value of 10 seconds. +.IP \fB\-\-random\-wait\fR 4 +.IX Item "--random-wait" +Some web sites may perform log analysis to identify retrieval programs +such as Wget by looking for statistically significant similarities in +the time between requests. This option causes the time between requests +to vary between 0.5 and 1.5 * \fIwait\fR seconds, where \fIwait\fR was +specified using the \fB\-\-wait\fR option, in order to mask Wget's +presence from such analysis. +.Sp +A 2001 article in a publication devoted to development on a popular +consumer platform provided code to perform this analysis on the fly. +Its author suggested blocking at the class C address level to ensure +automated retrieval programs were blocked despite changing DHCP-supplied +addresses. +.Sp +The \fB\-\-random\-wait\fR option was inspired by this ill-advised +recommendation to block many unrelated users from a web site due to the +actions of one. +.IP \fB\-\-no\-proxy\fR 4 +.IX Item "--no-proxy" +Don't use proxies, even if the appropriate \f(CW*_proxy\fR environment +variable is defined. +.IP "\fB\-Q\fR \fIquota\fR" 4 +.IX Item "-Q quota" +.PD 0 +.IP \fB\-\-quota=\fR\fIquota\fR 4 +.IX Item "--quota=quota" +.PD +Specify download quota for automatic retrievals. The value can be +specified in bytes (default), kilobytes (with \fBk\fR suffix), or +megabytes (with \fBm\fR suffix). +.Sp +Note that quota will never affect downloading a single file. So if you +specify \fBwget \-Q10k https://example.com/ls\-lR.gz\fR, all of the +\&\fIls\-lR.gz\fR will be downloaded. The same goes even when several +URLs are specified on the command-line. The quota is checked only +at the end of each downloaded file, so it will never result in a partially +downloaded file. Thus you may safely type \fBwget \-Q2m \-i sites\fR\-\-\-download +will be aborted after the file that exhausts the quota is completely +downloaded. +.Sp +Setting quota to 0 or to \fBinf\fR unlimits the download quota. +.IP \fB\-\-no\-dns\-cache\fR 4 +.IX Item "--no-dns-cache" +Turn off caching of DNS lookups. Normally, Wget remembers the IP +addresses it looked up from DNS so it doesn't have to repeatedly +contact the DNS server for the same (typically small) set of hosts it +retrieves from. This cache exists in memory only; a new Wget run will +contact DNS again. +.Sp +However, it has been reported that in some situations it is not +desirable to cache host names, even for the duration of a +short-running application like Wget. With this option Wget issues a +new DNS lookup (more precisely, a new call to \f(CW\*(C`gethostbyname\*(C'\fR or +\&\f(CW\*(C`getaddrinfo\*(C'\fR) each time it makes a new connection. Please note +that this option will \fInot\fR affect caching that might be +performed by the resolving library or by an external caching layer, +such as NSCD. +.Sp +If you don't understand exactly what this option does, you probably +won't need it. +.IP \fB\-\-restrict\-file\-names=\fR\fImodes\fR 4 +.IX Item "--restrict-file-names=modes" +Change which characters found in remote URLs must be escaped during +generation of local filenames. Characters that are \fIrestricted\fR +by this option are escaped, i.e. replaced with \fR\f(CB%HH\fR\fB\fR, where +\&\fBHH\fR is the hexadecimal number that corresponds to the restricted +character. This option may also be used to force all alphabetical +cases to be either lower\- or uppercase. +.Sp +By default, Wget escapes the characters that are not valid or safe as +part of file names on your operating system, as well as control +characters that are typically unprintable. This option is useful for +changing these defaults, perhaps because you are downloading to a +non-native partition, or because you want to disable escaping of the +control characters, or you want to further restrict characters to only +those in the ASCII range of values. +.Sp +The \fImodes\fR are a comma-separated set of text values. The +acceptable values are \fBunix\fR, \fBwindows\fR, \fBnocontrol\fR, +\&\fBascii\fR, \fBlowercase\fR, and \fBuppercase\fR. The values +\&\fBunix\fR and \fBwindows\fR are mutually exclusive (one will +override the other), as are \fBlowercase\fR and +\&\fBuppercase\fR. Those last are special cases, as they do not change +the set of characters that would be escaped, but rather force local +file paths to be converted either to lower\- or uppercase. +.Sp +When "unix" is specified, Wget escapes the character \fB/\fR and +the control characters in the ranges 0\-\-31 and 128\-\-159. This is the +default on Unix-like operating systems. +.Sp +When "windows" is given, Wget escapes the characters \fB\e\fR, +\&\fB|\fR, \fB/\fR, \fB:\fR, \fB?\fR, \fB"\fR, \fB*\fR, \fB<\fR, +\&\fB>\fR, and the control characters in the ranges 0\-\-31 and 128\-\-159. +In addition to this, Wget in Windows mode uses \fB+\fR instead of +\&\fB:\fR to separate host and port in local file names, and uses +\&\fB@\fR instead of \fB?\fR to separate the query portion of the file +name from the rest. Therefore, a URL that would be saved as +\&\fBwww.xemacs.org:4300/search.pl?input=blah\fR in Unix mode would be +saved as \fBwww.xemacs.org+4300/search.pl@input=blah\fR in Windows +mode. This mode is the default on Windows. +.Sp +If you specify \fBnocontrol\fR, then the escaping of the control +characters is also switched off. This option may make sense +when you are downloading URLs whose names contain UTF\-8 characters, on +a system which can save and display filenames in UTF\-8 (some possible +byte values used in UTF\-8 byte sequences fall in the range of values +designated by Wget as "controls"). +.Sp +The \fBascii\fR mode is used to specify that any bytes whose values +are outside the range of ASCII characters (that is, greater than +127) shall be escaped. This can be useful when saving filenames +whose encoding does not match the one used locally. +.IP \fB\-4\fR 4 +.IX Item "-4" +.PD 0 +.IP \fB\-\-inet4\-only\fR 4 +.IX Item "--inet4-only" +.IP \fB\-6\fR 4 +.IX Item "-6" +.IP \fB\-\-inet6\-only\fR 4 +.IX Item "--inet6-only" +.PD +Force connecting to IPv4 or IPv6 addresses. With \fB\-\-inet4\-only\fR +or \fB\-4\fR, Wget will only connect to IPv4 hosts, ignoring AAAA +records in DNS, and refusing to connect to IPv6 addresses specified in +URLs. Conversely, with \fB\-\-inet6\-only\fR or \fB\-6\fR, Wget will +only connect to IPv6 hosts and ignore A records and IPv4 addresses. +.Sp +Neither options should be needed normally. By default, an IPv6\-aware +Wget will use the address family specified by the host's DNS record. +If the DNS responds with both IPv4 and IPv6 addresses, Wget will try +them in sequence until it finds one it can connect to. (Also see +\&\f(CW\*(C`\-\-prefer\-family\*(C'\fR option described below.) +.Sp +These options can be used to deliberately force the use of IPv4 or +IPv6 address families on dual family systems, usually to aid debugging +or to deal with broken network configuration. Only one of +\&\fB\-\-inet6\-only\fR and \fB\-\-inet4\-only\fR may be specified at the +same time. Neither option is available in Wget compiled without IPv6 +support. +.IP \fB\-\-prefer\-family=none/IPv4/IPv6\fR 4 +.IX Item "--prefer-family=none/IPv4/IPv6" +When given a choice of several addresses, connect to the addresses +with specified address family first. The address order returned by +DNS is used without change by default. +.Sp +This avoids spurious errors and connect attempts when accessing hosts +that resolve to both IPv6 and IPv4 addresses from IPv4 networks. For +example, \fBwww.kame.net\fR resolves to +\&\fB2001:200:0:8002:203:47ff:fea5:3085\fR and to +\&\fB203.178.141.194\fR. When the preferred family is \f(CW\*(C`IPv4\*(C'\fR, the +IPv4 address is used first; when the preferred family is \f(CW\*(C`IPv6\*(C'\fR, +the IPv6 address is used first; if the specified value is \f(CW\*(C`none\*(C'\fR, +the address order returned by DNS is used without change. +.Sp +Unlike \fB\-4\fR and \fB\-6\fR, this option doesn't inhibit access to +any address family, it only changes the \fIorder\fR in which the +addresses are accessed. Also note that the reordering performed by +this option is \fIstable\fR\-\-\-it doesn't affect order of addresses of +the same family. That is, the relative order of all IPv4 addresses +and of all IPv6 addresses remains intact in all cases. +.IP \fB\-\-retry\-connrefused\fR 4 +.IX Item "--retry-connrefused" +Consider "connection refused" a transient error and try again. +Normally Wget gives up on a URL when it is unable to connect to the +site because failure to connect is taken as a sign that the server is +not running at all and that retries would not help. This option is +for mirroring unreliable sites whose servers tend to disappear for +short periods of time. +.IP \fB\-\-user=\fR\fIuser\fR 4 +.IX Item "--user=user" +.PD 0 +.IP \fB\-\-password=\fR\fIpassword\fR 4 +.IX Item "--password=password" +.PD +Specify the username \fIuser\fR and password \fIpassword\fR for both +FTP and HTTP file retrieval. These parameters can be overridden +using the \fB\-\-ftp\-user\fR and \fB\-\-ftp\-password\fR options for +FTP connections and the \fB\-\-http\-user\fR and \fB\-\-http\-password\fR +options for HTTP connections. +.IP \fB\-\-ask\-password\fR 4 +.IX Item "--ask-password" +Prompt for a password for each connection established. Cannot be specified +when \fB\-\-password\fR is being used, because they are mutually exclusive. +.IP \fB\-\-use\-askpass=\fR\fIcommand\fR 4 +.IX Item "--use-askpass=command" +Prompt for a user and password using the specified command. If no command is +specified then the command in the environment variable WGET_ASKPASS is used. +If WGET_ASKPASS is not set then the command in the environment variable +SSH_ASKPASS is used. +.Sp +You can set the default command for use-askpass in the \fI.wgetrc\fR. That +setting may be overridden from the command line. +.IP \fB\-\-no\-iri\fR 4 +.IX Item "--no-iri" +Turn off internationalized URI (IRI) support. Use \fB\-\-iri\fR to +turn it on. IRI support is activated by default. +.Sp +You can set the default state of IRI support using the \f(CW\*(C`iri\*(C'\fR +command in \fI.wgetrc\fR. That setting may be overridden from the +command line. +.IP \fB\-\-local\-encoding=\fR\fIencoding\fR 4 +.IX Item "--local-encoding=encoding" +Force Wget to use \fIencoding\fR as the default system encoding. That affects +how Wget converts URLs specified as arguments from locale to UTF\-8 for +IRI support. +.Sp +Wget use the function \f(CWnl_langinfo()\fR and then the \f(CW\*(C`CHARSET\*(C'\fR +environment variable to get the locale. If it fails, ASCII is used. +.Sp +You can set the default local encoding using the \f(CW\*(C`local_encoding\*(C'\fR +command in \fI.wgetrc\fR. That setting may be overridden from the +command line. +.IP \fB\-\-remote\-encoding=\fR\fIencoding\fR 4 +.IX Item "--remote-encoding=encoding" +Force Wget to use \fIencoding\fR as the default remote server encoding. +That affects how Wget converts URIs found in files from remote encoding +to UTF\-8 during a recursive fetch. This options is only useful for +IRI support, for the interpretation of non-ASCII characters. +.Sp +For HTTP, remote encoding can be found in HTTP \f(CW\*(C`Content\-Type\*(C'\fR +header and in HTML \f(CW\*(C`Content\-Type http\-equiv\*(C'\fR meta tag. +.Sp +You can set the default encoding using the \f(CW\*(C`remoteencoding\*(C'\fR +command in \fI.wgetrc\fR. That setting may be overridden from the +command line. +.IP \fB\-\-unlink\fR 4 +.IX Item "--unlink" +Force Wget to unlink file instead of clobbering existing file. This +option is useful for downloading to the directory with hardlinks. +.SS "Directory Options" +.IX Subsection "Directory Options" +.IP \fB\-nd\fR 4 +.IX Item "-nd" +.PD 0 +.IP \fB\-\-no\-directories\fR 4 +.IX Item "--no-directories" +.PD +Do not create a hierarchy of directories when retrieving recursively. +With this option turned on, all files will get saved to the current +directory, without clobbering (if a name shows up more than once, the +filenames will get extensions \fB.n\fR). +.IP \fB\-x\fR 4 +.IX Item "-x" +.PD 0 +.IP \fB\-\-force\-directories\fR 4 +.IX Item "--force-directories" +.PD +The opposite of \fB\-nd\fR\-\-\-create a hierarchy of directories, even if +one would not have been created otherwise. E.g. \fBwget \-x +http://fly.srk.fer.hr/robots.txt\fR will save the downloaded file to +\&\fIfly.srk.fer.hr/robots.txt\fR. +.IP \fB\-nH\fR 4 +.IX Item "-nH" +.PD 0 +.IP \fB\-\-no\-host\-directories\fR 4 +.IX Item "--no-host-directories" +.PD +Disable generation of host-prefixed directories. By default, invoking +Wget with \fB\-r http://fly.srk.fer.hr/\fR will create a structure of +directories beginning with \fIfly.srk.fer.hr/\fR. This option disables +such behavior. +.IP \fB\-\-protocol\-directories\fR 4 +.IX Item "--protocol-directories" +Use the protocol name as a directory component of local file names. For +example, with this option, \fBwget \-r http://\fR\fIhost\fR will save to +\&\fBhttp/\fR\fIhost\fR\fB/...\fR rather than just to \fIhost\fR\fB/...\fR. +.IP \fB\-\-cut\-dirs=\fR\fInumber\fR 4 +.IX Item "--cut-dirs=number" +Ignore \fInumber\fR directory components. This is useful for getting a +fine-grained control over the directory where recursive retrieval will +be saved. +.Sp +Take, for example, the directory at +\&\fBftp://ftp.xemacs.org/pub/xemacs/\fR. If you retrieve it with +\&\fB\-r\fR, it will be saved locally under +\&\fIftp.xemacs.org/pub/xemacs/\fR. While the \fB\-nH\fR option can +remove the \fIftp.xemacs.org/\fR part, you are still stuck with +\&\fIpub/xemacs\fR. This is where \fB\-\-cut\-dirs\fR comes in handy; it +makes Wget not "see" \fInumber\fR remote directory components. Here +are several examples of how \fB\-\-cut\-dirs\fR option works. +.Sp +.Vb 4 +\& No options \-> ftp.xemacs.org/pub/xemacs/ +\& \-nH \-> pub/xemacs/ +\& \-nH \-\-cut\-dirs=1 \-> xemacs/ +\& \-nH \-\-cut\-dirs=2 \-> . +\& +\& \-\-cut\-dirs=1 \-> ftp.xemacs.org/xemacs/ +\& ... +.Ve +.Sp +If you just want to get rid of the directory structure, this option is +similar to a combination of \fB\-nd\fR and \fB\-P\fR. However, unlike +\&\fB\-nd\fR, \fB\-\-cut\-dirs\fR does not lose with subdirectories\-\-\-for +instance, with \fB\-nH \-\-cut\-dirs=1\fR, a \fIbeta/\fR subdirectory will +be placed to \fIxemacs/beta\fR, as one would expect. +.IP "\fB\-P\fR \fIprefix\fR" 4 +.IX Item "-P prefix" +.PD 0 +.IP \fB\-\-directory\-prefix=\fR\fIprefix\fR 4 +.IX Item "--directory-prefix=prefix" +.PD +Set directory prefix to \fIprefix\fR. The \fIdirectory prefix\fR is the +directory where all other files and subdirectories will be saved to, +i.e. the top of the retrieval tree. The default is \fB.\fR (the +current directory). +.SS "HTTP Options" +.IX Subsection "HTTP Options" +.IP \fB\-\-default\-page=\fR\fIname\fR 4 +.IX Item "--default-page=name" +Use \fIname\fR as the default file name when it isn't known (i.e., for +URLs that end in a slash), instead of \fIindex.html\fR. +.IP \fB\-E\fR 4 +.IX Item "-E" +.PD 0 +.IP \fB\-\-adjust\-extension\fR 4 +.IX Item "--adjust-extension" +.PD +If a file of type \fBapplication/xhtml+xml\fR or \fBtext/html\fR is +downloaded and the URL does not end with the regexp +\&\fB\e.[Hh][Tt][Mm][Ll]?\fR, this option will cause the suffix \fB.html\fR +to be appended to the local filename. This is useful, for instance, when +you're mirroring a remote site that uses \fB.asp\fR pages, but you want +the mirrored pages to be viewable on your stock Apache server. Another +good use for this is when you're downloading CGI-generated materials. A URL +like \fBhttp://site.com/article.cgi?25\fR will be saved as +\&\fIarticle.cgi?25.html\fR. +.Sp +Note that filenames changed in this way will be re-downloaded every time +you re-mirror a site, because Wget can't tell that the local +\&\fIX.html\fR file corresponds to remote URL \fIX\fR (since +it doesn't yet know that the URL produces output of type +\&\fBtext/html\fR or \fBapplication/xhtml+xml\fR. +.Sp +As of version 1.12, Wget will also ensure that any downloaded files of +type \fBtext/css\fR end in the suffix \fB.css\fR, and the option was +renamed from \fB\-\-html\-extension\fR, to better reflect its new +behavior. The old option name is still acceptable, but should now be +considered deprecated. +.Sp +As of version 1.19.2, Wget will also ensure that any downloaded files with +a \f(CW\*(C`Content\-Encoding\*(C'\fR of \fBbr\fR, \fBcompress\fR, \fBdeflate\fR +or \fBgzip\fR end in the suffix \fB.br\fR, \fB.Z\fR, \fB.zlib\fR +and \fB.gz\fR respectively. +.Sp +At some point in the future, this option may well be expanded to +include suffixes for other types of content, including content types +that are not parsed by Wget. +.IP \fB\-\-http\-user=\fR\fIuser\fR 4 +.IX Item "--http-user=user" +.PD 0 +.IP \fB\-\-http\-password=\fR\fIpassword\fR 4 +.IX Item "--http-password=password" +.PD +Specify the username \fIuser\fR and password \fIpassword\fR on an +HTTP server. According to the type of the challenge, Wget will +encode them using either the \f(CW\*(C`basic\*(C'\fR (insecure), +the \f(CW\*(C`digest\*(C'\fR, or the Windows \f(CW\*(C`NTLM\*(C'\fR authentication scheme. +.Sp +Another way to specify username and password is in the URL itself. Either method reveals your password to anyone who +bothers to run \f(CW\*(C`ps\*(C'\fR. To prevent the passwords from being seen, +use the \fB\-\-use\-askpass\fR or store them in \fI.wgetrc\fR or \fI.netrc\fR, +and make sure to protect those files from other users with \f(CW\*(C`chmod\*(C'\fR. If +the passwords are really important, do not leave them lying in those files +either\-\-\-edit the files and delete them after Wget has started the download. +.IP \fB\-\-no\-http\-keep\-alive\fR 4 +.IX Item "--no-http-keep-alive" +Turn off the "keep-alive" feature for HTTP downloads. Normally, Wget +asks the server to keep the connection open so that, when you download +more than one document from the same server, they get transferred over +the same TCP connection. This saves time and at the same time reduces +the load on the server. +.Sp +This option is useful when, for some reason, persistent (keep-alive) +connections don't work for you, for example due to a server bug or due +to the inability of server-side scripts to cope with the connections. +.IP \fB\-\-no\-cache\fR 4 +.IX Item "--no-cache" +Disable server-side cache. In this case, Wget will send the remote +server appropriate directives (\fBCache-Control: no-cache\fR and +\&\fBPragma: no-cache\fR) to get the file from the remote service, +rather than returning the cached version. This is especially useful +for retrieving and flushing out-of-date documents on proxy servers. +.Sp +Caching is allowed by default. +.IP \fB\-\-no\-cookies\fR 4 +.IX Item "--no-cookies" +Disable the use of cookies. Cookies are a mechanism for maintaining +server-side state. The server sends the client a cookie using the +\&\f(CW\*(C`Set\-Cookie\*(C'\fR header, and the client responds with the same cookie +upon further requests. Since cookies allow the server owners to keep +track of visitors and for sites to exchange this information, some +consider them a breach of privacy. The default is to use cookies; +however, \fIstoring\fR cookies is not on by default. +.IP "\fB\-\-load\-cookies\fR \fIfile\fR" 4 +.IX Item "--load-cookies file" +Load cookies from \fIfile\fR before the first HTTP retrieval. +\&\fIfile\fR is a textual file in the format originally used by Netscape's +\&\fIcookies.txt\fR file. +.Sp +You will typically use this option when mirroring sites that require +that you be logged in to access some or all of their content. The login +process typically works by the web server issuing an HTTP cookie +upon receiving and verifying your credentials. The cookie is then +resent by the browser when accessing that part of the site, and so +proves your identity. +.Sp +Mirroring such a site requires Wget to send the same cookies your +browser sends when communicating with the site. This is achieved by +\&\fB\-\-load\-cookies\fR\-\-\-simply point Wget to the location of the +\&\fIcookies.txt\fR file, and it will send the same cookies your browser +would send in the same situation. Different browsers keep textual +cookie files in different locations: +.RS 4 +.ie n .IP """Netscape 4.x.""" 4 +.el .IP "\f(CWNetscape 4.x.\fR" 4 +.IX Item "Netscape 4.x." +The cookies are in \fI~/.netscape/cookies.txt\fR. +.ie n .IP """Mozilla and Netscape 6.x.""" 4 +.el .IP "\f(CWMozilla and Netscape 6.x.\fR" 4 +.IX Item "Mozilla and Netscape 6.x." +Mozilla's cookie file is also named \fIcookies.txt\fR, located +somewhere under \fI~/.mozilla\fR, in the directory of your profile. +The full path usually ends up looking somewhat like +\&\fI~/.mozilla/default/some-weird-string/cookies.txt\fR. +.ie n .IP """Internet Explorer.""" 4 +.el .IP "\f(CWInternet Explorer.\fR" 4 +.IX Item "Internet Explorer." +You can produce a cookie file Wget can use by using the File menu, +Import and Export, Export Cookies. This has been tested with Internet +Explorer 5; it is not guaranteed to work with earlier versions. +.ie n .IP """Other browsers.""" 4 +.el .IP "\f(CWOther browsers.\fR" 4 +.IX Item "Other browsers." +If you are using a different browser to create your cookies, +\&\fB\-\-load\-cookies\fR will only work if you can locate or produce a +cookie file in the Netscape format that Wget expects. +.RE +.RS 4 +.Sp +If you cannot use \fB\-\-load\-cookies\fR, there might still be an +alternative. If your browser supports a "cookie manager", you can use +it to view the cookies used when accessing the site you're mirroring. +Write down the name and value of the cookie, and manually instruct Wget +to send those cookies, bypassing the "official" cookie support: +.Sp +.Vb 1 +\& wget \-\-no\-cookies \-\-header "Cookie: <name>=<value>" +.Ve +.RE +.IP "\fB\-\-save\-cookies\fR \fIfile\fR" 4 +.IX Item "--save-cookies file" +Save cookies to \fIfile\fR before exiting. This will not save cookies +that have expired or that have no expiry time (so-called "session +cookies"), but also see \fB\-\-keep\-session\-cookies\fR. +.IP \fB\-\-keep\-session\-cookies\fR 4 +.IX Item "--keep-session-cookies" +When specified, causes \fB\-\-save\-cookies\fR to also save session +cookies. Session cookies are normally not saved because they are +meant to be kept in memory and forgotten when you exit the browser. +Saving them is useful on sites that require you to log in or to visit +the home page before you can access some pages. With this option, +multiple Wget runs are considered a single browser session as far as +the site is concerned. +.Sp +Since the cookie file format does not normally carry session cookies, +Wget marks them with an expiry timestamp of 0. Wget's +\&\fB\-\-load\-cookies\fR recognizes those as session cookies, but it might +confuse other browsers. Also note that cookies so loaded will be +treated as other session cookies, which means that if you want +\&\fB\-\-save\-cookies\fR to preserve them again, you must use +\&\fB\-\-keep\-session\-cookies\fR again. +.IP \fB\-\-ignore\-length\fR 4 +.IX Item "--ignore-length" +Unfortunately, some HTTP servers (CGI programs, to be more +precise) send out bogus \f(CW\*(C`Content\-Length\*(C'\fR headers, which makes Wget +go wild, as it thinks not all the document was retrieved. You can spot +this syndrome if Wget retries getting the same document again and again, +each time claiming that the (otherwise normal) connection has closed on +the very same byte. +.Sp +With this option, Wget will ignore the \f(CW\*(C`Content\-Length\*(C'\fR header\-\-\-as +if it never existed. +.IP \fB\-\-header=\fR\fIheader-line\fR 4 +.IX Item "--header=header-line" +Send \fIheader-line\fR along with the rest of the headers in each +HTTP request. The supplied header is sent as-is, which means it +must contain name and value separated by colon, and must not contain +newlines. +.Sp +You may define more than one additional header by specifying +\&\fB\-\-header\fR more than once. +.Sp +.Vb 3 +\& wget \-\-header=\*(AqAccept\-Charset: iso\-8859\-2\*(Aq \e +\& \-\-header=\*(AqAccept\-Language: hr\*(Aq \e +\& http://fly.srk.fer.hr/ +.Ve +.Sp +Specification of an empty string as the header value will clear all +previous user-defined headers. +.Sp +As of Wget 1.10, this option can be used to override headers otherwise +generated automatically. This example instructs Wget to connect to +localhost, but to specify \fBfoo.bar\fR in the \f(CW\*(C`Host\*(C'\fR header: +.Sp +.Vb 1 +\& wget \-\-header="Host: foo.bar" http://localhost/ +.Ve +.Sp +In versions of Wget prior to 1.10 such use of \fB\-\-header\fR caused +sending of duplicate headers. +.IP \fB\-\-compression=\fR\fItype\fR 4 +.IX Item "--compression=type" +Choose the type of compression to be used. Legal values are +\&\fBauto\fR, \fBgzip\fR and \fBnone\fR. +.Sp +If \fBauto\fR or \fBgzip\fR are specified, Wget asks the server to +compress the file using the gzip compression format. If the server +compresses the file and responds with the \f(CW\*(C`Content\-Encoding\*(C'\fR +header field set appropriately, the file will be decompressed +automatically. +.Sp +If \fBnone\fR is specified, wget will not ask the server to compress +the file and will not decompress any server responses. This is the default. +.Sp +Compression support is currently experimental. In case it is turned on, +please report any bugs to \f(CW\*(C`bug\-wget@gnu.org\*(C'\fR. +.IP \fB\-\-max\-redirect=\fR\fInumber\fR 4 +.IX Item "--max-redirect=number" +Specifies the maximum number of redirections to follow for a resource. +The default is 20, which is usually far more than necessary. However, on +those occasions where you want to allow more (or fewer), this is the +option to use. +.IP \fB\-\-proxy\-user=\fR\fIuser\fR 4 +.IX Item "--proxy-user=user" +.PD 0 +.IP \fB\-\-proxy\-password=\fR\fIpassword\fR 4 +.IX Item "--proxy-password=password" +.PD +Specify the username \fIuser\fR and password \fIpassword\fR for +authentication on a proxy server. Wget will encode them using the +\&\f(CW\*(C`basic\*(C'\fR authentication scheme. +.Sp +Security considerations similar to those with \fB\-\-http\-password\fR +pertain here as well. +.IP \fB\-\-referer=\fR\fIurl\fR 4 +.IX Item "--referer=url" +Include `Referer: \fIurl\fR' header in HTTP request. Useful for +retrieving documents with server-side processing that assume they are +always being retrieved by interactive web browsers and only come out +properly when Referer is set to one of the pages that point to them. +.IP \fB\-\-save\-headers\fR 4 +.IX Item "--save-headers" +Save the headers sent by the HTTP server to the file, preceding the +actual contents, with an empty line as the separator. +.IP "\fB\-U\fR \fIagent-string\fR" 4 +.IX Item "-U agent-string" +.PD 0 +.IP \fB\-\-user\-agent=\fR\fIagent-string\fR 4 +.IX Item "--user-agent=agent-string" +.PD +Identify as \fIagent-string\fR to the HTTP server. +.Sp +The HTTP protocol allows the clients to identify themselves using a +\&\f(CW\*(C`User\-Agent\*(C'\fR header field. This enables distinguishing the +WWW software, usually for statistical purposes or for tracing of +protocol violations. Wget normally identifies as +\&\fBWget/\fR\fIversion\fR, \fIversion\fR being the current version +number of Wget. +.Sp +However, some sites have been known to impose the policy of tailoring +the output according to the \f(CW\*(C`User\-Agent\*(C'\fR\-supplied information. +While this is not such a bad idea in theory, it has been abused by +servers denying information to clients other than (historically) +Netscape or, more frequently, Microsoft Internet Explorer. This +option allows you to change the \f(CW\*(C`User\-Agent\*(C'\fR line issued by Wget. +Use of this option is discouraged, unless you really know what you are +doing. +.Sp +Specifying empty user agent with \fB\-\-user\-agent=""\fR instructs Wget +not to send the \f(CW\*(C`User\-Agent\*(C'\fR header in HTTP requests. +.IP \fB\-\-post\-data=\fR\fIstring\fR 4 +.IX Item "--post-data=string" +.PD 0 +.IP \fB\-\-post\-file=\fR\fIfile\fR 4 +.IX Item "--post-file=file" +.PD +Use POST as the method for all HTTP requests and send the specified +data in the request body. \fB\-\-post\-data\fR sends \fIstring\fR as +data, whereas \fB\-\-post\-file\fR sends the contents of \fIfile\fR. +Other than that, they work in exactly the same way. In particular, +they \fIboth\fR expect content of the form \f(CW\*(C`key1=value1&key2=value2\*(C'\fR, +with percent-encoding for special characters; the only difference is +that one expects its content as a command-line parameter and the other +accepts its content from a file. In particular, \fB\-\-post\-file\fR is +\&\fInot\fR for transmitting files as form attachments: those must +appear as \f(CW\*(C`key=value\*(C'\fR data (with appropriate percent-coding) just +like everything else. Wget does not currently support +\&\f(CW\*(C`multipart/form\-data\*(C'\fR for transmitting POST data; only +\&\f(CW\*(C`application/x\-www\-form\-urlencoded\*(C'\fR. Only one of +\&\fB\-\-post\-data\fR and \fB\-\-post\-file\fR should be specified. +.Sp +Please note that wget does not require the content to be of the form +\&\f(CW\*(C`key1=value1&key2=value2\*(C'\fR, and neither does it test for it. Wget will +simply transmit whatever data is provided to it. Most servers however expect +the POST data to be in the above format when processing HTML Forms. +.Sp +When sending a POST request using the \fB\-\-post\-file\fR option, Wget treats +the file as a binary file and will send every character in the POST request +without stripping trailing newline or formfeed characters. Any other control +characters in the text will also be sent as-is in the POST request. +.Sp +Please be aware that Wget needs to know the size of the POST data in +advance. Therefore the argument to \f(CW\*(C`\-\-post\-file\*(C'\fR must be a regular +file; specifying a FIFO or something like \fI/dev/stdin\fR won't work. +It's not quite clear how to work around this limitation inherent in +HTTP/1.0. Although HTTP/1.1 introduces \fIchunked\fR transfer that +doesn't require knowing the request length in advance, a client can't +use chunked unless it knows it's talking to an HTTP/1.1 server. And it +can't know that until it receives a response, which in turn requires the +request to have been completed \-\- a chicken-and-egg problem. +.Sp +Note: As of version 1.15 if Wget is redirected after the POST request is +completed, its behaviour will depend on the response code returned by the +server. In case of a 301 Moved Permanently, 302 Moved Temporarily or +307 Temporary Redirect, Wget will, in accordance with RFC2616, continue +to send a POST request. +In case a server wants the client to change the Request method upon +redirection, it should send a 303 See Other response code. +.Sp +This example shows how to log in to a server using POST and then proceed to +download the desired pages, presumably only accessible to authorized +users: +.Sp +.Vb 4 +\& # Log in to the server. This can be done only once. +\& wget \-\-save\-cookies cookies.txt \e +\& \-\-post\-data \*(Aquser=foo&password=bar\*(Aq \e +\& http://example.com/auth.php +\& +\& # Now grab the page or pages we care about. +\& wget \-\-load\-cookies cookies.txt \e +\& \-p http://example.com/interesting/article.php +.Ve +.Sp +If the server is using session cookies to track user authentication, +the above will not work because \fB\-\-save\-cookies\fR will not save +them (and neither will browsers) and the \fIcookies.txt\fR file will +be empty. In that case use \fB\-\-keep\-session\-cookies\fR along with +\&\fB\-\-save\-cookies\fR to force saving of session cookies. +.IP \fB\-\-method=\fR\fIHTTP-Method\fR 4 +.IX Item "--method=HTTP-Method" +For the purpose of RESTful scripting, Wget allows sending of other HTTP Methods +without the need to explicitly set them using \fB\-\-header=Header\-Line\fR. +Wget will use whatever string is passed to it after \fB\-\-method\fR as the HTTP +Method to the server. +.IP \fB\-\-body\-data=\fR\fIData-String\fR 4 +.IX Item "--body-data=Data-String" +.PD 0 +.IP \fB\-\-body\-file=\fR\fIData-File\fR 4 +.IX Item "--body-file=Data-File" +.PD +Must be set when additional data needs to be sent to the server along with the +Method specified using \fB\-\-method\fR. \fB\-\-body\-data\fR sends \fIstring\fR as +data, whereas \fB\-\-body\-file\fR sends the contents of \fIfile\fR. Other than that, +they work in exactly the same way. +.Sp +Currently, \fB\-\-body\-file\fR is \fInot\fR for transmitting files as a whole. +Wget does not currently support \f(CW\*(C`multipart/form\-data\*(C'\fR for transmitting data; +only \f(CW\*(C`application/x\-www\-form\-urlencoded\*(C'\fR. In the future, this may be changed +so that wget sends the \fB\-\-body\-file\fR as a complete file instead of sending its +contents to the server. Please be aware that Wget needs to know the contents of +BODY Data in advance, and hence the argument to \fB\-\-body\-file\fR should be a +regular file. See \fB\-\-post\-file\fR for a more detailed explanation. +Only one of \fB\-\-body\-data\fR and \fB\-\-body\-file\fR should be specified. +.Sp +If Wget is redirected after the request is completed, Wget will +suspend the current method and send a GET request till the redirection +is completed. This is true for all redirection response codes except +307 Temporary Redirect which is used to explicitly specify that the +request method should \fInot\fR change. Another exception is when +the method is set to \f(CW\*(C`POST\*(C'\fR, in which case the redirection rules +specified under \fB\-\-post\-data\fR are followed. +.IP \fB\-\-content\-disposition\fR 4 +.IX Item "--content-disposition" +If this is set to on, experimental (not fully-functional) support for +\&\f(CW\*(C`Content\-Disposition\*(C'\fR headers is enabled. This can currently result in +extra round-trips to the server for a \f(CW\*(C`HEAD\*(C'\fR request, and is known +to suffer from a few bugs, which is why it is not currently enabled by default. +.Sp +This option is useful for some file-downloading CGI programs that use +\&\f(CW\*(C`Content\-Disposition\*(C'\fR headers to describe what the name of a +downloaded file should be. +.Sp +When combined with \fB\-\-metalink\-over\-http\fR and \fB\-\-trust\-server\-names\fR, +a \fBContent-Type: application/metalink4+xml\fR file is named using the +\&\f(CW\*(C`Content\-Disposition\*(C'\fR filename field, if available. +.IP \fB\-\-content\-on\-error\fR 4 +.IX Item "--content-on-error" +If this is set to on, wget will not skip the content when the server responds +with a http status code that indicates error. +.IP \fB\-\-trust\-server\-names\fR 4 +.IX Item "--trust-server-names" +If this is set, on a redirect, the local file name will be based +on the redirection URL. By default the local file name is based on +the original URL. When doing recursive retrieving this can be helpful +because in many web sites redirected URLs correspond to an underlying +file structure, while link URLs do not. +.IP \fB\-\-auth\-no\-challenge\fR 4 +.IX Item "--auth-no-challenge" +If this option is given, Wget will send Basic HTTP authentication +information (plaintext username and password) for all requests, just +like Wget 1.10.2 and prior did by default. +.Sp +Use of this option is not recommended, and is intended only to support +some few obscure servers, which never send HTTP authentication +challenges, but accept unsolicited auth info, say, in addition to +form-based authentication. +.IP \fB\-\-retry\-on\-host\-error\fR 4 +.IX Item "--retry-on-host-error" +Consider host errors, such as "Temporary failure in name resolution", +as non-fatal, transient errors. +.IP \fB\-\-retry\-on\-http\-error=\fR\fIcode[,code,...]\fR 4 +.IX Item "--retry-on-http-error=code[,code,...]" +Consider given HTTP response codes as non-fatal, transient errors. +Supply a comma-separated list of 3\-digit HTTP response codes as +argument. Useful to work around special circumstances where retries +are required, but the server responds with an error code normally not +retried by Wget. Such errors might be 503 (Service Unavailable) and +429 (Too Many Requests). Retries enabled by this option are performed +subject to the normal retry timing and retry count limitations of +Wget. +.Sp +Using this option is intended to support special use cases only and is +generally not recommended, as it can force retries even in cases where +the server is actually trying to decrease its load. Please use wisely +and only if you know what you are doing. +.SS "HTTPS (SSL/TLS) Options" +.IX Subsection "HTTPS (SSL/TLS) Options" +To support encrypted HTTP (HTTPS) downloads, Wget must be compiled +with an external SSL library. The current default is GnuTLS. +In addition, Wget also supports HSTS (HTTP Strict Transport Security). +If Wget is compiled without SSL support, none of these options are available. +.IP \fB\-\-secure\-protocol=\fR\fIprotocol\fR 4 +.IX Item "--secure-protocol=protocol" +Choose the secure protocol to be used. Legal values are \fBauto\fR, +\&\fBSSLv2\fR, \fBSSLv3\fR, \fBTLSv1\fR, \fBTLSv1_1\fR, \fBTLSv1_2\fR, +\&\fBTLSv1_3\fR and \fBPFS\fR. If \fBauto\fR is used, the SSL library is +given the liberty of choosing the appropriate protocol automatically, which is +achieved by sending a TLSv1 greeting. This is the default. +.Sp +Specifying \fBSSLv2\fR, \fBSSLv3\fR, \fBTLSv1\fR, \fBTLSv1_1\fR, +\&\fBTLSv1_2\fR or \fBTLSv1_3\fR forces the use of the corresponding +protocol. This is useful when talking to old and buggy SSL server +implementations that make it hard for the underlying SSL library to choose +the correct protocol version. Fortunately, such servers are quite rare. +.Sp +Specifying \fBPFS\fR enforces the use of the so-called Perfect Forward +Security cipher suites. In short, PFS adds security by creating a one-time +key for each SSL connection. It has a bit more CPU impact on client and server. +We use known to be secure ciphers (e.g. no MD4) and the TLS protocol. This mode +also explicitly excludes non-PFS key exchange methods, such as RSA. +.IP \fB\-\-https\-only\fR 4 +.IX Item "--https-only" +When in recursive mode, only HTTPS links are followed. +.IP \fB\-\-ciphers\fR 4 +.IX Item "--ciphers" +Set the cipher list string. Typically this string sets the +cipher suites and other SSL/TLS options that the user wish should be used, in a +set order of preference (GnuTLS calls it 'priority string'). This string +will be fed verbatim to the SSL/TLS engine (OpenSSL or GnuTLS) and hence +its format and syntax is dependent on that. Wget will not process or manipulate it +in any way. Refer to the OpenSSL or GnuTLS documentation for more information. +.IP \fB\-\-no\-check\-certificate\fR 4 +.IX Item "--no-check-certificate" +Don't check the server certificate against the available certificate +authorities. Also don't require the URL host name to match the common +name presented by the certificate. +.Sp +As of Wget 1.10, the default is to verify the server's certificate +against the recognized certificate authorities, breaking the SSL +handshake and aborting the download if the verification fails. +Although this provides more secure downloads, it does break +interoperability with some sites that worked with previous Wget +versions, particularly those using self-signed, expired, or otherwise +invalid certificates. This option forces an "insecure" mode of +operation that turns the certificate verification errors into warnings +and allows you to proceed. +.Sp +If you encounter "certificate verification" errors or ones saying +that "common name doesn't match requested host name", you can use +this option to bypass the verification and proceed with the download. +\&\fIOnly use this option if you are otherwise convinced of the +site's authenticity, or if you really don't care about the validity of +its certificate.\fR It is almost always a bad idea not to check the +certificates when transmitting confidential or important data. +For self\-signed/internal certificates, you should download the certificate +and verify against that instead of forcing this insecure mode. +If you are really sure of not desiring any certificate verification, you +can specify \-\-check\-certificate=quiet to tell wget to not print any +warning about invalid certificates, albeit in most cases this is the +wrong thing to do. +.IP \fB\-\-certificate=\fR\fIfile\fR 4 +.IX Item "--certificate=file" +Use the client certificate stored in \fIfile\fR. This is needed for +servers that are configured to require certificates from the clients +that connect to them. Normally a certificate is not required and this +switch is optional. +.IP \fB\-\-certificate\-type=\fR\fItype\fR 4 +.IX Item "--certificate-type=type" +Specify the type of the client certificate. Legal values are +\&\fBPEM\fR (assumed by default) and \fBDER\fR, also known as +\&\fBASN1\fR. +.IP \fB\-\-private\-key=\fR\fIfile\fR 4 +.IX Item "--private-key=file" +Read the private key from \fIfile\fR. This allows you to provide the +private key in a file separate from the certificate. +.IP \fB\-\-private\-key\-type=\fR\fItype\fR 4 +.IX Item "--private-key-type=type" +Specify the type of the private key. Accepted values are \fBPEM\fR +(the default) and \fBDER\fR. +.IP \fB\-\-ca\-certificate=\fR\fIfile\fR 4 +.IX Item "--ca-certificate=file" +Use \fIfile\fR as the file with the bundle of certificate authorities +("CA") to verify the peers. The certificates must be in PEM format. +.Sp +Without this option Wget looks for CA certificates at the +system-specified locations, chosen at OpenSSL installation time. +.IP \fB\-\-ca\-directory=\fR\fIdirectory\fR 4 +.IX Item "--ca-directory=directory" +Specifies directory containing CA certificates in PEM format. Each +file contains one CA certificate, and the file name is based on a hash +value derived from the certificate. This is achieved by processing a +certificate directory with the \f(CW\*(C`c_rehash\*(C'\fR utility supplied with +OpenSSL. Using \fB\-\-ca\-directory\fR is more efficient than +\&\fB\-\-ca\-certificate\fR when many certificates are installed because +it allows Wget to fetch certificates on demand. +.Sp +Without this option Wget looks for CA certificates at the +system-specified locations, chosen at OpenSSL installation time. +.IP \fB\-\-crl\-file=\fR\fIfile\fR 4 +.IX Item "--crl-file=file" +Specifies a CRL file in \fIfile\fR. This is needed for certificates +that have been revocated by the CAs. +.IP \fB\-\-pinnedpubkey=file/hashes\fR 4 +.IX Item "--pinnedpubkey=file/hashes" +Tells wget to use the specified public key file (or hashes) to verify the peer. +This can be a path to a file which contains a single public key in PEM or DER +format, or any number of base64 encoded sha256 hashes preceded by "sha256//" +and separated by ";" +.Sp +When negotiating a TLS or SSL connection, the server sends a certificate +indicating its identity. A public key is extracted from this certificate and if +it does not exactly match the public key(s) provided to this option, wget will +abort the connection before sending or receiving any data. +.IP \fB\-\-random\-file=\fR\fIfile\fR 4 +.IX Item "--random-file=file" +[OpenSSL and LibreSSL only] +Use \fIfile\fR as the source of random data for seeding the +pseudo-random number generator on systems without \fI/dev/urandom\fR. +.Sp +On such systems the SSL library needs an external source of randomness +to initialize. Randomness may be provided by EGD (see +\&\fB\-\-egd\-file\fR below) or read from an external source specified by +the user. If this option is not specified, Wget looks for random data +in \f(CW$RANDFILE\fR or, if that is unset, in \fR\f(CI$HOME\fR\fI/.rnd\fR. +.Sp +If you're getting the "Could not seed OpenSSL PRNG; disabling SSL." +error, you should provide random data using some of the methods +described above. +.IP \fB\-\-egd\-file=\fR\fIfile\fR 4 +.IX Item "--egd-file=file" +[OpenSSL only] +Use \fIfile\fR as the EGD socket. EGD stands for \fIEntropy +Gathering Daemon\fR, a user-space program that collects data from +various unpredictable system sources and makes it available to other +programs that might need it. Encryption software, such as the SSL +library, needs sources of non-repeating randomness to seed the random +number generator used to produce cryptographically strong keys. +.Sp +OpenSSL allows the user to specify his own source of entropy using the +\&\f(CW\*(C`RAND_FILE\*(C'\fR environment variable. If this variable is unset, or +if the specified file does not produce enough randomness, OpenSSL will +read random data from EGD socket specified using this option. +.Sp +If this option is not specified (and the equivalent startup command is +not used), EGD is never contacted. EGD is not needed on modern Unix +systems that support \fI/dev/urandom\fR. +.IP \fB\-\-no\-hsts\fR 4 +.IX Item "--no-hsts" +Wget supports HSTS (HTTP Strict Transport Security, RFC 6797) by default. +Use \fB\-\-no\-hsts\fR to make Wget act as a non-HSTS-compliant UA. As a +consequence, Wget would ignore all the \f(CW\*(C`Strict\-Transport\-Security\*(C'\fR +headers, and would not enforce any existing HSTS policy. +.IP \fB\-\-hsts\-file=\fR\fIfile\fR 4 +.IX Item "--hsts-file=file" +By default, Wget stores its HSTS database in \fI~/.wget\-hsts\fR. +You can use \fB\-\-hsts\-file\fR to override this. Wget will use +the supplied file as the HSTS database. Such file must conform to the +correct HSTS database format used by Wget. If Wget cannot parse the provided +file, the behaviour is unspecified. +.Sp +The Wget's HSTS database is a plain text file. Each line contains an HSTS entry +(ie. a site that has issued a \f(CW\*(C`Strict\-Transport\-Security\*(C'\fR header and that +therefore has specified a concrete HSTS policy to be applied). Lines starting with +a dash (\f(CW\*(C`#\*(C'\fR) are ignored by Wget. Please note that in spite of this convenient +human-readability hand-hacking the HSTS database is generally not a good idea. +.Sp +An HSTS entry line consists of several fields separated by one or more whitespace: +.Sp +\&\f(CW\*(C`<hostname> SP [<port>] SP <include subdomains> SP <created> SP <max\-age>\*(C'\fR +.Sp +The \fIhostname\fR and \fIport\fR fields indicate the hostname and port to which +the given HSTS policy applies. The \fIport\fR field may be zero, and it will, in +most of the cases. That means that the port number will not be taken into account +when deciding whether such HSTS policy should be applied on a given request (only +the hostname will be evaluated). When \fIport\fR is different to zero, both the +target hostname and the port will be evaluated and the HSTS policy will only be applied +if both of them match. This feature has been included for testing/development purposes only. +The Wget testsuite (in \fItestenv/\fR) creates HSTS databases with explicit ports +with the purpose of ensuring Wget's correct behaviour. Applying HSTS policies to ports +other than the default ones is discouraged by RFC 6797 (see Appendix B "Differences +between HSTS Policy and Same-Origin Policy"). Thus, this functionality should not be used +in production environments and \fIport\fR will typically be zero. The last three fields +do what they are expected to. The field \fIinclude_subdomains\fR can either be \f(CW1\fR +or \f(CW0\fR and it signals whether the subdomains of the target domain should be +part of the given HSTS policy as well. The \fIcreated\fR and \fImax-age\fR fields +hold the timestamp values of when such entry was created (first seen by Wget) and the +HSTS-defined value 'max\-age', which states how long should that HSTS policy remain active, +measured in seconds elapsed since the timestamp stored in \fIcreated\fR. Once that time +has passed, that HSTS policy will no longer be valid and will eventually be removed +from the database. +.Sp +If you supply your own HSTS database via \fB\-\-hsts\-file\fR, be aware that Wget +may modify the provided file if any change occurs between the HSTS policies +requested by the remote servers and those in the file. When Wget exits, +it effectively updates the HSTS database by rewriting the database file with the new entries. +.Sp +If the supplied file does not exist, Wget will create one. This file will contain the new HSTS +entries. If no HSTS entries were generated (no \f(CW\*(C`Strict\-Transport\-Security\*(C'\fR headers +were sent by any of the servers) then no file will be created, not even an empty one. This +behaviour applies to the default database file (\fI~/.wget\-hsts\fR) as well: it will not be +created until some server enforces an HSTS policy. +.Sp +Care is taken not to override possible changes made by other Wget processes at +the same time over the HSTS database. Before dumping the updated HSTS entries +on the file, Wget will re-read it and merge the changes. +.Sp +Using a custom HSTS database and/or modifying an existing one is discouraged. +For more information about the potential security threats arose from such practice, +see section 14 "Security Considerations" of RFC 6797, specially section 14.9 +"Creative Manipulation of HSTS Policy Store". +.IP \fB\-\-warc\-file=\fR\fIfile\fR 4 +.IX Item "--warc-file=file" +Use \fIfile\fR as the destination WARC file. +.IP \fB\-\-warc\-header=\fR\fIstring\fR 4 +.IX Item "--warc-header=string" +Use \fIstring\fR into as the warcinfo record. +.IP \fB\-\-warc\-max\-size=\fR\fIsize\fR 4 +.IX Item "--warc-max-size=size" +Set the maximum size of the WARC files to \fIsize\fR. +.IP \fB\-\-warc\-cdx\fR 4 +.IX Item "--warc-cdx" +Write CDX index files. +.IP \fB\-\-warc\-dedup=\fR\fIfile\fR 4 +.IX Item "--warc-dedup=file" +Do not store records listed in this CDX file. +.IP \fB\-\-no\-warc\-compression\fR 4 +.IX Item "--no-warc-compression" +Do not compress WARC files with GZIP. +.IP \fB\-\-no\-warc\-digests\fR 4 +.IX Item "--no-warc-digests" +Do not calculate SHA1 digests. +.IP \fB\-\-no\-warc\-keep\-log\fR 4 +.IX Item "--no-warc-keep-log" +Do not store the log file in a WARC record. +.IP \fB\-\-warc\-tempdir=\fR\fIdir\fR 4 +.IX Item "--warc-tempdir=dir" +Specify the location for temporary files created by the WARC writer. +.SS "FTP Options" +.IX Subsection "FTP Options" +.IP \fB\-\-ftp\-user=\fR\fIuser\fR 4 +.IX Item "--ftp-user=user" +.PD 0 +.IP \fB\-\-ftp\-password=\fR\fIpassword\fR 4 +.IX Item "--ftp-password=password" +.PD +Specify the username \fIuser\fR and password \fIpassword\fR on an +FTP server. Without this, or the corresponding startup option, +the password defaults to \fB\-wget@\fR, normally used for anonymous +FTP. +.Sp +Another way to specify username and password is in the URL itself. Either method reveals your password to anyone who +bothers to run \f(CW\*(C`ps\*(C'\fR. To prevent the passwords from being seen, +store them in \fI.wgetrc\fR or \fI.netrc\fR, and make sure to protect +those files from other users with \f(CW\*(C`chmod\*(C'\fR. If the passwords are +really important, do not leave them lying in those files either\-\-\-edit +the files and delete them after Wget has started the download. +.IP \fB\-\-no\-remove\-listing\fR 4 +.IX Item "--no-remove-listing" +Don't remove the temporary \fI.listing\fR files generated by FTP +retrievals. Normally, these files contain the raw directory listings +received from FTP servers. Not removing them can be useful for +debugging purposes, or when you want to be able to easily check on the +contents of remote server directories (e.g. to verify that a mirror +you're running is complete). +.Sp +Note that even though Wget writes to a known filename for this file, +this is not a security hole in the scenario of a user making +\&\fI.listing\fR a symbolic link to \fI/etc/passwd\fR or something and +asking \f(CW\*(C`root\*(C'\fR to run Wget in his or her directory. Depending on +the options used, either Wget will refuse to write to \fI.listing\fR, +making the globbing/recursion/time\-stamping operation fail, or the +symbolic link will be deleted and replaced with the actual +\&\fI.listing\fR file, or the listing will be written to a +\&\fI.listing.number\fR file. +.Sp +Even though this situation isn't a problem, though, \f(CW\*(C`root\*(C'\fR should +never run Wget in a non-trusted user's directory. A user could do +something as simple as linking \fIindex.html\fR to \fI/etc/passwd\fR +and asking \f(CW\*(C`root\*(C'\fR to run Wget with \fB\-N\fR or \fB\-r\fR so the file +will be overwritten. +.IP \fB\-\-no\-glob\fR 4 +.IX Item "--no-glob" +Turn off FTP globbing. Globbing refers to the use of shell-like +special characters (\fIwildcards\fR), like \fB*\fR, \fB?\fR, \fB[\fR +and \fB]\fR to retrieve more than one file from the same directory at +once, like: +.Sp +.Vb 1 +\& wget ftp://gnjilux.srk.fer.hr/*.msg +.Ve +.Sp +By default, globbing will be turned on if the URL contains a +globbing character. This option may be used to turn globbing on or off +permanently. +.Sp +You may have to quote the URL to protect it from being expanded by +your shell. Globbing makes Wget look for a directory listing, which is +system-specific. This is why it currently works only with Unix FTP +servers (and the ones emulating Unix \f(CW\*(C`ls\*(C'\fR output). +.IP \fB\-\-no\-passive\-ftp\fR 4 +.IX Item "--no-passive-ftp" +Disable the use of the \fIpassive\fR FTP transfer mode. Passive FTP +mandates that the client connect to the server to establish the data +connection rather than the other way around. +.Sp +If the machine is connected to the Internet directly, both passive and +active FTP should work equally well. Behind most firewall and NAT +configurations passive FTP has a better chance of working. However, +in some rare firewall configurations, active FTP actually works when +passive FTP doesn't. If you suspect this to be the case, use this +option, or set \f(CW\*(C`passive_ftp=off\*(C'\fR in your init file. +.IP \fB\-\-preserve\-permissions\fR 4 +.IX Item "--preserve-permissions" +Preserve remote file permissions instead of permissions set by umask. +.IP \fB\-\-retr\-symlinks\fR 4 +.IX Item "--retr-symlinks" +By default, when retrieving FTP directories recursively and a symbolic link +is encountered, the symbolic link is traversed and the pointed-to files are +retrieved. Currently, Wget does not traverse symbolic links to directories to +download them recursively, though this feature may be added in the future. +.Sp +When \fB\-\-retr\-symlinks=no\fR is specified, the linked-to file is not +downloaded. Instead, a matching symbolic link is created on the local +file system. The pointed-to file will not be retrieved unless this recursive +retrieval would have encountered it separately and downloaded it anyway. This +option poses a security risk where a malicious FTP Server may cause Wget to +write to files outside of the intended directories through a specially crafted +\&.LISTING file. +.Sp +Note that when retrieving a file (not a directory) because it was +specified on the command-line, rather than because it was recursed to, +this option has no effect. Symbolic links are always traversed in this +case. +.SS "FTPS Options" +.IX Subsection "FTPS Options" +.IP \fB\-\-ftps\-implicit\fR 4 +.IX Item "--ftps-implicit" +This option tells Wget to use FTPS implicitly. Implicit FTPS consists of initializing +SSL/TLS from the very beginning of the control connection. This option does not send +an \f(CW\*(C`AUTH TLS\*(C'\fR command: it assumes the server speaks FTPS and directly starts an +SSL/TLS connection. If the attempt is successful, the session continues just like +regular FTPS (\f(CW\*(C`PBSZ\*(C'\fR and \f(CW\*(C`PROT\*(C'\fR are sent, etc.). +Implicit FTPS is no longer a requirement for FTPS implementations, and thus +many servers may not support it. If \fB\-\-ftps\-implicit\fR is passed and no explicit +port number specified, the default port for implicit FTPS, 990, will be used, instead +of the default port for the "normal" (explicit) FTPS which is the same as that of FTP, +21. +.IP \fB\-\-no\-ftps\-resume\-ssl\fR 4 +.IX Item "--no-ftps-resume-ssl" +Do not resume the SSL/TLS session in the data channel. When starting a data connection, +Wget tries to resume the SSL/TLS session previously started in the control connection. +SSL/TLS session resumption avoids performing an entirely new handshake by reusing +the SSL/TLS parameters of a previous session. Typically, the FTPS servers want it that way, +so Wget does this by default. Under rare circumstances however, one might want to +start an entirely new SSL/TLS session in every data connection. +This is what \fB\-\-no\-ftps\-resume\-ssl\fR is for. +.IP \fB\-\-ftps\-clear\-data\-connection\fR 4 +.IX Item "--ftps-clear-data-connection" +All the data connections will be in plain text. Only the control connection will be +under SSL/TLS. Wget will send a \f(CW\*(C`PROT C\*(C'\fR command to achieve this, which must be +approved by the server. +.IP \fB\-\-ftps\-fallback\-to\-ftp\fR 4 +.IX Item "--ftps-fallback-to-ftp" +Fall back to FTP if FTPS is not supported by the target server. For security reasons, +this option is not asserted by default. The default behaviour is to exit with an error. +If a server does not successfully reply to the initial \f(CW\*(C`AUTH TLS\*(C'\fR command, or in the +case of implicit FTPS, if the initial SSL/TLS connection attempt is rejected, it is +considered that such server does not support FTPS. +.SS "Recursive Retrieval Options" +.IX Subsection "Recursive Retrieval Options" +.IP \fB\-r\fR 4 +.IX Item "-r" +.PD 0 +.IP \fB\-\-recursive\fR 4 +.IX Item "--recursive" +.PD +Turn on recursive retrieving. The default maximum depth is 5. +.IP "\fB\-l\fR \fIdepth\fR" 4 +.IX Item "-l depth" +.PD 0 +.IP \fB\-\-level=\fR\fIdepth\fR 4 +.IX Item "--level=depth" +.PD +Set the maximum number of subdirectories that Wget will recurse into to \fIdepth\fR. +In order to prevent one from accidentally downloading very large websites when using recursion +this is limited to a depth of 5 by default, i.e., it will traverse at most 5 directories deep +starting from the provided URL. +Set \fB\-l 0\fR or \fB\-l inf\fR for infinite recursion depth. +.Sp +.Vb 1 +\& wget \-r \-l 0 http://<site>/1.html +.Ve +.Sp +Ideally, one would expect this to download just \fI1.html\fR. +but unfortunately this is not the case, because \fB\-l 0\fR is equivalent to +\&\fB\-l inf\fR\-\-\-that is, infinite recursion. To download a single HTML +page (or a handful of them), specify them all on the command line and leave away \fB\-r\fR +and \fB\-l\fR. To download the essential items to view a single HTML page, see \fBpage requisites\fR. +.IP \fB\-\-delete\-after\fR 4 +.IX Item "--delete-after" +This option tells Wget to delete every single file it downloads, +\&\fIafter\fR having done so. It is useful for pre-fetching popular +pages through a proxy, e.g.: +.Sp +.Vb 1 +\& wget \-r \-nd \-\-delete\-after http://whatever.com/~popular/page/ +.Ve +.Sp +The \fB\-r\fR option is to retrieve recursively, and \fB\-nd\fR to not +create directories. +.Sp +Note that \fB\-\-delete\-after\fR deletes files on the local machine. It +does not issue the \fBDELE\fR command to remote FTP sites, for +instance. Also note that when \fB\-\-delete\-after\fR is specified, +\&\fB\-\-convert\-links\fR is ignored, so \fB.orig\fR files are simply not +created in the first place. +.IP \fB\-k\fR 4 +.IX Item "-k" +.PD 0 +.IP \fB\-\-convert\-links\fR 4 +.IX Item "--convert-links" +.PD +After the download is complete, convert the links in the document to +make them suitable for local viewing. This affects not only the visible +hyperlinks, but any part of the document that links to external content, +such as embedded images, links to style sheets, hyperlinks to non-HTML +content, etc. +.Sp +Each link will be changed in one of the two ways: +.RS 4 +.IP \(bu 4 +The links to files that have been downloaded by Wget will be changed to +refer to the file they point to as a relative link. +.Sp +Example: if the downloaded file \fI/foo/doc.html\fR links to +\&\fI/bar/img.gif\fR, also downloaded, then the link in \fIdoc.html\fR +will be modified to point to \fB../bar/img.gif\fR. This kind of +transformation works reliably for arbitrary combinations of directories. +.IP \(bu 4 +The links to files that have not been downloaded by Wget will be changed +to include host name and absolute path of the location they point to. +.Sp +Example: if the downloaded file \fI/foo/doc.html\fR links to +\&\fI/bar/img.gif\fR (or to \fI../bar/img.gif\fR), then the link in +\&\fIdoc.html\fR will be modified to point to +\&\fIhttp://hostname/bar/img.gif\fR. +.RE +.RS 4 +.Sp +Because of this, local browsing works reliably: if a linked file was +downloaded, the link will refer to its local name; if it was not +downloaded, the link will refer to its full Internet address rather than +presenting a broken link. The fact that the former links are converted +to relative links ensures that you can move the downloaded hierarchy to +another directory. +.Sp +Note that only at the end of the download can Wget know which links have +been downloaded. Because of that, the work done by \fB\-k\fR will be +performed at the end of all the downloads. +.RE +.IP \fB\-\-convert\-file\-only\fR 4 +.IX Item "--convert-file-only" +This option converts only the filename part of the URLs, leaving the rest +of the URLs untouched. This filename part is sometimes referred to as the +"basename", although we avoid that term here in order not to cause confusion. +.Sp +It works particularly well in conjunction with \fB\-\-adjust\-extension\fR, although +this coupling is not enforced. It proves useful to populate Internet caches +with files downloaded from different hosts. +.Sp +Example: if some link points to \fI//foo.com/bar.cgi?xyz\fR with +\&\fB\-\-adjust\-extension\fR asserted and its local destination is intended to be +\&\fI./foo.com/bar.cgi?xyz.css\fR, then the link would be converted to +\&\fI//foo.com/bar.cgi?xyz.css\fR. Note that only the filename part has been +modified. The rest of the URL has been left untouched, including the net path +(\f(CW\*(C`//\*(C'\fR) which would otherwise be processed by Wget and converted to the +effective scheme (ie. \f(CW\*(C`http://\*(C'\fR). +.IP \fB\-K\fR 4 +.IX Item "-K" +.PD 0 +.IP \fB\-\-backup\-converted\fR 4 +.IX Item "--backup-converted" +.PD +When converting a file, back up the original version with a \fB.orig\fR +suffix. Affects the behavior of \fB\-N\fR. +.IP \fB\-m\fR 4 +.IX Item "-m" +.PD 0 +.IP \fB\-\-mirror\fR 4 +.IX Item "--mirror" +.PD +Turn on options suitable for mirroring. This option turns on recursion +and time-stamping, sets infinite recursion depth and keeps FTP +directory listings. It is currently equivalent to +\&\fB\-r \-N \-l inf \-\-no\-remove\-listing\fR. +.IP \fB\-p\fR 4 +.IX Item "-p" +.PD 0 +.IP \fB\-\-page\-requisites\fR 4 +.IX Item "--page-requisites" +.PD +This option causes Wget to download all the files that are necessary to +properly display a given HTML page. This includes such things as +inlined images, sounds, and referenced stylesheets. +.Sp +Ordinarily, when downloading a single HTML page, any requisite documents +that may be needed to display it properly are not downloaded. Using +\&\fB\-r\fR together with \fB\-l\fR can help, but since Wget does not +ordinarily distinguish between external and inlined documents, one is +generally left with "leaf documents" that are missing their +requisites. +.Sp +For instance, say document \fI1.html\fR contains an \f(CW\*(C`<IMG>\*(C'\fR tag +referencing \fI1.gif\fR and an \f(CW\*(C`<A>\*(C'\fR tag pointing to external +document \fI2.html\fR. Say that \fI2.html\fR is similar but that its +image is \fI2.gif\fR and it links to \fI3.html\fR. Say this +continues up to some arbitrarily high number. +.Sp +If one executes the command: +.Sp +.Vb 1 +\& wget \-r \-l 2 http://<site>/1.html +.Ve +.Sp +then \fI1.html\fR, \fI1.gif\fR, \fI2.html\fR, \fI2.gif\fR, and +\&\fI3.html\fR will be downloaded. As you can see, \fI3.html\fR is +without its requisite \fI3.gif\fR because Wget is simply counting the +number of hops (up to 2) away from \fI1.html\fR in order to determine +where to stop the recursion. However, with this command: +.Sp +.Vb 1 +\& wget \-r \-l 2 \-p http://<site>/1.html +.Ve +.Sp +all the above files \fIand\fR \fI3.html\fR's requisite \fI3.gif\fR +will be downloaded. Similarly, +.Sp +.Vb 1 +\& wget \-r \-l 1 \-p http://<site>/1.html +.Ve +.Sp +will cause \fI1.html\fR, \fI1.gif\fR, \fI2.html\fR, and \fI2.gif\fR +to be downloaded. One might think that: +.Sp +.Vb 1 +\& wget \-r \-l 0 \-p http://<site>/1.html +.Ve +.Sp +would download just \fI1.html\fR and \fI1.gif\fR, but unfortunately +this is not the case, because \fB\-l 0\fR is equivalent to +\&\fB\-l inf\fR\-\-\-that is, infinite recursion. To download a single HTML +page (or a handful of them, all specified on the command-line or in a +\&\fB\-i\fR URL input file) and its (or their) requisites, simply leave off +\&\fB\-r\fR and \fB\-l\fR: +.Sp +.Vb 1 +\& wget \-p http://<site>/1.html +.Ve +.Sp +Note that Wget will behave as if \fB\-r\fR had been specified, but only +that single page and its requisites will be downloaded. Links from that +page to external documents will not be followed. Actually, to download +a single page and all its requisites (even if they exist on separate +websites), and make sure the lot displays properly locally, this author +likes to use a few options in addition to \fB\-p\fR: +.Sp +.Vb 1 +\& wget \-E \-H \-k \-K \-p http://<site>/<document> +.Ve +.Sp +To finish off this topic, it's worth knowing that Wget's idea of an +external document link is any URL specified in an \f(CW\*(C`<A>\*(C'\fR tag, an +\&\f(CW\*(C`<AREA>\*(C'\fR tag, or a \f(CW\*(C`<LINK>\*(C'\fR tag other than \f(CW\*(C`<LINK +REL="stylesheet">\*(C'\fR. +.IP \fB\-\-strict\-comments\fR 4 +.IX Item "--strict-comments" +Turn on strict parsing of HTML comments. The default is to terminate +comments at the first occurrence of \fB\-\->\fR. +.Sp +According to specifications, HTML comments are expressed as SGML +\&\fIdeclarations\fR. Declaration is special markup that begins with +\&\fB<!\fR and ends with \fB>\fR, such as \fB<!DOCTYPE ...>\fR, that +may contain comments between a pair of \fB\-\-\fR delimiters. HTML +comments are "empty declarations", SGML declarations without any +non-comment text. Therefore, \fB<!\-\-foo\-\->\fR is a valid comment, and +so is \fB<!\-\-one\-\- \-\-two\-\->\fR, but \fB<!\-\-1\-\-2\-\->\fR is not. +.Sp +On the other hand, most HTML writers don't perceive comments as anything +other than text delimited with \fB<!\-\-\fR and \fB\-\->\fR, which is not +quite the same. For example, something like \fB<!\-\-\-\-\-\-\-\-\-\-\-\->\fR +works as a valid comment as long as the number of dashes is a multiple +of four (!). If not, the comment technically lasts until the next +\&\fB\-\-\fR, which may be at the other end of the document. Because of +this, many popular browsers completely ignore the specification and +implement what users have come to expect: comments delimited with +\&\fB<!\-\-\fR and \fB\-\->\fR. +.Sp +Until version 1.9, Wget interpreted comments strictly, which resulted in +missing links in many web pages that displayed fine in browsers, but had +the misfortune of containing non-compliant comments. Beginning with +version 1.9, Wget has joined the ranks of clients that implements +"naive" comments, terminating each comment at the first occurrence of +\&\fB\-\->\fR. +.Sp +If, for whatever reason, you want strict comment parsing, use this +option to turn it on. +.SS "Recursive Accept/Reject Options" +.IX Subsection "Recursive Accept/Reject Options" +.IP "\fB\-A\fR \fIacclist\fR \fB\-\-accept\fR \fIacclist\fR" 4 +.IX Item "-A acclist --accept acclist" +.PD 0 +.IP "\fB\-R\fR \fIrejlist\fR \fB\-\-reject\fR \fIrejlist\fR" 4 +.IX Item "-R rejlist --reject rejlist" +.PD +Specify comma-separated lists of file name suffixes or patterns to +accept or reject. Note that if +any of the wildcard characters, \fB*\fR, \fB?\fR, \fB[\fR or +\&\fB]\fR, appear in an element of \fIacclist\fR or \fIrejlist\fR, +it will be treated as a pattern, rather than a suffix. +In this case, you have to enclose the pattern into quotes to prevent +your shell from expanding it, like in \fB\-A "*.mp3"\fR or \fB\-A '*.mp3'\fR. +.IP "\fB\-\-accept\-regex\fR \fIurlregex\fR" 4 +.IX Item "--accept-regex urlregex" +.PD 0 +.IP "\fB\-\-reject\-regex\fR \fIurlregex\fR" 4 +.IX Item "--reject-regex urlregex" +.PD +Specify a regular expression to accept or reject the complete URL. +.IP "\fB\-\-regex\-type\fR \fIregextype\fR" 4 +.IX Item "--regex-type regextype" +Specify the regular expression type. Possible types are \fBposix\fR or +\&\fBpcre\fR. Note that to be able to use \fBpcre\fR type, wget has to be +compiled with libpcre support. +.IP "\fB\-D\fR \fIdomain-list\fR" 4 +.IX Item "-D domain-list" +.PD 0 +.IP \fB\-\-domains=\fR\fIdomain-list\fR 4 +.IX Item "--domains=domain-list" +.PD +Set domains to be followed. \fIdomain-list\fR is a comma-separated list +of domains. Note that it does \fInot\fR turn on \fB\-H\fR. +.IP "\fB\-\-exclude\-domains\fR \fIdomain-list\fR" 4 +.IX Item "--exclude-domains domain-list" +Specify the domains that are \fInot\fR to be followed. +.IP \fB\-\-follow\-ftp\fR 4 +.IX Item "--follow-ftp" +Follow FTP links from HTML documents. Without this option, +Wget will ignore all the FTP links. +.IP \fB\-\-follow\-tags=\fR\fIlist\fR 4 +.IX Item "--follow-tags=list" +Wget has an internal table of HTML tag / attribute pairs that it +considers when looking for linked documents during a recursive +retrieval. If a user wants only a subset of those tags to be +considered, however, he or she should be specify such tags in a +comma-separated \fIlist\fR with this option. +.IP \fB\-\-ignore\-tags=\fR\fIlist\fR 4 +.IX Item "--ignore-tags=list" +This is the opposite of the \fB\-\-follow\-tags\fR option. To skip +certain HTML tags when recursively looking for documents to download, +specify them in a comma-separated \fIlist\fR. +.Sp +In the past, this option was the best bet for downloading a single page +and its requisites, using a command-line like: +.Sp +.Vb 1 +\& wget \-\-ignore\-tags=a,area \-H \-k \-K \-r http://<site>/<document> +.Ve +.Sp +However, the author of this option came across a page with tags like +\&\f(CW\*(C`<LINK REL="home" HREF="/">\*(C'\fR and came to the realization that +specifying tags to ignore was not enough. One can't just tell Wget to +ignore \f(CW\*(C`<LINK>\*(C'\fR, because then stylesheets will not be downloaded. +Now the best bet for downloading a single page and its requisites is the +dedicated \fB\-\-page\-requisites\fR option. +.IP \fB\-\-ignore\-case\fR 4 +.IX Item "--ignore-case" +Ignore case when matching files and directories. This influences the +behavior of \-R, \-A, \-I, and \-X options, as well as globbing +implemented when downloading from FTP sites. For example, with this +option, \fB\-A "*.txt"\fR will match \fBfile1.txt\fR, but also +\&\fBfile2.TXT\fR, \fBfile3.TxT\fR, and so on. +The quotes in the example are to prevent the shell from expanding the +pattern. +.IP \fB\-H\fR 4 +.IX Item "-H" +.PD 0 +.IP \fB\-\-span\-hosts\fR 4 +.IX Item "--span-hosts" +.PD +Enable spanning across hosts when doing recursive retrieving. +.IP \fB\-L\fR 4 +.IX Item "-L" +.PD 0 +.IP \fB\-\-relative\fR 4 +.IX Item "--relative" +.PD +Follow relative links only. Useful for retrieving a specific home page +without any distractions, not even those from the same hosts. +.IP "\fB\-I\fR \fIlist\fR" 4 +.IX Item "-I list" +.PD 0 +.IP \fB\-\-include\-directories=\fR\fIlist\fR 4 +.IX Item "--include-directories=list" +.PD +Specify a comma-separated list of directories you wish to follow when +downloading. Elements +of \fIlist\fR may contain wildcards. +.IP "\fB\-X\fR \fIlist\fR" 4 +.IX Item "-X list" +.PD 0 +.IP \fB\-\-exclude\-directories=\fR\fIlist\fR 4 +.IX Item "--exclude-directories=list" +.PD +Specify a comma-separated list of directories you wish to exclude from +download. Elements of +\&\fIlist\fR may contain wildcards. +.IP \fB\-np\fR 4 +.IX Item "-np" +.PD 0 +.IP \fB\-\-no\-parent\fR 4 +.IX Item "--no-parent" +.PD +Do not ever ascend to the parent directory when retrieving recursively. +This is a useful option, since it guarantees that only the files +\&\fIbelow\fR a certain hierarchy will be downloaded. +.SH ENVIRONMENT +.IX Header "ENVIRONMENT" +Wget supports proxies for both HTTP and FTP retrievals. The +standard way to specify proxy location, which Wget recognizes, is using +the following environment variables: +.IP \fBhttp_proxy\fR 4 +.IX Item "http_proxy" +.PD 0 +.IP \fBhttps_proxy\fR 4 +.IX Item "https_proxy" +.PD +If set, the \fBhttp_proxy\fR and \fBhttps_proxy\fR variables should +contain the URLs of the proxies for HTTP and HTTPS +connections respectively. +.IP \fBftp_proxy\fR 4 +.IX Item "ftp_proxy" +This variable should contain the URL of the proxy for FTP +connections. It is quite common that \fBhttp_proxy\fR and +\&\fBftp_proxy\fR are set to the same URL. +.IP \fBno_proxy\fR 4 +.IX Item "no_proxy" +This variable should contain a comma-separated list of domain extensions +proxy should \fInot\fR be used for. For instance, if the value of +\&\fBno_proxy\fR is \fB.mit.edu\fR, proxy will not be used to retrieve +documents from MIT. +.SH "EXIT STATUS" +.IX Header "EXIT STATUS" +Wget may return one of several error codes if it encounters problems. +.ie n .IP 0 4 +.el .IP \f(CW0\fR 4 +.IX Item "0" +No problems occurred. +.ie n .IP 1 4 +.el .IP \f(CW1\fR 4 +.IX Item "1" +Generic error code. +.ie n .IP 2 4 +.el .IP \f(CW2\fR 4 +.IX Item "2" +Parse error\-\-\-for instance, when parsing command-line options, the +\&\fB.wgetrc\fR or \fB.netrc\fR... +.ie n .IP 3 4 +.el .IP \f(CW3\fR 4 +.IX Item "3" +File I/O error. +.ie n .IP 4 4 +.el .IP \f(CW4\fR 4 +.IX Item "4" +Network failure. +.ie n .IP 5 4 +.el .IP \f(CW5\fR 4 +.IX Item "5" +SSL verification failure. +.ie n .IP 6 4 +.el .IP \f(CW6\fR 4 +.IX Item "6" +Username/password authentication failure. +.ie n .IP 7 4 +.el .IP \f(CW7\fR 4 +.IX Item "7" +Protocol errors. +.ie n .IP 8 4 +.el .IP \f(CW8\fR 4 +.IX Item "8" +Server issued an error response. +.PP +With the exceptions of 0 and 1, the lower-numbered exit codes take +precedence over higher-numbered ones, when multiple types of errors +are encountered. +.PP +In versions of Wget prior to 1.12, Wget's exit status tended to be +unhelpful and inconsistent. Recursive downloads would virtually always +return 0 (success), regardless of any issues encountered, and +non-recursive fetches only returned the status corresponding to the +most recently-attempted download. +.SH FILES +.IX Header "FILES" +.IP \fB/usr/local/etc/wgetrc\fR 4 +.IX Item "/usr/local/etc/wgetrc" +Default location of the \fIglobal\fR startup file. +.IP \fB.wgetrc\fR 4 +.IX Item ".wgetrc" +User startup file. +.SH BUGS +.IX Header "BUGS" +You are welcome to submit bug reports via the GNU Wget bug tracker (see +<\fBhttps://savannah.gnu.org/bugs/?func=additem&group=wget\fR>) or to our +mailing list <\fBbug\-wget@gnu.org\fR>. +.PP +Visit <\fBhttps://lists.gnu.org/mailman/listinfo/bug\-wget\fR> to +get more info (how to subscribe, list archives, ...). +.PP +Before actually submitting a bug report, please try to follow a few +simple guidelines. +.IP 1. 4 +Please try to ascertain that the behavior you see really is a bug. If +Wget crashes, it's a bug. If Wget does not behave as documented, +it's a bug. If things work strange, but you are not sure about the way +they are supposed to work, it might well be a bug, but you might want to +double-check the documentation and the mailing lists. +.IP 2. 4 +Try to repeat the bug in as simple circumstances as possible. E.g. if +Wget crashes while downloading \fBwget \-rl0 \-kKE \-t5 \-\-no\-proxy +http://example.com \-o /tmp/log\fR, you should try to see if the crash is +repeatable, and if will occur with a simpler set of options. You might +even try to start the download at the page where the crash occurred to +see if that page somehow triggered the crash. +.Sp +Also, while I will probably be interested to know the contents of your +\&\fI.wgetrc\fR file, just dumping it into the debug message is probably +a bad idea. Instead, you should first try to see if the bug repeats +with \fI.wgetrc\fR moved out of the way. Only if it turns out that +\&\fI.wgetrc\fR settings affect the bug, mail me the relevant parts of +the file. +.IP 3. 4 +Please start Wget with \fB\-d\fR option and send us the resulting +output (or relevant parts thereof). If Wget was compiled without +debug support, recompile it\-\-\-it is \fImuch\fR easier to trace bugs +with debug support on. +.Sp +Note: please make sure to remove any potentially sensitive information +from the debug log before sending it to the bug address. The +\&\f(CW\*(C`\-d\*(C'\fR won't go out of its way to collect sensitive information, +but the log \fIwill\fR contain a fairly complete transcript of Wget's +communication with the server, which may include passwords and pieces +of downloaded data. Since the bug address is publicly archived, you +may assume that all bug reports are visible to the public. +.IP 4. 4 +If Wget has crashed, try to run it in a debugger, e.g. \f(CW\*(C`gdb \`which +wget\` core\*(C'\fR and type \f(CW\*(C`where\*(C'\fR to get the backtrace. This may not +work if the system administrator has disabled core files, but it is +safe to try. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +This is \fBnot\fR the complete manual for GNU Wget. +For more complete information, including more detailed explanations of +some of the options, and a number of commands available +for use with \fI.wgetrc\fR files and the \fB\-e\fR option, see the GNU +Info entry for \fIwget\fR. +.PP +Also see \fBwget2\fR\|(1), the updated version of GNU Wget with even better +support for recursive downloading and modern protocols like HTTP/2. +.SH AUTHOR +.IX Header "AUTHOR" +Originally written by Hrvoje Nikšić <hniksic@xemacs.org>. +Currently maintained by Darshit Shah <darnir@gnu.org> and +Tim Rühsen <tim.ruehsen@gmx.de>. +.SH COPYRIGHT +.IX Header "COPYRIGHT" +Copyright (c) 1996\-\-2011, 2015, 2018\-\-2023 Free Software +Foundation, Inc. +.PP +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, with no Front-Cover Texts, and with no Back-Cover +Texts. A copy of the license is included in the section entitled +"GNU Free Documentation License". |