diff options
Diffstat (limited to '')
-rw-r--r-- | doc/wget.texi | 4691 |
1 files changed, 4691 insertions, 0 deletions
diff --git a/doc/wget.texi b/doc/wget.texi new file mode 100644 index 0000000..3c24de2 --- /dev/null +++ b/doc/wget.texi @@ -0,0 +1,4691 @@ +\input texinfo @c -*-texinfo-*- + +@c %**start of header +@setfilename wget.info +@documentencoding UTF-8 +@include version.texi +@settitle GNU Wget @value{VERSION} Manual +@c Disable the monstrous rectangles beside overfull hbox-es. +@finalout +@c Use `odd' to print double-sided. +@setchapternewpage on +@c %**end of header + +@iftex +@c Remove this if you don't use A4 paper. +@afourpaper +@end iftex + +@c Title for man page. The weird way texi2pod.pl is written requires +@c the preceding @set. +@set Wget Wget +@c man title Wget The non-interactive network downloader. + +@dircategory Network applications +@direntry +* Wget: (wget). Non-interactive network downloader. +@end direntry + +@copying +This file documents the GNU Wget utility for downloading network +data. + +@c man begin COPYRIGHT +Copyright @copyright{} 1996--2011, 2015, 2018--2023 Free Software +Foundation, Inc. + +@iftex +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. +@end iftex + +@ignore +Permission is granted to process this file through TeX and print the +results, provided the printed document carries a copying permission +notice identical to this one except for the removal of this paragraph +(this paragraph not being relevant to the printed manual). +@end ignore +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, with no Front-Cover Texts, and with no Back-Cover +Texts. A copy of the license is included in the section entitled +``GNU Free Documentation License''. +@c man end +@end copying + +@titlepage +@title GNU Wget @value{VERSION} +@subtitle The non-interactive download utility +@subtitle Updated for Wget @value{VERSION}, @value{UPDATED} +@author by Hrvoje Nikšić and others + +@ignore +@c man begin AUTHOR +Originally written by Hrvoje Nikšić <hniksic@xemacs.org>. +Currently maintained by Darshit Shah <darnir@gnu.org> and +Tim Rühsen <tim.ruehsen@gmx.de>. +@c man end +@c man begin SEEALSO +This is @strong{not} the complete manual for GNU Wget. +For more complete information, including more detailed explanations of +some of the options, and a number of commands available +for use with @file{.wgetrc} files and the @samp{-e} option, see the GNU +Info entry for @file{wget}. + +Also see wget2(1), the updated version of GNU Wget with even better +support for recursive downloading and modern protocols like HTTP/2. +@c man end +@end ignore + +@page +@vskip 0pt plus 1filll +@insertcopying +@end titlepage + +@contents + +@ifnottex +@node Top, Overview, (dir), (dir) +@top Wget @value{VERSION} + +@insertcopying +@end ifnottex + +@menu +* Overview:: Features of Wget. +* Invoking:: Wget command-line arguments. +* Recursive Download:: Downloading interlinked pages. +* Following Links:: The available methods of chasing links. +* Time-Stamping:: Mirroring according to time-stamps. +* Startup File:: Wget's initialization file. +* Examples:: Examples of usage. +* Various:: The stuff that doesn't fit anywhere else. +* Appendices:: Some useful references. +* Copying this manual:: You may give out copies of this manual. +* Concept Index:: Topics covered by this manual. +@end menu + +@node Overview, Invoking, Top, Top +@chapter Overview +@cindex overview +@cindex features + +@c man begin DESCRIPTION +GNU Wget is a free utility for non-interactive download of files from +the Web. It supports @sc{http}, @sc{https}, and @sc{ftp} protocols, as +well as retrieval through @sc{http} proxies. + +@c man end +This chapter is a partial overview of Wget's features. + +@itemize @bullet +@item +@c man begin DESCRIPTION +Wget is non-interactive, meaning that it can work in the background, +while the user is not logged on. This allows you to start a retrieval +and disconnect from the system, letting Wget finish the work. By +contrast, most of the Web browsers require constant user's presence, +which can be a great hindrance when transferring a lot of data. +@c man end + +@item +@ignore +@c man begin DESCRIPTION + +@c man end +@end ignore +@c man begin DESCRIPTION +Wget can follow links in @sc{html}, @sc{xhtml}, and @sc{css} pages, to +create local versions of remote web sites, fully recreating the +directory structure of the original site. This is sometimes referred to +as ``recursive downloading.'' While doing that, Wget respects the Robot +Exclusion Standard (@file{/robots.txt}). Wget can be instructed to +convert the links in downloaded files to point at the local files, for +offline viewing. +@c man end + +@item +File name wildcard matching and recursive mirroring of directories are +available when retrieving via @sc{ftp}. Wget can read the time-stamp +information given by both @sc{http} and @sc{ftp} servers, and store it +locally. Thus Wget can see if the remote file has changed since last +retrieval, and automatically retrieve the new version if it has. This +makes Wget suitable for mirroring of @sc{ftp} sites, as well as home +pages. + +@item +@ignore +@c man begin DESCRIPTION + +@c man end +@end ignore +@c man begin DESCRIPTION +Wget has been designed for robustness over slow or unstable network +connections; if a download fails due to a network problem, it will +keep retrying until the whole file has been retrieved. If the server +supports regetting, it will instruct the server to continue the +download from where it left off. +@c man end + +@item +Wget supports proxy servers, which can lighten the network load, speed +up retrieval and provide access behind firewalls. Wget uses the passive +@sc{ftp} downloading by default, active @sc{ftp} being an option. + +@item +Wget supports IP version 6, the next generation of IP. IPv6 is +autodetected at compile-time, and can be disabled at either build or +run time. Binaries built with IPv6 support work well in both +IPv4-only and dual family environments. + +@item +Built-in features offer mechanisms to tune which links you wish to follow +(@pxref{Following Links}). + +@item +The progress of individual downloads is traced using a progress gauge. +Interactive downloads are tracked using a ``thermometer''-style gauge, +whereas non-interactive ones are traced with dots, each dot +representing a fixed amount of data received (1KB by default). Either +gauge can be customized to your preferences. + +@item +Most of the features are fully configurable, either through command line +options, or via the initialization file @file{.wgetrc} (@pxref{Startup +File}). Wget allows you to define @dfn{global} startup files +(@file{/usr/local/etc/wgetrc} by default) for site settings. You can also +specify the location of a startup file with the --config option. +To disable the reading of config files, use --no-config. +If both --config and --no-config are given, --no-config is ignored. + + +@ignore +@c man begin FILES +@table @samp +@item /usr/local/etc/wgetrc +Default location of the @dfn{global} startup file. + +@item .wgetrc +User startup file. +@end table +@c man end +@end ignore + +@item +Finally, GNU Wget is free software. This means that everyone may use +it, redistribute it and/or modify it under the terms of the GNU General +Public License, as published by the Free Software Foundation (see the +file @file{COPYING} that came with GNU Wget, for details). +@end itemize + +@node Invoking, Recursive Download, Overview, Top +@chapter Invoking +@cindex invoking +@cindex command line +@cindex arguments +@cindex nohup + +By default, Wget is very simple to invoke. The basic syntax is: + +@example +@c man begin SYNOPSIS +wget [@var{option}]@dots{} [@var{URL}]@dots{} +@c man end +@end example + +Wget will simply download all the @sc{url}s specified on the command +line. @var{URL} is a @dfn{Uniform Resource Locator}, as defined below. + +However, you may wish to change some of the default parameters of +Wget. You can do it two ways: permanently, adding the appropriate +command to @file{.wgetrc} (@pxref{Startup File}), or specifying it on +the command line. + +@menu +* URL Format:: +* Option Syntax:: +* Basic Startup Options:: +* Logging and Input File Options:: +* Download Options:: +* Directory Options:: +* HTTP Options:: +* HTTPS (SSL/TLS) Options:: +* FTP Options:: +* Recursive Retrieval Options:: +* Recursive Accept/Reject Options:: +* Exit Status:: +@end menu + +@node URL Format, Option Syntax, Invoking, Invoking +@section URL Format +@cindex URL +@cindex URL syntax + +@dfn{URL} is an acronym for Uniform Resource Locator. A uniform +resource locator is a compact string representation for a resource +available via the Internet. Wget recognizes the @sc{url} syntax as per +@sc{rfc1738}. This is the most widely used form (square brackets denote +optional parts): + +@example +http://host[:port]/directory/file +ftp://host[:port]/directory/file +@end example + +You can also encode your username and password within a @sc{url}: + +@example +ftp://user:password@@host/path +http://user:password@@host/path +@end example + +Either @var{user} or @var{password}, or both, may be left out. If you +leave out either the @sc{http} username or password, no authentication +will be sent. If you leave out the @sc{ftp} username, @samp{anonymous} +will be used. If you leave out the @sc{ftp} password, your email +address will be supplied as a default password.@footnote{If you have a +@file{.netrc} file in your home directory, password will also be +searched for there.} + +@strong{Important Note}: if you specify a password-containing @sc{url} +on the command line, the username and password will be plainly visible +to all users on the system, by way of @code{ps}. On multi-user systems, +this is a big security risk. To work around it, use @code{wget -i -} +and feed the @sc{url}s to Wget's standard input, each on a separate +line, terminated by @kbd{C-d}. + +You can encode unsafe characters in a @sc{url} as @samp{%xy}, @code{xy} +being the hexadecimal representation of the character's @sc{ascii} +value. Some common unsafe characters include @samp{%} (quoted as +@samp{%25}), @samp{:} (quoted as @samp{%3A}), and @samp{@@} (quoted as +@samp{%40}). Refer to @sc{rfc1738} for a comprehensive list of unsafe +characters. + +Wget also supports the @code{type} feature for @sc{ftp} @sc{url}s. By +default, @sc{ftp} documents are retrieved in the binary mode (type +@samp{i}), which means that they are downloaded unchanged. Another +useful mode is the @samp{a} (@dfn{ASCII}) mode, which converts the line +delimiters between the different operating systems, and is thus useful +for text files. Here is an example: + +@example +ftp://host/directory/file;type=a +@end example + +Two alternative variants of @sc{url} specification are also supported, +because of historical (hysterical?) reasons and their widespreaded use. + +@sc{ftp}-only syntax (supported by @code{NcFTP}): +@example +host:/dir/file +@end example + +@sc{http}-only syntax (introduced by @code{Netscape}): +@example +host[:port]/dir/file +@end example + +These two alternative forms are deprecated, and may cease being +supported in the future. + +If you do not understand the difference between these notations, or do +not know which one to use, just use the plain ordinary format you use +with your favorite browser, like @code{Lynx} or @code{Netscape}. + +@c man begin OPTIONS + +@node Option Syntax, Basic Startup Options, URL Format, Invoking +@section Option Syntax +@cindex option syntax +@cindex syntax of options + +Since Wget uses GNU getopt to process command-line arguments, every +option has a long form along with the short one. Long options are +more convenient to remember, but take time to type. You may freely +mix different option styles, or specify options after the command-line +arguments. Thus you may write: + +@example +wget -r --tries=10 http://fly.srk.fer.hr/ -o log +@end example + +The space between the option accepting an argument and the argument may +be omitted. Instead of @samp{-o log} you can write @samp{-olog}. + +You may put several options that do not require arguments together, +like: + +@example +wget -drc @var{URL} +@end example + +This is completely equivalent to: + +@example +wget -d -r -c @var{URL} +@end example + +Since the options can be specified after the arguments, you may +terminate them with @samp{--}. So the following will try to download +@sc{url} @samp{-x}, reporting failure to @file{log}: + +@example +wget -o log -- -x +@end example + +The options that accept comma-separated lists all respect the convention +that specifying an empty list clears its value. This can be useful to +clear the @file{.wgetrc} settings. For instance, if your @file{.wgetrc} +sets @code{exclude_directories} to @file{/cgi-bin}, the following +example will first reset it, and then set it to exclude @file{/~nobody} +and @file{/~somebody}. You can also clear the lists in @file{.wgetrc} +(@pxref{Wgetrc Syntax}). + +@example +wget -X "" -X /~nobody,/~somebody +@end example + +Most options that do not accept arguments are @dfn{boolean} options, +so named because their state can be captured with a yes-or-no +(``boolean'') variable. For example, @samp{--follow-ftp} tells Wget +to follow FTP links from HTML files and, on the other hand, +@samp{--no-glob} tells it not to perform file globbing on FTP URLs. A +boolean option is either @dfn{affirmative} or @dfn{negative} +(beginning with @samp{--no}). All such options share several +properties. + +Unless stated otherwise, it is assumed that the default behavior is +the opposite of what the option accomplishes. For example, the +documented existence of @samp{--follow-ftp} assumes that the default +is to @emph{not} follow FTP links from HTML pages. + +Affirmative options can be negated by prepending the @samp{--no-} to +the option name; negative options can be negated by omitting the +@samp{--no-} prefix. This might seem superfluous---if the default for +an affirmative option is to not do something, then why provide a way +to explicitly turn it off? But the startup file may in fact change +the default. For instance, using @code{follow_ftp = on} in +@file{.wgetrc} makes Wget @emph{follow} FTP links by default, and +using @samp{--no-follow-ftp} is the only way to restore the factory +default from the command line. + +@node Basic Startup Options, Logging and Input File Options, Option Syntax, Invoking +@section Basic Startup Options + +@table @samp +@item -V +@itemx --version +Display the version of Wget. + +@item -h +@itemx --help +Print a help message describing all of Wget's command-line options. + +@item -b +@itemx --background +Go to background immediately after startup. If no output file is +specified via the @samp{-o}, output is redirected to @file{wget-log}. + +@cindex execute wgetrc command +@item -e @var{command} +@itemx --execute @var{command} +Execute @var{command} as if it were a part of @file{.wgetrc} +(@pxref{Startup File}). A command thus invoked will be executed +@emph{after} the commands in @file{.wgetrc}, thus taking precedence over +them. If you need to specify more than one wgetrc command, use multiple +instances of @samp{-e}. + +@end table + +@node Logging and Input File Options, Download Options, Basic Startup Options, Invoking +@section Logging and Input File Options + +@table @samp +@cindex output file +@cindex log file +@item -o @var{logfile} +@itemx --output-file=@var{logfile} +Log all messages to @var{logfile}. The messages are normally reported +to standard error. + +@cindex append to log +@item -a @var{logfile} +@itemx --append-output=@var{logfile} +Append to @var{logfile}. This is the same as @samp{-o}, only it appends +to @var{logfile} instead of overwriting the old log file. If +@var{logfile} does not exist, a new file is created. + +@cindex debug +@item -d +@itemx --debug +Turn on debug output, meaning various information important to the +developers of Wget if it does not work properly. Your system +administrator may have chosen to compile Wget without debug support, in +which case @samp{-d} will not work. Please note that compiling with +debug support is always safe---Wget compiled with the debug support will +@emph{not} print any debug info unless requested with @samp{-d}. +@xref{Reporting Bugs}, for more information on how to use @samp{-d} for +sending bug reports. + +@cindex quiet +@item -q +@itemx --quiet +Turn off Wget's output. + +@cindex verbose +@item -v +@itemx --verbose +Turn on verbose output, with all the available data. The default output +is verbose. + +@item -nv +@itemx --no-verbose +Turn off verbose without being completely quiet (use @samp{-q} for +that), which means that error messages and basic information still get +printed. + +@item --report-speed=@var{type} +Output bandwidth as @var{type}. The only accepted value is @samp{bits}. + +@cindex input-file +@item -i @var{file} +@itemx --input-file=@var{file} +Read @sc{url}s from a local or external @var{file}. If @samp{-} is +specified as @var{file}, @sc{url}s are read from the standard input. +(Use @samp{./-} to read from a file literally named @samp{-}.) + +If this function is used, no @sc{url}s need be present on the command +line. If there are @sc{url}s both on the command line and in an input +file, those on the command lines will be the first ones to be +retrieved. If @samp{--force-html} is not specified, then @var{file} +should consist of a series of URLs, one per line. + +However, if you specify @samp{--force-html}, the document will be +regarded as @samp{html}. In that case you may have problems with +relative links, which you can solve either by adding @code{<base +href="@var{url}">} to the documents or by specifying +@samp{--base=@var{url}} on the command line. + +If the @var{file} is an external one, the document will be automatically +treated as @samp{html} if the Content-Type matches @samp{text/html}. +Furthermore, the @var{file}'s location will be implicitly used as base +href if none was specified. + +@cindex input-metalink +@item --input-metalink=@var{file} +Downloads files covered in local Metalink @var{file}. Metalink version 3 +and 4 are supported. + +@cindex keep-badhash +@item --keep-badhash +Keeps downloaded Metalink's files with a bad hash. It appends .badhash +to the name of Metalink's files which have a checksum mismatch, except +without overwriting existing files. + +@cindex metalink-over-http +@item --metalink-over-http +Issues HTTP HEAD request instead of GET and extracts Metalink metadata +from response headers. Then it switches to Metalink download. +If no valid Metalink metadata is found, it falls back to ordinary HTTP download. +Enables @samp{Content-Type: application/metalink4+xml} files download/processing. + +@cindex metalink-index +@item --metalink-index=@var{number} +Set the Metalink @samp{application/metalink4+xml} metaurl ordinal +NUMBER. From 1 to the total number of ``application/metalink4+xml'' +available. Specify 0 or @samp{inf} to choose the first good one. +Metaurls, such as those from a @samp{--metalink-over-http}, may have +been sorted by priority key's value; keep this in mind to choose the +right NUMBER. + +@cindex preferred-location +@item --preferred-location +Set preferred location for Metalink resources. This has effect if multiple +resources with same priority are available. + +@cindex xattr +@item --xattr +Enable use of file system's extended attributes to save the +original URL and the Referer HTTP header value if used. + +Be aware that the URL might contain private information like +access tokens or credentials. + + +@cindex force html +@item -F +@itemx --force-html +When input is read from a file, force it to be treated as an @sc{html} +file. This enables you to retrieve relative links from existing +@sc{html} files on your local disk, by adding @code{<base +href="@var{url}">} to @sc{html}, or using the @samp{--base} command-line +option. + +@cindex base for relative links in input file +@item -B @var{URL} +@itemx --base=@var{URL} +Resolves relative links using @var{URL} as the point of reference, +when reading links from an HTML file specified via the +@samp{-i}/@samp{--input-file} option (together with +@samp{--force-html}, or when the input file was fetched remotely from +a server describing it as @sc{html}). This is equivalent to the +presence of a @code{BASE} tag in the @sc{html} input file, with +@var{URL} as the value for the @code{href} attribute. + +For instance, if you specify @samp{http://foo/bar/a.html} for +@var{URL}, and Wget reads @samp{../baz/b.html} from the input file, it +would be resolved to @samp{http://foo/baz/b.html}. + +@cindex specify config +@item --config=@var{FILE} +Specify the location of a startup file you wish to use instead of the +default one(s). Use --no-config to disable reading of config files. +If both --config and --no-config are given, --no-config is ignored. + + +@item --rejected-log=@var{logfile} +Logs all URL rejections to @var{logfile} as comma separated values. The values +include the reason of rejection, the URL and the parent URL it was found in. + +@end table + +@node Download Options, Directory Options, Logging and Input File Options, Invoking +@section Download Options + +@table @samp +@cindex bind address +@cindex client IP address +@cindex IP address, client +@item --bind-address=@var{ADDRESS} +When making client TCP/IP connections, bind to @var{ADDRESS} on +the local machine. @var{ADDRESS} may be specified as a hostname or IP +address. This option can be useful if your machine is bound to multiple +IPs. + +@cindex bind DNS address +@cindex client DNS address +@cindex DNS IP address, client, DNS +@item --bind-dns-address=@var{ADDRESS} +[libcares only] +This address overrides the route for DNS requests. If you ever need to +circumvent the standard settings from /etc/resolv.conf, this option together +with @samp{--dns-servers} is your friend. +@var{ADDRESS} must be specified either as IPv4 or IPv6 address. +Wget needs to be built with libcares for this option to be available. + +@cindex DNS server +@cindex DNS IP address, client, DNS +@item --dns-servers=@var{ADDRESSES} +[libcares only] +The given address(es) override the standard nameserver +addresses, e.g. as configured in /etc/resolv.conf. +@var{ADDRESSES} may be specified either as IPv4 or IPv6 addresses, +comma-separated. +Wget needs to be built with libcares for this option to be available. + +@cindex retries +@cindex tries +@cindex number of tries +@item -t @var{number} +@itemx --tries=@var{number} +Set number of tries to @var{number}. Specify 0 or @samp{inf} for +infinite retrying. The default is to retry 20 times, with the exception +of fatal errors like ``connection refused'' or ``not found'' (404), +which are not retried. + +@item -O @var{file} +@itemx --output-document=@var{file} +The documents will not be written to the appropriate files, but all +will be concatenated together and written to @var{file}. If @samp{-} +is used as @var{file}, documents will be printed to standard output, +disabling link conversion. (Use @samp{./-} to print to a file +literally named @samp{-}.) + +Use of @samp{-O} is @emph{not} intended to mean simply ``use the name +@var{file} instead of the one in the URL;'' rather, it is +analogous to shell redirection: +@samp{wget -O file http://foo} is intended to work like +@samp{wget -O - http://foo > file}; @file{file} will be truncated +immediately, and @emph{all} downloaded content will be written there. + +For this reason, @samp{-N} (for timestamp-checking) is not supported +in combination with @samp{-O}: since @var{file} is always newly +created, it will always have a very new timestamp. A warning will be +issued if this combination is used. + +Similarly, using @samp{-r} or @samp{-p} with @samp{-O} may not work as +you expect: Wget won't just download the first file to @var{file} and +then download the rest to their normal names: @emph{all} downloaded +content will be placed in @var{file}. This was disabled in version +1.11, but has been reinstated (with a warning) in 1.11.2, as there are +some cases where this behavior can actually have some use. + +A combination with @samp{-nc} is only accepted if the given output +file does not exist. + +Note that a combination with @samp{-k} is only permitted when +downloading a single document, as in that case it will just convert +all relative URIs to external ones; @samp{-k} makes no sense for +multiple URIs when they're all being downloaded to a single file; +@samp{-k} can be used only when the output is a regular file. + +@cindex clobbering, file +@cindex downloading multiple times +@cindex no-clobber +@item -nc +@itemx --no-clobber +If a file is downloaded more than once in the same directory, Wget's +behavior depends on a few options, including @samp{-nc}. In certain +cases, the local file will be @dfn{clobbered}, or overwritten, upon +repeated download. In other cases it will be preserved. + +When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or +@samp{-p}, downloading the same file in the same directory will result +in the original copy of @var{file} being preserved and the second copy +being named @samp{@var{file}.1}. If that file is downloaded yet +again, the third copy will be named @samp{@var{file}.2}, and so on. +(This is also the behavior with @samp{-nd}, even if @samp{-r} or +@samp{-p} are in effect.) When @samp{-nc} is specified, this behavior +is suppressed, and Wget will refuse to download newer copies of +@samp{@var{file}}. Therefore, ``@code{no-clobber}'' is actually a +misnomer in this mode---it's not clobbering that's prevented (as the +numeric suffixes were already preventing clobbering), but rather the +multiple version saving that's prevented. + +When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N}, +@samp{-nd}, or @samp{-nc}, re-downloading a file will result in the +new copy simply overwriting the old. Adding @samp{-nc} will prevent +this behavior, instead causing the original version to be preserved +and any newer copies on the server to be ignored. + +When running Wget with @samp{-N}, with or without @samp{-r} or +@samp{-p}, the decision as to whether or not to download a newer copy +of a file depends on the local and remote timestamp and size of the +file (@pxref{Time-Stamping}). @samp{-nc} may not be specified at the +same time as @samp{-N}. + +A combination with @samp{-O}/@samp{--output-document} is only accepted +if the given output file does not exist. + +Note that when @samp{-nc} is specified, files with the suffixes +@samp{.html} or @samp{.htm} will be loaded from the local disk and +parsed as if they had been retrieved from the Web. + +@cindex backing up files +@item --backups=@var{backups} +Before (over)writing a file, back up an existing file by adding a +@samp{.1} suffix (@samp{_1} on VMS) to the file name. Such backup +files are rotated to @samp{.2}, @samp{.3}, and so on, up to +@var{backups} (and lost beyond that). + +@cindex authentication credentials +@item --no-netrc +Do not try to obtain credentials from @file{.netrc} file. By default +@file{.netrc} file is searched for credentials in case none have been +passed on command line and authentication is required. + +@cindex continue retrieval +@cindex incomplete downloads +@cindex resume download +@item -c +@itemx --continue +Continue getting a partially-downloaded file. This is useful when you +want to finish up a download started by a previous instance of Wget, or +by another program. For instance: + +@example +wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z +@end example + +If there is a file named @file{ls-lR.Z} in the current directory, Wget +will assume that it is the first portion of the remote file, and will +ask the server to continue the retrieval from an offset equal to the +length of the local file. + +Note that you don't need to specify this option if you just want the +current invocation of Wget to retry downloading a file should the +connection be lost midway through. This is the default behavior. +@samp{-c} only affects resumption of downloads started @emph{prior} to +this invocation of Wget, and whose local files are still sitting around. + +Without @samp{-c}, the previous example would just download the remote +file to @file{ls-lR.Z.1}, leaving the truncated @file{ls-lR.Z} file +alone. + +If you use @samp{-c} on a non-empty file, and the server does not support +continued downloading, Wget will restart the download from scratch and overwrite +the existing file entirely. + +Beginning with Wget 1.7, if you use @samp{-c} on a file which is of +equal size as the one on the server, Wget will refuse to download the +file and print an explanatory message. The same happens when the file +is smaller on the server than locally (presumably because it was changed +on the server since your last download attempt)---because ``continuing'' +is not meaningful, no download occurs. + +On the other side of the coin, while using @samp{-c}, any file that's +bigger on the server than locally will be considered an incomplete +download and only @code{(length(remote) - length(local))} bytes will be +downloaded and tacked onto the end of the local file. This behavior can +be desirable in certain cases---for instance, you can use @samp{wget -c} +to download just the new portion that's been appended to a data +collection or log file. + +However, if the file is bigger on the server because it's been +@emph{changed}, as opposed to just @emph{appended} to, you'll end up +with a garbled file. Wget has no way of verifying that the local file +is really a valid prefix of the remote file. You need to be especially +careful of this when using @samp{-c} in conjunction with @samp{-r}, +since every file will be considered as an "incomplete download" candidate. + +Another instance where you'll get a garbled file if you try to use +@samp{-c} is if you have a lame @sc{http} proxy that inserts a +``transfer interrupted'' string into the local file. In the future a +``rollback'' option may be added to deal with this case. + +Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http} +servers that support the @code{Range} header. + +@cindex offset +@cindex continue retrieval +@cindex incomplete downloads +@cindex resume download +@cindex start position +@item --start-pos=@var{OFFSET} +Start downloading at zero-based position @var{OFFSET}. Offset may be expressed +in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix, etc. + +@samp{--start-pos} has higher precedence over @samp{--continue}. When +@samp{--start-pos} and @samp{--continue} are both specified, wget will emit a +warning then proceed as if @samp{--continue} was absent. + +Server support for continued download is required, otherwise @samp{--start-pos} +cannot help. See @samp{-c} for details. + +@cindex progress indicator +@cindex dot style +@item --progress=@var{type} +Select the type of the progress indicator you wish to use. Legal +indicators are ``dot'' and ``bar''. + +The ``bar'' indicator is used by default. It draws an @sc{ascii} progress +bar graphics (a.k.a ``thermometer'' display) indicating the status of +retrieval. If the output is not a TTY, the ``dot'' bar will be used by +default. + +Use @samp{--progress=dot} to switch to the ``dot'' display. It traces +the retrieval by printing dots on the screen, each dot representing a +fixed amount of downloaded data. + +The progress @var{type} can also take one or more parameters. The parameters +vary based on the @var{type} selected. Parameters to @var{type} are passed by +appending them to the type sperated by a colon (:) like this: +@samp{--progress=@var{type}:@var{parameter1}:@var{parameter2}}. + +When using the dotted retrieval, you may set the @dfn{style} by +specifying the type as @samp{dot:@var{style}}. Different styles assign +different meaning to one dot. With the @code{default} style each dot +represents 1K, there are ten dots in a cluster and 50 dots in a line. +The @code{binary} style has a more ``computer''-like orientation---8K +dots, 16-dots clusters and 48 dots per line (which makes for 384K +lines). The @code{mega} style is suitable for downloading large +files---each dot represents 64K retrieved, there are eight dots in a +cluster, and 48 dots on each line (so each line contains 3M). +If @code{mega} is not enough then you can use the @code{giga} +style---each dot represents 1M retrieved, there are eight dots in a +cluster, and 32 dots on each line (so each line contains 32M). + +With @samp{--progress=bar}, there are currently two possible parameters, +@var{force} and @var{noscroll}. + +When the output is not a TTY, the progress bar always falls back to ``dot'', +even if @samp{--progress=bar} was passed to Wget during invocation. This +behaviour can be overridden and the ``bar'' output forced by using the ``force'' +parameter as @samp{--progress=bar:force}. + +By default, the @samp{bar} style progress bar scroll the name of the file from +left to right for the file being downloaded if the filename exceeds the maximum +length allotted for its display. In certain cases, such as with +@samp{--progress=bar:force}, one may not want the scrolling filename in the +progress bar. By passing the ``noscroll'' parameter, Wget can be forced to +display as much of the filename as possible without scrolling through it. + +Note that you can set the default style using the @code{progress} +command in @file{.wgetrc}. That setting may be overridden from the +command line. For example, to force the bar output without scrolling, +use @samp{--progress=bar:force:noscroll}. + +@item --show-progress +Force wget to display the progress bar in any verbosity. + +By default, wget only displays the progress bar in verbose mode. One may +however, want wget to display the progress bar on screen in conjunction with +any other verbosity modes like @samp{--no-verbose} or @samp{--quiet}. This +is often a desired a property when invoking wget to download several small/large +files. In such a case, wget could simply be invoked with this parameter to get +a much cleaner output on the screen. + +This option will also force the progress bar to be printed to @file{stderr} when +used alongside the @samp{--output-file} option. + +@item -N +@itemx --timestamping +Turn on time-stamping. @xref{Time-Stamping}, for details. + +@item --no-if-modified-since +Do not send If-Modified-Since header in @samp{-N} mode. Send preliminary HEAD +request instead. This has only effect in @samp{-N} mode. + +@item --no-use-server-timestamps +Don't set the local file's timestamp by the one on the server. + +By default, when a file is downloaded, its timestamps are set to +match those from the remote file. This allows the use of +@samp{--timestamping} on subsequent invocations of wget. However, it +is sometimes useful to base the local file's timestamp on when it was +actually downloaded; for that purpose, the +@samp{--no-use-server-timestamps} option has been provided. + +@cindex server response, print +@item -S +@itemx --server-response +Print the headers sent by @sc{http} servers and responses sent by +@sc{ftp} servers. + +@cindex Wget as spider +@cindex spider +@item --spider +When invoked with this option, Wget will behave as a Web @dfn{spider}, +which means that it will not download the pages, just check that they +are there. For example, you can use Wget to check your bookmarks: + +@example +wget --spider --force-html -i bookmarks.html +@end example + +This feature needs much more work for Wget to get close to the +functionality of real web spiders. + +@cindex timeout +@item -T seconds +@itemx --timeout=@var{seconds} +Set the network timeout to @var{seconds} seconds. This is equivalent +to specifying @samp{--dns-timeout}, @samp{--connect-timeout}, and +@samp{--read-timeout}, all at the same time. + +When interacting with the network, Wget can check for timeout and +abort the operation if it takes too long. This prevents anomalies +like hanging reads and infinite connects. The only timeout enabled by +default is a 900-second read timeout. Setting a timeout to 0 disables +it altogether. Unless you know what you are doing, it is best not to +change the default timeout settings. + +All timeout-related options accept decimal values, as well as +subsecond values. For example, @samp{0.1} seconds is a legal (though +unwise) choice of timeout. Subsecond timeouts are useful for checking +server response times or for testing network latency. + +@cindex DNS timeout +@cindex timeout, DNS +@item --dns-timeout=@var{seconds} +Set the DNS lookup timeout to @var{seconds} seconds. DNS lookups that +don't complete within the specified time will fail. By default, there +is no timeout on DNS lookups, other than that implemented by system +libraries. + +@cindex connect timeout +@cindex timeout, connect +@item --connect-timeout=@var{seconds} +Set the connect timeout to @var{seconds} seconds. TCP connections that +take longer to establish will be aborted. By default, there is no +connect timeout, other than that implemented by system libraries. + +@cindex read timeout +@cindex timeout, read +@item --read-timeout=@var{seconds} +Set the read (and write) timeout to @var{seconds} seconds. The +``time'' of this timeout refers to @dfn{idle time}: if, at any point in +the download, no data is received for more than the specified number +of seconds, reading fails and the download is restarted. This option +does not directly affect the duration of the entire download. + +Of course, the remote server may choose to terminate the connection +sooner than this option requires. The default read timeout is 900 +seconds. + +@cindex bandwidth, limit +@cindex rate, limit +@cindex limit bandwidth +@item --limit-rate=@var{amount} +Limit the download speed to @var{amount} bytes per second. Amount may +be expressed in bytes, kilobytes with the @samp{k} suffix, or megabytes +with the @samp{m} suffix. For example, @samp{--limit-rate=20k} will +limit the retrieval rate to 20KB/s. This is useful when, for whatever +reason, you don't want Wget to consume the entire available bandwidth. + +This option allows the use of decimal numbers, usually in conjunction +with power suffixes; for example, @samp{--limit-rate=2.5k} is a legal +value. + +Note that Wget implements the limiting by sleeping the appropriate +amount of time after a network read that took less time than specified +by the rate. Eventually this strategy causes the TCP transfer to slow +down to approximately the specified rate. However, it may take some +time for this balance to be achieved, so don't be surprised if limiting +the rate doesn't work well with very small files. + +@cindex pause +@cindex wait +@item -w @var{seconds} +@itemx --wait=@var{seconds} +Wait the specified number of seconds between the retrievals. Use of +this option is recommended, as it lightens the server load by making the +requests less frequent. Instead of in seconds, the time can be +specified in minutes using the @code{m} suffix, in hours using @code{h} +suffix, or in days using @code{d} suffix. + +Specifying a large value for this option is useful if the network or the +destination host is down, so that Wget can wait long enough to +reasonably expect the network error to be fixed before the retry. The +waiting interval specified by this function is influenced by +@code{--random-wait}, which see. + +@cindex retries, waiting between +@cindex waiting between retries +@item --waitretry=@var{seconds} +If you don't want Wget to wait between @emph{every} retrieval, but only +between retries of failed downloads, you can use this option. Wget will +use @dfn{linear backoff}, waiting 1 second after the first failure on a +given file, then waiting 2 seconds after the second failure on that +file, up to the maximum number of @var{seconds} you specify. + +By default, Wget will assume a value of 10 seconds. + +@cindex wait, random +@cindex random wait +@item --random-wait +Some web sites may perform log analysis to identify retrieval programs +such as Wget by looking for statistically significant similarities in +the time between requests. This option causes the time between requests +to vary between 0.5 and 1.5 * @var{wait} seconds, where @var{wait} was +specified using the @samp{--wait} option, in order to mask Wget's +presence from such analysis. + +A 2001 article in a publication devoted to development on a popular +consumer platform provided code to perform this analysis on the fly. +Its author suggested blocking at the class C address level to ensure +automated retrieval programs were blocked despite changing DHCP-supplied +addresses. + +The @samp{--random-wait} option was inspired by this ill-advised +recommendation to block many unrelated users from a web site due to the +actions of one. + +@cindex proxy +@item --no-proxy +Don't use proxies, even if the appropriate @code{*_proxy} environment +variable is defined. + +@c man end +@xref{Proxies}, for more information about the use of proxies with +Wget. +@c man begin OPTIONS + +@cindex quota +@item -Q @var{quota} +@itemx --quota=@var{quota} +Specify download quota for automatic retrievals. The value can be +specified in bytes (default), kilobytes (with @samp{k} suffix), or +megabytes (with @samp{m} suffix). + +Note that quota will never affect downloading a single file. So if you +specify @samp{wget -Q10k https://example.com/ls-lR.gz}, all of the +@file{ls-lR.gz} will be downloaded. The same goes even when several +@sc{url}s are specified on the command-line. The quota is checked only +at the end of each downloaded file, so it will never result in a partially +downloaded file. Thus you may safely type @samp{wget -Q2m -i sites}---download +will be aborted after the file that exhausts the quota is completely +downloaded. + +Setting quota to 0 or to @samp{inf} unlimits the download quota. + +@cindex DNS cache +@cindex caching of DNS lookups +@item --no-dns-cache +Turn off caching of DNS lookups. Normally, Wget remembers the IP +addresses it looked up from DNS so it doesn't have to repeatedly +contact the DNS server for the same (typically small) set of hosts it +retrieves from. This cache exists in memory only; a new Wget run will +contact DNS again. + +However, it has been reported that in some situations it is not +desirable to cache host names, even for the duration of a +short-running application like Wget. With this option Wget issues a +new DNS lookup (more precisely, a new call to @code{gethostbyname} or +@code{getaddrinfo}) each time it makes a new connection. Please note +that this option will @emph{not} affect caching that might be +performed by the resolving library or by an external caching layer, +such as NSCD. + +If you don't understand exactly what this option does, you probably +won't need it. + +@cindex file names, restrict +@cindex Windows file names +@item --restrict-file-names=@var{modes} +Change which characters found in remote URLs must be escaped during +generation of local filenames. Characters that are @dfn{restricted} +by this option are escaped, i.e. replaced with @samp{%HH}, where +@samp{HH} is the hexadecimal number that corresponds to the restricted +character. This option may also be used to force all alphabetical +cases to be either lower- or uppercase. + +By default, Wget escapes the characters that are not valid or safe as +part of file names on your operating system, as well as control +characters that are typically unprintable. This option is useful for +changing these defaults, perhaps because you are downloading to a +non-native partition, or because you want to disable escaping of the +control characters, or you want to further restrict characters to only +those in the @sc{ascii} range of values. + +The @var{modes} are a comma-separated set of text values. The +acceptable values are @samp{unix}, @samp{windows}, @samp{nocontrol}, +@samp{ascii}, @samp{lowercase}, and @samp{uppercase}. The values +@samp{unix} and @samp{windows} are mutually exclusive (one will +override the other), as are @samp{lowercase} and +@samp{uppercase}. Those last are special cases, as they do not change +the set of characters that would be escaped, but rather force local +file paths to be converted either to lower- or uppercase. + +When ``unix'' is specified, Wget escapes the character @samp{/} and +the control characters in the ranges 0--31 and 128--159. This is the +default on Unix-like operating systems. + +When ``windows'' is given, Wget escapes the characters @samp{\}, +@samp{|}, @samp{/}, @samp{:}, @samp{?}, @samp{"}, @samp{*}, @samp{<}, +@samp{>}, and the control characters in the ranges 0--31 and 128--159. +In addition to this, Wget in Windows mode uses @samp{+} instead of +@samp{:} to separate host and port in local file names, and uses +@samp{@@} instead of @samp{?} to separate the query portion of the file +name from the rest. Therefore, a URL that would be saved as +@samp{www.xemacs.org:4300/search.pl?input=blah} in Unix mode would be +saved as @samp{www.xemacs.org+4300/search.pl@@input=blah} in Windows +mode. This mode is the default on Windows. + +If you specify @samp{nocontrol}, then the escaping of the control +characters is also switched off. This option may make sense +when you are downloading URLs whose names contain UTF-8 characters, on +a system which can save and display filenames in UTF-8 (some possible +byte values used in UTF-8 byte sequences fall in the range of values +designated by Wget as ``controls''). + +The @samp{ascii} mode is used to specify that any bytes whose values +are outside the range of @sc{ascii} characters (that is, greater than +127) shall be escaped. This can be useful when saving filenames +whose encoding does not match the one used locally. + +@cindex IPv6 +@item -4 +@itemx --inet4-only +@itemx -6 +@itemx --inet6-only +Force connecting to IPv4 or IPv6 addresses. With @samp{--inet4-only} +or @samp{-4}, Wget will only connect to IPv4 hosts, ignoring AAAA +records in DNS, and refusing to connect to IPv6 addresses specified in +URLs. Conversely, with @samp{--inet6-only} or @samp{-6}, Wget will +only connect to IPv6 hosts and ignore A records and IPv4 addresses. + +Neither options should be needed normally. By default, an IPv6-aware +Wget will use the address family specified by the host's DNS record. +If the DNS responds with both IPv4 and IPv6 addresses, Wget will try +them in sequence until it finds one it can connect to. (Also see +@code{--prefer-family} option described below.) + +These options can be used to deliberately force the use of IPv4 or +IPv6 address families on dual family systems, usually to aid debugging +or to deal with broken network configuration. Only one of +@samp{--inet6-only} and @samp{--inet4-only} may be specified at the +same time. Neither option is available in Wget compiled without IPv6 +support. + +@item --prefer-family=none/IPv4/IPv6 +When given a choice of several addresses, connect to the addresses +with specified address family first. The address order returned by +DNS is used without change by default. + +This avoids spurious errors and connect attempts when accessing hosts +that resolve to both IPv6 and IPv4 addresses from IPv4 networks. For +example, @samp{www.kame.net} resolves to +@samp{2001:200:0:8002:203:47ff:fea5:3085} and to +@samp{203.178.141.194}. When the preferred family is @code{IPv4}, the +IPv4 address is used first; when the preferred family is @code{IPv6}, +the IPv6 address is used first; if the specified value is @code{none}, +the address order returned by DNS is used without change. + +Unlike @samp{-4} and @samp{-6}, this option doesn't inhibit access to +any address family, it only changes the @emph{order} in which the +addresses are accessed. Also note that the reordering performed by +this option is @dfn{stable}---it doesn't affect order of addresses of +the same family. That is, the relative order of all IPv4 addresses +and of all IPv6 addresses remains intact in all cases. + +@item --retry-connrefused +Consider ``connection refused'' a transient error and try again. +Normally Wget gives up on a URL when it is unable to connect to the +site because failure to connect is taken as a sign that the server is +not running at all and that retries would not help. This option is +for mirroring unreliable sites whose servers tend to disappear for +short periods of time. + +@cindex user +@cindex password +@cindex authentication +@item --user=@var{user} +@itemx --password=@var{password} +Specify the username @var{user} and password @var{password} for both +@sc{ftp} and @sc{http} file retrieval. These parameters can be overridden +using the @samp{--ftp-user} and @samp{--ftp-password} options for +@sc{ftp} connections and the @samp{--http-user} and @samp{--http-password} +options for @sc{http} connections. + +@item --ask-password +Prompt for a password for each connection established. Cannot be specified +when @samp{--password} is being used, because they are mutually exclusive. + +@item --use-askpass=@var{command} +Prompt for a user and password using the specified command. If no command is +specified then the command in the environment variable WGET_ASKPASS is used. +If WGET_ASKPASS is not set then the command in the environment variable +SSH_ASKPASS is used. + +You can set the default command for use-askpass in the @file{.wgetrc}. That +setting may be overridden from the command line. + +@cindex iri support +@cindex idn support +@item --no-iri + +Turn off internationalized URI (IRI) support. Use @samp{--iri} to +turn it on. IRI support is activated by default. + +You can set the default state of IRI support using the @code{iri} +command in @file{.wgetrc}. That setting may be overridden from the +command line. + +@cindex local encoding +@item --local-encoding=@var{encoding} + +Force Wget to use @var{encoding} as the default system encoding. That affects +how Wget converts URLs specified as arguments from locale to @sc{utf-8} for +IRI support. + +Wget use the function @code{nl_langinfo()} and then the @code{CHARSET} +environment variable to get the locale. If it fails, @sc{ascii} is used. + +You can set the default local encoding using the @code{local_encoding} +command in @file{.wgetrc}. That setting may be overridden from the +command line. + +@cindex remote encoding +@item --remote-encoding=@var{encoding} + +Force Wget to use @var{encoding} as the default remote server encoding. +That affects how Wget converts URIs found in files from remote encoding +to @sc{utf-8} during a recursive fetch. This options is only useful for +IRI support, for the interpretation of non-@sc{ascii} characters. + +For HTTP, remote encoding can be found in HTTP @code{Content-Type} +header and in HTML @code{Content-Type http-equiv} meta tag. + +You can set the default encoding using the @code{remoteencoding} +command in @file{.wgetrc}. That setting may be overridden from the +command line. + +@cindex unlink +@item --unlink + +Force Wget to unlink file instead of clobbering existing file. This +option is useful for downloading to the directory with hardlinks. + +@end table + +@node Directory Options, HTTP Options, Download Options, Invoking +@section Directory Options + +@table @samp +@item -nd +@itemx --no-directories +Do not create a hierarchy of directories when retrieving recursively. +With this option turned on, all files will get saved to the current +directory, without clobbering (if a name shows up more than once, the +filenames will get extensions @samp{.n}). + +@item -x +@itemx --force-directories +The opposite of @samp{-nd}---create a hierarchy of directories, even if +one would not have been created otherwise. E.g. @samp{wget -x +http://fly.srk.fer.hr/robots.txt} will save the downloaded file to +@file{fly.srk.fer.hr/robots.txt}. + +@item -nH +@itemx --no-host-directories +Disable generation of host-prefixed directories. By default, invoking +Wget with @samp{-r http://fly.srk.fer.hr/} will create a structure of +directories beginning with @file{fly.srk.fer.hr/}. This option disables +such behavior. + +@item --protocol-directories +Use the protocol name as a directory component of local file names. For +example, with this option, @samp{wget -r http://@var{host}} will save to +@samp{http/@var{host}/...} rather than just to @samp{@var{host}/...}. + +@cindex cut directories +@item --cut-dirs=@var{number} +Ignore @var{number} directory components. This is useful for getting a +fine-grained control over the directory where recursive retrieval will +be saved. + +Take, for example, the directory at +@samp{ftp://ftp.xemacs.org/pub/xemacs/}. If you retrieve it with +@samp{-r}, it will be saved locally under +@file{ftp.xemacs.org/pub/xemacs/}. While the @samp{-nH} option can +remove the @file{ftp.xemacs.org/} part, you are still stuck with +@file{pub/xemacs}. This is where @samp{--cut-dirs} comes in handy; it +makes Wget not ``see'' @var{number} remote directory components. Here +are several examples of how @samp{--cut-dirs} option works. + +@example +@group +No options -> ftp.xemacs.org/pub/xemacs/ +-nH -> pub/xemacs/ +-nH --cut-dirs=1 -> xemacs/ +-nH --cut-dirs=2 -> . + +--cut-dirs=1 -> ftp.xemacs.org/xemacs/ +... +@end group +@end example + +If you just want to get rid of the directory structure, this option is +similar to a combination of @samp{-nd} and @samp{-P}. However, unlike +@samp{-nd}, @samp{--cut-dirs} does not lose with subdirectories---for +instance, with @samp{-nH --cut-dirs=1}, a @file{beta/} subdirectory will +be placed to @file{xemacs/beta}, as one would expect. + +@cindex directory prefix +@item -P @var{prefix} +@itemx --directory-prefix=@var{prefix} +Set directory prefix to @var{prefix}. The @dfn{directory prefix} is the +directory where all other files and subdirectories will be saved to, +i.e. the top of the retrieval tree. The default is @samp{.} (the +current directory). +@end table + +@node HTTP Options, HTTPS (SSL/TLS) Options, Directory Options, Invoking +@section HTTP Options + +@table @samp +@cindex default page name +@cindex index.html +@item --default-page=@var{name} +Use @var{name} as the default file name when it isn't known (i.e., for +URLs that end in a slash), instead of @file{index.html}. + +@cindex .html extension +@cindex .css extension +@item -E +@itemx --adjust-extension +If a file of type @samp{application/xhtml+xml} or @samp{text/html} is +downloaded and the URL does not end with the regexp +@samp{\.[Hh][Tt][Mm][Ll]?}, this option will cause the suffix @samp{.html} +to be appended to the local filename. This is useful, for instance, when +you're mirroring a remote site that uses @samp{.asp} pages, but you want +the mirrored pages to be viewable on your stock Apache server. Another +good use for this is when you're downloading CGI-generated materials. A URL +like @samp{http://site.com/article.cgi?25} will be saved as +@file{article.cgi?25.html}. + +Note that filenames changed in this way will be re-downloaded every time +you re-mirror a site, because Wget can't tell that the local +@file{@var{X}.html} file corresponds to remote URL @samp{@var{X}} (since +it doesn't yet know that the URL produces output of type +@samp{text/html} or @samp{application/xhtml+xml}. + +As of version 1.12, Wget will also ensure that any downloaded files of +type @samp{text/css} end in the suffix @samp{.css}, and the option was +renamed from @samp{--html-extension}, to better reflect its new +behavior. The old option name is still acceptable, but should now be +considered deprecated. + +As of version 1.19.2, Wget will also ensure that any downloaded files with +a @code{Content-Encoding} of @samp{br}, @samp{compress}, @samp{deflate} +or @samp{gzip} end in the suffix @samp{.br}, @samp{.Z}, @samp{.zlib} +and @samp{.gz} respectively. + +At some point in the future, this option may well be expanded to +include suffixes for other types of content, including content types +that are not parsed by Wget. + +@cindex http user +@cindex http password +@cindex authentication +@item --http-user=@var{user} +@itemx --http-password=@var{password} +Specify the username @var{user} and password @var{password} on an +@sc{http} server. According to the type of the challenge, Wget will +encode them using either the @code{basic} (insecure), +the @code{digest}, or the Windows @code{NTLM} authentication scheme. + +Another way to specify username and password is in the @sc{url} itself +(@pxref{URL Format}). Either method reveals your password to anyone who +bothers to run @code{ps}. To prevent the passwords from being seen, +use the @samp{--use-askpass} or store them in @file{.wgetrc} or @file{.netrc}, +and make sure to protect those files from other users with @code{chmod}. If +the passwords are really important, do not leave them lying in those files +either---edit the files and delete them after Wget has started the download. + +@iftex +@xref{Security Considerations}, for more information about security +issues with Wget. +@end iftex + +@cindex Keep-Alive, turning off +@cindex Persistent Connections, disabling +@item --no-http-keep-alive +Turn off the ``keep-alive'' feature for HTTP downloads. Normally, Wget +asks the server to keep the connection open so that, when you download +more than one document from the same server, they get transferred over +the same TCP connection. This saves time and at the same time reduces +the load on the server. + +This option is useful when, for some reason, persistent (keep-alive) +connections don't work for you, for example due to a server bug or due +to the inability of server-side scripts to cope with the connections. + +@cindex proxy +@cindex cache +@item --no-cache +Disable server-side cache. In this case, Wget will send the remote +server appropriate directives (@samp{Cache-Control: no-cache} and +@samp{Pragma: no-cache}) to get the file from the remote service, +rather than returning the cached version. This is especially useful +for retrieving and flushing out-of-date documents on proxy servers. + +Caching is allowed by default. + +@cindex cookies +@item --no-cookies +Disable the use of cookies. Cookies are a mechanism for maintaining +server-side state. The server sends the client a cookie using the +@code{Set-Cookie} header, and the client responds with the same cookie +upon further requests. Since cookies allow the server owners to keep +track of visitors and for sites to exchange this information, some +consider them a breach of privacy. The default is to use cookies; +however, @emph{storing} cookies is not on by default. + +@cindex loading cookies +@cindex cookies, loading +@item --load-cookies @var{file} +Load cookies from @var{file} before the first HTTP retrieval. +@var{file} is a textual file in the format originally used by Netscape's +@file{cookies.txt} file. + +You will typically use this option when mirroring sites that require +that you be logged in to access some or all of their content. The login +process typically works by the web server issuing an @sc{http} cookie +upon receiving and verifying your credentials. The cookie is then +resent by the browser when accessing that part of the site, and so +proves your identity. + +Mirroring such a site requires Wget to send the same cookies your +browser sends when communicating with the site. This is achieved by +@samp{--load-cookies}---simply point Wget to the location of the +@file{cookies.txt} file, and it will send the same cookies your browser +would send in the same situation. Different browsers keep textual +cookie files in different locations: + +@table @asis +@item Netscape 4.x. +The cookies are in @file{~/.netscape/cookies.txt}. + +@item Mozilla and Netscape 6.x. +Mozilla's cookie file is also named @file{cookies.txt}, located +somewhere under @file{~/.mozilla}, in the directory of your profile. +The full path usually ends up looking somewhat like +@file{~/.mozilla/default/@var{some-weird-string}/cookies.txt}. + +@item Internet Explorer. +You can produce a cookie file Wget can use by using the File menu, +Import and Export, Export Cookies. This has been tested with Internet +Explorer 5; it is not guaranteed to work with earlier versions. + +@item Other browsers. +If you are using a different browser to create your cookies, +@samp{--load-cookies} will only work if you can locate or produce a +cookie file in the Netscape format that Wget expects. +@end table + +If you cannot use @samp{--load-cookies}, there might still be an +alternative. If your browser supports a ``cookie manager'', you can use +it to view the cookies used when accessing the site you're mirroring. +Write down the name and value of the cookie, and manually instruct Wget +to send those cookies, bypassing the ``official'' cookie support: + +@example +wget --no-cookies --header "Cookie: @var{name}=@var{value}" +@end example + +@cindex saving cookies +@cindex cookies, saving +@item --save-cookies @var{file} +Save cookies to @var{file} before exiting. This will not save cookies +that have expired or that have no expiry time (so-called ``session +cookies''), but also see @samp{--keep-session-cookies}. + +@cindex cookies, session +@cindex session cookies +@item --keep-session-cookies +When specified, causes @samp{--save-cookies} to also save session +cookies. Session cookies are normally not saved because they are +meant to be kept in memory and forgotten when you exit the browser. +Saving them is useful on sites that require you to log in or to visit +the home page before you can access some pages. With this option, +multiple Wget runs are considered a single browser session as far as +the site is concerned. + +Since the cookie file format does not normally carry session cookies, +Wget marks them with an expiry timestamp of 0. Wget's +@samp{--load-cookies} recognizes those as session cookies, but it might +confuse other browsers. Also note that cookies so loaded will be +treated as other session cookies, which means that if you want +@samp{--save-cookies} to preserve them again, you must use +@samp{--keep-session-cookies} again. + +@cindex Content-Length, ignore +@cindex ignore length +@item --ignore-length +Unfortunately, some @sc{http} servers (@sc{cgi} programs, to be more +precise) send out bogus @code{Content-Length} headers, which makes Wget +go wild, as it thinks not all the document was retrieved. You can spot +this syndrome if Wget retries getting the same document again and again, +each time claiming that the (otherwise normal) connection has closed on +the very same byte. + +With this option, Wget will ignore the @code{Content-Length} header---as +if it never existed. + +@cindex header, add +@item --header=@var{header-line} +Send @var{header-line} along with the rest of the headers in each +@sc{http} request. The supplied header is sent as-is, which means it +must contain name and value separated by colon, and must not contain +newlines. + +You may define more than one additional header by specifying +@samp{--header} more than once. + +@example +@group +wget --header='Accept-Charset: iso-8859-2' \ + --header='Accept-Language: hr' \ + http://fly.srk.fer.hr/ +@end group +@end example + +Specification of an empty string as the header value will clear all +previous user-defined headers. + +As of Wget 1.10, this option can be used to override headers otherwise +generated automatically. This example instructs Wget to connect to +localhost, but to specify @samp{foo.bar} in the @code{Host} header: + +@example +wget --header="Host: foo.bar" http://localhost/ +@end example + +In versions of Wget prior to 1.10 such use of @samp{--header} caused +sending of duplicate headers. + +@cindex Content-Encoding, choose +@item --compression=@var{type} +Choose the type of compression to be used. Legal values are +@samp{auto}, @samp{gzip} and @samp{none}. + +If @samp{auto} or @samp{gzip} are specified, Wget asks the server to +compress the file using the gzip compression format. If the server +compresses the file and responds with the @code{Content-Encoding} +header field set appropriately, the file will be decompressed +automatically. + +If @samp{none} is specified, wget will not ask the server to compress +the file and will not decompress any server responses. This is the default. + +Compression support is currently experimental. In case it is turned on, +please report any bugs to @code{bug-wget@@gnu.org}. + +@cindex redirect +@item --max-redirect=@var{number} +Specifies the maximum number of redirections to follow for a resource. +The default is 20, which is usually far more than necessary. However, on +those occasions where you want to allow more (or fewer), this is the +option to use. + +@cindex proxy user +@cindex proxy password +@cindex proxy authentication +@item --proxy-user=@var{user} +@itemx --proxy-password=@var{password} +Specify the username @var{user} and password @var{password} for +authentication on a proxy server. Wget will encode them using the +@code{basic} authentication scheme. + +Security considerations similar to those with @samp{--http-password} +pertain here as well. + +@cindex http referer +@cindex referer, http +@item --referer=@var{url} +Include `Referer: @var{url}' header in HTTP request. Useful for +retrieving documents with server-side processing that assume they are +always being retrieved by interactive web browsers and only come out +properly when Referer is set to one of the pages that point to them. + +@cindex server response, save +@item --save-headers +Save the headers sent by the @sc{http} server to the file, preceding the +actual contents, with an empty line as the separator. + +@cindex user-agent +@item -U @var{agent-string} +@itemx --user-agent=@var{agent-string} +Identify as @var{agent-string} to the @sc{http} server. + +The @sc{http} protocol allows the clients to identify themselves using a +@code{User-Agent} header field. This enables distinguishing the +@sc{www} software, usually for statistical purposes or for tracing of +protocol violations. Wget normally identifies as +@samp{Wget/@var{version}}, @var{version} being the current version +number of Wget. + +However, some sites have been known to impose the policy of tailoring +the output according to the @code{User-Agent}-supplied information. +While this is not such a bad idea in theory, it has been abused by +servers denying information to clients other than (historically) +Netscape or, more frequently, Microsoft Internet Explorer. This +option allows you to change the @code{User-Agent} line issued by Wget. +Use of this option is discouraged, unless you really know what you are +doing. + +Specifying empty user agent with @samp{--user-agent=""} instructs Wget +not to send the @code{User-Agent} header in @sc{http} requests. + +@cindex POST +@item --post-data=@var{string} +@itemx --post-file=@var{file} +Use POST as the method for all HTTP requests and send the specified +data in the request body. @samp{--post-data} sends @var{string} as +data, whereas @samp{--post-file} sends the contents of @var{file}. +Other than that, they work in exactly the same way. In particular, +they @emph{both} expect content of the form @code{key1=value1&key2=value2}, +with percent-encoding for special characters; the only difference is +that one expects its content as a command-line parameter and the other +accepts its content from a file. In particular, @samp{--post-file} is +@emph{not} for transmitting files as form attachments: those must +appear as @code{key=value} data (with appropriate percent-coding) just +like everything else. Wget does not currently support +@code{multipart/form-data} for transmitting POST data; only +@code{application/x-www-form-urlencoded}. Only one of +@samp{--post-data} and @samp{--post-file} should be specified. + +Please note that wget does not require the content to be of the form +@code{key1=value1&key2=value2}, and neither does it test for it. Wget will +simply transmit whatever data is provided to it. Most servers however expect +the POST data to be in the above format when processing HTML Forms. + +When sending a POST request using the @samp{--post-file} option, Wget treats +the file as a binary file and will send every character in the POST request +without stripping trailing newline or formfeed characters. Any other control +characters in the text will also be sent as-is in the POST request. + +Please be aware that Wget needs to know the size of the POST data in +advance. Therefore the argument to @code{--post-file} must be a regular +file; specifying a FIFO or something like @file{/dev/stdin} won't work. +It's not quite clear how to work around this limitation inherent in +HTTP/1.0. Although HTTP/1.1 introduces @dfn{chunked} transfer that +doesn't require knowing the request length in advance, a client can't +use chunked unless it knows it's talking to an HTTP/1.1 server. And it +can't know that until it receives a response, which in turn requires the +request to have been completed -- a chicken-and-egg problem. + +Note: As of version 1.15 if Wget is redirected after the POST request is +completed, its behaviour will depend on the response code returned by the +server. In case of a 301 Moved Permanently, 302 Moved Temporarily or +307 Temporary Redirect, Wget will, in accordance with RFC2616, continue +to send a POST request. +In case a server wants the client to change the Request method upon +redirection, it should send a 303 See Other response code. + +This example shows how to log in to a server using POST and then proceed to +download the desired pages, presumably only accessible to authorized +users: + +@example +@group +# @r{Log in to the server. This can be done only once.} +wget --save-cookies cookies.txt \ + --post-data 'user=foo&password=bar' \ + http://example.com/auth.php + +# @r{Now grab the page or pages we care about.} +wget --load-cookies cookies.txt \ + -p http://example.com/interesting/article.php +@end group +@end example + +If the server is using session cookies to track user authentication, +the above will not work because @samp{--save-cookies} will not save +them (and neither will browsers) and the @file{cookies.txt} file will +be empty. In that case use @samp{--keep-session-cookies} along with +@samp{--save-cookies} to force saving of session cookies. + +@cindex Other HTTP Methods +@item --method=@var{HTTP-Method} +For the purpose of RESTful scripting, Wget allows sending of other HTTP Methods +without the need to explicitly set them using @samp{--header=Header-Line}. +Wget will use whatever string is passed to it after @samp{--method} as the HTTP +Method to the server. + +@item --body-data=@var{Data-String} +@itemx --body-file=@var{Data-File} +Must be set when additional data needs to be sent to the server along with the +Method specified using @samp{--method}. @samp{--body-data} sends @var{string} as +data, whereas @samp{--body-file} sends the contents of @var{file}. Other than that, +they work in exactly the same way. + +Currently, @samp{--body-file} is @emph{not} for transmitting files as a whole. +Wget does not currently support @code{multipart/form-data} for transmitting data; +only @code{application/x-www-form-urlencoded}. In the future, this may be changed +so that wget sends the @samp{--body-file} as a complete file instead of sending its +contents to the server. Please be aware that Wget needs to know the contents of +BODY Data in advance, and hence the argument to @samp{--body-file} should be a +regular file. See @samp{--post-file} for a more detailed explanation. +Only one of @samp{--body-data} and @samp{--body-file} should be specified. + +If Wget is redirected after the request is completed, Wget will +suspend the current method and send a GET request till the redirection +is completed. This is true for all redirection response codes except +307 Temporary Redirect which is used to explicitly specify that the +request method should @emph{not} change. Another exception is when +the method is set to @code{POST}, in which case the redirection rules +specified under @samp{--post-data} are followed. + +@cindex Content-Disposition +@item --content-disposition + +If this is set to on, experimental (not fully-functional) support for +@code{Content-Disposition} headers is enabled. This can currently result in +extra round-trips to the server for a @code{HEAD} request, and is known +to suffer from a few bugs, which is why it is not currently enabled by default. + +This option is useful for some file-downloading CGI programs that use +@code{Content-Disposition} headers to describe what the name of a +downloaded file should be. + +When combined with @samp{--metalink-over-http} and @samp{--trust-server-names}, +a @samp{Content-Type: application/metalink4+xml} file is named using the +@code{Content-Disposition} filename field, if available. + +@cindex Content On Error +@item --content-on-error + +If this is set to on, wget will not skip the content when the server responds +with a http status code that indicates error. + +@cindex Trust server names +@item --trust-server-names + +If this is set, on a redirect, the local file name will be based +on the redirection URL. By default the local file name is based on +the original URL. When doing recursive retrieving this can be helpful +because in many web sites redirected URLs correspond to an underlying +file structure, while link URLs do not. + +@cindex authentication +@item --auth-no-challenge + +If this option is given, Wget will send Basic HTTP authentication +information (plaintext username and password) for all requests, just +like Wget 1.10.2 and prior did by default. + +Use of this option is not recommended, and is intended only to support +some few obscure servers, which never send HTTP authentication +challenges, but accept unsolicited auth info, say, in addition to +form-based authentication. + +@item --retry-on-host-error +Consider host errors, such as ``Temporary failure in name resolution'', +as non-fatal, transient errors. + +@item --retry-on-http-error=@var{code[,code,...]} +Consider given HTTP response codes as non-fatal, transient errors. +Supply a comma-separated list of 3-digit HTTP response codes as +argument. Useful to work around special circumstances where retries +are required, but the server responds with an error code normally not +retried by Wget. Such errors might be 503 (Service Unavailable) and +429 (Too Many Requests). Retries enabled by this option are performed +subject to the normal retry timing and retry count limitations of +Wget. + +Using this option is intended to support special use cases only and is +generally not recommended, as it can force retries even in cases where +the server is actually trying to decrease its load. Please use wisely +and only if you know what you are doing. + +@end table + +@node HTTPS (SSL/TLS) Options, FTP Options, HTTP Options, Invoking +@section HTTPS (SSL/TLS) Options + +@cindex SSL +To support encrypted HTTP (HTTPS) downloads, Wget must be compiled +with an external SSL library. The current default is GnuTLS. +In addition, Wget also supports HSTS (HTTP Strict Transport Security). +If Wget is compiled without SSL support, none of these options are available. + +@table @samp +@cindex SSL protocol, choose +@item --secure-protocol=@var{protocol} +Choose the secure protocol to be used. Legal values are @samp{auto}, +@samp{SSLv2}, @samp{SSLv3}, @samp{TLSv1}, @samp{TLSv1_1}, @samp{TLSv1_2}, +@samp{TLSv1_3} and @samp{PFS}. If @samp{auto} is used, the SSL library is +given the liberty of choosing the appropriate protocol automatically, which is +achieved by sending a TLSv1 greeting. This is the default. + +Specifying @samp{SSLv2}, @samp{SSLv3}, @samp{TLSv1}, @samp{TLSv1_1}, +@samp{TLSv1_2} or @samp{TLSv1_3} forces the use of the corresponding +protocol. This is useful when talking to old and buggy SSL server +implementations that make it hard for the underlying SSL library to choose +the correct protocol version. Fortunately, such servers are quite rare. + +Specifying @samp{PFS} enforces the use of the so-called Perfect Forward +Security cipher suites. In short, PFS adds security by creating a one-time +key for each SSL connection. It has a bit more CPU impact on client and server. +We use known to be secure ciphers (e.g. no MD4) and the TLS protocol. This mode +also explicitly excludes non-PFS key exchange methods, such as RSA. + +@item --https-only +When in recursive mode, only HTTPS links are followed. + +@item --ciphers +Set the cipher list string. Typically this string sets the +cipher suites and other SSL/TLS options that the user wish should be used, in a +set order of preference (GnuTLS calls it 'priority string'). This string +will be fed verbatim to the SSL/TLS engine (OpenSSL or GnuTLS) and hence +its format and syntax is dependent on that. Wget will not process or manipulate it +in any way. Refer to the OpenSSL or GnuTLS documentation for more information. + +@cindex SSL certificate, check +@item --no-check-certificate +Don't check the server certificate against the available certificate +authorities. Also don't require the URL host name to match the common +name presented by the certificate. + +As of Wget 1.10, the default is to verify the server's certificate +against the recognized certificate authorities, breaking the SSL +handshake and aborting the download if the verification fails. +Although this provides more secure downloads, it does break +interoperability with some sites that worked with previous Wget +versions, particularly those using self-signed, expired, or otherwise +invalid certificates. This option forces an ``insecure'' mode of +operation that turns the certificate verification errors into warnings +and allows you to proceed. + +If you encounter ``certificate verification'' errors or ones saying +that ``common name doesn't match requested host name'', you can use +this option to bypass the verification and proceed with the download. +@emph{Only use this option if you are otherwise convinced of the +site's authenticity, or if you really don't care about the validity of +its certificate.} It is almost always a bad idea not to check the +certificates when transmitting confidential or important data. +For self-signed/internal certificates, you should download the certificate +and verify against that instead of forcing this insecure mode. +If you are really sure of not desiring any certificate verification, you +can specify --check-certificate=quiet to tell wget to not print any +warning about invalid certificates, albeit in most cases this is the +wrong thing to do. + +@cindex SSL certificate +@item --certificate=@var{file} +Use the client certificate stored in @var{file}. This is needed for +servers that are configured to require certificates from the clients +that connect to them. Normally a certificate is not required and this +switch is optional. + +@cindex SSL certificate type, specify +@item --certificate-type=@var{type} +Specify the type of the client certificate. Legal values are +@samp{PEM} (assumed by default) and @samp{DER}, also known as +@samp{ASN1}. + +@item --private-key=@var{file} +Read the private key from @var{file}. This allows you to provide the +private key in a file separate from the certificate. + +@item --private-key-type=@var{type} +Specify the type of the private key. Accepted values are @samp{PEM} +(the default) and @samp{DER}. + +@item --ca-certificate=@var{file} +Use @var{file} as the file with the bundle of certificate authorities +(``CA'') to verify the peers. The certificates must be in PEM format. + +Without this option Wget looks for CA certificates at the +system-specified locations, chosen at OpenSSL installation time. + +@cindex SSL certificate authority +@item --ca-directory=@var{directory} +Specifies directory containing CA certificates in PEM format. Each +file contains one CA certificate, and the file name is based on a hash +value derived from the certificate. This is achieved by processing a +certificate directory with the @code{c_rehash} utility supplied with +OpenSSL. Using @samp{--ca-directory} is more efficient than +@samp{--ca-certificate} when many certificates are installed because +it allows Wget to fetch certificates on demand. + +Without this option Wget looks for CA certificates at the +system-specified locations, chosen at OpenSSL installation time. + +@cindex SSL CRL, certificate revocation list +@item --crl-file=@var{file} +Specifies a CRL file in @var{file}. This is needed for certificates +that have been revocated by the CAs. + +@cindex SSL Public Key Pin +@item --pinnedpubkey=file/hashes +Tells wget to use the specified public key file (or hashes) to verify the peer. +This can be a path to a file which contains a single public key in PEM or DER +format, or any number of base64 encoded sha256 hashes preceded by ``sha256//'' +and separated by ``;'' + +When negotiating a TLS or SSL connection, the server sends a certificate +indicating its identity. A public key is extracted from this certificate and if +it does not exactly match the public key(s) provided to this option, wget will +abort the connection before sending or receiving any data. + +@cindex entropy, specifying source of +@cindex randomness, specifying source of +@item --random-file=@var{file} +[OpenSSL and LibreSSL only] +Use @var{file} as the source of random data for seeding the +pseudo-random number generator on systems without @file{/dev/urandom}. + +On such systems the SSL library needs an external source of randomness +to initialize. Randomness may be provided by EGD (see +@samp{--egd-file} below) or read from an external source specified by +the user. If this option is not specified, Wget looks for random data +in @code{$RANDFILE} or, if that is unset, in @file{$HOME/.rnd}. + +If you're getting the ``Could not seed OpenSSL PRNG; disabling SSL.'' +error, you should provide random data using some of the methods +described above. + +@cindex EGD +@item --egd-file=@var{file} +[OpenSSL only] +Use @var{file} as the EGD socket. EGD stands for @dfn{Entropy +Gathering Daemon}, a user-space program that collects data from +various unpredictable system sources and makes it available to other +programs that might need it. Encryption software, such as the SSL +library, needs sources of non-repeating randomness to seed the random +number generator used to produce cryptographically strong keys. + +OpenSSL allows the user to specify his own source of entropy using the +@code{RAND_FILE} environment variable. If this variable is unset, or +if the specified file does not produce enough randomness, OpenSSL will +read random data from EGD socket specified using this option. + +If this option is not specified (and the equivalent startup command is +not used), EGD is never contacted. EGD is not needed on modern Unix +systems that support @file{/dev/urandom}. + +@cindex HSTS +@item --no-hsts +Wget supports HSTS (HTTP Strict Transport Security, RFC 6797) by default. +Use @samp{--no-hsts} to make Wget act as a non-HSTS-compliant UA. As a +consequence, Wget would ignore all the @code{Strict-Transport-Security} +headers, and would not enforce any existing HSTS policy. + +@item --hsts-file=@var{file} +By default, Wget stores its HSTS database in @file{~/.wget-hsts}. +You can use @samp{--hsts-file} to override this. Wget will use +the supplied file as the HSTS database. Such file must conform to the +correct HSTS database format used by Wget. If Wget cannot parse the provided +file, the behaviour is unspecified. + +The Wget's HSTS database is a plain text file. Each line contains an HSTS entry +(ie. a site that has issued a @code{Strict-Transport-Security} header and that +therefore has specified a concrete HSTS policy to be applied). Lines starting with +a dash (@code{#}) are ignored by Wget. Please note that in spite of this convenient +human-readability hand-hacking the HSTS database is generally not a good idea. + +An HSTS entry line consists of several fields separated by one or more whitespace: + +@code{<hostname> SP [<port>] SP <include subdomains> SP <created> SP <max-age>} + +The @var{hostname} and @var{port} fields indicate the hostname and port to which +the given HSTS policy applies. The @var{port} field may be zero, and it will, in +most of the cases. That means that the port number will not be taken into account +when deciding whether such HSTS policy should be applied on a given request (only +the hostname will be evaluated). When @var{port} is different to zero, both the +target hostname and the port will be evaluated and the HSTS policy will only be applied +if both of them match. This feature has been included for testing/development purposes only. +The Wget testsuite (in @file{testenv/}) creates HSTS databases with explicit ports +with the purpose of ensuring Wget's correct behaviour. Applying HSTS policies to ports +other than the default ones is discouraged by RFC 6797 (see Appendix B "Differences +between HSTS Policy and Same-Origin Policy"). Thus, this functionality should not be used +in production environments and @var{port} will typically be zero. The last three fields +do what they are expected to. The field @var{include_subdomains} can either be @code{1} +or @code{0} and it signals whether the subdomains of the target domain should be +part of the given HSTS policy as well. The @var{created} and @var{max-age} fields +hold the timestamp values of when such entry was created (first seen by Wget) and the +HSTS-defined value 'max-age', which states how long should that HSTS policy remain active, +measured in seconds elapsed since the timestamp stored in @var{created}. Once that time +has passed, that HSTS policy will no longer be valid and will eventually be removed +from the database. + +If you supply your own HSTS database via @samp{--hsts-file}, be aware that Wget +may modify the provided file if any change occurs between the HSTS policies +requested by the remote servers and those in the file. When Wget exits, +it effectively updates the HSTS database by rewriting the database file with the new entries. + +If the supplied file does not exist, Wget will create one. This file will contain the new HSTS +entries. If no HSTS entries were generated (no @code{Strict-Transport-Security} headers +were sent by any of the servers) then no file will be created, not even an empty one. This +behaviour applies to the default database file (@file{~/.wget-hsts}) as well: it will not be +created until some server enforces an HSTS policy. + +Care is taken not to override possible changes made by other Wget processes at +the same time over the HSTS database. Before dumping the updated HSTS entries +on the file, Wget will re-read it and merge the changes. + +Using a custom HSTS database and/or modifying an existing one is discouraged. +For more information about the potential security threats arose from such practice, +see section 14 "Security Considerations" of RFC 6797, specially section 14.9 +"Creative Manipulation of HSTS Policy Store". +@end table + +@cindex WARC +@table @samp +@item --warc-file=@var{file} +Use @var{file} as the destination WARC file. + +@item --warc-header=@var{string} +Use @var{string} into as the warcinfo record. + +@item --warc-max-size=@var{size} +Set the maximum size of the WARC files to @var{size}. + +@item --warc-cdx +Write CDX index files. + +@item --warc-dedup=@var{file} +Do not store records listed in this CDX file. + +@item --no-warc-compression +Do not compress WARC files with GZIP. + +@item --no-warc-digests +Do not calculate SHA1 digests. + +@item --no-warc-keep-log +Do not store the log file in a WARC record. + +@item --warc-tempdir=@var{dir} +Specify the location for temporary files created by the WARC writer. +@end table + +@node FTP Options, Recursive Retrieval Options, HTTPS (SSL/TLS) Options, Invoking +@section FTP Options + +@table @samp +@cindex ftp user +@cindex ftp password +@cindex ftp authentication +@item --ftp-user=@var{user} +@itemx --ftp-password=@var{password} +Specify the username @var{user} and password @var{password} on an +@sc{ftp} server. Without this, or the corresponding startup option, +the password defaults to @samp{-wget@@}, normally used for anonymous +FTP. + +Another way to specify username and password is in the @sc{url} itself +(@pxref{URL Format}). Either method reveals your password to anyone who +bothers to run @code{ps}. To prevent the passwords from being seen, +store them in @file{.wgetrc} or @file{.netrc}, and make sure to protect +those files from other users with @code{chmod}. If the passwords are +really important, do not leave them lying in those files either---edit +the files and delete them after Wget has started the download. + +@iftex +@xref{Security Considerations}, for more information about security +issues with Wget. +@end iftex + +@cindex .listing files, removing +@item --no-remove-listing +Don't remove the temporary @file{.listing} files generated by @sc{ftp} +retrievals. Normally, these files contain the raw directory listings +received from @sc{ftp} servers. Not removing them can be useful for +debugging purposes, or when you want to be able to easily check on the +contents of remote server directories (e.g. to verify that a mirror +you're running is complete). + +Note that even though Wget writes to a known filename for this file, +this is not a security hole in the scenario of a user making +@file{.listing} a symbolic link to @file{/etc/passwd} or something and +asking @code{root} to run Wget in his or her directory. Depending on +the options used, either Wget will refuse to write to @file{.listing}, +making the globbing/recursion/time-stamping operation fail, or the +symbolic link will be deleted and replaced with the actual +@file{.listing} file, or the listing will be written to a +@file{.listing.@var{number}} file. + +Even though this situation isn't a problem, though, @code{root} should +never run Wget in a non-trusted user's directory. A user could do +something as simple as linking @file{index.html} to @file{/etc/passwd} +and asking @code{root} to run Wget with @samp{-N} or @samp{-r} so the file +will be overwritten. + +@cindex globbing, toggle +@item --no-glob +Turn off @sc{ftp} globbing. Globbing refers to the use of shell-like +special characters (@dfn{wildcards}), like @samp{*}, @samp{?}, @samp{[} +and @samp{]} to retrieve more than one file from the same directory at +once, like: + +@example +wget ftp://gnjilux.srk.fer.hr/*.msg +@end example + +By default, globbing will be turned on if the @sc{url} contains a +globbing character. This option may be used to turn globbing on or off +permanently. + +You may have to quote the @sc{url} to protect it from being expanded by +your shell. Globbing makes Wget look for a directory listing, which is +system-specific. This is why it currently works only with Unix @sc{ftp} +servers (and the ones emulating Unix @code{ls} output). + +@cindex passive ftp +@item --no-passive-ftp +Disable the use of the @dfn{passive} FTP transfer mode. Passive FTP +mandates that the client connect to the server to establish the data +connection rather than the other way around. + +If the machine is connected to the Internet directly, both passive and +active FTP should work equally well. Behind most firewall and NAT +configurations passive FTP has a better chance of working. However, +in some rare firewall configurations, active FTP actually works when +passive FTP doesn't. If you suspect this to be the case, use this +option, or set @code{passive_ftp=off} in your init file. + +@cindex file permissions +@item --preserve-permissions +Preserve remote file permissions instead of permissions set by umask. + +@cindex symbolic links, retrieving +@item --retr-symlinks +By default, when retrieving @sc{ftp} directories recursively and a symbolic link +is encountered, the symbolic link is traversed and the pointed-to files are +retrieved. Currently, Wget does not traverse symbolic links to directories to +download them recursively, though this feature may be added in the future. + +When @samp{--retr-symlinks=no} is specified, the linked-to file is not +downloaded. Instead, a matching symbolic link is created on the local +file system. The pointed-to file will not be retrieved unless this recursive +retrieval would have encountered it separately and downloaded it anyway. This +option poses a security risk where a malicious FTP Server may cause Wget to +write to files outside of the intended directories through a specially crafted +@sc{.listing} file. + +Note that when retrieving a file (not a directory) because it was +specified on the command-line, rather than because it was recursed to, +this option has no effect. Symbolic links are always traversed in this +case. +@end table + +@section FTPS Options + +@table @samp +@item --ftps-implicit +This option tells Wget to use FTPS implicitly. Implicit FTPS consists of initializing +SSL/TLS from the very beginning of the control connection. This option does not send +an @code{AUTH TLS} command: it assumes the server speaks FTPS and directly starts an +SSL/TLS connection. If the attempt is successful, the session continues just like +regular FTPS (@code{PBSZ} and @code{PROT} are sent, etc.). +Implicit FTPS is no longer a requirement for FTPS implementations, and thus +many servers may not support it. If @samp{--ftps-implicit} is passed and no explicit +port number specified, the default port for implicit FTPS, 990, will be used, instead +of the default port for the "normal" (explicit) FTPS which is the same as that of FTP, +21. + +@item --no-ftps-resume-ssl +Do not resume the SSL/TLS session in the data channel. When starting a data connection, +Wget tries to resume the SSL/TLS session previously started in the control connection. +SSL/TLS session resumption avoids performing an entirely new handshake by reusing +the SSL/TLS parameters of a previous session. Typically, the FTPS servers want it that way, +so Wget does this by default. Under rare circumstances however, one might want to +start an entirely new SSL/TLS session in every data connection. +This is what @samp{--no-ftps-resume-ssl} is for. + +@item --ftps-clear-data-connection +All the data connections will be in plain text. Only the control connection will be +under SSL/TLS. Wget will send a @code{PROT C} command to achieve this, which must be +approved by the server. + +@item --ftps-fallback-to-ftp +Fall back to FTP if FTPS is not supported by the target server. For security reasons, +this option is not asserted by default. The default behaviour is to exit with an error. +If a server does not successfully reply to the initial @code{AUTH TLS} command, or in the +case of implicit FTPS, if the initial SSL/TLS connection attempt is rejected, it is +considered that such server does not support FTPS. +@end table + +@node Recursive Retrieval Options, Recursive Accept/Reject Options, FTP Options, Invoking +@section Recursive Retrieval Options + +@table @samp +@item -r +@itemx --recursive +Turn on recursive retrieving. @xref{Recursive Download}, for more +details. The default maximum depth is 5. + +@item -l @var{depth} +@itemx --level=@var{depth} +Set the maximum number of subdirectories that Wget will recurse into to @var{depth}. +In order to prevent one from accidentally downloading very large websites when using recursion +this is limited to a depth of 5 by default, i.e., it will traverse at most 5 directories deep +starting from the provided URL. +Set @samp{-l 0} or @samp{-l inf} for infinite recursion depth. + +@example +wget -r -l 0 http://@var{site}/1.html +@end example + +Ideally, one would expect this to download just @file{1.html}. +but unfortunately this is not the case, because @samp{-l 0} is equivalent to +@samp{-l inf}---that is, infinite recursion. To download a single @sc{html} +page (or a handful of them), specify them all on the command line and leave away @samp{-r} +and @samp{-l}. To download the essential items to view a single @sc{html} page, see @samp{page requisites}. + +@cindex proxy filling +@cindex delete after retrieval +@cindex filling proxy cache +@item --delete-after +This option tells Wget to delete every single file it downloads, +@emph{after} having done so. It is useful for pre-fetching popular +pages through a proxy, e.g.: + +@example +wget -r -nd --delete-after http://whatever.com/~popular/page/ +@end example + +The @samp{-r} option is to retrieve recursively, and @samp{-nd} to not +create directories. + +Note that @samp{--delete-after} deletes files on the local machine. It +does not issue the @samp{DELE} command to remote FTP sites, for +instance. Also note that when @samp{--delete-after} is specified, +@samp{--convert-links} is ignored, so @samp{.orig} files are simply not +created in the first place. + +@cindex conversion of links +@cindex link conversion +@item -k +@itemx --convert-links +After the download is complete, convert the links in the document to +make them suitable for local viewing. This affects not only the visible +hyperlinks, but any part of the document that links to external content, +such as embedded images, links to style sheets, hyperlinks to non-@sc{html} +content, etc. + +Each link will be changed in one of the two ways: + +@itemize @bullet +@item +The links to files that have been downloaded by Wget will be changed to +refer to the file they point to as a relative link. + +Example: if the downloaded file @file{/foo/doc.html} links to +@file{/bar/img.gif}, also downloaded, then the link in @file{doc.html} +will be modified to point to @samp{../bar/img.gif}. This kind of +transformation works reliably for arbitrary combinations of directories. + +@item +The links to files that have not been downloaded by Wget will be changed +to include host name and absolute path of the location they point to. + +Example: if the downloaded file @file{/foo/doc.html} links to +@file{/bar/img.gif} (or to @file{../bar/img.gif}), then the link in +@file{doc.html} will be modified to point to +@file{http://@var{hostname}/bar/img.gif}. +@end itemize + +Because of this, local browsing works reliably: if a linked file was +downloaded, the link will refer to its local name; if it was not +downloaded, the link will refer to its full Internet address rather than +presenting a broken link. The fact that the former links are converted +to relative links ensures that you can move the downloaded hierarchy to +another directory. + +Note that only at the end of the download can Wget know which links have +been downloaded. Because of that, the work done by @samp{-k} will be +performed at the end of all the downloads. + +@item --convert-file-only +This option converts only the filename part of the URLs, leaving the rest +of the URLs untouched. This filename part is sometimes referred to as the +"basename", although we avoid that term here in order not to cause confusion. + +It works particularly well in conjunction with @samp{--adjust-extension}, although +this coupling is not enforced. It proves useful to populate Internet caches +with files downloaded from different hosts. + +Example: if some link points to @file{//foo.com/bar.cgi?xyz} with +@samp{--adjust-extension} asserted and its local destination is intended to be +@file{./foo.com/bar.cgi?xyz.css}, then the link would be converted to +@file{//foo.com/bar.cgi?xyz.css}. Note that only the filename part has been +modified. The rest of the URL has been left untouched, including the net path +(@code{//}) which would otherwise be processed by Wget and converted to the +effective scheme (ie. @code{http://}). + +@cindex backing up converted files +@item -K +@itemx --backup-converted +When converting a file, back up the original version with a @samp{.orig} +suffix. Affects the behavior of @samp{-N} (@pxref{HTTP Time-Stamping +Internals}). + +@item -m +@itemx --mirror +Turn on options suitable for mirroring. This option turns on recursion +and time-stamping, sets infinite recursion depth and keeps @sc{ftp} +directory listings. It is currently equivalent to +@samp{-r -N -l inf --no-remove-listing}. + +@cindex page requisites +@cindex required images, downloading +@item -p +@itemx --page-requisites +This option causes Wget to download all the files that are necessary to +properly display a given @sc{html} page. This includes such things as +inlined images, sounds, and referenced stylesheets. + +Ordinarily, when downloading a single @sc{html} page, any requisite documents +that may be needed to display it properly are not downloaded. Using +@samp{-r} together with @samp{-l} can help, but since Wget does not +ordinarily distinguish between external and inlined documents, one is +generally left with ``leaf documents'' that are missing their +requisites. + +For instance, say document @file{1.html} contains an @code{<IMG>} tag +referencing @file{1.gif} and an @code{<A>} tag pointing to external +document @file{2.html}. Say that @file{2.html} is similar but that its +image is @file{2.gif} and it links to @file{3.html}. Say this +continues up to some arbitrarily high number. + +If one executes the command: + +@example +wget -r -l 2 http://@var{site}/1.html +@end example + +then @file{1.html}, @file{1.gif}, @file{2.html}, @file{2.gif}, and +@file{3.html} will be downloaded. As you can see, @file{3.html} is +without its requisite @file{3.gif} because Wget is simply counting the +number of hops (up to 2) away from @file{1.html} in order to determine +where to stop the recursion. However, with this command: + +@example +wget -r -l 2 -p http://@var{site}/1.html +@end example + +all the above files @emph{and} @file{3.html}'s requisite @file{3.gif} +will be downloaded. Similarly, + +@example +wget -r -l 1 -p http://@var{site}/1.html +@end example + +will cause @file{1.html}, @file{1.gif}, @file{2.html}, and @file{2.gif} +to be downloaded. One might think that: + +@example +wget -r -l 0 -p http://@var{site}/1.html +@end example + +would download just @file{1.html} and @file{1.gif}, but unfortunately +this is not the case, because @samp{-l 0} is equivalent to +@samp{-l inf}---that is, infinite recursion. To download a single @sc{html} +page (or a handful of them, all specified on the command-line or in a +@samp{-i} @sc{url} input file) and its (or their) requisites, simply leave off +@samp{-r} and @samp{-l}: + +@example +wget -p http://@var{site}/1.html +@end example + +Note that Wget will behave as if @samp{-r} had been specified, but only +that single page and its requisites will be downloaded. Links from that +page to external documents will not be followed. Actually, to download +a single page and all its requisites (even if they exist on separate +websites), and make sure the lot displays properly locally, this author +likes to use a few options in addition to @samp{-p}: + +@example +wget -E -H -k -K -p http://@var{site}/@var{document} +@end example + +To finish off this topic, it's worth knowing that Wget's idea of an +external document link is any URL specified in an @code{<A>} tag, an +@code{<AREA>} tag, or a @code{<LINK>} tag other than @code{<LINK +REL="stylesheet">}. + +@cindex @sc{html} comments +@cindex comments, @sc{html} +@item --strict-comments +Turn on strict parsing of @sc{html} comments. The default is to terminate +comments at the first occurrence of @samp{-->}. + +According to specifications, @sc{html} comments are expressed as @sc{sgml} +@dfn{declarations}. Declaration is special markup that begins with +@samp{<!} and ends with @samp{>}, such as @samp{<!DOCTYPE ...>}, that +may contain comments between a pair of @samp{--} delimiters. @sc{html} +comments are ``empty declarations'', @sc{sgml} declarations without any +non-comment text. Therefore, @samp{<!--foo-->} is a valid comment, and +so is @samp{<!--one-- --two-->}, but @samp{<!--1--2-->} is not. + +On the other hand, most @sc{html} writers don't perceive comments as anything +other than text delimited with @samp{<!--} and @samp{-->}, which is not +quite the same. For example, something like @samp{<!------------>} +works as a valid comment as long as the number of dashes is a multiple +of four (!). If not, the comment technically lasts until the next +@samp{--}, which may be at the other end of the document. Because of +this, many popular browsers completely ignore the specification and +implement what users have come to expect: comments delimited with +@samp{<!--} and @samp{-->}. + +Until version 1.9, Wget interpreted comments strictly, which resulted in +missing links in many web pages that displayed fine in browsers, but had +the misfortune of containing non-compliant comments. Beginning with +version 1.9, Wget has joined the ranks of clients that implements +``naive'' comments, terminating each comment at the first occurrence of +@samp{-->}. + +If, for whatever reason, you want strict comment parsing, use this +option to turn it on. +@end table + +@node Recursive Accept/Reject Options, Exit Status, Recursive Retrieval Options, Invoking +@section Recursive Accept/Reject Options + +@table @samp +@item -A @var{acclist} --accept @var{acclist} +@itemx -R @var{rejlist} --reject @var{rejlist} +Specify comma-separated lists of file name suffixes or patterns to +accept or reject (@pxref{Types of Files}). Note that if +any of the wildcard characters, @samp{*}, @samp{?}, @samp{[} or +@samp{]}, appear in an element of @var{acclist} or @var{rejlist}, +it will be treated as a pattern, rather than a suffix. +In this case, you have to enclose the pattern into quotes to prevent +your shell from expanding it, like in @samp{-A "*.mp3"} or @samp{-A '*.mp3'}. + +@item --accept-regex @var{urlregex} +@itemx --reject-regex @var{urlregex} +Specify a regular expression to accept or reject the complete URL. + +@item --regex-type @var{regextype} +Specify the regular expression type. Possible types are @samp{posix} or +@samp{pcre}. Note that to be able to use @samp{pcre} type, wget has to be +compiled with libpcre support. + +@item -D @var{domain-list} +@itemx --domains=@var{domain-list} +Set domains to be followed. @var{domain-list} is a comma-separated list +of domains. Note that it does @emph{not} turn on @samp{-H}. + +@item --exclude-domains @var{domain-list} +Specify the domains that are @emph{not} to be followed +(@pxref{Spanning Hosts}). + +@cindex follow FTP links +@item --follow-ftp +Follow @sc{ftp} links from @sc{html} documents. Without this option, +Wget will ignore all the @sc{ftp} links. + +@cindex tag-based recursive pruning +@item --follow-tags=@var{list} +Wget has an internal table of @sc{html} tag / attribute pairs that it +considers when looking for linked documents during a recursive +retrieval. If a user wants only a subset of those tags to be +considered, however, he or she should be specify such tags in a +comma-separated @var{list} with this option. + +@item --ignore-tags=@var{list} +This is the opposite of the @samp{--follow-tags} option. To skip +certain @sc{html} tags when recursively looking for documents to download, +specify them in a comma-separated @var{list}. + +In the past, this option was the best bet for downloading a single page +and its requisites, using a command-line like: + +@example +wget --ignore-tags=a,area -H -k -K -r http://@var{site}/@var{document} +@end example + +However, the author of this option came across a page with tags like +@code{<LINK REL="home" HREF="/">} and came to the realization that +specifying tags to ignore was not enough. One can't just tell Wget to +ignore @code{<LINK>}, because then stylesheets will not be downloaded. +Now the best bet for downloading a single page and its requisites is the +dedicated @samp{--page-requisites} option. + +@cindex case fold +@cindex ignore case +@item --ignore-case +Ignore case when matching files and directories. This influences the +behavior of -R, -A, -I, and -X options, as well as globbing +implemented when downloading from FTP sites. For example, with this +option, @samp{-A "*.txt"} will match @samp{file1.txt}, but also +@samp{file2.TXT}, @samp{file3.TxT}, and so on. +The quotes in the example are to prevent the shell from expanding the +pattern. + +@item -H +@itemx --span-hosts +Enable spanning across hosts when doing recursive retrieving +(@pxref{Spanning Hosts}). + +@item -L +@itemx --relative +Follow relative links only. Useful for retrieving a specific home page +without any distractions, not even those from the same hosts +(@pxref{Relative Links}). + +@item -I @var{list} +@itemx --include-directories=@var{list} +Specify a comma-separated list of directories you wish to follow when +downloading (@pxref{Directory-Based Limits}). Elements +of @var{list} may contain wildcards. + +@item -X @var{list} +@itemx --exclude-directories=@var{list} +Specify a comma-separated list of directories you wish to exclude from +download (@pxref{Directory-Based Limits}). Elements of +@var{list} may contain wildcards. + +@item -np +@item --no-parent +Do not ever ascend to the parent directory when retrieving recursively. +This is a useful option, since it guarantees that only the files +@emph{below} a certain hierarchy will be downloaded. +@xref{Directory-Based Limits}, for more details. +@end table + +@c man end + +@node Exit Status, , Recursive Accept/Reject Options, Invoking +@section Exit Status + +@c man begin EXITSTATUS + +Wget may return one of several error codes if it encounters problems. + + +@table @asis +@item 0 +No problems occurred. + +@item 1 +Generic error code. + +@item 2 +Parse error---for instance, when parsing command-line options, the +@samp{.wgetrc} or @samp{.netrc}... + +@item 3 +File I/O error. + +@item 4 +Network failure. + +@item 5 +SSL verification failure. + +@item 6 +Username/password authentication failure. + +@item 7 +Protocol errors. + +@item 8 +Server issued an error response. +@end table + + +With the exceptions of 0 and 1, the lower-numbered exit codes take +precedence over higher-numbered ones, when multiple types of errors +are encountered. + +In versions of Wget prior to 1.12, Wget's exit status tended to be +unhelpful and inconsistent. Recursive downloads would virtually always +return 0 (success), regardless of any issues encountered, and +non-recursive fetches only returned the status corresponding to the +most recently-attempted download. + +@c man end + +@node Recursive Download, Following Links, Invoking, Top +@chapter Recursive Download +@cindex recursion +@cindex retrieving +@cindex recursive download + +GNU Wget is capable of traversing parts of the Web (or a single +@sc{http} or @sc{ftp} server), following links and directory structure. +We refer to this as to @dfn{recursive retrieval}, or @dfn{recursion}. + +With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} or +@sc{css} from the given @sc{url}, retrieving the files the document +refers to, through markup like @code{href} or @code{src}, or @sc{css} +@sc{uri} values specified using the @samp{url()} functional notation. +If the freshly downloaded file is also of type @code{text/html}, +@code{application/xhtml+xml}, or @code{text/css}, it will be parsed +and followed further. + +Recursive retrieval of @sc{http} and @sc{html}/@sc{css} content is +@dfn{breadth-first}. This means that Wget first downloads the requested +document, then the documents linked from that document, then the +documents linked by them, and so on. In other words, Wget first +downloads the documents at depth 1, then those at depth 2, and so on +until the specified maximum depth. + +The maximum @dfn{depth} to which the retrieval may descend is specified +with the @samp{-l} option. The default maximum depth is five layers. + +When retrieving an @sc{ftp} @sc{url} recursively, Wget will retrieve all +the data from the given directory tree (including the subdirectories up +to the specified depth) on the remote server, creating its mirror image +locally. @sc{ftp} retrieval is also limited by the @code{depth} +parameter. Unlike @sc{http} recursion, @sc{ftp} recursion is performed +depth-first. + +By default, Wget will create a local directory tree, corresponding to +the one found on the remote server. + +Recursive retrieving can find a number of applications, the most +important of which is mirroring. It is also useful for @sc{www} +presentations, and any other opportunities where slow network +connections should be bypassed by storing the files locally. + +You should be warned that recursive downloads can overload the remote +servers. Because of that, many administrators frown upon them and may +ban access from your site if they detect very fast downloads of big +amounts of content. When downloading from Internet servers, consider +using the @samp{-w} option to introduce a delay between accesses to the +server. The download will take a while longer, but the server +administrator will not be alarmed by your rudeness. + +Of course, recursive download may cause problems on your machine. If +left to run unchecked, it can easily fill up the disk. If downloading +from local network, it can also take bandwidth on the system, as well as +consume memory and CPU. + +Try to specify the criteria that match the kind of download you are +trying to achieve. If you want to download only one page, use +@samp{--page-requisites} without any additional recursion. If you want +to download things under one directory, use @samp{-np} to avoid +downloading things from other directories. If you want to download all +the files from one directory, use @samp{-l 1} to make sure the recursion +depth never exceeds one. @xref{Following Links}, for more information +about this. + +Recursive retrieval should be used with care. Don't say you were not +warned. + +@node Following Links, Time-Stamping, Recursive Download, Top +@chapter Following Links +@cindex links +@cindex following links + +When retrieving recursively, one does not wish to retrieve loads of +unnecessary data. Most of the time the users bear in mind exactly what +they want to download, and want Wget to follow only specific links. + +For example, if you wish to download the music archive from +@samp{fly.srk.fer.hr}, you will not want to download all the home pages +that happen to be referenced by an obscure part of the archive. + +Wget possesses several mechanisms that allows you to fine-tune which +links it will follow. + +@menu +* Spanning Hosts:: (Un)limiting retrieval based on host name. +* Types of Files:: Getting only certain files. +* Directory-Based Limits:: Getting only certain directories. +* Relative Links:: Follow relative links only. +* FTP Links:: Following FTP links. +@end menu + +@node Spanning Hosts, Types of Files, Following Links, Following Links +@section Spanning Hosts +@cindex spanning hosts +@cindex hosts, spanning + +Wget's recursive retrieval normally refuses to visit hosts different +than the one you specified on the command line. This is a reasonable +default; without it, every retrieval would have the potential to turn +your Wget into a small version of google. + +However, visiting different hosts, or @dfn{host spanning,} is sometimes +a useful option. Maybe the images are served from a different server. +Maybe you're mirroring a site that consists of pages interlinked between +three servers. Maybe the server has two equivalent names, and the @sc{html} +pages refer to both interchangeably. + +@table @asis +@item Span to any host---@samp{-H} + +The @samp{-H} option turns on host spanning, thus allowing Wget's +recursive run to visit any host referenced by a link. Unless sufficient +recursion-limiting criteria are applied depth, these foreign hosts will +typically link to yet more hosts, and so on until Wget ends up sucking +up much more data than you have intended. + +@item Limit spanning to certain domains---@samp{-D} + +The @samp{-D} option allows you to specify the domains that will be +followed, thus limiting the recursion only to the hosts that belong to +these domains. Obviously, this makes sense only in conjunction with +@samp{-H}. A typical example would be downloading the contents of +@samp{www.example.com}, but allowing downloads from +@samp{images.example.com}, etc.: + +@example +wget -rH -Dexample.com http://www.example.com/ +@end example + +You can specify more than one address by separating them with a comma, +e.g. @samp{-Ddomain1.com,domain2.com}. + +@item Keep download off certain domains---@samp{--exclude-domains} + +If there are domains you want to exclude specifically, you can do it +with @samp{--exclude-domains}, which accepts the same type of arguments +of @samp{-D}, but will @emph{exclude} all the listed domains. For +example, if you want to download all the hosts from @samp{foo.edu} +domain, with the exception of @samp{sunsite.foo.edu}, you can do it like +this: + +@example +wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \ + http://www.foo.edu/ +@end example + +@end table + +@node Types of Files, Directory-Based Limits, Spanning Hosts, Following Links +@section Types of Files +@cindex types of files + +When downloading material from the web, you will often want to restrict +the retrieval to only certain file types. For example, if you are +interested in downloading @sc{gif}s, you will not be overjoyed to get +loads of PostScript documents, and vice versa. + +Wget offers two options to deal with this problem. Each option +description lists a short name, a long name, and the equivalent command +in @file{.wgetrc}. + +@cindex accept wildcards +@cindex accept suffixes +@cindex wildcards, accept +@cindex suffixes, accept +@table @samp +@item -A @var{acclist} +@itemx --accept @var{acclist} +@itemx accept = @var{acclist} +@itemx --accept-regex @var{urlregex} +@itemx accept-regex = @var{urlregex} +The argument to @samp{--accept} option is a list of file suffixes or +patterns that Wget will download during recursive retrieval. A suffix +is the ending part of a file, and consists of ``normal'' letters, +e.g. @samp{gif} or @samp{.jpg}. A matching pattern contains shell-like +wildcards, e.g. @samp{books*} or @samp{zelazny*196[0-9]*}. + +So, specifying @samp{wget -A gif,jpg} will make Wget download only the +files ending with @samp{gif} or @samp{jpg}, i.e. @sc{gif}s and +@sc{jpeg}s. On the other hand, @samp{wget -A "zelazny*196[0-9]*"} will +download only files beginning with @samp{zelazny} and containing numbers +from 1960 to 1969 anywhere within. Look up the manual of your shell for +a description of how pattern matching works. + +Of course, any number of suffixes and patterns can be combined into a +comma-separated list, and given as an argument to @samp{-A}. + +The argument to @samp{--accept-regex} option is a regular expression which +is matched against the complete URL. + +@cindex reject wildcards +@cindex reject suffixes +@cindex wildcards, reject +@cindex suffixes, reject +@item -R @var{rejlist} +@itemx --reject @var{rejlist} +@itemx reject = @var{rejlist} +@itemx --reject-regex @var{urlregex} +@itemx reject-regex = @var{urlregex} +The @samp{--reject} option works the same way as @samp{--accept}, only +its logic is the reverse; Wget will download all files @emph{except} the +ones matching the suffixes (or patterns) in the list. + +So, if you want to download a whole page except for the cumbersome +@sc{mpeg}s and @sc{.au} files, you can use @samp{wget -R mpg,mpeg,au}. +Analogously, to download all files except the ones beginning with +@samp{bjork}, use @samp{wget -R "bjork*"}. The quotes are to prevent +expansion by the shell. +@end table + +The argument to @samp{--accept-regex} option is a regular expression which +is matched against the complete URL. + +@noindent +The @samp{-A} and @samp{-R} options may be combined to achieve even +better fine-tuning of which files to retrieve. E.g. @samp{wget -A +"*zelazny*" -R .ps} will download all the files having @samp{zelazny} as +a part of their name, but @emph{not} the PostScript files. + +Note that these two options do not affect the downloading of @sc{html} +files (as determined by a @samp{.htm} or @samp{.html} filename +prefix). This behavior may not be desirable for all users, and may be +changed for future versions of Wget. + +Note, too, that query strings (strings at the end of a URL beginning +with a question mark (@samp{?}) are not included as part of the +filename for accept/reject rules, even though these will actually +contribute to the name chosen for the local file. It is expected that +a future version of Wget will provide an option to allow matching +against query strings. + +Finally, it's worth noting that the accept/reject lists are matched +@emph{twice} against downloaded files: once against the URL's filename +portion, to determine if the file should be downloaded in the first +place; then, after it has been accepted and successfully downloaded, +the local file's name is also checked against the accept/reject lists +to see if it should be removed. The rationale was that, since +@samp{.htm} and @samp{.html} files are always downloaded regardless of +accept/reject rules, they should be removed @emph{after} being +downloaded and scanned for links, if they did match the accept/reject +lists. However, this can lead to unexpected results, since the local +filenames can differ from the original URL filenames in the following +ways, all of which can change whether an accept/reject rule matches: + +@itemize @bullet +@item +If the local file already exists and @samp{--no-directories} was +specified, a numeric suffix will be appended to the original name. +@item +If @samp{--adjust-extension} was specified, the local filename might have +@samp{.html} appended to it. If Wget is invoked with @samp{-E -A.php}, +a filename such as @samp{index.php} will match be accepted, but upon +download will be named @samp{index.php.html}, which no longer matches, +and so the file will be deleted. +@item +Query strings do not contribute to URL matching, but are included in +local filenames, and so @emph{do} contribute to filename matching. +@end itemize + +@noindent +This behavior, too, is considered less-than-desirable, and may change +in a future version of Wget. + +@node Directory-Based Limits, Relative Links, Types of Files, Following Links +@section Directory-Based Limits +@cindex directories +@cindex directory limits + +Regardless of other link-following facilities, it is often useful to +place the restriction of what files to retrieve based on the directories +those files are placed in. There can be many reasons for this---the +home pages may be organized in a reasonable directory structure; or some +directories may contain useless information, e.g. @file{/cgi-bin} or +@file{/dev} directories. + +Wget offers three different options to deal with this requirement. Each +option description lists a short name, a long name, and the equivalent +command in @file{.wgetrc}. + +@cindex directories, include +@cindex include directories +@cindex accept directories +@table @samp +@item -I @var{list} +@itemx --include @var{list} +@itemx include_directories = @var{list} +@samp{-I} option accepts a comma-separated list of directories included +in the retrieval. Any other directories will simply be ignored. The +directories are absolute paths. + +So, if you wish to download from @samp{http://host/people/bozo/} +following only links to bozo's colleagues in the @file{/people} +directory and the bogus scripts in @file{/cgi-bin}, you can specify: + +@example +wget -I /people,/cgi-bin http://host/people/bozo/ +@end example + +@cindex directories, exclude +@cindex exclude directories +@cindex reject directories +@item -X @var{list} +@itemx --exclude @var{list} +@itemx exclude_directories = @var{list} +@samp{-X} option is exactly the reverse of @samp{-I}---this is a list of +directories @emph{excluded} from the download. E.g. if you do not want +Wget to download things from @file{/cgi-bin} directory, specify @samp{-X +/cgi-bin} on the command line. + +The same as with @samp{-A}/@samp{-R}, these two options can be combined +to get a better fine-tuning of downloading subdirectories. E.g. if you +want to load all the files from @file{/pub} hierarchy except for +@file{/pub/worthless}, specify @samp{-I/pub -X/pub/worthless}. + +@cindex no parent +@item -np +@itemx --no-parent +@itemx no_parent = on +The simplest, and often very useful way of limiting directories is +disallowing retrieval of the links that refer to the hierarchy +@dfn{above} than the beginning directory, i.e. disallowing ascent to the +parent directory/directories. + +The @samp{--no-parent} option (short @samp{-np}) is useful in this case. +Using it guarantees that you will never leave the existing hierarchy. +Supposing you issue Wget with: + +@example +wget -r --no-parent http://somehost/~luzer/my-archive/ +@end example + +You may rest assured that none of the references to +@file{/~his-girls-homepage/} or @file{/~luzer/all-my-mpegs/} will be +followed. Only the archive you are interested in will be downloaded. +Essentially, @samp{--no-parent} is similar to +@samp{-I/~luzer/my-archive}, only it handles redirections in a more +intelligent fashion. + +@strong{Note} that, for HTTP (and HTTPS), the trailing slash is very +important to @samp{--no-parent}. HTTP has no concept of a ``directory''---Wget +relies on you to indicate what's a directory and what isn't. In +@samp{http://foo/bar/}, Wget will consider @samp{bar} to be a +directory, while in @samp{http://foo/bar} (no trailing slash), +@samp{bar} will be considered a filename (so @samp{--no-parent} would be +meaningless, as its parent is @samp{/}). +@end table + +@node Relative Links, FTP Links, Directory-Based Limits, Following Links +@section Relative Links +@cindex relative links + +When @samp{-L} is turned on, only the relative links are ever followed. +Relative links are here defined those that do not refer to the web +server root. For example, these links are relative: + +@example +<a href="foo.gif"> +<a href="foo/bar.gif"> +<a href="../foo/bar.gif"> +@end example + +These links are not relative: + +@example +<a href="/foo.gif"> +<a href="/foo/bar.gif"> +<a href="http://www.example.com/foo/bar.gif"> +@end example + +Using this option guarantees that recursive retrieval will not span +hosts, even without @samp{-H}. In simple cases it also allows downloads +to ``just work'' without having to convert links. + +This option is probably not very useful and might be removed in a future +release. + +@node FTP Links, , Relative Links, Following Links +@section Following FTP Links +@cindex following ftp links + +The rules for @sc{ftp} are somewhat specific, as it is necessary for +them to be. @sc{ftp} links in @sc{html} documents are often included +for purposes of reference, and it is often inconvenient to download them +by default. + +To have @sc{ftp} links followed from @sc{html} documents, you need to +specify the @samp{--follow-ftp} option. Having done that, @sc{ftp} +links will span hosts regardless of @samp{-H} setting. This is logical, +as @sc{ftp} links rarely point to the same host where the @sc{http} +server resides. For similar reasons, the @samp{-L} options has no +effect on such downloads. On the other hand, domain acceptance +(@samp{-D}) and suffix rules (@samp{-A} and @samp{-R}) apply normally. + +Also note that followed links to @sc{ftp} directories will not be +retrieved recursively further. + +@node Time-Stamping, Startup File, Following Links, Top +@chapter Time-Stamping +@cindex time-stamping +@cindex timestamping +@cindex updating the archives +@cindex incremental updating + +One of the most important aspects of mirroring information from the +Internet is updating your archives. + +Downloading the whole archive again and again, just to replace a few +changed files is expensive, both in terms of wasted bandwidth and money, +and the time to do the update. This is why all the mirroring tools +offer the option of incremental updating. + +Such an updating mechanism means that the remote server is scanned in +search of @dfn{new} files. Only those new files will be downloaded in +the place of the old ones. + +A file is considered new if one of these two conditions are met: + +@enumerate +@item +A file of that name does not already exist locally. + +@item +A file of that name does exist, but the remote file was modified more +recently than the local file. +@end enumerate + +To implement this, the program needs to be aware of the time of last +modification of both local and remote files. We call this information the +@dfn{time-stamp} of a file. + +The time-stamping in GNU Wget is turned on using @samp{--timestamping} +(@samp{-N}) option, or through @code{timestamping = on} directive in +@file{.wgetrc}. With this option, for each file it intends to download, +Wget will check whether a local file of the same name exists. If it +does, and the remote file is not newer, Wget will not download it. + +If the local file does not exist, or the sizes of the files do not +match, Wget will download the remote file no matter what the time-stamps +say. + +@menu +* Time-Stamping Usage:: +* HTTP Time-Stamping Internals:: +* FTP Time-Stamping Internals:: +@end menu + +@node Time-Stamping Usage, HTTP Time-Stamping Internals, Time-Stamping, Time-Stamping +@section Time-Stamping Usage +@cindex time-stamping usage +@cindex usage, time-stamping + +The usage of time-stamping is simple. Say you would like to download a +file so that it keeps its date of modification. + +@example +wget -S http://www.gnu.ai.mit.edu/ +@end example + +A simple @code{ls -l} shows that the timestamp on the local file equals +the state of the @code{Last-Modified} header, as returned by the server. +As you can see, the time-stamping info is preserved locally, even +without @samp{-N} (at least for @sc{http}). + +Several days later, you would like Wget to check if the remote file has +changed, and download it if it has. + +@example +wget -N http://www.gnu.ai.mit.edu/ +@end example + +Wget will ask the server for the last-modified date. If the local file +has the same timestamp as the server, or a newer one, the remote file +will not be re-fetched. However, if the remote file is more recent, +Wget will proceed to fetch it. + +The same goes for @sc{ftp}. For example: + +@example +wget "ftp://ftp.ifi.uio.no/pub/emacs/gnus/*" +@end example + +(The quotes around that URL are to prevent the shell from trying to +interpret the @samp{*}.) + +After download, a local directory listing will show that the timestamps +match those on the remote server. Reissuing the command with @samp{-N} +will make Wget re-fetch @emph{only} the files that have been modified +since the last download. + +If you wished to mirror the GNU archive every week, you would use a +command like the following, weekly: + +@example +wget --timestamping -r ftp://ftp.gnu.org/pub/gnu/ +@end example + +Note that time-stamping will only work for files for which the server +gives a timestamp. For @sc{http}, this depends on getting a +@code{Last-Modified} header. For @sc{ftp}, this depends on getting a +directory listing with dates in a format that Wget can parse +(@pxref{FTP Time-Stamping Internals}). + +@node HTTP Time-Stamping Internals, FTP Time-Stamping Internals, Time-Stamping Usage, Time-Stamping +@section HTTP Time-Stamping Internals +@cindex http time-stamping + +Time-stamping in @sc{http} is implemented by checking of the +@code{Last-Modified} header. If you wish to retrieve the file +@file{foo.html} through @sc{http}, Wget will check whether +@file{foo.html} exists locally. If it doesn't, @file{foo.html} will be +retrieved unconditionally. + +If the file does exist locally, Wget will first check its local +time-stamp (similar to the way @code{ls -l} checks it), and then send a +@code{HEAD} request to the remote server, demanding the information on +the remote file. + +The @code{Last-Modified} header is examined to find which file was +modified more recently (which makes it ``newer''). If the remote file +is newer, it will be downloaded; if it is older, Wget will give +up.@footnote{As an additional check, Wget will look at the +@code{Content-Length} header, and compare the sizes; if they are not the +same, the remote file will be downloaded no matter what the time-stamp +says.} + +When @samp{--backup-converted} (@samp{-K}) is specified in conjunction +with @samp{-N}, server file @samp{@var{X}} is compared to local file +@samp{@var{X}.orig}, if extant, rather than being compared to local file +@samp{@var{X}}, which will always differ if it's been converted by +@samp{--convert-links} (@samp{-k}). + +Arguably, @sc{http} time-stamping should be implemented using the +@code{If-Modified-Since} request. + +@node FTP Time-Stamping Internals, , HTTP Time-Stamping Internals, Time-Stamping +@section FTP Time-Stamping Internals +@cindex ftp time-stamping + +In theory, @sc{ftp} time-stamping works much the same as @sc{http}, only +@sc{ftp} has no headers---time-stamps must be ferreted out of directory +listings. + +If an @sc{ftp} download is recursive or uses globbing, Wget will use the +@sc{ftp} @code{LIST} command to get a file listing for the directory +containing the desired file(s). It will try to analyze the listing, +treating it like Unix @code{ls -l} output, extracting the time-stamps. +The rest is exactly the same as for @sc{http}. Note that when +retrieving individual files from an @sc{ftp} server without using +globbing or recursion, listing files will not be downloaded (and thus +files will not be time-stamped) unless @samp{-N} is specified. + +Assumption that every directory listing is a Unix-style listing may +sound extremely constraining, but in practice it is not, as many +non-Unix @sc{ftp} servers use the Unixoid listing format because most +(all?) of the clients understand it. Bear in mind that @sc{rfc959} +defines no standard way to get a file list, let alone the time-stamps. +We can only hope that a future standard will define this. + +Another non-standard solution includes the use of @code{MDTM} command +that is supported by some @sc{ftp} servers (including the popular +@code{wu-ftpd}), which returns the exact time of the specified file. +Wget may support this command in the future. + +@node Startup File, Examples, Time-Stamping, Top +@chapter Startup File +@cindex startup file +@cindex wgetrc +@cindex .wgetrc +@cindex startup +@cindex .netrc + +Once you know how to change default settings of Wget through command +line arguments, you may wish to make some of those settings permanent. +You can do that in a convenient way by creating the Wget startup +file---@file{.wgetrc}. + +Besides @file{.wgetrc} is the ``main'' initialization file, it is +convenient to have a special facility for storing passwords. Thus Wget +reads and interprets the contents of @file{$HOME/.netrc}, if it finds +it. You can find @file{.netrc} format in your system manuals. + +Wget reads @file{.wgetrc} upon startup, recognizing a limited set of +commands. + +@menu +* Wgetrc Location:: Location of various wgetrc files. +* Wgetrc Syntax:: Syntax of wgetrc. +* Wgetrc Commands:: List of available commands. +* Sample Wgetrc:: A wgetrc example. +@end menu + +@node Wgetrc Location, Wgetrc Syntax, Startup File, Startup File +@section Wgetrc Location +@cindex wgetrc location +@cindex location of wgetrc + +When initializing, Wget will look for a @dfn{global} startup file, +@file{/usr/local/etc/wgetrc} by default (or some prefix other than +@file{/usr/local}, if Wget was not installed there) and read commands +from there, if it exists. + +Then it will look for the user's file. If the environmental variable +@code{WGETRC} is set, Wget will try to load that file. Failing that, no +further attempts will be made. + +If @code{WGETRC} is not set, Wget will try to load @file{$HOME/.wgetrc}. + +The fact that user's settings are loaded after the system-wide ones +means that in case of collision user's wgetrc @emph{overrides} the +system-wide wgetrc (in @file{/usr/local/etc/wgetrc} by default). +Fascist admins, away! + +@node Wgetrc Syntax, Wgetrc Commands, Wgetrc Location, Startup File +@section Wgetrc Syntax +@cindex wgetrc syntax +@cindex syntax of wgetrc + +The syntax of a wgetrc command is simple: + +@example +variable = value +@end example + +The @dfn{variable} will also be called @dfn{command}. Valid +@dfn{values} are different for different commands. + +The commands are case-, underscore- and minus-insensitive. Thus +@samp{DIr__PrefiX}, @samp{DIr-PrefiX} and @samp{dirprefix} are the same. +Empty lines, lines beginning with @samp{#} and lines containing white-space +only are discarded. + +Commands that expect a comma-separated list will clear the list on an +empty command. So, if you wish to reset the rejection list specified in +global @file{wgetrc}, you can do it with: + +@example +reject = +@end example + +@node Wgetrc Commands, Sample Wgetrc, Wgetrc Syntax, Startup File +@section Wgetrc Commands +@cindex wgetrc commands + +The complete set of commands is listed below. Legal values are listed +after the @samp{=}. Simple Boolean values can be set or unset using +@samp{on} and @samp{off} or @samp{1} and @samp{0}. + +Some commands take pseudo-arbitrary values. @var{address} values can be +hostnames or dotted-quad IP addresses. @var{n} can be any positive +integer, or @samp{inf} for infinity, where appropriate. @var{string} +values can be any non-empty string. + +Most of these commands have direct command-line equivalents. Also, any +wgetrc command can be specified on the command line using the +@samp{--execute} switch (@pxref{Basic Startup Options}.) + +@table @asis +@item accept/reject = @var{string} +Same as @samp{-A}/@samp{-R} (@pxref{Types of Files}). + +@item add_hostdir = on/off +Enable/disable host-prefixed file names. @samp{-nH} disables it. + +@item ask_password = on/off +Prompt for a password for each connection established. Cannot be specified +when @samp{--password} is being used, because they are mutually +exclusive. Equivalent to @samp{--ask-password}. + +@item auth_no_challenge = on/off +If this option is given, Wget will send Basic HTTP authentication +information (plaintext username and password) for all requests. See +@samp{--auth-no-challenge}. + +@item background = on/off +Enable/disable going to background---the same as @samp{-b} (which +enables it). + +@item backup_converted = on/off +Enable/disable saving pre-converted files with the suffix +@samp{.orig}---the same as @samp{-K} (which enables it). + +@item backups = @var{number} +Use up to @var{number} backups for a file. Backups are rotated by +adding an incremental counter that starts at @samp{1}. The default is +@samp{0}. + +@item base = @var{string} +Consider relative @sc{url}s in input files (specified via the +@samp{input} command or the @samp{--input-file}/@samp{-i} option, +together with @samp{force_html} or @samp{--force-html}) +as being relative to @var{string}---the same as @samp{--base=@var{string}}. + +@item bind_address = @var{address} +Bind to @var{address}, like the @samp{--bind-address=@var{address}}. + +@item ca_certificate = @var{file} +Set the certificate authority bundle file to @var{file}. The same +as @samp{--ca-certificate=@var{file}}. + +@item ca_directory = @var{directory} +Set the directory used for certificate authorities. The same as +@samp{--ca-directory=@var{directory}}. + +@item cache = on/off +When set to off, disallow server-caching. See the @samp{--no-cache} +option. + +@item certificate = @var{file} +Set the client certificate file name to @var{file}. The same as +@samp{--certificate=@var{file}}. + +@item certificate_type = @var{string} +Specify the type of the client certificate, legal values being +@samp{PEM} (the default) and @samp{DER} (aka ASN1). The same as +@samp{--certificate-type=@var{string}}. + +@item check_certificate = on/off +If this is set to off, the server certificate is not checked against +the specified client authorities. The default is ``on''. The same as +@samp{--check-certificate}. + +@item connect_timeout = @var{n} +Set the connect timeout---the same as @samp{--connect-timeout}. + +@item content_disposition = on/off +Turn on recognition of the (non-standard) @samp{Content-Disposition} +HTTP header---if set to @samp{on}, the same as @samp{--content-disposition}. + +@item trust_server_names = on/off +If set to on, construct the local file name from redirection URLs +rather than original URLs. + +@item continue = on/off +If set to on, force continuation of preexistent partially retrieved +files. See @samp{-c} before setting it. + +@item convert_links = on/off +Convert non-relative links locally. The same as @samp{-k}. + +@item cookies = on/off +When set to off, disallow cookies. See the @samp{--cookies} option. + +@item cut_dirs = @var{n} +Ignore @var{n} remote directory components. Equivalent to +@samp{--cut-dirs=@var{n}}. + +@item debug = on/off +Debug mode, same as @samp{-d}. + +@item default_page = @var{string} +Default page name---the same as @samp{--default-page=@var{string}}. + +@item delete_after = on/off +Delete after download---the same as @samp{--delete-after}. + +@item dir_prefix = @var{string} +Top of directory tree---the same as @samp{-P @var{string}}. + +@item dirstruct = on/off +Turning dirstruct on or off---the same as @samp{-x} or @samp{-nd}, +respectively. + +@item dns_cache = on/off +Turn DNS caching on/off. Since DNS caching is on by default, this +option is normally used to turn it off and is equivalent to +@samp{--no-dns-cache}. + +@item dns_timeout = @var{n} +Set the DNS timeout---the same as @samp{--dns-timeout}. + +@item domains = @var{string} +Same as @samp{-D} (@pxref{Spanning Hosts}). + +@item dot_bytes = @var{n} +Specify the number of bytes ``contained'' in a dot, as seen throughout +the retrieval (1024 by default). You can postfix the value with +@samp{k} or @samp{m}, representing kilobytes and megabytes, +respectively. With dot settings you can tailor the dot retrieval to +suit your needs, or you can use the predefined @dfn{styles} +(@pxref{Download Options}). + +@item dot_spacing = @var{n} +Specify the number of dots in a single cluster (10 by default). + +@item dots_in_line = @var{n} +Specify the number of dots that will be printed in each line throughout +the retrieval (50 by default). + +@item egd_file = @var{file} +Use @var{string} as the EGD socket file name. The same as +@samp{--egd-file=@var{file}}. + +@item exclude_directories = @var{string} +Specify a comma-separated list of directories you wish to exclude from +download---the same as @samp{-X @var{string}} (@pxref{Directory-Based +Limits}). + +@item exclude_domains = @var{string} +Same as @samp{--exclude-domains=@var{string}} (@pxref{Spanning +Hosts}). + +@item follow_ftp = on/off +Follow @sc{ftp} links from @sc{html} documents---the same as +@samp{--follow-ftp}. + +@item follow_tags = @var{string} +Only follow certain @sc{html} tags when doing a recursive retrieval, +just like @samp{--follow-tags=@var{string}}. + +@item force_html = on/off +If set to on, force the input filename to be regarded as an @sc{html} +document---the same as @samp{-F}. + +@item ftp_password = @var{string} +Set your @sc{ftp} password to @var{string}. Without this setting, the +password defaults to @samp{-wget@@}, which is a useful default for +anonymous @sc{ftp} access. + +This command used to be named @code{passwd} prior to Wget 1.10. + +@item ftp_proxy = @var{string} +Use @var{string} as @sc{ftp} proxy, instead of the one specified in +environment. + +@item ftp_user = @var{string} +Set @sc{ftp} user to @var{string}. + +This command used to be named @code{login} prior to Wget 1.10. + +@item glob = on/off +Turn globbing on/off---the same as @samp{--glob} and @samp{--no-glob}. + +@item header = @var{string} +Define a header for HTTP downloads, like using +@samp{--header=@var{string}}. + +@item compression = @var{string} +Choose the compression type to be used. Legal values are @samp{auto} +(the default), @samp{gzip}, and @samp{none}. The same as +@samp{--compression=@var{string}}. + +@item adjust_extension = on/off +Add a @samp{.html} extension to @samp{text/html} or +@samp{application/xhtml+xml} files that lack one, a @samp{.css} +extension to @samp{text/css} files that lack one, and a @samp{.br}, +@samp{.Z}, @samp{.zlib} or @samp{.gz} to compressed files like +@samp{-E}. Previously named @samp{html_extension} (still acceptable, +but deprecated). + +@item http_keep_alive = on/off +Turn the keep-alive feature on or off (defaults to on). Turning it +off is equivalent to @samp{--no-http-keep-alive}. + +@item http_password = @var{string} +Set @sc{http} password, equivalent to +@samp{--http-password=@var{string}}. + +@item http_proxy = @var{string} +Use @var{string} as @sc{http} proxy, instead of the one specified in +environment. + +@item http_user = @var{string} +Set @sc{http} user to @var{string}, equivalent to +@samp{--http-user=@var{string}}. + +@item https_only = on/off +When in recursive mode, only HTTPS links are followed (defaults to off). + +@item https_proxy = @var{string} +Use @var{string} as @sc{https} proxy, instead of the one specified in +environment. + +@item ignore_case = on/off +When set to on, match files and directories case insensitively; the +same as @samp{--ignore-case}. + +@item ignore_length = on/off +When set to on, ignore @code{Content-Length} header; the same as +@samp{--ignore-length}. + +@item ignore_tags = @var{string} +Ignore certain @sc{html} tags when doing a recursive retrieval, like +@samp{--ignore-tags=@var{string}}. + +@item include_directories = @var{string} +Specify a comma-separated list of directories you wish to follow when +downloading---the same as @samp{-I @var{string}}. + +@item iri = on/off +When set to on, enable internationalized URI (IRI) support; the same as +@samp{--iri}. + +@item inet4_only = on/off +Force connecting to IPv4 addresses, off by default. You can put this +in the global init file to disable Wget's attempts to resolve and +connect to IPv6 hosts. Available only if Wget was compiled with IPv6 +support. The same as @samp{--inet4-only} or @samp{-4}. + +@item inet6_only = on/off +Force connecting to IPv6 addresses, off by default. Available only if +Wget was compiled with IPv6 support. The same as @samp{--inet6-only} +or @samp{-6}. + +@item input = @var{file} +Read the @sc{url}s from @var{string}, like @samp{-i @var{file}}. + +@item keep_session_cookies = on/off +When specified, causes @samp{save_cookies = on} to also save session +cookies. See @samp{--keep-session-cookies}. + +@item limit_rate = @var{rate} +Limit the download speed to no more than @var{rate} bytes per second. +The same as @samp{--limit-rate=@var{rate}}. + +@item load_cookies = @var{file} +Load cookies from @var{file}. See @samp{--load-cookies @var{file}}. + +@item local_encoding = @var{encoding} +Force Wget to use @var{encoding} as the default system encoding. See +@samp{--local-encoding}. + +@item logfile = @var{file} +Set logfile to @var{file}, the same as @samp{-o @var{file}}. + +@item max_redirect = @var{number} +Specifies the maximum number of redirections to follow for a resource. +See @samp{--max-redirect=@var{number}}. + +@item mirror = on/off +Turn mirroring on/off. The same as @samp{-m}. + +@item netrc = on/off +Turn reading netrc on or off. + +@item no_clobber = on/off +Same as @samp{-nc}. + +@item no_parent = on/off +Disallow retrieving outside the directory hierarchy, like +@samp{--no-parent} (@pxref{Directory-Based Limits}). + +@item no_proxy = @var{string} +Use @var{string} as the comma-separated list of domains to avoid in +proxy loading, instead of the one specified in environment. + +@item output_document = @var{file} +Set the output filename---the same as @samp{-O @var{file}}. + +@item page_requisites = on/off +Download all ancillary documents necessary for a single @sc{html} page to +display properly---the same as @samp{-p}. + +@item passive_ftp = on/off +Change setting of passive @sc{ftp}, equivalent to the +@samp{--passive-ftp} option. + +@item password = @var{string} +Specify password @var{string} for both @sc{ftp} and @sc{http} file retrieval. +This command can be overridden using the @samp{ftp_password} and +@samp{http_password} command for @sc{ftp} and @sc{http} respectively. + +@item post_data = @var{string} +Use POST as the method for all HTTP requests and send @var{string} in +the request body. The same as @samp{--post-data=@var{string}}. + +@item post_file = @var{file} +Use POST as the method for all HTTP requests and send the contents of +@var{file} in the request body. The same as +@samp{--post-file=@var{file}}. + +@item prefer_family = none/IPv4/IPv6 +When given a choice of several addresses, connect to the addresses +with specified address family first. The address order returned by +DNS is used without change by default. The same as @samp{--prefer-family}, +which see for a detailed discussion of why this is useful. + +@item private_key = @var{file} +Set the private key file to @var{file}. The same as +@samp{--private-key=@var{file}}. + +@item private_key_type = @var{string} +Specify the type of the private key, legal values being @samp{PEM} +(the default) and @samp{DER} (aka ASN1). The same as +@samp{--private-type=@var{string}}. + +@item progress = @var{string} +Set the type of the progress indicator. Legal types are @samp{dot} +and @samp{bar}. Equivalent to @samp{--progress=@var{string}}. + +@item protocol_directories = on/off +When set, use the protocol name as a directory component of local file +names. The same as @samp{--protocol-directories}. + +@item proxy_password = @var{string} +Set proxy authentication password to @var{string}, like +@samp{--proxy-password=@var{string}}. + +@item proxy_user = @var{string} +Set proxy authentication user name to @var{string}, like +@samp{--proxy-user=@var{string}}. + +@item quiet = on/off +Quiet mode---the same as @samp{-q}. + +@item quota = @var{quota} +Specify the download quota, which is useful to put in the global +@file{wgetrc}. When download quota is specified, Wget will stop +retrieving after the download sum has become greater than quota. The +quota can be specified in bytes (default), kbytes @samp{k} appended) or +mbytes (@samp{m} appended). Thus @samp{quota = 5m} will set the quota +to 5 megabytes. Note that the user's startup file overrides system +settings. + +@item random_file = @var{file} +Use @var{file} as a source of randomness on systems lacking +@file{/dev/random}. + +@item random_wait = on/off +Turn random between-request wait times on or off. The same as +@samp{--random-wait}. + +@item read_timeout = @var{n} +Set the read (and write) timeout---the same as +@samp{--read-timeout=@var{n}}. + +@item reclevel = @var{n} +Recursion level (depth)---the same as @samp{-l @var{n}}. + +@item recursive = on/off +Recursive on/off---the same as @samp{-r}. + +@item referer = @var{string} +Set HTTP @samp{Referer:} header just like +@samp{--referer=@var{string}}. (Note that it was the folks who wrote +the @sc{http} spec who got the spelling of ``referrer'' wrong.) + +@item relative_only = on/off +Follow only relative links---the same as @samp{-L} (@pxref{Relative +Links}). + +@item remote_encoding = @var{encoding} +Force Wget to use @var{encoding} as the default remote server encoding. +See @samp{--remote-encoding}. + +@item remove_listing = on/off +If set to on, remove @sc{ftp} listings downloaded by Wget. Setting it +to off is the same as @samp{--no-remove-listing}. + +@item restrict_file_names = unix/windows +Restrict the file names generated by Wget from URLs. See +@samp{--restrict-file-names} for a more detailed description. + +@item retr_symlinks = on/off +When set to on, retrieve symbolic links as if they were plain files; the +same as @samp{--retr-symlinks}. + +@item retry_connrefused = on/off +When set to on, consider ``connection refused'' a transient +error---the same as @samp{--retry-connrefused}. + +@item robots = on/off +Specify whether the norobots convention is respected by Wget, ``on'' by +default. This switch controls both the @file{/robots.txt} and the +@samp{nofollow} aspect of the spec. @xref{Robot Exclusion}, for more +details about this. Be sure you know what you are doing before turning +this off. + +@item save_cookies = @var{file} +Save cookies to @var{file}. The same as @samp{--save-cookies +@var{file}}. + +@item save_headers = on/off +Same as @samp{--save-headers}. + +@item secure_protocol = @var{string} +Choose the secure protocol to be used. Legal values are @samp{auto} +(the default), @samp{SSLv2}, @samp{SSLv3}, and @samp{TLSv1}. The same +as @samp{--secure-protocol=@var{string}}. + +@item server_response = on/off +Choose whether or not to print the @sc{http} and @sc{ftp} server +responses---the same as @samp{-S}. + +@item show_all_dns_entries = on/off +When a DNS name is resolved, show all the IP addresses, not just the first +three. + +@item span_hosts = on/off +Same as @samp{-H}. + +@item spider = on/off +Same as @samp{--spider}. + +@item strict_comments = on/off +Same as @samp{--strict-comments}. + +@item timeout = @var{n} +Set all applicable timeout values to @var{n}, the same as @samp{-T +@var{n}}. + +@item timestamping = on/off +Turn timestamping on/off. The same as @samp{-N} (@pxref{Time-Stamping}). + +@item use_server_timestamps = on/off +If set to @samp{off}, Wget won't set the local file's timestamp by the +one on the server (same as @samp{--no-use-server-timestamps}). + +@item tries = @var{n} +Set number of retries per @sc{url}---the same as @samp{-t @var{n}}. + +@item use_proxy = on/off +When set to off, don't use proxy even when proxy-related environment +variables are set. In that case it is the same as using +@samp{--no-proxy}. + +@item user = @var{string} +Specify username @var{string} for both @sc{ftp} and @sc{http} file retrieval. +This command can be overridden using the @samp{ftp_user} and +@samp{http_user} command for @sc{ftp} and @sc{http} respectively. + +@item user_agent = @var{string} +User agent identification sent to the HTTP Server---the same as +@samp{--user-agent=@var{string}}. + +@item verbose = on/off +Turn verbose on/off---the same as @samp{-v}/@samp{-nv}. + +@item wait = @var{n} +Wait @var{n} seconds between retrievals---the same as @samp{-w +@var{n}}. + +@item wait_retry = @var{n} +Wait up to @var{n} seconds between retries of failed retrievals +only---the same as @samp{--waitretry=@var{n}}. Note that this is +turned on by default in the global @file{wgetrc}. +@end table + +@node Sample Wgetrc, , Wgetrc Commands, Startup File +@section Sample Wgetrc +@cindex sample wgetrc + +This is the sample initialization file, as given in the distribution. +It is divided in two section---one for global usage (suitable for global +startup file), and one for local usage (suitable for +@file{$HOME/.wgetrc}). Be careful about the things you change. + +Note that almost all the lines are commented out. For a command to have +any effect, you must remove the @samp{#} character at the beginning of +its line. + +@example +@include sample.wgetrc.munged_for_texi_inclusion +@end example + +@node Examples, Various, Startup File, Top +@chapter Examples +@cindex examples + +@c man begin EXAMPLES +The examples are divided into three sections loosely based on their +complexity. + +@menu +* Simple Usage:: Simple, basic usage of the program. +* Advanced Usage:: Advanced tips. +* Very Advanced Usage:: The hairy stuff. +@end menu + +@node Simple Usage, Advanced Usage, Examples, Examples +@section Simple Usage + +@itemize @bullet +@item +Say you want to download a @sc{url}. Just type: + +@example +wget http://fly.srk.fer.hr/ +@end example + +@item +But what will happen if the connection is slow, and the file is lengthy? +The connection will probably fail before the whole file is retrieved, +more than once. In this case, Wget will try getting the file until it +either gets the whole of it, or exceeds the default number of retries +(this being 20). It is easy to change the number of tries to 45, to +insure that the whole file will arrive safely: + +@example +wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg +@end example + +@item +Now let's leave Wget to work in the background, and write its progress +to log file @file{log}. It is tiring to type @samp{--tries}, so we +shall use @samp{-t}. + +@example +wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg & +@end example + +The ampersand at the end of the line makes sure that Wget works in the +background. To unlimit the number of retries, use @samp{-t inf}. + +@item +The usage of @sc{ftp} is as simple. Wget will take care of login and +password. + +@example +wget ftp://gnjilux.srk.fer.hr/welcome.msg +@end example + +@item +If you specify a directory, Wget will retrieve the directory listing, +parse it and convert it to @sc{html}. Try: + +@example +wget ftp://ftp.gnu.org/pub/gnu/ +links index.html +@end example +@end itemize + +@node Advanced Usage, Very Advanced Usage, Simple Usage, Examples +@section Advanced Usage + +@itemize @bullet +@item +You have a file that contains the URLs you want to download? Use the +@samp{-i} switch: + +@example +wget -i @var{file} +@end example + +If you specify @samp{-} as file name, the @sc{url}s will be read from +standard input. + +@item +Create a five levels deep mirror image of the GNU web site, with the +same directory structure the original has, with only one try per +document, saving the log of the activities to @file{gnulog}: + +@example +wget -r https://www.gnu.org/ -o gnulog +@end example + +@item +The same as the above, but convert the links in the downloaded files to +point to local files, so you can view the documents off-line: + +@example +wget --convert-links -r https://www.gnu.org/ -o gnulog +@end example + +@item +Retrieve only one @sc{html} page, but make sure that all the elements needed +for the page to be displayed, such as inline images and external style +sheets, are also downloaded. Also make sure the downloaded page +references the downloaded links. + +@example +wget -p --convert-links http://www.example.com/dir/page.html +@end example + +The @sc{html} page will be saved to @file{www.example.com/dir/page.html}, and +the images, stylesheets, etc., somewhere under @file{www.example.com/}, +depending on where they were on the remote server. + +@item +The same as the above, but without the @file{www.example.com/} directory. +In fact, I don't want to have all those random server directories +anyway---just save @emph{all} those files under a @file{download/} +subdirectory of the current directory. + +@example +wget -p --convert-links -nH -nd -Pdownload \ + http://www.example.com/dir/page.html +@end example + +@item +Retrieve the index.html of @samp{www.lycos.com}, showing the original +server headers: + +@example +wget -S http://www.lycos.com/ +@end example + +@item +Save the server headers with the file, perhaps for post-processing. + +@example +wget --save-headers http://www.lycos.com/ +more index.html +@end example + +@item +Retrieve the first two levels of @samp{wuarchive.wustl.edu}, saving them +to @file{/tmp}. + +@example +wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/ +@end example + +@item +You want to download all the @sc{gif}s from a directory on an @sc{http} +server. You tried @samp{wget http://www.example.com/dir/*.gif}, but that +didn't work because @sc{http} retrieval does not support globbing. In +that case, use: + +@example +wget -r -l1 --no-parent -A.gif http://www.example.com/dir/ +@end example + +More verbose, but the effect is the same. @samp{-r -l1} means to +retrieve recursively (@pxref{Recursive Download}), with maximum depth +of 1. @samp{--no-parent} means that references to the parent directory +are ignored (@pxref{Directory-Based Limits}), and @samp{-A.gif} means to +download only the @sc{gif} files. @samp{-A "*.gif"} would have worked +too. + +@item +Suppose you were in the middle of downloading, when Wget was +interrupted. Now you do not want to clobber the files already present. +It would be: + +@example +wget -nc -r https://www.gnu.org/ +@end example + +@item +If you want to encode your own username and password to @sc{http} or +@sc{ftp}, use the appropriate @sc{url} syntax (@pxref{URL Format}). + +@example +wget ftp://hniksic:mypassword@@unix.example.com/.emacs +@end example + +Note, however, that this usage is not advisable on multi-user systems +because it reveals your password to anyone who looks at the output of +@code{ps}. + +@cindex redirecting output +@item +You would like the output documents to go to standard output instead of +to files? + +@example +wget -O - http://jagor.srce.hr/ http://www.srce.hr/ +@end example + +You can also combine the two options and make pipelines to retrieve the +documents from remote hotlists: + +@example +wget -O - http://cool.list.com/ | wget --force-html -i - +@end example +@end itemize + +@node Very Advanced Usage, , Advanced Usage, Examples +@section Very Advanced Usage + +@cindex mirroring +@itemize @bullet +@item +If you wish Wget to keep a mirror of a page (or @sc{ftp} +subdirectories), use @samp{--mirror} (@samp{-m}), which is the shorthand +for @samp{-r -l inf -N}. You can put Wget in the crontab file asking it +to recheck a site each Sunday: + +@example +crontab +0 0 * * 0 wget --mirror https://www.gnu.org/ -o /home/me/weeklog +@end example + +@item +In addition to the above, you want the links to be converted for local +viewing. But, after having read this manual, you know that link +conversion doesn't play well with timestamping, so you also want Wget to +back up the original @sc{html} files before the conversion. Wget invocation +would look like this: + +@example +wget --mirror --convert-links --backup-converted \ + https://www.gnu.org/ -o /home/me/weeklog +@end example + +@item +But you've also noticed that local viewing doesn't work all that well +when @sc{html} files are saved under extensions other than @samp{.html}, +perhaps because they were served as @file{index.cgi}. So you'd like +Wget to rename all the files served with content-type @samp{text/html} +or @samp{application/xhtml+xml} to @file{@var{name}.html}. + +@example +wget --mirror --convert-links --backup-converted \ + --adjust-extension -o /home/me/weeklog \ + https://www.gnu.org/ +@end example + +Or, with less typing: + +@example +wget -m -k -K -E https://www.gnu.org/ -o /home/me/weeklog +@end example +@end itemize +@c man end + +@node Various, Appendices, Examples, Top +@chapter Various +@cindex various + +This chapter contains all the stuff that could not fit anywhere else. + +@menu +* Proxies:: Support for proxy servers. +* Distribution:: Getting the latest version. +* Web Site:: GNU Wget's presence on the World Wide Web. +* Mailing Lists:: Wget mailing list for announcements and discussion. +* Internet Relay Chat:: Wget's presence on IRC. +* Reporting Bugs:: How and where to report bugs. +* Portability:: The systems Wget works on. +* Signals:: Signal-handling performed by Wget. +@end menu + +@node Proxies, Distribution, Various, Various +@section Proxies +@cindex proxies + +@dfn{Proxies} are special-purpose @sc{http} servers designed to transfer +data from remote servers to local clients. One typical use of proxies +is lightening network load for users behind a slow connection. This is +achieved by channeling all @sc{http} and @sc{ftp} requests through the +proxy which caches the transferred data. When a cached resource is +requested again, proxy will return the data from cache. Another use for +proxies is for companies that separate (for security reasons) their +internal networks from the rest of Internet. In order to obtain +information from the Web, their users connect and retrieve remote data +using an authorized proxy. + +@c man begin ENVIRONMENT +Wget supports proxies for both @sc{http} and @sc{ftp} retrievals. The +standard way to specify proxy location, which Wget recognizes, is using +the following environment variables: + +@table @env +@item http_proxy +@itemx https_proxy +If set, the @env{http_proxy} and @env{https_proxy} variables should +contain the @sc{url}s of the proxies for @sc{http} and @sc{https} +connections respectively. + +@item ftp_proxy +This variable should contain the @sc{url} of the proxy for @sc{ftp} +connections. It is quite common that @env{http_proxy} and +@env{ftp_proxy} are set to the same @sc{url}. + +@item no_proxy +This variable should contain a comma-separated list of domain extensions +proxy should @emph{not} be used for. For instance, if the value of +@env{no_proxy} is @samp{.mit.edu}, proxy will not be used to retrieve +documents from MIT. +@end table +@c man end + +In addition to the environment variables, proxy location and settings +may be specified from within Wget itself. + +@table @samp +@item --no-proxy +@itemx proxy = on/off +This option and the corresponding command may be used to suppress the +use of proxy, even if the appropriate environment variables are set. + +@item http_proxy = @var{URL} +@itemx https_proxy = @var{URL} +@itemx ftp_proxy = @var{URL} +@itemx no_proxy = @var{string} +These startup file variables allow you to override the proxy settings +specified by the environment. +@end table + +Some proxy servers require authorization to enable you to use them. The +authorization consists of @dfn{username} and @dfn{password}, which must +be sent by Wget. As with @sc{http} authorization, several +authentication schemes exist. For proxy authorization only the +@code{Basic} authentication scheme is currently implemented. + +You may specify your username and password either through the proxy +@sc{url} or through the command-line options. Assuming that the +company's proxy is located at @samp{proxy.company.com} at port 8001, a +proxy @sc{url} location containing authorization data might look like +this: + +@example +http://hniksic:mypassword@@proxy.company.com:8001/ +@end example + +Alternatively, you may use the @samp{proxy-user} and +@samp{proxy-password} options, and the equivalent @file{.wgetrc} +settings @code{proxy_user} and @code{proxy_password} to set the proxy +username and password. + +@node Distribution, Web Site, Proxies, Various +@section Distribution +@cindex latest version + +Like all GNU utilities, the latest version of Wget can be found at the +master GNU archive site ftp.gnu.org, and its mirrors. For example, +Wget @value{VERSION} can be found at +@url{https://ftp.gnu.org/pub/gnu/wget/wget-@value{VERSION}.tar.gz} + +@node Web Site, Mailing Lists, Distribution, Various +@section Web Site +@cindex web site + +The official web site for GNU Wget is at +@url{https//www.gnu.org/software/wget/}. However, most useful +information resides at ``The Wget Wgiki'', +@url{http://wget.addictivecode.org/}. + +@node Mailing Lists, Internet Relay Chat, Web Site, Various +@section Mailing Lists +@cindex mailing list +@cindex list + +@unnumberedsubsec Primary List + +The primary mailinglist for discussion, bug-reports, or questions +about GNU Wget is at @email{bug-wget@@gnu.org}. To subscribe, send an +email to @email{bug-wget-join@@gnu.org}, or visit +@url{https://lists.gnu.org/mailman/listinfo/bug-wget}. + +You do not need to subscribe to send a message to the list; however, +please note that unsubscribed messages are moderated, and may take a +while before they hit the list---@strong{usually around a day}. If +you want your message to show up immediately, please subscribe to the +list before posting. Archives for the list may be found at +@url{https://lists.gnu.org/archive/html/bug-wget/}. + +An NNTP/Usenettish gateway is also available via +@uref{http://gmane.org/about.php,Gmane}. You can see the Gmane +archives at +@url{http://news.gmane.org/gmane.comp.web.wget.general}. Note that the +Gmane archives conveniently include messages from both the current +list, and the previous one. Messages also show up in the Gmane +archives sooner than they do at @url{https://lists.gnu.org}. + +@unnumberedsubsec Obsolete Lists + +Previously, the mailing list @email{wget@@sunsite.dk} was used as the +main discussion list, and another list, +@email{wget-patches@@sunsite.dk} was used for submitting and +discussing patches to GNU Wget. + +Messages from @email{wget@@sunsite.dk} are archived at +@itemize @tie{} +@item +@url{https://www.mail-archive.com/wget%40sunsite.dk/} and at +@item +@url{http://news.gmane.org/gmane.comp.web.wget.general} (which also +continues to archive the current list, @email{bug-wget@@gnu.org}). +@end itemize + +Messages from @email{wget-patches@@sunsite.dk} are archived at +@itemize @tie{} +@item +@url{http://news.gmane.org/gmane.comp.web.wget.patches}. +@end itemize + +@node Internet Relay Chat, Reporting Bugs, Mailing Lists, Various +@section Internet Relay Chat +@cindex Internet Relay Chat +@cindex IRC +@cindex #wget + +In addition to the mailinglists, we also have a support channel set up +via IRC at @code{irc.freenode.org}, @code{#wget}. Come check it out! + +@node Reporting Bugs, Portability, Internet Relay Chat, Various +@section Reporting Bugs +@cindex bugs +@cindex reporting bugs +@cindex bug reports + +@c man begin BUGS +You are welcome to submit bug reports via the GNU Wget bug tracker (see +@url{https://savannah.gnu.org/bugs/?func=additem&group=wget}) or to our +mailing list @email{bug-wget@@gnu.org}. + +Visit @url{https://lists.gnu.org/mailman/listinfo/bug-wget} to +get more info (how to subscribe, list archives, ...). + +Before actually submitting a bug report, please try to follow a few +simple guidelines. + +@enumerate +@item +Please try to ascertain that the behavior you see really is a bug. If +Wget crashes, it's a bug. If Wget does not behave as documented, +it's a bug. If things work strange, but you are not sure about the way +they are supposed to work, it might well be a bug, but you might want to +double-check the documentation and the mailing lists (@pxref{Mailing +Lists}). + +@item +Try to repeat the bug in as simple circumstances as possible. E.g. if +Wget crashes while downloading @samp{wget -rl0 -kKE -t5 --no-proxy +http://example.com -o /tmp/log}, you should try to see if the crash is +repeatable, and if will occur with a simpler set of options. You might +even try to start the download at the page where the crash occurred to +see if that page somehow triggered the crash. + +Also, while I will probably be interested to know the contents of your +@file{.wgetrc} file, just dumping it into the debug message is probably +a bad idea. Instead, you should first try to see if the bug repeats +with @file{.wgetrc} moved out of the way. Only if it turns out that +@file{.wgetrc} settings affect the bug, mail me the relevant parts of +the file. + +@item +Please start Wget with @samp{-d} option and send us the resulting +output (or relevant parts thereof). If Wget was compiled without +debug support, recompile it---it is @emph{much} easier to trace bugs +with debug support on. + +Note: please make sure to remove any potentially sensitive information +from the debug log before sending it to the bug address. The +@code{-d} won't go out of its way to collect sensitive information, +but the log @emph{will} contain a fairly complete transcript of Wget's +communication with the server, which may include passwords and pieces +of downloaded data. Since the bug address is publicly archived, you +may assume that all bug reports are visible to the public. + +@item +If Wget has crashed, try to run it in a debugger, e.g. @code{gdb `which +wget` core} and type @code{where} to get the backtrace. This may not +work if the system administrator has disabled core files, but it is +safe to try. +@end enumerate +@c man end + +@node Portability, Signals, Reporting Bugs, Various +@section Portability +@cindex portability +@cindex operating systems + +Like all GNU software, Wget works on the GNU system. However, since it +uses GNU Autoconf for building and configuring, and mostly avoids using +``special'' features of any particular Unix, it should compile (and +work) on all common Unix flavors. + +Various Wget versions have been compiled and tested under many kinds of +Unix systems, including GNU/Linux, Solaris, SunOS 4.x, Mac OS X, OSF +(aka Digital Unix or Tru64), Ultrix, *BSD, IRIX, AIX, and others. Some +of those systems are no longer in widespread use and may not be able to +support recent versions of Wget. If Wget fails to compile on your +system, we would like to know about it. + +Thanks to kind contributors, this version of Wget compiles and works +on 32-bit Microsoft Windows platforms. It has been compiled +successfully using MS Visual C++ 6.0, Watcom, Borland C, and GCC +compilers. Naturally, it is crippled of some features available on +Unix, but it should work as a substitute for people stuck with +Windows. Note that Windows-specific portions of Wget are not +guaranteed to be supported in the future, although this has been the +case in practice for many years now. All questions and problems in +Windows usage should be reported to Wget mailing list at +@email{wget@@sunsite.dk} where the volunteers who maintain the +Windows-related features might look at them. + +Support for building on MS-DOS via DJGPP has been contributed by Gisle +Vanem; a port to VMS is maintained by Steven Schweda, and is available +at @url{https://antinode.info/dec/sw/wget.html}. + +@node Signals, , Portability, Various +@section Signals +@cindex signal handling +@cindex hangup + +Since the purpose of Wget is background work, it catches the hangup +signal (@code{SIGHUP}) and ignores it. If the output was on standard +output, it will be redirected to a file named @file{wget-log}. +Otherwise, @code{SIGHUP} is ignored. This is convenient when you wish +to redirect the output of Wget after having started it. + +@example +$ wget http://www.gnus.org/dist/gnus.tar.gz & +... +$ kill -HUP %% +SIGHUP received, redirecting output to `wget-log'. +@end example + +Other than that, Wget will not try to interfere with signals in any way. +@kbd{C-c}, @code{kill -TERM} and @code{kill -KILL} should kill it alike. + +@node Appendices, Copying this manual, Various, Top +@chapter Appendices + +This chapter contains some references I consider useful. + +@menu +* Robot Exclusion:: Wget's support for RES. +* Security Considerations:: Security with Wget. +* Contributors:: People who helped. +@end menu + +@node Robot Exclusion, Security Considerations, Appendices, Appendices +@section Robot Exclusion +@cindex robot exclusion +@cindex robots.txt +@cindex server maintenance + +It is extremely easy to make Wget wander aimlessly around a web site, +sucking all the available data in progress. @samp{wget -r @var{site}}, +and you're set. Great? Not for the server admin. + +As long as Wget is only retrieving static pages, and doing it at a +reasonable rate (see the @samp{--wait} option), there's not much of a +problem. The trouble is that Wget can't tell the difference between the +smallest static page and the most demanding CGI. A site I know has a +section handled by a CGI Perl script that converts Info files to @sc{html} on +the fly. The script is slow, but works well enough for human users +viewing an occasional Info file. However, when someone's recursive Wget +download stumbles upon the index page that links to all the Info files +through the script, the system is brought to its knees without providing +anything useful to the user (This task of converting Info files could be +done locally and access to Info documentation for all installed GNU +software on a system is available from the @code{info} command). + +To avoid this kind of accident, as well as to preserve privacy for +documents that need to be protected from well-behaved robots, the +concept of @dfn{robot exclusion} was invented. The idea is that +the server administrators and document authors can specify which +portions of the site they wish to protect from robots and those +they will permit access. + +The most popular mechanism, and the @i{de facto} standard supported by +all the major robots, is the ``Robots Exclusion Standard'' (RES) written +by Martijn Koster et al. in 1994. It specifies the format of a text +file containing directives that instruct the robots which URL paths to +avoid. To be found by the robots, the specifications must be placed in +@file{/robots.txt} in the server root, which the robots are expected to +download and parse. + +Although Wget is not a web robot in the strictest sense of the word, it +can download large parts of the site without the user's intervention to +download an individual page. Because of that, Wget honors RES when +downloading recursively. For instance, when you issue: + +@example +wget -r http://www.example.com/ +@end example + +First the index of @samp{www.example.com} will be downloaded. If Wget +finds that it wants to download more documents from that server, it will +request @samp{http://www.example.com/robots.txt} and, if found, use it +for further downloads. @file{robots.txt} is loaded only once per each +server. + +Until version 1.8, Wget supported the first version of the standard, +written by Martijn Koster in 1994 and available at +@url{http://www.robotstxt.org/orig.html}. As of version 1.8, +Wget has supported the additional directives specified in the internet +draft @samp{<draft-koster-robots-00.txt>} titled ``A Method for Web +Robots Control''. The draft, which has as far as I know never made to +an @sc{rfc}, is available at +@url{http://www.robotstxt.org/norobots-rfc.txt}. + +This manual no longer includes the text of the Robot Exclusion Standard. + +The second, less known mechanism, enables the author of an individual +document to specify whether they want the links from the file to be +followed by a robot. This is achieved using the @code{META} tag, like +this: + +@example +<meta name="robots" content="nofollow"> +@end example + +This is explained in some detail at +@url{http://www.robotstxt.org/meta.html}. Wget supports this +method of robot exclusion in addition to the usual @file{/robots.txt} +exclusion. + +If you know what you are doing and really really wish to turn off the +robot exclusion, set the @code{robots} variable to @samp{off} in your +@file{.wgetrc}. You can achieve the same effect from the command line +using the @code{-e} switch, e.g. @samp{wget -e robots=off @var{url}...}. + +@node Security Considerations, Contributors, Robot Exclusion, Appendices +@section Security Considerations +@cindex security + +When using Wget, you must be aware that it sends unencrypted passwords +through the network, which may present a security problem. Here are the +main issues, and some solutions. + +@enumerate +@item +The passwords on the command line are visible using @code{ps}. The best +way around it is to use @code{wget -i -} and feed the @sc{url}s to +Wget's standard input, each on a separate line, terminated by @kbd{C-d}. +Another workaround is to use @file{.netrc} to store passwords; however, +storing unencrypted passwords is also considered a security risk. + +@item +Using the insecure @dfn{basic} authentication scheme, unencrypted +passwords are transmitted through the network routers and gateways. + +@item +The @sc{ftp} passwords are also in no way encrypted. There is no good +solution for this at the moment. + +@item +Although the ``normal'' output of Wget tries to hide the passwords, +debugging logs show them, in all forms. This problem is avoided by +being careful when you send debug logs (yes, even when you send them to +me). +@end enumerate + +@node Contributors, , Security Considerations, Appendices +@section Contributors +@cindex contributors + +GNU Wget was written by Hrvoje Nikšić @email{hniksic@@xemacs.org}, + +However, the development of Wget could never have gone as far as it has, were +it not for the help of many people, either with bug reports, feature proposals, +patches, or letters saying ``Thanks!''. + +Special thanks goes to the following people (no particular order): + +@itemize @bullet +@item Dan Harkless---contributed a lot of code and documentation of +extremely high quality, as well as the @code{--page-requisites} and +related options. He was the principal maintainer for some time and +released Wget 1.6. + +@item Ian Abbott---contributed bug fixes, Windows-related fixes, and +provided a prototype implementation of the breadth-first recursive +download. Co-maintained Wget during the 1.8 release cycle. + +@item +The dotsrc.org crew, in particular Karsten Thygesen---donated system +resources such as the mailing list, web space, @sc{ftp} space, and +version control repositories, along with a lot of time to make these +actually work. Christian Reiniger was of invaluable help with setting +up Subversion. + +@item +Heiko Herold---provided high-quality Windows builds and contributed +bug and build reports for many years. + +@item +Shawn McHorse---bug reports and patches. + +@item +Kaveh R. Ghazi---on-the-fly @code{ansi2knr}-ization. Lots of +portability fixes. + +@item +Gordon Matzigkeit---@file{.netrc} support. + +@item +Zlatko Čalušić, Tomislav Vujec and Dražen +Kačar---feature suggestions and ``philosophical'' discussions. + +@item +Darko Budor---initial port to Windows. + +@item +Antonio Rosella---help and suggestions, plus the initial Italian +translation. + +@item +Tomislav Petrović, Mario Mikočević---many bug reports and +suggestions. + +@item +Françis Pinard---many thorough bug reports and discussions. + +@item +Karl Eichwalder---lots of help with internationalization, Makefile +layout and many other things. + +@item +Junio Hamano---donated support for Opie and @sc{http} @code{Digest} +authentication. + +@item +Mauro Tortonesi---improved IPv6 support, adding support for dual +family systems. Refactored and enhanced FTP IPv6 code. Maintained GNU +Wget from 2004--2007. + +@item +Christopher G.@: Lewis---maintenance of the Windows version of GNU WGet. + +@item +Gisle Vanem---many helpful patches and improvements, especially for +Windows and MS-DOS support. + +@item +Ralf Wildenhues---contributed patches to convert Wget to use Automake as +part of its build process, and various bugfixes. + +@item +Steven Schubiger---Many helpful patches, bugfixes and improvements. +Notably, conversion of Wget to use the Gnulib quotes and quoteargs +modules, and the addition of password prompts at the console, via the +Gnulib getpasswd-gnu module. + +@item +Ted Mielczarek---donated support for CSS. + +@item +Saint Xavier---Support for IRIs (RFC 3987). + +@item +Tim Rühsen---Loads of helpful patches, especially fuzzing support and +Continuous Integration. Maintainer since 2014. + +@item +Darshit Shah---Many helpful patches. Community support on various platforms. +Maintainer since 2014. + +@item +People who provided donations for development---including Brian Gough. +@end itemize + +The following people have provided patches, bug/build reports, useful +suggestions, beta testing services, fan mail and all the other things +that make maintenance so much fun: + +Tim Adam, +Adrian Aichner, +Martin Baehr, +Dieter Baron, +Roger Beeman, +Dan Berger, +T.@: Bharath, +Christian Biere, +Paul Bludov, +Daniel Bodea, +Mark Boyns, +John Burden, +Julien Buty, +Wanderlei Cavassin, +Gilles Cedoc, +Tim Charron, +Noel Cragg, +Kristijan Čonkaš, +John Daily, +Andreas Damm, +Ahmon Dancy, +Andrew Davison, +Bertrand Demiddelaer, +Alexander Dergachev, +Andrew Deryabin, +Ulrich Drepper, +Marc Duponcheel, +Damir Džeko, +Alan Eldridge, +Hans-Andreas Engel, +Aleksandar Erkalović, +Andy Eskilsson, +João Ferreira, +Christian Fraenkel, +David Fritz, +Mike Frysinger, +Charles C.@: Fu, +FUJISHIMA Satsuki, +Masashi Fujita, +Howard Gayle, +Marcel Gerrits, +Lemble Gregory, +Hans Grobler, +Alain Guibert, +Mathieu Guillaume, +Aaron Hawley, +Jochen Hein, +Karl Heuer, +Madhusudan Hosaagrahara, +HIROSE Masaaki, +Ulf Harnhammar, +Gregor Hoffleit, +Erik Magnus Hulthen, +Richard Huveneers, +Jonas Jensen, +Larry Jones, +Simon Josefsson, +Mario Jurić, +Hack Kampbjørn, +Const Kaplinsky, +Goran Kezunović, +Igor Khristophorov, +Robert Kleine, +KOJIMA Haime, +Fila Kolodny, +Alexander Kourakos, +Martin Kraemer, +Sami Krank, +Jay Krell, +Σίμος Ξενιτέλλης (Simos KSenitellis), +Christian Lackas, +Hrvoje Lacko, +Daniel S.@: Lewart, +Nicolás Lichtmeier, +Dave Love, +Alexander V.@: Lukyanov, +Thomas Lußnig, +Andre Majorel, +Aurelien Marchand, +Matthew J.@: Mellon, +Jordan Mendelson, +Ted Mielczarek, +Robert Millan, +Lin Zhe Min, +Jan Minar, +Tim Mooney, +Keith Moore, +Adam D.@: Moss, +Simon Munton, +Charlie Negyesi, +R.@: K.@: Owen, +Jim Paris, +Kenny Parnell, +Leonid Petrov, +Simone Piunno, +Andrew Pollock, +Steve Pothier, +Jan Přikryl, +Marin Purgar, +Csaba Ráduly, +Keith Refson, +Bill Richardson, +Tyler Riddle, +Tobias Ringstrom, +Jochen Roderburg, +Juan José Rodríguez, +Maciej W.@: Rozycki, +Edward J.@: Sabol, +Heinz Salzmann, +Robert Schmidt, +Nicolas Schodet, +Benno Schulenberg, +Andreas Schwab, +Steven M.@: Schweda, +Chris Seawood, +Pranab Shenoy, +Dennis Smit, +Toomas Soome, +Tage Stabell-Kulo, +Philip Stadermann, +Daniel Stenberg, +Sven Sternberger, +Markus Strasser, +John Summerfield, +Szakacsits Szabolcs, +Mike Thomas, +Philipp Thomas, +Mauro Tortonesi, +Dave Turner, +Gisle Vanem, +Rabin Vincent, +Russell Vincent, +Željko Vrba, +Charles G Waldman, +Douglas E.@: Wegscheid, +Ralf Wildenhues, +Joshua David Williams, +Benjamin Wolsey, +Saint Xavier, +YAMAZAKI Makoto, +Jasmin Zainul, +Bojan Ždrnja, +Kristijan Zimmer, +Xin Zou. + +Apologies to all who I accidentally left out, and many thanks to all the +subscribers of the Wget mailing list. + +@node Copying this manual, Concept Index, Appendices, Top +@appendix Copying this manual + +@menu +* GNU Free Documentation License:: License for copying this manual. +@end menu + +@node GNU Free Documentation License, , Copying this manual, Copying this manual +@appendixsec GNU Free Documentation License +@cindex FDL, GNU Free Documentation License + +@include fdl.texi + + +@node Concept Index, , Copying this manual, Top +@unnumbered Concept Index +@printindex cp + +@contents + +@bye |