diff options
Diffstat (limited to 'upstream/fedora-40/man1/wget2.1')
-rw-r--r-- | upstream/fedora-40/man1/wget2.1 | 2726 |
1 files changed, 2726 insertions, 0 deletions
diff --git a/upstream/fedora-40/man1/wget2.1 b/upstream/fedora-40/man1/wget2.1 new file mode 100644 index 00000000..29a4bea5 --- /dev/null +++ b/upstream/fedora-40/man1/wget2.1 @@ -0,0 +1,2726 @@ +.\" Automatically generated by Pandoc 3.1.3 +.\" +.\" Define V font for inline verbatim, using C font in formats +.\" that render this, and otherwise B font. +.ie "\f[CB]x\f[]"x" \{\ +. ftr V B +. ftr VI BI +. ftr VB B +. ftr VBI BI +.\} +.el \{\ +. ftr V CR +. ftr VI CI +. ftr VB CB +. ftr VBI CBI +.\} +.TH "WGET2" "1" "" "GNU Wget2 User Manual" "GNU Wget2 2.1.0" +.hy +.SH Name +.PP +Wget2 - a recursive metalink/file/website downloader. +.SH Synopsis +.PP +\f[V]wget2 [options]... [URL]...\f[R] +.SH Description +.PP +GNU Wget2 is a free utility for non-interactive download of files from +the Web. +It supports HTTP and HTTPS protocols, as well as retrieval through +HTTP(S) proxies. +.PP +Wget2 is non-interactive, meaning that it can work in the background, +while the user is not logged on. +This allows you to start a retrieval and disconnect from the system, +letting Wget2 finish the work. +By contrast, most of the Web browsers require constant user\[cq]s +presence, which can be a great hindrance when transferring a lot of +data. +.PP +Wget2 can follow links in HTML, XHTML, CSS, RSS, Atom and sitemap files +to create local versions of remote web sites, fully recreating the +directory structure of the original site. +This is sometimes referred to as \f[I]recursive downloading\f[R]. +While doing that, Wget2 respects the Robot Exclusion Standard +(\f[I]/robots.txt\f[R]). +Wget2 can be instructed to convert the links in downloaded files to +point at the local files, for offline viewing. +.PP +Wget2 has been designed for robustness over slow or unstable network +connections; if a download fails due to a network problem, it will keep +retrying until the whole file has been retrieved. +If the server supports partial downloads, it may continue the download +from where it left off. +.SH Options +.SS Option Syntax +.PP +Every option has a long form and sometimes also a short one. +Long options are more convenient to remember, but take time to type. +You may freely mix different option styles. +Thus you may write: +.IP +.nf +\f[C] + wget2 -r --tries=10 https://example.com/ -o log +\f[R] +.fi +.PP +The space between the option accepting an argument and the argument may +be omitted. +Instead of \f[V]-o log\f[R] you can write \f[V]-olog\f[R]. +.PP +You may put several options that do not require arguments together, +like: +.IP +.nf +\f[C] + wget2 -drc <URL> +\f[R] +.fi +.PP +This is equivalent to: +.IP +.nf +\f[C] + wget2 -d -r -c <URL> +\f[R] +.fi +.PP +Since the options can be specified after the arguments, you may +terminate them with \f[V]--\f[R]. +So the following will try to download URL \f[V]-x\f[R], reporting +failure to \f[V]log\f[R]: +.IP +.nf +\f[C] + wget2 -o log -- -x +\f[R] +.fi +.PP +The options that accept comma-separated lists all respect the convention +that prepending \f[V]--no-\f[R] clears its value. +This can be useful to clear the \f[V].wget2rc\f[R] settings. +For instance, if your \f[V].wget2rc\f[R] sets +\f[V]exclude-directories\f[R] to \f[V]/cgi-bin\f[R], the following +example will first reset it, and then set it to exclude \f[V]/priv\f[R] +and \f[V]/trash\f[R]. +You can also clear the lists in \f[V].wget2rc\f[R]. +.IP +.nf +\f[C] + wget2 --no-exclude-directories -X /priv,/trash +\f[R] +.fi +.PP +Most options that do not accept arguments are boolean options, so named +because their state can be captured with a yes-or-no (\[lq]boolean\[rq]) +variable. +A boolean option is either affirmative or negative (beginning with +\f[V]--no-\f[R]). +All such options share several properties. +.PP +Affirmative options can be negated by prepending the \f[V]--no-\f[R] to +the option name; negative options can be negated by omitting the +\f[V]--no-\f[R] prefix. +This might seem superfluous - if the default for an affirmative option +is to not do something, then why provide a way to explicitly turn it +off? +But the startup file may in fact change the default. +For instance, using \f[V]timestamping = on\f[R] in \f[V].wget2rc\f[R] +makes Wget2 download updated files only. +Using \f[V]--no-timestamping\f[R] is the only way to restore the factory +default from the command line. +.SS Basic Startup Options +.SS \f[V]-V\f[R], \f[V]--version\f[R] +.PP +Display the version of Wget2. +.SS \f[V]-h\f[R], \f[V]--help\f[R] +.PP +Print a help message describing all of Wget2\[cq]s command-line options. +.SS \f[V]-b\f[R], \f[V]--background\f[R] +.PP +Go to background immediately after startup. +If no output file is specified via the \f[V]-o\f[R], output is +redirected to \f[V]wget-log\f[R]. +.SS \f[V]-e\f[R], \f[V]--execute=command\f[R] +.PP +Execute command as if it were a part of \f[V].wget2rc\f[R]. +A command thus invoked will be executed after the commands in +\f[V].wget2rc\f[R], thus taking precedence over them. +If you need to specify more than one wget2rc command, use multiple +instances of \f[V]-e\f[R]. +.SS \f[V]--hyperlink\f[R] +.PP +Hyperlink names of downloaded files so that they can opened from the +terminal by clicking on them. +Only a few terminal emulators currently support hyperlinks. +Enable this option if you know your terminal supports hyperlinks. +.SS Logging and Input File Options +.SS \f[V]-o\f[R], \f[V]--output-file=logfile\f[R] +.PP +Log all messages to \f[V]logfile\f[R]. +The messages are normally reported to standard error. +.SS \f[V]-a\f[R], \f[V]--append-output=logfile\f[R] +.PP +Append to \f[V]logfile\f[R]. +This is the same as \f[V]-o\f[R], only it appends to \f[V]logfile\f[R] +instead of overwriting the old log file. +If \f[V]logfile\f[R] does not exist, a new file is created. +.SS \f[V]-d\f[R], \f[V]--debug\f[R] +.PP +Turn on debug output, meaning various information important to the +developers of Wget2 if it does not work properly. +Your system administrator may have chosen to compile Wget2 without debug +support, in which case \f[V]-d\f[R] will not work. +Please note that compiling with debug support is always safe, Wget2 +compiled with the debug support will not print any debug info unless +requested with \f[V]-d\f[R]. +.SS \f[V]-q\f[R], \f[V]--quiet\f[R] +.PP +Turn off Wget2\[cq]s output. +.SS \f[V]-v\f[R], \f[V]--verbose\f[R] +.PP +Turn on verbose output, with all the available data. +The default output is verbose. +.SS \f[V]-nv\f[R], \f[V]--no-verbose\f[R] +.PP +Turn off verbose without being completely quiet (use \f[V]-q\f[R] for +that), which means that error messages and basic information still get +printed. +.SS \f[V]--report-speed=type\f[R] +.PP +Output bandwidth as \f[V]type\f[R]. +The only accepted values are \f[V]bytes\f[R] (which is set by default) +and \f[V]bits\f[R]. +This option only works if \f[V]--progress=bar\f[R] is also set. +.SS \f[V]-i\f[R], \f[V]--input-file=file\f[R] +.PP +Read URLs from a local or external file. +If \f[V]-\f[R] is specified as file, URLs are read from the standard +input. +Use \f[V]./-\f[R] to read from a file literally named \f[V]-\f[R]. +.PP +If this function is used, no URLs need be present on the command line. +If there are URLs both on the command line and in an input file, those +on the command lines will be the first ones to be retrieved. +\f[V]file\f[R] is expected to contain one URL per line, except one of +the \f[V]--force-\f[R] options specifies a different format. +.PP +If you specify \f[V]--force-html\f[R], the document will be regarded as +HTML. +In that case you may have problems with relative links, which you can +solve either by adding \f[V]<base href=\[dq]url\[dq]>\f[R] to the +documents or by specifying \f[V]--base=url\f[R] on the command line. +.PP +If you specify \f[V]--force-css\f[R], the document will be regarded as +CSS. +.PP +If you specify \f[V]--force-sitemap\f[R], the document will be regarded +as XML sitemap. +.PP +If you specify \f[V]--force-atom\f[R], the document will be regarded as +Atom Feed. +.PP +If you specify \f[V]--force-rss\f[R], the document will be regarded as +RSS Feed. +.PP +If you specify \f[V]--force-metalink\f[R], the document will be regarded +as Metalink description. +.PP +If you have problems with relative links, you should use +\f[V]--base=url\f[R] on the command line. +.SS \f[V]-F\f[R], \f[V]--force-html\f[R] +.PP +When input is read from a file, force it to be treated as an HTML file. +This enables you to retrieve relative links from existing HTML files on +your local disk, by adding \[lq]\[rq] to HTML, or using the +\f[V]--base\f[R] command-line option. +.SS \f[V]--force-css\f[R] +.PP +Read and parse the input file as CSS. +This enables you to retrieve links from existing CSS files on your local +disk. +You will need \f[V]--base\f[R] to handle relative links correctly. +.SS \f[V]--force-sitemap\f[R] +.PP +Read and parse the input file as sitemap XML. +This enables you to retrieve links from existing sitemap files on your +local disk. +You will need \f[V]--base\f[R] to handle relative links correctly. +.SS \f[V]--force-atom\f[R] +.PP +Read and parse the input file as Atom Feed XML. +This enables you to retrieve links from existing sitemap files on your +local disk. +You will need \f[V]--base\f[R] to handle relative links correctly. +.SS \f[V]--force-rss\f[R] +.PP +Read and parse the input file as RSS Feed XML. +This enables you to retrieve links from existing sitemap files on your +local disk. +You will need \f[V]--base\f[R] to handle relative links correctly. +.SS \f[V]--force-metalink\f[R] +.PP +Read and parse the input file as Metalink. +This enables you to retrieve links from existing Metalink files on your +local disk. +You will need \f[V]--base\f[R] to handle relative links correctly. +.SS \f[V]-B\f[R], \f[V]--base=URL\f[R] +.PP +Resolves relative links using URL as the point of reference, when +reading links from an HTML file specified via the +\f[V]-i\f[R]/\f[V]--input-file\f[R] option (together with a +\f[V]--force\f[R]\&... +option, or when the input file was fetched remotely from a server +describing it as HTML, CSS, Atom or RSS). +This is equivalent to the presence of a \[lq]BASE\[rq] tag in the HTML +input file, with URL as the value for the \[lq]href\[rq] attribute. +.PP +For instance, if you specify \f[V]https://example.com/bar/a.html\f[R] +for URL, and Wget2 reads \f[V]../baz/b.html\f[R] from the input file, it +would be resolved to \f[V]https://example.com/baz/b.html\f[R]. +.SS \f[V]--config=FILE\f[R] +.PP +Specify the location of configuration files you wish to use. +If you specify more than one file, either by using a comma-separated +list or several \f[V]--config\f[R] options, these files are read in +left-to-right order. +The files given in \f[V]$SYSTEM_WGET2RC\f[R] and (\f[V]$WGET2RC\f[R] or +\f[V]\[ti]/.wget2rc\f[R]) are read in that order and then the +user-provided config file(s). +If set, \f[V]$WGET2RC\f[R] replaces \f[V]\[ti]/.wget2rc\f[R]. +.PP +\f[V]--no-config\f[R] empties the internal list of config files. +So if you want to prevent reading any config files, give +\f[V]--no-config\f[R] on the command line. +.PP +\f[V]--no-config\f[R] followed by \f[V]--config=file\f[R] just reads +\f[V]file\f[R] and skips reading the default config files. +.PP +Wget will attempt to tilde-expand filenames written in the configuration +file on supported platforms. +To use a file that starts with the character literal `\[ti]', use +\[lq]./\[ti]\[rq] or an absolute path. +.SS \f[V]--rejected-log=logfile\f[R] [Not implemented yet] +.PP +Logs all URL rejections to logfile as comma separated values. +The values include the reason of rejection, the URL and the parent URL +it was found in. +.SS \f[V]--local-db\f[R] +.PP +Enables reading/writing to local database files (default: on). +.PP +These are the files for \f[V]--hsts\f[R], \f[V]--hpkp\f[R], +\f[V]--ocsp\f[R], etc. +.PP +With \f[V]--no-local-db\f[R] you can switch reading/writing off, +e.g.\ useful for testing. +.PP +This option does not influence the reading of config files. +.SS \f[V]--stats-dns=[FORMAT:]FILE\f[R] +.PP +Save DNS stats in format \f[V]FORMAT\f[R], in file \f[V]FILE\f[R]. +.PP +\f[V]FORMAT\f[R] can be \f[V]human\f[R] or \f[V]csv\f[R]. +\f[V]-\f[R] is shorthand for \f[V]stdout\f[R] and \f[V]h\f[R] is +shorthand for \f[V]human\f[R]. +.PP +The CSV output format is +.PP +Hostname,IP,Port,Duration +.IP +.nf +\f[C] +\[ga]Duration\[ga] is given in milliseconds. +\f[R] +.fi +.SS \f[V]--stats-tls=[FORMAT:]FILE\f[R] +.PP +Save TLS stats in format \f[V]FORMAT\f[R], in file \f[V]FILE\f[R]. +.PP +\f[V]FORMAT\f[R] can be \f[V]human\f[R] or \f[V]csv\f[R]. +\f[V]-\f[R] is shorthand for \f[V]stdout\f[R] and \f[V]h\f[R] is +shorthand for \f[V]human\f[R]. +.PP +The CSV output format is +.PP +Hostname,TLSVersion,FalseStart,TFO,Resumed,ALPN,HTTPVersion,Certificates,Duration +.IP +.nf +\f[C] +\[ga]TLSVersion\[ga] can be 1,2,3,4,5 for SSL3, TLS1.0, TLS1.1, TLS1.2 and TLS1.3. -1 means \[aq]None\[aq]. + +\[ga]FalseStart\[ga] whether the connection used TLS False Start. -1 if not applicable. + +\[ga]TFO\[ga] whether the connection used TCP Fast Open. -1 is TFO was disabled. + +\[ga]Resumed\[ga] whether the TLS session was resumed or not. + +\[ga]ALPN\[ga] is the ALPN negotiation string. + +\[ga]HTTPVersion\[ga] is 0 for HTTP 1.1 and 1 is for HTTP 2.0. + +\[ga]Certificates\[ga] is the size of the server\[aq]s certificate chain. + +\[ga]Duration\[ga] is given in milliseconds. +\f[R] +.fi +.SS \f[V]--stats-ocsp=[FORMAT:]FILE\f[R] +.PP +Save OCSP stats in format \f[V]FORMAT\f[R], in file \f[V]FILE\f[R]. +.PP +\f[V]FORMAT\f[R] can be \f[V]human\f[R] or \f[V]csv\f[R]. +\f[V]-\f[R] is shorthand for \f[V]stdout\f[R] and \f[V]h\f[R] is +shorthand for \f[V]human\f[R]. +.PP +The CSV output format is +.PP +Hostname,Stapling,Valid,Revoked,Ignored +.IP +.nf +\f[C] +\[ga]Stapling\[ga] whether an OCSP response was stapled or not. + +\[ga]Valid\[ga] how many server certificates were valid regarding OCSP. + +\[ga]Revoked\[ga] how many server certificates were revoked regarding OCSP. + +\[ga]Ignored\[ga] how many server certificates had been ignored or OCSP responses missing. +\f[R] +.fi +.SS \f[V]--stats-server=[FORMAT:]FILE\f[R] +.PP +Save Server stats in format \f[V]FORMAT\f[R], in file \f[V]FILE\f[R]. +.PP +\f[V]FORMAT\f[R] can be \f[V]human\f[R] or \f[V]csv\f[R]. +\f[V]-\f[R] is shorthand for \f[V]stdout\f[R] and \f[V]h\f[R] is +shorthand for \f[V]human\f[R]. +.PP +The CSV output format is +.PP +Hostname,IP,Scheme,HPKP,NewHPKP,HSTS,CSP +.IP +.nf +\f[C] +\[ga]Scheme\[ga] 0,1,2 mean \[ga]None\[ga], \[ga]http\[ga], \[ga]https\[ga]. + + \[ga]HPKP\[ga] values 0,1,2,3 mean \[aq]No HPKP\[aq], \[aq]HPKP matched\[aq], \[aq]HPKP doesn\[aq]t match\[aq], \[aq]HPKP error\[aq]. + +\[ga]NewHPKP\[ga] whether server sent HPKP (Public-Key-Pins) header. + +\[ga]HSTS\[ga] whether server sent HSTS (Strict-Transport-Security) header. + +\[ga]CSP\[ga] whether server sent CSP (Content-Security-Policy) header. +\f[R] +.fi +.SS \f[V]--stats-site=[FORMAT:]FILE\f[R] +.PP +Save Site stats in format \f[V]FORMAT\f[R], in file \f[V]FILE\f[R]. +.PP +\f[V]FORMAT\f[R] can be \f[V]human\f[R] or \f[V]csv\f[R]. +\f[V]-\f[R] is shorthand for \f[V]stdout\f[R] and \f[V]h\f[R] is +shorthand for \f[V]human\f[R]. +.PP +The CSV output format is +.PP +ID,ParentID,URL,Status,Link,Method,Size,SizeDecompressed,TransferTime,ResponseTime,Encoding,Verification +.IP +.nf +\f[C] +\[ga]ID\[ga] unique ID for a stats record. + +\[ga]ParentID\[ga] ID of the parent document, relevant for \[ga]--recursive\[ga] mode. + +\[ga]URL\[ga] URL of the document. + +\[ga]Status\[ga] HTTP response code or 0 if not applicable. + +\[ga]Link\[ga] 1 means \[aq]direkt link\[aq], 0 means \[aq]redirection link\[aq]. + +\[ga]Method\[ga] 1,2,3 mean GET, HEAD, POST request type. + +\[ga]Size\[ga] size of downloaded body (theoretical value for HEAD requests). + +\[ga]SizeDecompressed\[ga] size of decompressed body (0 for HEAD requests). + +\[ga]TransferTime\[ga] ms between start of request and completed download. + +\[ga]ResponseTime\[ga] ms between start of request and first response packet. + +\[ga]Encoding\[ga] 0,1,2,3,4,5 mean server side compression was \[aq]identity\[aq], \[aq]gzip\[aq], \[aq]deflate\[aq], \[aq]lzma/xz\[aq], \[aq]bzip2\[aq], \[aq]brotli\[aq], \[aq]zstd\[aq], \[aq]lzip\[aq] + +\[ga]Verification\[ga] PGP verification status. 0,1,2,3 mean \[aq]none\[aq], \[aq]valid\[aq], \[aq]invalid\[aq], \[aq]bad\[aq], \[aq]missing\[aq]. +\f[R] +.fi +.SS Download Options +.SS \f[V]--bind-address=ADDRESS\f[R] +.PP +When making client TCP/IP connections, bind to ADDRESS on the local +machine. +ADDRESS may be specified as a hostname or IP address. +This option can be useful if your machine is bound to multiple IPs. +.SS \f[V]--bind-interface=INTERFACE\f[R] +.PP +When making client TCP/IP connections, bind to INTERFACE on the local +machine. +INTERFACE may be specified as the name for a Network Interface. +This option can be useful if your machine has multiple Network +Interfaces. +However, the option works only when wget2 is run with elevated +privileges (On GNU/Linux: root / sudo or +\f[V]sudo setcap cap_net_raw+ep <path to wget|wget2>\f[R]). +.SS \f[V]-t\f[R], \f[V]--tries=number\f[R] +.PP +Set number of tries to number. +Specify 0 or inf for infinite retrying. +The default is to retry 20 times, with the exception of fatal errors +like \[lq]connection refused\[rq] or \[lq]not found\[rq] (404), which +are not retried. +.SS \f[V]--retry-on-http-error=list\f[R] +.PP +Specify a comma-separated list of HTTP codes in which Wget2 will retry +the download. +The elements of the list may contain wildcards. +If an HTTP code starts with the character `!' it won\[cq]t be +downloaded. +This is useful when trying to download something with exceptions. +For example, retry every failed download if error code is not 404: +.IP +.nf +\f[C] + wget2 --retry-on-http-error=*,\[rs]!404 https://example.com/ +\f[R] +.fi +.PP +Please keep in mind that \[lq]200\[rq] is the only forbidden code. +If it is included on the status list Wget2 will ignore it. +The max. +number of download attempts is given by the \f[V]--tries\f[R] option. +.SS \f[V]-O\f[R], \f[V]--output-document=file\f[R] +.PP +The documents will not be written to the appropriate files, but all will +be concatenated together and written to file. +If \f[V]-\f[R] is used as file, documents will be printed to standard +output, disabling link conversion. +Use \f[V]./-\f[R] to print to a file literally named \f[V]-\f[R]. +To not get Wget2 status messages mixed with file content, use +\f[V]-q\f[R] in combination with \f[V]-O-\f[R] (This is different to how +Wget 1.x behaves). +.PP +Using \f[V]-r\f[R] or \f[V]-p\f[R] with \f[V]-O\f[R] may not work as you +expect: Wget2 won\[cq]t just download the first file to file and then +download the rest to their normal names: all downloaded content will be +placed in file. +.PP +A combination with \f[V]-nc\f[R] is only accepted if the given output +file does not exist. +.PP +When used along with the \f[V]-c\f[R] option, Wget2 will attempt to +continue downloading the file whose name is passed to the option, +irrespective of whether the actual file already exists on disk or not. +This allows users to download a file with a temporary name alongside the +actual file. +.PP +Note that a combination with \f[V]-k\f[R] is only permitted when +downloading a single document, as in that case it will just convert all +relative URIs to external ones; \f[V]-k\f[R] makes no sense for multiple +URIs when they\[cq]re all being downloaded to a single file; +\f[V]-k\f[R] can be used only when the output is a regular file. +.PP +Compatibility-Note: Wget 1.x used to treat \f[V]-O\f[R] as analogous to +shell redirection. +Wget2 does not handle the option similarly. +Hence, the file will not always be newly created. +The file\[cq]s timestamps will not be affected unless it is actually +written to. +As a result, both \f[V]-c\f[R] and \f[V]-N\f[R] options are now +supported in conjunction with this option. +.SS \f[V]-nc\f[R], \f[V]--no-clobber\f[R] +.PP +If a file is downloaded more than once in the same directory, +Wget2\[cq]s behavior depends on a few options, including \f[V]-nc\f[R]. +In certain cases, the local file will be clobbered, or overwritten, upon +repeated download. +In other cases it will be preserved. +.PP +When running Wget2 without \f[V]-N\f[R], \f[V]-nc\f[R], \f[V]-r\f[R], or +\f[V]-p\f[R], downloading the same file in the same directory will +result in the original copy of file being preserved and the second copy +being named file.1. +If that file is downloaded yet again, the third copy will be named +file.2, and so on. +(This is also the behavior with \f[V]-nd\f[R], even if \f[V]-r\f[R] or +\f[V]-p\f[R] are in effect.) +Use \f[V]--keep-extension\f[R] to use an alternative file naming +pattern. +.PP +When \f[V]-nc\f[R] is specified, this behavior is suppressed, and Wget2 +will refuse to download newer copies of file. +Therefore, \[lq]\[lq]no-clobber\[rq]\[rq] is actually a misnomer in this +mode - it\[cq]s not clobbering that\[cq]s prevented (as the numeric +suffixes were already preventing clobbering), but rather the multiple +version saving that\[cq]s prevented. +.PP +When running Wget2 with \f[V]-r\f[R] or \f[V]-p\f[R], but without +\f[V]-N\f[R], \f[V]-nd\f[R], or \f[V]-nc\f[R], re-downloading a file +will result in the new copy simply overwriting the old. +Adding \f[V]-nc\f[R] will prevent this behavior, instead causing the +original version to be preserved and any newer copies on the server to +be ignored. +.PP +When running Wget2 with \f[V]-N\f[R], with or without \f[V]-r\f[R] or +\f[V]-p\f[R], the decision as to whether or not to download a newer copy +of a file depends on the local and remote timestamp and size of the +file. +\f[V]-nc\f[R] may not be specified at the same time as \f[V]-N\f[R]. +.PP +A combination with \f[V]-O\f[R]/\f[V]--output-document\f[R] is only +accepted if the given output file does not exist. +.PP +Note that when \f[V]-nc\f[R] is specified, files with the suffixes .html +or .htm will be loaded from the local disk and parsed as if they had +been retrieved from the Web. +.SS \f[V]--backups=backups\f[R] +.PP +Before (over)writing a file, back up an existing file by adding a .1 +suffix to the file name. +Such backup files are rotated to .2, .3, and so on, up to +\f[V]backups\f[R] (and lost beyond that). +.SS \f[V]-c\f[R], \f[V]--continue\f[R] +.PP +Continue getting a partially-downloaded file. +This is useful when you want to finish up a download started by a +previous instance of Wget2, or by another program. +For instance: +.IP +.nf +\f[C] + wget2 -c https://example.com/tarball.gz +\f[R] +.fi +.PP +If there is a file named \f[V]tarball.gz\f[R] in the current directory, +Wget2 will assume that it is the first portion of the remote file, and +will ask the server to continue the retrieval from an offset equal to +the length of the local file. +.PP +Note that you don\[cq]t need to specify this option if you just want the +current invocation of Wget2 to retry downloading a file should the +connection be lost midway through. +This is the default behavior. +\f[V]-c\f[R] only affects resumption of downloads started prior to this +invocation of Wget2, and whose local files are still sitting around. +.PP +Without \f[V]-c\f[R], the previous example would just download the +remote file to \f[V]tarball.gz.1\f[R], leaving the truncated +\f[V]tarball.gz\f[R] file alone. +.PP +If you use \f[V]-c\f[R] on a non-empty file, and it turns out that the +server does not support continued downloading, Wget2 will refuse to +start the download from scratch, which would effectively ruin existing +contents. +If you really want the download to start from scratch, remove the file. +.PP +If you use \f[V]-c\f[R] on a file which is of equal size as the one on +the server, Wget2 will refuse to download the file and print an +explanatory message. +The same happens when the file is smaller on the server than locally +(presumably because it was changed on the server since your last +download attempt). +Because \[lq]continuing\[rq] is not meaningful, no download occurs. +.PP +On the other side of the coin, while using \f[V]-c\f[R], any file +that\[cq]s bigger on the server than locally will be considered an +incomplete download and only \[lq](length(remote) - length(local))\[rq] +bytes will be downloaded and tacked onto the end of the local file. +This behavior can be desirable in certain cases. +For instance, you can use \f[V]wget2 -c\f[R] to download just the new +portion that\[cq]s been appended to a data collection or log file. +.PP +However, if the file is bigger on the server because it\[cq]s been +changed, as opposed to just appended to, you\[cq]ll end up with a +garbled file. +Wget2 has no way of verifying that the local file is really a valid +prefix of the remote file. +You need to be especially careful of this when using \f[V]-c\f[R] in +conjunction with \f[V]-r\f[R], since every file will be considered as an +\[lq]incomplete download\[rq] candidate. +.PP +Another instance where you\[cq]ll get a garbled file if you try to use +\f[V]-c\f[R] is if you have a lame HTTP proxy that inserts a +\[lq]transfer interrupted\[rq] string into the local file. +In the future a \[lq]rollback\[rq] option may be added to deal with this +case. +.PP +Note that \f[V]-c\f[R] only works with HTTP servers that support the +\[lq]Range\[rq] header. +.SS \f[V]--start-pos=OFFSET\f[R] +.PP +Start downloading at zero-based position \f[V]OFFSET\f[R]. +Offset may be expressed in bytes, kilobytes with the +\f[V]k\[aq] suffix, or megabytes with the\f[R]m\[cq] suffix, etc. +.PP +\f[V]--start-pos\f[R] has higher precedence over \f[V]--continue\f[R]. +When \f[V]--start-pos\f[R] and \f[V]--continue\f[R] are both specified, +Wget2 will emit a warning then proceed as if \f[V]--continue\f[R] was +absent. +.PP +Server support for continued download is required, otherwise +\[en]start-pos cannot help. +See \f[V]-c\f[R] for details. +.SS \f[V]--progress=type\f[R] +.PP +Select the type of the progress indicator you wish to use. +Supported indicator types are \f[V]none\f[R] and \f[V]bar\f[R]. +.PP +Type \f[V]bar\f[R] draws an ASCII progress bar graphics (a.k.a +\[lq]thermometer\[rq] display) indicating the status of retrieval. +.PP +If the output is a TTY, \f[V]bar\f[R] is the default. +Else, the progress bar will be switched off, except when using +\f[V]--force-progress\f[R]. +.PP +The type `dot' is currently not supported, but won\[cq]t trigger an +error to not break wget command lines. +.PP +The parameterized types \f[V]bar:force\f[R] and +\f[V]bar:force:noscroll\f[R] will add the effect of +\f[V]--force-progress\f[R]. +These are accepted for better wget compatibility. +.SS \f[V]--force-progress\f[R] +.PP +Force Wget2 to display the progress bar in any verbosity. +.PP +By default, Wget2 only displays the progress bar in verbose mode. +One may however, want Wget2 to display the progress bar on screen in +conjunction with any other verbosity modes like \f[V]--no-verbose\f[R] +or \f[V]--quiet\f[R]. +This is often a desired a property when invoking Wget2 to download +several small/large files. +In such a case, Wget2 could simply be invoked with this parameter to get +a much cleaner output on the screen. +.PP +This option will also force the progress bar to be printed to stderr +when used alongside the \f[V]--output-file\f[R] option. +.SS \f[V]-N\f[R], \f[V]--timestamping\f[R] +.PP +Turn on time-stamping. +.SS \f[V]--no-if-modified-since\f[R] +.PP +Do not send If-Modified-Since header in \f[V]-N\f[R] mode. +Send preliminary HEAD request instead. +This has only effect in \f[V]-N\f[R] mode. +.SS \f[V]--no-use-server-timestamps\f[R] +.PP +Don\[cq]t set the local file\[cq]s timestamp by the one on the server. +.PP +By default, when a file is downloaded, its timestamps are set to match +those from the remote file. +This allows the use of \f[V]--timestamping\f[R] on subsequent +invocations of Wget2. +However, it is sometimes useful to base the local file\[cq]s timestamp +on when it was actually downloaded; for that purpose, the +\f[V]--no-use-server-timestamps\f[R] option has been provided. +.SS \f[V]-S\f[R], \f[V]--server-response\f[R] +.PP +Print the response headers sent by HTTP servers. +.SS \f[V]--spider\f[R] +.PP +When invoked with this option, Wget2 will behave as a Web spider, which +means that it will not download the pages, just check that they are +there. +For example, you can use Wget2 to check your bookmarks: +.IP +.nf +\f[C] + wget2 --spider --force-html -i bookmarks.html +\f[R] +.fi +.PP +This feature needs much more work for Wget2 to get close to the +functionality of real web spiders. +.SS \f[V]-T seconds\f[R], \f[V]--timeout=seconds\f[R] +.PP +Set the network timeout to seconds seconds. +This is equivalent to specifying \f[V]--dns-timeout\f[R], +\f[V]--connect-timeout\f[R], and \f[V]--read-timeout\f[R], all at the +same time. +.PP +When interacting with the network, Wget2 can check for timeout and abort +the operation if it takes too long. +This prevents anomalies like hanging reads and infinite connects. +The only timeout enabled by default is a 900-second read timeout. +Setting a timeout to 0 disables it altogether. +Unless you know what you are doing, it is best not to change the default +timeout settings. +.PP +All timeout-related options accept decimal values, as well as subsecond +values. +For example, 0.1 seconds is a legal (though unwise) choice of timeout. +Subsecond timeouts are useful for checking server response times or for +testing network latency. +.SS \f[V]--dns-timeout=seconds\f[R] +.PP +Set the DNS lookup timeout to seconds seconds. +DNS lookups that don\[cq]t complete within the specified time will fail. +By default, there is no timeout on DNS lookups, other than that +implemented by system libraries. +.SS \f[V]--connect-timeout=seconds\f[R] +.PP +Set the connect timeout to seconds seconds. +TCP connections that take longer to establish will be aborted. +By default, there is no connect timeout, other than that implemented by +system libraries. +.SS \f[V]--read-timeout=seconds\f[R] +.PP +Set the read (and write) timeout to seconds seconds. +The \[lq]time\[rq] of this timeout refers to idle time: if, at any point +in the download, no data is received for more than the specified number +of seconds, reading fails and the download is restarted. +This option does not directly affect the duration of the entire +download. +.PP +Of course, the remote server may choose to terminate the connection +sooner than this option requires. +The default read timeout is 900 seconds. +.SS \f[V]--limit-rate=amount\f[R] +.PP +Limit the download speed to amount bytes per second. +Amount may be expressed in bytes, kilobytes with the k suffix, or +megabytes with the m suffix. +For example, \f[V]--limit-rate=20k\f[R] will limit the retrieval rate to +20KB/s. +This is useful when, for whatever reason, you don\[cq]t want Wget2 to +consume the entire available bandwidth. +.PP +This option allows the use of decimal numbers, usually in conjunction +with power suffixes; for example, \f[V]--limit-rate=2.5k\f[R] is a legal +value. +.PP +Note that Wget2 implements the limiting by sleeping the appropriate +amount of time after a network read that took less time than specified +by the rate. +Eventually this strategy causes the TCP transfer to slow down to +approximately the specified rate. +However, it may take some time for this balance to be achieved, so +don\[cq]t be surprised if limiting the rate doesn\[cq]t work well with +very small files. +.SS \f[V]-w seconds\f[R], \f[V]--wait=seconds\f[R] +.PP +Wait the specified number of seconds between the retrievals. +Use of this option is recommended, as it lightens the server load by +making the requests less frequent. +Instead of in seconds, the time can be specified in minutes using the +\[lq]m\[rq] suffix, in hours using \[lq]h\[rq] suffix, or in days using +\[lq]d\[rq] suffix. +.PP +Specifying a large value for this option is useful if the network or the +destination host is down, so that Wget2 can wait long enough to +reasonably expect the network error to be fixed before the retry. +The waiting interval specified by this function is influenced by +\f[V]--random-wait\f[R], which see. +.SS \f[V]--waitretry=seconds\f[R] +.PP +If you don\[cq]t want Wget2 to wait between every retrieval, but only +between retries of failed downloads, you can use this option. +Wget2 will use linear backoff, waiting 1 second after the first failure +on a given file, then waiting 2 seconds after the second failure on that +file, up to the maximum number of seconds you specify. +.PP +By default, Wget2 will assume a value of 10 seconds. +.SS \f[V]--random-wait\f[R] +.PP +Some web sites may perform log analysis to identify retrieval programs +such as Wget2 by looking for statistically significant similarities in +the time between requests. +This option causes the time between requests to vary between 0.5 and 1.5 +### wait seconds, where wait was specified using the \f[V]--wait\f[R] +option, in order to mask Wget2\[cq]s presence from such analysis. +.PP +A 2001 article in a publication devoted to development on a popular +consumer platform provided code to perform this analysis on the fly. +Its author suggested blocking at the class C address level to ensure +automated retrieval programs were blocked despite changing DHCP-supplied +addresses. +.PP +The \f[V]--random-wait\f[R] option was inspired by this ill-advised +recommendation to block many unrelated users from a web site due to the +actions of one. +.SS \f[V]--no-proxy[=exceptions]\f[R] +.PP +If no argument is given, we try to stay backward compatible with Wget1.x +and don\[cq]t use proxies, even if the appropriate *_proxy environment +variable is defined. +.PP +If a comma-separated list of exceptions (domains/IPs) is given, these +exceptions are accessed without using a proxy. +It overrides the `no_proxy' environment variable. +.SS \f[V]-Q quota\f[R], \f[V]--quota=quota\f[R] +.PP +Specify download quota for automatic retrievals. +The value can be specified in bytes (default), kilobytes (with k +suffix), or megabytes (with m suffix). +.PP +Note that quota will never affect downloading a single file. +So if you specify +.IP +.nf +\f[C] + wget2 -Q10k https://example.com/bigfile.gz +\f[R] +.fi +.PP +all of the \f[V]bigfile.gz\f[R] will be downloaded. +The same goes even when several URLs are specified on the command-line. +However, quota is respected when retrieving either recursively, or from +an input file. +Thus you may safely type +.IP +.nf +\f[C] + wget2 -Q2m -i sites +\f[R] +.fi +.PP +download will be aborted when the quota is exceeded. +.PP +Setting quota to \f[V]0\f[R] or to \f[V]inf\f[R] unlimits the download +quota. +.SS \f[V]--restrict-file-names=modes\f[R] +.PP +Change which characters found in remote URLs must be escaped during +generation of local filenames. +Characters that are restricted by this option are escaped, +i.e.\ replaced with %HH, where HH is the hexadecimal number that +corresponds to the restricted character. +This option may also be used to force all alphabetical cases to be +either lower- or uppercase. +.PP +By default, Wget2 escapes the characters that are not valid or safe as +part of file names on your operating system, as well as control +characters that are typically unprintable. +This option is useful for changing these defaults, perhaps because you +are downloading to a non-native partition, or because you want to +disable escaping of the control characters, or you want to further +restrict characters to only those in the ASCII range of values. +.PP +The modes are a comma-separated set of text values. +The acceptable values are unix, windows, nocontrol, ascii, lowercase, +and uppercase. +The values unix and windows are mutually exclusive (one will override +the other), as are lowercase and uppercase. +Those last are special cases, as they do not change the set of +characters that would be escaped, but rather force local file paths to +be converted either to lower- or uppercase. +.PP +When \[lq]unix\[rq] is specified, Wget2 escapes the character / and the +control characters in the ranges 0\[en]31 and 128\[en]159. +This is the default on Unix-like operating systems. +.PP +When \[lq]windows\[rq] is given, Wget2 escapes the characters , |, /, :, +?, \[lq], *, <, >, and the control characters in the ranges 0\[en]31 and +128\[en]159. +In addition to this, Wget2 in Windows mode uses + instead of : to +separate host and port in local file names, and uses \[at] instead of ? +to separate the query portion of the file name from the rest. +Therefore, a URL that would be saved as +\f[V]www.xemacs.org:4300/search.pl?input=blah\f[R] in Unix mode would be +saved as \f[V]www.xemacs.org+4300/search.pl\[at]input=blah\f[R] in +Windows mode. +This mode is the default on Windows. +.PP +If you specify nocontrol, then the escaping of the control characters is +also switched off. +This option may make sense when you are downloading URLs whose names +contain UTF-8 characters, on a system which can save and display +filenames in UTF-8 (some possible byte values used in UTF-8 byte +sequences fall in the range of values designated by Wget2 as +\[lq]controls\[rq]). +.PP +The ascii mode is used to specify that any bytes whose values are +outside the range of ASCII characters (that is, greater than 127) shall +be escaped. +This can be useful when saving filenames whose encoding does not match +the one used locally. +.SS \f[V]-4\f[R], \f[V]--inet4-only\f[R], \f[V]-6\f[R], \f[V]--inet6-only\f[R] +.PP +Force connecting to IPv4 or IPv6 addresses. +With \f[V]--inet4-only\f[R] or \f[V]-4\f[R], Wget2 will only connect to +IPv4 hosts, ignoring AAAA records in DNS, and refusing to connect to +IPv6 addresses specified in URLs. +Conversely, with \f[V]--inet6-only\f[R] or \f[V]-6\f[R], Wget2 will only +connect to IPv6 hosts and ignore A records and IPv4 addresses. +.PP +Neither options should be needed normally. +By default, an IPv6-aware Wget2 will use the address family specified by +the host\[cq]s DNS record. +If the DNS responds with both IPv4 and IPv6 addresses, Wget2 will try +them in sequence until it finds one it can connect to. +(Also see \f[V]--prefer-family\f[R] option described below.) +.PP +These options can be used to deliberately force the use of IPv4 or IPv6 +address families on dual family systems, usually to aid debugging or to +deal with broken network configuration. +Only one of \f[V]--inet6-only\f[R] and \f[V]--inet4-only\f[R] may be +specified at the same time. +Neither option is available in Wget2 compiled without IPv6 support. +.SS \f[V]--prefer-family=none/IPv4/IPv6\f[R] +.PP +When given a choice of several addresses, connect to the addresses with +specified address family first. +The address order returned by DNS is used without change by default. +.PP +This avoids spurious errors and connect attempts when accessing hosts +that resolve to both IPv6 and IPv4 addresses from IPv4 networks. +For example, www.kame.net resolves to 2001:200:0:8002:203:47ff:fea5:3085 +and to 203.178.141.194. +When the preferred family is \[lq]IPv4\[rq], the IPv4 address is used +first; when the preferred family is \[lq]IPv6\[rq], the IPv6 address is +used first; if the specified value is \[lq]none\[rq], the address order +returned by DNS is used without change. +.PP +Unlike -4 and -6, this option doesn\[cq]t inhibit access to any address +family, it only changes the order in which the addresses are accessed. +Also note that the reordering performed by this option is stable. +It doesn\[cq]t affect order of addresses of the same family. +That is, the relative order of all IPv4 addresses and of all IPv6 +addresses remains intact in all cases. +.SS \f[V]--tcp-fastopen\f[R] +.PP +Enable support for TCP Fast Open (TFO) (default: on). +.PP +TFO reduces connection latency by 1 RT on \[lq]hot\[rq] connections +(2nd+ connection to the same host in a certain amount of time). +.PP +Currently this works on recent Linux and OSX kernels, on HTTP and HTTPS. +.SS \f[V]--dns-cache-preload=file\f[R] +.PP +Load a list of IP / Name tuples into the DNS cache. +.PP +The format of \f[V]file\f[R] is like \f[V]/etc/hosts\f[R]: IP-address +whitespace Name +.PP +This allows to save domain name lookup time, which is a bottleneck in +some use cases. +Also, the use of HOSTALIASES (which is not portable) can be mimiced by +this option. +.SS \f[V]--dns-cache\f[R] +.PP +Enable DNS caching (default: on). +.PP +Normally, Wget2 remembers the IP addresses it looked up from DNS so it +doesn\[cq]t have to repeatedly contact the DNS server for the same +(typically small) set of hosts it retrieves from. +This cache exists in memory only; a new Wget2 run will contact DNS +again. +.PP +However, it has been reported that in some situations it is not +desirable to cache host names, even for the duration of a short-running +application like Wget2. +With \f[V]--no-dns-cache\f[R] Wget2 issues a new DNS lookup (more +precisely, a new call to \[lq]gethostbyname\[rq] or +\[lq]getaddrinfo\[rq]) each time it makes a new connection. +Please note that this option will not affect caching that might be +performed by the resolving library or by an external caching layer, such +as NSCD. +.SS \f[V]--retry-connrefused\f[R] +.PP +Consider \[lq]connection refused\[rq] a transient error and try again. +Normally Wget2 gives up on a URL when it is unable to connect to the +site because failure to connect is taken as a sign that the server is +not running at all and that retries would not help. +This option is for mirroring unreliable sites whose servers tend to +disappear for short periods of time. +.SS \f[V]--user=user\f[R], \f[V]--password=password\f[R] +.PP +Specify the username user and password password for HTTP file retrieval. +This overrides the lookup of credentials in the .netrc file +(\f[V]--netrc\f[R] is enabled by default). +These parameters can be overridden using the \f[V]--http-user\f[R] and +\f[V]--http-password\f[R] options for HTTP(S) connections. +.PP +If neither \f[V]--http-proxy-user\f[R] nor +\f[V]--http-proxy-password\f[R] is given these settings are also taken +for proxy authentication. +.SS \f[V]--ask-password\f[R] +.PP +Prompt for a password on the command line. +Overrides the password set by \f[V]--password\f[R] (if any). +.SS \f[V]--use-askpass=command\f[R] +.PP +Prompt for a user and password using the specified command. +Overrides the user and/or password set by +\f[V]--user\f[R]/\f[V]--password\f[R] (if any). +.SS \f[V]--no-iri\f[R] +.PP +Turn off internationalized URI (IRI) support. +Use \f[V]--iri\f[R] to turn it on. +IRI support is activated by default. +.PP +You can set the default state of IRI support using the \[lq]iri\[rq] +command in \f[V].wget2rc\f[R]. +That setting may be overridden from the command line. +.SS \f[V]--local-encoding=encoding\f[R] +.PP +Force Wget2 to use encoding as the default system encoding. +That affects how Wget2 converts URLs specified as arguments from locale +to UTF-8 for IRI support. +.PP +Wget2 use the function \[lq]nl_langinfo()\[rq] and then the +\[lq]CHARSET\[rq] environment variable to get the locale. +If it fails, ASCII is used. +.SS \f[V]--remote-encoding=encoding\f[R] +.PP +Force Wget2 to use encoding as the default remote server encoding. +That affects how Wget2 converts URIs found in files from remote encoding +to UTF-8 during a recursive fetch. +This options is only useful for IRI support, for the interpretation of +non-ASCII characters. +.PP +For HTTP, remote encoding can be found in HTTP \[lq]Content-Type\[rq] +header and in HTML \[lq]Content-Type http-equiv\[rq] meta tag. +.SS \f[V]--input-encoding=encoding\f[R] +.PP +Use the specified encoding for the URLs read from +\f[V]--input-file\f[R]. +The default is the local encoding. +.SS \f[V]--unlink\f[R] +.PP +Force Wget2 to unlink file instead of clobbering existing file. +This option is useful for downloading to the directory with hardlinks. +.SS \f[V]--cut-url-get-vars\f[R] +.PP +Remove HTTP GET Variables from URLs. +For example \[lq]main.css?v=123\[rq] will be changed to +\[lq]main.css\[rq]. +Be aware that this may have unintended side effects, for example +\[lq]image.php?name=sun\[rq] will be changed to \[lq]image.php\[rq]. +The cutting happens before adding the URL to the download queue. +.SS \f[V]--cut-file-get-vars\f[R] +.PP +Remove HTTP GET Variables from filenames. +For example \[lq]main.css?v=123\[rq] will be changed to +\[lq]main.css\[rq]. +.PP +Be aware that this may have unintended side effects, for example +\[lq]image.php?name=sun\[rq] will be changed to \[lq]image.php\[rq]. +The cutting happens when saving the file, after downloading. +.PP +File names obtained from a \[lq]Content-Disposition\[rq] header are not +affected by this setting (see \f[V]--content-disposition\f[R]), and can +be a solution for this problem. +.PP +When \f[V]--trust-server-names\f[R] is used, the redirection URL is +affected by this setting. +.SS \f[V]--chunk-size=size\f[R] +.PP +Download large files in multithreaded chunks. +This switch specifies the size of the chunks, given in bytes if no other +byte multiple unit is specified. +By default it\[cq]s set on 0/off. +.SS \f[V]--max-threads=number\f[R] +.PP +Specifies the maximum number of concurrent download threads for a +resource. +The default is 5 but if you want to allow more or fewer this is the +option to use. +.SS \f[V]-s\f[R], \f[V]--verify-sig[=fail|no-fail]\f[R] +.PP +Enable PGP signature verification (when not prefixed with +\f[V]no-\f[R]). +When enabled Wget2 will attempt to download and verify PGP signatures +against their corresponding files. +Any file downloaded that has a content type beginning with +\f[V]application/\f[R] will cause Wget2 to request the signature for +that file. +.PP +The name of the signature file is computed by appending the extension to +the full path of the file that was just downloaded. +The extension used is defined by the \f[V]--signature-extensions\f[R] +option. +If the content type for the signature request is +\f[V]application/pgp-signature\f[R], Wget2 will attempt to verify the +signature against the original file. +By default, if a signature file cannot be found (I.E. +the request for it gets a 404 status code) Wget2 will exit with an error +code. +.PP +This behavior can be tuned using the following arguments: * +\f[V]fail\f[R]: This is the default, meaning that this is the value when +you supply the flag without an argument. +Indicates that missing signature files will cause Wget2 to exit with an +error code. +* \f[V]no-fail\f[R]: This value allows missing signature files. +A 404 message will still be issued, but the program will exit normally +(assuming no unrelated errors). +.PP +Additionally, \f[V]--no-verify-sig\f[R] disables signature checking +altogether \f[V]--no-verify-sig\f[R] does not allow any arguments. +.SS \f[V]--signature-extensions\f[R] +.PP +Specify the file extensions for signature files, without the leading +\[lq].\[rq]. +You may specify multiple extensions as a comma separated list. +All the provided extensions will be tried simultaneously when looking +for the signature file. +The default is \[lq]sig\[rq]. +.SS \f[V]--gnupg-homedir\f[R] +.PP +Specifies the gnupg home directory to use when verifying PGP signatures +on downloaded files. +The default for this is your system\[cq]s default home directory. +.SS \f[V]--verify-save-failed\f[R] +.PP +Instructs Wget2 to keep files that don\[cq]t pass PGP signature +validation. +The default is to delete files that fail validation. +.SS \f[V]--xattr\f[R] +.PP +Saves documents metadata as \[lq]user POSIX Extended Attributes\[rq] +(default: on). +This feature only works if the file system supports it. +More info on https://freedesktop.org/wiki/CommonExtendedAttributes. +.PP +Wget2 currently sets * user.xdg.origin.url * user.xdg.referrer.url * +user.mime_type * user.charset +.PP +To display the extended attributes of a file (Linux): +\f[V]getfattr -d <file>\f[R] +.SS \f[V]--metalink\f[R] +.PP +Follow/process metalink URLs without saving them (default: on). +.PP +Metalink files describe downloads incl.\ mirrors, files, checksums, +signatures. +This allows chunked downloads, automatically taking the nearest mirrors, +preferring the fastest mirrors and checking the download for integrity. +.SS \f[V]--fsync-policy\f[R] +.PP +Enables disk syncing after each write (default: off). +.SS \f[V]--http2-request-window=number\f[R] +.PP +Set max. +number of parallel streams per HTTP/2 connection (default: 30). +.SS \f[V]--keep-extension\f[R] +.PP +This option changes the behavior for creating a unique filename if a +file already exists. +.PP +The standard (default) pattern for file names is +\f[V]<filename>.<N>\f[R], the new pattern is +\f[V]<basename>_<N>.<ext>\f[R]. +.PP +The idea is to use such files without renaming when the use depends on +the extension, like on Windows. +.PP +This option doesn not change the behavior of \f[V]--backups\f[R]. +.SS Directory Options +.SS \f[V]-nd\f[R], \f[V]--no-directories\f[R] +.PP +Do not create a hierarchy of directories when retrieving recursively. +With this option turned on, all files will get saved to the current +directory, without clobbering (if a name shows up more than once, the +filenames will get extensions .n). +.SS \f[V]-x\f[R], \f[V]--force-directories\f[R] +.PP +The opposite of \f[V]-nd\f[R]: create a hierarchy of directories, even +if one would not have been created otherwise. +E.g. +\f[V]wget2 -x https://example.com/robots.txt\f[R] will save the +downloaded file to \f[V]example.com/robots.txt\f[R]. +.SS \f[V]-nH\f[R], \f[V]--no-host-directories\f[R] +.PP +Disable generation of host-prefixed directories. +By default, invoking Wget2 with \f[V]-r https://example.com/\f[R] will +create a structure of directories beginning with \f[V]example.com/\f[R]. +This option disables such behavior. +.SS \f[V]--protocol-directories\f[R] +.PP +Use the protocol name as a directory component of local file names. +For example, with this option, \f[V]wget2 -r https://example.com\f[R] +will save to \f[V]https/example.com/...\f[R] rather than just to +\f[V]example.com/...\f[R]. +.SS \f[V]--cut-dirs=number\f[R] +.PP +Ignore a number of directory components. +This is useful for getting a fine-grained control over the directory +where recursive retrieval will be saved. +.PP +Take, for example, the directory at https://example.com/pub/sub/. +If you retrieve it with \f[V]-r\f[R], it will be saved locally under +\f[V]example.com/pub/sub/\f[R]. +While the \f[V]-nH\f[R] option can remove the \f[V]example.com/\f[R] +part, you are still stuck with \f[V]pub/sub/\f[R]. +This is where \f[V]--cut-dirs\f[R] comes in handy; it makes Wget2 not +\[lq]see\[rq] a number of remote directory components. +Here are several examples of how \f[V]--cut-dirs\f[R] option works. +\f[V]No options -> example.com/pub/sub/ --cut-dirs=1 -> example.com/sub/ --cut-dirs=2 -> example.com/ -nH -> pub/sub/ -nH --cut-dirs=1 -> sub/ -nH --cut-dirs=2 -> .\f[R] +If you just want to get rid of the directory structure, this option is +similar to a combination of \f[V]-nd\f[R] and \f[V]-P\f[R]. +However, unlike \f[V]-nd\f[R], \f[V]--cut-dirs\f[R] does not lose with +subdirectories. +For instance, with \f[V]-nH --cut-dirs=1\f[R], a \f[V]beta/\f[R] +subdirectory will be placed to \f[V]sub/beta/\f[R], as one would expect. +.SS \f[V]-P prefix\f[R], \f[V]--directory-prefix=prefix\f[R] +.PP +Set directory prefix to prefix. +The directory prefix is the directory where all other files and +subdirectories will be saved to, i.e.\ the top of the retrieval tree. +The default is \f[V].\f[R], the current directory. +If the directory \f[V]prefix\f[R] doesn\[cq]t exist, it will be created. +.SS HTTP Options +.SS \f[V]--default-page=name\f[R] +.PP +Use name as the default file name when it isn\[cq]t known (i.e., for +URLs that end in a slash), instead of \f[V]index.html\f[R]. +.SS \f[V]--default-http-port=port\f[R] +.PP +Set the default port for HTTP URLs (default: 80). +.PP +This is mainly for testing purposes. +.SS \f[V]--default-https-port=port\f[R] +.PP +Set the default port for HTTPS URLs (default: 443). +.PP +This is mainly for testing purposes. +.SS \f[V]-E\f[R], \f[V]--adjust-extension\f[R] +.PP +If a file of type \f[V]application/xhtml+xml\f[R] or \f[V]text/html\f[R] +is downloaded and the URL does not end with the regexp +\f[V]\[rs].[Hh][Tt][Mm][Ll]?\f[R], this option will cause the suffix +\f[V].html\f[R] to be appended to the local filename. +This is useful, for instance, when you\[cq]re mirroring a remote site +that uses .asp pages, but you want the mirrored pages to be viewable on +your stock Apache server. +Another good use for this is when you\[cq]re downloading CGI-generated +materials. +A URL like \f[V]https://example.com/article.cgi?25\f[R] will be saved as +\f[V]article.cgi?25.html\f[R]. +.PP +Note that filenames changed in this way will be re-downloaded every time +you re-mirror a site, because Wget2 can\[cq]t tell that the local +\f[V]X.html\f[R] file corresponds to remote URL X (since it doesn\[cq]t +yet know that the URL produces output of type \f[V]text/html\f[R] or +\f[V]application/xhtml+xml\f[R]. +.PP +Wget2 will also ensure that any downloaded files of type +\f[V]text/css\f[R] end in the suffix \f[V].css\f[R]. +.PP +At some point in the future, this option may well be expanded to include +suffixes for other types of content, including content types that are +not parsed by Wget. +.SS \f[V]--http-user=user\f[R], \f[V]--http-password=password\f[R] +.PP +Specify the user and password for HTTP authentication. +According to the type of the challenge, Wget will encode them using +either the \[lq]basic\[rq] (insecure), the \[lq]digest\[rq], or the +Windows \[lq]NTLM\[rq] authentication scheme. +.PP +If possible, put your credentials into \f[V]\[ti]/.netrc\f[R] (see also +\f[V]--netrc\f[R] and \f[V]--netrc-file\f[R] options) or into +\f[V].wget2rc\f[R]. +This is far more secure than using the command line which can be seen by +any other user. +If the passwords are really important, do not leave them lying in those +files either. +Edit the files and delete them after Wget2 has started the download. +.PP +In \f[V]\[ti]/.netrc\f[R] passwords may be double quoted to allow +spaces. +Also, escape characters with a backslash if needed. +A backslash in a password always needs to be escaped, so use +\f[V]\[rs]\[rs]\f[R] instead of a single \f[V]\[rs]\f[R]. +.PP +Also see \f[V]--use-askpass\f[R] and \f[V]--ask-password\f[R] for an +interactive method to provide your password. +.SS \f[V]--http-proxy-user=user\f[R], \f[V]--http-proxy-password=password\f[R] +.PP +Specify the user and password for HTTP proxy authentication. +See \f[V]--http-user\f[R] for details. +.SS \f[V]--http-proxy=proxies\f[R] +.PP +Set comma-separated list of HTTP proxies. +The environment variable `http_proxy' will be overridden. +.PP +Exceptions can be set via the environment variable `no_proxy' or via +\f[V]--no-proxy\f[R]. +.SS \f[V]--https-proxy=proxies\f[R] +.PP +Set comma-separated list of HTTPS proxies. +The environment variable `https_proxy' will be overridden. +.PP +Exceptions can be set via the environment variable `no_proxy' or via +\f[V]--no-proxy\f[R]. +.SS \f[V]--no-http-keep-alive\f[R] +.PP +Turn off the \[lq]keep-alive\[rq] feature for HTTP(S) downloads. +Normally, Wget2 asks the server to keep the connection open so that, +when you download more than one document from the same server, they get +transferred over the same TCP connection. +This saves time and at the same time reduces the load on the server. +.PP +This option is useful when, for some reason, persistent (keep-alive) +connections don\[cq]t work for you, for example due to a server bug or +due to the inability of server-side scripts to cope with the +connections. +.SS \f[V]--no-cache\f[R] +.PP +Disable server-side cache. +In this case, Wget2 will send the remote server appropriate directives +(Cache-Control: no- cache and Pragma: no-cache) to get the file from the +remote service, rather than returning the cached version. +This is especially useful for retrieving and flushing out-of-date +documents on proxy servers. +.PP +Caching is allowed by default. +.SS \f[V]--no-cookies\f[R] +.PP +Disable the use of cookies. +Cookies are a mechanism for maintaining server-side state. +The server sends the client a cookie using the \[lq]Set-Cookie\[rq] +header, and the client responds with the same cookie upon further +requests. +Since cookies allow the server owners to keep track of visitors and for +sites to exchange this information, some consider them a breach of +privacy. +The default is to use cookies; however, storing cookies is not on by +default. +.SS \f[V]--load-cookies file\f[R] +.PP +Load cookies from \f[V]file\f[R] before the first HTTP(S) retrieval. +file is a textual file in the format originally used by Netscape\[cq]s +cookies.txt file. +.PP +You will typically use this option when mirroring sites that require +that you be logged in to access some or all of their content. +The login process typically works by the web server issuing an HTTP +cookie upon receiving and verifying your credentials. +The cookie is then resent by the browser when accessing that part of the +site, and so proves your identity. +.PP +Mirroring such a site requires Wget2 to send the same cookies your +browser sends when communicating with the site. +This is achieved by \f[V]--load-cookies\f[R]: simply point Wget2 to the +location of the cookies.txt file, and it will send the same cookies your +browser would send in the same situation. +Different browsers keep textual cookie files in different locations: +.PP +\[lq]Netscape 4.x.\[rq] The cookies are in \[ti]/.netscape/cookies.txt. +.PP +\[lq]Mozilla and Netscape 6.x.\[rq] Mozilla\[cq]s cookie file is also +named cookies.txt, located somewhere under \[ti]/.mozilla, in the +directory of your profile. +The full path usually ends up looking somewhat like +\[ti]/.mozilla/default/some-weird- string/cookies.txt. +.PP +\[lq]Internet Explorer.\[rq] You can produce a cookie file Wget2 can use +by using the File menu, Import and Export, Export Cookies. +This has been tested with Internet Explorer 5; it is not guaranteed to +work with earlier versions. +.PP +\[lq]Other browsers.\[rq] If you are using a different browser to create +your cookies, \f[V]--load-cookies\f[R] will only work if you can locate +or produce a cookie file in the Netscape format that Wget2 expects. +.PP +If you cannot use \f[V]--load-cookies\f[R], there might still be an +alternative. +If your browser supports a \[lq]cookie manager\[rq], you can use it to +view the cookies used when accessing the site you\[cq]re mirroring. +Write down the name and value of the cookie, and manually instruct Wget2 +to send those cookies, bypassing the \[lq]official\[rq] cookie support: +.IP +.nf +\f[C] + wget2 --no-cookies --header \[dq]Cookie: <name>=<value>\[dq] +\f[R] +.fi +.SS \f[V]--save-cookies file\f[R] +.PP +Save cookies to \f[V]file\f[R] before exiting. +This will not save cookies that have expired or that have no expiry time +(so-called \[lq]session cookies\[rq]), but also see +\f[V]--keep-session-cookies\f[R]. +.SS \f[V]--keep-session-cookies\f[R] +.PP +When specified, causes \f[V]--save-cookies\f[R] to also save session +cookies. +Session cookies are normally not saved because they are meant to be kept +in memory and forgotten when you exit the browser. +Saving them is useful on sites that require you to log in or to visit +the home page before you can access some pages. +With this option, multiple Wget2 runs are considered a single browser +session as far as the site is concerned. +.PP +Since the cookie file format does not normally carry session cookies, +Wget2 marks them with an expiry timestamp of 0. +Wget2\[cq]s \f[V]--load-cookies\f[R] recognizes those as session +cookies, but it might confuse other browsers. +Also note that cookies so loaded will be treated as other session +cookies, which means that if you want \f[V]--save-cookies\f[R] to +preserve them again, you must use \f[V]--keep-session-cookies\f[R] +again. +.SS \f[V]--cookie-suffixes=file\f[R] +.PP +Load the public suffixes used for cookie checking from the given file. +.PP +Normally, the underlying libpsl loads this data from a system file or it +has the data built in. +In some cases you might want to load an updated PSL, e.g.\ from +https://publicsuffix.org/list/public_suffix_list.dat. +.PP +The PSL allows to prevent setting of \[lq]super-cookies\[rq] that lead +to cookie privacy leakage. +More details can be found on https://publicsuffix.org/. +.SS \f[V]--ignore-length\f[R] +.PP +Unfortunately, some HTTP servers (CGI programs, to be more precise) send +out bogus \[lq]Content-Length\[rq] headers, which makes Wget2 go wild, +as it thinks not all the document was retrieved. +You can spot this syndrome if Wget retries getting the same document +again and again, each time claiming that the (otherwise normal) +connection has closed on the very same byte. +.PP +With this option, Wget2 will ignore the \[lq]Content-Length\[rq] header +as if it never existed. +.SS \f[V]--header=header-line\f[R] +.PP +Send header-line along with the rest of the headers in each HTTP +request. +The supplied header is sent as-is, which means it must contain name and +value separated by colon, and must not contain newlines. +.PP +You may define more than one additional header by specifying +\f[V]--header\f[R] more than once. +.IP +.nf +\f[C] + wget2 --header=\[aq]Accept-Charset: iso-8859-2\[aq] \[rs] + --header=\[aq]Accept-Language: hr\[aq] \[rs] + https://example.com/ +\f[R] +.fi +.PP +Specification of an empty string as the header value will clear all +previous user-defined headers. +.PP +This option can be used to override headers otherwise generated +automatically. +This example instructs Wget2 to connect to localhost, but to specify +\f[V]example.com\f[R] in the \[lq]Host\[rq] header: +.IP +.nf +\f[C] + wget2 --header=\[dq]Host: example.com\[dq] http://localhost/ +\f[R] +.fi +.SS \f[V]--max-redirect=number\f[R] +.PP +Specifies the maximum number of redirections to follow for a resource. +The default is 20, which is usually far more than necessary. +However, on those occasions where you want to allow more (or fewer), +this is the option to use. +.SS \f[V]--proxy-user=user\f[R], \f[V]--proxy-password=password\f[R] [Not implemented, use \f[V]--http-proxy-password\f[R]] +.PP +Specify the username user and password password for authentication on a +proxy server. +Wget2 will encode them using the \[lq]basic\[rq] authentication scheme. +.PP +Security considerations similar to those with \f[V]--http-password\f[R] +pertain here as well. +.SS \f[V]--referer=url\f[R] +.PP +Include \[ga]Referer: url\[cq] header in HTTP request. +Useful for retrieving documents with server-side processing that assume +they are always being retrieved by interactive web browsers and only +come out properly when Referer is set to one of the pages that point to +them. +.SS \f[V]--save-headers\f[R] +.PP +Save the headers sent by the HTTP server to the file, preceding the +actual contents, with an empty line as the separator. +.SS \f[V]-U agent-string\f[R], \f[V]--user-agent=agent-string\f[R] +.PP +Identify as agent-string to the HTTP server. +.PP +The HTTP protocol allows the clients to identify themselves using a +\[lq]User-Agent\[rq] header field. +This enables distinguishing the WWW software, usually for statistical +purposes or for tracing of protocol violations. +Wget normally identifies as Wget/version, version being the current +version number of Wget. +.PP +However, some sites have been known to impose the policy of tailoring +the output according to the \[lq]User-Agent\[rq]-supplied information. +While this is not such a bad idea in theory, it has been abused by +servers denying information to clients other than (historically) +Netscape or, more frequently, Microsoft Internet Explorer. +This option allows you to change the \[lq]User-Agent\[rq] line issued by +Wget. +Use of this option is discouraged, unless you really know what you are +doing. +.PP +Specifying empty user agent with \f[V]--user-agent=\[dq]\[dq]\f[R] +instructs Wget2 not to send the \[lq]User-Agent\[rq] header in HTTP +requests. +.SS \f[V]--post-data=string\f[R], \f[V]--post-file=file\f[R] +.PP +Use POST as the method for all HTTP requests and send the specified data +in the request body. +\[en]post-data sends string as data, whereas \f[V]--post-file\f[R] sends +the contents of file. +Other than that, they work in exactly the same way. +In particular, they both expect content of the form +\[lq]key1=value1&key2=value2\[rq], with percent-encoding for special +characters; the only difference is that one expects its content as a +command-line parameter and the other accepts its content from a file. +In particular, \f[V]--post-file\f[R] is not for transmitting files as +form attachments: those must appear as \[lq]key=value\[rq] data (with +appropriate percent-coding) just like everything else. +Wget2 does not currently support \[lq]multipart/form-data\[rq] for +transmitting POST data; only +\[lq]application/x-www-form-urlencoded\[rq]. +Only one of \f[V]--post-data\f[R] and \f[V]--post-file\f[R] should be +specified. +.PP +Please note that wget2 does not require the content to be of the form +\[lq]key1=value1&key2=value2\[rq], and neither does it test for it. +Wget2 will simply transmit whatever data is provided to it. +Most servers however expect the POST data to be in the above format when +processing HTML Forms. +.PP +When sending a POST request using the \f[V]--post-file\f[R] option, +Wget2 treats the file as a binary file and will send every character in +the POST request without stripping trailing newline or formfeed +characters. +Any other control characters in the text will also be sent as-is in the +POST request. +.PP +Please be aware that Wget2 needs to know the size of the POST data in +advance. +Therefore the argument to \f[V]--post-file\f[R] must be a regular file; +specifying a FIFO or something like /dev/stdin won\[cq]t work. +It\[cq]s not quite clear how to work around this limitation inherent in +HTTP/1.0. +Although HTTP/1.1 introduces chunked transfer that doesn\[cq]t require +knowing the request length in advance, a client can\[cq]t use chunked +unless it knows it\[cq]s talking to an HTTP/1.1 server. +And it can\[cq]t know that until it receives a response, which in turn +requires the request to have been completed \[en] a chicken-and-egg +problem. +.PP +If Wget2 is redirected after the POST request is completed, its +behaviour depends on the response code returned by the server. +In case of a 301 Moved Permanently, 302 Moved Temporarily or 307 +Temporary Redirect, Wget2 will, in accordance with RFC2616, continue to +send a POST request. +In case a server wants the client to change the Request method upon +redirection, it should send a 303 See Other response code. +.PP +This example shows how to log in to a server using POST and then proceed +to download the desired pages, presumably only accessible to authorized +users: +.IP +.nf +\f[C] + # Log in to the server. This can be done only once. + wget2 --save-cookies cookies.txt \[rs] + --post-data \[aq]user=foo&password=bar\[aq] \[rs] + http://example.com/auth.php + + # Now grab the page or pages we care about. + wget2 --load-cookies cookies.txt \[rs] + -p http://example.com/interesting/article.php +\f[R] +.fi +.PP +If the server is using session cookies to track user authentication, the +above will not work because \f[V]--save-cookies\f[R] will not save them +(and neither will browsers) and the cookies.txt file will be empty. +In that case use \f[V]--keep-session-cookies\f[R] along with +\f[V]--save-cookies\f[R] to force saving of session cookies. +.SS \f[V]--method=HTTP-Method\f[R] +.PP +For the purpose of RESTful scripting, Wget2 allows sending of other HTTP +Methods without the need to explicitly set them using +\f[V]--header=Header-Line\f[R]. +Wget2 will use whatever string is passed to it after \f[V]--method\f[R] +as the HTTP Method to the server. +.SS \f[V]--body-data=Data-String\f[R], \f[V]--body-file=Data-File\f[R] +.PP +Must be set when additional data needs to be sent to the server along +with the Method specified using \f[V]--method\f[R]. +\f[V]--body-data\f[R] sends string as data, whereas +\f[V]--body-file\f[R] sends the contents of file. +Other than that, they work in exactly the same way. +.PP +Currently, \f[V]--body-file\f[R] is not for transmitting files as a +whole. +Wget2 does not currently support \[lq]multipart/form-data\[rq] for +transmitting data; only \[lq]application/x-www-form-urlencoded\[rq]. +In the future, this may be changed so that wget2 sends the +\f[V]--body-file\f[R] as a complete file instead of sending its contents +to the server. +Please be aware that Wget2 needs to know the contents of BODY Data in +advance, and hence the argument to \f[V]--body-file\f[R] should be a +regular file. +See \f[V]--post-file\f[R] for a more detailed explanation. +Only one of \f[V]--body-data\f[R] and \f[V]--body-file\f[R] should be +specified. +.PP +If Wget2 is redirected after the request is completed, Wget2 will +suspend the current method and send a GET request till the redirection +is completed. +This is true for all redirection response codes except 307 Temporary +Redirect which is used to explicitly specify that the request method +should not change. +Another exception is when the method is set to \[lq]POST\[rq], in which +case the redirection rules specified under \f[V]--post-data\f[R] are +followed. +.SS \f[V]--content-disposition\f[R] +.PP +If this is set to on, experimental (not fully-functional) support for +\[lq]Content-Disposition\[rq] headers is enabled. +This can currently result in extra round-trips to the server for a +\[lq]HEAD\[rq] request, and is known to suffer from a few bugs, which is +why it is not currently enabled by default. +.PP +This option is useful for some file-downloading CGI programs that use +\[lq]Content-Disposition\[rq] headers to describe what the name of a +downloaded file should be. +.SS \f[V]--content-on-error\f[R] +.PP +If this is set to on, wget2 will not skip the content when the server +responds with a http status code that indicates error. +.SS \f[V]--save-content-on\f[R] +.PP +This takes a comma-separated list of HTTP status codes to save the +content for. +.PP +You can use \[cq]*\[cq] for ANY. +An exclamation mark (!) +in front of a code means `exception'. +.PP +Example 1: \f[V]--save-content-on=\[dq]*,!404\[dq]\f[R] would save the +content on any HTTP status, except for 404. +.PP +Example 2: \f[V]--save-content-on=404\f[R] would save the content only +on HTTP status 404. +.PP +The older \f[V]--content-on-error\f[R] behaves like +\f[V]--save-content-on=*\f[R]. +.SS \f[V]--trust-server-names\f[R] +.PP +If this is set to on, on a redirect the last component of the +redirection URL will be used as the local file name. +By default it is used the last component in the original URL. +.SS \f[V]--auth-no-challenge\f[R] +.PP +If this option is given, Wget2 will send Basic HTTP authentication +information (plaintext username and password) for all requests. +.PP +Use of this option is not recommended, and is intended only to support +some few obscure servers, which never send HTTP authentication +challenges, but accept unsolicited auth info, say, in addition to +form-based authentication. +.SS \f[V]--compression=TYPE\f[R] +.PP +If this TYPE(\f[V]identity\f[R], \f[V]gzip\f[R], \f[V]deflate\f[R], +\f[V]xz\f[R], \f[V]lzma\f[R], \f[V]br\f[R], \f[V]bzip2\f[R], +\f[V]zstd\f[R], \f[V]lzip\f[R] or any combination of it) is given, Wget2 +will set \[lq]Accept-Encoding\[rq] header accordingly. +\f[V]--no-compression\f[R] means no \[lq]Accept-Encoding\[rq] header at +all. +To set \[lq]Accept-Encoding\[rq] to a custom value, use +\f[V]--no-compression\f[R] in combination with +\f[V]--header=\[dq]Accept-Encoding: xxx\[dq]\f[R]. +.PP +Compatibility-Note: \f[V]none\f[R] type in Wget 1.X has the same meaning +as \f[V]identity\f[R] type in Wget2. +.SS \f[V]--download-attr=[strippath|usepath]\f[R] +.PP +The \f[V]download\f[R] HTML5 attribute may specify (or better: suggest) +a file name for the \f[V]href\f[R] URL in \f[V]a\f[R] and \f[V]area\f[R] +tags. +This option tells Wget2 to make use of this file name when saving. +The two possible values are `strippath' to strip the path from the file +name. +This is the default. +.PP +The value `usepath' takes the file name as as including the directory. +This is very dangerous and we can\[cq]t stress enough not to use it on +untrusted input or servers ! +Only use this if you really trust the input or the server. +.SS HTTPS (SSL/TLS) Options +.PP +To support encrypted HTTP (HTTPS) downloads, Wget2 must be compiled with +an external SSL library. +The current default is GnuTLS. +In addition, Wget2 also supports HSTS (HTTP Strict Transport Security). +If Wget2 is compiled without SSL support, none of these options are +available. +.SS \f[V]--secure-protocol=protocol\f[R] +.PP +Choose the secure protocol to be used (default: \f[V]auto\f[R]). +.PP +Legal values are \f[V]auto\f[R], \f[V]SSLv3\f[R], \f[V]TLSv1\f[R], +\f[V]TLSv1_1\f[R], \f[V]TLSv1_2\f[R], \f[V]TLSv1_3\f[R] and +\f[V]PFS\f[R]. +.PP +If \f[V]auto\f[R] is used, the TLS library\[cq]s default is used. +.PP +Specifying \f[V]SSLv3\f[R] forces the use of the SSL3. +This is useful when talking to old and buggy SSL server implementations +that make it hard for the underlying TLS library to choose the correct +protocol version. +.PP +Specifying \f[V]PFS\f[R] enforces the use of the so-called Perfect +Forward Security cipher suites. +In short, PFS adds security by creating a one-time key for each TLS +connection. +It has a bit more CPU impact on client and server. +We use known to be secure ciphers (e.g.\ no MD4) and the TLS protocol. +.PP +\f[V]TLSv1\f[R] enables TLS1.0 or higher. +\f[V]TLSv1_1\f[R] enables TLS1.1 or higher. +\f[V]TLSv1_2\f[R] enables TLS1.2 or higher. +\f[V]TLSv1_3\f[R] enables TLS1.3 or higher. +.PP +Any other protocol string is directly given to the TLS library, +currently GnuTLS, as a \[lq]priority\[rq] or \[lq]cipher\[rq] string. +This is for users who know what they are doing. +.SS \f[V]--https-only\f[R] +.PP +When in recursive mode, only HTTPS links are followed. +.SS \f[V]--no-check-certificate\f[R] +.PP +Don\[cq]t check the server certificate against the available certificate +authorities. +Also don\[cq]t require the URL host name to match the common name +presented by the certificate. +.PP +The default is to verify the server\[cq]s certificate against the +recognized certificate authorities, breaking the SSL handshake and +aborting the download if the verification fails. +Although this provides more secure downloads, it does break +interoperability with some sites that worked with previous Wget +versions, particularly those using self-signed, expired, or otherwise +invalid certificates. +This option forces an \[lq]insecure\[rq] mode of operation that turns +the certificate verification errors into warnings and allows you to +proceed. +.PP +If you encounter \[lq]certificate verification\[rq] errors or ones +saying that \[lq]common name doesn\[cq]t match requested host name\[rq], +you can use this option to bypass the verification and proceed with the +download. +Only use this option if you are otherwise convinced of the site\[cq]s +authenticity, or if you really don\[cq]t care about the validity of its +certificate. +It is almost always a bad idea not to check the certificates when +transmitting confidential or important data. +For self-signed/internal certificates, you should download the +certificate and verify against that instead of forcing this insecure +mode. +If you are really sure of not desiring any certificate verification, you +can specify \f[V]--check-certificate=quiet\f[R] to tell Wget2 to not +print any warning about invalid certificates, albeit in most cases this +is the wrong thing to do. +.SS \f[V]--certificate=file\f[R] +.PP +Use the client certificate stored in file. +This is needed for servers that are configured to require certificates +from the clients that connect to them. +Normally a certificate is not required and this switch is optional. +.SS \f[V]--certificate-type=type\f[R] +.PP +Specify the type of the client certificate. +Legal values are PEM (assumed by default) and DER, also known as ASN1. +.SS \f[V]--private-key=file\f[R] +.PP +Read the private key from file. +This allows you to provide the private key in a file separate from the +certificate. +.SS \f[V]--private-key-type=type\f[R] +.PP +Specify the type of the private key. +Accepted values are PEM (the default) and DER. +.SS \f[V]--ca-certificate=file\f[R] +.PP +Use file as the file with the bundle of certificate authorities +(\[lq]CA\[rq]) to verify the peers. +The certificates must be in PEM format. +.PP +Without this option Wget2 looks for CA certificates at the +system-specified locations, chosen at OpenSSL installation time. +.SS \f[V]--ca-directory=directory\f[R] +.PP +Specifies directory containing CA certificates in PEM format. +Each file contains one CA certificate, and the file name is based on a +hash value derived from the certificate. +This is achieved by processing a certificate directory with the +\[lq]c_rehash\[rq] utility supplied with OpenSSL. +Using \f[V]--ca-directory\f[R] is more efficient than +\f[V]--ca-certificate\f[R] when many certificates are installed because +it allows Wget2 to fetch certificates on demand. +.PP +Without this option Wget2 looks for CA certificates at the +system-specified locations, chosen at OpenSSL installation time. +.SS \f[V]--crl-file=file\f[R] +.PP +Specifies a CRL file in file. +This is needed for certificates that have been revocated by the CAs. +.SS \f[V]--random-file=file\f[R] +.PP +[OpenSSL and LibreSSL only] Use file as the source of random data for +seeding the pseudo-random number generator on systems without +/dev/urandom. +.PP +On such systems the SSL library needs an external source of randomness +to initialize. +Randomness may be provided by EGD (see \[en]egd-file below) or read from +an external source specified by the user. +If this option is not specified, Wget2 looks for random data in +$RANDFILE or, if that is unset, in $HOME/.rnd. +.PP +If you\[cq]re getting the \[lq]Could not seed OpenSSL PRNG; disabling +SSL.\[rq] error, you should provide random data using some of the +methods described above. +.SS \f[V]--egd-file=file\f[R] +.PP +[OpenSSL only] Use file as the EGD socket. +EGD stands for Entropy Gathering Daemon, a user-space program that +collects data from various unpredictable system sources and makes it +available to other programs that might need it. +Encryption software, such as the SSL library, needs sources of +non-repeating randomness to seed the random number generator used to +produce cryptographically strong keys. +.PP +OpenSSL allows the user to specify his own source of entropy using the +\[lq]RAND_FILE\[rq] environment variable. +If this variable is unset, or if the specified file does not produce +enough randomness, OpenSSL will read random data from EGD socket +specified using this option. +.PP +If this option is not specified (and the equivalent startup command is +not used), EGD is never contacted. +EGD is not needed on modern Unix systems that support /dev/urandom. +.SS \f[V]--hsts\f[R] +.PP +Wget2 supports HSTS (HTTP Strict Transport Security, RFC 6797) by +default. +Use \f[V]--no-hsts\f[R] to make Wget2 act as a non-HSTS-compliant UA. +As a consequence, Wget2 would ignore all the +\[lq]Strict-Transport-Security\[rq] headers, and would not enforce any +existing HSTS policy. +.SS \f[V]--hsts-file=file\f[R] +.PP +By default, Wget2 stores its HSTS data in +\f[V]$XDG_DATA_HOME/wget/.wget-hsts\f[R] or, if XDG_DATA_HOME is not +set, in \f[V]\[ti]/.local/wget/.wget-hsts\f[R]. +You can use \f[V]--hsts-file\f[R] to override this. +.PP +Wget2 will use the supplied file as the HSTS database. +Such file must conform to the correct HSTS database format used by Wget. +If Wget2 cannot parse the provided file, the behaviour is unspecified. +.PP +To disable persistent storage use \f[V]--no-hsts-file\f[R]. +.PP +The Wget2\[cq]s HSTS database is a plain text file. +Each line contains an HSTS entry (ie. +a site that has issued a \[lq]Strict-Transport-Security\[rq] header and +that therefore has specified a concrete HSTS policy to be applied). +Lines starting with a dash (\[lq]#\[rq]) are ignored by Wget. +Please note that in spite of this convenient human-readability +hand-hacking the HSTS database is generally not a good idea. +.PP +An HSTS entry line consists of several fields separated by one or more +whitespace: +.IP +.nf +\f[C] + <hostname> SP [<port>] SP <include subdomains> SP <created> SP <max-age> +\f[R] +.fi +.PP +The hostname and port fields indicate the hostname and port to which the +given HSTS policy applies. +The port field may be zero, and it will, in most of the cases. +That means that the port number will not be taken into account when +deciding whether such HSTS policy should be applied on a given request +(only the hostname will be evaluated). +When port is different to zero, both the target hostname and the port +will be evaluated and the HSTS policy will only be applied if both of +them match. +This feature has been included for testing/development purposes only. +The Wget2 testsuite (in testenv/) creates HSTS databases with explicit +ports with the purpose of ensuring Wget2\[cq]s correct behaviour. +Applying HSTS policies to ports other than the default ones is +discouraged by RFC 6797 (see Appendix B \[lq]Differences between HSTS +Policy and Same-Origin Policy\[rq]). +Thus, this functionality should not be used in production environments +and port will typically be zero. +The last three fields do what they are expected to. +The field include_subdomains can either be 1 or 0 and it signals whether +the subdomains of the target domain should be part of the given HSTS +policy as well. +The created and max-age fields hold the timestamp values of when such +entry was created (first seen by Wget) and the HSTS-defined value +`max-age', which states how long should that HSTS policy remain active, +measured in seconds elapsed since the timestamp stored in created. +Once that time has passed, that HSTS policy will no longer be valid and +will eventually be removed from the database. +.PP +If you supply your own HSTS database via \f[V]--hsts-file\f[R], be aware +that Wget2 may modify the provided file if any change occurs between the +HSTS policies requested by the remote servers and those in the file. +When Wget2 exits, it effectively updates the HSTS database by rewriting +the database file with the new entries. +.PP +If the supplied file does not exist, Wget2 will create one. +This file will contain the new HSTS entries. +If no HSTS entries were generated (no +\[lq]Strict-Transport-Security\[rq] headers were sent by any of the +servers) then no file will be created, not even an empty one. +This behaviour applies to the default database file (\[ti]/.wget-hsts) +as well: it will not be created until some server enforces an HSTS +policy. +.PP +Care is taken not to override possible changes made by other Wget2 +processes at the same time over the HSTS database. +Before dumping the updated HSTS entries on the file, Wget2 will re-read +it and merge the changes. +.PP +Using a custom HSTS database and/or modifying an existing one is +discouraged. +For more information about the potential security threats arose from +such practice, see section 14 \[lq]Security Considerations\[rq] of RFC +6797, specially section 14.9 \[lq]Creative Manipulation of HSTS Policy +Store\[rq]. +.SS \f[V]--hsts-preload\f[R] +.PP +Enable loading of a HSTS Preload List as supported by libhsts. +(default: on, if built with libhsts). +.SS \f[V]--hsts-preload-file=file\f[R] +.PP +If built with libhsts, Wget2 uses the HSTS data provided by the +distribution. +If there is no such support by the distribution or if you want to load +your own file, use this option. +.PP +The data file must be in DAFSA format as generated by libhsts\[cq] tool +\f[V]hsts-make-dafsa\f[R]. +.SS \f[V]--hpkp\f[R] +.PP +Enable HTTP Public Key Pinning (HPKP) (default: on). +.PP +This is a Trust On First Use (TOFU) mechanism to add another security +layer to HTTPS (RFC 7469). +.PP +The certificate key data of a previously established TLS session will be +compared with the current data. +In case both doesn\[cq]t match, the connection will be terminated. +.SS \f[V]--hpkp-file=file\f[R] +.PP +By default, Wget2 stores its HPKP data in +\f[V]$XDG_DATA_HOME/wget/.wget-hpkp\f[R] or, if XDG_DATA_HOME is not +set, in \f[V]\[ti]/.local/wget/.wget-hpkp\f[R]. +You can use \f[V]--hpkp-file\f[R] to override this. +.PP +Wget2 will use the supplied file as the HPKP database. +Such file must conform to the correct HPKP database format used by Wget. +If Wget2 cannot parse the provided file, the behaviour is unspecified. +.PP +To disable persistent storage use \f[V]--no-hpkp-file\f[R]. +.SS \f[V]--tls-resume\f[R] +.PP +Enable TLS Session Resumption which is disabled as default. +.PP +For TLS Session Resumption the session data of a previously established +TLS session is needed. +.PP +There are several security flaws related to TLS 1.2 session resumption +which are explained in detail at: +https://web.archive.org/web/20171103231804/https://blog.filippo.io/we-need-to-talk-about-session-tickets/ +.SS \f[V]--tls-session-file=file\f[R] +.PP +By default, Wget2 stores its TLS Session data in +\f[V]$XDG_DATA_HOME/wget/.wget-session\f[R] or, if XDG_DATA_HOME is not +set, in \f[V]\[ti]/.local/wget/.wget-session\f[R]. +You can use \f[V]--tls-session-file\f[R] to override this. +.PP +Wget2 will use the supplied file as the TLS Session database. +Such file must conform to the correct TLS Session database format used +by Wget. +If Wget2 cannot parse the provided file, the behaviour is unspecified. +.PP +To disable persistent storage use \f[V]--no-tls-session-file\f[R]. +.SS \f[V]--tls-false-start\f[R] +.PP +Enable TLS False start (default: on). +.PP +This reduces TLS negotiation by one RT and thus speeds up HTTPS +connections. +.PP +More details at https://tools.ietf.org/html/rfc7918. +.SS \f[V]--check-hostname\f[R] +.PP +Enable TLS SNI verification (default: on). +.SS \f[V]--ocsp\f[R] +.PP +Enable OCSP server access to check the possible revocation the HTTPS +server certificate(s) (default: on). +.PP +This procedure is pretty slow (connect to server, HTTP request, +response) and thus we support OSCP stapling (server sends OCSP response +within TLS handshake) and persistent OCSP caching. +.SS \f[V]--ocsp-date\f[R] +.PP +Check if OCSP response is too old. +(default: on) +.SS \f[V]--ocsp-nonce\f[R] +.PP +Allow nonce checking when verifying OCSP response. +(default: on) +.SS \f[V]--ocsp-server\f[R] +.PP +Set OCSP server address (default: OCSP server given in certificate). +.SS \f[V]--ocsp-stapling\f[R] +.PP +Enable support for OCSP stapling (default: on). +.SS \f[V]--ocsp-file=file\f[R] +.PP +By default, Wget2 stores its TLS Session data in +\f[V]$XDG_DATA_HOME/wget/.wget-ocsp\f[R] or, if XDG_DATA_HOME is not +set, in \f[V]\[ti]/.local/wget/.wget-ocsp\f[R]. +You can use \f[V]--ocsp-file\f[R] to override this. +.PP +Wget2 will use the supplied file as the OCSP database. +Such file must conform to the correct OCSP database format used by Wget. +If Wget2 cannot parse the provided file, the behaviour is unspecified. +.PP +To disable persistent OCSP caching use \f[V]--no-ocsp-file\f[R]. +.SS \f[V]--dane\f[R] (experimental) +.PP +Enable DANE certificate verification (default: off). +.PP +In case the server verification fails due to missing CA certificates +(e.g.\ empty certification pool), this option enables checking the TLSA +DNS entries via DANE. +.PP +You should have DNSSEC set up to avoid MITM attacks. +Also, the destination host\[cq]s DNS entries need to be set up for DANE. +.PP +Warning: This option or its behavior may change or may be removed +without further notice. +.SS \f[V]--http2\f[R] +.PP +Enable HTTP/2 protocol (default: on). +.PP +Wget2 requests HTTP/2 via ALPN. +If available it is preferred over HTTP/1.1. +Up to 30 streams are used in parallel within a single connection. +.SS \f[V]--http2-only\f[R] +.PP +Resist on using HTTP/2 and error if a server doesn\[cq]t accept it. +This is mainly for testing. +.SS \f[V]--https-enforce=mode\f[R] +.PP +Sets how to deal with URLs that are not explicitly HTTPS (where scheme +isn\[cq]t https://) (default: none) +.SS mode=none +.PP +Use HTTP for URLs without scheme. +In recursive operation the scheme of the parent document is taken as +default. +.SS mode=soft +.PP +Try HTTPS first when the scheme is HTTP or not given. +On failure fall back to HTTP. +.SS mode=hard +.PP +Only use HTTPS, no matter if a HTTP scheme is given or not. +Do not fall back to HTTP. +.SS Recursive Retrieval Options +.SS \f[V]-r\f[R], \f[V]--recursive\f[R] +.PP +Turn on recursive retrieving. +The default maximum depth is 5. +.SS \f[V]-l depth\f[R], \f[V]--level=depth\f[R] +.PP +Specify recursion maximum depth level depth. +.SS \f[V]--delete-after\f[R] +.PP +This option tells Wget2 to delete every single file it downloads, after +having done so. +It is useful for pre- fetching popular pages through a proxy, e.g.: +.IP +.nf +\f[C] + wget2 -r -nd --delete-after https://example.com/\[ti]popular/page/ +\f[R] +.fi +.PP +The \f[V]-r\f[R] option is to retrieve recursively, and \f[V]-nd\f[R] to +not create directories. +.PP +Note that when \[en]delete-after is specified, \f[V]--convert-links\f[R] +is ignored, so .orig files are simply not created in the first place. +.SS \f[V]-k\f[R], \f[V]--convert-links\f[R] +.PP +After the download is complete, convert the links in the document to +make them suitable for local viewing. +This affects not only the visible hyperlinks, but any part of the +document that links to external content, such as embedded images, links +to style sheets, hyperlinks to non-HTML content, etc. +.PP +Each link will be changed in one of the two ways: +.IP "1." 3 +The links to files that have been downloaded by Wget2 will be changed to +refer to the file they point to as a relative link. +.RS 4 +.PP +Example: if the downloaded file /foo/doc.html links to /bar/img.gif, +also downloaded, then the link in doc.html will be modified to point to +\&../bar/img.gif. +This kind of transformation works reliably for arbitrary combinations of +directories. +.RE +.IP "2." 3 +The links to files that have not been downloaded by Wget2 will be +changed to include host name and absolute path of the location they +point to. +.RS 4 +.PP +Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or +to ../bar/img.gif), then the link in doc.html will be modified to point +to \f[V]https://example.com/bar/img.gif\f[R]. +.RE +.PP +Because of this, local browsing works reliably: if a linked file was +downloaded, the link will refer to its local name; if it was not +downloaded, the link will refer to its full Internet address rather than +presenting a broken link. +The fact that the former links are converted to relative links ensures +that you can move the downloaded hierarchy to another directory. +.PP +Note that only at the end of the download can Wget2 know which links +have been downloaded. +Because of that, the work done by \f[V]-k\f[R] will be performed at the +end of all the downloads. +.SS \f[V]--convert-file-only\f[R] +.PP +This option converts only the filename part of the URLs, leaving the +rest of the URLs untouched. +This filename part is sometimes referred to as the \[lq]basename\[rq], +although we avoid that term here in order not to cause confusion. +.PP +It works particularly well in conjunction with +\f[V]--adjust-extension\f[R], although this coupling is not enforced. +It proves useful to populate Internet caches with files downloaded from +different hosts. +.PP +Example: if some link points to //foo.com/bar.cgi?xyz with +\f[V]--adjust-extension\f[R] asserted and its local destination is +intended to be ./foo.com/bar.cgi?xyz.css, then the link would be +converted to //foo.com/bar.cgi?xyz.css. +Note that only the filename part has been modified. +The rest of the URL has been left untouched, including the net path +(\[lq]//\[rq]) which would otherwise be processed by Wget2 and converted +to the effective scheme (ie. +\[lq]https://\[rq]). +.SS \f[V]-K\f[R], \f[V]--backup-converted\f[R] +.PP +When converting a file, back up the original version with a .orig +suffix. +Affects the behavior of \f[V]-N\f[R]. +.SS \f[V]-m\f[R], \f[V]--mirror\f[R] +.PP +Turn on options suitable for mirroring. +This option turns on recursion and time-stamping, sets infinite +recursion depth. +It is currently equivalent to \f[V]-r -N -l inf\f[R]. +.SS \f[V]-p\f[R], \f[V]--page-requisites\f[R] +.PP +This option causes Wget2 to download all the files that are necessary to +properly display a given HTML page. +This includes such things as inlined images, sounds, and referenced +stylesheets. +.PP +Ordinarily, when downloading a single HTML page, any requisite documents +that may be needed to display it properly are not downloaded. +Using \f[V]-r\f[R] together with \f[V]-l\f[R] can help, but since Wget2 +does not ordinarily distinguish between external and inlined documents, +one is generally left with \[lq]leaf documents\[rq] that are missing +their requisites. +.PP +For instance, say document \f[V]1.html\f[R] contains an \f[V]<IMG>\f[R] +tag referencing \f[V]1.gif\f[R] and an \f[V]<A>\f[R] tag pointing to +external document \f[V]2.html\f[R]. +Say that \f[V]2.html\f[R] is similar but that its image is +\f[V]2.gif\f[R] and it links to \f[V]3.html\f[R]. +Say this continues up to some arbitrarily high number. +.PP +If one executes the command: +.IP +.nf +\f[C] + wget2 -r -l 2 https://<site>/1.html +\f[R] +.fi +.PP +then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded. +As you can see, 3.html is without its requisite 3.gif because Wget2 is +simply counting the number of hops (up to 2) away from 1.html in order +to determine where to stop the recursion. +However, with this command: +.IP +.nf +\f[C] + wget2 -r -l 2 -p https://<site>/1.html +\f[R] +.fi +.PP +all the above files and 3.html\[cq]s requisite 3.gif will be downloaded. +Similarly, +.IP +.nf +\f[C] + wget2 -r -l 1 -p https://<site>/1.html +\f[R] +.fi +.PP +will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. +One might think that: +.IP +.nf +\f[C] + wget2 -r -l 0 -p https://<site>/1.html +\f[R] +.fi +.PP +would download just 1.html and 1.gif, but unfortunately this is not the +case, because \f[V]-l 0\f[R] is equivalent to \f[V]-l\f[R] inf, that is, +infinite recursion. +To download a single HTML page (or a handful of them, all specified on +the command-line or in a \f[V]-i\f[R] URL input file) and its (or their) +requisites, simply leave off \f[V]-r\f[R] and \f[V]-l\f[R]: +.IP +.nf +\f[C] + wget2 -p https://<site>/1.html +\f[R] +.fi +.PP +Note that Wget2 will behave as if \f[V]-r\f[R] had been specified, but +only that single page and its requisites will be downloaded. +Links from that page to external documents will not be followed. +Actually, to download a single page and all its requisites (even if they +exist on separate websites), and make sure the lot displays properly +locally, this author likes to use a few options in addition to +\f[V]-p\f[R]: +.IP +.nf +\f[C] + wget2 -E -H -k -K -p https://<site>/<document> +\f[R] +.fi +.PP +To finish off this topic, it\[cq]s worth knowing that Wget2\[cq]s idea +of an external document link is any URL specified in an \f[V]<A>\f[R] +tag, an \f[V]<AREA>\f[R] tag, or a \f[V]<LINK>\f[R] tag other than +\f[V]<LINK REL=\[dq]stylesheet\[dq]>\f[R]. +.SS \f[V]--strict-comments\f[R] +.PP +Obsolete option for compatibility with Wget1.x. +Wget2 always terminates comments at the first occurrence of +\f[V]-->\f[R], as popular browsers do. +.SS \f[V]--robots\f[R] +.PP +Enable the Robots Exclusion Standard (default: on). +.PP +For each visited domain, follow rules specified in +\f[V]/robots.txt\f[R]. +You should respect the domain owner\[cq]s rules and turn this off only +for very good reasons. +.PP +Whether enabled or disabled, the \f[V]robots.txt\f[R] file is downloaded +and scanned for sitemaps. +These are lists of pages / files available for download that not +necessarily are available via recursive scanning. +.PP +This behavior can be switched off by \f[V]--no-follow-sitemaps\f[R]. +.SS Recursive Accept/Reject Options +.SS \f[V]-A acclist\f[R], \f[V]--accept=acclist\f[R], \f[V]-R rejlist\f[R], \f[V]--reject=rejlist\f[R] +.PP +Specify comma-separated lists of file name suffixes or patterns to +accept or reject. +Note that if any of the wildcard characters, \f[V]*, ?, [, ]\f[R], +appear in an element of acclist or rejlist, it will be treated as a +pattern, rather than a suffix. +In this case, you have to enclose the pattern into quotes to prevent +your shell from expanding it, like in \f[V]-A \[dq]*.mp3\[dq]\f[R] or +\f[V]-A \[aq]*.mp3\[aq]\f[R]. +.SS \f[V]--accept-regex=urlregex\f[R], \f[V]--reject-regex=urlregex\f[R] +.PP +Specify a regular expression to accept or reject file names. +.SS \f[V]--regex-type=regextype\f[R] +.PP +Specify the regular expression type. +Possible types are posix or pcre. +Note that to be able to use pcre type, wget2 has to be compiled with +libpcre support. +.SS \f[V]--filter-urls\f[R] +.PP +Apply the accept and reject filters on the URL before starting a +download. +.SS \f[V]-D domain-list\f[R], \f[V]--domains=domain-list\f[R] +.PP +Set domains to be followed. +domain-list is a comma-separated list of domains. +Note that it does not turn on \f[V]-H\f[R]. +.SS \f[V]--exclude-domains=domain-list\f[R] +.PP +Specify the domains that are not to be followed. +.SS \f[V]--follow-sitemaps\f[R] +.PP +Parsing the sitemaps from \f[V]robots.txt\f[R] and follow the links. +(default: on). +.PP +This option is on for recursive downloads whether you specify +\f[V]--robots\f[R] or \f[V]-no-robots\f[R]. +Following the URLs found in sitemaps can be switched off with +\f[V]--no-follow-sitemaps\f[R]. +.SS \f[V]--follow-tags=list\f[R] +.PP +Wget2 has an internal table of HTML tag / attribute pairs that it +considers when looking for linked documents during a recursive +retrieval. +If a user wants only a subset of those tags to be considered, however, +he or she should be specify such tags in a comma-separated list with +this option. +.SS \f[V]--ignore-tags=list\f[R] +.PP +This is the opposite of the \f[V]--follow-tags\f[R] option. +To skip certain HTML tags when recursively looking for documents to +download, specify them in a comma-separated list. +.PP +In the past, this option was the best bet for downloading a single page +and its requisites, using a command-line like: +.IP +.nf +\f[C] + wget2 --ignore-tags=a,area -H -k -K -r https://<site>/<document> +\f[R] +.fi +.PP +However, the author of this option came across a page with tags like +\[lq]\[rq] and came to the realization that specifying tags to ignore +was not enough. +One can\[cq]t just tell Wget2 to ignore \[lq]\[rq], because then +stylesheets will not be downloaded. +Now the best bet for downloading a single page and its requisites is the +dedicated \f[V]--page-requisites\f[R] option. +.SS \f[V]--ignore-case\f[R] +.PP +Ignore case when matching files and directories. +This influences the behavior of \f[V]-R\f[R], \f[V]-A\f[R], +\f[V]-I\f[R], and \f[V]-X\f[R] options. +For example, with this option, \f[V]-A\f[R] \[lq]*.txt\[rq] will match +file1.txt, but also file2.TXT, file3.TxT, and so on. +The quotes in the example are to prevent the shell from expanding the +pattern. +.SS \f[V]-H\f[R], \f[V]--span-hosts\f[R] +.PP +Enable spanning across hosts when doing recursive retrieving. +.SS \f[V]-L\f[R], \f[V]--relative\f[R] [Not implemented yet] +.PP +Follow relative links only. +Useful for retrieving a specific home page without any distractions, not +even those from the same hosts. +.SS \f[V]-I list\f[R], \f[V]--include-directories=list\f[R] +.PP +Specify a comma-separated list of directories you wish to follow when +downloading. +Elements of the list may contain wildcards. +.IP +.nf +\f[C] + wget2 -r https://webpage.domain --include-directories=*/pub/*/ +\f[R] +.fi +.PP +Please keep in mind that \f[V]*/pub/*/\f[R] is the same as +\f[V]/*/pub/*/\f[R] and that it matches directories, not strings. +This means that \f[V]*/pub\f[R] doesn\[cq]t affect files contained at +e.g.\ \f[V]/directory/something/pub\f[R] but \f[V]/pub/*\f[R] matches +every subdir of \f[V]/pub\f[R]. +.SS \f[V]-X list\f[R], \f[V]--exclude-directories=list\f[R] +.PP +Specify a comma-separated list of directories you wish to exclude from +download. +Elements of the list may contain wildcards. +.IP +.nf +\f[C] + wget2 -r https://gnu.org --exclude-directories=/software +\f[R] +.fi +.SS \f[V]-I\f[R] / \f[V]-X\f[R] combinations +.PP +Please be aware that the behavior of this combination of flags works +slightly different than in wget1.x. +.PP +If \f[V]-I\f[R] is given first, the default is `exclude all'. +If \f[V]-X\f[R] is given first, the default is `include all'. +.PP +Multiple \f[V]-I\f[R]/\f[V]-X\f[R] options are processed `first to +last'. +The last match is relevant. +.IP +.nf +\f[C] + Example: \[ga]-I /pub -X /pub/trash\[ga] would download all from /pub/ except from /pub/trash. + Example: \[ga]-X /pub -I /pub/important\[ga] would download all except from /pub where only /pub/important would be downloaded. +\f[R] +.fi +.PP +To reset the list (e.g.\ to ignore \f[V]-I\f[R]/\f[V]-X\f[R] from +\f[V].wget2rc\f[R] files) use \f[V]--no-include-directories\f[R] or +\f[V]--no-exclude-directories\f[R]. +.SS \f[V]-np\f[R], \f[V]--no-parent\f[R] +.PP +Do not ever ascend to the parent directory when retrieving recursively. +This is a useful option, since it guarantees that only the files below a +certain hierarchy will be downloaded. +.SS \f[V]--filter-mime-type=list\f[R] +.PP +Specify a comma-separated list of MIME types that will be downloaded. +Elements of list may contain wildcards. +If a MIME type starts with the character `!' it won\[cq]t be downloaded, +this is useful when trying to download something with exceptions. +If server doesn\[cq]t specify the MIME type of a file it will be +considered as `application/octet-stream'. +For example, download everything except images: +.IP +.nf +\f[C] + wget2 -r https://<site>/<document> --filter-mime-type=*,\[rs]!image/* +\f[R] +.fi +.PP +It is also useful to download files that are compatible with an +application of your system. +For instance, download every file that is compatible with LibreOffice +Writer from a website using the recursive mode: +.IP +.nf +\f[C] + wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r \[aq]/\[ha]MimeType=/!d;s/\[ha]MimeType=//;s/;/,/g\[aq] /usr/share/applications/libreoffice-writer.desktop) +\f[R] +.fi +.SS Plugin Options +.SS \f[V]--list-plugins\f[R] +.PP +Print a list all available plugins and exit. +.SS \f[V]--local-plugin=file\f[R] +.PP +Load \f[V]file\f[R] as plugin. +.SS \f[V]--plugin=name\f[R] +.PP +Load a plugin with a given \f[V]name\f[R] from the configured plugin +directories. +.SS \f[V]--plugin-dirs=directories\f[R] +.PP +Set plugin directories. +\f[V]directories\f[R] is a comma-separated list of directories. +.SS \f[V]--plugin-help\f[R] +.PP +Print the help messages from all loaded plugins. +.SS \f[V]--plugin-opt=option\f[R] +.PP +Set a plugin specific command line option. +.PP +\f[V]option\f[R] is in the format +\f[V]<plugin_name>.<option>[=value]\f[R]. +.SH Environment +.PP +Wget2 supports proxies for both HTTP and HTTPS retrievals. +The standard way to specify proxy location, which Wget recognizes, is +using the following environment variables: +.PP +\f[V]http_proxy\f[R] +.PP +\f[V]https_proxy\f[R] +.PP +If set, the \f[V]http_proxy\f[R] and \f[V]https_proxy\f[R] variables +should contain the URLs of the proxies for HTTP and HTTPS connections +respectively. +.PP +\f[V]no_proxy\f[R] +.PP +This variable should contain a comma-separated list of domain extensions +\f[V]proxy\f[R] should not be used for. +For instance, if the value of \f[V]no_proxy\f[R] is +\f[V].example.com\f[R], \f[V]proxy\f[R] will not be used to retrieve +documents from \f[V]*.example.com\f[R]. +.SH Exit Status +.PP +Wget2 may return one of several error codes if it encounters problems. +.IP +.nf +\f[C] + 0 No problems occurred. + + 1 Generic error code. + + 2 Parse error. For instance, when parsing command-line options, the .wget2rc or .netrc... + + 3 File I/O error. + + 4 Network failure. + + 5 SSL verification failure. + + 6 Username/password authentication failure. + + 7 Protocol errors. + + 8 Server issued an error response. + + 9 Public key missing from keyring. + + 10 A Signature verification failed. +\f[R] +.fi +.PP +With the exceptions of 0 and 1, the lower-numbered exit codes take +precedence over higher-numbered ones, when multiple types of errors are +encountered. +.SH Startup File +.PP +Sometimes you may wish to permanently change the default behaviour of +GNU Wget2. +There is a better way to do this than setting an alias in your shell. +GNU Wget2 allows you to set all options permanently through its startup +up, \f[V].wget2rc\f[R]. +.PP +While \f[V].wget2rc\f[R] is the \f[I]main\f[R] initialization file used +by GNU Wget2, it is not a good idea to store passwords in this file. +This is because the startup file maybe publicly readable or backed up in +version control. +This is why Wget2 also reads the contents of \f[V]$HOME/.netrc\f[R] when +required. +.PP +The \f[V].wget2rc\f[R] file follows a very similar syntax to the +\f[V].wgetrc\f[R] that is read by GNU Wget. +It varies in only those places where the command line options vary +between Wget1.x and Wget2. +.SS Wget2rc Location +.PP +When initializing, Wget2 will attempt to read the \[lq]global\[rq] +startup file, which is located at `/usr/local/etc/wget2rc' by default +(or some prefix other than `/usr/local', if Wget2 was not installed +there). +The global startup file is useful for system administrators to enforce a +default policy, such as setting the path to the certificate store, +preloading a HSTS list, etc. +.PP +Then, Wget2 will look for the user\[cq]s initialization file. +If the user has passed the \f[V]--config\f[R] command line option, Wget2 +will try to load the file that it points to. +If file does not exist, or if it cannot be read, Wget2 will make no +further attempts to read any initialization files. +.PP +If the environment variable \f[V]WGET2RC\f[R] is set, Wget2 will try to +load the file at this location. +If the file does not exist, or if it cannot be read, Wget2 will make no +further attempts to read an initialization file. +.PP +If, \f[V]--config\f[R] is not passed and \f[V]WGET2RC\f[R] is not set, +Wget2 will attempt to load the user\[cq]s initialization file from a +location as defined by the XDG Base Directory Specification. +It will read the first, and only the first file it finds from the +following locations: +.IP "1." 3 +\f[V]$XDG_CONFIG_HOME/wget/wget2rc\f[R] +.IP "2." 3 +\f[V]$HOME/.config/wget/wget2rc\f[R] +.IP "3." 3 +\f[V]$HOME/.wget2rc\f[R] +.PP +Having an initialization file at \f[V]$HOME/.wget2rc\f[R] is deprecated. +If a file is found there, Wget2 will print a warning about it. +Support for reading from this file will be removed in the future. +.PP +The fact that the user\[cq]s settings are loaded after the system-wide +ones means that in case of a collision, the user\[cq]s wget2rc +\f[I]overrides\f[R] the global wget2rc. +.SH Bugs +.PP +You are welcome to submit bug reports via the GNU Wget2 bug +tracker (https://gitlab.com/gnuwget/wget2/issues). +.PP +Before actually submitting a bug report, please try to follow a few +simple guidelines. +.IP "1." 3 +Please try to ascertain that the behavior you see really is a bug. +If Wget2 crashes, it\[cq]s a bug. +If Wget2 does not behave as documented, it\[cq]s a bug. +If things work strange, but you are not sure about the way they are +supposed to work, it might well be a bug, but you might want to +double-check the documentation and the mailing lists. +.IP "2." 3 +Try to repeat the bug in as simple circumstances as possible. +E.g. +if Wget2 crashes while downloading +\f[V]wget2 -rl0 -kKE -t5 --no-proxy https://example.com -o /tmp/log\f[R], +you should try to see if the crash is repeatable, and if will occur with +a simpler set of options. +You might even try to start the download at the page where the crash +occurred to see if that page somehow triggered the crash. +.PP +Also, while I will probably be interested to know the contents of your +\f[V].wget2rc\f[R] file, just dumping it into the debug message is +probably a bad idea. +Instead, you should first try to see if the bug repeats with +\f[V].wget2rc\f[R] moved out of the way. +Only if it turns out that \f[V].wget2rc\f[R] settings affect the bug, +mail me the relevant parts of the file. +.IP "3." 3 +Please start Wget2 with \f[V]-d\f[R] option and send us the resulting +output (or relevant parts thereof). +If Wget2 was compiled without debug support, recompile it. +It is much easier to trace bugs with debug support on. +.PP +Note: please make sure to remove any potentially sensitive information +from the debug log before sending it to the bug address. +The \f[V]-d\f[R] won\[cq]t go out of its way to collect sensitive +information, but the log will contain a fairly complete transcript of +Wget2\[cq]s communication with the server, which may include passwords +and pieces of downloaded data. +Since the bug address is publicly archived, you may assume that all bug +reports are visible to the public. +.IP "4." 3 +If Wget2 has crashed, try to run it in a debugger, +e.g.\ \f[V]gdb \[ga]which wget\[ga] core\f[R] and type \[lq]where\[rq] +to get the backtrace. +This may not work if the system administrator has disabled core files, +but it is safe to try. +.SH Author +.PP +Wget2 written by Tim Rühsen <tim.ruehsen@gmx.de> +.PP +Wget 1.x originally written by Hrvoje Nikšić <hniksic@xemacs.org> +.SH Copyright +.PP +Copyright (C) 2012-2015 Tim Rühsen +.PP +Copyright (C) 2015-2023 Free Software Foundation, Inc. +.PP +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, with no Front-Cover Texts, and with no Back-Cover +Texts. +A copy of the license is included in the section entitled \[lq]GNU Free +Documentation License\[rq]. |