diff options
Diffstat (limited to 'Documentation/gitdiffcore.txt')
-rw-r--r-- | Documentation/gitdiffcore.txt | 333 |
1 files changed, 333 insertions, 0 deletions
diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt new file mode 100644 index 0000000..0d57f86 --- /dev/null +++ b/Documentation/gitdiffcore.txt @@ -0,0 +1,333 @@ +gitdiffcore(7) +============== + +NAME +---- +gitdiffcore - Tweaking diff output + +SYNOPSIS +-------- +[verse] +'git diff' * + +DESCRIPTION +----------- + +The diff commands 'git diff-index', 'git diff-files', and 'git diff-tree' +can be told to manipulate differences they find in +unconventional ways before showing 'diff' output. The manipulation +is collectively called "diffcore transformation". This short note +describes what they are and how to use them to produce 'diff' output +that is easier to understand than the conventional kind. + + +The chain of operation +---------------------- + +The 'git diff-{asterisk}' family works by first comparing two sets of +files: + + - 'git diff-index' compares contents of a "tree" object and the + working directory (when `--cached` flag is not used) or a + "tree" object and the index file (when `--cached` flag is + used); + + - 'git diff-files' compares contents of the index file and the + working directory; + + - 'git diff-tree' compares contents of two "tree" objects; + +In all of these cases, the commands themselves first optionally limit +the two sets of files by any pathspecs given on their command-lines, +and compare corresponding paths in the two resulting sets of files. + +The pathspecs are used to limit the world diff operates in. They remove +the filepairs outside the specified sets of pathnames. E.g. If the +input set of filepairs included: + +------------------------------------------------ +:100644 100644 bcd1234... 0123456... M junkfile +------------------------------------------------ + +but the command invocation was `git diff-files myfile`, then the +junkfile entry would be removed from the list because only "myfile" +is under consideration. + +The result of comparison is passed from these commands to what is +internally called "diffcore", in a format similar to what is output +when the -p option is not used. E.g. + +------------------------------------------------ +in-place edit :100644 100644 bcd1234... 0123456... M file0 +create :000000 100644 0000000... 1234567... A file4 +delete :100644 000000 1234567... 0000000... D file5 +unmerged :000000 000000 0000000... 0000000... U file6 +------------------------------------------------ + +The diffcore mechanism is fed a list of such comparison results +(each of which is called "filepair", although at this point each +of them talks about a single file), and transforms such a list +into another list. There are currently 5 such transformations: + +- diffcore-break +- diffcore-rename +- diffcore-merge-broken +- diffcore-pickaxe +- diffcore-order +- diffcore-rotate + +These are applied in sequence. The set of filepairs 'git diff-{asterisk}' +commands find are used as the input to diffcore-break, and +the output from diffcore-break is used as the input to the +next transformation. The final result is then passed to the +output routine and generates either diff-raw format (see Output +format sections of the manual for 'git diff-{asterisk}' commands) or +diff-patch format. + + +diffcore-break: For Splitting Up Complete Rewrites +-------------------------------------------------- + +The second transformation in the chain is diffcore-break, and is +controlled by the -B option to the 'git diff-{asterisk}' commands. This is +used to detect a filepair that represents "complete rewrite" and +break such filepair into two filepairs that represent delete and +create. E.g. If the input contained this filepair: + +------------------------------------------------ +:100644 100644 bcd1234... 0123456... M file0 +------------------------------------------------ + +and if it detects that the file "file0" is completely rewritten, +it changes it to: + +------------------------------------------------ +:100644 000000 bcd1234... 0000000... D file0 +:000000 100644 0000000... 0123456... A file0 +------------------------------------------------ + +For the purpose of breaking a filepair, diffcore-break examines +the extent of changes between the contents of the files before +and after modification (i.e. the contents that have "bcd1234..." +and "0123456..." as their SHA-1 content ID, in the above +example). The amount of deletion of original contents and +insertion of new material are added together, and if it exceeds +the "break score", the filepair is broken into two. The break +score defaults to 50% of the size of the smaller of the original +and the result (i.e. if the edit shrinks the file, the size of +the result is used; if the edit lengthens the file, the size of +the original is used), and can be customized by giving a number +after "-B" option (e.g. "-B75" to tell it to use 75%). + + +diffcore-rename: For Detecting Renames and Copies +------------------------------------------------- + +This transformation is used to detect renames and copies, and is +controlled by the -M option (to detect renames) and the -C option +(to detect copies as well) to the 'git diff-{asterisk}' commands. If the +input contained these filepairs: + +------------------------------------------------ +:100644 000000 0123456... 0000000... D fileX +:000000 100644 0000000... 0123456... A file0 +------------------------------------------------ + +and the contents of the deleted file fileX is similar enough to +the contents of the created file file0, then rename detection +merges these filepairs and creates: + +------------------------------------------------ +:100644 100644 0123456... 0123456... R100 fileX file0 +------------------------------------------------ + +When the "-C" option is used, the original contents of modified files, +and deleted files (and also unmodified files, if the +"--find-copies-harder" option is used) are considered as candidates +of the source files in rename/copy operation. If the input were like +these filepairs, that talk about a modified file fileY and a newly +created file file0: + +------------------------------------------------ +:100644 100644 0123456... 1234567... M fileY +:000000 100644 0000000... bcd3456... A file0 +------------------------------------------------ + +the original contents of fileY and the resulting contents of +file0 are compared, and if they are similar enough, they are +changed to: + +------------------------------------------------ +:100644 100644 0123456... 1234567... M fileY +:100644 100644 0123456... bcd3456... C100 fileY file0 +------------------------------------------------ + +In both rename and copy detection, the same "extent of changes" +algorithm used in diffcore-break is used to determine if two +files are "similar enough", and can be customized to use +a similarity score different from the default of 50% by giving a +number after the "-M" or "-C" option (e.g. "-M8" to tell it to use +8/10 = 80%). + +Note that when rename detection is on but both copy and break +detection are off, rename detection adds a preliminary step that first +checks if files are moved across directories while keeping their +filename the same. If there is a file added to a directory whose +contents is sufficiently similar to a file with the same name that got +deleted from a different directory, it will mark them as renames and +exclude them from the later quadratic step (the one that pairwise +compares all unmatched files to find the "best" matches, determined by +the highest content similarity). So, for example, if a deleted +docs/ext.txt and an added docs/config/ext.txt are similar enough, they +will be marked as a rename and prevent an added docs/ext.md that may +be even more similar to the deleted docs/ext.txt from being considered +as the rename destination in the later step. For this reason, the +preliminary "match same filename" step uses a bit higher threshold to +mark a file pair as a rename and stop considering other candidates for +better matches. At most, one comparison is done per file in this +preliminary pass; so if there are several remaining ext.txt files +throughout the directory hierarchy after exact rename detection, this +preliminary step may be skipped for those files. + +Note. When the "-C" option is used with `--find-copies-harder` +option, 'git diff-{asterisk}' commands feed unmodified filepairs to +diffcore mechanism as well as modified ones. This lets the copy +detector consider unmodified files as copy source candidates at +the expense of making it slower. Without `--find-copies-harder`, +'git diff-{asterisk}' commands can detect copies only if the file that was +copied happened to have been modified in the same changeset. + + +diffcore-merge-broken: For Putting Complete Rewrites Back Together +------------------------------------------------------------------ + +This transformation is used to merge filepairs broken by +diffcore-break, and not transformed into rename/copy by +diffcore-rename, back into a single modification. This always +runs when diffcore-break is used. + +For the purpose of merging broken filepairs back, it uses a +different "extent of changes" computation from the ones used by +diffcore-break and diffcore-rename. It counts only the deletion +from the original, and does not count insertion. If you removed +only 10 lines from a 100-line document, even if you added 910 +new lines to make a new 1000-line document, you did not do a +complete rewrite. diffcore-break breaks such a case in order to +help diffcore-rename to consider such filepairs as candidate of +rename/copy detection, but if filepairs broken that way were not +matched with other filepairs to create rename/copy, then this +transformation merges them back into the original +"modification". + +The "extent of changes" parameter can be tweaked from the +default 80% (that is, unless more than 80% of the original +material is deleted, the broken pairs are merged back into a +single modification) by giving a second number to -B option, +like these: + +* -B50/60 (give 50% "break score" to diffcore-break, use 60% + for diffcore-merge-broken). + +* -B/60 (the same as above, since diffcore-break defaults to 50%). + +Note that earlier implementation left a broken pair as a separate +creation and deletion patches. This was an unnecessary hack and +the latest implementation always merges all the broken pairs +back into modifications, but the resulting patch output is +formatted differently for easier review in case of such +a complete rewrite by showing the entire contents of old version +prefixed with '-', followed by the entire contents of new +version prefixed with '+'. + + +diffcore-pickaxe: For Detecting Addition/Deletion of Specified String +--------------------------------------------------------------------- + +This transformation limits the set of filepairs to those that change +specified strings between the preimage and the postimage in a certain +way. -S<block of text> and -G<regular expression> options are used to +specify different ways these strings are sought. + +"-S<block of text>" detects filepairs whose preimage and postimage +have different number of occurrences of the specified block of text. +By definition, it will not detect in-file moves. Also, when a +changeset moves a file wholesale without affecting the interesting +string, diffcore-rename kicks in as usual, and `-S` omits the filepair +(since the number of occurrences of that string didn't change in that +rename-detected filepair). When used with `--pickaxe-regex`, treat +the <block of text> as an extended POSIX regular expression to match, +instead of a literal string. + +"-G<regular expression>" (mnemonic: grep) detects filepairs whose +textual diff has an added or a deleted line that matches the given +regular expression. This means that it will detect in-file (or what +rename-detection considers the same file) moves, which is noise. The +implementation runs diff twice and greps, and this can be quite +expensive. To speed things up binary files without textconv filters +will be ignored. + +When `-S` or `-G` are used without `--pickaxe-all`, only filepairs +that match their respective criterion are kept in the output. When +`--pickaxe-all` is used, if even one filepair matches their respective +criterion in a changeset, the entire changeset is kept. This behavior +is designed to make reviewing changes in the context of the whole +changeset easier. + +diffcore-order: For Sorting the Output Based on Filenames +--------------------------------------------------------- + +This is used to reorder the filepairs according to the user's +(or project's) taste, and is controlled by the -O option to the +'git diff-{asterisk}' commands. + +This takes a text file each of whose lines is a shell glob +pattern. Filepairs that match a glob pattern on an earlier line +in the file are output before ones that match a later line, and +filepairs that do not match any glob pattern are output last. + +As an example, a typical orderfile for the core Git probably +would look like this: + +------------------------------------------------ +README +Makefile +Documentation +*.h +*.c +t +------------------------------------------------ + +diffcore-rotate: For Changing At Which Path Output Starts +--------------------------------------------------------- + +This transformation takes one pathname, and rotates the set of +filepairs so that the filepair for the given pathname comes first, +optionally discarding the paths that come before it. This is used +to implement the `--skip-to` and the `--rotate-to` options. It is +an error when the specified pathname is not in the set of filepairs, +but it is not useful to error out when used with "git log" family of +commands, because it is unreasonable to expect that a given path +would be modified by each and every commit shown by the "git log" +command. For this reason, when used with "git log", the filepair +that sorts the same as, or the first one that sorts after, the given +pathname is where the output starts. + +Use of this transformation combined with diffcore-order will produce +unexpected results, as the input to this transformation is likely +not sorted when diffcore-order is in effect. + + +SEE ALSO +-------- +linkgit:git-diff[1], +linkgit:git-diff-files[1], +linkgit:git-diff-index[1], +linkgit:git-diff-tree[1], +linkgit:git-format-patch[1], +linkgit:git-log[1], +linkgit:gitglossary[7], +link:user-manual.html[The Git User's Manual] + +GIT +--- +Part of the linkgit:git[1] suite |