summaryrefslogtreecommitdiffstats
path: root/vendor/imara-diff/README.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:41:41 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-05-04 12:41:41 +0000
commit10ee2acdd26a7f1298c6f6d6b7af9b469fe29b87 (patch)
treebdffd5d80c26cf4a7a518281a204be1ace85b4c1 /vendor/imara-diff/README.md
parentReleasing progress-linux version 1.70.0+dfsg1-9~progress7.99u1. (diff)
downloadrustc-10ee2acdd26a7f1298c6f6d6b7af9b469fe29b87.tar.xz
rustc-10ee2acdd26a7f1298c6f6d6b7af9b469fe29b87.zip
Merging upstream version 1.70.0+dfsg2.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'vendor/imara-diff/README.md')
-rw-r--r--vendor/imara-diff/README.md107
1 files changed, 107 insertions, 0 deletions
diff --git a/vendor/imara-diff/README.md b/vendor/imara-diff/README.md
new file mode 100644
index 000000000..f652d478b
--- /dev/null
+++ b/vendor/imara-diff/README.md
@@ -0,0 +1,107 @@
+# imara-diff
+
+[![crates.io](https://img.shields.io/crates/v/imara-diff?style=flat-square)](https://crates.io/crates/imara-diff)
+[![crates.io](https://img.shields.io/docsrs/imara-diff?style=flat-square)](https://docs.rs/imara-diff/latest/imara_diff/)
+![crates.io](https://img.shields.io/crates/l/imara-diff?style=flat-square)
+
+Imara-diff is a solid (imara in swahili) diff library for rust.
+Solid refers to the fact that imara-diff provides very good runtime performance even
+in pathologic cases so that your application never appears to freeze while waiting on a diff.
+The performance improvements are achieved using battle tested heuristics used in gnu-diff and git
+that are known to perform well while still providing good results.
+
+Imara-diff is also designed to be flexible so that it can be used with arbitrary collections and
+not just lists and strings and even allows reusing large parts of the computation when
+comparing the same file to multiple different files.
+
+Imara-diff provides two diff algorithms:
+
+* The linear-space variant of the well known [Myers algorithm](http://www.xmailserver.org/diff2.pdf)
+* The **Histogram** algorithm which variant of the patience diff algorithm.
+
+Myers algorithm has been enhanced with preprocessing and multiple heuristics to ensure fast runtime in pathological
+cases to avoid quadratic time complexity and closely matches the behavior of gnu-diff and git.
+The histogram algorithm was originally ported from git but has been heavily optimized.
+The **Histogram algorithm outperforms Myers algorithm** by 10% - 100% across a **wide variety of workloads**.
+
+## Limitations
+
+Even with the optimizations in this crate, performing a large diff without any tokenization (like character diff for a string) does not perform well.
+To work around this problem a diff of the entire file with large tokens (like lines for a string) can be performed first.
+The `Sink` implementation can then perform fine-grained diff on changed regions.
+Note that this fine-grained diff should not be performed for pure insertions, pure deletions and very large changes.
+
+In an effort to improve performance, `imara-diff` makes heavy use of pointer compression.
+That means that it can only support files with at most `2^31 - 2` tokens.
+This should be rarely an issue in practice for textual diffs, because most (large) real-world files
+have an average line-length of at least 8.
+That means that this limitation only becomes a problem for files above 16GB while performing line-diffs.
+
+## Benchmarks
+
+The most used diffing libraries in the rust ecosystem are [similar](https://crates.io/crates/similar) and [dissimilar](https://crates.io/crates/similar).
+The fastest diff implementation both of these offer is a simple implementation of Myers algorithm
+without preprocessing or additional heuristics.
+As these implementations are very similar only `similar` was included in the benchmark.
+
+To provide a benchmark to reflects real-world workloads, the git history of different open source projects were used.
+For each repo two (fairly different) tags were chosen.
+A tree diff is performed with [gitoxide](https://github.com/Byron/gitoxide) and the pairs of files that should be saved are stored in memory.
+The diffs collected using this method are often fairly large, because the repositories are compared over a large span of time.
+Therefore, the tree diff of the last 30 commit before the tag (equivalent of `git diff TAG^ TAG`, `git diff TAG^^ TAG^^`) were also used to also include smaller diffs.
+
+The benchmark measures the runtime of performing a **line diff** between the collected files.
+As a measure of complexity for each change `(M + N) D` was used where `M` and `N` are the lengths of the two compared files
+and `D` is the length of the edit script required to transform these files into each other (determined with Myers algorithm).
+This complexity measure is used to divide the changes into 10 badges.
+The time to compute the line diffs in each badge was benchmarked.
+
+The plots below show the runtime for each **average** complexity (runtime is normalized by the number of diffs).
+Note that these plots are shown in logarithmic scale due to the large runtime of `similar` for complex diffs.
+Furthermore, to better highlight the performance of the Histogram algorithm, the speedup of the Histogram algorithm
+compared to the Myers algorithm is shown separately.
+
+* [Linux](#Linux)
+* [Rust](#Rust)
+* [VSCode](#VSCode)
+* [Helix](#Helix)
+
+# Linux
+
+The sourcecode of the linux kernel.
+
+- **Repo** - https://kernel.org
+- **Tags** - `v5.7` and `v6.0`
+
+<img src='plots/linux_comparison.svg' width="700">
+<img src='plots/linux_speedup.svg' width="700">
+
+# Rust
+
+The sourcecode of the rust compiler, standard library and various related tooling.
+
+- **Repo** - https://github.com/rust-lang/rust
+- **Tags** - `1.50.0` and `1.64.0`
+
+<img src='plots/rust_comparison.svg' width="700">
+<img src='plots/rust_speedup.svg' width="700">
+
+# VScode
+
+The sourcecode of the vscode editor.
+
+- **Repo** - https://github.com/microsoft/vscode
+- **Tags** - `1.41.0` and `1.72.2`
+
+<img src='plots/vscode_comparison.svg' width="700">
+<img src='plots/vscode_speedup.svg' width="700">
+
+# Helix
+
+The sourcecode of the helix editor.
+
+- **Repo** - https://github.com/helix-editor/helix
+- **Tags** - `v0.5.0` and `22.08.1`
+
+<img src='plots/helix_comparison.svg' width="700">
+<img src='plots/helix_speedup.svg' width="700">