summaryrefslogtreecommitdiffstats
path: root/src/arrow/docs/source/developers/computeir.rst
diff options
context:
space:
mode:
Diffstat (limited to 'src/arrow/docs/source/developers/computeir.rst')
-rw-r--r--src/arrow/docs/source/developers/computeir.rst59
1 files changed, 59 insertions, 0 deletions
diff --git a/src/arrow/docs/source/developers/computeir.rst b/src/arrow/docs/source/developers/computeir.rst
new file mode 100644
index 000000000..9ebe1d5af
--- /dev/null
+++ b/src/arrow/docs/source/developers/computeir.rst
@@ -0,0 +1,59 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+**********************************************
+Arrow Compute IR (Intermediate Representation)
+**********************************************
+
+In the same way that the Arrow format provides a powerful tool
+for communicating data, Compute IR is intended to provide a
+consistent format for representing analytical operations against
+that data. As an arrow-native expression of computation it includes
+information such as explicit types and schemas and arrow formatted
+literal data. It is also optimized for low runtime overhead in both
+serialization and deserialization.
+
+Built-in definitions are included to enable representation of
+relational algebraic operations- the contents of a "logical query plan".
+Compute IR also has first class support for representing operations
+which are not members of a minimal relational algebra, including
+implementation and optimization details- the contents of a "physical
+query plan". This approach is taken in emulation of `MLIR`_ (Multi-Level
+Intermediate Representation), a system which has had strong successes in
+spaces of comparable complexity to representation of analytic operations.
+To borrow terms from that project, there are two mutations of interest:
+
+* Replacement of representations with semantically equivalent representations
+ which will yield better performance for consumers- an optimization pass.
+* Replacement of abstract or generic representations with more specific
+ and potentially consumer-specific representations- a lowering pass.
+ This modification corresponds to the translation of a logical plan
+ to a physical plan.
+
+Allowing representation of physical plans (and plans which are between
+logical and physical) in Compute IR enables systems to define incremental
+optimization and lowering passes which operate on and produce valid
+Compute IR. This in turn enables communication, manipulation, and inspection
+at every stage of lowering/optimization by the same tools
+used for logical-plan-equivalent-IR. This is especially useful for systems
+where such passes may depend on information only available on every node
+of a distributed consumer (for example statistics unique to that node's
+local data) or may not be universal to all backends in a heterogeneous
+consumer (for example which optimizations nodes are capable of for
+non equi joins).
+
+.. _MLIR: https://mlir.llvm.org