diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-21 11:54:28 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-21 11:54:28 +0000 |
commit | e6918187568dbd01842d8d1d2c808ce16a894239 (patch) | |
tree | 64f88b554b444a49f656b6c656111a145cbbaa28 /src/arrow/docs/source/developers/contributing.rst | |
parent | Initial commit. (diff) | |
download | ceph-e6918187568dbd01842d8d1d2c808ce16a894239.tar.xz ceph-e6918187568dbd01842d8d1d2c808ce16a894239.zip |
Adding upstream version 18.2.2.upstream/18.2.2
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/arrow/docs/source/developers/contributing.rst')
-rw-r--r-- | src/arrow/docs/source/developers/contributing.rst | 362 |
1 files changed, 362 insertions, 0 deletions
diff --git a/src/arrow/docs/source/developers/contributing.rst b/src/arrow/docs/source/developers/contributing.rst new file mode 100644 index 000000000..9b81a6ff1 --- /dev/null +++ b/src/arrow/docs/source/developers/contributing.rst @@ -0,0 +1,362 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. _contributing: + +**************************** +Contributing to Apache Arrow +**************************** + +Thanks for your interest in the Apache Arrow project. Arrow is a large project +and may seem overwhelming when you're first getting involved. +Contributing code is great, but that's probably not the first place to start. +There are lots of ways to make valuable contributions to the project and +community. + +This page provides some orientation for how to get involved. It also offers +some recommendations on how to get best results when engaging with the +community. + +Code of Conduct +=============== + +All participation in the Apache Arrow project is governed by the ASF's +`Code of Conduct <https://www.apache.org/foundation/policies/conduct.html>`_. + +Join the mailing lists +====================== + +A good first step to getting involved in the Arrow project is to join the +mailing lists and participate in discussions where you can. +Projects in The Apache Software Foundation ("the ASF") use public, archived +mailing lists to create a public record of each project's development +activities and decision-making process. +While lacking the immediacy of chat or other forms of communication, +the mailing lists give participants the opportunity to slow down and be +thoughtful in their responses, and they help developers who are spread across +many timezones to participate more equally. + +See `the community page <https://arrow.apache.org/community/>`_ for links to +subscribe to the mailing lists and to view archives. + +Report bugs and propose features +================================ + +Using the software and sharing your experience is a very helpful contribution +itself. Those who actively develop Arrow need feedback from users on what +works and what doesn't. Alerting us to unexpected behavior and missing features, +even if you can't solve the problems yourself, help us understand and prioritize +work to improve the libraries. + +We use `JIRA <https://issues.apache.org/jira/projects/ARROW/issues>`_ +to manage our development "todo" list and to maintain changelogs for releases. +In addition, the project's `Confluence site <https://cwiki.apache.org/confluence/display/ARROW>`_ +has some useful higher-level views of the JIRA issues. + +To create a JIRA issue, you'll need to have an account on the ASF JIRA, which +you can `sign yourself up for <https://issues.apache.org/jira/secure/Signup!default.jspa>`_. +The JIRA server hosts bugs and issues for multiple Apache projects. The JIRA +project name for Arrow is "ARROW". + +You don't need any special permissions on JIRA to be able to create issues. +Once you are more involved in the project and want to do more on JIRA, such as +assign yourself an issue, you will need "Contributor" permissions on the +Apache Arrow JIRA. To get this role, ask on the mailing list for a project +maintainer's help. + +Tips for using JIRA ++++++++++++++++++++ + +Before you create a new issue, we recommend you first +`search <https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20resolution%20%3D%20Unresolved>`_ +among existing Arrow issues. + +When reporting a new issue, follow these conventions to help make sure the +right people see it: + +* Use the **Component** field to indicate the area of the project that your + issue pertains to (for example "Python" or "C++"). +* Also prefix the issue title with the component name in brackets, for example + ``[Python] issue name`` ; this helps when navigating lists of open issues, + and it also makes our changelogs more readable. Most prefixes are exactly the + same as the **Component** name, with the following exceptions: + + * **Component:** Continuous Integration β **Summary prefix:** [CI] + * **Component:** Developer Tools β **Summary prefix:** [Dev] + * **Component:** Documentation β **Summary prefix:** [Docs] + +* If you're reporting something that used to work in a previous version + but doesn't work in the current release, you can add the "Affects version" + field. For feature requests and other proposals, "Affects version" isn't + appropriate. + +Project maintainers may later tweak formatting and labels to help improve their +visibility. They may add a "Fix version" to indicate that they're considering +it for inclusion in the next release, though adding that tag is not a +commitment that it will be done in the next release. + +Tips for successful bug reports ++++++++++++++++++++++++++++++++ + +No one likes having bugs in their software, and in an ideal world, all bugs +would get fixed as soon as they were reported. However, time and attention are +finite, especially in an open-source project where most contributors are +participating in their spare time. All contributors in Apache projects are +volunteers and act as individuals, even if they are contributing to the project +as part of their job responsibilities. + +In order for your bug to get prompt +attention, there are things you can do to make it easier for contributors to +reproduce and fix it. +When you're reporting a bug, please help us understand the issue by providing, +to the best of your ability, + +* Clear, minimal steps to reproduce the issue, with as few non-Arrow + dependencies as possible. If there's a problem on reading a file, try to + provide as small of an example file as possible, or code to create one. + If your bug report says "it crashes trying to read my file, but I can't + share it with you," it's really hard for us to debug. +* Any relevant operating system, language, and library version information +* If it isn't obvious, clearly state the expected behavior and what actually + happened. + +If a developer can't get a failing unit test, they won't be able to know that +the issue has been identified, and they won't know when it has been fixed. +Try to anticipate the questions you might be asked by someone working to +understand the issue and provide those supporting details up front. + +Other resources: + +* `Mozilla's bug-reporting guidelines <https://developer.mozilla.org/en-US/docs/Mozilla/QA/Bug_writing_guidelines>`_ +* `Reprex do's and don'ts <https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html>`_ + +Improve documentation +===================== + +A great way to contribute to the project is to improve documentation. If you +found some docs to be incomplete or inaccurate, share your hard-earned knowledge +with the rest of the community. + +Documentation improvements are also a great way to gain some experience with +our submission and review process, discussed below, without requiring a lot +of local development environment setup. In fact, many documentation-only changes +can be made directly in the GitHub web interface by clicking the "edit" button. +This will handle making a fork and a pull request for you. + +Contribute code +=============== + +Code contributions, or "patches", are delivered in the form of GitHub pull +requests against the `github.com/apache/arrow +<https://github.com/apache/arrow>`_ repository. + +Before starting ++++++++++++++++ + +You'll first need to select a JIRA issue to work on. Perhaps you're working on +one you reported yourself. Otherwise, if you're looking for something, +`search <https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20resolution%20%3D%20Unresolved>`_ +the open issues. Anything that's not in the "In Progress" state is fair game, +even if it is "Assigned" to someone, particularly if it has not been +recently updated. When in doubt, comment on the issue asking if they mind +if you try to put together a pull request; interpret no response to mean that +you're free to proceed. + +Please do ask questions, either on the JIRA itself or on the dev mailing list, +if you have doubts about where to begin or what approach to take. +This is particularly a good idea if this is your first code contribution, +so you can get some sense of what the core developers in this part of the +project think a good solution looks like. For best results, ask specific, +direct questions, such as: + +* Do you think $PROPOSED_APPROACH is the right one? +* In which file(s) should I be looking to make changes? +* Is there anything related in the codebase I can look at to learn? + +If you ask these questions and do not get an answer, it is OK to ask again. + +Pull request and review ++++++++++++++++++++++++ + +To contribute a patch: + +* Submit the patch as a GitHub pull request against the master branch. For a + tutorial, see the GitHub guides on `forking a repo <https://help.github.com/en/articles/fork-a-repo>`_ + and `sending a pull request <https://help.github.com/en/articles/creating-a-pull-request-from-a-fork>`_. + So that your pull request syncs with the JIRA issue, prefix your pull request + name with the JIRA issue id (ex: + `ARROW-767: [C++] Filesystem abstraction <https://github.com/apache/arrow/pull/4225>`_). +* Give the pull request a clear, brief description: when the pull request is + merged, this will be retained in the extended commit message. +* Make sure that your code passes the unit tests. You can find instructions how + to run the unit tests for each Arrow component in its respective README file. + +Core developers and others with a stake in the part of the project your change +affects will review, request changes, and hopefully indicate their approval +in the end. To make the review process smooth for everyone, try to + +* Break your work into small, single-purpose patches if possible. Itβs much + harder to merge in a large change with a lot of disjoint features, and + particularly if you're new to the project, smaller changes are much easier + for maintainers to accept. +* Add new unit tests for your code. +* Follow the style guides for the part(s) of the project you're modifying. + Some languages (C++ and Python, for example) run a lint check in + continuous integration. For all languages, see their respective developer + documentation and READMEs for style guidance. In general, try to make it look + as if the codebase has a single author, and emulate any conventions you see, + whether or not they are officially documented or checked. + +When tests are passing and the pull request has been approved by the interested +parties, a `committer <https://arrow.apache.org/committers/>`_ +will merge the pull request. This is done with a +command-line utility that does a squash merge, so all of your commits will be +registered as a single commit to the master branch; this simplifies the +connection between JIRA issues and commits, makes it easier to bisect +history to identify where changes were introduced, and helps us be able to +cherry-pick individual patches onto a maintenance branch. + +A side effect of this way of +merging is that your pull request will appear in the GitHub interface to have +been "closed without merge". Do not be alarmed: if you look at the bottom, you +will see a message that says ``@user closed this in $COMMIT``. In the commit +message of that commit, the merge tool adds the pull request description, a +link back to the pull request, and attribution to the contributor and any +co-authors. + +Local git conventions ++++++++++++++++++++++ + +If you are tracking the Arrow source repository locally, here are some tips +for using ``git``. + +All Arrow contributors work off of their personal fork of ``apache/arrow`` +and submit pull requests "upstream". Once you've cloned your fork of Arrow, +be sure to:: + + $ git remote add upstream https://github.com/apache/arrow + +to set the "upstream" repository. + +You are encouraged to develop on branches, rather than your own "master" branch, +and it helps to keep your fork's master branch synced with ``upstream/master``. + +To start a new branch, pull the latest from upstream first:: + + $ git fetch upstream + $ git checkout master + $ git pull --ff-only upstream master + $ git checkout -b $BRANCH + +It does not matter what you call your branch. Some people like to use the JIRA +number as branch name, others use descriptive names. + +Once you have a branch going, you should sync with ``upstream/master`` +regularly, as many commits are merged to master every day. +It is recommended to use ``git rebase`` rather than ``git merge``. +To sync your local copy of a branch, you may do the following:: + + $ git pull upstream $BRANCH --rebase + +This will rebase your local commits on top of the tip of ``upstream/$BRANCH``. In case +there are conflicts, and your local commit history has multiple commits, you may +simplify the conflict resolution process by squashing your local commits into a single +commit. Preserving the commit history isn't as important because when your +feature branch is merged upstream, a squash happens automatically. If you choose this +route, you can abort the rebase with:: + + $ git rebase --abort + +Following which, the local commits can be squashed interactively by running:: + + $ git rebase --interactive ORIG_HEAD~n + +Where ``n`` is the number of commits you have in your local branch. After the squash, +you can try the merge again, and this time conflict resolution should be relatively +straightforward. + +If you set the following in your repo's ``.git/config``, the ``--rebase`` option can be +omitted from the ``git pull`` command, as it is implied by default. :: + + [pull] + rebase = true + +Once you have an updated local copy, you can push to your remote repo. Note, since your +remote repo still holds the old history, you would need to do a force push. :: + + $ git push --force origin branch + +*Note about force pushing to a branch that is being reviewed:* if you want reviewers to +look at your updates, please ensure you comment on the PR on GitHub as simply force +pushing does not trigger a notification in the GitHub user interface. + +Also, once you have a pull request up, be sure you pull from ``origin`` +before rebasing and force-pushing. Arrow maintainers can push commits directly +to your branch, which they sometimes do to help move a pull request along. +In addition, the GitHub PR "suggestion" feature can also add commits to +your branch, so it is possible that your local copy of your branch is missing +some additions. + +.. include:: experimental_repos.rst + +Guidance for specific features +============================== + +From time to time the community has discussions on specific types of features +and improvements that they expect to support. This section outlines decisions +that have been made in this regard. + +Endianness +++++++++++ + +The Arrow format allows setting endianness. Due to the popularity of +little endian architectures most of implementation assume little endian by +default. There has been some effort to support big endian platforms as well. +Based on a `mailing-list discussion +<https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3cCAK7Z5T--HHhr9Dy43PYhD6m-XoU4qoGwQVLwZsG-kOxXjPTyZA@mail.gmail.com%3e>`__, +the requirements for a new platform are: + +1. A robust (non-flaky, returning results in a reasonable time) Continuous + Integration setup. +2. Benchmarks for performance critical parts of the code to demonstrate + no regression. + +Furthermore, for big-endian support, there are two levels that an +implementation can support: + +1. Native endianness (all Arrow communication happens with processes of the + same endianness). This includes ancillary functionality such as reading + and writing various file formats, such as Parquet. +2. Cross endian support (implementations will do byte reordering when + appropriate for :ref:`IPC <format-ipc>` and :ref:`Flight <flight-rpc>` + messages). + +The decision on what level to support is based on maintainers' preferences for +complexity and technical risk. In general all implementations should be open +to native endianness support (provided the CI and performance requirements +are met). Cross endianness support is a question for individual maintainers. + +The current implementations aiming for cross endian support are: + +1. C++ + +Implementations that do not intend to implement cross endian support: + +1. Java + +For other libraries, a discussion to gather consensus on the mailing-list +should be had before submitting PRs. |