diff options
Diffstat (limited to 'src/jaegertracing/thrift/doc')
24 files changed, 3784 insertions, 0 deletions
diff --git a/src/jaegertracing/thrift/doc/ReleaseManagement.md b/src/jaegertracing/thrift/doc/ReleaseManagement.md new file mode 100644 index 000000000..362fd8315 --- /dev/null +++ b/src/jaegertracing/thrift/doc/ReleaseManagement.md @@ -0,0 +1,419 @@ +# Apache Thrift Release Management + +Instructions for preparing and distributing a release of Apache Thrift are fairly complex. These procedures are documented here, and we're working to automate as much of this as possible. There are few projects like ours that integrate with 28 programming languages. Given the extreme number of package management systems that Apache Thrift integrates with (compared to perhaps any), part of the burden of releasing Apache Thrift is to manually package and upload some of these [language-specific packages](http://apache.thrift.org/libraries). + +It is important to note here that Apache Thrift is designed for version interoperability, so one can use a version 0.7.0 client with a 0.12.0 server. A particular version number does not make any guarantees as to the features available in any given language. See the [Language Feature Matrix](https://github.com/apache/thrift/blob/master/LANGUAGES.md) to learn more. + +## Concepts + +### Versioning + +Apache Thrift and the vast majority of package management systems out there conform to the [SemVer 2.0](https://semver.org/spec/v2.0.0.html) version numbering specification. Apache Thrift uses the following versioning rules: + +- *major* is currently always zero; +- *minor* is increased for each release cycle; +- *patch* is increased for patch builds between release cycles to address critical defect, security, or packaging issues + +Further, if there are only packaging changes for a single third-party distribution point to correct an issue, the major.minor.patch may remain the same while adding a suffix compatible with that distribution point, for example "0.12.0.1" for nuget, or "0.12.0-1" for maven. + +#### External Package Patches + +It is common to have language-specific critical defects or packaging errors that need to be resolved between releases of Apache Thrift. The project handles these on a case-by-case basis for languages that have their own [package management systems](http://apache.thrift.org/libraries). When a language-specific patch is made, the patch level of the distribution pushed to the external package manager is bumped. + + As such, there may be cases between Apache Thrift releases where there are (for example) a `0.12.1` and `0.12.2` version of a Haskell Hackage package, and perhaps also a `0.12.3` version of a dlang dub package. You will not find a tag or an official project release in these cases, however the code changes will be reflected in the release branch and in master. In these cases we would not release a version of Apache Thrift nor would we refresh all the external language packages. + +#### Version in the master branch + +The master branch will always contain the next anticipated release version. When a release cycle begins, a branch is cut from master. The release branch will already have all of the correct versions, and therefore release branches can be easily merged back into master. (This was not true of releases before 0.12.0). + +### Code Repository + +The authoritative repository for Apache Thrift is stored in [GitHub](https://github.com/apache/thrift). It is mirrored by [GitBox](https://gitbox.apache.org/repos/asf?p=thrift.git). + +### Branches + +All code (submitted via pull request or direct push) is committed to the `master` branch. Until version 1.0 of Apache Thrift each release branch was named `<version>`, for example in version `0.12.0` there is a branch named the same. For version 1.0 releases any beyond, releases will have a branch named `release/<version>`. + +### Tags + +Up to version `0.12.0` each release of Apache Thrift was tagged with a `<version>` tag. Starting with the `0.12.0` release, each release of Apache Thrift will be tagged with a `v<version>` tag to satisfy external package management tools (such as ones for dlang and golang). For example the tag of version `0.12.0` is `v0.12.0`. + +## Release Procedures + +### Release Schedule + +Apache Thrift has no official release schedule, however the project aims to release at least twice per year. + +A complete release cycle will take about 1 week to complete, if things go well, with half of that time waiting for a vote. + +### Release Manager + +Before a release cycle begins, someone must nominate themselves on the development mailing list as the release manager for that release. In order to be a release manager you must meet the following criteria: + +1. You are a [member](http://people.apache.org/phonebook.html?pmc=thrift) of the Apache PMC group. +1. Your profile at https://id.apache.org/ is valid and contains a PGP key. If it does not, see the [Apache OpenPGP Instructions](https://www.apache.org/dev/openpgp.html). If your PGP private key creation seems to hang indefinitely while creating entropy, try these fixes: + - Generate disk I/O with: `dd if=/dev/sda of=/dev/zero` + - Install the `rng-tools` package. +1. Your PGP key is visible in the [Apache Committer Keys](http://people.apache.org/keys/committer/) for code signing. This list is updated periodically from your Apache ID (see previous step). +1. You have read and agree with the contents of the [ASF Release Distribution Policy](https://www.apache.org/dev/release-distribution.html). +1. You have access and the ability to use subversion. All distribution artifacts are released through a subversion commit. +1. You can build in the Linux Docker Container, and you have Visual Studio 2017. +1. You have sufficient time to complete a release distribution. + +### Release Candidate + +All Apache Thrift releases go through a 72-hour final release candidate voting procedure. Votes from members of the Apache Thrift PMC are binding, and all others are non-binding. For these examples, the `master` branch is at version 1.0.0 and that is the next release. + +1. Scrub the Apache Jira backlog. There are a couple things to do: + + 1. [Open Issues without a Component](https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20THRIFT%20and%20status%20!%3D%20Closed%20and%20component%20is%20empty) - make sure everything has an assigned component, as the release notes are grouped together by language. + + 1. [Open Issues with a Fix Version](https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20THRIFT%20and%20status%20in%20(OPEN%2C%20%27IN%20PROGRESS%27%2C%20REOPENED)%20and%20fixVersion%20is%20not%20empty) - these will be issues that someone placed a fixVersion on in Jira, but have not been resolved or closed yet. They are likely stale somehow. Resolutions for these issues include resolving or closing the issue in Jira, or simply removing the fixVersion if the issue hasn't been fixed. + + 1. [Open Blocking Issues](https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20THRIFT%20and%20priority%20in%20(blocker)%20and%20status%20not%20in%20(closed)%20order%20by%20component%20ASC) - blocking issues should block a release. Scrub the list to see if they are really blocking the release, and if not change their priority. + + 1. [Open Critical Issues](https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20THRIFT%20and%20priority%20in%20(critical)%20and%20status%20not%20in%20(closed)%20and%20type%20not%20in%20(%22wish%22)%20order%20by%20component%20ASC) - this list will end up in the known critical issues list in the changes file. Scrub it to make sure everything is actually critical. + + It is healthy to scrub these periodically, whether or not you are making a new release. + +1. Check that the version number in the `master` branch matches the version number of the upcomning release. To check the `master` branch version, run: + + ```bash + thrift$ grep AC_INIT configure.ac | cut -d'[' -f3 | cut -d']' -f1 + 1.0.0 + ``` + + If it does not match (this should be extremely rare), you need to submit a pull request setting the `master` branch to the desired version of the upcoming release. In the following example, we prepare to commit a branch where the version number is changed from `1.0.0` to `1.1.0`: + + ```bash + thrift$ git checkout -b fix-version-for-release + thrift$ build/veralign.sh 1.0.0 1.1.0 + # check to see if any of the manually modified files needs changes + thrift$ git push ... # make a pull request + ``` + +1. Create a release branch for the release, in this example `1.0.0`: + + ```bash + thrift$ git checkout master + thrift$ git pull + thrift$ git checkout -b "release/1.0.0" + thrift$ git push + ``` + + Now there is a `release/1.0.0` branch in GitHub for Apache Thrift. + + By creating a release branch we allow work to continue on the `master` branch for the next release while we finalize this one. Note that `release/1.0.0` and `master` in this example are now identical, and therefore it is possible to merge the release branch back into `master` at the end of the release! + +1. Modify these files manually, inserting the release into them at the appropriate location. Follow existing patterns in each file: + - `doap.rdf` + - `debian/changelog` + +1. Generate the content for `CHANGES.md` - this is one of the most time-consuming parts of the release cycle. It is a lot of work, but the result is well worth it to the consumers of Apache Thrift: + + 1. Find all [Issues Fixed but not Closed in 1.0.0](https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20thrift%20and%20fixVersion%20%3D%201.0.0%20and%20status%20!%3D%20closed) (adjust the version in the link to suit your needs). + + 1. Export the list of issues to a CSV (Current Fields) and open in Excel (or a similar spreadsheet). + + 1. Hide all columns except for the issue id (i.e. THRIFT-nnnn), the component (first one), and the summary. + + 1. Sort by component ascending and then by id ascending. + + 1. Create a fourth column that will contain the contents of each line that goes into the release notes. Once you have the formula working in one cell paste it into the other rows to populate them. Use a formula to get the column to look like this: + + ```vcol + Issue Component Summary RelNote + THRIFT-123 C++ - Library Drop C++03 [THRIFT-123](https://issues.apache.org/jira/browse/THRIFT-3978) - Drop C++03 + ``` + + For example, if the row above was row "B" in EXCEL it would look something like: + + ```text + =CONCAT("[", B1, "]", + "https://issues.apache.org/jira/browse/", + B1, " - ", B3) + ``` + + 1. Create a level 3 section in `CHANGES.md` under the release for each component and copy the items from the RelNote column into the changes file. + + 1. Find all [Open Critical Issues](https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20THRIFT%20and%20priority%20in%20(critical)%20and%20status%20not%20in%20(closed)%20and%20type%20not%20in%20(%22wish%22)%20order%20by%20component%20ASC) and add them to `CHANGES.md` in the list of known critical issues for the release. + +1. Commit all changes to the release branch. + +1. Generate the source tarball. + + 1. On a linux system get a clean copy of the release branch, for example: + + ```bash + ~$ git clone -b "release/1.0.0" git@github.com:apache/thrift.git thrift-1.0.0-src + ``` + + 1. In the clean copy of the release branch, start a docker build container and run `make dist`: + + ```code + ~$ cd thrift-1.0.0-src + ~/thrift-1.0.0-src$ docker run -v $(pwd):/thrift/src:rw \ + -it thrift/thrift-build:ubuntu-bionic /bin/bash + root@8b4101188aa2:/thrift/src# ./bootstrap.sh && ./configure && make dist + ``` + + The result will be a file named `thrift-1.0.0.tar.gz`. Check the size and make sure it is roughly 4MB. It could get larger over time, but it shouldn't jump by orders of magnitude. Once satisfied you can exit the docker container with `exit`. + + 1. Generate signatures and checksums for the tarball: + + ```bash + gpg --armor --output thrift-1.0.0.tar.gz.asc --detach-sig thrift-1.0.0.tar.gz + md5sum thrift-1.0.0.tar.gz > thrift-1.0.0.tar.gz.md5 + sha1sum thrift-1.0.0.tar.gz > thrift-1.0.0.tar.gz.sha1 + sha256sum thrift-1.0.0.tar.gz > thrift-1.0.0.tar.gz.sha256 + +1. Generate the Windows Thrift Compiler. This is a statically linked compiler that is portable and folks find it useful to be able to download one, especially if they are using third-party distributed runtime libraries for interpreted languages on Windows. There are two ways to generate this: + + - Using a Development VM + + 1. On a Windows machine with Visual Studio, pull down the source code and checkout the release branch. + 1. Open an x64 Native Tools Command Prompt for VS 2017 and create an out-of-tree build directory. + 1. Install the latest version of cmake. + 1. Install chocolatey and install winflexbison with chocolatey. + 1. Run cmake to generate an out-of-tree build environment: + ```cmd + C:\build> cmake ..\thrift -DBISON_EXECUTABLE=c:\ProgramData\chocolatey\lib\winflexbison\tools\win_bison.exe -DFLEX_EXECUTABLE=c:\ProgramData\chocolatey\lib\winflexbison\tools\win_flex.exe -DWITH_MT=ON -DWITH_SHARED_LIB=OFF -DWITH_CPP=OFF -DWITH_JAVA=OFF -DWITH_HASKELL=OFF -DWITH_PYTHON=OFF -DWITH_C_GLIB=OFF -DBUILD_TESTING=OFF -DBUILD_TUTORIALS=OFF -DBUILD_COMPILER=ON + C:\build> cmake --build . --config Release + ``` + + - Using [Docker for Windows](../build/docker/msvc2017/README.md), follow the instructions for building the compiler. + - In both cases: + 1. Verify the executable only depends on kernel32.dll using [depends.exe](http://www.dependencywalker.com/). + 1. Copy the executable `thrift.exe` to your linux system where the signed tarball lives and rename it to `thrift-1.0.0.exe` (substitute the correct version, of course). + 1. Sign the executable the same way you signed the tarball. + +1. Upload the release artifacts to the Apache Dist/Dev site. This requires subversion: + + ```bash + ~$ mkdir -p dist/dev + ~$ cd dist/dev + ~/dist/dev$ svn co "https://dist.apache.org/repos/dist/dev/thrift" thrift + ~/dist/dev$ cd thrift + ``` + + Copy the tarball, windows compiler executable, and 8 additional signing files into a new directory for the release: + + ``` bash + ~/dist/dev/thrift$ mkdir 1.0.0-rc0 + # copy the files into the directory + ~/dist/dev/thrift$ svn add 1.0.0-rc0 + ``` + + The layout of the files should match the [current release](https://www.apache.org/dist/thrift/). Once done, add the release candidate and check it in: + + ```bash + ~/dist/dev/thrift$ svn status + # verify everything is correct + ~/dist/dev/thrift$ svn commit -m "Apache Thrift 1.0.0-rc0 in dist dev" \ + --username <apache-username> --password <apache-password> + ``` + +1. Verify the release candidate artifacts are available at: + + [https://dist.apache.org/repos/dist/dev/thrift/](https://dist.apache.org/repos/dist/dev/thrift/) + +1. Send a voting announcement message to `dev@thrift.apache.org` following this template as a guide: + + ```code + To: dev@thrift.apache.org + Subject: [VOTE] Apache Thrift 1.0.0-rc0 release candidate + --- + All, + + I propose that we accept the following release candidate as the official Apache Thrift 1.0.0 release: + + https://dist.apache.org/repos/dist/dev/thrift/1.0.0-rc0/thrift-1.0.0-rc0.tar.gz + + The release candidate was created from the release/1.0.0 branch and can be cloned using: + + git clone -b release/1.0.0 https://github.com/apache/thrift.git + + The release candidates GPG signature can be found at: + https://dist.apache.org/repos/dist/dev/thrift/1.0.0-rc0/thrift-1.0.0-rc0.tar.gz.asc + + The release candidates checksums are: + md5: + sha1: + sha256: + + + A prebuilt statically-linked Windows compiler is available at: + https://dist.apache.org/repos/dist/dev/thrift/1.0.0-rc0/thrift-1.0.0-rc0.exe + + Prebuilt statically-linked Windows compiler GPG signature: + https://dist.apache.org/repos/dist/dev/thrift/1.0.0-rc0/thrift-1.0.0-rc0.exe.asc + + Prebuilt statically-linked Windows compiler checksums are: + md5: + sha1: + sha256: + + + The source tree as ZIP file to be published via Github releases: + https://dist.apache.org/repos/dist/dev/thrift/1.0.0-rc0/thrift-1.0.0-rc0.zip + + ZIP source tree GPG signature: + https://dist.apache.org/repos/dist/dev/thrift/1.0.0-rc0/thrift-1.0.0-rc0.zip.asc + + ZIP source tree checksums are: + md5: + sha1: + sha256: + + The CHANGES list for this release is available at: + https://github.com/apache/thrift/blob/release/1.0.0/CHANGES.md + + + Please download, verify sig/sum, install and test the libraries and languages of your choice. + + This vote will close in 72 hours on 2019-07-06 21:00 UTC + + [ ] +1 Release this as Apache Thrift 1.0.0 + [ ] +0 + [ ] -1 Do not release this as Apache Thrift 1.0.0 because... + ``` + +1. If any issues are brought up with the release candidate, you will need to package another and reset the voting clock. + +Voting on the development mailing list provides additional benefits (wisdom from [Christopher Tubbs](https://issues.apache.org/jira/browse/THRIFT-4506?focusedCommentId=16791902&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16791902)): +- It creates a public record for the vote, +- It allows for participation/evaluation from our wider user audience (more diversity in evaluators improves quality), and +- It provides more entry points for potential future committers/PMC members to earn merit through participation. + +### Official Release + +1. Send a message to `dev@thrift.apache.org` with the voting results. Use this template as a guide: + + ```code + To: dev~thrift.apache.org + Subject: [VOTE][RESULT] Release Apache Thrift 1.0.0 + --- + All, + + Including my own vote of +1 we have N binding +1 and no -1. + The vote for the Apache Thrift 1.0.0 release is ***successful***. + Thank you to all who helped test and verify. + ``` + +1. Use svn to checkout the release part of thrift (similar to dev) and copy the files over from dev, matching the previous release structure: + + ```bash + ~$ mkdir -p dist/release + ~$ cd dist/release + ~/dist/release$ svn co "https://dist.apache.org/repos/dist/release/thrift" thrift + ~/dist/release$ cd thrift + ~/dist/release/thrift$ mkdir 1.0.0 + ~/dist/release/thrift$ cp -p ../../dev/thrift/1.0.0-rc0/* 1.0.0/ + ~/dist/release/thrift$ svn status + # verify everything is correct + ~/dist/release/thrift$ svn commit -m "Apache Thrift 1.0.0 official release" \ + --username <apache-username> --password <apache-password> + ``` + + **NOTE** One you check in, you need to wait about a day for all the mirrors to update. You cannot send the announcement email or update the web site until the mirrors are updated. + +1. Create and push a tag for the release, for example "v1.0.0". + + **NOTE:** All new releases must have the "v" prefix to satisfy third party package managers (dlang dub, golang, etc..) + + **NOTE:** You **should** [sign the release tag](https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work). Since you already have a GPG signing key for publishing the Apache Release, you want to [upload that key to your GitHub account](https://help.github.com/en/articles/adding-a-new-gpg-key-to-your-github-account). Once the key is known by GitHub you can sign the tag. + + ```bash + ~/thrift$ # make sure you are on the release branch + ~/thrift$ git checkout release/1.0.0 + ~/thrift$ git pull + ~/thrift$ git tag -s v1.0.0 -m "Version 1.0.0" + ~/thrift$ git push --tags + ``` + + **NOTE:** If you get the error "gpg failed to sign the data" when tagging, try this fix: "export GPG_TTY=$(tty)" + +1. Create a new release from the [GitHub Tags Page](https://github.com/apache/thrift/tags). Attach the statically built Windows thrift compiler as a binary here. + +1. Merge the release branch into master. This ensures all changes made to fix up the release are in master. + + ```bash + ~/thrift$ git checkout master + ~/thrift$ git pull + ~/thrift$ git merge release/1.0.0 + ``` + + The merge of 1.0.0 into master should proceed as a fast-forward since the 1.0.0 release branch. If there are discrepancies the best thing to do is resolve them and then submit a pull request. This pull request must be *MERGED* and not *REBASED* after the CI build is successful. You may want to do this yourself and mark the pull request as `[DO NOT MERGE]`. + +1. Update the ASF CMS content for thrift to include the new release. Note over time we will retire this in favor of including all documentation in the GitHub repository. The page with the variables that are important like the current release or distribution links is in trunk/lib/path.pm in the ASF CMS for thrift. + + 1. Go to the [ASF CMS for Thrift](https://cms.apache.org/thrift/). + 1. Get a working copy. + 1. On the top right, click on `trunk`. + 1. Navigate into `lib`. + 1. Open `path.pm`. + 1. Edit + 1. Change `current_release` and `current_release_date` to reflect the correct information. + 1. Submit + 1. Commit + 1. Submit + 1. Follow Staging Build until it completes. + 1. Open the Staged site. + 1. Ensure the download links work. + 1. Publish Site. + +1. Make an announcement on the dev@ and user@ mailing lists of the release. There's no template to follow, but you can point folks to the official web site at https://thrift.apache.org, and to the GitHub site at https://github.org/apache.thrift. + +### Post-Release + +1. Visit https://reporter.apache.org/addrelease.html?thrift and register it. You will get an automated reminder as the one who committed into dist. This informs the Apache Board of Directors of releases through project reports. + +1. Create a local branch to bump the release number to the next anticipated release: + + ```bash + ~/thrift$ git checkout -b bump-master + ~/thrift$ build/veralign.sh 1.0.0 1.1.0 + ``` + + The veralign script will set the version number in all of the language packaging files and headers. You do not need to worry about the manually modified files at this time. You should however ensure everything is correct by looking at the diff. + +1. Create a pull request to advance master to the next anticipated release. + +1. In Apache Jira, select all tickets where the fix version is the release and the status is not closed ([example](https://issues.apache.org/jira/issues/?jql=project%20%3D%20THRIFT%20AND%20fixVersion%20%3D%201.0%20%20and%20status%20!%3D%20Closed)) and use the bulk editing tool to close them. +1. **FIXME** Ask someone with admin access to Apache Jira to change the fixVersion in question from unreleased to released, for example: + https://issues.apache.org/jira/browse/THRIFT-4686 + +1. Ensure that the [Jira release page](https://issues.apache.org/jira/projects/THRIFT?selectedItem=com.atlassian.jira.jira-projects-plugin%3Arelease-page&status=unreleased) for the version has the same number of issues in the version as issues done, and that there are no issues in progress and no issues to do, and no warnings. Finally, mark it as released and set the date of the release. + +* [Report any CVEs](https://apache.org/security/committers.html) that were fixed. You can email `security@apache.org` if you are not sure if there are any CVEs to report. + +#### Third Party Package Managers + +See https://thrift.apache.org/lib/ for the current status of each external package manager's distribution. Information below is from the 0.12.0 release: + + > This section needs to be updated with detailed instructions for each language, or pointers to the README.md files in each language directory with detailed release instructions for the given package management system. + +* [dart] Releasing this requires a google account. + * You will need to install the same version of dart that is used in the docker image. + * Go into lib/dart and run "pub publish --dry-run" and resolve any warnings. + * Run "pub publish" and go through the google account authorization to allow it. +* [dlang] Within a day, the dlang dub site https://code.dlang.org/packages/apache-thrift?tab=info + should pick up the release based on the tag. No action needed. +* [haskell] https://hackage.haskell.org/package/thrift + https://jira.apache.org/jira/browse/THRIFT-4698 +* [npmjs] @jfarrell is the only one who can do this right now. + https://issues.apache.org/jira/browse/THRIFT-4688 +* [perl] A submission to CPAN is necessary (normally jeking3 does this): + * Checkout the release branch or tag on a linux system. + * Fire up the docker build container. + * Run "make clean" and remove any gen-perl directories. + * Inside `lib/perl` run the script `build-cpan-dist.sh`. + * Upload the resulting package. If there's a mistake that needs to be corrected, + increase the suffix. (_1, _2, ...) and upload another. You cannot replace a release on CPAN. +* [php] @jfarrell, @bufferoverflow, @jeking3 are the only ones who can do this right now. + * Once the release is tagged, one just has to hit the "Update" button to pick it up. +* [pypi] @jfarrell is the only one who can do this right now. + https://issues.apache.org/jira/browse/THRIFT-4687 +* [rust] Any thrift project committer is allowed to upload a new crate. + +If you have any questions email `dev@thrift.apache.org`. diff --git a/src/jaegertracing/thrift/doc/coding_standards.md b/src/jaegertracing/thrift/doc/coding_standards.md new file mode 100644 index 000000000..308100ab0 --- /dev/null +++ b/src/jaegertracing/thrift/doc/coding_standards.md @@ -0,0 +1,48 @@ +# Thrift Coding Standards + + Any fool can write code that a computer can understand. + Good programmers write code that humans can understand. + -- Martin Fowler, 1999 + +The purpose of this document is to make everyone's life easier. + +It's easier when you read good, well formatted, with clearly defined purpose, code. +But the only way to read clean code is to write such. + +This document can help achieve that, but keep in mind that +those are not silver-bullet, fix-all-at-once rules. Just think about readability while writing code. +Write code like you would have to read it in ten years from now. + +## General Coding Standards + +Thrift has some history. Not all existing code follows those rules. +But we want to improve over time. +When making small change / bugfix - like single line fix - do *not* refactor whole function. +That disturbs code repository history. +Whenever adding something new and / or making bigger refactoring + - follow those rules as strictly as you can. + +When in doubt - contact other developers (using dev@ mailing list or IRC). +Code review is the best way to improve readability. + +### Basics + * Use spaces not tabs + * Use only ASCII characters in file and directory names + * Commit to repository using Unix-style line endings (LF) + On Windows: + git config core.autocrlf true + * Maximum line width - 100 characters + * If not specified otherwise in language specific standard - use 2 spaces as indent/tab + +### Comments + * Each file has to start with comment containing [Apache License](http://www.apache.org/licenses/LICENSE-2.0) + * Public API of library should be documented, preferably using format native for language specific documentation generation tools (Javadoc, Doxygen etc.) + * Other comments are discouraged - comments are lies. When one has to make comment it means one failed to write readable code. Instead of "I should write a comment here" think "I should clean it up" + * Do not leave "TODO/FIXME" comments - file [Jira](http://issues.apache.org/jira/browse/THRIFT) issue instead + +### Naming + Finding proper names is the most important and most difficult task in software development. + +## Language Specific Coding Standards + +For detailed information see `lib/LANG/coding_standards.md` diff --git a/src/jaegertracing/thrift/doc/committers.md b/src/jaegertracing/thrift/doc/committers.md new file mode 100644 index 000000000..2326711d9 --- /dev/null +++ b/src/jaegertracing/thrift/doc/committers.md @@ -0,0 +1,53 @@ +## Process used by committers to review and submit patches + +1. Make sure that there is an issue for the patch(s) you are about to commit in our [Jira issue tracker](http://issues.apache.org/jira/browse/THRIFT) + +1. Check out the latest version of the source code + + * git clone https://github.com/apache/thrift.git thrift + +1. Apply the patch + + * curl https://issues.apache.org/jira/... |git apply --ignore-space-change + + or + + * curl https://github.com/<GitHub User>/thrift/commit/<Commit ID>.patch |git apply --ignore-space-change + + +1. Inspect the applied patch to ensure that all [Legal aspects on Submission of Contributions (Patches)](http://www.apache.org/licenses/LICENSE-2.0.html#contributions) are met + +1. Run the necessary unit tests and cross language test cases to verify the patch + +1. Commit the patch + + git --config user.name "Your Name" + git --config user.email "YourApacheID@apache.org" + git add -A + git commit + +1. The commit message should be in the format: + + THRIFT-####:<Jira description> + Client: <component> + Patch: <Name of person contributing the patch> + + Description of what was fixed or addressed. + + If this is a github pull request then add below comment to automaticaly close GitHub request, + where #NNNN is the PR number: + + This closes #NNNN + + +1. Double check the patch committed and that nothing was missed then push the patch + + git status + git show HEAD + git push origin master + + +1. Resolve the jira issue and set the following for the changelog + + * Component the patch is for + * fixVersion to the current version on master diff --git a/src/jaegertracing/thrift/doc/images/cgrn.png b/src/jaegertracing/thrift/doc/images/cgrn.png Binary files differnew file mode 100644 index 000000000..dc0964e0d --- /dev/null +++ b/src/jaegertracing/thrift/doc/images/cgrn.png diff --git a/src/jaegertracing/thrift/doc/images/cred.png b/src/jaegertracing/thrift/doc/images/cred.png Binary files differnew file mode 100644 index 000000000..086a5fbe9 --- /dev/null +++ b/src/jaegertracing/thrift/doc/images/cred.png diff --git a/src/jaegertracing/thrift/doc/images/credfull.png b/src/jaegertracing/thrift/doc/images/credfull.png Binary files differnew file mode 100644 index 000000000..ff66404ff --- /dev/null +++ b/src/jaegertracing/thrift/doc/images/credfull.png diff --git a/src/jaegertracing/thrift/doc/images/cyel.png b/src/jaegertracing/thrift/doc/images/cyel.png Binary files differnew file mode 100644 index 000000000..7c1dfc767 --- /dev/null +++ b/src/jaegertracing/thrift/doc/images/cyel.png diff --git a/src/jaegertracing/thrift/doc/images/thrift-layers.png b/src/jaegertracing/thrift/doc/images/thrift-layers.png Binary files differnew file mode 100644 index 000000000..c1accf409 --- /dev/null +++ b/src/jaegertracing/thrift/doc/images/thrift-layers.png diff --git a/src/jaegertracing/thrift/doc/install/README.md b/src/jaegertracing/thrift/doc/install/README.md new file mode 100644 index 000000000..071a5d64d --- /dev/null +++ b/src/jaegertracing/thrift/doc/install/README.md @@ -0,0 +1,43 @@ + +## Basic requirements +* A relatively POSIX-compliant *NIX system + * Cygwin or MinGW can be used on Windows (but there are better options, see below) +* g++ 4.2 +* boost 1.56.0 +* Runtime libraries for lex and yacc might be needed for the compiler. + +## Requirements for building from source +* GNU build tools: + * autoconf 2.65 + * automake 1.13 + * libtool 1.5.24 +* pkg-config autoconf macros (pkg.m4) +* lex and yacc (developed primarily with flex and bison) +* libssl-dev + +## Requirements for building the compiler from source on Windows +* Visual Studio C++ +* Flex and Bison (e.g. the WinFlexBison package) + +## Language requirements +These are only required if you choose to build the libraries for the given language + +* C++ + * Boost 1.56.0 + * libevent (optional, to build the nonblocking server) + * zlib (optional) +* Java + * Java 1.8 + * Gradle +* C#: Mono 1.2.4 (and pkg-config to detect it) or Visual Studio 2005+ +* Python 2.6 (including header files for extension modules) +* PHP 5.0 (optionally including header files for extension modules) +* Ruby 1.8 + * bundler gem +* Erlang R12 (R11 works but not recommended) +* Perl 5 + * Bit::Vector + * Class::Accessor +* Haxe 3.1.3 +* Go 1.4 +* Delphi 2010 diff --git a/src/jaegertracing/thrift/doc/install/centos.md b/src/jaegertracing/thrift/doc/install/centos.md new file mode 100644 index 000000000..18282a3a1 --- /dev/null +++ b/src/jaegertracing/thrift/doc/install/centos.md @@ -0,0 +1,75 @@ +# Building Apache Thrift on CentOS 6.5 + +Starting with a minimal installation, the following steps are required to build Apache Thrift on Centos 6.5. This example builds from source, using the current development master branch. These instructions should also work with Apache Thrift releases beginning with 0.9.2. + +## Update the System + + sudo yum -y update + +## Install the Platform Development Tools + + sudo yum -y groupinstall "Development Tools" + +## Upgrade autoconf/automake/bison + + sudo yum install -y wget + +### Upgrade autoconf + + wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz + tar xvf autoconf-2.69.tar.gz + cd autoconf-2.69 + ./configure --prefix=/usr + make + sudo make install + cd .. + +### Upgrade automake + + wget http://ftp.gnu.org/gnu/automake/automake-1.14.tar.gz + tar xvf automake-1.14.tar.gz + cd automake-1.14 + ./configure --prefix=/usr + make + sudo make install + cd .. + +### Upgrade bison + + wget http://ftp.gnu.org/gnu/bison/bison-2.5.1.tar.gz + tar xvf bison-2.5.1.tar.gz + cd bison-2.5.1 + ./configure --prefix=/usr + make + sudo make install + cd .. + +## Add Optional C++ Language Library Dependencies + +All languages require the Apache Thrift IDL Compiler and at this point everything needed to make the IDL Compiler is installed (if you only need the compiler you can skip to the Build step). + +If you will be developing Apache Thrift clients/servers in C++ you will also need additional packages to support the C++ shared library build. + +### Install C++ Lib Dependencies + + sudo yum -y install libevent-devel zlib-devel openssl-devel + +### Upgrade Boost >= 1.56 + + wget http://sourceforge.net/projects/boost/files/boost/1.56.0/boost_1_56_0.tar.gz + tar xvf boost_1_56_0.tar.gz + cd boost_1_56_0 + ./bootstrap.sh + sudo ./b2 install + +## Build and Install the Apache Thrift IDL Compiler + + git clone https://github.com/apache/thrift.git + cd thrift + ./bootstrap.sh + ./configure --with-lua=no + make + sudo make install + +This will build the compiler (thrift/compiler/cpp/thrift --version) and any language libraries supported. The make install step installs the compiler on the path: /usr/local/bin/thrift +You can use the ./configure --enable-libs=no switch to build the Apache Thrift IDL Compiler only without lib builds. To run tests use "make check". diff --git a/src/jaegertracing/thrift/doc/install/debian.md b/src/jaegertracing/thrift/doc/install/debian.md new file mode 100644 index 000000000..2ccc37da7 --- /dev/null +++ b/src/jaegertracing/thrift/doc/install/debian.md @@ -0,0 +1,62 @@ +## Debian/Ubuntu install +The following command will install tools and libraries required to build and install the Apache Thrift compiler and C++ libraries on a Debian/Ubuntu Linux based system. + + sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config + +Debian 7/Ubuntu 12 users need to manually install a more recent version of automake and (for C++ library and test support) boost: + + wget http://ftp.debian.org/debian/pool/main/a/automake-1.15/automake_1.15-3_all.deb + sudo dpkg -i automake_1.15-3_all.deb + + wget http://sourceforge.net/projects/boost/files/boost/1.60.0/boost_1_60_0.tar.gz tar xvf boost_1_60_0.tar.gz + cd boost_1_60_0 + ./bootstrap.sh + sudo ./b2 install + +## Optional packages + +If you would like to build Apache Thrift libraries for other programming languages you may need to install additional packages. The following languages require the specified additional packages: + + * Java + * packages: gradle + * You will also need Java JDK v1.8 or higher. Type **javac** to see a list of available packages, pick the one you prefer and **apt-get install** it (e.g. default-jdk). + * Ruby + * ruby-full ruby-dev ruby-rspec rake rubygems bundler + * Python + * python-all python-all-dev python-all-dbg + * Perl + * libbit-vector-perl libclass-accessor-class-perl + * Php, install + * php5-dev php5-cli phpunit + * C_glib + * libglib2.0-dev + * Erlang + * erlang-base erlang-eunit erlang-dev rebar + * Csharp + * mono-gmcs mono-devel libmono-system-web2.0-cil nunit nunit-console + * Haskell + * ghc cabal-install libghc-binary-dev libghc-network-dev libghc-http-dev + * Thrift Compiler for Windows + * mingw-w64 mingw-w64-x86-64-dev nsis + * Rust + * rustc cargo + * Haxe + * haxe + * Lua + * lua5.3 liblua5.3-dev + * NodeJs + * nodejs npm + * dotnetcore + * https://www.microsoft.com/net/learn/get-started/linuxubuntu + * d-lang + * curl -fsS https://dlang.org/install.sh | bash -s dmd + * dart & pub + * https://www.dartlang.org/install/linux + * https://www.dartlang.org/tools/pub/installing + + +## Additional reading + +For more information on the requirements see: [Apache Thrift Requirements](/docs/install) + +For more information on building and installing Thrift see: [Building from source](/docs/BuildingFromSource) diff --git a/src/jaegertracing/thrift/doc/install/os_x.md b/src/jaegertracing/thrift/doc/install/os_x.md new file mode 100644 index 000000000..2d99ef4a3 --- /dev/null +++ b/src/jaegertracing/thrift/doc/install/os_x.md @@ -0,0 +1,27 @@ +## OS X Setup +The following command install all the required tools and libraries to build and install the Apache Thrift compiler on a OS X based system. + +### Install Boost +Download the boost library from [boost.org](http://www.boost.org) untar compile with + + ./bootstrap.sh + sudo ./b2 threading=multi address-model=64 variant=release stage install + +### Install libevent +Download [libevent](http://monkey.org/~provos/libevent), untar and compile with + + ./configure --prefix=/usr/local + make + sudo make install + +### Building Apache Thrift +Download the latest version of [Apache Thrift](/download), untar and compile with + + ./configure --prefix=/usr/local/ --with-boost=/usr/local --with-libevent=/usr/local + +## Additional reading + +For more information on the requirements see: [Apache Thrift Requirements](/docs/install) + +For more information on building and installing Thrift see: [Building from source](/docs/BuildingFromSource) + diff --git a/src/jaegertracing/thrift/doc/install/windows.md b/src/jaegertracing/thrift/doc/install/windows.md new file mode 100644 index 000000000..8618934f8 --- /dev/null +++ b/src/jaegertracing/thrift/doc/install/windows.md @@ -0,0 +1,186 @@ +## Windows Setup + +The Thrift environment consists of two main parts: The Thrift compiler EXE and the language-dependent libraries. Most of these libraries will require some kind of build and/or installation. But regarding the Thrift compiler utility there are a number of different alternatives. + +The first one of these alternatives is to download the **pre-built Thrift Compiler EXE** and only build the libraries needed from source, following one of the "Setup from source" methods outlined below. + +The other two options are to build the Thrift compiler from source. The most recommended way to achieve this is by means of the **Visual Studio C++ build project**. Alternatively, the Thrift compiler can also be built via **Cygwin** or **MinGW** build environments, however this method is not only less comfortable, but more time-consuming and requires much more manual effort. + + +## Prebuilt Thrift compiler + +The windows Thrift compiler is available as a prebuilt exe available [here](/download). Note that there is no installation tool, rather this EXE file *is* already the Thrift compiler utility. Download the file and put it into some suitable location of your choice. + +Now pick one of the "Build and install target libraries" below to continue. + + +## Setup from source via Visual Studio C++ (recommended) + +### Requirements + +Thrift's compiler is written in C++ and designed to be portable, but there are some system requirements. Thrift's runtime libraries are written in various languages, which are also required for the particular language interface. + + * Visual Studio C++, any recent version should do + * Flex and Bison, e.g. the WinFlexBison package + * [Apache Thrift Requirements](/docs/install) + +### Build and install the compiler + +After all requirements are in place, use the `compiler/cpp/compiler.vcxproj` build project to build the Thrift compiler. Copy the resulting EXE file to a location of your choice. + +### Build and install target libraries + +A few of the target language libraries also do provide Visual Studio project files, such as C++ and C#. These are located in the `lib/<language>/` folders. + +Most of the language packages must be built and installed manually using build tools better suited to those languages. Typical examples are Java, Ruby, Delphi, or PHP. Look for the `README.md` file in the `lib/<language>/` folder for more details on how to build and install each language's library package. + + +## Setup from source via Cygwin + +### Requirements + +Thrift's compiler is written in C++ and designed to be portable, but there are some system requirements. Thrift's runtime libraries are written in various languages, which are also required for the particular language interface. + + * Cygwin or MinGW + * [Apache Thrift Requirements](/docs/install) + +### Installing from source + +If you are building from the first time out of the source repository, you will need to generate the configure scripts. (This is not necessary if you downloaded a tarball.) From the top directory, do: + + ./bootstrap.sh + +Once the configure scripts are generated, thrift can be configured. From the top directory, do: + + export CXXFLAGS="-D PTHREAD_MUTEX_RECURSIVE_NP=PTHREAD_MUTEX_RECURSIVE" + ./configure + +Setting the CXXFLAGS environmental variable works around compile errors with PTHREAD_MUTEX_RECURSIVE_NP being undeclared, by replacing it with the newer, portable PTHREAD_MUTEX_RECURSIVE. (Tested on cygwin 20100320, Thrift r760184, latest pthread.) + +**Optional:** You **may not** be able to make from the root Thrift directory due to errors (see below to resolve). To make the compiler only, change to the compiler directory before running make: + + cd compiler/cpp + +Now make the thrift compiler (& runtime libraries if make is run from the thrift root directory): + + make + make install + +### Build and install target libraries + +Some language packages must be installed manually using build tools better suited to those languages. Typical examples are Java, Ruby, or PHP. Look for the README file in the `lib/<language>/` folder for more details on the installation of each language library package. + +### Possible issues with Cygwin install + +See also Possible issues with MinGW install. + +#### Syntax error in ./configure + +The following error occurs for some users when running ./configure: + + ./configure: line 21183: syntax error near unexpected token `MONO,' + ./configure: line 21183: ` PKG_CHECK_MODULES(MONO, mono >= 1.2.6, have_mono=yes, have_mono=no)' + +To resolve this, you'll need to find your pkg.m4 (installed by the pkg-config package) file and copy it to the thrift/aclocal directory. From the top-level thrift directory, you can copy the file by running + + cp /usr/share/aclocal/pkg.m4 aclocal + +Finally, re-run ./bootstrap.sh and ./configure. (Note that pkg.m4 is created by the pkg-config tool. If your /usr/share/aclocal directory doesn't contain the pkg.m4 file, you may not have pkg-config installed.) + +#### Installing perl runtime libraries + +Sometimes, there will be an error during the install of the perl libraries with chmod. + +A workaround is to avoid installing the perl libraries if they are not needed. + +If you don't need perl, run configure with --without-perl. + +If you need perl, and are happy to manually install it, replace the contents of thrift/lib/perl/Makefile with the following, after building thrift: + + TODO + +#### Linking to installed C++ runtime libraries + +Sometimes, the installed libthrift.a will not link using g++, with linker errors about missing vtables and exceptions for Thrift classes. + +A workaround is to link the compiled object files directly from your Thrift build, corresponding to the missing classes. + +This can be implemented in a Makefile using the following lines: + + THRIFT_O=<path to>/thrift/lib/cpp + LTHRIFT=$(THRIFT_O)/Thrift.o $(THRIFT_O)/TSocket.o $(THRIFT_O)/TBinaryProtocol.o $(THRIFT_O)/TBufferTransports.o + +Then linking using $(LTHRIFT) instead of -lthrift. + + TODO - diagnose issue further + +#### C++ runtime segfault with cygwin 1.7.5-1, g++-4.3.4, fork() and throw + +If your thrift C++ programs segfault on throw after fork()ing, compile them with g++-3. + +The issue and patch are described on the Cygwin mailing list at http://cygwin.com/ml/cygwin/2010-05/msg00203.html + +This issue should be fixed in Cygwin versions after 1.7.5-1, or g++ 4.5.0. + +## Setup from source via MinGW + +### Requirements + +To compile the Thrift generator & runtime libraries (untested) without the cygwin.dll dependency you need to install MinGW (www.mingw.org). + + * MinGW + * [Apache Thrift Requirements](/docs/install) + +In addition you need to add the following entry to your windows PATH variable. + + C:\MINGW\BIN + +Next, open compiler/cpp/Makefile.am and add the following line to thrift_CXXFLAGS + + -DMINGW -mno-cygwin -lfl + +Run bootstrap.sh: + + ./bootstrap.sh + +Make sure you have java in your $PATH variable, if not do(adjust path if necessary): + + export PATH=$PATH:"/cygdrive/c/program files/java/jre1.8.0_191/bin" + +Run configure - using CXXFLAGS to work around an issue with an old pthreads define (untested on MinGW - works on Cygwin): + + export CXXFLAGS="-D PTHREAD_MUTEX_RECURSIVE_NP=PTHREAD_MUTEX_RECURSIVE" + ./configure + +''Optional:'' To make the compiler only, change to the compiler directory before running make: + + cd compiler/cpp + +Run make: + + mingw32-make.exe + +### Possible issues with MinGW install + +See also Possible issues with Cygwin install, including the discussion about PTHREAD_MUTEX_RECURSIVE_NP. + +#### yywrap is not found + +Make sure you add -lfl in your cxxflags in Makefile, also try adding -Lc:/cygwin/libs + +#### boost is not found + +Try and change the include dir to use the windows path from c like this: Edit compiler/cpp/Makefile, look for the declaration of BOOST_CPPFLAGS, change that line for + + BOOST_CPPFLAGS = -Ic:/cygwin/usr/include/boost-1_53_0 + +#### realpath is not found + +add -DMINGW -mno-cygwin to the CXXDEFS variable in Makefile + +## Additional reading + +For more information on the requirements see: [Apache Thrift Requirements](/docs/install) + +For more information on building and installing Thrift see: [Building from source](/docs/BuildingFromSource) + diff --git a/src/jaegertracing/thrift/doc/licenses/lgpl-2.1.txt b/src/jaegertracing/thrift/doc/licenses/lgpl-2.1.txt new file mode 100644 index 000000000..5ab7695ab --- /dev/null +++ b/src/jaegertracing/thrift/doc/licenses/lgpl-2.1.txt @@ -0,0 +1,504 @@ + GNU LESSER GENERAL PUBLIC LICENSE + Version 2.1, February 1999 + + Copyright (C) 1991, 1999 Free Software Foundation, Inc. + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + +[This is the first released version of the Lesser GPL. It also counts + as the successor of the GNU Library Public License, version 2, hence + the version number 2.1.] + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +Licenses are intended to guarantee your freedom to share and change +free software--to make sure the software is free for all its users. + + This license, the Lesser General Public License, applies to some +specially designated software packages--typically libraries--of the +Free Software Foundation and other authors who decide to use it. You +can use it too, but we suggest you first think carefully about whether +this license or the ordinary General Public License is the better +strategy to use in any particular case, based on the explanations below. + + When we speak of free software, we are referring to freedom of use, +not price. Our General Public Licenses are designed to make sure that +you have the freedom to distribute copies of free software (and charge +for this service if you wish); that you receive source code or can get +it if you want it; that you can change the software and use pieces of +it in new free programs; and that you are informed that you can do +these things. + + To protect your rights, we need to make restrictions that forbid +distributors to deny you these rights or to ask you to surrender these +rights. These restrictions translate to certain responsibilities for +you if you distribute copies of the library or if you modify it. + + For example, if you distribute copies of the library, whether gratis +or for a fee, you must give the recipients all the rights that we gave +you. You must make sure that they, too, receive or can get the source +code. If you link other code with the library, you must provide +complete object files to the recipients, so that they can relink them +with the library after making changes to the library and recompiling +it. And you must show them these terms so they know their rights. + + We protect your rights with a two-step method: (1) we copyright the +library, and (2) we offer you this license, which gives you legal +permission to copy, distribute and/or modify the library. + + To protect each distributor, we want to make it very clear that +there is no warranty for the free library. Also, if the library is +modified by someone else and passed on, the recipients should know +that what they have is not the original version, so that the original +author's reputation will not be affected by problems that might be +introduced by others. + + Finally, software patents pose a constant threat to the existence of +any free program. We wish to make sure that a company cannot +effectively restrict the users of a free program by obtaining a +restrictive license from a patent holder. Therefore, we insist that +any patent license obtained for a version of the library must be +consistent with the full freedom of use specified in this license. + + Most GNU software, including some libraries, is covered by the +ordinary GNU General Public License. This license, the GNU Lesser +General Public License, applies to certain designated libraries, and +is quite different from the ordinary General Public License. We use +this license for certain libraries in order to permit linking those +libraries into non-free programs. + + When a program is linked with a library, whether statically or using +a shared library, the combination of the two is legally speaking a +combined work, a derivative of the original library. The ordinary +General Public License therefore permits such linking only if the +entire combination fits its criteria of freedom. The Lesser General +Public License permits more lax criteria for linking other code with +the library. + + We call this license the "Lesser" General Public License because it +does Less to protect the user's freedom than the ordinary General +Public License. It also provides other free software developers Less +of an advantage over competing non-free programs. These disadvantages +are the reason we use the ordinary General Public License for many +libraries. However, the Lesser license provides advantages in certain +special circumstances. + + For example, on rare occasions, there may be a special need to +encourage the widest possible use of a certain library, so that it becomes +a de-facto standard. To achieve this, non-free programs must be +allowed to use the library. A more frequent case is that a free +library does the same job as widely used non-free libraries. In this +case, there is little to gain by limiting the free library to free +software only, so we use the Lesser General Public License. + + In other cases, permission to use a particular library in non-free +programs enables a greater number of people to use a large body of +free software. For example, permission to use the GNU C Library in +non-free programs enables many more people to use the whole GNU +operating system, as well as its variant, the GNU/Linux operating +system. + + Although the Lesser General Public License is Less protective of the +users' freedom, it does ensure that the user of a program that is +linked with the Library has the freedom and the wherewithal to run +that program using a modified version of the Library. + + The precise terms and conditions for copying, distribution and +modification follow. Pay close attention to the difference between a +"work based on the library" and a "work that uses the library". The +former contains code derived from the library, whereas the latter must +be combined with the library in order to run. + + GNU LESSER GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License Agreement applies to any software library or other +program which contains a notice placed by the copyright holder or +other authorized party saying it may be distributed under the terms of +this Lesser General Public License (also called "this License"). +Each licensee is addressed as "you". + + A "library" means a collection of software functions and/or data +prepared so as to be conveniently linked with application programs +(which use some of those functions and data) to form executables. + + The "Library", below, refers to any such software library or work +which has been distributed under these terms. A "work based on the +Library" means either the Library or any derivative work under +copyright law: that is to say, a work containing the Library or a +portion of it, either verbatim or with modifications and/or translated +straightforwardly into another language. (Hereinafter, translation is +included without limitation in the term "modification".) + + "Source code" for a work means the preferred form of the work for +making modifications to it. For a library, complete source code means +all the source code for all modules it contains, plus any associated +interface definition files, plus the scripts used to control compilation +and installation of the library. + + Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running a program using the Library is not restricted, and output from +such a program is covered only if its contents constitute a work based +on the Library (independent of the use of the Library in a tool for +writing it). Whether that is true depends on what the Library does +and what the program that uses the Library does. + + 1. You may copy and distribute verbatim copies of the Library's +complete source code as you receive it, in any medium, provided that +you conspicuously and appropriately publish on each copy an +appropriate copyright notice and disclaimer of warranty; keep intact +all the notices that refer to this License and to the absence of any +warranty; and distribute a copy of this License along with the +Library. + + You may charge a fee for the physical act of transferring a copy, +and you may at your option offer warranty protection in exchange for a +fee. + + 2. You may modify your copy or copies of the Library or any portion +of it, thus forming a work based on the Library, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) The modified work must itself be a software library. + + b) You must cause the files modified to carry prominent notices + stating that you changed the files and the date of any change. + + c) You must cause the whole of the work to be licensed at no + charge to all third parties under the terms of this License. + + d) If a facility in the modified Library refers to a function or a + table of data to be supplied by an application program that uses + the facility, other than as an argument passed when the facility + is invoked, then you must make a good faith effort to ensure that, + in the event an application does not supply such function or + table, the facility still operates, and performs whatever part of + its purpose remains meaningful. + + (For example, a function in a library to compute square roots has + a purpose that is entirely well-defined independent of the + application. Therefore, Subsection 2d requires that any + application-supplied function or table used by this function must + be optional: if the application does not supply it, the square + root function must still compute square roots.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Library, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Library, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote +it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Library. + +In addition, mere aggregation of another work not based on the Library +with the Library (or with a work based on the Library) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may opt to apply the terms of the ordinary GNU General Public +License instead of this License to a given copy of the Library. To do +this, you must alter all the notices that refer to this License, so +that they refer to the ordinary GNU General Public License, version 2, +instead of to this License. (If a newer version than version 2 of the +ordinary GNU General Public License has appeared, then you can specify +that version instead if you wish.) Do not make any other change in +these notices. + + Once this change is made in a given copy, it is irreversible for +that copy, so the ordinary GNU General Public License applies to all +subsequent copies and derivative works made from that copy. + + This option is useful when you wish to copy part of the code of +the Library into a program that is not a library. + + 4. You may copy and distribute the Library (or a portion or +derivative of it, under Section 2) in object code or executable form +under the terms of Sections 1 and 2 above provided that you accompany +it with the complete corresponding machine-readable source code, which +must be distributed under the terms of Sections 1 and 2 above on a +medium customarily used for software interchange. + + If distribution of object code is made by offering access to copy +from a designated place, then offering equivalent access to copy the +source code from the same place satisfies the requirement to +distribute the source code, even though third parties are not +compelled to copy the source along with the object code. + + 5. A program that contains no derivative of any portion of the +Library, but is designed to work with the Library by being compiled or +linked with it, is called a "work that uses the Library". Such a +work, in isolation, is not a derivative work of the Library, and +therefore falls outside the scope of this License. + + However, linking a "work that uses the Library" with the Library +creates an executable that is a derivative of the Library (because it +contains portions of the Library), rather than a "work that uses the +library". The executable is therefore covered by this License. +Section 6 states terms for distribution of such executables. + + When a "work that uses the Library" uses material from a header file +that is part of the Library, the object code for the work may be a +derivative work of the Library even though the source code is not. +Whether this is true is especially significant if the work can be +linked without the Library, or if the work is itself a library. The +threshold for this to be true is not precisely defined by law. + + If such an object file uses only numerical parameters, data +structure layouts and accessors, and small macros and small inline +functions (ten lines or less in length), then the use of the object +file is unrestricted, regardless of whether it is legally a derivative +work. (Executables containing this object code plus portions of the +Library will still fall under Section 6.) + + Otherwise, if the work is a derivative of the Library, you may +distribute the object code for the work under the terms of Section 6. +Any executables containing that work also fall under Section 6, +whether or not they are linked directly with the Library itself. + + 6. As an exception to the Sections above, you may also combine or +link a "work that uses the Library" with the Library to produce a +work containing portions of the Library, and distribute that work +under terms of your choice, provided that the terms permit +modification of the work for the customer's own use and reverse +engineering for debugging such modifications. + + You must give prominent notice with each copy of the work that the +Library is used in it and that the Library and its use are covered by +this License. You must supply a copy of this License. If the work +during execution displays copyright notices, you must include the +copyright notice for the Library among them, as well as a reference +directing the user to the copy of this License. Also, you must do one +of these things: + + a) Accompany the work with the complete corresponding + machine-readable source code for the Library including whatever + changes were used in the work (which must be distributed under + Sections 1 and 2 above); and, if the work is an executable linked + with the Library, with the complete machine-readable "work that + uses the Library", as object code and/or source code, so that the + user can modify the Library and then relink to produce a modified + executable containing the modified Library. (It is understood + that the user who changes the contents of definitions files in the + Library will not necessarily be able to recompile the application + to use the modified definitions.) + + b) Use a suitable shared library mechanism for linking with the + Library. A suitable mechanism is one that (1) uses at run time a + copy of the library already present on the user's computer system, + rather than copying library functions into the executable, and (2) + will operate properly with a modified version of the library, if + the user installs one, as long as the modified version is + interface-compatible with the version that the work was made with. + + c) Accompany the work with a written offer, valid for at + least three years, to give the same user the materials + specified in Subsection 6a, above, for a charge no more + than the cost of performing this distribution. + + d) If distribution of the work is made by offering access to copy + from a designated place, offer equivalent access to copy the above + specified materials from the same place. + + e) Verify that the user has already received a copy of these + materials or that you have already sent this user a copy. + + For an executable, the required form of the "work that uses the +Library" must include any data and utility programs needed for +reproducing the executable from it. However, as a special exception, +the materials to be distributed need not include anything that is +normally distributed (in either source or binary form) with the major +components (compiler, kernel, and so on) of the operating system on +which the executable runs, unless that component itself accompanies +the executable. + + It may happen that this requirement contradicts the license +restrictions of other proprietary libraries that do not normally +accompany the operating system. Such a contradiction means you cannot +use both them and the Library together in an executable that you +distribute. + + 7. You may place library facilities that are a work based on the +Library side-by-side in a single library together with other library +facilities not covered by this License, and distribute such a combined +library, provided that the separate distribution of the work based on +the Library and of the other library facilities is otherwise +permitted, and provided that you do these two things: + + a) Accompany the combined library with a copy of the same work + based on the Library, uncombined with any other library + facilities. This must be distributed under the terms of the + Sections above. + + b) Give prominent notice with the combined library of the fact + that part of it is a work based on the Library, and explaining + where to find the accompanying uncombined form of the same work. + + 8. You may not copy, modify, sublicense, link with, or distribute +the Library except as expressly provided under this License. Any +attempt otherwise to copy, modify, sublicense, link with, or +distribute the Library is void, and will automatically terminate your +rights under this License. However, parties who have received copies, +or rights, from you under this License will not have their licenses +terminated so long as such parties remain in full compliance. + + 9. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Library or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Library (or any work based on the +Library), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Library or works based on it. + + 10. Each time you redistribute the Library (or any work based on the +Library), the recipient automatically receives a license from the +original licensor to copy, distribute, link with or modify the Library +subject to these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties with +this License. + + 11. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Library at all. For example, if a patent +license would not permit royalty-free redistribution of the Library by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Library. + +If any portion of this section is held invalid or unenforceable under any +particular circumstance, the balance of the section is intended to apply, +and the section as a whole is intended to apply in other circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 12. If the distribution and/or use of the Library is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Library under this License may add +an explicit geographical distribution limitation excluding those countries, +so that distribution is permitted only in or among countries not thus +excluded. In such case, this License incorporates the limitation as if +written in the body of this License. + + 13. The Free Software Foundation may publish revised and/or new +versions of the Lesser General Public License from time to time. +Such new versions will be similar in spirit to the present version, +but may differ in detail to address new problems or concerns. + +Each version is given a distinguishing version number. If the Library +specifies a version number of this License which applies to it and +"any later version", you have the option of following the terms and +conditions either of that version or of any later version published by +the Free Software Foundation. If the Library does not specify a +license version number, you may choose any version ever published by +the Free Software Foundation. + + 14. If you wish to incorporate parts of the Library into other free +programs whose distribution conditions are incompatible with these, +write to the author to ask for permission. For software which is +copyrighted by the Free Software Foundation, write to the Free +Software Foundation; we sometimes make exceptions for this. Our +decision will be guided by the two goals of preserving the free status +of all derivatives of our free software and of promoting the sharing +and reuse of software generally. + + NO WARRANTY + + 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO +WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. +EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR +OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY +KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE +LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME +THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN +WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY +AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU +FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR +CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE +LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING +RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A +FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF +SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH +DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Libraries + + If you develop a new library, and you want it to be of the greatest +possible use to the public, we recommend making it free software that +everyone can redistribute and change. You can do so by permitting +redistribution under these terms (or, alternatively, under the terms of the +ordinary General Public License). + + To apply these terms, attach the following notices to the library. It is +safest to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least the +"copyright" line and a pointer to where the full notice is found. + + <one line to give the library's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with this library; if not, write to the Free Software + Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + +Also add information on how to contact you by electronic and paper mail. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the library, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the + library `Frob' (a library for tweaking knobs) written by James Random Hacker. + + <signature of Ty Coon>, 1 April 1990 + Ty Coon, President of Vice + +That's all there is to it! + + diff --git a/src/jaegertracing/thrift/doc/licenses/otp-base-license.txt b/src/jaegertracing/thrift/doc/licenses/otp-base-license.txt new file mode 100644 index 000000000..8ee29920a --- /dev/null +++ b/src/jaegertracing/thrift/doc/licenses/otp-base-license.txt @@ -0,0 +1,20 @@ +Tue Oct 24 12:28:44 CDT 2006 + +Copyright (c) <2006> <Martin J. Logan, Erlware> + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software (OTP Base, fslib, G.A.S) and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, +INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A +PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT +HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION +OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE +OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. diff --git a/src/jaegertracing/thrift/doc/specs/HeaderFormat.md b/src/jaegertracing/thrift/doc/specs/HeaderFormat.md new file mode 100644 index 000000000..42ec7ae38 --- /dev/null +++ b/src/jaegertracing/thrift/doc/specs/HeaderFormat.md @@ -0,0 +1,82 @@ +<link href="http://kevinburke.bitbucket.org/markdowncss/markdown.css" rel="stylesheet"></link> + +Header format for the THeader.h +=============================== + + 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 1 2 3 4 5 6 7 8 9 a b c d e f + +----------------------------------------------------------------+ + | 0| LENGTH | + +----------------------------------------------------------------+ + | 0| HEADER MAGIC | FLAGS | + +----------------------------------------------------------------+ + | SEQUENCE NUMBER | + +----------------------------------------------------------------+ + | 0| Header Size(/32) | ... + +--------------------------------- + + Header is of variable size: + (and starts at offset 14) + + +----------------------------------------------------------------+ + | PROTOCOL ID (varint) | NUM TRANSFORMS (varint) | + +----------------------------------------------------------------+ + | TRANSFORM 0 ID (varint) | TRANSFORM 0 DATA ... + +----------------------------------------------------------------+ + | ... ... | + +----------------------------------------------------------------+ + | INFO 0 ID (varint) | INFO 0 DATA ... + +----------------------------------------------------------------+ + | ... ... | + +----------------------------------------------------------------+ + | | + | PAYLOAD | + | | + +----------------------------------------------------------------+ + +The `LENGTH` field is 32 bits, and counts the remaining bytes in the +packet, NOT including the length field. The header size field is 16 +bits, and defines the size of the header remaining NOT including the +`HEADER MAGIC`, `FLAGS`, `SEQUENCE NUMBER` and header size fields. The +Header size field is in bytes/4. + +The transform ID's are varints. The data for each transform is +defined by the transform ID in the code - no size is given in the +header. If a transform ID is specified from a client and the server +doesn't know about the transform ID, an error MUST be returned as we +don't know how to transform the data. + +Conversely, data in the info headers is ignorable. This should only +be things like timestamps, debuging tracing, etc. Using the header +size you should be able to skip this data and read the payload safely +if you don't know the info ID. + +Info's should be oldest supported to newest supported order, so that +if we read an info ID we don't support, none of the remaining info +ID's will be supported either, and we can safely skip to the payload. + +Info ID's and transform ID's should share the same ID space. + +### PADDING: + +Header will be padded out to next 4-byte boundary with `0x00`. + +Max frame size is `0x3FFFFFFF`, which is slightly less than `HTTP_MAGIC`. +This allows us to distingush between different (older) transports. + +### Transform IDs: + + ZLIB_TRANSFORM 0x01 - No data for this. Use zlib to (de)compress the + data. + + HMAC_TRANSFORM 0x02 - Variable amount of mac data. One byte to specify + size. Mac data is appended at the end of the packet. + SNAPPY_TRANSFORM 0x03 - No data for this. Use snappy to (de)compress the + data. + + +###Info IDs: + + INFO_KEYVALUE 0x01 - varint32 number of headers. + - key/value pairs of varstrings (varint16 length plus + no-trailing-null string). + diff --git a/src/jaegertracing/thrift/doc/specs/SequenceNumbers.md b/src/jaegertracing/thrift/doc/specs/SequenceNumbers.md new file mode 100644 index 000000000..fef3fcff1 --- /dev/null +++ b/src/jaegertracing/thrift/doc/specs/SequenceNumbers.md @@ -0,0 +1,23 @@ +# Sequence Number # + +Apache Thrift built sequence numbers into every protocol exchange to allow +for clients that may submit multiple outstanding requests on a single transport +connection. This is typically done by asynchronous clients. + +The following rules apply to sequence numbers: + +1. A sequence number is a signed 32-bit integer. Negative values are allowed. +1. Sequence numbers `MUST` be unique across all outstanding requests on a + given transport connection. There is no requirement for unique numbers + between different transport connections even if they are from the same client. +1. A server `MUST` reply to a client with the same sequence number that was + used in the request. This includes any exception-based reply. +1. A client `MAY` use sequence numbers if it needs them for proper operation. +1. A client `SHOULD` set the sequence number to zero if it does not rely + on them. +1. Wrapped protocols (such as THeaderProtocol) `SHOULD` use the same sequence + number on the wrapping as is used on the payload protocol. + +Servers will not inspect or make any logic choices based on the sequence number +sent by the client. The server's only job is to process the request and reply +with the same sequence number. diff --git a/src/jaegertracing/thrift/doc/specs/idl.md b/src/jaegertracing/thrift/doc/specs/idl.md new file mode 100644 index 000000000..9439bee4e --- /dev/null +++ b/src/jaegertracing/thrift/doc/specs/idl.md @@ -0,0 +1,250 @@ +## Thrift interface description language + +For Thrift version 0.13.0. + +The Thrift interface definition language (IDL) allows for the definition of [Thrift Types](/docs/types). A Thrift IDL file is processed by the Thrift code generator to produce code for the various target languages to support the defined structs and services in the IDL file. + +## Description + +Here is a description of the Thrift IDL. + +## Document + +Every Thrift document contains 0 or more headers followed by 0 or more definitions. + + [1] Document ::= Header* Definition* + +## Header + +A header is either a Thrift include, a C++ include, or a namespace declaration. + + [2] Header ::= Include | CppInclude | Namespace + +### Thrift Include + +An include makes all the symbols from another file visible (with a prefix) and adds corresponding include statements into the code generated for this Thrift document. + + [3] Include ::= 'include' Literal + +### C++ Include + +A C++ include adds a custom C++ include to the output of the C++ code generator for this Thrift document. + + [4] CppInclude ::= 'cpp_include' Literal + +### Namespace + +A namespace declares which namespaces/package/module/etc. the type definitions in this file will be declared in for the target languages. The namespace scope indicates which language the namespace applies to; a scope of '*' indicates that the namespace applies to all target languages. + + [5] Namespace ::= ( 'namespace' ( NamespaceScope Identifier ) ) + + [6] NamespaceScope ::= '*' | 'c_glib' | 'cpp' | 'csharp' | 'delphi' | 'go' | 'java' | 'js' | 'lua' | 'netcore' | 'perl' | 'php' | 'py' | 'py.twisted' | 'rb' | 'st' | 'xsd' + +## Definition + + [7] Definition ::= Const | Typedef | Enum | Senum | Struct | Union | Exception | Service + +### Const + + [8] Const ::= 'const' FieldType Identifier '=' ConstValue ListSeparator? + +### Typedef + +A typedef creates an alternate name for a type. + + [9] Typedef ::= 'typedef' DefinitionType Identifier + +### Enum + +An enum creates an enumerated type, with named values. If no constant value is supplied, the value is either 0 for the first element, or one greater than the preceding value for any subsequent element. Any constant value that is supplied must be non-negative. + + [10] Enum ::= 'enum' Identifier '{' (Identifier ('=' IntConstant)? ListSeparator?)* '}' + +### Senum + +Senum (and Slist) are now deprecated and should both be replaced with String. + + [11] Senum ::= 'senum' Identifier '{' (Literal ListSeparator?)* '}' + +### Struct + +Structs are the fundamental compositional type in Thrift. The name of each field must be unique within the struct. + + [12] Struct ::= 'struct' Identifier 'xsd_all'? '{' Field* '}' + +N.B.: The `xsd_all` keyword has some purpose internal to Facebook but serves no purpose in Thrift itself. Use of this feature is strongly discouraged + +### Union + +Unions are similar to structs, except that they provide a means to transport exactly one field of a possible set of fields, just like union {} in C++. Consequently, union members are implicitly considered optional (see requiredness). + + [13] Union ::= 'union' Identifier 'xsd_all'? '{' Field* '}' + +N.B.: The `xsd_all` keyword has some purpose internal to Facebook but serves no purpose in Thrift itself. Use of this feature is strongly discouraged + +### Exception + +Exceptions are similar to structs except that they are intended to integrate with the native exception handling mechanisms in the target languages. The name of each field must be unique within the exception. + + [14] Exception ::= 'exception' Identifier '{' Field* '}' + +### Service + +A service provides the interface for a set of functionality provided by a Thrift server. The interface is simply a list of functions. A service can extend another service, which simply means that it provides the functions of the extended service in addition to its own. + + [15] Service ::= 'service' Identifier ( 'extends' Identifier )? '{' Function* '}' + +## Field + + [16] Field ::= FieldID? FieldReq? FieldType Identifier ('=' ConstValue)? XsdFieldOptions ListSeparator? + +### Field ID + + [17] FieldID ::= IntConstant ':' + +### Field Requiredness + +There are two explicit requiredness values, and a third one that is applied implicity if neither *required* nor *optional* are given: *default* requiredness. + + [18] FieldReq ::= 'required' | 'optional' + +The general rules for requiredness are as follows: + +#### required + +- Write: Required fields are always written and are expected to be set. +- Read: Required fields are always read and are expected to be contained in the input stream. +- Defaults values: are always written + +If a required field is missing during read, the expected behaviour is to indicate an unsuccessful read operation to the caller, e.g. by throwing an exception or returning an error. + +Because of this behaviour, required fields drastically limit the options with regard to soft versioning. Because they must be present on read, the fields cannot be deprecated. If a required field would be removed (or changed to optional), the data are no longer compatible between versions. + +#### optional + +- Write: Optional fields are only written when they are set +- Read: Optional fields may, or may not be part of the input stream. +- Default values: are written when the isset flag is set + +Most language implementations use the recommended practice of so-called "isset" flags to indicate whether a particular optional field is set or not. Only fields with this flag set are written, and conversely the flag is only set when a field value has been read from the input stream. + +#### default requiredness (implicit) + +- Write: In theory, the fields are always written. There are some exceptions to that rule, see below. +- Read: Like optional, the field may, or may not be part of the input stream. +- Default values: may not be written (see next section) + +Default requiredness is a good starting point. The desired behaviour is a mix of optional and required, hence the internal name "opt-in, req-out". Although in theory these fields are supposed to be written ("req-out"), in reality unset fields are not always written. This is especially the case, when the field contains a <null> value, which by definition cannot be transported through thrift. The only way to achieve this is by not writing that field at all, and that's what most languages do. + +#### Semantics of Default Values + +There are ongoing discussions about that topic, see JIRA for details. Not all implementations treat default values in the very same way, but the current status quo is more or less that default fields are typically set at initialization time. Therefore, a value that equals the default may not be written, because the read end will set the value implicitly. On the other hand, an implementation is free to write the default value anyways, as there is no hard restriction that prevents this. + +The major point to keep in mind here is the fact, that any unwritten default value implicitly becomes part of the interface version. If that default is changed, the interface changes. If, in contrast, the default value is written into the output data, the default in the IDL can change at any time without affecting serialized data. + +### XSD Options + +N.B.: These have some internal purpose at Facebook but serve no current purpose in Thrift. Use of these options is strongly discouraged. + + [19] XsdFieldOptions ::= 'xsd_optional'? 'xsd_nillable'? XsdAttrs? + + [20] XsdAttrs ::= 'xsd_attrs' '{' Field* '}' + +## Functions + + [21] Function ::= 'oneway'? FunctionType Identifier '(' Field* ')' Throws? ListSeparator? + + [22] FunctionType ::= FieldType | 'void' + + [23] Throws ::= 'throws' '(' Field* ')' + +## Types + + [24] FieldType ::= Identifier | BaseType | ContainerType + + [25] DefinitionType ::= BaseType | ContainerType + + [26] BaseType ::= 'bool' | 'byte' | 'i8' | 'i16' | 'i32' | 'i64' | 'double' | 'string' | 'binary' | 'slist' + + [27] ContainerType ::= MapType | SetType | ListType + + [28] MapType ::= 'map' CppType? '<' FieldType ',' FieldType '>' + + [29] SetType ::= 'set' CppType? '<' FieldType '>' + + [30] ListType ::= 'list' '<' FieldType '>' CppType? + + [31] CppType ::= 'cpp_type' Literal + +## Constant Values + + [32] ConstValue ::= IntConstant | DoubleConstant | Literal | Identifier | ConstList | ConstMap + + [33] IntConstant ::= ('+' | '-')? Digit+ + + [34] DoubleConstant ::= ('+' | '-')? Digit* ('.' Digit+)? ( ('E' | 'e') IntConstant )? + + [35] ConstList ::= '[' (ConstValue ListSeparator?)* ']' + + [36] ConstMap ::= '{' (ConstValue ':' ConstValue ListSeparator?)* '}' + +## Basic Definitions + +### Literal + + [37] Literal ::= ('"' [^"]* '"') | ("'" [^']* "'") + +### Identifier + + [38] Identifier ::= ( Letter | '_' ) ( Letter | Digit | '.' | '_' )* + + [39] STIdentifier ::= ( Letter | '_' ) ( Letter | Digit | '.' | '_' | '-' )* + +### List Separator + + [40] ListSeparator ::= ',' | ';' + +### Letters and Digits + + [41] Letter ::= ['A'-'Z'] | ['a'-'z'] + + [42] Digit ::= ['0'-'9'] + +## Examples + +Here are some examples of Thrift definitions, using the Thrift IDL: + + * [ThriftTest.thrift][] used by the Thrift TestFramework + * Thrift [tutorial][] + * Facebook's [fb303.thrift][] + * [Apache Cassandra's][] Thrift IDL: [cassandra.thrift][] + * [Evernote API][] + + [ThriftTest.thrift]: https://raw.githubusercontent.com/apache/thrift/master/test/ThriftTest.thrift + [tutorial]: /tutorial/ + [fb303.thrift]: https://raw.githubusercontent.com/apache/thrift/master/contrib/fb303/if/fb303.thrift + [Apache Cassandra's]: http://cassandra.apache.org/ + [cassandra.thrift]: http://svn.apache.org/viewvc/cassandra/trunk/interface/cassandra.thrift?view=co + [Evernote API]: http://www.evernote.com/about/developer/api/ + +## To Do/Questions + +Initialization of Base Types for all Languages? + + * Do all Languages initialize them to 0, bool=false and string=""? or null, undefined? + +Why does position of `CppType` vary between `SetType` and `ListType`? + + * std::set does sort the elements automatically, that's the design. see [Thrift Types](/docs/types) or the [C++ std:set reference][] for further details + * The question is, how other languages are doing that? What about custom objects, do they have a Compare function the set the order correctly? + + [C++ std:set reference]: http://www.cplusplus.com/reference/stl/set/ + +Why can't `DefinitionType` be the same as `FieldType` (i.e. include `Identifier`)? + +Examine the `smalltalk.prefix` and `smalltalk.category` status (esp `smalltalk.category`, which takes `STIdentifier` as its argument)... + +What to do about `ListSeparator`? Do we really want to be as lax as we currently are? + +Should `Field*` really be `Field+` in `Struct`, `Enum`, etc.? + diff --git a/src/jaegertracing/thrift/doc/specs/thrift-binary-protocol.md b/src/jaegertracing/thrift/doc/specs/thrift-binary-protocol.md new file mode 100644 index 000000000..a85268517 --- /dev/null +++ b/src/jaegertracing/thrift/doc/specs/thrift-binary-protocol.md @@ -0,0 +1,254 @@ +Thrift Binary protocol encoding +=============================== + +<!-- +-------------------------------------------------------------------- + +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + +-------------------------------------------------------------------- +--> + +This documents describes the wire encoding for RPC using the older Thrift *binary protocol*. + +The information here is _mostly_ based on the Java implementation in the Apache thrift library (version 0.9.1 and +0.9.3). Other implementation however, should behave the same. + +For background on Thrift see the [Thrift whitepaper (pdf)](https://thrift.apache.org/static/files/thrift-20070401.pdf). + +# Contents + +* Binary protocol + * Base types + * Message + * Struct + * List and Set + * Map +* BNF notation used in this document + +# Binary protocol + +## Base types + +### Integer encoding + +In the _binary protocol_ integers are encoded with the most significant byte first (big endian byte order, aka network +order). An `int8` needs 1 byte, an `int16` 2, an `int32` 4 and an `int64` needs 8 bytes. + +The CPP version has the option to use the binary protocol with little endian order. Little endian gives a small but +noticeable performance boost because contemporary CPUs use little endian when storing integers to RAM. + +### Enum encoding + +The generated code encodes `Enum`s by taking the ordinal value and then encoding that as an int32. + +### Binary encoding + +Binary is sent as follows: + +``` +Binary protocol, binary data, 4+ bytes: ++--------+--------+--------+--------+--------+...+--------+ +| byte length | bytes | ++--------+--------+--------+--------+--------+...+--------+ +``` + +Where: + +* `byte length` is the length of the byte array, a signed 32 bit integer encoded in network (big endian) order (must be >= 0). +* `bytes` are the bytes of the byte array. + +### String encoding + +*String*s are first encoded to UTF-8, and then send as binary. + +### Double encoding + +Values of type `double` are first converted to an int64 according to the IEEE 754 floating-point "double format" bit +layout. Most run-times provide a library to make this conversion. Both the binary protocol as the compact protocol then +encode the int64 in 8 bytes in big endian order. + +### Boolean encoding + +Values of `bool` type are first converted to an int8. True is converted to `1`, false to `0`. + +## Message + +A `Message` can be encoded in two different ways: + +``` +Binary protocol Message, strict encoding, 12+ bytes: ++--------+--------+--------+--------+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+ +|1vvvvvvv|vvvvvvvv|unused |00000mmm| name length | name | seq id | ++--------+--------+--------+--------+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+ +``` + +Where: + +* `vvvvvvvvvvvvvvv` is the version, an unsigned 15 bit number fixed to `1` (in binary: `000 0000 0000 0001`). + The leading bit is `1`. +* `unused` is an ignored byte. +* `mmm` is the message type, an unsigned 3 bit integer. The 5 leading bits must be `0` as some clients (checked for + java in 0.9.1) take the whole byte. +* `name length` is the byte length of the name field, a signed 32 bit integer encoded in network (big endian) order (must be >= 0). +* `name` is the method name, a UTF-8 encoded string. +* `seq id` is the sequence id, a signed 32 bit integer encoded in network (big endian) order. + +The second, older encoding (aka non-strict) is: + +``` +Binary protocol Message, old encoding, 9+ bytes: ++--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+--------+ +| name length | name |00000mmm| seq id | ++--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+--------+ +``` + +Where `name length`, `name`, `mmm`, `seq id` are as above. + +Because `name length` must be positive (therefore the first bit is always `0`), the first bit allows the receiver to see +whether the strict format or the old format is used. Therefore a server and client using the different variants of the +binary protocol can transparently talk with each other. However, when strict mode is enforced, the old format is +rejected. + +Message types are encoded with the following values: + +* _Call_: 1 +* _Reply_: 2 +* _Exception_: 3 +* _Oneway_: 4 + +## Struct + +A *Struct* is a sequence of zero or more fields, followed by a stop field. Each field starts with a field header and +is followed by the encoded field value. The encoding can be summarized by the following BNF: + +``` +struct ::= ( field-header field-value )* stop-field +field-header ::= field-type field-id +``` + +Because each field header contains the field-id (as defined by the Thrift IDL file), the fields can be encoded in any +order. Thrift's type system is not extensible; you can only encode the primitive types and structs. Therefore is also +possible to handle unknown fields while decoding; these are simply ignored. While decoding the field type can be used to +determine how to decode the field value. + +Note that the field name is not encoded so field renames in the IDL do not affect forward and backward compatibility. + +The default Java implementation (Apache Thrift 0.9.1) has undefined behavior when it tries to decode a field that has +another field-type then what is expected. Theoretically this could be detected at the cost of some additional checking. +Other implementation may perform this check and then either ignore the field, or return a protocol exception. + +A *Union* is encoded exactly the same as a struct with the additional restriction that at most 1 field may be encoded. + +An *Exception* is encoded exactly the same as a struct. + +### Struct encoding + +In the binary protocol field headers and the stop field are encoded as follows: + +``` +Binary protocol field header and field value: ++--------+--------+--------+--------+...+--------+ +|tttttttt| field id | field value | ++--------+--------+--------+--------+...+--------+ + +Binary protocol stop field: ++--------+ +|00000000| ++--------+ +``` + +Where: + +* `tttttttt` the field-type, a signed 8 bit integer. +* `field id` the field-id, a signed 16 bit integer in big endian order. +* `field-value` the encoded field value. + +The following field-types are used: + +* `BOOL`, encoded as `2` +* `BYTE`, encoded as `3` +* `DOUBLE`, encoded as `4` +* `I16`, encoded as `6` +* `I32`, encoded as `8` +* `I64`, encoded as `10` +* `STRING`, used for binary and string fields, encoded as `11` +* `STRUCT`, used for structs and union fields, encoded as `12` +* `MAP`, encoded as `13` +* `SET`, encoded as `14` +* `LIST`, encoded as `15` + +## List and Set + +List and sets are encoded the same: a header indicating the size and the element-type of the elements, followed by the +encoded elements. + +``` +Binary protocol list (5+ bytes) and elements: ++--------+--------+--------+--------+--------+--------+...+--------+ +|tttttttt| size | elements | ++--------+--------+--------+--------+--------+--------+...+--------+ +``` + +Where: + +* `tttttttt` is the element-type, encoded as an int8 +* `size` is the size, encoded as an int32, positive values only +* `elements` the element values + +The element-type values are the same as field-types. The full list is included in the struct section above. + +The maximum list/set size is configurable. By default there is no limit (meaning the limit is the maximum int32 value: +2147483647). + +## Map + +Maps are encoded with a header indicating the size, the element-type of the keys and the element-type of the elements, +followed by the encoded elements. The encoding follows this BNF: + +``` +map ::= key-element-type value-element-type size ( key value )* +``` + +``` +Binary protocol map (6+ bytes) and key value pairs: ++--------+--------+--------+--------+--------+--------+--------+...+--------+ +|kkkkkkkk|vvvvvvvv| size | key value pairs | ++--------+--------+--------+--------+--------+--------+--------+...+--------+ +``` + +Where: + +* `kkkkkkkk` is the key element-type, encoded as an int8 +* `vvvvvvvv` is the value element-type, encoded as an int8 +* `size` is the size of the map, encoded as an int32, positive values only +* `key value pairs` are the encoded keys and values + +The element-type values are the same as field-types. The full list is included in the struct section above. + +The maximum map size is configurable. By default there is no limit (meaning the limit is the maximum int32 value: +2147483647). + +# BNF notation used in this document + +The following BNF notation is used: + +* a plus `+` appended to an item represents repetition; the item is repeated 1 or more times +* a star `*` appended to an item represents optional repetition; the item is repeated 0 or more times +* a pipe `|` between items represents choice, the first matching item is selected +* parenthesis `(` and `)` are used for grouping multiple items diff --git a/src/jaegertracing/thrift/doc/specs/thrift-compact-protocol.md b/src/jaegertracing/thrift/doc/specs/thrift-compact-protocol.md new file mode 100644 index 000000000..02467dd19 --- /dev/null +++ b/src/jaegertracing/thrift/doc/specs/thrift-compact-protocol.md @@ -0,0 +1,294 @@ +Thrift Compact protocol encoding +================================ + +<!-- +-------------------------------------------------------------------- + +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + +-------------------------------------------------------------------- +--> + +This documents describes the wire encoding for RPC using the Thrift *compact protocol*. + +The information here is _mostly_ based on the Java implementation in the Apache thrift library (version 0.9.1) and +[THRIFT-110 A more compact format](https://issues.apache.org/jira/browse/THRIFT-110). Other implementation however, +should behave the same. + +For background on Thrift see the [Thrift whitepaper (pdf)](https://thrift.apache.org/static/files/thrift-20070401.pdf). + +# Contents + +* Compact protocol + * Base types + * Message + * Struct + * List and Set + * Map +* BNF notation used in this document + +# Compact protocol + +## Base types + +### Integer encoding + +The _compact protocol_ uses multiple encodings for ints: the _zigzag int_, and the _var int_. + +Values of type `int32` and `int64` are first transformed to a *zigzag int*. A zigzag int folds positive and negative +numbers into the positive number space. When we read 0, 1, 2, 3, 4 or 5 from the wire, this is translated to 0, -1, 1, +-2 or 2 respectively. Here are the (Scala) formulas to convert from int32/int64 to a zigzag int and back: + +```scala +def intToZigZag(n: Int): Int = (n << 1) ^ (n >> 31) +def zigzagToInt(n: Int): Int = (n >>> 1) ^ - (n & 1) +def longToZigZag(n: Long): Long = (n << 1) ^ (n >> 63) +def zigzagToLong(n: Long): Long = (n >>> 1) ^ - (n & 1) +``` + +The zigzag int is then encoded as a *var int*. Var ints take 1 to 5 bytes (int32) or 1 to 10 bytes (int64). The most +significant bit of each byte indicates if more bytes follow. The concatenation of the least significant 7 bits from each +byte form the number, where the first byte has the most significant bits (so they are in big endian or network order). + +Var ints are sometimes used directly inside the compact protocol to represent positive numbers. + +To encode an `int16` as zigzag int, it is first converted to an `int32` and then encoded as such. The type `int8` simply +uses a single byte as in the binary protocol. + +### Enum encoding + +The generated code encodes `Enum`s by taking the ordinal value and then encoding that as an int32. + +### Binary encoding + +Binary is sent as follows: + +``` +Binary protocol, binary data, 1+ bytes: ++--------+...+--------+--------+...+--------+ +| byte length | bytes | ++--------+...+--------+--------+...+--------+ +``` + +Where: + +* `byte length` is the length of the byte array, using var int encoding (must be >= 0). +* `bytes` are the bytes of the byte array. + +### String encoding + +*String*s are first encoded to UTF-8, and then send as binary. + +### Double encoding + +Values of type `double` are first converted to an int64 according to the IEEE 754 floating-point "double format" bit +layout. Most run-times provide a library to make this conversion. Both the binary protocol as the compact protocol then +encode the int64 in 8 bytes in big endian order. + +### Boolean encoding + +Booleans are encoded differently depending on whether it is a field value (in a struct) or an element value (in a set, +list or map). Field values are encoded directly in the field header. Element values of type `bool` are sent as an int8; +true as `1` and false as `0`. + +## Message + +A `Message` on the wire looks as follows: + +``` +Compact protocol Message (4+ bytes): ++--------+--------+--------+...+--------+--------+...+--------+--------+...+--------+ +|pppppppp|mmmvvvvv| seq id | name length | name | ++--------+--------+--------+...+--------+--------+...+--------+--------+...+--------+ +``` + +Where: + +* `pppppppp` is the protocol id, fixed to `1000 0010`, 0x82. +* `mmm` is the message type, an unsigned 3 bit integer. +* `vvvvv` is the version, an unsigned 5 bit integer, fixed to `00001`. +* `seq id` is the sequence id, a signed 32 bit integer encoded as a var int. +* `name length` is the byte length of the name field, a signed 32 bit integer encoded as a var int (must be >= 0). +* `name` is the method name to invoke, a UTF-8 encoded string. + +Message types are encoded with the following values: + +* _Call_: 1 +* _Reply_: 2 +* _Exception_: 3 +* _Oneway_: 4 + +### Struct + +A *Struct* is a sequence of zero or more fields, followed by a stop field. Each field starts with a field header and +is followed by the encoded field value. The encoding can be summarized by the following BNF: + +``` +struct ::= ( field-header field-value )* stop-field +field-header ::= field-type field-id +``` + +Because each field header contains the field-id (as defined by the Thrift IDL file), the fields can be encoded in any +order. Thrift's type system is not extensible; you can only encode the primitive types and structs. Therefore is also +possible to handle unknown fields while decoding; these are simply ignored. While decoding the field type can be used to +determine how to decode the field value. + +Note that the field name is not encoded so field renames in the IDL do not affect forward and backward compatibility. + +The default Java implementation (Apache Thrift 0.9.1) has undefined behavior when it tries to decode a field that has +another field-type than what is expected. Theoretically this could be detected at the cost of some additional checking. +Other implementation may perform this check and then either ignore the field, or return a protocol exception. + +A *Union* is encoded exactly the same as a struct with the additional restriction that at most 1 field may be encoded. + +An *Exception* is encoded exactly the same as a struct. + +### Struct encoding + +``` +Compact protocol field header (short form) and field value: ++--------+--------+...+--------+ +|ddddtttt| field value | ++--------+--------+...+--------+ + +Compact protocol field header (1 to 3 bytes, long form) and field value: ++--------+--------+...+--------+--------+...+--------+ +|0000tttt| field id | field value | ++--------+--------+...+--------+--------+...+--------+ + +Compact protocol stop field: ++--------+ +|00000000| ++--------+ +``` + +Where: + +* `dddd` is the field id delta, an unsigned 4 bits integer, strictly positive. +* `tttt` is field-type id, an unsigned 4 bit integer. +* `field id` the field id, a signed 16 bit integer encoded as zigzag int. +* `field-value` the encoded field value. + +The field id delta can be computed by `current-field-id - previous-field-id`, or just `current-field-id` if this is the +first of the struct. The short form should be used when the field id delta is in the range 1 - 15 (inclusive). + +The following field-types can be encoded: + +* `BOOLEAN_TRUE`, encoded as `1` +* `BOOLEAN_FALSE`, encoded as `2` +* `BYTE`, encoded as `3` +* `I16`, encoded as `4` +* `I32`, encoded as `5` +* `I64`, encoded as `6` +* `DOUBLE`, encoded as `7` +* `BINARY`, used for binary and string fields, encoded as `8` +* `LIST`, encoded as `9` +* `SET`, encoded as `10` +* `MAP`, encoded as `11` +* `STRUCT`, used for both structs and union fields, encoded as `12` + +Note that because there are 2 specific field types for the boolean values, the encoding of a boolean field value has no +length (0 bytes). + +## List and Set + +List and sets are encoded the same: a header indicating the size and the element-type of the elements, followed by the +encoded elements. + +``` +Compact protocol list header (1 byte, short form) and elements: ++--------+--------+...+--------+ +|sssstttt| elements | ++--------+--------+...+--------+ + +Compact protocol list header (2+ bytes, long form) and elements: ++--------+--------+...+--------+--------+...+--------+ +|1111tttt| size | elements | ++--------+--------+...+--------+--------+...+--------+ +``` + +Where: + +* `ssss` is the size, 4 bit unsigned int, values `0` - `14` +* `tttt` is the element-type, a 4 bit unsigned int +* `size` is the size, a var int (int32), positive values `15` or higher +* `elements` are the encoded elements + +The short form should be used when the length is in the range 0 - 14 (inclusive). + +The following element-types are used (note that these are _different_ from the field-types): + +* `BOOL`, encoded as `2` +* `BYTE`, encoded as `3` +* `DOUBLE`, encoded as `4` +* `I16`, encoded as `6` +* `I32`, encoded as `8` +* `I64`, encoded as `10` +* `STRING`, used for binary and string fields, encoded as `11` +* `STRUCT`, used for structs and union fields, encoded as `12` +* `MAP`, encoded as `13` +* `SET`, encoded as `14` +* `LIST`, encoded as `15` + + +The maximum list/set size is configurable. By default there is no limit (meaning the limit is the maximum int32 value: +2147483647). + +## Map + +Maps are encoded with a header indicating the size, the type of the keys and the element-type of the elements, followed +by the encoded elements. The encoding follows this BNF: + +``` +map ::= empty-map | non-empty-map +empty-map ::= `0` +non-empty-map ::= size key-element-type value-element-type (key value)+ +``` + +``` +Compact protocol map header (1 byte, empty map): ++--------+ +|00000000| ++--------+ + +Compact protocol map header (2+ bytes, non empty map) and key value pairs: ++--------+...+--------+--------+--------+...+--------+ +| size |kkkkvvvv| key value pairs | ++--------+...+--------+--------+--------+...+--------+ +``` + +Where: + +* `size` is the size, a var int (int32), strictly positive values +* `kkkk` is the key element-type, a 4 bit unsigned int +* `vvvv` is the value element-type, a 4 bit unsigned int +* `key value pairs` are the encoded keys and values + +The element-types are the same as for lists. The full list is included in the 'List and set' section. + +The maximum map size is configurable. By default there is no limit (meaning the limit is the maximum int32 value: +2147483647). + +# BNF notation used in this document + +The following BNF notation is used: + +* a plus `+` appended to an item represents repetition; the item is repeated 1 or more times +* a star `*` appended to an item represents optional repetition; the item is repeated 0 or more times +* a pipe `|` between items represents choice, the first matching item is selected +* parenthesis `(` and `)` are used for grouping multiple items diff --git a/src/jaegertracing/thrift/doc/specs/thrift-protocol-spec.md b/src/jaegertracing/thrift/doc/specs/thrift-protocol-spec.md new file mode 100644 index 000000000..0c1a61cb2 --- /dev/null +++ b/src/jaegertracing/thrift/doc/specs/thrift-protocol-spec.md @@ -0,0 +1,101 @@ +Thrift Protocol Structure +==================================================================== + +Last Modified: 2007-Jun-29 + +<!-- +-------------------------------------------------------------------- + +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + +-------------------------------------------------------------------- +--> + +This document describes the structure of the Thrift protocol +without specifying the encoding. Thus, the order of elements +could in some cases be rearranged depending upon the TProtocol +implementation, but this document specifies the minimum required +structure. There are some "dumb" terminals like STRING and INT +that take the place of an actual encoding specification. + +They key point to notice is that ALL messages are just one wrapped +`<struct>`. Depending upon the message type, the `<struct>` can be +interpreted as the argument list to a function, the return value +of a function, or an exception. + +-------------------------------------------------------------------- + +``` + <message> ::= <message-begin> <struct> <message-end> + + <message-begin> ::= <method-name> <message-type> <message-seqid> + + <method-name> ::= STRING + + <message-type> ::= T_CALL | T_REPLY | T_EXCEPTION | T_ONEWAY + + <message-seqid> ::= I32 + + <struct> ::= <struct-begin> <field>* <field-stop> <struct-end> + + <struct-begin> ::= <struct-name> + + <struct-name> ::= STRING + + <field-stop> ::= T_STOP + + <field> ::= <field-begin> <field-data> <field-end> + + <field-begin> ::= <field-name> <field-type> <field-id> + + <field-name> ::= STRING + + <field-type> ::= T_BOOL | T_BYTE | T_I8 | T_I16 | T_I32 | T_I64 | T_DOUBLE + | T_STRING | T_BINARY | T_STRUCT | T_MAP | T_SET | T_LIST + + <field-id> ::= I16 + + <field-data> ::= I8 | I16 | I32 | I64 | DOUBLE | STRING | BINARY + <struct> | <map> | <list> | <set> + + <map> ::= <map-begin> <field-datum>* <map-end> + + <map-begin> ::= <map-key-type> <map-value-type> <map-size> + + <map-key-type> ::= <field-type> + +<map-value-type> ::= <field-type> + + <map-size> ::= I32 + + <list> ::= <list-begin> <field-data>* <list-end> + + <list-begin> ::= <list-elem-type> <list-size> + +<list-elem-type> ::= <field-type> + + <list-size> ::= I32 + + <set> ::= <set-begin> <field-data>* <set-end> + + <set-begin> ::= <set-elem-type> <set-size> + + <set-elem-type> ::= <field-type> + + <set-size> ::= I32 +``` diff --git a/src/jaegertracing/thrift/doc/specs/thrift-rpc.md b/src/jaegertracing/thrift/doc/specs/thrift-rpc.md new file mode 100644 index 000000000..d45c06f30 --- /dev/null +++ b/src/jaegertracing/thrift/doc/specs/thrift-rpc.md @@ -0,0 +1,178 @@ +Thrift Remote Procedure Call +============================ + +<!-- +-------------------------------------------------------------------- + +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + +-------------------------------------------------------------------- +--> + +This document describes the high level message exchange between the Thrift RPC client and server. +See [thrift-binary-protocol.md] and [thrift-compact-protocol.md] for a description of how the exchanges are encoded on +the wire. + +In addition, this document compares the binary protocol with the compact protocol. Finally it describes the framed vs. +unframed transport. + +The information here is _mostly_ based on the Java implementation in the Apache thrift library (version 0.9.1 and +0.9.3). Other implementation however, should behave the same. + +For background on Thrift see the [Thrift whitepaper (pdf)](https://thrift.apache.org/static/files/thrift-20070401.pdf). + +# Contents + +* Thrift Message exchange for Remote Procedure Call + * Message + * Request struct + * Response struct +* Protocol considerations + * Comparing binary and compact protocol + * Compatibility + * Framed vs unframed transport + +# Thrift Remote Procedure Call Message exchange + +Both the binary protocol and the compact protocol assume a transport layer that exposes a bi-directional byte stream, +for example a TCP socket. Both use the following exchange: + +1. Client sends a `Message` (type `Call` or `Oneway`). The TMessage contains some metadata and the name of the method + to invoke. +2. Client sends method arguments (a struct defined by the generate code). +3. Server sends a `Message` (type `Reply` or `Exception`) to start the response. +4. Server sends a struct containing the method result or exception. + +The pattern is a simple half duplex protocol where the parties alternate in sending a `Message` followed by a struct. +What these are is described below. + +Although the standard Apache Thrift Java clients do not support pipelining (sending multiple requests without waiting +for an response), the standard Apache Thrift Java servers do support it. + +## Message + +A *Message* contains: + +* _Name_, a string (can be empty). +* _Message type_, a message types, one of `Call`, `Reply`, `Exception` and `Oneway`. +* _Sequence id_, a signed int32 integer. + +The *sequence id* is a simple message id assigned by the client. The server will use the same sequence id in the +message of the response. The client uses this number to detect out of order responses. Each client has an int32 field +which is increased for each message. The sequence id simply wraps around when it overflows. + +The *name* indicates the service method name to invoke. The server copies the name in the response message. + +When the *multiplexed protocol* is used, the name contains the service name, a colon `:` and the method name. The +multiplexed protocol is not compatible with other protocols. + +The *message type* indicates what kind of message is sent. Clients send requests with TMessages of type `Call` or +`Oneway` (step 1 in the protocol exchange). Servers send responses with messages of type `Exception` or `Reply` (step +3). + +Type `Reply` is used when the service method completes normally. That is, it returns a value or it throws one of the +exceptions defined in the Thrift IDL file. + +Type `Exception` is used for other exceptions. That is: when the service method throws an exception that is not declared +in the Thrift IDL file, or some other part of the Thrift stack throws an exception. For example when the server could +not encode or decode a message or struct. + +In the Java implementation (0.9.3) there is different behavior for the synchronous and asynchronous server. In the async +server all exceptions are send as a `TApplicationException` (see 'Response struct' below). In the synchronous Java +implementation only (undeclared) exceptions that extend `TException` are send as a `TApplicationException`. Unchecked +exceptions lead to an immediate close of the connection. + +Type `Oneway` is only used starting from Apache Thrift 0.9.3. Earlier versions do _not_ send TMessages of type `Oneway`, +even for service methods defined with the `oneway` modifier. + +When client sends a request with type `Oneway`, the server must _not_ send a response (steps 3 and 4 are skipped). Note +that the Thrift IDL enforces a return type of `void` and does not allow exceptions for oneway services. + +## Request struct + +The struct that follows the message of type `Call` or `Oneway` contains the arguments of the service method. The +argument ids correspond to the field ids. The name of the struct is the name of the method with `_args` appended. +For methods without arguments an struct is sent without fields. + +## Response struct + +The struct that follows the message of type `Reply` are structs in which exactly 1 of the following fields is encoded: + +* A field with name `success` and id `0`, used in case the method completed normally. +* An exception field, name and id are as defined in the `throws` clause in the Thrift IDL's service method definition. + +When the message is of type `Exception` the struct is encoded as if it was declared by the following IDL: + +``` +exception TApplicationException { + 1: string message, + 2: i32 type +} +``` + +The following exception types are defined in the java implementation (0.9.3): + +* _unknown_: 0, used in case the type from the peer is unknown. +* _unknown method_: 1, used in case the method requested by the client is unknown by the server. +* _invalid message type_: 2, no usage was found. +* _wrong method name_: 3, no usage was found. +* _bad sequence id_: 4, used internally by the client to indicate a wrong sequence id in the response. +* _missing result_: 5, used internally by the client to indicate a response without any field (result nor exception). +* _internal error_: 6, used when the server throws an exception that is not declared in the Thrift IDL file. +* _protocol error_: 7, used when something goes wrong during decoding. For example when a list is too long or a required + field is missing. +* _invalid transform_: 8, no usage was found. +* _invalid protocol_: 9, no usage was found. +* _unsupported client type_: 10, no usage was found. + +# Protocol considerations + +## Comparing binary and compact protocol + +The binary protocol is fairly simple and therefore easy to process. The compact protocol needs less bytes to send the +same data at the cost of additional processing. As bandwidth is usually the bottleneck, the compact protocol is almost +always slightly faster. + +## Compatibility + +A server could automatically determine whether a client talks the binary protocol or the compact protocol by +investigating the first byte. If the value is `1000 0001` or `0000 0000` (assuming a name shorter then ±16 MB) it is the +binary protocol. When the value is `1000 0010` it is talking the compact protocol. + +## Framed vs. unframed transport + +The first thrift binary wire format was unframed. This means that information is sent out in a single stream of bytes. +With unframed transport the (generated) processors will read directly from the socket (though Apache Thrift does try to +grab all available bytes from the socket in a buffer when it can). + +Later, Thrift introduced the framed transport. + +With framed transport the full request and response (the TMessage and the following struct) are first written to a +buffer. Then when the struct is complete (transport method `flush` is hijacked for this), the length of the buffer is +written to the socket first, followed by the buffered bytes. The combination is called a _frame_. On the receiver side +the complete frame is first read in a buffer before the message is passed to a processor. + +The length prefix is a 4 byte signed int, send in network (big endian) order. +The following must be true: `0` <= length <= `16384000` (16M). + +Framed transport was introduced to ease the implementation of async processors. An async processor is only invoked when +all data is received. Unfortunately, framed transport is not ideal for large messages as the entire frame stays in +memory until the message has been processed. In addition, the java implementation merges the incoming data to a single, +growing byte array. Every time the byte array is full it needs to be copied to a new larger byte array. + +Framed and unframed transports are not compatible with each other. diff --git a/src/jaegertracing/thrift/doc/specs/thrift-sasl-spec.txt b/src/jaegertracing/thrift/doc/specs/thrift-sasl-spec.txt new file mode 100644 index 000000000..02cf79e93 --- /dev/null +++ b/src/jaegertracing/thrift/doc/specs/thrift-sasl-spec.txt @@ -0,0 +1,108 @@ +A Thrift SASL message shall be a byte array of the following form: + +| 1-byte status code | 4-byte payload length | variable-length payload | + +The length fields shall be interpreted as integers, with the high byte sent +first. This indicates the length of the field immediately following it, not +including the status code or the length bytes. + +The possible status codes are: + +0x01 - START - Hello, let's go on a date. +0x02 - OK - Everything's been going alright so far, let's see each other again. +0x03 - BAD - I understand what you're saying. I really do. I just don't like it. We have to break up. +0x04 - ERROR - We can't go on like this. It's like you're speaking another language. +0x05 - COMPLETE - Will you marry me? + +The Thrift SASL communication will proceed as follows: + +1. The client is configured at instantiation of the transport with a single +underlying SASL security mechanism that it supports. + +2. The server is configured with a mapping of underlying security mechanism +name -> mechanism options. + +3. At connection time, the client will initiate communication by sending the +server a START message. The payload of this message will be the name of the +underlying security mechanism that the client would like to use. +This mechanism name shall be 1-20 characters in length, and follow the +specifications for SASL mechanism names specified in RFC 2222. + +4. The server receives this message and, if the mechanism name provided is +among the set of mechanisms this server transport is configured to accept, +appropriate initialization of the underlying security mechanism may take place. +If the mechanism name is not one which the server is configured to support, the +server shall return the BAD byte, followed by a 4-byte, potentially zero-value +message length, followed by the potentially zero-length payload which may be a +status code or message indicating failure. No further communication may take +place via this transport. If the mechanism name is one which the server +supports, then proceed to step 5. + +5. Following the START message, the client must send another message containing +the "initial response" of the chosen SASL implementation. The client may send +this message piggy-backed on the "START" message of step 3. The message type +of this message must be either "OK" or "COMPLETE", depending on whether the +SASL implementation indicates that this side of the authentication has been +satisfied. + +6. The server then provides the byte array of the payload received to its +underlying security mechanism. A challenge is generated by the underlying +security mechanism on the server, and this is used as the payload for a message +sent to the client. This message shall consist of an OK byte, followed by the +non-zero message length word, followed by the payload. + +7. The client receives this message from the server and passes the payload to +its underlying security mechanism to generate a response. The client then sends +the server an OK byte, followed by the non-zero-value length of the response, +followed by the bytes of the response as the payload. + +8. Steps 6 and 7 are repeated until both security mechanisms are satisfied with +the challenge/response exchange. When either side has completed its security +protocol, its next message shall be the COMPLETE byte, followed by a 4-byte +potentially zero-value length word, followed by a potentially zero-length +payload. This payload will be empty except for those underlying security +mechanisms which provide additional data with success. + +If at any point in time either side is able to interpret the challenge or +response sent by the other, but is dissatisfied with the contents thereof, this +side should send the other a BAD byte, followed by a 4-byte potentially +zero-value length word, followed by an optional, potentially zero-length +message encoded in UTF-8 indicating failure. This message should be passed to +the protocol above the thrift transport by whatever mechanism is appropriate +and idiomatic for the particular language these thrift bindings are for. + +If at any point in time either side fails to interpret the challenge or +response sent by the other, this side should send the other an ERROR byte, +followed by a 4-byte potentially zero-value length word, followed by an +optional, potentially zero-length message encoded in UTF-8. This message should +be passed to the protocol above the thrift transport by whatever mechanism is +appropriate and idiomatic for the particular language these thrift bindings are +for. + +If step 8 completes successfully, then the communication is considered +authenticated and subsequent communication may commence. + +If step 8 fails to complete successfully, then no further communication may +take place via this transport. + +8. All writes to the underlying transport must be prefixed by the 4-byte length +of the payload data, followed by the payload. All reads from this transport +should read the 4-byte length word, then read the full quantity of bytes +specified by this length word. + +If no SASL QOP (quality of protection) is negotiated during steps 6 and 7, then +all subsequent writes to/reads from this transport are written/read unaltered, +save for the length prefix, to the underlying transport. + +If a SASL QOP is negotiated, then this must be used by the Thrift transport for +all subsequent communication. This is done by wrapping subsequent writes to the +transport using the underlying security mechanism, and unwrapping subsequent +reads from the underlying transport. Note that in this case, the length prefix +of the write to the underlying transport is the length of the data after it has +been wrapped by the underlying security mechanism. Note that the complete +message must be read before giving this data to the underlying security +mechanism for unwrapping. + +If at any point in time reading of a message fails either because of a +malformed length word or failure to unwrap by the underlying security +mechanism, then all further communication on this transport must cease. diff --git a/src/jaegertracing/thrift/doc/specs/thrift.tex b/src/jaegertracing/thrift/doc/specs/thrift.tex new file mode 100644 index 000000000..a706fcbbc --- /dev/null +++ b/src/jaegertracing/thrift/doc/specs/thrift.tex @@ -0,0 +1,1057 @@ +%----------------------------------------------------------------------------- +% +% Thrift whitepaper +% +% Name: thrift.tex +% +% Authors: Mark Slee (mcslee@facebook.com) +% +% Created: 05 March 2007 +% +% You will need a copy of sigplanconf.cls to format this document. +% It is available at <http://www.sigplan.org/authorInformation.htm>. +% +%----------------------------------------------------------------------------- + + +\documentclass[nocopyrightspace,blockstyle]{sigplanconf} + +\usepackage{amssymb} +\usepackage{amsfonts} +\usepackage{amsmath} +\usepackage{url} + +\begin{document} + +% \conferenceinfo{WXYZ '05}{date, City.} +% \copyrightyear{2007} +% \copyrightdata{[to be supplied]} + +% \titlebanner{banner above paper title} % These are ignored unless +% \preprintfooter{short description of paper} % 'preprint' option specified. + +\title{Thrift: Scalable Cross-Language Services Implementation} +\subtitle{} + +\authorinfo{Mark Slee, Aditya Agarwal and Marc Kwiatkowski} + {Facebook, 156 University Ave, Palo Alto, CA} + {\{mcslee,aditya,marc\}@facebook.com} + +\maketitle + +\begin{abstract} +Thrift is a software library and set of code-generation tools developed at +Facebook to expedite development and implementation of efficient and scalable +backend services. Its primary goal is to enable efficient and reliable +communication across programming languages by abstracting the portions of each +language that tend to require the most customization into a common library +that is implemented in each language. Specifically, Thrift allows developers to +define datatypes and service interfaces in a single language-neutral file +and generate all the necessary code to build RPC clients and servers. + +This paper details the motivations and design choices we made in Thrift, as +well as some of the more interesting implementation details. It is not +intended to be taken as research, but rather it is an exposition on what we did +and why. +\end{abstract} + +% \category{D.3.3}{Programming Languages}{Language constructs and features} + +%\terms +%Languages, serialization, remote procedure call + +%\keywords +%Data description language, interface definition language, remote procedure call + +\section{Introduction} +As Facebook's traffic and network structure have scaled, the resource +demands of many operations on the site (i.e. search, +ad selection and delivery, event logging) have presented technical requirements +drastically outside the scope of the LAMP framework. In our implementation of +these services, various programming languages have been selected to +optimize for the right combination of performance, ease and speed of +development, availability of existing libraries, etc. By and large, +Facebook's engineering culture has tended towards choosing the best +tools and implementations available over standardizing on any one +programming language and begrudgingly accepting its inherent limitations. + +Given this design choice, we were presented with the challenge of building +a transparent, high-performance bridge across many programming languages. +We found that most available solutions were either too limited, did not offer +sufficient datatype freedom, or suffered from subpar performance. +\footnote{See Appendix A for a discussion of alternative systems.} + +The solution that we have implemented combines a language-neutral software +stack implemented across numerous programming languages and an associated code +generation engine that transforms a simple interface and data definition +language into client and server remote procedure call libraries. +Choosing static code generation over a dynamic system allows us to create +validated code that can be run without the need for +any advanced introspective run-time type checking. It is also designed to +be as simple as possible for the developer, who can typically define all +the necessary data structures and interfaces for a complex service in a single +short file. + +Surprised that a robust open solution to these relatively common problems +did not yet exist, we committed early on to making the Thrift implementation +open source. + +In evaluating the challenges of cross-language interaction in a networked +environment, some key components were identified: + +\textit{Types.} A common type system must exist across programming languages +without requiring that the application developer use custom Thrift datatypes +or write their own serialization code. That is, +a C++ programmer should be able to transparently exchange a strongly typed +STL map for a dynamic Python dictionary. Neither +programmer should be forced to write any code below the application layer +to achieve this. Section 2 details the Thrift type system. + +\textit{Transport.} Each language must have a common interface to +bidirectional raw data transport. The specifics of how a given +transport is implemented should not matter to the service developer. +The same application code should be able to run against TCP stream sockets, +raw data in memory, or files on disk. Section 3 details the Thrift Transport +layer. + +\textit{Protocol.} Datatypes must have some way of using the Transport +layer to encode and decode themselves. Again, the application +developer need not be concerned by this layer. Whether the service uses +an XML or binary protocol is immaterial to the application code. +All that matters is that the data can be read and written in a consistent, +deterministic matter. Section 4 details the Thrift Protocol layer. + +\textit{Versioning.} For robust services, the involved datatypes must +provide a mechanism for versioning themselves. Specifically, +it should be possible to add or remove fields in an object or alter the +argument list of a function without any interruption in service (or, +worse yet, nasty segmentation faults). Section 5 details Thrift's versioning +system. + +\textit{Processors.} Finally, we generate code capable of processing data +streams to accomplish remote procedure calls. Section 6 details the generated +code and TProcessor paradigm. + +Section 7 discusses implementation details, and Section 8 describes +our conclusions. + +\section{Types} + +The goal of the Thrift type system is to enable programmers to develop using +completely natively defined types, no matter what programming language they +use. By design, the Thrift type system does not introduce any special dynamic +types or wrapper objects. It also does not require that the developer write +any code for object serialization or transport. The Thrift IDL (Interface +Definition Language) file is +logically a way for developers to annotate their data structures with the +minimal amount of extra information necessary to tell a code generator +how to safely transport the objects across languages. + +\subsection{Base Types} + +The type system rests upon a few base types. In considering which types to +support, we aimed for clarity and simplicity over abundance, focusing +on the key types available in all programming languages, omitting any +niche types available only in specific languages. + +The base types supported by Thrift are: +\begin{itemize} +\item \texttt{bool} A boolean value, true or false +\item \texttt{byte} A signed byte +\item \texttt{i16} A 16-bit signed integer +\item \texttt{i32} A 32-bit signed integer +\item \texttt{i64} A 64-bit signed integer +\item \texttt{double} A 64-bit floating point number +\item \texttt{string} An encoding-agnostic text or binary string +\item \texttt{binary} A byte array representation for blobs +\end{itemize} + +Of particular note is the absence of unsigned integer types. Because these +types have no direct translation to native primitive types in many languages, +the advantages they afford are lost. Further, there is no way to prevent the +application developer in a language like Python from assigning a negative value +to an integer variable, leading to unpredictable behavior. From a design +standpoint, we observed that unsigned integers were very rarely, if ever, used +for arithmetic purposes, but in practice were much more often used as keys or +identifiers. In this case, the sign is irrelevant. Signed integers serve this +same purpose and can be safely cast to their unsigned counterparts (most +commonly in C++) when absolutely necessary. + +\subsection{Structs} + +A Thrift struct defines a common object to be used across languages. A struct +is essentially equivalent to a class in object oriented programming +languages. A struct has a set of strongly typed fields, each with a unique +name identifier. The basic syntax for defining a Thrift struct looks very +similar to a C struct definition. Fields may be annotated with an integer field +identifier (unique to the scope of that struct) and optional default values. +Field identifiers will be automatically assigned if omitted, though they are +strongly encouraged for versioning reasons discussed later. + +\subsection{Containers} + +Thrift containers are strongly typed containers that map to the most commonly +used containers in common programming languages. They are annotated using +the C++ template (or Java Generics) style. There are three types available: +\begin{itemize} +\item \texttt{list<type>} An ordered list of elements. Translates directly into +an STL \texttt{vector}, Java \texttt{ArrayList}, or native array in scripting languages. May +contain duplicates. +\item \texttt{set<type>} An unordered set of unique elements. Translates into +an STL \texttt{set}, Java \texttt{HashSet}, \texttt{set} in Python, or native +dictionary in PHP/Ruby. +\item \texttt{map<type1,type2>} A map of strictly unique keys to values +Translates into an STL \texttt{map}, Java \texttt{HashMap}, PHP associative +array, or Python/Ruby dictionary. +\end{itemize} + +While defaults are provided, the type mappings are not explicitly fixed. Custom +code generator directives have been added to substitute custom types in +destination languages (i.e. +\texttt{hash\_map} or Google's sparse hash map can be used in C++). The +only requirement is that the custom types support all the necessary iteration +primitives. Container elements may be of any valid Thrift type, including other +containers or structs. + +\begin{verbatim} +struct Example { + 1:i32 number=10, + 2:i64 bigNumber, + 3:double decimals, + 4:string name="thrifty" +}\end{verbatim} + +In the target language, each definition generates a type with two methods, +\texttt{read} and \texttt{write}, which perform serialization and transport +of the objects using a Thrift TProtocol object. + +\subsection{Exceptions} + +Exceptions are syntactically and functionally equivalent to structs except +that they are declared using the \texttt{exception} keyword instead of the +\texttt{struct} keyword. + +The generated objects inherit from an exception base class as appropriate +in each target programming language, in order to seamlessly +integrate with native exception handling in any given +language. Again, the design emphasis is on making the code familiar to the +application developer. + +\subsection{Services} + +Services are defined using Thrift types. Definition of a service is +semantically equivalent to defining an interface (or a pure virtual abstract +class) in object oriented +programming. The Thrift compiler generates fully functional client and +server stubs that implement the interface. Services are defined as follows: + +\begin{verbatim} +service <name> { + <returntype> <name>(<arguments>) + [throws (<exceptions>)] + ... +}\end{verbatim} + +An example: + +\begin{verbatim} +service StringCache { + void set(1:i32 key, 2:string value), + string get(1:i32 key) throws (1:KeyNotFound knf), + void delete(1:i32 key) +} +\end{verbatim} + +Note that \texttt{void} is a valid type for a function return, in addition to +all other defined Thrift types. Additionally, an \texttt{async} modifier +keyword may be added to a \texttt{void} function, which will generate code that does +not wait for a response from the server. Note that a pure \texttt{void} +function will return a response to the client which guarantees that the +operation has completed on the server side. With \texttt{async} method calls +the client will only be guaranteed that the request succeeded at the +transport layer. (In many transport scenarios this is inherently unreliable +due to the Byzantine Generals' Problem. Therefore, application developers +should take care only to use the async optimization in cases where dropped +method calls are acceptable or the transport is known to be reliable.) + +Also of note is the fact that argument lists and exception lists for functions +are implemented as Thrift structs. All three constructs are identical in both +notation and behavior. + +\section{Transport} + +The transport layer is used by the generated code to facilitate data transfer. + +\subsection{Interface} + +A key design choice in the implementation of Thrift was to decouple the +transport layer from the code generation layer. Though Thrift is typically +used on top of the TCP/IP stack with streaming sockets as the base layer of +communication, there was no compelling reason to build that constraint into +the system. The performance tradeoff incurred by an abstracted I/O layer +(roughly one virtual method lookup / function call per operation) was +immaterial compared to the cost of actual I/O operations (typically invoking +system calls). + +Fundamentally, generated Thrift code only needs to know how to read and +write data. The origin and destination of the data are irrelevant; it may be a +socket, a segment of shared memory, or a file on the local disk. The Thrift +transport interface supports the following methods: + +\begin{itemize} +\item \texttt{open} Opens the transport +\item \texttt{close} Closes the transport +\item \texttt{isOpen} Indicates whether the transport is open +\item \texttt{read} Reads from the transport +\item \texttt{write} Writes to the transport +\item \texttt{flush} Forces any pending writes +\end{itemize} + +There are a few additional methods not documented here which are used to aid +in batching reads and optionally signaling the completion of a read or +write operation from the generated code. + +In addition to the above +\texttt{TTransport} interface, there is a\\ +\texttt{TServerTransport} interface +used to accept or create primitive transport objects. Its interface is as +follows: + +\begin{itemize} +\item \texttt{open} Opens the transport +\item \texttt{listen} Begins listening for connections +\item \texttt{accept} Returns a new client transport +\item \texttt{close} Closes the transport +\end{itemize} + +\subsection{Implementation} + +The transport interface is designed for simple implementation in any +programming language. New transport mechanisms can be easily defined as needed +by application developers. + +\subsubsection{TSocket} + +The \texttt{TSocket} class is implemented across all target languages. It +provides a common, simple interface to a TCP/IP stream socket. + +\subsubsection{TFileTransport} + +The \texttt{TFileTransport} is an abstraction of an on-disk file to a data +stream. It can be used to write out a set of incoming Thrift requests to a file +on disk. The on-disk data can then be replayed from the log, either for +post-processing or for reproduction and/or simulation of past events. + +\subsubsection{Utilities} + +The Transport interface is designed to support easy extension using common +OOP techniques, such as composition. Some simple utilities include the +\texttt{TBufferedTransport}, which buffers the writes and reads on an +underlying transport, the \texttt{TFramedTransport}, which transmits data with frame +size headers for chunking optimization or nonblocking operation, and the +\texttt{TMemoryBuffer}, which allows reading and writing directly from the heap +or stack memory owned by the process. + +\section{Protocol} + +A second major abstraction in Thrift is the separation of data structure from +transport representation. Thrift enforces a certain messaging structure when +transporting data, but it is agnostic to the protocol encoding in use. That is, +it does not matter whether data is encoded as XML, human-readable ASCII, or a +dense binary format as long as the data supports a fixed set of operations +that allow it to be deterministically read and written by generated code. + +\subsection{Interface} + +The Thrift Protocol interface is very straightforward. It fundamentally +supports two things: 1) bidirectional sequenced messaging, and +2) encoding of base types, containers, and structs. + +\begin{verbatim} +writeMessageBegin(name, type, seq) +writeMessageEnd() +writeStructBegin(name) +writeStructEnd() +writeFieldBegin(name, type, id) +writeFieldEnd() +writeFieldStop() +writeMapBegin(ktype, vtype, size) +writeMapEnd() +writeListBegin(etype, size) +writeListEnd() +writeSetBegin(etype, size) +writeSetEnd() +writeBool(bool) +writeByte(byte) +writeI16(i16) +writeI32(i32) +writeI64(i64) +writeDouble(double) +writeString(string) + +name, type, seq = readMessageBegin() + readMessageEnd() +name = readStructBegin() + readStructEnd() +name, type, id = readFieldBegin() + readFieldEnd() +k, v, size = readMapBegin() + readMapEnd() +etype, size = readListBegin() + readListEnd() +etype, size = readSetBegin() + readSetEnd() +bool = readBool() +byte = readByte() +i16 = readI16() +i32 = readI32() +i64 = readI64() +double = readDouble() +string = readString() +\end{verbatim} + +Note that every \texttt{write} function has exactly one \texttt{read} counterpart, with +the exception of \texttt{writeFieldStop()}. This is a special method +that signals the end of a struct. The procedure for reading a struct is to +\texttt{readFieldBegin()} until the stop field is encountered, and then to +\texttt{readStructEnd()}. The +generated code relies upon this call sequence to ensure that everything written by +a protocol encoder can be read by a matching protocol decoder. Further note +that this set of functions is by design more robust than necessary. +For example, \texttt{writeStructEnd()} is not strictly necessary, as the end of +a struct may be implied by the stop field. This method is a convenience for +verbose protocols in which it is cleaner to separate these calls (e.g. a closing +\texttt{</struct>} tag in XML). + +\subsection{Structure} + +Thrift structures are designed to support encoding into a streaming +protocol. The implementation should never need to frame or compute the +entire data length of a structure prior to encoding it. This is critical to +performance in many scenarios. Consider a long list of relatively large +strings. If the protocol interface required reading or writing a list to be an +atomic operation, then the implementation would need to perform a linear pass over the +entire list before encoding any data. However, if the list can be written +as iteration is performed, the corresponding read may begin in parallel, +theoretically offering an end-to-end speedup of $(kN - C)$, where $N$ is the size +of the list, $k$ the cost factor associated with serializing a single +element, and $C$ is fixed offset for the delay between data being written +and becoming available to read. + +Similarly, structs do not encode their data lengths a priori. Instead, they are +encoded as a sequence of fields, with each field having a type specifier and a +unique field identifier. Note that the inclusion of type specifiers allows +the protocol to be safely parsed and decoded without any generated code +or access to the original IDL file. Structs are terminated by a field header +with a special \texttt{STOP} type. Because all the basic types can be read +deterministically, all structs (even those containing other structs) can be +read deterministically. The Thrift protocol is self-delimiting without any +framing and regardless of the encoding format. + +In situations where streaming is unnecessary or framing is advantageous, it +can be very simply added into the transport layer, using the +\texttt{TFramedTransport} abstraction. + +\subsection{Implementation} + +Facebook has implemented and deployed a space-efficient binary protocol which +is used by most backend services. Essentially, it writes all data +in a flat binary format. Integer types are converted to network byte order, +strings are prepended with their byte length, and all message and field headers +are written using the primitive integer serialization constructs. String names +for fields are omitted - when using generated code, field identifiers are +sufficient. + +We decided against some extreme storage optimizations (i.e. packing +small integers into ASCII or using a 7-bit continuation format) for the sake +of simplicity and clarity in the code. These alterations can easily be made +if and when we encounter a performance-critical use case that demands them. + +\section{Versioning} + +Thrift is robust in the face of versioning and data definition changes. This +is critical to enable staged rollouts of changes to deployed services. The +system must be able to support reading of old data from log files, as well as +requests from out-of-date clients to new servers, and vice versa. + +\subsection{Field Identifiers} + +Versioning in Thrift is implemented via field identifiers. The field header +for every member of a struct in Thrift is encoded with a unique field +identifier. The combination of this field identifier and its type specifier +is used to uniquely identify the field. The Thrift definition language +supports automatic assignment of field identifiers, but it is good +programming practice to always explicitly specify field identifiers. +Identifiers are specified as follows: + +\begin{verbatim} +struct Example { + 1:i32 number=10, + 2:i64 bigNumber, + 3:double decimals, + 4:string name="thrifty" +}\end{verbatim} + +To avoid conflicts between manually and automatically assigned identifiers, +fields with identifiers omitted are assigned identifiers +decrementing from -1, and the language only supports the manual assignment of +positive identifiers. + +When data is being deserialized, the generated code can use these identifiers +to properly identify the field and determine whether it aligns with a field in +its definition file. If a field identifier is not recognized, the generated +code can use the type specifier to skip the unknown field without any error. +Again, this is possible due to the fact that all datatypes are self +delimiting. + +Field identifiers can (and should) also be specified in function argument +lists. In fact, argument lists are not only represented as structs on the +backend, but actually share the same code in the compiler frontend. This +allows for version-safe modification of method parameters + +\begin{verbatim} +service StringCache { + void set(1:i32 key, 2:string value), + string get(1:i32 key) throws (1:KeyNotFound knf), + void delete(1:i32 key) +} +\end{verbatim} + +The syntax for specifying field identifiers was chosen to echo their structure. +Structs can be thought of as a dictionary where the identifiers are keys, and +the values are strongly-typed named fields. + +Field identifiers internally use the \texttt{i16} Thrift type. Note, however, +that the \texttt{TProtocol} abstraction may encode identifiers in any format. + +\subsection{Isset} + +When an unexpected field is encountered, it can be safely ignored and +discarded. When an expected field is not found, there must be some way to +signal to the developer that it was not present. This is implemented via an +inner \texttt{isset} structure inside the defined objects. (Isset functionality +is implicit with a \texttt{null} value in PHP, \texttt{None} in Python +and \texttt{nil} in Ruby.) Essentially, +the inner \texttt{isset} object of each Thrift struct contains a boolean value +for each field which denotes whether or not that field is present in the +struct. When a reader receives a struct, it should check for a field being set +before operating directly on it. + +\begin{verbatim} +class Example { + public: + Example() : + number(10), + bigNumber(0), + decimals(0), + name("thrifty") {} + + int32_t number; + int64_t bigNumber; + double decimals; + std::string name; + + struct __isset { + __isset() : + number(false), + bigNumber(false), + decimals(false), + name(false) {} + bool number; + bool bigNumber; + bool decimals; + bool name; + } __isset; +... +} +\end{verbatim} + +\subsection{Case Analysis} + +There are four cases in which version mismatches may occur. + +\begin{enumerate} +\item \textit{Added field, old client, new server.} In this case, the old +client does not send the new field. The new server recognizes that the field +is not set, and implements default behavior for out-of-date requests. +\item \textit{Removed field, old client, new server.} In this case, the old +client sends the removed field. The new server simply ignores it. +\item \textit{Added field, new client, old server.} The new client sends a +field that the old server does not recognize. The old server simply ignores +it and processes as normal. +\item \textit{Removed field, new client, old server.} This is the most +dangerous case, as the old server is unlikely to have suitable default +behavior implemented for the missing field. It is recommended that in this +situation the new server be rolled out prior to the new clients. +\end{enumerate} + +\subsection{Protocol/Transport Versioning} +The \texttt{TProtocol} abstractions are also designed to give protocol +implementations the freedom to version themselves in whatever manner they +see fit. Specifically, any protocol implementation is free to send whatever +it likes in the \texttt{writeMessageBegin()} call. It is entirely up to the +implementor how to handle versioning at the protocol level. The key point is +that protocol encoding changes are safely isolated from interface definition +version changes. + +Note that the exact same is true of the \texttt{TTransport} interface. For +example, if we wished to add some new checksumming or error detection to the +\texttt{TFileTransport}, we could simply add a version header into the +data it writes to the file in such a way that it would still accept old +log files without the given header. + +\section{RPC Implementation} + +\subsection{TProcessor} + +The last core interface in the Thrift design is the \texttt{TProcessor}, +perhaps the most simple of the constructs. The interface is as follows: + +\begin{verbatim} +interface TProcessor { + bool process(TProtocol in, TProtocol out) + throws TException +} +\end{verbatim} + +The key design idea here is that the complex systems we build can fundamentally +be broken down into agents or services that operate on inputs and outputs. In +most cases, there is actually just one input and output (an RPC client) that +needs handling. + +\subsection{Generated Code} + +When a service is defined, we generate a +\texttt{TProcessor} instance capable of handling RPC requests to that service, +using a few helpers. The fundamental structure (illustrated in pseudo-C++) is +as follows: + +\begin{verbatim} +Service.thrift + => Service.cpp + interface ServiceIf + class ServiceClient : virtual ServiceIf + TProtocol in + TProtocol out + class ServiceProcessor : TProcessor + ServiceIf handler + +ServiceHandler.cpp + class ServiceHandler : virtual ServiceIf + +TServer.cpp + TServer(TProcessor processor, + TServerTransport transport, + TTransportFactory tfactory, + TProtocolFactory pfactory) + serve() +\end{verbatim} + +From the Thrift definition file, we generate the virtual service interface. +A client class is generated, which implements the interface and +uses two \texttt{TProtocol} instances to perform the I/O operations. The +generated processor implements the \texttt{TProcessor} interface. The generated +code has all the logic to handle RPC invocations via the \texttt{process()} +call, and takes as a parameter an instance of the service interface, as +implemented by the application developer. + +The user provides an implementation of the application interface in separate, +non-generated source code. + +\subsection{TServer} + +Finally, the Thrift core libraries provide a \texttt{TServer} abstraction. +The \texttt{TServer} object generally works as follows. + +\begin{itemize} +\item Use the \texttt{TServerTransport} to get a \texttt{TTransport} +\item Use the \texttt{TTransportFactory} to optionally convert the primitive +transport into a suitable application transport (typically the +\texttt{TBufferedTransportFactory} is used here) +\item Use the \texttt{TProtocolFactory} to create an input and output protocol +for the \texttt{TTransport} +\item Invoke the \texttt{process()} method of the \texttt{TProcessor} object +\end{itemize} + +The layers are appropriately separated such that the server code needs to know +nothing about any of the transports, encodings, or applications in play. The +server encapsulates the logic around connection handling, threading, etc. +while the processor deals with RPC. The only code written by the application +developer lives in the definitional Thrift file and the interface +implementation. + +Facebook has deployed multiple \texttt{TServer} implementations, including +the single-threaded \texttt{TSimpleServer}, thread-per-connection +\texttt{TThreadedServer}, and thread-pooling \texttt{TThreadPoolServer}. + +The \texttt{TProcessor} interface is very general by design. There is no +requirement that a \texttt{TServer} take a generated \texttt{TProcessor} +object. Thrift allows the application developer to easily write any type of +server that operates on \texttt{TProtocol} objects (for instance, a server +could simply stream a certain type of object without any actual RPC method +invocation). + +\section{Implementation Details} +\subsection{Target Languages} +Thrift currently supports five target languages: C++, Java, Python, Ruby, and +PHP. At Facebook, we have deployed servers predominantly in C++, Java, and +Python. Thrift services implemented in PHP have also been embedded into the +Apache web server, providing transparent backend access to many of our +frontend constructs using a \texttt{THttpClient} implementation of the +\texttt{TTransport} interface. + +Though Thrift was explicitly designed to be much more efficient and robust +than typical web technologies, as we were designing our XML-based REST web +services API we noticed that Thrift could be easily used to define our +service interface. Though we do not currently employ SOAP envelopes (in the +authors' opinions there is already far too much repetitive enterprise Java +software to do that sort of thing), we were able to quickly extend Thrift to +generate XML Schema Definition files for our service, as well as a framework +for versioning different implementations of our web service. Though public +web services are admittedly tangential to Thrift's core use case and design, +Thrift facilitated rapid iteration and affords us the ability to quickly +migrate our entire XML-based web service onto a higher performance system +should the need arise. + +\subsection{Generated Structs} +We made a conscious decision to make our generated structs as transparent as +possible. All fields are publicly accessible; there are no \texttt{set()} and +\texttt{get()} methods. Similarly, use of the \texttt{isset} object is not +enforced. We do not include any \texttt{FieldNotSetException} construct. +Developers have the option to use these fields to write more robust code, but +the system is robust to the developer ignoring the \texttt{isset} construct +entirely and will provide suitable default behavior in all cases. + +This choice was motivated by the desire to ease application development. Our stated +goal is not to make developers learn a rich new library in their language of +choice, but rather to generate code that allow them to work with the constructs +that are most familiar in each language. + +We also made the \texttt{read()} and \texttt{write()} methods of the generated +objects public so that the objects can be used outside of the context +of RPC clients and servers. Thrift is a useful tool simply for generating +objects that are easily serializable across programming languages. + +\subsection{RPC Method Identification} +Method calls in RPC are implemented by sending the method name as a string. One +issue with this approach is that longer method names require more bandwidth. +We experimented with using fixed-size hashes to identify methods, but in the +end concluded that the savings were not worth the headaches incurred. Reliably +dealing with conflicts across versions of an interface definition file is +impossible without a meta-storage system (i.e. to generate non-conflicting +hashes for the current version of a file, we would have to know about all +conflicts that ever existed in any previous version of the file). + +We wanted to avoid too many unnecessary string comparisons upon +method invocation. To deal with this, we generate maps from strings to function +pointers, so that invocation is effectively accomplished via a constant-time +hash lookup in the common case. This requires the use of a couple interesting +code constructs. Because Java does not have function pointers, process +functions are all private member classes implementing a common interface. + +\begin{verbatim} +private class ping implements ProcessFunction { + public void process(int seqid, + TProtocol iprot, + TProtocol oprot) + throws TException + { ...} +} + +HashMap<String,ProcessFunction> processMap_ = + new HashMap<String,ProcessFunction>(); +\end{verbatim} + +In C++, we use a relatively esoteric language construct: member function +pointers. + +\begin{verbatim} +std::map<std::string, + void (ExampleServiceProcessor::*)(int32_t, + facebook::thrift::protocol::TProtocol*, + facebook::thrift::protocol::TProtocol*)> + processMap_; +\end{verbatim} + +Using these techniques, the cost of string processing is minimized, and we +reap the benefit of being able to easily debug corrupt or misunderstood data by +inspecting it for known string method names. + +\subsection{Servers and Multithreading} +Thrift services require basic multithreading to handle simultaneous +requests from multiple clients. For the Python and Java implementations of +Thrift server logic, the standard threading libraries distributed with the +languages provide adequate support. For the C++ implementation, no standard multithread runtime +library exists. Specifically, robust, lightweight, and portable +thread manager and timer class implementations do not exist. We investigated +existing implementations, namely \texttt{boost::thread}, +\texttt{boost::threadpool}, \texttt{ACE\_Thread\_Manager} and +\texttt{ACE\_Timer}. + +While \texttt{boost::threads}\cite{boost.threads} provides clean, +lightweight and robust implementations of multi-thread primitives (mutexes, +conditions, threads) it does not provide a thread manager or timer +implementation. + +\texttt{boost::threadpool}\cite{boost.threadpool} also looked promising but +was not far enough along for our purposes. We wanted to limit the dependency on +third-party libraries as much as possible. Because\\ +\texttt{boost::threadpool} is +not a pure template library and requires runtime libraries and because it is +not yet part of the official Boost distribution we felt it was not ready for +use in Thrift. As \texttt{boost::threadpool} evolves and especially if it is +added to the Boost distribution we may reconsider our decision to not use it. + +ACE has both a thread manager and timer class in addition to multi-thread +primitives. The biggest problem with ACE is that it is ACE. Unlike Boost, ACE +API quality is poor. Everything in ACE has large numbers of dependencies on +everything else in ACE - thus forcing developers to throw out standard +classes, such as STL collections, in favor of ACE's homebrewed implementations. In +addition, unlike Boost, ACE implementations demonstrate little understanding +of the power and pitfalls of C++ programming and take no advantage of modern +templating techniques to ensure compile time safety and reasonable compiler +error messages. For all these reasons, ACE was rejected. Instead, we chose +to implement our own library, described in the following sections. + +\subsection{Thread Primitives} + +The Thrift thread libraries are implemented in the namespace\\ +\texttt{facebook::thrift::concurrency} and have three components: +\begin{itemize} +\item primitives +\item thread pool manager +\item timer manager +\end{itemize} + +As mentioned above, we were hesitant to introduce any additional dependencies +on Thrift. We decided to use \texttt{boost::shared\_ptr} because it is so +useful for multithreaded application, it requires no link-time or +runtime libraries (i.e. it is a pure template library) and it is due +to become part of the C++0x standard. + +We implement standard \texttt{Mutex} and \texttt{Condition} classes, and a + \texttt{Monitor} class. The latter is simply a combination of a mutex and +condition variable and is analogous to the \texttt{Monitor} implementation provided for +the Java \texttt{Object} class. This is also sometimes referred to as a barrier. We +provide a \texttt{Synchronized} guard class to allow Java-like synchronized blocks. +This is just a bit of syntactic sugar, but, like its Java counterpart, clearly +delimits critical sections of code. Unlike its Java counterpart, we still +have the ability to programmatically lock, unlock, block, and signal monitors. + +\begin{verbatim} +void run() { + {Synchronized s(manager->monitor); + if (manager->state == TimerManager::STARTING) { + manager->state = TimerManager::STARTED; + manager->monitor.notifyAll(); + } + } +} +\end{verbatim} + +We again borrowed from Java the distinction between a thread and a runnable +class. A \texttt{Thread} is the actual schedulable object. The +\texttt{Runnable} is the logic to execute within the thread. +The \texttt{Thread} implementation deals with all the platform-specific thread +creation and destruction issues, while the \texttt{Runnable} implementation deals +with the application-specific per-thread logic. The benefit of this approach +is that developers can easily subclass the Runnable class without pulling in +platform-specific super-classes. + +\subsection{Thread, Runnable, and shared\_ptr} +We use \texttt{boost::shared\_ptr} throughout the \texttt{ThreadManager} and +\texttt{TimerManager} implementations to guarantee cleanup of dead objects that can +be accessed by multiple threads. For \texttt{Thread} class implementations, +\texttt{boost::shared\_ptr} usage requires particular attention to make sure +\texttt{Thread} objects are neither leaked nor dereferenced prematurely while +creating and shutting down threads. + +Thread creation requires calling into a C library. (In our case the POSIX +thread library, \texttt{libpthread}, but the same would be true for WIN32 threads). +Typically, the OS makes few, if any, guarantees about when \texttt{ThreadMain}, a C thread's entry-point function, will be called. Therefore, it is +possible that our thread create call, +\texttt{ThreadFactory::newThread()} could return to the caller +well before that time. To ensure that the returned \texttt{Thread} object is not +prematurely cleaned up if the caller gives up its reference prior to the +\texttt{ThreadMain} call, the \texttt{Thread} object makes a weak reference to +itself in its \texttt{start} method. + +With the weak reference in hand the \texttt{ThreadMain} function can attempt to get +a strong reference before entering the \texttt{Runnable::run} method of the +\texttt{Runnable} object bound to the \texttt{Thread}. If no strong references to the +thread are obtained between exiting \texttt{Thread::start} and entering \texttt{ThreadMain}, the weak reference returns \texttt{null} and the function +exits immediately. + +The need for the \texttt{Thread} to make a weak reference to itself has a +significant impact on the API. Since references are managed through the +\texttt{boost::shared\_ptr} templates, the \texttt{Thread} object must have a reference +to itself wrapped by the same \texttt{boost::shared\_ptr} envelope that is returned +to the caller. This necessitated the use of the factory pattern. +\texttt{ThreadFactory} creates the raw \texttt{Thread} object and a +\texttt{boost::shared\_ptr} wrapper, and calls a private helper method of the class +implementing the \texttt{Thread} interface (in this case, \texttt{PosixThread::weakRef}) + to allow it to make add weak reference to itself through the + \texttt{boost::shared\_ptr} envelope. + +\texttt{Thread} and \texttt{Runnable} objects reference each other. A \texttt{Runnable} +object may need to know about the thread in which it is executing, and a Thread, obviously, +needs to know what \texttt{Runnable} object it is hosting. This interdependency is +further complicated because the lifecycle of each object is independent of the +other. An application may create a set of \texttt{Runnable} object to be reused in different threads, or it may create and forget a \texttt{Runnable} object +once a thread has been created and started for it. + +The \texttt{Thread} class takes a \texttt{boost::shared\_ptr} reference to the hosted +\texttt{Runnable} object in its constructor, while the \texttt{Runnable} class has an +explicit \texttt{thread} method to allow explicit binding of the hosted thread. +\texttt{ThreadFactory::newThread} binds the objects to each other. + +\subsection{ThreadManager} + +\texttt{ThreadManager} creates a pool of worker threads and +allows applications to schedule tasks for execution as free worker threads +become available. The \texttt{ThreadManager} does not implement dynamic +thread pool resizing, but provides primitives so that applications can add +and remove threads based on load. This approach was chosen because +implementing load metrics and thread pool size is very application +specific. For example some applications may want to adjust pool size based +on running-average of work arrival rates that are measured via polled +samples. Others may simply wish to react immediately to work-queue +depth high and low water marks. Rather than trying to create a complex +API abstract enough to capture these different approaches, we +simply leave it up to the particular application and provide the +primitives to enact the desired policy and sample current status. + +\subsection{TimerManager} + +\texttt{TimerManager} allows applications to schedule + \texttt{Runnable} objects for execution at some point in the future. Its specific task +is to allows applications to sample \texttt{ThreadManager} load at regular +intervals and make changes to the thread pool size based on application policy. +Of course, it can be used to generate any number of timer or alarm events. + +The default implementation of \texttt{TimerManager} uses a single thread to +execute expired \texttt{Runnable} objects. Thus, if a timer operation needs to +do a large amount of work and especially if it needs to do blocking I/O, +that should be done in a separate thread. + +\subsection{Nonblocking Operation} +Though the Thrift transport interfaces map more directly to a blocking I/O +model, we have implemented a high performance \texttt{TNonBlockingServer} +in C++ based on \texttt{libevent} and the \texttt{TFramedTransport}. We +implemented this by moving all I/O into one tight event loop using a +state machine. Essentially, the event loop reads framed requests into +\texttt{TMemoryBuffer} objects. Once entire requests are ready, they are +dispatched to the \texttt{TProcessor} object which can read directly from +the data in memory. + +\subsection{Compiler} +The Thrift compiler is implemented in C++ using standard \texttt{lex}/\texttt{yacc} +lexing and parsing. Though it could have been implemented with fewer +lines of code in another language (i.e. Python Lex-Yacc (PLY) or \texttt{ocamlyacc}), using C++ +forces explicit definition of the language constructs. Strongly typing the +parse tree elements (debatably) makes the code more approachable for new +developers. + +Code generation is done using two passes. The first pass looks only for +include files and type definitions. Type definitions are not checked during +this phase, since they may depend upon include files. All included files +are sequentially scanned in a first pass. Once the include tree has been +resolved, a second pass over all files is taken that inserts type definitions +into the parse tree and raises an error on any undefined types. The program is +then generated against the parse tree. + +Due to inherent complexities and potential for circular dependencies, +we explicitly disallow forward declaration. Two Thrift structs cannot +each contain an instance of the other. (Since we do not allow \texttt{null} +struct instances in the generated C++ code, this would actually be impossible.) + +\subsection{TFileTransport} +The \texttt{TFileTransport} logs Thrift requests/structs by +framing incoming data with its length and writing it out to disk. +Using a framed on-disk format allows for better error checking and +helps with the processing of a finite number of discrete events. The\\ +\texttt{TFileWriterTransport} uses a system of swapping in-memory buffers +to ensure good performance while logging large amounts of data. +A Thrift log file is split up into chunks of a specified size; logged messages +are not allowed to cross chunk boundaries. A message that would cross a chunk +boundary will cause padding to be added until the end of the chunk and the +first byte of the message are aligned to the beginning of the next chunk. +Partitioning the file into chunks makes it possible to read and interpret data +from a particular point in the file. + +\section{Facebook Thrift Services} +Thrift has been employed in a large number of applications at Facebook, including +search, logging, mobile, ads and the developer platform. Two specific usages are discussed below. + +\subsection{Search} +Thrift is used as the underlying protocol and transport layer for the Facebook Search service. +The multi-language code generation is well suited for search because it allows for application +development in an efficient server side language (C++) and allows the Facebook PHP-based web application +to make calls to the search service using Thrift PHP libraries. There is also a large +variety of search stats, deployment and testing functionality that is built on top +of generated Python code. Additionally, the Thrift log file format is +used as a redo log for providing real-time search index updates. Thrift has allowed the +search team to leverage each language for its strengths and to develop code at a rapid pace. + +\subsection{Logging} +The Thrift \texttt{TFileTransport} functionality is used for structured logging. Each +service function definition along with its parameters can be considered to be +a structured log entry identified by the function name. This log can then be used for +a variety of purposes, including inline and offline processing, stats aggregation and as a redo log. + +\section{Conclusions} +Thrift has enabled Facebook to build scalable backend +services efficiently by enabling engineers to divide and conquer. Application +developers can focus on application code without worrying about the +sockets layer. We avoid duplicated work by writing buffering and I/O logic +in one place, rather than interspersing it in each application. + +Thrift has been employed in a wide variety of applications at Facebook, +including search, logging, mobile, ads, and the developer platform. We have +found that the marginal performance cost incurred by an extra layer of +software abstraction is far eclipsed by the gains in developer efficiency and +systems reliability. + +\appendix + +\section{Similar Systems} +The following are software systems similar to Thrift. Each is (very!) briefly +described: + +\begin{itemize} +\item \textit{SOAP.} XML-based. Designed for web services via HTTP, excessive +XML parsing overhead. +\item \textit{CORBA.} Relatively comprehensive, debatably overdesigned and +heavyweight. Comparably cumbersome software installation. +\item \textit{COM.} Embraced mainly in Windows client software. Not an entirely +open solution. +\item \textit{Pillar.} Lightweight and high-performance, but missing versioning +and abstraction. +\item \textit{Protocol Buffers.} Closed-source, owned by Google. Described in +Sawzall paper. +\end{itemize} + +\acks + +Many thanks for feedback on Thrift (and extreme trial by fire) are due to +Martin Smith, Karl Voskuil and Yishan Wong. + +Thrift is a successor to Pillar, a similar system developed +by Adam D'Angelo, first while at Caltech and continued later at Facebook. +Thrift simply would not have happened without Adam's insights. + +\begin{thebibliography}{} + +\bibitem{boost.threads} +Kempf, William, +``Boost.Threads'', +\url{http://www.boost.org/doc/html/threads.html} + +\bibitem{boost.threadpool} +Henkel, Philipp, +``threadpool'', +\url{http://threadpool.sourceforge.net} + +\end{thebibliography} + +\end{document} |