.travis/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97

# Description of CI build configuration

## Variables needed by travis

- GITHUB_TOKEN - GitHub token with push access to repository
- DOCKER_USERNAME - Username (netdatabot) with write access to docker hub repository
- DOCKER_PASS - Password to docker hub
- encrypted_8daf19481253_key - key needed by openssl to decrypt GCS credentials file
- encrypted_8daf19481253_iv - IV needed by openssl to decrypt GCS credentials file
- COVERITY_SCAN_TOKEN - Token to allow coverity test analysis uploads
- SLACK_USERNAME - This is required for the slack notifications triggered by travis pipeline
- SLACK_CHANNEL - This is the channel that Travis will be posting messages
- SLACK_NOTIFY_WEBHOOK_URL - This is the incoming URL webhook as provided by slack integration. Visit Apps integration in slack to generate the required hook
- SLACK_BOT_NAME - This is the name your bot will appear with on slack

## CI workflow details
Our CI pipeline is designed to help us identify and mitigate risks at all stages of implementation.
To accommodate this need, we used [Travis CI](http://www.travis-ci.com) as our CI/CD tool.
Our main areas of concern are:
1) Only push code that is working. That means fail fast so that we can improve before we reach the public

2) Reduce the time to market to minimum, by streamlining the release process.
   That means a lot of testing, a lot of consistency checks, a lot of validations

3) Generated artifacts consistency. We should not allow broken software to reach the public.
   When this happens, it's embarassing and we struggle to eliminate it.

4) We are an innovative company, so we love to automate :)


Having said that, here's a brief introduction to Netdata's improved CI/CD pipeline with Travis.
Our CI/CD lifecycle contains three different execution entry points:
1) A user opens a pull request to netdata/master: Travis will run a pipeline on the branch under that PR
2) A merge or commit happens on netdata/master. This will trigger travis to run, but we have two distinct cases in this scenario:
   a) A user merges a pull request to netdata/master: Travis will run on master, after the merge.
   b) A user runs a commit/merge with a special keyword (mentioned later).
      This triggers a release for either minor, major or release candidate versions, depending the keyword
3) A scheduled job runs on master once per day: Travis will run on master at the scheduled interval

To accommodate all three entry points our CI/CD workflow has a set of steps that run on all three entry points.
Once all these steps are successfull, then our pipeline executes another subset of steps for entry points 2 and 3.
In travis terms the "steps" are "Stages" and within each stage we execute a set of activities called "jobs" in travis.

### Always run: Stages that running on all three execution entry points

## Code quality, linting, syntax, code style
At this early stage we iterate through a set of basic quality control checks:
- Shell checking: Run linters for our various BASH scripts
- Checksum validators: Run validators to ensure our installers and documentation are in sync
- Dashboard validator: We provide a pre-generated dashboard.js script file that we need to make sure its up to date. We validate that.

## Build process
At this stage, basically, we build :-)
We do a baseline check of our build artifacts to guarantee they are not broken
Briefly our activities include:
- Verify docker builds successfully
- Run the standard netdata installer, to make sure we build & run properly
- Do the same through 'make dist', as this is our stable channel for our kickstart files

## Artifacts validation
At this point we know our software is building, we need to go through the a set of checks, to guarantee
that our product meets certain epxectations. At the current stage, we are focusing on basic capabilities
like installing in different distributions, running the full lifecycle of install-run-update-install and so on.
We are still working on enriching this with more and more use cases, to get us closer to achieving full stability of our software.
Briefly we currently evaluate the following activities:
- Basic software unit testing
- Non containerized build and install on ubuntu 14.04
- Non containerized build and install on ubuntu 18.04
- Running the full netdata lifecycle (install, update, uninstall) on ubuntu 18.04
- Build and install on CentOS 6
- Build and install on CentOS 7
(More to come)

### Nightly operations: Stages that run daily under cronjob
The nightly stages are related to the daily nightly activities, that produce our daily latest releases.
We also maintain a couple of cronjobs that run during the night to provide us with deeper insights,
like for example coverity scanning or extended kickstart checksum checks

## Nightly operations
At this stage we run scheduled jobs and execute the nightly changelog generator, coverity scans,
labeler for our issues and extended kickstart files checksum validations.

## Nightly release
During this stage we are building and publishing latest docker images, prepare the nightly artifacts
and deploy them (the artifacts) to our google cloud service provider.


### Publishing
Publishing is responsible for executing the major/minor/patch releases and is separated
in two stages: packaging preparation process and publishing.

## Packaging for release
During packaging we are preparing the release changelog information and run the labeler.

## Publish for release
The publishing stage is the most complex part in publishing. This is the stage were we generate and publish docker images,
prepare the release artifacts and get ready with the release draft.