Summary of the Gerrit User Summit & Hackathon 2019 in Sunnyvale
High-performance Summit in numbers
The Gerrit User Summit 2019 has ended, with highest score of achievements in the history of the 11 years of the entire Gerrit open-source project:
-
Two dates and locations in a 12-months period: Gothenburg (Sweden) and Sunnyvale (California).
-
Four Gerrit releases delivered: v2.15.16, v2.16.11, v3.0.2, v3.1.0
-
127 people registered across the two locations, 87 people attended on-site (70% turnout) and 38 people followed the event remotely at different times using the live streaming coverage provided by GerritForge.
-
373 changes merged (204 in Gothenburg, 169 in Sunnyvale).
-
32 developers attended the Hackathons, 8 of them have never contributed or attended an event before.
-
The highest performing version of Gerrit v3.1.0 released, with over 2x git and REST-API performance compared to v3.0.x.
-
22 talks presented across Gothenburg and Sunnyvale, with 6 new speakers that have never presented before at the Summit.
The performance of the Summit is yet again another evidence of the continuous growth of the community and the increased synergies with the JGit, OpenStack/Zuul and the Tuleap open-source projects.
Sunnyvale Hackathon summary
Gerrit v3.1.0 preparation, load testing and release
During the Hackathon, David Pursehouse has been working on the release of Gerrit v3.1.0, with the help and support of all the other developers at the hackathon.
Following the experiences of the previous releases, this year the major focus has been the stability, end-to-end and load testing of the release. Matthias Sohn (SAP), Fabio Ponciroli (GerritForge) and Antonio Barone (GerritForge) worked in improving the Gerrit E2E test suite to perform A/B testing of Gerrit v3.0 vs. v3.1.
GerritForge has upgraded early on the GerritHub.io multi-site setup, keeping one data-centre (Canada) on v3.0 and upgrading the second data-centre (Germany) to v3.1. GerritHub.io has thus been the target of the Gerrit v3.1 validation tests which has been successfully completed and shown a 2x performance improvement ratio between the two releases.
NOTE: The E2E tests for Gerrit are based on Gatling open-source framework with the Git protocol support implemented by GerritForge.
Support for large repositories in Gerrit
Luca Milanesio (GerritForge), Matthias Sohn (SAP) and Martin Fick (Qualcomm) have been discussing the issues associated with very large repositories:
-
JVM heap utilisation and associated GC cycles
Luca contributed the information about the problems and investigations associated with large stop-the-world (STW) GC pauses observed when running Git operations on large repositories. The JVM heap would need to create a large in-memory packfile and thus would require the JVM to allocate a very large continuous area of memory. That operation could, in some cases, trigger a STW GC cycle that could make the Gerrit server unavailable for a few seconds.
-
Git in-memory cache of Packfiles and BLOBs
Matthias has contributed its experience at SAP in dealing with large repositories. The JVM heap allocated is huge, up to 500 GBytes. A big part of the heap is dedicated to the in-memory packfile caching which would avoid the continuous allocation/release of large areas of memory. However, it looks like that even though the cache is still needed, the JVM at times releases part of it and may cause the continuous memory allocation/release that may cause STW GC cycles.
-
Quotas support for expensive operations
Martin has proposed a change to the Gerrit quotas to block or delay incoming operations in the execution queue. It could allow to identify operations that could be potentially trigger a STW GC and reschedule them at a later time. Whilst this would not completely solve the problem it would allow the Gerrit instance to have a “breathing space” and recover heap before serving the exensive operations.
Review and merge of the ref-table support in JGit and Gerrit
Han-Wen Nienhuys (Google) and Matthias Sohn (SAP) have worked in the final review and submission of the JGit implementation of ref-table, which was initially designed by Shawn Pearce but never applied to the OpenSource code-base. Han-Wen has redesigned the feature for making it compatible with the filesystem-based implementation of JGit.
The “implement FileReftableDatabase” change has been merged into JGit and later included in Gerrit v3.1.2.
The Git reftable is an alternative storage for keeping the list of Git refs on the filesystem. The ones currently implemented in Git are the loose refs and packed refs, which are both not scalable for repositories with a large number of refs (e.g. 500k or more).
With regards to the reftable performance, the following table speaks more than a thousand words:
format | cache | scan | by name | by SHA-1 |
---|---|---|---|---|
packed-refs | cold | 402 ms | 409,660.1 usec | 412,535.8 usec |
packed-refs | hot | 6,844.6 usec | 20,110.1 usec | |
reftable | cold | 112.0 ms | 33.9 usec | 323.2 usec |
reftable | hot | 20.2 usec | 320.8 usec |
Merge of the two forks of the high-availability plugin
The high-availability plugin has been founded in 2016 by Ericsson with the scope of allowing an active failover of their Gerrit master setup. Over the years, the plugin has received many contributions by different companies, including CollabNet, SAP and GerritForge.
Starting from 2018, GerritForge began to fork the plugin because of the need to have urgent fixes merged that made their way also in the mainstream repository. However, as we all know, forking is easy but merging is a lot more complicated and painful and the fork continued for over Two years with duplication of efforts and imparity of fix levels between the two forks.
Marco Miller (Ericsson), David Ostrovsky and Luca Milanesio (GerritForge) worked hard to merge the two forks and make them aligned in terms of functionality and fixes. After the hackathon and in the following few weeks, the GerritForge’s fork has been successfully merged into the main repository.
The only active version of the high-availability plugin is now the mainstream repository. David Ostrovsky and Luca Milanesio have been officially granted the role of maintainers, together with the current Ericsson and CollabNet members.
Multi-site plugin decoupled from Kafka and Zookeeper
The multi-site plugin was originally released in April 2019 and is fully based on Kafka/Zookeeper infrastructure for the alignment of indexes, caches and events across sites.
During the hackathon, Marcin Czech (GerritForge) has worked in abstracting the Kafka/Zookeeper layer out of the multi-site plugin. That allows Gerrit multi-site to be deployed in the future with a different infrastructure, possibly more cloud-native and integrated with the major cloud provider services.
The Kafka broker interface has been put into the kafka events plugin, which was previously used only for stream events and now also for indexing/cache consistency.
With regards to Zookeeper, an initial request to include a generic support for a global-refdb has been presented but then abandoned because of the unanimous rejection by the Gerrit community.
Waiting for a different solution to be presented, the support for Zookeeper has been then moved to a GerritForge-owned repository on GitHub.
Summit summary
The talks have been mainly centred on the new features introduced in Gerrit v3.1:
- The porting to PolyGerrit 2 and the new development team in Germany
- Performance improvements in v3.1
- Support for Git protocol v2
Some of the talks presented in Gothenburg have been replayed in Sunnyvale as well, with the addition of brand-new talks about the new features and developments completed in the past three months.
This year the Summit was hosted in the new home of GerritForge in the USA, downtown Sunnyvale, at The Satellite in the historic Del Monte building.
What’s new in Gerrit v3.0/v3.1
David Ostrovsky, Luca Mianesio (GerritForge) and Patrick Hiesel (Google) have presented the new features and improvements introduced in Gerrit v3.0/v3.1, two closely related versions.
Gerrit v3.0/v3.1 include respectively 1,589 and 1,443 commits, which together makes over 3k of changes compared to the latest v2.16.x releases. Gerrit major release number has been incremented because of breaking changes introduced:
- Removal of the GWT UI
- Removal of ReviewDb (deprecated from v2.16)
- Removal of pushes to refs/drafts/* and refs/changes/*
New and noteworthy feature include:
- Re-introduction of Git protocol v2
- Significant speed-up of the Gerrit frontend and backend, showing up to 2x performance improvement (Gatling automated tests)
Any upgrade to Gerrit v3.0/v3.1 require to have a stop at v2.16 and convert the changes from ReviewDb to NoteDb.
Road-map and migration path to Gerrit v3
Luca Milanesio (GerritForge) presented a deep-dive into the high-level process of migrating Gerrit from old releases to the latest v3.1.
Migrating is always difficult, and Gerrit migrations before the advent of NoteDb were alwyas cursed by the schema upgrades needed by ReviewDb. However, migrating to the latest version is not an option and must be planned and executed systematically.
Luca classified Gerrit migrations in four quadrants, based on their version distance and installation size.
-
Trivial
Small upgrade step (e.g. v2.15 to v2.16) for a small-sized Gerrit setup. It is typically resolved by a war upgrade and Gerrit restart.
-
Complex
Small upgrade step for a large-scale Gerrit setup. It typically requires more coordination and communication with the teams about the planning and execution of the cutover plan. The outage window needs to be tested and reduced to a minimum.
-
Risky
Big upgrade step (e.g. v2.11 to v3.1) for a small-sized Gerrit setup. The big gap of releases introduce functional differences and gaps on the different features (e.g. draft changes migrated to WIP/Private).
-
Ultrahazardous
Big upgrade step (e.g. v2.11 to v3.1) for a large-scale Gerrit setup. THe big functional gap combined with a large setup involving potentially hundreds or thousands of people may lead to a very delicate and hazardous upgrade.
Luca went through the overview of how to plan and execute the migrations of type 1., 2. and 3. while advised to avoid type 4. migrations as they may lead to expensive and un-necessary risks.
Any upgrade of type 4. can be translated as a series of upgrades of type 2. which would lower the risk and increase the confidence and understanding of the new Gerrit features.
Gareth Bowles (Apple) presented his experience on managing Gerrit and automating each phase of its lifecycle using Ansible. Apple’s installation has 1k projects with over 670k patch-sets and used by over 800+ worldwide.
Cesare San Martino (GerritForge) explained how the adoption of Gerrit high-availability plugin and architecture can help in lowering the risks associated with migrations and reduce the outage window to a minimum if not even to zero in certain cases.
Gerrit Q&A with the maintainers
For the very first time the Q&A was a global event, allowing people on-site in Sunnyvale and remote around the globe in streaming to interact and ask questions directly to the Gerrit maintainers.
The questions were at 360 degrees covering multiple topics:
- Status of the Gerrit plugins
- Onboarding of new contributors to the Gerrit project
- New organisation of the Gerrit Open-Source community with ESC and CMs
- Gerrit vs. GitHub vs. GitLab: competition or integration
- Pull-request workflow for Gerrit
What’s cooking in JGit
Ivan Frade and Han-Wen Nienhuys (Google) have presented the new innovative features that are coming in the next forthcoming versions of JGit. This is the first time since the last GitTogether in 2011 that core Git contributors are participating with a mixed Git/Gerrit audience.
Ivan presented what’s new on the JGit server side, which is the backend that serves the Git protocol for the Chromium and Android Open-Source projects.
The new features introduced in JGit from v5.2/3/4/5 and master are focussed on:
-
Exposing server options mechanism, made possible since the introduction of Git protocol v2. That allowed to enable precious features like the Git-protocol level tracing from a server-side perspective.
-
Consistency on demand and update indexes, which allows Git servers on multiple sites to pass a consistency version token and detect when commits are replicated to remote servers and thus ready to be fetched.
-
Reachability checker optimisation, which allows large repositories to reduce the execution time and CPU utilisation of the validation of the “WANT SHA1” commands received from the Git client.
-
Sideband-all, which means that at any point in the communication the client and server can pass parallel information via the normal Git protocol client/server communication. That allows new use-cases like the packfile off-loading, which is a new capability that would communicate to a series of mirrors where the packfiles can be fetched concurrently.
-
Local reftables, presented by Han-Wen, are an innovative storage format that allows repositories to scale to millions of refs without impacting significantly the access time on the filesystem and reducing lock contention in case of concurrent updates.
New developments and team structure in the PolyGerrit Team
Google’s Gerrit frontend team has been successfully re-staffed with four new hires over the summer. Today it consists of Ben, Dhruv, Dmitrii, Milutin, Ole, Tao - all working from the Munich office alongside Google’s backend team.
For the 3.1 release the frontend infrastructure has been changed to use Polymer 2 instead of 1, which among other things means that all UI components are encapsulated using the Shadow DOM. The team’s focus are further infrastructure projects (Polymer 3, stronger typing, npm, content-security-policy, …), performance, checks and a new feature for tracking whose turn it is for all your code reviews.
Status of the Gerrit Code-Review Analytics for the Android open-source project
David Ostrovsky and Luca Milanesio (GerritForge) presented the work done to extend the Gerrit DevOps Analytics Open-Source platform (GDA) to cover also the use-case of the Android Open-Source Project (AOSP).
The GDA platform collects information about Git commits, reviews and logs and correlate them together to build dashboards of KPIs that are relevant to the people involved with the project.
Luca described the challenges of applying the platform to AOSP:
-
Mirroring of AOSP repositories to GerritHub.io in order to minimise network traffic during the Big-Data processing.
-
Scale-up the current performance of the analytics extractors and ELT so that AOSP branches are resolved quickly and without impact on the JVM utilisation.
-
David has presented the challenge of parsing foreign NoteDb change-data using the Gerrit internal API.
What’s new in the Bazel tool-chain for Gerrit
David Ostrovsky (GerritForge) presented the advance in the Bazel latest versions adoption in the Gerrit build tool-chain and its plugins.
The Gerrit build process is complex, and involves Java, JavaScript, 160+ dependencies, 150+ plugins built in two modes (standalone and in-tree). All of that needs to be orchestrated, automated and executed in a fast, correct and reproducible way.
Gerrit started as a Maven build project (until v2.7) and then later moved to Buck (v2.8-v2.13) and eventually adopted Bazel (v2.14 onwards). Bazel is the industry standard for large, distributed and fast builds executed in the build server. It is used by large companies around the globe, including Spotify, Uber, Stripe, nVidia, Volvo and many others.
David explained how the overall build process works in Gerrit and highlighted the versions where the build is actively supported by the community (v2.16 onwards). Bazel builds are orchestrated by the Gerrit CI, initially created by GerritForge and now actively supported by the whole Gerrit community.
Last but not least, David explained some of the tips and tricks on how to perform integration-tests in Gerrit, using the TestContainers library, which allows to automatically test and validate more complex scenarios like ElasticSearch indexes and the Gerrit multi-site plugin.
Racy JGit
Matthias Sohn (SAP) presented the history of how time is used in JGit and the struggle to improve the resiliency to the git racy-reads problem.
It all started with the bug #544199 reported by Luca Milanesio (GerritForge) which was later fixed by adjusting the way packfiles cache consistency is checked against the filesystem.
The fix opened up the pandora box of the 2.5s hard-coded resolution in JGit for dealing with racy-reads. The additional checks to make sure that a packfile has not been changed after being cached in memory, raised the bug #546891 related to the performance regression observed. The reason why JGit historically used a hard-coded resolution of 2.5s was the FAT filesystem storing timestamps with 2s resolution (the extra .5s is a safety margin), which can still be found in some Windows systems running Eclipse.
Matthias and the other folks at SAP have been working hard to improve the way the filesystem resolution is detected and make all of that available in JGit transparently, without having to configure anything special in Gerrit or JGit.
The problem was not easy to resolve as the complex combination of JVM versions, OSes and filesystems created a series of conditions that could have made the calculation of the resolution a lot harder than initially thought.
The challenge was eventually completed after seven months of work by six different authors (Chris, Han-Wen, Luca, Matthias, Marc, Thomas) and 82 commits across 22 different service releases. JGit with the racy-reads problem resolved and optimised is now included in all the latest active versions of Gerrit.
OSSUM with Gerrit
Miikka Andersson from CollabNet gave a presentation about the company’s latest product initiative: ossum. Ossum is a developer-focused SaaS solution for software engineering needs with a strong focus on planning, version control, and Continuous Integration.
Gerrit is one of ossum’s key components on top of which the entire version control service was built. The decision of choosing Gerrit for the backend wasn’t coincidence: CollabNet has a long history with Gerrit and ossum is already company’s second product providing Gerrit-powered Git service.
In his presentation, Miikka went through some of the key factors contributing to the decision to choose Gerrit to be used as Git backend for the new product initiative. In addition to that, some of the key takeaways and lessons learnt from earlier Gerrit-based product initiatives were shared with the audience.
Revert submission
Gal Paikin (paiking@) from Google showed a new feature he was working on. In this presentation presentation he described RevertSubmission, a new endpoint that allows reverting multiple changes simultaneously. This endpoint is meant to ease the workflow of many engineers that submit many changes together.
Gerrit metrics and dashboards
We all know metrics are important to monitor the status of our systems and avoid our users to tell us Gerrit is not working before we realise it.
Gerrit logs are an under-evaluated gold mine of metrics. In this presentation Fabio Ponciroli (aka Ponch, GerritForge) showed its five favourites metrics which help in the daily job of a Gerrit admin.
Stress your Gerrit with Gatling
Fabio Ponciroli (GerritForge), aka Ponch, showed the work on implementing a consistent end-to-end test scenario for Gerrit by leveraging the Gatling tool.
Testing Gerrit involves the invocation of REST-API by simulating the PolyGerrit UI and also the use of Git/HTTP and Git/SSH protocol. Gatling, however, does not support the Git protocol out-of-the-box. Ponch has introduced the gatling-git project, that extends Gatling to include the Git protocol.
The definition of end-to-end tests is further simplified by using the Gatling “feeders”. Those are sample data in JSON format, which can also be generated from existing Gerrit production logs.
Ponch has then showcased, to Luca’s surprise, a real use-case of running load tests against GerritHub.io, and they generated the expected spike of incoming traffic.
This is a post, with the video of the presentation, about the topic published on the GerritForge blog.
Feedback and proposals of improvements for the next Summits
For the very first time, the Q&A with the maintainers was done with a vast audience, including the people from Europe in Gothenburg, the users in Silicon Valley and the remote attendees remotely using the GerritForge Live streaming.
The audience was very active and asked many questions related to the Gerrit release management (can Gerrit have more stable and well-defined release plan?), to the plugins lifecycle management and to the new processes introduced in the community like the design-driven contribution.
Also the recurring question about the “competition” between Gerrit, GitLab and GitHub came back, with different feedback from various users and companies. There is also people still happily using IBM ClearCase ! That means there isn’t a golden standard for using a golden platform that would resolve all the use-cases.
The main reason people and companies have adopted Gerrit is the need for scalability and managing a large number of users across different sites across the globe.
Another thing that emerged again is how to smooth the learning curve for the new adopters of Gerrit Code Review, possibly giving the possibility to contribute using a branch or pull-request review model, in conjunction with the typical change-based code review.
All the discussions and hints were captured in the
Gerrit Code Review Issue Tracker
associated with the label retrospective
for easier discovery and tracking.
Thank you again to all the attendees of the Gerrit User Summit 2019 in Volvo - Sweden and GerritForge Inc - California. Looking forward to another exciting year of innovation and development of the Gerrit Code Review platform and community.
Luca Milanesio (Gerrit Maintainer, Release Manager, ESC member) with contributions and reviews by David Pursehouse (CollabNet), Fabio Ponciroli (GerritForge), Matthias Sohn (SAP), David Ostrovsky, Gal Paikin (Google), Douglas Luedtke (Garmin), Nasser Grainawi (Qualcomm).