Splunk : Jenkins, OpenTelemetry, Observability

January 06, 2022 at 12:08 pm

By Jeremy Hicks January 06, 2022

If you're like most organizations, you're leveraging Jenkins for all sorts of things. Deployment pipelines, automated API tests, even glorified CRON jobs just to name a few.

How Do You Gain Insight Into These Various Types of Pipelines?

Build Logs: Good for auditing and troubleshooting but difficult to use for long term metrics trends.
Time Series Metrics: Great for establishing the health of Jenkins instances and identifying issues over longer time periods. Less great for gathering data on specific jobs and/or steps of jobs due to higher cardinality.
Tracing/APM: A more uncommon approach that provides detailed waterfall charts of individual runs and steps. Allows checking outcomes and individual step data at a glance. Consider it a combination of build log detail and time series visualization over time.

Tracing and APM for Jenkins recently became much more straightforward with the advent of the OpenTelemetry project and an OpenTelemetry Jenkins Plugin (Maintained by Cyrille Le Clerc). Once configured, a single click can take you from your Jenkins job into a detailed waterfall chart of the entire pipeline run!

Why Would I Want APM Data from Jenkins?

Combining the power of OpenTelemetry (OTEL), Jenkins, and Splunk APM you can leverage the granularity of distributed tracing to understand specifics of your Jenkins usage that were previously difficult to uncover while having full control of your data.

Build queue times starting to become excruciatingly long? Quickly identify builds and steps holding up Jenkins for unusual amounts of time.

Noticing a slow increase in the time it takes to run pipelines across your organization?Send your Jenkins APM data through Splunk Log Observer to emit time series metrics of all steps and easily visualize increased (or decreased) time spent on various steps across all jobs even after your Jenkins data has aged out of APM.

Are calls to external services taking longer than average? Perhaps git checkout takes longer than average or a given API's response has become slower over time. Splunk APM's Tag Spotlight can help visualize lengthy calls to external services in your pipeline with P50, P90, and P99 values.

Want to know when another Team's builds are happening that may impact your service?Set up a detector on their deployments and have an event marker show up on your dashboards to quickly establish if their deployment has impacted your service's performance.

APM (or distributed tracing for those historically inclined) is a powerful tool for understanding interactions over the entire lifespan of a given process; in this case, Jenkins deployments. Not only does it give you a nifty waterfall chart of where time was spent in each step of a Jenkins deployment, but it also provides additional data to aggregate with more common time series metrics and traditional build logs. Various parts of your organization may benefit from Jenkins trace data in unexpected ways:

IT Operations / Support Analysts: As a member (or Head) of the IT Operations team, I want up to date build information on important services to notify Software teams if recent deployments are causing or related to a service interruption.
DevOps / SRE: As a member (or Head) of a DevOps or SRE team, I need to ensure services are healthy, and if not, quickly track down the cause. The ability to provide stakeholders with visibility into issues caused by deployment of applications and infrastructure will help them improve their software development and deployment practices improving overall MTTD.
Software Developers: As a member (or Head) of a Software Development team I want to know if a recent deployment of my own software or upstream services is causing an issue before customers are impacted without jumping between different tools and UIs.
CI/CD: As a member (or Head) of a team in charge of Continuous Integration / Continuous Delivery solutions I want to understand why and where our CI/CD pipelines are slowing down, and how to best address any issues to quickly improve CI/CD services provided to DevOps, SRE, and Software Development teams.

With Jenkins, Splunk APM can address these concerns quickly in one place without being overwhelmed by tool sprawl. There is no need to utilize multiple tools and jump between different interfaces for Jenkins, logging, and monitoring data to understand what's really going on.

Setup: How to Hit the Ground Running

To get setup, quickly check out the Github repository for OpenTelemetry Collector configuration examples, documentation, and 2 Splunk Observability Cloud Dashboard exports to get you started. Armed with these artifacts and an OpenTelemetry Collector you'll quickly be able to provide more detailed Jenkins insights for IT Operations, CI/CD teams, and DevOps professionals.

^{Figure 1-1. Get detailed Jenkins pipeline metrics with Jenkins APM data}

Out Of The Box Dashboards

The Github repository linked as part of this blog includes two dashboards meant to help understand specific Jenkins Pipelines and also overall Jenkins Health. They can be leveraged as-is or used as a starting point for building your own more detailed deployment dashboards.

Also included in the Github repository are instructions and SignalFlow for setting up a Detector to notify you of failed deployments. This sort of detector is useful not only for knowing when your own deployments have issues, but also for knowing when an upstream service you depend on is having a problem due to a failed (or successful) deployment. Exposing these types of events on your dashboards can help provide more context with less tool sprawl..

How do you get these insights today and how much effort does it require?

^{Figure 1-2. Overall Jenkins Health: Observe valuable Jenkins agent, build queuing, and even detailed step metrics (with Log Observer) at a glance.}

The Future

OpenTelemetry, APM, and Infrastructure Monitoring are integral, and until now separate, but crucial tools for understanding your services. With their powers combined in one tool you will more quickly establish effects of deployments, understanding of Jenkins performance, and gain the ability to quickly notify teams of issues with their own or other services related to software builds and releases. But, the future is even brighter! These additional insights into Jenkins can help unlock metrics for better understanding the larger impacts of DevOps within your organization.

Jenkins and DORA Best DevOps Friends

DevOps Research and Assessment (or DORA) metrics address a fundamental set of concerns when attempting to measure DevOps activity and performance. The four key metrics associated with DORA that may benefit from or require additional Jenkins context are:

Deployment Frequency: How often are you deploying your code with Jenkins? Chances are, that with a bit of effort, you can dig up and report on this data already. But, imagine a dashboard per org, team, or service showing this number in a single chart by leveraging APM or Log Observer data emitted from Jenkins.
Change Failure Rate: Going hand in hand with Deployment Frequency is Change Failure Rate. Similar to the familiar monitoring metric of Error Rate; you can leverage your Jenkins APM data to quickly visualize Change Failure Rate at various levels of organizational complexity. This metric can be invaluable for determining and prioritizing DevOps work related to improving your delivery and deployment of software.
Mean Lead Time for Changes: Knowing when you're deploying is a crucial element of establishing the overall development time required to get a change from ticket inception to final deployment into production. Using your new Jenkins data in Observability Cloud along with some additional signals from other software like Jira and Github, you're well on your way to establishing a trackable flow from ticket, to development, and on to deployment.
Time to Recovery: The final piece of the DORA puzzle and directly related to its Observability focused Mean Time To Recovering (MTTR) cousin. Understanding Time to Recovery requires Jenkins metrics to know when the deployment went out with a breaking change and when the fix is finally deployed to production

Next Steps

Armed with your new Jenkins metrics and APM data, get out there and scrutinize pipelines, evaluate deployments, and generally push your DevOps Magic™ to the limit!

Want to quickly start understanding your Jenkins deployment? You can sign up to start a free trial of the Splunk Observability Cloud suite of products today!

This blog post was authored by Jeremy Hicks, Observability Field Solutions Engineer at Splunk with special thanks to: Doug Erkkila, Adam Schalock, Todd DeCapua, Tom Martin, Marie Duran, and Joel Schoenberg at Splunk.

Attachments

Original Link
Original Document
Permalink

Disclaimer

Splunk Inc. published this content on 06 January 2022 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 06 January 2022 17:07:04 UTC.

	1st Jan change	Capi.
MICROSOFT CORPORATION	+14.39%	3,197B
SYNOPSYS INC.	+14.08%	89.6B
CADENCE DESIGN SYSTEMS, INC.	+8.04%	80.08B
DASSAULT SYSTÈMES SE	-13.12%	55.52B
PALANTIR TECHNOLOGIES INC.	+22.36%	46.79B
THE TRADE DESK, INC.	+31.67%	46.34B
ATLASSIAN CORPORATION	-29.01%	43.95B
SEA LIMITED	+77.73%	41.34B
TAKE-TWO INTERACTIVE SOFTWARE, INC.	-3.95%	26.5B

1st Jan change

Capi.

MICROSOFT CORPORATION

+14.39%

3,197B

SYNOPSYS INC.

+14.08%

89.6B

CADENCE DESIGN SYSTEMS, INC.

+8.04%

80.08B

DASSAULT SYSTÈMES SE

-13.12%

55.52B

PALANTIR TECHNOLOGIES INC.

+22.36%

46.79B

THE TRADE DESK, INC.

+31.67%

46.34B

ATLASSIAN CORPORATION

-29.01%

43.95B

SEA LIMITED

+77.73%

41.34B

TAKE-TWO INTERACTIVE SOFTWARE, INC.

-3.95%

26.5B

Bitwarden Expands Splunk Cloud Integration for Advanced Event Management	05-16	CI
Splunk Unveils Asset and Risk Intelligence to Revolutionize Proactive Risk Mitigation	05-06	CI
ANALYST RECOMMENDATIONS : Best Buy, Wells Fargo, AMD, Netflix, Nvidia...	03-20
Splunk Inc.(NasdaqGM:SPLK) dropped from FTSE All-World Index	03-19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P Software & Services Select Industry Index	03-19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P TMI Index	03-19	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from S&P Global BMI Index	03-19	CI
ANALYST RECOMMENDATIONS : 3M Company, Snowflake, Splunk, Micron, Nvidia...	03-19
How Cisco Will Integrate Splunk Into Company	03-18	MT
Cisco: completes acquisition of Splunk for $28 billion	03-18	CF
Splunk Inc.(NasdaqGS:SPLK) dropped from NASDAQ Composite Index	03-17	CI
Cisco Systems, Inc. completed the acquisition of Splunk Inc. from Hellman & Friedman Capital Partners X, L.P., managed by Hellman & Friedman LLC, BlackRock, Inc., The Vanguard Group, Inc., PRIMECAP Management Company and others for approximately $27 billion..	03-17	CI
Splunk Inc.(NasdaqGS:SPLK) dropped from NASDAQ-100 Index	03-14	CI
Add a little SaaS to your life	03-14
EU Watchdog Green-lights Cisco Systems' Purchase of Splunk	03-14	MT
Cisco gains EU antitrust nod for $28 billion Splunk acquisition	03-14	RE
Oracle posts rise in quarterly profit on strong cloud demand	03-11	RE
Linde to Join Nasdaq-100 Index	03-11	MT
Cisco's Splunk deal set to win unconditional EU antitrust OK, sources say	03-05	RE
GitLab shares drop as 'less conservative' forecast disappoints investors	03-05	RE
Splunk beats quarterly revenue estimates on steady demand for cloud services	02-27	RE
Splunk Fiscal Q4 Earnings, Revenue Rise	02-27	MT
Earnings Flash (SPLK) SPLUNK Posts Q4 Revenue $1.49B, vs. Street Est of $1.27B	02-27	MT
Splunk Inc. Reports Earnings Results for the Full Year Ended January 31, 2024	02-27	CI
Splunk Inc. Reports Earnings Results for the Fourth Quarter and Full Year Ended January 31, 2024	02-27	CI

Splunk Inc.

Equities

SPLK

US8486371045

Software

Splunk : Jenkins, OpenTelemetry, Observability

Latest news about Splunk Inc.

Chart Splunk Inc.

Company Profile

Sector Other Software