Observability is the ability to infer a system’s state from output data. Or is it the sum of logs, metrics and traces? Well, both actually. The former is a qualitative and abstract expression, while the latter is more quantitative and specific.
Both are definitions people seem to subscribe to, according to the DevOps Pulse 2020 survey. DevOps Pulse is an annual survey conducted by logz.io, a company active in the observability space. Similar to 2019, over 1000 people took the survey, which gives interesting insights on the state of DevOps.
Similar to 2019, we connected with a logz.io executive to discuss the findings of the survey. Unlike 2019, DevOps Pulse 2020 was released towards the end of 2020, rather than the beginning of 2021. The main reason is that it’s timed to coincide with a product announcement from logz.io, around the next stage in observability, and the main finding of the survey: tracing.
Obervability’s maturity curve
ZDNet connected with Jonah Kowall, logz.io CTO, to discuss DevOps Pulse 2020, observability at large, and tracing in particular. Kowall, who is relatively new in logz.io, but experienced in the observability space, said what attracted him to logz.io was the dynamic of its open source offering.
As we have seen before, logz.io offers observability-related frameworks, such as Elastic / the ELK stack and Grafana as a service. It’s a well-known pattern and offering: rather than taking the open source software and running it themselves, many organizations find that having a 3rd party do it for them makes sense.
When discussing “what do we talk about when we talk about observability”, we noted about 1 in 3 survey respondents said it’s “using logs, metrics and traces”, and another 1 in 3 said it’s “the measure of how well a system’s state can be inferred from output data”. Different ways of saying the same thing, noted Kowall, but neither actually focuses on the reason why you’re doing it.
The reason is not only to detect and diagnose problems, but also to get an understanding of how your digital business is functioning. There’s a maturity curve of sorts that you see around observability, Kowall went on to add. Companies that don’t have observability often don’t see their problems until users complain — the reactive mode.
Slightly more advanced companies start saying — “now that we see a problem, how do we fix it? Who is the best person to fix it”? Then the most advanced users, and this is a very small percentage, use that data to drive their business decisions: “how do I infer or observe my business based on the data coming from my applications and infrastructure”?
This pointed us towards two more interesting findings in DevOps Pulse 2020. One, respondents seemed to be uncertain as to what it is they do exactly, or whose job observability actually is. This is consistent with the findings in 2019 — not much has changed. Kowall thinks a lot of this is about names, or what people think of themselves.
Some teams used to call this job operations, they have perhaps renamed themselves to DevOps, but have not changed much otherwise. The real sign of DevOps in action, Kowall went on to add, is adopting new ways of building and releasing software — continuous deployment and continuous change. When this happens, it really is DevOps — a shared responsibility which requires shared shared data and shared roles between the Dev and Ops teams.
Two, about 70 percent of respondents said they’re using between two and four observability tools. Trying to figure out what this could possibly mean, we also looked at how people responded when asked to share what types of observability tools they are using.
The vast majority of respondents use tools for log management and analysis, as well as infrastructure monitoring (about 90 and 80 percent, respectively). About 1 in 3 use application performance monitoring tools, and 1 in 4 use distributed tracing tools. In other words, people use best of breed tools, and this is another reason why open source makes sense in observability, too.
Open source observability tools are in the mix for about 80 percent of respondents, with the majority of users relying on open source for the majority of their operations. The reasons they give are typical for open source users – ease of integration, community, avoiding vendor lock-in, lower cost of ownership.
Distributed tracing as a service with Jaeger
When it comes to integration of disparate tools for observability, Kowall highlighted OpenTelemetry. The idea behind OpenTelemetry, a CNCF (Cloud Native Computing Foundation) project which is second to Kubernetes in popularity, is to create a standards-based way to collect data and send it to various tools.
Browsing through OpenTelemetry, one can find support from vendors such as Microsoft Azure and Google, in addition to a who is who of vendors active in the observability space. OpenTelemetry has good traction, and is closing in on a first release which will likely happen in 2020, said Kowall:
“This shows that the industry, or at least the leaders in the industry, are moving to this open way of sharing information. And the proprietary nature of tools from before are becoming less accepted by the general community”.
Coming back to the maturity curve in observability, Kowall noted that tracing is less popular because it’s harder to implement. But open standards and open source projects will change over time, he went on to add. The reason why Cloud Native is so important for observability is that many of the cloud native stack technologies are automatically integrated with tracing.
When Kubernetes is deployed, for example, some type of service mesh or proxy system is also likely to be deployed. Many of those systems are built on an open source tool called Envoy, which integrates with tracing systems. As these cloud native tools continue to get deployed, it becomes easier to collect that data.
Today, however, implementing tracing is not very easy. In some languages, like Java, there are very easy ways to do that in an automated manner that require no code changes, but that’s not the case everywhere.
What logz.io is announcing today is the general availability of Jaeger as a service, to capitalize on this potential. Jaeger, together with Zipkin, are the most popular open source tools for tracing. logz.io has been working on Jaeger, hardening its code in terms of bug fixes and new features, and adding machine learning on top of the data it collects to make it easier to troubleshoot and isolate problems.
Jaeger was started and donated to the CNCF by Uber, and Red Hat is another big contributor. Kowall said logz.io has gotten very involved in the project the last few months, and they also plan to improve user interface and usability, with releases around that scheduled for early 2021.
Kowall also mentioned a new dependency mapping view that that logz.io have contributed back to Jaeger, and integration with Kibana, allowing to move from traces to logs. This may be contributed to open source Jaeger too at some point.
Last but not least, logz.io is also announcing a Prometheus as a service offering in private beta. This is an offering that integrates natively with Prometheus and allows people to use Prometheus in a in a standard way, integrated with the ELK stack and with Grafana.
Capitalizing on open source
logz.io’s strategy is typical when it comes to capitalizing on open source projects: hardening them and offering them as a service in the cloud, aiming to lower the bar of adopting and using them. Kowal mentioned the hidden cost of running open source, which is something we have highlighted as well, and claimed logz.io makes it much more economical for users to run observability frameworks.
This is rather hard to quantify, but it is an interesting offering. In any case, to the best of our knowledge, there is no distributed tracing as a service offering in the market at this point. DevOps Pulse respondents indicated that getting tracing up and running is in their plans for the coming years. So this looks like an investment that could pay off for logz.io.