DevOps and CI/CD principles revolutionized software development. Anyone who wants to deliver high-quality software frequently and reliably uses these best practices. We use version control, write automated tests, and automatically deliver code from initial development to production to meet the demands of today’s world, where agility and speed have become critical competitive advantages.
These best practices are possible because we control the source code. If we couldn’t work with the code, we couldn’t make incremental changes to respond to the growing expectations and demands of our end users. But when we think of our analytics platforms, we’re not really able to access and manage the code we create when we build analytics with them.
So the obvious question is, if managing the underlying code and taking advantage of software development best practices have taken software development to a new level, why not apply the same techniques to analytics as well?
What is Analytics as Code?
Analytics as Code is the management of analytics using human and machine-readable configuration files. This means that our analytics solutions— connectors, semantic layer, dashboards, metrics, visualizations, user management, and other analytical objects — are transformed into a manageable piece of code. And this code should be treated exactly the same way as any other application source code.
The configuration files that define our analytics must be integrated into our version control systems to track, review, and monitor changes. With CI/CD platforms and testing tools, we can automate the integration and testing stages and deploy analytics to end-users faster, with higher quality and lower error rates.
This allows us to quickly innovate and experiment with new insights and make them available to end-users at an ever-increasing rate. We can minimize the time and effort required to turn requirements into solutions and improve and reuse them throughout the organization.
From a manual process to an easy-to-manage, reusable piece of code
Traditionally, all parts of our analytics are defined using the graphical user interface of the analytics platform. But because of a lack of openness and flexibility, the underlying code that we manually generate by clicking and dragging and dropping cannot be exported and managed outside the solution. This “what happens on the platform stays on the platform” approach has begun to limit analytics creation, management, and deployment. And it’s no longer a scalable solution as we strive to respond to today’s fast-paced world of analytics.
Analytics as Code is based on modern analytics tools that support the import and export of all underlying metadata — in a declarative format — and provide open APIs to automate the ongoing delivery process. When we can export human-readable configuration files from our entire analytics solution, we can use both the platform interface and our favorite IDEs to manage the code and leverage best software practices. As a result, analytics becomes an easy-to-manage, reusable piece of code.
Examples of Analytics as Code configuration files
Below is a metric configuration file imported from an example analytics platform — GoodData.CN Community Edition — via open APIs. As we can see, the Total Sales configuration is both human and machine-readable:
Metric config file
In this exported configuration file for the Total Sales by Year visualization, we use the created Total sales metric (a reference to the metric in line 17) and slice it by year (line 37). The visualization type — column chart — is specified in line 51.
Visualization config file
Once we have a visualization, we can create a dashboard around it. In the following dashboard configuration file, we specify the layout and visualization (a reference to the created visualization in line 39).
Dashboard config file
Below we see what the created dashboard looks like on the platform. If we make any changes to the configuration files above — e.g., update the metric, change the type of visualization, or add a new visualization to the dashboard — we can import the files back to the platform, and the solution will update accordingly.
If you are interested, the complete config file of this simple example— data connector, physical data model, logical data model, users and user groups, and all previously displayed objects —can be found here: Configuration file.
Advantage of Analytics as Code
Analytics as Code makes managing and deploying analytics more efficient by dividing analytics into reusable code snippets and utilizing the same principles we use to scale up our other software. Here are some of the benefits that Analytics as Code offers:
Versioning
When we use configuration files, we can version the entire analytics solution and each object in it. Thus, all parts of our analytics are subject to source control, just like any other code.
CI/CD and Collaboration
Our data engineers and analysts can work simultaneously with different parts of the solution — semantic layer, metrics, dashboards, or anything else — and write automated tests to ensure that the logic we use works as it should. They don’t have to worry about breaking the work of others when they push updated versions into production.
Reusability
We can divide our analytics into modular code components, so our analytical objects become reusable code snippets that can be shared among teams. There is no need to re-create visualizations or metrics for different use cases, as we can reuse existing configuration files.
Consistency
Because the configuration files serve as a single source of truth, Analytics as Code ensures consistency across the organization. It ensures that everything works the way we want it every time we deploy or update our analytics.
Speed and Quality
We can make incremental changes to the code and quickly deploy updated analytics versions. The faster we develop and deploy our analytics, the higher the quality because we can deploy smaller snippets of code that are much easier to test. And to complete the process, we can quickly gather feedback on changes and respond to them immediately.
Automation
Declarative configuration files, along with open APIs, allow us to automate hideous manual tasks like (de)provisioning of new tenants and dashboard, metrics, and visualization creation. They also make it possible to programmatically change the configuration of our analytics solution.
Summary
The concept of Analytics as Code is simple; we should treat our analytics in the same way as any other software. This approach complements the functionalities offered by our analytics platforms and helps us move out of the current situation where we are at the mercy of platforms in terms of how we build and manage our analytics.
It’s time to turn our analytics into an easy-to-manage, reusable piece of code while leveraging software development best practices. By doing so, we can scale our analytics like modern applications, and ensure that we deliver data into people’s hands faster, more reliably, and more agilely so they can use it better for what it is intended for — to make better decisions.
Comments