Retrieval of Performance Metrics from Azure Data Factory and Integration with New Relic
When running pipelines in a production environment, it is very important to collect metrics on daily performance in terms of execution duration and latency to detect any anomalies for immediate fixes. Not too long ago, this used to involve manually retrieving pipeline execution information for each activity in the workflow, and logging it in a spreadsheet or text document for every pipeline monitored daily. Luckily, we no longer have this kind of constraint due to Microsoft Azure’s Management API, whose usage can be leveraged via simple Java/Python/C# .NET code, to access the variety of different resources offered on the Azure cloud provider, including Azure Data Factory (ADF), which hosts and runs pipelines for all types of environments.
This article will explain how to retrieve performance metrics from ADF and integrate it with New Relic to display trends graphically, as shown in the architecture diagram in Figure 1.
The following sections will delve on a more deeper level on how this step by step process can be accomplished:
· Setting up an app registration and creating a client secret on Azure Active Directory
· Authenticating requests to obtain information on Azure resources using the AAD app
· Obtaining pipeline run IDs and activity run metrics from ADF
· Importing metrics to New Relic via Insights API to display trends graphically on a dashboard
Setting up an app registration and creating a client secret on Azure Active Directory
To start the integration process of reading ADF pipeline performance metrics into our program, a couple of preliminary steps are required to authenticate and authorize requests to Microsoft Login and the Azure Management API (which is the endpoint that provides access to the majority of your Azure resources on the cloud). In order to perform the authentication process, an Azure Active Directory (AAD) App Registration should be created and provided User.Read API permissions to Microsoft.Graph. This is the API that will read in user access information and allow a user to retrieve data from their Azure resources, like pipeline performance metrics on ADF, since it provides the same read permissions to the app registered as the actual user who created and owns the app. Once these steps have been complete, a client secret must be created for the registered app to use in the authentication process along with the client id. Figures 2–4 represent this process.
Authenticating requests to obtain information on Azure resources using the AAD app
In the actual programming script, a REST call to Microsoft Login must be made to retrieve a bearer token for authenticating requests to the Azure Management API. This is a POST request that should submit the following details to establish an authenticated connection with Azure: grant type, client ID, client secret, and resource. grant type specifies the type of authorization workflow being performed by OAuth 2.0; a standard, widely used protocol for authorization. Since an AAD app is used to perform the authentication, this should be set to client credentials. The client ID and client secret can be directly obtained from the registered app’s information on Azure AD, as shown in Figure 4 above. Last but not least, the resource that we are trying to authenticate requests to, which is the Azure Management API, should be specified using its URL endpoint: https://management.azure.com/. Once the POST request has been submitted, a bearer token for authentication will be generated in the response body (which will be formatted in JSON unless otherwise specified). This token will be used to authenticate before authorizing all consequent requests to the Azure Management API.
Figures 5 and 6 represent the client credential details on the registered app and the request and response from performing the authentication process to Microsoft Login via Postman, respectively.
The next steps are performed in two parts:
· Retrieving the run IDs corresponding to each pipeline run in the data factory we’re reading information from
· Retrieving details about the activities performed for that respective pipeline and their respective durations using the run IDs obtained above.
Figure 7 shows this two-part process graphically for clarity.
Obtaining pipeline run IDs and activity run metrics from ADF
To fulfill part one, a REST call to Azure Management API using the Bearer Token obtained previously is made. The endpoint used is specific to include the subscription ID, resource group name, provider we want to read information from (which is Microsoft.Data Factory in this case), the particular factory name, and the query string parameter to get a list of pipeline runs details. Figure 8 represented below shows how to format this request in Postman. Issuing this POST request provides a response body in JSON Array format (again unless specified otherwise in the original request). Parsing through this body in a loop programmatically will help provide high level details on each pipeline including total execution duration for the run and the respective run ID. Figure 9 shows the entire response body obtained for the request referenced in Figure 8. Using the obtained run IDs, another set of REST calls are made again to Azure Management API, still specifying the subscription ID, resource group name, provider, factory name, but additionally now also the pipeline run ID and the query string parameter to get a list of activity run details for that pipeline. This request provides another response body in JSON Array format, which can be parsed to retrieve name, start time, and end time details for each activity in the chosen pipeline’s workflow. Figure 10 shows the response details after making another request using the run ID obtained in the previous steps.
From here, any spreadsheet/text document provider can be used to represent the activity run information tabularly for each pipeline.
Importing metrics to New Relic via Insights API to display trends graphically on a dashboard
While displaying metrics in a file format may be good for reporting and documentation purposes, most of the times we want to visually display trends that our data depicts to determine whether the processes are occurring as per specifications and track anomalies. There are many telemetry and observability platforms that are rising into the market to meet this demand, with New Relic One topping the list in fourth place according to [1]. New Relic One monitors various types of performance data, from web, mobile, and server-based applications. Its comprehensive dashboard system allows for representation of trends in telemetry data collected along with a listing of logs from the apps producing the data. In the case of a pipeline performance monitoring system like detailed in the above sections of this article, a way to represent total execution duration trends for the pipelines on a daily basis is useful to help determine the causes behind potential slowdowns or delays on certain days. This would resolve latency issues that could be due to running multiple Platform specific jobs at once or even more substantial issues like not provisioning enough compute to run additional jobs required for the total pipeline execution.
In order to import ADF pipeline total execution duration data into New Relic One to represent visually on a dashboard, each pipeline must be logged as an event to New Relic Insights. This is possible by issuing yet another REST call to the New Relic Insights API with a JSON format converted zipped byte array of event data. Figure 11 is shown below to see the format in which this example data is submitted to Insights API. Making this POST call will create an event in Insights Data Explorer, which can then be added to a dashboard that can be generated under New Relic’s Dashboard options. Figures 12 and 13 show how to execute the request in Postman with headers required to authenticate to New Relic Insights API and the request body to upload through the process. Figure 14 shows the response body received from submitting the above request. Adding any event, simple or complex, will involve executing a NRQL query, which is New Relic’s version of SQL, to select the event data, format it as per desire, and display it amongst a particular set time frame. Figures 15 and 16 show pipeline performance monitoring data logged to a New Relic dashboard, displaying the trends over the week (limited to seven days for clarity and ease to find any anomalies).
On a final note…
Being able to work flexibly with large loads of data is a basic requirement in any digitally enabled enterprise. Microsoft Azure offers a variety of tools and APIs to facilitate this process, the Azure Management API detailed here being one of them. Integrating data with dashboards to represent trends has been a long used practice for better visual clarity and information conveyance. Thanks to providers such as New Relic, Data Dog, Azure Monitor, etc., this practice is now being revolutionized to include more features in representing and analyzing telemetry. This article provides a strategy to utilize these tools to gather pipeline run metrics, which is a common use case in enterprises running workflows in various environments, not just production.
Sources:
1. https://sematext.com/blog/cloud-monitoring-tools/#toc-4-new-relic-5