`
Jan 18, 2022
CDQ Custom Dashboard Utilizing IICS REST
API
Vivek Singh, Solutions Architect
2 © Informatica. Proprietary and Confidential.2
Housekeeping Tips
Ø Today’s Webinar is scheduled for 1 hour
Ø The session will include a webcast and then your questions will be answered live at the end of the presentation
Ø All dial-in participants will be muted to enable the speakers to present without interruption
Ø Questions can be submitted to “All Panelists" via the Q&A option and we will respond at the end of the presentation
Ø The webinar is being recorded and will be available on our INFASupport YouTube channel and Success Portal - where
you can download the slide deck for the presentation. The link to the recording will be emailed as well.
Ø Please take time to complete the post-webinar survey and provide your feedback and suggestions for upcoming topics.
Feature Rich Success Portal
© Informatica. Proprietary and Confidential.
Product Learning
Paths and Weekly
Expert Sessions
Bootstrap trial and
POC Customers
Informatica
Concierge
Enriched Customer
Onboarding
experience
Tailored training and
content
recommendations
44 © Informatica. Proprietary and Confidential.
More Information
Success Portal
https://network.informatica.com
Communities &
Support
Documentation
https://www.informatica.com/in/servic
es-and-training/informatica-
university.html
University
https://success.informatica.com https://docs.informatica.com
5 © Informatica. Proprietary and Confidential.
Safe Harbor
The information being provided today is for informational purposes only. The
development, release, and timing of any Informatica product or functionality
described today remain at the sole discretion of Informatica and should not be
relied upon in making a purchasing decision.
Statements made today are based on currently available information, which is
subject to change. Such statements should not be relied upon as a
representation, warranty or commitment to deliver specific products or
functionality in the future.
`
CDQ Custom
Dashboard Utilizing IICS
REST API
Vivek Singh
Solutions Architect
Sachin Jain
Principal Customer Success Technologist
7 © Informatica. Proprietary and Confidential.7
Agenda
• Overview
• Requirement
• Design
• Solution Scope and Technical Details
• Demo
• Q&A
8 © Informatica. Proprietary and Confidential.8
The Cloud Data Quality Reporting Dashboard Template is designed to provide a framework to capture
reporting metrics for data quality issues by extracting the Profile Details from the CDQ Profile Warehouse.
Furthermore, it demonstrates how to visualize the profiling data in a business intelligence tool.
Template includes a Python Script, and a sample Power BI Report and Dashboards illustrating DQ metrics. In
addition, it includes sample data files on which provided Power BI Report is build, to allow you to view the
reports without having the system up and running and generating metrics.
Sample report is build using Power BI but any reporting tool can be utilized to design similar DQ Dashboard.
Overview
9 © Informatica. Proprietary and Confidential.9
Requirement
Managing the quality of data within an enterprise requires precise feedback on the data quality of any data
movement. An automated Data Quality Report and Dashboard solution can provide a means to take already
ongoing activities (DQ validation rules, Data Profiling) and combine them together to generate periodic data
quality reports and dashboards to provide immediate metric feedback into the current and trending state of
data quality.
CDQ provides out of the box profiling solution to identify the data anomalies, apply validation rules and
present the result in visual form. These visuals are dataset specific and business users might be interested
in having a holistic and summarized view of Identified DQ anomalies across the organization/department.
Building a custom DQ reporting dashboard using the Profiling data from CDQ can help users to achieve the
stated objective. A summarized dashboard enables users to monitor the enterprise-wide data quality issues
from a single window, resulting improved DQ monitoring and feedback process.
10 © Informatica. Proprietary and Confidential.10
Design
Process Flow
Python Script
Profiling API
- Profile Definition & Details
- Column Profiling Result
- Top N Value Frequency
- Profile Rule Details
- Run Stats
FRS API
- Profile & Rule Additional
Metadata
CSV files
Dashboard
IICS
Request
Response
API Response
11 © Informatica. Proprietary and Confidential.11
Solution Scope
Prerequisite -
DQ Rules are already created in CDQ and attached to the profiles
Profiles are successfully executed in CDP
A user account who can run the Rest API and have access to the CDQ/CDP assets
Python installed with below libraries
requests, json, pandas, csv, os, sys, datetime, shutil, glob
Covered in this solution Framework
Python Script to call the API and capture response in CSV files
Use the generated CSV files to build Reporting dashboard
Automating CSV file generation -
User can automate the Python Script execution using Cron Job (on Linux) or Task Scheduler (On windows) or can even use CDI.
© Informatica. Proprietary and Confidential.1212
Profiling REST API
Following IICS Rest APIs have been used
Login API
To log into IICS and get Session ID. The session ID expires after 30 minutes of inactivity and used as header parameter for all subsequent REST API calls.
https://dm-{POD_region}.informaticacloud.com/ma/api/v2/user/login
Profiling API
Get list of profiles
https://{POD}-dqprofile.dm-{POD_region}.informaticacloud.com/pro filing-service/api/v1/profile
Get details of each profile like Profilable fields, Rules tagged, Sampling Option, Filter enable flag, Created by, date etc.
https://{POD}-dqprofile.dm-{POD_region}.informaticacloud.com/pro filing-service/api/v1/profile/{id}
Get Column Profiling Details By Column Id Column Profile Result as seen on Profile Result page in IICS Data Profiling
https://{POD}-dqprofile.dm-{POD_region}.informaticacloud.com/metric-store/api/v1/odata/Profiles('{profileId}')/Columns('{columnId}’)
Get Top N value frequencies by column id
https://{POD}-dqprofile.dm-{POD_region}.informaticacloud.com/metric-store/api/v1/odata/Profiles('{profileId}')/Columns('{columnId}')/ValueFrequencies
Lists all the profile runs for a profile ID like Job execution status, Start/End time, Time Taken, Memory Consumed etc.
https://{POD}-dqprofile.dm-{POD_region}.informaticacloud.com/pro filing-service/api/v1/runDetail?profileId={profileId}
FRS API
Get Additional Metadata Details like profile and rule project, folder name, rule dimension name etc.
https://{POD}.dm-{POD_region}.informaticacloud.com/frs/v1/Documents('{in_frs_id}')?$expand=userInfo
13 © Informatica. Proprietary and Confidential.13
Python Script
Below Steps are performed by Python script
1. If specified folder does not exist, then create it else archive the CSV files if they are already present
2. Call CDQ Profiling Rest API in below sequence and capture the parsed response in respective CSV files
3. Delete the temporary files generated during the process
Rest API (Refer previous slide)
Output CSV file
IICS Login API
N/A
Get list of profiles API
N/A
Get details of each profile
Profile_Metadata_Details.csv
Profile_Rule_Output_Fields.csv
Get Column Profiling Details By Column Id
Column_Profiling_Result.csv
Get Top N value frequencies by column id
Top_Value_Frequency_Data.csv
Lists all the profile runs for a profile ID
Profile_Execution_Stats.csv
Get Additional Metadata Details
Profile_Metadata_Details.csv
14 © Informatica. Proprietary and Confidential.14
Reporting Dashboard
A sample reporting dashboard has been created on top of the CSV files generated
1. In this solution Power BI has been used to create the Reporting dashboard but any reporting tool which is capable of reading CSV files
can be utilized
2. In Power BI below Model has been created to enable across report filter capability
3. The Report contains 2 dashboards
DQ Dashboard - Shows DQ anomalies like Uniqueness, Completeness, Validity, Accuracy check results
DQ Asset Analytics - Shows stats on DQ assets like # of profiles with distribution against rules tagged, sampling/filter applied, execution stats, rule
count distribution
4. Users can also drill down the summarized graphs to view the underlying data in tabular format
Here -
Solid line represents Active relationship
Dotted line represents Inactive relationship
temp_Profile_Metadata_Details_distinct_profiles is
a derived dataset referencing
Profile_Metadata_Details
15 © Informatica. Proprietary and Confidential.15
How can I use this solution?
Once the prerequisites mentioned in Solution Scope slide are fulfilled, perform below steps -
1. Download the package (link is in Reference slide)
2. Update following lines in the python script
POD_region = “<YOUR IICS POD REGION NAME>"
username = “<YOUR IICS USERNAME>"
password = “<YOUR IICS PASSWORD>"
csv_file_path = “<COMPLETE PATH WHERE CSV FILES WILL BE GENERATED>“
Optionally you can also update value of history_rec_count variable. Default value is 10 which means report can show trend for last 10 executions
3. Execute Python Script. On successful execution you will see below 5 files in the “csv_file_path” location
Profile_Metadata_Details.csv, Profile_Rule_Output_Fields.csv, Column_Profiling_Result.csv, Top_Value_Frequency_Data.csv, Profile_Execution_Stats.csv
4. Open report in Power BI Desktop and Update the Power BI datasets to point to the correct file location
5. Refresh Data for each datasets in the Report
Once you have tested the solution, You can automate the process of python script execution (refer next slide) and data refresh in the
report (like using gateway in Power BI)
© Informatica. Proprietary and Confidential.1616
Python Script Execution Automation - Sample
User can automate the Python script execution to get updated data files at a regular interval. Few Sample Ways -
Using Cron job in Linux Machine
Sample configuration in Linux Cron schedule file to run the python script every 1 hour
* */1 * * * /usr/bin/python3 /home/user/IICS_CDQ_Dashboard_Dataset.py >> /home/user/python_script_log.log 2>&1
Using Task Scheduler in Windows machine
Using CDI
17 © Informatica. Proprietary and Confidential.17
Getting Started with Cloud Data Profiling REST API
- https://docs.informatica.com/data-governance-and-quality-cloud/cloud-data-profiling/h2l/1547-getting-
started-with-cloud-data-profiling-rest-api/getting-started-with-cloud-data-profiling-rest-api.html
CDQ/CDP custom reporting dashboard framework artifacts
- https://github.com/vks9907/CDQ_Custom_Dashboard
Reference
© Informatica. Proprietary and Confidential.1818
Demo
Q & A
Thanks