Project Summary

Designing the curation experience for an AI driven data lineage tool

Data Lineage is a critical module in Informatica's cloud data governance and privacy product. This project aimed to investigate and imagine how AI can be leveraged to predict data connections within the data lineage, particularly in cases where the data itself is not readily available.
My Role
  • Stakeholder interviews & discussions
  • Documenting ask, business goals, user goals
  • Storyboarding use case & scenario
  • Concept development
  • Low and high fidelity wireframes
  • Prototyping and client walkthroughs
Position

UX AI Intern

Company

Informatica

Duration

12 Week (JUN - AUG 2022)

Mentors

Ranjeet Tayi, Jill Blue Lin

What is data lineage?

Data lineage is the process of understanding, recording, and visualizing data as it flows from data sources to consumption. It shows where the data originated, how it has changed, and its ultimate destination within the data pipeline.

What is it used for ?

Apart from helping organizations keep a clear record of data’s movements and transformations, it is also used for:

Assurance of Data Integrity

To make sure data elements in your report are trustworthy

Data Governance

To check data stores for personal user information and ensure data privacy law compliance.

Impact Analysis

To analyze impact of changes you make upstream or downstream in your data system

ProBLEM

Unparseable code creates gaps in dataflow diagram

Automated data lineage process relies on creating connection assignments by parsing code logic. However, when the code logic or parser is unavailable, it creates a gap in the data flow diagram, especially in legacy systems without custom data lineage scanners.

User roles affected by the problem :

Data Catalog Admin
Responsible for keeping data source documentation up to date within the organization.
⬇️
Missing Lineage Request Lead to Data Silos
Data Steward
Plans data management strategies and standards for optimal data usage.
⬇️
Manual Lineage Curation is Cumbersome
Business Analyst
Creates and presents analysis reports based on which critical business decisions are made.
⬇️
Unable to Verify Credibility of Report
How might we facilitate data lineage documentation in these cases through the power of machine learning?
SOLUTION
Use Case 1: Create Job for Curation
As a Data Catalog Admin, I want to set up a collection of sources and targets for the CLAIRE machine learning engine to efficiently curate the inferred data lineage.
FEATURE detail

Streamlined Project Creation and Assignment

Reduce the hassle of documentation and assignment through third party apps through in app project creation and assignment.

Use Case 2: Accurate Lineage Curation
As a Data Steward, I want to curate an accurate inferred lineage map for business analysts to make data decisions for the organization.
FEATURE detail

Job Summary for Planning

A clear job summary with recommendations and it's breakdown for easier project planning and management.

FEATURE detail

Granularity for precise decision making

Drill down from data set level to data column level recommendations to take decisions at different lineage levels.

FEATURE detail

ML Recommendations to aid decision making

Recommendation card provides prediction parameters and a confidence score to supplement decision making and build trust with the user. Action on a recommendation refines the ML model.

FEATURE detail

Toggle Lineage views

Lineage view toggle from map to list view enables quick search and comparison of recommendations  for accurate decision making.

FEATURE detail

Collaborate through the Comments Panel

The comments panel provides a space to discuss and consult with other data stewards about a recommendation decision within context.

Use Case 3: Data Lineage Transparency
As a Business Analyst, I want to have a transparent view of the data lineage based on inferred vs derived so that I can make robust business decisions.
FEATURE detail

Toggle between Inferred and Derived Lineage

The Inferred lineage toggle indicates that there is some ML based inferred lineage documentation curated by the data steward.

DESIGN PROCESS

4D Design Process

This project utilized the 4D Design Process, a converging and diverging approach which consisted of the following phases

Define
Goal
Discover
Need
Design
Solution
Validate and Iterate

AI Engine - CLAIRE as Part of Cloud Offerings

At the start of the project, I met with stakeholders to understand the business goals of the project. The following business goals allowed me to define my own design goals:

Stakeholder Interviews for Understanding Data Systems

Due to time and budget constraints, direct user engagement was not possible during the concept development phase. Instead, I relied on the project manager's and fellow designers' expertise to understand the user. I conducted three stakeholder interviews with the following research goals:

I needed to learn -

Plotting the Curation Experience

Scenario: NYC Health+ hospitals use Informatica's data lineage tool to manage patient data. They also receive data from external partners but due to data sharing restrictions they do not have the transformation logic and path for that data.  They need a way to view their end to end data lineage for verifying the information in their revenue report.

Key Insights
Lineage Curation

Machine learning approach is hard to debug

⬇️

Share prediction details with users to build trust

Lineage Transparency

Data Lineage is used for critical business decisions

⬇️

Distinguish derived vs. inferred to show information reliability

Collaboration

Data systems too vast for one person to know entirely

⬇️

Inferred lineage needs to be a collaborative tool

User Roles in Inferred Lineage Curation

Stakeholder interviews and secondary research was defined in three role based user personas: Data Catalog Administrator, Data Steward, and Business Analyst.

Task Flows

Understanding AI-Powered Data Lineage Through Storyboarding

To validate my understanding of the scenario, the storyboard served as a tangible representation, facilitating communication and shaping the project's direction.

Concept Exploration

With the scenario and use cases in place, I then explored ideas for live collaboration, recommendation display, CLAIRE representation, and differentiation of derived and inferred lineage.

Final Design Component Specifications

Garnering Customer Feedback

Due to time and budget constraints, user interviews were not possible at the start of the project. Therefore, I used the customer validation call to get more insights into the lineage systems and users. I spoke with two clients, where I examined their existing data systems and lineage in their real-time environment in the first half of the call, and then shared my ideas and prototype concepts; collecting feedback.

To garner feedback on my concepts, I met with data stewards from two clients: Elevance Health (an insurance provider) and Thrivent (a not-for-profit financial organization). Talking to them I got the following feedback -

REFLECTION
Learnings
🌃
A Mock-up’s worth a Thousand Words
Through low fidelity concept mock-ups I was able to best explain initial ideas, design requirement, feasibility and timeline with the cross-functional stakeholders.
📝
Define the Scenario
Use case scenarios need to be as detailed as possible for designing robust workflows, because every industry could have different data management and governance.
🤖
Human Discretion for AI Probability
Because AI is based on probabilities instead of definitive answers, human discretion becomes crucial for training and refining the AI system—making traditional approaches like bulk actions less applicable.

Thanks for stopping by!

Find something you like? Contact me at awaneemjoshi@gmail.com

Facebook Logo
Twitter Logo