IBM Cloud Pak for Data- Multicloud Data Integration and Data Governance:
As we all know, IBM Cloud Pak for Data is a cloud-native solution that enables you to put your data to work quickly and efficiently. Let’s understand below features of IBM Cloud Pak for Data. I’ll also be discussing what practical experience I have gained while working on this through some detailed steps:
Multicloud Data Integration with DataStage:
IBM DataStage on IBM Cloud Pak for Data is a modernized data integration solution to collect and deliver trusted data anywhere, at any scale and complexity, on and across multi-cloud and hybrid cloud environments.
This cloud-native insight platform — built on the Red Hat OpenShift container orchestration platform — integrates the tools needed to collect, organize and analyze data within a data fabric architecture. Data fabric is an architecture that facilitates the end-to-end integration of various data pipelines and cloud environments through intelligent and automated systems.
It dynamically and intelligently orchestrates data across a distributed landscape to create a network of instantly available information for data consumers. IBM Cloud Pak for Data can be deployed on-premises, as a service on the IBM Cloud, or on any vendor’s cloud.
Source of above Data Stage diagram: IBM Documentation
Prerequisites: Need to have Data Stage Instance provisioned to perform the required tasks.
Below are the Tasks performed on Data Stage:
Prerequisites:
Below are the Tasks performed on Data Stage for Multiload Data Integration:
DataStage AVI (Address Verification Interface):
IBM’s Quality Stage Address Verification Interface (AVI) provides comprehensive address parsing, standardization, validation, geocoding, and reverse geocoding, available in selected packages against reference files for over 245 countries and territories.
AVI’s focus is to help solve challenges with location data across the enterprise, specifically addresses, geocodes, and reverse geocode data attributes. Data Quality and MDM have never been more critical as a foundation to any digital-minded business intent on cost and operational efficiency.
IBM cares about Quality addresses to avoid negative customer experience, Fraud Prevention, Cost of Undelivered and returned Mail, and maintaining key Customer Demographic data attributes.
Source of the above diagram: IBM Documentation
Prerequisites:
Below are the Tasks performed on the Data Stage AVI Feature:
Watson Knowledge Catalog:
IBM Watson Knowledge Catalog on Cloud Pak for Data powers intelligence, the self-service discovery of data, models, and more, activating them for artificial intelligence, machine learning, and deep learning. With WKC, users can access, curate, and share data, knowledge assets, and their relationships wherever they reside.
WKC’s below features were performed and tested.
Prerequisites:
Below are the Tasks performed on Watson Knowledge Catalog:
Watson Knowledge Catalog – Data Privacy:
Here I have learned:
Prerequisites:
Below are the Tasks performed on Watson Knowledge Catalog:
Conclusion: IBM Cloud Pak for Data is a robust Cloud Data, Analytics, and AI platform that provides a cost-effective, powerful MultiCloud Data Integration and Data Governance solution.
]]>IBM Cloud Pak for Data- Data Science MLOPS:
I have been Learning, exploring and working on IBM MLOPS for Data Science and wanted to share my learning and experience here about IBM’s Cloud services and how they are integrated under one umbrella named IBM Cloud Pak for Data.
First, let’s understand what IBM Cloud Pak for Data is.
IBM Cloud Pak for Data is a cloud-native solution that enables you to put your data to work quickly and efficiently.
Your enterprise has lots of data. You need to use your data to generate meaningful insights that can help you avoid problems and reach your goals.
But your data is useless if you can’t trust it or access it. Cloud Pak for Data lets you do both by enabling you to connect to your data, govern it, find it, and use it for analysis. Cloud Pak for Data also enables all of your data users to collaborate from a single, unified interface that supports many services that are designed to work together.
Cloud Pak for Data fosters productivity by enabling users to find existing data or to request access to data. With modern tools that facilitate analytics and remove barriers to collaboration, users can spend less time finding data and more time using it effectively.
And with Cloud Pak for Data, your IT department doesn’t need to deploy multiple applications on disparate systems and then try to figure out how to get them to connect.
Data Science MLOPS POC:
Before you begin an IBM Data Science MLOPS POC you need to have some pre-requisites done:
Data science MLOPS POC was focused on the main capabilities and strengths of Watson Studio and related products. The three main themes of POCs were:
IBM MLOPS Flow
Source of Diagram: IBM Documentation
Learned about MLOPS Phases and how we can approach a Data Science POC:
To build a Data Science POC we perform the followings activities/tasks:
Data Access: This covers Discovery, Ingestion, and Preparation
Watson Studio- Open Source and GIT with cpdctl for automated CI/CD deployment process: This Covers Development and Deployment
a. In CPD, Created a new Deployment Space ->Online– Selected the Customer Data Predict notebook->Execute and Save the Model using WML Object.
b. From the Projects -Assets view -> Locate the Model -> Promoted the Model by Selecting Deployment Space.
c. From Deployment Space -> Select the Model and Deploy it by clicking the Deploy button.
d. Similar way created Deployment Space for Batch -> Created Job -> Selected Customer data CSV as the source and executed it.
e. This is how we do automatic deployment of the model to a deployment space.
Monitoring and Governance – IBM Watson OpenScale is used for Monitoring the model in terms of Fairness, Quality, Drift, and other details.
Conclusion: IBM Cloud Pak for Data is a powerful Cloud Data, Analytics, and AI platform solution that provides end-users quick governed data access, increased productivity, and cost savings.
Note: Please note that some of the diagrams and details are taken from IBM (ibm.com/docs and other reference materials).
If you are interested in exploring and learning IBM Cloud Pak for Data and its services then please go through below tutorial:
Announcing hands-on tutorials for the IBM data fabric use cases
]]>