Eric Walk, Author at Perficient Blogs

Adopt the PACE Framework with IBM watsonx.governance

Mon, 28 Apr 2025 14:04:51 +0000

As my clients start to harness the power of AI to drive innovation and improve operational efficiency, the journey to production is fraught with challenges, including ethical considerations, risk management, and regulatory compliance. Perficient’s PACE framework offers a holistic approach to AI governance, ensuring responsible and effective AI integration. By leveraging IBM watsonx.governance, enterprises can streamline this process, ensuring robust governance and scalability.

Starting Point

The implementation of the PACE framework using IBM watsonx.governance begins with a clear understanding of the enterprise’s AI goals and readiness. This involves:

Assessment of AI Readiness: Evaluating the current state of AI within the organization, including existing capabilities, infrastructure, and stakeholder buy-in.
Defining Objectives: Establishing clear, measurable goals for AI integration that align with business objectives and ethical standards.
Stakeholder Engagement: Ensuring that all relevant stakeholders, from executives to technical teams, are engaged and informed about the AI governance strategy.

Challenges

Several challenges may be encountered during the implementation process:

Ethical and Regulatory Compliance: Navigating the complex landscape of AI ethics and regulatory requirements can be daunting. IBM watsonx.governance provides tools to automate compliance management, but continuous monitoring and adaptation are necessary.
Risk Management: Identifying and mitigating risks associated with AI systems, such as biases and security vulnerabilities, requires robust oversight and auditing mechanisms. IBM watsonx.governance’s risk management capabilities can help address these challenges.
Cultural Resistance: Promoting advocacy and adoption of AI within the organization may face resistance. Continuous education and collaboration are essential to overcome this barrier.
Scalability: Ensuring that AI governance processes can scale with the growth of AI initiatives is crucial. IBM watsonx.governance offers lifecycle governance tools to manage this scalability effectively.

Connecting IBM watsonx.governance to the PACE Framework

IBM watsonx.governance offers several features that align perfectly with the principles of the PACE framework:

Policies: IBM watsonx.governance helps define and enforce corporate guidelines for AI usage through automated compliance management tools. These tools simplify the identification of regulatory changes and translate them into enforceable policies, ensuring that AI systems adhere to established standards.
Advocacy: The platform supports continuous education and collaboration by providing insights and metrics that can be shared across the organization. This fosters a culture of understanding and adoption of AI, aligning with the advocacy component of the PACE framework.
Controls: IBM watsonx.governance offers robust risk management capabilities, including automated risk metrics and bias detection tools. These features enable enterprises to conduct thorough audits and maintain oversight of AI systems, ensuring they operate within acceptable risk parameters.
Enablement: The platform provides lifecycle governance tools that monitor and manage the complete AI lifecycle, from model selection to deployment, monitoring, and replacement. This ensures that technology teams have the necessary resources and support to innovate responsibly.

Measuring Success with the PACE Framework

Success in implementing the PACE framework with IBM watsonx.governance can be measured through several key indicators:

Compliance and Risk Metrics: Monitoring compliance with ethical standards and regulatory requirements, as well as tracking risk metrics to ensure AI systems are secure and reliable.
Stakeholder Engagement: Assessing the level of engagement and understanding among stakeholders, including feedback from continuous education initiatives.
Operational Efficiency: Evaluating improvements in operational efficiency and innovation resulting from AI integration.
Business Impact: Measuring the tangible business impact, such as revenue growth, cost savings, and customer satisfaction, resulting from AI initiatives.

By integrating Perficient’s PACE framework with IBM watsonx.governance, large enterprises can confidently embrace AI, driving innovation while ensuring responsible and ethical AI usage. This combined approach not only mitigates risks but also accelerates the adoption of AI, paving the way for a transformative impact on business operations and customer experiences.

]]>

Is it really DeepSeek FTW?

Thu, 30 Jan 2025 14:55:53 +0000

So, DeepSeek just dropped their latest AI models, and while it’s exciting, there are some cautions to consider. Because of the US export controls around advanced hardware, DeepSeek has been operating under a set of unique constraints that have forced them to get creative in their approach. This creativity seems to have yielded real progress in reducing the amount of hardware required for training high-end models in reasonable timeframes and for inferencing off those same models. If reality bears out the claims, this could be a sea change in the monetary and environmental costs of training and hosting LLMs.

In addition to the increased efficiency, DeepSeek’s R1 model is continuing to swell the innovation curve around reasoning models. Models that follow this emerging chain of thought paradigm in their responses, providing an explanation of their thinking first and then summarizing into an answer, are providing a step change in response quality. Especially when paired with RAG and a library of tools or actions in an agentic framework, baking this emerging pattern into the models instead of including it in the prompt is a serious innovation. We’re going to see even more open-source model vendors follow OpenAI and DeepSeek in this.

Key Considerations

One of the key factors in considering the adoption of DeepSeek models will be data residency requirements for your business. For now, self-managed private hosting is the only option for maintaining full US, EU, or UK data residency with these new DeepSeek models (the most common needs for our clients). The same export restrictions limiting the hardware available to DeepSeek have also prevented OpenAI from offering their full services with comprehensive Chinese data residency. This makes DeepSeek a compelling offering for businesses needing an option within China. It’s yet to be seen if the hyperscalers or other providers will offer DeepSeek models on their platforms (Before I managed to get his published, Microsoft made a move and is offering DeepSeek-R1 in Azure AI Foundry). The good news is that the models are highly efficient, and self-image hosting is feasible and not overly expensive for inferencing with these models. The downside is managing provisioned capacity when workloads can be uneven, which is why pay-per-token models are often the most cost efficient.

We are expecting that these new models and the reduced prices associated with them will have serious downward pressure on per-token costs for other models hosted by the hyperscalers. We’ll be paying specific attention to Microsoft as they are continuing to diversify their offerings beyond OpenAI, especially with their decision to make DeepSeek-R1 available. We also expect to see US-based firms replicate DeepSeek’s successes, especially given that Hugging Face has already started work within their Open R1 project to take the research behind DeepSeek’s announcements and make it fully open source.

What to Do Now

This is a definite leap forward and progress in the direction of what we have long said is the destination—more and smaller models targeted at specific use cases. For now, when looking at our clients, we advise a healthy dose of “wait and see.” As has been the case for the last three years, this technology is evolving rapidly, and we expect there to be further developments in the near future from other vendors. Our perpetual reminder to our clients is that security and privacy always outweigh marginal cost savings in the long run.

The comprehensive FAQ from Stratechery is a great resource for more information.

]]>

Unlocking Specialized AI: IBM’s InstructLab and the Future of Fine-Tuned Models — IBM Think 2024

Tue, 28 May 2024 18:36:46 +0000

I’ve been reflecting on my experience last week at IBM Think. As ever, it feels good to get back to my roots and see familiar faces and platforms. What struck me, though, was the unfamiliar. Seeing AWS, Microsoft, Salesforce, Adobe, SAP, and Oracle all manning booths at IBM’s big show was jarring, as it’s almost unheard of. It’s a testament to my current rallying cry for prioritizing the focus on how to make a diversity of platforms work better together by making data flow all directions, with minimal effort. I see many partners focusing in on this by supporting a diversity of data integration patterns in zero copy or zero elt patterns (a recurring theme, thank you Salesforce). In this environment of radical collaboration, I think something really compelling might’ve gotten lost… a little open source project they launched called InstructLab.

IBM spent a lot of time talking about how now is the time to SCALE your investments in AI, how it’s time to get out of the lab and into production. At the same time, there was a focus on fit for purpose AI, using the smallest, leanest model possible to achieve the goal you set.

Think Big. Start Small. Move Fast.

I always come back to one of our favorite mantras, Think Big. Start Small. Move Fast. What that means here is that we have this opportunity to thread the needle. It’s not about going from the lab to the enterprise-wide rollouts in one move. It’s about identifying the right, most valuable use cases and building tailored, highly effective solutions for them. You get lots of fast little wins that way, instead of hoping for general 10% productivity gains across the board, you’re getting 70+% productivity gain on specific measurable tasks.

This is where we get back to InstructLab, a model- agnostic open source AI project created to enhance LLMs. . We’ve seen over and over that general-purpose LLMs perform well for general-purpose tasks, but when you ask them to do something specialized, you’re getting intern in their first week results. The idea of InstructLab is to be able to track a taxonomy of knowledge and task domains, choose a foundation model that’s trained on the most relevant branches of the taxonomy, then add additional domain-specific tuning with a machine-amplified training data set. This opens the door to effective fine tuning. We’ve been advising against it because most enterprises just don’t have enough data to move the needle and make the necessary infrastructure spend for the model retraining to be worth it. With the InstructLab approach, we can, as we so often do in AI, borrow an idea from Biology–amplification. We use an adversarial approach to amplify a not-big-enough training set by adding additional synthetic entries that follow the patterns in the sample.

The cool thing here is that, because IBM chose the Apache 2 license for everything, they’ve open sourced, including Granite, it’s now possible to use InstructLab to train new models with Granite models as foundations, and decide to keep it private or open source it and share it with the world. This could be the start of a new ecosystem of trustable open-source models that have been trained for very specific tasks that meet the demands of our favorite mantra.

Move Faster Today

Whether your business is just starting its AI journey or seeking to enhance its current efforts, partnering with the right service provider makes all the difference. With a team of over 300 AI professionals, Perficient has extensive knowledge and skills across various AI domains. Learn more about how Perficient can help your organization harness the power of emerging technologies- contact us today.

]]>

ELT IS DEAD. LONG LIVE ZERO COPY.

Mon, 29 Apr 2024 16:31:26 +0000

Imagine a world where we can skip Extract and Load, just do our data Transformations connecting directly to sources no matter what data platform you use?

Salesforce has taken significant steps over the last 2 years with Data Cloud to streamline how you get data in and out of their platform and we’re excited to see other vendors follow their lead. They’ve gone to the next level today by announcing their more comprehensive Zero Copy Partner Network.

By using industry standards, like Apache Iceberg, as the base layer, it means it’s easy for ALL data ecosystems to interoperate with Salesforce. We can finally make progress in achieving the dream of every master data manager, a world where the golden record can be constructed from the actual source of truth directly, without needing to rely on copies.

This is also a massive step forward for our clients as they mature into real DataOps and continue beyond to full site reliability engineering operational patterns for their data estates. Fewer copies of data mean increased pipeline reliability, data trustability, and data velocity.

This new model is especially important for our clients when they choose a heterogeneous ecosystem combining tools from many partners (maybe using Adobe for DXP and marking automation, and Salesforce for sales and service) they struggle to build consistent predictive models that can power them all—their customers end up getting different personalization from different channels. When we can bring all the data together in the Lakehouse faster and simpler, it makes it possible to build one model that can be consumed by all platforms. This efficiency is critical to the practicality of adopting AI at scale.

Perficient is unique in our depth and history with Data + Intelligence, and our diversity of partners. Salesforce’s “better together” approach is aligned precisely with our normal way of working. If you use Snowflake, RedShift, Synapse, Databricks, or Big Query, we have the right experience to help you make better decisions faster with Salesforce Data Cloud.

]]>

IBM Launches Watsonx: A New Platform for Foundation Models and Generative AI

Wed, 10 May 2023 14:50:22 +0000

This week, IBM announced a new AI and data platform, Watsonx, at its annual Think conference. Watsonx is designed to help enterprises scale and accelerate the impact of the most advanced AI with trusted data. Watsonx will offer a studio, a data store, and a governance toolkit for building, tuning, and deploying foundation models and generative AI across any cloud environment.

What are Foundation Models and Generative AI?

Foundation models are large-scale AI models that can learn from massive amounts of data and perform multiple tasks across different domains. They can be adapted and fine-tuned for specific use cases with less data and computational resources than training a model from scratch. Generative AI is a branch of AI that can create new content or data, such as text, images, code, or music, based on existing data or inputs.

What are the benefits of Watsonx?

Watsonx aims to make AI more accessible, scalable, and trustworthy for enterprises. With Watsonx, clients will be able to:

Access a variety of foundation models and open-source models curated and trained by IBM for different purposes, such as natural language understanding, code generation, chemical synthesis, or climate change modeling.
Use a data store to gather and cleanse training and tuning data from various sources and formats.
Use a governance toolkit to monitor, explain, secure, and audit their AI models and data throughout the lifecycle.
Train and deploy custom AI capabilities across their entire business with speed and confidence.
Collaborate with IBM Consulting and Hugging Face, an open-source AI software development hub, to leverage the best of enterprise-grade and community-driven AI.

Most exciting is that this is completely new work out of IBM Research and, when available in July, is meant to have options for SaaS on AWS or deploy anywhere on Openshift.

What Makes the New Generative AI and Foundation Model-infused Watson Products Unique to the Market?

We’ve been anticipating the prospect of using generative models to automate menial tasks and accelerate complex ones, and this announcement holds great potential. IBM also announced new Watson products that will leverage foundation models and generative AI to address these various business challenges, such as:

Watson Code: A product that can automatically generate code from natural language specifications or existing code snippets.
Watson AIOps: A product that can automate IT operations and incident management using natural language processing and anomaly detection.
Watson Digital Labor: A product that can augment human workers with intelligent automation and conversational agents.
Watson Security: A product that can enhance cybersecurity with threat detection and response using natural language understanding and anomaly detection.
Watson Sustainability: A product that can help measure, track, manage, and report on cloud carbon emissions using an AI-powered dashboard.

However, what makes IBM unique is that this platform will let you tune the model using your own code repository, critically, without expositing that proprietary information to a shared model, meaning you don’t even need to let that data leave your secure network. The benefit of tuning we find, in this case, is getting suggestions that match your code standards and proprietary libraries and methods. The ability to tune foundation models while maintaining an appropriate risk and security posture continues IBM’s Emphasis on highly regulated industries and will be revolutionary.

How can I get started with Watsonx?

We’re looking forward to getting access to the tech preview and trying it out ourselves. This is a significant addition to our repertoire of generative AI tools and adds one more option to make this technology practical for the real world and a large variety of industries. Watsonx is expected to be released later this year, and you can sign up for early access at watsonx.ai. You can also join the IBM Think conference to learn more about Watsonx and other IBM innovations at ibm.com/think.

]]>

Transform Your Business with Amazon DataZone

Mon, 13 Feb 2023 15:35:48 +0000

Amazon recently released a new data tool called DataZone, which allows companies to share, search, and discover data at scale across organizational boundaries. It offers many features such as the ability to search for published data, request access, collaborate with teams through data assets, manage and monitor data assets across projects, access analytics with a personalized view for data assets through a web-based application or API, and manage and govern data access in accordance with your organization’s security regulations from a single place.

DataZone may be helpful for IT leaders because it enables them to empower their business users to make data-driven decisions and easily access data both within and outside their organization. With DataZone, users can search for and access data they need quickly and easily while also ensuring the necessary governance and access control. Additionally, DataZone makes it easier to discover, prepare, transform, analyze, and visualize data with its web-based application.

Implementation of DataZone can vary depending on the organization and its existing governance policies. If your data governance is already in place, implementation of DataZone may take only a few months. However, if governance needs to be established and implemented, it will take much longer and require significant organizational changes.

While it may seem obvious, DataZone is not a magic solution to all your data problems. Simply having a tool is not enough. Deciding to move forward with any data marketplace solution requires a shared responsibility model and governance across multiple channels and teams. We’ve Seen many companies fail to adopt the full use of data marketplaces due to lack of adoption by the business.

Ultimately, DataZone can be an invaluable tool for IT leaders looking to empower their business to access data quickly and easily within and outside their organization while adhering to necessary governance and access control policies. With the help of the automated data harvesters, stewards, and AI, DataZone makes data not just accessible but also available, allowing businesses to make use of it when making decisions.

With our “VP of IT’s Guide to Transforming Your Business,” IT leaders can gain the insights they need to successfully implement the latest data-driven solutions, such as DataZone. Download it for free today to get the answers you need to unlock the full potential of your data investments and drive your business forward with data-driven decisions.

]]>

5 Commonly Asked Questions About Intrinsic Bias in AI/ML Models in Healthcare

Tue, 19 Jul 2022 09:08:50 +0000

Healthcare organizations play a key role in offering access to care, motivating skilled workers, and acting as social safety nets in their communities. They, along with life sciences organizations, serve on the front lines of addressing health equity.

With a decade of experience in data content and knowledge, specializing in document processing, AI solutions, and natural language solutions, I strive to apply my technical and industry expertise to the top-of-mind issue of diversity, equity, and inclusion in healthcare.

Here are five questions that I hear commonly in my line of work:

1. What is the digital divide, and how does it impact healthcare consumers?

There are still too many people in this country who don’t have reliable access to computing devices and the internet in their homes. If we think back to the beginning of the pandemic, we can see this in sharp relief. The number one impediment to the shift to virtual school was that kids didn’t have devices or reliable internet at home.

We also saw quite clearly that the divide is disproportionately impacting low income people in disadvantaged neighborhoods.

The problem is both affordability and access.

The result, through a healthcare lens, is that people without reliable access to the internet have less access to information they can use to manage their health.

They are less able to find a doctor who’s a good fit for them. Their access to information about their insurance policy and what is covered is more restricted. They are less able to access telehealth services and see a provider from home.

All this compounds because we’re using digital and internet-connected tools to improve healthcare and outcomes for patients. But ultimately, the digital divide means we’re achieving marginal gains for the populations with the best outcomes already and not getting significant gains from the populations that need support the most.

2. How can organizations maintain an ethical stance while using AI/ML in healthcare?

Focus on intrinsic bias, the subconscious stereotypes that affect the way individuals make decisions. People have intrinsic biases picked up from their environment that require conscious acknowledgement and attention. Machine learning models also pick up these biases. This happens because models are trained on data about historical human decisions, so the human biases come through (and can even be amplified). It’s critical to understand where a model comes from, how it was trained, and why it was created before using it.

Ethical use of AI/ML in healthcare requires careful attention to detail and, often, human review of machine decisions in order to build trust.

3. How can HCOs manage inherent bias in data? Is it possible to eliminate it?

At this point, we’re working to manage bias, not eliminate it. This is most critical for training machine learning models and correctly interpreting the results. We generally recommend using appropriate tools to help detect bias in model predictions and to use those detections to drive retraining and repredicting.

Here are some of the simplest tools in our arsenal:

Flip the offending parameter and try again.
Determine if the model would have made a different prediction if the person was white and male.
Use that additional data point to advise a human on their decision.

For healthcare in particular, the human in the loop is critically important. There are some cases where membership in a protected class changes a prediction because it acts as a proxy for key genetic factor (man or woman, white or Black). The computer can easily correct for bias when reviewing a loan application. However, when evaluating heart attack risk, there are specific health factors that can be predicted by race or gender.

4. Why is it important to educate data scientists in this area?

Data scientists need to be aware of potential issues and omit protected class information from model training sets whenever possible. This is very difficult to do in healthcare, because that information can be used to predict outcomes.

The data scientist needs to understand the likelihood that there will be a problem and be trained to recognize problematic patterns. This is also why it’s very important for data scientists to have some understanding of the medical or scientific domain about which they’re building a model.

They need to understand the context of the data they’re using and the predictions they’re making to understand if protected classes driving outcomes is expected or unexpected.

5: What tools are available to identify bias in AI/ML models and how can an organization choose the right tool?

Tools like IBM OpenScale, Amazon Sagemaker Clarify, Google What-if and Microsoft Fairlearn are a great starting point in terms of detecting bias in models during training, and some can do so at runtime (including the ability to make corrections or identify changes in model behavior over time). These tools that enable both bias detection and model explainability and observability are critical to bringing AI/ML into live clinical and non-clinical healthcare settings.

EXPLORE NOW: Diversity, Equity & Inclusion (DE&I) in Healthcare

Healthcare Leaders Turn to Us

Perficient is dedicated to enabling organizations to elevate diversity, equity, and inclusion within their companies. Our healthcare practice is comprised of experts who understand the unique challenges facing the industry. The 10 largest health systems and 10 largest health insurers in the U.S. have counted on us to support their end-to-end digital success. Modern Healthcare has also recognized us as the fourth largest healthcare IT consulting firm.

We bring pragmatic, strategically-grounded know-how to our clients’ initiatives. And our work gets attention – not only by industry groups that recognize and award our work but also by top technology partners that know our teams will reliably deliver complex, game-changing implementations. Most importantly, our clients demonstrate their trust in us by partnering with us again and again. We are incredibly proud of our 90% repeat business rate because it represents the trust and collaborative culture that we work so hard to build every day within our teams and with every client.

With more than 20 years of experience in the healthcare industry, Perficient is a trusted, end-to-end, global digital consultancy. Contact us to learn how we can help you plan and implement a successful DE&I initiative for your organization.

]]>

Don’t Panic: Log4Shell

Wed, 15 Dec 2021 23:40:02 +0000

2021-12-29: Updates related to CVE-2021-44832

2021-12-20: Updates related to CVE-2021-45105

We, like many of you, have spent our weekend and the start of this week triaging the risks posed by CVE-2021-44228 (aka Log4Shell).

As we’ve been checking and patching our internal systems, products and client systems, some colleagues and I have made a few observations we wanted to share.

First things first, though. If you are concerned about the risks posed by this log4j vulnerability we recommend making sure that you’ve patched vendor products according to vendor instructions and ensure that all your systems are updated to use log4j ~~2.16.02.17.0~~ 2.17.1or newer. Also, here are some other resources from trusted sources and partners that we find helpful:

A couple other technical notes…

UPDATED 2021-12-20 29: There are several situations where the fixes in 2.15.0, 2.16.0 and 2.17.0 are incomplete as described in CVE-2021-45046, CVE-2021-45105 and CVE-2021-44832, we recommend using 2.16.0 2.17.0 2.17.1 wherever possible. It was previously thought that CVE-2021-45046 was medium to low risk, it has been upgraded to critical. The new vulnerability, CVE-2021-45105, is also a high risk issue and should be patched as soon as possible. Therefore, we recommend upgrading to 2.17.1 even if you had previously upgraded to 2.15.0, 2.16.0 or 2.17.0. This is a dynamic situation, and this may change again as more thorough testing continues to be done.
log4j 1.2.x has a similar but more limited vulnerability and no patch will be available as that version of log4j is past end of life. The CVSS 3.0 score for CVE-2021-4104 is still pending, but we do expect the risk level to be lower given the limited code paths that appear to be affected, but there is still some uncertainty here. More information should be available soon, but the TL;DR is that you should be okay if you don’t use JMSAppenders, but this may yet evolve.
Remember that log4j may not be a direct dependency of your code, it may be a dependency of a dependency or deeper down the dependency tree. It’s important to use your dependency management tool (i.e. Maven, Gradle, etc.) to check the full tree for references to log4j.
The vulnerability itself is fairly straight forward and shockingly easy to exploit. It uses the magic of JNDI lookups, a feature in Java that allows you to transparently interact with a variety of directory and naming services. In this case, the exploit uses LDAP (Lightweight Directory Access Protocol) and a little trickery to get the JNDI lookup to pull what it thinks is an object from an LDAP server via a redirect to an https endpoint but is really code that will be executed on the vulnerable server. The risk here is that any place in an application for which a user can pass in a textual input and that text is logged as-is, will be susceptible to having any arbitrary code passed to it and executed by the JNDI lookup in that text input.
NEW 2021-12-29: Per CVE-2021-44832 the same issue can be exploited, although with more difficulty (the attacker must first get access to the log4j configuration file), when using 2.17.0 with the JDBCAppender (i.e. writing logs to a database). If you’re not using this functionality, there should be no risk related to this new vulnerability. Unlike the JMSAppender noted for the log4j 1.2.x risk, however, the JDBCAppender is widely used in log4j 2.x. This problem is interesting because it’s an additional layer of JNDI lookup capability that exists to allow log4j to lookup the connection information for the intended database from the local application context, or possibly, a remote service. This use of JNDI lookup is commonly used to avoid hard-coded database connection information in code and configuration files (as well as centralizing the storage of credentials). The fixed jar is simply changing the default behavior and allowing the prior behavior to be enabled with a setting. This is the sort of change that may require more rework if the JDBCAppender needs to be used with JNDI lookups to make it safe.

Now on to the interesting part. As we’ve been discussing this internally, we realized a few things about the risk posed by the log4shell vulnerability:

There is an industry-wide tendency to shrug off these kinds of risks as “only a problem for internet-facing systems.” There are two key ways that this thinking is flawed. First, perfectly preventing a malicious actor from accessing your network is impossible. Not only are firewalls not perfect, but social engineering and angry employees can circumvent them entirely. Second, this sort of vulnerability can be exploited by cascading an offending string from system to system via service calls down the chain until it finds one that is vulnerable.
Every software vendor is taking a different approach. Even the ones that agree that swapping out the jars is the right move don’t agree on how they want to deliver that to customers. We find that it is important to take the time to assess each vendor’s approach and follow their instructions carefully to ensure you don’t create more problems than you solve. That said, patience is required to allow vendors the time to not only identify which products are vulnerable and release a fix, but also to test that fix and ensure it doesn’t cause problematic regressions.
It is not at all surprising how widespread this is, but it is surprising that it was found because of Minecraft. It turns out that Minecraft is written in Java and uses log4j2. It appears that Minecraft servers were logging, by default, all the messages typed into the chat. This meant that anyone could put a malicious JNDI lookup into the chat on a Minecraft multiplayer server and exploit this vulnerability. Here’s a great video showing how it works, including how to get as far as getting a remote shell on a Minecraft server.

If you need help either to work through remediations or devise a strategy to optimize response next time (because there certainly will be a next time), as your trusted partner and advisor, we’re ready, willing, and able to help.

Just don’t forget your towel.

]]>

Divorcing Virtual Paper

Mon, 18 Oct 2021 13:00:51 +0000

I wanted to expand a little on one of the core themes of a perspective I just published, Demise of the Document, the idea of Virtual Paper.

It’s a concept that haunts me even as I’m drafting this post using Microsoft Word…

A colleague did point out to me the irony of a think-piece titled Demise of the Document, lamenting our ongoing attachment to Virtual Paper, being distributed as Virtual Paper (a PDF).

Everywhere we look we’re greeted with this metaphor of a sheet of paper that never has and never will exist. There’s certainly something comforting about seeing the familiar layout and knowing that, if I really wanted, I could print this out confident of the layout. At the same time, it’s starting to feel insidious. It’s infecting the way we think about digital content, restricting our ability to be creative in the way we understand and present our ideas to the world. But why do we still base our choices around those assumptions?

In Demise of the Document I didn’t spend much time talking about the future of authoring and that was intentional. I think that there’s too much uncertainty about where it will go and how it will evolve. I am excited, though, by the ideas present in Microsoft’s Fluid Framework, I even wrote an early draft of Demise of the Document using it. I found, though, that it’s not quite ready for everything I need. I switched my writing back to MS Word not to get the layout right, but to have track changes and comments in the way we’re all accustomed. So why do we still fall back to the metaphor of Virtual Paper time and again? Because we don’t yet have access to truly natively digital tools for producing and consuming content that provide us with the lifestyle to which we’ve become accustomed. Our divorce from Virtual Paper will be long and messy, but it is inevitable.

]]>

IBM Think 2021 Key Takeaways

Fri, 14 May 2021 22:09:30 +0000

IBM Think 2021 wrapped up earlier this week. While it was virtual again this year, there was certainly no shortage of valuable information and key takeaways.

One of the biggest themes was automation. IBM is looking to accelerate automation in ways that are directly visible to business users and have introduced two new products – Watson Orchestrate and new Cloud Pak for Data capabilities. Additionally, application modernization appears that it will continue to be a major story for the next decade or more as well.

Here are our big takeaways from IBM Think this year.

Automation was Everywhere at IBM Think

IBM announced Watson Orchestrate and two new capabilities for Cloud Pak for Data that will play into automation.

Watson Orchestrate

Watson Orchestrate will bring together best-in-class natural language processing with IBM’s integration and business automation platforms to enable conversational interactions for users to get their work done efficiently.

Imagine, as a sales person, talking to a chat bot and asking it to create an opportunity in your CRM for a particular client and to generate an associated quote from the standard price list. Then the chat bot automatically suggesting other products based on a propensity model. Imagine being able to build all this with low-code tools and to deploy it anywhere with the power of OpenShift.

Cloud Pak for Data – Intelligent Data Fabric and Palantir

The Cloud Pak for Data has two new capabilities, Intelligent Data Fabric and Palantir, to add automation to the tedious process of establishing data governance and lineage.

In the intelligent Data Fabric, Auto SQL enables users to write one SQL statement that can intelligently query any data source (whether it’s a Data Lake, the built-in Data Warehouse, an external Data Warehouse or some other database). It uses smart caching and AI-powered query translation to push down query execution to the remote systems and efficiently return a complete result. AutoCatalog and AutoPrivacy sit over the top and enable the platform to automatically catalog, classify and understand the data sources connected to the system and use AI to apply privacy rules to dynamically mask sensitive data.

Palantir can consume the data catalog entities as business objects, understand and represent the relationships between those objects and apply WatsonML predictive models to them. All together this accelerates the ability to build AI-powered customer 360, propensity dashboards, and other kinds of fully integrated Enterprise Knowledge applications. IBM is taking a stand against Informatica, Databricks and Snowflake with these new offerings and the ongoing work to modernize their core data platform offerings (db2 and DataStage).

Application Modernization is Still Key

Application modernization will continue to be a major story for the next decade or more. Renewed focus on full edge to core modernization using an open-core platform based around OpenShift and Edge Application Manager (Kubernetes and OpenHorizon). The combined force of these platforms is enabling IBM to modernize it’s own product offerings at a breakneck pace. They also continue to divest where the investment in modernization is not sustainable and are adopting open source replacements in places where it makes sense. IBM’s new vision of the future brings it’s best-in-class enterprise software together with open source innovation via the Cloud Pak offerings.

More Business Automation

We were fortunate enough to be invited to share our business automation and content management expertise at IBM Think this year along with our partner IBM and our customer PayPal. If you haven’t checked it out yet, catch the on-demand video session here (registration required).

The session covers how we’re helping organizations enhance their FileNet content environments with IBM Cloud Pak for Business Automation to automate the use of enterprise content with easy to use tools and intelligent services. Check it out!

Delivering top line growth: A journey to unlocking siloed data with automation

On-Demand Session #1893

]]>

Cloud Pak and IBM Automation Document Processing: It’s more than AI and ML

Wed, 04 Nov 2020 13:30:25 +0000

As our clients continue their digital transformation journeys, challenges with traditional document capture solutions are coming to the forefront. Managing and configuring classification sample sets and extraction rules in layout-driven, legacy solutions, is time consuming and expensive. Modern digital businesses require systems that can continue to provide accurate results as forms and businesses evolve. Existing solutions cannot adapt automatically to minor changes requiring application development or engineering intervention for reconfiguration.

Machine learning and AI to the rescue, right?

IBM Automation Document Processing

Most AI solutions are either narrowly targeted to one kind of document (such as ID cards, invoices, or shipping labels) or require significant development effort to wire together the models, the repository, and the user interface. IBM is leading the way with a new fully integrated, deploy anywhere, configuration-driven solution.

The new document processing capability for the IBM Cloud Pak for Automation is a new way of thinking about document capture and data verification.

Imagine a bank receiving and storing bundles of documents related to the opening of a loan. A loan package will often contain a scan of the signed note itself, ID cards, income verification documents, several disclosure forms and supporting documentation like pay stubs, bank statements and the like. These documents might arrive as a single PDF, a disorganized pile of images, pieces of physical paper, or a mix of the above. The documents need to be captured, classified, and indexed with relevant data from each document, then rendezvous with a workflow or case already in progress. At Perficient, we have been helping our customers solve this kind of problem for decades, but it often presented a few specific challenges.

Document classification
Historically, it has been difficult, time-consuming, and expensive to achieve the vision of accurate automated classification. Traditional approaches required fixed form layout, fixed keywords, barcodes/patch codes or general text matching. These kinds of tools are extremely sensitive to changing document sources and structures. Thinking about that loan package, tax returns and W2s are standard forms with standard layouts that change rarely.
A traditional capture solution might be trained to recognize the layout, however even a minor change year to year could break that classification model. With the machine learning models embedded in IBM Automation Document Processing your classification model will be more resilient to minor changes in form layout. Not only is it pretrained on common document types, the new product makes it easy for a business user to extend the training using custom sets of samples.
Data extraction
While text recognition technologies have continued to evolve and improve, traditional capture technologies have not progressed in their ability to make sense of that text. The continuing dependence on structured form extraction and assumptions about each document’s layout causes similar problems to my earlier classification example. However, certain document types, like the note itself in our loan example, may have inconsistent layouts. Supporting documents like banks statements, payroll statements and disclosures will vary significantly within a single loan package. With these kinds of documents, we may not know where a piece of information may appear, or even if it is present. Nevertheless, we just want to extract the right information if we do find it. IBM’s new tools come with pretrained deep learning models designed to easily find hundreds of common key-value pairs. In the model setup wizard, a business analyst can add more fields and with a small set of samples, train the system to extract those as well.

So, I hear you saying “This still sounds like machine learning (ML) and AI to the rescue! Why is this different?”

A Cloud Pak for Automation Integrated Solution

IBM has integrated this solution into Cloud Pak for Automation platform via the Business Automation Studio low-code designer. With IBM Automation Document Processing, you have a simple, business analyst friendly wizard that walks through training the classification and extraction models. The wizard enables the user to set up data validation rules (dates, phone numbers, confidence levels) and simultaneously maps that configuration into new or existing document classes and property templates in an IBM FileNet Content Manager repository. These low-code tools enable that same business analyst to design, test and deploy an intuitive user experience for validating classification and extraction results in real-time. This integration also means that these capabilities can be used directly from a Case or Workflow Solution.

Getting back to our example…

A potential borrower goes to the bank’s website and indicates their intent to apply for a loan. They fill out some information, attach a few documents, and a case is started. IBM Automation Document Processing classifies and extracts metadata from the documents, the decision service compares the information provided in the form to the data in the attachments and either automatically rejects the loan, sends a request to the applicant to supply more information or routes the case to a loan officer for further review.

IBM Automation Document Processing capabilities provide the system the necessary data to:

Automatically check that the applicant entered their income correctly on the application.
Validate that the requested loan amount is under the absolute limit for that income level.
Immediately check the credit report of the applicant.
Automatically request more information from the applicant

…and quite a bit more, all before a human is asked to review anything.

Once the loan is approved, the system can also determine if the executed note:

Is tagged to the correct loan number.
Has all the right boxes checked.

Each of these tasks can be completed without human intervention.

Delivering On The Unmet Promise

As you can see, IBM’s integrated and comprehensive approach to applying AI-led automation to document processing tasks is essential to meet the requirements of dynamic, information intensive uses cases. Because it is coupled to an industry leading content services and digital process automation platform, IBM Automation Document Processing enables more rapid delivery of information to automated business processes. These capabilities deliver on the unmet promise of document capture solutions with a modern and readily deployable package.

]]>

Kick Start your Digital Transformation with IBM Cloud Pak for Automation

Thu, 16 Apr 2020 15:10:20 +0000

This is the third installment in our series on IBM Cloud Paks. Read the first post and the second post.

With IBM’s acquisition of Red Hat last summer, there has been a monumental shift underway at Big Blue. IBM is working rapidly to modernize products up and down its software portfolio. Some of the biggest changes have been in the Workflow (BPM) and Content (ECM) spaces, now collectively dubbed Digital Automation.

The new IBM Cloud Pak model is, at its core, a new way of bundling and buying IBM software. In this model, you can trade up your existing parts and keep all your current bundled entitlements (combining several products to a single part number in many cases) and gain access to new bundled entitlement for Red Hat OpenShift. In addition, there are certain new capabilities and features only available in the Cloud Pak licensing model, and this is especially important to note for the Automation space.

Cloud Pak for Automation includes the legacy core of the Automation platform:

FileNet Content Manager
Operational Decision Manager
Business Automation Workflow (the merged Case and BPM offering)
Datacap
Enterprise Records
Content Collector

It also includes several new capabilities, only available with the Cloud Pak:

Content Manager and Operational Decision Manager Kubernetes/OpenShift Deployment
Business Automation Content Analyzer
Automation Digital Worker
Automation Workstream Services
Business Automation Insight
Business Automation Studio
Business Automation Application Designer

These new capabilities enable new pathways for low-code application design and work management.

Introducing Digital Worker

Digital Worker is a new conceptual framework for automating work and a technical framework for integrating a variety of intelligent services to carry out those goals. You start at the business level, defining your real workers and their real job activities. From there, the technical teams can choose which tasks to try to automate. Digital Worker can hook into existing workflows (or any API) or automate new workflows from scratch.

For instance, Digital Worker can act as a member of a team in a BPM process application and try to process work assigned to that team in an automated way. It can do this using intelligent tools like Business Automation Content Analyzer to automatically extract data from documents, use any of the many Watson AI services to understand context, and use Operational Decision Manager to confirm that guardrails have not been violated. Once it finishes processing, it can either automatically process the work or release it back to the queue for manual intervention. The Digital Worker thus becomes another member of the team.

Machine learning in Business Automation Content Analyzer

Business Automation Content Analyzer (mentioned above) allows for quick and easy configuration-driven data extraction from documents. The built-in machine learning model is designed to take in an ontology that defines various document classifications and fields for key-value pair extraction. You can supply samples of each class to the model to enhance the classification training or you can train a new model from scratch. The browser-based interface is designed to allow an analyst to configure and test the models and ontologies. It is designed to run as a microservice. Documents are posted to a secure REST endpoint and results are returned asynchronously. This service is available to deploy anywhere as part of the Cloud Pak for Automation and as a stand-alone Cloud API.

New capability for non-technical users

Workstream Services is a new capability to allow non-technical users to design new process flows for simple tasks in a no-code way while enabling full governance of what is in production. It supports the concept of Checklist, Form, and Approval tasks within a workstream. Workstream definitions all have a lifecycle, from draft to test to published and finally, archived. An approval gate can be placed at the publish phase to allow only authorized users to publish new workstream definitions, allowing other users to simply make a publish request.

Imagine a business user who is trying to get a handle on their work. They receive hundreds of requests per week via email for updates to a primary inventory because the inventory system does not have native change control processes. Workstream would enable that user to define their own process, creating a form for people to complete and supply all the information required, an approval step for his manager, followed by a checklist to remind himself of the steps to complete the work. Because Workstreams provides this all in a no-code way, directly from Business Automation Navigator, this user can easily do it all himself, submit his work for review by the IT team to ensure it meets enterprise standards, and get it published to start him on a path to automation. Later, if called for, a Digital Worker can be created to help take more load off this worker’s plate.

Benefits of these new capabilities

The big benefit of these new capabilities is that there is no need to leave behind your current investment to take advantage of new options. The new capabilities, where appropriate, integrate fully with the legacy core platform. Documents all live in FileNet with its robust and proven security models. Existing BPM and Case workloads will continue to be supported and can be enhanced with the new Digital Workers. Some of the legacy core can even be shifted to OpenShift or whatever certified Kubernetes platform you prefer, in your datacenter or the cloud or both. Part of this new flexibility is that Cloud Paks are licensed per Virtual Processor Core (VPC) – when you buy a Cloud Pak you get a pool of VPCs to allocate to the various components of the Pak, and as your needs change, you can transfer your entitlement dynamically.

IBM is working rapidly to enhance and improve these offerings and even we are struggling to keep up with their pace of change. This is both new and refreshing from IBM, but it does call for caution from our clients. Perficient’s experience with IBM’s legacy core products and as one of the first partners to bring a client live with the new Kubernetes support for FileNet Content Manager makes us the ideal partner to help you on your journey.

]]>