Research and Studies Articles / Blogs / Perficient https://blogs.perficient.com/category/research-and-studies/ Expert Digital Insights Fri, 17 Jan 2025 14:50:11 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png Research and Studies Articles / Blogs / Perficient https://blogs.perficient.com/category/research-and-studies/ 32 32 30508587 Perficient’s Unique Industry Focus Continues to Capture Recognition in the Utilities Space https://blogs.perficient.com/2025/01/17/perficients-unique-industry-focus-continues-to-capture-recognition-in-the-utilities-space/ https://blogs.perficient.com/2025/01/17/perficients-unique-industry-focus-continues-to-capture-recognition-in-the-utilities-space/#respond Fri, 17 Jan 2025 14:50:11 +0000 https://blogs.perficient.com/?p=372784

Severe unpredictable weather, unprecedented power draw from cutting-edge technologies, and intensifying governmental and regulatory pressures have created a turbulent landscape for power and utilities providers. There is now significant concern regarding grid reliability, from resiliency to keeping up with energy consumption. Coupled with aging information technology systems and shifting consumer expectations, many utilities companies are scrambling to keep up with the fast-paced changes impacting their operations and growth.

At Perficient, we help these organizations not only pick up the pace but gain a competitive advantage. Our experts are well-versed in the many challenges facing the energy and utilities industry due to years of client-side experience, first-party research, memberships at industry associations, and being trusted advisors at many leading utilities companies. Whether it’s grid and IT modernization or grappling with the evolution of utilities customer experience, we have been at the forefront of strategy development for award-winning digital transformations. Because of this expertise, Perficient was recently recognized in a leading global technology research and advisory firm’s report featuring notable digital solutions consultancies partnering with power and utilities companies across North America and Europe.

An Unexpected Twist on Industry-First Approach

Perficient’s experts dive deep into the specifics of each industry’s trends, drivers, and disruptors to ensure their advisory and solutions are tailored and relevant, but more than ever, organizations are being challenged to look outside of their industry and capitalize on solutions from other markets. The power and utilities industry is the perfect example.

Electrification Ripple Effect

Not many factors significantly affect multiple industries, quite like the adoption of and governmental policies surrounding electric vehicles. At first glance, this sweeping development has the most impact on the automotive industry. It did not take long, however, for most to realize just how much electric vehicle adoption would rely on the capabilities of utilities providers.

Automobile companies’ transition to electric vehicles and society’s adoption of them – though slow-moving – are causing additional stress on the grid. Mandates to cease producing traditional internal combustion engine vehicles are further augmenting this pressure, compounded by consumer demand for more accessible and faster charging. For automotive companies to answer these demands, they now look to the utilities providers who must make it all possible.

Perficient’s deep expertise in automotive and utilities is helping organizations in both industries to improve their capabilities and offerings in the electric vehicle charging space and collaborate to scale rapid charging, improve electric vehicle ownership and charging experience, and ensure resiliency in the face of increased energy demand.

Transformation From Rate Payer to Customer, Provider to Advisor

The automotive industry is not the only market introducing new challenges for utilities providers. Utilities companies have experienced larger-than-average debt collection deficits, as indicated in their 10K financial reports. Consumers are coming to expect from their utilities providers the experiences that other industries are innovating, such as retail and financial services. Historically, utilities customers have been thought of rate payers, but stellar customer experiences such as through mobile apps, loyalty and rewards programs, immersive virtual experiences, and intuitive virtual assistants have challenged the industry to rethink their interface with customers.

To tackle this change, energy companies are looking outside their industry, such as to big box retail stores and successful financial institutions, for answers. Many are adopting a digital experience mindset to develop major applications, customer-facing portals, and consumer-first programs. In response, Perficient approaches collaboration with these organizations with cross-industry solutions.

We were able to replicate and apply the compassion-based debt collection strategy within utilities, greatly assisting where their income was suffering. By reaching out to late accounts in humane ways, such as text messages, these utilities companies saw greater success in attaining timely payments and collections.

“The shift towards digital-first utilities requires cross-industry solutioning. This sector looks to Perficient for our ability to digitally transform and share solutions across our entire industry portfolio.”– Sean McGrath, Principal, Energy and Utilities

Our team of experts is constantly working towards the next innovative solution for the problems of tomorrow. Learn more about our energy and utilities industry expertise.

 

]]>
https://blogs.perficient.com/2025/01/17/perficients-unique-industry-focus-continues-to-capture-recognition-in-the-utilities-space/feed/ 0 372784
Perficient Recognized as Oil and Gas Industry Provider Transforming Leading Companies https://blogs.perficient.com/2024/12/20/perficient-recognized-as-oil-and-gas-industry-provider-transforming-leading-companies/ https://blogs.perficient.com/2024/12/20/perficient-recognized-as-oil-and-gas-industry-provider-transforming-leading-companies/#respond Fri, 20 Dec 2024 15:51:52 +0000 https://blogs.perficient.com/?p=372790

In the face of electrification, evolving consumer behavior and expectations, sustainability initiatives, regulatory pressures, and geopolitical volatility, oil and gas companies are being challenged to shift their approach and innovate to stay competitive. While there’s a continued focus on the digital experience for customers especially in the downstream sector, companies are also pressured to address ESG policies and reporting from production to transport and sale of their products. The development of plans to utilize emerging technologies with data-driven approaches remains integral, however, they’re executing on these all while weaving through one merger and acquisition after another.

We are excited to announce that Perficient was recently recognized by a leading global technology research and advisory firm’s report highlighting notable oil and gas industry consultancies in the U.S. and Europe. Perficient experts have worked closely with organizations within the industry to overcome challenges and gain a competitive advantage with digital transformation.

“A key differentiator for Perficient is approaching each challenge with a deep understanding of the oil and gas industry while also tapping into innovative solutions that have secured real results in other industries.” – John Latham, GM, Houston

By keeping a pulse on the ever-changing trends and pain points within the industry, maintaining cutting-edge capabilities in technology, and conducting first-party research to inform strategy, we deliver results-driven solutions that our partners are seeking.

Data Analytics and App Development for Improved Worker Safety

Like in many industries, oil and gas companies are not immune to siloed and inaccessible data. We help these organizations access, consolidate, and manage that information easily. We’ve completed numerous projects in app development, such as shift handover applications and integrated many worker safety programs, including a system to monitor gas within trucks without the need to open lids and send personnel onto dangerous catwalks.

Streamlined Transitions Throughout Mergers and Acquisitions

Over the years, we’ve helped oil and gas companies navigate the growing number of mergers and acquisitions in the industry. When one company acquires another, they want system integration as quickly as possible. Post-merger integration, supply chain and logistics, supplier management, and standardizing systems across processes are playbooks we’ve written for not just oil and gas, but every industry we’ve worked in. Further, the abundance of data that occurs due to mergers is something we expertly handle to prevent further siloing.

Cross-Industry Solutions in Oil and Gas

Oil and gas companies are stretching beyond their role as service providers to act as retailers and manufacturers. They are beginning to delve into solutions like loyalty programs and hiring executives from Target and other big box retail environments. Gas stations are now mini supermarkets striving to increase foot traffic and the size of customer baskets.  Further, all eyes are on the automotive industry as energy companies are attempting to predict the demand for gasoline and what it would look like to provide electric vehicle charging stations.

Our work across industries has made us a trusted partner and resource for these organizations hoping to build on strategies and insights from other markets. Our inclusion in this report reflects the countless hours devoted to our partnerships and understanding the work that matters to them so that we deliver real results.

Learn more about Perficient’s energy and utilities expertise.

 

 

 

]]>
https://blogs.perficient.com/2024/12/20/perficient-recognized-as-oil-and-gas-industry-provider-transforming-leading-companies/feed/ 0 372790
Quantum Computing and Cybersecurity: Preparing for a Quantum-Safe Future https://blogs.perficient.com/2024/11/20/quantum-computing-and-cybersecurity-preparing-for-a-quantum-safe-future/ https://blogs.perficient.com/2024/11/20/quantum-computing-and-cybersecurity-preparing-for-a-quantum-safe-future/#comments Wed, 20 Nov 2024 16:57:41 +0000 https://blogs.perficient.com/?p=372305

Quantum computing is rapidly transitioning from theory to reality, using the principles of quantum mechanics to achieve computational power far beyond traditional computers. Imagine upgrading from a bicycle to a spaceship—quantum computers can solve complex problems at extraordinary speeds. However, this leap in computing power poses significant challenges, particularly for cybersecurity, which forms the backbone of data protection in our digital world.

The Quantum Revolution and its Impact on CyberSecurity

Today’s cybersecurity heavily relies on encryption,  converting data into secret codes to protect sensitive information like passwords, financial data, and emails. Modern encryption relies on complex mathematical problems that even the fastest supercomputers would take thousands of years to solve. However, quantum computers could change this model. Cryptography operates on the assumption that classical computers cannot break their codes. With their immense power, quantum computers may be able to crack these algorithms in hours or even minutes. This possibility is alarming, as it could make current encryption techniques obsolete, putting businesses, governments, and individuals at risk.

The Risks for Businesses and Organizations

Quantum computing introduces vulnerabilities that could disrupt how organizations secure their data.  Once quantum computers mature,  bad actors and cyber criminals can introduce the following key risks:

  1. Fraudulent Authentication :  Bypass secure systems, unauthorized access to applications, databases, and networks.
  2. Forgery of Digital Signatures: This could enable hackers to forge digital signatures, tamper with records, and compromise the integrity of blockchain assets, audits, and identities.
  3. Harvest-Now, Decrypt-Later Attacks: Hackers might steal encrypted data today, store it, and wait until quantum computers mature to decrypt it. This approach poses long-term threats to sensitive data.

Solutions to Achieve Quantum Safety

Organizations must act proactively to safeguard their systems against quantum threats. Here’s a three-step approach  by few experts in the field:

1. Discover

  • Identify all cryptographic elements in your systems, including libraries, methods, and artifacts in source and object code.
  • Map dependencies to create a unified inventory of cryptographic assets.
  • Establish a single source of truth for cryptography within your organization.

2. Observe

  • Develop a complete inventory of cryptographic assets from both a network and application perspective.
  • Analyze key exchange mechanisms like TLS and SSL to understand current vulnerabilities.
  • Prioritize assets based on compliance requirements and risk levels.

3. Transform

  • Transition to quantum-safe algorithms and encryption protocols.
  • Implement new quantum-resistant certificates

By doing this, we need to make sure that we are also following a process that can achieve crypto-agility. Crypto agility mean that how can you reduce the burden on development as well as the operational environment so that its not disrupting our existing systems and applications, rather giving us an ability to move from old algorithms to new algorithms seamlessly. Which in short means we can have crypto agility as service capabilities, starting from encryption, lifecycle management, and certificate management capabilities that would be quantum safe. Whenever we need them in our business applications , we can simply make an API call when a new encryption, new certificate or a new key is needed.

The Role of Technology Leaders in Quantum Safety

Leading technology companies are making strides to address quantum challenges:

  • IBM: Developing advanced quantum systems and promoting quantum-safe encryption.
  • Google: Advancing quantum computing through its Quantum AI division, with applications in cryptography and beyond.
  • Microsoft: Offering access to quantum resources via its Azure Quantum platform, focusing on securing systems against future threats.
  • Intel and Honeywell: Investing in quantum hardware and research collaborations to tackle cybersecurity challenges.
  • Startups: Companies like Rigetti Computing and Post-Quantum are innovating quantum-resistant encryption solutions.

What Can Be Done Today?

  1. Adopt Quantum-Safe Algorithms: Start transitioning to post-quantum cryptography to future-proof your systems.
  2. Raise Awareness and Invest in Research :Educate stakeholders about quantum computing risks and benefits while fostering innovation in quantum-safe technologies.
  3. Collaborate Across Sectors :Governments, businesses, and tech leaders must work together to develop secure, quantum-resilient systems.

Conclusion

Quantum computing holds incredible promise but also presents unmatched risks, particularly to cybersecurity. While quantum computers won’t break the internet overnight, organizations must act now to prepare for this transformative technology. By adopting quantum-safe practices and embracing innovation, we can secure our digital future in the face of quantum challenges.

]]>
https://blogs.perficient.com/2024/11/20/quantum-computing-and-cybersecurity-preparing-for-a-quantum-safe-future/feed/ 2 372305
The risk of using String objects in Java https://blogs.perficient.com/2024/10/25/the-risk-of-using-string-objects-in-java/ https://blogs.perficient.com/2024/10/25/the-risk-of-using-string-objects-in-java/#respond Fri, 25 Oct 2024 23:23:46 +0000 https://blogs.perficient.com/?p=370962

If you are a Java programmer, you may have been incurring an insecure practice without knowing. We all know (or should know) that is not safe to store unencrypted passwords in the database because that might compromise the protection of data at rest. But that is not the only issue, if at any time in our code there is an unencrypted password or sensitive data stored in a String variable even if it is temporary, then there could be a risk.

Why is there a risk?

String objects were not created to store passwords, they were designed to optimize space in our program. String objects in Java are “immutable” which means that after you create a String object and assign it some value, afterward you cannot remove the value nor modify it. I know you might be thinking that this is not true because you can assign “Hello World” to a given String object and in the following line assign it with “Goodbye, cruel world”, and that is technically correct. The problem is that the “Hello World” that you created first is going to keep living in the String pool even if you cannot see it.

What is the String pool?

Java uses a special memory area called the String pool to store String literals. When you create a String literal, Java checks the String pool first to see if an identical String already exists. If it does, Java will reuse the reference to the existing String, saving memory. This means that if you create 25.000 String objects and all of them have the value of “Michael Jackson” only one String Literal will be stored in memory and all variables will be pointing to the same one, optimizing the space in memory.

Ok, the object is in the String pool, where is the risk?

The String Object will remain in memory for some time before being deleted by the garbage collector. If an attacker has access to the content of the memory, they could obtain the password stored there.

Let’s see a basic example of this. The following code is creating a String object and assigning it with a secret password: “¿This is a secret password”. Then, that same object is overwritten 3 times, and the Instances Inspector of the Debugger will help us in locating String objects starting with the character “¿”.

Example 1 Code:

Code1

Example 1 Debugger:

Example1

 

As you can notice in the image when the debugger has gotten to the line 8, even after having changed three times the value of the String variable “a” and setting it to null at the end, all previous values remain in the memory, included our: “¿This is a secret password”.

 

Got it. Just avoiding creating String variables will solve the problem, right?

It is not that simple. Let us consider a second example. Now we are smarter, and we are going to use a char array to store the password instead of the String to avoid the issue of having it saved in the String pool. In addition, rather than having the secret password as literal in the code, it will be available unencrypted in a text file, which by the way is not recommended to save it unencrypted, but we will do it for this example. A BufferedReader is going to support reading the contents of the file.

Unfortunately, as you will see, password also exist in the String pool.

Example 2 Code:

Code2

Example 2 Debugger:

 

Example2 Debugger

This case is even more puzzling because in the code a String Object was never created, at least explicitly. The problem is that the BufferedReader.readLine() is returning a String Object temporarily and the content with the unencrypted password will remain in the String pool.

What can I do to solve this problem?

In this last example we will have the unencrypted password stored in a text file, we will use a BufferedReader to read the contents of the file, but instead of using the method BufferedReader.readLine()  that returns a String we are using the method BufferedReader.read() that stores the content of the file in a char array.  As seen in the debugger’s screenshot, this time the file’s contents are not available in the String pool.

Example 3 Code:

Code3

Example 3 Debugger:

Example3 Debugger

In summary

To solve this problem, consider following the principles listed below:

  1. Do not create String literals with confidential information in your code.
  2. Do not store confidential information in String objects. You can use other types of Objects to store this information such as the classic char array. After processing the data make sure to overwrite the char array with zeros or some random chars, just to confuse attackers.
  3. Avoid calling methods that will return the confidential information as String, even if you will not save that into a variable.
  4. Consider applying an additional security layer by encrypting confidential information. The SealedObject in Java is a great alternative to achieve this. The SealedObject is a Java Object where you can store sensitive data, you provide a secret key, and the Object is encrypted and serialized. This is useful if you want to transmit it and ensure the content remains unexposed. Afterward, you can decrypt it using the same secret key. Just one piece of advice, after decrypting it, please do not store it on a String object.
]]>
https://blogs.perficient.com/2024/10/25/the-risk-of-using-string-objects-in-java/feed/ 0 370962
Perficient Named in Forrester’s App Modernization and Multicloud Managed Services Landscape, Q4 2024 https://blogs.perficient.com/2024/10/25/perficient-in-forresters-app-modernization-and-multicloud-managed-services-landscape-q4-2024/ https://blogs.perficient.com/2024/10/25/perficient-in-forresters-app-modernization-and-multicloud-managed-services-landscape-q4-2024/#respond Fri, 25 Oct 2024 12:21:43 +0000 https://blogs.perficient.com/?p=371037

As new technologies become available within the digital space, businesses must adapt quickly by modernizing their legacy systems and harnessing the power of the cloud to stay competitive. Forrester’s 2024 report recognizes 42 notable providers– and we’re proud to announce that Perficient is among them.

We believe our inclusion in Forrester’s Application Modernization and Multicloud Managed Services Landscape, Q4 2024 reflects our commitment to evolving enterprise applications and managing multicloud environments to enhance customer experiences and drive growth in a complex digital world.

With the demand for digital transformation growing rapidly, this landscape provides valuable insights into what businesses can expect from service providers, how different companies compare, and the options available based on provider size and market focus.

Application Modernization and Multicloud Managed Services

Forrester defines application modernization and multicloud managed services as:

“Services that offer technical and professional support to perform application and system assessments, ongoing application multicloud management, application modernization, development services for application replacements, and application retirement.”

According to the report,

“Cloud leaders and sourcing professionals implement application modernization and multicloud managed services to:

  • Deliver superior customer experiences.
  • Gain access to technical and transformational skills and capabilities.
  • Reduce costs associated with legacy technologies and systems.”

By focusing on application modernization and multicloud management, Perficient empowers businesses to deliver superior customer experiences through agile technologies that boost user satisfaction. We provide clients with access to cutting-edge technical and transformational skills, allowing them to stay ahead of industry trends. Our solutions are uniquely tailored to reduce costs associated with maintaining legacy systems, helping businesses optimize their IT budgets while focusing on growth.

Focus Areas for Modernization and Multicloud Management

Perficient has honed its expertise in several key areas that are critical for organizations looking to modernize their applications and manage multicloud environments effectively. As part of the report, Forrester asked each provider included in the Landscape to select the top business scenarios for which clients select them and from there determined which are the extended business scenarios that highlight differentiation among the providers. Perficient self-reported three key business scenarios that clients work with us out of those extended application modernization and multicloud services business scenarios:

  • Infrastructure Modernization: We help clients transform their IT infrastructure to be more flexible, scalable, and efficient, supporting the rapid demands of modern applications.
  • Cloud-Native Development Execution: Our cloud-native approach enables new applications to leverage cloud environments, maximizing performance and agility.
  • Cloud Infrastructure “Run”: We provide ongoing support for cloud infrastructure, keeping applications and systems optimized, secure, and scalable.

Delivering Value Through Innovation

Perficient is listed among large consultancies with an industry focus in financial services, healthcare, and the manufacturing/production of consumer products. Additionally, our geographic presence in North America, Latin America, and the Asia-Pacific region was noted.

We believe that Perficient’s inclusion in Forrester’s report serves as another milestone in our mission to drive digital innovation for our clients across industries. We are proud to be recognized among notable providers and look forward to continuing to empower our clients to transform their digital landscapes with confidence. For more information on how Perficient can help your business with application modernization and multicloud managed services, contact us today.

Download the Forrester report, The Application Modernization And Multicloud Managed Services Landscape, Q4 2024, to learn more (link to report available to Forrester subscribers and for purchase).

]]>
https://blogs.perficient.com/2024/10/25/perficient-in-forresters-app-modernization-and-multicloud-managed-services-landscape-q4-2024/feed/ 0 371037
Success Story: Enhancing Member Engagement with Marketing Cloud Personalization https://blogs.perficient.com/2024/10/22/success-story-enhancing-member-engagement-with-marketing-cloud-personalization/ https://blogs.perficient.com/2024/10/22/success-story-enhancing-member-engagement-with-marketing-cloud-personalization/#respond Tue, 22 Oct 2024 16:44:09 +0000 https://blogs.perficient.com/?p=370869

Introduction 

In today’s digital age, personalization is key to engaging and retaining members. A prominent labor union representing a large number of educators in a major metropolitan public school system recognized this need and partnered with Perficient to implement a scalable Salesforce Marketing Cloud Personalization solution. This collaboration aimed to drive member engagement, increase portal adoption, and improve navigation for its members. 

About the Union 

This union represents a substantial number of teachers, classroom paraprofessionals, and various other school-based titles, including school secretaries, school counselors, and occupational and physical therapists. Additionally, the union represents family childcare providers, nurses, and employees at various private educational institutions and some charter schools. The union also has a significant number of retired members. 

The union’s central headquarters is located in a major city, with additional offices to assist members with certification, licensing, salaries, grievances, and pensions. Founded in the 1960s, the union is part of a larger national federation of teachers, a state-wide teachers’ organization, and is affiliated with major labor councils. 

The Challenge 

The union faced the challenge of effectively engaging its members and ensuring they were aware of the benefits and resources available to them. The goal was to create a more personalized and user-friendly experience on their member portal, which would lead to higher engagement and satisfaction. 

The Solution 

Perficient stepped in to provide a comprehensive solution that included: 

  1. Scalable Marketing Cloud Personalization Foundation: Establishing a robust foundation for ongoing personalization efforts. 
  2. High-Value Use Cases: Implementing three high-value use cases focused on educating site visitors about the benefits available through the union and the resources on the Member Hub.
  3. Knowledge Transfer Sessions: Enabling the union team to create additional personalization campaigns through detailed knowledge transfer sessions. 

What is Marketing Cloud Personalization? 

Marketing Cloud Personalization provides real-time, scalable, cross-channel personalization and AI to complement Marketing Cloud Engagement’s robust customer data, audience segmentation, and engagement platform. It uses tailored interactions with customers and prospects to increase loyalty, engagement, and conversions, delivering more relevant experiences across the customer journey. 

Unified Profiles 

Personalization helps understand each visitor by building a centralized individual profile from different data sources. This profile includes preferences and affinities, providing a visual representation of all data about a single visitor. This information helps decide how and when to best interact with them on their preferred channels. Profiles can also be rolled up to the account level to view relationships among visitor behaviors associated with the same account. 

The Results 

Through the personalized website experiences, the union saw a significant increase in member engagement, portal adoption, and website conversions. The scalable personalization foundation allowed them to continuously optimize and expand their targeted campaigns, ensuring ongoing improvements and relevance. 

The union’s Marketing Director praised the consultancy’s exceptional work, stating: 

“Perficient is one of the strongest partners I have ever worked with on strategy and implementation. The team was amazing from start to finish. Our product is live and running. I immediately secured the team to continue working on other projects so we wouldn’t lose these great resources.” 

This seamless partnership and effective personalization solution enabled the union to better serve its members and achieve its key engagement objectives. By leveraging Marketing Cloud Personalization, the union not only enhanced the user experience but also empowered its team to sustain and grow these efforts independently. 

The partnership between the union and Perficient showcases the power of personalized digital experiences in driving member engagement and satisfaction.  

About Our Salesforce Team

We are a Salesforce Summit Partner with more than two decades of experience delivering digital solutions in the manufacturing, automotive, healthcare, financial services, and high-tech industries. Our team has deep expertise in all Salesforce Clouds and products, artificial intelligence, DevOps, and specialized domains to help you reap the benefits of implementing Salesforce solutions.  

Want to learn more? Schedule some time with us  to explore Marketing Cloud Personalization! 

]]>
https://blogs.perficient.com/2024/10/22/success-story-enhancing-member-engagement-with-marketing-cloud-personalization/feed/ 0 370869
Custom Weather Forecast Model Using ML Net https://blogs.perficient.com/2024/09/10/custom-weather-forecast-model-using-ml-net/ https://blogs.perficient.com/2024/09/10/custom-weather-forecast-model-using-ml-net/#respond Tue, 10 Sep 2024 20:12:22 +0000 https://blogs.perficient.com/?p=368939

Nowadays, AI is a crucial field with various frameworks like ML.NET that can be used to build amazing applications using pre-built models from cloud providers. It’s important to learn how these services work behind the scenes, how to create custom models, and understand how your application can interact with AI frameworks beyond just cloud providers or the source of the AI services.

How can I use ML Net?

ML Net can be used with Visual Studio 2019 or later, using any version of Visual Studio, and also can be used by Visual Studio Code, but only works on a Windows OS, Its prerequisites are:

  • Visual Studio 2022 or Visual Studio 2019.
  • .NET Core 3.1 SDK or later.

ML Net 1

Image 1: Visual Studio installer, Installation Details contains the ML Net Model builder

ML Net 2

Image 2: Visual Studio Context Menu

After adding the ML Net component to your project, you can see a wizard that allows you to set up your model as you need (Image 3).

ML Net 3

Image 3: ML NET Wizard

Application Overview

The application starts with the weather Groups, every item contains a temperature range, a button to search the Historical data, and a forecast prediction (Image 4).

ML Net 4

Image 4: Weather forecast main page.

The source of those groups is a table named Weather with the attributes:

  • Id: primary key
  • Description: that is the group description, you can see it as the title of the cards in image 4
  • MinRange: Minimal temperature belongs to the group.
  • MaxRange: Maximum temperature to belongs to the group.

The “History” button shows a table with all the historical data paginated. The historical data contains,  the date with format (yyyy-mm-dd),  the temperature, and if the day was cloudy (Image 5)

 

ML Net 5

Image 5: Weather forecast historical page.

The predict option allows the users to generate their own prediction using ML Net through an API endpoint, the input data is the number of days from today that the user will predict and if the day will be cloudy (Image 6)

Image6

Image 6: Prediction page

The API result is the date, the group, and the percentage of probability that the date will belong to the group, also shows a table with the percentage of probability of every group.

Model

In the real world, there are lots of variables to keep in mind if you want to implement a Weather Forecast prediction app, such as wind speed, temperature, the season, humidity, if it was cloudy, etc.(2)

The scope of this approach is to see how ML Net can solve a custom model; therefore, a simple custom model was created, based on the temperature, and the season and if the day is cloudy, the model uses the weather as group of different forecasts, then the custom training model was designed as follow (Image 7):

  • Weather (Id): Every grouper has an ID, so the label to predict it is the ID.
  • Date: it is the feature of the date related to the weather
  • IsCloudy: it’s a Boolean feature that indicates the relationship between weather and clouds.
  • Season (Id): it is a feature that indicates the relationship between weather and season (Every season has an id)

Image7

Image 7: Training data section from ML Net wizard

You can get the data from Files, SQL Server databases, for this case, the data was collected from a View on SQL Server.

Project Architecture Overview

The weather forecast has 2 sites a front-end and a back-end, the data was stored in a SQL Server Database (Image 8). With this overall approach, the system was designed to separate the responsibilities of the business logic, the data, and the user experience.

Image8

Image 8: Sites and database

Front-end

You can find the app repository on GitHub using the following URL: https://github.com/joseflorezr/trainingangularversion

The front-end repository contains an angular 18 solution, which uses angular materials to help improve the user experience, and routing for navigation. The solution contains the following components (image 9):

  • Forecast-header: The top component of the page, it shows the title with its style.
  • Forecast-prediction: Contains the form for weather predictions and shows the results.
  • Forecast results: Contains the historical data.
  • Weather: Shows the groups of weather forecasts
  • Services: Connects to the API to get weather, forecasts, and predictions
  • Model: interfaces that map with the API

Image9

Image 9: Front-end components

Back-end

You can find the app repository on GitHub using the following URL: https://github.com/joseflorezr/WebApiMlNetWeatherForecast.

Image10

Image 10: Back End components

The API solution contains  the following projects:

  • TestWebAPi: Web API with the endpoints, contains 3 controllers, Weather, forecast, and WeatherForecast. WeatherForecast is an abstract class with the logger and the use case reference injection.
  • Business: Contains the classes that contain the business logic, based on the Use Case approach(4)
  • Model: It is the abstraction of the domain objects like Weather, Forecast, Season, and predicted forecast
  • Data: This library contains 2 parts:
    • The integration at the data level, the context with Entity Framework to get to the database.
    • The integration with ML Net, after being added to the solution, some  support files were  scaffolded with the same name but different descriptions, in this case, the file is MLForecastModel:
      • mbconfig: contains the wizard that helps to change the settings.
      • consumption: a partial class that allows interaction with the model.
      • evaluate: a partial class that allows to calculate of the metrics
      • mlnetl: this file contains the knowledge base; it is important to share the file at the  API level.
      • training: Adds the training methods that support the creation of the file.

Database Project(3)

The data was built abstracting the concepts of the Weather and Season as master entities with their description, otherwise Forecast it’s the historical table that contains the information for a specific date (1 row per day) the observation, that means, the temperature, the season id and then the weather id.

Visual Studio contains a database project that allows developers to create, modify, and deploy databases, and can run scripts after the deployment. To create the ML Net model, a View named WeatherForecast was used because it’s easier to connect to the ML Net Wizard.  The image 11 shows the relationship between the tables.

Image11

Image 11: Database diagram

Database projects can be deployed using the SQL Schema comparer tool, there is a post-build script that creates the data to the database model. For this app, a script was executed simulating forecast data from 1900-01-01 to 2024-06-04. The script uses random data, so the results must be different every time that you populate the forecast table.

WeatherForecast view concentrates the data used by ML Net to create the model.

API Project

The API project exposes endpoints that support getting the groups (Weather Controller), getting the historical Forecast data (Forecast Controller), and predict (Forecast Controller)

Image12

Image 12:  Web API Swagger

Note: The ML net file must be added as a resource of the API because the MLForecastModel class at the moment the API uses the prediction functionality, tries to look at the file on a specific path (it could be changed).

 Image13

Image 13: ML Net file location

Model Project

Contains the DTOs that can be transferred to the Front-end, basically, the weather entity has the group description and the temperature ranges, the season contains the description of the starting and end months, the forecast has the temperature, date if the day was cloudy and id, PredictedForecast inherits from forecast and the score, and weather description was added (Image 14).

Image14

Image 14: Entities

Basically, ML Net  creates the MLForecastModel class, it contains the methods to use the prediction model (the result is different for the chosen scenario), but in general terms, the idea is to send an Input object (defined by ML Net) and receive results as follows:

  • For a single object, use the Predict method, it will return the score for the predicted label.
  • If you want to get the labels, use the GetLabels method, it will return all the labels as an IEnumerable.
  • If you want to evaluate all labels, PredictAllLabels is the method, it will return a sorted IEnumerable with key-value pairs (label and score)
  • If you want to map an unlabeled result, use the GetSortedScoresWithLabels, it will return a sorted IEnumerable with key-value pairs (label and score)

The PredictAsync Method (Image 15), creates the input object, starting with the user input (id, days, cloudy), it gets the projected date adds the days, and then finds the season ID based on the month (GetSeasonMethod). After the input project was complete, the chosen method to use was PredictAllLabels. In this case, the label is a Weather ID, so it was needed to get the Description from the Database for every given label.

Image15

Image 15: PredictAsync Implementation

Summary

  • You can use ML NET to create your own Machine Learning models and use them as part of your API solution.
  • There are multiple options (scenarios) to choose from according to your needs.
  • Models can be created using diverse sources, such as Database objects, or files.

References

  1. https://learn.microsoft.com/en-us/dotnet/machine-learning/how-does-mldotnet-work
  2. https://content.meteoblue.com/en/research-education/specifications/weather-variables
  3. https://visualstudio.microsoft.com/vs/features/ssdt/
  4. https://medium.com/@pooja0403keshri/clean-architecture-a-comprehensive-guide-with-c-676beed5bdbb
  5. https://learn.microsoft.com/en-us/aspnet/core/fundamentals/dependency-injection?view=aspnetcore-8.0

 

 

]]>
https://blogs.perficient.com/2024/09/10/custom-weather-forecast-model-using-ml-net/feed/ 0 368939
Perficient Interviewed by Forrester: Steps to Develop A Manufacturing Operations Management Vision https://blogs.perficient.com/2024/07/19/perficient-interviewed-for-forrester-report-on-manufacturing-operations-management-vision/ https://blogs.perficient.com/2024/07/19/perficient-interviewed-for-forrester-report-on-manufacturing-operations-management-vision/#respond Fri, 19 Jul 2024 13:37:56 +0000 https://blogs.perficient.com/?p=366106

Customers are demanding a personalized experience, and manufacturing is no different. One-size-fits-all manufacturing is now becoming a thing of the past. Today, we are seeing a decline in the mass production of identical products in exchange for personalization and niche products. As consumers intensify their expectations for on-demand personalization and delivery within days of purchase, marketplaces that offer wide ranges of options without sacrificing price or availability, like Amazon, have been capitalizing on this shift in customer behavior.

To achieve low costs while offering high availability and personalized products, manufacturers must alter their operations to be flexible, focusing on creating a wider range of products that share common underlying components or production processes. Perficient is committed to advising manufacturers as they face these new challenges outlined in Forrester’s recent report, “Key Steps to Develop Your Manufacturing Operations Management Vision.”

Our Manufacturing Operations Management Capabilities

Forrester interviewed several service providers and manufacturers to gain a holistic understanding of the current state of manufacturing operations management (MOM). The report outlines the importance of achieving variety while maintaining low cost and high availability to thrive in this new landscape. Ultimately, Forrester concluded that exploiting economies of scope – where the unit price of a product decreases as the variety increases – is the solution.

According to Forrester, “To thrive in fragmented and restless markets, manufacturers must compete on economies of scope, sharing fixed costs between multiple product or asset variants to deliver the innovation, choice, and personalization that will win, serve, and retain customers.” Further, Forrester stated, “… to compete with innovation, manufacturers must modernize their supply chain and manufacturing operations to manage — at scale and velocity — the digital thread that links design, manufacture, and the ongoing operation and maintenance of assets… and boost manufacturing execution system (MES) interoperability with enterprise resource planning (ERP) scheduling solutions…”

We believe Perficient is uniquely poised as an end-to-end digital consultancy with deep industry expertise to partner with manufacturing brands embarking on this journey. Our established practices, from supply chain to commerce, offer a customized strategy that meets the organization in its current state and execution that secures long-term success. Our supply chain experts routinely help organizations in sales and operations planning, strategic sourcing and spend control, procure to pay, inventory and materials management, operations continuation, risk management, end-to-end supply chain visibility, and more.

Finally, our partnerships with the other technology providers listed in Forrester’s report, such as Oracle and SAP, further equip us with the expertise needed to help brands navigate their modernization.

Perficient’s Manufacturing Industry Expertise

Forrester interviewed leaders from Perficient’s manufacturing and supply chain teams while researching this report. Kevin Espinosa, manufacturing industry lead at Perficient, remarked:

“We believe our participation as a company interviewed for the report on Manufacturing Operations Management is a testament to how critical our industry expertise is for manufacturing companies working to transform their operations and meet their customers needs.”

Perficient is excited to continue to share thought leadership and perspective on emerging trends in manufacturing operations management. For more information, download “Key Steps to Develop Your Manufacturing Operations Management Vision,” (available for purchase or to Forrester subscribers) or contact our manufacturing and supply chain experts today.

]]>
https://blogs.perficient.com/2024/07/19/perficient-interviewed-for-forrester-report-on-manufacturing-operations-management-vision/feed/ 0 366106
Perficient Recognized as a Major Player in IDC MarketScape for Cloud Professional Services https://blogs.perficient.com/2024/07/02/perficient-recognized-in-idc-marketscape-for-cloud-professional-services/ https://blogs.perficient.com/2024/07/02/perficient-recognized-in-idc-marketscape-for-cloud-professional-services/#respond Tue, 02 Jul 2024 15:53:21 +0000 https://blogs.perficient.com/?p=364651

Navigating the complexities of cloud technology requires an exceptional partner. We are thrilled to announce that Perficient has been named a Major Player in the IDC MarketScape: Worldwide Cloud Professional Services 2024 Vendor Assessment (Doc #US51406224, June 2024).

What Does This Inclusion Mean for Perficient?

“We’re honored to be recognized as a Major Player in this IDC MarketScape Report, a distinction we believe highlights our holistic approach to cloud strategy and our implementation expertise,” said Glenn Kline, Perficient’s Area Vice President of Product Development Operations. “We combine our Envision Framework, migration and modernization expertise, and our strong network of partnerships with leading cloud providers to drive measurable business outcomes for our customers. Our Agile-ready global team enables businesses to think big, start small, and act fast so they can scale their cloud ecosystem over time and deliver on the outcomes promised by cloud computing.”

According to the IDC MarketScape, businesses should “consider Perficient if [they] are looking for a midsized cloud services provider that can combine client intimacy with industrial-strength capabilities in technology transformation and experience design and build.” Additionally, our global managed services group has created comprehensive accelerators such as the App Modernization IQ, Cloud FinOps IQ, and Green Impact IQ, serving as effective tools for guiding clients in cloud operations strategies.

What Does This Mean for Our Clients?

We believe this inclusion reaffirms Perficient as a trusted partner in cloud transformation. Perficient Cloud, our comprehensive suite of six solution areas, serves as a roadmap to navigate the evolving landscape of cloud technology. These areas focus on delivering critical business and technology capabilities, with agnostic offers and accelerators tailored to meet the unique needs of each client. Our Agile-ready global team enables businesses to think big, start small, and act fast, allowing scalable cloud ecosystems that maximize investment. Our focus areas include:

  • Technology Modernization: Enhancing performance and efficiency through updated infrastructure.
  • Product Differentiation: Creating innovative product offerings that stand out.
  • Customer Engagement: Improving interactions and experiences with personalized, data-driven approaches.
  • Data & AI Enablement: Driving insights and innovation with advanced analytics and AI.
  • Automation & Operational Agility: Boosting efficiency with automation solutions.
  • Sustainable Practices: Promoting responsible and impactful cloud strategies.

Join Us on Our Cloud Journey

We believe our inclusion in the IDC MarketScape report highlights our commitment to helping businesses navigate the complexities of cloud transformation. We are dedicated to delivering top-tier cloud solutions that drive growth and innovation.

To learn more about Perficient’s cloud professional services, download the IDC MarketScape: Worldwide Cloud Professional Services 2024 Vendor Assessment report available to IDC subscribers and for purchase. You can also read our News Release for more details on this recognition.

 

]]>
https://blogs.perficient.com/2024/07/02/perficient-recognized-in-idc-marketscape-for-cloud-professional-services/feed/ 0 364651
Retrieve Your Application Data Using AWS ElastiCache https://blogs.perficient.com/2024/06/26/retrieve-your-application-data-using-aws-elasticache/ https://blogs.perficient.com/2024/06/26/retrieve-your-application-data-using-aws-elasticache/#respond Wed, 26 Jun 2024 13:48:03 +0000 https://blogs.perficient.com/?p=364838

AWS ElastiCache is a service that improves web application performance by retrieving information from fast-managed in-memory caches.

What is Caching?

Caching is the process of storing data in a cache. A cache is a temporary storage area. Cache are optimized for fast retrieval with the trade off that data is not durable.

The cache is used for reading purposes only, which can access your application data promptly.

ElastiCache supports the following two popular open-source in-memory caching engines:

  • Memcached: A high-performance, distributed memory object caching system well-suited for use cases where simple key-value storage and retrieval are required.
  • Redis: An open-source, in-memory key-value store that supports various data structures such as strings, hashes, lists, sets, and more. Redis is often used for caching, session management, real-time analytics, and messaging.

Which Caching Engine is Best?

Redis has more Advanced features than Memcached. A data structure server stores data in a key-value format to be served quickly. It allows replication, clustering, and configurable persistence. It is recommended if you want a highly scalable data store shared by multiple processes, applications, or servers or just as a caching layer.

On the Other Hand, Memcached is an in-memory key-value store for small chunks of data that fetch data from a database, API calls, or page rendering.

Memcached is used to speed up the dynamic web application.

Both Caching engines have their own usage depending on your requirements. Here, we are going to use Redis Cache.

Architecture Diagram of  AWS ElastiCache

Architecture

According to the Architecture diagram, whenever a read request is generated by the user, information is first searched in ElastiCache. If the data is not available in the Cache, then the request is served from the Database.

If the requested data is present in the cache, then the reply is very quick; otherwise, the Database is responsible for serving the request, which increases the latency.

Why We Need AWS ElastiCache

  • Performance Improvement: ElastiCache stores frequently used data from the database, which helps to improve the Application’s performance.
  • Scalability: ElastiCache can quickly help set up, manage, and scale distributed in-memory cache in the cloud.
  • High Availability and Reliability: ElastiCache supports multi-AZ functionality, which means if one AZ is unavailable, ElastiCache continues to serve data in the other AZ. ElastiCache supports replication and provides automatic failover, in which if the master node fails, one of the read replicas promotes itself as a master node. This is particularly crucial for critical applications that require constant uptime.
  • Cost-Effectiveness: With ElastiCache, There is no upfront cost or long-term commitment. You just pay a simple monthly charge for each Redis node you use. By offloading traffic from databases to cache layers, ElastiCache helps reduce the workload on your databases.
  • Security: ElastiCache comes with various security features, including encryption in transit and at rest, identity and access management (IAM) policies, and integration with Amazon Virtual Private Cloud (VPC), helping to protect your cached data.
  • Compatibility: ElastiCache is compatible with variety of popular frameworks and libraries, it is easy to integrate with existing applications.

Use Cases of AWS ElastiCache

  • Chat Application
  • Media Streaming
  • Session store
  • Gaming Leaderboard’s
  • Real-time analytics
  • Caching

Deployment of  AWS ElastiCache using CloudFormation Template

Let’s Deploy the AWS ElastiCache(Redis Cache) using IaC Tool (AWS CloudFormation)

Step 1: Create a Stack in AWS CloudFormation and upload a Template file.

Template file: In this template file, there is CloudFormation code, which is going to deploy the AWS ElastiCache.

Note: Repository link to download the template file: https://github.com/prafulitankar/AWS-Elasticache

Img 1

Step 2: Mention the Stack Name and Parameter Values. Here, we have provided the CloudFormation Stack name(Elasticache-01) and Parameter Values, which define the Configuration of the AWS ElastiCache Cluster.

Img 2

Img 3

Step 3: Once we’re done with the Parameter Value, let’s configure the Below Stack Options. Provide the Tags and Permissions to the Cluster.

Img 4

Step 4: Configure Stack Failure Options; here we have stack failure options:

  • Preserve successfully provisioned resources: When the stack fails, it preserves all the resources that were successfully created.
  • Delete all newly created resource : Once the stack failed it should be rollback , which means it keep all the old resources which was created previously and delete all the new resource during rollback

Img 5

Once we Submit all the necessary information, the CloudFormation stack will start creating the AWS ElastiCache Cluster.

Now, our AWS ElastiCache Cluster is available.

Img 8

How to Access AWS ElastiCache

  • AWS ElastiCache Cluster must be deployed in VPC.
  • Port Number 6379 is allowed in the Security Group from the source IP from where we access the ElastiCache Cluster.
  • To Access, the Cluster requires a Primary endpoint (master.cluster-test-001.flihgf.use2.cache.amazon.com:6379)

By using AWS ElastiCache, we can speed up our Application performance by caching data in ElastiCache, which is cost-effective, secure, and highly available to reduce the overhead and latency on the database.

]]>
https://blogs.perficient.com/2024/06/26/retrieve-your-application-data-using-aws-elasticache/feed/ 0 364838
Part 1: An Overview of the PDFBox Library https://blogs.perficient.com/2024/06/25/part-1-an-overview-of-the-pdfbox-library/ https://blogs.perficient.com/2024/06/25/part-1-an-overview-of-the-pdfbox-library/#respond Wed, 26 Jun 2024 04:27:14 +0000 https://blogs.perficient.com/?p=364863

Apache PDFBox is a versatile open-source library designed to work with PDF documents. It is widely used in various Java applications to create, modify, extract, and print PDF documents. In this part, we will provide a theoretical overview of the PDFBox library, highlighting its key features, components, and typical use cases.

Key Features of PDFBox

  1. PDF Creation

PDFBox allows developers to create new PDF documents programmatically. You can add text, images, and other graphical elements to the pages of a PDF.

  1. PDF Modification

With PDFBox, you can modify existing PDF documents. This includes adding or removing pages, altering the content of existing pages, and adding annotations or form fields.

  1. Text Extraction

The capability of PDFBox to extract text from PDF documents is among its most potent capabilities. This is especially helpful for converting PDFs to other formats, such as HTML or plain text, or for indexing and searching PDF information.

  1. Image Extraction

PDFBox provides functionality to extract images from PDF documents. This is useful when validating images within PDFs or reusing images in other applications.

  1. Form Handling

PDFBox supports interactive PDF forms (AcroForms). You can create new forms, fill existing forms, and extract data from filled forms.

  1. PDF Rendering

PDFBox includes rendering capabilities, allowing you to convert PDF pages to images. This is useful for displaying PDF content in applications that do not natively support PDF viewing.

  1. Encryption and Decryption

PDFBox supports PDF document encryption and decryption. You can secure your PDFs with passwords and manage user permissions for viewing, printing, and editing.

Components of PDFBox

  1. PDDocument

The PDDocument class represents an in-memory PDF document. It is the starting point for most PDF operations in PDFBox.

  1. PDPage

The PDPage class represents a single page in a PDF document. You can add content to a page, extract content from a page, and manipulate the page layout.

  1. PDPageContentStream

The PDPageContentStream class is used to write content to a PDPage, including text, images, and graphical elements.

  1. PDFTextStripper

The PDFTextStripper class is used for text extraction. It processes a PDDocument and extracts text content from it.

  1. PDFRenderer

The PDFRenderer class is used to render PDF pages into images. This is useful for displaying PDF pages in applications or for generating thumbnails.

  1. PDImageXObject

The PDImageXObject class represents an image within a PDF document. You can use it to extract or add new images to a PDF.

  1. PDAcroForm

The PDAcroForm class represents the interactive form fields in a PDF. It allows you to manipulate form data programmatically.

Typical Use Cases for PDFBox

  1. Generating Reports

Businesses often need to generate dynamic reports in PDF format. PDFBox can be used to create customized reports with text, tables, images, and charts.

  1. Archiving Documents

PDFBox is useful for archiving documents in a standardized format. It can convert various document types into PDFs and manage large collections of PDF documents.

  1. Content Extraction and Indexing

PDFBox is frequently used for extracting text and metadata from PDFs for indexing and search purposes. This is valuable for building searchable archives and databases.

  1. Form Processing

Many applications require the handling of PDF forms. PDFBox can create, fill, and read form data, making it ideal for automating form processing tasks.

  1. PDF Security

With PDFBox, you can add security features to your PDF documents. This includes encrypting sensitive information and managing access permissions.

  1. Displaying PDFs

PDFBox’s rendering capabilities make it suitable for applications that need to display PDF content as images, such as in a thumbnail preview or a custom PDF viewer.

Conclusion

The extensive functionality offered by Apache PDFBox makes working with PDF documents easier. Whether you want to create, edit, extract, or secure PDF files, PDFBox has the tools to get the job done quickly. Because of its Java integration, it’s a great option for developers who want to handle PDF documents inside of their apps.

By being aware of PDFBox’s features and components, you can get the most out of it for your projects and guarantee that any activities involving PDFs are completed quickly and efficiently.

]]>
https://blogs.perficient.com/2024/06/25/part-1-an-overview-of-the-pdfbox-library/feed/ 0 364863
The Quest for Spark Performance Optimization: A Data Engineer’s Journey https://blogs.perficient.com/2024/06/18/the-quest-for-spark-performance-optimization-a-data-engineers-journey/ https://blogs.perficient.com/2024/06/18/the-quest-for-spark-performance-optimization-a-data-engineers-journey/#respond Tue, 18 Jun 2024 13:43:04 +0000 https://blogs.perficient.com/?p=364402

In the bustling city of Tech Ville, where data flows like rivers and companies thrive on insights, there lived a dedicated data engineer named Tara. With over five years of experience under her belt, Tara had navigated the vast ocean of data engineering, constantly learning, and evolving with the ever-changing tides.
One crisp morning, Tara was called into a meeting with the analytics team at the company she worked for. The team had been facing significant delays in processing their massive datasets, which was hampering their ability to generate timely insights. Tara’s mission was clear: optimize the performance of their Apache Spark jobs to ensure faster and more efficient data processing.
The Analysis
Tara began her quest by diving deep into the existing Spark jobs. She knew that to optimize performance, she first needed to understand where the bottlenecks were. she started with the following steps:
1. Reviewing Spark UI: Tara meticulously analyzed the Spark UI for the running jobs, focusing on stages and tasks that were taking the longest time to execute. she noticed that certain stages had tasks with high execution times and frequent shuffling.

Monitoring Spark with the web interface | DataStax Enterprise | DataStax  Docs
2. Examining Cluster Resources: she checked the cluster’s resource utilization. The CPU and memory usage graphs indicated that some of the executor nodes were underutilized while others were overwhelmed, suggesting an imbalance in resource allocation.

                                           Apache Spark Cluster Manager: YARN, Mesos and Standalone - TechVidvan
The Optimization Strategy
Armed with this knowledge, Tara formulated a multi-faceted optimization strategy:

1. Data Serialization: she decided to switch from the default Java serialization to Kryo serialization, which is faster and more efficient.
conf = SparkConf().set(“spark.serializer”, “org.apache.spark.serializer.KryoSerializer”)

pyspark tunning #Data Serialization
2. Tuning Parallelism: Tara adjusted the level of parallelism to better match the cluster’s resources. By setting `spark.default.parallelism` and `spark.sql.shuffle.partitions` to a higher value, she aimed to reduce the duration of shuffle operations.
conf = conf.set(“spark.default.parallelism”, “200”)
conf = conf.set(“spark.sql.shuffle.partitions”, “200”)
3. Optimizing Joins: she optimized the join operations by leveraging broadcast joins for smaller datasets. This reduced the amount of data shuffled across the network.
small_df = spark.read.parquet(“hdfs://path/to/small_dataset”)
large_df = spark.read.parquet(“hdfs://path/to/large_dataset”)
small_df_broadcast = broadcast(small_df)
result_df = large_df.join(small_df_broadcast, “join_key”)

Hadoop, Spark, Hive and Programming: Broadcast Join in Spark
4. Caching and Persisting: Tara identified frequently accessed DataFrames and cached them to avoid redundant computations.
df = spark.read.parquet(“hdfs://path/to/important_dataset”).cache()
df.count() – Triggering cache action

Caching In Spark
5. Resource Allocation: she reconfigured the cluster’s resource allocation, ensuring a more balanced distribution of CPU and memory resources across executor nodes.
conf = conf.set(“spark.executor.memory”, “4g”)
conf = conf.set(“spark.executor.cores”, “2”)
conf = conf.set(“spark.executor.instances”, “10”)

The Implementation
With the optimizations planned, Tara implemented the changes and closely monitored their impact. she kicked off a series of test runs, carefully comparing the performance metrics before and after the optimizations. The results were promising:
– The overall job execution time reduced by 40%.
– The resource utilization across the cluster was more balanced.
– The shuffle read and write times decreased significantly.
– The stability of the jobs improved, with fewer retries and failures.
The Victory
Tara presented the results to the analytics team and the management. The improvements not only sped up their data processing pipelines but also enabled the team to run more complex analyses without worrying about performance bottlenecks. The insights were now delivered faster, enabling better decision-making, and driving the company’s growth.
The Continuous Journey
While Tara had achieved a significant milestone, she knew that the world of data engineering is ever evolving. she remained committed to learning and adapting, ready to tackle new challenges and optimize further as the data landscape continued to grow.
And so, in the vibrant city of Tech Ville, Tara’s journey as a data engineer continued, navigating the vast ocean of data with skill, knowledge, and an unquenchable thirst for improvement.

]]>
https://blogs.perficient.com/2024/06/18/the-quest-for-spark-performance-optimization-a-data-engineers-journey/feed/ 0 364402