Case Studies Articles / Blogs / Perficient https://blogs.perficient.com/category/research-studies/case-studies/ Expert Digital Insights Thu, 04 Dec 2025 00:30:15 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png Case Studies Articles / Blogs / Perficient https://blogs.perficient.com/category/research-studies/case-studies/ 32 32 30508587 Creators in Coding, Copycats in Class: The Double-Edged Sword of Artificial Intelligence https://blogs.perficient.com/2025/12/03/creators-in-coding-copycats-in-class-the-double-edged-sword-of-artificial-intelligence/ https://blogs.perficient.com/2025/12/03/creators-in-coding-copycats-in-class-the-double-edged-sword-of-artificial-intelligence/#respond Thu, 04 Dec 2025 00:30:15 +0000 https://blogs.perficient.com/?p=388808

“Powerful technologies require equally powerful ethical guidance.” (Bostrom, N. Superintelligence: Paths, Dangers, Strategies. Oxford University Press, 2014).

The ethics of using artificial intelligence depend on how we apply its capabilities—either to enhance learning or to prevent irresponsible practices that may compromise academic integrity. In this blog, I share reflections, experiences, and insights about the impact of AI in our environment, analyzing its role as a creative tool in the hands of developers and as a challenge within the academic context.

Between industry and the classroom

As a Senior Developer, my professional trajectory has led me to delve deeply into the fascinating discipline of software architecture. Currently, I work as a Backend Developer specializing in Microsoft technologies, facing daily the challenges of building robust, scalable, and well-structured systems in the business world.

Alongside my role in the industry, I am privileged to serve as a university professor, teaching four courses. Three of them are fundamental parts of the software development lifecycle: Software Analysis and Design, Software Architecture, and Programming Techniques. This dual perspective—as both a professional and a teacher—has allowed me to observe the rapid changes that technology is generating both in daily development practice and in the formation of future engineers.

Exploring AI as an Accelerator in Software Development

One of the greatest challenges for those studying the software development lifecycle is transforming ideas and diagrams into functional, well-structured projects. I always encourage my students to use Artificial Intelligence as a tool for acceleration, not as a substitute.

For example, in the Software Analysis and Design course, we demonstrate how a BPMN 2.0 process diagram can serve as a starting point for modeling a system. We also work with class diagrams that reflect compositions and various design patterns. AI can intervene in this process in several ways:

  • Code Generation from Models: With AI-based tools, it’s possible to automatically turn a well-built class diagram into the source code foundation needed to start a project, respecting the relationships and patterns defined during modeling.
  • Rapid Project Architecture Setup: Using AI assistants, we can streamline the initial setup of a project by selecting the technology stack, creating folder structures, base files, and configurations according to best practices.
  • Early Validation and Correction: AI can suggest improvements to proposed models, detect inconsistencies, foresee integration issues, and help adapt the design context even before coding begins.

This approach allows students to dedicate more time to understanding the logic behind each component and design principle, instead of spending hours on repetitive setup and basic coding tasks. The conscious and critical use of artificial intelligence strengthens their learning, provides them with more time to innovate, and helps prepare them for real-world industry challenges.

But Not Everything Is Perfect: The Challenges in Programming Techniques

However, not everything is as positive as it seems. In “Programming Techniques,” a course that represents students’ first real contact with application development, the impact of AI is different compared to more advanced subjects. In the past, the repetitive process of writing code—such as creating a simple constructor public Person(), a function public void printFullName() or practicing encapsulation in Java with methods like public void setName(String name) and public String getName()—kept the fundamental programming concepts fresh and clear while coding.

This repetition was not just mechanical; it reinforced their understanding of concepts like object construction, data encapsulation, and procedural logic. It also played a crucial role in developing a solid foundation that made it easier to understand more complex topics, such as design patterns, in future courses.

Nowadays, with the widespread availability and use of AI-based tools and code generators, students tend to skip these fundamental steps. Instead of internalizing these concepts through practice, they quickly generate code snippets without fully understanding their structure or purpose. As a result, the pillars of programming—such as abstraction, encapsulation, inheritance, and polymorphism—are not deeply absorbed, which can lead to confusion and mistakes later on.

Although AI offers the promise of accelerating development and reducing manual labor, it is important to remember that certain repetition and manual coding are essential for establishing a solid understanding of fundamental principles. Without this foundation, it becomes difficult for students to recognize bad practices, avoid common errors, and truly appreciate the architecture and design of robust software systems.

Reflection and Ethical Challenges in Using AI

Recently, I explained the concept of reflection in microservices to my Software Architecture students. To illustrate this, I used the following example: when implementing the Abstract Factory design pattern within a microservices architecture, the Reflection technique can be used to dynamically instantiate concrete classes at runtime. This allows the factory to decide which object to create based on external parameters, such as a message type or specific configuration received from another service. I consider this concept fundamental if we aim to design an architecture suitable for business models that require this level of flexibility.

However, during a classroom exercise where I provided a base code, I asked the students to correct an error that I had deliberately injected. The error consisted of an additional parameter in a constructor—a detail that did not cause compilation failures, but at runtime, it caused 2 out of 5 microservices that consumed the abstract factory via reflection to fail. From their perspective, this exercise may have seemed unnecessary, which led many to ask AI to fix the error.

As expected, the AI efficiently eliminated the error but overlooked a fundamental acceptance criterion: that parameter was necessary for the correct functioning of the solution. The task was not to remove the parameter but to add it in the Factory classes where it was missing. Out of 36 students, only 3 were able to explain and justify the changes they made. The rest did not even know what modifications the AI had implemented.

This experience highlights the double-edged nature of artificial intelligence in learning: it can provide quick solutions, but if the context or the criteria behind a problem are not understood, the correction can be superficial and jeopardize both the quality and the deep understanding of the code.

I haven’t limited this exercise to architecture examples alone. I have also conducted mock interviews, asking basic programming concepts. Surprisingly, even among final-year students who are already doing their internships, the success rate is alarmingly low: approximately 65% to 70% of the questions are answered incorrectly, which would automatically disqualify them in a real technical interview.

Conclusion

Artificial intelligence has become increasingly integrated into academia, yet its use does not always reflect a genuine desire to learn. For many students, AI has turned into a tool for simply getting through academic commitments, rather than an ally that fosters knowledge, creativity, and critical thinking. This trend presents clear risks: a loss of deep understanding, unreflective automation of tasks, and a lack of internalization of fundamental concepts—all crucial for professional growth in technological fields.

Various authors have analyzed the impact of AI on educational processes and emphasize the importance of promoting its ethical and constructive use. As Luckin et al. (2016) suggest, the key lies in integrating artificial intelligence as support for skill development rather than as a shortcut to avoid intellectual effort. Similarly, Selwyn (2019) explores the ethical and pedagogical challenges that arise when technology becomes a quick fix instead of a resource for deep learning.

References:

]]>
https://blogs.perficient.com/2025/12/03/creators-in-coding-copycats-in-class-the-double-edged-sword-of-artificial-intelligence/feed/ 0 388808
Driving Measurable Impact: Rochester Regional Health Earns Dual Industry Honors https://blogs.perficient.com/2025/11/11/driving-measurable-impact-rochester-regional-health-access-to-care/ https://blogs.perficient.com/2025/11/11/driving-measurable-impact-rochester-regional-health-access-to-care/#respond Tue, 11 Nov 2025 15:30:47 +0000 https://blogs.perficient.com/?p=388288

Healthcare leaders face a critical mandate: deliver seamless, patient-centered experiences while boosting efficiency and measurable outcomes. Lasting transformation happens when strategy, data, technology, and experience converge—and Rochester Regional Health’s recent recognition proves what’s possible.

We’re proud to share that our work with Rochester Regional Health earned two 2025 eHealthcare Leadership Awards and the Sitecore Digital Impact Award for Business Impact, underscoring the power of strategic digital investments in healthcare.

Why This Matters for Healthcare Leaders

Patients expect frictionless access to care, personalized experiences, and real-time engagement. Our recent Access to Care research outlines how these priorities drive competitive advantage for healthcare organizations. More than 50% of respondents who encountered friction when scheduling an appointment took their care elsewhere. That’s not just lost revenue—it’s lost continuity, lost data, and lost trust. To deliver on consumers’ expectations, leaders need a unified digital strategy that connects systems, streamlines workflows, and gives consumers simple, reliable ways to find and schedule care.

Rochester Regional Health and Perficient embraced this challenge, consolidating dozens of disparate websites into one seamless experience and implementing a mobile-first design that mirrors the simplicity of modern commerce. The results speak volumes.


Sitecore Digital Impact Awards 2025

Business Impact

Sitecore Digital Impact Award: Business Impact | Recognized for removing friction and focusing on experience, this award demonstrates how digital transformation accelerates growth and improves care access. Sitecore shares, “The Business Impact winners remind us that digital transformation only matters when it delivers real results for people and the business. Rochester Regional Health and Perficient turned 24 disconnected websites into one seamless experience, helping patients get to the care they need faster.… These stories show what happens when great brands remove friction, focus on experience, and grow because of it.” | Learn more about this award


eHealthcare Leadership Award 2025 WinnerBest Mobile Experience

eHealthcare Leadership Award, Gold, Healthcare System | A mobile-first redesign delivers intuitive navigation, regional personalization, and real-time appointment scheduling, boosting accessibility, engagement, and conversions. This award recognizes the best examples of healthcare mobile experience, whether via installed app or mobile website via a browser. Judges evaluated usability, design, branding, quality of content, clarity of purpose and consumer ratings. | Learn more about this award


eHealthcare Leadership Award 2025 Winner

Best Use of Artificial Intelligence in Healthcare Marketing

eHealthcare Leadership Award, Distinction, Healthcare System | Rochester Regional’s new site offers smart search, dynamic filters, and real-time booking making it easy for patients and their caregivers to discover and schedule care that best supports patient needs. It drove a 26% boost in appointment scheduling and $79K+ monthly saving in call center costs. This category awarded the successful application of AI and Machine Learning (ML) to achieve marketing goals, including customer acquisition and retention, online content personalization, digital experience, understanding user intent, physician search, call center optimization, and more. | Learn more about this award


What This Signals for 2026

The next phase of digital priorities will focus on scalable personalization, AI-driven operational efficiency, and connected ecosystems that extend beyond the hospital walls. Leaders are investing in platforms that integrate clinical, financial, and consumer data to deliver proactive care and predictive insights. Digital-first models, intelligent scheduling, and automation will become standard. Organizations that build flexible, cloud-based architectures now and leverage AI for personalization and resource optimization position themselves to improve access, reduce costs, and strengthen patient loyalty in a competitive market.

Explore the full case study to see how Rochester Regional Health partnered with Perficient to make this vision a reality.

Reimagine Access to Care with Confidence

These awards validate the impact of our approach and reinforce the urgency of digital innovation as a strategic imperative for healthcare leaders.

More importantly, it reflects what we’re hearing across the industry: the need to prioritize consumer-centric transformation is accelerating. Leaders are looking for solutions that improve access, personalize engagement, and deliver measurable outcomes for both patients and the business.

From insight to impact, our healthcare expertise equips leaders to modernize, personalize, and scale care. We drive resilient, AI-powered transformation to shape the experiences and engagement of health care consumers, streamline operations, and improve the cost, quality, and equity of care.

  • Business Transformation: Activate strategy for transformative outcomes and health experiences.
  • Modernization: Maximize technology to drive health innovation, efficiency, and interoperability.
  • Data + Analytics: Power enterprise agility and accelerate healthcare insights.
  • Consumer Experience: Connect, ease, and elevate impactful health journeys.

We are trusted by leading technology partners, mentioned by analysts, and Modern Healthcare consistently ranks us as one of the largest healthcare consulting firms.

Discover why we’ve been trusted by the 10 largest health systems and the 10 largest health insurers in the U.S. Explore our healthcare expertise and contact us to learn more.

]]>
https://blogs.perficient.com/2025/11/11/driving-measurable-impact-rochester-regional-health-access-to-care/feed/ 0 388288
House Price Predictor – An MLOps Learning Project Using Azure DevOps https://blogs.perficient.com/2025/08/06/house-price-predictor-an-mlops-learning-project-using-azure-devops/ https://blogs.perficient.com/2025/08/06/house-price-predictor-an-mlops-learning-project-using-azure-devops/#comments Wed, 06 Aug 2025 12:28:37 +0000 https://blogs.perficient.com/?p=385548

Machine Learning (ML) is no longer limited to research labs — it’s actively driving decisions in real estate, finance, healthcare, and more. But deploying and managing ML models in production is a different ballgame. That’s where MLOps comes in.

In this blog, we’ll walk through a practical MLOps learning project — building a House Price Predictor using Azure DevOps as the CI/CD backbone. We’ll explore the evolution from DevOps to MLOps, understand the model development lifecycle, and see how to automate and manage it effectively.

What is MLOps?

MLOps (Machine Learning Operations) is the discipline of combining Machine Learning, DevOps, and Data Engineering to streamline the end-to-end ML lifecycle.

It aims to:

  • Automate training, testing, and deployment of models
  • Enable reproducibility and version control for data and models
  • Support continuous integration and delivery (CI/CD) for ML workflows
  • Monitor model performance in production

MLOps ensures that your model doesn’t just work in Jupyter notebooks but continues to deliver accurate predictions in production environments over time.

From DevOps to MLOps: The Evolution

DevOps revolutionized software engineering by integrating development and operations through automation, CI/CD, and infrastructure as code (IaC). However, ML projects add new complexity:

Aspect Traditional DevOps MLOps
Artifact Source code Code + data + models
Version Control Git Git + Data Versioning (e.g., DVC)
Testing Unit & integration tests Data validation + model validation
Deployment Web services, APIs ML models, pipelines, batch jobs
Monitoring Logs, uptime, errors Model drift, data drift, accuracy decay

So, MLOps builds on DevOps but extends it with data-centric workflows, experimentation tracking, and model governance.

House Price Prediction: Project Overview

Our goal is to build an ML model that predicts house prices based on input features like square footage, number of bedrooms, location, etc. This learning project is structured to follow MLOps best practices, using Azure DevOps pipelines for automation.

 Project Structure

house-price-predictor/

├── configs/               # Model configurations stored in YAML format

├── data/                  # Contains both raw and processed data files

├── deployment/  

│    └── mlflow/           # Docker Compose files to set up MLflow tracking

├── models/                # Saved model artifacts and preprocessing objects

├── notebooks/             # Jupyter notebooks for exploratory analysis and prototyping

├── src/

│    ├── data/             # Scripts for data preparation and transformation

│    ├── features/         # Logic for generating and engineering features

│    ├── models/           # Code for model building, training, and validation

├── k8s/

│    ├── deployment.yaml        # Kubernetes specs to deploy the Streamlit frontend

│    └── fast_model.yaml        # Kubernetes specs to deploy the FastAPI model service

├── requirements.txt       # List of required Python packages

 Setting Up Your Development Environment

Before getting started, make sure the following tools are installed on your machine:

 Preparing Your Environment

  • Fork this repo on GitHub to your personal or organization account.
  • Clone your forked repository
# Replace 'xxxxxx' with your GitHub username or organization
git clone https://github.com/xxxxxx/house-price-predictor.git
cd house-price-predictor
  • Create a virtual environment using UV:
uv venv --python python3.11
source .venv/bin/activate
  • Install the required Python packages:
uv pip install -r requirements.txt

 Configure MLflow for Experiment Tracking

To enable experiment and model run tracking with MLflow:

cd deployment/mlflow
docker compose -f mlflow-docker-compose.yml up -d
docker compose ps

 Using Podman Instead of Docker?

podman compose -f mlflow-docker-compose.yml up -d
podman compose ps

Access the MLflow UI. Once running, open your browser and navigate to http://localhost:5555

Model Workflow

 Step 1: Data Processing

Perform cleaning and preprocessing on the raw housing dataset:

python src/data/run_processing.py   --input data/raw/house_data.csv   --output data/processed/cleaned_house_data.csv

 Step 2: Feature Engineering

Perform data transformations and feature generation:

python src/features/engineer.py   --input data/processed/cleaned_house_data.csv   --output data/processed/featured_house_data.csv   --preprocessor models/trained/preprocessor.pkl

 Step 3: Modeling & Experimentation

Train the model and track all metrics using MLflow:

python src/models/train_model.py   --config configs/model_config.yaml   --data data/processed/featured_house_data.csv   --models-dir models   --mlflow-tracking-uri http://localhost:5555

Step 4: Building FastAPI and Streamlit

The source code for both applications — the FastAPI backend and the Streamlit frontend — is already available in the src/api and streamlit_app directories, respectively. To build and launch these applications:

  • Add a Dockerfile in the src/api directory to containerize the FastAPI service.
  • Add a Dockerfile inside streamlit_app/ to package the Streamlit interface.
  • Create a docker-compose.yaml file at the project root to orchestrate both containers.
    Make sure to set the environment variable API_URL=http://fastapi:8000 for the Streamlit app to connect to the FastAPI backend.

Once both services are up and running, you can access the Streamlit web UI in your browser to make predictions.

You can also test the prediction API directly by sending requests to the FastAPI endpoint.

curl -X POST "http://localhost:8000/predict" \

-H "Content-Type: application/json" \

-d '{

  "sqft": 1500,

  "bedrooms": 3,

  "bathrooms": 2,

  "location": "suburban",

  "year_built": 2000,

  "condition": fair

}'

Be sure to replace http://localhost:8000/predict with the actual endpoint based on where it’s running.

At this stage, your project is running locally. Now it’s time to implement the same workflow using Azure DevOps.

Prerequisites for Implementing This Approach in Azure DevOps.

To implement a similar MLOps pipeline using Azure DevOps, the following prerequisites must be in place:

  1. Azure Service Connection (Workload Identity-based)
    • Create a Workload Identity Service Connection in Azure DevOps.
    • Assign it Contributor access to the target Azure subscription or resource group.
    • This enables secure and passwordless access to Azure resources from the pipeline.
  2. Azure Kubernetes Service (AKS) Cluster
    • Provision an AKS cluster to serve as the deployment environment for your ML application.
    • Ensure the service connection has sufficient permissions (e.g., Azure Kubernetes Service Cluster User RBAC role) to interact with the cluster.

Start by cloning the existing GitHub repository into your Azure Repos. Inside the repository, you’ll find the azure-pipeline.yaml file, which defines the Azure DevOps CI/CD pipeline consisting of the following four stages:

  1. Data Processing Stage – Handles data cleaning and preparation.
  2. Model Training Stage – Trains the machine learning model and logs experiments.
  3. Build and Publish Stage – Builds Docker images and publishes them to the container registry.
  4. Deploy to AKS Stage – Deploys the application components to Azure Kubernetes Service (AKS).

This pipeline automates the end-to-end ML workflow from raw data to production deployment.

The CI/CD pipeline is already defined in the existing YAML file and is configured to run manually based on the parameters specified at runtime.

This pipeline is manually triggered (no automatic trigger on commits or pull requests) and supports the conditional execution of specific stages using parameters.

It consists of four stages, each representing a step in the MLOps lifecycle:

  1. Data Processing Stage

Condition: Runs if run_all or run_data_processing is set to true.

What it does:

  • Check out the code.
  • Sets up Python 3.11.13 and installs dependencies.
  • Runs scripts to:
    • Clean and preprocess the raw dataset.
    • Perform feature engineering.
  • Publishes the processed data and the trained preprocessor as pipeline artifacts
  1. Model Training Stage

Depends on: DataProcessing
Condition: Runs if run_all or run_model_training is set to true.

What it does:

  • Downloads the processed data artifact.
  • Spins up an MLflow server using Docker.
  • Waits for MLflow to be ready.
  • Trains the machine learning model using the processed data.
  • Logs the training results to MLflow.
  • Publishes the trained model as a pipeline artifact.
  • Stops and removes the temporary MLflow container.
  1. Build and Publish Stage

Depends on: ModelTraining
Condition: Runs if run_all or run_build_and_publish is set to true.

What it does:

  • Downloads trained model and preprocessor artifacts.
  • Builds Docker images for:
    • FastAPI (model API)
    • Streamlit (frontend)
  • Tag both images using the current commit hash and the latest.
  • Runs and tests both containers locally (verifies /health and web access).
  • Pushes the tested Docker images to Docker Hub using credentials stored in the pipeline.
  1. Deploy to AKS Stage

Depends on: BuildAndPublish
Condition: Runs only if the previous stages succeed.

What it does:

  • Uses the Azure CLI to:
    • Set the AKS cluster context. #Make sure to update the cluster name
    • Update Kubernetes deployment YAML files with the new Docker image tags.
    • Apply the updated deployment configurations to the AKS cluster using kubectl.

Now, the next step is to set up the Kubernetes deployment and service configuration for both components of the application:

  • Streamlit App: This serves as the frontend interface for users.
  • FastAPI App: This functions as the backend, handling API requests from the Streamlit frontend and returning model predictions.

Both deployment and service YAML files for these components are already present in the k8s/ folder and will be used for deploying to Azure Kubernetes Service (AKS).

This k8s/deployment.yaml file sets up a Streamlit app on Kubernetes with two key components:

  • Deployment: Runs 2 replicas of the Streamlit app using a Docker image. It exposes port 8501 and sets the API_URL environment variable to connect with the FastAPI backend.
  • Service: Creates a LoadBalancer service that exposes the app on port 80, making it accessible externally.

In short, it deploys the Streamlit frontend and makes it publicly accessible while connecting it to the FastAPI backend for predictions.

This k8s/fastapi_model.yaml file deploys the FastAPI backend for the house price prediction app:

  • It creates a Deployment named house-price-api with 2 replicas running the FastAPI app on port 8000.
  • A LoadBalancer Service named house-price-api-service exposes the app externally on port 8000, allowing other services (like Streamlit) or users to access the API.

In short, it runs the backend API in Kubernetes and makes it accessible for predictions.

Now it’s time for the final run to verify the deployment on the AKS cluster. Trigger the pipeline by selecting the run_all parameter.

Run All Image

 

After the pipeline completes successfully, all four stages and their corresponding jobs will be executed, confirming that the application has been successfully deployed to the AKS cluster.

 

Mlops Stages

Mlops Jobs

 

Now, log in to the Azure portal and retrieve the external IP address of the Streamlit app service. Once accessed in your browser, you’ll see the House Price Prediction Streamlit application up and running.

 

Aks Ips

 

Mlops Page

 

Now, go ahead and perform model inference by selecting the appropriate parameter values and clicking on “Predict Price” to see how the model generates the prediction.

 

Mlops Predict

Conclusion

In this blog, we explored the fundamentals of MLOps and how it bridges the gap between machine learning development and scalable, production-ready deployment. We walked through a complete MLOps workflow—from data processing and feature engineering to model training, packaging, and deployment—using modern tools like FastAPI, Streamlit, and MLflow.

Using Azure DevOps, we implemented a robust CI/CD pipeline to automate each step of the ML lifecycle. Finally, we deployed the complete House Price Predictor application on an Azure Kubernetes Service (AKS) cluster, enabling a user-friendly frontend (Streamlit) to interact seamlessly with a predictive backend (FastAPI).

This end-to-end project not only showcases how MLOps principles can be applied in real-world scenarios but also provides a strong foundation for deploying scalable and maintainable ML solutions in production.

]]>
https://blogs.perficient.com/2025/08/06/house-price-predictor-an-mlops-learning-project-using-azure-devops/feed/ 1 385548
Your Phone is the New Target: Mobile Malware Trends in 2025 https://blogs.perficient.com/2025/04/08/mobile-malware-trends-in-2025/ https://blogs.perficient.com/2025/04/08/mobile-malware-trends-in-2025/#respond Tue, 08 Apr 2025 11:52:41 +0000 https://blogs.perficient.com/?p=379744

Smartphones have become integral to our daily lives, the rise in mobile malware attacks is alarming. Recent reports indicate a significant escalation in these threats, both globally and with a great impact on India.

Global Surge in Mobile Malware Attacks

The Zscaler ThreatLabz 2024 Mobile, IoT, and OT Threat Report analyzed over 20 billion mobile threat transactions between June 2023 and May 2024. The findings reveal a 29% increase in banking malware attacks and a staggering 111% rise in mobile spyware incidents. These statistics underscore the growing sophistication and frequency of cyber threats targeting mobile devices.

India’s Alarming Position

India has emerged as the top global target for mobile malware attacks, accounting for 28% of the total, surpassing the United States (27.3%) and Canada (15.9%). This marks a significant jump from its previous third-place ranking, highlighting the urgent need for robust cybersecurity measures in the country.
Within the Asia-Pacific region, India dominates with a staggering 66.5% share of mobile malware attacks. This surge is accompanied by a sharp increase in sophisticated phishing campaigns targeting users of leading private Indian banks, such as HDFC, ICICI, and Axis. These attacks often involve fake banking websites that closely mimic legitimate ones, deceiving users into disclosing sensitive information.

*Source: NDTV – India Tops Global List for Mobile Malware Attacks

*Source: New Indian Express – Banking Systems Particularly Vulnerable

Case Study: The Rise of Crocodilus Malware

A recent example of advanced mobile malware is “Crocodilus,” a banking trojan identified by cybersecurity firm Threat Fabric. Initially targeting users in Spain and Turkey, Crocodilus is anticipated to expand globally. This malware employs overlay attacks to steal sensitive information, including banking credentials and cryptocurrency wallet keys. Once installed, it leverages Android’s Accessibility Service to gain control over device functions, enabling fraudulent transactions and unauthorized access to financial assets.

*Source: Meristation/AS – New Mobile Hijacking Malware Discovered

The Indian Scenario: A Closer Look

The India Cyber Threat Report 2025 highlights that there have been 369.01 million malware detections across an 8.44 million-strong installation base in India. Trojans account for 43.38% of these detections, followed by Infectors (34.23%) and Worms (8.43%). Notably, Telangana, Tamil Nadu, and Delhi are the top three regions affected by malware attacks, with the healthcare, hospitality, and BFSI (Banking, Financial Services, and Insurance) sectors being the most targeted.

*Source: Times of India – India Records 369 Million Malware Detections

Protecting Yourself Against Mobile Malware

Given the escalating threats, it’s important for individuals and organizations to adopt strict security measures:

  • Stay Updated: Regularly update your device’s operating system and applications to patch security vulnerabilities.

  • Download Wisely: Only install apps from official app stores and verify their authenticity.

  • Be Cautious with Permissions: Review app permissions carefully and avoid granting unnecessary access.

  • Use Security Software: Install reputable mobile security solutions(Bitdefender, Norton, Kaspersky, McAfee etc.) to detect and prevent malware infections.

  • Educate Yourself: Stay informed about the latest phishing tactics and be cautious with unsolicited messages or emails.

Conclusion

As mobile devices continue to be integral to our personal and professional lives, ensuring their security is priority. By adopting proactive measures and staying vigilant, we can mitigate the risks posed by the evolving  mobile malware threats.

]]>
https://blogs.perficient.com/2025/04/08/mobile-malware-trends-in-2025/feed/ 0 379744
The risk of using String objects in Java https://blogs.perficient.com/2024/10/25/the-risk-of-using-string-objects-in-java/ https://blogs.perficient.com/2024/10/25/the-risk-of-using-string-objects-in-java/#respond Fri, 25 Oct 2024 23:23:46 +0000 https://blogs.perficient.com/?p=370962

If you are a Java programmer, you may have been incurring an insecure practice without knowing. We all know (or should know) that is not safe to store unencrypted passwords in the database because that might compromise the protection of data at rest. But that is not the only issue, if at any time in our code there is an unencrypted password or sensitive data stored in a String variable even if it is temporary, then there could be a risk.

Why is there a risk?

String objects were not created to store passwords, they were designed to optimize space in our program. String objects in Java are “immutable” which means that after you create a String object and assign it some value, afterward you cannot remove the value nor modify it. I know you might be thinking that this is not true because you can assign “Hello World” to a given String object and in the following line assign it with “Goodbye, cruel world”, and that is technically correct. The problem is that the “Hello World” that you created first is going to keep living in the String pool even if you cannot see it.

What is the String pool?

Java uses a special memory area called the String pool to store String literals. When you create a String literal, Java checks the String pool first to see if an identical String already exists. If it does, Java will reuse the reference to the existing String, saving memory. This means that if you create 25.000 String objects and all of them have the value of “Michael Jackson” only one String Literal will be stored in memory and all variables will be pointing to the same one, optimizing the space in memory.

Ok, the object is in the String pool, where is the risk?

The String Object will remain in memory for some time before being deleted by the garbage collector. If an attacker has access to the content of the memory, they could obtain the password stored there.

Let’s see a basic example of this. The following code is creating a String object and assigning it with a secret password: “¿This is a secret password”. Then, that same object is overwritten 3 times, and the Instances Inspector of the Debugger will help us in locating String objects starting with the character “¿”.

Example 1 Code:

Code1

Example 1 Debugger:

Example1

 

As you can notice in the image when the debugger has gotten to the line 8, even after having changed three times the value of the String variable “a” and setting it to null at the end, all previous values remain in the memory, included our: “¿This is a secret password”.

 

Got it. Just avoiding creating String variables will solve the problem, right?

It is not that simple. Let us consider a second example. Now we are smarter, and we are going to use a char array to store the password instead of the String to avoid the issue of having it saved in the String pool. In addition, rather than having the secret password as literal in the code, it will be available unencrypted in a text file, which by the way is not recommended to save it unencrypted, but we will do it for this example. A BufferedReader is going to support reading the contents of the file.

Unfortunately, as you will see, password also exist in the String pool.

Example 2 Code:

Code2

Example 2 Debugger:

 

Example2 Debugger

This case is even more puzzling because in the code a String Object was never created, at least explicitly. The problem is that the BufferedReader.readLine() is returning a String Object temporarily and the content with the unencrypted password will remain in the String pool.

What can I do to solve this problem?

In this last example we will have the unencrypted password stored in a text file, we will use a BufferedReader to read the contents of the file, but instead of using the method BufferedReader.readLine()  that returns a String we are using the method BufferedReader.read() that stores the content of the file in a char array.  As seen in the debugger’s screenshot, this time the file’s contents are not available in the String pool.

Example 3 Code:

Code3

Example 3 Debugger:

Example3 Debugger

In summary

To solve this problem, consider following the principles listed below:

  1. Do not create String literals with confidential information in your code.
  2. Do not store confidential information in String objects. You can use other types of Objects to store this information such as the classic char array. After processing the data make sure to overwrite the char array with zeros or some random chars, just to confuse attackers.
  3. Avoid calling methods that will return the confidential information as String, even if you will not save that into a variable.
  4. Consider applying an additional security layer by encrypting confidential information. The SealedObject in Java is a great alternative to achieve this. The SealedObject is a Java Object where you can store sensitive data, you provide a secret key, and the Object is encrypted and serialized. This is useful if you want to transmit it and ensure the content remains unexposed. Afterward, you can decrypt it using the same secret key. Just one piece of advice, after decrypting it, please do not store it on a String object.
]]>
https://blogs.perficient.com/2024/10/25/the-risk-of-using-string-objects-in-java/feed/ 0 370962
Success Story: Enhancing Member Engagement with Marketing Cloud Personalization https://blogs.perficient.com/2024/10/22/success-story-enhancing-member-engagement-with-marketing-cloud-personalization/ https://blogs.perficient.com/2024/10/22/success-story-enhancing-member-engagement-with-marketing-cloud-personalization/#respond Tue, 22 Oct 2024 16:44:09 +0000 https://blogs.perficient.com/?p=370869

Introduction 

In today’s digital age, personalization is key to engaging and retaining members. A prominent labor union representing a large number of educators in a major metropolitan public school system recognized this need and partnered with Perficient to implement a scalable Salesforce Marketing Cloud Personalization solution. This collaboration aimed to drive member engagement, increase portal adoption, and improve navigation for its members. 

About the Union 

This union represents a substantial number of teachers, classroom paraprofessionals, and various other school-based titles, including school secretaries, school counselors, and occupational and physical therapists. Additionally, the union represents family childcare providers, nurses, and employees at various private educational institutions and some charter schools. The union also has a significant number of retired members. 

The union’s central headquarters is located in a major city, with additional offices to assist members with certification, licensing, salaries, grievances, and pensions. Founded in the 1960s, the union is part of a larger national federation of teachers, a state-wide teachers’ organization, and is affiliated with major labor councils. 

The Challenge 

The union faced the challenge of effectively engaging its members and ensuring they were aware of the benefits and resources available to them. The goal was to create a more personalized and user-friendly experience on their member portal, which would lead to higher engagement and satisfaction. 

The Solution 

Perficient stepped in to provide a comprehensive solution that included: 

  1. Scalable Marketing Cloud Personalization Foundation: Establishing a robust foundation for ongoing personalization efforts. 
  2. High-Value Use Cases: Implementing three high-value use cases focused on educating site visitors about the benefits available through the union and the resources on the Member Hub.
  3. Knowledge Transfer Sessions: Enabling the union team to create additional personalization campaigns through detailed knowledge transfer sessions. 

What is Marketing Cloud Personalization? 

Marketing Cloud Personalization provides real-time, scalable, cross-channel personalization and AI to complement Marketing Cloud Engagement’s robust customer data, audience segmentation, and engagement platform. It uses tailored interactions with customers and prospects to increase loyalty, engagement, and conversions, delivering more relevant experiences across the customer journey. 

Unified Profiles 

Personalization helps understand each visitor by building a centralized individual profile from different data sources. This profile includes preferences and affinities, providing a visual representation of all data about a single visitor. This information helps decide how and when to best interact with them on their preferred channels. Profiles can also be rolled up to the account level to view relationships among visitor behaviors associated with the same account. 

The Results 

Through the personalized website experiences, the union saw a significant increase in member engagement, portal adoption, and website conversions. The scalable personalization foundation allowed them to continuously optimize and expand their targeted campaigns, ensuring ongoing improvements and relevance. 

The union’s Marketing Director praised the consultancy’s exceptional work, stating: 

“Perficient is one of the strongest partners I have ever worked with on strategy and implementation. The team was amazing from start to finish. Our product is live and running. I immediately secured the team to continue working on other projects so we wouldn’t lose these great resources.” 

This seamless partnership and effective personalization solution enabled the union to better serve its members and achieve its key engagement objectives. By leveraging Marketing Cloud Personalization, the union not only enhanced the user experience but also empowered its team to sustain and grow these efforts independently. 

The partnership between the union and Perficient showcases the power of personalized digital experiences in driving member engagement and satisfaction.  

About Our Salesforce Team

We are a Salesforce Summit Partner with more than two decades of experience delivering digital solutions in the manufacturing, automotive, healthcare, financial services, and high-tech industries. Our team has deep expertise in all Salesforce Clouds and products, artificial intelligence, DevOps, and specialized domains to help you reap the benefits of implementing Salesforce solutions.  

Want to learn more? Schedule some time with us  to explore Marketing Cloud Personalization! 

]]>
https://blogs.perficient.com/2024/10/22/success-story-enhancing-member-engagement-with-marketing-cloud-personalization/feed/ 0 370869
Custom Weather Forecast Model Using ML Net https://blogs.perficient.com/2024/09/10/custom-weather-forecast-model-using-ml-net/ https://blogs.perficient.com/2024/09/10/custom-weather-forecast-model-using-ml-net/#respond Tue, 10 Sep 2024 20:12:22 +0000 https://blogs.perficient.com/?p=368939

Nowadays, AI is a crucial field with various frameworks like ML.NET that can be used to build amazing applications using pre-built models from cloud providers. It’s important to learn how these services work behind the scenes, how to create custom models, and understand how your application can interact with AI frameworks beyond just cloud providers or the source of the AI services.

How can I use ML Net?

ML Net can be used with Visual Studio 2019 or later, using any version of Visual Studio, and also can be used by Visual Studio Code, but only works on a Windows OS, Its prerequisites are:

  • Visual Studio 2022 or Visual Studio 2019.
  • .NET Core 3.1 SDK or later.

ML Net 1

Image 1: Visual Studio installer, Installation Details contains the ML Net Model builder

ML Net 2

Image 2: Visual Studio Context Menu

After adding the ML Net component to your project, you can see a wizard that allows you to set up your model as you need (Image 3).

ML Net 3

Image 3: ML NET Wizard

Application Overview

The application starts with the weather Groups, every item contains a temperature range, a button to search the Historical data, and a forecast prediction (Image 4).

ML Net 4

Image 4: Weather forecast main page.

The source of those groups is a table named Weather with the attributes:

  • Id: primary key
  • Description: that is the group description, you can see it as the title of the cards in image 4
  • MinRange: Minimal temperature belongs to the group.
  • MaxRange: Maximum temperature to belongs to the group.

The “History” button shows a table with all the historical data paginated. The historical data contains,  the date with format (yyyy-mm-dd),  the temperature, and if the day was cloudy (Image 5)

 

ML Net 5

Image 5: Weather forecast historical page.

The predict option allows the users to generate their own prediction using ML Net through an API endpoint, the input data is the number of days from today that the user will predict and if the day will be cloudy (Image 6)

Image6

Image 6: Prediction page

The API result is the date, the group, and the percentage of probability that the date will belong to the group, also shows a table with the percentage of probability of every group.

Model

In the real world, there are lots of variables to keep in mind if you want to implement a Weather Forecast prediction app, such as wind speed, temperature, the season, humidity, if it was cloudy, etc.(2)

The scope of this approach is to see how ML Net can solve a custom model; therefore, a simple custom model was created, based on the temperature, and the season and if the day is cloudy, the model uses the weather as group of different forecasts, then the custom training model was designed as follow (Image 7):

  • Weather (Id): Every grouper has an ID, so the label to predict it is the ID.
  • Date: it is the feature of the date related to the weather
  • IsCloudy: it’s a Boolean feature that indicates the relationship between weather and clouds.
  • Season (Id): it is a feature that indicates the relationship between weather and season (Every season has an id)

Image7

Image 7: Training data section from ML Net wizard

You can get the data from Files, SQL Server databases, for this case, the data was collected from a View on SQL Server.

Project Architecture Overview

The weather forecast has 2 sites a front-end and a back-end, the data was stored in a SQL Server Database (Image 8). With this overall approach, the system was designed to separate the responsibilities of the business logic, the data, and the user experience.

Image8

Image 8: Sites and database

Front-end

You can find the app repository on GitHub using the following URL: https://github.com/joseflorezr/trainingangularversion

The front-end repository contains an angular 18 solution, which uses angular materials to help improve the user experience, and routing for navigation. The solution contains the following components (image 9):

  • Forecast-header: The top component of the page, it shows the title with its style.
  • Forecast-prediction: Contains the form for weather predictions and shows the results.
  • Forecast results: Contains the historical data.
  • Weather: Shows the groups of weather forecasts
  • Services: Connects to the API to get weather, forecasts, and predictions
  • Model: interfaces that map with the API

Image9

Image 9: Front-end components

Back-end

You can find the app repository on GitHub using the following URL: https://github.com/joseflorezr/WebApiMlNetWeatherForecast.

Image10

Image 10: Back End components

The API solution contains  the following projects:

  • TestWebAPi: Web API with the endpoints, contains 3 controllers, Weather, forecast, and WeatherForecast. WeatherForecast is an abstract class with the logger and the use case reference injection.
  • Business: Contains the classes that contain the business logic, based on the Use Case approach(4)
  • Model: It is the abstraction of the domain objects like Weather, Forecast, Season, and predicted forecast
  • Data: This library contains 2 parts:
    • The integration at the data level, the context with Entity Framework to get to the database.
    • The integration with ML Net, after being added to the solution, some  support files were  scaffolded with the same name but different descriptions, in this case, the file is MLForecastModel:
      • mbconfig: contains the wizard that helps to change the settings.
      • consumption: a partial class that allows interaction with the model.
      • evaluate: a partial class that allows to calculate of the metrics
      • mlnetl: this file contains the knowledge base; it is important to share the file at the  API level.
      • training: Adds the training methods that support the creation of the file.

Database Project(3)

The data was built abstracting the concepts of the Weather and Season as master entities with their description, otherwise Forecast it’s the historical table that contains the information for a specific date (1 row per day) the observation, that means, the temperature, the season id and then the weather id.

Visual Studio contains a database project that allows developers to create, modify, and deploy databases, and can run scripts after the deployment. To create the ML Net model, a View named WeatherForecast was used because it’s easier to connect to the ML Net Wizard.  The image 11 shows the relationship between the tables.

Image11

Image 11: Database diagram

Database projects can be deployed using the SQL Schema comparer tool, there is a post-build script that creates the data to the database model. For this app, a script was executed simulating forecast data from 1900-01-01 to 2024-06-04. The script uses random data, so the results must be different every time that you populate the forecast table.

WeatherForecast view concentrates the data used by ML Net to create the model.

API Project

The API project exposes endpoints that support getting the groups (Weather Controller), getting the historical Forecast data (Forecast Controller), and predict (Forecast Controller)

Image12

Image 12:  Web API Swagger

Note: The ML net file must be added as a resource of the API because the MLForecastModel class at the moment the API uses the prediction functionality, tries to look at the file on a specific path (it could be changed).

 Image13

Image 13: ML Net file location

Model Project

Contains the DTOs that can be transferred to the Front-end, basically, the weather entity has the group description and the temperature ranges, the season contains the description of the starting and end months, the forecast has the temperature, date if the day was cloudy and id, PredictedForecast inherits from forecast and the score, and weather description was added (Image 14).

Image14

Image 14: Entities

Basically, ML Net  creates the MLForecastModel class, it contains the methods to use the prediction model (the result is different for the chosen scenario), but in general terms, the idea is to send an Input object (defined by ML Net) and receive results as follows:

  • For a single object, use the Predict method, it will return the score for the predicted label.
  • If you want to get the labels, use the GetLabels method, it will return all the labels as an IEnumerable.
  • If you want to evaluate all labels, PredictAllLabels is the method, it will return a sorted IEnumerable with key-value pairs (label and score)
  • If you want to map an unlabeled result, use the GetSortedScoresWithLabels, it will return a sorted IEnumerable with key-value pairs (label and score)

The PredictAsync Method (Image 15), creates the input object, starting with the user input (id, days, cloudy), it gets the projected date adds the days, and then finds the season ID based on the month (GetSeasonMethod). After the input project was complete, the chosen method to use was PredictAllLabels. In this case, the label is a Weather ID, so it was needed to get the Description from the Database for every given label.

Image15

Image 15: PredictAsync Implementation

Summary

  • You can use ML NET to create your own Machine Learning models and use them as part of your API solution.
  • There are multiple options (scenarios) to choose from according to your needs.
  • Models can be created using diverse sources, such as Database objects, or files.

References

  1. https://learn.microsoft.com/en-us/dotnet/machine-learning/how-does-mldotnet-work
  2. https://content.meteoblue.com/en/research-education/specifications/weather-variables
  3. https://visualstudio.microsoft.com/vs/features/ssdt/
  4. https://medium.com/@pooja0403keshri/clean-architecture-a-comprehensive-guide-with-c-676beed5bdbb
  5. https://learn.microsoft.com/en-us/aspnet/core/fundamentals/dependency-injection?view=aspnetcore-8.0

 

 

]]>
https://blogs.perficient.com/2024/09/10/custom-weather-forecast-model-using-ml-net/feed/ 0 368939
Retrieve Your Application Data Using AWS ElastiCache https://blogs.perficient.com/2024/06/26/retrieve-your-application-data-using-aws-elasticache/ https://blogs.perficient.com/2024/06/26/retrieve-your-application-data-using-aws-elasticache/#respond Wed, 26 Jun 2024 13:48:03 +0000 https://blogs.perficient.com/?p=364838

AWS ElastiCache is a service that improves web application performance by retrieving information from fast-managed in-memory caches.

What is Caching?

Caching is the process of storing data in a cache. A cache is a temporary storage area. Cache are optimized for fast retrieval with the trade off that data is not durable.

The cache is used for reading purposes only, which can access your application data promptly.

ElastiCache supports the following two popular open-source in-memory caching engines:

  • Memcached: A high-performance, distributed memory object caching system well-suited for use cases where simple key-value storage and retrieval are required.
  • Redis: An open-source, in-memory key-value store that supports various data structures such as strings, hashes, lists, sets, and more. Redis is often used for caching, session management, real-time analytics, and messaging.

Which Caching Engine is Best?

Redis has more Advanced features than Memcached. A data structure server stores data in a key-value format to be served quickly. It allows replication, clustering, and configurable persistence. It is recommended if you want a highly scalable data store shared by multiple processes, applications, or servers or just as a caching layer.

On the Other Hand, Memcached is an in-memory key-value store for small chunks of data that fetch data from a database, API calls, or page rendering.

Memcached is used to speed up the dynamic web application.

Both Caching engines have their own usage depending on your requirements. Here, we are going to use Redis Cache.

Architecture Diagram of  AWS ElastiCache

Architecture

According to the Architecture diagram, whenever a read request is generated by the user, information is first searched in ElastiCache. If the data is not available in the Cache, then the request is served from the Database.

If the requested data is present in the cache, then the reply is very quick; otherwise, the Database is responsible for serving the request, which increases the latency.

Why We Need AWS ElastiCache

  • Performance Improvement: ElastiCache stores frequently used data from the database, which helps to improve the Application’s performance.
  • Scalability: ElastiCache can quickly help set up, manage, and scale distributed in-memory cache in the cloud.
  • High Availability and Reliability: ElastiCache supports multi-AZ functionality, which means if one AZ is unavailable, ElastiCache continues to serve data in the other AZ. ElastiCache supports replication and provides automatic failover, in which if the master node fails, one of the read replicas promotes itself as a master node. This is particularly crucial for critical applications that require constant uptime.
  • Cost-Effectiveness: With ElastiCache, There is no upfront cost or long-term commitment. You just pay a simple monthly charge for each Redis node you use. By offloading traffic from databases to cache layers, ElastiCache helps reduce the workload on your databases.
  • Security: ElastiCache comes with various security features, including encryption in transit and at rest, identity and access management (IAM) policies, and integration with Amazon Virtual Private Cloud (VPC), helping to protect your cached data.
  • Compatibility: ElastiCache is compatible with variety of popular frameworks and libraries, it is easy to integrate with existing applications.

Use Cases of AWS ElastiCache

  • Chat Application
  • Media Streaming
  • Session store
  • Gaming Leaderboard’s
  • Real-time analytics
  • Caching

Deployment of  AWS ElastiCache using CloudFormation Template

Let’s Deploy the AWS ElastiCache(Redis Cache) using IaC Tool (AWS CloudFormation)

Step 1: Create a Stack in AWS CloudFormation and upload a Template file.

Template file: In this template file, there is CloudFormation code, which is going to deploy the AWS ElastiCache.

Note: Repository link to download the template file: https://github.com/prafulitankar/AWS-Elasticache

Img 1

Step 2: Mention the Stack Name and Parameter Values. Here, we have provided the CloudFormation Stack name(Elasticache-01) and Parameter Values, which define the Configuration of the AWS ElastiCache Cluster.

Img 2

Img 3

Step 3: Once we’re done with the Parameter Value, let’s configure the Below Stack Options. Provide the Tags and Permissions to the Cluster.

Img 4

Step 4: Configure Stack Failure Options; here we have stack failure options:

  • Preserve successfully provisioned resources: When the stack fails, it preserves all the resources that were successfully created.
  • Delete all newly created resource : Once the stack failed it should be rollback , which means it keep all the old resources which was created previously and delete all the new resource during rollback

Img 5

Once we Submit all the necessary information, the CloudFormation stack will start creating the AWS ElastiCache Cluster.

Now, our AWS ElastiCache Cluster is available.

Img 8

How to Access AWS ElastiCache

  • AWS ElastiCache Cluster must be deployed in VPC.
  • Port Number 6379 is allowed in the Security Group from the source IP from where we access the ElastiCache Cluster.
  • To Access, the Cluster requires a Primary endpoint (master.cluster-test-001.flihgf.use2.cache.amazon.com:6379)

By using AWS ElastiCache, we can speed up our Application performance by caching data in ElastiCache, which is cost-effective, secure, and highly available to reduce the overhead and latency on the database.

]]>
https://blogs.perficient.com/2024/06/26/retrieve-your-application-data-using-aws-elasticache/feed/ 0 364838
Part 1: An Overview of the PDFBox Library https://blogs.perficient.com/2024/06/25/part-1-an-overview-of-the-pdfbox-library/ https://blogs.perficient.com/2024/06/25/part-1-an-overview-of-the-pdfbox-library/#respond Wed, 26 Jun 2024 04:27:14 +0000 https://blogs.perficient.com/?p=364863

Apache PDFBox is a versatile open-source library designed to work with PDF documents. It is widely used in various Java applications to create, modify, extract, and print PDF documents. In this part, we will provide a theoretical overview of the PDFBox library, highlighting its key features, components, and typical use cases.

Key Features of PDFBox

  1. PDF Creation

PDFBox allows developers to create new PDF documents programmatically. You can add text, images, and other graphical elements to the pages of a PDF.

  1. PDF Modification

With PDFBox, you can modify existing PDF documents. This includes adding or removing pages, altering the content of existing pages, and adding annotations or form fields.

  1. Text Extraction

The capability of PDFBox to extract text from PDF documents is among its most potent capabilities. This is especially helpful for converting PDFs to other formats, such as HTML or plain text, or for indexing and searching PDF information.

  1. Image Extraction

PDFBox provides functionality to extract images from PDF documents. This is useful when validating images within PDFs or reusing images in other applications.

  1. Form Handling

PDFBox supports interactive PDF forms (AcroForms). You can create new forms, fill existing forms, and extract data from filled forms.

  1. PDF Rendering

PDFBox includes rendering capabilities, allowing you to convert PDF pages to images. This is useful for displaying PDF content in applications that do not natively support PDF viewing.

  1. Encryption and Decryption

PDFBox supports PDF document encryption and decryption. You can secure your PDFs with passwords and manage user permissions for viewing, printing, and editing.

Components of PDFBox

  1. PDDocument

The PDDocument class represents an in-memory PDF document. It is the starting point for most PDF operations in PDFBox.

  1. PDPage

The PDPage class represents a single page in a PDF document. You can add content to a page, extract content from a page, and manipulate the page layout.

  1. PDPageContentStream

The PDPageContentStream class is used to write content to a PDPage, including text, images, and graphical elements.

  1. PDFTextStripper

The PDFTextStripper class is used for text extraction. It processes a PDDocument and extracts text content from it.

  1. PDFRenderer

The PDFRenderer class is used to render PDF pages into images. This is useful for displaying PDF pages in applications or for generating thumbnails.

  1. PDImageXObject

The PDImageXObject class represents an image within a PDF document. You can use it to extract or add new images to a PDF.

  1. PDAcroForm

The PDAcroForm class represents the interactive form fields in a PDF. It allows you to manipulate form data programmatically.

Typical Use Cases for PDFBox

  1. Generating Reports

Businesses often need to generate dynamic reports in PDF format. PDFBox can be used to create customized reports with text, tables, images, and charts.

  1. Archiving Documents

PDFBox is useful for archiving documents in a standardized format. It can convert various document types into PDFs and manage large collections of PDF documents.

  1. Content Extraction and Indexing

PDFBox is frequently used for extracting text and metadata from PDFs for indexing and search purposes. This is valuable for building searchable archives and databases.

  1. Form Processing

Many applications require the handling of PDF forms. PDFBox can create, fill, and read form data, making it ideal for automating form processing tasks.

  1. PDF Security

With PDFBox, you can add security features to your PDF documents. This includes encrypting sensitive information and managing access permissions.

  1. Displaying PDFs

PDFBox’s rendering capabilities make it suitable for applications that need to display PDF content as images, such as in a thumbnail preview or a custom PDF viewer.

Conclusion

The extensive functionality offered by Apache PDFBox makes working with PDF documents easier. Whether you want to create, edit, extract, or secure PDF files, PDFBox has the tools to get the job done quickly. Because of its Java integration, it’s a great option for developers who want to handle PDF documents inside of their apps.

By being aware of PDFBox’s features and components, you can get the most out of it for your projects and guarantee that any activities involving PDFs are completed quickly and efficiently.

]]>
https://blogs.perficient.com/2024/06/25/part-1-an-overview-of-the-pdfbox-library/feed/ 0 364863
The Quest for Spark Performance Optimization: A Data Engineer’s Journey https://blogs.perficient.com/2024/06/18/the-quest-for-spark-performance-optimization-a-data-engineers-journey/ https://blogs.perficient.com/2024/06/18/the-quest-for-spark-performance-optimization-a-data-engineers-journey/#respond Tue, 18 Jun 2024 13:43:04 +0000 https://blogs.perficient.com/?p=364402

In the bustling city of Tech Ville, where data flows like rivers and companies thrive on insights, there lived a dedicated data engineer named Tara. With over five years of experience under her belt, Tara had navigated the vast ocean of data engineering, constantly learning, and evolving with the ever-changing tides.
One crisp morning, Tara was called into a meeting with the analytics team at the company she worked for. The team had been facing significant delays in processing their massive datasets, which was hampering their ability to generate timely insights. Tara’s mission was clear: optimize the performance of their Apache Spark jobs to ensure faster and more efficient data processing.
The Analysis
Tara began her quest by diving deep into the existing Spark jobs. She knew that to optimize performance, she first needed to understand where the bottlenecks were. she started with the following steps:
1. Reviewing Spark UI: Tara meticulously analyzed the Spark UI for the running jobs, focusing on stages and tasks that were taking the longest time to execute. she noticed that certain stages had tasks with high execution times and frequent shuffling.

Monitoring Spark with the web interface | DataStax Enterprise | DataStax  Docs
2. Examining Cluster Resources: she checked the cluster’s resource utilization. The CPU and memory usage graphs indicated that some of the executor nodes were underutilized while others were overwhelmed, suggesting an imbalance in resource allocation.

                                           Apache Spark Cluster Manager: YARN, Mesos and Standalone - TechVidvan
The Optimization Strategy
Armed with this knowledge, Tara formulated a multi-faceted optimization strategy:

1. Data Serialization: she decided to switch from the default Java serialization to Kryo serialization, which is faster and more efficient.
conf = SparkConf().set(“spark.serializer”, “org.apache.spark.serializer.KryoSerializer”)

pyspark tunning #Data Serialization
2. Tuning Parallelism: Tara adjusted the level of parallelism to better match the cluster’s resources. By setting `spark.default.parallelism` and `spark.sql.shuffle.partitions` to a higher value, she aimed to reduce the duration of shuffle operations.
conf = conf.set(“spark.default.parallelism”, “200”)
conf = conf.set(“spark.sql.shuffle.partitions”, “200”)
3. Optimizing Joins: she optimized the join operations by leveraging broadcast joins for smaller datasets. This reduced the amount of data shuffled across the network.
small_df = spark.read.parquet(“hdfs://path/to/small_dataset”)
large_df = spark.read.parquet(“hdfs://path/to/large_dataset”)
small_df_broadcast = broadcast(small_df)
result_df = large_df.join(small_df_broadcast, “join_key”)

Hadoop, Spark, Hive and Programming: Broadcast Join in Spark
4. Caching and Persisting: Tara identified frequently accessed DataFrames and cached them to avoid redundant computations.
df = spark.read.parquet(“hdfs://path/to/important_dataset”).cache()
df.count() – Triggering cache action

Caching In Spark
5. Resource Allocation: she reconfigured the cluster’s resource allocation, ensuring a more balanced distribution of CPU and memory resources across executor nodes.
conf = conf.set(“spark.executor.memory”, “4g”)
conf = conf.set(“spark.executor.cores”, “2”)
conf = conf.set(“spark.executor.instances”, “10”)

The Implementation
With the optimizations planned, Tara implemented the changes and closely monitored their impact. she kicked off a series of test runs, carefully comparing the performance metrics before and after the optimizations. The results were promising:
– The overall job execution time reduced by 40%.
– The resource utilization across the cluster was more balanced.
– The shuffle read and write times decreased significantly.
– The stability of the jobs improved, with fewer retries and failures.
The Victory
Tara presented the results to the analytics team and the management. The improvements not only sped up their data processing pipelines but also enabled the team to run more complex analyses without worrying about performance bottlenecks. The insights were now delivered faster, enabling better decision-making, and driving the company’s growth.
The Continuous Journey
While Tara had achieved a significant milestone, she knew that the world of data engineering is ever evolving. she remained committed to learning and adapting, ready to tackle new challenges and optimize further as the data landscape continued to grow.
And so, in the vibrant city of Tech Ville, Tara’s journey as a data engineer continued, navigating the vast ocean of data with skill, knowledge, and an unquenchable thirst for improvement.

]]>
https://blogs.perficient.com/2024/06/18/the-quest-for-spark-performance-optimization-a-data-engineers-journey/feed/ 0 364402
Unleash the Power of Your CloudFront Logs: Analytics with AWS Athena https://blogs.perficient.com/2024/05/22/unleash-the-power-of-your-cloudfront-logs-analytics-with-aws-athena/ https://blogs.perficient.com/2024/05/22/unleash-the-power-of-your-cloudfront-logs-analytics-with-aws-athena/#comments Wed, 22 May 2024 06:48:07 +0000 https://blogs.perficient.com/?p=362976

CloudFront, Amazon’s Content Delivery Network (CDN), accelerates website performance by delivering content from geographically distributed edge locations. But how do you understand how users interact with your content and optimize CloudFront’s performance? The answer lies in CloudFront access logs, and a powerful tool called AWS Athena can help you unlock valuable insights from them. In this blog post, we’ll explore how you can leverage Amazon Athena to simplify log analysis for your CloudFront CDN service.

Why Analyze CloudFront Logs?

CloudFront delivers data, videos, applications, and APIs to customers globally with low latency and high transfer speeds. However, managing and analyzing the logs generated by CloudFront can be challenging due to their sheer volume and complexity.

These logs contain valuable information such as request details, response status codes, and latency metrics, which can help you gain insights into your application’s performance, user behavior, and security incidents. Analyzing this data manually or using traditional methods like log parsing scripts can be time-consuming and inefficient.

By analyzing these logs, you gain a deeper understanding of:

  • User behaviour and access patterns: Identify popular content, user traffic patterns, and potential areas for improvement.
  • Content popularity and resource usage: See which resources are accessed most frequently and optimize caching strategies.
  • CDN performance metrics: Measure CloudFront’s effectiveness by analyzing hit rates, latency, and potential bottlenecks.
  • Potential issues: Investigate spikes in errors, identify regions with slow response times, and proactively address issues.

Introducing AWS Athena: Your CloudFront Log Analysis Hero

Amazon Athena is a serverless query service that allows you to analyze data stored in Amazon S3 using standard SQL. Here’s why Athena is perfect for CloudFront logs:

  • Cost-Effective: You only pay for the queries you run, making it a budget-friendly solution.
  • Serverless: No infrastructure to manage – Athena takes care of everything.
  • Familiar Interface: Use standard SQL queries, eliminating the need to learn complex new languages.

Architecture:

Arcgi

Getting Started with Athena and CloudFront Logs

To begin using Amazon Athena for CloudFront log analysis, follow these steps:

1. Enable Logging in Amazon CloudFront

If you haven’t already done so, enable logging for your CloudFront distribution. This will start capturing detailed access logs for all requests made to your content.

2. Store Logs in Amazon S3

Configure CloudFront to store access logs in a designated Amazon S3 bucket. Ensure that you have the necessary permissions to access this bucket from Amazon Athena.

3. Create an Athena Table

Create an external table in Amazon Athena, specifying the schema that matches the structure of your CloudFront log files.

Below is the sample query we have used to create a Table :

 CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (

  date STRING,

  time STRING,

  location STRING,

  bytes BIGINT,

  request_ip STRING,

  method STRING,

  host STRING,

  uri STRING,

  status INT,

  referrer STRING,

  user_agent STRING,

  query_string STRING,

  cookie STRING,

  result_type STRING,

  request_id STRING,

  host_header STRING,

  request_protocol STRING,

  request_bytes BIGINT,

  time_taken FLOAT,

  xforwarded_for STRING,

  ssl_protocol STRING,

  ssl_cipher STRING,

  response_result_type STRING,

  http_version STRING,

  fle_encrypted_fields STRING,

  fle_status STRING,

  unique_id STRING

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ ESCAPED BY ‘\’ LINES TERMINATED BY ‘\n’

LOCATION ‘paste your s3 URI here’;

Click on the run button!

Query

Extracting Insights with Athena Queries

Now comes the fun part – using Athena to answer your questions about CloudFront performance. Here are some sample queries to get you going:

Total Requests

Find the total number of requests served by CloudFront for a specific date range.

SQL

SELECT

    COUNT(*) AS total_requests

FROM

    cloudfront_logs

WHERE

    date BETWEEN ‘2023-12-01’ AND ‘2023-12-31’;

 

Most Requested Resources

Identify the top 10 most requested URLs from your CloudFront distribution. This query will give you a list of the top 10 most requested URLs along with their corresponding request counts. You can use this information to identify popular content and analyze user behavior on your CloudFront distribution.

SQL

SELECT

    uri,

    COUNT(*) AS request_count

FROM

    assetscs_cdn_logs

GROUP BY

    uri

ORDER BY

    request_count DESC

LIMIT 10;

Traffic by Region

Analyze traffic patterns by user location.

This query selects the location field from your CloudFront logs (which typically represents the geographical region of the user) and counts the number of requests for each location. It then groups the results by location and orders them in descending order based on the request count. This query will give you a breakdown of traffic by region, allowing you to analyze which regions generate the most requests to your CloudFront distribution. You can use this information to optimize content delivery, allocate resources, and tailor your services based on geographic demand.

SQL

SELECT

    location,

    COUNT(*) AS request_count

FROM

    cloudfront_logs

GROUP BY

    location

ORDER BY

    request_count DESC;

 

Average Response Time

Calculate the average response time for CloudFront requests. Executing this query will give you the average response time for all requests served by your CloudFront distribution. You can use this metric to monitor the performance of your CDN and identify any potential performance bottlenecks.

SQL

SELECT

    AVG(time_taken) AS average_response_time

FROM

    cloudfront_logs;

 

Number of Requests According to Status

The below query will provide you with a breakdown of the number of requests for each HTTP status code returned by CloudFront, allowing you to identify any patterns or anomalies in your CDN’s behavior.

SQL

SELECT status, COUNT(*) as count

FROM cloudfront_logs

GROUP BY status

ORDER BY count DESC;

Athena empowers you to create even more complex queries involving joins, aggregations, and filtering to uncover deeper insights from your CloudFront logs.

Optimizing CloudFront with Log Analysis

By analyzing CloudFront logs, you can identify areas for improvement:

  • Resource Optimization: Resources with consistently high latency or low hit rates might benefit from being cached at more edge locations.
  • Geographic Targeting: Regions with high traffic volume might warrant additional edge locations to enhance user experience.

Conclusion

AWS Athena and CloudFront access logs form a powerful duo for unlocking valuable insights into user behavior and CDN performance. With Athena’s cost-effective and user-friendly approach, you can gain a deeper understanding of your content delivery and make data-driven decisions to optimize your CloudFront deployment.

Ready to Unleash the Power of Your Logs?

Get started with AWS Athena today and unlock the hidden potential within your CloudFront logs. With its intuitive interface and serverless architecture, Athena empowers you to transform data into actionable insights for a faster, more performant CDN experience.

]]>
https://blogs.perficient.com/2024/05/22/unleash-the-power-of-your-cloudfront-logs-analytics-with-aws-athena/feed/ 1 362976
IICS Micro and Macro Services https://blogs.perficient.com/2024/04/26/iics-micro-and-macro-services/ https://blogs.perficient.com/2024/04/26/iics-micro-and-macro-services/#respond Fri, 26 Apr 2024 13:34:44 +0000 https://blogs.perficient.com/?p=362102

 

Macros in IICS

 

Informatica IICS: An expression macro is a useful technique for creating complex or repeating expressions in mappings. This makes it possible to perform computations over various fields or constants.

creating a collection of related expressions so that the same computation can be done on several input fields.

Steps to Use Macros:

  1. Login into your informatica cloud account and open Data Integration microservice.
    Iics 1
  2. Now, create a new mapping by clicking on New from Navigation window and Select Mapping and click on create.
    Iics 2
  3. Select source and target Objects in Source and Target Transformations.
    Iics 3
  4. Now, Create an Expression Transformation in IICS mapping between Source and Target.
    Iics 4
  5. Click the “+” icon in Expression Transformation to create an input macro field. Then, choose “Input_Macro_Field” as the field type, as shown below.
    Iics 5
  6. Configure the port according to the requirements (that is, whether we wish to apply the same logic or condition to all fields or just a few specific fields) as indicated below after generating the input macros.
    Iics 6
  7. Create an additional field in the same way as before, but this time choose “Output_Macro_Field” as the field type for the output macro, choose the data type, and set the precision to “Max” in order to avoid data truncation, as shown below..
    Iics 7
  8. Configure your macro expression in the output macro.
  9. For example, we had to apply the LTRIM RTIM function and set all blank values to null. However, attempting to validate this expression resulted in an error: ‘This expression cannot be validated because it uses macro input fields’ . So, avoid from clicking the Validate button.
    Iics 8
  10. Navigating to the target, you will see an additional incoming field from the expression “%Input_Macro%_out.” As shown below.
    Iics 9
  11. In Target Transformation, choose ‘Completely Parameterized‘ under field mapping, and then create a new parameter as indicated below.
    Iics 10
  12. Now save the mapping.
  13. Create a Mapping Configuration Task (MCT) and Select Runtime Environment and click Next .
  14. Map all the fields with the suffix “_out” in order to allow expression logic to be applied in expression macros as shown below.
    Iics 11
  15. Click “Finish” and run MCT to complete the mapping requirements.

]]>
https://blogs.perficient.com/2024/04/26/iics-micro-and-macro-services/feed/ 0 362102