Skip to main content

David CallaghanSolutions Architect

As a solutions architect with Perficient, I bring twenty years of development experience and I'm currently hands-on with Hadoop/Spark, blockchain and cloud, coding in Java, Scala and Go. I'm certified in and work extensively with Hadoop, Cassandra, Spark, AWS, MongoDB and Pentaho. Most recently, I've been bringing integrated blockchain (particularly Hyperledger and Ethereum) and big data solutions to the cloud with an emphasis on integrating Modern Data produces such as HBase, Cassandra and Neo4J as the off-blockchain repository.

Connect with David

Blogs from this Author

The Secret To Software Tool Integration

Protect PII with anonymized datasets for Data Scientists with differential privacy

Businesses and organizations now hold more personal information than ever before. Storing large amounts of structured and unstructured data may be useful in a variety of ways, such as reporting and analytics, but it might expose PII that is linked to the data being analyzed.As organizations are increasingly under pressure to comply with data privacy […]

Kid Playing Hide Seek Smiling Concept

Beyond Encryption: Protect sensitive data using k-anonymity

Businesses and organizations now hold more personal information than ever before. Storing a lot of data may be useful in a variety of ways, such as reporting and analytics, which might expose PII that is linked to the data being analyzed. When data is being transmitted or stored, encryption is useful for protecting it, whereas […]

Real-time Retail with Databrick’s Lakehouse Accelerators

Databricks has announced Lakehouse for Retail, a collection of more than twenty free, open-source Retail Solution Accelerators. Solution accelerators are tools that help companies in constructing a solution for their data and AI problem. They can be used to show the feasibility of a prototype and then the business can use that as support for […]

Istock 1325306866

Best practices for securing Snowflake

Understanding best practices for securing Snowflake and having a concrete implementation plan is a critical Day Zero deliverable. Snowflake is a secure, cloud-based data warehouse. There are no hardware or software components to select, install, configure, or maintain. There is virtually no software to install, configure, or manage. Snowflake takes care of ongoing maintenance and […]

Medicine Doctor Touching Electronic Medical Record On Tablet. Dna. Digital Healthcare And Network Connection On Hologram Modern Virtual Screen Interface, Medical Technology And Futuristic Concept.

HIPAA compliance with Redshift

At Perficient, our Data Solutions team has worked closely with our Healthcare division to implement Redshift for HIPAA and HITECH compliance. Snowflake offers healthcare organizations a secure data warehouse environment with many HIPAA compliance features. Perficient’s implementation team includes Snowflake and health industry subject matter experts. We’ll take a look at Snowflake’s benefits for healthcare providers […]

Medicine Doctor Touching Electronic Medical Record On Tablet. Dna. Digital Healthcare And Network Connection On Hologram Modern Virtual Screen Interface, Medical Technology And Futuristic Concept.

HIPAA compliance with Snowflake

At Perficient, our Data Solutions team has worked closely with our Healthcare division to implement Snowflake for HIPAA and HITECH compliance. Snowflake offers healthcare organizations a secure data warehouse environment with many HIPAA compliance features. Perficient’s implementation team includes Snowflake and health industry subject matter experts. We’ll take a look at Snowflake’s benefits for healthcare providers […]

Cloud Security In Offshore Software Development Projects

It’s good that Spark Security is turned off by default

Security in Spark is OFF by default, which means you are fully responsible for security from Day One. Spark supports a variety of deployment types, each with its own set of security levels. Not all deployment sorts are safe in every scenario, and none is secure by default. Take the time to analyze your situation, […]

Four tips to solve harder data science problems with Jupyter Notebooks

Jupyter notebooks are versatile tools that data scientists can use for a variety of purposes. In this article, we will explore four ways that Jupyter notebooks can be used to improve your data science workflow. We will discuss how Jupyter notebooks can be used to learn new programming languages, document your code, debug code, and […]

All Equipped For A Productive Workday

8 Ways to Data Scientist’s Can Optimize Their Parquet Queries

Some data formats are columnar. This means they store information in columns or rows. They are popular because they can be used for certain types of queries more easily than row-based ones. Parquet supports parallel query processing, meaning it can split up your data into several files in order to read in multiple processors at […]

Balance

Finding the right balance of nOps

There are a proliferation of acronyms with the Ops suffix for the software architect to choose from. It’s reasonable to question whether the number are needed and necessary. All of these are, at the core, a targeted expressions of foundational business management methodology. The end goal will be continuous improvement in some business critical metric. […]

Istock 1214111410

Adopting a Risk-Based Strategy for Data

Ransomware attacks have been in the news lately, possibly because of the 225% increase in total losses from ransomware in the United States alone in 2020. An increase in sophistication by attackers is a major factor, and many of these ransomware attacks were enabled at least in part by insider negligence. As the level of […]

Istock 1255683032

Deep Dive into Databricks Tempo for Time Series Analytics

Time-series data has typically been fit imperfectly into whatever database we were using at the time for other tasks. There are time series databases  (TSDB) coming to market. TSDBs are optimized to store and retrieve associated pairs of times and values. TSDB’s architecture focuses on time-stamp data storage and the compressions, summarization and life-cycle management […]

Load More