David CallaghanSolutions Architect

As a solutions architect with Perficient, I bring twenty years of development experience and I'm currently hands-on with Hadoop/Spark, blockchain and cloud, coding in Java, Scala and Go. I'm certified in and work extensively with Hadoop, Cassandra, Spark, AWS, MongoDB and Pentaho. Most recently, I've been bringing integrated blockchain (particularly Hyperledger and Ethereum) and big data solutions to the cloud with an emphasis on integrating Modern Data produces such as HBase, Cassandra and Neo4J as the off-blockchain repository.

Connect with David

Blogs from this Author

IBM Cloud Pak for Automation is more than AI and ML.

DataOps with IBM

DataOps seeks to deliver high quality data fast in the same way that DevOps delivers high quality code fast. The names are similar; the goals are similar; the implementation is very different. Code quality can be measured using similar tools across multiple projects. Data quality is a mission-critical, enterprise-wide effort. The effort has consistently proven […]

Trust models in distributed ledgers

Consensus, getting distributed processes to agree on a single value, is a fundamental problem in computer science. Distributed processing is difficult. In fact, there are logical proofs that show pretty conclusively that there won’t be a single perfect algorithm for handling consensus in an asynchronous system made of imperfect nodes. As long as there is […]

Understanding Performance in Blockchain Systems

Blockchain is an example of distributed ledger systems and as such shares the same performance concerns as any other distributed system. In order to measure the performance of a distributed system with an acceptable degree of accuracy, it’s best to simplify as many of the variables under our control as possible. The size of the […]


Take advantage of windows in your Spark data science pipeline

Windows can perform calculations across a certain time frame around the current record in your Spark data science pipeline. Windows are SQL functions that allow you to access data before and after the current record to perform calculations. They can be broken down into ranking and analytic functions and, like aggregate functions. Spark provides the […]

Istock 927720230 Featured Image

Bringing Informatica Intelligent Cloud Service into your Release Management Pipeline

Informatica Intelligent Cloud Services (IICS) now offers a free command line utility that can be used to integrate your ETL jobs into most enterprise release management pipelines. It’s called the Asset Management command line interface (CLI). Version two now allows you to extract an IICS job into a single compressed file. Moving a single standalone […]

Scale your data science practice formally

Frequently, the “crawl, walk, run, fly” metaphor is used when describing the path to implementing a scalable data science practice. There are a lot of problems with this concept, not the least of which is the fact there is already motion involved. People are already doing BI work, often complex work enabling high value results. […]

Computer And Tools

Tune the dials to optimize your Spark machine learning pipeline

Tuning Spark for your machine learning pipeline can be a complex and time consuming process. Store and compute play a different role for your Spark cluster in different stages of your machine learning pipeline. Spark defaults are never the right way to go. It makes more sense to know what settings are most effective at […]

getting started

Big Data Bootcamp by the Beach: Getting Started Smart

In the first post in this series, I talked about giving a Big Data Bootcamp in the Dominican Republic to a large group of very smart students. In this post, I’ll go over the basic tools and techniques that I think are most relevant in the job market. These are basic tools that most are […]

Big Data Bootcamp

Big Data Bootcamp by the Beach: An introduction

This is a little story about nothing ventured; nothing gained. One day, I got a LinkedIn message asking if I would like to teach a Big Data Bootcamp at an event for the Universidad Abierta Para Adultos in Santiago de Caballeros, República Dominicana. Luis didn’t know me; he just saw my profile and saw that I’ve been […]

Respect Driven Development

Respect Driven Development is not an attempt to add to the Agile alphabet soup of X Driven Development like TDD, BDD, DDD or FDD. I consider it to be more of an attitude than a process. The idea started to germinate the more I worked on Big Data projects, which have not always turned out to […]

How to Make Puerto Rico Your New Crypto Home

Is Puerto Rico a haven for crypto-currency? Is Act 20 right for my business? Do I have to move? What are the tax benefits? What does it take to qualify for tax benefits? What is the process? Identifying the steps to becoming a bona fide resident of Puerto Rico as an individual and/or a company is […]

5 Steps to Modernize Your Mainframe with Pair Programming

It seems like once I stopped mainframe coding in COBOL and RPG and moved to Java, I have been continuously involved in mainframe retirement projects. There may be no bigger and slower moving pending disaster than the state of the corporate mainframe. The mainframe itself stands in stark contrast; they are better and faster than […]

Load More