Understanding best practices for securing Snowflake and having a concrete implementation plan is a critical Day Zero deliverable. Snowflake is a secure, cloud-based data warehouse. There are no hardware or software components to select, install, configure, or maintain. There is virtually no software to install, configure, or manage. Snowflake takes care of ongoing maintenance and […]
David Callaghan – Solutions Architect
As a solutions architect with Perficient, I bring twenty years of development experience and I'm currently hands-on with Hadoop/Spark, blockchain and cloud, coding in Java, Scala and Go. I'm certified in and work extensively with Hadoop, Cassandra, Spark, AWS, MongoDB and Pentaho. Most recently, I've been bringing integrated blockchain (particularly Hyperledger and Ethereum) and big data solutions to the cloud with an emphasis on integrating Modern Data produces such as HBase, Cassandra and Neo4J as the off-blockchain repository.
Connect with David
Blogs from this Author
HIPAA compliance with Redshift
At Perficient, our Data Solutions team has worked closely with our Healthcare division to implement Redshift for HIPAA and HITECH compliance. Snowflake offers healthcare organizations a secure data warehouse environment with many HIPAA compliance features. Perficient’s implementation team includes Snowflake and health industry subject matter experts. We’ll take a look at Snowflake’s benefits for healthcare providers […]
HIPAA compliance with Snowflake
At Perficient, our Data Solutions team has worked closely with our Healthcare division to implement Snowflake for HIPAA and HITECH compliance. Snowflake offers healthcare organizations a secure data warehouse environment with many HIPAA compliance features. Perficient’s implementation team includes Snowflake and health industry subject matter experts. We’ll take a look at Snowflake’s benefits for healthcare providers […]
It’s good that Spark Security is turned off by default
Security in Spark is OFF by default, which means you are fully responsible for security from Day One. Spark supports a variety of deployment types, each with its own set of security levels. Not all deployment sorts are safe in every scenario, and none is secure by default. Take the time to analyze your situation, […]
Four tips to solve harder data science problems with Jupyter Notebooks
Jupyter notebooks are versatile tools that data scientists can use for a variety of purposes. In this article, we will explore four ways that Jupyter notebooks can be used to improve your data science workflow. We will discuss how Jupyter notebooks can be used to learn new programming languages, document your code, debug code, and […]
8 Ways to Data Scientist’s Can Optimize Their Parquet Queries
Some data formats are columnar. This means they store information in columns or rows. They are popular because they can be used for certain types of queries more easily than row-based ones. Parquet supports parallel query processing, meaning it can split up your data into several files in order to read in multiple processors at […]
Finding the right balance of nOps
There are a proliferation of acronyms with the Ops suffix for the software architect to choose from. It’s reasonable to question whether the number are needed and necessary. All of these are, at the core, a targeted expressions of foundational business management methodology. The end goal will be continuous improvement in some business critical metric. […]
Adopting a Risk-Based Strategy for Data
Ransomware attacks have been in the news lately, possibly because of the 225% increase in total losses from ransomware in the United States alone in 2020. An increase in sophistication by attackers is a major factor, and many of these ransomware attacks were enabled at least in part by insider negligence. As the level of […]
Deep Dive into Databricks Tempo for Time Series Analytics
Time-series data has typically been fit imperfectly into whatever database we were using at the time for other tasks. There are time series databases (TSDB) coming to market. TSDBs are optimized to store and retrieve associated pairs of times and values. TSDB’s architecture focuses on time-stamp data storage and the compressions, summarization and life-cycle management […]
Koalas are better than Pandas (on Spark)
I help companies build out, manage and hopefully get value from large data stores. Or at least, I try. In order to get value from these petabytes-scale datastores, I need the data scientists to be able to easily apply their statistical and domain knowledge. There’s one fundamental problem: large datasets are always distributed and data […]
DataOps with IBM
DataOps seeks to deliver high quality data fast in the same way that DevOps delivers high quality code fast. The names are similar; the goals are similar; the implementation is very different. Code quality can be measured using similar tools across multiple projects. Data quality is a mission-critical, enterprise-wide effort. The effort has consistently proven […]
Trust models in distributed ledgers
Consensus, getting distributed processes to agree on a single value, is a fundamental problem in computer science. Distributed processing is difficult. In fact, there are logical proofs that show pretty conclusively that there won’t be a single perfect algorithm for handling consensus in an asynchronous system made of imperfect nodes. As long as there is […]