Skip to main content

Posts Tagged ‘Spark’

Group of web developers working in an office.

Hadoop Ecosystem Components

The Hadoop Ecosystem Hadoop Ecosystem is a platform or a suite that provides various services to solve big data problems. It includes Apache projects and various commercial tools and solutions. 4 major elements of Hadoop are HDFS, MapReduce, YARN, and Hadoop Common. Hadoop is a framework that enables the processing of large data sets which […]

Istock 1255683032

Deep Dive into Databricks Tempo for Time Series Analytics

Time-series data has typically been fit imperfectly into whatever database we were using at the time for other tasks. There are time series databases  (TSDB) coming to market. TSDBs are optimized to store and retrieve associated pairs of times and values. TSDB’s architecture focuses on time-stamp data storage and the compressions, summarization and life-cycle management […]

Watercolor Koala And Panda Sitting On The Tree.

Koalas are better than Pandas (on Spark)

I help companies build out, manage and hopefully get value from large data stores. Or at least, I try. In order to get value from these petabytes-scale datastores, I need the data scientists to be able to easily apply their statistical and domain knowledge. There’s one fundamental problem: large datasets are always distributed and data […]

Istock 649839956

Key Components/Calculations for Spark Memory Management

Different organizations will have different needs for cluster memory management. For the same, there is no set of recommendations for resource allocation. Instead, it can be calculated from the available cluster resources.  In this blog post, I will discuss best practices for YARN resource management with the optimum distribution of Memory, Executors, and Cores for […]

Computer And Tools

Tune the dials to optimize your Spark machine learning pipeline

Tuning Spark for your machine learning pipeline can be a complex and time consuming process. Store and compute play a different role for your Spark cluster in different stages of your machine learning pipeline. Spark defaults are never the right way to go. It makes more sense to know what settings are most effective at […]

5 Oracle Analytics Trends to Watch Out for Starting Now

“Catching up” is the term that came to mind when I used to check out what’s new with Oracle Analytics in previous years. This year, however, I frankly say I was impressed with what I saw at Oracle Open World last week. The rules of the analytics platform game have changed, tremendously. This is after […]

Spark as ETL

Introduction:   In general, the ETL (Extraction, Transformation and Loading) process is being implemented through ETL tools such as Datastage, Informatica, AbInitio, SSIS, and Talend to load data into the data warehouse. The same process can also be accomplished through programming such as Apache Spark to load the data into the database. Let’s see how it […]

Top 5 Lessons of Day 1 at Hadoop Summit #HS16SJ

Perficient is at the Hadoop Summit in San Jose, CA and we’re tracking the best of the conference. Here’s the top 5 lessons from day 1: Apache Atlas for managing your business catalog is almost ready for prime time! It is not, however, ready to be a full fledged Records Management solution (no policy management, […]

Big Data and You: DataOps

Welcome to “Big Data and You (the enterprise IT leader),” the Enterprise Content Intelligence group’s demystification of the “Big Data” . The often missing piece of the Infrastructure as code movement emerging from the DevOps space is what we think of as DataOps. Big Data technologies are uniquely poised to fill this gap because they […]

Top 10 EIS Posts of 2015

The Year in Review | Top 10 EIS Posts of 2015

It’s been a busy year in the Enterprise Information Systems space. With over 75 posts this year, our in-house experts found themselves face to face with big changes and an abundance of great information to share. We sifted through that content and present to you the Top 10 EIS posts of 2015.   Ten | […]

Time Well Spent in 2015

The end of 2015 is fast approaching, with December looming just a week away. For most people, December is packed with the hustle and bustle of last-minute gift shopping, or end-of-year projections and budgets for 2016. Often in the sway of all this activity, many are so focused on the approaching New Year that they […]

SparkR for Data Scientists

Although the title Data Scientist is not mentioned as often as other IT job titles, it has been in the IT world for a while and is becoming more important with the popularity of the Internet and eCommerce. What kind of skills should a data scientist have? It could be a long list, but I […]

Load More