Hadoop Articles - Perficient Blogs
Blog

Posts Tagged ‘Hadoop’

  • Topics
  • Industries
  • Partners

Explore

Topics

Industries

Partners

How to Load Log Data into HDFS using Flume

Data acquisition is a very important part of building a big data ecosystem. Data acquisition allows you to extract various types of data such as a file, DB, streaming, web page etc. If you are just setting up your local environment, not in the real business scenarios, you can resolve data acquisition by making use […]

Read more

2 Choices for Big Data Analysis on AWS: Amazon EMR or Hadoop on EC2

What are the key differentiators to determine Hadoop distribution for Big Data analysis on AWS? We have two choices: Amazon EMR or a third-party provided Hadoop (ex: Core Apache Hadoop, Cloudera, MapR etc). Yes, cost is important. But, aside from cost, other things to look for include ease of operation, controlling, managing, performance, features etc. 1. Cost […]

Read more

The Year in Review | Top 10 EIS Posts of 2015

It’s been a busy year in the Enterprise Information Systems space. With over 75 posts this year, our in-house experts found themselves face to face with big changes and an abundance of great information to share. We sifted through that content and present to you the Top 10 EIS posts of 2015.   Ten | […]

Read more

Time Well Spent in 2015

The end of 2015 is fast approaching, with December looming just a week away. For most people, December is packed with the hustle and bustle of last-minute gift shopping, or end-of-year projections and budgets for 2016. Often in the sway of all this activity, many are so focused on the approaching New Year that they […]

Read more

Dorothy in the Land of Big Data

Big Data is one of the enabling technologies for companies to digitally transform either their operations and/or customer  interactions.  However the open source world can be complicated, especially in the red hot Big Data arena. There are a myriad of technologies; some compete with one another, others overlap, some are complementary, and worse of all, […]

Read more

Hadoop, Spark, Cassandra, Oh My!

Previously, I reviewed why Spark will not by itself replace Hadoop, but Spark combined with other data storage and resource management technologies creates other options for managing Big Data.  Today we will investigate how an enterprise should proceed in this new, “Hadoop is not the only option” world.  Hadoop, Spark, Cassandra, Oh My!  Open source Hadoop and […]

Read more

IBM’s Spark Investment is Evidence Big Data is Dead

  Right after I posted my blog on Spark and Hadoop, I came across this article. IBM made a big announcement that they are putting their weight behind Spark.  They are committing more than 3,500 developers and programmers to help move Spark forward. This combined with significant support from the Big 3 Hadoop distributors (HortonWorks, Cloudera, […]

Read more

Webinar: Big Data & Microsoft, Key to Your Digital Transformation

Companies undergoing digital transformation are creating organizational change through technologies and systems that enable them to work in ways that are in sync with the evolution of consumer demands and the state of today’s marketplace. In addition, more companies are relying on more and more data to help make business decisions. And when it comes […]

Read more

Analytical Talent Gap

As new companies embark on the Digital Transformation leveraging Big Data, key concerns and challenges get amplified especially for the near term before the technology and talent pool supply adjusts to the demand. Looking at the  earlier post Big Data Challenges, the top 3 concerns were: Identifying the Business value/Monetizing the Big Data Setting up the […]

Read more

Big Data Changes Everything – Has Your Governance Changed?

A few years ago, Big Data/Hadoop systems were generally a side project for either storing bulk data or for analytics. But now as companies  have pursued a data unification strategy, leveraging the Next Generation Data Architecture, Big Data and Hadoop systems are becoming a strategic necessity in the modern enterprise. Big Data and Hadoop are technologies […]

Read more

Hadoop’s Ever-Increasing Role

With the advent of Splice Machine and the release of Hive 0.14 we are seeing Hadoop’s role in the data center continue to grow. Both of these technologies support limited transactions against data stored in HDFS. Now, I would not suggest moving your mission-critical ERP systems to Hive or Splice Machine, but the support of […]

Read more

Defining Big Data Prototypes – part 2

In part 1 of this series, we discussed some of the most common assumptions associated with Big Data Proof of Concept (POC) projects. Today, we’re going to begin exploring the next stage in Big Data POC definition – “The What.” The ‘What’ for Big Data has gotten much more complicated in recent years; and now […]

Read more

One Cluster To Rule Them All!

In the Hadoop space we have a number of terms for the Hadoop File System used for data management. Data Lake is probably the most popular. I have heard it called a Data Refinery as well as some other not so mentionable names. The one that has stuck with me has been is the Data […]

Read more

The Modern Data Warehouse Will Augment Hadoop

The data warehouse has been a part of the EIM vernacular for nearly 20 years. The vision of the single source of the truth and a single repository for reporting and analysis are two objectives that have resulted in a never-ending journey.   The data warehouse never has had enough data and the quality required for […]

Read more

Data Staging and Hadoop

Traditionally, in our information architectures we have a number of staging or intermediate data storage areas / systems.   These have taken different forms over the years, publish directories on source systems, staging areas in data warehouses, data vaults, or most commonly, data file hubs.   In general, these data file staging solutions have suffered from two […]

Read more

Get R Running over YARN-based MapReduce

Out of the mathematical and statistics language and tools such as SAS, SPSS, Matlab, etc. R language is a pretty good tool which provides the environment and essential packages for statistical computing and graphics. It is free and it offers an open environment and the means to allow users to develop custom package. In addition to […]

Read more

A little stuffed animal called Hadoop

Doug Cutting – Hadoop creator – is reported to have explained how the name for his Big Data technology came about: “The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria.” The term, of course, evolved over time and […]

Read more

Webinar Recap: The Modern Data Warehouse – A Hybrid Story

Last week, we held a webinar, The Modern Data Warehouse – A Hybrid Story. As the world of data evolves ever so quickly, it transforms the industry and creates a need for new approaches to business intelligence. Data warehousing technology that worked well for years, serving its purpose to manage and understand business driven data, […]

Read more

SAP HANA and Hadoop – complementary or competitive?

In my last blog post, we learned about SAP HANA… or as I called it, “a database on steroids”. Here is what SAP former CTO and Executive Board Member, Vishal Sikka, told InformationWeek: “Hana is a full, ACID-compliant database, and not just a cache or accelerator. All the operations happen in memory, but every transaction […]

Read more

Data Warehouse Role in Big Data

Last year the Data Warehouse was on the endangered species list.   A number of Hive solutions were being marketed as the Data Warehouse killers. However, this message has been muted this year and is evidenced by some developing trends. First, all of the mega-vendors have announced technologies to access data in Hadoop.   Oracle and IBM […]

Read more

Subscribe to the Weekly Blog Digest:

Sign Up