So, the already convoluted Open Source Hadoop ecosystem just got a little more complicated with a Kudu joining the Elephant at #StrataHadoop. Advocates of Fast Analytics on Fast Data at Scale also just got more excited regarding the potential of fast writes, fast updates, fast reads, fast everything – all with Kudu! Cloudera’s Kudu is designed to fill major gaps in Hadoop’s storage layer, especially with regard to Fast Analytics, but is not meant to replace or disrupt (just yet!) HBase or HDFS. Instead, Kudu is meant to complement and run in close proximity with the storage engine because some applications may get more benefit out of HDFS or Hbase.
Before the official release of this news, VentureBeat speculated about Kudu’s possible implications for the Big Data industry. It “could present a new threat to data warehouses from Teradata and IBM’s PureData … It may also be used as a highly scalable in-memory database that can handle massively parallel processing (MPP) workloads, not unlike HP’s Vertica and VoltDB.”
Whatever the long-term implications of Kudu, the above scenarios are not going to play out any time soon. Maturity is still what most enterprises crave in this rather diverse Open Source ecosystem, and Kudu, despite all its excitement, has a long way to go on that front.
Recovery of data in Essbase is sometimes confusing and time consuming depending on the type and frequency of backups previously taken. In this article, I’ll give you an idea of how you can back up your Essbase data for the purpose of restoring a slice of the database.
Scenario: I am running a daily export of level 0 data from my Essbase database. One of my Planners called and said he submitted this month’s data to last month’s point of view thus overwriting a Forecast version that was still valid.
Recovery Solution 1: My first option is to import my level 0 data export and notify the Planners that they need to start submitting data that was entered since that last backup I have. The problem with this scenario is not everyone who entered data may be available to re-enter data.
Recovery Solution 2: Create a copy of the database, load the last level 0 backup, export the point of view affected by the accidental submit then load that into my production database. This solution takes a bit longer to build because it involves the creation of a new Essbase application and database as well as calculation script development; however, we can surgically specify the data to restore based on the point of view of the process that broke it to begin with. Read the rest of this post »
In 220.127.116.11 and later, Essbase performs a backup of the Essbase.sec file every 300 seconds and retains a default of two backups. This means that at any given time, you have two backups of the Essbase security file that are at most 10 minutes old.
In my experience, corruption to the Essbase security file is not identified within a 10-minute period, so as a matter of habit, I set the follow in the Essbase configuration file (essbase.cfg):
The interval is in seconds. This ensures that at any given time, I have 10 backups spanning across a 24-hour period. I will determine the Essbase security file is corrupted easily within a few hours of symptoms occurring and can roll it back several hours if necessary.
This is a generic error, and it may occur due to any of the following reasons:
To correct the IE timeout issues, backup of the Windows registry on the client machine then perform the following:
RESTRUCTURETHREADS is a setting you can add to your Essbase.cfg file to enable parallel restructure of BSO databases. The proper syntax taken from the Essbase Technical Reference is:
RESTRUCTURETHREADS [ appname [ dbname] ] n
Application and database are optional parameters and if omitted will apply to all BSO databases on the server. n specifies the number of threads to use for parallel processing the restructure.
I have seen tremendous improvements to dense restructures on systems with the horsepower to support this. I usually set the number of threads to half the total number of processor cores on the Essbase server. For example, if an Essbase server has four quad-core CPUs, I will set this to 8. I leave off the application and database when I have control over when the restructures will occur.
This error is often adjacent to the error, “Date &Time]Starting Cube Create/Refresh…[Date &time]: Cannot restructure.
There are other active users on database [%s]
[Date&tim]: An Exception occurred during Application deployment.:
The Hyperion Essbase operation failed with an error code: 1013101.”
The obvious cause is someone has an exclusive lock on the BSO database that is preventing Essbase from processing Planning’s request. The typical remedy is to wait it out. Once the existing lock is released, we can attempt to refresh again. Read the rest of this post »
The realization of competitive advantage has changed significantly over time. In the early days of IT automation, custom-built systems that handled the transactions of accounting or sales were a tremendous competitive advantage. Over time, building custom systems that handled all the transactions of a business provided competitive advantages to the leading companies who were able to perform the functions of their business more consistently, more completely, more accurately, and eventually more quickly.
These transactional systems eventually became Enterprise Resource Planning (ERP) systems. The first businesses that implemented these enterprise-wide systems were able to better manage and control the transactions that flowed through their business. Over time, order systems pointed to the need to track inventory as a series of transactions into and out of a warehouse. HR systems pointed to the need to treat hiring as transactions that began a series of other processes such as benefits, payroll, department assignments and access to secure systems. Inventory systems pointed to the need to build procurement systems that extended the enterprise into the systems of suppliers and vendors.
All of these systems provided a competitive advantage to the business, but as this trend reached its climax with the advent of ERP systems that provide the 90% to 95% of the standard (out-of-the-box) best practices within each business department, the competitive advantage these systems brought to companies, simply became the systems every company needs to stay in business. Read the rest of this post »
The Oracle Learning Library is pretty vast. One recent addition is a video titled, “Exporting Financial Management Data into Essbase Using Data Management.” Here’s a URL that “shortcuts” to the chase: https://apexapps.oracle.com/pls/apex/f?p=44785:2:::NO:2,RIR,CIR:P2_TAGS:Hyperion
For a very well done primer on Hyperion Financial Management, check out this video: http://download.oracle.com/technology/products/hfm/demos/hfm1112overview/HFM_LessonIndex.htm. The video is geared towards someone new to HFM and covers high level topics such as Creating Applications and Loading and Entering Data.
I recently heard a question about the DRM Web API. Here’s a link to an Oracle by Example (OBE), which describes how to configure and deploy it: http://www.oracle.com/webfolder/technetwork/tutorials/obe/hyp/DRM11.1.2-WebServicesAPI/index.htm
“Not all readers are leaders, but all leaders are readers.” Harry S. Truman
Yesterday, Facebook announced a News Feed algorithm change that will show members more videos similar to ones they “expand to full-screen, un-mute or opt to watch in HD”, even if they don’t Like, share or comment. Those same indicators will tell Facebook that a video is enjoyable so the Feed shows it to more people.
Now that is Big Data on viewership, leveraging insights from viewer behavior patterns, and it could definitely give Facebook an edge on YouTube, the leading internet video giant.
Unlike Facebook, YouTube is not a feed. Many of Facebook’s 1 billion+ users come daily, sometimes even hourly, to view the latest from their friends. As a matter of fact, every clip you see is essentially a recommendation from someone you are connected to and Like, and since the feed is so popular, it knows you will at some point see these videos. All it has to do is figure out what you want to see! The latter, in turn, is getting a lot easier with big data-driven personalization engines.
Facebook can process highly specific data on viewer behavior to learn what each individual member wants to see. How many seconds did people watch the video for? Was it visually stimulating? Multiply that by 4 billion videos per day in an IoT world where Facebook knows that a video popular with some people will probably be interesting to people similar to them based on all the biographical and behavior data it has.
So, now all of a sudden, it becomes quite clear why and how data-driven personalization could lead to a Feed full of the exact videos you may want to see.
These and other similar enhancements will be made over a period of time, and it sure remains to be seen if Facebook is trying to usher us into an era of diminished Tube popularity, or simply making it fun for its members.
Data modeling in Cassandra is a little tricky and requires a combination of science and art. Think of the Cassandra column family as a map of a map: an outer map keyed by a row key, and an inner map keyed by a column key. Both maps are sorted. To maximize Cassandra’s capabilities and for long term maintenance need’s, it’s better to analyze, know and follow certain high level rules while implementing Cassandra.
A few things to consider while implementing Cassandra:
In Cassandra, you have clusters and nodes, you want to make sure that during write’s, data is written to all cluster nodes evenly. Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. To increase read efficiency make sure that data is read from as few nodes as possible.
A Cassandra stress test below with Consistency level set to ALL and ONE proves why it’s better to read from as few nodes as possible
Isolate Clusters by functional areas and criticality. Use cases with similar criticality from the same functional area share the cluster and reside in different Keyspaces(Database). Determine Queries and Build Model based on those queries. Design and think about query pattern up front and design column families also ahead. Another reason why this rule should be followed is that unlike relational database, it’s not easy to tune or introduce new query patterns in Cassandra. In other words, you can’t just introduce or add a complex SQL (TSQL, PLSQL etc.) or Secondary Indexes to Cassandra because of it highly distributed nature.
On the high level bases, Below are some of the things you need to do to determine with your query pattern:
Create your queries to read from one partition. Keep in mind that your data is replicated to multiple nodes and so you can create individual queries that reads from different partition. When you query reads from multiple nodes, It has to go to each individual nodes and get the data and this takes time. But when it gets the data from one node it saves time.
An example would be the create Table below:
CREATE TABLE users_by_email( Name VARCHAR, Dob TIMESTAMP, Email VARCHAR, Join_date TIMESTAMP, PRIMARY KEY (email));
CREATE TABLE users_by_join_date( Name VARCHAR, Dob TIMESTAMP, Email VARCHAR, Join_date TIMESTAMP, PRIMARY KEY (join_date,email));
The above creates tables that enables you to read from one partition and basically, each user gets their own partition.
If you are trying to fit a group into a partition, you can use a compound PRIMARY KEY for this example:
CREATE TABLE groups (groupname text, username text, email text, join_date int, PRIMARY KEY (groupname, username)).