Skip to main content

Development

ELT Vs ETL : Data warehousing

Introduction :

The cost of any Data Management (DM) or Business Intelligence (BI) project is dependent on the decision of the architecture to be employed for the project .ETL is the traditional method that has been followed for many years. Considering the new technologies and toolsets that are available currently, a new approach can be implemented. ELT might challenge the existing ETL method and prove to be better. The choice needs to be made based upon the objectives set for the project and by considering the various strengths and weakness of both approaches .

What is ELT?

Data is “extracted” from the data sources into the “Staging Database” using a data extraction tool. The integrity and business rules checks are applied and then it is loaded into the warehouse. The warehouse database performs all the transformations to re-shape the data into its target output format.

The extract and load process is isolated from the transformation process. There is no third party tool required for transformation.

Reasons to Choose ELT:

Flexibility for enhancements / Future requirements:

The transformation process is not dependent on the Extract and load process in ELT. The data extracted and loaded is not as per the current scope of requirements. Data that could be required for future use can also be extracted and loaded, though not used in the current transformation processing.

Reduced Risk :

 As there is less dependency between the various stages, the development process can be isolated and the design can be implemented in an isolated manner. This reduces the cost of risk and helps in better maintenance and management.

No additional hardware / Skill set required:

 The Tools provided by the target database engine could be utilized for building the warehouse. The ELT process is run on the same hardware as the database engine where the warehouse is placed using the existing hardware deployed

As the functionality provided by the database engine is utilized, there is no need for investing in learning new tools which further reduces cost and risk in the development process.

Reasons Not to Choose ELT :

Against the Norm

ELT is a new approach to data warehouse design and development, hence requires a change in mentality and design approach which is against the traditional method currently being used.

Availability of Tools :

 As ELT is a new approach there are not enough tools on the market which are as powerful as Teradata in performing transformations.

 

What is ETL?

Data is “extracted” from the data sources using a data extraction tool. It is then transformed using a series of transformation routines. This transformation process is largely dictated by the data format of the output. Data quality and integrity checking is performed as part of the transformation process in the data staging area. Finally, once the data is in the target format, it is then loaded into the data warehouse ready for presentation/reporting.

The process is often designed from the end backwards, in that the required output is designed first. In so doing, this informs exactly what data is required from the source. The routines designed and developed to implement the process are written specifically for the purpose of achieving the desired output, and only the data required for the output is included in the extraction process.

Business rules that define how aggregations are achieved and the relationships between the various entities in both the source and target, are designed and therefore coded into the routines that implement the ETL process. This approach leads to tight dependencies in the routines at each stage of the process.

 

Reasons to choose ETL:

Development Time:

Designing from the output backwards ensures that only data relevant to the solution is extracted and processed, potentially reducing development, extract, and processing overhead; and therefore time.

Availability of Tools:

There are a lot of tools available that implement ETL hence providing the developer with an opportunity to choose the appropriate one for them.

Administration Overhead:

Since only data relevant to the solution is extracted, there is less data to manage and store. Hence the administration overhead is reduced.

Target database is not capable to perform transformations:

The Target Database chosen to hold the data is not powerful enough to process the Transformations, unlike the Teradata engine which is extremely fast and efficient at ELT (Extract, Load, Transform) processing to aggregate and resave data.

Faster Transformation processing:

The ETL tool transforms the data as it reads it from the source database which is faster when compared to transformations done by the Target Database which involves reading from the disk and writing it back.

ETL tools allow parallel execution which requires building up a pipeline and splitting data accordingly, thereby improving the performance

Reasons not to choose ETL:

Less Flexibility for enhancements / Future requirements:

The Data extracted from the source is totally dependent on the current output. If a new requirement needs to be implemented in the future that may need data that was not included in the original design, we would have to add that to the ETL routines. Due to tight dependency between the routines developed, this often leads to a need for fundamental re-design and development, resulting in increased cost, time and effort involved.

Additional Hardware/Tools and Skill set required:

Third Party tools are required to implement the ETL process as the Target system doesn’t perform transformations, leading to increased cost and risk involved.

Most third party tools utilize their own engine to implement the ETL process. Regardless of the size of the solution this can necessitate the investment in additional hardware to implement the tool’s ETL engine.

The use of third party tools to implement ETL processes compels the learning of new scripting languages and processes.

 

Summary

ELT makes sense when the target is a high-end data engine, such as a data appliance, Hadoop cluster, or cloud installation.

The ETL approach can provide drastically better performance in certain scenarios provided a proper ETL Pipeline is designed allowing for Parallel execution for Faster performance. The training and development costs of ETL need to be weighed against the need for better performance as well. If the target system is not powerful enough for ELT, then ETL may be more economical.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories
Follow Us