Data entry isn’t something most would put on a list of BI tools. But here’s my thought: I have yet to see a BI system of any size that doesn’t have some need for data entry of some kind, whether maintaining a small reference table or pulling the system’s configuration levers or correcting erroneous inbound data of some kind.
Usually most of these get handled by a DBA running a SQL script and voila! But, there’s all kinds of issues with that from a system security and stability standpoint. There’s no record of the change (excepting database logs that may or may not be parsable), business and quality rules are bypassed possibly leading to a corrupt state, and the DBA has been granted write level access to the production system (which I understand is OK in many places). On the upside, it’s cheap, flexible, and isn’t another tool to maintain.
But what if there was an easy, generalized way to make the necessary edits without bypassing the rules and check and audit trails?
The solution to this issue depends largely on the overall integration architecture deployed. If your architecture includes an ODS as the primary integration point, the answer may be a tool that either writes directly to the ODS or one that writes to the data interfaces surrounding the ODS (if they exist). In either case, the tool would only need to respect whatever standards are in place for ODS writes, probably updating a “last updated by” field or something.
That’s nice, but most data warehouses don’t benefit from an integrated ODS as the single or even primary source of data. So data would need to be written directly to the warehouse somehow. This means:
- Respecting SLAs that expect the system to be static and available (no transaction locks) during operating periods.
- Respecting model standards such as Type 2 dimension consistency, surrogate key generation, and foreign key links.
- Securing write access to the generally read-only data warehouse.
Basically, this sounds like a bad idea. But so does a DBA issuing an un-auditable UPDATE statement.
Unfortunately, there’s no silver bullet on this one. Without building a whole parallel interface for data warehouse writes the best advice I can give is to allow a limited audience to make updates to the staging tables that will load the warehouse. This respects the SLAs, doesn’t break the load processes, and ensures consistency is maintained. It doesn’t address the security issue, but this can probably be handled for a small group of eligible editors through table level security.
I’m interested in more comments on this idea, so please drop a comment below if you’ve seen or can think of a more elegant solution here.