Hi Data Virtuality
It would be very helpful in terms of execution performance and data structuring to be able to group distinct data sources that physically exist on a single RDBMS instance. That means that a DV-source is still mapped to a single source DB, but that there would be some kind of grouping parameter that would indicate to the DV optimizer that the parts of a SQL query that use tables from (different) sources, but have the same "grouping id" can be pushed down in a single SQL query to the given RDBMS, rather than copying the data to DV-memory and processing it there.
We have 2 use cases for this that I can think of:
- We often use data as source that are coming from two different databases on the same db instance. E. g. if there are two tables X.x and Y.y, where X and Y are two different databases on the same RDBMS, DV would copy the tables into memory and join them in DV. It would be better though if the query could be pushed down to the RDBMS in its entirety.
- We envision to stage data from different RDBMSs and process them on that stage-RDBMS-instance subsequently. If we want to reflect db / schema / table names as is in the sources (instead of using prefixes to guarantee uniqueness of table names), we have to map source DB to DV-source 1:1. The trouble with that is that even though all the staging DBs would be on the same RDBMS instance, and therefore the most efficient way to process queries using data across all those tables would be to push down to that RDBMS instance, DV would still extract everything that is beyond a single DV-source, to join in memory.
Please sign in to leave a comment.