As of versions 2.4.26 and 3.0.5 the Excel connector only supports a connection to a single Excel file. The Excel connector will eventually support connections to multiple Excel files with a single connection. In the meantime, we suggest using the following solutions.
Method 1
The solution consists of automating the creation of multiple connections, one connection to each Excel file. The Excel files are in the following folder structure.
/mnt/hgfs/vm_host_tmp_folder/excel-sample-files/
├── Book1.xlsx
├── Book2.xlsx
├── Book3.xlsx
├── Book4.xlsx
└── subfolder
├── cities.xlsx
└── customers.xlsx
The first step is to create a file connection to the folder where the files are located.
The connection can also be created through code:
/* Create connection */
call SYSADMIN.createConnection(name => 'excel_folder', jbossCliTemplateName => 'ufile', connectionOrResourceAdapterProperties => 'ParentDirectory=/mnt/hgfs/vm_host_tmp_folder/excel-sample-files/,decompressCompressedFiles=false', encryptedProperties => '');;
/* Create data source */
call SYSADMIN.createDatasource(name => 'excel_folder', translator => 'ufile', modelProperties => 'importer.useFullSchemaName=false', translatorProperties => '', encryptedModelProperties => '', encryptedTranslatorProperties => '');;
Once the connection has been created, you can query the folder to list the files.
call "excel_folder.listFiles"("pathAndPattern" => '*.xlsx');;
path |
name |
lastModifiedTime |
lastAccessTime |
creationTime |
isRegularFile |
isDirectory |
isSymbolicLink |
isOther |
size |
fileKey |
/mnt/hgfs/vm_host_tmp_folder/excel-sample-files |
Book1.xlsx |
2022-07-18 13:31 |
2022-07-18 13:31 |
2022-07-18 13:31 |
TRUE |
FALSE |
FALSE |
FALSE |
75380 |
-1183557981 |
/mnt/hgfs/vm_host_tmp_folder/excel-sample-files |
Book2.xlsx |
2022-07-18 13:29 |
2022-07-18 13:29 |
2022-07-18 13:29 |
TRUE |
FALSE |
FALSE |
FALSE |
21332 |
-1157001664 |
/mnt/hgfs/vm_host_tmp_folder/excel-sample-files |
Book3.xlsx |
2022-07-18 13:23 |
2022-07-18 13:30 |
2022-07-18 13:23 |
TRUE |
FALSE |
FALSE |
FALSE |
117472 |
-1128078879 |
/mnt/hgfs/vm_host_tmp_folder/excel-sample-files |
Book4.xlsx |
2022-07-18 13:30 |
2022-07-18 13:30 |
2022-07-18 13:30 |
TRUE |
FALSE |
FALSE |
FALSE |
9045 |
-1097344866 |
The file connector does not recursively enter subfolders. To list the contents of the subfolder you can use the following syntax:
call "excel_folder.listFiles"("pathAndPattern" => '\subfolder\*.xlsx');;
path |
name |
lastModifiedTime |
lastAccessTime |
creationTime |
isRegularFile |
isDirectory |
isSymbolicLink |
isOther |
size |
fileKey |
/mnt/hgfs/vm_host_tmp_folder/excel-sample-files/subfolder | cities.xlsx | 19:08.0 | 27:11.0 | 19:08.0 | TRUE | FALSE | FALSE | FALSE | 12747 | <OBJECT> |
/mnt/hgfs/vm_host_tmp_folder/excel-sample-files/subfolder | customers.xlsx | 21:41.0 | 31:55.0 | 21:41.0 | TRUE | FALSE | FALSE | FALSE | 10639 | <OBJECT> |
The next step is to create a stored procedure to loop over the list of files and create the CONNECTION and DATASOURCE.
create procedure views.createExcelDatasources(
connection_name_prefix string,
path_and_pattern string
) as
begin
declare string connection_name;
-- Get the list of Excel files located in the folder. Then loop of the results using a cursor.
loop on (select x.path, x.name from (call "excel_folder.listFiles"("pathAndPattern" => path_and_pattern)) x) as cur
begin
-- Create the new connection name.
connection_name = connection_name_prefix || replace(cur.name, '.', '_');
-- Using the path and file name, create the CONNECTION and DATASOURCE
call SYSADMIN.createConnection(
name => connection_name,
jbossCliTemplateName => 'excel',
connectionOrResourceAdapterProperties => 'ParentDirectory=' || cur.path,
encryptedProperties => ''
) without return;
call SYSADMIN.createDatasource(
name => connection_name,
translator => 'excel',
modelProperties => 'importer.useFullSchemaName=false,importer.headerRowNumber=1,importer.ExcelFileName="' || cur.name || '"',
translatorProperties => '',
encryptedModelProperties => '',
encryptedTranslatorProperties => ''
) without return;
end
end;;
Once the stored procedure has been created we pass in a prefix and execute the stored procedure using the following call:
call views.createExcelDatasources("connection_name_prefix" => 'excel_file__', "path_and_pattern" => '*.xlsx');;
Once the stored procedure completes, refresh the list of data sources.
The new data sources will become available after the refresh, and you can query the contents of the various Excel files.
The files in the subfolder can be added by calling the stored procedure with the following parameters:
call views.createExcelDatasources("connection_name_prefix" => 'excel_file__', "path_and_pattern" => '\subfolder\*.xlsx');;
Method 2
Method 2 does not persist multiple connections, instead it creates a single connection to a file and imports the data. Then deletes the connection, and repeats the process for each Excel file.
This method does require that each Excel file have the same structure and schema, else the insert will fail. It is possible to modify this code to insert each Excel file into it's own table, but that is not covered in this article.
create procedure views.bulkImportExcelFiles(
connection_name string,
path_and_pattern string,
destination_schema string,
destination_table string
) as
begin
declare string destination_schema_table = '"' || destination_schema || '.' || destination_table || '"';
declare string sql_template_insert_into = 'insert into "<<dest_schema>>"."<<dest_table>>" select * from "<<src_schema>>"."<<src_table>>"';
declare string sql_template_select_into = 'select * into "<<dest_schema>>"."<<dest_table>>" from "<<src_schema>>"."<<src_table>>"';
declare string sql_stmt;
-- Get the list of Excel files located in the folder. Then loop of the results using a cursor.
loop on (select x.path, x.name from (call "excel_folder.listFiles"("pathAndPattern" => path_and_pattern)) x) as curExcelFiles
begin
-- If the connection name exists, delete it.
if (exists(SELECT * FROM "SYSADMIN.Connections" as c where connection_name in ("c.fullName", "c.name")))
begin
call "SYSADMIN.removeDataSource"("name" => connection_name) without return;
call "SYSADMIN.removeConnection"("name" => connection_name) without return;
end
-- Using the path and file name, create the CONNECTION and DATASOURCE
call SYSADMIN.createConnection(
name => connection_name,
jbossCliTemplateName => 'excel',
connectionOrResourceAdapterProperties => 'ParentDirectory=' || curExcelFiles.path,
encryptedProperties => ''
) without return;
call SYSADMIN.createDatasource(
name => connection_name,
translator => 'excel',
modelProperties => 'importer.useFullSchemaName=false,importer.headerRowNumber=1,importer.ExcelFileName="' || curExcelFiles.name || '"',
translatorProperties => '',
encryptedModelProperties => '',
encryptedTranslatorProperties => ''
) without return;
-- Create the SQL code to import the Excel file contents into the table
-- Note: An Excel file may contain multiple sheets. Iterate over each one and import the data.
loop on (SELECT "Name" FROM "SYS.Tables" where SchemaName = connection_name) as curExcelSheets
begin
if ((select * from (call "UTILS.tableExists"("tableName" => destination_schema_table)) x) = True)
begin
-- If the target table **does exists**, simply insert the data into the table.
sql_stmt = replace(replace(replace(replace(sql_template_insert_into,
'<<dest_schema>>', destination_schema),
'<<dest_table>>', destination_table),
'<<src_schema>>', connection_name),
'<<src_table>>', curExcelSheets.Name);
end
else
begin
-- If the target table does **NOT** exists, use a select into to create the table
sql_stmt = replace(replace(replace(replace(sql_template_select_into,
'<<dest_schema>>', destination_schema),
'<<dest_table>>', destination_table),
'<<src_schema>>', connection_name),
'<<src_table>>', curExcelSheets.Name);
end
execute (sql_stmt) without return;
end
end
end;;
Comments
0 comments
Please sign in to leave a comment.