McCarthy, Suzanne (2021) Reusing dynamic data marts for query management in an on-demand ETL architecture. PhD thesis, Dublin City University.
Abstract
Data analysts working often have a requirement to integrate an in-house data warehouse with external datasets, especially web-based datasets. Doing so can give them important insights into their performance when compared with competitors, their industry in general on a global scale, and make predictions as to sales, providing important decision support services. The quality of these insights depends on the quality of the data imported into the analysis dataset. There is a wealth of data freely available from government sources online but little unity between data sources, leading to a requirement for a data processing layer wherein various types of quality issues and heterogeneities can be resolved. Traditionally, this is achieved with an Extract-Transform-Load (ETL) series of processes which are performed on all of the available data, in advance, in a batch process typically run outside of business hours. While this is recognized as a powerful knowledge-based support, it is very expensive to build and maintain, and is very costly to update, in the event that new data sources become available. On-demand ETL offers a solution in that data is only acquired when needed and new sources can be added as they come online. However, this form of dynamic ETL is very difficult to deliver. In this research dissertation, we explore the possibilities of creating dynamic data marts which can be created using non-warehouse data to support the inclusion of new sources. We then examine how these dynamic structures can be used for query fulfillment andhow they can support an overall on-demand query mechanism. At each step of the research and development, we employ a robust validation using a real-world data warehouse from the agricultural domain with selected Agri web sources to test the dynamic elements of the proposed architecture.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | March 2021 |
Refereed: | No |
Supervisor(s): | Roantree, Mark and McCarren, Andrew |
Uncontrolled Keywords: | data warehousing; data modelling; data integration |
Subjects: | Computer Science > Information storage and retrieval systems |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > INSIGHT Centre for Data Analytics |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
ID Code: | 25228 |
Deposited On: | 11 Mar 2021 10:55 by Suzanne Mc Carthy . Last Modified 11 Mar 2021 10:55 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record