Yang, Qishan, Ge, Mouzhi and Helfert, Markus ORCID: 0000-0001-6546-6408 (2018) Data quality problems in TPC-DI based data integration processes. In: 19th International Conference, ICEIS 2017, 26-29 Apr 2017, Porto, Portugal. ISBN 978-3-319-93374-0
Abstract
Many data driven organisations need to integrate data from multiple, distributed and heterogeneous resources for advanced data analysis. A data integration system is an essential component to collect data into a data warehouse or other data analytics systems. There are various alternatives of data integration systems which are created in-house or provided by vendors. Hence, it is necessary for an organisation to compare and benchmark them when choosing a suitable one to meet its requirements. Recently, the TPC-DI is proposed as the first industrial benchmark for evaluating data integration systems. When using this benchmark, we find some typical data quality problems in the TPC-DI data source such as multi-meaning attributes and inconsistent data schemas, which could delay or even fail the data integration process. This paper explains processes of this benchmark and summarises typical data quality problems identified in the TPC-DI data source. Furthermore, in order to prevent data quality problems and proactively manage data quality, we propose a set of practical guidelines for researchers and practitioners to conduct data quality management when using the TPC-DI benchmark.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | Data quality; Data integration; TPC-DI Benchmark; ETL |
Subjects: | Computer Science > Information technology Computer Science > Information storage and retrieval systems |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > INSIGHT Centre for Data Analytics Research Initiatives and Centres > ADAPT |
Published in: | Hammoudi, Slimane, Śmiałek, Michał, Camp, Olivier and Filipe, Joaquim, (eds.) Enterprise Information Systems. Lecture Notes in Business Information Processing (LNBIP) 321. Springer. ISBN 978-3-319-93374-0 |
Publisher: | Springer |
Official URL: | https://doi.org/10.1007/978-3-319-93375-7_4 |
Copyright Information: | © 2018 Springer. The original publication is available at www.springerlink.com |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Science Foundation Ireland grant SFI/12/RC/2289 |
ID Code: | 22315 |
Deposited On: | 26 Jun 2018 12:51 by Qishan Yang . Last Modified 13 Mar 2019 14:41 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
759kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record