Discovering the most important data quality dimensions in health big data using latent semantic analysis

Juddoo, Suraj and George, Carlisle (2018) Discovering the most important data quality dimensions in health big data using latent semantic analysis. In: 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD), 06-07 Aug 2018, Durban, South Africa.

PDF - Final accepted version (with author's formatting)
Download (2MB) | Preview


Big Data quality is a field which is emerging. Many authors nowadays agree that data quality is still very relevant, even for Big Data uses. However, there is a lack of frameworks or guidelines focusing on how to carry out big data quality initiatives. The starting point of any data quality work is to determine the properties of data quality, termed ‘data quality dimensions’ (DQDs). Even these dimensions lack precise rigour in terms of definition in existing literature. This current research aims to contribute towards identifying the most important DQDs for big data in the health industry. It is a continuation of previous work, which, using relevant literature, identified five DQDs (accuracy, completeness, consistency, reliability and timeliness) as being the most important DQDs in health datasets. The previous work used a human judgement based research method known as an inner hermeneutic cycle (IHC). To remove the potential bias coming from the human judgement aspect, this research study used the same set of literature but applied a statistical research method (used to extract knowledge from a set of documents) known as latent semantic analysis (LSA). Use of LSA concluded that accuracy and completeness were the only similar DQDs classed as the most important in health Big Data for both IHC and LSA.

Item Type: Conference or Workshop Item (Paper)
Research Areas: A. > School of Science and Technology > Computer Science > Aspects of Law and Ethics Related to Technology group
Item ID: 25560
Notes on copyright: © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Useful Links:
Depositing User: Carlisle George
Date Deposited: 09 Nov 2018 12:29
Last Modified: 06 Apr 2019 23:47

Actions (login required)

Edit Item Edit Item

Full text downloads (NB count will be zero if no full text documents are attached to the record)

Downloads per month over the past year