Large-scale Data Integration Using Graph Probabilistic Dependencies (GPDs)
Authors
Zada, Muhammad Sadiq HassanYuan, Bo

Anjum, Ashiq

Azad, Muhammad Ajmal
Khan, Wajahat Ali
Reiff-Marganiec, Stephan
Issue Date
2020-12-28
Metadata
Show full item recordAbstract
The diversity and proliferation of Knowledge bases have made data integration one of the key challenges in the data science domain. The imperfect representations of entities, particularly in graphs, add additional challenges in data integration. Graph dependencies (GDs) were investigated in existing studies for the integration and maintenance of data quality on graphs. However, the majority of graphs contain plenty of duplicates with high diversity. Consequently, the existence of dependencies over these graphs becomes highly uncertain. In this paper, we proposed graph probabilistic dependencies (GPDs) to address the issue of uncertainty over these large-scale graphs with a novel class of dependencies for graphs. GPDs can provide a probabilistic explanation for dealing with uncertainty while discovering dependencies over graphs. Furthermore, a case study is provided to verify the correctness of the data integration process based on GPDs. Preliminary results demonstrated the effectiveness of GPDs in terms of reducing redundancies and inconsistencies over the benchmark datasets.Citation
Zada, M.S.H., Yuan, B., Anjum, A., Azad, M.A., Khan, W.A. and Reiff-Marganiec, S. (2020). ‘Large-scale Data Integration Using Graph Probabilistic Dependencies (GPDs)’. IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, Leicester, 7-10 December. New York: IEEE, pp. 27-36.Publisher
IEEEDOI
10.1109/bdcat50828.2020.00028Additional Links
https://ieeexplore.ieee.org/abstract/document/9302543Type
Meetings and ProceedingsLanguage
enISBN
9780738123967ae974a485f413a2113503eed53cd6c53
10.1109/bdcat50828.2020.00028
Scopus Count
Collections
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International