• Big data analytics in healthcare: A cloud based framework for generating insights

      Anjum, Ashiq; Aizad, Sanna; Arshad, Bilal; Subhani, Moeez; Davies-Tagg, Dominic; Abdullah, Tariq; Antonopoulos, Nikolaos; University of Derby (Springer, 2017)
      With exabytes of data being generated from genome sequencing, a whole new science behind genomic big data has emerged. As technology improves, the cost of sequencing a human genome has gone down considerably increasing the number of genomes being sequenced. Huge amounts of genomic data along with a vast variety of clinical data cannot be handled using existing frameworks and techniques. It is to be efficiently stored in a warehouse where a number of things have to be taken into account. Firstly, the genome data is to be integrated effectively and correctly with clinical data. The other data sources along with their formats have to be identified. Required data is then extracted from these other sources (such as clinical datasets) and integrated with the genome. The main challenge here is to be able to handle the integration complexity as a large number of datasets are being integrated with huge amounts of genome. Secondly, since the data is captured at disparate locations individually by clinicians and scientists, it brings the challenge of data consistency. It has to be made sure that the data consistency is not compromised as it is passed along the warehouse. Checks have to be put in place to make sure the data remains consistent from start to finish. Thirdly, to carry this out effectively, the data infrastructure has to be in the correct order. How frequently the data is accessed plays a crucial role here. Data in frequent use will be handled differently than data which is not in frequent use. Lastly, efficient browsing mechanisms have to put in place to allow the data to be quickly retrieved. The data is then iteratively analysed to get meaningful insights. The challenge here is to perform analysis very quickly. Cloud Computing plays an important role as it is used to provide scalability.