The study of DNA sequences has become indis-pensable for basic biological research, and in numerous applied fields such as comparative genomics, evolutionary biology, pan genomics, genetics of disease, regulation of gene expression, oncology and many others, all supported by bioinformatics. In the era of Cloud computing, federating the Cloud systems of different genetics research organisations paves the way towards a new era of data sharing and new mashup services and applications. However, due to the huge amount of genomics data (genomics Big Data) that have to be managed, a parallel distributed NoSQL DataBase Management System (DBMS) approach becomes fundamental. Specifically, due to the textual nature of genomics data, a NoSQL DBMS appears to be the most suitable solution. In this paper, by considering the whole human genome, we present a preliminary study comparing this latter using MongoDB with a SQL-like database solution, i.e., MySQL in order to look for DNA sequences. Moreover, in order to optimize the research of genomics codes, we adopt hash functions that allow mapping nucleotides sequences of arbitrary size onto data of a fixed smaller size. Experiments, shows that MongoDB apart simplifying the management of genomics data provides better performances.

Optimizing the Research of DNA Sequences in a NoSQL Document Database: A Preliminary Study

Celesti A.
;
Galletta A.;Fazio M.;Villari M.
2019-01-01

Abstract

The study of DNA sequences has become indis-pensable for basic biological research, and in numerous applied fields such as comparative genomics, evolutionary biology, pan genomics, genetics of disease, regulation of gene expression, oncology and many others, all supported by bioinformatics. In the era of Cloud computing, federating the Cloud systems of different genetics research organisations paves the way towards a new era of data sharing and new mashup services and applications. However, due to the huge amount of genomics data (genomics Big Data) that have to be managed, a parallel distributed NoSQL DataBase Management System (DBMS) approach becomes fundamental. Specifically, due to the textual nature of genomics data, a NoSQL DBMS appears to be the most suitable solution. In this paper, by considering the whole human genome, we present a preliminary study comparing this latter using MongoDB with a SQL-like database solution, i.e., MySQL in order to look for DNA sequences. Moreover, in order to optimize the research of genomics codes, we adopt hash functions that allow mapping nucleotides sequences of arbitrary size onto data of a fixed smaller size. Experiments, shows that MongoDB apart simplifying the management of genomics data provides better performances.
2019
978-1-7281-2999-0
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3150654
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact