Nowadays, the study of nucleic acids (DNA/RNA) has become a digital science thanks to the advent of modern massive parallel sequencing technologies, better known with the acronym NGS standing for next-generation sequencing, and to the availability of a vast amount of genetic data easily accessible from publicly available databases. Due to the quantity and complexity of such data, its processing requires strong computer science knowledge and skills. This background includes topics such as programming and scripting languages, command-line interfaces, low-level data management tools, which are not always part of the toolbox of molecular biologists and geneticists. The need to adapt to entirely new IT tools and workflows slow down even the more experienced researchers, thus dedicated and customizable GUIs would be much more preferable and conducive. In this paper, we tackle this issue by proposing a preliminary architecture for a framework providing the following benefits: i) it supports the post-NGS analysis process definition phase (commonly called pipeline definition) via a graphical dashboard designed with NodeRED; ii) it automatically deploys the workflows on top of a cluster of computational resources, according to the Function-as-a-Service paradigm, i.e., treating each step of the pipeline as a function to be executed within Linux-based containers, pre-configured with all the necessary dependencies; iii) it runs such containers taking care automatically of resource load balancing. Finally, the framework is thought to include human feedback in the loop, thanks to the availability of a smart notification system, allowing the researcher to monitor the workflows and make any decision needed for its continuation.
Toward a Function-as-a-Service Framework for Genomic Analysis
Tricomi, Giuseppe
Primo
Writing – Original Draft Preparation
;Giosa, DomenicoSecondo
Validation
;Merlino, GiovanniWriting – Review & Editing
;Romeo, OrazioPenultimo
Writing – Review & Editing
;Longo, FrancescoUltimo
Supervision
2020-01-01
Abstract
Nowadays, the study of nucleic acids (DNA/RNA) has become a digital science thanks to the advent of modern massive parallel sequencing technologies, better known with the acronym NGS standing for next-generation sequencing, and to the availability of a vast amount of genetic data easily accessible from publicly available databases. Due to the quantity and complexity of such data, its processing requires strong computer science knowledge and skills. This background includes topics such as programming and scripting languages, command-line interfaces, low-level data management tools, which are not always part of the toolbox of molecular biologists and geneticists. The need to adapt to entirely new IT tools and workflows slow down even the more experienced researchers, thus dedicated and customizable GUIs would be much more preferable and conducive. In this paper, we tackle this issue by proposing a preliminary architecture for a framework providing the following benefits: i) it supports the post-NGS analysis process definition phase (commonly called pipeline definition) via a graphical dashboard designed with NodeRED; ii) it automatically deploys the workflows on top of a cluster of computational resources, according to the Function-as-a-Service paradigm, i.e., treating each step of the pipeline as a function to be executed within Linux-based containers, pre-configured with all the necessary dependencies; iii) it runs such containers taking care automatically of resource load balancing. Finally, the framework is thought to include human feedback in the loop, thanks to the availability of a smart notification system, allowing the researcher to monitor the workflows and make any decision needed for its continuation.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.