A Dynamic MPI-Based Memory-Efficient Framework for Longest Common Subsequence Computation on Massive DNA Sequence

Singh, Shubham Kumar and T, Dharanya and N, Aarthi and S, Nagadevi (2025) A Dynamic MPI-Based Memory-Efficient Framework for Longest Common Subsequence Computation on Massive DNA Sequence. International Journal of Innovative Science and Research Technology, 10 (5): 25may1297. pp. 971-977. ISSN 2456-2165

Text
IJISRT25MAY1297.pdf - Published Version
Download (546kB)

Official URL: https://doi.org/10.38124/ijisrt%2F25may1297

Abstract

Abstract—The rapid expansion of genomic datasets, particularly those encompassing entire human chromosomes, presents formidable computational challenges for conventional sequence alignment techniques. Among these, the Longest Common Subsequence (LCS) problem plays a fundamental role in comparative genomics and DNA sequence alignment. However, its inherent time and space complexity—typically quadratic in nature—renders traditional sequential and statically parallelized implementations insufficient for large-scale genomic analysis. In response to this limitation, we propose a dynamic, memory- efficient parallel architecture based on the Message Passing Interface (MPI) framework, specifically optimized for large DNA sequence comparisons. Our approach introduces a master-worker model that dynamically distributes computational workload at runtime. By dividing the input sequences into smaller, manageable segments and assigning these chunks to worker processes on demand, the system ensures effective load balancing across processing units. The architecture leverages a space-optimized dynamic programming technique, where only two rows of the LCS matrix are stored at any time, significantly reducing memory consumption without sacrificing correctness. To evaluate the scalability and performance of our method, we conducted extensive experiments using complete human chromosome datasets across an MPI cluster with eight processes. The results indicate that while the dynamic strategy introduces moderate communication overhead, it consistently outperforms static distribution methods in terms of scalability, adaptability to heterogeneous environments, and memory efficiency. Notably, the proposed solution maintains stability and performance even as sequence sizes grow, making it suitable for deployment in high- performance computing (HPC) environments and cloud-based bioinformatics platforms.

Item Type:	Article
Subjects:	T Technology > T Technology (General)
Divisions:	Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
Depositing User:	Editor IJISRT Publication
Date Deposited:	02 Jun 2025 12:03
Last Modified:	02 Jun 2025 12:03
URI:	https://eprint.ijisrt.org/id/eprint/1049

Actions (login required)

: View Item

A Dynamic MPI-Based Memory-Efficient Framework for Longest Common Subsequence Computation on Massive DNA Sequence

What do you want to search?

Abstract

Actions (login required)