Skip to content

EDS: An Effective DNA-Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data

Notifications You must be signed in to change notification settings

abdul-rasool/EDS-Effective-DNA-Storage-System

Repository files navigation

EDS: Effective DNA Storage System

An Effective DNA-Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data

What is EDS?

EDS proposes an effective DNA storage (EDS) approach for archiving medical data. The EDS approach incorporates (i) a novel fraction strategy for handling the crucial problem of rotating encoding to control data loss and DNA sequencing costs; (ii) a novel rule-based quaternary transcoding to satisfy bio-constraints and ensure reliable mapping; and (iii) a new indexing method to simplify random search and access. The approach's effectiveness is validated through computer simulation and biological experiments, confirming its practicality.

Installation

Step-by-step installation is as follows:

Tools and environment

Install Python IDE, PyCharm from here https://www.jetbrains.com/pycharm/download/?section=windows,

Install following Python packages

  • Codecs
  • Math
  • Struct
  • Os, random
  • Binascii
  • Blast
  • Struct

Experimental steps

Update the existing system according to requirements or run.

Readers can easily follow the steps outlined in our video demonstration available in the files above.

ENCODING

  1. Open EDS.py
  2. The default settings are for encoding the image files (16 chunks of MRI). Users can change the input file path and output results path at img_dir = './image/' and result_dir = './imageResults/', respectively.
  3. If the user wants to encode a non-image file, the first user has to turn the 'main' function on by removing # from line 633 and turning off the function by inserting # in front of line 632.
  4. Suppose the user runs the code for image file encoding; the following output can be found in the terminal;
  • Original binary segment
  • Max GC
  • Min GC
  • Total GC
  • Max length
  • Min length
  • Average length
  • Total sequences
  • Density
  • Time
  • Maximum file size
  • Adding sequences from FASTA; added x sequences in x seconds.
  1. The folder 'imageResults' has 16 subfolders of 16 corresponding chunks. Each sub-image has primers and DNA sequences generated by the code. For experimental convince, we have merged all the chunks images in 'result.dna' file in the 'imageResults' folder. (merged sequences can be differentiated by the primer difference) The result.dna was converted into an xlsx file to send out for gene synthesis.
  2. Suppose the user runs the code for non-image file encoding (i.e., report); after setting the pdf_dir path, the code will provide the results.dna file in the 'reportResults' folder.

The resulting xlsx files were sent out to DNA synthesis companies. The synthesized DNA and gene were sequenced from another company, and later, we received the DNA sequences with multiple results. These DNA sequences were decoded to access the required chunks and different files.

DECODING

  1. Open decode_one.py
  2. Select the 'main' function for image and non-image files on lines 226 and 227.
  3. Provide the input_path of a file which is being decoded.
  4. The decoded results will be generated back to the original folders.

In the manuscript, we have offered various analyses on DNA and binary file recovery, running time, memory utilization, GC and RC constraints satisfactions, and biological validation, for which readers are referred to the main draft and supplementary file.

License

EDS is licensed under the GNU General Public License; for more information, read the LICENSE file or refer to:

http://www.gnu.org/licenses/

Citation

A related paper (https://onlinelibrary.wiley.com/doi/10.1002/smtd.202301585) is published in Small Method journal (12.4 impact factor).

Cite:

A. Rasool, J. Hong, Z. Hong, Y. Li, C. Zou, H. Chen, Q. Qu, Y. Wang, Q. Jiang, X. Huang, J. Dai, An Effective DNA-Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data. Small Methods 2024, 2301585. https://doi.org/10.1002/smtd.202301585

About

EDS: An Effective DNA-Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published