An Effective DNA-Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data
EDS proposes an effective DNA storage (EDS) approach for archiving medical data. The EDS approach incorporates (i) a novel fraction strategy for handling the crucial problem of rotating encoding to control data loss and DNA sequencing costs; (ii) a novel rule-based quaternary transcoding to satisfy bio-constraints and ensure reliable mapping; and (iii) a new indexing method to simplify random search and access. The approach's effectiveness is validated through computer simulation and biological experiments, confirming its practicality.
Step-by-step installation is as follows:
Install Python IDE, PyCharm from here https://www.jetbrains.com/pycharm/download/?section=windows,
Install following Python packages
- Codecs
- Math
- Struct
- Os, random
- Binascii
- Blast
- Struct
Update the existing system according to requirements or run.
Readers can easily follow the steps outlined in our video demonstration available in the files above.
- Open EDS.py
- The default settings are for encoding the image files (16 chunks of MRI). Users can change the input file path and output results path at img_dir = './image/' and result_dir = './imageResults/', respectively.
- If the user wants to encode a non-image file, the first user has to turn the 'main' function on by removing # from line 633 and turning off the function by inserting # in front of line 632.
- Suppose the user runs the code for image file encoding; the following output can be found in the terminal;
- Original binary segment
- Max GC
- Min GC
- Total GC
- Max length
- Min length
- Average length
- Total sequences
- Density
- Time
- Maximum file size
- Adding sequences from FASTA; added x sequences in x seconds.
- The folder 'imageResults' has 16 subfolders of 16 corresponding chunks. Each sub-image has primers and DNA sequences generated by the code. For experimental convince, we have merged all the chunks images in 'result.dna' file in the 'imageResults' folder. (merged sequences can be differentiated by the primer difference) The result.dna was converted into an xlsx file to send out for gene synthesis.
- Suppose the user runs the code for non-image file encoding (i.e., report); after setting the pdf_dir path, the code will provide the results.dna file in the 'reportResults' folder.
The resulting xlsx files were sent out to DNA synthesis companies. The synthesized DNA and gene were sequenced from another company, and later, we received the DNA sequences with multiple results. These DNA sequences were decoded to access the required chunks and different files.
- Open decode_one.py
- Select the 'main' function for image and non-image files on lines 226 and 227.
- Provide the input_path of a file which is being decoded.
- The decoded results will be generated back to the original folders.
In the manuscript, we have offered various analyses on DNA and binary file recovery, running time, memory utilization, GC and RC constraints satisfactions, and biological validation, for which readers are referred to the main draft and supplementary file.
EDS is licensed under the GNU General Public License; for more information, read the LICENSE file or refer to:
A related paper (https://onlinelibrary.wiley.com/doi/10.1002/smtd.202301585) is published in Small Method journal (12.4 impact factor).
Cite:
A. Rasool, J. Hong, Z. Hong, Y. Li, C. Zou, H. Chen, Q. Qu, Y. Wang, Q. Jiang, X. Huang, J. Dai, An Effective DNA-Based File Storage System for Practical Archiving and Retrieval of Medical MRI Data. Small Methods 2024, 2301585. https://doi.org/10.1002/smtd.202301585