Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof-of-concept implemenation for supporting multiple pack repositories #126

Closed
wants to merge 6 commits into from

Conversation

zhubonan
Copy link
Collaborator

@zhubonan zhubonan commented Nov 8, 2021

This PR allows the packs to be stored in additional locations (pack repositories) other than the main object store folder.

The main reason for allowing this is allow packs to be moved to different file systems (slow but spacious ones) to free up spaces in the main filesystem (usually on a SSD), in case of growing demand of storage space.

Packing loose objects and writing directly to the packs will only take place on the main storage to avoid any performance impact.

One exception is when repacking - this will be done directly inside the pack repository (on the same filesytem as the pack file to be repacked). This means there is no longer a single path for temporaray pack that is used for repacking (previously pack_id=-1), as its path on the file system now depends on the id of the pack that is being repacked.

The user is resposible for moving the packs, ideally when the repository is fully "offline". Online operation is still possible, but requires a sequence of copy-rename-unlink operations.

Todos:

  • allow getting sizes of each pack repositories
  • More comprehensive tests
    • Test for repacking
    • Test for online concurrent pack relocation
  • CLI for displaying information of pack repositories
  • CLI for moving packs while the repository is "online" (being accessed)
  • CLI for checking the integrity, e.g. wheter there are "missing" packs.

Since pack can sit in the additional folder, it no longer make sense
to get the path of a repack soley from the pack_id. Path to the
repack file must be obtained by giving the pack_id itself.
This ensures that the repack file sit on the same filessytem as the
pack that is being repacked, and allow safe handling at the end using
hard link.
@zhubonan zhubonan changed the title Proof-of-concept implemenation for support multiple pack repositories Proof-of-concept implemenation for supporting multiple pack repositories Nov 8, 2021
@codecov
Copy link

codecov bot commented Nov 8, 2021

Codecov Report

Merging #126 (721df95) into develop (7a09ea2) will decrease coverage by 0.73%.
The diff coverage is 84.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #126      +/-   ##
===========================================
- Coverage    99.52%   98.79%   -0.74%     
===========================================
  Files            8        8              
  Lines         1675     1736      +61     
===========================================
+ Hits          1667     1715      +48     
- Misses           8       21      +13     
Impacted Files Coverage Δ
disk_objectstore/container.py 97.98% <84.00%> (-1.42%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a09ea2...721df95. Read the comment docs.

@zhubonan
Copy link
Collaborator Author

close since it is superseded by #133

@zhubonan zhubonan closed this Jan 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant