Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRUNK-6300 Add Storage Service #4920

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rkorytkowski
Copy link
Member

@rkorytkowski rkorytkowski commented Feb 12, 2025

Description of what I changed

Issue I worked on

see https://issues.openmrs.org/browse/TRUNK-6300

Checklist: I completed these to help reviewers :)

  • My IDE is configured to follow the code style of this project.

    No? Unsure? -> configure your IDE, format the code and add the changes with git add . && git commit --amend

  • I have added tests to cover my changes. (If you refactored
    existing code that was well tested you do not have to add tests)

    No? -> write tests and add them to this commit git add . && git commit --amend

  • I ran mvn clean package right before creating this pull request and
    added all formatting changes to my commit.

    No? -> execute above command

  • All new and existing tests passed.

    No? -> figure out why and add the fix to your commit. It is your responsibility to make sure your code works.

  • My pull request is based on the latest changes of the master branch.

    No? Unsure? -> execute command git pull --rebase upstream master

* <p>
* Data objects are stored under generated or provided unique keys. Data and keys are immutable.
*
* @since 2.8.0, 2.7.2, 2.6.16, 2.5.15
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we backport the interface and implementation to all versions used by o3, o2, bahmni, ozone... It is to be able to easily add support in modules. We will only modify the master branch to use the new service in api. Does this list of versions cover what is used by implementations these days?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use-case for backporting this? I kind of imagine that we'll need to re-write stuff in various modules and that's going to be a lot simpler to just say "this works with 2.8.0" rather than "this works with 2.8.0+, 2.7.3+, 2.6.16+, and 2.5.15+".

@rkorytkowski
Copy link
Member Author

Please do not merge until I refactor complex obs to use this service in https://openmrs.atlassian.net/jira/software/c/projects/TRUNK/issues/TRUNK-6304

@rkorytkowski rkorytkowski marked this pull request as draft February 12, 2025 09:39
@rkorytkowski
Copy link
Member Author

Also I need to add key encoding so it doesn't include special characters. It might be needed to add mimetype support as well as it's used in https://github.com/openmrs/openmrs-core/blob/master/api/src/main/java/org/openmrs/obs/handler/AbstractHandler.java#L117

* @return stream of keys
* @throws IOException IO error
*/
Stream<String> getKeys(String moduleId, String prefix) throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i see! 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would a key be a folder name? For instance lucene as we currently have in the application data directory?

Copy link
Member Author

@rkorytkowski rkorytkowski Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key points to an actual file not a directory, but a key can include a directory e.g. lucene/data.dat, lucene/data0.dat, etc.

If you want to search for all files in lucene you would call the method with "lucene/". If you want to list all files starting with data in lucene dir you would do "lucene/data".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also give an example of how the prefix would differ from the key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I call it prefix, because it matches any keys starting with the prefix e.g. having 'abc/ab', 'abc/abc', 'ab', 'abc/abcd', 'abc/ac' and the prefix 'abc/' we would get back keys 'abc/ab', 'abc/abc', 'abc/abcd' but not 'ab' and 'abc/ac'.

Copy link
Member

@ibacher ibacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rkorytkowski! A few questions / comments.

* <p>
* Data objects are stored under generated or provided unique keys. Data and keys are immutable.
*
* @since 2.8.0, 2.7.2, 2.6.16, 2.5.15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use-case for backporting this? I kind of imagine that we'll need to re-write stuff in various modules and that's going to be a lot simpler to just say "this works with 2.8.0" rather than "this works with 2.8.0+, 2.7.3+, 2.6.16+, and 2.5.15+".


private final String storageDir;

private final DateFormat dateFormat = new SimpleDateFormat("yyyy/MM-dd/HH-mm/yyyy-MM-dd-HH-mm-ss-SSS-");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any new code, we should probably use DateTimeFormatter (SimpleDateFormat is rather infamously not thread-safe).


public Path getPath(final String key) {
Path path = Paths.get(key);
if (path.isAbsolute()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purposes of this service, it would seem safer to just enforce the rule that everything is interpreted relative to the storageDir.

private Path newKey(String moduleId, String keySuffix) {
if (keySuffix == null) {
Date date = new Date();
keySuffix = dateFormat.format(date) + UUID.randomUUID().toString().substring(0, 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to use Common Lang's RandomStringUtils#nextAlphanumeric(8)


@Override
public String saveTempData(InputStream inputStream) throws IOException {
Path tempFile = Files.createTempFile("openmrs-temp", ".tmp");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of my comment about not allowing absolute paths:

Suggested change
Path tempFile = Files.createTempFile("openmrs-temp", ".tmp");
Path tempFile = Files.createTempFile(Paths.get(storageDir, ".tmp"), "openmrs-temp", ".tmp");


/**
* Saves the given InputStream as a temporary file. Temporary files are kept, if possible, in local
* storage for fast IO. They are available only for the duration of a request and for the single
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the "request" that this is tied to?

dirs.add(moduleId);
storagePath = Paths.get(storageDir, "modules", moduleId);
} else {
storagePath = Paths.get(storageDir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If moduleId is null, should we filter the modules directory from the returned results? Or else are we creating the contract that every storage service should have a top-level key called modules that stores the module-specific data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants