Ecarton/cumulus 3751 s3 task #3910

etcart · 2025-01-28T21:22:38Z

Summary: 3751 just the s3 copy part
Addresses CUMULUS-3751: Move granules across collections

Changes

task for copying granules' s3 files from one collection to another
workflow and integration test for this s3 file copy, which is intended to be expanded to include the rest of the workflow as those are ready

PR Checklist

Update CHANGELOG
Unit tests
Ad-hoc testing - Deploy changes and test manually
Integration tests

…ecarton/CUMULUS-3751-as-separate-task

…umulus into CUMULUS-3757-move-granule

…ecarton/CUMULUS-3751-as-separate-task

… AL2023 (#3870) * Updated user-data for compatibility with Amazon Linux 2023 AMI * use amazon 2023 ami * install lvm2 * fix task-reaper * add more log for debug * update changelog * test backward compatibility image_id_ecs_amz2 * change back to /ngap/amis/image_id_ecs_al2023_x86 * update ecs user data use IMDSv2 * update fakeprovider IMDSv2 template aws cli v2 * update fake-provider --------- Co-authored-by: mikedorfman <[email protected]>

adding cbanh CI stack

…ests

* Update migration to remove vacuum statements that can time out in a Lambda env * Remove additional vacuum statements * Add CL

Jkovarik

Adding questions from cmr-utils! Thanks for your patience.

Jkovarik · 2025-02-06T17:17:05Z

packages/aws-client/src/S3.ts

+    chunkSize?: number
+  }
+): Promise<DeleteObjectCommandOutput> => {
+  const {


Consider: Should we have a success path test around this method working aside the wrapper implementation?

Jkovarik · 2025-02-06T20:18:44Z

packages/cmrjs/src/cmr-utils.js

@@ -560,7 +565,6 @@ function constructRelatedUrls({
  files,
  distEndpoint,
  bucketTypes,
-  s3CredsEndpoint = 's3credentials',
  cmrGranuleUrlType = 'both',


Given s3CredsEndpoint isn't used anywhere else in this file, we should probably just define it in the function scope rather than create a top-level variable.

Jkovarik · 2025-02-06T20:49:58Z

packages/cmrjs/src/cmr-utils.js

-  const useDirectS3Type = shouldUseDirectS3Type(metadataObject);
-
-  const newURLs = constructRelatedUrls({
+  updateUMMGMetadataObject({


It's probably not great for maintenance/clarity that this method mutates metadataObject - can we refactor it to not do that?

Meaning, updateUMMGMetadataObject should require a return from the method to get the updated object, not change something defined in the scope of the parent. I'd expect a clone, or expansion assignment or something like that in the child method.

Jkovarik · 2025-02-06T20:50:21Z

packages/cmrjs/src/cmr-utils.js

@@ -749,7 +782,7 @@ async function updateUMMGMetadata({
 * @param {string} cmrConfig.certificate - Launchpad certificate
 * @param {string} cmrConfig.username - EDL username
 * @param {string} cmrConfig.passwordSecretName - CMR password secret name
- * @returns {Promise<Object>} object to create CMR instance - contains the
+ * @returns {Promise<CMRConstructorParams>} object to create CMR instance - contains the


Jkovarik · 2025-02-06T20:56:15Z

packages/cmrjs/src/cmr-utils.js

@@ -691,6 +695,43 @@ function shouldUseDirectS3Type(metadataObject) {
  return false;
 }

+/**
+ *
+ * @param {Object} params


Can we define these params a bit better re: typing? Not asking for metadataObject, but bucketTypes/BucketMap/files might be reasonable given this is exported.

Jkovarik · 2025-02-06T21:01:44Z

packages/cmrjs/src/cmr-utils.js

@@ -848,31 +881,24 @@ function buildMergedEchoURLObject(URLlist = [], originalURLlist = [], removedURL
 }

 /**
- * After files are moved, creates new online access URLs and then updates
- * the S3 ECHO10 CMR XML file with this information.
 *


Edit: Docstring looks good, but should probably have header text

Jkovarik · 2025-02-06T21:14:38Z

packages/cmrjs/src/cmr-utils.js

+ * @param {string} attributePath - subObject path to seek
+ * @returns {string | null}
+*/
+const findCollectionAttributePath = (cmrObject, attributePath) => {


Is there a reason we need to recurse through the entire object? Is collection not a well defined key in the CMR spec?

Also.... if it isn't a well defined location in the spec, are we certain there can't be duplicates/etc?

Jkovarik · 2025-02-06T21:16:16Z

packages/cmrjs/src/cmr-utils.js

+) => {
+  const backupPath = defaultPath || identifierPath;
+  const fullPath = findCollectionAttributePath(cmrObject, identifierPath) || backupPath;
+  set(cmrObject, fullPath, value);


We should probably avoid mutation here, or consider not abstracting this?

I'm less concerned about the implementation as stands, but that this isn't by convention/etc something that someone would think twice about importing elsewhere at some point in the future.

Jkovarik · 2025-02-06T21:21:44Z

packages/cmrjs/tests/cmr-utils/test-cmr-utils.js

+  t.is(updated.Granule.WhyThisAttribute[1].CollectionReference.Version, 'b');
+});
+
+test('updateCmrFileCollections updates umm when missing', (t) => {


So this is adding a Collection.Shortname if it doesn't exist to the user metadata?

Can we document/elaborate/discuss why that choice was made?

Jkovarik · 2025-02-06T21:22:32Z

packages/cmrjs/tests/cmr-utils/test-cmr-utils.js

+  t.is(updated.CollectionReference.Version, 'b');
+});
+
+test('updateCmrFileCollections updates umm at non-standard locations', (t) => {


What makes this a non-standard location? Is it possible for it to be defined in both places? Same question for UMM as well.

Jkovarik · 2025-02-06T22:54:20Z

tasks/change-granule-collection-s3/schemas/output.json

+        }
+      }
+    },
+    "oldGranules": {


I think there's still an open question re: this output with respect to what we need downstream, but that can probably be modified as part of that work if needed. Commenting mostly as a bookmark to have that conversation.

Jkovarik

Review still in progress, further commentary

Jkovarik · 2025-02-06T22:57:01Z

tasks/change-granule-collection-s3/src/update_cmr_file_collection.ts

+import { ApiFile, ApiGranuleRecord } from '@cumulus/types';
+import { AssertionError } from 'assert';
+
+export type ValidApiFile = {


Nit: Is this a legit type for the types package? Should it be?

Jkovarik · 2025-02-06T22:59:57Z

tasks/change-granule-collection-s3/src/update_cmr_file_collection.ts

+} & ApiFile;
+
+export type ValidGranuleRecord = {
+  files: Omit<ValidApiFile, 'granuleId'>[]


Nit: This also raises a question of if there should be a type that doesn't include granuleId given we're doing this sort of omission both in types/granules and here.

Consider making this a defined type to DRY up the Omits in the task

Jkovarik · 2025-02-06T23:03:10Z

tasks/change-granule-collection-s3/src/update_cmr_file_collection.ts

+  return true;
+}
+
+export function apiGranuleRecordIsValid(granule: ApiGranuleRecord): granule is ValidGranuleRecord {


Nit: So... we're saying that granules must have files to be valid for the methods using this type.... saving this thought as I haven't completed the review, but what if there are collections being moved where there are some granules that don't have files? Is that a valid case? Added nit tag as I'm unconvinced this is a useful edge case in practice.

Jkovarik · 2025-02-06T23:10:45Z

tasks/change-granule-collection-s3/src/update_cmr_file_collection.ts

+export const uploadCMRFile = async (cmrFile: Omit<ValidApiFile, 'granuleId'>, cmrObject: { Granule?: object }) => {
+  let cmrFileString;
+  if (isUMMGFilename(cmrFile.fileName || cmrFile.key)) {
+    cmrFileString = JSON.stringify(cmrObject, undefined, 2);


Why '2' here?

throughout our code, when we upload json, we upload it as a prettified string at 2 indentation

Jkovarik · 2025-02-07T12:49:17Z

tasks/change-granule-collection-s3/src/update_cmr_file_collection.ts

+    cmrFileString = JSON.stringify(cmrObject, undefined, 2);
+  } else {
+    // our xml stringify function packages the metadata in "Granule",
+    // resulting in possible nested Granule object


Can you talk me through this a bit? I see generateEcho10XMLString putting values in Object.Granule, but is `{ Granule?: object } the typing in the method because it could be UMMG or Echo10XML?

(also: Direct unit coverage here might be helpful in documenting the edge case?)

Jkovarik · 2025-02-07T12:50:01Z