🚨🚨 Source Mailchimp: Migrate to Low code #35281

artem1205 · 2024-02-14T15:50:07Z

What

Resolve https://github.com/airbytehq/airbyte-internal-issues/issues/2919

How

migrate to Low-Code

🚨 User Impact 🚨

breaking change for nested state (parent child incremental)
fix segment_members primary key: ["id", "segment_id"]
fix list_members primary key: ["id", "list_id"]

Pre-merge Actions

Updating a connector

Community member or Airbyter

Grant edit access to maintainers (instructions)
Unit & integration tests added

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

Create a non-forked branch based on this PR and test the below items on it
Build is successful
If new credentials are required for use in CI, add them to GSM. Instructions.

vercel · 2024-02-14T15:50:14Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	⬜️ Ignored (Inspect)	Visit Preview		Mar 29, 2024 11:13am

github-actions · 2024-02-14T15:50:29Z

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

PR name follows PR naming conventions
Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
Secrets in the connector's spec are annotated with airbyte_secret
All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

Check for hidden checklists in your PR description
Toggle the github label checklist-action-run on/off to re-run the checklist CI.

natikgadzhi · 2024-02-14T18:32:04Z

airbyte-integrations/connectors/source-mailchimp/source_mailchimp/components.py

+class MailChimpRecordFilter(RecordFilter):
+    """
+    Filter applied on a list of Records.
+    """
+
+    def filter_records(
+        self,
+        records: List[Mapping[str, Any]],
+        stream_state: StreamState,
+        stream_slice: Optional[StreamSlice] = None,
+        next_page_token: Optional[Mapping[str, Any]] = None,
+    ) -> List[Mapping[str, Any]]:
+        current_state = [x for x in stream_state.get("states", []) if x["partition"]["id"] == stream_slice.partition["id"]]
+        # TODO: REF what to do if no start_date mentioned (see manifest)
+        #  implement the same logic
+        start_date = self.config.get("start_date", (pendulum.now() - pendulum.duration(days=700)).to_iso8601_string())
+        if current_state and start_date:
+            filter_value = max(start_date, current_state[0]["cursor"][self.parameters["cursor_field"]])
+            return [record for record in records if record[self.parameters["cursor_field"]] > filter_value]
+        return records
+
+
+class MailChimpRecordFilterEmailActivity(RecordFilter):
+    def filter_records(
+        self,
+        records: List[Mapping[str, Any]],
+        stream_state: StreamState,
+        stream_slice: Optional[StreamSlice] = None,
+        next_page_token: Optional[Mapping[str, Any]] = None,
+    ) -> List[Mapping[str, Any]]:
+
+        return [{**record, **activity_item} for record in records for activity_item in record.pop("activity", [])]


Question so I learn: looks like this filter takes a dict of record and returns a separate row for each record and each activity in that record.

If there are no activities at all, the record will be filtered out. If there are multiple activities on a single record (if that is possible), that will return multiple row for each found activity item.

How wrong am I?

If that's correct, than this thing both filters records, but also transforms them and generates them, right?

You're generally right:

this code may be written as follows:

data = response_json.get(self.data_field, []) for item in data: for activity_item in item.pop("activity", []): yield {**item, **activity_item}

If that's correct, than this thing both filters records, but also transforms them and generates them, right?

Yep.
Standard Dpathextractor is used to replace first for in code above data = response_json.get(self.data_field, [])
and this custom class filters and extracts activities

why is this defined as being a filter? Like Natik mentioned, I would expect this to be a custom transformation followed by a filter that checks if there were activities.

Separating the two concerns allows us to extract some logic that can be generalized.

Instead of creating a custom filter, we should introduce a flatten_field transformation which takes a nested object and brings it to the root

The transformation:

{ "key": 123, "activities": {"nested": "value", "another_nested": "value2"} } to { "key": 123, "nested:" value", "another_nested": "value2" }

would then be described as

transformations: type: FlattenField field_pointers: - activities

Here is a tentative schema definition:

FlattenFields: title: Flatten Fields type: object required: - type - field_pointers properties: type: type: string enum: [FlattenFields] field_pointers: title: Field Paths type: array items: items: type: string examples: - ["activities"]

Hmm, I can agree that this is more about transformation rather than filtering, but

{ "key": 123, "activities": [ {"nested": "value"}, {"another_nested": "value2"}] }

should be transformed to 2 records as activities is a list :

[ { "key": 123, "nested": "value" }, { "key": 123, "another_nested": "value2" } ]

it means that we will have more records than original list, while current implementation does not expect the number of records to change:

airbyte/airbyte-cdk/python/airbyte_cdk/sources/declarative/extractors/record_selector.py

Lines 88 to 97 in 4061f08

def _transform(

self,

records: List[Mapping[str, Any]],

stream_state: StreamState,

stream_slice: Optional[StreamSlice] = None,

) -> None:

for record in records:

for transformation in self.transformations:

# record has type Mapping[str, Any], but Record expected

transformation.transform(record, config=self.config, stream_state=stream_state, stream_slice=stream_slice) # type: ignore

got it! I did misunderstand.

Can you make a proposal for how you'd want to use this feature? When considering new features or custom components, it's useful to have a description, examples, and a desired interface (when known) to help us understand the use case.

Then we can implement and make any required changes to the record transform interface

Sure, I'll get back with all propositions as soon as this PR will be in review, I'm still working on it.

airbyte-integrations/connectors/source-mailchimp/source_mailchimp/components.py

girarda · 2024-02-14T20:20:12Z

airbyte-integrations/connectors/source-mailchimp/source_mailchimp/components.py

+
+
+@dataclass
+class MailChimpRequester(HttpRequester):


can you describe why this is needed? it's difficult to judge whether is should truly be custom behavior without stating the desired behavior and why it cannot be supported natively by the CDK

So, mailchimp declares custom api structure

There are a few ways to find your data center. It’s the first part of the URL you see in the API keys section of your account; if the URL is https://us6.mailchimp.com/account/api/, then the data center subdomain is us6. It’s also appended to your API key in the form key-dc; if your API key is 0123456789abcdef0123456789abcde-us6, then the data center subdomain is us6. And finally, if you’re connecting via OAuth 2, you can find the data center associated with the token via the OAuth Metadata endpoint; for more information, see the OAuth guide.

The only reason for this custom component is to get the data center prefix for Oauth authenticator.

Why is this a custom requester instead of something like a config migrator?

girarda · 2024-02-14T20:21:25Z

airbyte-integrations/connectors/source-mailchimp/unit_tests/test_source.py

-    ],
-    ids=["API Key Invalid", "Forbidden", "Unknown Error"],
-)
-def test_check_connection_error(requests_mock, config, data_center, response, expected_message):


have we considered the tradeoffs between deleting the tests and updating the fixture to run them in the new implementation?

girarda · 2024-02-14T20:26:45Z

airbyte-integrations/connectors/source-mailchimp/source_mailchimp/components.py

+                data_center = self.config["credentials"]["apikey"].split("-").pop()
+            else:
+                data_center = self.get_oauth_data_center(self.config["credentials"]["access_token"])
+            self.config["data_center"] = data_center


why are we modifying the config? As far as I know, this is an implementation detail. Why not remove the url_base field from the manifest?

Mainly for visibility and not overriding the get_url_base method from base class

I would agree with @girarda that it would be better to pass this as a parameter of __init__ else we create an object that is stateful. That being said, I don't know how all of this fits in the initialization process of the custom components. The other solution I would propose is to not use custom components and use only interpolation

maxi297

Thanks for addressing all the comments and making sure there is a little impact as possible for the user!

maxi297 · 2024-03-18T13:24:06Z

airbyte-integrations/connectors/source-mailchimp/source_mailchimp/config_migrations.py

+        Args:
+        - migrated_config (Mapping[str, Any]): The migrated configuration.
+        """
+        cls.message_repository.emit_message(create_connector_config_control_message(migrated_config))


This whole method can be replaced by print(create_connector_config_control_message(migrated_config).json(exclude_unset=True)) and we can remove the message_repository from this class

Signed-off-by: Artem Inzhyyants <[email protected]>

…ilchimp-low-code-35064

Signed-off-by: Artem Inzhyyants <[email protected]>

NAjustin · 2024-03-22T19:29:44Z

@artem1205 I found this PR while investigating an issue with the current version of the Mailchimp connector, and noticed that this version replicates the same error, so I wanted to flag it for you while I work on testing a PR for the main connector.

The issue is that the members object currently has primary_key set to id, which is incorrect:

In Mailchimp, members are associated with lists. This means there will be one member record per list they are on.
members.id` is a hash of the member's email address, which is obviously non-unique across lists
This means that if you do an incremental sync with the current connector (or this low-code version), you will only get one record per email.

You can see this issue here in your feature branch.

Either of the following keys would be suitable:

contact_id, which is a non-email-specific identifier for the member within the list (so is unique across email+list)
web_id is an integer and is also unique to the list+email combination
id and list_id could also be used as a compound key with the same effect

Mailchimp also discourages the use of id in their docs (see the members.contact_id response object), stating:

As Mailchimp evolves beyond email, you may eventually have contacts without email addresses. While the id is the MD5 hash of their email address, this contact_id is agnostic of contact’s inclusion of an email address.

This has been a longstanding issue in the Mailchimp source, but only affects accounts with the same contacts on multiple lists which may be why it's flown under the radar.

…ilchimp-low-code-35064

Signed-off-by: Artem Inzhyyants <[email protected]>

artem1205 · 2024-03-26T11:49:38Z

@NAjustin, thank you for pointing that out!
composite primary key will be used: ["id", "list_id"]

Signed-off-by: Artem Inzhyyants <[email protected]>

brianjlai

small reminder to re-run poetry lock before merging to use the latest version of airbyte-cdk. Thanks!

…ilchimp-low-code-35064 # Conflicts: # airbyte-integrations/connectors/source-mailchimp/metadata.yaml # airbyte-integrations/connectors/source-mailchimp/poetry.lock # airbyte-integrations/connectors/source-mailchimp/pyproject.toml # docs/integrations/sources/mailchimp.md

Signed-off-by: Artem Inzhyyants <[email protected]>

artem1205 added 3 commits February 14, 2024 14:51

Airbyte CDK: add CustomRecordFilter

c9d43a6

Airbyte CDK: add interpolation for RequestOptions

83736f2

Source Mailchimp: migrate to Low-Code

3a96f34

octavia-squidington-iii added area/connectors Connector related issues CDK Connector Development Kit labels Feb 14, 2024

octavia-squidington-iii added the connectors/source/mailchimp label Feb 14, 2024

artem1205 self-assigned this Feb 14, 2024

natikgadzhi reviewed Feb 14, 2024

View reviewed changes

artem1205 added 5 commits February 14, 2024 19:54

Source Mailchimp: bump base image

9d79879

Source Mailchimp: remove unit tests

200d282

Source Mailchimp: add docstring

9b76729

Source Mailchimp: add segment_members transformation

b8173f1

Source Mailchimp: add tags transformation

568d83c

girarda reviewed Feb 14, 2024

View reviewed changes

airbyte-integrations/connectors/source-mailchimp/source_mailchimp/components.py Outdated Show resolved Hide resolved

girarda reviewed Feb 14, 2024

View reviewed changes

artem1205 added 12 commits February 14, 2024 21:45

Source Mailchimp: use SelectiveAuthenticator

7eafce1

Airbyte CDK: add filter to RemoveFields

7419188

Source Mailchimp: add transformation

321ea38

Source Mailchimp: remove duplicating test

1b4ca32

Source Mailchimp: ref MailChimpRecordExtractorEmailActivity

90b9606

Source Mailchimp: add test

24b1511

Source Mailchimp: add unit tests

0cb2273

Source Mailchimp: add test for components

2c25d95

Source Mailchimp: add integration tests

536c018

Source Mailchimp: ref manifest.yaml

ccd2305

Source Mailchimp: add unit tests

b08c4b7

Source Mailchimp: ref

d8ef2fc

maxi297 approved these changes Mar 18, 2024

View reviewed changes

artem1205 added 2 commits March 18, 2024 14:45

Source MailChimp: remove message_repository

f703e00

Signed-off-by: Artem Inzhyyants <[email protected]>

Source MailChimp: update docs

82526f7

Signed-off-by: Artem Inzhyyants <[email protected]>

vercel bot deployed to Preview March 18, 2024 16:20 View deployment

artem1205 added 3 commits March 18, 2024 17:27

Merge remote-tracking branch 'origin/master' into artem1205/source-ma…

dbe6cf2

…ilchimp-low-code-35064

Source MailChimp: format

f087be1

Signed-off-by: Artem Inzhyyants <[email protected]>

Source MailChimp: format

6e92e8e

Signed-off-by: Artem Inzhyyants <[email protected]>

lazebnyi approved these changes Mar 20, 2024

View reviewed changes

NAjustin mentioned this pull request Mar 22, 2024

🐛 Source MailChimp: Fix incorrect primary key on list_members stream #36405

Closed

artem1205 added 3 commits March 26, 2024 12:28

Merge remote-tracking branch 'origin/master' into artem1205/source-ma…

cc9299a

…ilchimp-low-code-35064

Source MailChimp: update primary key

0124f5e

Signed-off-by: Artem Inzhyyants <[email protected]>

Source MailChimp: update poetry

ba51d7f

Signed-off-by: Artem Inzhyyants <[email protected]>

Source MailChimp: update poetry

bd79818

Signed-off-by: Artem Inzhyyants <[email protected]>

vercel bot deployed to Preview March 26, 2024 11:55 View deployment

artem1205 added 5 commits March 26, 2024 13:30

Source MailChimp: add LegacyToPerPartitionStateMigration

2550ba7

Signed-off-by: Artem Inzhyyants <[email protected]>

Source Mailchimp: fix state migration

c6953ab

Signed-off-by: Artem Inzhyyants <[email protected]>

Source MailChimp: update acceptance test

38dd9a4

Signed-off-by: Artem Inzhyyants <[email protected]>

Source MailChimp: update acceptance test

08be185

Signed-off-by: Artem Inzhyyants <[email protected]>

Source MailChimp: update docs

f903771

Signed-off-by: Artem Inzhyyants <[email protected]>

vercel bot deployed to Preview March 27, 2024 18:23 View deployment

Source MailChimp: fix formatting

c556708

Signed-off-by: Artem Inzhyyants <[email protected]>

brianjlai reviewed Mar 28, 2024

View reviewed changes

artem1205 added 2 commits March 29, 2024 12:12

Source Mailchimp: bump CDK version

ffd2ca3

Signed-off-by: Artem Inzhyyants <[email protected]>

artem1205 merged commit 02add5b into master Apr 1, 2024
29 of 30 checks passed

artem1205 deleted the artem1205/source-mailchimp-low-code-35064 branch April 1, 2024 13:30

lazebnyi added the low-code-migration This connector has been migrated to the low-code CDK label Apr 3, 2024

nurikk pushed a commit to nurikk/airbyte that referenced this pull request Apr 4, 2024

🚨🚨 Source Mailchimp: Migrate to Low code (airbytehq#35281)

05c1ba1

Signed-off-by: Artem Inzhyyants <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨🚨 Source Mailchimp: Migrate to Low code #35281

🚨🚨 Source Mailchimp: Migrate to Low code #35281

artem1205 commented Feb 14, 2024 •

edited

Loading

vercel bot commented Feb 14, 2024 •

edited

Loading

github-actions bot commented Feb 14, 2024 •

edited by artem1205

Loading

natikgadzhi Feb 14, 2024

artem1205 Feb 14, 2024

girarda Feb 14, 2024

artem1205 Feb 14, 2024

girarda Feb 14, 2024

artem1205 Feb 14, 2024

girarda Feb 14, 2024

artem1205 Feb 14, 2024

maxi297 Mar 11, 2024

girarda Feb 14, 2024

girarda Feb 14, 2024

artem1205 Feb 14, 2024

maxi297 Mar 11, 2024

maxi297 left a comment

maxi297 Mar 18, 2024

NAjustin commented Mar 22, 2024

artem1205 commented Mar 26, 2024

brianjlai left a comment

	def _transform(
	self,
	records: List[Mapping[str, Any]],
	stream_state: StreamState,
	stream_slice: Optional[StreamSlice] = None,
	) -> None:
	for record in records:
	for transformation in self.transformations:
	# record has type Mapping[str, Any], but Record expected
	transformation.transform(record, config=self.config, stream_state=stream_state, stream_slice=stream_slice) # type: ignore

🚨🚨 Source Mailchimp: Migrate to Low code #35281

🚨🚨 Source Mailchimp: Migrate to Low code #35281

Conversation

artem1205 commented Feb 14, 2024 • edited Loading

What

How

Recommended reading order

🚨 User Impact 🚨

Pre-merge Actions

Community member or Airbyter

Airbyter

vercel bot commented Feb 14, 2024 • edited Loading

github-actions bot commented Feb 14, 2024 • edited by artem1205 Loading

Before Merging a Connector Pull Request

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxi297 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NAjustin commented Mar 22, 2024

artem1205 commented Mar 26, 2024

brianjlai left a comment

Choose a reason for hiding this comment

artem1205 commented Feb 14, 2024 •

edited

Loading

vercel bot commented Feb 14, 2024 •

edited

Loading

github-actions bot commented Feb 14, 2024 •

edited by artem1205

Loading