-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨🚨 Source Mailchimp: Migrate to Low code #35281
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
class MailChimpRecordFilter(RecordFilter): | ||
""" | ||
Filter applied on a list of Records. | ||
""" | ||
|
||
def filter_records( | ||
self, | ||
records: List[Mapping[str, Any]], | ||
stream_state: StreamState, | ||
stream_slice: Optional[StreamSlice] = None, | ||
next_page_token: Optional[Mapping[str, Any]] = None, | ||
) -> List[Mapping[str, Any]]: | ||
current_state = [x for x in stream_state.get("states", []) if x["partition"]["id"] == stream_slice.partition["id"]] | ||
# TODO: REF what to do if no start_date mentioned (see manifest) | ||
# implement the same logic | ||
start_date = self.config.get("start_date", (pendulum.now() - pendulum.duration(days=700)).to_iso8601_string()) | ||
if current_state and start_date: | ||
filter_value = max(start_date, current_state[0]["cursor"][self.parameters["cursor_field"]]) | ||
return [record for record in records if record[self.parameters["cursor_field"]] > filter_value] | ||
return records | ||
|
||
|
||
class MailChimpRecordFilterEmailActivity(RecordFilter): | ||
def filter_records( | ||
self, | ||
records: List[Mapping[str, Any]], | ||
stream_state: StreamState, | ||
stream_slice: Optional[StreamSlice] = None, | ||
next_page_token: Optional[Mapping[str, Any]] = None, | ||
) -> List[Mapping[str, Any]]: | ||
|
||
return [{**record, **activity_item} for record in records for activity_item in record.pop("activity", [])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question so I learn: looks like this filter takes a dict of record and returns a separate row for each record and each activity in that record.
If there are no activities at all, the record will be filtered out. If there are multiple activities on a single record (if that is possible), that will return multiple row for each found activity item.
How wrong am I?
If that's correct, than this thing both filters records, but also transforms them and generates them, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're generally right:
this code may be written as follows:
data = response_json.get(self.data_field, [])
for item in data:
for activity_item in item.pop("activity", []):
yield {**item, **activity_item}
If that's correct, than this thing both filters records, but also transforms them and generates them, right?
Yep.
Standard Dpathextractor
is used to replace first for in code above data = response_json.get(self.data_field, [])
and this custom class filters and extracts activities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this defined as being a filter? Like Natik mentioned, I would expect this to be a custom transformation followed by a filter that checks if there were activities.
Separating the two concerns allows us to extract some logic that can be generalized.
Instead of creating a custom filter, we should introduce a flatten_field transformation which takes a nested object and brings it to the root
The transformation:
{
"key": 123,
"activities": {"nested": "value", "another_nested": "value2"}
}
to
{
"key": 123,
"nested:" value",
"another_nested": "value2"
}
would then be described as
transformations:
type: FlattenField
field_pointers:
- activities
Here is a tentative schema definition:
FlattenFields:
title: Flatten Fields
type: object
required:
- type
- field_pointers
properties:
type:
type: string
enum: [FlattenFields]
field_pointers:
title: Field Paths
type: array
items:
items:
type: string
examples:
- ["activities"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I can agree that this is more about transformation
rather than filtering
, but
{
"key": 123,
"activities": [
{"nested": "value"},
{"another_nested": "value2"}]
}
should be transformed to 2 records as activities
is a list :
[
{
"key": 123,
"nested": "value"
},
{
"key": 123,
"another_nested": "value2"
}
]
it means that we will have more records than original list, while current implementation does not expect the number of records to change:
airbyte/airbyte-cdk/python/airbyte_cdk/sources/declarative/extractors/record_selector.py
Lines 88 to 97 in 4061f08
def _transform( | |
self, | |
records: List[Mapping[str, Any]], | |
stream_state: StreamState, | |
stream_slice: Optional[StreamSlice] = None, | |
) -> None: | |
for record in records: | |
for transformation in self.transformations: | |
# record has type Mapping[str, Any], but Record expected | |
transformation.transform(record, config=self.config, stream_state=stream_state, stream_slice=stream_slice) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it! I did misunderstand.
Can you make a proposal for how you'd want to use this feature? When considering new features or custom components, it's useful to have a description, examples, and a desired interface (when known) to help us understand the use case.
Then we can implement and make any required changes to the record transform interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll get back with all propositions as soon as this PR will be in review, I'm still working on it.
airbyte-integrations/connectors/source-mailchimp/source_mailchimp/components.py
Outdated
Show resolved
Hide resolved
|
||
|
||
@dataclass | ||
class MailChimpRequester(HttpRequester): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you describe why this is needed? it's difficult to judge whether is should truly be custom behavior without stating the desired behavior and why it cannot be supported natively by the CDK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, mailchimp declares custom api structure
There are a few ways to find your data center. It’s the first part of the URL you see in the API keys section of your account; if the URL is https://us6.mailchimp.com/account/api/, then the data center subdomain is us6. It’s also appended to your API key in the form key-dc; if your API key is
0123456789abcdef0123456789abcde-us6
, then the data center subdomain is us6. And finally, if you’re connecting via OAuth 2, you can find the data center associated with the token via the OAuth Metadata endpoint; for more information, see the OAuth guide.
The only reason for this custom component is to get the data center
prefix for Oauth authenticator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a custom requester instead of something like a config migrator?
], | ||
ids=["API Key Invalid", "Forbidden", "Unknown Error"], | ||
) | ||
def test_check_connection_error(requests_mock, config, data_center, response, expected_message): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we considered the tradeoffs between deleting the tests and updating the fixture to run them in the new implementation?
data_center = self.config["credentials"]["apikey"].split("-").pop() | ||
else: | ||
data_center = self.get_oauth_data_center(self.config["credentials"]["access_token"]) | ||
self.config["data_center"] = data_center |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we modifying the config? As far as I know, this is an implementation detail. Why not remove the url_base field from the manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly for visibility
and not overriding the get_url_base
method from base class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree with @girarda that it would be better to pass this as a parameter of __init__
else we create an object that is stateful. That being said, I don't know how all of this fits in the initialization process of the custom components. The other solution I would propose is to not use custom components and use only interpolation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all the comments and making sure there is a little impact as possible for the user!
Args: | ||
- migrated_config (Mapping[str, Any]): The migrated configuration. | ||
""" | ||
cls.message_repository.emit_message(create_connector_config_control_message(migrated_config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole method can be replaced by print(create_connector_config_control_message(migrated_config).json(exclude_unset=True))
and we can remove the message_repository from this class
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
…ilchimp-low-code-35064
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
@artem1205 I found this PR while investigating an issue with the current version of the Mailchimp connector, and noticed that this version replicates the same error, so I wanted to flag it for you while I work on testing a PR for the main connector. The issue is that the
You can see this issue here in your feature branch. Either of the following keys would be suitable:
Mailchimp also discourages the use of
This has been a longstanding issue in the Mailchimp source, but only affects accounts with the same contacts on multiple lists which may be why it's flown under the radar. |
…ilchimp-low-code-35064
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
@NAjustin, thank you for pointing that out! |
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small reminder to re-run poetry lock
before merging to use the latest version of airbyte-cdk
. Thanks!
…ilchimp-low-code-35064 # Conflicts: # airbyte-integrations/connectors/source-mailchimp/metadata.yaml # airbyte-integrations/connectors/source-mailchimp/poetry.lock # airbyte-integrations/connectors/source-mailchimp/pyproject.toml # docs/integrations/sources/mailchimp.md
Signed-off-by: Artem Inzhyyants <[email protected]>
Signed-off-by: Artem Inzhyyants <[email protected]>
What
Resolve https://github.com/airbytehq/airbyte-internal-issues/issues/2919
How
migrate to Low-Code
Recommended reading order
airbyte-integrations/connectors/source-mailchimp/source_mailchimp/manifest.yaml
airbyte-integrations/connectors/source-mailchimp/source_mailchimp/components.py
🚨 User Impact 🚨
Pre-merge Actions
Updating a connector
Community member or Airbyter
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.