-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨🚨 Source Mailchimp: Migrate to Low code #35281
Changes from 3 commits
c9d43a6
83736f2
3a96f34
9d79879
200d282
9b76729
b8173f1
568d83c
7eafce1
7419188
321ea38
1b4ca32
90b9606
24b1511
0cb2273
2c25d95
536c018
ccd2305
b08c4b7
d8ef2fc
b2ccb38
0d62d00
9953738
8cd2760
7534433
97a8f96
0e9cdf1
6594d4a
55563a8
89098c3
d020651
8641b82
faf5f79
7837b76
e6fc6d9
0fbd2ae
591e852
425be4a
1e369f9
9a71be0
f4c9591
8f69b55
0fbcb0f
ef3e0ec
f8c19f9
23d305e
b89ba8d
8831889
e38201a
20a83a1
966f90b
2aeb0f4
944666b
cfdc451
c54ea4e
057e519
01b2213
e1be9a7
b6d84d3
7d5eb7f
7732a44
77369ee
bd956e7
0dcc86c
d992ec3
f703e00
82526f7
dbe6cf2
f087be1
6e92e8e
cc9299a
0124f5e
ba51d7f
bd79818
2550ba7
c6953ab
38dd9a4
08be185
f903771
c556708
df2ee5b
ffd2ca3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,117 @@ | ||||||||||||||||||||||
# Copyright (c) 2023 Airbyte, Inc., all rights reserved. | ||||||||||||||||||||||
|
||||||||||||||||||||||
from dataclasses import dataclass | ||||||||||||||||||||||
from typing import Any, List, Mapping, Optional | ||||||||||||||||||||||
|
||||||||||||||||||||||
import pendulum | ||||||||||||||||||||||
import requests | ||||||||||||||||||||||
from airbyte_cdk.sources.declarative.auth.declarative_authenticator import DeclarativeAuthenticator | ||||||||||||||||||||||
from airbyte_cdk.sources.declarative.auth.token import BasicHttpAuthenticator, BearerAuthenticator | ||||||||||||||||||||||
from airbyte_cdk.sources.declarative.extractors.record_filter import RecordFilter | ||||||||||||||||||||||
from airbyte_cdk.sources.declarative.requesters.http_requester import HttpRequester | ||||||||||||||||||||||
from airbyte_cdk.sources.declarative.requesters.request_options.interpolated_request_options_provider import ( | ||||||||||||||||||||||
InterpolatedRequestOptionsProvider, | ||||||||||||||||||||||
RequestInput, | ||||||||||||||||||||||
) | ||||||||||||||||||||||
from airbyte_cdk.sources.declarative.types import StreamSlice, StreamState | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
@dataclass | ||||||||||||||||||||||
class MailChimpRequester(HttpRequester): | ||||||||||||||||||||||
|
||||||||||||||||||||||
request_body_json: Optional[RequestInput] = None | ||||||||||||||||||||||
request_headers: Optional[RequestInput] = None | ||||||||||||||||||||||
request_parameters: Optional[RequestInput] = None | ||||||||||||||||||||||
request_body_data: Optional[RequestInput] = None | ||||||||||||||||||||||
|
||||||||||||||||||||||
def __post_init__(self, parameters: Mapping[str, Any]) -> None: | ||||||||||||||||||||||
|
||||||||||||||||||||||
self.request_options_provider = InterpolatedRequestOptionsProvider( | ||||||||||||||||||||||
request_body_data=self.request_body_data, | ||||||||||||||||||||||
request_body_json=self.request_body_json, | ||||||||||||||||||||||
request_headers=self.request_headers, | ||||||||||||||||||||||
request_parameters=self.request_parameters, | ||||||||||||||||||||||
config=self.config, | ||||||||||||||||||||||
parameters=parameters or {}, | ||||||||||||||||||||||
) | ||||||||||||||||||||||
super().__post_init__(parameters) | ||||||||||||||||||||||
|
||||||||||||||||||||||
def get_url_base(self) -> str: | ||||||||||||||||||||||
self.get_data_center_location() | ||||||||||||||||||||||
return super().get_url_base() | ||||||||||||||||||||||
|
||||||||||||||||||||||
def get_data_center_location(self): | ||||||||||||||||||||||
if not self.config.get("data_center"): | ||||||||||||||||||||||
if isinstance(self.authenticator, BasicHttpAuthenticator): | ||||||||||||||||||||||
data_center = self.config["credentials"]["apikey"].split("-").pop() | ||||||||||||||||||||||
else: | ||||||||||||||||||||||
data_center = self.get_oauth_data_center(self.config["credentials"]["access_token"]) | ||||||||||||||||||||||
self.config["data_center"] = data_center | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are we modifying the config? As far as I know, this is an implementation detail. Why not remove the url_base field from the manifest? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mainly for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would agree with @girarda that it would be better to pass this as a parameter of |
||||||||||||||||||||||
|
||||||||||||||||||||||
@staticmethod | ||||||||||||||||||||||
def get_oauth_data_center(access_token: str) -> str: | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
Every Mailchimp API request must be sent to a specific data center. | ||||||||||||||||||||||
The data center is already embedded in API keys, but not OAuth access tokens. | ||||||||||||||||||||||
This method retrieves the data center for OAuth credentials. | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
try: | ||||||||||||||||||||||
response = requests.get( | ||||||||||||||||||||||
"https://login.mailchimp.com/oauth2/metadata", headers={"Authorization": "OAuth {}".format(access_token)} | ||||||||||||||||||||||
) | ||||||||||||||||||||||
|
||||||||||||||||||||||
# Requests to this endpoint will return a 200 status code even if the access token is invalid. | ||||||||||||||||||||||
error = response.json().get("error") | ||||||||||||||||||||||
if error == "invalid_token": | ||||||||||||||||||||||
raise ValueError("The access token you provided was invalid. Please check your credentials and try again.") | ||||||||||||||||||||||
return response.json()["dc"] | ||||||||||||||||||||||
|
||||||||||||||||||||||
# Handle any other exceptions that may occur. | ||||||||||||||||||||||
except Exception as e: | ||||||||||||||||||||||
raise Exception(f"An error occured while retrieving the data center for your account. \n {repr(e)}") | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
class MailChimpAuthenticator(DeclarativeAuthenticator): | ||||||||||||||||||||||
artem1205 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
config: Mapping[str, Any] | ||||||||||||||||||||||
bearer: BearerAuthenticator | ||||||||||||||||||||||
basic: BasicHttpAuthenticator | ||||||||||||||||||||||
|
||||||||||||||||||||||
def __new__(cls, bearer, basic, config, *args, **kwargs): | ||||||||||||||||||||||
if config.get("credentials", {}).get("auth_type") == "oauth2.0": | ||||||||||||||||||||||
return bearer | ||||||||||||||||||||||
else: | ||||||||||||||||||||||
return basic | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
class MailChimpRecordFilter(RecordFilter): | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
Filter applied on a list of Records. | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
|
||||||||||||||||||||||
def filter_records( | ||||||||||||||||||||||
self, | ||||||||||||||||||||||
records: List[Mapping[str, Any]], | ||||||||||||||||||||||
stream_state: StreamState, | ||||||||||||||||||||||
stream_slice: Optional[StreamSlice] = None, | ||||||||||||||||||||||
next_page_token: Optional[Mapping[str, Any]] = None, | ||||||||||||||||||||||
) -> List[Mapping[str, Any]]: | ||||||||||||||||||||||
current_state = [x for x in stream_state.get("states", []) if x["partition"]["id"] == stream_slice.partition["id"]] | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note to @girarda: semi-incremental needs to work with per partition states |
||||||||||||||||||||||
# TODO: REF what to do if no start_date mentioned (see manifest) | ||||||||||||||||||||||
maxi297 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
# implement the same logic | ||||||||||||||||||||||
start_date = self.config.get("start_date", (pendulum.now() - pendulum.duration(days=700)).to_iso8601_string()) | ||||||||||||||||||||||
if current_state and start_date: | ||||||||||||||||||||||
filter_value = max(start_date, current_state[0]["cursor"][self.parameters["cursor_field"]]) | ||||||||||||||||||||||
return [record for record in records if record[self.parameters["cursor_field"]] > filter_value] | ||||||||||||||||||||||
return records | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
||||||||||||||||||||||
class MailChimpRecordFilterEmailActivity(RecordFilter): | ||||||||||||||||||||||
def filter_records( | ||||||||||||||||||||||
self, | ||||||||||||||||||||||
records: List[Mapping[str, Any]], | ||||||||||||||||||||||
stream_state: StreamState, | ||||||||||||||||||||||
stream_slice: Optional[StreamSlice] = None, | ||||||||||||||||||||||
next_page_token: Optional[Mapping[str, Any]] = None, | ||||||||||||||||||||||
) -> List[Mapping[str, Any]]: | ||||||||||||||||||||||
|
||||||||||||||||||||||
return [{**record, **activity_item} for record in records for activity_item in record.pop("activity", [])] | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question so I learn: looks like this filter takes a dict of record and returns a separate row for each record and each activity in that record. If there are no activities at all, the record will be filtered out. If there are multiple activities on a single record (if that is possible), that will return multiple row for each found activity item. How wrong am I? If that's correct, than this thing both filters records, but also transforms them and generates them, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're generally right: this code may be written as follows: data = response_json.get(self.data_field, [])
for item in data:
for activity_item in item.pop("activity", []):
yield {**item, **activity_item}
Yep. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this defined as being a filter? Like Natik mentioned, I would expect this to be a custom transformation followed by a filter that checks if there were activities. Separating the two concerns allows us to extract some logic that can be generalized. Instead of creating a custom filter, we should introduce a flatten_field transformation which takes a nested object and brings it to the root The transformation:
would then be described as
Here is a tentative schema definition:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I can agree that this is more about {
"key": 123,
"activities": [
{"nested": "value"},
{"another_nested": "value2"}]
}
should be transformed to 2 records as [
{
"key": 123,
"nested": "value"
},
{
"key": 123,
"another_nested": "value2"
}
] it means that we will have more records than original list, while current implementation does not expect the number of records to change: airbyte/airbyte-cdk/python/airbyte_cdk/sources/declarative/extractors/record_selector.py Lines 88 to 97 in 4061f08
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. got it! I did misunderstand. Can you make a proposal for how you'd want to use this feature? When considering new features or custom components, it's useful to have a description, examples, and a desired interface (when known) to help us understand the use case. Then we can implement and make any required changes to the record transform interface There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I'll get back with all propositions as soon as this PR will be in review, I'm still working on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you describe why this is needed? it's difficult to judge whether is should truly be custom behavior without stating the desired behavior and why it cannot be supported natively by the CDK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, mailchimp declares custom api structure
The only reason for this custom component is to get the
data center
prefix for Oauth authenticator.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a custom requester instead of something like a config migrator?