Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][pulsar-io] Added support for generic record and raw JSON string schemas to CassandraSink #16179

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

david-streamlio
Copy link
Contributor

Motivation

The current implementation of the Cassandra Sink connector only supported a single schema type (key, string). This is not useful for production. So I modified the code to be able to support any schema type in Cassandra.

Modifications

Added classes that interrogate the database to determine the schema type at runtime. I also added a framework that will extract the values from the supported incoming schema types (GenericRecord, and String) using the table metadata.

Verifying this change

  • [ x] Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change added tests and can be verified as follows:

Added integration tests for testing against a Cassandra database

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes)
  • The public API: (no)
  • The schema: (yes)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: ( no)
  • Anything that affects deployment: (no)

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • [ x] doc-required
    (Your PR needs to update docs and you will update later)

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Jun 22, 2022
@david-streamlio
Copy link
Contributor Author

@tspannhw can you review and upvote?

@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Aug 21, 2022
@david-streamlio
Copy link
Contributor Author

/pulsarbot run-failure-checks

@david-streamlio
Copy link
Contributor Author

/pulsarbot ready-to-test

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks

I left some comments, please take a look

<dependency>
<groupId>org.apache.pulsar</groupId>
<artifactId>pulsar-functions-local-runner-original</artifactId>
<version>2.11.0-SNAPSHOT</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use project.version

public class CassandraConnector {

private Cluster cluster;
private Session session;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to handle concurrent access properly to these fields


public StringRecordWrapper(String jsonString) {
super(jsonString);
valuesMap = new Gson().fromJson(jsonString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can cache this "new Gson()" instance in a static field


package org.apache.pulsar.io.cassandra.util;

import com.datastax.driver.core.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not use star imports

}

private static final void sendData() throws InterruptedException {
TimeUnit.SECONDS.sleep(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to sleep a fixed amount of time ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I sleep 10 seconds, is so I can easily visualize the data as it is published to the system. The delay allows me time to execute a query against Cassandra in order to detect the new records.

@codecov-commenter
Copy link

codecov-commenter commented Oct 13, 2022

Codecov Report

Merging #16179 (21cedba) into master (1b5722d) will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #16179      +/-   ##
============================================
- Coverage     46.34%   46.29%   -0.05%     
- Complexity    10394    10420      +26     
============================================
  Files           703      703              
  Lines         68838    68858      +20     
  Branches       7379     7383       +4     
============================================
- Hits          31905    31880      -25     
- Misses        33324    33375      +51     
+ Partials       3609     3603       -6     
Flag Coverage Δ
unittests 46.29% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...java/org/apache/pulsar/proxy/stats/TopicStats.java 58.82% <0.00%> (-41.18%) ⬇️
...lsar/broker/loadbalance/impl/ThresholdShedder.java 3.27% <0.00%> (-27.87%) ⬇️
.../apache/pulsar/broker/loadbalance/LoadManager.java 61.11% <0.00%> (-16.67%) ⬇️
...rg/apache/pulsar/broker/lookup/v1/TopicLookup.java 60.00% <0.00%> (-13.34%) ⬇️
...roker/service/persistent/MessageDeduplication.java 43.23% <0.00%> (-10.49%) ⬇️
...org/apache/pulsar/broker/loadbalance/LoadData.java 58.33% <0.00%> (-8.34%) ⬇️
...he/pulsar/client/impl/PartitionedProducerImpl.java 30.34% <0.00%> (-5.13%) ⬇️
.../apache/pulsar/client/impl/BatchMessageIdImpl.java 67.50% <0.00%> (-4.73%) ⬇️
...tent/PersistentDispatcherSingleActiveConsumer.java 55.17% <0.00%> (-4.71%) ⬇️
...pulsar/broker/service/PulsarCommandSenderImpl.java 73.84% <0.00%> (-4.62%) ⬇️
... and 64 more

@david-streamlio
Copy link
Contributor Author

/pulsarbot run-failure-checks

@david-streamlio
Copy link
Contributor Author

@eolivelli I have made the requested changes, can you PTAL when you get a chance? Thank!

@david-streamlio
Copy link
Contributor Author

@eolivelli , Can you please take a look at this when you get the chance? Thanks again!

@david-streamlio
Copy link
Contributor Author

@eolivelli Can I please get some feedback on these changes I made in response to your initial feedback? Thanks again for the review, I really appreciated it.

@tisonkun tisonkun requested a review from eolivelli December 11, 2022 01:10
Signed-off-by: tison <[email protected]>
Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-streamlio Please fix the compilation failure. It seems that you use JUnit while Pulsar use TestNG as the test platform.

I've pushed a merge commit and fixing for license header to your remote branch so be aware to git pull before working on it.

BTW, please try to expand all star import.

@david-streamlio
Copy link
Contributor Author

/pulsarbot run-failure-checks

@david-streamlio
Copy link
Contributor Author

@tisonkun @eolivelli I would appreciate another review when you have the time.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@Technoboy- Technoboy- added this to the 3.2.0 milestone Jul 31, 2023
@Technoboy- Technoboy- modified the milestones: 3.2.0, 3.3.0 Dec 22, 2023
@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
@lhotari lhotari modified the milestones: 4.0.0, 4.1.0 Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/function doc-required Your PR changes impact docs and you will update later. ready-to-test release/4.0.4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants