Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowplow-collector-scala does not use IAM Role for Service accounts in a container #186

Open
brettcave opened this issue Nov 10, 2021 · 7 comments

Comments

@brettcave
Copy link

brettcave commented Nov 10, 2021

When using snowplow stream collector scala (version 2.4.1) in kubernetes in AWS, the authentication does not work as expected.

I have tried the following steps to get it working:

  1. Configure IRSA - https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html - OIDC provider is set up, IAM role + policy is created, create a ServiceAccount.
  2. Test that IRSA is working - I have swapped out the snowplow/scala-stream-collector-kinesis for my own container image that has AWS CLI installed, and am able to validate that I am assuming the role correctly (aws sts get-caller-identity)
  3. Set up kinesis and add a configmap definition.
  4. Deploy the collector

After deploying the collector, I see the following errors:

com.amazonaws.services.kinesis.model.AmazonKinesisException: User: arn:aws:sts::REDACTED:assumed-role/eks-node-group-role/i-INSTANCEID is not authorized to perform: kinesis:DescribeStream on resource: arn:aws:kinesis:<region>:<account_id>:stream/<good_stream> because no identity-based policy allows the kinesis:DescribeStream action (Service: AmazonKinesis; Status Code: 400; Error Code: AccessDeniedException;)

So I can see that the role being assumed by the service is the IAM Instance Profile of the underlying node (which is restricted), and not the IAM Role for Service account.

I have variations on the aws snippet in the configmap:

              accessKey = default
              secretKey = default
          }

and

          aws {
              accessKey = iam
              secretKey = iam
          }

However, based on https://github.com/snowplow/stream-collector/blob/master/kinesis/src/main/scala/com.snowplowanalytics.snowplow.collectors.scalastream/sinks/KinesisSink.scala#L432, I believe the first is the 1 that should be used, as it would trigger the DefaultAWScredentialProviderChain(), and with the SDK being 1 that supports IRSA, it should pick up the Web Identity token as a higher priority (3) than EC2 instance profile (6). https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html

But for some reason it is not, not sure why.

I have also tried adding the following to the pod security context, as I have seen issues with accessing tokens before:

 fsGroup: 1000

However, I don't think this is related, because the container I used for testing was able to access the token and assume the IRSA role by default, and not the IAM instance profile of the node.

edit

from the scala collector container:

$ printenv | grep AWS
...
AWS_ROLE_ARN=arn:aws:iam:::role/my-sp-collector-role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
$ whoami
daemon
$ ls -lh /var/run/secrets/eks.amazonaws.com/serviceaccount/..data/token  # where the token above softlinks to
-rw-r----- 1 root daemon 1.1K Nov 10 03:44 /run/secrets/eks.amazonaws.com/serviceaccount/..data/token
$ cat /run/secrets/eks.amazonaws.com/serviceaccount/token 
<valid token is shown>

I have tried some variations on the security contexts, e.g. to map the token group

podSecurityContext:
  fsGroup: 1
securityContext:
  runAsGroup: 1
  runAsUser: 1

The only thing i can think of possibly is that the volume mount for the token / secret completes after the service starts, not sure if this is possible or the best way to validate.

@caleb15
Copy link

caleb15 commented Apr 12, 2022

Same issue with the snowplow/scala-stream-collector-kinesis:2.5.0 docker image. I'm getting the issue regardless of whether I specify iam or default. Maybe snowplow is just using an old version of the AWS SDK? It should be fixed in version 2.10.11 of java SDK v2 but I'm not sure which version they are using.

The only thing i can think of possibly is that the volume mount for the token / secret completes after the service starts

I don't think that's the cause. I tested by deploying the container with a overriden entrypoint:

          command: ["sleep"]
          args: ['3600']

I exec'd into the pod, made sure the env vars were set properly, and manually executed the command, yet I got the same issue.

@caleb15
Copy link

caleb15 commented Apr 12, 2022

Did some more investigation and found out that this issue should have been fixed already by #170. I double-checked and confirmed that the sts jar file is in the docker image and it's specified in the classpath too. I also confirmed it has the necessary SDK version (1.12.128). It's really weird that this issue is still happening, @istreeter @mmathias01 any ideas?

Logs:
daemon@snowplow-7978d84d7b-gjn7m:/opt/snowplow$ ls lib
com.amazonaws.aws-java-sdk-core-1.12.128.jar			     com.snowplowanalytics.snowplow-badrows_2.12-2.1.1.jar		     javax.annotation.javax.annotation-api-1.3.2.jar
com.amazonaws.aws-java-sdk-kinesis-1.12.128.jar			     com.snowplowanalytics.snowplow-scala-analytics-sdk_2.12-2.1.0.jar	     joda-time.joda-time-2.10.13.jar
com.amazonaws.aws-java-sdk-sqs-1.12.128.jar			     com.snowplowanalytics.snowplow-scala-tracker-core_2.12-1.0.0.jar	     org.apache.httpcomponents.httpclient-4.5.13.jar
com.amazonaws.aws-java-sdk-sts-1.12.128.jar			     com.snowplowanalytics.snowplow-scala-tracker-emitter-id_2.12-1.0.0.jar  org.apache.httpcomponents.httpcore-4.4.13.jar
com.amazonaws.jmespath-java-1.12.128.jar			     com.snowplowanalytics.snowplow-stream-collector-core-2.5.0.jar	     org.apache.thrift.libthrift-0.15.0.jar
com.chuusai.shapeless_2.12-2.3.7.jar				     com.snowplowanalytics.snowplow-stream-collector-kinesis-2.5.0.jar	     org.reactivestreams.reactive-streams-1.0.3.jar
com.fasterxml.jackson.core.jackson-annotations-2.12.3.jar	     com.typesafe.akka.akka-actor_2.12-2.6.16.jar			     org.scalaj.scalaj-http_2.12-2.4.2.jar
com.fasterxml.jackson.core.jackson-core-2.12.3.jar		     com.typesafe.akka.akka-http_2.12-10.2.7.jar			     org.scala-lang.modules.scala-java8-compat_2.12-0.8.0.jar
com.fasterxml.jackson.core.jackson-databind-2.12.3.jar		     com.typesafe.akka.akka-http-core_2.12-10.2.7.jar			     org.scala-lang.modules.scala-parser-combinators_2.12-1.1.2.jar
com.fasterxml.jackson.dataformat.jackson-dataformat-cbor-2.12.3.jar  com.typesafe.akka.akka-parsing_2.12-10.2.7.jar			     org.scala-lang.scala-library-2.12.10.jar
com.github.pureconfig.pureconfig_2.12-0.15.0.jar		     com.typesafe.akka.akka-slf4j_2.12-2.6.16.jar			     org.slf4j.log4j-over-slf4j-1.7.32.jar
com.github.pureconfig.pureconfig-core_2.12-0.15.0.jar		     com.typesafe.akka.akka-stream_2.12-2.6.16.jar			     org.slf4j.slf4j-api-1.7.32.jar
com.github.pureconfig.pureconfig-generic_2.12-0.15.0.jar	     com.typesafe.config-1.4.1.jar					     org.slf4j.slf4j-simple-1.7.32.jar
com.github.pureconfig.pureconfig-generic-base_2.12-0.15.0.jar	     com.typesafe.ssl-config-core_2.12-0.4.2.jar			     org.typelevel.cats-core_2.12-2.6.1.jar
com.github.scopt.scopt_2.12-4.0.1.jar				     io.circe.circe-core_2.12-0.14.1.jar				     org.typelevel.cats-effect_2.12-2.2.0.jar
commons-codec.commons-codec-1.15.jar				     io.circe.circe-generic_2.12-0.14.1.jar				     org.typelevel.cats-kernel_2.12-2.6.1.jar
commons-logging.commons-logging-1.2.jar				     io.circe.circe-jawn_2.12-0.13.0.jar				     org.typelevel.jawn-parser_2.12-1.0.0.jar
com.snowplowanalytics.collector-payload-1-0.0.0.jar		     io.circe.circe-numbers_2.12-0.14.1.jar				     org.typelevel.simulacrum-scalafix-annotations_2.12-0.5.4.jar
com.snowplowanalytics.iglu-core_2.12-1.0.0.jar			     io.circe.circe-parser_2.12-0.13.0.jar				     software.amazon.ion.ion-java-1.0.2.jar
com.snowplowanalytics.iglu-core-circe_2.12-1.0.0.jar		     io.prometheus.simpleclient-0.9.0.jar
com.snowplowanalytics.iglu-scala-client_2.12-1.1.1.jar		     io.prometheus.simpleclient_common-0.9.0.jar

daemon@snowplow-7978d84d7b-gjn7m:/opt/snowplow$ bin/snowplow-stream-collector  --config /etc/conf/collector-conf -v
# Executing command line:
/opt/java/openjdk/bin/java
-cp
/opt/snowplow/lib/com.snowplowanalytics.snowplow-stream-collector-kinesis-2.5.0.jar:/opt/snowplow/lib/com.snowplowanalytics.snowplow-stream-collector-core-2.5.0.jar:/opt/snowplow/lib/org.scala-lang.scala-library-2.12.10.jar:/opt/snowplow/lib/org.apache.thrift.libthrift-0.15.0.jar:/opt/snowplow/lib/joda-time.joda-time-2.10.13.jar:/opt/snowplow/lib/org.slf4j.slf4j-simple-1.7.32.jar:/opt/snowplow/lib/org.slf4j.log4j-over-slf4j-1.7.32.jar:/opt/snowplow/lib/com.typesafe.config-1.4.1.jar:/opt/snowplow/lib/io.prometheus.simpleclient-0.9.0.jar:/opt/snowplow/lib/io.prometheus.simpleclient_common-0.9.0.jar:/opt/snowplow/lib/com.github.scopt.scopt_2.12-4.0.1.jar:/opt/snowplow/lib/com.typesafe.akka.akka-stream_2.12-2.6.16.jar:/opt/snowplow/lib/com.typesafe.akka.akka-http_2.12-10.2.7.jar:/opt/snowplow/lib/com.typesafe.akka.akka-slf4j_2.12-2.6.16.jar:/opt/snowplow/lib/com.snowplowanalytics.snowplow-badrows_2.12-2.1.1.jar:/opt/snowplow/lib/com.snowplowanalytics.collector-payload-1-0.0.0.jar:/opt/snowplow/lib/com.github.pureconfig.pureconfig_2.12-0.15.0.jar:/opt/snowplow/lib/com.snowplowanalytics.snowplow-scala-tracker-core_2.12-1.0.0.jar:/opt/snowplow/lib/com.snowplowanalytics.snowplow-scala-tracker-emitter-id_2.12-1.0.0.jar:/opt/snowplow/lib/com.amazonaws.aws-java-sdk-kinesis-1.12.128.jar:/opt/snowplow/lib/com.amazonaws.aws-java-sdk-sts-1.12.128.jar:/opt/snowplow/lib/com.fasterxml.jackson.dataformat.jackson-dataformat-cbor-2.12.3.jar:/opt/snowplow/lib/com.amazonaws.aws-java-sdk-sqs-1.12.128.jar:/opt/snowplow/lib/org.slf4j.slf4j-api-1.7.32.jar:/opt/snowplow/lib/org.apache.httpcomponents.httpclient-4.5.13.jar:/opt/snowplow/lib/org.apache.httpcomponents.httpcore-4.4.13.jar:/opt/snowplow/lib/javax.annotation.javax.annotation-api-1.3.2.jar:/opt/snowplow/lib/com.typesafe.akka.akka-actor_2.12-2.6.16.jar:/opt/snowplow/lib/org.reactivestreams.reactive-streams-1.0.3.jar:/opt/snowplow/lib/com.typesafe.ssl-config-core_2.12-0.4.2.jar:/opt/snowplow/lib/com.typesafe.akka.akka-http-core_2.12-10.2.7.jar:/opt/snowplow/lib/org.typelevel.cats-core_2.12-2.6.1.jar:/opt/snowplow/lib/io.circe.circe-generic_2.12-0.14.1.jar:/opt/snowplow/lib/com.snowplowanalytics.iglu-scala-client_2.12-1.1.1.jar:/opt/snowplow/lib/com.snowplowanalytics.snowplow-scala-analytics-sdk_2.12-2.1.0.jar:/opt/snowplow/lib/com.github.pureconfig.pureconfig-core_2.12-0.15.0.jar:/opt/snowplow/lib/com.github.pureconfig.pureconfig-generic_2.12-0.15.0.jar:/opt/snowplow/lib/com.snowplowanalytics.iglu-core_2.12-1.0.0.jar:/opt/snowplow/lib/io.circe.circe-parser_2.12-0.13.0.jar:/opt/snowplow/lib/com.snowplowanalytics.iglu-core-circe_2.12-1.0.0.jar:/opt/snowplow/lib/org.typelevel.cats-effect_2.12-2.2.0.jar:/opt/snowplow/lib/org.scalaj.scalaj-http_2.12-2.4.2.jar:/opt/snowplow/lib/com.amazonaws.aws-java-sdk-core-1.12.128.jar:/opt/snowplow/lib/com.amazonaws.jmespath-java-1.12.128.jar:/opt/snowplow/lib/com.fasterxml.jackson.core.jackson-databind-2.12.3.jar:/opt/snowplow/lib/com.fasterxml.jackson.core.jackson-core-2.12.3.jar:/opt/snowplow/lib/commons-logging.commons-logging-1.2.jar:/opt/snowplow/lib/commons-codec.commons-codec-1.15.jar:/opt/snowplow/lib/org.scala-lang.modules.scala-java8-compat_2.12-0.8.0.jar:/opt/snowplow/lib/org.scala-lang.modules.scala-parser-combinators_2.12-1.1.2.jar:/opt/snowplow/lib/com.typesafe.akka.akka-parsing_2.12-10.2.7.jar:/opt/snowplow/lib/org.typelevel.cats-kernel_2.12-2.6.1.jar:/opt/snowplow/lib/org.typelevel.simulacrum-scalafix-annotations_2.12-0.5.4.jar:/opt/snowplow/lib/io.circe.circe-core_2.12-0.14.1.jar:/opt/snowplow/lib/com.chuusai.shapeless_2.12-2.3.7.jar:/opt/snowplow/lib/com.github.pureconfig.pureconfig-generic-base_2.12-0.15.0.jar:/opt/snowplow/lib/io.circe.circe-jawn_2.12-0.13.0.jar:/opt/snowplow/lib/software.amazon.ion.ion-java-1.0.2.jar:/opt/snowplow/lib/com.fasterxml.jackson.core.jackson-annotations-2.12.3.jar:/opt/snowplow/lib/io.circe.circe-numbers_2.12-0.14.1.jar:/opt/snowplow/lib/org.typelevel.jawn-parser_2.12-1.0.0.jar
com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector
--config
/etc/conf/collector-conf

[main] INFO com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Creating thread pool of size 10
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.fasterxml.jackson.databind.util.ClassUtil (file:/opt/snowplow/lib/com.fasterxml.jackson.core.jackson-databind-2.12.3.jar) to field java.lang.Throwable.cause
WARNING: Please consider reporting this to the maintainers of com.fasterxml.jackson.databind.util.ClassUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[main] ERROR com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$ - Error checking if stream good-snow exists
com.amazonaws.services.kinesis.model.AmazonKinesisException: User: arn:aws:sts::<accountIDCensored>:assumed-role/15FiveEKSNode/i-<instanceIDCensored> is not authorized to perform: kinesis:DescribeStream on resource: arn:aws:kinesis:us-east-1:<accountIDCensored>:stream/good-snow because no identity-based policy allows the kinesis:DescribeStream action (Service: AmazonKinesis; Status Code: 400; Error Code: AccessDeniedException; Request ID: eeed99b9-7575-6dba-b43d-92ac6118feef; Proxy: null)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:2980)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2947)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2936)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.executeDescribeStream(AmazonKinesisClient.java:898)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:867)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:910)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$streamExists$1(KinesisSink.scala:525)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at scala.util.Try$.apply(Try.scala:213)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.streamExists(KinesisSink.scala:524)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.runChecks(KinesisSink.scala:486)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$3(KinesisSink.scala:399)
	at scala.util.Either.map(Either.scala:353)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$2(KinesisSink.scala:398)
	at scala.util.Either.flatMap(Either.scala:341)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$1(KinesisSink.scala:397)
	at scala.util.Either.flatMap(Either.scala:341)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.createAndInitialize(KinesisSink.scala:396)
	at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.$anonfun$main$2(KinesisCollector.scala:49)
	at scala.util.Either.flatMap(Either.scala:341)
	at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.main(KinesisCollector.scala:33)
	at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector.main(KinesisCollector.scala)
[main] ERROR com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$ - SQS buffer is not configured.
[main] WARN com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink - No SQS buffer for surge protection set up (consider setting a SQS Buffer in config.hocon).
[main] ERROR com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$ - Error checking if stream bad-snow exists
com.amazonaws.services.kinesis.model.AmazonKinesisException: User: arn:aws:sts::<accountIDCensored>:assumed-role/15FiveEKSNode/i-<instanceIDCensored> is not authorized to perform: kinesis:DescribeStream on resource: arn:aws:kinesis:us-east-1:<accountIDCensored>:stream/bad-snow because no identity-based policy allows the kinesis:DescribeStream action (Service: AmazonKinesis; Status Code: 400; Error Code: AccessDeniedException; Request ID: efe39e37-2344-50e2-b533-952217d662b2; Proxy: null)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:2980)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2947)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2936)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.executeDescribeStream(AmazonKinesisClient.java:898)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:867)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:910)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$streamExists$1(KinesisSink.scala:525)
	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
	at scala.util.Try$.apply(Try.scala:213)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.streamExists(KinesisSink.scala:524)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.runChecks(KinesisSink.scala:486)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$3(KinesisSink.scala:399)
	at scala.util.Either.map(Either.scala:353)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$2(KinesisSink.scala:398)
	at scala.util.Either.flatMap(Either.scala:341)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$1(KinesisSink.scala:397)
	at scala.util.Either.flatMap(Either.scala:341)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.createAndInitialize(KinesisSink.scala:396)
	at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.$anonfun$main$3(KinesisCollector.scala:51)
	at scala.util.Either.flatMap(Either.scala:341)
	at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.$anonfun$main$2(KinesisCollector.scala:43)
	at scala.util.Either.flatMap(Either.scala:341)
	at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.main(KinesisCollector.scala:33)
	at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector.main(KinesisCollector.scala)
[main] ERROR com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$ - SQS buffer is not configured.
[main] WARN com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink - No SQS buffer for surge protection set up (consider setting a SQS Buffer in config.hocon).
[scala-stream-collector-akka.actor.default-dispatcher-5] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
[main] INFO com.snowplowanalytics.snowplow.collectors.scalastream.telemetry.TelemetryAkkaService - Telemetry enabled
[scala-stream-collector-akka.actor.default-dispatcher-5] INFO com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - REST interface bound to /0.0.0.0:8000
^C[Thread-0] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Received shutdown signal
[Thread-0] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Sleeping for 10 seconds
[Thread-0] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Initiating http server termination
[Thread-0] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Server terminated
[scala-stream-collector-akka.actor.default-dispatcher-13] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Initiating bad sink shutdown
[scala-stream-collector-akka.actor.default-dispatcher-12] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Initiating good sink shutdown
[scala-stream-collector-akka.actor.default-dispatcher-12] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Completed good sink shutdown
[scala-stream-collector-akka.actor.default-dispatcher-13] WARN com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - Completed bad sink shutdown

@jbeemster
Copy link
Member

Hi @caleb15 I have been testing this out today and with:

aws {
        accessKey = default
        secretKey = default
      }

... and an OIDC IAM role ARN attached to the pod service-account I could successfully write out to the target stream. If you use the iam inputs it indeed does not work but the DefaultAWSCredentialsProviderChain with v2.5.0 does work as far as I can tell.

Are you certain that the ServiceAccount you have attached to the service is correctly configured and attached?

@caleb15
Copy link

caleb15 commented Apr 13, 2022

Weird, this time default works. I could've sworn I tested it before and default didn't work :|

Sorry about that, thanks for testing!

@caleb15
Copy link

caleb15 commented May 26, 2022

Just ran into this issue again when recreating the pod even though I have it set to default. Same issue with snowplow kinesis enrichment. I'll try making my own pod with sudo rights and awscli so I can look further into it.

@caleb15
Copy link

caleb15 commented May 26, 2022

Nevermind, turns out I got the error because the trust relationship in the IAM role referenced serviceaccount's old name ~ I had updated the serviceaccount to have a new name, but didn't realize I would also need to update the IAM role too because I thought all you needed was the ARN reference. Turns out you need to make sure the ARN annotation in the service account matches and that the service account name matches the name in the trust relationship.

Relevant: https://stackoverflow.com/questions/66405794/not-authorized-to-perform-stsassumerolewithwebidentity-403

Some more debugging tips: When your IAM role is working you should be able to do aws sts get-caller-identity and get something like the following:

{
    "UserId": "<censored>:botocore-session-<censored>",
    "Account": "<censored>",
    "Arn": "arn:aws:sts::<censored>:assumed-role/<role-name>/botocore-session-<censored>"
}

You should also make sure the serviceaccount name in the pod matches the name of the serviceaccount:

kf get pod/<podname> -o yaml | grep serviceAccount

Note the container doesn't have root privileges so you can't install awscli. I made my own container with root privileges and used that instead. However, I realized there's a far easier way: just set runAsUser in the securityContext to 0 (root). That way you can install whatever packages you need to debug.

@kalupa
Copy link

kalupa commented Oct 23, 2023

Is this still an active issue? I was looking for info about k8s and the scala collector and stumbled upon this, and it's unclear why the issue is still 'open'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants