-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: emit metrics from CRT HTTP engine #1017
base: main
Are you sure you want to change the base?
Changes from 2 commits
bdb0814
2174a83
0357d85
1b970a4
24278a6
c36f9fe
d27f858
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"id": "e7c3c7ab-749e-4371-8b25-42ea76aa870d", | ||
"type": "feature", | ||
"description": "Emit metrics from CRT HTTP engine", | ||
"issues": [ | ||
"https://github.com/awslabs/smithy-kotlin/issues/893" | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,9 +13,12 @@ import aws.smithy.kotlin.runtime.crt.SdkDefaultIO | |
import aws.smithy.kotlin.runtime.http.HttpErrorCode | ||
import aws.smithy.kotlin.runtime.http.HttpException | ||
import aws.smithy.kotlin.runtime.http.engine.ProxyConfig | ||
import aws.smithy.kotlin.runtime.http.engine.internal.HttpClientMetrics | ||
import aws.smithy.kotlin.runtime.http.request.HttpRequest | ||
import aws.smithy.kotlin.runtime.io.Closeable | ||
import aws.smithy.kotlin.runtime.net.TlsVersion | ||
import aws.smithy.kotlin.runtime.telemetry.metrics.measureSeconds | ||
import kotlinx.atomicfu.atomic | ||
import kotlinx.coroutines.TimeoutCancellationException | ||
import kotlinx.coroutines.sync.Mutex | ||
import kotlinx.coroutines.sync.Semaphore | ||
|
@@ -27,8 +30,10 @@ import aws.smithy.kotlin.runtime.net.TlsVersion as SdkTlsVersion | |
|
||
internal class ConnectionManager( | ||
private val config: CrtHttpEngineConfig, | ||
private val metrics: HttpClientMetrics, | ||
) : Closeable { | ||
private val leases = Semaphore(config.maxConnections.toInt()) | ||
private val pending = atomic(0L) | ||
|
||
private val crtTlsContext: TlsContext = TlsContextOptionsBuilder() | ||
.apply { | ||
|
@@ -61,16 +66,22 @@ internal class ConnectionManager( | |
val manager = getManagerForUri(request.uri, proxyConfig) | ||
var leaseAcquired = false | ||
|
||
metrics.queuedRequests = pending.incrementAndGet() | ||
|
||
return try { | ||
// wait for an actual connection | ||
val conn = withTimeout(config.connectionAcquireTimeout) { | ||
// get a permit to acquire a connection (limits overall connections since managers are per/host) | ||
leases.acquire() | ||
leaseAcquired = true | ||
manager.acquireConnection() | ||
} | ||
metrics.requestsQueuedDuration.measureSeconds { | ||
// wait for an actual connection | ||
val conn = withTimeout(config.connectionAcquireTimeout) { | ||
// get a permit to acquire a connection (limits overall connections since managers are per/host) | ||
leases.acquire() | ||
leaseAcquired = true | ||
metrics.connectionAcquireDuration.measureSeconds { | ||
manager.acquireConnection() | ||
} | ||
} | ||
|
||
LeasedConnection(conn) | ||
LeasedConnection(conn) | ||
} | ||
} catch (ex: Exception) { | ||
if (leaseAcquired) { | ||
leases.release() | ||
|
@@ -82,8 +93,18 @@ internal class ConnectionManager( | |
} | ||
|
||
throw httpEx | ||
} finally { | ||
metrics.queuedRequests = pending.decrementAndGet() | ||
emitConnections() | ||
} | ||
} | ||
|
||
private fun emitConnections() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit / naming: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the class already has the word "connection" in it, renaming to simply |
||
val idleConnections = leases.availablePermits.toLong() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. correctness: this semaphore isn't tied in anyway to actual connections in CRT, you could have 100 max connections configured but that doesn't mean you have 100 idle connections already connected waiting to be used. I'm thinking this would have to come from the actual CRT connection manager. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh jeez, good point. Yes, this is clearly flawed. That brings up an interesting wrinkle, though. We don't have just one CRT connection manager, we have one for each host. Does the Smithy metric for acquired connections equal the sum of acquired connections across all underlying CRT connection managers? Moreover, we have a CRT config param called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fact CRT uses connection manager per host is an implementation detail, the config setting is at the engine level. This is what the The reason I allowed each connection manager to be configured with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see this as an area that may cause confusion later, especially if users begin correlating the idle connections metric with connections reported by the OS or JVM. I don't have a better suggestion though given how the CRT connection manager works. I'll switch the implementation to calculate the idle connections metric as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh and interrogating the number of active connections from the CRT connection managers requires another revision of aws-crt-kotlin: awslabs/aws-crt-kotlin#88 |
||
metrics.idleConnections = idleConnections | ||
metrics.acquiredConnections = config.maxConnections.toLong() - idleConnections | ||
} | ||
|
||
private suspend fun getManagerForUri(uri: Uri, proxyConfig: ProxyConfig): HttpClientConnectionManager = mutex.withLock { | ||
connManagers.getOrPut(uri.authority) { | ||
val connOpts = options.apply { | ||
|
@@ -105,6 +126,7 @@ internal class ConnectionManager( | |
HttpClientConnectionManager(connOpts) | ||
} | ||
} | ||
|
||
override fun close() { | ||
connManagers.forEach { entry -> entry.value.close() } | ||
crtTlsContext.close() | ||
|
@@ -117,6 +139,7 @@ internal class ConnectionManager( | |
delegate.close() | ||
} finally { | ||
leases.release() | ||
emitConnections() | ||
} | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,16 +10,27 @@ import aws.sdk.kotlin.crt.io.Buffer | |
import aws.smithy.kotlin.runtime.http.* | ||
import aws.smithy.kotlin.runtime.http.HeadersBuilder | ||
import aws.smithy.kotlin.runtime.http.HttpException | ||
import aws.smithy.kotlin.runtime.http.engine.EngineAttributes | ||
import aws.smithy.kotlin.runtime.http.engine.crt.io.reportingTo | ||
import aws.smithy.kotlin.runtime.http.engine.internal.HttpClientMetrics | ||
import aws.smithy.kotlin.runtime.http.response.HttpResponse | ||
import aws.smithy.kotlin.runtime.http.response.copy | ||
import aws.smithy.kotlin.runtime.io.SdkBuffer | ||
import aws.smithy.kotlin.runtime.io.SdkByteChannel | ||
import aws.smithy.kotlin.runtime.io.SdkByteReadChannel | ||
import aws.smithy.kotlin.runtime.io.source | ||
import aws.smithy.kotlin.runtime.operation.ExecutionContext | ||
import aws.smithy.kotlin.runtime.telemetry.logging.logger | ||
import aws.smithy.kotlin.runtime.telemetry.metrics.recordSeconds | ||
import aws.smithy.kotlin.runtime.time.Instant | ||
import aws.smithy.kotlin.runtime.time.fromEpochNanoseconds | ||
import aws.smithy.kotlin.runtime.util.derivedName | ||
import kotlinx.atomicfu.locks.reentrantLock | ||
import kotlinx.atomicfu.locks.withLock | ||
import kotlinx.coroutines.* | ||
import kotlinx.coroutines.DelicateCoroutinesApi | ||
import kotlinx.coroutines.GlobalScope | ||
import kotlinx.coroutines.channels.Channel | ||
import kotlinx.coroutines.launch | ||
import kotlin.coroutines.CoroutineContext | ||
|
||
/** | ||
|
@@ -30,6 +41,8 @@ import kotlin.coroutines.CoroutineContext | |
internal class SdkStreamResponseHandler( | ||
private val conn: HttpClientConnection, | ||
private val callContext: CoroutineContext, | ||
private val execContext: ExecutionContext, | ||
private val clientMetrics: HttpClientMetrics, | ||
) : HttpStreamResponseHandler { | ||
// TODO - need to cancel the stream when the body is closed from the caller side early. | ||
// There is no great way to do that currently without either (1) closing the connection or (2) throwing an | ||
|
@@ -57,6 +70,10 @@ internal class SdkStreamResponseHandler( | |
|
||
private var streamCompleted = false | ||
|
||
init { | ||
clientMetrics.incrementInflightRequests() | ||
} | ||
|
||
/** | ||
* Called by the response read channel as data is consumed | ||
* @param size the number of bytes consumed | ||
|
@@ -184,7 +201,30 @@ internal class SdkStreamResponseHandler( | |
} | ||
|
||
internal suspend fun waitForResponse(): HttpResponse = | ||
responseReady.receive() | ||
responseReady.receive().wrapBody() | ||
|
||
private fun HttpResponse.wrapBody(): HttpResponse { | ||
val wrappedBody = when (val originalBody = body) { | ||
is HttpBody.Empty -> return this // Don't need an object copy since we're not wrapping the body | ||
is HttpBody.Bytes -> | ||
originalBody | ||
.bytes() | ||
.source() | ||
.reportingTo(clientMetrics.bytesReceived) | ||
.toHttpBody(originalBody.contentLength) | ||
is HttpBody.SourceContent -> | ||
originalBody | ||
.readFrom() | ||
.reportingTo(clientMetrics.bytesReceived) | ||
.toHttpBody(originalBody.contentLength) | ||
is HttpBody.ChannelContent -> | ||
originalBody | ||
.readFrom() | ||
.reportingTo(clientMetrics.bytesReceived) | ||
.toHttpBody(originalBody.contentLength) | ||
} | ||
return copy(body = wrappedBody) | ||
} | ||
|
||
/** | ||
* Invoked only after the consumer is finished with the response and it is safe to cleanup resources | ||
|
@@ -197,6 +237,8 @@ internal class SdkStreamResponseHandler( | |
// and more data is pending arrival). It can also happen if the coroutine for this request is cancelled | ||
// before onResponseComplete fires. | ||
lock.withLock { | ||
clientMetrics.decrementInflightRequests() | ||
|
||
val forceClose = !streamCompleted | ||
|
||
if (forceClose) { | ||
|
@@ -210,4 +252,19 @@ internal class SdkStreamResponseHandler( | |
conn.close() | ||
} | ||
} | ||
|
||
override fun onMetrics(stream: HttpStream, metrics: HttpStreamMetrics) { | ||
val sendEnd = positiveInstantOrNull(metrics.sendEndTimestampNs) | ||
val receiveStart = positiveInstantOrNull(metrics.receiveStartTimestampNs) | ||
|
||
if (sendEnd != null && receiveStart != null) { | ||
val ttfb = receiveStart - sendEnd | ||
if (ttfb.isPositive()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question: have you seen any instances where this is negative? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not directly but event streams may begin receiving traffic before sending has concluded since the communication is bidirectional and duplexed. It seemed prudent in that situation to not report TTFB. |
||
clientMetrics.timeToFirstByteDuration.recordSeconds(ttfb) | ||
execContext[EngineAttributes.TimeToFirstByte] = ttfb | ||
} | ||
} | ||
} | ||
} | ||
|
||
private fun positiveInstantOrNull(ns: Long): Instant? = if (ns > 0) Instant.fromEpochNanoseconds(ns) else null |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. comment: These new body types are missing tests |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
/* | ||
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
package aws.smithy.kotlin.runtime.http.engine.crt.io | ||
|
||
import aws.smithy.kotlin.runtime.io.SdkBuffer | ||
import aws.smithy.kotlin.runtime.io.SdkByteReadChannel | ||
import aws.smithy.kotlin.runtime.telemetry.metrics.MonotonicCounter | ||
|
||
private class ReportingByteReadChannel( | ||
val delegate: SdkByteReadChannel, | ||
val metric: MonotonicCounter, | ||
) : SdkByteReadChannel by delegate { | ||
override suspend fun read(sink: SdkBuffer, limit: Long): Long = delegate.read(sink, limit).also { | ||
if (it > 0) metric.add(it) | ||
} | ||
} | ||
|
||
internal fun SdkByteReadChannel.reportingTo(metric: MonotonicCounter): SdkByteReadChannel = | ||
ReportingByteReadChannel(this, metric) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
/* | ||
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
package aws.smithy.kotlin.runtime.http.engine.crt.io | ||
|
||
import aws.smithy.kotlin.runtime.io.SdkBuffer | ||
import aws.smithy.kotlin.runtime.io.SdkSource | ||
import aws.smithy.kotlin.runtime.telemetry.metrics.MonotonicCounter | ||
|
||
private class ReportingSource(val delegate: SdkSource, val metric: MonotonicCounter) : SdkSource by delegate { | ||
override fun read(sink: SdkBuffer, limit: Long): Long = delegate.read(sink, limit).also { | ||
if (it > 0) metric.add(it) | ||
} | ||
} | ||
|
||
internal fun SdkSource.reportingTo(metric: MonotonicCounter): SdkSource = ReportingSource(this, metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correctness: If we are trying to acquire a connection then it's not queued right? I think queued would be before we hit the requestLimiter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if "queued" means "before we attempt to acquire a connection" then I'm guessing that the
requestsQueuedDuration
measurement below is also wrong. I'll move it too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to look at the definitions I gave them again and I think this would be inline with what it says
If we are to the point of acquiring a connection we are "actively processing" the request. I can see where the definition could be interpreted differently though as in "I have all the resources needed at this point to execute the request" but I think establishing a connection (or waiting on one to be available) is part of overall request processing. Curious if others disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still doesn't look correct, I don't think
queuedRequests
is calculated in connection manager.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I think I was confusing the semaphore inside
ConnectionManager
with the one insideCrtHttpEngine
. Switching.