-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP client: total/read timeout may not be respected #9547
Comments
I'm not sure if it's a deliberate decision, but I don't think the timeout applies to .read(). So, the server may be responding to the request quickly and then hanging during the rest of data transfer. |
Documentation does say: The maximal number of seconds for the whole operation including connection establishment, request sending and response reading. Which would suggest that the timeout should be carried into the .read() method. I think at present it only applies to reading the start (maybe 2KB) of response which is completed in the .request() call. |
Hey Sam! Not only documentation, the source code of request method claims the same. https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client.py#L514
And I think the timer works well ... usually. No matter what, I'm not able to reproduce the problem in mock client/server. When I stream data from server to client, the client raises TimeoutError after Would it be possible under some circumstances that the client blocks asyncio event loop? |
I may have missed something, but it seems like the timer is created within .request() and not passed anywhere relevant. I didn't see any timer in the .read() method, or the underlying StreamReader. Should be possible to create a test simulating this, if you want to give it a go, you could probably take a test from (e.g. from test_client_functional.py or test_web_functional.py) and create a handler which yields data with an asyncio.sleep() call in between to force a delay. Assert that the client code completes the .request() call and then assert a timeout is raised on the .read() call.
Probably more likely something else in your application could block the event loop. But, seems highly unlikely you'd have something blocking for 2 hours and not realise... The json.loads() call is obviously blocking, but to take 2 hours to decode would surely require a response that would likely be bigger than the machine's RAM... |
Thanks, Sam! Streams have timer enabled, too. https://github.com/aio-libs/aiohttp/blob/master/aiohttp/streams.py#L345 I've spent many hours analyzing aiohttp code and couldn't find any issue. Hence I have implemented full TraceConfig a will wait for another incident. |
Right, I missed that. I think that is working correctly then. I can only imagine something is blocking your application then... |
Describe the bug
I've been using aiohttp with FastAPI server without any major problems, however, several times a week it happens that the client fails to respect request timeout.
This snippet is 100% accurate to what I have in the production code.
Most of the time, the communication works perfectly. Having client and server machines in the same rack, req/response takes milliseconds. However, there seems to be a (nightly) period of (possible?) maintenance. In that case I can see in logs that response took well over 2 hours.
There are dozens of requests from a single process directed to the server at the same time. The requests do succeed, however, third of them finishes under second, while the rest takes minutes or hours to complete. From the server logs and actions I know the request arrived to FastAPI server immediately, while the response takes the whole time.
The question is, why the client doesn't respect 60s total timeout that I set?
I've tested timeouts on a mock server and client and it indeed did work. Is it possible that under some circumstances the client receives data slowly and fails to respect the timeout?
Under which case is the timeout not respected? I.e. is there a way to set total timeout that hits no matter what? (I mean except for anyio cancelling groups).
To Reproduce
I wasn't able to reproduce the problem in controlled environment, however, this is a repeating problem in a period of many weeks and like caused some maintenance on the machine.
Expected behavior
To be able to set timeout that always hits no matter what.
Logs/tracebacks
Python Version
aiohttp Version
multidict Version
propcache Version
yarl Version
OS
Linux, RHEL 7, x86_64
Related component
Client
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: