Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgREST does not log 503 errors to stderr #3841

Open
laurenceisla opened this issue Jan 6, 2025 · 5 comments · May be fixed by #3896
Open

PostgREST does not log 503 errors to stderr #3841

laurenceisla opened this issue Jan 6, 2025 · 5 comments · May be fixed by #3896
Assignees
Labels

Comments

@laurenceisla
Copy link
Member

laurenceisla commented Jan 6, 2025

Environment

  • PostgreSQL version: 15.6
  • PostgREST version: 12.2.3
  • Operating system: ubuntu

Description of issue

PostgREST logged the following (requests queries redacted):

127.0.0.1 - service_role [06/Jan/2025:17:44:02 +0000] "GET /rpc/<an_rpc> HTTP/1.1" 503 - "" "node"
06/Jan/2025:17:44:03 +0000: Successfully connected to PostgreSQL 15.6 ...
06/Jan/2025:17:44:03 +0000: Config reloaded
06/Jan/2025:17:44:03 +0000: Schema cache queried in 35.5 milliseconds
06/Jan/2025:17:44:03 +0000: Schema cache loaded ...
127.0.0.1 - service_role [06/Jan/2025:17:44:02 +0000] "GET /rpc/<another_rpc> HTTP/1.1" 503 - "" "node"
06/Jan/2025:17:44:03 +0000: Successfully connected to PostgreSQL 15.6 ...
06/Jan/2025:17:44:03 +0000: Config reloaded
06/Jan/2025:17:44:04 +0000: Schema cache queried in 37.9 milliseconds
06/Jan/2025:17:44:04 +0000: Schema cache loaded ...

As seen above it does not log the 503 error (when it should) and retries connection immediately. For instance, the 500 errors are logged correctly:

06/Jan/2025:17:44:59 +0000: {"code":"57014","details":null,"hint":null,"message":"canceling statement due to statement timeout"}
127.0.0.1 - service_role [06/Jan/2025:17:44:59 +0000] "GET /rpc/<an_rpc> HTTP/1.1" 500 - "" "node"

Now, checking the PostgreSQL logs, it shows that it's an error related to. In particular, it shows these two errors:

message:"canceling statement due to conflict with recovery"
detail:"User query might have needed to see row versions that must be removed."
message:"terminating connection due to conflict with recovery"
detail:"User query might have needed to see row versions that must be removed."
hint:"In a moment you should be able to reconnect to the database and repeat your command."

Not sure if PostgREST should try to reconnect or not in these cases, since a solution is to set max_standby_archive_delay and max_standby_streaming_delay in PG config to avoid the error.

@steve-chavez steve-chavez added the enhancement a feature, ready for implementation label Jan 15, 2025
@laurenceisla laurenceisla changed the title Read replicas do not log 503 errors to stderr PostgREST does not log 503 errors to stderr Jan 26, 2025
@laurenceisla
Copy link
Member Author

This not only happens on read replicas but also on connections to a single primary db.

Not sure if PostgREST should try to reconnect or not in these cases

I missed this issue #3673, this is tracked there.


The solution seems simple enough. It would need to also log for ClientError here:

SQL.ServerError{} ->
when (Error.status (Error.PgError False err) >= HTTP.status500) $
observer $ QueryErrorCodeHighObs err
SQL.SessionUsageError (SQL.QueryError _ _ (SQL.ClientError _)) ->
pure ()
)

Something like:

 SQL.SessionUsageError (SQL.QueryError _ _ (SQL.ClientError _)) -> 
   when (Error.status (Error.PgError False err) >= HTTP.status500) $ 
     observer $ QueryErrorCodeHighObs err
 ) 

I don't remember if there's a reason why we don't log on ClientError. According to Hasql docs this is most likely due to connection errors, so I don't think it'd be a problem.

@taimoorzaeem taimoorzaeem added the difficulty: beginner Pure Haskell task label Jan 30, 2025
@taimoorzaeem taimoorzaeem self-assigned this Jan 31, 2025
@taimoorzaeem
Copy link
Collaborator

taimoorzaeem commented Feb 1, 2025

error related to conflicts between the replica and the primary db

@laurenceisla Hm, I can't see this. Any ideas on how to reproduce this error in our tests using test/io/replica.sql and test/io/test_replica.py?

@laurenceisla
Copy link
Member Author

Yeah, I don't think that's gonna be easy or possible to reproduce in replicas. But this also happens for regular connections too, so you should test in test_io.py instead.

One way to trigger a 503 is to call the /rpc/terminate_pgrst function, it's defined in the IO test fixtures.

@taimoorzaeem taimoorzaeem linked a pull request Feb 3, 2025 that will close this issue
@steve-chavez steve-chavez added bug and removed enhancement a feature, ready for implementation bug labels Feb 4, 2025
@steve-chavez
Copy link
Member

steve-chavez commented Feb 4, 2025

This looks like a bug fix to me, as the "crit" level should document all critical error messages as documented.


Sidenote: Just noticed that we don't have a "bug" label anymore, and there's a "Type: bug" field on github. It looks like all our "bug" issues have been migrated to that type. That was confusing.

@wolfgangwalther
Copy link
Member

It looks like all our "bug" issues have been migrated to that type.

👋

That was me ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

4 participants