Optimise `details` handling for GraphQL #3134

brucebolt · 2025-02-11T17:00:13Z

This makes a number of performance improvements to processing of details for GraphQL queries. When we did profiling, we found DetailsPresenter is home to some of the most invoked and slowest methods within our own code when serving GraphQL responses.

The change in response time is only marginal for the document types we have already migrated to GraphQL, but has potential for larger improvements when we migrate documents containing a large amount of data in details, particularly those with extensive nesting.

Change 1: Only parse details that are requested

At the moment, we are putting the entire contents of the details field through the DetailsPresenter, which is sub-optimal as we are parsing values that clients are not requesting.

By using lookaheads, we can determine which fields from details have been requested and only parse those.

In a previous prototype, we had only transformed the govspeak in the body field. However some schemas permit mixed govspeak/HTML content in other details fields, or even nested within other fields. We therefore need to recursively parse all items within details, as is already done using the DetailsPresenter, to ensure no raw govspeak to presented to the client.

Note: the code here could've been simpler (i.e. not duplicate the object, just slice the details hash) but the ContentEmbedPresenter requires a full Edition object.

Change 2: Do not loop through details unless embed links exist

At the moment, we are looping through each item in details to look for any embedded content references. This is sub-optimal as we are searching through content for a tag that we know won't be there, if there is nothing to embed in the document.

We should skip doing this if there is nothing that could possibly be embedded, which we know by looking in the document's links.

Change 3: Reduce lines of code executed in `DetailsPresenter`

This presenter currently performs some of the same code in different methods that determine the type of content in the field, and looks for content types even when we know there won't be any present.

Therefore refactoring this presenter to perform as little code execution as possible to order to determine the correct outcome for the content type given.

Change 4: Remove useless memoization

Everywhere the details method is being called is not re-computing the response, therefore the memoization is pointless.

Removing it here, to avoid the unnecessary compute needed to check whether the instance variable has already been set.

Trello card

yndajas

Looks good - no particularly significant comments

spec/graphql/types/edition_type_spec.rb

app/graphql/types/edition_type.rb

yndajas · 2025-02-18T10:52:19Z

spec/presenters/details_presenter_spec.rb

@@ -108,25 +108,6 @@
      it { is_expected.to match(expected_result) }
    end

-    context "when we're passed hashes rather than arrays" do


question: is there no code change required to restrict this from happening? (Given the tests pass, presumably it works, even if it goes against the ADR?)

I can't see any schemas with this format currently in use, nor any suggestion we've ever used a hash that isn't inside an array.

I was more meaning do we need to make any code changes in the presenter itself to restrict this from being introduced?

I see. I don't think the presenter would be the right place to restrict this, since the content has already been accepted by Publishing API at this point. Looking in WellFormedContentTypesValidator and the associated tests, there doesn't seem to be a case where we ever intended for well-formed content types to not be wrapped in an array.

I've added an additional check into that validator and a test to ensure we don't get content like this published in the future.