-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise details
handling for GraphQL
#3134
Conversation
1d713b1
to
e939446
Compare
details
handling for GraphQL
d10f907
to
71c2777
Compare
71c2777
to
69ceda5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - no particularly significant comments
@@ -108,25 +108,6 @@ | |||
it { is_expected.to match(expected_result) } | |||
end | |||
|
|||
context "when we're passed hashes rather than arrays" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is there no code change required to restrict this from happening? (Given the tests pass, presumably it works, even if it goes against the ADR?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see any schemas with this format currently in use, nor any suggestion we've ever used a hash that isn't inside an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more meaning do we need to make any code changes in the presenter itself to restrict this from being introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I don't think the presenter would be the right place to restrict this, since the content has already been accepted by Publishing API at this point. Looking in WellFormedContentTypesValidator
and the associated tests, there doesn't seem to be a case where we ever intended for well-formed content types to not be wrapped in an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an additional check into that validator and a test to ensure we don't get content like this published in the future.
if obj.is_a?(Array) && obj.all?(Hash) && (parsed_obj = parsed_content(obj)) | ||
parsed_obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what's the reason for putting the assignment in the conditional expression rather than directly in the return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a possibility parsed_content
can return nil
, in which case we'll then proceed onto the next elsif
.
I've done the assignment in the conditional expression so we don't need to evaluate parsed_content
unless both the first two conditions are true.
@details ||= | ||
begin | ||
updated = content_embed(content_item_details).presence || content_item_details | ||
updated = recursively_transform_govspeak(updated) | ||
updated[:change_history] = change_history if change_history.present? | ||
updated | ||
end | ||
updated = content_embed(content_item_details).presence || content_item_details | ||
updated = recursively_transform_govspeak(updated) | ||
updated[:change_history] = change_history if change_history.present? | ||
updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: did you just do a search of the codebase to work out that nowhere is calling this method twice? Is the check for an existing instance variable non-negligible? Given this is a public method it feels plausible that future code could call it twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. DetailsPresenter
is only being used in three places and the details
method is immediately called (and only once) in each place. I can't really think of any reason why we'd need to call it multiple times in this application.
Memoization probably uses only minimal compute here and won't have any significant effect on the overall response time, but does seem a little pointless.
1701e5c
to
fd49232
Compare
In the `EditionType` specs, we are currently not requesting any fields from `details`. This works at the moment, since the `EditionType` processes the entire content of the `details` hash before filtering out those selected by the client. In a future commit, we will only process those the client has selected. Therefore updating these tests to specify which fields should be obtained from `details`. The resolution context has been removed from the `includes` line, since that version of `run_graphql_field` does not support `lookahead` as an argument [1]. 1: https://github.com/rmosolgo/graphql-ruby/blob/ec47e903e619923715de225c36fafeb28620a91e/lib/graphql/testing/helpers.rb#L125
At the moment, we are putting the entire contents of the `details` field through the `DetailsPresenter`, which is sub-optimal as we are parsing values that clients are not requesting. By using lookaheads, we can determine which fields from `details` have been requested and only parse those. In a previous prototype, we had only transformed the govspeak in the `body` field. However the schemas can permit mixed govspeak/HTML content in any details field. We therefore need to parse all items within `details`, as is already done using the `DetailsPresenter`. This also allows removal of the optimisations for `change_history` added in 2caf00e since we can now filter the details to only include that field when requested.
At the moment, we are looping through each item in `details` to look for any embedded content references. This is sub-optimal as we are searching through content for a tag that we know won't be there, if there is nothing to embed in the document. We should skip doing this if there is nothing that could possibly be embedded, which we know by looking in the document's links.
There is no schema that permits this format for specifying the content is govspeak (all schemas require this in an array), so the test can be removed. This behaviour is confired by ADR-003 [1] and the definition of `multiple_content_types` in the shared Publishing API base schema definition. Removing this test (and support for multi-part content to not be included in an array) allows us to simplify the `DetailsPresenter`, as we will only ever be dealing with strings or arrays of hashes. Also updating a validator to ensure no content with this incorrect syntax can be published in the future. 1: https://github.com/alphagov/publishing-api/blob/main/docs/arch/adr-003-representation-for-multiple-content-types.md
This presenter currently performs some of the same code in different methods that determine the type of content in the field, and looks for content types even when we know there won't be any present. Therefore refactoring this presenter to perform as little code execution as possible to order to determine the correct outcome for the content type given. This performance improvement is particularly important for GraphQL, as we will be converting govspeak to HTML at render-time.
Everywhere the `details` method is being called is not re-computing the response, therefore the memoization is pointless. Removing it here, to avoid the unnecessary compute needed to check whether the instance variable has already been set.
fd49232
to
a89216a
Compare
This makes a number of performance improvements to processing of
details
for GraphQL queries. When we did profiling, we foundDetailsPresenter
is home to some of the most invoked and slowest methods within our own code when serving GraphQL responses.The change in response time is only marginal for the document types we have already migrated to GraphQL, but has potential for larger improvements when we migrate documents containing a large amount of data in
details
, particularly those with extensive nesting.Change 1: Only parse details that are requested
At the moment, we are putting the entire contents of the
details
field through theDetailsPresenter
, which is sub-optimal as we are parsing values that clients are not requesting.By using lookaheads, we can determine which fields from
details
have been requested and only parse those.In a previous prototype, we had only transformed the govspeak in the
body
field. However some schemas permit mixed govspeak/HTML content in other details fields, or even nested within other fields. We therefore need to recursively parse all items withindetails
, as is already done using theDetailsPresenter
, to ensure no raw govspeak to presented to the client.Note: the code here could've been simpler (i.e. not duplicate the object, just slice the details hash) but the
ContentEmbedPresenter
requires a fullEdition
object.Change 2: Do not loop through details unless embed links exist
At the moment, we are looping through each item in
details
to look for any embedded content references. This is sub-optimal as we are searching through content for a tag that we know won't be there, if there is nothing to embed in the document.We should skip doing this if there is nothing that could possibly be embedded, which we know by looking in the document's links.
Change 3: Reduce lines of code executed in
DetailsPresenter
This presenter currently performs some of the same code in different methods that determine the type of content in the field, and looks for content types even when we know there won't be any present.
Therefore refactoring this presenter to perform as little code execution as possible to order to determine the correct outcome for the content type given.
Change 4: Remove useless memoization
Everywhere the
details
method is being called is not re-computing the response, therefore the memoization is pointless.Removing it here, to avoid the unnecessary compute needed to check whether the instance variable has already been set.
Trello card