Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterative Include SQL Simplification #4699

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

LTA-Thinking
Copy link
Collaborator

Description

Simplifies the SQL query generated when lots of iterative include statements are added to a query.

Related issues

Addresses User Story 130995

Testing

Tested the results returned before and after including the simplification on a known data set.

FHIR Team Checklist

  • Update the title of the PR to be succinct and less than 65 characters
  • Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
  • Tag the PR with the type of update: Bug, Build, Dependencies, Enhancement, New-Feature or Documentation
  • Tag the PR with Open source, Azure API for FHIR (CosmosDB or common code) or Azure Healthcare APIs (SQL or common code) to specify where this change is intended to be released.
  • Tag the PR with Schema Version backward compatible or Schema Version backward incompatible or Schema Version unchanged if this adds or updates Sql script which is/is not backward compatible with the code.
  • CI is green before merge Build Status
  • Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

@LTA-Thinking LTA-Thinking added Enhancement Enhancement on existing functionality. Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs labels Oct 24, 2024
@LTA-Thinking LTA-Thinking added this to the S152 milestone Oct 24, 2024
@LTA-Thinking LTA-Thinking requested a review from a team as a code owner October 24, 2024 19:28
@LTA-Thinking
Copy link
Collaborator Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines +72 to +76
catch (Exception ex)
{
logger.LogWarning(ex, "Exception combining iterative includes.");
return; // Use the unmodified string
}

Check notice

Code scanning / CodeQL

Generic catch clause Note

Generic catch clause.
Comment on lines +498 to +505
if ((tempSortValue as DateTime?) != null)
{
sortValue = (tempSortValue as DateTime?).Value.ToString("o");
}
else
{
sortValue = tempSortValue.ToString();
}

Check notice

Code scanning / CodeQL

Missed ternary opportunity Note

Both branches of this 'if' statement write to the same variable - consider using '?' to express intent better.
@@ -15,6 +15,161 @@ namespace Microsoft.Health.Fhir.SqlServer.Features.Search
{
internal static class SqlCommandSimplifier
{
private static readonly Regex FindCteMatch = new Regex(",cte(\\d+) AS\\s*\\r\\n\\s*\\(\\s*\\r\\n\\s*SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch\\s*\\r\\n\\s*FROM dbo.ReferenceSearchParam refSource\\s*\\r\\n\\s*JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId\\s*\\r\\n\\s*WHERE refSource.SearchParamId = (\\d*)\\s*\\r\\n\\s*AND refTarget.IsHistory = 0\\s*\\r\\n\\s*AND refTarget.IsDeleted = 0\\s*\\r\\n\\s*AND refSource.ResourceTypeId IN \\((\\d*)\\)\\s*\\r\\n\\s*AND EXISTS \\(SELECT \\* FROM cte(\\d+) WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1");

private const string RemoveCteMatchBase = "(\\s*,cte<CteNumber> AS\\s*\\r\\n\\s*\\(\\s*\\r\\n\\s*SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch\\s*\\r\\n\\s*FROM dbo.ReferenceSearchParam refSource\\s*\\r\\n\\s*JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId\\s*\\r\\n\\s*WHERE refSource.SearchParamId = <SearchParamId>\\s*\\r\\n\\s*AND refTarget.IsHistory = 0\\s*\\r\\n\\s*AND refTarget.IsDeleted = 0\\s*\\r\\n\\s*AND refSource.ResourceTypeId IN \\(<ResourceTypeId>\\)\\s*\\r\\n\\s*AND EXISTS \\(SELECT \\* FROM cte<SourceCte> WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1.*\\r\\n\\s*\\)\\s*\\r\\n\\s*,cte<CteNextNumber> AS\\s*\\r\\n\\s*\\(\\s*\\r\\n\\s*SELECT DISTINCT .*T1, Sid1, IsMatch, .* AS IsPartial\\s*\\r\\n\\s*FROM cte<CteNumber>\\s*\\r\\n\\s*\\))";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is simpler than fixing the actual generator? Should this be a rewriter instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take another look. Changing the generator looked more complicated.

@LTA-Thinking
Copy link
Collaborator Author

For the query:
GET https://{{hostname}}/DiagnosticReport?_include=DiagnosticReport:encounter&_include:iterate=Location:organization&_include:iterate=DiagnosticReport:based-on&_include:iterate=Encounter:location&_include:iterate=ServiceRequest:encounter

Original:

      DECLARE @p0 int = 11
      DECLARE @p1 int = 11
      DECLARE @p2 int = 100
      DECLARE @p3 int = 100
      DECLARE @p4 int = 100
      DECLARE @p5 int = 100

      SET STATISTICS IO ON;
      SET STATISTICS TIME ON;

      DECLARE @FilteredData AS TABLE (T1 smallint, Sid1 bigint, IsMatch bit, IsPartial bit, Row int)
      ;WITH
      cte0 AS
      (
          SELECT ResourceTypeId AS T1, ResourceSurrogateId AS Sid1
          FROM dbo.Resource
          WHERE IsHistory = 0
              AND IsDeleted = 0
              AND ResourceTypeId = 40
      )
      ,cte1 AS
      (
          SELECT row_number() OVER (ORDER BY T1 ASC, Sid1 ASC) AS Row, *
          FROM
          (
              SELECT DISTINCT TOP (@p0) T1, Sid1, 1 AS IsMatch, 0 AS IsPartial
              FROM cte0
              ORDER BY T1 ASC, Sid1 ASC
          ) t
      )
      INSERT INTO @FilteredData SELECT T1, Sid1, IsMatch, IsPartial, Row FROM cte1
      ;WITH cte1 AS (SELECT * FROM @FilteredData)
      ,cte2 AS
      (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 204
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (40)
              AND EXISTS (SELECT * FROM cte1 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1 AND Row < @p1)
      )
      ,cte3 AS
      (
          SELECT DISTINCT T1, Sid1, IsMatch, 0 AS IsPartial
          FROM cte2
      )
      ,cte4 AS
      (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 404
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (40)
              AND EXISTS (SELECT * FROM cte1 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1)
      )
      ,cte5 AS
      (
          SELECT DISTINCT T1, Sid1, IsMatch, 0 AS IsPartial
          FROM cte4
      )
      ,cte6 AS
      (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 204
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (124)
              AND EXISTS (SELECT * FROM cte5 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1)
      )
      ,cte7 AS
      (
          SELECT DISTINCT T1, Sid1, IsMatch, 0 AS IsPartial
          FROM cte6
      )
      ,cte8 AS
      (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 470
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (44)
              AND EXISTS (SELECT * FROM cte3 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1)
          )
          ,cte9 AS
          (
          SELECT DISTINCT TOP (@p2) T1, Sid1, IsMatch, CASE WHEN count_big(*) over() > @p3 THEN 1 ELSE 0 END AS IsPartial
          FROM cte8
          )
          ,cte10 AS
          (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 470
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (44)
              AND EXISTS (SELECT * FROM cte7 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1)
      )
      ,cte11 AS
      (
          SELECT DISTINCT T1, Sid1, IsMatch, 0 AS IsPartial
          FROM cte10
      )
      ,cte12 AS
      (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 770
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (71)
              AND EXISTS (SELECT * FROM cte9 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1)
          )
          ,cte13 AS
          (
          SELECT DISTINCT TOP (@p4) T1, Sid1, IsMatch, CASE WHEN count_big(*) over() > @p5 THEN 1 ELSE 0 END AS IsPartial
          FROM cte12
          )
          ,cte14 AS
          (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 770
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (71)
              AND EXISTS (SELECT * FROM cte11 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1)
      )
      ,cte15 AS
      (
          SELECT DISTINCT T1, Sid1, IsMatch, 0 AS IsPartial
          FROM cte14
      )
      ,cte16 AS
      (
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte1
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte3 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte3.Sid1 AND cte1.T1 = cte3.T1)
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte5 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte5.Sid1 AND cte1.T1 = cte5.T1)
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte7 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte7.Sid1 AND cte1.T1 = cte7.T1)
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte9 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte9.Sid1 AND cte1.T1 = cte9.T1)
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte11 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte11.Sid1 AND cte1.T1 = cte11.T1)
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte13 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte13.Sid1 AND cte1.T1 = cte13.T1)
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte15 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte15.Sid1 AND cte1.T1 = cte15.T1)
      )
      SELECT DISTINCT r.ResourceTypeId, r.ResourceId, r.Version, r.IsDeleted, r.ResourceSurrogateId, r.RequestMethod, CAST(IsMatch AS bit) AS IsMatch, CAST(IsPartial AS bit) AS IsPartial, r.IsRawResourceMetaSet, r.SearchParamHash, r.RawResource
      FROM dbo.Resource r
           JOIN cte16 ON r.ResourceTypeId = cte16.T1 AND r.ResourceSurrogateId = cte16.Sid1
      WHERE IsHistory = 0
          AND IsDeleted = 0
      ORDER BY IsMatch DESC, r.ResourceTypeId ASC, r.ResourceSurrogateId ASC

After change:

      DECLARE @p0 int = 11
      DECLARE @p1 int = 11
      DECLARE @p2 int = 100
      DECLARE @p3 int = 100
      DECLARE @p4 int = 100
      DECLARE @p5 int = 100

      SET STATISTICS IO ON;
      SET STATISTICS TIME ON;

      DECLARE @FilteredData AS TABLE (T1 smallint, Sid1 bigint, IsMatch bit, IsPartial bit, Row int)
      ;WITH
      cte0 AS
      (
          SELECT ResourceTypeId AS T1, ResourceSurrogateId AS Sid1
          FROM dbo.Resource
          WHERE IsHistory = 0
              AND IsDeleted = 0
              AND ResourceTypeId = 40
      )
      ,cte1 AS
      (
          SELECT row_number() OVER (ORDER BY T1 ASC, Sid1 ASC) AS Row, *
          FROM
          (
              SELECT DISTINCT TOP (@p0) T1, Sid1, 1 AS IsMatch, 0 AS IsPartial
              FROM cte0
              ORDER BY T1 ASC, Sid1 ASC
          ) t
      )
      INSERT INTO @FilteredData SELECT T1, Sid1, IsMatch, IsPartial, Row FROM cte1
      ;WITH cte1 AS (SELECT * FROM @FilteredData)
      ,cte2 AS
      (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 204
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (40)
              AND EXISTS (SELECT * FROM cte1 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1 AND Row < @p1)
      )
      ,cte3 AS
      (
          SELECT DISTINCT T1, Sid1, IsMatch, 0 AS IsPartial
          FROM cte2
      )
      ,cte4 AS
      (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 404
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (40)
              AND EXISTS (SELECT * FROM cte1 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1)
      )
      ,cte5 AS
      (
          SELECT DISTINCT T1, Sid1, IsMatch, 0 AS IsPartial
          FROM cte4
      )
      ,cte6 AS
      (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 204
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (124)
              AND EXISTS (SELECT * FROM cte5 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1)
      )
      ,cte7 AS
      (
          SELECT DISTINCT T1, Sid1, IsMatch, 0 AS IsPartial
          FROM cte6
      )
          ,cte10 AS
          (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 470
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (44)
              AND EXISTS (SELECT * FROM cte7 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1 UNION SELECT * FROM cte3 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1)
      )
      ,cte11 AS
      (
          SELECT DISTINCT T1, Sid1, IsMatch, 0 AS IsPartial
          FROM cte10
      )
          ,cte14 AS
          (
          SELECT DISTINCT refTarget.ResourceTypeId AS T1, refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch
          FROM dbo.ReferenceSearchParam refSource
               JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId
          WHERE refSource.SearchParamId = 770
              AND refTarget.IsHistory = 0
              AND refTarget.IsDeleted = 0
              AND refSource.ResourceTypeId IN (71)
              AND EXISTS (SELECT * FROM cte11 WHERE refSource.ResourceTypeId = T1 AND refSource.ResourceSurrogateId = Sid1)
      )
      ,cte15 AS
      (
          SELECT DISTINCT T1, Sid1, IsMatch, 0 AS IsPartial
          FROM cte14
      )
      ,cte16 AS
      (
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte1
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte3 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte3.Sid1 AND cte1.T1 = cte3.T1)
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte5 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte5.Sid1 AND cte1.T1 = cte5.T1)
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte7 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte7.Sid1 AND cte1.T1 = cte7.T1)
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte11 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte11.Sid1 AND cte1.T1 = cte11.T1)
          UNION ALL
          SELECT T1, Sid1, IsMatch, IsPartial
          FROM cte15 WHERE NOT EXISTS (SELECT * FROM cte1 WHERE cte1.Sid1 = cte15.Sid1 AND cte1.T1 = cte15.T1)
      )
      SELECT DISTINCT r.ResourceTypeId, r.ResourceId, r.Version, r.IsDeleted, r.ResourceSurrogateId, r.RequestMethod, CAST(IsMatch AS bit) AS IsMatch, CAST(IsPartial AS bit) AS IsPartial, r.IsRawResourceMetaSet, r.SearchParamHash, r.RawResource
      FROM dbo.Resource r
           JOIN cte16 ON r.ResourceTypeId = cte16.T1 AND r.ResourceSurrogateId = cte16.Sid1
      WHERE IsHistory = 0
          AND IsDeleted = 0
      ORDER BY IsMatch DESC, r.ResourceTypeId ASC, r.ResourceSurrogateId ASC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs Enhancement Enhancement on existing functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants