Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Required lines to display in table which is not in data #32

Open
NNaikp opened this issue Dec 13, 2021 · 3 comments
Open

Required lines to display in table which is not in data #32

NNaikp opened this issue Dec 13, 2021 · 3 comments

Comments

@NNaikp
Copy link

NNaikp commented Dec 13, 2021

We often have an issue where we want to/are required to display all possible responses on the CRF in the output table, but not all responses are reported in the trial. This is the case for e.g. adjudication tables:

image

It happens every now and then that you only have a single or a few of these events, but still want to display all possible responses.
Previously the programmers just hardcoded it somewhere in the program. There are multiple issues with this:

  • Handling of events that are not there now, but might appear in the future.
  • If the output is parallel programmed/double programmed what should then be kept in the underlying output datasets
  • Lengthy text outputs with almost no underlying data makes all programmers tired.

We have made an input in our TFL package, that can eat required lines and place them. I wonder if some of the packages out there already has such a solution? Or maybe your company has done something better? 😃

(Excerpt from one of our vignettes)

image
image
image
image
image

@gmbecker
Copy link
Collaborator

gmbecker commented Jan 6, 2022

Hi @NNaikp,

So speaking only for the approach you'd take with rtables, the typical table you'd make to display this set of information is slightly different (see below), but the key is that a) you can have distinct "analyses" that operate on different variables, and b) it's straightforward to write the analysis functions such that factor level, rather than observed values, dictates how many rows show up in an analysis block:

We'll use a slightly modified version of ex_adsl provided by rtables:

adsl2 <- ex_adsl
adsl2$smoker <- factor(NA, levels = c("10 cigarettes", ">10 cigarettes"))
adsl2$age_grp <- cut(adsl2$AGE, c(18, 65, 75, 1000), labels = c("18 <= to < 65",
                                                                "65 <= to < 75",
                                                                "Elderly >= 75"))

## make one of the factor levels of SEX variable empty
adsl2 <- subset(adsl2, SEX != "UNDIFFERENTIATED")

We then write an analysis function (which we will use for all the variables). Note if we didn't want the percentages there we would not need to specify the analysis function at all. Note here that analysis functions are passed the column observation count as .N_col and can use it in their computations during cell content generation, which we use here to get the percentages.

## helper that omits the pct entirely if the count is 0
count_pct <- function(x, .N_col, ...) {
    if( x == 0 ) {
        rcell(0, format = "xx")
    } else {
        rcell(c(x, x/.N_col), format = "xx (xx.x%)")
    }
}

## analysis function: table factor then apply above to get our cell values
tab_w_pct <- function(x, .N_col, ...) {
    tab <- as.list(table(x))
    lapply(tab, count_pct, .N_col = .N_col)
}

With that done the layout simply analyzes each of the desired variables:

lyt <- basic_table(show_colcounts = TRUE) %>%
    split_cols_by("ARM") %>%
    analyze("SEX", tab_w_pct, var_labels = "Gender") %>%
    analyze("smoker", tab_w_pct) %>%
    analyze("age_grp", tab_w_pct)


build_table(lyt, adsl2)

Which gives us

                      A: Drug X      B: Placebo    C: Combination
                       (N=133)        (N=134)         (N=130)    
—————————————————————————————————————————————————————————————————
Gender                                                           
  F                   79 (59.4%)     77 (57.5%)      66 (50.8%)  
  M                   51 (38.3%)     55 (41.0%)      60 (46.2%)  
  U                    3 (2.3%)       2 (1.5%)        4 (3.1%)   
  UNDIFFERENTIATED        0              0               0       
smoker                                                           
  10 cigarettes           0              0               0       
  >10 cigarettes          0              0               0       
age_grp                                                          
  18 <= to < 65      133 (100.0%)   134 (100.0%)    129 (99.2%)  
  65 <= to < 75           0              0            1 (0.8%)   
  Elderly >= 75           0              0               0       

Now as I noted this table is slightly different, as the patient counts are in the column header area rather than as a separate row.

To get something more exactly like the displayed table we could do:

lyt2 <- basic_table() %>%
    split_cols_by("ARM") %>%
    analyze("USUBJID", function(x) in_rows("Number of Patients" = length(unique(x))),
            show_labels = "hidden") %>%
    analyze("SEX", tab_w_pct, var_labels = "Gender") %>%
    analyze("smoker", tab_w_pct) %>%
    analyze("age_grp", tab_w_pct)


build_table(lyt2, adsl2)
                       A: Drug X      B: Placebo    C: Combination
—————————————————————————————————————————————————————————————————
Number of Patients       133            134             130      
Gender                                                           
  F                   79 (59.4%)     77 (57.5%)      66 (50.8%)  
  M                   51 (38.3%)     55 (41.0%)      60 (46.2%)  
  U                    3 (2.3%)       2 (1.5%)        4 (3.1%)   
  UNDIFFERENTIATED        0              0               0       
smoker                                                           
  10 cigarettes           0              0               0       
  >10 cigarettes          0              0               0       
age_grp                                                          
  18 <= to < 65      133 (100.0%)   134 (100.0%)    129 (99.2%)  
  65 <= to < 75           0              0            1 (0.8%)   
  Elderly >= 75           0              0               0       

Note that the percentages are still being calculated based on those column counts, not actually on the row cell values. Access to prior calculated rows counts/data is possible only for subtable structures an analysis is nested within.

If we wanted the total patient counts to be reprinted as context after pagination (which might make sense or very much not, depending on the goals), we would want that to be what we call a "group summary row" or more technically a "content row", which we would do like so:

lyt3 <- basic_table() %>%
    split_cols_by("ARM") %>%
    summarize_row_groups("USUBJID", label_fstr = "Number of Patients", format = "xx") %>%
    analyze("SEX", tab_w_pct, var_labels = "Gender", indent_mod = -1) %>%
    analyze("smoker", tab_w_pct, indent_mod = -1) %>%
    analyze("age_grp", tab_w_pct, indent_mod = -1)

tab <- build_table(lyt3, adsl2)

paginate_table(tab, lpp = 10)

Which gives us:

[[1]]
                     A: Drug X    B: Placebo   C: Combination
—————————————————————————————————————————————————————————————
Number of Patients      133          134            130      
Gender                                                       
  F                  79 (59.4%)   77 (57.5%)     66 (50.8%)  
  M                  51 (38.3%)   55 (41.0%)     60 (46.2%)  
  U                   3 (2.3%)     2 (1.5%)       4 (3.1%)   
  UNDIFFERENTIATED       0            0              0       

[[2]]
                      A: Drug X      B: Placebo    C: Combination
—————————————————————————————————————————————————————————————————
Number of Patients       133            134             130      
  smoker                                                         
    10 cigarettes         0              0               0       
    >10 cigarettes        0              0               0       
  age_grp                                                        
    18 <= to < 65    133 (100.0%)   134 (100.0%)    129 (99.2%)  
    65 <= to < 75         0              0            1 (0.8%)   
    Elderly >= 75         0              0               0       

@elimillera
Copy link
Member

Hey @NNaikp
Giving the Tplyr approach here. The idea of this is similar to the above. Tplyr will use all factors when displaying counts, see below:

library(Tplyr)

adsl <- safetyData::adam_adsl %>%
  mutate(
    AGEGR1 = factor(AGEGR1, c("<65", "65-80", ">80", "Unknown")),
    BMIBLGR1 = factor(BMIBLGR1, c("<25", "25-<30", ">=30", "Unknown"))
  )


t <- tplyr_table(adsl, TRT01A) %>%
  add_layer(
    group_count("Number of Subjects")
  ) %>%
  add_layer(
    group_count(AGEGR1, by = "AGE")
      )

t %>%
  build() %>%
  apply_row_masks(row_breaks = TRUE) %>%
  select(-starts_with("ord")) %>%
  add_column_headers(
    "| | Placebo (N=**Placebo**)  | Xanomeline High (N=**Xanomeline High Dose**) |
    Xanomeline Low (N=**Xanomeline Low Dose**)",
    header_n(t)
  )
# A tibble: 8 × 5
  row_label1           row_label2 var1_Placebo     `var1_Xanomeline High Dose` `var1_Xanomeline Low D…
  <chr>                <chr>      <chr>            <chr>                       <chr>                  
1 ""                   ""         "Placebo (N=86)" "Xanomeline High (N=84)"    "Xanomeline Low (N=84)"
2 "Number of Subjects" ""         "86 (100.0%)"    "84 (100.0%)"               "84 (100.0%)"          
3 ""                   ""         ""               ""                          ""                     
4 "AGE"                "<65"      "14 ( 16.3%)"    "11 ( 13.1%)"               " 8 (  9.5%)"          
5 ""                   "65-80"    "42 ( 48.8%)"    "55 ( 65.5%)"               "47 ( 56.0%)"          
6 ""                   ">80"      "30 ( 34.9%)"    "18 ( 21.4%)"               "29 ( 34.5%)"          
7 ""                   "Unknown"  " 0 (  0.0%)"    " 0 (  0.0%)"               " 0 (  0.0%)"          
8 ""                   ""         ""               ""                          ""              

Tplyr can also include event counts along with total count rows which looks to be one of the requirements of the table above.

@NNaikp
Copy link
Author

NNaikp commented Feb 5, 2022

Thanks @gmbecker and @elimillera for the elaborate replies!
And thanks for bringing up the use of factors when deriving summary statistics. We have primarily been using factors created from the col/colN pairs in ADaM to handle sorting so we didn't really think of this. I can see now that bringing codelists into play and creating factors based on that also makes a lot of sense 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants