Define Raw Data Ranges directly or via (…) #10

playmobilmeister · 2023-06-06T10:14:00Z

My first use involves 3 types of data in each sheet that I need to record

symbol	denotes
➟	The transformation That is required in my case
A➟B	A in Excel Sheet will transform to B in the final Data Frame

~~non-calculated single-cell case ➟ column names~~
Raw data table¹ ➟ column name
Combine input from all Files to a Sheet Name column and a File Name column

I assume that if you would have wanted to handle the latter 2 tasks, you would have create additional functions that utilize 📦TidyXL:: or something similar ?

¹ This is analogous to capturing the actual time-sheet entries in the published example

Describe the solution you'd like
I would like to be able to define not only column names as in the example's {{ column-name }} , but also to be able to define raw data ranges. In the image below I give an example (from real life) where the top part of the form includes some fields that are already captured well by xlcutter::, and the bottom part of the form includes the raw observations. These raw observations are the essential data, but to each of their lines the columns such as {{ Reservoir_Name }} must be added,as are the names of the file and sheet where they came from. In addition, I'll need to add a skip rows = argument (As many people made changes to the forms, but for now I can live without it.

Additional context
Form example

The text was updated successfully, but these errors were encountered:

Bisaloo · 2023-06-09T09:06:56Z

Thanks for opening this issue!

My main question about your use case is how you expect the data from different excel files to be consolidated into a single file.

The assumption of this package is that each excel file will end up in a single row. It would probably be technical possible (although not super straightforward) to allow ranges but I'm concerned about the conceptual & design implications. With the change you are proposing, a row could either correspond to an entire file, or an element of this file.

As a consequence, I'm not inclined to do this change. However, I think you're pointing out an interesting use case that should be addressed. Fortunately, xlcutter in its current state can already deal with it with a short data wrangling step.

Let's take again the example from the README but pass it a modified template (available in this GitHub repo):

It is then possible to get the output you're asking for with the following code:

library(xlcutter)

data_files <- list.files(
  system.file("example", "timesheet", package = "xlcutter"),
  pattern = "\\.xlsx$",
  full.names = TRUE
)

download.file(
  "https://github.com/Bisaloo/xlcutter/blob/30f6a8b1ac2a4dbb88d8742d197cd528c022117b/inst/example/expanded_timesheet_template.xlsx?raw=TRUE",
  "expanded_timesheet_template.xlsx",
  mode = "wb"
)

out <- xlsx_cutter(
  data_files,
  "expanded_timesheet_template.xlsx"
)

head(out)
#>   employee_firstname contract_hours employee_lastname realised_hours
#> 1               Leon             35              Bedu          29.00
#> 2               Paul             35            Dupont          35.00
#> 3           Marianne             35            Lebrun          36.25
#>   manager_firstname manager_lastname period_start period_end hours_mon
#> 1              <NA>           Dubois   2022-01-03 2022-01-07      7.25
#> 2             Lydia           Dubois   2022-01-03 2022-01-07      7.00
#> 3             Lydia           Dubois   2022-01-03 2022-01-07      7.25
#>   hours_tue hours_wed hours_thu hours_fri
#> 1      7.25      7.25      7.25        NA
#> 2      7.00      7.00      7.00      7.00
#> 3      7.25      7.25      7.25      7.25

library(tidyr)

outp <- pivot_longer(
  out,
  starts_with("hours"),
  names_to = "day",
  values_to = "hours"
)

head(outp)
#> # A tibble: 6 × 10
#>   employee_firstname contract_hours employee_lastname realised_hours
#>   <chr>                       <int> <chr>                      <dbl>
#> 1 Leon                           35 Bedu                          29
#> 2 Leon                           35 Bedu                          29
#> 3 Leon                           35 Bedu                          29
#> 4 Leon                           35 Bedu                          29
#> 5 Leon                           35 Bedu                          29
#> 6 Paul                           35 Dupont                        35
#> # ℹ 6 more variables: manager_firstname <chr>, manager_lastname <chr>,
#> #   period_start <chr>, period_end <chr>, day <chr>, hours <dbl>
head(outp[, c("employee_lastname", "day", "hours")])
#> # A tibble: 6 × 3
#>   employee_lastname day       hours
#>   <chr>             <chr>     <dbl>
#> 1 Bedu              hours_mon  7.25
#> 2 Bedu              hours_tue  7.25
#> 3 Bedu              hours_wed  7.25
#> 4 Bedu              hours_thu  7.25
#> 5 Bedu              hours_fri NA   
#> 6 Dupont            hours_mon  7

^{Created on 2023-06-09 with reprex v2.0.2}

Does this address your use case or did I misunderstand?

If it addresses your use case, I will add a section to the vignette to detail this advanced use case
If I misunderstood, please provide an example of what you expect as output

playmobilmeister · 2023-06-19T08:54:28Z

It's a fine suggetions which should work in many situations. However, it does not address my situation. I'll provide a detailed explanation soon, but the gist of it is that the number of observations is variable (from 0 to 100, let's say (would be line 13:112 in your example).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define Raw Data Ranges directly or via (…) #10

Define Raw Data Ranges directly or via (…) #10

playmobilmeister commented Jun 6, 2023

Bisaloo commented Jun 9, 2023

playmobilmeister commented Jun 19, 2023

Define Raw Data Ranges directly or via (…) #10

Define Raw Data Ranges directly or via (…) #10

Comments

playmobilmeister commented Jun 6, 2023

Bisaloo commented Jun 9, 2023

playmobilmeister commented Jun 19, 2023