Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Raw Data Ranges directly or via (…) #10

Open
1 of 3 tasks
playmobilmeister opened this issue Jun 6, 2023 · 2 comments
Open
1 of 3 tasks

Define Raw Data Ranges directly or via (…) #10

playmobilmeister opened this issue Jun 6, 2023 · 2 comments

Comments

@playmobilmeister
Copy link

My first use involves 3 types of data in each sheet that I need to record

symbol denotes
The transformation That is required in my case
A➟B A in Excel Sheet will transform to B in the final Data Frame
  • non-calculated single-cell case ➟ column names
  • Raw data table¹ ➟ column name
  • Combine input from all Files to a Sheet Name column and a File Name column

I assume that if you would have wanted to handle the latter 2 tasks, you would have create additional functions that utilize 📦TidyXL:: or something similar ?

¹ This is analogous to capturing the actual time-sheet entries in the published example

Describe the solution you'd like
I would like to be able to define not only column names as in the example's {{ column-name }} , but also to be able to define raw data ranges. In the image below I give an example (from real life) where the top part of the form includes some fields that are already captured well by xlcutter::, and the bottom part of the form includes the raw observations. These raw observations are the essential data, but to each of their lines the columns such as {{ Reservoir_Name }} must be added,as are the names of the file and sheet where they came from. In addition, I'll need to add a skip rows = argument (As many people made changes to the forms, but for now I can live without it.

Additional context
Form example
Form example

@Bisaloo
Copy link
Owner

Bisaloo commented Jun 9, 2023

Thanks for opening this issue!

My main question about your use case is how you expect the data from different excel files to be consolidated into a single file.

The assumption of this package is that each excel file will end up in a single row. It would probably be technical possible (although not super straightforward) to allow ranges but I'm concerned about the conceptual & design implications. With the change you are proposing, a row could either correspond to an entire file, or an element of this file.

As a consequence, I'm not inclined to do this change. However, I think you're pointing out an interesting use case that should be addressed. Fortunately, xlcutter in its current state can already deal with it with a short data wrangling step.

Let's take again the example from the README but pass it a modified template (available in this GitHub repo):

image

It is then possible to get the output you're asking for with the following code:

library(xlcutter)

data_files <- list.files(
  system.file("example", "timesheet", package = "xlcutter"),
  pattern = "\\.xlsx$",
  full.names = TRUE
)

download.file(
  "https://github.com/Bisaloo/xlcutter/blob/30f6a8b1ac2a4dbb88d8742d197cd528c022117b/inst/example/expanded_timesheet_template.xlsx?raw=TRUE",
  "expanded_timesheet_template.xlsx",
  mode = "wb"
)

out <- xlsx_cutter(
  data_files,
  "expanded_timesheet_template.xlsx"
)

head(out)
#>   employee_firstname contract_hours employee_lastname realised_hours
#> 1               Leon             35              Bedu          29.00
#> 2               Paul             35            Dupont          35.00
#> 3           Marianne             35            Lebrun          36.25
#>   manager_firstname manager_lastname period_start period_end hours_mon
#> 1              <NA>           Dubois   2022-01-03 2022-01-07      7.25
#> 2             Lydia           Dubois   2022-01-03 2022-01-07      7.00
#> 3             Lydia           Dubois   2022-01-03 2022-01-07      7.25
#>   hours_tue hours_wed hours_thu hours_fri
#> 1      7.25      7.25      7.25        NA
#> 2      7.00      7.00      7.00      7.00
#> 3      7.25      7.25      7.25      7.25

library(tidyr)

outp <- pivot_longer(
  out,
  starts_with("hours"),
  names_to = "day",
  values_to = "hours"
)

head(outp)
#> # A tibble: 6 × 10
#>   employee_firstname contract_hours employee_lastname realised_hours
#>   <chr>                       <int> <chr>                      <dbl>
#> 1 Leon                           35 Bedu                          29
#> 2 Leon                           35 Bedu                          29
#> 3 Leon                           35 Bedu                          29
#> 4 Leon                           35 Bedu                          29
#> 5 Leon                           35 Bedu                          29
#> 6 Paul                           35 Dupont                        35
#> # ℹ 6 more variables: manager_firstname <chr>, manager_lastname <chr>,
#> #   period_start <chr>, period_end <chr>, day <chr>, hours <dbl>
head(outp[, c("employee_lastname", "day", "hours")])
#> # A tibble: 6 × 3
#>   employee_lastname day       hours
#>   <chr>             <chr>     <dbl>
#> 1 Bedu              hours_mon  7.25
#> 2 Bedu              hours_tue  7.25
#> 3 Bedu              hours_wed  7.25
#> 4 Bedu              hours_thu  7.25
#> 5 Bedu              hours_fri NA   
#> 6 Dupont            hours_mon  7

Created on 2023-06-09 with reprex v2.0.2

Does this address your use case or did I misunderstand?

  • If it addresses your use case, I will add a section to the vignette to detail this advanced use case
  • If I misunderstood, please provide an example of what you expect as output

@playmobilmeister
Copy link
Author

It's a fine suggetions which should work in many situations. However, it does not address my situation. I'll provide a detailed explanation soon, but the gist of it is that the number of observations is variable (from 0 to 100, let's say (would be line 13:112 in your example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants