-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define Raw Data Ranges directly or via (…) #10
Comments
Thanks for opening this issue! My main question about your use case is how you expect the data from different excel files to be consolidated into a single file. The assumption of this package is that each excel file will end up in a single row. It would probably be technical possible (although not super straightforward) to allow ranges but I'm concerned about the conceptual & design implications. With the change you are proposing, a row could either correspond to an entire file, or an element of this file. As a consequence, I'm not inclined to do this change. However, I think you're pointing out an interesting use case that should be addressed. Fortunately, xlcutter in its current state can already deal with it with a short data wrangling step. Let's take again the example from the It is then possible to get the output you're asking for with the following code: library(xlcutter)
data_files <- list.files(
system.file("example", "timesheet", package = "xlcutter"),
pattern = "\\.xlsx$",
full.names = TRUE
)
download.file(
"https://github.com/Bisaloo/xlcutter/blob/30f6a8b1ac2a4dbb88d8742d197cd528c022117b/inst/example/expanded_timesheet_template.xlsx?raw=TRUE",
"expanded_timesheet_template.xlsx",
mode = "wb"
)
out <- xlsx_cutter(
data_files,
"expanded_timesheet_template.xlsx"
)
head(out)
#> employee_firstname contract_hours employee_lastname realised_hours
#> 1 Leon 35 Bedu 29.00
#> 2 Paul 35 Dupont 35.00
#> 3 Marianne 35 Lebrun 36.25
#> manager_firstname manager_lastname period_start period_end hours_mon
#> 1 <NA> Dubois 2022-01-03 2022-01-07 7.25
#> 2 Lydia Dubois 2022-01-03 2022-01-07 7.00
#> 3 Lydia Dubois 2022-01-03 2022-01-07 7.25
#> hours_tue hours_wed hours_thu hours_fri
#> 1 7.25 7.25 7.25 NA
#> 2 7.00 7.00 7.00 7.00
#> 3 7.25 7.25 7.25 7.25
library(tidyr)
outp <- pivot_longer(
out,
starts_with("hours"),
names_to = "day",
values_to = "hours"
)
head(outp)
#> # A tibble: 6 × 10
#> employee_firstname contract_hours employee_lastname realised_hours
#> <chr> <int> <chr> <dbl>
#> 1 Leon 35 Bedu 29
#> 2 Leon 35 Bedu 29
#> 3 Leon 35 Bedu 29
#> 4 Leon 35 Bedu 29
#> 5 Leon 35 Bedu 29
#> 6 Paul 35 Dupont 35
#> # ℹ 6 more variables: manager_firstname <chr>, manager_lastname <chr>,
#> # period_start <chr>, period_end <chr>, day <chr>, hours <dbl>
head(outp[, c("employee_lastname", "day", "hours")])
#> # A tibble: 6 × 3
#> employee_lastname day hours
#> <chr> <chr> <dbl>
#> 1 Bedu hours_mon 7.25
#> 2 Bedu hours_tue 7.25
#> 3 Bedu hours_wed 7.25
#> 4 Bedu hours_thu 7.25
#> 5 Bedu hours_fri NA
#> 6 Dupont hours_mon 7 Created on 2023-06-09 with reprex v2.0.2 Does this address your use case or did I misunderstand?
|
It's a fine suggetions which should work in many situations. However, it does not address my situation. I'll provide a detailed explanation soon, but the gist of it is that the number of observations is variable (from 0 to 100, let's say (would be line 13:112 in your example). |
My first use involves 3 types of data in each sheet that I need to record
non-calculated single-cell case ➟ column namesSheet Name
column and aFile Name
columnI assume that if you would have wanted to handle the latter 2 tasks, you would have create additional functions that utilize 📦
TidyXL::
or something similar ?¹ This is analogous to capturing the actual time-sheet entries in the published example
Describe the solution you'd like
I would like to be able to define not only column names as in the example's
{{ column-name }}
, but also to be able to define raw data ranges. In the image below I give an example (from real life) where the top part of the form includes some fields that are already captured well byxlcutter::
, and the bottom part of the form includes the raw observations. These raw observations are the essential data, but to each of their lines the columns such as{{ Reservoir_Name }}
must be added,as are the names of the file and sheet where they came from. In addition, I'll need to add askip rows =
argument (As many people made changes to the forms, but for now I can live without it.Additional context
![Form example](https://private-user-images.githubusercontent.com/33288162/243658858-902f9e27-e55d-44e8-9c32-1fa7ea7b5a11.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNzUyNjgsIm5iZiI6MTczOTM3NDk2OCwicGF0aCI6Ii8zMzI4ODE2Mi8yNDM2NTg4NTgtOTAyZjllMjctZTU1ZC00NGU4LTljMzItMWZhN2VhN2I1YTExLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDE1NDI0OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBmNGM0MmYzZDU3OGRkMWRiNGFhZTUyMTc2ZDNlMDhlYzI3MmJiYzQxMTdiYzg2ZmM4YWQ3ZWU4YTE4YmVkNGYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.8mhK4WFMHzivEmA9QXt3ZlTcdsieWLJOVgSLhff3xbo)
Form example
The text was updated successfully, but these errors were encountered: