-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement batch processing #143
Comments
Do you think this code could live in ellmer itself? Do you have any sense of what the UI might look like? |
I was thinking that in a batch scenario you still want to be able to do all the same things that you do in a regular chat, but you just want to do many of them at once. So maybe an interface like this would work: library(ellmer)
chat <- chat_openai("You're reply succintly")
chat$register_tool(tool(function() Sys.Date(), "Return the current date"))
chat$chat_batch(list(
"What's the date today?",
"What's the date tomorrow?",
"What's the date yesterday?",
"What's the date next week?",
"What's the date last week?",
)) I think the main challenge is what would Other thoughts:
|
After revisiting this, I do think my original ideas were ambitious, and I would like to see this code in the ellmer package if possible. I revised my original post to simplify the concept. I think your interface idea is a good start, and as for the returned object, would it be possible to return a list not only of the text answers, but a nested list with each chat object for diagnostics? Although that might be a beefy object if you have big batches (maybe only return the chat object if there was an issue?). |
@t-emery - Adding Teal because I know he has experience using ellmer for batch processing. |
FWIW, my use-case was batched structured data extraction. I was reading text data from a tibble and doing structured data extraction (x 18,000). In my initial attempts, I found that I was using far more tokens than made sense given the inputs. Eventually I found the workable answer was to clear the turns after each time. Then everything worked as expected. I'm new to using LLM APIs, so I have great humility that I might be missing something simple. I haven't had time to think about this deeply yet, and I haven't figured out what part of this is a documentation issue (creating documentation about best practices for running a lot of data) versus a feature issue. Here are the relevant functions. The key fix was:
# 1. Core Classification Function ----
classify_single_project <- function(text, chat, type_spec) {
result <- purrr::safely(
~ chat$extract_data(text, type = type_spec),
otherwise = NULL,
quiet = TRUE
)()
tibble::tibble(
success = is.null(result$error),
error_message = if(is.null(result$error)) NA_character_ else as.character(result$error),
classification = list(result$result)
)
}
# 2. Process a chunk of data ----
# 3. Modify process_chunk to handle provider types
process_chunk <- function(chunk, chat, type_spec, provider_type = NULL) {
# Use the provider_type parameter to determine which classification function to use
classify_fn <- if(provider_type == "deepseek") {
classify_single_project_deepseek
} else {
classify_single_project
}
chunk |>
mutate(
classification_result = map(
combined_text,
~{
# Clear all prior turns so we start fresh for *each* record
chat$set_turns(list())
classify_fn(.x, chat, type_spec)
}
)
) |>
unnest(classification_result) |>
mutate(
primary_class = map_chr(classification, ~.x$classification$primary %||% NA_character_),
confidence = map_chr(classification, ~.x$classification$confidence %||% NA_character_),
project_type = map_chr(classification, ~.x$classification$project_type %||% NA_character_),
justification = map_chr(classification, ~.x$justification %||% NA_character_),
evidence = map_chr(classification, ~.x$evidence %||% NA_character_)
) |>
select(-classification)
} |
@t-emery a slightly better way to ensure that the chats are standalone is to call |
Beyond retries, it would be great to handle interrupted batches and resume when you re-call the function. For example: library(ellmer)
chat <- chat_openai("You reply succinctly")
chat$register_tool(tool(
function() Sys.Date(),
"Return the current date"
))
prompts <- list(
"What's the date today?",
"What's the date tomorrow?",
"What's the date yesterday?",
"What's the date next week?",
"What's the date last week?"
)
# Initial processing (interrupted and returns a partial object)
responses <- chat$chat_batch(prompts)
# Resume processing
responses <- chat$chat_batch(prompts, responses) Where chat_batch <- function(prompts, last_chat = NULL) {
if (!is.null(last_chat) && !last_chat$complete) {
chat$process_batch(prompts, start_at = last_chat$last_index + 1)
} else {
chat$process_batch(prompts)
}
} |
@dylanpieper I think it's a bit better to return a richer object that lets you resume. Something like this maybe: batched_chat <- chat$chat_batch(prompts)
batched_chat$process()
# ctrl + c
batched_chat$process() # resumes where the work left off That object would also be the way you get either rich chat objects or simple text responses: batched_chat$texts()
batched_chat$chats() For the batch (not parallel) case, where it might take up to 24 hours, the function would also need to serialise something to disk, so if your R process completely dies, you can resume it in a fresh session: batched_chat <- resume(some_path) (All of these names are up for discussion, I just brain-dumped quickly.) |
The naming seems good and the functional OOP style is elegant for sure. I can start tinkering with defining an S7 class and writing functions for the sequential batch case. I need to get hands-on and wrap my head around the ellmer architecture. I feel less confident about the error handling needs, which is likely where I'll need more help. |
I’m working on parallel requests which I think should be superior to sequentially (esp since you’ll be able to opt-in to sequential by using just a single “parallel” request). Probably won’t get this finished for a week or two since we have a company get together and I have to think through how to resolve the current limitations with |
@hadley Can you explain the reasoning to default to parallel? I was thinking that the only advantage for parallel is for multi-model or provider batches where you are seen as a new requester per API call. What I've read about single model parallelism is that it is often slower because of the provider's rate limits per model. |
Looking at the openAI rate limits suggests that parallelism is going to be a big improvement for most use cases. (Plus I’d rather solve this problem once in a way that also benefits the design of httr2) |
Just an fyi, naming-wise, thought I'd mention there is something on both openai and anthropic called batch processing, which is more of an overnight job submission type of option:
Just wanted to alert you in case you wanted to distinguish this as a parallel_chats or something along those lines to avoid confusion. |
The tutorial states: “The most programmatic way to chat is to create the chat object inside a function. By doing so, live streaming is automatically suppressed, and |
@Marwolaeth why are you concerned with creating multiple chat objects? |
I wasn't sure whether it would introduce significant overhead. I also suspected that a So the final answer is here: creating multiple chats does not deteriorate the performance and should be preferred. I should have realized that a |
I packaged my interim batch processing solution (sequential) until this is complete. https://github.com/dylanpieper/hellmer |
I added parallel processing to my package via future/furrr and it's indeed a big improvement. I do see the logic in doing it right the first time now using httr2 (at least as a core ellmer feature). I'm not sure how well my retry handling will hold up in the wild. I urge people to try it out though. It's much better than any other solution I've encountered thus far. https://github.com/dylanpieper/hellmer |
Hi everyone! I came across this issue and this comment in particular, and I would like to add my thoughts (as an intensive user of batch LLM calls). The approach you are currently discussing seems to rely on parallel calls rather than genuine "batch" calls. With a proper batch feature (similar to OpenAI), you can have your calls run without keeping your session active or even your system switched on; there's a substantially reduced rate for bulk processing (i.e., batch_price = standard_price/2), and also a guarantee your job will finish within the next 24h (by experience, often a lot less!). This can be especially helpful when dealing with large workloads (e.g., ~10^4+ calls). If For example, how does a "parallel” chat_batch() approach enhance or offer advantages in the following scenario (and readily generalize to structured output, function calls, data-interpolated prompts, and all the other great library(targets)
tar_dir({
tar_script({
library(targets)
library(tarchetypes)
library(crew)
ellmer_controller <- crew_controller_local(
name = "ellmer's crew controller",
workers = 2
)
tar_option_set(
controller = ellmer_controller,
packages = c("ellmer", "dplyr")
)
list(
tar_target(
chatMaster,
chat_openai(system_prompt = "Reply concisely, one sentence.")
),
tar_target(
prompts,
c(
"What is the meaning of life?",
"What is the best way to learn R?",
"What is the best way to learn Tidyverse?"
)
),
tar_target(
chatResponses,
{
tibble(
chat = list(chatMaster$clone()),
response = chat[[1]]$chat(prompts),
tokens_input = chat[[1]]$tokens()[3, "input"],
tokens_output = chat[[1]]$tokens()[3, "output"]
)
},
pattern = map(prompts)
)
)
})
tar_make()
tar_read(chatResponses) |> print()
tar_crew()
})
#> ▶ dispatched target chatMaster
#> ▶ dispatched target prompts
#> ● completed target prompts [0.125 seconds, 118 bytes]
#> ● completed target chatMaster [0.172 seconds, 8.91 kilobytes]
#> ▶ dispatched branch chatResponses_8aae0534d8dc1db8
#> ▶ dispatched branch chatResponses_05413695e06a1f47
#> ● completed branch chatResponses_8aae0534d8dc1db8 [1.078 seconds, 13.392 kilobytes]
#> ▶ dispatched branch chatResponses_eec392bfe0cfa1d2
#> ● completed branch chatResponses_05413695e06a1f47 [1.125 seconds, 13.464 kilobytes]
#> ● completed branch chatResponses_eec392bfe0cfa1d2 [0.938 seconds, 13.469 kilobytes]
#> ● completed pattern chatResponses
#> ▶ ended pipeline [4.578 seconds]
#> # A tibble: 3 × 4
#> chat response tokens_input tokens_output
#> <list> <chr> <dbl> <dbl>
#> 1 <Chat> The meaning of life is subjective and can v… 26 20
#> 2 <Chat> The best way to learn R is to combine hands… 28 40
#> 3 <Chat> The best way to learn Tidyverse is by follo… 30 39
#> # A tibble: 2 × 4
#> controller worker seconds targets
#> <chr> <chr> <dbl> <int>
#> 1 ellmer's crew controller 1 2.30 3
#> 2 ellmer's crew controller 2 1.36 2 Created on 2025-02-17 with reprex v2.1.1 Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.2 (2024-10-31 ucrt)
#> os Windows 11 x64 (build 26100)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.utf8
#> ctype English_United States.utf8
#> tz Europe/Rome
#> date 2025-02-17
#> pandoc 3.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> backports 1.5.0 2024-05-23 [1] CRAN (R 4.4.0)
#> base64url 1.4 2018-05-14 [1] CRAN (R 4.4.1)
#> callr 3.7.6 2024-03-25 [1] CRAN (R 4.4.1)
#> cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.1)
#> codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.2)
#> coro 1.1.0 2024-11-05 [1] CRAN (R 4.4.2)
#> data.table 1.16.4 2024-12-06 [1] CRAN (R 4.4.2)
#> digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
#> ellmer 0.1.1 2025-02-07 [1] CRAN (R 4.4.2)
#> evaluate 1.0.1 2024-10-10 [1] CRAN (R 4.4.1)
#> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.1)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.1)
#> fs 1.6.5 2024-10-30 [1] RSPM
#> glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#> httr2 1.1.0 2025-01-18 [1] CRAN (R 4.4.2)
#> igraph 2.1.1 2024-10-19 [1] CRAN (R 4.4.2)
#> knitr 1.49 2024-11-08 [1] RSPM
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.1)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.1)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.1)
#> processx 3.8.4 2024-03-16 [1] CRAN (R 4.4.1)
#> ps 1.8.1 2024-10-28 [1] RSPM (R 4.4.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.1)
#> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.4.1)
#> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.1)
#> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.1)
#> rmarkdown 2.29 2024-11-04 [1] RSPM
#> rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.4.2)
#> S7 0.2.0 2024-11-07 [1] CRAN (R 4.4.2)
#> secretbase 1.0.3 2024-10-02 [1] CRAN (R 4.4.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.1)
#> targets * 1.9.1 2024-12-04 [1] CRAN (R 4.4.2)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.1)
#> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.1)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.1)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.1)
#> withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.2)
#> xfun 0.49 2024-10-31 [1] RSPM
#> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.1)
#>
#> [1] C:/Users/corra/AppData/Local/R/win-library/4.4
#> [2] C:/Program Files/R/R-4.4.2/library
#>
#> ────────────────────────────────────────────────────────────────────────────── |
@CorradoLanera This doesn't really address the ellmer implementation, but you can interact with batch APIs of several providers using tidyllm. |
@CorradoLanera yes, I understand the different between parallel and batch calls. The advantage of ellmer providing tools for parallel requests is that it's going to be much more efficient to do everything in the same process, rather than having to spin out multiple processes. It also makes it much easier to implement throttling. |
@dylanpieper we'll see how much faster doing it in httr2 is, but I would hope it was more like 5-10x faster. |
Yeah, performance looks pretty good: prompts <- list(
"What is the meaning of life?",
"What is the best way to learn R?",
"What is the best way to learn Tidyverse?",
"What colours are in the rainbow?",
"What's funny about legs?",
"What's the capital of the United States?",
"Where is the moon?"
)
chat <- chat_openai(system_prompt = "Reply concisely, one sentence.")
system.time(chats <- chat$chat_parallel(prompts))
#> user system elapsed
#> 0.039 0.003 1.248
chat <- chat_openai(system_prompt = "Reply concisely, one sentence.")
system.time(chats <- map(prompts, \(prompt) chat$clone()$chat(prompt, echo = FALSE)))
#> user system elapsed
#> 0.770 0.017 5.602 |
@dylanpieper as of a couple of days ago, most of those limitations have been lifted. I implemented something similar for interrupts too. |
Implement batch processing into ellmer to process lists of inputs through a single prompt. Batch processing enables running multiple inputs to get multiple responses while maintaining safety and ability to leverage ellmer's other features.
Basic Features (likely in scope):
Ambitious Features / Extensions (likely out of scope):
(something like inter-rater reliability but for LLMs)
The text was updated successfully, but these errors were encountered: