load_custom_hf_dataset not handling the text_feature argument properly #1087

chimezie · 2024-11-03T22:56:23Z

If you use a hf_dataset configuration such as (for e.x.)

hf_dataset:
  name: "Open-Orca/OpenOrca"
  train_split: "train[:90%]"
  valid_split: "train[-10%:]"
  text_feature: "response"

It is supposed to work the same as the (local) text data format, but it comes up against:

        if prompt_feature and completion_feature:
            return CompletionsDataset(ds, tokenizer, prompt_feature, completion_feature)
        elif text_feature:
            return Dataset(train_ds, text_key=text_feature)
        else:
            raise ValueError(
                "Specify either a prompt and completion feature or a text "
                "feature for the Hugging Face dataset."
            )

which errors out because train_ds is not defined

The text was updated successfully, but these errors were encountered:

…, adds support for custom chat HF datasets (ml-explore#1088), and fixes (ml-explore#1087)

chimezie added a commit to chimezie/mlx-examples that referenced this issue Nov 4, 2024

Generalize HF datasets to a collection of HF dataasets via datasets…

9df7bbb

…, adds support for custom chat HF datasets (ml-explore#1088), and fixes (ml-explore#1087)

chimezie mentioned this issue Nov 4, 2024

Generalize HF datasets to a collection of HF datasets via hf_datasets #1090

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_custom_hf_dataset not handling the text_feature argument properly #1087

load_custom_hf_dataset not handling the text_feature argument properly #1087

chimezie commented Nov 3, 2024

load_custom_hf_dataset not handling the text_feature argument properly #1087

load_custom_hf_dataset not handling the text_feature argument properly #1087

Comments

chimezie commented Nov 3, 2024