partners: Fixed the procedure of initializing pad_token_id #29500

tishizaki · 2025-01-30T07:49:11Z

Description: Add to check pad_token_id and eos_token_id of model config. It seems that this is the same bug as the HuggingFace TGI bug. It's same bug as community: Fixed the procedure of initializing pad_token_id #29434
Issue: Llama-3.2-3B-Instruct failed to use with HuggingfacePipeline because of setting a non-string value as the pad_token #29431
Dependencies: none
Twitter handle: tell14

Example code is followings:

from langchain_huggingface.llms import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="meta-llama/Llama-3.2-3B-Instruct",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 10},
)

from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))

vercel · 2025-01-30T07:49:15Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Jan 30, 2025 1:34pm

Add to check pad_token_id and eos_token_id of model config. It seems that this is the same bug as the HuggingFace TGI bug. In additon, fix lint errors of test.

tishizaki · 2025-02-03T00:44:17Z

@ccurme I tried to add test program, but I didn't know how to refer the
tokenizer's member variables after doing HuggingFacePipeline.from_model_id() .
Sorry, the test program is incomplete.

Following is the program I'm currently considering to add to libs/partners/huggingface/tests/unit_tests/test_huggingface_pipeline.py.
But now, it's hard-coded program with the parameter of Llama-3.2-3B-Instruct.

@patch("transformers.AutoTokenizer.from_pretrained")
@patch("transformers.AutoModelForCausalLM.from_pretrained")
@patch("transformers.pipeline")
def test_initialization_with_from_model_id_pad_token_id(
    mock_pipeline: MagicMock, mock_model: MagicMock, mock_tokenizer: MagicMock
) -> None:
    """Test initialization with the from_model_id method"""

    mock_tokenizer.return_value = MagicMock(pad_token=None, eos_token_id=128009)
    mock_model.return_value = MagicMock()
    mock_model.return_value.config.pad_token_id = None
    mock_model.return_value.config.eos_token_id = [128001, 128008, 128009]

    mock_pipe = MagicMock()
    mock_pipe.task = "text-generation"
    mock_pipe.model = mock_model.return_value
    mock_pipeline.return_value = mock_pipe

    llm = HuggingFacePipeline.from_model_id(
        model_id="mock-model-id",
        task="text-generation",
    )

    assert [XXX.tokenizer.pad_token_id == 128009]

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jan 30, 2025

partners: Fixed the procedure of initializing pad_token_id

a82968a

Add to check pad_token_id and eos_token_id of model config. It seems that this is the same bug as the HuggingFace TGI bug. In additon, fix lint errors of test.

tishizaki force-pushed the fix_hfpp_partners branch from e8a8198 to a82968a Compare January 30, 2025 13:34

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jan 30, 2025

ccurme approved these changes Feb 4, 2025

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Feb 4, 2025

ccurme merged commit aeb42dc into langchain-ai:master Feb 4, 2025
19 checks passed

tishizaki mentioned this pull request Feb 4, 2025

Llama-3.2-3B-Instruct failed to use with HuggingfacePipeline because of setting a non-string value as the pad_token #29431

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

partners: Fixed the procedure of initializing pad_token_id #29500

partners: Fixed the procedure of initializing pad_token_id #29500

tishizaki commented Jan 30, 2025

vercel bot commented Jan 30, 2025 •

edited

Loading

tishizaki commented Feb 3, 2025

partners: Fixed the procedure of initializing pad_token_id #29500

partners: Fixed the procedure of initializing pad_token_id #29500

Conversation

tishizaki commented Jan 30, 2025

vercel bot commented Jan 30, 2025 • edited Loading

tishizaki commented Feb 3, 2025

vercel bot commented Jan 30, 2025 •

edited

Loading