Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tests): add support for running EarlyAI-generated tests #2595

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions EarlyAI_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# EarlyAI Test Integration in OpenLLmetry

## Exeutive Summary

This document outlines the integration of **EarlyAI-generated tests** into the OpenLLmetry monorepo. These tests improve test coverage and ensure instrumentation correctness while keeping the existing test flow intact.

Here is a summay of the test generated for the utils folder on the following projects:

| Project | Total Tests | Passed | Failed |
| ------------------------------------------- | ----------- | ------- | ------ |
| **opentelemetry-instrumentation-anthropic** | 48 | 47 | 1 |
| **opentelemetry-instrumentation-haystack** | 20 | 20 | 0 |
| **opentelemetry-instrumentation-pinecone** | 18 | 17 | 1 |
| **opentelemetry-instrumentation-groq** | 29 | 29 | 0 |
| **Total** | **115** | **113** | **2** |

## Failure Details

### opentelemetry-instrumentation-pinecone

**TestSetSpanAttribute.test_set_attribute_with_none_name_and_valid_value** failed.

- **Assertion failed:** Expected `set_attribute` to not be called, but it was called once.

### opentelemetry-instrumentation-anthropic

**TestSharedMetricsAttributes.test_shared_metrics_attributes_with_none_response** failed.

- Assertion failed: Expected a structured response, but `None` was returned

## Key Additions

### 1. Test Configuration:

- Updated **nx.json** to define `test:early` as a target for running EarlyAI tests across projects.
- Updated **package.json** to include scripts for running EarlyAI tests.
- Added a global **pytest.ini** file to manage test markers and configurations centrally.

### 2. Test Execution Support:

- Tests can be executed across the **entire monorepo** or **per project**.
- EarlyAI tests displayed in the **Early** VS Code extension.

## How to Run EarlyAI Tests

### Run All EarlyAI Tests Across All Projects

```bash
npm run test:early
```

This command runs all EarlyAI tests across the monorepo.

### Run EarlyAI Tests for a Specific Project

```bash
nx run <project-name>:test:early
```

Replace `<project-name>` with the relevant project (e.g., `opentelemetry-instrumentation-openai`).

---

## Technical Changes

### 1. Updated `nx.json`

We added a **global target** for EarlyAI test execution:

```json
"test:early": {
"executor": "@nxlv/python:run-commands",
"options": {
"command": ". .venv/Scripts/activate && poetry run pytest source/test_early_utils/",
"cwd": "{projectRoot}"
}
}
```

### 2. Updated `package.json`

Added a global script for running EarlyAI tests:

```json
"scripts": {
"test:early": "nx run-many --target=test:early"
}
```

### 3. Added a Global `pytest.ini`

Instead of managing individual `pytest.ini` files per project, we added a **global pytest.ini**:

```ini
[tool.pytest.ini_options]
markers = [
"describe: Custom marker for test groups",
"happy_path: Tests the 'happy path' of a function",
"edge_case: Tests edge cases of a function"
]
```

### 4. Added `test:early` Target in Each Project

Each project where EarlyAI tests were added includes the following target in its `project.json`:

```json
"test:early": {
"executor": "@nxlv/python:run-commands",
"outputs": [
"{workspaceRoot}/reports/packages/opentelemetry-instrumentation-anthropic/unittests/early",
"{workspaceRoot}/coverage/packages/opentelemetry-instrumentation-anthropic/early"
],
"options": {
"command": "poetry run pytest opentelemetry/instrumentation/anthropic/test_early_utils/",
"cwd": "packages/opentelemetry-instrumentation-anthropic"
}
}
```

(Each project follows a similar structure, replacing **anthropic** with the respective project name.)

[Early-Ai for Vscode](vscode:extension/Early-AI.EarlyAI)

[Early-Ai for Cursor](cursor:extension/Early-AI.EarlyAI)
16 changes: 15 additions & 1 deletion nx.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
{
"extends": "nx/presets/npm.json",
"$schema": "./node_modules/nx/schemas/nx-schema.json",
"plugins": ["@nxlv/python"]
"plugins": ["@nxlv/python"],
"projects": {
"default": {
"root": ".",
"targets": {
"test:early": {
"executor": "@nxlv/python:run-commands",
"options": {
"command": ". .venv/Scripts/activate && poetry run pytest source/test_early_utils/",
"cwd": "{projectRoot}"
}
}
}
}
}
}
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
"name": "openllmetry",
"version": "0.0.0",
"license": "MIT",
"scripts": {},
"scripts": {
"test:early": "nx run-many --target=test:early"
},
"private": true,
"devDependencies": {
"@nxlv/python": "^20.2.0",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,39 +4,30 @@
import logging
import os
import time
from typing import Callable, Collection, Dict, Any, Optional
from typing_extensions import Coroutine
from typing import Any, Callable, Collection, Dict, Optional

from anthropic._streaming import AsyncStream, Stream
from opentelemetry import context as context_api
from opentelemetry.instrumentation.anthropic.config import Config
from opentelemetry.instrumentation.anthropic.streaming import (
abuild_from_streaming_response,
build_from_streaming_response,
)
abuild_from_streaming_response, build_from_streaming_response)
from opentelemetry.instrumentation.anthropic.utils import (
acount_prompt_tokens_from_request,
dont_throw,
error_metrics_attributes,
count_prompt_tokens_from_request,
run_async,
set_span_attribute,
shared_metrics_attributes,
should_send_prompts,
)
acount_prompt_tokens_from_request, count_prompt_tokens_from_request,
dont_throw, error_metrics_attributes, run_async, set_span_attribute,
shared_metrics_attributes, should_send_prompts)
from opentelemetry.instrumentation.anthropic.version import __version__
from opentelemetry.instrumentation.instrumentor import BaseInstrumentor
from opentelemetry.instrumentation.utils import _SUPPRESS_INSTRUMENTATION_KEY, unwrap
from opentelemetry.instrumentation.utils import (_SUPPRESS_INSTRUMENTATION_KEY,
unwrap)
from opentelemetry.metrics import Counter, Histogram, Meter, get_meter
from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import GEN_AI_RESPONSE_ID
from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import \
GEN_AI_RESPONSE_ID
from opentelemetry.semconv_ai import (
SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY,
LLMRequestTypeValues,
SpanAttributes,
Meters,
)
SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, LLMRequestTypeValues, Meters,
SpanAttributes)
from opentelemetry.trace import SpanKind, Tracer, get_tracer
from opentelemetry.trace.status import Status, StatusCode
from typing_extensions import Coroutine
from wrapt import wrap_function_wrapper

logger = logging.getLogger(__name__)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from typing import Callable, Optional

from typing_extensions import Coroutine


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,11 @@

from opentelemetry.instrumentation.anthropic.config import Config
from opentelemetry.instrumentation.anthropic.utils import (
dont_throw,
error_metrics_attributes,
count_prompt_tokens_from_request,
set_span_attribute,
shared_metrics_attributes,
should_send_prompts,
)
count_prompt_tokens_from_request, dont_throw, error_metrics_attributes,
set_span_attribute, shared_metrics_attributes, should_send_prompts)
from opentelemetry.metrics import Counter, Histogram
from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import GEN_AI_RESPONSE_ID
from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import \
GEN_AI_RESPONSE_ID
from opentelemetry.semconv_ai import SpanAttributes
from opentelemetry.trace.status import Status, StatusCode

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
import logging
from unittest.mock import Mock, patch

import pytest
from opentelemetry.instrumentation.anthropic.utils import dont_throw

# Mock Config to control the behavior of exception_logger


class MockConfig:
exception_logger = None

# Patch the Config used in the module with our MockConfig


@pytest.fixture(autouse=True)
def patch_config():
with patch('opentelemetry.instrumentation.anthropic.utils.Config', MockConfig):
yield

# Describe block for _handle_exception related tests


@pytest.mark.describe("_handle_exception")
class TestHandleException:

@pytest.mark.happy_path
def test_sync_function_no_exception(self):
"""Test that a synchronous function runs without exceptions."""
@dont_throw
def no_exception_func():
return "success"

assert no_exception_func() == "success"

@pytest.mark.happy_path
@pytest.mark.asyncio
async def test_async_function_no_exception(self):
"""Test that an asynchronous function runs without exceptions."""
@dont_throw
async def no_exception_func():
return "success"

assert await no_exception_func() == "success"

@pytest.mark.edge_case
def test_sync_function_with_exception(self, caplog):
"""Test that a synchronous function logs an exception without raising it."""
@dont_throw
def exception_func():
raise ValueError("Test exception")

with caplog.at_level(logging.DEBUG):
exception_func()
assert "OpenLLMetry failed to trace in exception_func, error:" in caplog.text

@pytest.mark.edge_case
@pytest.mark.asyncio
async def test_async_function_with_exception(self, caplog):
"""Test that an asynchronous function logs an exception without raising it."""
@dont_throw
async def exception_func():
raise ValueError("Test exception")

with caplog.at_level(logging.DEBUG):
await exception_func()
assert "OpenLLMetry failed to trace in exception_func, error:" in caplog.text

@pytest.mark.edge_case
def test_no_exception_logger(self):
"""Test that no error occurs if exception_logger is None."""
MockConfig.exception_logger = None

@dont_throw
def exception_func():
raise ValueError("Test exception")

exception_func() # Should not raise any error
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import asyncio
from unittest.mock import AsyncMock

import pytest
# Assuming the function is imported from the module
from opentelemetry.instrumentation.anthropic.utils import \
acount_prompt_tokens_from_request


@pytest.mark.describe("acount_prompt_tokens_from_request")
class TestAcountPromptTokensFromRequest:

@pytest.mark.happy_path
@pytest.mark.asyncio
async def test_single_prompt(self):
"""Test with a single prompt string to ensure correct token counting."""
anthropic = AsyncMock()
anthropic.count_tokens = AsyncMock(return_value=5)
request = {"prompt": "This is a test prompt."}

result = await acount_prompt_tokens_from_request(anthropic, request)

assert result == 5
anthropic.count_tokens.assert_awaited_once_with("This is a test prompt.")

@pytest.mark.edge_case
@pytest.mark.asyncio
async def test_no_prompt_or_messages(self):
"""Test with no prompt or messages to ensure zero tokens are counted."""
anthropic = AsyncMock()
request = {}

result = await acount_prompt_tokens_from_request(anthropic, request)

assert result == 0
anthropic.count_tokens.assert_not_awaited()

@pytest.mark.edge_case
@pytest.mark.asyncio
async def test_message_with_non_string_content(self):
"""Test with message content that is not a string to ensure it is ignored."""
anthropic = AsyncMock()
anthropic.count_tokens = AsyncMock(return_value=0)
request = {
"messages": [
{"content": 12345}, # Non-string content
{"content": None} # None content
]
}

result = await acount_prompt_tokens_from_request(anthropic, request)

assert result == 0
anthropic.count_tokens.assert_not_awaited()
Loading