Skip to content

Commit

Permalink
Merge branch 'release/0.8.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
stumpylog committed Dec 17, 2024
2 parents 888fe76 + a316855 commit aabe6b2
Show file tree
Hide file tree
Showing 10 changed files with 78 additions and 35 deletions.
31 changes: 20 additions & 11 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ jobs:
-
uses: actions/checkout@v4
-
name: Set up Python 3.10
name: Set up Python 3.11
uses: actions/setup-python@v5
with:
python-version: '3.10'
python-version: '3.11'
cache: 'pip'
-
name: Install Hatch
Expand Down Expand Up @@ -58,7 +58,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12', '3.13', 'pypy3.8', 'pypy3.9', 'pypy3.10']
python-version: [ '3.9', '3.10', '3.11', '3.12', '3.13', 'pypy3.9', 'pypy3.10']

steps:
-
Expand All @@ -70,6 +70,10 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
-
name: Pull Docker images
run: |
docker compose --file tests/docker/docker-compose.ci-test.yml pull
-
name: Install Hatch
run: |
Expand All @@ -83,15 +87,20 @@ jobs:
-
name: Run tests
run: |
hatch test --cover --python ${{ matrix.python-version }}
ls -ahl .
hatch test --cover --junitxml=junit.xml -o junit_family=legacy --python ${{ matrix.python-version }}
-
name: Upload coverage to Codecov
if: matrix.python-version == '3.10'
uses: codecov/codecov-action@v4
if: matrix.python-version == '3.11'
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
-
name: Upload test results to Codecov
if: ${{ !cancelled() }}
uses: codecov/test-results-action@v1
with:
# not required for public repos, but intermittently fails otherwise
token: ${{ secrets.CODECOV_TOKEN }}
flags: python-${{ matrix.python-version }}

build:
name: Build
Expand All @@ -104,10 +113,10 @@ jobs:
-
uses: actions/checkout@v4
-
name: Set up Python 3.10
name: Set up Python 3.11
uses: actions/setup-python@v5
with:
python-version: '3.10'
python-version: '3.11'
cache: 'pip'
-
name: Install Hatch
Expand Down Expand Up @@ -180,4 +189,4 @@ jobs:
path: dist
-
name: Publish build to PyPI
uses: pypa/gh-action-pypi-publish@v1.10.2
uses: pypa/gh-action-pypi-publish@v1.12.2
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ repos:
- id: detect-private-key
# See https://github.com/prettier/prettier/issues/15742 for the fork reason
- repo: https://github.com/rbubley/mirrors-prettier
rev: "v3.3.3"
rev: "v3.4.2"
hooks:
- id: prettier
types_or:
Expand All @@ -41,13 +41,13 @@ repos:
- id: codespell
# Python hooks
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: 'v0.6.9'
rev: 'v0.8.2'
hooks:
# Run the linter.
- id: ruff
# Run the formatter.
- id: ruff-format
- repo: https://github.com/tox-dev/pyproject-fmt
rev: "2.2.4"
rev: "v2.5.0"
hooks:
- id: pyproject-fmt
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,26 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.8.0] - 2024-12-17

### Breaking Change

- Dropped support for Python 3.8 ([#36](https://github.com/stumpylog/tika-client/pull/36))

### Fixed

- Tests failed when run with Tika v3 ([#28](https://github.com/stumpylog/tika-client/pull/28))
- Relaxed version restriction on `httpx`

### Changed

- Bump pypa/gh-action-pypi-publish from 1.10.2 to 1.12.2 (by [@dependabot](https://github.com/apps/dependabot) in [#33](https://github.com/stumpylog/tika-client/pull/33))
- Bump codecov/codecov-action from 4 to 5 by (by [@dependabot](https://github.com/apps/dependabot)) ([#32](https://github.com/stumpylog/tika-client/pull/32))

### Added

- Integrated Codecov test analytics ([#34](https://github.com/stumpylog/tika-client/pull/34))

## [0.7.0] - 2024-10-09

### Added
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

---

**Table of Contents**
## Table of Contents

- [Features](#features)
- [Installation](#installation)
Expand All @@ -17,7 +17,7 @@
## Features

- Simplified: No need to worry about XML or JSON responses, downloading a Tika jar file or Python 2
- Support for Tika 2+ only
- Support for Tika 2+ only (including Tika v3, which didn't change the API)
- Based on the modern [httpx](https://github.com/encode/httpx) library
- Full support for type hinting
- Nearly full test coverage run against an actual Tika server for multiple Python and PyPy versions
Expand All @@ -39,7 +39,7 @@ from tika_client import TikaClient
test_file = Path("sample.docx")


with TikaClient("http://localhost:9998") as client
with TikaClient("http://localhost:9998") as client:

# Extract a document's metadata
metadata = client.metadata.from_file(test_file)
Expand Down
24 changes: 10 additions & 14 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ license = "MPL-2.0"
authors = [
{ name = "Trenton H", email = "[email protected]" },
]
requires-python = ">=3.8"
requires-python = ">=3.9"
classifiers = [
"Development Status :: 4 - Beta",
"Environment :: Web Environment",
Expand All @@ -25,7 +25,6 @@ classifiers = [
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
Expand All @@ -36,8 +35,7 @@ classifiers = [
]
dynamic = [ "version" ]
dependencies = [
"httpx~=0.24; python_version<'3.9'",
"httpx~=0.27; python_version>='3.9'",
"httpx>=0.27",
"typing-extensions; python_version<'3.11'",
]

Expand All @@ -64,7 +62,7 @@ installer = "uv"

[tool.hatch.envs.hatch-static-analysis]
# https://hatch.pypa.io/latest/config/internal/static-analysis/
dependencies = [ "ruff ~= 0.6" ]
dependencies = [ "ruff ~= 0.8" ]
config-path = "none"

[tool.hatch.envs.hatch-test]
Expand All @@ -74,17 +72,15 @@ randomize = true
dependencies = [
"coverage-enable-subprocess == 1.0",
"coverage[toml] ~= 7.6",
"pytest < 8.0; python_version < '3.9'",
"pytest ~= 8.3; python_version >= '3.9'",
"pytest ~= 8.3",
"pytest-mock ~= 3.14",
"pytest-randomly ~= 3.15",
"pytest-rerunfailures ~= 14.0",
"pytest-rerunfailures ~= 15.0",
"pytest-xdist[psutil] ~= 3.6",
]
extra-dependencies = [
"pytest-sugar",
"pytest-httpx == 0.30.0; python_version >= '3.9'",
"pytest-httpx ~= 0.22; python_version < '3.9'",
"pytest-httpx ~= 0.33",
"python-magic",
"pytest-docker ~= 3.1",
]
Expand All @@ -109,15 +105,15 @@ cov-report = [
]

[[tool.hatch.envs.hatch-test.matrix]]
python = [ "3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "pypy3.8", "pypy3.9", "pypy3.10" ]
python = [ "3.9", "3.10", "3.11", "3.12", "3.13", "pypy3.8", "pypy3.9", "pypy3.10" ]

#
# Custom Environments
#
[tool.hatch.envs.typing]
detached = true
dependencies = [
"mypy ~= 1.11",
"mypy ~= 1.13",
"httpx",
]

Expand All @@ -144,7 +140,7 @@ update = [ "pre-commit autoupdate" ]
#

[tool.ruff]
target-version = "py38"
target-version = "py39"
line-length = 120

# https://docs.astral.sh/ruff/settings/
Expand Down Expand Up @@ -239,7 +235,7 @@ lint.isort.known-first-party = [ "tika_client" ]
max_supported_python = "3.13"

[tool.pytest.ini_options]
minversion = "7.0"
minversion = "8.0"
testpaths = [ "tests" ]

[tool.coverage.run]
Expand Down
2 changes: 1 addition & 1 deletion src/tika_client/__about__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# SPDX-FileCopyrightText: 2023-present Trenton H <[email protected]>
#
# SPDX-License-Identifier: MPL-2.0
__version__ = "0.7.0"
__version__ = "0.8.0"
2 changes: 1 addition & 1 deletion src/tika_client/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@
from tika_client.data_models import TikaKey
from tika_client.data_models import XmpKey

__all__ = ["TikaClient", "TikaKey", "XmpKey", "DublinCoreKey"]
__all__ = ["DublinCoreKey", "TikaClient", "TikaKey", "XmpKey"]
18 changes: 18 additions & 0 deletions src/tika_client/data_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,26 @@


class TikaKey(str, Enum):
"""
Based on
- https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235835139#MetadataOverview-TikaProcess
- https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235835139#MetadataOverview-TikaGeneral
"""

Parsers = "X-TIKA:Parsed-By"
Parser_Full = "X-TIKA:Parsed-By-Full-Set"
Parse_Time = "X-TIKA:parse_time_millis"
ContentType = "Content-Type"
ContentLength = "Content-Length"
Content = "X-TIKA:content"


class DublinCoreKey(str, Enum):
"""
Based on:
- https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235835139#MetadataOverview-DublinCore
"""

Creator = "dc:creator"
Created = "dcterms:created"
Modified = "dcterms:modified"
Expand All @@ -49,6 +62,11 @@ class DublinCoreKey(str, Enum):


class XmpKey(str, Enum):
"""
Based on:
- https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=235835139#MetadataOverview-XMP(eXtensibleMetadataPlatform)
"""

About = "xmp:About"
Created = "xmp:CreateDate"
NumPages = "xmpTPg:NPages"
Expand Down
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import logging
from collections.abc import Generator
from pathlib import Path
from typing import Generator

import pytest
from pytest_docker.plugin import Services
Expand Down
2 changes: 1 addition & 1 deletion tests/test_resource_tika.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ def test_html_document_from_string_buffer(self, tika_client: TikaClient, sample_
resp = tika_client.tika.as_text.from_buffer(buffer)

assert resp.type == "text/html; charset=UTF-8"
assert resp.parsers == ["org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.html.HtmlParser"]
assert resp.parsers == ["org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.html.JSoupParser"]
assert "Hello world! This is HTML5 content in a file for" in resp.data["X-TIKA:content"]
assert resp.data["dc:title"] == "This Is A Test"
assert resp.data["description"] == "A sample HTML file"
Expand Down

0 comments on commit aabe6b2

Please sign in to comment.