diff --git a/docs/src/dictionary/en-custom.txt b/docs/src/dictionary/en-custom.txt index d4ca484b..cc9c6111 100644 --- a/docs/src/dictionary/en-custom.txt +++ b/docs/src/dictionary/en-custom.txt @@ -49,6 +49,7 @@ multiline namespace namespaces newline +parser's parsers pre prerelease diff --git a/docs/src/markdown/about/changelog.md b/docs/src/markdown/about/changelog.md index 5dcbf722..76d9ead7 100644 --- a/docs/src/markdown/about/changelog.md +++ b/docs/src/markdown/about/changelog.md @@ -7,7 +7,7 @@ for which the element under consideration applies. - **FIX**: HTML pseudo-classes will check that all key elements checked are in the XHTML namespace (HTML parsers that do not provide namespaces will assume the XHTML namespace). -- **FIX**: Ensure that all pseudo-classes names are case insensitive and allow CSS escapes. +- **FIX**: Ensure that all pseudo-class names are case insensitive and allow CSS escapes. ## 1.9.0 diff --git a/docs/src/markdown/faq.md b/docs/src/markdown/faq.md new file mode 100644 index 00000000..252e8572 --- /dev/null +++ b/docs/src/markdown/faq.md @@ -0,0 +1,41 @@ +# Frequent Asked Questions + +## Why do selectors not work the same in Beautiful Soup 4.7+? + +Soup Sieve is the official CSS selector library in Beautiful Soup 4.7+, and with this change, Soup Sieve introduces a +number of changes that break some of the expected behaviors that existed in versions prior to 4.7. + +In short, Soup Sieve follows the CSS specifications fairly close, and this broke a number of non-standard behaviors. +These non-standard behaviors were not allowed according to the CSS specifications. Soup Sieve has no intentions of +bringing back these behaviors. + +For more details on specific changes, and the reasoning why a specific change is considered a good change, or simply a +feature that Soup Sieve cannot/will not support, see [Beautiful Soup Differences](./differences.md). + +## How does `iframe` handling work? + +In web browsers, CSS selectors do not usually select content inside an `iframe` element if the selector is called on an +element outside of the `iframe`. Each HTML document is usually encapsulated and CSS selector leakage across this +`iframe` boundary is usually prevented. + +In it's current iteration, Soup Sieve is not aware of the origin of the documents in the `iframe`, and Soup Sieve will +not prevent selectors from crossing these boundaries. Soup Sieve is not used to style documents, but to scrape +documents. For this reason, it seems to be more helpful to allow selector combinators to cross these boundaries. + +Soup Sieve isn't entirely unaware of `iframe` elements though. In Soup Sieve 1.9.1, it was noticed that some +pseudo-classes behaved in unexpected ways without awareness to `iframes`, this was fixed in 1.9.1. Pseudo-classes such +as [`:default`](./selectors.md#:default), [`:indeterminate`](./selectors.md#:indeterminate), [`:dir()`]( +./selectors.md#:dir), [`:lang()`](./selectors.md#:lang), [`:root`](./selectors.md#:root), and [`:contains()`]( +./selectors.md#:contains) where given awareness of `iframes` to ensure they behaved properly and returned the expected +elements. This doesn't mean that `select` won't return elements in `iframes`, but it won't allow something like +`:default` to select a `button` in an `iframe` whose parent `form` is outside the `iframe`. Or better put, a default +`button` will be evaluated in the context of the document it is in. + +With all of this said, if your selectors have issues with `iframes`, it is most likely because `iframes` are handled +differently by different parsers. `html.parser` will usually parse `iframe` elements as it sees them. `lxml` parser will +often remove `html` and `body` tags of an `iframe` HTML document. `lxml-xml` will simply ignore the content in a XHTML +document. And `html5lib` will HTML escape the content of an `iframe` making traversal impossible. + +In short, Soup Sieve will return elements from all documents, even `iframes`. But certain pseudo-classes may take into +consideration the context of the document they are in. But even with all of this, a parser's handling of `iframes` may +make handling its content difficult if it doesn't parse it as HTML elements, or augments its structure. diff --git a/mkdocs.yml b/mkdocs.yml index b73dfe5c..0d03420f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -27,6 +27,7 @@ nav: - Soup Sieve: index.md - API: api.md - CSS Selectors: selectors.md + - F.A.Q.: faq.md - Beautiful Soup Differences: differences.md - About: - Contributing & Support: about/contributing.md diff --git a/soupsieve/__meta__.py b/soupsieve/__meta__.py index b34bd201..18f2ea08 100644 --- a/soupsieve/__meta__.py +++ b/soupsieve/__meta__.py @@ -186,5 +186,5 @@ def parse_version(ver, pre=False): return Version(major, minor, micro, release, pre, post, dev) -__version_info__ = Version(1, 9, 1, ".dev") +__version_info__ = Version(1, 9, 1, "final") __version__ = __version_info__._get_canonical() diff --git a/tests/test_level4/test_nth_child.py b/tests/test_level4/test_nth_child.py index 719d26b8..d944a86e 100644 --- a/tests/test_level4/test_nth_child.py +++ b/tests/test_level4/test_nth_child.py @@ -41,6 +41,13 @@ def test_nth_child_of_s_complex(self): flags=util.HTML ) + self.assert_selector( + self.MARKUP, + ":nth-child(2n + 1 OF :is(p, span).test)", + ['2', '6', '10'], + flags=util.HTML + ) + class TestNthChildQuirks(TestNthChild): """Test `nth` child selectors with quirks."""