-
-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Prep for 1.9.1 release Fix changelog typo. Add F.A.Q. detailing iframe behavior. Bump version. * Fix spelling * Add test for :nth-child 'of' case sensitivity
- Loading branch information
1 parent
c7929c7
commit ce5a65a
Showing
6 changed files
with
52 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,6 +49,7 @@ multiline | |
namespace | ||
namespaces | ||
newline | ||
parser's | ||
parsers | ||
pre | ||
prerelease | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Frequent Asked Questions | ||
|
||
## Why do selectors not work the same in Beautiful Soup 4.7+? | ||
|
||
Soup Sieve is the official CSS selector library in Beautiful Soup 4.7+, and with this change, Soup Sieve introduces a | ||
number of changes that break some of the expected behaviors that existed in versions prior to 4.7. | ||
|
||
In short, Soup Sieve follows the CSS specifications fairly close, and this broke a number of non-standard behaviors. | ||
These non-standard behaviors were not allowed according to the CSS specifications. Soup Sieve has no intentions of | ||
bringing back these behaviors. | ||
|
||
For more details on specific changes, and the reasoning why a specific change is considered a good change, or simply a | ||
feature that Soup Sieve cannot/will not support, see [Beautiful Soup Differences](./differences.md). | ||
|
||
## How does `iframe` handling work? | ||
|
||
In web browsers, CSS selectors do not usually select content inside an `iframe` element if the selector is called on an | ||
element outside of the `iframe`. Each HTML document is usually encapsulated and CSS selector leakage across this | ||
`iframe` boundary is usually prevented. | ||
|
||
In it's current iteration, Soup Sieve is not aware of the origin of the documents in the `iframe`, and Soup Sieve will | ||
not prevent selectors from crossing these boundaries. Soup Sieve is not used to style documents, but to scrape | ||
documents. For this reason, it seems to be more helpful to allow selector combinators to cross these boundaries. | ||
|
||
Soup Sieve isn't entirely unaware of `iframe` elements though. In Soup Sieve 1.9.1, it was noticed that some | ||
pseudo-classes behaved in unexpected ways without awareness to `iframes`, this was fixed in 1.9.1. Pseudo-classes such | ||
as [`:default`](./selectors.md#:default), [`:indeterminate`](./selectors.md#:indeterminate), [`:dir()`]( | ||
./selectors.md#:dir), [`:lang()`](./selectors.md#:lang), [`:root`](./selectors.md#:root), and [`:contains()`]( | ||
./selectors.md#:contains) where given awareness of `iframes` to ensure they behaved properly and returned the expected | ||
elements. This doesn't mean that `select` won't return elements in `iframes`, but it won't allow something like | ||
`:default` to select a `button` in an `iframe` whose parent `form` is outside the `iframe`. Or better put, a default | ||
`button` will be evaluated in the context of the document it is in. | ||
|
||
With all of this said, if your selectors have issues with `iframes`, it is most likely because `iframes` are handled | ||
differently by different parsers. `html.parser` will usually parse `iframe` elements as it sees them. `lxml` parser will | ||
often remove `html` and `body` tags of an `iframe` HTML document. `lxml-xml` will simply ignore the content in a XHTML | ||
document. And `html5lib` will HTML escape the content of an `iframe` making traversal impossible. | ||
|
||
In short, Soup Sieve will return elements from all documents, even `iframes`. But certain pseudo-classes may take into | ||
consideration the context of the document they are in. But even with all of this, a parser's handling of `iframes` may | ||
make handling its content difficult if it doesn't parse it as HTML elements, or augments its structure. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters