-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ShadowDOM #555
base: master
Are you sure you want to change the base?
Support ShadowDOM #555
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR 🚀
Can you provide a test and way to enable/disable the behavior via the API?
We probably won't use this feature on our side so we most likely want it to be disabled by default.
If you prefer I can take over this PR but it will probably take longer
src/lib/browser/Page.ts
Outdated
const rootAttr = [...document.documentElement.attributes] | ||
.map(({ name, value }) => `${name}="${value}"`) | ||
.join(' '); | ||
const innerContent = (document.documentElement as any).getInnerHTML(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it needs to {includeShadowRoots: true}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The open mode ShadowDOM can be obtained without passing this parameter.
In order to preserve encapsulation semantics, any closed shadow roots within an element will not be serialized by default.
The default behavior seems to be what we want
@bodinsamuel Would love to have ShadowDOM support for Docsearch since my docs site uses WebComponents. If ShadowDOM is not supported by default, is it possible to enable this feature in Crawler config? Glad you can take over this PR, i don't have much time to perfect it. Thank you for your team's work |
ah it's for DocSearch, I wasn't aware. In that case we might want to use it indeed ahah |
I have several websites using WebComponents:
I used to use fork docsearch-scraper. |
baacb1e
to
b724543
Compare
I'm not sure how docsearch retrieves information from the HTML string. If use a parser to analyze the HTML and then query through the DOM API, we need to remove the
Update: It seems that the Crawler is using Cheerio, so there is no need to remove the |
Use `getInnerHTML`
@@ -271,7 +271,23 @@ export class BrowserPage { | |||
return await promiseWithTimeout( | |||
(async (): Promise<string | null> => { | |||
const start = Date.now(); | |||
const content = await this.#ref?.content(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bodinsamuel now use standard method |
Use
getInnerHTML
https://web.dev/declarative-shadow-dom/