`'str' object has no attribute 'name'` error on some xpath filters #2318

dgtlmoon · 2024-04-18T16:31:03Z

All versions?

using this shared watch https://changedetection.io/share/QtZ-94DW41sa

'str' object has no attribute '__name__' error.. i tried different lxml library versions but that made no difference

https://www.depinte.be/werken and

//div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1]

seems to come from here

changedetection.io/changedetectionio/html_tools.py

Line 128 in e110b3e

    
           r = elementpath.select(tree, xpath_filter.strip(), namespaces={'re': 'http://exslt.org/regular-expressions'}, parser=XPath3Parser)

Likely it is elementpath related

The text was updated successfully, but these errors were encountered:

dgtlmoon · 2024-04-18T16:39:16Z

tried latest elementpath 4.4.0 same result

dgtlmoon · 2024-04-18T16:49:27Z

the error comes from elementpath.. tried different versions, same outcome...

dgtlmoon · 2024-04-18T17:22:58Z

this is my custom 45.13 container's pip package version.

are you saying you cant reproduce the issue?

Constantin1489 · 2024-04-18T17:31:21Z

I can reproduce the problem. But it is quite weird.
With "Playwright Chromium/Javascript via 'ws://127.0.0.1:3000/?stealth=1&--disable-web-security=true'", elementpath works
With "Basic fast Plaintext/HTTP Client", 'str' object has no attribute '__name__'

?????

dgtlmoon · 2024-04-18T17:35:14Z

You need to compare the HTML then both in the chrome JS rendered version and using curl

Constantin1489 · 2024-04-29T08:02:27Z

Hi,
I believe the bug is originated from libxml2. See also, https://gitlab.gnome.org/GNOME/libxml2/-/issues/716

Constantin1489 · 2024-05-02T11:51:07Z

I found the solution but I need time to ensure.

ezalenski · 2024-05-07T01:40:55Z

I took a look at this just to try and brush up on my pdb skills.

The issue here is that lxml believes the html from that site is invalid. There's an issue with elementpath.select() assuming it's on a non-empty tree and not handling that correctly (this is where the exception is coming from). I think an improvement changedetection.io can do here is to check the parser.error_log for errors, maybe only with empty trees as I'm not sure how noisy that error_log is and how often it's non-empty.

Here's where I attached the pdb:

Constantin1489 · 2024-05-07T15:55:33Z

@ezalenski try with python -m pdb -c 'b elementpath/tree_builders.py:229' and p [ e for e in elem.itersiblings()] in pdb. That is the problem. and see also https://gitlab.gnome.org/GNOME/libxml2/-/issues/716

Also, please take a look at my test in the PR.

amirt01 · 2024-05-13T03:07:13Z

I encountered the same issue. I'm solving it temporarily using XPath1.0 by prepending xpath1: to the XPath rule.

Constantin1489 · 2024-05-13T04:59:00Z

Hi @amirt01 If you provide the example URL, I would be thankful!

amirt01 · 2024-05-13T06:27:01Z

Certainly @Constantin1489! I use changedetection.io to monitor company job sites like those hosted on Lever. I ran into this issue when filtering for the posting names: //*[contains(@data-qa, 'posting-name')]. I was able to remedy this by changing this filter to: xpath1://*[contains(@data-qa, 'posting-name')].

Here is an arbitrary example using Kinsta:
Here is a link to the broken watch config.
Here is a link to the fixed* watch config.

Constantin1489 · 2024-05-13T06:37:23Z

@amirt01 Thank you! The case you reported will be fixed with the #2351

leiless · 2024-05-20T10:31:36Z

I also came across this issue, it's reproducible in my machine.
ChangeDetection version is v0.45.22

The CSS/JSONPath/JQ/XPath Filters is something like //*[@id="Foobar"]/div[1].

I'm solving it temporarily using XPath1.0 by prepending xpath1: to the XPath rule, just as what @amirt01 did.
So it's something like xpath1://*[@id="Foobar"]/div[1]

Constantin1489 · 2024-05-26T10:17:45Z

@leiless would you run the code by modifying the url?

URL='https://jobs.lever.co/kinsta/'
curl $URL | xmllint --html - --debug 2> /dev/null | grep 'ELEMENT html'

leiless · 2024-05-27T02:04:57Z

@Constantin1489, there is the ELEMENT html line:

$ curl -fsSL $URL | xmllint --html - --debug 2> /dev/null | grep 'ELEMENT html' -C10
HTML DOCUMENT
encoding=utf-8
URL=-
standalone=true
  DTD(html)
  ELEMENT html
    ATTRIBUTE xmlns
      TEXT
        content=http://www.w3.org/1999/xhtml
    TEXT
      content=
    ELEMENT head
      ELEMENT meta
        ATTRIBUTE http-equiv
          TEXT
            content=Content-Type

Constantin1489 · 2024-05-27T05:02:36Z

Please would you run the code without -C10?

leiless · 2024-05-27T05:57:26Z

$ curl -fsSL $URL | xmllint --html - --debug 2> /dev/null | grep 'ELEMENT html'
  ELEMENT html

Constantin1489 · 2024-05-27T06:03:09Z

Yes, that is the problem I solved with the PR.
@leiless why you edited? That is exactly the bug.

screenshot

Constantin1489 · 2024-05-27T06:12:05Z

Anyway, if there is some kind like iframe, child ELEMENT html doesn't have same indentation.

leiless · 2024-05-27T06:42:00Z

Yes, that is the problem I solved with the PR. @leiless why you edited? That is exactly the bug.

screenshot

Sometimes, I got this

  ELEMENT html
  ELEMENT html

But usually it's:

  ELEMENT html

It's weird?

dgtlmoon · 2024-05-27T08:04:40Z

@leiless can you include the URL?

Constantin1489 · 2024-05-27T08:08:01Z

I'm not an expert. This is just my explanation. It would be wrong at some point.

This is XPath1 spec said (https://www.w3.org/TR/1999/REC-xpath-19991116/#root-node)

The root node is the root of the tree. A root node does not occur except as the root of the tree. The element node for the document element is a child of the root node. The root node also has as children processing instruction and comment nodes for processing instructions and comments that occur in the prolog and after the end of the document element.

This explanation is similar to how DOM or XDM looks like.

The point is that the element node for the document element is a child of the root node. (root node != root element node)
If we take this definition literally, XPath1 is not possible for our cases.(But you know everybody uses lxml, libxml2 well. everybody has benefit of it.)

We send a document to xmllint, in this case, we can expect a fixed html document would have one html(root element node).

Also, DOM is important. (https://www.w3.org/2008/08/cleantheweb/libxml)

When reading the document on the Web (likely to be invalid) and creating the DOM tree, clients have to recover for syntax errors. HTML 5 Parsing algorithm describes precisely how to recover from erroneous syntax.

So when we think of something no-more-fixable HTML(this is my term. Or means complete html. Also my term), It will have the same structure as DOM.

So, when html4 specs says "html element is optional" is something like, "I know you open a html document in browser. I will make it lemon juice." and the browsers create an html element tag to fix the document.

And the xpath1, xpath2, xpath3, and xpath3.1 expect only one root element node.
that is why if the document you receive has more than one root element, in SO, people say,

divide it into multiple documents with regex
OR <root> + document you receive + </root>

But the method I choose is, just to create new_root Object and add multiple element nodes as children and flag an frangment option. It won't re-parse the document because I don't have any legitimacy to make it only one html root element node in this case.

So why this happens. (Again, this is just my explanation. It would be wrong at some point. But I'm easing my pain with it.
) "HTMLparser - interface for an HTML 4.0 non-verifying parser"(https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html). Also if you click the link, you can see what exactly html element it is. It is xml elements and xml nodes. lxml also is the same. (EDIT ADD: https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html#htmlNodePtr)

That is why when only one html root element exists, xpath2-3.1 works. Internally the nodes are xml nodes.

So BTW why then libxml2 has a HTML 4.0 non-verifying parser?
Maybe at that time, there were browser wars. there were multiple parsing rules. So many web devs sent html with the wrong syntax. Maybe that leads the spec development for XHTML, HTML5. but you know everybody uses lxml, libxml2 well. everybody has benefit of it.)

I think it is what it is.

Also I already reported this issue. You don't have to do.

leiless · 2024-05-27T12:35:59Z

@leiless can you include the URL?

https://www.pdrcfw.com/OurNews.aspx

XPath Filters: //*[@id="ArticleList1"]/div[1]

Constantin1489 · 2024-05-27T13:02:10Z

I got the result but after I tried to investigate, I got blocked..?

screenshot

Constantin1489 · 2024-05-27T13:13:33Z

BTW, my private changedetection function has html source api

</body>
</html>
<a href="/twaf_abc/twaf_abc.html" style="display:none">robots</a>

This is the problem.

html source code


<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head id="Head1"><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta content="keywords" name="浦东人才服务网,本站动态 " /><title>
	浦东人才服务网
</title><link href="html/css/base.css" rel="stylesheet" />
    <script src="/html/js/jquery-14B.min.js"></script>

</head>
<body>
    <form name="form1" method="post" action="./OurNews.aspx" id="form1">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="5cYKOKOOhN0W1gP3c88uQTQVi/lOR1kK21r41rSM4fCaIjMOQeGBE/HfQYEi66clZwliUggasZvPtqt4L29UJwZW6opyTArBYHeLzlwomDNA7BVS/WUoE2+ZqOdL1F8JI+ZTjrspixbn5WScFmk+8i7B+vOFZDRdqeGbcF3QYfCRKmynyw4zWJHSmBC7mUB5MjKTPBt19EfKq3LOjndQhqonc9OvjcuONtIUn/tPn6poSYFNADteTgn6pSDqVhybSdmx8zU9wh9v/Li5LmhUXii1z8yUZ3GoXSM02c281lPrhksAWQrtpwf1hD/B+MFhtyfIzIrF4X5ouBbr+evSqencJ7XKDcfLU2rGKkK/HaEMZf2z60IQfZQ9cuAAcSLA8vy9tp/mZgyt5WLVcFx3tIFdr3LOP03ixkbN4uK7lmY4DeSo32jAKxcyhgcofQozBzjPH7pmx8bbl32GWXbSm5c+2bYO04zmPcOgO5VMRO8zK4dypB9LPvtg7KUyJp/eQZ59lnQBiCXNzO627uSeG4/5k1h53xvtfqdUSpJf2tmtN1BhG+OEZefmD5fZG+o9huy2B8s4sAGlmobQikiO7bXxnG3iHAonTPiCa1CqyIA22DApJjbQjIj8Srh/OeFNV/1biS1hVht69eK6NE2tlbirXCvzTPKORmGQUSyu4S724WuoBZGriZL/1wMNSkm35lrduI81wUd/aA7RK4aF+d74cL0v+Dky0UiabB7qgMGAh3Uf6+hEnKG8Y1r/gvIkpAZIjDSljB3YL7C5fEJPYhdpoCts9Z4y3lvg98CtdhgROsgyu9xqHMPWFTeExZj3z6HlsMxMVcwcXU1KoQKh33KvFEp17ehsgYRHvLg7YSMajtwKh/veNHGlh0PSps0PVYRdIfdkA72hIc6bwaNPOvJsvknREFkILm2QDuxRityU7PVTeFoY2ILPd3+D3pfCCmwuS/WKdGy0EEkDqN8waA3F9jXVqj6ay8yh5VvY7+Go0KZdy//UUp2zeyP8rTgAyeuDDvt0pWGCq+FEzcHFQxIt5zjuFb02HJlBbHirpWAkLOWK0w0VEe5SmnijxYuhAG/1gj5t2cQKowAuXUBqT25BfCZyG2PlabeYrj282b06MWelLAP0Ga61qYNsPUeykhJLKlNEKAI/sSVehRIxY3+Wg8v6v/LKaSOjXHNPj6D+bm/n+JuxnIp0kRj1PBuWvGRTgyvJr1cH8m4DOjIa16vo2/2xSGT54GQbKyL6TMkCux5z06Y7Z2qx4eiTVi/Tqnlp6kO4MaMBIFFOuoTKLmyF+APt6CUnv2eyzaGMWIxhWubBmdE+N8JrqOTUlPj0lNfXFXDI9FQiyITmopR6/pyLr/VXlaKdg7T1jZ129/Vq6pCDPWWucp/qr6bGY1IE/DG+0ETeb1Vx0vuQLUhtpaTvQuI5nM7daa+R1Wchx19m1B7r8GGhACEVlqfdxtRbi6rAlyENOs6q5PH7VxZfdeIjMkLpiFawkRFwLb6Pi/5zaUr3YwzXN06GrAsBh+YuNM1xO6nIVDGx3zdDNUgzAnTM2KTzZfTTUQIx5GhfBwu7Kz9DolseINJ8RmfrJKnBBRVHEXMx47yIKXwSDsTSFdkWe+p3lGKV0x/qf8/G5WFUZRvk1HNCasUAGaOo0hkNGewl4kO9/djqS8ddJZVTdZTxfaQwUbEAiEvn0DrwTqEAEvo6nWsqByb9H/P0nyqcQagjnCjK4rd8ahzZFsMNichoaC7MFH76S+5wC7IqKYL8Xc2K+LekVsPq6m+iZXgvDB24rcYVetfDxdBgavM+8tH9k26VcAT+LhNS4WqyVMBLdrIbnw+UEX+UsxIHqgTfOuHPCja+wg3uTXN+0VVsn1vm5mXD3YB1RhirEjEPhdwMSpR2MSQhUkil4FmQXjmevxpcjnz8RFLNkC258+siU3eaAOXdYQXN/8oFaUY6ZAcgnBRiMCzT0zFPbESTacZ6Psr6qyMfytgrloTLalbicuwBrxJmTJWIvisclo+snoJyvgU9d+lf38Y/Ld3MUIHaoYd14s1WtIUWNgrZorHB4E+raUQOxFJ3+2E8/yrWoKht1f0Dqe2VV0qAMNwh1tfuF1oy70zJtgeM5hNdxf06RfXcvEdwd+30p3E3q62dyjmhn5qzQbz2JKe4fmAPUMqZiuzzcj+96GyMhgcLBc9/P+DJqxozHIuJmzmQ3KYhBoziMOdIicNUv6cxEOUH+TrTWzKafA/i347/ouz2676VHtJgDMwpy/QessaNjNWI1RgbwjhMFL/r1cNT+7v2YcwvSsyYs/d7CsU3MkKvsjW+EUQjHzPvHyZySzRLtDUWUQP/350gV9YHrcOxvbtO8eT5OHLqan1V7niA5Yw6c2oCaUs0K8Cwlw3VqH7J2PNeyxmKyMPdVU0lEzDWtCNc5yEH163ysP+cCTtGnH+BVkwcVbbDHiKJXzHiiEdZ4kcGLOwW4fFFGzJ9izLHbbxkYK1WcubtBi/NRsaF0EQNxa+TtJKynyfK5IWInwKJ4ZqSw4R0f8nUiyROQcx2HUjll60clhJraeehQVGnhlkVWuLtuHi0FvN5aEpDU3rsDG+cZduzA7lVHOP6DHs37MsE2rj2EADDbMKPu4VnePBzpq8rvW0fk5leTkfqQH9EaOwAAFKgUWNnrF7VqG7cOAFHRGjfonusdkNkCUC6X+obEdp0jdn6vTwhcUNpDhizNrLl7ZKKO34odDq5hXDYD0OvkTWFiWKhpbykZ31kdHBUKaZ5oQisGyNaQ6+cGaYFIhd5bCMy6GQjfQnbPhPFvFBrfYr4O/wEQd24mMqaexR6d6BaQ/aa2A6fQhcGxBCgzv96n9cgDLBKnTaqCY0GarmUj63S/tsvhZtX4WLAXXhefaD86hGx4rlhDNhda4QyJuU0j9FQgsiQDmevfiCl23U1oKAvxYVFDRyeHqDx/bAdzgVXwkd2apiaHDiVm1plv80Uh/wXsaGgpfShV2xwSDYQCeggYw51jR2iveJVzWxbXKvE7GySyDx35WfkUGVvY1svH8D6j6xLGeDjUTBuZyo2Eufy9pQnn8fmfgVT2Y/p9RaUCfcoU6zaHU7Lsx3xrEPlPzEjLUpF2nHSgt/eaAPOfGnul2HUj2LyeJBT3GFCHnJ1eYgLZetEaqFz9twttWoP1daXzCMfrkod1/myv1vo+OL2HG4RXsUina72zzpIlhErwmoas/Q=" />
</div>

<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['form1'];
if (!theForm) {
    theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
//]]>
</script>


<div>

	<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="FF60E101" />
	<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="DHfxOK9Grhk5o01Yfy+LTxBzkE12A9nVEeiZkUE0MkkR7o3Xz/5hYn7pwDYNQ9frVVp+13T89d498sshdws7g+/jt15UXB18cFyurNqV+Uykkf5Eud945Ci2Qqagv7kMc5Ik2O2wql0TkwskZNwoBvd184Eu5eJv7Njx91E8P+jPTFSpQW+6OE3ixAX/DjQq0FIN+BqeFbxFz83RHDjjPOkTpEmPYwM1JpjO2fg6HLk0uT60FPpBPHy2+APYz/hg9nc1XwjHis4dGZyzcPeXHuV3Kgi9L2Vk+rmxH83JMPhSDvR5q3WZWLdgik0hqiC4IKIu5H5LSIzHEUnPLzqhQx53oSY0gU51vTiZO4MWXaaMo/61ag/BQBylRli9WGb0hhcUYQs2v5vBvkgPWOcNHVI6Jlz7BOQM47D//c0fEV4rJ95xM2rm2/jcHbThgaMYt20bIcjkR1C64COYeQSGkVchQWBiy5nUtZt7qPzyVXB2Tw9/ct5W2x0hUzny4nAXgadfmQmlSnG3qFnjCzk41SBG70Bb60F42aqF8soobSxN6qf1qrEQyJVNKfSfWj+/Zevyu1ok3jCuRRhsbxux2JS13WGlTPMeLVfZgA9xBzFRSjdjTBoL01OCAd3tShZwxXykcQyDMZwv5K5GZB5bnHnCkq1uKjLtCDkv9Sm03dd/HbxyvWHuUKkVugk3Ip+mPAK2p3YBBTvXKO4VT4R/XMmbZqne6+87/ge34aex5KfY32c29n/gF8sOuAKmC6MHHYHs+1VhmXDBEHjY03CQaDwnOMvtOkp9YqML2+j37IRQqKVhRyaQOunwX/RUrJF1Wk1QlMbs3KcPuQ83yY9lJLPnbPYJv0LDXVEE15r6+10QG/XBcQ/VC75WpOfgM/AX3f4bJyNF3T+Tlf5fNW+npAgkEcizORImbsX/jRQsLDmHKjhIEh5VYrHM8e5WcH5bB/NNr8cHlzzwk0Zfo6H3XOz+KzX7icjyNZVW9z9XTTIRyAwFB86ecUYWzvL1j50R+Bv3Me5PIewqEw3EjG8G9I2Kwwwbbud5u0gvhhD6deVtv9f8Y/SQ3P5b07fpgrxk9mYmPqXRM8ISNaMloesUZiuDWUVzxORzfrA01pV7aHJMvWU16HL4vv7iR2xDAXgX+2MJT4xRpFAo72KbmC7dGcAOlYtiW3TYo/NSFBQpwuZMS8BtRhu65kmxGeLPMnPRHaEZpYekjhBCpdynp4ttYFsTZI97VWUHiwXpZhSx7n5wA3hhG6Ifm3bDeqjPZPIcUGa6p+uFHYUncvSATRNiyMdHymmOaU3MxVQsPMxKAxUpDnAq3o6xMIUpkMrJV7Qn7KpW7ebvCBpEtw0giN13KreFkNs=" />
</div>
<script src="/html/js/head.js"></script>

<div style="height:30px;width:960px;margin-top:30px;">
    <div style="margin-left:560px;width:550px;height:22px;">
        <div class="head_menu_top" itemref="/QuickQuery.aspx?querytype=xueli">便民查询</div>
       
        <div class="head_menu_top_vl"></div>
        <div class="head_menu_top" itemref="/LineStatus.aspx">窗口受理动态</div> 
        <div class="head_menu_top_vl"></div>
        <div class="head_menu_top" itemref="/Download.aspx">表格下载</div> 
        <div style="width:115px;float:left;height:22px;margin-left:10px;"><input id="head_search_txt" type="text" style="width:110px;height:18px;font-size:9pt;border:solid 1px #e7e5e5" /></div>
        <div style="width:17px;float:left;height:22px"><img src="/html/img/pdrcfwsearch.jpg" alt="" id="head_search_btn" style="cursor:pointer;" /></div>
    </div>
</div>
<div class="head_div">
    <div style="width:136px;height:80px;float:left;overflow:hidden"><a href="/"><img src="/html/img/pdrcfwlogo_r.jpg" style="border-width:0px"/></a></div>
    <div style="width:824px;height:80px;float:right;overflow:hidden">
        <div style="width:824px;height:54px;overflow:hidden;background:url('/html/img/new_pdrcfwmenutop.jpg') no-repeat;">
            <ul>
                <li style="width:23px;height:54px"></li>
                <li class="head_btn" itemref="/ServiceIndex.aspx?ServiceType=210&PageType=1"></li>
                <li class="head_space"></li>
                <li class="head_btn" itemref="/ServiceIndex.aspx?ServiceType=F10&PageType=1"></li>
                <li class="head_space"></li>
                <li class="head_btn" itemref="/ServiceIndex.aspx?ServiceType=310&PageType=1"></li>
                <li class="head_space"></li>
                <li class="head_btn" itemref="/ServiceIndex.aspx?ServiceType=110&PageType=1"></li>
                <li class="head_space"></li>
                <li class="head_btn" itemref="/ServiceIndex.aspx?ServiceType=E10&PageType=1"></li>
                <li class="head_space"></li>
                <li class="head_btn" itemref="/ServiceIndex.aspx?ServiceType=530&PageType=1"></li>
                <li class="head_space"></li>
                <li class="head_btn" itemref="/LiuChuangYuan.aspx"></li>
                <li class="head_space"></li>
                <li class="head_btn" itemref="/ServiceIndex.aspx?ServiceType=810&PageType=1"></li>
            </ul>
        </div>
        <div style="width:824px;height:26px;overflow:hidden;background:url('/html/img/new_pdrcfwmenuK.jpg') no-repeat">
            <ul>
                <li style="width:23px;height:26px"></li>
                <li class="head_menu_btn" itemref="/ServiceIndex.aspx?ServiceType=410&PageType=1">分居调沪</li>
                <li class="head_menu_btn" itemref="/ServiceIndex.aspx?ServiceType=910&PageType=1">职称评审</li>
                <li class="head_menu_btn" itemref="/ServiceIndex.aspx?ServiceType=A10&PageType=1">事业单位聘用登记</li>
                <li class="head_menu_btn" itemref="/ServiceIndex.aspx?ServiceType=G10&PageType=1">海外人才居住证</li>
                <li class="head_menu_btn" itemref="/ServiceIndex.aspx?ServiceType=630&PageType=1">留学生创业资助</li>
                <li class="head_menu_btn" itemref="/ServiceIndex.aspx?ServiceType=710&PageType=1">人才项目申报</li>
                <li class="head_menu_btn" itemref="/OurNews.aspx">动态发布</li>
                <li class="head_menu_btn" itemref="/ServiceCases.aspx">案例精选</li>
                <li class="head_menu_btn" itemref="/PublicityList.aspx">公示名单</li>
            </ul>
        </div>
    </div>
</div>
<div style="height:40px"></div>
    <div style="width:960px">
       <div>
            <div class="div_left">
               <div style="width:185px;height:45px;line-height:45px;color:white;background:url('/html/img/pdrcfwtitle.jpg') no-repeat;text-align:center;font-size:19px;font-family:SimHei;font-weight:bold">动态发布</div>
                <div>
                                            <div class="div_dot_r"><a href="OurNews.aspx?ServiceType=000">
人才服务资讯</a></div>                        <div class="div_dot_r"><a href="OurNews.aspx?ServiceType=100">
居住证积分</a></div>                        <div class="div_dot_r"><a href="OurNews.aspx?ServiceType=200">
直接落户</a></div>                        <div class="div_dot_r"><a href="OurNews.aspx?ServiceType=300">
居住证转户籍</a></div>                        <div class="div_dot_r"><a href="OurNews.aspx?ServiceType=600">
留学生创业资助</a></div>                        <div class="div_dot_r"><a href="OurNews.aspx?ServiceType=800">
博士后</a></div>                        <div class="div_dot_r"><a href="OurNews.aspx?ServiceType=900">
职称评审</a></div>                        <div class="div_dot_r"><a href="OurNews.aspx?ServiceType=E00">
外国人来华工作许可</a></div>                        <div class="div_dot_r"><a href="OurNews.aspx?ServiceType=F00">
留学回国人员落户</a></div>                        <div class="div_dot_r"><a href="OurNews.aspx?ServiceType=L00">
张江留创园</a></div>
                </div>
                <div style="margin-top:50px">
                    <img src="/html/img/wx.jpg" />
                </div>
            </div>
            <div class="div_right">
                <div style="background:url('/html/img/pdrcfwgncd.jpg') no-repeat;width:729px;height:39px;overflow:hidden;margin-bottom:6px;line-height:39px;">
                    <a href="OurNews.aspx" style="font-size:18px;color:#d7281e;margin-left:24px;font-weight:bold;">动态发布</a>
                </div>
                <div style="width:726px;height:6px;background:url('/html/img/pdrcfwlisttop.jpg') no-repeat;overflow:hidden"></div>
                <div style="width:724px;border-left:solid 1px #d2ccca;border-right:solid 1px #d2ccca;overflow:hidden" id="div_content">
                    <div id="list" style="padding-left:20px;padding-top:20px">
                    <div id="ArticleList1">
	<div style="width:680px;margin:5px auto;line-height:150%">
		<div style="width:680px;height:160px;border-bottom:solid 1px #cccccc;overflow:hidden;padding-top:15px;"><div style="width:180px;float:left;overflow:hidden;text-align:center;"><a href='OurNews.aspx?ServiceType=900&id=2307' title='关于浦东新区开展2024年度本市正高级经济师推荐材料受理工作的通知' target='_blank'><img src="/files/pdrcfw/tongzhi2106.jpg" style="width:152px;height:108px;border-width:0px;"/></a></div><div style="width:500px;float:left;overflow:hidden"><a href='OurNews.aspx?ServiceType=900&id=2307' title='关于浦东新区开展2024年度本市正高级经济师推荐材料受理工作的通知' target='_blank'><span style="font-size:12pt;font-family:黑体 宋体;font-weight:900;">关于浦东新区开展2024年度本市正高级经济师推荐材料受理工作的通知</span></a><div style="font-size:10pt;color:gray">&nbsp;[2024-4-12]</div><div style="font-size:10pt;line-height:150%">关于浦东新区开展2024年度本市正高级经济师推荐材料受理工作的通知</div></div></div><div style="width:680px;height:160px;border-bottom:solid 1px #cccccc;overflow:hidden;padding-top:15px;"><div style="width:180px;float:left;overflow:hidden;text-align:center;"><a href='OurNews.aspx?ServiceType=all&id=2275' title='积分、落户码上测' target='_blank'><img src="/files/pdrcfw/浦东人才服务小程序码235s.jpg" style="width:152px;height:108px;border-width:0px;"/></a></div><div style="width:500px;float:left;overflow:hidden"><a href='OurNews.aspx?ServiceType=all&id=2275' title='积分、落户码上测' target='_blank'><span style="font-size:12pt;font-family:黑体 宋体;font-weight:900;">积分、落户码上测</span></a><div style="font-size:10pt;color:gray">&nbsp;[2023-11-16]</div><div style="font-size:10pt;line-height:150%">积分、落户码上测</div></div></div><div style="width:680px;height:160px;border-bottom:solid 1px #cccccc;overflow:hidden;padding-top:15px;"><div style="width:180px;float:left;overflow:hidden;text-align:center;"><a href='OurNews.aspx?ServiceType=all&id=2273' title='“聚焦产业促校企牵手，释放博士后创新活力”博士后创新创业能力发展暨企业技术需求交流会顺利举行' target='_blank'><img src="/files/pdrcfw/交流会1s.png" style="width:152px;height:108px;border-width:0px;"/></a></div><div style="width:500px;float:left;overflow:hidden"><a href='OurNews.aspx?ServiceType=all&id=2273' title='“聚焦产业促校企牵手，释放博士后创新活力”博士后创新创业能力发展暨企业技术需求交流会顺利举行' target='_blank'><span style="font-size:12pt;font-family:黑体 宋体;font-weight:900;">“聚焦产业促校企牵手，释放博士后创新活力”博士后创新创业能力发展暨企业技术需求交流会顺利举行</span></a><div style="font-size:10pt;color:gray">&nbsp;[2023-11-1]</div><div style="font-size:10pt;line-height:150%">“聚焦产业促校企牵手，释放博士后创新活力”博士后创新创业能力发展暨企业技术需求交流会顺利举行</div></div></div><div style="width:680px;height:160px;border-bottom:solid 1px #cccccc;overflow:hidden;padding-top:15px;"><div style="width:180px;float:left;overflow:hidden;text-align:center;"><a href='OurNews.aspx?ServiceType=all&id=2268' title='2023“留·在上海 全球留学人员创新创业大赛”北美赛道半决赛成功举办' target='_blank'><img src="/files/pdrcfw/关注2023s.png" style="width:152px;height:108px;border-width:0px;"/></a></div><div style="width:500px;float:left;overflow:hidden"><a href='OurNews.aspx?ServiceType=all&id=2268' title='2023“留·在上海 全球留学人员创新创业大赛”北美赛道半决赛成功举办' target='_blank'><span style="font-size:12pt;font-family:黑体 宋体;font-weight:900;">2023“留·在上海 全球留学人员创新创业大赛”北美赛道半决赛成功举办</span></a><div style="font-size:10pt;color:gray">&nbsp;[2023-10-25]</div><div style="font-size:10pt;line-height:150%">2023“留·在上海 全球留学人员创新创业大赛”北美赛道半决赛成功举办</div></div></div><div style="width:680px;height:160px;border-bottom:solid 1px #cccccc;overflow:hidden;padding-top:15px;"><div style="width:180px;float:left;overflow:hidden;text-align:center;"><a href='OurNews.aspx?ServiceType=all&id=2269' title='2023最具潜力的海归创业团队大赛报名即将截止' target='_blank'><img src="/files/pdrcfw/海归创业团队s.jpg" style="width:152px;height:108px;border-width:0px;"/></a></div><div style="width:500px;float:left;overflow:hidden"><a href='OurNews.aspx?ServiceType=all&id=2269' title='2023最具潜力的海归创业团队大赛报名即将截止' target='_blank'><span style="font-size:12pt;font-family:黑体 宋体;font-weight:900;">2023最具潜力的海归创业团队大赛报名即将截止</span></a><div style="font-size:10pt;color:gray">&nbsp;[2023-10-12]</div><div style="font-size:10pt;line-height:150%">2023最具潜力的海归创业团队大赛报名即将截止</div></div></div>
	</div><div style="font-size:9pt;text-align:center;margin-top:20px;">
		第1页&nbsp; 共51页&nbsp;总计253条&nbsp;<a disabled="disabled">【上页】</a><a href="javascript:__doPostBack(&#39;ctl14&#39;,&#39;&#39;)">【下页】</a>&nbsp;至<select name="ctl16" onchange="javascript:setTimeout(&#39;__doPostBack(\&#39;ctl16\&#39;,\&#39;\&#39;)&#39;, 0)">
			<option selected="selected" value="1">1</option>
			<option value="2">2</option>
			<option value="3">3</option>
			<option value="4">4</option>
			<option value="5">5</option>
			<option value="6">6</option>
			<option value="7">7</option>
			<option value="8">8</option>
			<option value="9">9</option>
			<option value="10">10</option>
			<option value="11">11</option>
			<option value="12">12</option>
			<option value="13">13</option>
			<option value="14">14</option>
			<option value="15">15</option>
			<option value="16">16</option>
			<option value="17">17</option>
			<option value="18">18</option>
			<option value="19">19</option>
			<option value="20">20</option>
			<option value="21">21</option>
			<option value="22">22</option>
			<option value="23">23</option>
			<option value="24">24</option>
			<option value="25">25</option>
			<option value="26">26</option>
			<option value="27">27</option>
			<option value="28">28</option>
			<option value="29">29</option>
			<option value="30">30</option>
			<option value="31">31</option>
			<option value="32">32</option>
			<option value="33">33</option>
			<option value="34">34</option>
			<option value="35">35</option>
			<option value="36">36</option>
			<option value="37">37</option>
			<option value="38">38</option>
			<option value="39">39</option>
			<option value="40">40</option>
			<option value="41">41</option>
			<option value="42">42</option>
			<option value="43">43</option>
			<option value="44">44</option>
			<option value="45">45</option>
			<option value="46">46</option>
			<option value="47">47</option>
			<option value="48">48</option>
			<option value="49">49</option>
			<option value="50">50</option>
			<option value="51">51</option>

		</select>&nbsp;每页<select name="ctl18" onchange="javascript:setTimeout(&#39;__doPostBack(\&#39;ctl18\&#39;,\&#39;\&#39;)&#39;, 0)">
			<option selected="selected" value="5">5</option>
			<option value="20">20</option>
			<option value="30">30</option>
			<option value="50">50</option>

		</select>
	</div>
</div>
                     

                    </div>
                </div>
                <div style="width:726px;height:6px;background:url('/html/img/pdrcfwlistbottom.jpg') no-repeat;overflow:hidden"></div>
            </div>
        </div>

    </div>
    
<div style="clear:both;height:30px;overflow:hidden"></div>
<div style="width:960px;background-color:#3f548a;height:96px;overflow:hidden;">
    <div style="float:left;width:480px;margin-left:15px;padding-top:15px;color:white;line-height:130%;font-size:14px;height:70px;">
        <a href="ShowDetail.aspx?id=12" style="color:white">关于我们</a> ┊ <a href="ShowDetail.aspx?id=13" style="color:white">联络我们</a> ┊ <a href="FriendSiteList.aspx" style="color:white">友情链接</a> ┊ <a href="#" style="color:white" onClick="javascript:window.external.AddFavorite('http://www.pdrcfw.com/',document.title)">加入收藏夹</a> 　　<br />
        联系电话：021-58603333　投诉电话：021-58602628；<a href="AdviceWeb.aspx?ServiceType=ZZZ" style="color:white">我来说两句</a><br />
        © 2015-现在 上海市浦东新区人才服务中心 版权所有<br />
        <a target="_blank" href="https://beian.miit.gov.cn/" style="display:inline-block;text-decoration:none;height:20px;line-height:20px;"><p style="float:left;height:20px;line-height:20px;margin: 0px 0px 0px 0px; color:#939393;">沪ICP备11006370号-3</p></a>　
        <a target="_blank" href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=31011502000473" style="display:inline-block;text-decoration:none;height:20px;line-height:20px;border:0px;"><img src="/html/img/gongan.png" style="float:left;border:0px"/><p style="float:left;height:20px;line-height:20px;margin: 0px 0px 0px 5px; color:#939393;">沪公网安备 31011502000473号</p></a><br />
    </div>
    <div style="width:310px;height:79px;float:left;padding-top:9px;margin-left:30px">
        <img border="0" src="/html/img/qcodeclear.jpg" title="欢迎关注微信公众号【浦东新区人才服务中心】"/>
        <img border="0" src="/html/img/allminipro.jpg" title="积分、落户码上测，微信扫码获取" style="margin-left:5px" />
    </div>
    <div style="width:80px;height:80px;float:left;padding-top:10px;">
        <script type="text/javascript">document.write(unescape("%3Cspan id='_ideConac' %3E%3C/span%3E%3Cscript src='https://dcs.conac.cn/js/02/032/0000/60563371/CA020320000605633710002.js' type='text/javascript'%3E%3C/script%3E"));</script>
    </div>
</div>

<span></span>
<script language="javascript" type="text/javascript" > var  sUserAgent = navigator.userAgent;var scr = screen.width +  "*" + screen.height;var reUrl=  escape(this.document.referrer);var url = escape(window.location.pathname);var traurl = "/reach.aspx";function traffic(){traurl = traurl + "?CurrentPage=OurNews&scr=" + screen.width + "*" + screen.height + "&reUrl=" + reUrl + "&lurl="+ url + "&sUserAgent=" + sUserAgent;if(document.images){(new Image()).src = traurl ;} else {document.write('<img border="1" name="trImg" width="1" height="1" src="'+traurl +'">');}return true;}traffic();</script></form>
</body>
</html>
<a href="/twaf_abc/twaf_abc.html" style="display:none">robots</a>

and run my command, the result shows two html root elements.

ADD:

this is small html code to reproduce.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head id="Head1"><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta content="keywords" name="浦东人才服务网,本站动态 " /><title>
	浦东人才服务网
</title><link href="html/css/base.css" rel="stylesheet" />
    <script src="/html/js/jquery-14B.min.js"></script>

</head>
<body>
</body>
</html>
<a href="/twaf_abc/twaf_abc.html" style="display:none">robots</a>

leiless · 2024-05-27T15:10:51Z

Yeah, definitely it's because the HTML source code is malformed (code after the final </html>).

Constantin1489 · 2024-05-28T05:14:42Z

@leiless Also I tried again, today. Now, it shows.

"Your access has been identified as an attack and logged"

So, this can be the reason sometimes you get the one html root element.

Constantin1489 · 2024-05-28T05:19:46Z

So, the non-blocked version(html source code of #2318 (comment) ) will show

another root element is root element siblings.

EDIT:(add image)

leiless · 2024-05-28T06:11:39Z

@Constantin1489 Great analysis! wonder if this bug will be fixed in the [next] release?

Constantin1489 · 2024-05-28T06:25:21Z

I submitted my PR. However, the maintainer needs time to ensure the PR is the solution.

dgtlmoon · 2024-05-28T09:17:12Z

https://www.pdrcfw.com/OurNews.aspx

has correct <html open tag

but then...

</body>
</html>
<a href="[/twaf_abc/twaf_abc.html](https://www.pdrcfw.com/twaf_abc/twaf_abc.html)" style="display:none">robots</a>

dgtlmoon added the triage label Apr 18, 2024

dgtlmoon self-assigned this Apr 18, 2024

dgtlmoon mentioned this issue Apr 18, 2024

module 'lxml.etree' has no attribute '_ElementStringResult' error since v0.45.18 #2312

Closed

dgtlmoon added bug Something isn't working Change detection algorithms filters and removed triage labels Apr 18, 2024

This comment was marked as resolved.

Sign in to view

Constantin1489 linked a pull request May 7, 2024 that will close this issue

XPath3.1: mimic handling of multiple root element nodes #2351

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`'str' object has no attribute 'name'` error on some xpath filters #2318

`'str' object has no attribute 'name'` error on some xpath filters #2318

dgtlmoon commented Apr 18, 2024 •

edited

Loading

dgtlmoon commented Apr 18, 2024

This comment was marked as resolved.

dgtlmoon commented Apr 18, 2024

This comment was marked as resolved.

dgtlmoon commented Apr 18, 2024

Constantin1489 commented Apr 18, 2024

dgtlmoon commented Apr 18, 2024

Constantin1489 commented Apr 29, 2024 •

edited

Loading

Constantin1489 commented May 2, 2024

ezalenski commented May 7, 2024 •

edited

Loading

Constantin1489 commented May 7, 2024 •

edited

Loading

amirt01 commented May 13, 2024

Constantin1489 commented May 13, 2024

amirt01 commented May 13, 2024

Constantin1489 commented May 13, 2024 •

edited

Loading

leiless commented May 20, 2024 •

edited

Loading

Constantin1489 commented May 26, 2024 •

edited

Loading

leiless commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

leiless commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

leiless commented May 27, 2024

dgtlmoon commented May 27, 2024

Constantin1489 commented May 27, 2024 •

edited

Loading

leiless commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

leiless commented May 27, 2024

Constantin1489 commented May 28, 2024

Constantin1489 commented May 28, 2024 •

edited

Loading

leiless commented May 28, 2024 •

edited

Loading

Constantin1489 commented May 28, 2024 •

edited

Loading

dgtlmoon commented May 28, 2024

'str' object has no attribute '__name__' error on some xpath filters #2318

'str' object has no attribute '__name__' error on some xpath filters #2318

Comments

dgtlmoon commented Apr 18, 2024 • edited Loading

dgtlmoon commented Apr 18, 2024

This comment was marked as resolved.

dgtlmoon commented Apr 18, 2024

This comment was marked as resolved.

dgtlmoon commented Apr 18, 2024

Constantin1489 commented Apr 18, 2024

dgtlmoon commented Apr 18, 2024

Constantin1489 commented Apr 29, 2024 • edited Loading

Constantin1489 commented May 2, 2024

ezalenski commented May 7, 2024 • edited Loading

Constantin1489 commented May 7, 2024 • edited Loading

amirt01 commented May 13, 2024

Constantin1489 commented May 13, 2024

amirt01 commented May 13, 2024

Constantin1489 commented May 13, 2024 • edited Loading

leiless commented May 20, 2024 • edited Loading

Constantin1489 commented May 26, 2024 • edited Loading

leiless commented May 27, 2024 • edited Loading

Constantin1489 commented May 27, 2024 • edited Loading

leiless commented May 27, 2024 • edited Loading

Constantin1489 commented May 27, 2024 • edited Loading

Constantin1489 commented May 27, 2024 • edited Loading

leiless commented May 27, 2024

dgtlmoon commented May 27, 2024

Constantin1489 commented May 27, 2024 • edited Loading

leiless commented May 27, 2024 • edited Loading

Constantin1489 commented May 27, 2024 • edited Loading

Constantin1489 commented May 27, 2024 • edited Loading

leiless commented May 27, 2024

Constantin1489 commented May 28, 2024

Constantin1489 commented May 28, 2024 • edited Loading

leiless commented May 28, 2024 • edited Loading

Constantin1489 commented May 28, 2024 • edited Loading

dgtlmoon commented May 28, 2024

`'str' object has no attribute 'name'` error on some xpath filters #2318

`'str' object has no attribute 'name'` error on some xpath filters #2318

dgtlmoon commented Apr 18, 2024 •

edited

Loading

Constantin1489 commented Apr 29, 2024 •

edited

Loading

ezalenski commented May 7, 2024 •

edited

Loading

Constantin1489 commented May 7, 2024 •

edited

Loading

Constantin1489 commented May 13, 2024 •

edited

Loading

leiless commented May 20, 2024 •

edited

Loading

Constantin1489 commented May 26, 2024 •

edited

Loading

leiless commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

leiless commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

leiless commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 27, 2024 •

edited

Loading

Constantin1489 commented May 28, 2024 •

edited

Loading

leiless commented May 28, 2024 •

edited

Loading

Constantin1489 commented May 28, 2024 •

edited

Loading