Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing fails on unix where python is not available (but only python3) #276

Open
kloczek opened this issue May 11, 2022 · 27 comments · Fixed by #302
Open

testing fails on unix where python is not available (but only python3) #276

kloczek opened this issue May 11, 2022 · 27 comments · Fixed by #302
Labels
Bug Bugs which need fixing Hacktoberfest

Comments

@kloczek
Copy link

kloczek commented May 11, 2022

I'm trying to package your module as an rpm package. So I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

  • python3 -sBm build -w --no-isolation
  • because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
  • install .whl file in </install/prefix>
  • run pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

Here is pytest output:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.8-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.8-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.13, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.8
collected 0 items

========================================================================== no tests ran in 0.01s ===========================================================================
@JessicaTegner
Copy link
Owner

TOtally honest here, I don't know much about rpm packages.

How do you get the new release? Through pip right?
Do you clone the repo, or use the official release through pypi?

@kloczek
Copy link
Author

kloczek commented May 11, 2022

That issue has nothing to do with rpm.
You can reproduce that using oprocedure which I've described.
Just plese run pytest.

@JessicaTegner
Copy link
Owner

Ohh. The reason for it, is because examples, documentation files and tests has been removed from the pypi release from version 1.8, because of some conflicting names, when installing

@kloczek
Copy link
Author

kloczek commented May 11, 2022

I'm not using pypu sdist but tar atogenerated from git tag. https://github.com/NicklasTegner/pypandoc/archive/refs/tags/v1.8.tar.gz

@JessicaTegner
Copy link
Owner

It's because when building, from the pyproject.toml file.
In 1.8 we have removed the test and other files.

I would suggest download and extracting the tar.gz, then running the tests, and lastly creating the wheel

@kloczek
Copy link
Author

kloczek commented May 11, 2022

I see insise tar ball tests.py.

@JessicaTegner
Copy link
Owner

yes but they aren't included when you build. When the whl gets produced they aren't included.

@kloczek
Copy link
Author

kloczek commented May 12, 2022

You can check what is inside autogenerated from git tag tar ball https://github.com/NicklasTegner/pypandoc/tree/v1.8

@JessicaTegner
Copy link
Owner

I know, and in the tarball they are, but my guess is, that when you run the build command, they aren't included, just like when I run python setup.py sdist, because of a change for version 1.8.

My suggestion would be to run pytest before building.

@kloczek
Copy link
Author

kloczek commented Sep 28, 2022

Just tested 1.9 and looks like new two units are failing

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra tests.py --deselect tests.py::TestPypandoc::test_pdf_conversion
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.14, pytest-7.1.3, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9
collected 40 items / 1 deselected / 39 selected

tests.py ..........................FF...........                                                                                                                     [100%]

================================================================================= FAILURES =================================================================================
_____________________________________________________________ TestPypandoc.test_conversion_with_mixed_filters ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_mixed_filters>

    def test_conversion_with_mixed_filters(self):
        markdown_source = "-0-"

        lua = """\
        function Para(elem)
            return pandoc.Para(elem.content .. {{"{0}-"}})
        end
        """
        lua = textwrap.dedent(lua)

        python = """\
        #!/usr/bin/env python

        from pandocfilters import toJSONFilter, Para, Str

        def func(key, value, format, meta):
            if key == "Para":
                return Para(value + [Str("{0}-")])

        if __name__ == "__main__":
            toJSONFilter(func)

        """
        python = textwrap.dedent(python)

        with closed_tempfile(".lua", lua.format(1)) as temp1, closed_tempfile(".py", python.format(2)) as temp2:
            with closed_tempfile(".lua", lua.format(3)) as temp3, closed_tempfile(".py", python.format(4)) as temp4:
>               output = pypandoc.convert_text(
                    markdown_source, to="html", format="md", outputfile=None, filters=[temp1, temp2, temp3, temp4]
                ).strip()

tests.py:381:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:93: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'-0-', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None
filters = ['/tmp/tmpsu9lufkd.lua', '/tmp/tmpgux03sxh.py', '/tmp/tmp96de3ep5.lua', '/tmp/tmphc5bl3mo.py'], verify_format = True, sandbox = True, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=True, cworkdir=None):

        _check_log_handler()
        _ensure_pandoc_path()

        if verify_format:
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                args.append("--sandbox")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpgux03sxh.py:
E           Could not find executable python

pypandoc/__init__.py:418: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
_____________________________________________________________ TestPypandoc.test_conversion_with_python_filter ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_python_filter>

    def test_conversion_with_python_filter(self):
        markdown_source = "**Here comes the content.**"
        python_source = '''\
        #!/usr/bin/env python

        """
        Pandoc filter to convert all regular text to uppercase.
        Code, link URLs, etc. are not affected.
        """

        from pandocfilters import toJSONFilter, Str

        def caps(key, value, format, meta):
            if key == 'Str':
                return Str(value.upper())

        if __name__ == "__main__":
            toJSONFilter(caps)
        '''
        python_source = textwrap.dedent(python_source)
        with closed_tempfile(".py", python_source) as tempfile:
>           output = pypandoc.convert_text(
                markdown_source, to='html', format='md', outputfile=None, filters=tempfile
            ).strip()

tests.py:332:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:93: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'**Here comes the content.**', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None, filters = ['/tmp/tmpk2dzvkz1.py']
verify_format = True, sandbox = True, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=True, cworkdir=None):

        _check_log_handler()
        _ensure_pandoc_path()

        if verify_format:
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                args.append("--sandbox")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpk2dzvkz1.py:
E           Could not find executable python

pypandoc/__init__.py:418: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
============================================================================= warnings summary =============================================================================
pypandoc/pandoc_download.py:62
  /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9/pypandoc/pandoc_download.py:62: DeprecationWarning: invalid escape sequence \.
    regex = re.compile(r"/jgm/pandoc/releases/download/.*(?:"+processor_architecture+"|x86|mac).*\.(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================= short test summary info ==========================================================================
FAILED tests.py::TestPypandoc::test_conversion_with_mixed_filters - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpgux03sxh...
FAILED tests.py::TestPypandoc::test_conversion_with_python_filter - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpk2dzvkz1...
========================================================== 2 failed, 37 passed, 1 deselected, 1 warning in 4.90s ===========================================================

@kloczek
Copy link
Author

kloczek commented Sep 28, 2022

I know, and in the tarball they are, but my guess is, that when you run the build command, they aren't included, just like when I run python setup.py sdist, because of a change for version 1.8.

My suggestion would be to run pytest before building.

On typical rpm package build test suite is always executed after builds and install.

@JessicaTegner
Copy link
Owner

So for your errors, it seems that in both instances, it can't find the "python" executable when trying to use a python filter.
What do you say would be the best solutions? Trying "python3" before "python", since we actually want py3, or the other way around, where we try python3, if regular "python" executable couldn't be found?

@kloczek
Copy link
Author

kloczek commented Sep 29, 2022

So for your errors, it seems that in both instances, it can't find the "python" executable when trying to use a python filter.
What do you say would be the best solutions?

Instead hardcoding "python" executable name use sys.executable.

@JessicaTegner
Copy link
Owner

Instead hardcoding "python" executable name use sys.executable.

We are not hardcoding the name per say. THe error is from the shibang lines when we test with a python filter.

@JessicaTegner JessicaTegner added Bug Bugs which need fixing Hacktoberfest labels Sep 29, 2022
@kloczek
Copy link
Author

kloczek commented Sep 29, 2022

We are not hardcoding the name per say. THe error is from the shibang lines when we test with a python filter.

Than instead hardcode python executable in shebang line execute script as sys.executable param.

@JessicaTegner JessicaTegner changed the title 1.8: pytest cannot find any units testing fails on unix where python is not available (but only python3) Sep 30, 2022
JessicaTegner added a commit that referenced this issue Oct 1, 2022
…ixes #276 (#302)

* Updated tests cases that write python scripts, to use the python interpreter that the tests was run with

* fixed test cases
@kloczek
Copy link
Author

kloczek commented Oct 1, 2022

I've added tree commits to my build procedure:

Patch:          %{VCS}/commit/b5565358.patch#/%{name}-Updated-readme-with-correct-batches.patch
Patch:          %{VCS}/commit/3e7676dd.patch#/%{name}-Fixed-sort-files-before-processing-292-301.patch
Patch:          %{VCS}/commit/b2738b45.patch#/%{name}-Fixes-test-cases-that-uses-python-while-only-python3.patch

and looks like issue still is present

tests.py ..........................FF........... [100%]

================================================================================= FAILURES =================================================================================
_____________________________________________________________ TestPypandoc.test_conversion_with_mixed_filters ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_mixed_filters>

def test_conversion_with_mixed_filters(self):
    markdown_source = "-0-"

    lua = """\
    function Para(elem)
        return pandoc.Para(elem.content .. {{"{0}-"}})
    end
    """
    lua = textwrap.dedent(lua)

    python = """\
    #!{0}

    from pandocfilters import toJSONFilter, Para, Str

    def func(key, value, format, meta):
        if key == "Para":
            return Para(value + [Str("{0}-")])

    if __name__ == "__main__":
        toJSONFilter(func)

    """
    python = textwrap.dedent(python)
    python.format(sys.executable)

    with closed_tempfile(".lua", lua.format(1)) as temp1, closed_tempfile(".py", python.format(2)) as temp2:
        with closed_tempfile(".lua", lua.format(3)) as temp3, closed_tempfile(".py", python.format(4)) as temp4:
          output = pypandoc.convert_text(
                markdown_source, to="html", format="md", outputfile=None, filters=[temp1, temp2, temp3, temp4]
            ).strip()

tests.py:384:


pypandoc/init.py:93: in convert_text
return _convert_input(source, format, 'string', to, extra_args=extra_args,


source = b'-0-', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None
filters = ['/tmp/tmpbv7qc_dw.lua', '/tmp/tmpi8dvhe90.py', '/tmp/tmpf8udfv4c.lua', '/tmp/tmpb_8hsrr8.py'], verify_format = True, sandbox = True, cworkdir = None

def _convert_input(source, format, input_type, to, extra_args=(),
                   outputfile=None, filters=None, verify_format=True,
                   sandbox=True, cworkdir=None):

    _check_log_handler()
    _ensure_pandoc_path()

    if verify_format:
        format, to = _validate_formats(format, to, outputfile)
    else:
        format = normalize_format(format)
        to = normalize_format(to)

    string_input = input_type == 'string'
    if not string_input:
        if isinstance(source, str):
            input_file = [source]
        else:
            input_file = source
    else:
        input_file = []

    input_file = sorted(input_file)
    args = [__pandoc_path, '--from=' + format]

    args.append('--to=' + to)

    args += input_file

    if outputfile:
        args.append("--output=" + str(outputfile))

    if sandbox:
        if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
            args.append("--sandbox")

    args.extend(extra_args)

    # adds the proper filter syntax for each item in the filters list
    if filters is not None:
        if isinstance(filters, string_types):
            filters = filters.split()
        f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
        args.extend(f)

    # To get access to pandoc-citeproc when we use a included copy of pandoc,
    # we need to add the pypandoc/files dir to the PATH
    new_env = os.environ.copy()
    files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
    new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
    creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

    old_wd = os.getcwd()
    if cworkdir and old_wd != cworkdir:
        os.chdir(cworkdir)

    p = subprocess.Popen(
        args,
        stdin=subprocess.PIPE if string_input else None,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        env=new_env,
        creationflags=creation_flag)

    if cworkdir is not None:
        os.chdir(old_wd)

    # something else than 'None' indicates that the process already terminated
    if not (p.returncode is None):
        raise RuntimeError(
            'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                           p.stderr.read())
        )

    if string_input:
        try:
            source = cast_bytes(source, encoding='utf-8')
        except (UnicodeDecodeError, UnicodeEncodeError):
            # assume that it is already a utf-8 encoded string
            pass
    try:
        stdout, stderr = p.communicate(source if string_input else None)
    except OSError:
        # this is happening only on Py2.6 when pandoc dies before reading all
        # the input. We treat that the same as when we exit with an error...
        raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

    try:
        stdout = stdout.decode('utf-8')
    except UnicodeDecodeError:
        # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
        raise RuntimeError('Pandoc output was not utf-8.')

    try:
        stderr = stderr.decode('utf-8')
    except UnicodeDecodeError:
        # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
        raise RuntimeError('Pandoc output was not utf-8.')

    # check that pandoc returned successfully
    if p.returncode != 0:
      raise RuntimeError(
            'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
        )

E RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpi8dvhe90.py:
E Could not find executable python

pypandoc/init.py:420: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
_____________________________________________________________ TestPypandoc.test_conversion_with_python_filter ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_python_filter>

def test_conversion_with_python_filter(self):
    markdown_source = "**Here comes the content.**"
    python_source = '''\
    #!{0}

    """
    Pandoc filter to convert all regular text to uppercase.
    Code, link URLs, etc. are not affected.
    """

    from pandocfilters import toJSONFilter, Str

    def caps(key, value, format, meta):
        if key == 'Str':
            return Str(value.upper())

    if __name__ == "__main__":
        toJSONFilter(caps)
    '''
    python_source = textwrap.dedent(python_source)
    python_source.format(sys.executable)

    with closed_tempfile(".py", python_source) as tempfile:
      output = pypandoc.convert_text(
            markdown_source, to='html', format='md', outputfile=None, filters=tempfile
        ).strip()

tests.py:334:


pypandoc/init.py:93: in convert_text
return _convert_input(source, format, 'string', to, extra_args=extra_args,


source = b'Here comes the content.', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None, filters = ['/tmp/tmp2o_r5jt_.py']
verify_format = True, sandbox = True, cworkdir = None

def _convert_input(source, format, input_type, to, extra_args=(),
                   outputfile=None, filters=None, verify_format=True,
                   sandbox=True, cworkdir=None):

    _check_log_handler()
    _ensure_pandoc_path()

    if verify_format:
        format, to = _validate_formats(format, to, outputfile)
    else:
        format = normalize_format(format)
        to = normalize_format(to)

    string_input = input_type == 'string'
    if not string_input:
        if isinstance(source, str):
            input_file = [source]
        else:
            input_file = source
    else:
        input_file = []

    input_file = sorted(input_file)
    args = [__pandoc_path, '--from=' + format]

    args.append('--to=' + to)

    args += input_file

    if outputfile:
        args.append("--output=" + str(outputfile))

    if sandbox:
        if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
            args.append("--sandbox")

    args.extend(extra_args)

    # adds the proper filter syntax for each item in the filters list
    if filters is not None:
        if isinstance(filters, string_types):
            filters = filters.split()
        f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
        args.extend(f)

    # To get access to pandoc-citeproc when we use a included copy of pandoc,
    # we need to add the pypandoc/files dir to the PATH
    new_env = os.environ.copy()
    files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
    new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
    creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

    old_wd = os.getcwd()
    if cworkdir and old_wd != cworkdir:
        os.chdir(cworkdir)

    p = subprocess.Popen(
        args,
        stdin=subprocess.PIPE if string_input else None,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        env=new_env,
        creationflags=creation_flag)

    if cworkdir is not None:
        os.chdir(old_wd)

    # something else than 'None' indicates that the process already terminated
    if not (p.returncode is None):
        raise RuntimeError(
            'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                           p.stderr.read())
        )

    if string_input:
        try:
            source = cast_bytes(source, encoding='utf-8')
        except (UnicodeDecodeError, UnicodeEncodeError):
            # assume that it is already a utf-8 encoded string
            pass
    try:
        stdout, stderr = p.communicate(source if string_input else None)
    except OSError:
        # this is happening only on Py2.6 when pandoc dies before reading all
        # the input. We treat that the same as when we exit with an error...
        raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

    try:
        stdout = stdout.decode('utf-8')
    except UnicodeDecodeError:
        # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
        raise RuntimeError('Pandoc output was not utf-8.')

    try:
        stderr = stderr.decode('utf-8')
    except UnicodeDecodeError:
        # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
        raise RuntimeError('Pandoc output was not utf-8.')

    # check that pandoc returned successfully
    if p.returncode != 0:
      raise RuntimeError(
            'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
        )

E RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmp2o_r5jt_.py:
E Could not find executable python

pypandoc/init.py:420: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
============================================================================= warnings summary =============================================================================
pypandoc/pandoc_download.py:62
/home/tkloczko/rpmbuild/BUILD/pypandoc-1.9/pypandoc/pandoc_download.py:62: DeprecationWarning: invalid escape sequence .
regex = re.compile(r"/jgm/pandoc/releases/download/.(?:"+processor_architecture+"|x86|mac)..(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================= short test summary info ==========================================================================
FAILED tests.py::TestPypandoc::test_conversion_with_mixed_filters - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpi8dvhe90...
FAILED tests.py::TestPypandoc::test_conversion_with_python_filter - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmp2o_r5jt_...
========================================================== 2 failed, 37 passed, 1 deselected, 1 warning in 4.93s ===========================================================

</details>

@kloczek
Copy link
Author

kloczek commented Oct 1, 2022

Additionally after --deselect failing units I see some warnings:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra tests.py --deselect tests.py::TestPypandoc::test_pdf_conversion --deselect tests.py::TestPypandoc::test_conversion_with_mixed_filters --deselect tests.py::TestPypandoc::test_conversion_with_python_filter
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.14, pytest-7.1.3, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9
collected 40 items / 3 deselected / 37 selected

tests.py .....................................                                                                                                                       [100%]

============================================================================= warnings summary =============================================================================
pypandoc/pandoc_download.py:62
  /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9/pypandoc/pandoc_download.py:62: DeprecationWarning: invalid escape sequence \.
    regex = re.compile(r"/jgm/pandoc/releases/download/.*(?:"+processor_architecture+"|x86|mac).*\.(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================== 37 passed, 3 deselected, 1 warning in 4.33s ================================================================

JessicaTegner added a commit that referenced this issue Oct 1, 2022
@JessicaTegner
Copy link
Owner

@kloczek can you try the latest development snapshot, that should fix the failing test cases.

@kloczek
Copy link
Author

kloczek commented Oct 1, 2022

After replace last patch with:

Patch:          %{VCS}/commit/b5565358.patch#/%{name}-Updated-readme-with-correct-batches.patch
Patch:          %{VCS}/commit/3e7676dd.patch#/%{name}-Fixed-sort-files-before-processing-292-301.patch
Patch:          https://github.com/JessicaTegner/pypandoc/commit/b2738b45.patch#/%{name}-Fixes-test-cases-that-uses-python-while-only-python3-is-available.patch

pytest still fails ..

+ cd pypandoc-1.9
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ /usr/bin/cat /home/tkloczko/rpmbuild/SOURCES/python-pypandoc-Updated-readme-with-correct-batches.patch
+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f
+ /usr/bin/cat /home/tkloczko/rpmbuild/SOURCES/python-pypandoc-Fixed-sort-files-before-processing-292-301.patch
+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f
+ /usr/bin/cat /home/tkloczko/rpmbuild/SOURCES/python-pypandoc-Fixes-test-cases-that-uses-python-while-only-python3-is-available.patch
+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f

[..]

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.9-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra tests.py --deselect tests.py::TestPypandoc::test_pdf_conversion
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.14, pytest-7.1.3, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9
collected 40 items / 1 deselected / 39 selected

tests.py ..........................FF...........                                                                                                                     [100%]

================================================================================= FAILURES =================================================================================
_____________________________________________________________ TestPypandoc.test_conversion_with_mixed_filters ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_mixed_filters>

    def test_conversion_with_mixed_filters(self):
        markdown_source = "-0-"

        lua = """\
        function Para(elem)
            return pandoc.Para(elem.content .. {{"{0}-"}})
        end
        """
        lua = textwrap.dedent(lua)

        python = """\
        #!{0}

        from pandocfilters import toJSONFilter, Para, Str

        def func(key, value, format, meta):
            if key == "Para":
                return Para(value + [Str("{0}-")])

        if __name__ == "__main__":
            toJSONFilter(func)

        """
        python = textwrap.dedent(python)
        python.format(sys.executable)

        with closed_tempfile(".lua", lua.format(1)) as temp1, closed_tempfile(".py", python.format(2)) as temp2:
            with closed_tempfile(".lua", lua.format(3)) as temp3, closed_tempfile(".py", python.format(4)) as temp4:
>               output = pypandoc.convert_text(
                    markdown_source, to="html", format="md", outputfile=None, filters=[temp1, temp2, temp3, temp4]
                ).strip()

tests.py:384:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:93: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'-0-', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None
filters = ['/tmp/tmpwr8zvwco.lua', '/tmp/tmpiljf6dd6.py', '/tmp/tmpdu_fqeof.lua', '/tmp/tmpwj3gm4v6.py'], verify_format = True, sandbox = True, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=True, cworkdir=None):

        _check_log_handler()
        _ensure_pandoc_path()

        if verify_format:
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                args.append("--sandbox")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpiljf6dd6.py:
E           Could not find executable python

pypandoc/__init__.py:420: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
_____________________________________________________________ TestPypandoc.test_conversion_with_python_filter ______________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_python_filter>

    def test_conversion_with_python_filter(self):
        markdown_source = "**Here comes the content.**"
        python_source = '''\
        #!{0}

        """
        Pandoc filter to convert all regular text to uppercase.
        Code, link URLs, etc. are not affected.
        """

        from pandocfilters import toJSONFilter, Str

        def caps(key, value, format, meta):
            if key == 'Str':
                return Str(value.upper())

        if __name__ == "__main__":
            toJSONFilter(caps)
        '''
        python_source = textwrap.dedent(python_source)
        python_source.format(sys.executable)

        with closed_tempfile(".py", python_source) as tempfile:
>           output = pypandoc.convert_text(
                markdown_source, to='html', format='md', outputfile=None, filters=tempfile
            ).strip()

tests.py:334:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:93: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'**Here comes the content.**', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None, filters = ['/tmp/tmpd450lw4k.py']
verify_format = True, sandbox = True, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=True, cworkdir=None):

        _check_log_handler()
        _ensure_pandoc_path()

        if verify_format:
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                args.append("--sandbox")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpd450lw4k.py:
E           Could not find executable python

pypandoc/__init__.py:420: RuntimeError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
/home/tkloczko
============================================================================= warnings summary =============================================================================
pypandoc/pandoc_download.py:62
  /home/tkloczko/rpmbuild/BUILD/pypandoc-1.9/pypandoc/pandoc_download.py:62: DeprecationWarning: invalid escape sequence \.
    regex = re.compile(r"/jgm/pandoc/releases/download/.*(?:"+processor_architecture+"|x86|mac).*\.(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================= short test summary info ==========================================================================
FAILED tests.py::TestPypandoc::test_conversion_with_mixed_filters - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpiljf6dd6...
FAILED tests.py::TestPypandoc::test_conversion_with_python_filter - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpd450lw4k...
========================================================== 2 failed, 37 passed, 1 deselected, 1 warning in 4.89s ===========================================================

@JessicaTegner
Copy link
Owner

well we are using sys.executable to run the python tests now, so I don't see how it could fail...

@JessicaTegner JessicaTegner reopened this Oct 2, 2022
@kloczek
Copy link
Author

kloczek commented Mar 5, 2023

Just retested 1.11 and still I see three units failing

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.11-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.11-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra -m 'not network' tests.py --deselect tests.py::TestPypandoc::test_pdf_conversion
==================================================================================== test session starts ====================================================================================
platform linux -- Python 3.8.16, pytest-7.2.2, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.11
collected 41 items / 1 deselected / 40 selected

tests.py .......................F...FF...........                                                                                                                                     [100%]

========================================================================================= FAILURES ==========================================================================================
_______________________________________________________________________ TestPypandoc.test_conversion_with_data_files ________________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_data_files>

        def test_conversion_with_data_files(self):
            # remove our test.docx file from our test_data dir if it already exosts
            test_data_dir = os.path.join(os.path.dirname(__file__), 'test_data')
            test_docx_file = os.path.join(test_data_dir, 'test.docx')
            if os.path.exists(test_docx_file):
                os.remove(test_docx_file)
>           result = pypandoc.convert_file(
        os.path.join(test_data_dir, 'index.html'),
        to='docx',
        format='html',
        outputfile=test_docx_file,
        sandbox=True,
    )

tests.py:240:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:168: in convert_file
    return _convert_input(discovered_source_files, format, 'path', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = '/home/tkloczko/rpmbuild/BUILD/pypandoc-1.11/test_data/index.html', format = 'html', input_type = 'path', to = 'docx', extra_args = ()
outputfile = '/home/tkloczko/rpmbuild/BUILD/pypandoc-1.11/test_data/test.docx', filters = None, verify_format = True, sandbox = True, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "97" during conversion: Could not find data file data/data/docx/[Content_Types].xml

pypandoc/__init__.py:426: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
______________________________________________________________________ TestPypandoc.test_conversion_with_mixed_filters ______________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_mixed_filters>

    def test_conversion_with_mixed_filters(self):
        markdown_source = "-0-"

        lua = """\
        function Para(elem)
            return pandoc.Para(elem.content .. {{"{0}-"}})
        end
        """
        lua = textwrap.dedent(lua)

        python = """\
        #!{0}

        from pandocfilters import toJSONFilter, Para, Str

        def func(key, value, format, meta):
            if key == "Para":
                return Para(value + [Str("{0}-")])

        if __name__ == "__main__":
            toJSONFilter(func)

        """
        python = textwrap.dedent(python)
        python.format(sys.executable)

        with closed_tempfile(".lua", lua.format(1)) as temp1, closed_tempfile(".py", python.format(2)) as temp2:
            with closed_tempfile(".lua", lua.format(3)) as temp3, closed_tempfile(".py", python.format(4)) as temp4:
>               output = pypandoc.convert_text(
                    markdown_source, to="html", format="md", outputfile=None, filters=[temp1, temp2, temp3, temp4]
                ).strip()

tests.py:403:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:91: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'-0-', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None
filters = ['/tmp/tmpdgw_df6w.lua', '/tmp/tmpbl813ywg.py', '/tmp/tmp85dsiv3y.lua', '/tmp/tmp7j1t2jod.py'], verify_format = True, sandbox = False, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpbl813ywg.py:
E           Could not find executable python

pypandoc/__init__.py:426: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
______________________________________________________________________ TestPypandoc.test_conversion_with_python_filter ______________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_python_filter>

    def test_conversion_with_python_filter(self):
        markdown_source = "**Here comes the content.**"
        python_source = '''\
        #!{0}

        """
        Pandoc filter to convert all regular text to uppercase.
        Code, link URLs, etc. are not affected.
        """

        from pandocfilters import toJSONFilter, Str

        def caps(key, value, format, meta):
            if key == 'Str':
                return Str(value.upper())

        if __name__ == "__main__":
            toJSONFilter(caps)
        '''
        python_source = textwrap.dedent(python_source)
        python_source.format(sys.executable)

        with closed_tempfile(".py", python_source) as tempfile:
>           output = pypandoc.convert_text(
                markdown_source, to='html', format='md', outputfile=None, filters=tempfile
            ).strip()

tests.py:353:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:91: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'**Here comes the content.**', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None, filters = ['/tmp/tmpcwer2aku.py'], verify_format = True
sandbox = False, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpcwer2aku.py:
E           Could not find executable python

pypandoc/__init__.py:426: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
===================================================================================== warnings summary ======================================================================================
pypandoc/pandoc_download.py:61
  /home/tkloczko/rpmbuild/BUILD/pypandoc-1.11/pypandoc/pandoc_download.py:61: DeprecationWarning: invalid escape sequence \.
    regex = re.compile(r"/jgm/pandoc/releases/download/.*(?:"+processor_architecture+"|x86|mac).*\.(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================== short test summary info ==================================================================================
FAILED tests.py::TestPypandoc::test_conversion_with_data_files - RuntimeError: Pandoc died with exitcode "97" during conversion: Could not find data file data/data/docx/[Content_Types].xml
FAILED tests.py::TestPypandoc::test_conversion_with_mixed_filters - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpbl813ywg.py:
FAILED tests.py::TestPypandoc::test_conversion_with_python_filter - RuntimeError: Pandoc died with exitcode "83" during conversion: Error running filter /tmp/tmpcwer2aku.py:
=================================================================== 3 failed, 37 passed, 1 deselected, 1 warning in 6.46s ===================================================================

@JessicaTegner
Copy link
Owner

@kloczek From the 2 last ones:

        python_source = '''\
        #!{0}

        """
        Pandoc filter to convert all regular text to uppercase.
        Code, link URLs, etc. are not affected.
        """

        from pandocfilters import toJSONFilter, Str

        def caps(key, value, format, meta):
            if key == 'Str':
                return Str(value.upper())

        if __name__ == "__main__":
            toJSONFilter(caps)
        '''
        python_source = textwrap.dedent(python_source)
        python_source.format(sys.executable)

We are setting the shebang line by using "sys.executable", so only reason why it can't run the python filters would be because either the "sys.executable" is incorrect, or is still set to regular python (somehow).

Can you try running something like the following, to check what the sys.executable is set to when running our tests?

import sys

print(sys.executable)

For the first one, the one about the data files. That's an error in the test case, where "sandbox" is specifically set to True, even though the default is now False. THat should be easy enough to fix, by just omitting the sandbox parameter all together

@kloczek
Copy link
Author

kloczek commented Mar 7, 2023

[tkloczko@pers-jacek SPECS]$ python3
Python 3.8.16 (default, Jan 30 2023, 13:00:00)
[GCC 13.0.1 20230127 (Red Hat 13.0.1-0)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.executable)
/usr/bin/python3
>>>

@JessicaTegner
Copy link
Owner

@kloczek can you test master now, after the work pr pr #328 the python ones should be fixed

@jayvdb
Copy link
Contributor

jayvdb commented May 8, 2023

This issue should be able to be closed now. ping @kloczek

@kloczek
Copy link
Author

kloczek commented Apr 11, 2024

Hmm .. just retested 1.13 and pytest still fails in 3 units 🤔

Here is pytest output:
+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.13-4.fc37.x86_64/usr/lib64/python3.10/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-pypandoc-1.13-4.fc37.x86_64/usr/lib/python3.10/site-packages
+ /usr/bin/pytest -ra -m 'not network' tests.py --deselect tests.py::TestPypandoc::test_pdf_conversion
==================================================================================== test session starts ====================================================================================
platform linux -- Python 3.10.14, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/tkloczko/rpmbuild/BUILD/pypandoc-1.13
configfile: pyproject.toml
collected 41 items / 1 deselected / 40 selected

tests.py .......................F...FF...........                                                                                                                                     [100%]

========================================================================================= FAILURES ==========================================================================================
_______________________________________________________________________ TestPypandoc.test_conversion_with_data_files ________________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_data_files>

    def test_conversion_with_data_files(self):
        # remove our test.docx file from our test_data dir if it already exosts
        test_data_dir = os.path.join(os.path.dirname(__file__), 'test_data')
        test_docx_file = os.path.join(test_data_dir, 'test.docx')
        if os.path.exists(test_docx_file):
            os.remove(test_docx_file)
>       result = pypandoc.convert_file(
          os.path.join(test_data_dir, 'index.html'),
          to='docx',
          format='html',
          outputfile=test_docx_file,
          sandbox=True,
        )

tests.py:240:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:200: in convert_file
    return _convert_input(discovered_source_files, format, 'path', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = '/home/tkloczko/rpmbuild/BUILD/pypandoc-1.13/test_data/index.html', format = 'html', input_type = 'path', to = 'docx', extra_args = ()
outputfile = '/home/tkloczko/rpmbuild/BUILD/pypandoc-1.13/test_data/test.docx', filters = None, verify_format = True, sandbox = True
cworkdir = '/home/tkloczko/rpmbuild/BUILD/pypandoc-1.13'

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            if not (to in ["odt", "docx", "epub", "epub3", "pdf"] and outputfile == "-"):
                stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "97" during conversion: Could not find data file data/data/docx/[Content_Types].xml

pypandoc/__init__.py:467: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
______________________________________________________________________ TestPypandoc.test_conversion_with_mixed_filters ______________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_mixed_filters>

    def test_conversion_with_mixed_filters(self):
        markdown_source = "-0-"

        lua = """\
        function Para(elem)
            return pandoc.Para(elem.content .. {{"{0}-"}})
        end
        """
        lua = textwrap.dedent(lua)

        python = """\
        #!{0}

        from pandocfilters import toJSONFilter, Para, Str

        def func(key, value, format, meta):
            if key == "Para":
                return Para(value + [Str("{{0}}-")])

        if __name__ == "__main__":
            toJSONFilter(func)

        """
        python = textwrap.dedent(python)
        python = python.format(sys.executable)

        with closed_tempfile(".lua", lua.format(1)) as temp1, closed_tempfile(".py", python.format(2)) as temp2:
            os.chmod(temp2, 0o755)

            with closed_tempfile(".lua", lua.format(3)) as temp3, closed_tempfile(".py", python.format(4)) as temp4:
                os.chmod(temp4, 0o755)

>               output = pypandoc.convert_text(
                    markdown_source, to="html", format="md", outputfile=None, filters=[temp1, temp2, temp3, temp4]
                ).strip()

tests.py:408:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:92: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'-0-', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None
filters = ['/tmp/tmpjpqbkha4.lua', '/tmp/tmpslzo_5p7.py', '/tmp/tmpv3iwkln3.lua', '/tmp/tmppf4ooinv.py'], verify_format = True, sandbox = False, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            if not (to in ["odt", "docx", "epub", "epub3", "pdf"] and outputfile == "-"):
                stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Traceback (most recent call last):
E             File "/tmp/tmpslzo_5p7.py", line 3, in <module>
E               from pandocfilters import toJSONFilter, Para, Str
E           ModuleNotFoundError: No module named 'pandocfilters'
E           Error running filter /tmp/tmpslzo_5p7.py:
E           Filter returned error status 1

pypandoc/__init__.py:467: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
______________________________________________________________________ TestPypandoc.test_conversion_with_python_filter ______________________________________________________________________

self = <tests.TestPypandoc testMethod=test_conversion_with_python_filter>

    def test_conversion_with_python_filter(self):
        markdown_source = "**Here comes the content.**"
        python_source = '''\
        #!{0}

        """
        Pandoc filter to convert all regular text to uppercase.
        Code, link URLs, etc. are not affected.
        """

        from pandocfilters import toJSONFilter, Str

        def caps(key, value, format, meta):
            if key == 'Str':
                return Str(value.upper())

        if __name__ == "__main__":
            toJSONFilter(caps)
        '''
        python_source = textwrap.dedent(python_source)
        python_source = python_source.format(sys.executable)

        with closed_tempfile(".py", python_source) as tempfile:
            os.chmod(tempfile, 0o755)
>           output = pypandoc.convert_text(
                markdown_source, to='html', format='md', outputfile=None, filters=tempfile
            ).strip()

tests.py:354:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypandoc/__init__.py:92: in convert_text
    return _convert_input(source, format, 'string', to, extra_args=extra_args,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = b'**Here comes the content.**', format = 'markdown', input_type = 'string', to = 'html', extra_args = (), outputfile = None, filters = ['/tmp/tmp3f9k2vwi.py'], verify_format = True
sandbox = False, cworkdir = None

    def _convert_input(source, format, input_type, to, extra_args=(),
                       outputfile=None, filters=None, verify_format=True,
                       sandbox=False, cworkdir=None):

        _check_log_handler()

        logger.debug("Ensuring pandoc path...")
        _ensure_pandoc_path()

        if verify_format:
            logger.debug("Verifying format...")
            format, to = _validate_formats(format, to, outputfile)
        else:
            format = normalize_format(format)
            to = normalize_format(to)

        logger.debug("Identifying input type...")
        string_input = input_type == 'string'
        if not string_input:
            if isinstance(source, str):
                input_file = [source]
            else:
                input_file = source
        else:
            input_file = []

        input_file = sorted(input_file)
        args = [__pandoc_path, '--from=' + format]

        args.append('--to=' + to)

        args += input_file

        if outputfile:
            args.append("--output=" + str(outputfile))

        if sandbox:
            if ensure_pandoc_minimal_version(2,15): # sandbox was introduced in pandoc 2.15, so only add if we are using 2.15 or above.
                logger.debug("Adding sandbox argument...")
                args.append("--sandbox")
            else:
                logger.warning("Sandbox argument was used, but pandoc version is too low. Ignoring argument.")

        args.extend(extra_args)

        # adds the proper filter syntax for each item in the filters list
        if filters is not None:
            if isinstance(filters, string_types):
                filters = filters.split()
            f = ['--lua-filter=' + x if x.endswith(".lua") else '--filter=' + x for x in filters]
            args.extend(f)

        # To get access to pandoc-citeproc when we use a included copy of pandoc,
        # we need to add the pypandoc/files dir to the PATH
        new_env = os.environ.copy()
        files_path = os.path.join(os.path.dirname(os.path.realpath(__file__)), "files")
        new_env["PATH"] = new_env.get("PATH", "") + os.pathsep + files_path
        creation_flag = 0x08000000 if sys.platform == "win32" else 0 # set creation flag to not open pandoc in new console on windows

        old_wd = os.getcwd()
        if cworkdir and old_wd != cworkdir:
            os.chdir(cworkdir)

        logger.debug("Running pandoc...")
        p = subprocess.Popen(
            args,
            stdin=subprocess.PIPE if string_input else None,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=new_env,
            creationflags=creation_flag)

        if cworkdir is not None:
            os.chdir(old_wd)

        # something else than 'None' indicates that the process already terminated
        if not (p.returncode is None):
            raise RuntimeError(
                'Pandoc died with exitcode "%s" before receiving input: %s' % (p.returncode,
                                                                               p.stderr.read())
            )

        if string_input:
            try:
                source = cast_bytes(source, encoding='utf-8')
            except (UnicodeDecodeError, UnicodeEncodeError):
                # assume that it is already a utf-8 encoded string
                pass
        try:
            stdout, stderr = p.communicate(source if string_input else None)
        except OSError:
            # this is happening only on Py2.6 when pandoc dies before reading all
            # the input. We treat that the same as when we exit with an error...
            raise RuntimeError('Pandoc died with exitcode "%s" during conversion.' % (p.returncode))

        try:
            if not (to in ["odt", "docx", "epub", "epub3", "pdf"] and outputfile == "-"):
                stdout = stdout.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        try:
            stderr = stderr.decode('utf-8')
        except UnicodeDecodeError:
            # this shouldn't happen: pandoc more or less guarantees that the output is utf-8!
            raise RuntimeError('Pandoc output was not utf-8.')

        # check that pandoc returned successfully
        if p.returncode != 0:
>           raise RuntimeError(
                'Pandoc died with exitcode "%s" during conversion: %s' % (p.returncode, stderr)
            )
E           RuntimeError: Pandoc died with exitcode "83" during conversion: Traceback (most recent call last):
E             File "/tmp/tmp3f9k2vwi.py", line 8, in <module>
E               from pandocfilters import toJSONFilter, Str
E           ModuleNotFoundError: No module named 'pandocfilters'
E           Error running filter /tmp/tmp3f9k2vwi.py:
E           Filter returned error status 1

pypandoc/__init__.py:467: RuntimeError
----------------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------------
/home/tkloczko
===================================================================================== warnings summary ======================================================================================
pypandoc/pandoc_download.py:61
  /home/tkloczko/rpmbuild/BUILD/pypandoc-1.13/pypandoc/pandoc_download.py:61: DeprecationWarning: invalid escape sequence '\.'
    regex = re.compile(r"/jgm/pandoc/releases/download/.*(?:"+processor_architecture+"|x86|mac).*\.(?:msi|deb|pkg)")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================== short test summary info ==================================================================================
FAILED tests.py::TestPypandoc::test_conversion_with_data_files - RuntimeError: Pandoc died with exitcode "97" during conversion: Could not find data file data/data/docx/[Content_Types].xml
FAILED tests.py::TestPypandoc::test_conversion_with_mixed_filters - RuntimeError: Pandoc died with exitcode "83" during conversion: Traceback (most recent call last):
FAILED tests.py::TestPypandoc::test_conversion_with_python_filter - RuntimeError: Pandoc died with exitcode "83" during conversion: Traceback (most recent call last):
=================================================================== 3 failed, 37 passed, 1 deselected, 1 warning in 4.19s ===================================================================

@JessicaTegner
Copy link
Owner

@kloczek Can you try the following:

  • Running from source from the git repo?
  • running poetry install and make sure it iinstalls pandocfilters
  • Then run pytest with poetry run python tests.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Bugs which need fixing Hacktoberfest
Projects
None yet
3 participants