Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No agent #1245

Merged
merged 41 commits into from
Jan 20, 2025
Merged

No agent #1245

merged 41 commits into from
Jan 20, 2025

Conversation

Thrameos
Copy link
Contributor

@Thrameos Thrameos commented Nov 27, 2024

This resolves the issued with the agent. But I would like to fix the non-ascii path issue at the same time. So additional work is still required.

Comments and reviews welcome.

Reference #1246
Reference #1242
Reference NationalSecurityAgency/ghidra#7141
Reference py5coding/py5generator#563

This PR only drops the agent requirement and cleans up the class loader issues for non-ascii paths. PR #1246 deals with the broken JVMs on Windows.

jpype/_core.py Dismissed Show dismissed Hide dismissed
Copy link

codecov bot commented Nov 27, 2024

Codecov Report

Attention: Patch coverage is 60.46512% with 51 lines in your changes missing coverage. Please review.

Project coverage is 86.89%. Comparing base (29cce31) to head (e80100b).
Report is 42 commits behind head on master.

Files with missing lines Patch % Lines
jpype/_core.py 55.07% 31 Missing ⚠️
jpype/dbapi2.py 68.18% 7 Missing ⚠️
native/common/jp_context.cpp 61.53% 4 Missing and 1 partial ⚠️
native/python/pyjp_module.cpp 50.00% 2 Missing and 2 partials ⚠️
jpype/imports.py 66.66% 2 Missing ⚠️
jpype/_classpath.py 50.00% 1 Missing ⚠️
jpype/_jproxy.py 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1245      +/-   ##
==========================================
- Coverage   87.19%   86.89%   -0.31%     
==========================================
  Files         113      113              
  Lines       10296    10336      +40     
  Branches     4051     4065      +14     
==========================================
+ Hits         8978     8981       +3     
- Misses        726      760      +34     
- Partials      592      595       +3     
Files with missing lines Coverage Δ
jpype/_jclass.py 99.24% <ø> (ø)
jpype/_jvmfinder.py 61.53% <100.00%> (ø)
jpype/protocol.py 98.71% <100.00%> (ø)
native/common/include/jp_context.h 70.27% <ø> (ø)
native/common/jp_classloader.cpp 88.88% <100.00%> (+1.38%) ⬆️
native/common/jp_javaframe.cpp 89.69% <100.00%> (ø)
jpype/_classpath.py 80.95% <50.00%> (ø)
jpype/_jproxy.py 97.87% <50.00%> (ø)
jpype/imports.py 78.57% <66.66%> (ø)
native/python/pyjp_module.cpp 81.91% <50.00%> (-0.75%) ⬇️
... and 3 more

@Thrameos
Copy link
Contributor Author

@tachyonicClock I am taking another crack at the non-ascii paths. To deal with them I am setting up a temporary directory and just copying the required jar file in place. I will then try to bootstrap the rest.

Thus far everything but services are working.

@tachyonicClock
Copy link
Contributor

tachyonicClock commented Nov 27, 2024

I believe using a temporary directory is likely the most effective workaround.

Linking vs. Copying: If possible, we should try to hard link the libraries rather than copying them. This will require a fallback for people linking between disk drives, since you cannot hard-link across storage devices. Since you can symbolic link directories across drives we could try that but it might get messy. Symbolic linking or hard-linking will add close to zero startup time.

Uniform Behavior: Branching on java_class_path.isascii() creates two different behaviors. It would reduce the possibility of bugs to simply perform the workaround for all paths, regardless of whether they need it. This way, we can catch bugs in the workaround.

Custom Loaders: For people who need custom loaders or things on the class path at startup, we could expose a more flexible startJVM function:

startJVM(..., classpath: List[Path], system_classpath: List[Path])
  • classpath: The post-startup Unicode-compatible approach to adding files to the classpath.
  • system_classpath: The approach that adds files to the JVM classpath options (-Djava.class.path). This should perform the temporary directory hack for all system class paths. This way, people can have custom class loaders. The support library can be added to this list.
  • A downside is that it would break backwards compatibility.

@astrelsky
Copy link
Contributor

astrelsky commented Nov 27, 2024

The previous fix that included the agent was mis-behaving because dlls from a different jdk vendor were being loaded with the specified jvm.dll. Why this issue never appeared in the past, I'm not so sure. It was basically a dll search path issue.

Also, symlinks on windows requires admin.

@Thrameos
Copy link
Contributor Author

@tachyonicClock I agree that uniform behavior would be best. I would make everything go through our system class loader if I through that we could get away with it. However, there is a warning that is making me nervous...

OpenJDK 64-Bit Server VM warning: Archived non-system classes are disabled because the java.system.class.loader property is specified (value = "org.jpype.classloader.DynamicClassLoader"). To use archived non-system classes, this property must not be set

I have no idea what a Archived non-system class would be or what they are used for.

NationalSecurityAgency/ghidra#2400

As to linking that is really only a unix thing. For now I will copy. I may be able to unlink it after starting but there are risks with that. More experiments needed.

I copied my whole jpype install under a ./unicode_à😎 directory and trying to get it all working there. Still a bunch of edge cases. But if we can get all the tests running there than we should be set.

@Thrameos
Copy link
Contributor Author

Still running into edge cases here.

Down to two issues.

  1. A directory of classes is being dealt with differently than a set of jars. This is breaking some of our test cases. I am not sure how the system class loader was dealing with these before.
  2. The SQL tests are broken. Unfortunately, this may mean that our service test was inadequate as JDB are using the service loaded.

Copy link
Contributor Author

@Thrameos Thrameos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a work in progress but getting closer.

Comment on lines 43 to 80

private static URL[] launch()
{
String cp = System.getProperty("jpype.class.path");
if (cp == null)
return new URL[0];

ArrayList<URL> path = new ArrayList<>();
int off = 0, next;
do
{
next = cp.indexOf(File.pathSeparator, off);
String element = (next == -1)
? cp.substring(off)
: cp.substring(off, next);
if (!element.isEmpty())
{
try
{
URL url = Paths.get(element).toUri().toURL();
if (url != null)
path.add(url);
} catch (MalformedURLException ex)
{
System.err.println("Malformed url "+ element);
} catch (IOException ex)
{
System.err.println("Unable to open "+ element);
}
}
off = next + 1;
} while (next != -1);

System.out.println("jpype.class.path " + cp);
System.clearProperty("jpype.class.path");
System.setProperty("java.class.path", cp);
return path.toArray(new URL[0]);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tachyonicClock This section allows for us to load the non-ascii path just like a normal one. Thus removing the need for late load logic.

@Thrameos
Copy link
Contributor Author

I understand the two issues. The key problem is that callerSensitive picks up the class loader for the class which was loaded not the one called from Python. So as org.jpype is loaded from the system classloader it gets the broken ApplicationClassLoader rather than our system specified class loader.

As usually there are clear gaps in the Java api which are driving this. Only in the case that ApplicationClassLoader was use can this work perfectly.

I can make dbapi2 work if I force the caller sensitive logic off, but I am not sure if that works in all cases.

@Thrameos
Copy link
Contributor Author

Thrameos commented Nov 28, 2024

So close but still no cigar on this one. Each fix has its own downsides and each is thwarted by something deep in the Java API. I am guessing next attack will be to make a separate org.jpype.cl jar file with just the dynamic class loader. The issue all stem from our caller sensitive grabbing the broken application loader. Thus I need to fool it into thinking we are calling something loaded from the DynamicClassLoader level. Other than splitting the jars not much chance making such flip.

@astrelsky
Copy link
Contributor

I believe using a temporary directory is likely the most effective workaround.

Linking vs. Copying: If possible, we should try to hard link the libraries rather than copying them. This will require a fallback for people linking between disk drives, since you cannot hard-link across storage devices. Since you can symbolic link directories across drives we could try that but it might get messy. Symbolic linking or hard-linking will add close to zero startup time.

Uniform Behavior: Branching on java_class_path.isascii() creates two different behaviors. It would reduce the possibility of bugs to simply perform the workaround for all paths, regardless of whether they need it. This way, we can catch bugs in the workaround.

Custom Loaders: For people who need custom loaders or things on the class path at startup, we could expose a more flexible startJVM function:

startJVM(..., classpath: List[Path], system_classpath: List[Path])
  • classpath: The post-startup Unicode-compatible approach to adding files to the classpath.
  • system_classpath: The approach that adds files to the JVM classpath options (-Djava.class.path). This should perform the temporary directory hack for all system class paths. This way, people can have custom class loaders. The support library can be added to this list.
  • A downside is that it would break backwards compatibility.

Unfortunately, using a temporary directory will cause it to always fail for anyone using windows with non ascii characters in their username.

@Thrameos
Copy link
Contributor Author

There is one unresolved issue which I don't know how to test in the matrix. If the Python shared library is on a non-ascii path, then I get this very inscrutable error.

  File "C:\Users\nelson85\Documents\devel\open\unicode_à😎\jpype\test.py", line 4, in <module>
    jpype.startJVM()
  File "C:\Users\nelson85\Documents\devel\open\unicode_à😎\jpype\jpype\_core.py", line 345, in startJVM
    _jpype.startup(jvmpath, tuple(jvm_args + extra_jvm_args),
SystemError: java.lang.UnsatisfiedLinkError: C:\Users\nelson85\Documents\devel\open\unicode_à??\jpype\_jpype.cp312-win_amd64.pyd: Can't find dependent libraries

@Thrameos
Copy link
Contributor Author

Love when they have an error for ages but they won't fix it because it would break compatibility with being broken.

https://bugs.openjdk.org/browse/JDK-8195129

@Thrameos
Copy link
Contributor Author

Thrameos commented Nov 29, 2024

I don't believe it is possible to fix the issue of the dll located on a non-ascii path on windows. The problem stems from this code in Java.

// Loads .dll/.so and
// in case of error it checks if .dll/.so was built for the
// same architecture as Hotspot is running on
void * os::dll_load(const char *name, char *ebuf, int ebuflen) {
  log_info(os)("attempting shared library load of %s", name);

  void * result = LoadLibrary(name);  <== 
  if (result != NULL) {
    Events::log(NULL, "Loaded shared library %s", name);
    // Recalculate pdb search path if a DLL was loaded successfully.
   ...

They are calling the ascii only version of LoadLibrary. Apparently the correct solution is to convert to wide characters first than call LoadLibraryW. Unfortunately, this is above my level to deal with.

The only work around I see:

  1. Figure out given the current code page the appropriate set of bytes that will make LoadLibrary work.
  2. Instead of doing load library, manually install each of our entry points using JNI_RegisterNatives.
  3. Copy the dll some where and relink, but that will fail because we already have the native library loaded.

The second is possible, but would require a special script just to maintain it. The first I don't know how to start with because I am not sure how windows code pages even appear in Java.

@Thrameos
Copy link
Contributor Author

@tachyonicClock I found the solution to ascii characters on the path, though it is horrible. Apparently when you convert the path name to Short it will give you a path which works regardless. I hacked that into the process.

The last part is intractable. Whenever you run JPype on a Windows machine with a non-ascii path for the install directory, it is going to need to make a temporary copy of the jar. BUT it is also going to hold that file handle open forever with no way of deleting it from within the process that I have found. I put in a good 4 hours trying different work arounds. I have no clue why Java keeps the file open on Windows and not on Linux.

We are technically working in that all of the functionality is present. We need help from OpenJDK to implement better support for this problem. I have done all that is feasible. Though perhaps someone else as a creative idea.

@Thrameos
Copy link
Contributor Author

@marscher This is ready for a review, but contains enough black magic that a horcrux may have been created. There were at least 5 separate bugs in the JVM that I was trying to dodge.

  1. System.load() does not work on windows with non-ascii path.
  2. SystemClassLoader must be entirely replaced to run with non-ascii path.
  3. JDBC consults the broken copy of ClassLoaders.ApplicationClassLoader without a modified call stack.
  4. Class.forName consults the broken copy of ClassLoaders.ApplicationClassLoader without a modified call stack.
  5. Java does not closed down Jar handles when it is destroyed.

@astrelsky
Copy link
Contributor

  1. Java does not closed down Jar handles when it is destroyed.

https://docs.oracle.com/javase/8/docs/api/java/net/URLClassLoader.html#close--

You might just need to override close to close all the other ones and/or manually close it yourself before shutting down.

@Thrameos
Copy link
Contributor Author

The issue isn't the ones that we load. The issue is that on the system class loader itself. It is loaded by init level 3 and we will need to delete it at the end. Because it is the second item loaded in the system (before we have that ability to call a register native or insert class) it occupies a special place. I will look if perhaps there is a way to trash it on the shutdown sequence, but I suspect that it may be aone and done start given its location.

@Thrameos
Copy link
Contributor Author

Thrameos commented Dec 2, 2024

@marscher I am hoping to get 1.5.2 out in the next few weeks. Will you have time to review or should I try to find others to assist?

@ryanmkurtz
Copy link
Contributor

@marscher I am hoping to get 1.5.2 out in the next few weeks. Will you have time to review or should I try to find others to assist?

I was getting ready to upgrade Ghidra to JPype 1.5.1 when the mismatching java versions problem popped up. We plan on code freezing by the end of the month, so I'm debating if I should hold off for 1.5.2 and keep us at 1.5.0 which means we won't support Python 3.13, or upgrade to 1.5.1 knowing that 1.5.2 might not be out in time. If i do the latter, i'd like to at least document the known limitations, like the JDK on the PATH needing to match the version that gets passed to JPype. Maybe this choice will be made moot if 1.5.2 comes out relatively soon.

@Thrameos
Copy link
Contributor Author

Thrameos commented Dec 3, 2024

Assuming the code is ready I will be releasing this weekend or next. We don't like leaving broken versions out for long.

I do think documention of the weird platforms out there may still be necessary. Given nice, complete,well installed versions are on the test system. And users often have broken, incomplete, mismatched arch versions. We have no way to know that 1.5.2 won't also be a dud unless I give up on the non-ascii jar path. Even then once I try to support modules, once again we may produce another dud. Thus unless you are shipping all the pieces there will always be risk. That doesn't mean we won't do our best to support the majority of installations, but we are only human.

A number of things are broken in openJDK and have been for 10 years or more. Much of that came about because things were broken in windows and the Java developers threw up their hands. Thus we are fighting on a very shaky foundation.

@ryanmkurtz
Copy link
Contributor

Sounds good. We also see every possible combination of JDK's and platforms, so Ghidra will likely be a good real-world testing ground.

@Thrameos
Copy link
Contributor Author

Thrameos commented Dec 3, 2024

@ryanmkurtz Do you have an opinion on dropping 1.8? We have been holding out for a long time, but much of what I want to do going forward is going to be module based. Thus I have a need to drop it.

@ryanmkurtz
Copy link
Contributor

ryanmkurtz commented Dec 3, 2024

We have been switching to the latest LTS release about 6-12 months after it's released, so we are currently on JDK 21. Dropping 1.8 def. won't affect us or our users.

@Thrameos
Copy link
Contributor Author

Thrameos commented Dec 9, 2024

@marscher Any chance you can start the reviews this week?

@Thrameos
Copy link
Contributor Author

@astrelsky would you mind reviewing this code instead? I can't leave a defective version out there and have to start the ball on a replacement. But if I don't cut a new release soon I may end up in another busy period at work with little time to address issues.

@astrelsky
Copy link
Contributor

@astrelsky would you mind reviewing this code instead? I can't leave a defective version out there and have to start the ball on a replacement. But if I don't cut a new release soon I may end up in another busy period at work with little time to address issues.

I can. I probably won't get to it until tomorrow night.

Copy link
Contributor

@astrelsky astrelsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can break it tomorrow afternoon, as I'm burnt out for the night, but this looks good to me.

I forgot to check and am on my phone now. Is the case where the copied jar to the temp directory already exists handled? This could occur with multiprocessing before jvm startup maybe. I'm not sure if the temp path name used is randomized.

jpype/dbapi2.py Outdated Show resolved Hide resolved
Comment on lines 186 to 188
char shortPathBuffer[MAX_PATH];
GetShortPathName(path, shortPathBuffer, MAX_PATH);
return shortPathBuffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had me do a double take for a moment because it appears to be returning an array on the stack even though it is not. Not that there is any better alternative.

According to the documentation for GetShortPathName, this can fail. Some file systems don't support the short name. Additionally, the documentation says that this can succeed but still return the long name instead, which is fun.

Is the UNICODE macro undefined or explicitly not set somewhere so it doesn't pick the wide char version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here was that GetShortPathName ate the non-ascii characters which allows java to load the library properly. This was the only pattern that I found to work consistently. UNICODE doesn't help the problem as the issue is that Java doesn't accept non-ascii anywhere in the path. It is better to let it mangle to "a1/b2/org.jpype.lib" which somehow allows it to function properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. GetShortPathName is a macro though. If UNICODE is defined it will expand to GetShortPathNameW otherwise GetShortPathNameA.

I understand why this is being done. My point is that it can fail. As long as it fails in a way that we'll know what happened when the user reports a problem, then it's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good. I let it fall through to the old behavior.

Comment on lines +193 to +194
if (dladdr((void*)getShared, &info))
return info.dli_fname;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, I can't think of anything that would cause it to break.

// is holding the file open and there is no apparent method to close it
// so that this can succeed
if (jarTmpPath != "")
remove(jarTmpPath.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use URLClassLoader.close on the SystemClassLoader. Once closed it will be unable to load anymore classes which have not yet been loaded. Since we are shutting down here, it will be fine, you just need to call close before shutdownJVM (obviously). This won't have any effect on standard Java classes since they are in the builtin and platform class loaders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I called "close" on the JPypeClassLoader which is the system classloader. Unfortunately, that didn't change anything.

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\nelson85\\AppData\\Local\\Temp\\tmpfrr4g750'

The problem is at the ApplicationClassLoader which is not a URL classloader.

    private static class AppClassLoader extends BuiltinClassLoader {
        static {
            if (!ClassLoader.registerAsParallelCapable())
                throw new InternalError();
        }

        AppClassLoader(BuiltinClassLoader parent, URLClassPath ucp) {
            super("app", parent, ucp);
        }

        @Override
        protected Class<?> loadClass(String cn, boolean resolve)
            throws ClassNotFoundException
        {
            // for compatibility reasons, say where restricted package list has
            // been updated to list API packages in the unnamed module.
            @SuppressWarnings("removal")
            SecurityManager sm = System.getSecurityManager();
            if (sm != null) {
                int i = cn.lastIndexOf('.');
                if (i != -1) {
                    sm.checkPackageAccess(cn.substring(0, i));
                }
            }

            return super.loadClass(cn, resolve);
        }

        @Override
        protected PermissionCollection getPermissions(CodeSource cs) {
            PermissionCollection perms = super.getPermissions(cs);
            perms.add(new RuntimePermission("exitVM"));
            return perms;
        }

        /**
         * Called by the VM to support dynamic additions to the class path
         *
         * @see java.lang.instrument.Instrumentation#appendToSystemClassLoaderSearch
         */
        void appendToClassPathForInstrumentation(String path) {
            appendClassPath(path);
        }

        /**
         * Called by the VM to support define package for AppCDS
         */
        protected Package defineOrCheckPackage(String pn, Manifest man, URL url) {
            return super.defineOrCheckPackage(pn, man, url);
        }

        /**
         * Called by the VM, during -Xshare:dump
         */
        private void resetArchivedStates() {
            setClassPath(null);
        }
    }

I can't find a close method in any of the source code or documentation. I would hope there would be a way to close or at least remove the lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah for some reason I was thinking that this was for our system ClassLoader. Nevermind.

setupext/build_ext.py Outdated Show resolved Hide resolved
jvmargs = _removeClassPath(jvmargs)
late_load = False
elif has_classloader:
# Code for 1.6 release when we add module support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting very complicated. I contemplated it a few times with the previous changes but it may be a good idea to split it up more when adding the module path handling later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see a way to clear this up or can we wait for the module path patch first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just wait for the module patch.

@astrelsky
Copy link
Contributor

astrelsky commented Dec 19, 2024

I broke it where I did not expect to

Disregard that, it looks like I had an issue where a free-threaded build was laying around and loaded instead.
Why the hell is pip using the free threaded by default but the default python is normal. They really need to fix this.

@astrelsky
Copy link
Contributor

Ok good news I can't break it and #1242 is not necessary.

@Thrameos
Copy link
Contributor Author

Thrameos commented Dec 20, 2024

Okay I will work on making changes today and see if we can get a release candidate ready. I may need to ask others to test. Our testing matrix really can't cover all the deployments out there. Especially those partial jvms and international installs.

@@ -49,7 +49,8 @@
# 3.8 onward
from typing import Protocol, runtime_checkable
from typing import SupportsIndex, SupportsFloat # lgtm [py/unused-import]
from typing import Sequence, Mapping, Set, Callable # lgtm [py/unused-import]
# lgtm [py/unused-import]
from typing import Sequence, Mapping, Set, Callable

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'Callable' is not used.
@Thrameos Thrameos merged commit c421810 into jpype-project:master Jan 20, 2025
26 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants