Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewire python<->java logging to be generally friendlier to python #2763

Merged
merged 5 commits into from
Sep 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
import java.nio.ByteBuffer;
import java.util.Objects;

class LogBufferOutputStream extends OutputStream {
public class LogBufferOutputStream extends OutputStream {

private final LogBuffer sink;
private final LogLevel level;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
package io.deephaven.integrations.python;

import org.jetbrains.annotations.NotNull;
import org.jpy.PyObject;

import java.io.IOException;
import java.io.OutputStream;
import java.util.function.Supplier;

/**
* Simple output stream that redirects all writes to a python io.TextIOBase type.
*/
public class PyLogOutputStream extends OutputStream {
private final Supplier<PyObject> rawIoBaseSupplier;

public PyLogOutputStream(Supplier<PyObject> rawIoBaseSupplier) {
this.rawIoBaseSupplier = rawIoBaseSupplier;
}

@Override
public void write(int i) throws IOException {
write(new byte[] {(byte) i});
}

@Override
public void write(@NotNull byte[] b) throws IOException {
// TODO (deephaven#2793) switch to modern method overloads when jpy#87 is fixed
rawIoBaseSupplier.get().callMethod("write", new String(b));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need / want to account for Charset here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be better if we could call org.jpy.PyObject#call(java.lang.Class<T>, java.lang.String, java.lang.Class<?>[], java.lang.Object[]) - this allows us to be more explicit w/ the parameter and return types.

As it stands now, all of the calls to callMethod return PyObject, which I think is extra garbage we shouldn't need to create.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that raises a question - why can jpy convert cleanly from py bytes to java byte[], but not vice versa? That's the real answer here, since it is already encoded (and we can't cleanly ask for the underlying encoding... since jpy might break it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, why be more explicit with the param types? Both call into PyLib.callAndReturnValue, but the variant that takes lots of Class<?>s does java-side type checks, rather than just passing the string instance - and there is no return type to be concerned with.

(Also, .call() vs .callMethod() has semantics that technically we don't want here, we actually want callMethod. However, as far as I can tell, jpy just throws away that extra flag with a "todo why isn't this used?" sort of note)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly concerned about creating extraneous PyObjects - "write" will return a PyObject that is a python int.

try (final PyObject ignored = rawIoBaseSupplier.get().callMethod("write", new String(b))) {
}

vs

rawIoBaseSupplier.get().call(int.class, "write", String.class, new String(b));

IMO, the second is much better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, these are not the same - call sets methodCall=false and its docs clarify that this is invoking a python callable object, not a method. All overrides do this.

In contrast, callMethod sets methodCall=true, and is intended for calling methods. We don't have (but arguably should?) an overload of callMethod like call does to support each arg type. I'd support going so far as to add overloads that return primitives or void, to avoid that extra cost you describe.

In this case, we want to call a method, we are not interacting with a callable object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the jpy implementation does not actually use that boolean internally - some sort of historic wart before my time.

Copy link
Member Author

@niloc132 niloc132 Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that's what I said above - in which case we should either figure out what issue might be there or historically have been there, or change the API. I'm just going off of the current stated contract of JPy, which is that this isn't to be used for method calls. But at the end of the day, we're using the wrong method that happens to do the right thing, for now, probably, to save on creating one object (that will either be called approx once per jvm, in the case of close(), or for write and maybe flush, are likely to be jitted to be inlined and have escap analysis handle anyway)?

I've you insist, I'll write the wrong code that currently does what we want but not what we're asking for, but this is an improvement to make to jpy and then improve our usage of it, so I'm objecting to deliberately doing the wrong thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filed jpy-consortium/jpy#87.

org.jpy.PyObject#call(java.lang.Class<T>, java.lang.String, java.lang.Class<A0>, A0) doesn't expose the notion of "methodCall", or have any javadoc (my fault). I think your argument falls down a little bit though, b/c if the implementation of that method instead used true instead of false, the outcome would be exactly the same, and is arguably an implementation detail.

I do like to consider myself against unnecessary pre-optimization. With respect to PyObject though, it's not the allocation itself I'm usually worried about, but the implications that extraneous PyObjects may cause (PyObjectReferences growth, unnecessarily keeping python references alive). And specifically as a glue layer between streams, there's potential for this to be heavily used. In general, if there is a way to accomplish the same outcome but without creating a PyObject, that's the direction I'll lean.

All that said - I've argued my case, and can let the code stand as-is if that's still your preference.

}

@Override
public void write(@NotNull byte[] b, int off, int len) throws IOException {
byte[] buffer = new byte[len];
System.arraycopy(b, off, buffer, 0, len);
write(buffer);
}

@Override
public void flush() throws IOException {
// TODO (deephaven#2793) switch to modern method overloads when jpy#87 is fixed
rawIoBaseSupplier.get().callMethod("flush");
}

@Override
public void close() throws IOException {
// TODO (deephaven#2793) switch to modern method overloads when jpy#87 is fixed
rawIoBaseSupplier.get().callMethod("close");
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
import io.deephaven.base.verify.Assert;
import io.deephaven.configuration.Configuration;
import io.deephaven.engine.exceptions.CancellationException;
import io.deephaven.engine.context.QueryLibrary;
import io.deephaven.engine.context.QueryScope;
import io.deephaven.engine.updategraph.UpdateGraphProcessor;
import io.deephaven.engine.util.AbstractScriptSession;
Expand Down Expand Up @@ -93,11 +92,6 @@ public PythonDeephavenSession(
}
this.scriptFinder = new ScriptFinder(DEFAULT_SCRIPT_PATH);

/*
* We redirect the standard Python sys.stdout and sys.stderr streams to our log object.
*/
PythonLogAdapter.interceptOutputStreams(evaluator);

publishInitial();

/*
Expand Down

This file was deleted.

9 changes: 8 additions & 1 deletion engine/table/python/core/deephaven_jpy_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,17 @@

import jpy
import os
import sys
from deephaven_internal.stream import TeeStream

# Set stdin to /dev/null to prevent functions (like help()) that attempt to read from stdin from hanging the worker.
# Set stdin to /dev/null to prevent functions (like help()) that attempt to read from stdin from hanging python
# execution from within Java.
os.dup2(os.open("/dev/null", os.O_RDONLY), 0)

jpy.VerboseExceptions.enabled = True
# If you want jpy to tell you about all that it is doing, change this
# jpy.diag.flags = jpy.diag.F_ALL

j_sys = jpy.get_type('java.lang.System')
sys.stdout = TeeStream.redirect(sys.stdout, j_sys.out)
sys.stderr = TeeStream.redirect(sys.stderr, j_sys.err)
9 changes: 8 additions & 1 deletion py/embedded-server/deephaven_server/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#

from typing import Dict, List, Optional
import sys

from .start_jvm import start_jvm

Expand Down Expand Up @@ -39,9 +40,15 @@ def __init__(self, host: Optional[str] = None, port: Optional[int] = None, jvm_a
# it is now safe to import jpy
import jpy

# Create a wrapped java server that we can reference to talk to the platform
# Create a python-wrapped java server that we can reference to talk to the platform
self.j_server = jpy.get_type('io.deephaven.python.server.EmbeddedServer')(host, port, dh_args)

# Obtain references to the deephaven logbuffer and redirect stdout/stderr to it. Note that we should not import
# this until after jpy has started.
from deephaven_internal.stream import TeeStream
sys.stdout = TeeStream.split(sys.stdout, self.j_server.getStdout())
sys.stderr = TeeStream.split(sys.stderr, self.j_server.getStderr())

# Keep a reference to the server so we know it is running
Server.instance = self

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
package io.deephaven.python.server;

import dagger.Module;
import dagger.Provides;
import dagger.multibindings.ElementsIntoSet;
import io.deephaven.base.system.StandardStreamState;
import io.deephaven.internal.log.InitSink;
import io.deephaven.internal.log.LoggerFactory;
import io.deephaven.io.log.LogSink;
import io.deephaven.io.logger.LogBuffer;
import io.deephaven.io.logger.LogBufferGlobal;
import io.deephaven.server.log.LogModule;

import javax.inject.Singleton;
import java.util.Collections;
import java.util.ServiceLoader;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.StreamSupport;

@Module
public interface EmbeddedPyLogModule {

@Provides
static LogBuffer providesLogBuffer() {
return LogBufferGlobal.getInstance().orElseThrow(() -> new RuntimeException("No global LogBuffer found"));
}

@Provides
static LogSink providesLogSink() {
// In contract, this should be a singleton - see LogInit#checkLogSinkIsSingleton()
return LoggerFactory.getLogger(LogModule.class).getSink();
}

@Provides
@ElementsIntoSet
static Set<InitSink> providesLoggerSinkSetups() {
return StreamSupport
.stream(ServiceLoader.load(InitSink.class).spliterator(), false)
.collect(Collectors.toSet());
}

@Provides
@Singleton
static StandardStreamState providesStandardStreamState() {
return new StandardStreamState(Collections.emptySet());
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@
import dagger.Component;
import io.deephaven.configuration.Configuration;
import io.deephaven.engine.util.ScriptSession;
import io.deephaven.integrations.python.PyLogOutputStream;
import io.deephaven.io.log.LogLevel;
import io.deephaven.io.logger.LogBuffer;
import io.deephaven.io.logger.LogBufferOutputStream;
import io.deephaven.server.console.groovy.GroovyConsoleSessionModule;
import io.deephaven.server.console.python.PythonConsoleSessionModule;
import io.deephaven.server.console.python.PythonGlobalScopeModule;
Expand All @@ -21,18 +25,22 @@
import io.deephaven.server.runner.DeephavenApiServerModule;
import io.deephaven.server.runner.Main;
import io.deephaven.server.util.Scheduler;
import org.jpy.PyModule;
import org.jpy.PyObject;

import javax.annotation.Nullable;
import javax.inject.Inject;
import javax.inject.Provider;
import javax.inject.Singleton;
import java.io.IOException;
import java.io.OutputStream;
import java.io.PrintStream;

public class EmbeddedServer {
@Singleton
@Component(modules = {
DeephavenApiServerModule.class,
EmbeddedPyLogModule.class,
DeephavenApiConfigModule.class,
PythonGlobalScopeModule.class,
HealthCheckModule.class,
Expand Down Expand Up @@ -60,7 +68,19 @@ interface Builder extends DeephavenApiServerComponent.Builder<PythonServerCompon
@Inject
Provider<ScriptSession> scriptSession;

// // this is a nice idea, but won't work, since this is the same instance that we had to disable via sysprop
// @Inject
// StreamToLogBuffer logBuffer;

@Inject
LogBuffer logBuffer;

public EmbeddedServer(String host, Integer port, PyObject dict) throws IOException {
// Redirect System.out and err to the python equivelents, in case python has (or will) redirected them.
PyModule sys = PyModule.importModule("sys");
System.setOut(new PrintStream(new PyLogOutputStream(() -> sys.getAttribute("stdout"))));
System.setErr(new PrintStream(new PyLogOutputStream(() -> sys.getAttribute("stderr"))));

final Configuration config = Main.init(new String[0], EmbeddedServer.class);
final Builder builder = JettyConfig.buildFromConfig(config);
if (host != null) {
Expand Down Expand Up @@ -113,4 +133,18 @@ private void checkGlobals(ScriptSession scriptSession, @Nullable ScriptSession.S
public int getPort() {
return server.server().getPort();
}

/**
* Provide a way for Python to "tee" log output into the Deephaven log buffers.
*/
public OutputStream getStdout() {
return new LogBufferOutputStream(logBuffer, LogLevel.STDOUT, 256, 1 << 19);
}

/**
* Provide a way for Python to "tee" log output into the Deephaven log buffers.
*/
public OutputStream getStderr() {
return new LogBufferOutputStream(logBuffer, LogLevel.STDERR, 256, 1 << 19);
}
}
Loading