Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Clarify some details about foreign blobs #1224

Closed
wants to merge 1 commit into from

Conversation

kamahen
Copy link
Member

@kamahen kamahen commented Jan 26, 2024

I've also put some comments into pl-atom.c, to help me understand the code. You might wish to remove them. :)

There's one logic change: I added a check for the "name" field being non-null before calling the release() callback. This might be unnecessary, but it accords with a documentation change I made.

I tried changing the code in lookupBlob() for how the hash is computed, but that caused a crash when generating home/boot.prc ... that item is marked as a "TODO", which you might also want to remove.

@kamahen
Copy link
Member Author

kamahen commented Jan 27, 2024

I've reverted my changes to pl-atom.c (there's a newer version in master)

@JanWielemaker
Copy link
Member

Thanks. Did quite a bit of rewriting. Please check and ask if you do not understand. As usual, you got some details wrong, but this points me at the possible misconceptions and allows me to clarify 😄

@kamahen
Copy link
Member Author

kamahen commented Feb 2, 2024

"Note that this returns \const{NULL} when called by the atom garbage collector and PL_free_blob() was used on the blob."
Isn't it better for Prolog to check that Atom->name is non-null before calling the release() function? I can't think of anything that the release() function can do if the pointer is null. (I looked through the code and found one place where the pointer might be null and added a check before calling `release().)

In my documentation change for PL_close_foreign_frame(), I wanted to make clear that not all new terms are deleted when the frame is closed (unlike PL_discard_foreign_frame() or PL_rewind_foreign_frame()) - that is, only strictly local new terms that aren't unified with something "outside" the frame are discarded. This is obvious if you understand how the Warren Machine (or similar) works, but I don't think the "average" programmer would find it obvious (I had to think a bit about it when I was going over the API for PlFrame; perhaps I'm more prone to misinterpretation than the average programmer).

BTW, I think that "callback" is standard English usage, not "call back". (Except when "call" is a verb)

I changed line 1972 from

if ( !PL_unify(t1, t2) )

to

if ( !PL_unify(t1, t2) || !PL_unify(t3, t4) )

because I wanted the example to show the first unification succeeding and the second unification failing. If a single unification fails, there's no need to wrap things in PL_open_foreign_frame() (correct?).

@kamahen
Copy link
Member Author

kamahen commented Feb 2, 2024

Oops ... I mis-read the example code for PL_open_foreign_frame(). I'll try to rewrite the text a bit (in a separate PR) so that it's clearer, at least to me.

@JanWielemaker
Copy link
Member

Isn't it better for Prolog to check that Atom->name is non-null before calling the release() function? I can't think of anything that the release() function can do if the pointer is null.

I don't know. Possibly you still want the release callback to be notified if the atom is gone? I don't see much reason for this check in the core. They are surely two different actions: releasing the payload and releasing the atom. They are also easily enough to distinguish, so we do not need two callback handlers. If you want to hide that in the C++ wrapper, fine.

I wanted to make clear that not all new terms are deleted when the frame is closed (unlike PL_discard_foreign_frame() or PL_rewind_foreign_frame()) - that is, only strictly local new terms that aren't unified with something "outside" the frame are discarded.

I tried to rephrase the text. No term is deleted, only the references. This is true regardless of whether or not there are other references to the term. Indeed, the discard variation deletes the terms, but this is completely regardless of the references. Lifetime of terms is basically unrelated to the lifetime of term references except that term references to a term ensure it is not subject to GC.

callback

I'll try to remember ....

If the text is still unclear, please try to improve it. I think the current text is correct (but apparently hard to understand 😢 ) Scoping and lifetime of Prolog data is hard to understand for users of imperative languages ...

@kamahen
Copy link
Member Author

kamahen commented Mar 3, 2024

Can a release() callback do anything when PL_free_blob() returns NULL? All the information it needs is in the blob -- the atom_t value isn't useful for anything AFIACT.

Anyway, I'll add a test for NULL to the C++ callback.

@JanWielemaker
Copy link
Member

Can a release() callback do anything when PL_free_blob() returns NULL?

Not entirely clear. Possibly it wants to remove the handle from some table? Note that PL_free_blob() does not reclaim the handle, only the content.

@kamahen
Copy link
Member Author

kamahen commented Mar 4, 2024

By "remove the handle from some table", I suppose you're thinking of the table of name→atom_t in rocksdb? (actually, it's PlAtom→PlAtom, but same idea)

In that situation, release() will have already been called from PL_free_blob(), which is where the handle should be removed; the call to release() has no way of telling whether it's being called from PL_free_blob() or from the GC; and no expectation that it'll be called a second time. In addition, if the table has some kind of reference count (e.g., the blob in rocksdb has a count of the iterators that are using it), the call to PL_free_blob() would decrement the count; a subsequent call from GC would cause the reference count to go negative.

@JanWielemaker
Copy link
Member

Ok. Pushed ee3d08e to never call the release function if the data is already reclaimed. Note that PL_get_blob(), etc. still must be aware that the data may be NULL.

@kamahen
Copy link
Member Author

kamahen commented Mar 5, 2024

Why not also test for NULL before calling release() from PL_free_blob()? That would guarantee PL_get_blob() never returns NULL in the release() callback.

Do we also need to add a test for NULL to compare atoms(), write_stream_ref()? - currently they'll crash if PL_free_blob() has been called.

@JanWielemaker
Copy link
Member

Agree. Pushed 0bd41c3. Never considered one could call this twice 😄

@kamahen
Copy link
Member Author

kamahen commented Mar 6, 2024

Previously, I changed SWI-cpp2.h to contain this (cast() calls PL_blob_data()):

  static int release(atom_t a) noexcept
  { auto data = cast(PlAtom(a));
    if ( !data ) // PL_free_blob() has been used.
      return true;
    ...

I noticed that you removed the sentence from foreign.doc that says that PL_blob_data() can return NULL.
So, I'll leave the test for NULL but change the comment.

For other callbacks (write(), compare(), save()), the C++ code has an assert() that PL_blob_data() doesn't return NULL.

From looking at the code, it think that acquire(), write(), and compare() can never get PL_blob_data() returning NULL, so the assert() is harmless (and compiled out unless -DCMAKE_BUILD_TYPE=Debug).
save() isn't clear to me - I'll add a check and return false.

Does this seem correct?

@JanWielemaker
Copy link
Member

I noticed that you removed the sentence from foreign.doc that says that PL_blob_data() can return NULL.

I'm not aware of that. Surely the docs for PL_blob_data() have not changed according to git blame. And yes, it can return NULL as the atom handle continues to exist after PL_free_blob().

For other callbacks (write(), compare(), save()), the C++ code has an assert() that PL_blob_data() doesn't return NULL.

That is not correct. As the "atom" still exists from Prolog's point of view any of these operations may still happen. Of course, there is no data left to inspect, so you can only do something generic. For example:

1 ?- py_call(sys:path, Path, [py_object(true)]).
Path = <py_list>(0x7fad8b96d480).

For `acquire()` I think you are right.   Well, I'm not 100% sure about race conditions.    The `acquire()` is a direct callback from the call that creates the atom.   What however if the creating thread stalls just before the `acquire()` completes and some other thread picks up the handle and calls PL_free_blob() on it?   It is a bit theoretical as the only way to find this blob-being-created is through `blob/2`.    If we consider this a real issue it might be better to ensure `blob/2` does not return blobs who's creation is not complete.    I doubt we need to consider this a real issue.   Enumerating atoms and blobs is for debugging purposes only.   
2 ?- py_free($Path).
Path = <py_FREED>(0x0).

In this case we want user intervention because Python objects are written as <py_>(Ptr).

@kamahen
Copy link
Member Author

kamahen commented Mar 7, 2024

ee3d08e line 2717 removes the warning that PL_blob_data() can return NULL in release() and moves it to line 2975.
Perhaps the initial warning should be reinstated? (it doesn't hurt to say some things twice)

I'm wondering what the right behaviour is when PL_blob_data() returns NULL. assert() isn't good, because it's optimized out of non-debug builds.

  • acquire() has no way of signalling an error
  • compare() has no way of signalling an error
  • write() could just output <blob_name>(NULL), I suppose
  • save() can signal an error

(Why do I care about this? I've noticed that occasionally some build unit tests fail (and not just for the HTTP tests), but haven't followed up ... there might be some non-deterministic bugs in the code. Anything that suppresses error checks such as assert() could have some subtle problem ...)

=== (Repeating Jan's comment because the formatting was messed up) ===

That is not correct. As the "atom" still exists from Prolog's point of view any of these operations may still happen. Of course, there is no data left to inspect, so you can only do something generic. For example:

1 ?- py_call(sys:path, Path, [py_object(true)]).
Path = <py_list>(0x7fad8b96d480).

For acquire() I think you are right. Well, I'm not 100% sure about race conditions. The acquire() is a direct callback from the call that creates the atom. What however if the creating thread stalls just before the acquire() completes and some other thread picks up the handle and calls PL_free_blob() on it? It is a bit theoretical as the only way to find this blob-being-created is through blob/2. If we consider this a real issue it might be better to ensure blob/2 does not return blobs whose creation is not complete. I doubt we need to consider this a real issue. Enumerating atoms and blobs is for debugging purposes only.

2 ?- py_free($Path).
Path = <py_FREED>(0x0).

In this case we want user intervention because Python objects are written as <py_>(Ptr).

=== (End Jan's comment) ===

@JanWielemaker
Copy link
Member

Perhaps the initial warning should be reinstated? (it doesn't hurt to say some things twice)

True. Just, this text is already so complicated. Just be more explicit in PL_blob_data(), etc. may be better.

As is, release() can not have a NULL data. I think that also holds for acquire(), so an assert there is ok. All the others should just deal with it. compare can compare all things without data as first/last and for two without data on the handle. save() should probably also do something sensible. Although the "what" may be context dependent.

some build unit tests fail

At the moment I'm only aware of http_proxy and mqi. The first is most likely a port allocation issue and the latter is some sort of timing issue. Then there are a couple of tests that are inherently unsafe because they rely on timing that may go wrong on a heavily loaded system. I think most merely print a warning rather than actually failing, but I may have missed a few. I'm pretty sure that the issues are mostly with concurrency, some with the design of the tests, others with the core.

@kamahen
Copy link
Member Author

kamahen commented Mar 10, 2024

I've created an issue for "some build unit tests fail": SWI-Prolog/packages-cpp#73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants