-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asynchronous conference bridge operation #3928
Conversation
Since issue 2 is already handled, I assume it's no longer a problem (i.e. no longer causes backward compatibility)? Which only leaves issue 1. I understand it will be quite a lot to change all existing pjmedia_port(s) to have its own pool, but since it's necessary for this feature, I vote to continue. And if 1&2 are already resolved, the change in pjsua_app.c is no longer necessary, correct? (i.e. app will not need to change anything). |
The backward compatibility issues cannot be completely avoided, e.g:
Yes for |
Regarding issue 3, I don't think app is allowed to call The important thing here seems to be that |
Here is a sample scenario. App using PJSUA creates its own pool factory (instead of using PJSUA's) for some reason, and it uses the pool/factory to create a PJMEDIA port. It has to destroy the pool factory before shutting down PJSUA. Re: lib (PJSUA?) restart,
Yes, |
I see.
Ah, right. It's called upon |
…port check in video conference (to avoid vconf put/get frame from just-added-yet-unsynced ports).
…port check in video conference (to avoid vconf put/get frame from just-added-yet-unsynced ports).
…nto async-aud-conf
port->info.name.ptr)); | ||
pj_assert(!"Port using group lock should implement on_destroy()!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we removing the assertion and error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of checking port.on_destroy
is a kind of best effort to "detect" own pool availability to avoid premature destroy, I guess it should not cause a hard failure?
Some conditions may cause port.on_destroy
not implemented while the port has own pool, e.g:
- has its own destructor API, e.g: stream port has
pjmedia_stream_destroy()
, it has own pool but does not implementport.on_destroy
. - old custom/app port implementation with group lock but do not publish it to
port.grp_lock
(the field is relatively newly added) may implement destroy via group lock instead ofport.on_destroy
.
@@ -93,6 +93,7 @@ typedef struct vconf_port | |||
unsigned idx; /**< Port index. */ | |||
pj_str_t name; /**< Port name. */ | |||
pjmedia_port *port; /**< Video port. */ | |||
pj_bool_t is_new; /**< Port newly added? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not quite clear with the "new port" check. Since it also applies to video, is it considered a bug fix for the current code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is kind of bug fix (for some reason, for video it causes less/no harm).
Currently adding a port to conf needs to be done synchronously to avoid premature destroy (if async, the conf may end-up adding/accessing a destroyed port). However, we cannot increment/change conf->port_cnt
there as it is accessed by conf get_frame()
without mutex protection, e.g:
pjproject/pjmedia/src/pjmedia/conference.c
Line 2387 in ed32a91
for (i=0, ci=0; i<conf->max_ports && ci < conf->port_cnt; ++i) { |
So, the "new port" check is introduced to be able to increment the
conf->port_cnt
safely.
There's an issue when testing this.
|
I did the pre-release test and this seems to fail the double-hold scenario. Only video seems to be affected (video will not fully recover). |
See PR #4164 |
Thanks for fixing it! Also, I noticed these additional two lines in pjsua app during shutdown:
I suppose this is unavoidable due to issue 1 in the PR description above, correct? |
Unfortunately no. It is a new bug, see #4166. |
Hi, everybody! I would like to leave feedback here. However, we are encountering several unexpected behavior changes and issues. Some of them are outlined below:
|
First, thanks for the feedbacks! Re: peformance improvement note, I believe so especially for such condition. Will update the desc.
Thanks for the bug report & the fix.
You're right, will try to fix for both video & audio conf.
Ideally app must check & update its ports to have its own pool (btw, the PJMEDIA tonegen and almost all PJMEDIA ports have been updated). It is not supposed to be a recommendation, just a sample of a possible workaround. |
Re: deadlock risk, please check #4243. |
Real world applications may use pjsip as one of many other libraries and may not be written only in C/C++. The application manages resources outside the pjsip logic and may not be able to use the pool API to allocate memory, but in some cases the closing of application resources must be synchronized with the destruction of conference bridge ports. For example, an application may implement a special media port proxy to play audio data stored in a BLOB field of a database. With an asynchronous switch, the application cannot close the database connection not only when performing a disconnect, but also when calling remove port, because the conference bridge may try to receive the next frame of data after that. The application must know the exact moment when to release resources, close the database connection, etc. This moment is the "grp_lock handle call". |
You're right, so far I missed the idea and your suggestion that group lock destroy handler can be used as a proper solution to avoid premature pool destroy! Updated the PR desc. Btw, app can access the port's group lock directly and use the group lock APIs to add/remove handler. Anyway, added helper APIs in #4244. I cannot add you as a reviewer somehow, but feel free to review. Thanks again! |
In #3183, video conference bridge adopts asynchronous ports operations to handle non-uniform lock ordering issues. This PR integrates the same mechanism for audio conference bridge.
Beside avoiding deadlock, in general, this change may improve performance a little bit as audio data processing in the conference bridge will be performed without holding a mutex. It gives a serious performance improvement on a heavily loaded system where the high-priority conference bridge thread does not leave time for another thread to hold the conference mutex.
There are some pending tasks (e.g: integrate this into audio switchboard).
Potential backward compatibility issues (major):
Pool release immediately after port removal, consider this code:
Note that this issue affects ports that do not use its own pool (i.e: it allocates its instance and all internal states using the supplied memory pool), so releasing the pool immediately after removing the port from conference bridge may cause crash as the port removal is now asynchronous and the port may be still being accessed by the conference bridge.
Now, all PJMEDIA ports (except Bidirectional & Stereo) have been updated to use its own pool, so they should not be affected. If your application uses any custom PJMEDIA ports, the alternatives are:
pjmedia_port_add_destroy_handler(port, ...)
orpj_grp_lock_add_handler(port->grp_lock, ...)
after adding the port to conference.pj_thread_sleep()
. Calculating the delay needed is quite tricky. For example, a 100 ms delay for most/normal cases should be sufficient, but for a heavily loaded system may not.PJMEDIA Stream pattern.
PJMEDIA Stream exports a PJMEDIA port interface which can be queried using
pjmedia_stream_get_port()
. Unfortunately the port does not implementspjmedia_port_destroy()
, it is destroyed usingpjmedia_stream_destroy()
instead, which will release the memory pool. Similar to above issue, if pjmedia_stream_destroy() is invoked immediately afterpjmedia_conf_remove_port()
, it may cause crash.Now PJMEDIA Stream has a group lock, so
pjmedia_stream_destroy()
will just decrement the reference counter, the real destroy will be done when the reference counter reaches zero.Premature caching pool destroy.
When all PJMEDIA ports have their own pools, there is still possibility that application uses its own pool factory for creating the ports and destroy the pool factory (e.g: invoking
pj_caching_pool_destroy()
) prematurely before the actual port removal is done.For example, in a library shutdown scenario, the conference bridge clock has been stopped so actual port removal can only be done later by
pjmedia_conf_destroy()/pjsua_destroy()
, but in between, application may destroy its pool factory which may cause crash. A possible solution is to delay the pool factory destroy to PJLIB deinitialization viapj_atexit()
.get/put_frame()
callbacks until the removal completes.