-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Universal async framework #36
Comments
Just one remark: I don't see any way how we could reduce everything to a single lambda, because some of the lambdas need to run in different execution contexts - there are three separate execution contexts here:
|
UPDATE: I reduced the lambdas, if anyone has any ideas how to further simplify this, feel free to comment |
I am actually hesitating between those two: GDAL_ASYNCABLE_MAIN(CPLErr) =
[gdal_band, async_lock, x, y, w, h, data, buffer_w, buffer_h, type, pixel_space, line_space]() {
uv_mutex_lock(async_lock);
CPLErr err = gdal_band->RasterIO(GF_Write, x, y, w, h, data, buffer_w, buffer_h, type, pixel_space, line_space);
uv_mutex_unlock(async_lock);
if (err != CE_None) throw CPLGetLastErrorMsg();
return err;
};
GDAL_ASYNCABLE_RVAL(CPLErr) = [](CPLErr err, GDAL_ASYNCABLE_OBJS o) { return o[0]; }; (another one) GDAL_ASYNCABLE_DO(gdal_band, async_lock, x, y, w, h, data, buffer_w, buffer_h, type, pixel_space, line_space) {
uv_mutex_lock(async_lock);
CPLErr err = gdal_band->RasterIO(GF_Read, x, y, w, h, data, buffer_w, buffer_h, type, pixel_space, line_space);
uv_mutex_unlock(async_lock);
if (err != CE_None) throw CPLGetLastErrorMsg();
ret(persistent[0]);
}; I think the second one is better? UPDATE: this doesn't work in all cases |
No, I am unable make all the cases work with only one lambda, if anyone has any ideas, I am listening |
@mmomtchev I like it and think the code ultimately comes out to be pretty clean without reducing to a single lambda. |
Yes, I think I will leave it like this, unless someone comes up with a brilliant idea |
@mmomtchev My primary use case is https://www.npmjs.com/package/verrazzano so I'm excited to start benchmarking and see how much of a difference this makes |
I tried asyncing diff --git a/src/collections/layer_features.cpp b/src/collections/layer_features.cpp
index 852f61d3..83b805a1 100644
--- a/src/collections/layer_features.cpp
+++ b/src/collections/layer_features.cpp
@@ -18,6 +18,7 @@ void LayerFeatures::Initialize(Local<Object> target) {
Nan::SetPrototypeMethod(lcons, "count", count);
Nan::SetPrototypeMethod(lcons, "add", add);
Nan::SetPrototypeMethod(lcons, "get", get);
+ Nan::SetPrototypeMethod(lcons, "getAsync", getAsync);
Nan::SetPrototypeMethod(lcons, "set", set);
Nan::SetPrototypeMethod(lcons, "first", first);
Nan::SetPrototypeMethod(lcons, "next", next);
@@ -91,7 +92,7 @@ NAN_METHOD(LayerFeatures::toString) {
* @param {Integer} id The feature ID of the feature to read.
* @return {gdal.Feature}
*/
-NAN_METHOD(LayerFeatures::get) {
+GDAL_ASYNCABLE_DEFINE(LayerFeatures::get) {
Nan::HandleScope scope;
Local<Object> parent =
@@ -104,9 +105,17 @@ NAN_METHOD(LayerFeatures::get) {
int feature_id;
NODE_ARG_INT(0, "feature id", feature_id);
- OGRFeature *feature = layer->get()->GetFeature(feature_id);
-
- info.GetReturnValue().Set(Feature::New(feature));
+ OGRLayer *gdal_layer = layer->get();
+ uv_mutex_t *async_lock = layer->async_lock;
+ GDAL_ASYNCABLE_PERSIST(parent);
+ GDAL_ASYNCABLE_MAIN(OGRFeature*) = [async_lock, gdal_layer, feature_id]() {
+ uv_mutex_lock(async_lock);
+ OGRFeature *feature = gdal_layer->GetFeature(feature_id);
+ uv_mutex_unlock(async_lock);
+ return feature;
+ };
+ GDAL_ASYNCABLE_RVAL(OGRFeature*) = [](OGRFeature *feature, GDAL_ASYNCABLE_OBJS) { return Feature::New(feature); };
+ GDAL_ASYNCABLE_EXECUTE(1, OGRFeature*);
}
/** |
Then I tried these 2 simple gists (they are really simple, they are 90% instrumentation code) sync case
async case
So, no direct performance gain - this is impossible as long as the operations on the same dataset run sequentially. Hopefully a future version of GDAL will allow this, they have been talking about this and they even have an RFC about removing the big dataset lock. |
See the per-get duration - it really explodes in the async case, because the GDAL stubs are running parallel - but they are waiting in line for the async_lock |
The dataset is a 25MB GeoJSON with all the European administrative borders (from my weather site https://www.meteo.guru) |
@mmomtchev Yeah, having a second thread is great for the case of having this run on a webserver (we do). If I'm understanding you correctly though, multiple datasets still share a single secondary thread so parsing multiple files at the same time will still block with eachother, or does each dataset receives its own thread in GDAL? I think the library I linked will see a major speedup if we're able to thread pool within a dataset (probably the RFC you're referring to?) if the coordinate transformation ( |
I added a modified benchmark to the same gist that opens 4 datasets on the same file to be able to read with 4 threads. One must pay 4x times the
There is also one severe problem with this approach that is general in Node and cannot be easily solved: the maximum number of threads is limited to I will start pushing the first async vector functions over the weekend to |
@mmomtchev I think we should definitely document and recommend raising |
@contra multiple datasets should be completely independent aside from the |
Another solution is to always |
I think it is stalled (it is quite an undertaking) but I think that they are still considering options |
For everyone who is interested, a big chunk of the async vector API has reached usable state and is available at https://github.com/mmomtchev/node-gdal-async |
@contra, this will be a huge PR (>5000 lines) which lots of new code for the synchronous API too |
@mmomtchev Yeah, I think we can push/prebuild it as a beta release then move it to a major bump once its been tested in the ecosystem for a week. I'll use it in our production ETL system and put it through its paces. |
@contra If this is going to be a 3.0, then I probably should also rewrite the Geometry classes and convert to I have been reassured by the N-API should also come before the switch to github releases as it will impact the release process - after the switch there will be a single binary per platform that will be compatible with all Node versions |
@mmomtchev I'm fine to do multiple major releases instead of putting it all into a 3.0 - up to you how you think it should be ordered and released though. |
I ran the |
I have started working on an universal async call framework so that the all the bindings could easily be converted to async with minimal code modifications
Here it is the current version:
The NAN define on the function method is changed as this
from
NAN_METHOD(Driver::open)
it becomesGDAL_ASYNCABLE_DEFINE(Driver::open)
This will automatically create two methods,
Driver::open
, the sync version andDriver::openAsync
, the async version and a third hidden methodDriver::open_do
which will contain all the code and will be called withasync=true|false
Then only the end part of the function has to be modified as this
Calling GDAL is encapsulated in an asyncable (no automatic variables) lambda, for example
The return value generation is encapsulated too
And finally everything is automagically executed, either synchronously or asynchronously (3 is the callback argument)
If the function needs to protect some objects from the GC, a persist interface is provided
The RVAL lambda can access the persisted objects - sync/async transformations are automatic
Here is the full example for the bottom of
Driver::create
Before (only sync)
After (sync and async)
Here is the full example for the bottom of
RasterBandPixels::read
Before (only sync)
After (sync and async)
Any comments, suggestions or volunteers?
The text was updated successfully, but these errors were encountered: