________
/\ sa \
/ \ ku \
\ / ra /
\/_______/
An extension of R native serialization using the ‘refhook’ system for custom serialization and unserialization of non-system reference objects.
This package was a request from a meeting of the R Consortium Marshalling and Serialization Working Group held at useR!2024 in Salzburg, Austria. It is designed to further discussion around a common framework for marshalling in R.
It extracts the functionality embedded within the nanonext and mirai async frameworks for use in other contexts.
Some R objects by their nature cannot be serialized, such as those accessed via an external pointer.
Using the arrow
package as an
example:
library(arrow, warn.conflicts = FALSE)
x <- list(as_arrow_table(iris), as_arrow_table(mtcars))
unserialize(serialize(x, NULL))
#> [[1]]
#> Table
#> Error: Invalid <Table>, external pointer to null
In such cases, sakura::serial_config()
can be used to create custom
serialization configurations, specifying functions that hook into R’s
native serialization mechanism for reference objects (‘refhooks’).
cfg <- sakura::serial_config(
class = "ArrowTabular",
sfunc = arrow::write_to_raw,
ufunc = function(x) arrow::read_ipc_stream(x, as_data_frame = FALSE)
)
This configuration can then be supplied as the ‘hook’ argument for
sakura::serialize()
and sakura::unserialize()
.
sakura::unserialize(sakura::serialize(x, cfg), cfg)
#> [[1]]
#> Table
#> 150 rows x 5 columns
#> $Sepal.Length <double>
#> $Sepal.Width <double>
#> $Petal.Length <double>
#> $Petal.Width <double>
#> $Species <dictionary<values=string, indices=int8>>
#>
#> See $metadata for additional Schema metadata
#>
#> [[2]]
#> Table
#> 32 rows x 11 columns
#> $mpg <double>
#> $cyl <double>
#> $disp <double>
#> $hp <double>
#> $drat <double>
#> $wt <double>
#> $qsec <double>
#> $vs <double>
#> $am <double>
#> $gear <double>
#> $carb <double>
#>
#> See $metadata for additional Schema metadata
This time, the arrow tables are handled seamlessly.
Other types of serialization function are vectorized and in this case,
the configuration should be created specifying vec = TRUE
. Using
torch
as an example:
library(torch)
x <- list(torch_rand(5L), runif(5L))
unserialize(serialize(x, NULL))
#> [[1]]
#> torch_tensor
#> Error in (function (self) : external pointer is not valid
Base R serialization above fails, but sakura
serialization succeeds:
cfg <- sakura::serial_config(
class = "torch_tensor",
sfunc = torch::torch_serialize,
ufunc = torch::torch_load,
vec = TRUE
)
sakura::unserialize(sakura::serialize(x, cfg), cfg)
#> [[1]]
#> torch_tensor
#> 0.7018
#> 0.4835
#> 0.8654
#> 0.2325
#> 0.1030
#> [ CPUFloatType{5} ]
#>
#> [[2]]
#> [1] 0.453503897 0.862080483 0.565105654 0.009682012 0.224152206
We would like to thank in particular:
- R Core for providing the interface to the R serialization mechanism.
- Luke Tierney and Mike Cheng for their meticulous efforts in documenting the serialization interface.
- Daniel Falbel for discussion around an efficient solution to serialization and transmission of torch tensors.
The current development version is available from R-universe:
install.packages("sakura", repos = "https://shikokuchuo.r-universe.dev")
–
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.