Skip to content

Commit

Permalink
adding documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
andresy committed Oct 12, 2015
1 parent a410fee commit 75e9587
Showing 1 changed file with 212 additions and 51 deletions.
263 changes: 212 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,78 +1,239 @@
Data structures which do not rely on Lua memory allocator, nor being
limited by Lua garbage collector.

Under the hood, this is a LuaJIT FFI interface to [klib](http://attractivechaos.github.io/klib).
Only C types can be stored: supported types are currently number, strings,
the data structures themselves (see [nesting](#tds.nesting): e.g. it is
possible to have a Hash containing a Hash or a Vec), and torch tensors and
storages. All data structures can store heterogeneous objects, and support
[torch serialization](#tds.serialization).

## Example
It is easy to extend the support to [other C types](#tds.extend).

```Lua
local tds = require 'tds'
Note that `tds` relies currently on FFI, and works both with
[luajit](http://www.luajit.org) or [Lua 5.2](http://www.lua.org), provided
the latter is installed with
[luaffi](https://github.com/facebook/luaffifb). The dependency on FFI will
be removed in the future.

-------------------------------------------------
-- BASIC
-------------------------------------------------
## d = tds.Hash() ##

local h = tds.hash()
Creates a hash table which implements the lua operators `[key]`, `#`
and `pairs()`, and in very similar way than lua tables.

-- set
h.foo = "bar"
h["scm-1"] = "git"
A hash can contain any element (either as key or value) supported by `tds`.

-- the key can be a number
h[1] = "hey"
-- the value can be a number too
h.count = 1234
### d[key] = value ###

-- get
print(h.foo) -- "bar"
print(h["scm-1"]) -- "git"
print(h.count) -- 1234
Store the given (`key`, `value`) pair in the hash table. If `value` is
`nil`, remove the `key` if it exists.

-- length
print(#h) -- 4
### d[key] ###

-- iterator
for k,v in pairs(h) do
print(k, v)
Returns the `value` at the given `key`, and `nil` if the `key` does not exist in the hash table.

### #d ###

Returns the number of key-value pairs in the hash table. Note that this acts different than lua tables, the latter
returning the number of elements stored in numbered indices starting from 1.

### pairs(d) ###

Returns an iterator over the hash table `d`. The iterator returns a
key-value pair at each step, or nil if reaching the end. Typical usage
will be:
```lua
for k,v in pairs(d) do
-- <do something>
end
```

## d = tds.Vec(...) ##

Creates a vector of elements indexed by numbers starting from 1. If arguments are passed at construction, the vector
will be filled with these arguments.

A vector can contain any element (as value) supported by `tds`, as well as the `nil` value.

### d[index] = value ###

Store the given `value` at the given `index` (which must be a positive
number). If the index is larger than the current size of the vector, the
vector will be automatically resized. `value` may be `nil`.

-- unset
h.foo = nil
print(h.foo) -- nil
### d[index] ###

-------------------------------------------------
-- ADVANCED
-------------------------------------------------
Returns the `value` at the given `index` or `nil` if it does not exist.

-- you can nest hashes, i.e use another hash as value
local misc = tds.hash()
misc.hello = "world"
### #d ###

h.baz = misc
print(h.baz.hello) -- "world"
Returns the current size of the vector (note that it includes `nil` values, which are not treated as holes!).

if pcall(require, 'torch') then
-- Torch7
### d:resize(size) ###

-- tds plays nice with Torch since you can set a tensor as value
h.weights = torch.randn(3, 2)
print(h.weights) -- (...) [torch.DoubleTensor of dimension 3x2]
Resize the current vector to the given size. If the size is larger than the current size, the vector will be filled with `nil` values.

-- you can also serialize/unserialize with Torch utils
local f = torch.MemoryFile("rw"):binary()
### d:insert([index], value) ###

-- serialize
f:writeObject(h)
Insert `value` in the vector, at position `index`, shifting up all elements
above `index`. If `index` is not provided, insert the element at the end of
the vector.

-- unserialize
f:seek(1)
local clone = f:readObject()
### d:remove([index]) ###

assert(#h == #clone)
Remove the element at position `index`, shifting down all elements above
`index`. If `index` is not provided, remove the last element of the vector.

f:close()
### ipairs(d) ###

-- Note: you can also use the high level interface to save on disk:
torch.save("dump.bin", h)
Returns an iterator over the vector `d`. The iterator returns a index-value
pair at each step, or nil if reaching the end. Typical usage will be:
```lua
for i,v in pairs(d) do
-- <do something>
end
```

### pairs(d) ###

Alias for ipairs(d).

<a name="tds.serialization"/>
## Serialization ##

All `tds` data structures support torch serialization. Example:

```lua
tds = require 'tds'
require 'torch'

-- create a vector containing heterogeneous data
d = tds.Vec(4, 5, torch.rand(3), nil, "hello world")

-- serialize in a buffer
f = torch.MemoryFile("rw")
f:writeObject(d)

-- unserialize
f:seek(1)
print(f:readObject())
```

The example will output:
```
tds.Vec[5]{
1 : 4
2 : 5
3 : 0.1665
0.8750
0.7525
[torch.DoubleTensor of size 3]
4 : nil
5 : hello world
}
```

<a name="tds.nesting"/>
## Nesting ##

Nesting is supported in `tds`. However, __reference loops are prohibited__, and
will lead to leaks if used.

Example:

```lua

tds = require 'tds'
require 'torch'

-- create a vector containing heterogeneous data
d = tds.Vec(4, 5, torch.rand(3), tds.Hash(), "hello world")

-- fill up the hash table:
d[4].foo = "bar"
d[4][6] = torch.rand(3)
d[4].stuff = tds.Vec("how", "are", "you", "doing")

print(d)
```

This example will output:
```
tds.Vec[5]{
1 : 4
2 : 5
3 : 0.1958
0.5663
0.2777
[torch.DoubleTensor of size 3]
4 : tds.Hash[3]{
foo : bar
6 : 0.0105
0.7496
0.5241
[torch.DoubleTensor of size 3]
stuff : tds.Vec[4]{
1 : how
2 : are
3 : you
4 : doing
}
}
5 : hello world
}
```

<a name="tds.extend"/>
## Extending to other C types ##

`tds` provides a way to extend to your own C types using the submodule
`tds.elem`:

```lua
local elem = require 'tds.elem'
```

### elem.type(obj) ###

`tds` typechecking is achieved using this function. You can override it for
your own purposes. If torch is detected, `tds` will set `elem.type` to
`torch.typename()`, so in general (if you are using torch!) you should not
worry about this part.

### elem.addctype(ttype, free_p, setfunc, getfunc) ###

Add a new C type into `tds`:
- `ttype` must be the typename understood by the current `elem.type()` function.
- `free_p` is a C FFI pointer to a destructor of the C object.
- `setfunc(luaobj)` takes a __lua object__ and returns a FFI C pointer on this object, as well as a FFI function `free_p` to free this object.
- `getfunc(cpointer)` takes a __C FFI pointer__ and returns a lua object of the corresponding object.

One must be careful to handle properly reference counting and garbage collection in `setfunc()` and `getfunc()`:
- `setfunc()` will convert a lua object into a C pointer which will be
stored into the data structure: the reference count on this object must
be increased. When removed from the data structure, `tds` will call the
given `free_p()` function.
- `getfunc()` will convert a C pointer and push it into lua memory space:
one must again increase properly the reference count on this object,
and make sure lua will garbage collect it properly.

Here is a typical example showing how support for `tds.Hash` elements is supported:

```lua
elem.addctype(
'tds.Hash',
C.tds_hash_free,
function(lelem)
C.tds_hash_retain(lelem)
return lelem, C.tds_hash_free
end,
function(lelem_p)
local lelem = ffi.cast('tds_hash&', lelem_p)
C.tds_hash_retain(lelem)
ffi.gc(lelem, C.tds_hash_free)
return lelem
end
)
```

0 comments on commit 75e9587

Please sign in to comment.