Skip to content

Commit

Permalink
docs(arrowjs): Update Arrow docs and release notes (#2778)
Browse files Browse the repository at this point in the history
  • Loading branch information
ibgreen authored Nov 9, 2023
1 parent 4184949 commit 21ed934
Show file tree
Hide file tree
Showing 31 changed files with 342 additions and 334 deletions.
2 changes: 1 addition & 1 deletion docs/README.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<br />
<br />

This documentation describes loaders.gl **v4.0**. See our [**release notes**](./whats-new) to learn what is new.
This documentation describes loaders.gl **v4.0**. See our [**release notes**](/docs/whats-new) to learn what is new.

Docs for older versions are available on github:
**[v3.3](https://github.com/visgl/loaders.gl/blob/3.3-release/docs/README.md)**,
Expand Down
87 changes: 87 additions & 0 deletions docs/arrowjs/api-reference/builder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Builders


The `makeBuilder()` function creates a `Builder` instance that is set up to build
a columnar vector of the supplied `DataType`.

A `Builder` is responsible for writing arbitrary JavaScript values
to ArrayBuffers and/or child Builders according to the Arrow specification
or each DataType, creating or resizing the underlying ArrayBuffers as necessary.

The `Builder` for each Arrow `DataType` handles converting and appending
values for a given `DataType`.

Once created, `Builder` instances support both appending values to the end
of the `Builder`, and random-access writes to specific indices
`builder.append(value)` is a convenience method for
builder.set(builder.length, value)`). Appending or setting values beyond the
uilder's current length may cause the builder to grow its underlying buffers
r child Builders (if applicable) to accommodate the new values.

After enough values have been written to a `Builder`, `builder.flush()`
ill commit the values to the underlying ArrayBuffers (or child Builders). The
nternal Builder state will be reset, and an instance of `Data<T>` is returned.
lternatively, `builder.toVector()` will flush the `Builder` and return
n instance of `Vector<T>` instead.

When there are no more values to write, use `builder.finish()` to
inalize the `Builder`. This does not reset the internal state, so it is
ecessary to call `builder.flush()` or `toVector()` one last time
f there are still values queued to be flushed.

Note: calling `builder.finish()` is required when using a `DictionaryBuilder`,
ecause this is when it flushes the values that have been enqueued in its internal
ictionary's `Builder`, and creates the `dictionaryVector` for the `Dictionary` `DataType`.


## Usage

Creating a utf8 array

```ts
import { Builder, Utf8 } from 'apache-arrow';

const utf8Builder = makeBuilder({
type: new Utf8(),
nullValues: [null, 'n/a']
});

utf8Builder
.append('hello')
.append('n/a')
.append('world')
.append(null);

const utf8Vector = utf8Builder.finish().toVector();

console.log(utf8Vector.toJSON());
// > ["hello", null, "world", null]
```

## makeBuilder

```ts
function makeBuilder(options: BuilderOptions): Builder;
```

```ts
type BuilderOptions<T extends DataType = any, TNull = any> {
type: T;
nullValues?: TNull[] | ReadonlyArray<TNull> | null;
children?: { [key: string]: BuilderOptions } | BuilderOptions[];
}
```

- `type` - the data type of the column. This can be an arbitrarily nested data type with children (`List`, `Struct` etc).
- `nullValues?` - The javascript values which will be considered null-values.
- `children?` - `BuilderOptions` for any nested columns.

- `type T` - The `DataType` of this `Builder`.
- `type TNull` - The type(s) of values which will be considered null-value sentinels.


## Builder

`makeBuilder()` returns `Builder` which is a base class for the various that Arrow JS builder subclasses that
construct Arrow Vectors from JavaScript values.

23 changes: 0 additions & 23 deletions docs/arrowjs/api-reference/data-frame.md

This file was deleted.

33 changes: 0 additions & 33 deletions docs/arrowjs/api-reference/predicates.md

This file was deleted.

File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,6 @@ for (const value of column) {

## Inheritance

Column extends [`Chunked`](/docs/arrowjs/api-reference/chunked)


## Fields

In addition to fields inherited from `Chunked`, Colum also defines
Expand Down
71 changes: 43 additions & 28 deletions docs/arrowjs/api-reference/vector.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,49 @@
# Vector
# Vectors

> This documentation reflects Arrow JS v4.0. Needs to be updated for the new Arrow API in v9.0 +.
A `Vector` is an Array-like data structure. Use `makeVector` and `vectorFromArray` to create vectors.

### makeVector


### vectorFromArray


### Vector

Also referred to as `BaseVector`. An abstract base class for vector types.

* Can support a null map
* ...
* TBD

## Inheritance
## Fields

### `type: DataType`

## Fields
The Arrow `DataType` that describes the elements in this Vector.

### data: `Data<T>` (readonly)
### `data: Data<T> (readonly)`

The underlying Data instance for this Vector.

### numChildren: number (readonly)
### `numChildren: number (readonly)`

The number of logical Vector children. Only applicable if the DataType of the Vector is one of the nested types (List, FixedSizeList, Struct, or Map).

### type : T

The DataType that describes the elements in the Vector

### typeId : T['typeId']
### `typeId: T['typeId']`

The `typeId` enum value of the `type` instance

### length : number
### `length: number`

Number of elements in the `Vector`

### offset : number
### `offset: number`

Offset to the first element in the underlying data.

### stride : number
### `stride: number`

Stride between successive elements in the the underlying data.

Expand All @@ -49,59 +55,68 @@ The number of elements in the underlying data buffer that constitute a single lo
- For `FixedSizeList` types, the stride is the `listSize` property of the `FixedSizeList` instance.
- For `FixedSizeBinary` types, the stride is the `byteWidth` property of the `FixedSizeBinary` instance.

### nullCount : Number
### `nullCount: number`

Number of `null` values in this `Vector` instance (`null` values require a null map to be present).

### VectorName : String
### `VectorName: string`

Returns the name of the Vector

### ArrayType : TypedArrayConstructor | ArrayConstructor
### `ArrayType: TypedArrayConstructor | ArrayConstructor`

Returns the constructor of the underlying typed array for the values buffer as determined by this Vector's DataType.

### values : T['TArray']
### `values: T['TArray']`

Returns the underlying data buffer of the Vector, if applicable.

### typeIds : Int8Array | null
### `typeIds: Int8Array | null`

Returns the underlying typeIds buffer, if the Vector DataType is Union.

### nullBitmap : Uint8Array | null
### `nullBitmap: Uint8Array | null`

Returns the underlying validity bitmap buffer, if applicable.

Note: Since the validity bitmap is a Uint8Array of bits, it is _not_ sliced when you call `vector.slice()`. Instead, the `vector.offset` property is updated on the returned Vector. Therefore, you must factor `vector.offset` into the bit position if you wish to slice or read the null positions manually. See the implementation of `BaseVector.isValid()` for an example of how this is done.

### valueOffsets : Int32Array | null
### `valueOffsets: Int32Array | null`

Returns the underlying valueOffsets buffer, if applicable. Only the List, Utf8, Binary, and DenseUnion DataTypes will have valueOffsets.

## Methods

### clone(data: `Data<R>`, children): `Vector<R>`
### `clone(data: Data<R>, children): Vector<R>`

Returns a clone of the current Vector, using the supplied Data and optional children for the new clone. Does not copy any underlying buffers.

### concat(...others: `Vector<T>[]`)
### `concat(...others: Vector<T>[])`

Returns a `Chunked` vector that concatenates this Vector with the supplied other Vectors. Other Vectors must be the same type as this Vector.


### slice(begin?: number, end?: number)
### `slice(begin?: number, end?: number)`

Returns a zero-copy slice of this Vector. The begin and end arguments are handled the same way as JS' `Array.prototype.slice`; they are clamped between 0 and `vector.length` and wrap around when negative, e.g. `slice(-1, 5)` or `slice(5, -1)`

### isValid(index: number): boolean
### `isValid()`

```ts
vector.isValid(index: number): boolean
```

Returns `true` the supplied index is valid in the underlying validity bitmap.


Returns whether the supplied index is valid in the underlying validity bitmap.
### `getChildAt()`

### getChildAt`<R extends DataType = any>`(index: number): `Vector<R>` | null
```ts
vector.getChildAt<R extends DataType = any>(index: number): Vector<R> | null
```
Returns the inner Vector child if the DataType is one of the nested types (Map or Struct).
Returns the inner Vector child if the DataType is one of the nested types such as Map or Struct.
### toJSON(): any
### `toJSON()`
Returns a dense JS Array of the Vector values, with null sentinels in-place.
19 changes: 7 additions & 12 deletions docs/arrowjs/arrow-sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,30 +24,25 @@
"type": "category",
"label": "Developer Guide",
"items": [
"arrowjs/developer-guide/big-ints",
"arrowjs/developer-guide/converting-data",
"arrowjs/developer-guide/data-frame-operations",
"arrowjs/developer-guide/data-sources",
"arrowjs/developer-guide/data-types",
"arrowjs/developer-guide/memory-management",
"arrowjs/developer-guide/predicates",
"arrowjs/developer-guide/reading-and-writing",
"arrowjs/developer-guide/schemas",
"arrowjs/developer-guide/tables",
"arrowjs/developer-guide/typescript"
"arrowjs/developer-guide/builders",
"arrowjs/developer-guide/converting-data",
"arrowjs/developer-guide/memory-management",
"arrowjs/developer-guide/big-ints",
"arrowjs/developer-guide/data-sources",
"arrowjs/developer-guide/reading-and-writing"
]
},
{
"type": "category",
"label": "API Reference",
"items": [
"arrowjs/api-reference/README",
"arrowjs/api-reference/chunked",
"arrowjs/api-reference/column",
"arrowjs/api-reference/data-frame",
"arrowjs/api-reference/data",
"arrowjs/api-reference/dictionary",
"arrowjs/api-reference/field",
"arrowjs/api-reference/predicates",
"arrowjs/api-reference/record-batch-reader",
"arrowjs/api-reference/record-batch-writer",
"arrowjs/api-reference/record-batch",
Expand Down
30 changes: 30 additions & 0 deletions docs/arrowjs/developer-guide/builders.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Building columns and tables

Many JavaScript application may only need to be able to load and iterate of the data in existing Apache Arrow files creating outside of JavaScript.

However a JS application may also want to create its own Arrow tables from scratch.

For this situation, Apache Arrow JS provides the `makeBuilder()` function that returns `Builder` instances that can be used to build columns of specific data types.

However, creating arrow-compatible binary data columns for complex, potentially nullable data types can be quite tricky.

```ts
import { Builder, Utf8 } from 'apache-arrow';

const utf8Builder = makeBuilder({
type: new Utf8(),
nullValues: [null, 'n/a']
});

utf8Builder
.append('hello')
.append('n/a')
.append('world')
.append(null);

const utf8Vector = utf8Builder.finish().toVector();

console.log(utf8Vector.toJSON());
// > ["hello", null, "world", null]
```

Loading

0 comments on commit 21ed934

Please sign in to comment.