Skip to content

Commit

Permalink
feat(zip): cd header and zip64 info generation implemented (#2792)
Browse files Browse the repository at this point in the history
  • Loading branch information
dariaterekhova-actionengine authored Nov 21, 2023
1 parent 7f0bbb1 commit d1cf361
Show file tree
Hide file tree
Showing 75 changed files with 863 additions and 336 deletions.
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,20 @@

## v4.0

### v4.0.4
- feat(arrow): GeoArrowLoader (#2796)
- fix(draco): revert --format=esm for the worker (#2795)
- feat(arrow): Support GeoJSON output from Geo ArrowLoader (#2794)
- chore: refactor geospatial exmaple to use hooks (#2793)
- chore(arrow): triangulateOnWorker plumbing (#2788)
- feat(website): add GeoParquet datasets (#2786)
- fix(tile-converter): 'finalizing conversion' added (#2787)
- feat(GeoArrow): getBinaryGeometriesFromArrow enhancement (#2785)
- chore(Arrow): add test cases for multipolygon with holes (#2782)
- chore(worker-utils): worker simplification
- chore: Add license headers (#2784)
- chore: Upgrade to docusaurus 3 (#2757)

### v4.0.3

- feat(tile-converter): estimation of time remaining (#2774)
Expand Down
6 changes: 3 additions & 3 deletions docs/arrowjs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@

## Why does loaders.gl provide an Arrow JS API Reference?

> Perhaps this documentation could at some point be contributed back to the Apache Arrow project, but so far this has not happened.
> The idea is that this documentation should at some point be contributed back to the Apache Arrow project/repository, but so far this has not happened.
loaders.gl is designed to output parsed tables and meshes in binary columnar format (whenever the parsed data structure allows). Binary columnar tables are a compact and efficient representation that is easy to work with analytically in JavaScript and to seamlessly upload to GPUs (via e.g. WebGL or WebGPU) for ultra-performance rendering and computation.

While loaders.gl can load data into binary columnar tables, it only provides limited support for working with binary tables. The intention is that the application should be able to use complementary libraries like Apache Arrow JS.

While the Apache Arrow JS library itself is excellent, the [reference documentation for the Apache Arrow JavaScript bindings](https://arrow.apache.org/docs/js/) is unfortunately rather thin. It can therefore be challenging to get up to speed on the Arrow JS API, which is why this documentation is provided in loaders.gl.
While the Apache Arrow JS library itself is excellent, the [reference documentation for the JavaScript bindings](https://arrow.apache.org/docs/js/) is unfortunately rather thin. It can therefore be challenging to get up to speed on the Arrow JS API, which is why this documentation is provided in loaders.gl.

## About Apache Arrow JS

The Apache Arrow JavaScript API is designed to help applications tap into the full power of working with binary columnar data in the Apache Arrow format. Arrow JS has a rich set of classes that supports use cases such as batched loading and writing, as well performing data frame operations on Arrow encoded data, including applying filters, iterating over tables, etc.
The Apache Arrow JavaScript API is designed to help applications work with binary columnar data in the Apache Arrow format. Arrow JS offers a core set of classes that supports use cases such as batched loading and writing, column and row access, schemas etc.

## Getting Started

Expand Down
2 changes: 2 additions & 0 deletions docs/docs-sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@
"label": "Formats",
"items": [
"formats/README",
"modules/arrow/formats/arrow",
"modules/arrow/formats/geoarrow",
"modules/bson/formats/bson",
"modules/csv/formats/csv",
"modules/pcd/formats/pcd",
Expand Down
Binary file added docs/images/logos/apache-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 3 additions & 1 deletion docs/modules/arrow/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Overview

![arrow-logo](./images/apache-arrow-small.png)
 
![apache-logo](../../images/logos/apache-logo.png)

The `@loaders.gl/arrow` module handles [Apache Arrow](https://arrow.apache.org/), an emerging standard for large in-memory columnar data.
The `@loaders.gl/arrow` module provides support for the [Apache Arrow](/docs/modules/arrow/formats/arrow) and [GeoArrow](/docs/modules/arrow/formats/geoarrow) formats.

## Installation

Expand Down
18 changes: 17 additions & 1 deletion docs/modules/arrow/formats/arrow.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,21 @@
# Apache Arrow

![arrow-logo](../images/apache-arrow-small.png)
 
![apache-logo](../../../images/logos/apache-logo.png)

- *[`@loaders.gl/arrow`](/docs/modules/arrow)* - loaders.gl implementation
- *[Apache Arrow](https://arrow.apache.org/)* - A specification for large in-memory columnar data.
- *[ArrowJS](https://arrow.apache.org/docs/js)* - official documentation on ArrowJS API.
- *[ArrowJS](/docs/arrowjs)* - loaders.gl documentation on ArrowJS API.

The Apache Arrow project specifies a language-independent binary columnar memory format. It enables zero-copy shared memory and streaming messaging, interprocess communication, and is supported by many programming languages and data libraries.

This Apache Arrow specification supports encoding vectors and table-like containers of flat and nested data.

The Arrow spec is performance-optimized to eliminate memory copies and aligns columnar data in memory to minimize cache misses and take advantage of the latest SIMD (Single input multiple data) and GPU operations on modern processors.

Apache Arrow is emerging as a de-facto standard for large in-memory columnar data (Spark, Pandas, Drill, ...).

By standardizing on a common binary interchange format, big data systems can reduce the costs and friction associated with cross-system communication.

For more information, see [ArrowJS](/docs/arrowjs) documentation.
43 changes: 43 additions & 0 deletions docs/modules/arrow/formats/geoarrow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# GeoArrow

![arrow-logo](../images/apache-arrow-small.png)
 
![apache-logo](../../../images/logos/apache-logo.png)

- *[`@loaders.gl/arrow`](/docs/modules/arrow)* - loaders.gl implementation
- *[GeoArrow Specification](https://github.com/geoarrow/geoarrow)
- *[Apache Arrow](https://arrow.apache.org/)* - A specification for large in-memory columnar data.
- *[ArrowJS](/docs/arrowjs)* - loaders.gl documentation on ArrowJS API.

## Overview

GeoArrow is a specification for storing geospatial data in Apache Arrow memory layout. It ensures geospatial tools can interoperate and leverage the growing Apache Arrow ecosystem.

GeoArrow enables each row in an Arrow table to represent a feature as defined by the OGC Simple Feature Access standard (i.e. Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection).

Aside from geometry, simple features can also have additional standard Arrow columns that provide additional non-spatial attributes for the feature.

Geospatial tabular data where one or more columns contains feature geometries and remaining columns define feature attributes. The GeoArrow specification defines how such vector features (geometries) can be stored in Arrow (and Arrow-compatible) data structures.

Note that GeoArrow is not a separate format from Apache Arrow rather, the GeoArrow specification simply describes additional conventions for metadata and layout of geospatial data. This means that a valid GeoArrow file is always a valid Arrow file. This is done through [Arrow extension type](https://arrow.apache.org/docs/format/Columnar.html#extension-types) definitions that ensure type-level metadata (e.g., CRS) is propagated when used in Arrow implementations.


## Relationship with GeoParquet

The [GeoParquet specification](https://github.com/opengeospatial/geoparquet) is closely related to GeoArrow. Notable differences:

- GeoParquet is a file-level metadata specification
- GeoArrow is a field-level metadata and memory layout specification

## Geometry Types

| Geometry type | Read | Write | Description |
| -------------------------- | ---- | ----- | ----------- |
| `geoarrow.multipolygon` ||| |
| `geoarrow.polygon` ||| |
| `geoarrow.multipoint` ||| |
| `geoarrow.point` ||| |
| `geoarrow.multilinestring` ||| |
| `geoarrow.linestring` ||| |
| `geoarrow.wkb` ||| |
| `geoarrow.wkt` ||| |
6 changes: 6 additions & 0 deletions examples/website/geospatial/app.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,12 @@ export default function App(props: AppProps) {
if (props.format) {
// Move the preferred format examples to the "top"
examples = {[props.format]: EXAMPLES[props.format], ...EXAMPLES};
// Remove any keys
for (const key of Object.keys(examples)) {
if (key.endsWith('Test')) {
delete examples[key];
}
}
}

const selectedLoader = props.format || INITIAL_LOADER_NAME;
Expand Down
2 changes: 1 addition & 1 deletion lerna.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"lerna": "2.9.1",
"version": "4.0.3",
"version": "4.0.4",
"command": {
"publish": {},
"bootstrap": {}
Expand Down
14 changes: 7 additions & 7 deletions modules/3d-tiles/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@loaders.gl/3d-tiles",
"version": "4.0.3",
"version": "4.0.4",
"description": "3D Tiles, an open standard for streaming massive heterogeneous 3D geospatial datasets.",
"license": "MIT",
"type": "module",
Expand Down Expand Up @@ -42,12 +42,12 @@
"build-bundle": "ocular-bundle ./src/index.ts"
},
"dependencies": {
"@loaders.gl/draco": "4.0.3",
"@loaders.gl/gltf": "4.0.3",
"@loaders.gl/loader-utils": "4.0.3",
"@loaders.gl/math": "4.0.3",
"@loaders.gl/tiles": "4.0.3",
"@loaders.gl/zip": "4.0.3",
"@loaders.gl/draco": "4.0.4",
"@loaders.gl/gltf": "4.0.4",
"@loaders.gl/loader-utils": "4.0.4",
"@loaders.gl/math": "4.0.4",
"@loaders.gl/tiles": "4.0.4",
"@loaders.gl/zip": "4.0.4",
"@math.gl/core": "^4.0.0",
"@math.gl/geospatial": "^4.0.0",
"@probe.gl/log": "^4.0.4",
Expand Down
8 changes: 4 additions & 4 deletions modules/arrow/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@loaders.gl/arrow",
"version": "4.0.3",
"version": "4.0.4",
"description": "Simple columnar table loader for the Apache Arrow format",
"license": "MIT",
"type": "module",
Expand Down Expand Up @@ -47,9 +47,9 @@
"build-worker2": "esbuild src/workers/arrow-worker.ts --bundle --outfile=dist/arrow-worker.js --platform=browser --external:{stream}"
},
"dependencies": {
"@loaders.gl/gis": "4.0.3",
"@loaders.gl/loader-utils": "4.0.3",
"@loaders.gl/schema": "4.0.3",
"@loaders.gl/gis": "4.0.4",
"@loaders.gl/loader-utils": "4.0.4",
"@loaders.gl/schema": "4.0.4",
"@math.gl/polygon": "4.0.0",
"apache-arrow": "^13.0.0"
},
Expand Down
33 changes: 25 additions & 8 deletions modules/arrow/src/arrow-loader.ts
Original file line number Diff line number Diff line change
@@ -1,26 +1,29 @@
// loaders.gl, MIT license
// Copyright (c) vis.gl contributors

import type {Loader, LoaderOptions} from '@loaders.gl/loader-utils';
import type {Loader, LoaderWithParser, LoaderOptions} from '@loaders.gl/loader-utils';
import type {
ArrayRowTable,
ArrowTableBatch,
ColumnarTable,
ObjectRowTable
} from '@loaders.gl/schema';
import type {ArrowTable} from './lib/arrow-table';
import {parseArrowSync} from './parsers/parse-arrow-sync';
import {parseArrowInBatches} from './parsers/parse-arrow-in-batches';

// __VERSION__ is injected by babel-plugin-version-inline
// @ts-ignore TS2304: Cannot find name '__VERSION__'.
const VERSION = typeof __VERSION__ !== 'undefined' ? __VERSION__ : 'latest';

export type ArrowLoaderOptions = LoaderOptions & {
arrow?: {
shape:
| 'arrow-table'
| 'columnar-table'
| 'array-row-table'
| 'object-row-table'
| 'geojson-table';
shape: 'arrow-table' | 'columnar-table' | 'array-row-table' | 'object-row-table';
};
};

/** ArrowJS table loader */
export const ArrowLoader: Loader<ArrowTable, never, ArrowLoaderOptions> = {
export const ArrowWorkerLoader: Loader<ArrowTable, never, ArrowLoaderOptions> = {
name: 'Apache Arrow',
id: 'arrow',
module: 'arrow',
Expand All @@ -41,3 +44,17 @@ export const ArrowLoader: Loader<ArrowTable, never, ArrowLoaderOptions> = {
}
}
};

/** ArrowJS table loader */
export const ArrowLoader: LoaderWithParser<
ArrowTable | ColumnarTable | ObjectRowTable | ArrayRowTable,
ArrowTableBatch,
ArrowLoaderOptions
> = {
...ArrowWorkerLoader,
parse: async (arraybuffer: ArrayBuffer, options?: ArrowLoaderOptions) =>
parseArrowSync(arraybuffer, options?.arrow),
parseSync: (arraybuffer: ArrayBuffer, options?: ArrowLoaderOptions) =>
parseArrowSync(arraybuffer, options?.arrow),
parseInBatches: parseArrowInBatches
};
55 changes: 55 additions & 0 deletions modules/arrow/src/geoarrow-loader.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
// loaders.gl, MIT license
// Copyright (c) vis.gl contributors

import type {Loader, LoaderWithParser, LoaderOptions} from '@loaders.gl/loader-utils';
import {ArrowWorkerLoader} from './arrow-loader';
import type {GeoJSONTable, GeoJSONTableBatch, BinaryGeometry} from '@loaders.gl/schema';
import type {ArrowTable, ArrowTableBatch} from './lib/arrow-table';
import {parseGeoArrowSync} from './parsers/parse-geoarrow-sync';
import {parseGeoArrowInBatches} from './parsers/parse-geoarrow-in-batches';

// __VERSION__ is injected by babel-plugin-version-inline
// @ts-ignore TS2304: Cannot find name '__VERSION__'.
const VERSION = typeof __VERSION__ !== 'undefined' ? __VERSION__ : 'latest';

export type GeoArrowLoaderOptions = LoaderOptions & {
arrow?: {
shape: 'arrow-table' | 'binary-geometry';
};
};

/** ArrowJS table loader */
export const GeoArrowWorkerLoader: Loader<
ArrowTable | BinaryGeometry,
never,
GeoArrowLoaderOptions
> = {
...ArrowWorkerLoader,
options: {
arrow: {
shape: 'arrow-table'
}
}
};

/**
* GeoArrowLoader loads an Apache Arrow table, parses GeoArrow type extension data
* to convert it to a GeoJSON table or a BinaryGeometry
*/
export const GeoArrowLoader: LoaderWithParser<
ArrowTable | GeoJSONTable, // | BinaryGeometry,
ArrowTableBatch | GeoJSONTableBatch, // | BinaryGeometry,
GeoArrowLoaderOptions
> = {
...ArrowWorkerLoader,
options: {
arrow: {
shape: 'arrow-table'
}
},
parse: async (arraybuffer: ArrayBuffer, options?: GeoArrowLoaderOptions) =>
parseGeoArrowSync(arraybuffer, options?.arrow),
parseSync: (arraybuffer: ArrayBuffer, options?: GeoArrowLoaderOptions) =>
parseGeoArrowSync(arraybuffer, options?.arrow),
parseInBatches: parseGeoArrowInBatches
};
41 changes: 41 additions & 0 deletions modules/arrow/src/geoarrow-writer.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// import type {} from '@loaders.gl/loader-utils';

import type {WriterWithEncoder, WriterOptions} from '@loaders.gl/loader-utils';
import {GeoJSONTable, BinaryGeometry} from '@loaders.gl/schema';
import {encodeGeoArrowSync} from './lib/encode-geoarrow';

// __VERSION__ is injected by babel-plugin-version-inline
// @ts-ignore TS2304: Cannot find name '__VERSION__'.
const VERSION = typeof __VERSION__ !== 'undefined' ? __VERSION__ : 'latest';

type ArrowWriterOptions = WriterOptions & {
arrow?: {};
};

/** Apache Arrow writer */
export const GeoArrowWriter: WriterWithEncoder<
GeoJSONTable | BinaryGeometry,
never,
ArrowWriterOptions
> = {
name: 'Apache Arrow',
id: 'arrow',
module: 'arrow',
version: VERSION,
extensions: ['arrow', 'feather'],
mimeTypes: [
'application/vnd.apache.arrow.file',
'application/vnd.apache.arrow.stream',
'application/octet-stream'
],
binary: true,
options: {},
encode: async function encodeArrow(data, options?): Promise<ArrayBuffer> {
// @ts-expect-error
return encodeGeoArrowSync(data);
},
encodeSync(data, options?) {
// @ts-expect-error
return encodeGeoArrowSync(data);
}
};
8 changes: 6 additions & 2 deletions modules/arrow/src/geoarrow/convert-geoarrow-to-geojson.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ import {
import type {GeoArrowEncoding} from '@loaders.gl/gis';

type RawArrowFeature = {
data: arrow.Vector;
encoding?: GeoArrowEncoding;
data: any;
};

/**
Expand All @@ -30,7 +30,7 @@ type RawArrowFeature = {
* @returns Feature or null
*/
export function parseGeometryFromArrow(rawData: RawArrowFeature): Feature | null {
const encoding = rawData.encoding?.toLowerCase();
const encoding = rawData.encoding?.toLowerCase() as typeof rawData.encoding;
const data = rawData.data;
if (!encoding || !data) {
return null;
Expand All @@ -57,6 +57,10 @@ export function parseGeometryFromArrow(rawData: RawArrowFeature): Feature | null
case 'geoarrow.linestring':
geometry = arrowLineStringToFeature(data);
break;
case 'geoarrow.wkb':
throw Error(`GeoArrow encoding not supported ${encoding}`);
case 'geoarrow.wkt':
throw Error(`GeoArrow encoding not supported ${encoding}`);
default: {
throw Error(`GeoArrow encoding not supported ${encoding}`);
}
Expand Down
Loading

0 comments on commit d1cf361

Please sign in to comment.