From 7eef07bf72599f3cacc09f331f4704787f7c85a9 Mon Sep 17 00:00:00 2001 From: tb06904 <141412860+tb06904@users.noreply.github.com> Date: Tue, 5 Nov 2024 17:01:55 +0000 Subject: [PATCH] Gh-534: Additional updates for federated POC (#539) * basic page layout * configuration options page * finish the configuration page * Add access control docs * operation handling info * add additional information * Apply suggestions from code review Co-authored-by: cn337131 <141730190+cn337131@users.noreply.github.com> Co-authored-by: p29876 <165825455+p29876@users.noreply.github.com> * address comments * update federated docs * Apply suggestions from code review Co-authored-by: cn337131 <141730190+cn337131@users.noreply.github.com> * address comments --------- Co-authored-by: cn337131 <141730190+cn337131@users.noreply.github.com> Co-authored-by: p29876 <165825455+p29876@users.noreply.github.com> Co-authored-by: wb36499 <166839644+wb36499@users.noreply.github.com> --- .../gaffer-stores/federated-store.md | 3 + .../simple-federated/additional-info.md | 80 ++++++++++++++----- .../simple-federated/configuration.md | 12 +++ 3 files changed, 74 insertions(+), 21 deletions(-) diff --git a/docs/administration-guide/gaffer-stores/federated-store.md b/docs/administration-guide/gaffer-stores/federated-store.md index 59095d373b..fa0cb987d9 100644 --- a/docs/administration-guide/gaffer-stores/federated-store.md +++ b/docs/administration-guide/gaffer-stores/federated-store.md @@ -1,5 +1,8 @@ # Federated Store +!!! warning + The current version of the federated store and how it currently operates is deprecated, it will be replaced by the current [simple federated store](./simple-federated/configuration.md#) in v2.4.0. + The Federated Store is a Gaffer store which forwards operations to a collection of sub-graphs and returns a single response as though a single graph were queried. ## Introduction diff --git a/docs/administration-guide/gaffer-stores/simple-federated/additional-info.md b/docs/administration-guide/gaffer-stores/simple-federated/additional-info.md index 90bf8c6895..909d6c6568 100644 --- a/docs/administration-guide/gaffer-stores/simple-federated/additional-info.md +++ b/docs/administration-guide/gaffer-stores/simple-federated/additional-info.md @@ -16,21 +16,23 @@ operation. These can be used to do things like pick graphs or control the merging, a full list of the available options are outlined in the following table: -| Option | Description | -| --- | --- | -| `federated.graphIds` | List of graph IDs to submit the operation to, formatted as a comma separated string e.g. `"graph1,graph2"` | -| `federated.excludedGraphIds` | List of graph IDs to exclude from the query. If this is set any graph IDs on a `federated.graphIds` option are ignored and instead, all graphs are executed on except the ones specified e.g. `"graph1,graph2"` | -| `federated.aggregateElements` | Should the element aggregator be used when merging element results. | -| `federated.forwardChain` | Should the whole operation chain be sent to the sub graph or not. If set to `false` each operation will inside the chain will be sent separately, so merging from each graph will happen after each operation instead of at the end of the chain. This will be inherently slower if turned off so is `true` by default. | +| Option | Default | Description | +| --- | --- | --- | +| `federated.graphIds` | None | List of graph IDs to submit the operation to, formatted as a comma separated string e.g. `"graph1,graph2"` | +| `federated.excludedGraphIds` | None | List of graph IDs to exclude from the query. If this is set any graph IDs on a `federated.graphIds` option are ignored and instead, all graphs are executed on except the ones specified e.g. `"graph1,graph2"` | +| `federated.aggregateElements` | See store properties | Should the element aggregator be used when merging element results. | +| `federated.useDefaultGraphIds` | None | Explicitly specifies that the default Graph IDs from the store.properties file should be used. By default if no graph ID options are specified the default graph IDs will still be used where applicable. However, specifying this on an operation chain means the whole chain will be sent to the sub graph, and so merging from each graph will happen at the end of the chain instead of after each operation, hopefully increasing performance. +| `federated.separateResults` | `false` | A boolean option to specify if the results from each graph should be kept separate. If set, this will return a map where each key value is the graph ID and its respective result. | +| `federated.skipGraphOnFail` | `false` | A boolean option to specify if the operation should continue even if it fails on one or more of the sub graphs. | Along with the options above, all merge classes can be overridden per query using the same property key as you would via the store properties. Please see the table [here](./configuration.md#store-properties) for more information. If you wish to submit different operations to different graphs in the same query -you can do this using the `federate.forwardChain` option. By setting this to -false on the outer operation chain the options on the operations inside it will -be honoured. An example of this can be seen below: +you can do this by omitting any graph ID options on the outer operation chain. +You can then specify the graph IDs on the individual operations in the chain +instead. An example of this can be seen below: !!! note This will turn off any merging of the results at the end of the chain, the @@ -44,9 +46,6 @@ be honoured. An example of this can be seen below: ```json { "class": "OperationChain", - "options": { - "federated.forwardChain": false - }, "operations": [ { "class": "GetElements", @@ -77,21 +76,60 @@ graphs that have been added to the store. This means all features available to normal caches are also available to the graph storage, allowing the sharing and persisting of graphs between instances. -The federated store will use the default cache service to store graphs in. It -will also add a standard suffix meaning if you want to share graphs you will -need to set this to something other than the graph ID (see [here](../store-guide.md#cache-service)). +The federated store will use the default cache service to store graphs in. It will +also store graphs in a cache named `"federatedGraphCache_"` followed by the graph +ID of the federated store. You may wish to change this to have common storage +of graphs between stores using the `gaffer.store.federated.graphCache.name` +store property. + +### Named Operations and Views + +Named Operations and Views can be added to different caches if specified. By +passing graph IDs in the add operation (e.g. `AddNamedOperation`) you can make +the Named Operation or View specific to the graph(s) you specified. However, +this will mean if you try to run it on another graph it will not be available. + +If you do not specify any graph IDs in the add operation, any Named +Operations/Views will instead be added to the federated store's cache. By doing +this anything Named will be resolved before forwarding to sub graphs meaning in +essence it is available to all sub graphs. + +!!! example "" + === "Add to a sub graph" + ```java + final AddNamedOperation addNamedOp = new AddNamedOperation.Builder() + .option(FederatedOperationHandler.OPT_GRAPH_IDS, "subGraph") + .name("NamedOperation") + .operationChain(new OperationChain.Builder() + .first(new GetAllElements()) + .build()) + .build(); + ``` + + === "Add to a federated store" + ```java + final AddNamedOperation addNamedOp = new AddNamedOperation.Builder() + .name("NamedOperation") + .operationChain(new OperationChain.Builder() + .first(new GetAllElements()) + .build()) + .build(); + ``` ## Schema Compatibility -When querying multiple graphs, the federated store will attempt to merge each graph's schema together. This means the schemas will need to be -compatible in order to query across them. Generally you will need to ensure -any shared groups can be merged correctly, a few examples of criteria to -consider are: +When querying multiple graphs, the federated store will attempt to merge each +graph's schema together. This means the schemas will need to be compatible in +order to query across them. Generally you will need to ensure any shared groups +can be merged correctly, a few examples of criteria to consider are: - Any properties in a shared group defined in both schemas need to have the same type and aggregation function. -- Any visibility properties need to be compatible or they will be removed from the - schema. +- If the visibility property has been defined differently in each schema it will + be removed from the merged schema. This does not effect the actual visibility + of the data as that will still be applied at the sub graph level. - Groups with different properties in each schema will be merged so the group has all the properties in the merged schema. - Any groupBy definitions need to be compatible or will be removed. +- If the vertex serialiser has been defined differently in each schema it will + be removed from the merged schema. diff --git a/docs/administration-guide/gaffer-stores/simple-federated/configuration.md b/docs/administration-guide/gaffer-stores/simple-federated/configuration.md index 3bd4dfc370..b958c267d0 100644 --- a/docs/administration-guide/gaffer-stores/simple-federated/configuration.md +++ b/docs/administration-guide/gaffer-stores/simple-federated/configuration.md @@ -43,6 +43,7 @@ specific to a federated store and their usage. | `gaffer.store.federated.default.graphIds` | `""` | The list of default graph IDs for if a user does not specify what graph(s) to run their query on. Takes a comma separated list of graph IDs e.g. `"graphID1,graphID2"` | | `gaffer.store.federated.allowPublicGraphs` | `true` | Are graphs with public access allowed to be added to this store. | | `gaffer.store.federated.default.aggregateElements` | `false` | Should queries aggregate returned Gaffer elements together using the binary operator for merging elements. False by default as it can be slower meaning results are just chained into one big list. | +| `gaffer.store.federated.graphCache.name` | `"federatedGraphCache_"` | The name of the cache that the federated store will store its graphs in. This allows sharing of graphs between different federated stores if the cache name is the same (and same default implementation). | | `gaffer.store.federated.merge.number.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.Sum` | Default binary operator for merging [`Number`](https://docs.oracle.com/javase/8/docs/api/java/lang/Number.html) results (e.g. from a `Count` operation) from multiple graphs. | | `gaffer.store.federated.merge.string.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.StringConcat` | Default binary operator for merging [`String`](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html) results from multiple graphs. | | `gaffer.store.federated.merge.boolean.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.And` | Default binary operator for merging [`Boolean`](https://docs.oracle.com/javase/8/docs/api/java/lang/Boolean.html) results from multiple graphs. | @@ -64,6 +65,11 @@ satisfy Java's [`BinaryOperator`](https://docs.oracle.com/javase/8/docs/api/java interface, you can then specify it using the property key for the data type you wish to use it for. +!!! note + Please note you currently can't chose a merge operator for operations that + return an `Iterable` type, they will always just be chained together (an + iterable of `Element`s is an obvious exception, please see below). + ### The Default Element Merge Operator The default operator used to merge Gaffer elements is unique compared to the @@ -95,6 +101,12 @@ to the individual graph results, this means two results separately will satisfy the `View` but once aggregated they may not. - If you wish to write or use your own operator for merging elements the class must extend the [`ElementAggregateOperator`](https://github.com/gchq/Gaffer/blob/develop/store-implementation/simple-federated-store/src/main/java/uk/gov/gchq/gaffer/federated/simple/merge/operator/ElementAggregateOperator.java). +- If you have chosen in the schema to use a time sensitive aggregation function + (e.g. [`First`](../../../reference/binary-operators-guide/koryphe-operators.md#first)) + for a property that is in multiple sub graphs, you may end up with duplicates + in the result as the aggregator does not know which sub graph is first or + last. This means you may get duplicates of the same vertex but with different + properties in the result. ## Adding and Removing Graphs