Don't use term vectors #3

dsmiley · 2018-05-16T15:10:44Z

Lucene Term Vectors are a bit heavy to use in the way this plugin does. And why encode/decode the vector ordinal numbers as terms at all? Instead I propose as follows:

Add a new special text field that has payloads enabled. No Term vectors. This field will only ever index one nominal term, say the empty-string or one letter 'X' -- it doesn't matter. Each vector ordinal 0 thru 5 or however long it is becomes a term position of this term for a document. The payload encodes the number -- a 4-byte float. The home page of this plugin shows the numbers as dense but this approach (and the term vec one) could easily be sparse. This would be somewhat slower than a custom BinaryDocValues (another implementation path) but it leverages Lucene more and is less custom, for whatever benefit that is (e.g. easier debug-ability).

Ideally a FieldType would be added which could be used to enclose the implementation details of analysis, and it could even be used to query without the addition of any other top level classes / plugins, since a FieldType works with most query parsers, including the default/standard/lucene one and you can do some neat things this way. e.g. q=vecField:"0.1,4.75,0.3,1.2,0.7,4.0" (taken from the example)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use term vectors #3

Don't use term vectors #3

dsmiley commented May 16, 2018

Don't use term vectors #3

Don't use term vectors #3

Comments

dsmiley commented May 16, 2018