Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use term vectors #3

Open
dsmiley opened this issue May 16, 2018 · 0 comments
Open

Don't use term vectors #3

dsmiley opened this issue May 16, 2018 · 0 comments

Comments

@dsmiley
Copy link

dsmiley commented May 16, 2018

Lucene Term Vectors are a bit heavy to use in the way this plugin does. And why encode/decode the vector ordinal numbers as terms at all? Instead I propose as follows:

Add a new special text field that has payloads enabled. No Term vectors. This field will only ever index one nominal term, say the empty-string or one letter 'X' -- it doesn't matter. Each vector ordinal 0 thru 5 or however long it is becomes a term position of this term for a document. The payload encodes the number -- a 4-byte float. The home page of this plugin shows the numbers as dense but this approach (and the term vec one) could easily be sparse. This would be somewhat slower than a custom BinaryDocValues (another implementation path) but it leverages Lucene more and is less custom, for whatever benefit that is (e.g. easier debug-ability).

Ideally a FieldType would be added which could be used to enclose the implementation details of analysis, and it could even be used to query without the addition of any other top level classes / plugins, since a FieldType works with most query parsers, including the default/standard/lucene one and you can do some neat things this way. e.g. q=vecField:"0.1,4.75,0.3,1.2,0.7,4.0" (taken from the example)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant