Skip to content
This repository has been archived by the owner on Oct 19, 2020. It is now read-only.

handle documents with different fields #16

Open
kaem2111 opened this issue May 23, 2018 · 1 comment
Open

handle documents with different fields #16

kaem2111 opened this issue May 23, 2018 · 1 comment

Comments

@kaem2111
Copy link

kaem2111 commented May 23, 2018

This is an enhancement proposal.

When retrieving documents with different fields from an elastic index (e.q. index="metricbeat-*" query="*") then the first document determines the names of the columns of the whole table! The content of further documents with other fields are not shown, because there is no corresponding columnname.

The following modification inserts an additional first document with all fields of all documents (and a _time value < 0 to be filtered out later). The header fields are determined depending of the scan option:

  1. if scan=false, the columns are collected by looping through the full hits list
  2. if scan=true, the columns are extracted from an esclient.indices.get_field_mapping call

You can additionally determine the display sequence of the columns with the fields-parameter, e.g. fields="beat.name,system.load.*,beat.*" will show _time and beat.name first, then all system.load fields and after that the remaining beat-fields (without beat.name of course).

Unfortunaly I am not familiar with pull requests/github development, therefore here a code proposal (could be modified as you like) as follows:

# KAEM BEGIN extension to get column names via get_field_mapping
#       if self.scan:  # does not work, because is string type and always true
        if self.scan in ["true", "True", 1]: 
            head = OrderedDict()
            head["_time"] = -2
            f0 = config[KEY_CONFIG_FIELDS] or ['*']
            res = esclient.indices.get_field_mapping(index=config[KEY_CONFIG_INDEX], fields=f0)
            for nx in res:
                for ty in res[nx]["mappings"]:
                    for m0 in f0:
                        for fld in sorted(res[nx]["mappings"][ty]):
                            if fld in head: continue
                            if fld.endswith(".keyword"): continue
                            if re.match(m0.replace('*', '.*'), fld): head[fld]=""
            yield head
#KAEM END

            # Execute search
            res = helpers.scan(esclient, 
            ....
       else:
            res = esclient.search(index=config[KEY_CONFIG_INDEX],
                                  size=config[KEY_CONFIG_LIMIT],
                                  _source_include=config[KEY_CONFIG_FIELDS],
                                  doc_type=config[KEY_CONFIG_SOURCE_TYPE],
                                  body=body)

# KAEM BEGIN extension to get column names via hits scanning
            head = OrderedDict()
            head["_time"] = -1
            head0 = {}
            f0 = config[KEY_CONFIG_FIELDS] or ['*']
            for hit in res['hits']['hits']:
                for fld in self._parse_hit(config, hit): head0[fld] = ""
            for m0 in f0:
                for fld in sorted(head0):
                    if fld in head: continue
                    if re.match(m0.replace('*', '.*'), fld): head[fld] = head0[fld]
            head["_time"] = -1  # setup again, because overwritten by hits in meantime
            yield head
#KAEM END
@brunotm
Copy link
Owner

brunotm commented May 26, 2018

Hi @kaem2111,

I wasn't aware of this issue. I'll look into testing and adding your changes.

Thanks for tracking this :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants