handle documents with different fields #16

kaem2111 · 2018-05-23T18:13:09Z

This is an enhancement proposal.

When retrieving documents with different fields from an elastic index (e.q. index="metricbeat-*" query="*") then the first document determines the names of the columns of the whole table! The content of further documents with other fields are not shown, because there is no corresponding columnname.

The following modification inserts an additional first document with all fields of all documents (and a _time value < 0 to be filtered out later). The header fields are determined depending of the scan option:

if scan=false, the columns are collected by looping through the full hits list
if scan=true, the columns are extracted from an esclient.indices.get_field_mapping call

You can additionally determine the display sequence of the columns with the fields-parameter, e.g. fields="beat.name,system.load.*,beat.*" will show _time and beat.name first, then all system.load fields and after that the remaining beat-fields (without beat.name of course).

Unfortunaly I am not familiar with pull requests/github development, therefore here a code proposal (could be modified as you like) as follows:

# KAEM BEGIN extension to get column names via get_field_mapping
#       if self.scan:  # does not work, because is string type and always true
        if self.scan in ["true", "True", 1]: 
            head = OrderedDict()
            head["_time"] = -2
            f0 = config[KEY_CONFIG_FIELDS] or ['*']
            res = esclient.indices.get_field_mapping(index=config[KEY_CONFIG_INDEX], fields=f0)
            for nx in res:
                for ty in res[nx]["mappings"]:
                    for m0 in f0:
                        for fld in sorted(res[nx]["mappings"][ty]):
                            if fld in head: continue
                            if fld.endswith(".keyword"): continue
                            if re.match(m0.replace('*', '.*'), fld): head[fld]=""
            yield head
#KAEM END

            # Execute search
            res = helpers.scan(esclient, 
            ....
       else:
            res = esclient.search(index=config[KEY_CONFIG_INDEX],
                                  size=config[KEY_CONFIG_LIMIT],
                                  _source_include=config[KEY_CONFIG_FIELDS],
                                  doc_type=config[KEY_CONFIG_SOURCE_TYPE],
                                  body=body)

# KAEM BEGIN extension to get column names via hits scanning
            head = OrderedDict()
            head["_time"] = -1
            head0 = {}
            f0 = config[KEY_CONFIG_FIELDS] or ['*']
            for hit in res['hits']['hits']:
                for fld in self._parse_hit(config, hit): head0[fld] = ""
            for m0 in f0:
                for fld in sorted(head0):
                    if fld in head: continue
                    if re.match(m0.replace('*', '.*'), fld): head[fld] = head0[fld]
            head["_time"] = -1  # setup again, because overwritten by hits in meantime
            yield head
#KAEM END

The text was updated successfully, but these errors were encountered:

brunotm · 2018-05-26T13:54:39Z

Hi @kaem2111,

I wasn't aware of this issue. I'll look into testing and adding your changes.

Thanks for tracking this :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle documents with different fields #16

handle documents with different fields #16

kaem2111 commented May 23, 2018 •

edited

Loading

brunotm commented May 26, 2018

handle documents with different fields #16

handle documents with different fields #16

Comments

kaem2111 commented May 23, 2018 • edited Loading

brunotm commented May 26, 2018

kaem2111 commented May 23, 2018 •

edited

Loading