Skip to content

Latest commit

 

History

History
1073 lines (655 loc) · 26.7 KB

index.md

File metadata and controls

1073 lines (655 loc) · 26.7 KB

Tabdata commands

csv2td

NAME

csv2td - Transform CSV to tabular data format.

DESCRIPTION

Read CSV data on STDIN. Output tabular data to STDOUT.

OPTIONS

Any option which Text::CSV(3pm) takes. See Text::CSV-known_attributes> for extensive list. Example:

csv2td --sep=';' --blank-is-undef=0 --binary

becomes:

Text::CSV->new({sep=>";", blank_is_undef=>0, binary=>1})

F. A. Q.

Why there is no td2csv?

Why would you go back to ugly CSV when you have nice shiny Tabdata?

SEE ALSO

csv2(1), mrkv2td(1)

kvpairs2td

NAME

kvpairs2td - Transform lines of key-value pairs to tabular data stream

OPTIONS

  • -i, --ignore-non-existing-columns

    Do not fail when encounters a new field after the first record.

  • -w, --warn-non-existing-columns

  • -c, --column COLUMN

    Indicate that there will be a column by the name COLUMN. This is useful if the first record does not have COLUMN. This option is repeatable.

  • -r, --restcolumn NAME

    Name of the column where the rest of the input line will be put which is not part of key-value pairs. Default is _REST.

  • -u, --unknown-to-rest

    Put unknown (non-existing) fields in the "rest" column (see -r option).

SEE ALSO

td2mrkv(1), td2kvpairs(1)

mrkv2td

NAME

mrkv2td - Transform multi-record key-value (MRKV) stream to tabular data format.

DESCRIPTION

As tabular data format presents field names at the start of transmission, mrkv2td(1) infers them only from the first record, so no need to buffer the whole dataset to find all fields, and it's usual for all records to have all fields anyways.

OPTIONS

  • -s, --separator REGEXP

    Regexp which separates field name from cell data in MRKV stream. Default is TAB (\t).

  • -g, --multiline-glue STRING

SEE ALSO

td2mrkv(1)

td2html

NAME

td2html - Transform tabular data stream into a HTML table.

SYNOPSIS

td2html

DESCRIPTION

Takes a tabular data stream on STDIN and outputs a HTML table enclosed in <table>...</table> tags.

td2kvpairs

NAME

td2kvpairs - Transform tabular data into key-value pairs

OPTIONS

  • -r, --prefix-field NAME

    Put this field's content before the list of key-value pairs. Default is _REST. Prefix and the key-value pairs are separated by a space char, if there is any prefix.

SEE ALSO

td2mrkv(1), kvpairs2td(1)

td2mrkv

NAME

td2mrkv - Transform tabular data into multi-record key-value (MRKV) format.

OPTIONS

  • -s, --separator STR

    String to separate field name from content. Default is TAB (\t).

EXAMPLE

getent passwd | tr : "\t" | td-add-headers USER PW UID GID GECOS HOME SHELL | td-select +ALL -PW | td2mrkv

SEE ALSO

mrkv2td(1), td2html(1)

td-add-headers

NAME

td-add-headers - Add headers to the tabular data stream and pass through the rows.

SYNOPSIS

td-add-headers COLNAME_1 COLNAME_2 ...

DESCRIPTION

Add header row to the tabular data stream. Headers names will be the ones specified in the command line arguments, from the left-most 1-by-1.

If there are more fields in the first data row, then additional columns will be added with names like "COL4", "COL5", etc. by the index number of the column counting from 1. This may be prevented by --no-extra-columns option.

OPTIONS

  • -x, --extra-columns

    Give a name also to those columns which are not given name in the command parameters.

  • -X, --no-extra-columns

    Do not add more columns than specified in the command parameters.

EXAMPLE

who | td-trans | td-add-headers USER TTY DATE TIME COMMENT

td-alter

NAME

td-alter - Add new columns and fields to tabular data stream, and modify value of existing fields.

USAGE

td-alter COLUMN=EXPR [COLUMN=EXPR [COLUMN=EXPR [...]]]

DESCRIPTION

On each data row, sets field in COLUMN to the value resulted by EXPR Perl expression.

In EXPR, you may refer to other fields by $F{NAME} where NAME is the column name; or by $F[INDEX] where INDEX is the 0-based column index number. Furthermore you may refer to uppercase alpha-numeric field names, simply by bareword COLUMN, well, enclosed in paretheses like (COLUMN) to avoid parsing unambiguity in Perl. It's possible because these column names are set up as subroutines internally.

Topic variable ($_) initially is set to the current value of COLUMN in EXPR. So for example N='-$_' makes the field N the negative of itself.

You can create new columns simply by referring to a COLUMN name that does not exist yet. You can refer to an earlier defined COLUMN in subsequent EXPR expressions.

EXAMPLES

Add new columns: TYPE and IS_BIGFILE. IS_BIGFILE depends on previously defined TYPE field.

ls -l | td-trans-ls | td-alter TYPE='substr MODE,0,1' IS_BIGFILE='SIZE>10000000 && TYPE ne "d" ? "yes" : "no"'

Strip sub-seconds and timezone from DATETIME field:

TIME_STYLE=full-iso ls -l | td-trans-ls | td-alter DATETIME='s/\..*//; $_'

OPTIONS

  • -H, --no--header

    do not show headers

  • -h, --header

    show headers (default)

REFERENCES

"Alter" in td-alter comes from SQL. td-alter(1) can change the "table" column layout. But contrary to SQL's ALTER TABLE, td-alter(1) can modify the records too, so akin to SQL UPDATE as well.

td-collapse

NAME

td-collapse - Collapse multiple tabular data records with equivalent keys into one.

SYNOPSIS

td-collapse [OPTIONS]

DESCRIPTION

It goes row-by-row on a sorted tabular data stream and if 2 or more subsequent rows' first (key) cell are the same then collapse them into one row. This is done by joining corresponding cells' data from each row into one cell, effectively keeping every column's data in the same column.

If you want to group by an other column, not the first one, then first reorder the columns by td-select(1). Eg. td-select KEYCOLUMN +REST.

OPTIONS

  • -g, --glue STR

    Delimiter character or string between joined cell data. Default is space.

  • -u, --distribute-unique-field FIELD

    Take the FIELD column's cells from the first collapsed group, and multiplicate all other columns as many times as many rows are in this group, in a way that each cell goes under a new column corresponding to that cell's original row. FIELD field's cells need to be unique within each groups.

    If an unexpected value found during processing the 2nd row group and onwards, ie. a value which was not there in the first group, it won't be distibuted into the new column, since the header is already sent, but left in the original column just like -u option would not be in effect. See "pause" and "resume" in the example below.

    Example:

      ID | EVENT  | TIME  | STATUS
      15 | start  | 10:00 |
      15 | end    | 10:05 | ok
      16 | start  | 11:00 |
      16 | end    | 11:06 | err
      16 | pause  | 11:04 |
      16 | resume | 11:05 |
      
      td-collapse -u EVENT
      
      COUNT | ID | EVENT        | TIME        | TIME_start | TIME_end | STATUS | STATUS_start | STATUS_end
      2     | 15 |              |             | 10:00      | 10:05    |        |              | ok
      4     | 16 | pause resume | 11:04 11:05 | 11:00      | 11:06    |        |              | err
    
  • -s, --distributed-column-name-separator STR

    When generating new columns as described at -u option, join the original column name with each of the unique field's values by STR string. See example at -u option description. Default is underscore _.

EXAMPLES

This pipeline shows which users are using each of the configured default shells, grouped by shell path.

# get the list of users
getent passwd |\

# transform into tabular data stream
tr : "\t" |\
td-add-headers USER X UID GID GECOS HOME SHELL |\

# put the shell in the first column, and sort, then collapse
td-select SHELL USER | td-keepheader sort | td-collapse -g ' ' |\

# change header name "USER" to "USERS"
td-alter USERS=USER | td-select +ALL -USER

Output:

| COUNT | SHELL             | USERS                                        |
| 4     | /bin/bash         | user1 user2 nova root                        |
| 5     | /bin/false        | fetchmail hplip sddm speech-dispatcher sstpc |
| 1     | /bin/sync         | sync                                         |
| 1     | /sbin/rebootlogon | reboot                                       |
| 6     | /usr/sbin/nologin | _apt avahi avahi-autoipd backup bin daemon   |

CAVEATS

Have to sort input data first.

Group key is always the first input column.

If a row in the input data has more cells than the number of columns, those are ignored.

SEE ALSO

td-expand(1) is a kind of an inverse to td-collapse(1).

REFERENCES

td-collapse(1) roughly translates to SELECT COUNT(*) + GROUP_CONCAT() + GROUP BY in SQL.

td-disamb-headers

NAME

td-disamb-headers - Disambiguate headers in tabular data

DESCRIPTION

Change column names in input tabular data stream by appending a sequential number to the duplicated column names. The first occurrance is kept as-is. If a particular column name already ends with an integer, it gets incremented.

EXAMPLE

echo "PID     PID     PID2    PID2    USER    CMD" | td-disamb-headers

Output:

PID   PID3    PID2    PID4    USER    CMD

td-expand

NAME

td-expand - Generate multiple rows from each one row in a Tabular data stream.

SYNOPSIS

td-expand [-f FIELD] [-s SEPARATOR]

DESCRIPTION

It goes row-by-row and splits the given FIELD at SEPARATOR chars, creates as many rows on the output as many parts FIELD is split into, fills the FIELD column in each row by one of the parts, and fills all other columns in all resulted rows with the corresponding column's data in the input.

More illustratively:

| SHELL       | USERS         |
| /bin/bash   | user1 user2   |
| /bin/dash   | user3 user4   |
| /bin/sh     | root          |

td-expand -f USERS -s ' ' | td-alter USER=USERS | td-select +ALL -USERS

| SHELL       | USER          |
| /bin/bash   | user1         |
| /bin/bash   | user2         |
| /bin/dash   | user3         |
| /bin/dash   | user4         |
| /bin/sh     | root          |

OPTIONS

  • -f, --field FIELD

    Which field to break up. Default is always the first one.

  • -s, --separator PATTERN

    Regexp pattern to split FIELD at. Default is space.

SEE ALSO

td-collapse(1) is a kind of inverse to td-expand(1).

td-filter

NAME

td-filter - Show only those records from the input tabular data stream which match to the conditions.

USAGE

td-filter [OPTIONS] [--] COLUMN OPERATOR R-VALUE [[or] COLUMN OPERATOR R-VALUE [[or] ...]]

td-filter [OPTIONS] --perl EXPR

DESCRIPTION

Pass through those records which match at least one of the conditions (inclusive OR). A condition consists of a triplet of COLUMN, OPERATOR, and R-VALUE. You may put together conditions conjunctively (AND) by chaining multiple td-filter(1) commands by shell pipes. Example:

td-filter NAME eq john NAME eq jacob | tr-filter AGE -gt 18

This gives the records with either john or jacob, and all of them will be above 18.

The optional word "or" between triplets makes your code more explicite.

td-filter(1) evaluates the Perl expression in the second form and passes through records only if the result is true-ish in Perl (non zero, non empty string, etc). Each field's value is in @F by index, and in %F by column name. You can implement more complex conditions in this way.

OPTIONS

  • -H, --no-header

    do not show headers

  • -h, --header

    show headers (default)

  • -i, --ignore-non-existing-columns

    do not treat non-existing (missing or typo) column names as failure

  • -w, --warn-non-existing-columns

    only show warning on non-existing (missing or typo) column names, but don't fail

  • -N, --no-fail-non-numeric

    do not fail when a non-numeric r-value is given to a numeric operator

  • -W, --no-warn-non-numeric

    do not show warning when a non-numeric r-value is given to a numeric operator

OPERATORS

These operators are supported, semantics are the same as in Perl, see perlop(1).

== != <= >= < > =~ !~ eq ne gt lt

For your convenience, not to bother with escaping, you may also use these operators as alternatives to the canonical ones above:

  • is

  • = (single equal sign)

    string equality (eq)

  • is not

    string inequality (ne)

  • -eq

    numeric equality (==)

  • -ne

    numeric inequality (!=)

  • <>

    numeric inequality (!=)

  • -gt

    numeric greater than (>)

  • -ge

    numeric greater or equal (>=)

  • -lt

    numeric less than (<)

  • -le

    numeric less or equal (<=)

  • match

  • matches

    regexp match (=~)

  • does not match

  • do not match

  • not match

    negated regexp match (!~)

Other operators:

  • is [not] one of

  • is [not] any of

    R-VALUE is split into pieces by commas (,) and equality to at least one of them is required. Equality to none of them is required if the operator is negated.

  • contains [whole word]

    Substring match. Plural form "contain" is also accepted. Optional whole word is a literal part of the operator.

  • contains [one | any] [whole word] of

    Similar to is one of, but substring match is checked instead of full string equality. Plural form "contain" is also accepted. Optional whole word is a literal part of the operator.

  • ends with

  • starts with

    Plural forms are also accepted.

Operators may be preceeded by not, does not, do not to negate their effect.

CAVEATS

If there is no COLUMN column in the input data, it's silently considered empty. td-filter(1) does not need R-VALUE to be quoted or escaped, however your shell may do.

REFERENCES

td-filter(1) is analogous to SQL WHERE.

td-gnuplot

NAME

td-gnuplot - Graph tabular data using gnuplot(1)

USAGE

td-gnuplot [OPTIONS]

DESCRIPTION

Invoke gnuplot(1) to graph the data represented in Tabular data format on STDIN. The first column is the X axis, the rest of the columns are data lines.

Default is to output an ascii-art chart to the terminal ("dumb" output in gnuplot).

td-gnuplot guesses the data format from the column names. If the 0th column matches to "date" or "time" (case insensitively) then the X axis will be a time axis. If the 0th column matches to "time", then unix epoch timetamp is assumed. Otherwise specify what date/time format is used by eg. --timefmt=%Y-%m-%d option.

Plot data read from STDIN is buffered in a temp file (provided by File::Temp->new(TMPDIR=>1) and immediately unlinked so no waste product left around), because gnuplot(1) need to seek in it when plotting more than 1 data series.

OPTIONS

  • -i

    Output an image (PNG) to the STDOUT, instead of drawing to the terminal.

  • -d

    Let gnuplot(1) decide the output medium, instead of drawing to the terminal.

  • --SETTING

  • --SETTING=VALUE

    Set any gnuplot setting, optionally set its value to VALUE. SETTING is a setting name used in set ... gnuplot commands, except spaces replaced with dasshes. VALUE is always passed to gnuplot enclosed in double quotes. Examples:

      --format-x="%Y %b"
      --xtics-rotate-by=-90
      --style-data-lines
    

    Gnuplot equivalent command:

      set format x "%Y %b"
      set xtics rotate by "-90"
      set style data lines
    
  • -e COMMAND

    Pass arbitrary gnuplot commands to gnuplot. This option may be repeated. This is passed to gnuplot(1) in command line (-e option) after td-grnuplot(1)'s own sequence of gnuplot setup commands and after the --SETTING settings are applied, so you can override them.

td-keepheader

NAME

td-keepheader - Plug a non header-aware program in the tabular-data processing pipeline

USAGE

td-keepheader [--] []

EXAMPLE

ls -l | td-trans-ls | td-select NAME +REST | td-keepheader sort | tabularize

td-lpstat

NAME

td-lpstat - lpstat(1) wrapper to output printers status in Tabular Data format

td-ls

NAME

td-ls - ls(1)-like file list but more machine-parseable

SYNOPSIS

td-ls [OPTIONS] [PATHS] [-- FIND-OPTIONS]

OPTIONS, ls(1)-compatible

  • -A, --almost-all
  • -g
  • -G, --no-group
  • -i, --inode
  • -l (implied)
  • -n, --numeric-uid-gid
  • -o
  • --time=[atime, access, use, ctime, status, birth, creation, mtime, modification]
  • -R, --recursive
  • -U (implied, pipe to sort(1) if you want)

OPTIONS, not ls(1)-compatible

  • --devnum

  • -H, --no-header

  • --no-symlink-target

  • --add-field FIELD-NAME

    Add extra fields by name. See field names by --help-field-names option. May be added multiple times.

  • --add-field-macro FORMAT

    Add extra fields by find(1)-style format specification. For valid _FORMAT_s, see -printf section in find(1). May be added multiple times. Putting \\0 (backslash-zero) in FORMAT screws up the output; don't do that.

  • --help-field-names

    Show valid field names to be used for --add-field option.

DESCRIPTION

Columns are similar to good old ls(1): PERMS (symbolic representation), LINKS, USERNAME (USERID if -n option is given), GROUPNAME (GROUPID if -n option is given), SIZE (in bytes), time field is either ATIME, CTIME, or default MTIME (in full-iso format), BASENAME (or RELPATH in --recursive mode), and SYMLINKTARGET (unless --no-symlink-target option is given).

Column names are a bit different than td-trans-ls(1) produces, but this is intentional, because fields by these 2 tools have slightly different meaning. td-trans-ls(1) is less smart because it just transforms ls(1)'s output and does not always know what is in the input exactly; while td-ls(1) itself controls what data goes to the output.

No color support.

FORMAT

Output format is tabular data: a table, in which fields are delimited by TAB and records by newline (LF).

Meta chars may occur in some fields (path, filename, symlink target, etc), these are escaped this (perl-compatible) way:

| Raw char  | Substituted to |
|-----------|----------------|
| ESC       | \e             |
| TAB       | \t             |
| LF        | \n             |
| CR        | \r             |
| Backslash | \\             |

Other control chars (charcode below 32 in ASCII) including NUL, vertical-tab, and form-feed are left as-is.

ENVIRONMENT

  • TIME_STYLE

    TIME_STYLE is ignored as well as --time-style option. Always show date-time in %F %T %z strftime(3) format! It's simply the most superior. Equivalent to TIME_STYLE=full-iso.

SEE ALSO

td-select(1), td-filter(1), td-trans-ls(1)

td-pivot

NAME

td-pivot - Switch columns for rows in tabular data

SYNOPSIS

td-pivot

CAVEAT

Must read and buffer the whole STDIN before output any data, so inpractical on large data.

td-ps

td-rename

NAME

td-rename - Rename tabular data columns

USAGE

td-rename OLDNAME NEWNAME [OLDNAME NEWNAME [OLDNAME NEWNAME [...]]]

EXAMPLE

conntrack -L | sd '^(\S+)\s+(\S+)\s+(\S+)' 'protoname=$1 protonum=$2 timeout=$3' | kvpairs2td | td-rename _REST FLAGS

SEE ALSO

Not to confuse with rename.td(1) which renames files, not columns.

td-select

NAME

td-select - Show only the specified columns from the input tabular data stream.

USAGE

td-select [OPTIONS] [--] [-]COLUMN [[-]COLUMN [...]]

OPTIONS

  • -H, --no--header

    do not show headers

  • -h, --header

    show headers (default)

  • -i, --ignore-non-existing-columns

    do not treat non-existing (missing or typo) column names as failure

  • -w, --warn-non-existing-columns

    only show warning on non-existing (missing or typo) column names, but don't fail

  • --strict-columns

    warn and fail on non-existing (missing or typo) column names given in parameters, even if it's prefixed with hyphen, ie. when the user want to remove the named column from the output.

DESCRIPTION

COLUMN is either a column name, or one of these special keywords:

  • +ALL

    all columns

  • +REST

    the rest of columns not given yet in the parameter list

COLUMN is optionally prefixed with minus (-), in which case the given column will not be shown, ie. removed from the shown columns.

So if you want to show all columns except one or two:

td-select +ALL -PASSWD

If you want to put a given column (say "KEY") to the first place and left others intact:

td-select KEY +REST

EXAMPLE

ls -l | td-trans-ls | td-select -- NAME +REST -INODE -LINKS -MAJOR -MINOR

REFERENCES

"Select" in td-select comes from SQL. Similarly to SQL, td-select(1) is to choose some of the columns and return them in the given order.

td-sort

NAME

td-sort - Sort tabular data by the columns given by name

USAGE

td-sort OPTIONS

OPTIONS

All those which are accepted by sort(1), except you don't need to refer to columns by ordinal number, but by name.

  • -k, --key=KEYDEF

    sort(1) defines KEYDEF as F[.C][OPTS][,F[.C][OPTS]], where F is the (1-based) field number. However with td-sort(1) you may refer to fields by name. But since F is no longer consists only of digits, but is an arbitrary string, it's may be ambiguous where the name ends. So you may enclose them in round/square/curly/angle brackets. Choose the one which does not occur in the column name.

    You don't need to even type -k, because a lone COLUMN-NAME is interpreted as "-k F" where F is the corresponding field number.

REFERENCES

td-sort(1) is analogous to SQL ORDER BY.

td-trans

NAME

td-trans - Transform whitespace-delimited into TAB-delimited lines ignoring sorrounding whitespace.

OPTIONS

  • -m, --max-columns NUM

    Maximum number of columns. The _NUM_th column may have any whitespace. By default it's the number of fields in the header (first line).

td-trans-fixcol

NAME

td-trans-fixcol - Transform a table-looking text, aligned to fixed columns by spaces, into tabular data.

DESCRIPTION

First line is the header consisting of the column names. Each field's text must start in the same terminal column as the column name.

OPTIONS

  • -m, --min-column-spacing NUM

    Minimum spacing between columns. Default is 2. This allows the input data to have column names with single spaces.

EXAMPLE

arp -n | td-trans-fixcol

td-trans-group

td-trans-gshadow

td-trans-ls

NAME

td-trans-ls - Transform ls(1) output into fix number of TAB-delimited columns.

USAGE

ls -l | td-trans-ls

DETAILS

Supported ls(1) options which affect its output format:

  • --human-readable
  • --inode
  • --recursive
  • --time-style={iso,long-iso,full-iso}

Unsupported options:

  • --author
  • -g
  • -o
  • --time-style=locale

td-trans-mount

NAME

td-trans-mount - Transform mount(1) output to tabular data stream.

DESCRIPTION

Supported mount(1) options which affect output format:

  • -l (show labels)

EXAMPLES

mount | td-trans-mount

mount -l | td-trans-mount

td-trans-passwd

td-trans-shadow

vcf2td

NAME

vcf2td - Transform VCF to tabular data format.

OPTIONS

  • -c, --column COLUMN

    Indicate that there will be a column by the name COLUMN. Useful if the first record does not contain all fields which are otherwise occur in the whole data stream. By default, vcf2td(1) recognize fields which are in the first record in the VCF input, does not read ahead more records before sending the header. This option is repeatable.

  • -i, --ignore-non-existing-columns

    Don't fail and don't warn when ecountering new field names.

    Tabular data format declares all of the field names in the column headers, so it can not introduce new columns later on in the data stream (unless some records were buffered which are not currently). However in VCF, each record may have fields different from the first record. That's why vcf2td(1) fails itself by default if it encounters a field it can not convert to tabular.

  • -w, --warn-non-existing-columns

    Only warns on new fields, but don't fail.

  • -g, --multivalue-glue STR

    A string to glue repeated fields' values together when the repeated fields are handled by uniting their content into one tabdata column. Default is newline.

    Note, eventhough newline is the default glue, but if you want to be explicit about it (or want to set an other glue STR expressed often by some backslash sequence), vcf2td -g "\n" ... probably won't quite work as one may expect (depending on one's shell), because the shell passes the "backslash" + "n" 2-chars string, instead of a string consisting just 1 "newline" char. So, in bash, put it as vcf2td -g $'\n' ....

COMMON vCard FIELDS

  • N

    N is for a contact's name, different parts separated by ; semicolon. vcf2td(1) simplifies the N field by removing excess semicolons. If you need one or more name parts precisely, request the N.family, N.given, N.middle, N.prefixes fields by the -c option if you want, but this name partitioning method is not quite internationally useful, use the FN (full name) field for persons' names as much as you can.