Skip to content

Latest commit

 

History

History
602 lines (497 loc) · 14.4 KB

README.md

File metadata and controls

602 lines (497 loc) · 14.4 KB

go-tablib: format-agnostic tabular dataset library

MIT License Go Documentation Go Report Card Build Status

Go-Tablib is a format-agnostic tabular dataset library, written in Go. This is a port of the famous Python's tablib by Kenneth Reitz with some new features.

Export formats supported:

  • JSON (Sets + Books)
  • YAML (Sets + Books)
  • XLSX (Sets + Books)
  • XML (Sets + Books)
  • TSV (Sets)
  • CSV (Sets)
  • ASCII + Markdown (Sets)
  • MySQL (Sets)
  • Postgres (Sets)

Loading formats supported:

  • JSON (Sets + Books)
  • YAML (Sets + Books)
  • XML (Sets)
  • CSV (Sets)
  • TSV (Sets)

Overview

tablib.Dataset

A Dataset is a table of tabular data. It must have a header row. Datasets can be exported to JSON, YAML, CSV, TSV, and XML. They can be filtered, sorted and validated against constraint on columns.

tablib.Databook

A Databook is a set of Datasets. The most common form of a Databook is an Excel file with multiple spreadsheets. Databooks can be exported to JSON, YAML and XML.

tablib.Exportable

An exportable is a struct that holds a buffer representing the Databook or Dataset after it has been formated to any of the supported export formats. At this point the Datbook or Dataset cannot be modified anymore, but it can be returned as a string, a []byte or written to a io.Writer or a file.

Usage

Creates a dataset and populate it:

ds := NewDataset([]string{"firstName", "lastName"})

Add new rows:

ds.Append([]interface{}{"John", "Adams"})
ds.AppendValues("George", "Washington")

Add new columns:

ds.AppendColumn("age", []interface{}{90, 67})
ds.AppendColumnValues("sex", "male", "male")

Add a dynamic column, by passing a function which has access to the current row, and must return a value:

func lastNameLen(row []interface{}) interface{} {
	return len(row[1].(string))
}
ds.AppendDynamicColumn("lastName length", lastNameLen)
ds.CSV()
// >>
// firstName, lastName, age, sex, lastName length
// John, Adams, 90, male, 5
// George, Washington, 67, male, 10

Delete rows:

ds.DeleteRow(1) // starts at 0

Delete columns:

ds.DeleteColumn("sex")

Get a row or multiple rows:

row, _ := ds.Row(0)
fmt.Println(row["firstName"]) // George

rows, _ := ds.Rows(0, 1)
fmt.Println(rows[0]["firstName"]) // George
fmt.Println(rows[1]["firstName"]) // Thomas

Slice a Dataset:

newDs, _ := ds.Slice(1, 5) // returns a fresh Dataset with rows [1..5[

Filtering

You can add tags to rows by using a specific Dataset method. This allows you to filter your Dataset later. This can be useful to separate rows of data based on arbitrary criteria (e.g. origin) that you don’t want to include in your Dataset.

ds := NewDataset([]string{"Maker", "Model"})
ds.AppendTagged([]interface{}{"Porsche", "911"}, "fast", "luxury")
ds.AppendTagged([]interface{}{"Skoda", "Octavia"}, "family")
ds.AppendTagged([]interface{}{"Ferrari", "458"}, "fast", "luxury")
ds.AppendValues("Citroen", "Picasso")
ds.AppendValues("Bentley", "Continental")
ds.Tag(4, "luxury") // Bentley
ds.AppendValuesTagged("Aston Martin", "DB9", /* these are tags */ "fast", "luxury")

Filtering the Dataset is possible by calling Filter(column):

luxuryCars, err := ds.Filter("luxury").CSV()
fmt.Println(luxuryCars)
// >>>
// Maker,Model
// Porsche,911
// Ferrari,458
// Bentley,Continental
// Aston Martin,DB9
fastCars, err := ds.Filter("fast").CSV()
fmt.Println(fastCars)
// >>>
// Maker,Model
// Porsche,911
// Ferrari,458
// Aston Martin,DB9

Tags at a specific row can be retrieved by calling Dataset.Tags(index int)

Sorting

Datasets can be sorted by a specific column.

ds := NewDataset([]string{"Maker", "Model", "Year"})
ds.AppendValues("Porsche", "991", 2012)
ds.AppendValues("Skoda", "Octavia", 2011)
ds.AppendValues("Ferrari", "458", 2009)
ds.AppendValues("Citroen", "Picasso II", 2013)
ds.AppendValues("Bentley", "Continental GT", 2003)

sorted, err := ds.Sort("Year").CSV()
fmt.Println(sorted)
// >>
// Maker, Model, Year
// Bentley, Continental GT, 2003
// Ferrari, 458, 2009
// Skoda, Octavia, 2011
// Porsche, 991, 2012
// Citroen, Picasso II, 2013

Constraining

Datasets can have columns constrained by functions and further checked if valid.

ds := NewDataset([]string{"Maker", "Model", "Year"})
ds.AppendValues("Porsche", "991", 2012)
ds.AppendValues("Skoda", "Octavia", 2011)
ds.AppendValues("Ferrari", "458", 2009)
ds.AppendValues("Citroen", "Picasso II", 2013)
ds.AppendValues("Bentley", "Continental GT", 2003)

ds.ConstrainColumn("Year", func(val interface{}) bool { return val.(int) > 2008 })
ds.ValidFailFast() // false
if !ds.Valid() { // validate the whole dataset, errors are retrieved in Dataset.ValidationErrors
	ds.ValidationErrors[0] // Row: 4, Column: 2
}

A Dataset with constrained columns can be filtered to keep only the rows satisfying the constraints.

valid := ds.ValidSubset().Tabular("simple") // Cars after 2008
fmt.Println(valid)

Will output:

------------  ---------------  ---------
      Maker            Model       Year
------------  ---------------  ---------
    Porsche              991       2012

      Skoda          Octavia       2011

    Ferrari              458       2009

    Citroen       Picasso II       2013
------------  ---------------  ---------
invalid := ds.InvalidSubset().Tabular("simple") // Cars before 2008
fmt.Println(invalid)

Will output:

------------  -------------------  ---------
      Maker                Model       Year
------------  -------------------  ---------
    Bentley       Continental GT       2003
------------  -------------------  ---------

Loading

JSON

ds, _ := LoadJSON([]byte(`[
  {"age":90,"firstName":"John","lastName":"Adams"},
  {"age":67,"firstName":"George","lastName":"Washington"},
  {"age":83,"firstName":"Henry","lastName":"Ford"}
]`))

YAML

ds, _ := LoadYAML([]byte(`- age: 90
  firstName: John
  lastName: Adams
- age: 67
  firstName: George
  lastName: Washington
- age: 83
  firstName: Henry
  lastName: Ford`))

Exports

Exportable

Any of the following export format returns an *Exportable which means you can use:

  • Bytes() to get the content as a byte array
  • String() to get the content as a string
  • WriteTo(io.Writer) to write the content to an io.Writer
  • WriteFile(filename string, perm os.FileMode) to write to a file

It avoids unnecessary conversion between string and []byte to output/write/whatever. Thanks to @figlief for the proposition.

JSON

json, _ := ds.JSON()
fmt.Println(json)

Will output:

[{"age":90,"firstName":"John","lastName":"Adams"},{"age":67,"firstName":"George","lastName":"Washington"},{"age":83,"firstName":"Henry","lastName":"Ford"}]

XML

xml, _ := ds.XML()
fmt.Println(xml)

Will ouput:

<dataset>
 <row>
   <age>90</age>
   <firstName>John</firstName>
   <lastName>Adams</lastName>
 </row>  <row>
   <age>67</age>
   <firstName>George</firstName>
   <lastName>Washington</lastName>
 </row>  <row>
   <age>83</age>
   <firstName>Henry</firstName>
   <lastName>Ford</lastName>
 </row>
</dataset>

CSV

csv, _ := ds.CSV()
fmt.Println(csv)

Will ouput:

firstName,lastName,age
John,Adams,90
George,Washington,67
Henry,Ford,83

TSV

tsv, _ := ds.TSV()
fmt.Println(tsv)

Will ouput:

firstName lastName  age
John  Adams  90
George  Washington  67
Henry Ford 83

YAML

yaml, _ := ds.YAML()
fmt.Println(yaml)

Will ouput:

- age: 90
  firstName: John
  lastName: Adams
- age: 67
  firstName: George
  lastName: Washington
- age: 83
  firstName: Henry
  lastName: Ford

HTML

html := ds.HTML()
fmt.Println(html)

Will output:

<table class="table table-striped">
	<thead>
		<tr>
			<th>firstName</th>
			<th>lastName</th>
			<th>age</th>
		</tr>
	</thead>
	<tbody>
		<tr>
			<td>George</td>
			<td>Washington</td>
			<td>90</td>
		</tr>
		<tr>
			<td>Henry</td>
			<td>Ford</td>
			<td>67</td>
		</tr>
		<tr>
			<td>Foo</td>
			<td>Bar</td>
			<td>83</td>
		</tr>
	</tbody>
</table>

XLSX

xlsx, _ := ds.XLSX()
fmt.Println(xlsx)
// >>>
// binary content
xlsx.WriteTo(...)

ASCII

Grid format

ascii := ds.Tabular("grid" /* tablib.TabularGrid */)
fmt.Println(ascii)

Will output:

+--------------+---------------+--------+
|    firstName |      lastName |    age |
+==============+===============+========+
|       George |    Washington |     90 |
+--------------+---------------+--------+
|        Henry |          Ford |     67 |
+--------------+---------------+--------+
|          Foo |           Bar |     83 |
+--------------+---------------+--------+

Simple format

ascii := ds.Tabular("simple" /* tablib.TabularSimple */)
fmt.Println(ascii)

Will output:

--------------  ---------------  --------
    firstName         lastName       age
--------------  ---------------  --------
       George       Washington        90

        Henry             Ford        67

          Foo              Bar        83
--------------  ---------------  --------

Condensed format

ascii := ds.Tabular("condensed" /* tablib.TabularCondensed */)
fmt.Println(ascii)

Similar to simple but with less line feed:

--------------  ---------------  --------
    firstName         lastName       age
--------------  ---------------  --------
       George       Washington        90
        Henry             Ford        67
          Foo              Bar        83
--------------  ---------------  --------

Markdown

Markdown tables are similar to the Tabular condensed format, except that they have pipe characters separating columns.

mkd := ds.Markdown() // or
mkd := ds.Tabular("markdown" /* tablib.TabularMarkdown */)
fmt.Println(mkd)

Will output:

|     firstName   |       lastName    |    gpa  |
| --------------  | ---------------   | ------- |
|          John   |          Adams    |     90  |
|        George   |     Washington    |     67  |
|        Thomas   |      Jefferson    |     50  |

Which equals to the following when rendered as HTML:

firstName lastName gpa
John Adams 90
George Washington 67
Thomas Jefferson 50

MySQL

sql := ds.MySQL()
fmt.Println(sql)

Will output:

CREATE TABLE IF NOT EXISTS presidents
(
	id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
	firstName VARCHAR(9),
	lastName VARCHAR(8),
	gpa DOUBLE
);

INSERT INTO presidents VALUES(1, 'Jacques', 'Chirac', 88);
INSERT INTO presidents VALUES(2, 'Nicolas', 'Sarkozy', 98);
INSERT INTO presidents VALUES(3, 'François', 'Hollande', 34);

COMMIT;

Numeric (uint, int, float, ...) are stored as DOUBLE, strings as VARCHAR with width set to the length of the longest string in the column, and time.Times are stored as TIMESTAMP.

Postgres

sql := ds.Postgres()
fmt.Println(sql)

Will output:

CREATE TABLE IF NOT EXISTS presidents
(
	id SERIAL PRIMARY KEY,
	firstName TEXT,
	lastName TEXT,
	gpa NUMERIC
);

INSERT INTO presidents VALUES(1, 'Jacques', 'Chirac', 88);
INSERT INTO presidents VALUES(2, 'Nicolas', 'Sarkozy', 98);
INSERT INTO presidents VALUES(3, 'François', 'Hollande', 34);

COMMIT;

Numerics (uint, int, float, ...) are stored as NUMERIC, strings as TEXT and time.Times are stored as TIMESTAMP.

Databooks

This is an example of how to use Databooks.

db := NewDatabook()
// or loading a JSON content
db, err := LoadDatabookJSON([]byte(`...`))
// or a YAML content
db, err := LoadDatabookYAML([]byte(`...`))

// a dataset of presidents
presidents, _ := LoadJSON([]byte(`[
  {"Age":90,"First name":"John","Last name":"Adams"},
  {"Age":67,"First name":"George","Last name":"Washington"},
  {"Age":83,"First name":"Henry","Last name":"Ford"}
]`))

// a dataset of cars
cars := NewDataset([]string{"Maker", "Model", "Year"})
cars.AppendValues("Porsche", "991", 2012)
cars.AppendValues("Skoda", "Octavia", 2011)
cars.AppendValues("Ferrari", "458", 2009)
cars.AppendValues("Citroen", "Picasso II", 2013)
cars.AppendValues("Bentley", "Continental GT", 2003)

// add the sheets to the Databook
db.AddSheet("Cars", cars.Sort("Year"))
db.AddSheet("Presidents", presidents.SortReverse("Age"))

fmt.Println(db.JSON())

Will output the following JSON representation of the Databook:

[
  {
    "title": "Cars",
    "data": [
      {"Maker":"Bentley","Model":"Continental GT","Year":2003},
      {"Maker":"Ferrari","Model":"458","Year":2009},
      {"Maker":"Skoda","Model":"Octavia","Year":2011},
      {"Maker":"Porsche","Model":"991","Year":2012},
      {"Maker":"Citroen","Model":"Picasso II","Year":2013}
    ]
  },
  {
    "title": "Presidents",
    "data": [
      {"Age":90,"First name":"John","Last name":"Adams"},
      {"Age":83,"First name":"Henry","Last name":"Ford"},
      {"Age":67,"First name":"George","Last name":"Washington"}
    ]
  }
]

Installation

go get github.com/agrison/go-tablib

For those wanting the v1 version where export methods returned a string and not an Exportable:

go get gopkg.in/agrison/go-tablib.v1

TODO

  • Loading in more formats
  • Support more formats: DBF, XLS, LATEX, ...

Contribute

It is a work in progress, so it may exist some bugs and edge cases not covered by the test suite.

But we're on Github and this is Open Source, pull requests are more than welcomed, come and have some fun :)

Acknowledgement

Thanks to kennethreitz for the first implementation in Python, github.com/bndr/gotabulate, github.com/clbanning/mxj, github.com/tealeg/xlsx, gopkg.in/yaml.v2