Skip to content

Feanor is an artisan of CSV files. It can generate complex CSV files or file bundles for examples and tests.

License

Notifications You must be signed in to change notification settings

Bakuriu/feanor-csv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feanor CSV

Feanor is an artisan of CSV files. It can generate complex CSV files or file bundles for examples and tests.

Note: Feanor is currently in development. All releases prior to 1.0.0 should be considered alpha releases and both the command line interface and the library API might change significantly between releases. Release 1.0.0 will provide a stable API and stable command line interface for the 1.x series.

Usage

$ feanor --help
usage: feanor [-h] [--no-header] [-L LIBRARY] [-D DEFINE]
              [-C GLOBAL_CONFIGURATION] [-r RANDOM_MODULE] [-s RANDOM_SEED]
              [--version] (-n N | -b N | --stream-mode STREAM_MODE)
              {expr,cmdline} ...

optional arguments:
  -h, --help            show this help message and exit
  --no-header           Do not add header to the output.
  -L LIBRARY, --library LIBRARY
                        The library to use.
  -D DEFINE, --define DEFINE
                        Type alias definitions for producers.
  -C GLOBAL_CONFIGURATION, --global-configuration GLOBAL_CONFIGURATION
                        The global configuration for producers.
  -r RANDOM_MODULE, --random-module RANDOM_MODULE
                        The random module to be used to generate random data.
  -s RANDOM_SEED, --random-seed RANDOM_SEED
                        The random seed to use for this run.
  --version             show program's version number and exit
  -n N, --num-rows N    The number of rows of the produced CSV
  -b N, --num-bytes N   The approximate number of bytes of the produced CSV
  --stream-mode STREAM_MODE

Schema definition:
  {expr,cmdline}        Commands to define a CSV schema.

Checking the version:

$ feanor --version
feanor 0.6.0-alpha

Producer types

Each producer is assigned an "producer type", which describes how to generate the values. The syntax of the producer type is the following:

% <TYPE_NAME>[ : <PRODUCER_NAME>] [ CONFIG ]

Where TYPE_NAME and PRODUCER_NAME must match \w+ and CONFIG is a python dict literal.

For example the built-in int producer type can be used in the following ways:

  • %int or %int{}: default configuration
  • %int{"min": 10}: do not generate numbers smaller than 10 (inclusive).
  • %int{"max": 10}: do not generate numbers bigger than 10 (inclusive).
  • %int{"min": 10, "max":1000}: generate numbers between 10 and 1000 (both inclusive).

In all these cases we omitted PRODUCER_NAME. When specifying the PRODUCER_NAME we are telling feanor that it should generate the values using PRODUCER_NAME as producer but it should treat the expression as if the type was TYPE_NAME.

This is useful in the cases where you want to provide a new producer for an existing type and you don't want to also have to redefine the compatibility in the library.

Type definitions

It is possible to define "aliases" for types with a given configuration by using the -D or --define option. The value of this option should be a string of the form:

( <PRODUCER_NAME> := % <PRODUCER_NAME> [ CONFIG ] )+

The separator of these expressions is either the newline \n or ;.

For example to define a perc producer which yields floating point numbers in the range 0-100 you can do:

-D "perc := %float{'min': 0, 'max': 100}"

And then you will be able to use %perc as if it was a built-in type:

$ feanor -s 0 -n 5  -D "perc := %float{'min': 0, 'max': 100}" cmdline -c a %perc
a
84.4421851525048
75.79544029403024
42.0571580830845
25.891675029296334
51.12747213686085

Feanor DSL Expressions

Values are defined by a simple DSL that allows you to combine multiple producers in different ways and they allow to express complex logic for your data generation.

Producer definitions

A producer definition is simply its type and follows the syntax %<NAME>[CONFIG] as explained before.

Assignments

You can assign a name to a certain expression with the syntax (<expr>)=<NAME>.

References

You can refer to the values of an expression to which you assigned a name by using the syntax @<NAME>.

Concatenation

You can concatenate multiple values using the syntax <expr_1> . <expr_2> or <expr_1> · <expr_2>.

Choice

You can define an expression that can randomly take a value by using the choice operator | using the syntax <expr_1> | <expr_2>.

The value of such expression will take the value of expr_1 for 50% of the time and the value of expr_2 the other times. You can specify the chances with the syntax: expr_1 <0.3|0.7> expr_2. In this case the expression will evaluate to expr_1 only 30% of the time and to expr_2 the remaining 70% of the time. You may omit one of the two numbers, hence expr_1 <0.3|> expr_2 is equivalent to the last expression.

If the sum of the left and right weight add up to a value smaller than 1 then the remaining portion is the chance of the value to be empty. For example expr_1 <0.25|0.25> expr_2 defines an expression that in 25% of the time evaluates to expr_1, 25% of the time evaluates to expr_2 and 50% of the time evaluates to None (i.e. empty)

Merge

You can define an expression that can merge values of two different expressions using the + operator.

For example %int + %float is an expression that evaluates to the sum of a random integer and a random float.

Examples

NOTE: the following examples all specify the option -s 0. This is used solely for reproducibility reason. The common use cases for Feanor do not need to specify a random seed and in fact doing so often defeats the purpose of the tool.

Using the cmdline subcommand

Generate 10 rows with random integers:

$ feanor -s 0 -n 10 cmdline -c a '%int' -c b '%int'
a,b
885440,403958
794772,933488
441001,42450
271493,536110
509532,424604
962838,821872
870163,318046
499748,375441
611720,934973
952225,229053

Generate about 1 kilobyte of rows with 2 random integers in them and write result to /tmp/out.csv:

$ feanor -s 0 -b 1024 cmdline -c a '%int' -c b '%int'  /tmp/out.csv
$ head /tmp/out.csv 
a,b
885440,403958
794772,933488
441001,42450
271493,536110
509532,424604
962838,821872
870163,318046
499748,375441
611720,934973

Generate 10 rows with random integers, the first column between 0 and 10, the second column between 0 and 1000:

$ feanor -s 0 -n 10 cmdline -c a '%int{"min":0, "max":10}' -c b '%int{"min": 0, "max":1000}'
a,b
6,776
6,41
4,988
8,497
6,940
4,991
7,366
9,913
3,516
2,288

Generate 10 rows with random integers and their sum:

$ feanor -s 0 -n 10 cmdline -c a '%int' -c b '%int' -c c '@a+@b'
a,b,c
885440,403958,1289398
794772,933488,1728260
441001,42450,483451
271493,536110,807603
509532,424604,934136
962838,821872,1784710
870163,318046,1188209
499748,375441,875189
611720,934973,1546693
952225,229053,1181278

Using the expr subcommand

Generate 10 rows with random integers:

$ feanor -s 0 -n 10 expr -c a,b '%int·%int'
a,b
885440,403958
794772,933488
441001,42450
271493,536110
509532,424604
962838,821872
870163,318046
499748,375441
611720,934973
952225,229053

Generate about 1 kilobyte of rows with 2 random integers in them and write result to /tmp/out.csv:

$ feanor -s 0 -b 1024 expr -c a,b /tmp/out.csv '%int·%int'
$ head /tmp/out.csv 
a,b
885440,403958
794772,933488
441001,42450
271493,536110
509532,424604
962838,821872
870163,318046
499748,375441
611720,934973

Generate 10 rows with random integers, the first column between 0 and 10, the second column between 0 and 1000:

$ feanor -s 0 -n 10 expr -c a,b '%int{"min":0, "max":10}·%int{"min": 0, "max":1000}'
a,b
6,776
6,41
4,988
8,497
6,940
4,991
7,366
9,913
3,516
2,288

Generate 10 rows with random integers and their sum:

$ feanor -s 0 -n 10 expr -c a,b,c '(%int)=a·(%int)=b·(@a+@b)'
a,b,c
885440,403958,1289398
794772,933488,1728260
441001,42450,483451
271493,536110,807603
509532,424604,934136
962838,821872,1784710
870163,318046,1188209
499748,375441,875189
611720,934973,1546693
952225,229053,1181278

or also:

$ feanor -s 0 -n 10 expr -c a,b,c 'let a:=%int b:=%int in @a·@b·(@a+@b)'
a,b,c
885440,403958,1289398
794772,933488,1728260
441001,42450,483451
271493,536110,807603
509532,424604,934136
962838,821872,1784710
870163,318046,1188209
499748,375441,875189
611720,934973,1546693
952225,229053,1181278

About

Feanor is an artisan of CSV files. It can generate complex CSV files or file bundles for examples and tests.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages