Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bit packing #162

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ For IID tests use the Makefile to compile the program:

Then you can run the program with

./ea_iid [-i|-c] [-a|-t] [-v] [-l <index>,<samples>] <file_name> [bits_per_symbol]
./ea_iid [-i|-c] [-a|-t] [-v] [-l <index>,<samples>] [-p] <file_name> [bits_per_symbol]

You may specify either `-i` or `-c`, and either `-a` or `-t`. These correspond to the following:

Expand All @@ -55,30 +55,32 @@ You may specify either `-i` or `-c`, and either `-a` or `-t`. These correspond t
* Note: When testing binary data, no `H_bitstring` assessment is produced, so the `-a` and `-t` options produce the same results for the initial assessment of binary data.
* `-l`: Reads (at most) `samples` data samples after indexing into the file by `index*samples` bytes.
* `-v`: Optional verbosity flag for more output. Can be used multiple times.
* bits_per_symbol are the number of bits per symbol. Each symbol is expected to fit within a single byte.
* `-p`: Optional flag to denote that input data is contiguous bit-packed with the first sample occupying the most significant bits of the first byte and subsequent samples being in lower significant bits. Use of the `bits_per_symbol` parameter is required to ensure correct unpacking. Currently not compatible with `-l` parameter.
* `bits_per_symbol` are the number of bits per symbol. Each symbol is expected to fit within a single byte unless `-p` is used.

To run the non-IID tests, use the Makefile to compile:

make non_iid

Running this works the same way. This looks like

./ea_non_iid [-i|-c] [-a|-t] [-v] [-l <index>,<samples> ] <file_name> [bits_per_symbol]
./ea_non_iid [-i|-c] [-a|-t] [-v] [-l <index>,<samples> ] [-p] <file_name> [bits_per_symbol]

To run the restart testing, use the Makefile to compile:

make restart

Running this is similar.

./ea_restart [-i|-n] [-v] <file_name> [bits_per_symbol] <H_I>
./ea_restart [-i|-n] [-v] [-p] <file_name> [bits_per_symbol] <H_I>

The file should be in the "row dataset" format described in SP800-90B Section 3.1.4.1.

* `-i`: Indicates IID data.
* `-n`: Indicates non-IID data.
* `-v`: Optional verbosity flag for more output. Can be used multiple times.
* bits_per_symbol are the number of bits per symbol. Each symbol is expected to fit within a single byte.
* `-p`: Optional flag to denote that input data is contiguous bit-packed with the first sample occupying the most significant bits of the first byte and subsequent samples being in lower significant bits. Use of the `bits_per_symbol` parameter is required to ensure correct unpacking.
* `bits_per_symbol` are the number of bits per symbol. Each symbol is expected to fit within a single byte unless `-p` is used.
* `H_I` is the assessed entropy.

To calculate the entropy reduction due to conditioning, use the Makefile to compile:
Expand All @@ -98,8 +100,8 @@ or
* `n_in`: The number of bits entering the conditioning step per output.
* `n_out`: The number of bits per conditioning step output.
* `nw`: The narrowest width of the conditioning step.
* `h_in`: The amount of entropy entering the conditioning step per output. Must be less than n_in.
* `h'`: The entropy estimate per bit of conditioned sequential dataset (only for '-n' option).
* `h_in`: The amount of entropy entering the conditioning step per output. Must be less than `n_in`.
* `h'`: The entropy estimate per bit of conditioned sequential dataset (only for `-n` option).

## Make

Expand Down
38 changes: 32 additions & 6 deletions cpp/iid_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,18 @@


[[ noreturn ]] void print_usage(){
printf("Usage is: ea_iid [-i|-c] [-a|-t] [-v] [-l <index>,<samples> ] <file_name> [bits_per_symbol]\n\n");
printf("Usage is: ea_iid [-i|-c] [-a|-t] [-v] [-l <index>,<samples> ] [-p] <file_name> [bits_per_symbol]\n\n");
printf("\t <file_name>: Must be relative path to a binary file with at least 1 million entries (samples).\n");
printf("\t [bits_per_symbol]: Must be between 1-8, inclusive. By default this value is inferred from the data.\n");
printf("\t [-i|-c]: '-i' for initial entropy estimate, '-c' for conditioned sequential dataset entropy estimate. The initial entropy estimate is the default.\n");
printf("\t [-a|-t]: '-a' produces the 'H_bitstring' assessment using all read bits, '-t' truncates the bitstring used to produce the `H_bitstring` assessment to %d bits. Test all data by default.\n", MIN_SIZE);
printf("\t Note: When testing binary data, no `H_bitstring` assessment is produced, so the `-a` and `-t` options produce the same results for the initial assessment of binary data.\n");
printf("\t -v: Optional verbosity flag for more output. Can be used multiple times.\n");
printf("\t -l <index>,<samples>\tRead the <index> substring of length <samples>.\n");
printf("\n");
printf("\t Samples are assumed to be packed into 8-bit values, where the least significant 'bits_per_symbol'\n");
printf("\t -p: Optional flag to denote that input data is contiguous bit-packed with the first sample occupying the most significant bits of the first byte and subsequent samples being in lower significant bits. Use of the 'bits_per_symbol' parameter is required to ensure correct unpacking. Currently not compatible with -l parameter.\n");
printf("\t Currently only packed samples of 1, 2, 4 or 8 bits are permitted to ensure appropriate sample alignment to byte boundaries.\n");
printf("\n");
printf("\t Samples are assumed to be packed into 8-bit values unless -p is given, where the least significant 'bits_per_symbol'\n");
printf("\t bits constitute the symbol.\n");
printf("\n");
printf("\t -i: Initial Entropy Estimate (Section 3.1.3)\n");
Expand Down Expand Up @@ -47,18 +49,33 @@ int main(int argc, char* argv[]){
int verbose = 0;
double rawmean, median;
char* file_path;
data_t data;
data_t data = {0};
int opt;
unsigned long subsetIndex = ULONG_MAX;
unsigned long subsetSize = 0;
unsigned long long inint;
char *nextOption;
bool bit_packed = false;

data.word_size = 0;
initial_entropy = true;
all_bits = true;

while ((opt = getopt(argc, argv, "icatvl:")) != -1) {
/* This particular test harness makes use of non-determinism in the permutation test sequence.
* In order to perform any kind of comparisons for regression testing, we need to enable deterministic
* behaviour. This means we need to disable seeding from /dev/urandom.
* There are several ways in which this could be accomplished. The best way would be to permit
* run-time changes to the seed source (eg. like faketime, but for /dev/urandom access). However, there
* are significant limitations to that kind of mocking for device nodes. Therefore, we will use
* environment variables to alter behaviour. However, we want to make sure that someone didn't leave
* an environment variable enabled when they didn't mean to.
*/
if (getenv("__SP80090B_MOCKSEED__") != NULL) {
printf("*** Environment variable '__SP80090B_MOCKSEED__' detected. Test harness is operating in deterministic test mode. Make sure this is expected. ***\n");
}


while ((opt = getopt(argc, argv, "icatvl:p")) != -1) {
switch(opt) {
case 'i':
initial_entropy = true;
Expand All @@ -72,6 +89,9 @@ int main(int argc, char* argv[]){
case 't':
all_bits = false;
break;
case 'p':
bit_packed = true;
break;
case 'v':
verbose++;
break;
Expand Down Expand Up @@ -116,11 +136,17 @@ int main(int argc, char* argv[]){
}
}

if(bit_packed && !data.word_size) {
/* word_size was not given as a param and it should be for bit-packing */
printf("When using bit-packed input, you must provide the bits-per-symbol parameter.\n");
print_usage();
}

if(verbose > 0){
printf("Opening file: '%s'\n", file_path);
}

if(!read_file_subset(file_path, &data, subsetIndex, subsetSize)){
if(!read_file_subset(file_path, &data, bit_packed, subsetIndex, subsetSize)){
printf("Error reading file.\n");
print_usage();
}
Expand Down
22 changes: 17 additions & 5 deletions cpp/non_iid_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,18 @@


[[ noreturn ]] void print_usage() {
printf("Usage is: ea_non_iid [-i|-c] [-a|-t] [-v] [-l <index>,<samples> ] <file_name> [bits_per_symbol]\n\n");
printf("Usage is: ea_non_iid [-i|-c] [-a|-t] [-v] [-l <index>,<samples> ] [-p] <file_name> [bits_per_symbol]\n\n");
printf("\t <file_name>: Must be relative path to a binary file with at least 1 million entries (samples).\n");
printf("\t [bits_per_symbol]: Must be between 1-8, inclusive. By default this value is inferred from the data.\n");
printf("\t [-i|-c]: '-i' for initial entropy estimate, '-c' for conditioned sequential dataset entropy estimate. The initial entropy estimate is the default.\n");
printf("\t [-a|-t]: '-a' produces the 'H_bitstring' assessment using all read bits, '-t' truncates the bitstring used to produce the `H_bitstring` assessment to %d bits. Test all data by default.\n", MIN_SIZE);
printf("\t Note: When testing binary data, no `H_bitstring` assessment is produced, so the `-a` and `-t` options produce the same results for the initial assessment of binary data.\n");
printf("\t -v: Optional verbosity flag for more output. Can be used multiple times.\n");
printf("\t -l <index>,<samples>\tRead the <index> substring of length <samples>.\n");
printf("\t -p: Optional flag to denote that input data is contiguous bit-packed with the first sample occupying the most significant bits of the first byte and subsequent samples being in lower significant bits. Use of the 'bits_per_symbol' parameter is required to ensure correct unpacking. Currently not compatible with -l parameter.\n");
printf("\t Currently only packed samples of 1, 2, 4 or 8 bits are permitted to ensure appropriate sample alignment to byte boundaries.\n");
printf("\n");
printf("\t Samples are assumed to be packed into 8-bit values, where the least significant 'bits_per_symbol'\n");
printf("\t Samples are assumed to be packed into 8-bit values unless -p is given, where the least significant 'bits_per_symbol'\n");
printf("\t bits constitute the symbol.\n");
printf("\n");
printf("\t -i: Initial Entropy Estimate (Section 3.1.3)\n");
Expand All @@ -50,21 +52,22 @@ int main(int argc, char* argv[]){
int verbose = 0;
char *file_path;
double H_original, H_bitstring, ret_min_entropy;
data_t data;
data_t data = {0};
int opt;
double bin_t_tuple_res = -1.0, bin_lrs_res = -1.0;
double t_tuple_res = -1.0, lrs_res = -1.0;
unsigned long subsetIndex=ULONG_MAX;
unsigned long subsetSize=0;
unsigned long long inint;
char *nextOption;
bool bit_packed = false;

data.word_size = 0;

initial_entropy = true;
all_bits = true;

while ((opt = getopt(argc, argv, "icatvl:")) != -1) {
while ((opt = getopt(argc, argv, "icatvl:p")) != -1) {
switch(opt) {
case 'i':
initial_entropy = true;
Expand All @@ -81,6 +84,9 @@ int main(int argc, char* argv[]){
case 'v':
verbose++;
break;
case 'p':
bit_packed = true;
break;
case 'l':
inint = strtoull(optarg, &nextOption, 0);
if((inint > ULONG_MAX) || (errno == EINVAL) || (nextOption == NULL) || (*nextOption != ',')) {
Expand Down Expand Up @@ -123,9 +129,15 @@ int main(int argc, char* argv[]){
}
}

if(bit_packed && !data.word_size) {
/* word_size was not given as a param and it should be for bit-packing */
printf("When using bit-packed input, you must provide the bits-per-symbol parameter.\n");
print_usage();
}

if(verbose>0) printf("Opening file: '%s'\n", file_path);

if(!read_file_subset(file_path, &data, subsetIndex, subsetSize)){
if(!read_file_subset(file_path, &data, bit_packed, subsetIndex, subsetSize)){
printf("Error reading file.\n");
print_usage();
}
Expand Down
25 changes: 19 additions & 6 deletions cpp/restart_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,18 @@
#define SIMULATION_ROUNDS 5000000

[[ noreturn ]] void print_usage(){
printf("Usage is: ea_restart [-i|-n] [-v] <file_name> [bits_per_symbol] <H_I>\n\n");
printf("Usage is: ea_restart [-i|-n] [-p] [-v] <file_name> [bits_per_symbol] <H_I>\n\n");
printf("\t <file_name>: Must be relative path to a binary file with at least 1 million entries (samples),\n");
printf("\t and in the \"row dataset\" format described in SP800-90B Section 3.1.4.1.\n");
printf("\t [bits_per_symbol]: Must be between 1-8, inclusive.\n");
printf("\t <H_I>: Initial entropy estimate.\n");
printf("\t [-i|-n]: '-i' for IID data, '-n' for non-IID data. Non-IID is the default.\n");
printf("\t -v: Optional verbosity flag for more output.\n");
printf("\t -p: Optional flag to denote that input data is contiguous bit-packed with the first sample occupying the most significant bits of the first byte and subsequent samples being in lower significant bits. Use of the 'bits_per_symbol' parameter is required to ensure correct unpacking.\n");
printf("\t Currently only packed samples of 1, 2, 4 or 8 bits are permitted to ensure appropriate sample alignment to byte boundaries.\n");
printf("\n");
printf("\t Restart samples are assumed to be packed into 8-bit values, where the rightmost 'bits_per_symbol'\n");
printf("\t bits constitute the sample.\n");
printf("\t Restart samples are assumed to be packed into 8-bit values unless -p is used, where the\n");
printf("\t rightmost 'bits_per_symbol' bits constitute the sample.\n");
printf("\n");
printf("\t This program performs restart testing as described in Restart Tests (Section 3.1.4). The data\n");
printf("\t consists of 1000 restarts, each with 1000 samples. The data is converted to rows and columns\n");
Expand Down Expand Up @@ -131,13 +133,15 @@ int main(int argc, char* argv[]){
long i, j, X_i, X_r, X_c, X_max;
double H_I, H_r, H_c, alpha, ret_min_entropy;
byte *rdata, *cdata;
data_t data;
data_t data = {0};
int opt;

iid = false;
data.word_size = 0;

while ((opt = getopt(argc, argv, "inv")) != -1) {
bool bit_packed = false;

while ((opt = getopt(argc, argv, "invp")) != -1) {
switch(opt) {
case 'i':
iid = true;
Expand All @@ -148,6 +152,9 @@ int main(int argc, char* argv[]){
case 'v':
verbose++;
break;
case 'p':
bit_packed = true;
break;
default:
print_usage();
}
Expand Down Expand Up @@ -179,6 +186,12 @@ int main(int argc, char* argv[]){
argc--;
}

if(bit_packed && !data.word_size) {
/* word_size was not given as a param and it should be for bit-packing */
printf("When using bit-packed input, you must provide the bits-per-symbol parameter.\n");
print_usage();
}

// get H_I
H_I = atof(argv[0]);
if(H_I < 0){
Expand All @@ -188,7 +201,7 @@ int main(int argc, char* argv[]){

if(verbose > 0) printf("Opening file: '%s'\n", file_path);

if(!read_file(file_path, &data)){
if(!read_file(file_path, &data, bit_packed)){
printf("Error reading file.\n");
print_usage();
}
Expand Down
Binary file added cpp/selftest/pack-samples/packed-1bit.bin
Binary file not shown.
Binary file added cpp/selftest/pack-samples/packed-2bit.bin
Binary file not shown.
Binary file added cpp/selftest/pack-samples/packed-4bit.bin
Binary file not shown.
Binary file added cpp/selftest/pack-samples/packed-8bit.bin
Binary file not shown.
Binary file added cpp/selftest/pack-samples/unpacked-1bit.bin
Binary file not shown.
Binary file added cpp/selftest/pack-samples/unpacked-2bit.bin
Binary file not shown.
Binary file added cpp/selftest/pack-samples/unpacked-4bit.bin
Binary file not shown.
Binary file added cpp/selftest/pack-samples/unpacked-8bit.bin
Binary file not shown.
Loading