Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid unit name with install_unit() does not give an error #289

Open
billdenney opened this issue Oct 13, 2021 · 9 comments
Open

Invalid unit name with install_unit() does not give an error #289

billdenney opened this issue Oct 13, 2021 · 9 comments
Labels

Comments

@billdenney
Copy link
Contributor

I'm working with clinical laboratory data where a unit is defined for hemoglobin called "g%" (gram-percent) which is equal to 1 g/dL. While trying to create a new unit for it, I guessed that the unit name would not work because of the percent sign, but it didn't give an error when trying to install the unit; it only gave an error when trying to use the unit.

It would be helpful if install_unit() did a check to confirm that the unit was successfully installed before returning.

library(units)
#> udunits database from C:/Users/Bill Denney/Documents/R/win-library/4.1/units/share/udunits/udunits2.xml
install_unit(symbol="g_hemoglobin")
install_unit(symbol="mol_hemoglobin", def="100/0.006206 g_hemoglobin")
install_unit(symbol="g%_hemoglobin", def="1 g_hemoglobin/dL")
foo <- set_units(10, "g_hemoglobin/dL", mode="standard")
set_units(foo, "mmol_hemoglobin/L", mode="standard")
#> 6.206 [mmol_hemoglobin/L]
set_units(foo, "g%_hemoglobin", mode="standard")
#> Error: In 'g%_hemoglobin', 'g%_hemoglobin' is not recognized by udunits.
#> 
#> See a table of valid unit symbols and names with valid_udunits().
#> Custom user-defined units can be added with install_unit().
#> 
#> See a table of valid unit prefixes with valid_udunits_prefixes().
#> Prefixes will automatically work with any user-defined unit.

Created on 2021-10-13 by the reprex package (v2.0.1)

@billdenney
Copy link
Contributor Author

To be clear, I don't think that the units library should necessarily support "g%_hemoglobin". The request is just that an error is generated.

@Enchufa2
Copy link
Member

Agree. The thing is... the udunits2 call succeeds, but then the unit is not available. I don't know what happens internally, but I suppose it has something to do with the fact that the percentage symbol is part of the grammar. So this issue should be raised upstream.

@Enchufa2
Copy link
Member

Actually, it succeeds because it is permitted and it works:

#include <stdio.h>
#include <udunits2/udunits2.h>

int main() {
    ut_set_error_message_handler((ut_error_message_handler) ut_ignore);
    ut_system *sys = ut_read_xml(NULL);
    ut_encoding enc = UT_UTF8;
    ut_set_error_message_handler((ut_error_message_handler) vprintf);
    
    // install g_hemoglobin
    ut_unit *g_hemoglobin = ut_new_base_unit(sys);
    ut_map_symbol_to_unit("g_hemoglobin", enc, g_hemoglobin);
    ut_map_unit_to_symbol(g_hemoglobin, "g_hemoglobin", enc);

    // install g%_hemoglobin
    ut_unit *gpc_hemoglobin = ut_parse(sys, "1 g_hemoglobin/dL", enc);
    ut_map_symbol_to_unit("g%_hemoglobin", enc, gpc_hemoglobin);
    ut_map_unit_to_symbol(gpc_hemoglobin, "g%_hemoglobin", enc);

    // conversion
    char *from = "g_hemoglobin/dL", to[128];
    ut_unit *from_u = ut_parse(sys, from, enc);
    cv_converter *cv = ut_get_converter(from_u, gpc_hemoglobin);
    ut_format(gpc_hemoglobin, to, sizeof(to), enc);
    double x = 10;
    printf("From: %f %s\n", x, from);
    printf("To  : %f %s\n", cv_convert_double(cv, x), to);
    
    cv_free(cv);
    ut_free(from_u);
    ut_free(g_hemoglobin);
    ut_free(gpc_hemoglobin);
    ut_free_system(sys);
    return 0;
}

Save this as test.c and then

$ gcc test.c -l udunits2 && ./a.out
From: 10.000000 g_hemoglobin/dL
To  : 10.000000 g%_hemoglobin

So this means that we are doing something wrong with that percentage symbol somewhere.

@Enchufa2
Copy link
Member

However:

#include <stdio.h>
#include <udunits2/udunits2.h>

int main() {
    ut_set_error_message_handler((ut_error_message_handler) ut_ignore);
    ut_system *sys = ut_read_xml(NULL);
    ut_encoding enc = UT_UTF8;
    ut_set_error_message_handler((ut_error_message_handler) vprintf);
    
    // install g_hemoglobin
    ut_unit *g_hemoglobin = ut_new_base_unit(sys);
    ut_map_symbol_to_unit("g_hemoglobin", enc, g_hemoglobin);
    ut_map_unit_to_symbol(g_hemoglobin, "g_hemoglobin", enc);

    // install g%_hemoglobin
    ut_unit *gpc_hemoglobin = ut_parse(sys, "1 g_hemoglobin/dL", enc);
    ut_map_symbol_to_unit("g%_hemoglobin", enc, gpc_hemoglobin);
    ut_map_unit_to_symbol(gpc_hemoglobin, "g%_hemoglobin", enc);

    // conversion works
    char *from = "g_hemoglobin/dL", to[128];
    ut_unit *from_u = ut_parse(sys, from, enc);
    cv_converter *cv = ut_get_converter(from_u, gpc_hemoglobin);
    ut_format(gpc_hemoglobin, to, sizeof(to), enc);
    double x = 10;
    printf("From: %f %s\n", x, from);
    printf("To  : %f %s\n", cv_convert_double(cv, x), to);

    // parsing doesn't work
    ut_unit *u = ut_parse(sys, to, enc);
    if (!u) printf("NULL unit!\n");
    
    cv_free(cv);
    ut_free(from_u);
    ut_free(g_hemoglobin);
    ut_free(gpc_hemoglobin);
    ut_free_system(sys);
    return 0;
}
$ gcc test.c -l udunits2 && ./a.out
From: 10.000000 g_hemoglobin/dL
To  : 10.000000 g%_hemoglobin
NULL unit!

@billdenney
Copy link
Contributor Author

Thanks for reporting it upstream!

@billdenney
Copy link
Contributor Author

Since there doesn't appear to be movement upstream, would it be reasonable to add some form of test here to see if it works? Perhaps within install_units() add a test like set_units(1, new_unit_name), see if it succeeds, and if not raise an informative error to the user?

@Enchufa2
Copy link
Member

Enchufa2 commented Feb 3, 2022

The thing is that, as shown in the reprex, installation and even conversion works. It's parsing after the unit was installed what has a bug and doesn't work, and it's only for these cases with a percentage in the name. So not a fan of adding this test just for a bug in an edge case which in turn succeeded (and the database of units is already "polluted" with an unparseable identifier).

We will just not allow percentage characters in symbols/names until this issue is properly solved upstream (implemented in c9e958f):

units::install_unit(symbol="g%_hemoglobin", def="1 g_hemoglobin/dL")
#> Error in units::install_unit(symbol = "g%_hemoglobin", def = "1 g_hemoglobin/dL"): 
#>   Symbols/names with a percentage character '%' are not allowed (see https://github.com/r-quantities/units/issues/289)

@billdenney
Copy link
Contributor Author

Unfortunately, the UDUNITS2 grammar is more complex. As I read it, it allows a percent sign by itself but not a percent sign as part of a longer units string (https://www.unidata.ucar.edu/software/udunits/udunits-2.1.24/udunits2lib.html#Grammar). So, this would break the percent ("%") unit.

I'm guessing that it does not make sense for you to do a full grammar check before installing the unit. So, this may be a better thing to wait until fixed upstream.

@Enchufa2
Copy link
Member

Enchufa2 commented Feb 3, 2022

No, this doesn't break anything. We are just disabling any new user-provided symbol or name with a percentage sign in it. The existing percent unit is safely loaded from the system XML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants