-
Notifications
You must be signed in to change notification settings - Fork 4
Correlation
The #valid_correlation test in HXL schemas allows you to test whether one value is always aligned with another one. The test is one-way: it will check that the values of the columns matched by the tag patterns in #valid_correlation always correspond to the same value in the column matched by the tag pattern in #valid_tag, but not the other way around. A couple of examples will help make it clear.
Consider the following data
#adm1+name | #adm1+code |
---|---|
Coast Province | XX01 |
Cast Province | XX01 |
Coast Province | X01 |
Coast Province | XX01 |
Coast Province | XX01 |
You could use the following rule to test for correlation from the first column to the second:
{
"#valid_tag":"#adm1+name",
"#valid_correlation":"#adm1+code",
"#description":"Province name does not correspond with P-code"
}
With this rule, the HXL validation engine would detect the error "Cast Province" => "XX01" (and suggest the correction "Coast Province"), but would not detect the error "Coast Province" => "X01".
The following rule tries the correlation the other way:
{
"#valid_tag":"#adm1+code",
"#valid_correlation":"#adm1+name",
"#description":"Province P-code does not correspond with name"
}
With this rule, the HXL validation engine would detect the error "X01" => "Coast Province" (and suggest the correction "XX01"), but would not detect the error "XX01"=>"Cast Province". When you require a two-way correspondence, you need to add two reciprocal correlation rules:
[
{
"#valid_tag":"#adm1+name",
"#valid_correlation":"#adm1+code",
"#description":"Province name does not correspond with P-code"
},
{
"#valid_tag":"#adm1+code",
"#valid_correlation":"#adm1+name",
"#description":"Province P-code does not correspond with name"
}
]
The reason for the extra complexity above is that correlations aren't always reciprocal. For example, "Prefecture A" and "Prefecture B" might both be admin2 levels inside the same province. Consider the following:
#adm1+name | #adm2+name |
---|---|
Coast Province | Prefecture A |
XXX | Prefecture A |
Coast Province | Prefecture B |
Coast Province | Prefecture B |
Coast Province | Prefecture A |
Both "Prefecture A" and "Prefecture B" are acceptable values for #adm2+name when #adm1+name is "Coast Province" (the province can have multiple prefectures), but only "Coast Province" is an acceptable value for #adm1+name when #adm2+name is "Prefecture A" (the prefecture is always in the same province). This rule will detect the one-way correspondences:
{
"#valid_tag":"#adm1+name",
"#valid_correlation":"#adm2+name",
"#description":"Wrong province for prefecture"
}
It will correctly identify "XXX" as an error (and suggest "Coast Province" as a correction), but will not complain that both "Prefecture A" and "Prefecture B" are in the same province.
Learn more about the HXL standard at http://hxlstandard.org