Adding Support for Tag Directives #35

jess-sol · 2016-09-26T17:05:00Z

Hey all,

I'm working on a project that requires support for parsing Yaml documents with shorthand tag directives. I'm interested in looking into adding this support for yaml_rust, instead of writing my own Yaml parser. That said I don't have much experience with the yaml-rust project, or writing parsers in general.

So I wanted to start a discussion on what this feature should look like, how it should function, and how it may be implemented.

The text was updated successfully, but these errors were encountered:

chyh1990 · 2016-09-28T08:53:58Z

This library currently support several standard tag directives, e.g. !!str. But this feature is seldom used, can you give me some typical use cases? And how other YAML libraries support this feature (for example, in python or ruby) ?

jess-sol · 2016-09-28T15:23:56Z

Sure, I need to parse a Yaml CloudFormation template, which may or may not have local tags in it. I need to replace some of those tags (and their values) with new values; but some tags I just need to leave alone, so they'll be the same when I dump the Yaml. Here's a really simple example from one of my CloudFormation files:

WebInstance:
    Type: AWS::EC2::Instance
    CreationPolicy:
        ResourceSignal:
            Region: !Ref AWS::Region
            Timeout: !GetAtt ["a", "b", "c"]

I looked at how Ruby and Python handle custom tags. Tags are used to express the type of next node in the Yaml document, Ruby and Python both expose constructor functions that let the user register a function to be called when a specific tag is encountered, and the result of this function is then used as the value of the tagged node.

In Ruby:

require 'yaml'
YAML.add_domain_type '', 'Ref' do |type,val| { "Reference": val } end
thing = YAML.load_file('../../small-test.yaml')
puts thing.inspect

Which returns

{"WebInstance"=>{"Type"=>"AWS::EC2::Instance", "CreationPolicy"=>{"ResourceSignal"=>{"Region"=>{:Reference=>"AWS::Region"}, "Timeout"=>["a", "b", "c"]}}}}

There are actually four different methods in Ruby's Yaml module: add_builtin_type, add_domain_type, add_private_type, add_ruby_type Docs. They all do pretty much the same thing though, the major difference is the prefix of the tag (if it's a global tag from yaml.org, a custom domain, a private tag, or a global tag from ruby.yaml.org, respectively).

Ruby's implementation seems a bit wonky, they don't explicitly allow you to add local tags (aka !Ref), so instead you must use add_domain_type without a domain.

In Python:

import yaml
from pprint import pprint

class CFReference:
    def __init__(self, reference):
        self.reference = reference

class CFGetAtt:
    def __init__(self, attribute):
        self.attribute = attribute

def getatt_constructor(loader, node):
    return CFGetAtt(loader.construct_sequence(node))

def ref_constructor(loader, node):
    return CFReference(loader.construct_scalar(node))

yaml.add_constructor('!Ref', ref_constructor)
yaml.add_constructor('!GetAtt', getatt_constructor)

with open('../../small-test.yaml', 'r') as fh:
    pprint(yaml.load(fh))

Which returns

{'WebInstance': {'CreationPolicy': {'ResourceSignal': {'Region': <__main__.CFReference instance at 0x10b4426c8>,'Timeout': <__main__.CFGetAtt instance at 0x10b442710>}},'Type': 'AWS::EC2::Instance'}}

Python Docs

Python also has support for presenters, functions that take a class and return its Yaml equivalent for serialization. I'm not 100% how Ruby does this, perhaps by calling to_yaml on objects.

Unlike Ruby, Python will fail if it encounters a tag it can't construct. The Yaml spec says this about unresolved tags:

If a document contains unresolved tags, the YAML processor is unable to compose a complete representation graph. In such a case, the YAML processor may compose a partial representation, based on each node’s kind and allowing for non-specific tags.

A few of the features I think would be useful:

Generalized tag constructors
Python allows you to specify a prefix, and all tags starting with that prefix get passed to a specific constructor. This would be really useful for me, especially if that prefix can be blank (aka a catchall constructor).
Alternatively, instead of registering constructors, this library exposes a single function that gets called for all unrecognized tags.
Tags are dumped correctly
This is really important for my use case, because I don't want to do anything with some tags. It'd be really nice if serialize(deserialize(yaml)) == yaml with respect to tagging. Or at least make this achievable (even if I have to translate tagged values to structs during deserialization, and when dumping those structs explicitly state what tag they should have when they're serialized).
Unresolved tags
I don't think unresolved tags should be an issue for this library. Unlike serde_yaml, this library doesn't deserialize to structs; so unresolved tags should be much less of an issue. Maybe unresolved tags get a special struct that derefs to the value (this way the tag is still available). This way serde_yaml could then fail if it encounters an unresolved tag, but this library wouldn't need to.

DeltaEvo · 2016-09-28T19:42:23Z

I need this functionality too , actualy I copy paste YamlLoader struct and I add this code in the on_event when event is Scalar in the match

I'ts a tag to include yaml tree into another

Exemple:

hello: !include 'hello.yaml'

The code

              if style == TScalarStyle::SingleQuoted {
                    if let Some(TokenType::Tag(ref handle, ref suffix)) = *tag {
                        if *handle == "!" && *suffix == "include" {
                            let mut content = String::new();
                            let node = match File::open(Path::new(self.root.as_str()).join(v)){
                                Ok(mut f) => {
                                    let _ = f.read_to_string(&mut content);
                                    match YamlLoader::load_from_str(content.as_str() , self.root.clone()) {
                                        Ok(mut docs) => docs.pop().unwrap(),
                                        Err(_) => Yaml::BadValue
                                    }
                                }
                                Err(_) => Yaml::BadValue
                            };
                            self.insert_new_node((node , aid));
                            return;
                        }
                    }
                }

But It will great to have an api for this
(And more generaly an api to customise YamlLoader)

jess-sol · 2016-09-28T19:56:48Z

That's funny @DeltaEvolution, that was one of the use cases I wanted to add with tags, file inclusion. That's a decent example you have there for it, in the mean time.

DeltaEvo · 2016-09-28T20:05:42Z

@ioben ;)
@chyh1990 If you want more exemple with another's libraries, I have the !include tag with snakeyaml in java

DeltaEvo · 2016-09-29T19:46:48Z

I have created a very simple implementation here

The parser with this api look like

    struct HelloParser;

    impl YamlScalarParser for HelloParser {
        fn parse_scalar(&self, tag: &TokenType, value: &String) -> Option<Yaml> {
            if let TokenType::Tag(ref handle, ref suffix) = *tag {
                if *handle == "!" && *suffix == "hello" {
                    return Some(Yaml::String("Hello ".to_string() + value))
                }
            }
            None
        }
    }

So you can parse not only tags but all types of scalar values

flyx · 2016-10-01T15:01:24Z

Hi folks!

Not sure how I ended up here, but since it has been asked how other libraries use this feature, I think I can contribute some thoughts. I am the author of NimYAML, which employs tags by default. Since most other YAML implementations are written for languages with dynamic typing, we might benefit from sharing ideas here.

First, a word on terminology: A tag directive is a prefix before the YAML document; what you are talking about is tag handles:

%YAML 1.2
%TAG !n! tag:yaml.org,2002: # << this is a tag directive
---
!n! tagged scalar # << this is a tag handle

Tag handles are used to explicitly denote the type of a YAML node. The type of a YAML node without a tag is deduced from its content and its ancestors in the same document. Tag handles can be employed for type safety. For example, in NimYAML, I can serialize a sequence of integer values like this (I hope this code is readable for people not knowing Nim; @[] is a sequence constructor):

import yaml.serialization
echo dump(@[1, 2, 3])

Which will generate:

%YAML 1.2
---                                                                                                                                                                                                               
!nim:system:seq(nim:system:int64) [1, 2, 3]

When loading this YAML file with NimYAML, it will check whether it is loaded into a sequence type that matches the tag of the value and raise an error if it doesn't. That way, type errors can be discovered early. Note that the numbers inside the sequence do not need tags because their tag is implicitly given by an ancestor (the sequence).

Now I understand that yaml-rust does not load YAML into native types, but rather into a DOM. This corresponds to the Representation (Node Graph) as defined in the spec. As you can see in the spec, tags are part of that Node Graph, so there is nothing yaml-rust itself is required to do with them if its API barrier is between the Node Graph and native data structures. Its only responsibility should be to collect the tags present in the YAML input and make them accessible in its DOM. NimYAML also has a DOM API, but its use is discouraged because serializing to and from native types is more convenient and automatically includes type checks.

A use-case which might be relevant for yaml-rust is tag handles in sequences or mappings with heterogeneous value types. An example from NimYAML documentation:

%YAML 1.2
---
- this is a string
- 42
- false
- !!str 23
- !nim:demo:Person {name: Trillian}
- !!null

To process this sequence, the type of each value must be evaluated. For scalar values, this could be done by guessing the type (this is a string -> string, 42 -> integer). But if you have multiple complex types that can occur inside this sequence, it can be very impractical (or downright impossible) to infer their type from the value. Therefore, using tags here is very helpful. Since I do not want to clutter this conversation with code too much, I will simply link to the NimYAML examples to show how this data is loaded with NimYAML.

So, to sum up: My advice would be to simply make tags queryable from your DOM interface and let the caller decide what to do with them. They are very relevant for implementing automatic (de-)serialization of native types, and perhaps you want to add that some day to yaml-rust.

Cheers!

chyh1990 · 2016-10-10T01:50:49Z

Thanks for the detailed discussion!

Here's my options:

Handling tags with hooks (in the same way in Python or Ruby) is easy, maybe it can be added in the next major release. @ioben @DeltaEvolution
Unfortunately, yaml-rust does not implement a full DOM containing all metadata in the original text file (e.g. tags, mark, etc.) yet. This makes serialize(deserialize(yaml)) == yaml impossible. The serializer is preliminary, we need contribution for that. @ioben
tag directives for standard types are already supported by yaml-rust (also type guessing), but for a strongly typed language like Rust, maybe it's not easy to do auto type casting? @flyx

@DeltaEvolution Can you generalize your patch and submit a PR?

flyx · 2016-10-10T07:08:29Z

@chyh1990 The root type must be known at compile time and it can only be checked whether the YAML has the correct tag on its root element for strongly typed languages, that is correct. However, if you have

enum Message {
    Quit,
    ChangeColor(i32, i32, i32),
    Move { x: i32, y: i32 },
    Write(String),
}

(forgive me, I do not really know Rust and just copy things from the documentation), you can use Vec<Message> as root type, which would make it possible to deserialize this to a native type:

- !Quit  # no value
- !ChangeColor [1, 2, 3] # sequence as child
- !Move {x: 23, y: 42] # mapping as child
- !Write some string # scalar as child

If you have exactly one value for each enum item, and all these values have different types, you can match the tag (or the guessed type) against the value types instead of using the enum items as tags. This makes it possible to deserialize complex heterogeneous structures. You do not lose type safety and do not do any type casting.

chyh1990 · 2016-11-04T08:27:13Z

@DeltaEvolution propose a new API for tag parsing (see #37 ). It looks OK for me. Any opinion for the PR? @flyx @ioben

trans · 2016-11-16T19:31:08Z

I don't know Rust so forgive me that I can't give more Rust-oriented advice. But I know YAML quite well. What is often overlooked in implementations, beside the intermediate representation graph that @flyx mentions, are Tag Schema. I am not sure a single implementation of YAML currently in the wild actually handles this correctly. A complete implementation would allow for the definition of alternate schema in order to fully control the use of tags in the processes of loading and dumping, and of course include JSON and Core schema out of the box.

I am presently working on adding this support to Crystal, which is built on top of libyaml. When I am done I'll drop a line here if anyone wants to take a look for reference sake.

flyx · 2016-11-16T20:29:00Z

@trans I don't think that it's overlooked. The whole idea of having programming language independent tags never really took off, because YAML is being used more for serialization/deserialization within the same application and for configuration. In popular YAML implementations, you can specify custom tags in addition to those supported out of the box, and the spec advices that other schemas are based on the core schema, so I don't really see how that is not correct.

softprops · 2020-06-05T06:56:10Z

What's the status of this. I just surveyed the rust yaml ecosystem and this crate looks to be the most complete despite missing features like this that would make it complete or at least on par with other language ecosystems support for yaml tools.

Have others been able to work around this yet? I'm also needing to parse cloudformation yaml templates that leverages several !Foo tags https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference.html

theblazehen · 2020-06-05T10:28:36Z

yaml-rust/tests/specexamples.rs.inc

Lines 55 to 73 in 9f7d848

    
           // TODO: 2.19 - 2.22 schema tags 
        
           const EX2_23 : &'static str = 
        
               "---\nnot-date: !!str 2002-04-28\n\npicture: !!binary |\n R0lGODlhDAAMAIQAAP//9/X\n 17unp5WZmZgAAAOfn515eXv\n Pz7Y6OjuDg4J+fn5OTk6enp\n 56enmleECcgggoBADs=\n\napplication specific tag: !something |\n The semantics of the tag\n above may be different for\n different documents."; 
        
           const EX2_24 : &'static str = 
        
               "%TAG ! tag:clarkevans.com,2002:\n--- !shape\n  # Use the ! handle for presenting\n  # tag:clarkevans.com,2002:circle\n- !circle\n  center: &ORIGIN {x: 73, y: 129}\n  radius: 7\n- !line\n  start: *ORIGIN\n  finish: { x: 89, y: 102 }\n- !label\n  start: *ORIGIN\n  color: 0xFFEEBB\n  text: Pretty vector drawing."; 
        
           const EX2_25 : &'static str = 
        
               "# Sets are represented as a\n# Mapping where each key is\n# associated with a null value\n--- !!set\n? Mark McGwire\n? Sammy Sosa\n? Ken Griffey"; 
        
           const EX2_26 : &'static str = 
        
               "# Ordered maps are represented as\n# A sequence of mappings, with\n# each mapping having one key\n--- !!omap\n- Mark McGwire: 65\n- Sammy Sosa: 63\n- Ken Griffey: 58"; 
        
           const EX2_27 : &'static str = 
        
               "--- !<tag:clarkevans.com,2002:invoice>\ninvoice: 34843\ndate   : 2001-01-23\nbill-to: &id001\n    given  : Chris\n    family : Dumars\n    address:\n        lines: |\n            458 Walkman Dr.\n            Suite #292\n        city    : Royal Oak\n        state   : MI\n        postal  : 48046\nship-to: *id001\nproduct:\n    - sku         : BL394D\n      quantity    : 4\n      description : Basketball\n      price       : 450.00\n    - sku         : BL4438H\n      quantity    : 1\n      description : Super Hoop\n      price       : 2392.00\ntax  : 251.42\ntotal: 4443.52\ncomments:\n    Late afternoon is best.\n    Backup contact is Nancy\n    Billsmer @ 338-4338."; 
        
           const EX2_28 : &'static str = 
        
               "---\nTime: 2001-11-23 15:01:42 -5\nUser: ed\nWarning:\n  This is an error message\n  for the log file\n---\nTime: 2001-11-23 15:02:31 -5\nUser: ed\nWarning:\n  A slightly different error\n  message.\n---\nDate: 2001-11-23 15:03:17 -5\nUser: ed\nFatal:\n  Unknown variable \"bar\"\nStack:\n  - file: TopClass.py\n    line: 23\n    code: |\n      x = MoreObject(\"345\\n\")\n  - file: MoreClass.py\n    line: 58\n    code: |-\n      foo = bar";

seems like it's in the tests at least, and saw a few mentions in the code, hopefully it'll be released soon?

softprops · 2020-06-05T21:21:32Z

Is there an interface on master for registering tags?

davvid · 2024-01-29T02:57:39Z

FWIW I've merged #135 (a rebased version of #37) into my fork: https://github.com/davvid/yaml-rust

DeltaEvo linked a pull request Oct 10, 2016 that will close this issue

Scalar parser #37

Open

zuowenjian mentioned this issue Apr 17, 2018

parse rigger-ng yaml config file xcodecraft/rigger#6

Open

dchakrav-github mentioned this issue Nov 2, 2021

Allowing tags to be propagated for other events as well #180

Open

mrgrain mentioned this issue Jul 11, 2024

Tag handle support Ethiraric/yaml-rust2#34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Support for Tag Directives #35

Adding Support for Tag Directives #35

jess-sol commented Sep 26, 2016 •

edited

Loading

chyh1990 commented Sep 28, 2016

jess-sol commented Sep 28, 2016 •

edited

Loading

DeltaEvo commented Sep 28, 2016 •

edited

Loading

jess-sol commented Sep 28, 2016

DeltaEvo commented Sep 28, 2016 •

edited

Loading

DeltaEvo commented Sep 29, 2016

flyx commented Oct 1, 2016

chyh1990 commented Oct 10, 2016

flyx commented Oct 10, 2016

chyh1990 commented Nov 4, 2016

trans commented Nov 16, 2016 •

edited

Loading

flyx commented Nov 16, 2016

softprops commented Jun 5, 2020

theblazehen commented Jun 5, 2020

softprops commented Jun 5, 2020

davvid commented Jan 29, 2024

Adding Support for Tag Directives #35

Adding Support for Tag Directives #35

Comments

jess-sol commented Sep 26, 2016 • edited Loading

chyh1990 commented Sep 28, 2016

jess-sol commented Sep 28, 2016 • edited Loading

DeltaEvo commented Sep 28, 2016 • edited Loading

jess-sol commented Sep 28, 2016

DeltaEvo commented Sep 28, 2016 • edited Loading

DeltaEvo commented Sep 29, 2016

flyx commented Oct 1, 2016

chyh1990 commented Oct 10, 2016

flyx commented Oct 10, 2016

chyh1990 commented Nov 4, 2016

trans commented Nov 16, 2016 • edited Loading

flyx commented Nov 16, 2016

softprops commented Jun 5, 2020

theblazehen commented Jun 5, 2020

softprops commented Jun 5, 2020

davvid commented Jan 29, 2024

jess-sol commented Sep 26, 2016 •

edited

Loading

jess-sol commented Sep 28, 2016 •

edited

Loading

DeltaEvo commented Sep 28, 2016 •

edited

Loading

DeltaEvo commented Sep 28, 2016 •

edited

Loading

trans commented Nov 16, 2016 •

edited

Loading