Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement load_from_bytes #156

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Implement load_from_bytes #156

wants to merge 2 commits into from

Conversation

mkmik
Copy link

@mkmik mkmik commented May 5, 2020

Closes #155

Also helps in some cases with #142, when the BOM is at the beginning of the file (common),
but not in the corner case where the BOM is at the start of a document which is not the first one.

CC @glyn

@mkmik
Copy link
Author

mkmik commented May 5, 2020

Blocked on #139

src/yaml.rs Outdated
// detect_utf16_endianness.
let (res, _) = encoding::types::decode(
&buffer,
encoding::DecoderTrap::Replace,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkmik I'd like to understand why this decoder trap is sufficient in all cases. I was expecting it to be necessary to allow the user to choose.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point; probably strict is a better default and probably the user would want to be able to override this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(changed to Strict)

@glyn what's the sanest way to design an API that would let a user change the decoder error trapping behaviour?

a. Should we make it a mandatory parameter?
b. An Option<>?
c. Two methods, one with defaults and one with some config structure that includes the encoding trap?
d. a "decoder" struct with a "decode" method and a builder pattern to set options:

Decoder::read(file).encoding_trap(mytrap).decode()

?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like (d) as it gives a clean API when you only want the default (strict) behaviour. I could live with (a) or (c). I don't like (b) as it requires an ugly None parameter for the default behaviour.

@chyh1990
Copy link
Owner

chyh1990 commented Jun 1, 2020

since this MR introduce a new crate dependency, and users can easily implement this helper method in their code. I'm not sure wether we should provide this in yaml crate

@glyn
Copy link

glyn commented Jun 1, 2020

since this MR introduce a new crate dependency, and users can easily implement this helper method in their code. I'm not sure wether we should provide this in yaml crate

I think it would be worth including this to ensure consistency of behaviour between users. Also, providing it here makes it more likely that users will consider handling an initial BOM in the first place (rather than finding/fixing a bug).

@mkmik
Copy link
Author

mkmik commented Jun 1, 2020

yes; more explicitly: it's likely that if this library doesn't do it, the users of this library won't do it either and thus effectively their application won't be compliant with the YAML specification.

The reason I think that's likely, is that in my experience most people either are not aware of this detail of the spec and/or don't think UTF-16 or UTF-8 with BOMs is important at all (e.g. "I never saw some UTF-16 for years, so clearly nobody is using it").

I think adding a small crate dependency is a small price to pay for ensuring that the ecosystem built around this library follows the spec.

@chyh1990
Copy link
Owner

chyh1990 commented Jul 2, 2020

pls rebase this MR to fix CI

Closes chyh1990#155

Also helps in some cases with chyh1990#142, when the BOM is at the beginning of the file (common),
but not in corner case where the BOM is at the start of a document which is not the first one.
@mkmik
Copy link
Author

mkmik commented Jul 30, 2020

@glyn I implemented (d), PTAL

@glyn
Copy link

glyn commented Jul 30, 2020

@glyn I implemented (d), PTAL

Looks reasonable. One downside of this approach now becomes apparent. In:

let mut d = Decoder::read(file)
let mut e = d.encoding_trap(mytrap)
let f = d.decode()
...

d.encoding_trap(mytrap) mutates d and so f is computed with mytrap, which is surprising if you don't pay attention to the signatures. This is unusual behaviour for the builder pattern. But it's unavoidable if we go with option (d).

@mkmik
Copy link
Author

mkmik commented Jul 30, 2020

@glyn

let mut d = Decoder::read(file)
let mut e = d.encoding_trap(mytrap)
let f = d.decode()

I'm not a rust expert but that's not how this builder pattern works, I followed https://doc.rust-lang.org/1.0.0/style/ownership/builders.html#consuming-builders: which doesn't use references.

this means that I cannot call d.decode on a value if I already called d.encoding_trap on it.

error[E0382]: use of moved value: `d`
   --> src/yaml.rs:881:19
    |
879 |         let mut d = YamlDecoder::read(s as &[u8]);
    |             ----- move occurs because `d` has type `yaml::YamlDecoder<&[u8]>`, which does not implement the `Copy` trait
880 |         d.encoding_trap(encoding::DecoderTrap::Ignore);
    |         - value moved here
881 |         let out = d.decode().unwrap();
    |                   ^ value used here after move

using mut self initially was confusing to me too; I initially wrote it as:

  pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
    let mut new = self;
    new.trap = trap;
    new
  }

but then I saw the official documentation example with mut self in the fn signature, and assumed it was idiomatic rust.

@mkmik
Copy link
Author

mkmik commented Jul 30, 2020

CI fails because of a transient error

error: caused by: failed to make network request
error: caused by: error sending request for url (https://static.rust-lang.org/dist/channel-rust-stable.toml.sha256): error trying to connect: dns error: No such host is known. (os error 11001)
error: caused by: error trying to connect: dns error: No such host is known. (os error 11001)
error: caused by: dns error: No such host is known. (os error 11001)
error: caused by: No such host is known. (os error 11001)
Command exited with code 1

amending and force pushing to trigger a new CI run

@glyn
Copy link

glyn commented Jul 30, 2020

@mkmik mut self is indeed idiomatic rust. It's good the compiler protects us from the issue to some extent. I'm still a bit concerned that encoding_trap() both mutates self and returns self. Perhaps it would be cleaner to give up on chaining and just have it mutate self but not return anything.

  pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
    let mut new = self;
    new.trap = trap;
    new
  }

would have issues because self can't necessarily be copied. In particular, the std::io::Read value can't be copied and can only be used once.

@mkmik
Copy link
Author

mkmik commented Jul 30, 2020

Sorry, I don't understand. This code works:

  pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
    let mut new = self;
    new.trap = trap;
    new
  }

if I understand correctly, it's not copying self, but it's transferring ownership to 'new', which is mutable so I can now modify the trap field and return the modified value. The caller of encoding_trap now can't access the value it called encoding_trap on because the value's ownership has been moved.

(sorry if I'm missing the point; I'm unfamiliar with rust in particular, although I have some familiarity with type systems and compiler internals, so almost kinda sorta grasp how this things works, but mostly I have no idea about what's "idiomatic")

@glyn
Copy link

glyn commented Jul 30, 2020

@mkmik I'm just getting back into Rust after a long break, so apologies for my lack of understanding. I'm not sure about:

pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
    let mut new = self;
    new.trap = trap;
    new
  }

You're quite right that self is moved to new, but I don't particularly like that encoding_trap renders self subsequently unusable. If I find out more, I'll post here. But anyway, let's put that approach to bed either way.

src/yaml.rs Outdated
}
}

pub fn encoding_trap(mut self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkmir @17cupsofcoffee suggested something like the following might be better for a builder:

Suggested change
pub fn encoding_trap(mut self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
pub fn encoding_trap(&mut self, trap: encoding::types::DecoderTrap) -> &mut Self {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; PTAL

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, thanks. I think it's now up to the project maintainers to decide whether they like the style.

@Diegovsky
Copy link

Is this still being worked on?

@glyn
Copy link

glyn commented Jun 8, 2021

@SenseTime-Cloud @dtolnay Is anyone looking to merge PRs these days?

@dtolnay
Copy link
Collaborator

dtolnay commented Aug 1, 2022

Not me — this crate is no longer used by serde_yaml.

@davvid
Copy link

davvid commented Jan 29, 2024

FWIW I've merged (and various other PRs) this into my fork: https://github.com/davvid/yaml-rust/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parse from raw bytes
6 participants