Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make "transform" public and more reusable. #7

Closed
wants to merge 1 commit into from

Conversation

progval
Copy link

@progval progval commented Jul 5, 2018

Hi,

I was looking for a way to split words in a similar way to heck, in a crate that already uses heck.

The obvious way would be to use snake_case and split on underscores, but I thought it would be more useful to make transform reusable to directly produce a vector (and other collections), so here it is.

I understand it does not fit the intended use case of heck, so I will understand if you refuse this PR.

Thanks!

@withoutboats
Copy link
Owner

Hey, thanks for this PR! I've actually been interested recently in revisiting this library in a number of ways and prepping a 1.0 release, and this was one of the things I definitely considered (#6 is a prior request for the same feature that I missed and never responded to).

Its also interesting that you made transform generic. I also considered this, and the problem is that you still take the input data as &str. I'm not sure how useful being generic over the output actually is if you aren't also generic over the input. I also prefer to keep the ToOwned relationship between input and output, making the signature more like:

fn transform<T, F, G>(...) -> T::Owned where
    T: UnicodeSegmentation + ToOwned,
    T::Owned: Default,
    ...

Unfortunately, it seems really unlikely that any type other than str and String could ever meet these bounds.

So for now, I don't think I want to do make this function more generic, unless there's a really clear concrete use case for, as you say, producing a vector instead of a String.

But I'm happy with the idea of exposing this function!

@progval
Copy link
Author

progval commented Jul 5, 2018

I also prefer to keep the ToOwned relationship between input and output

Why?

unless there's a really clear concrete use case for, as you say, producing a vector instead of a String

I'm writing a code generator that takes an XML Schema as input. The problem is I have to output a lot more structs and enums than there are names in the schema, so I'm trying various heuristics to give them nice developer-friendly names.
The one I was thinking of here is, given an enum whose variants have types FooBarSimpleType, BazSimpleType, and QuxQuuxSimpleType, a longest-suffix/longest-substring search would guess the name SimpleType for the enum. And word splitting would avoid accidents like rSimpleTime if all prefixes end with an r.

(But I'm not sure it would actually use it in my code, I have yet to make a dataset of existing XML schemas to find patterns in the names they use for their elements.)

@withoutboats
Copy link
Owner

Interesting. So it seems like you actually want to visit all of the "words" as we've defined them for this purpose. Probably a more correct API for your use case would be some sort of iterator over substrings, rather than boxing all users into the transform API which is specifically designed for reformatting those substrings.

However, I'd also warn you that since you're doing string comparisons, you may need to worry about unicode normalization once your data is not ASCII only - for example, if the common suffix were Naïve, there are multiple ways that could be encoded in unicode which should be considered equivalent. This might get very hairy.

@withoutboats
Copy link
Owner

Why?

This is a good question, and I'm not sure - its really just a hunch to be honest.

I guess because this library has been focused on transformations for various user-facing display purposes. For display purposes, you'd never want to go from a str to a vector, instead you might possibly want to go from your custom str type to your custom String type.

I'm worried about taking on responsibilities to suit purposes outside of this display-oriented scope, because then it might grow more and more features and be more complicated.

I want to expose the helpers in lib.rs so that users can write their own ToShoutyKebabCase or whatever they need, but I worry about the implications of directly and explicitly supporting use cases like yours (like that we may need unicode normalization).

I feel more comfortable with the iterator API I mentioned in my previous comment, though implementing it may be unpleasant, because that leaves the collecting aspect of it to you, and not in this library.

@jplatte
Copy link
Collaborator

jplatte commented Aug 15, 2023

This PR is very old and has conflicts, so I'll close it. If you are still interested in this, please let's discuss how to expose it (which seems like it was the contentious bit of this PR) in #6 first.

@jplatte jplatte closed this Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants