Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: IPA input/output for words #4

Open
SaphireLattice opened this issue Apr 18, 2022 · 5 comments
Open

New Feature: IPA input/output for words #4

SaphireLattice opened this issue Apr 18, 2022 · 5 comments
Assignees

Comments

@SaphireLattice
Copy link
Owner

Somewhat related to #1, probably could be done in bulk with it. Also relevant is #3, to be able to assign own custom representations, and show some of the used ones (IPA, mine, and the one used by the audio decoding project)

@SaphireLattice SaphireLattice self-assigned this Apr 18, 2022
@Erquint
Copy link

Erquint commented Jul 31, 2023

IPA is a given.

I've spent the last day burying myself in the phonetic construction of the language to get very confident with it before I enable the spoiled mode and I couldn't help but notice that the developers surely must've strictly adhered to American English IPA transcriptions given on Wiktionary. To much detriment of the conlang, I should add, since it ends up carrying forth its redundant vowel inventory overabundance. I'm convinced they had an IPA-to-cubescript translator coded up indev in addition to manual input. I did spot some quite mistaken transcriptions made by the developers due to either bad IPA transcription source or human error. Such as bc9f /oʊt/ trying to spell "out" on manual page 16, which should've been a88c /aʊt/ as used in the rest of the manual.

And of course spelling phonemes using Latin characters with English speaker assumptions has always been horrid, given how starkly disjointed phonemes are from glyphs in English language. Never a way to escape ambiguity.

@Erquint
Copy link

Erquint commented Jul 31, 2023

Since I recently enabled the spoiled mode, I'm noticing many issues with vowel transcriptions the webapp currently displays. Maybe I'm wrong with my assumptions, but it seems to me that it misses a lot of interstitial vowels. See for yourself: I'm attaching my rough notes.

image
I'm sure I got something wrong too.

  • There's two /əɹ/s. The bottom one should actually be /ɔɹ/, even though there is already another /ɔɹ/.
    The bottom one actually seems bogus. No idea where it came from.
  • That /æɹ/ is rather /ɛ(ə)ɹ/ or even /ɛɚ/, but is probably sufficient to represent as /ɛɹ/.
    Even if it still feels to me like /æɹ/ kinda encompasses an approximation of those options — I'm afraid that might be my bias.
  • The top /aɹ/ should be /ɑɹ/.
  • The /ei/ should be /eɪ/.
  • The /ə/ at the very top is basically /ʌ/ which is very prone to degrade into schwa in context, especially since cubescript doesn't denote stress/accent.

Here's an updated version with those corrections.
image
Helpfully this helps.

I just doublechecked all of the vowels and there's a lot of discrepancy with what the webapp currently presents using the unreliable medium of Englatin.

@Erquint
Copy link

Erquint commented Aug 1, 2023

A snippet of minimal modification to your source.

const phonemes = {
    out: [
        { mask: 0b0001_0000_0001_0011, text: "æ" },
        { mask: 0b0001_0000_0001_0001, text: "ɔ" },
        { mask: 0b0000_1100_0000_0000, text: "ɪ" },
        { mask: 0b0001_1100_0001_0000, text: "ɛ" },
        { mask: 0b0001_0100_0001_0000, text: "ʊ" },
        { mask: 0b0000_0000_0000_0011, text: "ʌ" },

        { mask: 0b0001_1100_0001_0001, text: "i" },
        { mask: 0b0001_0100_0001_0011, text: "u" },
        { mask: 0b0001_1100_0001_0010, text: "əɹ" },
        { mask: 0b0001_1000_0001_0011, text: "ɔɹ" },
        { mask: 0b0000_1100_0000_0011, text: "ɑɹ" },
        { mask: 0b0001_1000_0001_0001, text: "ɪɹ" },

        { mask: 0b0000_0000_0000_0001, text: "eɪ" },
        { mask: 0b0000_0000_0000_0010, text: "aɪ" },
        { mask: 0b0000_0100_0000_0000, text: "ɔɪ" },
        { mask: 0b0000_1000_0000_0000, text: "aʊ" },
        { mask: 0b0001_1100_0001_0011, text: "oʊ" },
        { mask: 0b0001_1000_0001_0000, text: "ɛɹ" },
    ],
    in: [
        { mask: 0b0000_0011_0000_0000, text: "m" },
        { mask: 0b0000_0011_0000_0100, text: "n" },
        { mask: 0b0010_0011_1010_1100, text: "ŋ" },
        { mask: 0b0010_0000_1000_1000, text: "p" },
        { mask: 0b0000_0010_1010_0000, text: "b" },
        { mask: 0b0010_0000_1000_1100, text: "t" },

        { mask: 0b0000_0011_1010_0000, text: "d" },
        { mask: 0b0000_0010_1010_1000, text: "k" },
        { mask: 0b0010_0010_1000_1000, text: "g" },
        { mask: 0b0000_0001_1010_0000, text: "d͡ʒ" },
        { mask: 0b0010_0000_1000_0100, text: "t͡ʃ" },
        { mask: 0b0010_0001_1000_1000, text: "f" },

        { mask: 0b0000_0010_1010_0100, text: "v" },
        { mask: 0b0010_0000_1010_1100, text: "ð" },
        { mask: 0b0010_0011_1010_0000, text: "θ" },
        { mask: 0b0010_0001_1010_1000, text: "s" },
        { mask: 0b0010_0010_1010_0100, text: "z" },
        { mask: 0b0010_0011_1000_1100, text: "ʃ" },

        { mask: 0b0000_0011_1010_1100, text: "ʒ" },
        { mask: 0b0010_0010_1010_0000, text: "h" },
        { mask: 0b0010_0000_1010_1000, text: "ɹ" },
        { mask: 0b0010_0000_1010_0100, text: "j" },
        { mask: 0b0000_0000_0000_1100, text: "w" },
        { mask: 0b0010_0000_1010_0000, text: "l" },
    ],
};

Looks like you were missing /ɔɪ/ ("-?-") and didn't quite distinguish /ð/ ("?th?") from /θ/.

Reordered in the PR with spaces and underscores of the phonetic output omitted.

@Erquint
Copy link

Erquint commented Aug 2, 2023

If this PR is merged or you reimplement it as an option, there's a bonus feature you could go for.

You know how sometimes you can be trying to read the proper phonemes in the proper order and yet they just refuse to click into a recognized word in your head because of the perceived artificiality? Then you have to make yourself speak them out loud and try your best to listen to your own voice and interpret it as if you weren't thinking of the written phonemes in your head…

What could help is somebody reading it out loud for you at a press of a button.
In case you weren't aware — many if not the majority of TTS engines are based on IPA and can be prompted with it directly using some SSML markup.
I usually use this sort of prompt: <phoneme alphabet="ipa" ph="ˈʐvat͡ɕkə" /> [IPA for bubblegum in Russian.]

Might be worth trying to embed some random WASM TTS or something of the sort to add such a feature.

@Erquint
Copy link

Erquint commented Aug 6, 2023

Made an attempt to spatially group this mess of vowels.
Cubescript vowel map

I'm not entirely happy with this notional map, but there are conflicting criteria for how I'd set it up.

Made many attempts to find some rhyme and reason to their construction, but for the most part it seems pretty arbitrary. The only notable traits are:

  • One-off stroke makes a vowel ended with "ɪ".
    • Except for one that ends with "ʊ" instead.
  • A one-stroke-gap makes a vowel ended with an "ɹ".
    • Three of which seem to be constructed from "oʊ" with a gap made somewhere, but "ɛɹ" and "ɪɹ" are not.

The direction of strokes and gaps sadly doesn't seem to follow any inheritable system.
I can't seem able to systematize them further than that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants