-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markov loves those dots! #11
Comments
Good catch. So my usage of RiTA is very crude currently. I generate a random number of sentences (1-3, off the top of my head). Rita returns these as an array of Strings, and I append them all together adding a ' Anyway, I think what we're seeing here happens for two reasons:
So what I'll do for this is:
|
Additionally, I will attempt some simple and sane sanitation on the RiTA input to try to keep things sane. The better the input, the better the output. Do you have any examples of what you mean by "Twitch adds a space when we do so" or other examples for what you're talking about here? (So I can write unit tests). |
That was the most obvious case that came to mind. Being pedantic, I backspace after an auto-complete to add my period or comma straight after the name, but most people probably don't. :) If I could get my hands on a chat log I could check how often it actually happened, although I'm starting to suspect that all the superfluous periods we saw from the bot may have been the addition you mention above. If you're already massaging incoming text from the chat, it could be prudent to strip whitespace between characters and postfix punctuation early on. I'm rusty in PCRE's but something like this assuming plain-text input:
|
(Hopefully I'm not speaking out of turn here.) Yesterday in the stream, one thing which HoomanBot did quite frequently is use single periods as words, in sequences:
I'd bet it's because hoomans themselves use those when they end a proposition or sentence with a fellow hooman's nickname which was tab-completed: Twitch adds a space when we do so. Therefore, it might make sense to filter out or disallow single punctuation characters from becoming "words". (And I mean just punctuation; I can see "=" and such being useful.)
The text was updated successfully, but these errors were encountered: