Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text-based matching #61

Open
tcoopman opened this issue Dec 10, 2023 · 12 comments
Open

Text-based matching #61

tcoopman opened this issue Dec 10, 2023 · 12 comments
Labels
enhancement New feature or request

Comments

@tcoopman
Copy link

basically this with auto_assert: assert capture_log(fn -> Logger.error(msg) end) =~ msg

@zachallaun
Copy link
Owner

zachallaun commented May 12, 2024

The crux of this is somehow integrating with the text-match operator =~.

I think that auto_assert_capture_log is too specific. If support is added for text matching, I'd like it to be usable for any string comparison.

One option would be to allow a <~ operator to be used with auto_assert. Whereas =~ works as text =~ substring or regex, <~ would be substring or regex <~ text.

Here's a quick worked example. The goal is to test that "some error message" occurs in the logs.

# initial state
auto_assert capture_log(fn -> Logger.error("some error message") end)

# after first run
auto_assert "\e[31m\n12:50:27.941 [error] some error message\n\e[0m" <-
              capture_log(fn -> Logger.error("some error message") end)

# manually edit to use <~ and match less of the message
auto_assert "[error] some error message" <~
              capture_log(fn -> Logger.error("some error message") end)

The biggest issue with this is that it's not obvious what to do if the value no longer matches. Possibly the best we can do is replace it with the entire captured log again and require you to rewrite it.

# message changed
auto_assert "[error] some error message" <~
              capture_log(fn -> Logger.error("some error MESSAGE") end)

# after running
auto_assert "\e[31m\n12:50:27.941 [error] some error MESSAGE\n\e[0m" <~
              capture_log(fn -> Logger.error("some error MESSAGE") end)

# still have to manually edit to remove timestamp and only assert what you care about
auto_assert "[error] some error MESSAGE" <~
              capture_log(fn -> Logger.error("some error MESSAGE") end)

There are some heuristics that could be used to guess which part of the string you care about and suggest a more intelligent alternative. I might be able to leverage String.myers_difference/2 to find better suggestions, e.g. if the myers_difference "pattern" is [:ins, :eq, ..., :eq, :ins], that means that something in the middle of the asserted text changed, and that's probably the bit you care about. Might also be able to use regular expressions to omit common prefixes/suffixes like escape sequences and timestamps.

Given that, perhaps any string value that starts and ends with likely-ignorable formatting content could result in a pattern suggestion using <~. For instance:

defp example do
  "\e[31m\n12:50:27.941 [error] some error MESSAGE\n\e[0m"
end

test "example/0" do
  auto_assert example()

  # running yields the following two suggestions
  auto_assert "[error] some error MESSAGE" <~ example()
  auto_assert "\e[31m\n12:50:27.941 [error] some error MESSAGE\n\e[0m" <- example()
end

@zachallaun
Copy link
Owner

@tcoopman I have a somewhat minimal version of this implemented in the text-match-operator branch. If you have time to try it out, I'd really appreciate any feedback!

# dep
{:mneme, github: "zachallaun/mneme", ref: "text-match-operator"}

image

@zachallaun zachallaun changed the title feature: auto_assert_capture_log Text-based matching May 14, 2024
@zachallaun zachallaun added the enhancement New feature or request label May 14, 2024
@tcoopman
Copy link
Author

I'll try to look at it tomorrow, I'll keep you posted

@tcoopman
Copy link
Author

Some feedback:

  1. when you match the full string you switch from <~ to <-. I didn't notice that at first.
  2. on the full match you also start using """, but """ don't seem to work with <~

image

So that was weird / unexpected for me.

For the rest it feels nice, maybe adding regexes could be useful, but on the other hand I'm not sure it's worth the extra value.

@zachallaun
Copy link
Owner

Some feedback:

1. when you match the full string you switch from `<~` to `<-`. I didn't notice that at first.

I agree that the difference between the two operators is subtle.

An alternative that, after some reflection, I think I like more is to introduce "matchers" that can go on the left-hand side of <- and that change the behavior of the match. Concretely:

auto_assert text("bar") <- "foo bar baz"

This is also consistent with how I'm planning to handle file snapshots (#72).

2. on the full match you also start using `"""`, but `"""` don't seem to work with `<~`

I think the issue here is that the """ string you're using is actually equivalent to "multiple workshops found for\n", but that newline isn't present in the string, so Mneme regenerates the result. Try adding a \ to the end of the line, which suppresses the newline:

auto_assert """
            multiple workshops found for\
            """
            <~ """
            [error] multiple workshops found for .....
            """

For the rest it feels nice, maybe adding regexes could be useful, but on the other hand I'm not sure it's worth the extra value.

Regexes do currently work, but you have to add them yourself and Mneme doesn't generate them. I don't plan to add regex generation -- that seems like a can of worms that I don't want to open.

@zachallaun
Copy link
Owner

I'm not sure yet whether text/1 is the right name, but I'm about to push a change that removes substring <~ expr in favor of text(substring) <- expr.

image

@tcoopman
Copy link
Author

is it intentional that the text matcher is not used for exact matches?

@zachallaun
Copy link
Owner

Yes, that's intentional (for now). The idea is that if you're doing an exact match, you want to know if anything in value changes. If Mneme generated text for exact string matches and then the string changed because something was prepended or appended to it, the test case would still succeed.

The current "rules" for when text() is generated by Mneme are:

  • The expression evaluates to a string
  • The string has "ignorable content" at the beginning or end, where ignorable content is currently things like whitespace, dates/timestamps, and terminal escape characters
  • After stripping "ignorable content", the remaining content is a single line

At least that last one is likely to change because there's nothing that fundamentally prevents multi-line """ strings inside text(), but there might be some additional restrictions like no ignorable content in the middle of the text. For instance, in this case, there's no good way to "strip" the escape characters from the middle of the text:

image

In these cases, what you'd likely want to do instead is split the captured log on newlines and then do regular assertions, like:

logged = capture_log(...) |> String.split("\n")

assert Enum.any?(logged, &(&1 =~ "this is a warning"))

@zachallaun
Copy link
Owner

Though, if you really want to stay in Mneme-land, we could theoretically introduce a new contains() as well that asserts some pattern is present in an enumerable, so the above could be:

auto_assert contains(text("this is a warning")) <- capture_log(...) |> String.split("\n")

@tcoopman
Copy link
Author

What about: exact_match, matches, substring_match, text_match?

to be clear, removing the text if you have an exact match is fine for me as well, but you'll need to document it :-)

@zachallaun
Copy link
Owner

I could get behind substring_match as a better and more obvious/explicit name than text, for sure!

I don't know about exact_match or matches since they would be a no-op, i.e. the following would be exactly the same:

auto_assert exact_match("foo") <- "foo"
auto_assert "foo" <- "foo"

Agreed that it should be documented! Right now there are some docs about generated patterns, but they're not comprehensive. I should write up a guide about how and why patterns are generated/changed that I can link to from various places.

@zachallaun
Copy link
Owner

Wrote a new guide on pattern generation and selection to replace the small section it had in the overview. This will be a good place to add documentation for this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants