Advanced matching

The problem

When answering to text exo with long texts, formatting or spacing mistakes often cause frequent false negative results. The teacher can try to add more variants to increase the chance to be judged as correct but something combinations of variant choice inside a long text generate a lot of slightly different possibilities. As a student it's very frustrating to have it counted as incorrect so often, it creates a very problematic ambiguity in the feedback.

Let's look at a typical exo where just adding variants would be a nightmare:

exo Array of booleans
Declare a C++ `array` named `values` of 4 boolean values with the first 2 ones at true.
sol 
- array<bool, 4> values = {true, true};
- array<bool, 4> values = {true, true, false, false};
- array<bool, 4> values = {true, true, false, false};
- array<bool, 4> values = {true,true,false,false};
- array<bool, 4> values{true, true};
- array<bool, 4> values{true, true, false, false};
- array<bool, 4> values{true, true, false, false};
- std::array<bool, 4> values{true, true, false, false};
...

We could continue to double the number of variants with std:: prefix, with or without the ending comma, ... The problem is that all spaces here are optional, the code will compile without them so it should not be wrong. At worse when the student types a double space somewhere instead of a single one, and well you guessed it, that's another false positive case sadly...

So the real solution to verify this particular solution is regular expression (regex) ! It obviously requires some practice to know how to use them, but I assume all teachers in IT know how to use them or can learn them easily.

So, the regex that supports a very broad range of solution is:

(std::)?array<bool, *4> *values *=? *{true, *true(, *false){0,2}};

If you are not familiar with regex:

* matches 0 to n times the last char/group (this is used here to support 0, 1 or more spaces)
? matches 0 to 1 time the last char/group (the group std:: inside the parentheses is optional, same for the =)
{0,2} matches 0 to 2 times the last char/group (the 3 possibilities are matched :{true, true}, {true, true, false} and {true, true, false, false})

To use regexes in Delibay transcripts, the smart and easy way of doing it to wrap it with slashes (/regexhere/) like it is regularly written, to indicate this is not normal text. Then we could use them everywhere, under sol, under the cell solution in a table, mixed with a text, ... anywhere where raw solution text can be written. TODO: add examples with /regegx/ in multiple places TODO: talk about regex engine and flags TODO: edge case with /* comment */ that is NOT a regex ! TODO: !!! what about the non regex solution shown as solution ? could we extract it from the regex ?

Filters system

The concept of filters is another strategy to go beyond regex limitations or regex writing complexity. Basically, **a filter is a way to transform the student's answer before comparing it to the solution. In the precedent examples, extra spaces could be removed before we try to check it. "Removing extra spaces" could be a filter with some parameters.

Let's define a few filters, what they do and how to use them:

Name	Description	Usage	Use case
Remover	Remove all chars given in a list	`[rm:*]`	`[rm: ]`: Remove all spaces of `[ 4, 5, 6 , 1]` before checking it against `[4,5,6,1]`
Sort	Sort the list created by given separators	`[sort:*]`	`[sort: ]`: Make sure that `1 5 3 2` contains all elements of the list `1 2 3 5` (it split the string by in an array of substrings and sort it)
Lowercase	Transform to lowercase	`[lower]`	`[lower]`: Make sure `Latex`, `LATEX` or `latex` are judged as correct against the solution `latex`. The solution does not need to be in lowercase.
Uppercase	Transform to uppercase	`[upper]`	`[upper]`: Save usage than `[lower]`
??

As these filters concern how to transform the solution, we add them just after the sol prefix.

TODO: is there other important filters or changes to remover and sort ?
TODO: does filters have an apply order ?
TODO: does value trimming is applied before filters ?
TODO: does the solution is passed through filters ? is there a check to catch tricky errors like [rm: ] with solution containing a space: "hey there"

Combinations of regex and filters

The above example with the values array is pretty heavy to read with all these * and the space challenge has another edge case: array<bool, 4> values = { true , true, false , false } ;. I know this is an extreme case because it's a nonsense to put those spaces, but extra spaces that doesn't break compilation should ideally be supported. Instead of adding more * just everywhere in the regex all the time, we can combine filters + regexes to facilitate the regex writing:

With a filter that removes spaces, this answer

array<bool, 4> values = {   true  , true, false     , false } ;

would be transformed to

array<bool,4>values={true,true,false,false};

and the regex would be

(std::)?array<bool,4>values=?{true,true(,false){0,2}};

in the transcript file we would define it in this way:

exo Array of booleans
Declare a C++ `array` named `values` of 4 boolean values with the first 2 ones at true.
sol [remove: ] /(std::)?array<bool,4>values=?{true,true(,false){0,2}};/

...

TODO: how to fix the situation where the extra spaces are actually wrong ? arr ay<bool, 4> val ues = {tr ue, true, false, false};

Exact multiline code matching

TODO: is it really advanced matching or just a multiline mode in text exo ? There are some cases where a textual solution is multiline and there is only one correct solution possible. So it can be autocorrected with some details to consider.

Here is an example:

exo Ternary to if
Transform this ternary operator with the if/else version.
~~~cpp
return age >= 18 ? "Adult" : "Minor";
~~~
sol 
~~~cpp
if (age >= 18) {
	return "Adult";
} else {
	return "Minor";
}
~~~

TODO: should we mention the language or leave it after backticks ```cpp ?? fix below or above. TODO: should we add a [multiline] prefix ? TODO: should we add a [trimline] filter ? or [trim_lines] or [trim_line]

Another example:

exo ASCII art
Execute this in your head and give the output of this program:
~~~cpp
//some code here
~~~
sol 
    *
   ***
  *****
 *******
*********
    *
    *
    *

Regex support

Regex should be supported anywhere textual answer can be given, by just putting slashes around. Like this:

sol /(a|b|c){4}/

Regex supports is actually harder that it seems for security reasons. As the regex and content is provided by users, they are inherently unsafe and vulnerable to ReDOS attacks.

Some interesting resources

https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
https://blog.bitsrc.io/threats-of-using-regular-expressions-in-javascript-28ddccf5224c
https://www.npmjs.com/package/safe-regex
https://www.npmjs.com/package/re2

What happens in case of ReDOS:

Advanced matching

#The problem

#Filters system

#Combinations of regex and filters

#Exact multiline code matching

#Regex support

The problem

Filters system

Combinations of regex and filters

Exact multiline code matching

Regex support