Introduction Why Docs About

Advanced matching

The problem

When answering to text exo with long texts, formatting or spacing mistakes often cause frequent false negative results. The teacher can try to add more variants to increase the chance to be judged as correct but something combinations of variant choice inside a long text generate a lot of slightly different possibilities. As a student it's very frustrating to have it counted as incorrect so often, it creates a very problematic ambiguity in the feedback.

Let's look at a typical exo where just adding variants would be a nightmare:

## Exo: Array of booleans
Declare a C++ `array` named `values` of 4 boolean values with the first 2 ones at true.
Solution:
- array<bool, 4> values = {true, true};
- array<bool, 4> values = {true, true, false, false};
- array<bool, 4> values = {true, true, false, false};
- array<bool, 4> values = {true,true,false,false};
- array<bool, 4> values{true, true};
- array<bool, 4> values{true, true, false, false};
- array<bool, 4> values{true, true, false, false};
- std::array<bool, 4> values{true, true, false, false};
...

We could continue to double the number of variants with std:: prefix, with or without the ending comma, ... The problem is that all spaces here are optional, the code will compile without them so it should not be wrong. At worse when the student types a double space somewhere instead of a single one, and well you guessed it, that's another false positive case sadly...

So the real solution to verify this particular solution is regular expression (regex) ! It obviously requires some practice to know how to use them, but I assume all teachers in IT know how to use them or can learn them easily.

So, the regex that supports a very broad range of solution is:

(std::)?array<bool, *4> *values *=? *{true, *true(, *false){0,2}};

If you are not familiar with regex:

  • * matches 0 to n times the last char/group (this is used here to support 0, 1 or more spaces)
  • ? matches 0 to 1 time the last char/group (the group std:: inside the parentheses is optional, same for the =)
  • {0,2} matches 0 to 2 times the last char/group (the 3 possibilities are matched :{true, true}, {true, true, false} and {true, true, false, false})

To use regexes in Delibay transcripts, the smart and easy way of doing it to wrap it with slashes (/regexhere/) like it is regularly written, to indicate this is not normal text. Then we could use them everywhere, under Solution:, under the cell solution in a table, mixed with a text, ... anywhere where raw solution text can be written. TODO: add examples with /regegx/ in multiple places TODO: talk about regex engine and flags TODO: edge case with /* comment */ that is NOT a regex ! TODO: !!! what about the non regex solution shown as solution ? could we extract it from the regex ?

Filters system

The concept of filters is another strategy to go beyond regex limitations or regex writing complexity. Basically, **a filter is a way to transform the student's answer before comparing it to the solution. In the precedent examples, extra spaces could be removed before we try to check it. "Removing extra spaces" could be a filter with some parameters.

Let's define a few filters, what they do and how to use them:

Name Description Usage Use case
Remover Remove all chars given in a list [rm:*] [rm: ]: Remove all spaces of [ 4, 5, 6 , 1] before checking it against [4,5,6,1]
Sort Sort the list created by given separators [sort:*] [sort: ]: Make sure that 1 5 3 2 contains all elements of the list 1 2 3 5
(it split the string by in an array of substrings and sort it)
Lowercase Transform to lowercase [lower] [lower]: Make sure Latex, LATEX or latex are judged as correct against the solution latex. The solution does not need to be in lowercase.
Uppercase Transform to uppercase [upper] [upper]: Save usage than [lower]
??

As these filters concern how to transform the solution, we add them just after the Solution: prefix.

  • TODO: is there other important filters or changes to remover and sort ?
  • TODO: does filters have an apply order ?
  • TODO: does value trimming is applied before filters ?
  • TODO: does the solution is passed through filters ? is there a check to catch tricky errors like [rm: ] with solution containing a space: "hey there"

Combinations of regex and filters

The above example with the values array is pretty heavy to read with all these * and the space challenge has another edge case: array<bool, 4> values = { true , true, false , false } ;. I know this is an extreme case because it's a nonsense to put those spaces, but extra spaces that doesn't break compilation should ideally be supported. Instead of adding more * just everywhere in the regex all the time, we can combine filters + regexes to facilitate the regex writing:

With a filter that removes spaces, this answer

array<bool, 4> values = {   true  , true, false     , false } ;

would be transformed to

array<bool,4>values={true,true,false,false};

and the regex would be

(std::)?array<bool,4>values=?{true,true(,false){0,2}};

in the transcript file we would define it in this way:

## Exo: Array of booleans
Declare a C++ `array` named `values` of 4 boolean values with the first 2 ones at true.
Solution: [remove: ] /(std::)?array<bool,4>values=?{true,true(,false){0,2}};/

...

TODO: how to fix the situation where the extra spaces are actually wrong ? arr ay<bool, 4> val ues = {tr ue, true, false, false};

Exact multiline code matching

TODO: is it really advanced matching or just a multiline mode in text exo ? There are some cases where a textual solution is multiline and there is only one correct solution possible. So it can be autocorrected with some details to consider.

Here is an example:

## Exo: Ternary to if
Transform this ternary operator with the if/else version.
~~~cpp
return age >= 18 ? "Adult" : "Minor";
~~~
Solution:
~~~cpp
if (age >= 18) {
	return "Adult";
} else {
	return "Minor";
}
~~~

TODO: should we mention the language or leave it after backticks ```cpp ?? fix below or above. TODO: should we add a [multiline] prefix ? TODO: should we add a [trimline] filter ? or [trim_lines] or [trim_line]

Another example:

## Exo: ASCII art
Execute this in your head and give the output of this program:
~~~cpp
//some code here
~~~
Solution:
    *
   ***
  *****
 *******
*********
    *
    *
    *

Regex support

Regex should be supported anywhere textual answer can be given, by just putting slashes around. Like this:

Solution: /(a|b|c){4}/

Regex supports is actually harder that it seems for security reasons. As the regex and content is provided by users, they are inherently unsafe and vulnerable to ReDOS attacks.

Some interesting resources

  • https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
  • https://blog.bitsrc.io/threats-of-using-regular-expressions-in-javascript-28ddccf5224c
  • https://www.npmjs.com/package/safe-regex
  • https://www.npmjs.com/package/re2

What happens in case of ReDOS:

  • TBD