Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand monarch functionality to allow state access within rules #183463

Merged

Conversation

jeremy-rifkin
Copy link
Contributor

@jeremy-rifkin jeremy-rifkin commented May 25, 2023

I think monarch is not powerful enough to tokenize constructs such as C++ raw string literals in the general case. This is an attempt to expand monarch functionality so it can.

Monaco tokenization of C++ raw strings is broken when the d-char-sequence  contains double quotes microsoft/monaco-editor#3128

image

This is because https://github.com/microsoft/monaco-editor/blob/main/src/basic-languages/cpp/cpp.ts looks for the end of a raw string literal by searching for /(.*)(\))(?:([^ ()\\\t"]*))(\")/ and then checks $3==$S2 to see if the end has in fact been found, but only compares what is between a closing parenthesis and the first double quote.

As far as I can tell, there is no way to solve this currently in monarch, without doing something hacky like hard-coding for all d-char-sequences of 16 or fewer characters, for example (the C++ standard does specify this limit). Maybe there is a way that I haven't thought of, in which case this PR isn't needed. Most languages do not seem to place such limitations on the delimiter sequences, e.g. PHP and C#. Rust tokenization can be improved as well with the tools provided by the PR.


This PR expands monarch functionality by allowing $S2 to be used within rule regexes, so the fix for microsoft/monaco-editor#3128 can be something along the lines of

        root: [
            // C++ 11 Raw String
            [/@encoding?R\"(?:([^ ()\\\t]*))\(/, { token: 'string.raw.begin', next: '@raw.$1' }],
            ...
        ],
        raw: [
+           [/.*\)$S2\"/, 'string.raw', '@pop']
-           [
-               /(.*)(\))(?:([^ ()\\\t"]*))(\")/,
-               {
-                   cases: {
-                       '$3==$S2': [
-                           'string.raw',
-                           'string.raw.end',
-                           'string.raw.end',
-                           { token: 'string.raw.end', next: '@pop' }
-                       ],
-                       '@default': ['string.raw', 'string.raw', 'string.raw', 'string.raw']
-                   }
-               }
-           ],
            [/.*/, 'string.raw']
        ],

I have not fully tested this PR yet.

@jeremy-rifkin
Copy link
Contributor Author

@microsoft-github-policy-service agree

@jeremy-rifkin
Copy link
Contributor Author

The automated test cases and unit tests pass. I could use some guidance on how to test the changes in conjunction with monaco / a test monarch grammar.

@alexdima alexdima added this to the March 2024 milestone Mar 19, 2024
Copy link
Member

@alexdima alexdima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@alexdima alexdima enabled auto-merge (squash) March 19, 2024 15:50
@alexdima alexdima merged commit 20deb62 into microsoft:main Mar 19, 2024
6 checks passed
@microsoft microsoft locked and limited conversation to collaborators Jun 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants