Page MenuHomePhabricator

Enable clicking on wikilinks and external links in syntax-highlighted code
Closed, ResolvedPublicFeature

Assigned To
Authored By
SD0001
Jun 21 2024, 5:42 PM
Referenced Files
None
Tokens
"Love" token, awarded by Dringsim."Love" token, awarded by Sunpriat2."Love" token, awarded by Pcoombe."Love" token, awarded by MusikAnimal."Love" token, awarded by thcipriani."Love" token, awarded by Rexogamer."Love" token, awarded by Jack_who_built_the_house."Love" token, awarded by Pppery."Love" token, awarded by Nemoralis."Love" token, awarded by stjn."Love" token, awarded by Awesome_Aasim.

Description

Feature summary (what you would like to be able to do and where):
External links and [[wikilinks]], and perhaps even {{template links}} in code should result in clickable links.

Default gadget on Wiktionary CodeLinks.js provides this functionality.

Benefits (why should this be implemented?):
Makes it easier to navigate to the linked pages.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1048541 merged by jenkins-bot:

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Enable clicking on wikilinks and external links in highlighted code

https://gerrit.wikimedia.org/r/1048541

Would’ve been a bit more consistent to make {{Tl}} the whole link, too. Though I understand it might’ve been just replicating how CodeLinks does it.

Additionally, it might be good to use mw.Title instead of mw.util.getUrl since currently the script outputs uncapitalised links for syntax like {{tl}} or [[lowercase]]:
https://patchdemo-legacy.wmcloud.org/wikis/66b1838f1a/wiki/MediaWiki:Gadget-abc.js

Change #1048543 merged by jenkins-bot:

[mediawiki/extensions/Scribunto@master] Rename SyntaxHighlight RL module

https://gerrit.wikimedia.org/r/1048543

Change #1071889 had a related patch set uploaded (by SD0001; author: SD0001):

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Link page references in Scribunto modules

https://gerrit.wikimedia.org/r/1071889

Using JavaScript for this in MediaWiki core makes no sense because this is completely static content (on Wiktionary it's implemented this way purely by necessity). Also, it doesn't seem like it checks whether or not the link is a redlink. If I were implementing this as a gadget, I would put all of the links into the Parse API and use that (so one API call per page — not too bad) to guarantee that the HTML is exactly the same as it would be on a regular page.

Using JavaScript for this in MediaWiki core makes no sense

It's not ideal, yes, but it's an improvement over the status quo of no links. The links are just optional enhancements so using JavaScript is acceptable. Doing it in the backend is more difficult.

Also, it doesn't seem like it checks whether or not the link is a redlink. If I were implementing this as a gadget, I would put all of the links into the Parse API and use that (so one API call per page — not too bad)

Hacks like that are fine for user scripts or gadgets, but in MediaWiki itself we shouldn't use the parsing API on page views. Parsing is expensive.

For Tech News, timing-wise, is this ready for inclusion in Monday's edition (or should it wait another week, for the latest patch to be merged)?

Content-wise: What should the entry say?
My best guess (please clarify or confirm) is something like:

Readers can now click on wikilinks and external links that are written within syntaxhighlight tags. Thanks to SD0001 for these improvements. [1]

Found an issue here, about namespaced includes:

/* {{Projet:JavaScript/Script|FlecheHaut}} */

Notice the include has an explicit namespace, instead of the implicit "Template:" ("Modèle:" in French).

There is a link on the Projet:JavaScript/Script part, that points to Modèle:Projet:JavaScript/Script, but it should be to Projet:JavaScript/Script.

Noticed a wrong highlight here:

-- s is a string starting with '[[' and ending with ']]'. It does not contain any other ']]' strings.

The [[' and ending with ']] part is linked, pointing to ' and ending with '.

Change #1072879 had a related patch set uploaded (by SD0001; author: SD0001):

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Fix bugs in link generation

https://gerrit.wikimedia.org/r/1072879

There is a link on the Projet:JavaScript/Script part, that points to Modèle:Projet:JavaScript/Script, but it should be to Projet:JavaScript/Script.

Will be fixed by the above patch.

The [[' and ending with ']] part is linked, pointing to ' and ending with '.

That's a valid page title, so not much can be done about it.

This feature fails for jsdoc for objects as well:

@param {{ key1: valueType1, key2: valueType2 }} paramName

is linked to Template:key1: valueType1, key2: valueType2 even though it's unwanted. This is a valid template name but it can be checked for jsdoc keywords presence (in fact, I did it in my script for links in source code).

generally, my script is more functional, as it links /wiki/, /w/ and protocol-relative links as well, also detecting links inside strings, in importScript and mw.loader.load invocations and strips some params like action=raw&ctype=... which in turn is useful for dependence tracking in user scripts

Can we actually get it so that template links are not automatically clickable but by an opt in in User Preferences in Appearance? I like the concept, but I do not like the way it is implemented, especially on Lua pages.

This feature fails for jsdoc for objects as well:

@param {{ key1: valueType1, key2: valueType2 }} paramName

is linked to Template:key1: valueType1, key2: valueType2 even though it's unwanted. This is a valid template name but it can be checked for jsdoc keywords presence (in fact, I did it in my script for links in source code).

I would propose a more generalized heuristic - just skip the link if the "page name" contains multiple punctuation/special characters, as that makes it quite less likely that a link was intended. It will take care of the [[' and ending with ']] case as well.

Can we actually get it so that template links are not automatically clickable but by an opt in in User Preferences in Appearance? I like the concept, but I do not like the way it is implemented, especially on Lua pages.

Can you be specific about the issues you see? Just making it a user preference is seldom a good idea.

Can we actually get it so that template links are not automatically clickable but by an opt in in User Preferences in Appearance? I like the concept, but I do not like the way it is implemented, especially on Lua pages.

Can you be specific about the issues you see? Just making it a user preference is seldom a good idea.

I recognize that, but this is a major change to syntaxhighlight. Not everyone wants to see this the way it is. For example, the following code is linked like this:
generate {{some template ...|...}}. Unless if we explicitly avoid linking anything in nowiki, which might actually be a better solution.

I would propose a more generalized heuristic - just skip the link of the "page name" contains multiple punctuation/special characters, as that makes it quite less likely that a link was intended. It will take care of the [[' and ending with ']] case as well.

I would favor keeping a simple code, over trying to add some heuristics that would improve some detections, but probably worsen some others.

These tags are in comment lines, thus their content is really free. There is no parsing that would enforce/provide some semantics.
I suppose we'll have to accept that, and live with the few erroneous detections. And having all these helpful links is worth having a few wrong ones.

Though, we could fix some detections on a case-by-case basis. But again, would it be really worth it?

generate {{some template ...|...}}. Unless if we explicitly avoid linking anything in nowiki, which might actually be a better solution.

<nowiki> is used in JS/Lua code to stop the page from ghost-including other pages/templates unnecessarily, like in https://ru.wikipedia.org/wiki/MediaWiki:Gadget-iwrm.js so that can’t be a general solution.

I think the original suggestion of ignoring JSDoc lines is a bit better.

Would’ve been a bit more consistent to make {{Tl}} the whole link, too. Though I understand it might’ve been just replicating how CodeLinks does it.

Or to make [[link]] less link-y and behave like {{Tl}}?. I honestly think that's better, but not blocker to this enhancement

Or to make [[link]] less link-y and behave like {{Tl}}?. I honestly think that's better, but not blocker to this enhancement

Making link area smaller is generally never a good idea, so I don't think this is a good idea either. I wasn’t suggesting it to be a blocker.


@SD0001, I spotted a bug: for some reason protocol-relative links get weird href that makes them point to a wrong location:
https://ru.wikipedia.org/w/index.php?title=MediaWiki:Gadget-logo.css&oldid=135341064

Or to make [[link]] less link-y and behave like {{Tl}}?. I honestly think that's better, but not blocker to this enhancement

Making link area smaller is generally never a good idea, so I don't think this is a good idea either. I wasn’t suggesting it to be a blocker.

How is that making it smaller? In wikitext do you see the [[]] as part of the link? Does that make the links there 'smaller'?.

In wikitext you do not have [[]] displayed at all. This is not wikitext (and it shouldn’t). It would be smaller in terms of clickable area.

So you do not see the contradiction in what you're saying? Fine

To avoid a few false positives, we might do /\[\[(?!['"]|[ \xA0]*»).../

i.e. [[ but only if not followed by ', "", », as a way to avoid '[[', "[[", « [[ »

That would also remove linking on the technically valid [[' and ending with ']]… but indeed, technically valid only… there are not much page titles that start with a ', even less ones relevant to JavaScript code…

On the other hand, we should still link things like ''[[Foobar]]''


Edit: There are more quotation marks to support, see Quotation mark#Summary table. Adding (almost) all of the closing marks, and factorizing the [ \xA0]* both to simplify and to support more cases (in particular, see the Ido language), the result would be:

/\[\[(?![ \xA0]*["'”“’‘«»‹›]).../

Edit 2: Though, there would be false positives with this expanded version, for instance with [[« in Russian, see comments below. To address this issue, we would need a proper version of my stricter suggestion here.

«» are legitimate symbols in some circumstances (and in languages like Russian), as are "" etc. There is no reason to assume per se that someone doesn’t want to link to a page with quotation marks.

As I said, a page title starting by such character is very unlikely.

@SD0001, I spotted a bug: for some reason protocol-relative links get weird href that makes them point to a wrong location:
https://ru.wikipedia.org/w/index.php?title=MediaWiki:Gadget-logo.css&oldid=135341064

I quickly figured out something like:
/(?:\bhttps?:\/\/|(?<!:)\/\/|\bwww\d{0,3}[.]|\b[a-z0-9.-]+[.][a-z]{2,4}\/).../i

(current regex can be found here)

Beware of the \b boundary: contrarily to the other branches of the alternation, we should not put it before //.

By the way, in the whole pattern, the outermost parentheses are useless and could be removed: \b((?:...)(?:...)+(?:...))

You may not have noticed that I'm specifically looking for a closing ».

The only valid page titles that my code would reject are those that start with ', " or ». Quite unlikely, isn't it? (see: Special:PrefixIndex/', Special:PrefixIndex/", Special:PrefixIndex/»)

(at worse, my code would make miss some expected, but exotic link… Very unlikely, and nevertheless, one missing link won't harm. On the other hand, we would avoid false positive on all the '[['… I think is could be worth it.)

(for fun) A stricter version of:

/\[\[(?!['"]|[ \xA0]*»).../

To unneededly make extra sure to avoid only '[[', "[[", « [[ », would be:

/(?:(?<!')\[\[(?!')|(?<!")\[\[(?!")|(?<!«[ \xA0]*)\[\[(?![ \xA0]*»)).../ (edit: nope, see note at end)

I don't recommend the latter.

(edit: you can forget it even more: non-fixed width lookbehinds are not supported…)

For Tech News, timing-wise, is this ready for inclusion in Monday's edition (or should it wait another week, for the latest patch to be merged)?

Content-wise: What should the entry say?
My best guess (please clarify or confirm) is something like:

Readers can now click on wikilinks and external links that are written within syntaxhighlight tags. Thanks to SD0001 for these improvements. [1]

It's fine to include as the main patch has been merged. I'd phrase it as: Wikilinks and external links used within <syntaxhighlight> tags and on code pages (JavaScript, CSS, Scribunto and Sanitized CSS) now result in clickable links. Uses of the {{template syntax}} are also linked to the template page. .

This new behavior appears to have some false positives and undesired conversion of plain, unlinked text into links. As such, it should probably be opt-in via text inside the syntaxhighlight tag rather than on by default.

If opt-in is not possible, how do we turn off this new linking for a given set of tags?

It would be much more helpful to others if you actually provide links to false positives.

It would be much more helpful to others if you actually provide links to false positives.

There is one example given above. Here it is on a page: https://en.wikipedia.org/w/index.php?title=User:Jonesey95/sandbox&oldid=1247494681

I was expecting real-world examples, not theoretical ones. I agree that this feature can be theoretically restricted to code pages, though: wikitext pages (or at least articles) are probably better off without this enhancement since for no-JS users, it would make it impossible for them to use links added in this way.

Change #1072879 merged by jenkins-bot:

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Fix bugs in link generation

https://gerrit.wikimedia.org/r/1072879

Another erroneous link, on an external URL, see here: https://fr.wikipedia.org/w/index.php?title=MediaWiki:Common.js&oldid=218883767#L-500

Text: www.mediawiki.org/wiki/Snippets/Direct_imagelinks_to_Commons (URL starting with www., without protocol)

  • Actual link: https://fr.wikipedia.org/w/www.mediawiki.org/wiki/Snippets/Direct_imagelinks_to_Commons
  • Expected link: https://www.mediawiki.org/wiki/Snippets/Direct_imagelinks_to_Commons

The URL is correctly detected, but because it is assigned to link.href without protocol, it is handled as a relative path instead of a full URL.

https://en.wikipedia.org/wiki/User:Shrikarsan/shri.js has a link that goes to a different location than expected. The link is:

//ha.wikipedia.org/w/index.php?title=User:Shrikarsan/shri.js/load.js&action=raw&ctype=text/javascript

The link target is:

https://en.wikipedia.org/wiki/User:Shrikarsan/ha.wikipedia.org/w/index.php?title=User:Shrikarsan/shri.js/load.js&action=raw&ctype=text/javascript

Change #1075850 had a related patch set uploaded (by Ammarpad; author: Ammarpad):

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] code-link: Fix link for URLs without protocol

https://gerrit.wikimedia.org/r/1075850

https://en.wikipedia.org/wiki/User:Shrikarsan/shri.js has a link that goes to a different location than expected. The link is:

//ha.wikipedia.org/w/index.php?title=User:Shrikarsan/shri.js/load.js&action=raw&ctype=text/javascript

The link target is:

https://en.wikipedia.org/wiki/User:Shrikarsan/ha.wikipedia.org/w/index.php?title=User:Shrikarsan/shri.js/load.js&action=raw&ctype=text/javascript

Note the URLMatch regex doesn't match protocol-relative URLs currently, and in your case, the link is not applied on //ha.wikipedia.org/..., but on ha.wikipedia.org/... (the leading "//" is not linked).

Therefore, gerrit 1075850, in its current state, would fail if URLMatch were later modified to match protocol-relative URLs (the produced link target would be something like https:////ha.wikipedia.org/...)

This needs to be an opt-in (and when this becomes stable, opt-out) feature before this is actually finalized.

I am not opposed to the idea of having clickable wikilinks in code. What I am opposed to is not being able to turn it off where it is problematic. Also, red links are not handled correctly; these should be red in code rather than blue.

No, it doesn’t need to be opt-in, and the remaining issues can be resolved in due course. @SD0001 hasn’t shown that they are going to be unresponsive to them, so I don't get the point of hastily asking for what is actually a bigger time investment to make this a configurable option. Time spent on that would be much more productively spent on resolving the issues raised by users. The point about red links was raised above already.

The red link issue could be also overcome by making the links inherit the font color from the proper token (as most code editors do). This would be my preferred option but I'm not going to die for it.

Change #1075850 merged by jenkins-bot:

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] ode-link: Fix link for URLs without protocol

https://gerrit.wikimedia.org/r/1075850

Change #1071889 merged by jenkins-bot:

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Link page references in Scribunto modules

https://gerrit.wikimedia.org/r/1071889

One more thing: For interwiki links, if the project is not on Wikimedia, it results in "bad title". This should be fixed as well.

See https://en.wikipedia.org/wiki/Module:Docbunto, a module I imported from Fandom. The interwiki links are not followed, resulting in "bad title", etc.

Change #1079584 had a related patch set uploaded (by Ammarpad; author: Ammarpad):

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] pypgments.links.js: Handle empty wikilinks and pipe trick

https://gerrit.wikimedia.org/r/1079584

I noticed two other issues:

  • When there is an anchor (e.g. [[Foo#Bar]]), the whole text is linked as expected, but the anchor is missing from the link target.
  • When Live preview is enabled, the links are created only on the first preview. Subsequent previews don't have links.

Change #1079617 had a related patch set uploaded (by SD0001; author: SD0001):

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] links: show links in live previews and preserve fragments in links

https://gerrit.wikimedia.org/r/1079617

Change #1079584 merged by jenkins-bot:

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] pygments.links.js: Handle empty wikilinks and pipe trick

https://gerrit.wikimedia.org/r/1079584

I think Idc554269ee52a05660fa41f065a2b3c73e2e1b9b broke the extension, because either it doesn't define the remoteExtPath or, it redefined the localbasepath as the remoteExtPath. (not sure how inheritance of these vars works when specifying it from the hook)

You can verify this by running with ?debug=true, and then the JS file cannot be found and served by resourceloader.

[Error] Failed to load resource: the server responded with a status of 404 (Not Found) (pygments.linenumbers.js, line 0)
[Error] Failed to load resource: the server responded with a status of 404 (Not Found) (pygments.links.js, line 0)
[Error] Failed to load resource: the server responded with a status of 404 (Not Found) (pygments.copy.js, line 0)
[Error] Failed to load resource: the server responded with a status of 404 (Not Found) (pygments.links.scribunto.js, line 0)

A fix for this should get backported to REL1_43 branch

Change #1079617 merged by jenkins-bot:

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] links: show links in live previews and preserve fragments in links

https://gerrit.wikimedia.org/r/1079617

I think Idc554269ee52a05660fa41f065a2b3c73e2e1b9b broke the extension [in debug mode], because either it doesn't define the remoteExtPath or, it redefined the localbasepath as the remoteExtPath.

FTR, this was fixed in T378878.

SD0001 claimed this task.

Closing this now. Thanks everyone for the feedback and bug reports. Please file any remaining issues as separate tickets.