For all Wikimedia sites, add the following rewrite:
RewriteRule /data/(.*)/(.*)$ /wiki/Special:PageData/$1/$2 [QSA]
See T161527: RFC: Canonical data URLs for machine readable page content for the rationale.
For all Wikimedia sites, add the following rewrite:
RewriteRule /data/(.*)/(.*)$ /wiki/Special:PageData/$1/$2 [QSA]
See T161527: RFC: Canonical data URLs for machine readable page content for the rationale.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T91505 [Epic] Adding new datatypes to Wikidata (tracking) | |||
Resolved | • Jonas | T57549 [Story] Add a new datatype for geoshapes | |||
Resolved | Ladsgroup | T160535 [Task] Provide RDF mapping for geoshape data type. | |||
Open | None | T176764 Use the new /data/ path for canonical wikibase entity data URIs. | |||
Open | None | T163921 [Epic] Implement canonical data URLs for machine readable page content | |||
Open | None | T163922 Create a URL rewrite to handle the /data/ path for canonical URLs for machine readable page content | |||
Resolved | Ladsgroup | T163923 Create Special:PageData as a canonical entry point for machine readable page data. |
The patch is merged but not deployed so I think we should wait a little. But given how the RFC implemented and here, I think we need to change the code base to accept slot too (even thought it ignores it for now). Let me clarify: right now, Special:PageData/foo/bar goes to page foo and ignores bar, it should go to bar and ignores foo.
@Ladsgroup I agree that the rewrite should only be done once Special:PageData is live.
The fix for the foo/bar problem is now also merged. Thanks for noticing, I completely missed that, even though I wrote the spec :)
Change 360887 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/puppet@production] Add /data/ Redirect for commons
Change 360891 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/puppet@production] Add /data/ url redirect in beta cluster (Wikipedia only)
Mentioned in SAL (#wikimedia-releng) [2017-06-22T19:02:18Z] <Amir1> cherry-picking gerrit:360891/1 (T163922)
I cherry-picked the patch in beta cluster and it works just fine: https://en.wikipedia.beta.wmflabs.org/data/main/Albert_Einstein
Change 360891 merged by Filippo Giunchedi:
[operations/puppet@production] Add /data/ url redirect in beta cluster (Wikipedia only)
@Dereckson you marked this ticket as blocked on the ops boards - but I don't see what it's blocked on. How do we move forward?
Is this just blocked on the question of HTTP 301 vs. 303, which is still open on Gerrit? Or is there something else? We should really get this redirect in place, we’ve already been exposing /data/ URLs in RDF exports and the query service for a while now.
There are other comments from yours truly in the last review, namely maintaining the status quo of configuring the redirect, aside from the 303 vs 301 part, on which I can be convinced with a good enough argument, but I haven't yet seen a reply.
@akosiaris I replied on the patch. Basically: 301, 302, 303 are all wrong. Pick one and give us a redirect :)
Using an absolute target URL for consistency seems like a good idea, even if it's not necessary.
As there is no argument for 303, we only have to sort between 301 and 302, and the main question for that is the stable or not property of the target URL.
If the redirection will be stable and always point to the same resource at the same URL, we can use a 301 (permament). If not, that will be a 302.
*sigh* 301 it is, then. I wrote some more on the patch. Let's just ignore the pesky RFC ;)
Change 360887 merged by Alexandros Kosiaris:
[operations/puppet@production] Add /data/ Redirect for commons
Change 380774 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[operations/puppet@production] Fix /data/ redirect for commons
Change 380774 merged by Elukey:
[operations/puppet@production] Fix /data/ redirect for commons
I see the fix got merged, but it doesn't seem to be live yet.
In general, this raises the question of testing this kind of patch. Do we have an environment where this would be possible, in particular also for people that don't have shell access to production servers?
It seems to be live on some servers and not yet on others. (And also, Varnish cashes the redirects.) I’m running this command:
until ! curl -s -I https://commons.wikimedia.org/data/main/Data:Bundestagswahl2017/wahlkreis46.map?breakCache=$RANDOM | grep -qF RW_PROTO; do sleep 60s; done; notify-send 'redirect fixed (at least when not cached)'
and it occasionally sends out notifications already.
Also I think some are behind varnish, e.g. this works fine: https://commons.wikimedia.org/data/main/Data:Amsterdam_Districts.map
All the appservers are now returning the good version of the redirect, I think that some of them are still showing up broken due to caching.
elukey@tin:~$ apache-fast-test broken pybal testing 1 urls on 247 servers, totalling 247 requests spawning threads............................... https://commons.wikimedia.org/data/main/Data:Bundestagswahl2017/wahlkreis46.map * 301 Moved Permanently https://commons.wikimedia.org/wiki/Special:PageData/main/Data:Bundestagswahl2017/wahlkreis46.map
(broken was my config to test the Data:Bundestagswahl2017 url)
I believe this is only a matter of cleaning up urls that show up garbled, @ema just did it for https://commons.wikimedia.org/data/main/Data:Bundestagswahl2017/wahlkreis46.map via https://wikitech.wikimedia.org/wiki/Multicast_HTCP_purging#One-off_purge)
Is there any way to find out which URLs are garbled? Can we look for RW_PROTO in all the cached redirects, or something like that?
There is a way (https://wikitech.wikimedia.org/wiki/Varnish#One-off_purges_.28bans.29) but it is going to be risky if we don't get the purge pattern right, since we might risk to cut too many objects from the cache. I would avoid it if possible (and wait the normal caching expiry workflow), but we can discuss with the Traffic team another approach if you want.
If the TTL isn’t too long (I saw a cap of 1 day in the puppet config, is that correct?), then normal expiry is probably enough.
It doesn't work like that. The time that request can be cached is determined by the headers the response sets. http://book.varnish-software.com/3.0/HTTP.html is a pretty interesting read if you 've never read it before. It's also a rabbithole (ableit not a very big one ;-). In absence of these (like in this case where only the Age header was set) it's not easy to deduce when the page is going to be removed from all existing caches (some of which we don't really control, like the browser cache)
Anyway, I 've purged the caches in order to resolve this faster instead of waiting it out. For the interested the commands were (in that sequence)
varnishadm ban "obj.status == 301 && req.http.host ~ commons.wikimedia.org" varnishadm -n frontend ban "obj.status == 301 && req.http.host ~ commons.wikimedia.org"
Do remember to force refresh to test it as your browser probably has the result cached as well.
Actually, as per the RFC, this is for all Wikimedia wikis. It's independent of Wikibase/Wikidata. Wikidata just happens to be the driving use case.
Change 382163 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[operations/puppet@production] Change /data/ redirect to Special:Pagedata
Change 382163 abandoned by Lucas Werkmeister (WMDE):
Change /data/ redirect to Special:Pagedata
Reason:
Abandoning in favor of https://gerrit.wikimedia.org/r/#/c/382172, which makes PageData the proper title.