Page MenuHomePhabricator

MediaWiki::outputResponsePayload seemingly causes net::ERR_HTTP2_PROTOCOL_ERROR 200 and compression issues in 1.35
Closed, ResolvedPublic

Description

I found this:
https://github.com/wikimedia/mediawiki/commit/4f11b614544be8cb6198fbbef36e90206ed311bf#diff-6a25b7d123e94e4ae53bd62ecb2ebac4d94d85090177605742bcfda53d3c786cR1000-R1003
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/538362
T206283: Failed deferred updates should be queued as jobs if possible (Deadlock from LinksUpdate in WikiPage::updateCategoryCounts)

And commenting out the following lines:

			ini_set( 'zlib.output_compression', 0 );
			if ( function_exists( 'apache_setenv' ) ) {
				apache_setenv( 'no-gzip', '1' );
			}

Leads to pages once again loading for me.
Thus I guess this is some sort of regression / set of things interacting together around this part of code that has been introduced.

It seems this has surfaced in a variety of places

Original description:

When attempting to access a MediaWiki path in Chrome that sets a session cookie like https://thegoodplace.wmflabs.org/wiki/Special:CreateAccount the request fails with the following error:

net::ERR_HTTP2_PROTOCOL_ERROR 200

Accessing other pages like https://thegoodplace.wmflabs.org/wiki/Mars works fine.

This is only a problem in Chrome, it does not happen in Firefox or Safari. Also, this problem does not seem to happen locally or in production, the only place this seems to happen is on a Cloud VPS.

Here is a captured network log:


that can be browsed by uploading it to https://netlog-viewer.appspot.com

For affected users, until a fix is found, there's a workaround: Add this to LocalSettings.php: $wgDisableOutputCompression = true;. This will effectively disable output compression on MediaWiki side. Depending on your setup, you can enable compression at the webserver layer (apache, nginx) If this doesn't work or cause other problems, comment the following line in the includes/MediaWiki.php file: $response->header( 'Content-Encoding: identity' );

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

FWIW, on FactGrid the query service updater failed after the MediaWiki 1.35 upgrade until I applied the equivalent of 0653fd0da802ac9cacbd745664165d345cb98b2c locally.

org.wikidata.query.rdf.tool.exception.RetryableException: Error fetching RDF for https://database.factgrid.de/wiki/Special:EntityData/Q175990.ttl?flavor=dump&nocache=1602481346169
	at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.collectStatementsFromUrl(WikibaseRepository.java:399)
	at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.fetchRdfForEntity(WikibaseRepository.java:457)
	at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.fetchRdfForEntity(WikibaseRepository.java:433)
	at org.wikidata.query.rdf.tool.Updater.handleChange(Updater.java:362)
	at org.wikidata.query.rdf.tool.Updater.lambda$handleChanges$0(Updater.java:236)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.http.client.ClientProtocolException: null
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
	at org.wikidata.query.rdf.tool.wikibase.WikibaseRepository.collectStatementsFromUrl(WikibaseRepository.java:369)
	... 8 common frames omitted
Caused by: org.apache.http.HttpException: Unsupported Content-Coding: none
	at org.apache.http.client.protocol.ResponseContentEncoding.process(ResponseContentEncoding.java:130)
	at org.apache.http.protocol.ImmutableHttpProcessor.process(ImmutableHttpProcessor.java:141)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:189)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:84)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	... 11 common frames omitted

I then saw that @Platonides had already found and fixed this at T258877, and merged and then later backported the fix since I hadn’t heard of any problems with it.

Seemingly https://gerrit.wikimedia.org/r/c/mediawiki/core/+/641776 which is now on REL1_35 (for the last week or so) does fix the issue that I was having with an UptimeCheck as described in T235554#6628446 when setting $wgDisableOutputCompression = true;

However having $wgDisableOutputCompression = false; still leads to some unknown odd things happening, the UptimeCheck just outputs Responded with 'Request Exception' in 30,000 ms. for me (no way of seeing more details that I can see.

@LucasWerkmeister whats is the value of $wgDisableOutputCompression on FactGrid?

So $wgDisableOutputCompression = true; seems to be a good workaround for me on the current version of REL1_35, but I think there is still an issue here.

@LucasWerkmeister whats is the value of $wgDisableOutputCompression on FactGrid?

It’s false.

I had a user who couldn't access a MediaWiki 1.35 wiki on iOS 10.3.3, with an error message saying "cannot open page because the network connection was lost". I commented-out $response->header( 'Content-Length: ' . ob_get_length() ); in MediaWiki.php and that fixed it for them.

Our wiki (shared hosting) was updated to MediaWiki 1.35.1 (PHP 7.4.13) this evening and the workaround to use $wgDisableOutputCompression = true no longer seems to work. When included in LocalSettings.php it was causing all output to look like random symbols, so I had to comment it out. So I'm back to having iPhone and iPad users running iOS 14.x hanging indefinitely. The MediaWiki 1.35.1 Release Notes indicate T258877 was fixed in this release.

Most frustrating once again I cannot reproduce this problem on my localhost testbed MediaWiki 1.35.1 on CentOS 8, Apache/2.4.37 (centos), PHP 7.4.13. I can leave $wgDisableOutputCompression = true and everything works properly, include the Apple devices running iOS 14.x.

There are reports of users fixing this by commenting out the line $response->header( 'Content-Encoding: identity' );

See this topic

There are reports of users fixing this by commenting out the line $response->header( 'Content-Encoding: identity' );

See this topic

With the observation being that there is both a content-encoding header with the value identity AND a header with the value gzip, before commenting that line..

And I quote from the RFC

identity: The default (identity) encoding; the use of no transformation whatsoever. This content-coding is used only in the Accept-Encoding header, and SHOULD NOT be used in the Content-Encoding header.

So that's definitely @Platonides commit of rMW113c0d2ce14e: Change invalid 'Content-Encoding: none' header for T258877: MediaWiki sets invalid Content-Encoding: none; I think this was mentioned above too

If none is invalid, we definitely don't want to just revert it...

So that's definitely @Platonides commit of rMW113c0d2ce14e: Change invalid 'Content-Encoding: none' header for T258877: MediaWiki sets invalid Content-Encoding: none; I think this was mentioned above too

If none is invalid, we definitely don't want to just revert it...

Invalid-but-not-creating-errors > invalid-and-creating-errors.

Invalid-but-not-creating-errors > invalid-and-creating-errors.

(Content-Encoding: identity Content-Encoding: none has created errors, too.)

Invalid-but-not-creating-errors > invalid-and-creating-errors.

(Content-Encoding: identity has created errors, too.)

Do you mean Content-Encoding: none?

Indeed, yes. I got confused, sorry ^^

Change 643550 abandoned by Addshore:
[mediawiki/core@REL1_35] Revert things needed to revert for T235554

Reason:

https://gerrit.wikimedia.org/r/643550

The suggested workaround to $wgDisableOutputCompression = true; only works for MediaWiki 1.35.0. It doesn't work with MediaWiki 1.35.1, see https://www.mediawiki.org/w/index.php?title=Topic:Vwp6kb2w63zpuvud&topic_showPostId=vzu5i6tkgkcfdv7m#flow-post-vzu5i6tkgkcfdv7m

@Peculiar_Investor I don't think that's an issue of 1.35.0 vs 1.35.1 but just that $wgDisableOutputCompression = true; doesn't work with the invisible caching which your hosting does.

@Platonides. $wgDisableOutputCompression = true; was the workaround solution with 1.35.0. Changes in 1.35.1 rendered that workaround unusable as many have posted on Project:Support desk. Hence my comment to be careful with the suggested workaround.

I remain hopeful that root cause of these problems will be soon found and resolved since REL1_35 is the current LTS version of MediaWiki.

Sorry @Peculiar_Investor, you are right in that there was a change in 1.35.1, I was thinking this was included in 1.35.0
The related change in 1.35.0 vs 1.35.1 was that Content-Encoding: none was changed into Content-Encoding: identity (T258877).

This seems to have completely confused some (I'd dare to day, broken) setups which were working when we were outputting the invalid value 'none' (it probably got ignored) but now break with the value "identity" they understand (and mismatches with the content they receive).

@Platonides if you review through T267619 which was closed as a duplicate, I did some rather extensive testing with all MediaWiki versions between 1.31 to current master which clearly identified that changes in REL1_35 clearly introduced problems in the communication layers/stack. Running MediaWiki 1.34 didn't show these problems on the same server, with the same wiki database and configuration.

It clearly doesn't impact everyone, just some unidentified combination of webserver configuration and implementation. In addition to the other Content-* attributes mentioned, based on all the evidence I've gathered and reported, the problem seems to be related to changes in how Content-length is handled. IIRC I was seeing examples where there was a mismatch between the Content-length parameter was what was actually sent.

Change 661452 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] [DNM] Various cleanups to output flushing

https://gerrit.wikimedia.org/r/661452

I wonder what output_buffering value is being used in php.ini here. If I set it to off (not my distro default), I can trigger the MW_SETUP_CALLBACK use of ob_start()/OutputHandler::handle(). I definitely see some mismatch between the logic of OutputHandler::handle and MediaWiki::outputResponsePayload.

@aaron My wiki (1.35.1) is suffering from this issue. It runs on Bluehost shared hosting, so I have some control of settings, but not everything. I checked via phpinfo.php (PHP 7.4.14) and output_buffering shows 'no value' and zlib.output_compression = 'On'. I'm more than willing and able to investigate/change settings or any other debugging that might help aid in resolving this. I previously did a side-by-side test on the same server using 1.34 and it ran flawlessly, so something in 1.35 and the server setup are not 'playing nice' and have significantly degraded responsiveness and the user experience.

I recently installed Mediawiki 1.35.1 and it did show the mess above, on a Chrome browser, however, using Microsoft Edge which is based on Chromium doesn't show the garbled mess above. Commenting out the line in Mediawiki.php doesn't affect the Edge browser (obviously bc it works on Chrome too).

Hope this helps in a way.

This seems to have completely confused some (I'd dare to day, broken) setups which were working when we were outputting the invalid value 'none' (it probably got ignored) but now break with the value "identity" they understand (and mismatches with the content they receive).

Is there anything specific that might be broken in a setup likely to cause this that I can go read about?

Same issue testing 1.35.0 but only after having config'd short URLs, removed short URL config problem disappeared. Dreamhost is bugging us to upgrade--unhelpful, bc unclear if a good idea. Am an average MW user (not programmer) knowing just enough to install/upgrade, basic config, find/fix simple issues. Chrome, FF and iOS safari affected, DH supposedly checked their config and stated was a MW config problem (wouldn't say what--unhelpful). Figured out via support desk (Bawolff) to set $wgDisableOutputCompression = true; (had already set $wgUseGzip = false; and disabled CF), this worked on 1.35.0 but didn't upgrade our prod site (1.33).

Just tried 1.35.1 (same host, same db, same config, --all that worked fine for earlier versions of MW), no short URLs this time, same result that $wgDisableOutputCompression = true; causes random symbols with or without $wgUseGzip = false; . Dreamhost says they aren't doing hidden caching on VPS per their docs other than opcache (off--no difference), and they point folks to Cloudflare (off, which makes no difference). With https, gzip is auto-disabled on DH VPS I have almost the same setup as Peculiar Investor with regard to zlib:

I checked via phpinfo.php (PHP 7.4.1[2]) and output_buffering shows 'no value' and zlib.output_compression = 'On'

Short URLs on or off no difference, Chrome and iOS Safari affected. Weirdly was able to get Chrome to stop displaying console error by toggling on zlib.output_compression (refreshed chrome page) and then toggled back off in php.ini and reloaded again. Now Chrome sort of works on some devices (though performance is badly degraded), no console errors, iOS Safari (14.2) still complaining: "cannot parse response."

If there's suspected broken server config items that could cause this, would be really helpful to have any idea of what to check/change/test/disable or what to ask DH to (they are usually willing to test stuff).
Also, can there be mention of T235554 on Installing MediaWiki and Installation Error?--Apologies for my own ignorance in asking, no idea what's usually done in such cases.

Would MediaWiki folks advise average folks to use MW 1.35.0 on prod sites with $wgDisableOutputCompression = true; ? Would you advise folks stay on 1.31-1.34?

Thanks

My Dreamhost (DH) VPS info:

Debian 9
Apache 2.4.25
PHP	7.4.12 (cgi-fcgi)
MySQL	5.7.28-log
ICU	57.1

Change 674196 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Avoid HTTP protocol errors with apache2handler when there are deferred updates

https://gerrit.wikimedia.org/r/674196

Change 674196 merged by jenkins-bot:
[mediawiki/core@master] Avoid HTTP protocol errors when fastcgi_finish_request() is unavailable

https://gerrit.wikimedia.org/r/674196

Change 674673 had a related patch set uploaded (by Reedy; author: Aaron Schulz):
[mediawiki/core@REL1_35] Avoid HTTP protocol errors when fastcgi_finish_request() is unavailable

https://gerrit.wikimedia.org/r/674673

Change 674673 merged by jenkins-bot:
[mediawiki/core@REL1_35] Avoid HTTP protocol errors when fastcgi_finish_request() is unavailable

https://gerrit.wikimedia.org/r/674673

Change 675218 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):
[mediawiki/core@master] Move logDataPageOutputOnly() call to outputResponsePayload()

https://gerrit.wikimedia.org/r/675218

Change 675218 merged by jenkins-bot:
[mediawiki/core@master] Move logDataPageOutputOnly() call to outputResponsePayload()

https://gerrit.wikimedia.org/r/675218

Change 675379 had a related patch set uploaded (by Reedy; author: Aaron Schulz):
[mediawiki/core@REL1_35] Move logDataPageOutputOnly() call to outputResponsePayload()

https://gerrit.wikimedia.org/r/675379

Change 675379 merged by jenkins-bot:
[mediawiki/core@REL1_35] Move logDataPageOutputOnly() call to outputResponsePayload()

https://gerrit.wikimedia.org/r/675379

Closing given the updates to git master and REL1_35

Change 676577 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@REL1_35] RELEASE-NOTES-1.35: Remove T235554

https://gerrit.wikimedia.org/r/676577

Change 676577 merged by jenkins-bot:

[mediawiki/core@REL1_35] RELEASE-NOTES-1.35: Remove T235554

https://gerrit.wikimedia.org/r/676577

Other suspects to audit if we are going to try again at having a hard requirement for MediaWiki.php to "see" all output in an explicit return value:

  • TimedMediaHandler iframes.
  • RawAction printing (action=raw).
  • Anything else calling OutputPage::disable.
  • Anything else using print or echo outside CLI-only code.

Realistically this can't and shouldn't happen in REL1_35, so I'd recommend we take a simpler approach that more closely resembles the status quo we know to have worked before.

The headers_sent() checks should handle those, though maybe something is checked in the wrong place.

The only simpler fix is to just ignore DEFER_SET_LENGTH_AND_FLUSH .

Change 676693 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] Disable DEFER_SET_LENGTH_AND_FLUSH headers to avoid HTTP errors

https://gerrit.wikimedia.org/r/676693

Change 676956 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@REL1_35] Disable DEFER_SET_LENGTH_AND_FLUSH headers to avoid HTTP errors

https://gerrit.wikimedia.org/r/676956

Change 676693 merged by jenkins-bot:

[mediawiki/core@master] Disable DEFER_SET_LENGTH_AND_FLUSH headers to avoid HTTP errors

https://gerrit.wikimedia.org/r/676693

Change 676956 merged by jenkins-bot:

[mediawiki/core@REL1_35] Disable DEFER_SET_LENGTH_AND_FLUSH headers to avoid HTTP errors

https://gerrit.wikimedia.org/r/676956

Change 699182 had a related patch set uploaded (by Reedy; author: Aaron Schulz):

[mediawiki/core@REL1_36] Disable DEFER_SET_LENGTH_AND_FLUSH headers to avoid HTTP errors

https://gerrit.wikimedia.org/r/699182

I've just noticed this didn't land in 1.36...

Change 699182 merged by jenkins-bot:

[mediawiki/core@REL1_36] Disable DEFER_SET_LENGTH_AND_FLUSH headers to avoid HTTP errors

https://gerrit.wikimedia.org/r/699182

Change 661452 abandoned by Aaron Schulz:

[mediawiki/core@master] Various fixes and cleanups to output flushing and post-send logic

Reason:

https://gerrit.wikimedia.org/r/661452