Page MenuHomePhabricator

Define technical requirements for new file types (on Wikimedia Commons)
Closed, ResolvedPublic

Description

Every once in a while new file types get proposed on Commons (see for example https://commons.wikimedia.org/wiki/Commons:Village_pump#Stereoscopic_image_formats_support_.28.JPS_and_.MPO.29). We should document the technical requirements for a new file type. Think about for example the ability to render the thumbnail and security requirements. This makes it easier to evaluate new requests.

(if we already happen to have this somewhere, please paste the link and close this ticket)

Event Timeline

Multichill raised the priority of this task from to Needs Triage.
Multichill updated the task description. (Show Details)
Multichill subscribed.

If image is a normal image type (Not a video, audio file, not a 3d model.), and its supported by image magick, then its normally pretty easy to add support.

Otherwise they more or less have to be evaluated on a case by case basis, but general guidelines:
*There exists some program to convert the file to either a JPG or PNG file (If the file type is something where you wouldn't want to support it by converting to a still JPG or PNG [e.g. 3d model, video, rich interactive media], then things get significantly much more complicated, and should be evaluated on a case by case basis)
*There is a method of extracting the height and width of the file.
*Politically, there needs to be open source programs that can read and write the file. It should be possible to read/write the file without violating anyone's software patents. Preferably there should be an open specification of the format (unclear how important an open spec is. I imagine our community is probably ok with something if it was thoroughly reverse engineered)

Security wise:
*Format should probably not allow arbitrary code execution (e.g. Macros. Executable files, files that could contain executables [e.g. zip archives])
*Format should not look like something that a browser would interpret in a negative way (Uploaded arbitrary html files is a no)
*The converter program should be "safe" (What that means probably depends on circumstance, but as a start I would say - doesn't execute arbitrary code, doesn't access arbitrary network resources. Ideally has reasonable memory and CPU usage, but we also have things to kill converter programs that take too much resources)

That's off the top of my head. @csteipp might have further thoughts on security things

Security wise:
*Format should probably not allow arbitrary code execution (e.g. Macros. Executable files, files that could contain executables [e.g. zip archives])

This is a good general rule.

*Format should not look like something that a browser would interpret in a negative way (Uploaded arbitrary html files is a no)

This is probably the most important one. It shouldn't be possible for the file to be content-sniffed to html, nor should it be possible for the file to be interpreted as flash/java/silverlight, which have their origin defined as the domain where they are downloaded from.

*The converter program should be "safe" (What that means probably depends on circumstance, but as a start I would say - doesn't execute arbitrary code, doesn't access arbitrary network resources. Ideally has reasonable memory and CPU usage, but we also have things to kill converter programs that take too much resources)

I'd consider safe something that:

  • Doesn't pull in network or local resources ( or allows us to disable the parsing of those, like we do for xml-based formats)
  • Has a foss, maintained conversion utility, preferably packaged in debian already
  • Supported by a security team, or has undergone some amount of security testing, and gets security patches.

That looks like a decent starting list. I'll add any more I think of.

I had started drafting c:Help:Support of new file formats for this (based on, indeed, a post by @Bawolff ;-)

See also T116544 / T19012 -- it's important to be able to include source files. Currently we have tons of things like maps, charts, diagrams, and animated diagrams uploaded as derivative files (PNGs, JPGs, animated GIFs) with no way to reproduce or modify them.

Dear colleagues! We want to upload a base of astronomical observations in 10 000 sky shots (photos). We need a fits format: https://en.wikipedia.org/wiki/FITS. How to do this for the Wikimedia Commons?

@Niklitov we currently don't support this file format.

What you need for support is roughly:

  1. Determine if it is likely to be allowed at WMF
    1. The format should be free from patents
    2. The format should not allow execution of code when downloaded and executed on the computer of clients (if it does you will need to scan/sanitize the uploads from within MediaWiki)
    3. Determine if the image area size of the files. Anything over 12MP in size requires special support, to make thumbnails of very large images perform)

2: Add support for the mime type to MediaWiki: See

    1. includes/libs/mime/mime.info (mime to type matching)
    2. includes/libs/mime/mime.types (mime to file extension matching)
    3. includes/libs/mime/MimeAnalyzer.php (magic number detection)
  1. Create an ImageHandler
    1. Read the metadata of the file (minimally: width, height)
    2. Implement thumbnailing support (relatively easy since it seems supported by ImageMagick)
    3. See other image handlers in: /includes/media subdirectory
  2. Register the ImageHandler in includes/media/MediaHandlerFactory.php
  3. Now build a thumbor module to support TIFS (because that is what is actually used by WMF production. @Gilles might be able to help.
  4. Add the file extension to the list of allowed file extensions for upload (configuration change of wmf, possibly requiring community consensus)

FITS spec info: https://www.loc.gov/preservation/digital/formats/fdd/fdd000317.shtml and http://www.cv.nrao.edu/fits/

I also note that it might be unwise to allow upload FITS files which don't do image compression.. I mean it's nice that is is an archive format, but uncompressed images are a huge waste of disk space, that few people will appreciate.

Lastly.. you can always choose to convert everything to PNG and just upload as PNGs (probably easier to implement).

@TheDJ Thank you very much for your answer! I informed the Zvenigorod Observatory engineers (I asked to join them to our team). They can help us to make this work together and answer all questions. Also they are ready to transfer the code to the public (GNU) for reads these .fits images for Wikimedia Commons. If this format is available, we will get unique photos of telescopes.

Jdforrester-WMF assigned this task to TheDJ.
Jdforrester-WMF subscribed.

https://www.mediawiki.org/wiki/Manual:Adding_support_for_new_filetypes is a good start. Let's call this Resolved, though of course documentation of living systems is forever a work in progress.