Provide external file (mmaped) representation #108

permeakra · 2020-12-08T03:33:00Z

I have two use cases in mind. The first is (limited) persistence, allowing access to raw data. The second is working with datasets, exceeding memory size by several orders of magnitude. In both cases it might be desirable to allow several massives in a single file. Probably, interaction with madvise could be of use.

lehins · 2020-12-08T03:46:07Z

You can already do it with a bit of wrapper code. There is a mmap package that allows you to get ahold of a ForeignPtr to mmaped file with something liek mmapFileForeignPtr in: https://hackage.haskell.org/package/mmap-0.5.9/docs/System-IO-MMap.html I haven't tried this package myself, but it if it worked 7 years ago, I don't see why it shouldn't work now ;)

There is no need for a special representation, once you have a ForeignPtr in hand you can wrap into a S massiv array with something like unsafeMArrayFromForeignPtr0

There is no point in providing this sort of functionality in massiv directly because mmap is very much OS specific and I wanna stay OS agnostic as much as possible. However I might consider a helper package that does just this massiv-mmap or something.

Let me know how it goes if you do figure it out or hit me up on gitter if you do get stuck https://gitter.im/haskell-massiv/Lobby

I'll keep this ticket opened in case I find time to experiment with it and create such a package in a future.

permeakra · 2020-12-08T06:14:37Z

Thanks for reply

There is no point in providing this sort of functionality in massiv directly because mmap is very much OS specific and I wanna stay OS agnostic as much as possible. However I might consider a helper package that does just this massiv-mmap or something.

If you expect massiv to be used in numerics code (Personally, I consider it as my best bet for the project I'm currently planning), you should keep in mind that such code often has to deal with data sets exceeding available RAM by orders of magnitude. If we construct a massiv representing such data the way you described, it would have different cost model of various access patterns that are different from purely in-memory massives. In addition, specialized prefetch calls are available for mmaped files. Given than, it makes sense to have specialized algorithms for mmaped representation.

lehins · 2020-12-08T12:24:18Z

If you expect massiv to be used in numerics code, you should keep in mind that such code often has to deal with data sets exceeding available RAM by orders of magnitude.

@permeakra I didn't say this functionality isn't useful. I said that it should not be implemented in massiv package. It should instead be done in a separate package that integrates with massiv interface. The difference is subtle, but very important. I am all for making massiv able handle huge data. It is however not yet on my priority list.

If we construct a massiv representing such data the way you described, it would have different cost model of various access patterns that are different from purely in-memory massives. In addition, specialized prefetch calls are available for mmaped files. Given than, it makes sense to have specialized algorithms for mmaped representation.

It makes sense to have a new representation to account for different usage patterns, I certainly agree with that, but one way or another it will have to be a representation that is a wrapper around ForeignPtr, so the approach I described is required step in implementing this.

This is how I would implement this representation: newtype instance Array MM ix e = MMapArray (Array S ix e)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide external file (mmaped) representation #108

Provide external file (mmaped) representation #108

permeakra commented Dec 8, 2020

lehins commented Dec 8, 2020

permeakra commented Dec 8, 2020 •

edited

Loading

lehins commented Dec 8, 2020 •

edited

Loading

Provide external file (mmaped) representation #108

Provide external file (mmaped) representation #108

Comments

permeakra commented Dec 8, 2020

lehins commented Dec 8, 2020

permeakra commented Dec 8, 2020 • edited Loading

lehins commented Dec 8, 2020 • edited Loading

permeakra commented Dec 8, 2020 •

edited

Loading

lehins commented Dec 8, 2020 •

edited

Loading