Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file: allow open/2 to work on directories #2212

Merged
merged 1 commit into from
May 23, 2019

Conversation

andrenth
Copy link
Contributor

This is useful mainly to ensure that a new file has been persisted to disk by calling file:sync/1 or file:datasync/1 on file's parent directory.

Note: I haven't been able to test on Windows, though I believe the changes are correct according to the docs for CreateFileW.

@CLAassistant
Copy link

CLAassistant commented Apr 18, 2019

CLA assistant check
All committers have signed the CLA.

@jhogberg jhogberg self-assigned this Apr 23, 2019
@jhogberg jhogberg added the team:VM Assigned to OTP team VM label Apr 23, 2019
@jhogberg
Copy link
Contributor

Thanks for the PR! I think it looks great aside from making it possible to open directories without the directory option, which can't be done on Windows.

@andrenth
Copy link
Contributor Author

Hi

In this case doesn’t the Windows system call return an error, like on Unix when you use the directory flag to open a regular file, which returns enotdir?

Or do you mean I should check and return einval instead of relying on the syscall error?

@jhogberg
Copy link
Contributor

Sorry about the confusing wording. The problem is that this PR makes it possible to open a directory without specifying the directory option, which works fine on Linux but fails on Windows as the system call won't allow it.

I'd like the behavior to be consistent across platforms, and I think it would be best to disallow opening directories without the directory option as it's the current behavior, but I'm open to other ideas as long as they're consistent.

@llelf
Copy link
Contributor

llelf commented Apr 23, 2019

to ensure that a new file has been persisted to disk by calling file:sync/1 or file:datasync/1 on file's parent directory.

This behaviour is not specified anywhere, is it?

@andrenth
Copy link
Contributor Author

@jhogberg, it makes sense to be consistent; I should be able to work on it tomorrow. Do you have a preference regarding adding a new commit to the PR or rebasing to keep a single commit?

@llelf, I’m not sure if it’s standardized, but it’s documented on a number of articles, e.g. https://lwn.net/Articles/457667/ and http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/

Also, a descriptor-based API (e.g. fstat vs stay) is also useful to prevent some rare but possible race conditions on file access. I intend to work on allowing some functions in the file module to be able to take io_device()s if this PR is merged.

@jhogberg
Copy link
Contributor

@jhogberg, it makes sense to be consistent; I should be able to work on it tomorrow. Do you have a preference regarding adding a new commit to the PR or rebasing to keep a single commit?

We'd prefer rebasing in this case as it's just a fixup of a commit in the same PR, but feel free to keep them separate until it's time to merge it.

Also, a descriptor-based API (e.g. fstat vs stay) is also useful to prevent some rare but possible race conditions on file access. I intend to work on allowing some functions in the file module to be able to take io_device()s if this PR is merged.

I believe it will be difficult to make this both consistent and portable. We've gone through great pains to make sure all file functions behave the same way on all platforms, often down to simulating the same error codes, and I can't see a nice way to make (for example) renameat work the same way on Linux and Windows. Even if we stick to relatively easy functions like fstat, their results could diverge from their path-based variants because the Windows port may have some quirks-compatibility hacks that require the path itself.

Working around these issues isn't impossible, but we've been toying with the idea of adding a new module (fs) that doesn't bend over backwards to provide the exact same behavior on all platforms, and I think this is a good opportunity to get started on that.

@andrenth
Copy link
Contributor Author

andrenth commented Apr 24, 2019

I added a new commit to make it easier for you to review what's changed and then I'll rebase if you think it's ready.

Regarding the fd-based APIs I was thinking of just allowing some already existing functions to also work over I/O devices, such as fstat, and indeed I saw the compatibility workarounds that you mentioned, particularly the get_drive_number() function (I think a version that works on handles could be written with GetFinalPathNameByHandleW, if I understood the docs correctly), and is_executable_file() which just checks the file extension (i.e. .exe, .bat, etc) -- I don't know if there's a way around for that one.

At first I tried to implement these patches as separate module but I'd like to write NIFs that could operate on the file_descriptor record that the file module returns, and I'm not sure it's possible to share NIF resources across different modules like that. I ended up having to copy a lot of code from the file module NIFs, so I decided to give up this idea and go directly for the PR instead.

My plan was indeed to have the *at() functions available from Erlang at some point, though I'm not sure if Windows even has equivalent system calls, so your idea of having a separate module is interesting. Is there a policy for platform-dependent modules in OTP, as in, are those allowed?

@jhogberg
Copy link
Contributor

I added a new commit to make it easier for you to review what's changed and then I'll rebase if you think it's ready.

Great, I think it looks good.

It just occurred to me that not all platforms have O_DIRECTORY. Can you add a fallback that fails when trying to open a file with the directory option?

(It can also be hidden behind various feature-test macros but those are often platform dependent. I can look into that later on.)

Regarding the fd-based APIs I was thinking of just allowing some already existing functions to also work over I/O devices, such as fstat, and indeed I saw the compatibility workarounds that you mentioned, particularly the get_drive_number() function (I think a version that works on handles could be written with GetFinalPathNameByHandleW, if I understood the docs correctly), and is_executable_file() which just checks the file extension (i.e. .exe, .bat, etc) -- I don't know if there's a way around for that one.

Fudging it with GetFinalPathNameByHandleW would work fine, but it could give different results from the path-based variant since it follows links.

I don't think anyone cares in this specific case, but we're bound to collide with other annoying corner cases in other functions.

At first I tried to implement these patches as separate module but I'd like to write NIFs that could operate on the file_descriptor record that the file module returns, and I'm not sure it's possible to share NIF resources across different modules like that. I ended up having to copy a lot of code from the file module NIFs, so I decided to give up this idea and go directly for the PR instead.

The idea is to use the same NIF but expose different interfaces to it. There will be some duplication of code but most of it can be shared.

My plan was indeed to have the *at() functions available from Erlang at some point, though I'm not sure if Windows even has equivalent system calls, so your idea of having a separate module is interesting. Is there a policy for platform-dependent modules in OTP, as in, are those allowed?

We don't do platform-specific modules, what I have in mind is to {error, enotsup} if there's no equivalent and document the behavior on a per-platform basis when there is. Users that want Unix-ish behavior everywhere should use the file module, this is intended to be a supplement and not a replacement.

The current idea is that the module should only support raw files without fancy options like compressed, it should bypass the file server for general operations like renaming, and things like file permissions and metadata should be handled with two open-ended get/set functions to avoid the current mess.

We haven't done much beyond toying with the idea however, and I'm not opposed to adding descriptor-based functions to the file module, but I think it's worth looking further into this before adding more stuff to an already cluttered interface.

@andrenth
Copy link
Contributor Author

John,

It just occurred to me that not all platforms have O_DIRECTORY. Can you add a fallback that fails when trying to open a file with the directory option?

(It can also be hidden behind various feature-test macros but those are often platform dependent. I can look into that later on.)

I've added a commit to do that. I used #ifndef O_DIRECTORY directly because there was already a similar #ifndef O_SYNC in the code.

There's a bit of repetition in side the if (fd != -1) block with the EFILE_MODE_* checks but I think this is better than multiple nested ifs. I also repeated the open_file_is_dir() call instead of saving its result in a variable because I think of the EFILE_MODE_SKIP_TYPE_CHECK as saying "I know what I'm doing, let me save a syscall", but I can refactor if you prefer.

@jhogberg
Copy link
Contributor

I don't mind the repetition, it looks good to me aside from needing to close the file before returning the error.

@andrenth
Copy link
Contributor Author

I added the close calls (also on windows) and took the liberty to rebase.

@jhogberg
Copy link
Contributor

Thanks, it looks good to me!

We're a bit too close to include it in the OTP 22 release, so I'll add it too our nightly tests soon after that.

@andrenth
Copy link
Contributor Author

Awesome, thanks!

@andrenth
Copy link
Contributor Author

andrenth commented May 9, 2019

@jhogberg It took me a few weeks, but I finally managed to get a windows VM working for development, and I found a bug in the handling of the file open modes. The directory mode flag would never be handled in the original code, which I fixed by reordering the if/else if sequence.

@jhogberg jhogberg added the testing currently being tested, tag is used by OTP internal CI label May 14, 2019
@jhogberg
Copy link
Contributor

OTP 22's out so I've added this to our nightly tests. While looking through the PR again I noticed I'd missed an unrelated change in the compress_errors testcase, and that the mode() type in the file module needs to include the directory option, but other than that I think it looks ready to merge once the tests have run. :)

This is useful mainly to ensure that a new file has been persisted
to disk by calling file:sync/1 or file:datasync/1 on file's parent
directory.
@andrenth
Copy link
Contributor Author

Done. I added directory inline in the Modes spec in open/2 and path_open/3, becase mode() is used in other functions where directory wouldn't make sense, like copy/2.

@rickard-green rickard-green added testing currently being tested, tag is used by OTP internal CI and removed testing currently being tested, tag is used by OTP internal CI labels May 20, 2019
@jhogberg jhogberg merged commit 59a813c into erlang:master May 23, 2019
@jhogberg
Copy link
Contributor

Merged, thanks again for the PR!

@andrenth
Copy link
Contributor Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants