Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse registry: determine UX and URL handling #10964

Open
ehuss opened this issue Aug 10, 2022 · 3 comments
Open

sparse registry: determine UX and URL handling #10964

ehuss opened this issue Aug 10, 2022 · 3 comments
Labels
A-sparse-registry Area: http sparse registries S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing.

Comments

@ehuss
Copy link
Contributor

ehuss commented Aug 10, 2022

This is an open-ended issue about ultimately determining the UX for how users opt-in to sparse registries, and how that interacts with URLs, Package IDs, Cargo.lock files, config files, etc.

The approach may differ from crates.io compared to third-party registries. However, I think it would be good to think about them together so that there is a coherent story.

  • It is assumed the git index and the sparse index cannot be served from the same URL.
  • The URL is stored in Cargo.lock files and Cargo.toml in .crate files for alt-registries. What is the migration story for alt-registries that want to support sparse registries? What if they want to offer both git and sparse at the same time?
  • Cargo should give users the ability to choose whether to use sparse or git index.
    • Users may be able to opt-in to a specific registry protocol using source-replacement in the short term, but it would be nice if there was something simpler.
  • Auto-detection is desired, but Cargo will need to know about both URLs. However, auto-detection may be very difficult to implement well.

Some roughly thought-out proposals:

Proposal 1 — Alt-registry config

Add a new registry.*.sparse key which is the URL for a sparse index. Allow both index and sparse to be specified at the same time. Cargo will use the index URL in Cargo.lock and will generally treat the two URLs as interchangeable.

[registry.my-registry]
index = "https://example.com/crates/index.git"
sparse = "https://example.com/crates/sparse/"

If only sparse is listed, then Cargo will only use the sparse index, and the sparse URL will be used in Cargo.lock and PackageIDs and published Cargo.toml files (alt-registries).

Proposal 2 — Easy toggle

Add a "preferred" setting which indicates the preferred mechanism for speaking to a registry. It might look something like this:

[registries.crates-io]
preferred = "sparse"

Via environment variables, this would look like CARGO_REGISTRIES_CRATES_IO_PREFERRED=sparse.

(Bikeshed name welcome.)

This setting can be used for crates.io or any other registry.

Proposal 3 — Source replacement

This option works today (AFAIK), but is a little clumsy.

[registries.foo]
index = "https://example.com/crates/index.git"

[source.foo]
replace-with = "foo-sparse"

[source.foo-sparse]
index = "sparse+https://example.com/crates/sparse/"

Proposal 4 — Migration story

An organization using an alt registry may want to transition to using sparse indexes. There are a few options:

  1. They could serve both git and sparse, and have all users update their config as suggested in Proposal 1 above. They could keep both indexes alive (similar to how crates.io will work).
  2. They could temporarily serve both. At some point, they will switch to sparse-only after all users have updated their configs as suggested in Proposal 1. However, they will need to keep the config with the old git URL to handle Cargo.lock and .crate files.
  3. They could do a hard-switch. They could drop their git server, and change everyone's config to only know about sparse (as suggested in Proposal 5), and update all Cargo.lock files to use the new URL, and rebuild all .crate files to the new URL. (This sounds almost too painful to suggest.)

Proposal 5 — New registries

New registries may want to only support sparse indexes. In that case, they should have a simple config:

[registries.foo]
sparse = "https://exmaple.com/crates/index/"

or a different idea would be:

[registries.foo]
index = "sparse+https://example.com/crates/index/"

All URLs in Cargo.lock and other places will use the sparse URL.

@ehuss ehuss added the A-sparse-registry Area: http sparse registries label Aug 10, 2022
@Eh2406
Copy link
Contributor

Eh2406 commented Aug 15, 2022

So my opinion is that we should not recommend that registries plan to have two indexes with the same content in the long run. crates.io is going to do that, but it's a pretty large maintenance burden and I don't think it's worth it for anyone else. New registries should be built with sparse indexes, and just not support older Cargos.

So the question is what to do with existing registries that want to transition from git indexes to sparse indexes? What should they serve wear and for how long?

If they're using an existing registry as a source replacement for crates.io, then their source URL does not appear in any files. If they have any private packages not yet on crates.io this use pattern is not recommended by our documentation, but I believe it's quite common. These registries can provide the same files at both a git and sparse url for some transition period. Nothing changes depending on whether the user has git or sparse urls configured. Once no one is using the git, it can be shut down. This is as easy a migration as anyone could ask for.

But what if the existing registry is not intended to be used as a source replacement for crates.io? Then if custom package A depends on custom package B, then the index file and .crate file will have the old URL. The biggest problem is that the first time we hear about registry+https://my-git-registry.company.com may be looking at the index files for B, we may not even have a configuration for that URL that we can use to look up the new sparse URL associated with it. (cc @arlosi this is a problem you pointed out in rfcs/3139, do I have the structure of this problem correct? Also can we confirm the index URL ends up in the .crate file?)

If the organization mostly builds recent versions of its software. Then have both registries and treat them as different. New packages/versions are published to the sparse registry at sparse+https://my-sparse-registry.company.com, and older packages stay at registry+https://my-git-registry.company.com. (The Semver Trick, can be used to make the types from dependencies from one index compatible with the types from the other.) When all recent builds only use dependencies from sparse then the git can be shut down. An unpleasant migration, but not a big deal. If it's too costly, then stay on git indexes.

If the organization needs to regularly build historical versions of its software, but really wants to transition to sparse indexes... I don't have a real solution, but I'm also doubtful there are that many users in this category.

@arlosi
Copy link
Contributor

arlosi commented Aug 16, 2022

The biggest problem is that the first time we hear about registry+https://my-git-registry.company.com may be looking at the index files for B

Yes, that is the problem I was describing. Cargo considers the registry URL to be part of the package identity and includes that information in the .crate. The user might not have any local configuration for the registry where the dependency comes from.

Here's an example of what's included in the .crate file:

Cargo.toml

[dependencies.internal-crate-name]
version = "0.3"
registry-index = "sparse+https://index.example.com"

Cargo.lock

[[package]]
name = "internal-crate-name"
version = "0.3.0"
source = "sparse+https://index.example.com"
checksum = "0f2106293c2889292ded40b7014f292b5ad25dc2b04048690c3db5e03a990333"

@ehuss
Copy link
Contributor Author

ehuss commented Aug 31, 2022

To follow up on this, the current proposal is available here: https://hackmd.io/@rust-cargo-team/B13O52Zko

We plan to provide the registries.crates-io.protocol config setting to provide a mechanism to switch between git and http for crates.io only.

In the future (some unspecified time), we may proceed with the second part of the proposal to add a canonical setting to config.json to provide a mechanism for alt-registries to migrate to http.

The New registry and Without compatibility options will be available to alt-registries from the start. But obviously that option is not convenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sparse-registry Area: http sparse registries S-needs-mentor Status: Issue or feature is accepted, but needs a team member to commit to helping and reviewing.
Projects
None yet
Development

No branches or pull requests

4 participants