Skip to content

Commit

Permalink
Middle-ground vendoring option using a local registry (#46)
Browse files Browse the repository at this point in the history
Summary:
The problem is a goldilocks one.

1. `vendor = false` is quick & easy to manage deps & buckify, but pretty bad day to day.
    - Doesn't work offline as buck2 issues HEAD requests constantly
    - Terrible DX annoyance with "too many open files" errors due to buck trying to download 1000 crates at once. The standard start to your day looks like "run buck2 build about 15 times until by random chance the scheduler manages to get past those errors"
    - Those crates get downloaded again, and again, and again
    - `reindeer buckify` takes 2 seconds or so. Pretty convenient.

2. `[vendor] ...` is slow to manage deps & buckify.
    - Neat for small projects
    - Also probably neat for Meta with y'all's funky  EdenFS etc.
    - But middle ground is bad
        - Middle = `vendor` directory of 1000 crates, 1.2 GB, 50k source files. Mostly from dupes of the windows crates which can't be pinned to one single version etc.
        - `reindeer vendor` takes 35 seconds
        - `reindeer buckify` takes 20 seconds
        - `git status` takes 150ms
        -  The vendor folder wrecks git performance simply by its existence.
    - Build experience is perfect, works offline, etc.

I think we need a solution for the middle ground:

- `vendor = "local-registry"`, using https://github.com/dhovart/cargo-local-registry
- `reindeer vendor` ultimately just writes a bunch of .crate files into vendor/, which are just gzipped tarballs
- .crate files stored in git, but using git-lfs if you like. Suddenly `windows-0.48.0.crate` is just a blob. Your diffs are much simpler when you modify deps. Etc.
- Some buck2 rule to extract them. (There is no prelude rule that can do this with `strip_prefix` and `sub_targets` support, but prelude's `extract_archive` could probably have that added.)

Outcomes:
- Offline works (~~although doesn't handle cargo `package = { git = "..." }` deps yet~~).
- `reindeer vendor` and `reindeer buckify` both take 2 seconds
- `git status` takes 20ms
- Buck builds are a compromise, but a pretty great one. It still has to extract the tarballs when you want to build things. But at least buck won't ever actually extract `windows-0.48.0.crate` on linux, and you only pay for what you build.
- The DX annoyance factor during builds is back to zero. No more too many open files errors.
- DX annoyance when updating deps is acceptable.

Problems:
- Relies on https://github.com/dhovart/cargo-local-registry being installed. Note however this is a single-file binary. I think if you rewrote it without the dependency on the `cargo` crate it would be maybe a 2-file crate. And we could use it as a library.
- I think storing the local registry's `index` folder (nested `ab/ ac/ ah/ ...` folders) might be a little bit annoying if you're making concurrent edits on different branches. But you can always regenerate.

Pull Request resolved: #46

Reviewed By: zertosh

Differential Revision: D67925711

Pulled By: dtolnay

fbshipit-source-id: 2203d939cf4381c722abf6ea88e3acaf86a30c24
  • Loading branch information
cormacrelf authored and facebook-github-bot committed Jan 8, 2025
1 parent 018b70d commit 8f90c1f
Show file tree
Hide file tree
Showing 8 changed files with 260 additions and 75 deletions.
38 changes: 38 additions & 0 deletions src/buck.rs
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,38 @@ impl Serialize for HttpArchive {
}
}

#[derive(Debug)]
pub struct ExtractArchive {
pub name: Name,
pub src: BuckPath,
pub strip_prefix: String,
pub sub_targets: BTreeSet<BuckPath>,
pub visibility: Visibility,
pub sort_key: Name,
}

impl Serialize for ExtractArchive {
fn serialize<S: Serializer>(&self, ser: S) -> Result<S::Ok, S::Error> {
let Self {
name,
src,
strip_prefix,
sub_targets,
visibility,
sort_key: _,
} = self;
let mut map = ser.serialize_map(None)?;
map.serialize_entry("name", name)?;
map.serialize_entry("src", src)?;
map.serialize_entry("strip_prefix", strip_prefix)?;
if !sub_targets.is_empty() {
map.serialize_entry("sub_targets", sub_targets)?;
}
map.serialize_entry("visibility", visibility)?;
map.end()
}
}

#[derive(Debug)]
pub struct GitFetch {
pub name: Name,
Expand Down Expand Up @@ -1030,6 +1062,7 @@ impl Serialize for PrebuiltCxxLibrary {
pub enum Rule {
Alias(Alias),
Filegroup(Filegroup),
ExtractArchive(ExtractArchive),
HttpArchive(HttpArchive),
GitFetch(GitFetch),
Binary(RustBinary),
Expand Down Expand Up @@ -1071,6 +1104,7 @@ fn rule_sort_key(rule: &Rule) -> impl Ord + '_ {
// Make the alias rule come before the actual rule. Note that aliases
// emitted by reindeer are always to a target within the same package.
Rule::Alias(Alias { actual, .. }) => RuleSortKey::Other(actual, 0),
Rule::ExtractArchive(ExtractArchive { sort_key, .. }) => RuleSortKey::Other(sort_key, 1),
Rule::HttpArchive(HttpArchive { sort_key, .. }) => RuleSortKey::Other(sort_key, 1),
Rule::GitFetch(GitFetch { name, .. }) => RuleSortKey::GitFetch(name),
Rule::Filegroup(_)
Expand All @@ -1096,6 +1130,7 @@ impl Rule {
Rule::Alias(Alias { name, .. })
| Rule::Filegroup(Filegroup { name, .. })
| Rule::HttpArchive(HttpArchive { name, .. })
| Rule::ExtractArchive(ExtractArchive { name, .. })
| Rule::GitFetch(GitFetch { name, .. })
| Rule::Binary(RustBinary {
common:
Expand Down Expand Up @@ -1148,6 +1183,9 @@ impl Rule {
Rule::Filegroup(filegroup) => {
FunctionCall::new(&config.filegroup, filegroup).serialize(Serializer)
}
Rule::ExtractArchive(compressed_crate) => {
FunctionCall::new(&config.extract_archive, compressed_crate).serialize(Serializer)
}
Rule::HttpArchive(http_archive) => {
FunctionCall::new(&config.http_archive, http_archive).serialize(Serializer)
}
Expand Down
94 changes: 72 additions & 22 deletions src/buckify.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ use crate::buck;
use crate::buck::Alias;
use crate::buck::BuckPath;
use crate::buck::Common;
use crate::buck::ExtractArchive;
use crate::buck::Filegroup;
use crate::buck::GitFetch;
use crate::buck::HttpArchive;
Expand All @@ -56,6 +57,7 @@ use crate::cargo::Source;
use crate::cargo::TargetReq;
use crate::collection::SetOrMap;
use crate::config::Config;
use crate::config::VendorConfig;
use crate::fixups::ExportSources;
use crate::fixups::Fixups;
use crate::glob::Globs;
Expand Down Expand Up @@ -219,7 +221,7 @@ fn generate_rules<'scope>(
for rule in rules {
let _ = rule_tx.send(Ok(rule));
}
if context.config.vendor.is_none() {
if !matches!(context.config.vendor, VendorConfig::Source(_)) {
deps.push((pkg, TargetReq::Sources));
}
}
Expand Down Expand Up @@ -258,10 +260,18 @@ fn generate_nonvendored_sources_archive<'scope>(

match &lockfile_package.source {
Source::Local => Ok(None),
Source::CratesIo => generate_http_archive(context, pkg, lockfile_package).map(Some),
Source::CratesIo => match context.config.vendor {
VendorConfig::Off => generate_http_archive(context, pkg, lockfile_package).map(Some),
VendorConfig::LocalRegistry => generate_extract_archive(pkg).map(Some),
VendorConfig::Source(_) => unreachable!(),
},
Source::Git {
repo, commit_hash, ..
} => generate_git_fetch(repo, commit_hash).map(Some),
} => match context.config.vendor {
VendorConfig::Off => generate_git_fetch(repo, commit_hash).map(Some),
VendorConfig::LocalRegistry => generate_extract_archive(pkg).map(Some),
VendorConfig::Source(_) => unreachable!(),
},
Source::Unrecognized(_) => {
bail!(
"`vendor = false` mode is supported only with exclusively crates.io and https git dependencies. \"{}\" {} is coming from some other source",
Expand All @@ -272,6 +282,20 @@ fn generate_nonvendored_sources_archive<'scope>(
}
}

fn generate_extract_archive(pkg: &Manifest) -> anyhow::Result<Rule> {
Ok(Rule::ExtractArchive(ExtractArchive {
name: Name(format!("{}-{}.crate", pkg.name, pkg.version)),
src: BuckPath(PathBuf::from(format!(
"vendor/{}-{}.crate",
pkg.name, pkg.version,
))),
strip_prefix: format!("{}-{}", pkg.name, pkg.version),
sub_targets: BTreeSet::new(), // populated later after all fixups are constructed
visibility: Visibility::Private,
sort_key: Name(format!("{}-{}", pkg.name, pkg.version)),
}))
}

fn generate_http_archive<'scope>(
context: &'scope RuleContext<'scope>,
pkg: &'scope Manifest,
Expand Down Expand Up @@ -413,22 +437,25 @@ fn generate_target_rules<'scope>(
log::debug!("pkg {} target {} fixups {:#?}", pkg, tgt.name, fixups);

let manifest_dir = pkg.manifest_dir();
let mapped_manifest_dir =
if context.config.vendor.is_some() || matches!(pkg.source, Source::Local) {
relative_path(&paths.third_party_dir, manifest_dir)
} else if let Source::Git { repo, .. } = &pkg.source {
let git_fetch = short_name_for_git_repo(repo)?;
let repository_root = find_repository_root(manifest_dir)?;
let path_within_repo = relative_path(repository_root, manifest_dir);
PathBuf::from(git_fetch).join(path_within_repo)
} else {
PathBuf::from(format!("{}-{}.crate", pkg.name, pkg.version))
};
let mapped_manifest_dir = if matches!(config.vendor, VendorConfig::Source(_))
|| matches!(pkg.source, Source::Local)
{
relative_path(&paths.third_party_dir, manifest_dir)
} else if let VendorConfig::LocalRegistry = config.vendor {
PathBuf::from(format!("{}-{}.crate", pkg.name, pkg.version))
} else if let Source::Git { repo, .. } = &pkg.source {
let git_fetch = short_name_for_git_repo(repo)?;
let repository_root = find_repository_root(manifest_dir)?;
let path_within_repo = relative_path(repository_root, manifest_dir);
PathBuf::from(git_fetch).join(path_within_repo)
} else {
PathBuf::from(format!("{}-{}.crate", pkg.name, pkg.version))
};
let crate_root = mapped_manifest_dir.join(relative_path(manifest_dir, &tgt.src_path));
let edition = tgt.edition.unwrap_or(pkg.edition);

let mut licenses = BTreeSet::new();
if config.vendor.is_none() {
if !matches!(config.vendor, VendorConfig::Source(_)) {
// The `licenses` attribute takes `attrs.source()` which is the file
// containing the custom license text. For `vendor = false` mode, we
// don't have such a file on disk, and we don't have a Buck label either
Expand Down Expand Up @@ -458,7 +485,8 @@ fn generate_target_rules<'scope>(
// filename, or a list of globs.
// If we're configured to get precise sources and we're using 2018+ edition source, then
// parse the crate to see what files are actually used.
let mut srcs = if (config.vendor.is_some() || matches!(pkg.source, Source::Local))
let mut srcs = if (matches!(config.vendor, VendorConfig::Source(_))
|| matches!(pkg.source, Source::Local))
&& fixups.precise_srcs()
&& edition >= Edition::Rust2018
{
Expand Down Expand Up @@ -504,7 +532,7 @@ fn generate_target_rules<'scope>(
)
.context("rustc_flags")?;

if config.vendor.is_some() || matches!(pkg.source, Source::Local) {
if matches!(config.vendor, VendorConfig::Source(_)) || matches!(pkg.source, Source::Local) {
unzip_platform(
config,
&mut base,
Expand All @@ -516,6 +544,10 @@ fn generate_target_rules<'scope>(
fixups.compute_srcs(srcs)?,
)
.context("srcs")?;
} else if let VendorConfig::LocalRegistry = config.vendor {
let extract_archive_target = format!(":{}-{}.crate", pkg.name, pkg.version);
base.srcs
.insert(BuckPath(PathBuf::from(extract_archive_target)));
} else if let Source::Git { repo, .. } = &pkg.source {
let short_name = short_name_for_git_repo(repo)?;
let git_fetch_target = format!(":{}.git", short_name);
Expand Down Expand Up @@ -887,7 +919,9 @@ fn generate_target_rules<'scope>(
// For non-disk sources (i.e. non-vendor mode git_fetch and
// http_archive), `srcs` and `exclude` are ignored because
// we can't look at the files to match globs.
let srcs = if config.vendor.is_some() || matches!(pkg.source, Source::Local) {
let srcs = if matches!(config.vendor, VendorConfig::Source(_))
|| matches!(pkg.source, Source::Local)
{
// e.g. {"src/lib.rs": "vendor/foo-1.0.0/src/lib.rs"}
let mut globs = Globs::new(srcs, exclude).context("export sources")?;
let srcs = globs
Expand All @@ -901,6 +935,14 @@ fn generate_target_rules<'scope>(
globs.check_all_globs_used()?;
}
srcs
} else if let VendorConfig::LocalRegistry = config.vendor {
// e.g. {":foo-1.0.0.git": "foo-1.0.0"}
let extract_archive_target = format!(":{}-{}.crate", pkg.name, pkg.version);
[(
BuckPath(mapped_manifest_dir.clone()),
SubtargetOrPath::Path(BuckPath(PathBuf::from(extract_archive_target))),
)]
.into()
} else if let Source::Git { repo, .. } = &pkg.source {
// e.g. {":foo-123.git": "foo-123"}
let short_name = short_name_for_git_repo(repo)?;
Expand Down Expand Up @@ -1002,7 +1044,7 @@ fn buckify_for_universe(

// Fill in all http_archive rules with all the sub_targets which got
// mentioned by fixups.
if config.vendor.is_none() {
if !matches!(config.vendor, VendorConfig::Source(_)) {
let mut need_subtargets = HashMap::<Name, BTreeSet<BuckPath>>::new();
let mut insert = |subtarget_or_path: &SubtargetOrPath| {
if let SubtargetOrPath::Subtarget(subtarget) = subtarget_or_path {
Expand Down Expand Up @@ -1043,10 +1085,18 @@ fn buckify_for_universe(
rules = rules
.into_iter()
.map(|mut rule| {
if let Rule::HttpArchive(rule) = &mut rule {
if let Some(need_subtargets) = need_subtargets.remove(&rule.name) {
rule.sub_targets = need_subtargets;
match &mut rule {
Rule::HttpArchive(rule) => {
if let Some(need_subtargets) = need_subtargets.remove(&rule.name) {
rule.sub_targets = need_subtargets;
}
}
Rule::ExtractArchive(rule) => {
if let Some(need_subtargets) = need_subtargets.remove(&rule.name) {
rule.sub_targets = need_subtargets;
}
}
_ => {}
}
rule
})
Expand Down
3 changes: 2 additions & 1 deletion src/cargo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ use serde::Deserializer;
use serde::Serialize;

use crate::config::Config;
use crate::config::VendorConfig;
use crate::lockfile::Lockfile;
use crate::platform::PlatformExpr;
use crate::Args;
Expand Down Expand Up @@ -60,7 +61,7 @@ pub fn cargo_get_lockfile_and_metadata(

let cargo_home;
let lockfile;
if config.vendor.is_none() {
if !matches!(config.vendor, VendorConfig::Source(_)) {
cargo_home = None;

// Whether or not there is a Cargo.lock already, do not read it yet.
Expand Down
53 changes: 39 additions & 14 deletions src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,8 @@ pub struct Config {
#[serde(default)]
pub buck: BuckConfig,

#[serde(
default = "default_vendor_config",
deserialize_with = "deserialize_vendor_config"
)]
pub vendor: Option<VendorConfig>,
#[serde(default, deserialize_with = "deserialize_vendor_config")]
pub vendor: VendorConfig,

#[serde(default = "default_platforms")]
pub platform: HashMap<PlatformName, PlatformConfig>,
Expand Down Expand Up @@ -128,6 +125,8 @@ pub struct BuckConfig {
/// Rule name for http_archive
#[serde(default)]
pub http_archive: StringWithDefault<MustBe!("http_archive")>,
#[serde(default)]
pub extract_archive: StringWithDefault<MustBe!("extract_archive")>,
/// Rule name for git_fetch
#[serde(default)]
pub git_fetch: StringWithDefault<MustBe!("git_fetch")>,
Expand All @@ -150,9 +149,17 @@ pub struct BuckConfig {
pub buildscript_genrule: StringWithDefault<MustBe!("buildscript_run")>,
}

#[derive(Debug, Clone, Deserialize)]
#[serde(deny_unknown_fields)]
pub enum VendorConfig {
Off,
LocalRegistry,
Source(VendorSourceConfig),
}

#[derive(Debug, Default, Clone, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct VendorConfig {
pub struct VendorSourceConfig {
/// List of .gitignore files to use to filter checksum files, relative to
/// this config file.
#[serde(default)]
Expand All @@ -162,6 +169,12 @@ pub struct VendorConfig {
pub checksum_exclude: HashSet<String>,
}

impl Default for VendorConfig {
fn default() -> Self {
VendorConfig::Source(Default::default())
}
}

#[derive(Clone)]
pub struct StringWithDefault<T> {
pub value: String,
Expand Down Expand Up @@ -236,10 +249,6 @@ impl<T> From<String> for StringWithDefault<T> {
}
}

fn default_vendor_config() -> Option<VendorConfig> {
Some(VendorConfig::default())
}

fn default_platforms() -> HashMap<PlatformName, PlatformConfig> {
const DEFAULT_PLATFORMS_TOML: &str = include_str!("default_platforms.toml");

Expand All @@ -259,14 +268,14 @@ fn default_universes() -> BTreeMap<UniverseName, UniverseConfig> {
map
}

fn deserialize_vendor_config<'de, D>(deserializer: D) -> Result<Option<VendorConfig>, D::Error>
fn deserialize_vendor_config<'de, D>(deserializer: D) -> Result<VendorConfig, D::Error>
where
D: Deserializer<'de>,
{
struct VendorConfigVisitor;

impl<'de> Visitor<'de> for VendorConfigVisitor {
type Value = Option<VendorConfig>;
type Value = VendorConfig;

fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("[vendor] section, or `vendor = false`")
Expand All @@ -278,14 +287,30 @@ where
{
// `vendor = true`: default configuration with vendoring.
// `vendor = false`: do not vendor.
Ok(value.then(VendorConfig::default))
Ok(if value {
VendorConfig::default()
} else {
VendorConfig::Off
})
}

fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where
E: serde::de::Error,
{
if v == "local-registry" {
Ok(VendorConfig::LocalRegistry)
} else {
Err(E::custom("unknown vendor type"))
}
}

fn visit_map<M>(self, map: M) -> Result<Self::Value, M::Error>
where
M: MapAccess<'de>,
{
VendorConfig::deserialize(MapAccessDeserializer::new(map)).map(Some)
VendorSourceConfig::deserialize(MapAccessDeserializer::new(map))
.map(VendorConfig::Source)
}
}

Expand Down
Loading

0 comments on commit 8f90c1f

Please sign in to comment.