Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: hello-world-small example size update #50

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

polarathene
Copy link
Contributor

@polarathene polarathene commented Oct 5, 2024

Summary

  • Rust nightly (1.83) now builds these targets (from a glibc host) at 13-20KB smaller.
  • Delta reduced from the ~13KB difference to a 6KB difference.
  • When built with LLD musl is additionally smaller by 5KB, roughly on par with eyra at ~300 bytes delta (nightly fluctuations observed).

Reproduction

# Docker with Fedora 41 for the base reproduction environment:
$ docker run --rm -it --workdir /example fedora:41

# Prep environment:
$ dnf install -y gcc rustup nano
$ rustup-init -y --profile minimal --default-toolchain nightly --target x86_64-unknown-linux-gnu x86_64-unknown-linux-musl --component rust-src
$ . "$HOME/.cargo/env"

# Create basic hello world example:
$ cargo init
# Add the release profile:
# https://github.com/sunfishcode/eyra/blob/v0.17.0/example-crates/hello-world-small/Cargo.toml#L10-L16
$ nano Cargo.toml

musl (25KB)

$ RUSTFLAGS="-Z location-detail=none -C relocation-model=static -C target-feature=+crt-static" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-musl --release

$ du --bytes target/x86_64-unknown-linux-musl/release/example
30288   target/x86_64-unknown-linux-musl/release/example

# NOTE:
# 24,984 bytes with `-C link-arg=-fuse-ld=lld`
# 389,744 bytes with LLD and without `-Z build-std` args

musl + zig (26.4KB)

dnf install -y zig
cargo install cargo-zigbuild

# Only differs by replacing the `build` sub-command with `zigbuild`:
$ RUSTFLAGS="-Z location-detail=none -C relocation-model=static -C target-feature=+crt-static" cargo +nightly zigbuild -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-musl --release

$ du --bytes target/x86_64-unknown-linux-musl/release/example
26424   target/x86_64-unknown-linux-musl/release/example

# NOTE:
# Zig does not presently support static glibc builds, nor is it compatible with Eyra due to duplicate `_start` from Zig
# 351,728 bytes without `-Z build-std` args (Zig uses LLD by default).

glibc (834KB)

# glibc static libs are needed for gnu target to link statically:
$ dnf -y install glibc-static

# Same command as before, only adjusted `--target`
$ RUSTFLAGS="-Z location-detail=none -C relocation-model=static -C target-feature=+crt-static" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --release

$ du --bytes target/x86_64-unknown-linux-gnu/release/example
834224  target/x86_64-unknown-linux-gnu/release/example

# NOTE:
# No size difference when linking with LLD (slightly larger when linking with mold, as per usual)
# 1,121,168 bytes without `-Z build-std` args

eyra (24.7KB)

$ cargo add eyra --no-default-features
# `moreutils` provides the `sponge` command (or you could just edit via nano):
$ dnf -y install moreutils
# Add this line to the top of `src/main.rs`
$ echo 'extern crate eyra;' | cat - src/main.rs | sponge src/main.rs

# NOTE: Only differs by prepending `-C link-arg=nostartfiles`
$ RUSTFLAGS="-C link-arg=-nostartfiles -Z location-detail=none -C relocation-model=static -C target-feature=+crt-static" cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --release

$ du --bytes target/x86_64-unknown-linux-gnu/release/example
24616   target/x86_64-unknown-linux-gnu/release/example

# NOTE:
# No size difference when linking with LLD.
# Increased to 24,696 bytes on nightly 2 days later.
# 388,384 bytes without `-Z build-std` args

nostd reference

For an additional reference the nostd example that was added at a later date:

  • Presently builds with eyra to 5,592 bytes 😎 (with the same Cargo.toml release profile used below)

    • When built without the -Z build-std args increases to 10,144 bytes, or 7,608 bytes by adding -C linker-plugin-lto -C linker=clang -C link-arg=-flto=full -C link-arg=-fuse-ld=mold (note: eyra fails to build if -C link-arg=-fuse-ld=lld is used with -C linker-plugin-lto).
    • Nightly will be required to build for a while.
  • For the musl target (without eyra):

    • It builds to 13,464 bytes, but requires -C link-arg=-lc to build successfully. Adding -C link-arg=-fuse-ld=lld reduces this down to 3,776 bytes.
    • When built with Zig the size is 3,200 bytes, or 2,928 bytes after stripping some ELF strings (-C link-arg=-lc not required, Zig also defaults the linker to LLD already):
      objcopy \
        --remove-section=.comment \
        --remove-section=.note.gnu.build-id \
        target/x86_64-unknown-linux-musl/release/example
  • Static glibc gnu target (without eyra):

    • also builds smaller at 4,760 bytes. This also requires -C link-arg=-lc to be successful. (Nevermind that segfaults and unexpectedly dynamic links glibc. Without +crt-static it'll build dynamically linked successfully at 4,440 bytes)
    • This target requires -C link-arg=/usr/lib64/libc.a -C link-arg=/usr/lib/gcc/x86_64-redhat-linux/14/libgcc_eh.a (at least on Fedora 41 with the glibc-static package) to successfully build and weighs in at 692,976 bytes with the -Z build-std flags (otherwise 64 bytes larger at 693,040 bytes).
    • Possibly similar to the musl limitation, except the linker change barely decreases the size. Tried -C linker-plugin-lto -C linker=clang -C link-arg=-flto=full for cross-language LTO, but no improvement.

@polarathene
Copy link
Contributor Author

polarathene commented Oct 6, 2024

Reference

Unified example for the -gnu (glibc & eyra) + -musl targets for nostd:

src/main.rs:

#![no_std]
#![no_main]

// This approach to include eyra specific lines seems acceptable since Eyra only works with nightly:
// https://github.com/sunfishcode/c-ward/issues/144
#![cfg_attr(feature = "eyra", feature(cfg_match, lang_items), allow(internal_features))]
cfg_match! {
    cfg(feature = "eyra") => {
      extern crate eyra;

      #[global_allocator]
      static GLOBAL_ALLOCATOR: rustix_dlmalloc::GlobalDlmalloc = rustix_dlmalloc::GlobalDlmalloc;

      #[lang = "eh_personality"]
      extern "C" fn eh_personality() {}
    }
}

// NOTE: This differs from the official example,
// Provides visible feedback to stdout at the expense of extra size:
#[no_mangle]
pub extern "C" fn main() -> isize {
    const HELLO: &'static str = "Hello, world!\n";
    unsafe { write(1, HELLO.as_ptr() as *const i8, HELLO.len()) };
    0
}

extern "C" {
    fn write(fd: i32, buf: *const i8, count: usize) -> isize;
}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! { loop {} }

build.rs:

fn main() {
  let target_is_musl = std::env::var("CARGO_CFG_TARGET_ENV")
    .is_ok_and(|v| v == "musl");
  let target_is_glibc = std::env::var("CARGO_CFG_TARGET_ENV")
    .is_ok_and(|v| v == "gnu");

  // Pass `-nostartfiles` to the linker, when Eyra is enabled.
  if cfg!(feature = "eyra") {
    println!("cargo:rustc-link-arg=-nostartfiles");
  } else {
    // NOTE: Not required when building with `cargo zigbuild`:
    if target_is_musl {
      println!("cargo:rustc-link-arg=-lc");
    }

    // NOTE: Absolute paths specific to Fedora 41 used here.
    // Not providing the static libraries will dynamically link libc and segfault at runtime.
    if target_is_glibc {
      println!("cargo:rustc-link-arg=/usr/lib64/libc.a");
      println!("cargo:rustc-link-arg=/usr/lib/gcc/x86_64-redhat-linux/14/libgcc_eh.a");
    }
  }
}

Cargo.toml:

[package]
name = "example"
version = "0.1.0"
edition = "2021"

[dependencies]
eyra = { version = "0.17.0", default-features = false, optional = true }
rustix-dlmalloc = { version = "0.1.0", features = ["global"], optional = true }

[features]
eyra = ["dep:eyra", "dep:rustix-dlmalloc"]

[profile.release]
lto = true
codegen-units = 1
panic = "abort"
opt-level = "z"
strip = true

rust-toolchain.toml: (optional, pins nightly version used)

[toolchain]
profile = "minimal"
channel = "nightly-2024-10-04"
components = ["rust-src"]
targets = ["x86_64-unknown-linux-gnu", "x86_64-unknown-linux-musl"]

glibc (692,976 B)

# NOTE: The `-Z build-std*` args aren't doing much for the above `nostd` focused example:
# LLD as the linker benefits musl notably in size (~14KB => ~4KB), gnu targets minimally.
RUSTFLAGS="-Z location-detail=none -C link-arg=-fuse-ld=lld -C relocation-model=static -C target-feature=+crt-static" \
  cargo +nightly build --target x86_64-unknown-linux-gnu --release \
    -Z build-std=std,panic_abort \
    -Z build-std-features=panic_immediate_abort

eyra (5,712 B)

# NOTE: The `-Z build-std*` args aren't doing much for the above `nostd` focused example:
# LLD as the linker benefits musl notably in size (~14KB => ~4KB), gnu targets minimally.
RUSTFLAGS="-Z location-detail=none -C link-arg=-fuse-ld=lld -C relocation-model=static -C target-feature=+crt-static" \
  cargo +nightly build --target x86_64-unknown-linux-gnu --release \
    -Z build-std=std,panic_abort \
    -Z build-std-features=panic_immediate_abort \
    --features eyra

musl (3,952 B)

# LLD as the linker benefits musl notably in size (~14KB => ~4KB), gnu targets minimally.
RUSTFLAGS="-Z location-detail=none -C link-arg=-fuse-ld=lld -C relocation-model=static -C target-feature=+crt-static" \
  cargo +nightly build --target x86_64-unknown-linux-musl --release \
    -Z build-std=std,panic_abort \
    -Z build-std-features=panic_immediate_abort

@sunfishcode sunfishcode merged commit 5a96c77 into sunfishcode:main Oct 10, 2024
5 checks passed
@sunfishcode
Copy link
Owner

Thanks!

Also, if you're using #![no_std] and #![no_main], you may also be interested in using origin directly, which can produce even smaller binaries.

@polarathene
Copy link
Contributor Author

No worries!

This was just from having some time spare to go over my prior notes on the topic and do a revision / summary over my original issue ( #27 ).

I was a bit surprised with some of the musl insights, especially when changing to lld for the linker having a notable improvement.


Also, if you're using #![no_std] and #![no_main], you may also be interested in using origin directly, which can produce even smaller binaries.

I hit a bit of a snag not long after the docs PR here was merged nightly releases were failing due to a change that affected unwinding 0.2.2, but while unwinding 0.2.3 resolved that, in some builds with Eyra I encountered a new failure that didn't apply with unwinding 0.2.2.

Unfortunately due to the semver resolution (despite not changing my Eyra version), the unwinding crate would resolve to 0.2.3 (which will fail on my pinned nightly version in rust-toolchain.toml). So presently I need to pin unwinding 0.2.2 in my Cargo.lock or via an addition to my Cargo.toml along with the nightly pin, or run my patched unwinding 0.2.3 which apparently is not a valid fix (would break builds for others).

Just adding that context for anyone that lands here and attempts to reproduce the examples without a Cargo.lock 😅

@polarathene
Copy link
Contributor Author

As for size yes I got 352 bytes with your origin example IIRC, while a similar "Hello World" with rustix directly was 344 bytes. At that extreme -gnu / -musl target was irrelevant due to no libc usage?

I had come across this blog article prior to trying your origin example (they managed 640 bytes), where at the end their "Hello World" string version was 888 bytes.

The article (at least at the time I read it) was lacking information to reproduce their final result but after I saw the syscall usage to avoid libc I immediately thought rustix might let me do that without having to think about managing syscalls directly (I had not used rustix, but it was a nice reason to try it).

Origin Examples (352 & 504 bytes)

After that I got around to looking at the origin examples and realized you did roughly the same (but with proper error handling), and I got that down to 504 bytes:

# The nightly `-Z build-std` flags trim off almost 500 more bytes:
RUSTFLAGS='-C link-arg=-Wl,--build-id=none,--omagic,-z,nognustack -C link-arg=-fuse-ld=lld -C relocation-model=static -C target-feature=+crt-static -C link-arg=-nostartfiles' \
  cargo build --release --target x86_64-unknown-linux-gnu \
  -Z build-std=core,panic_abort \
  -Z build-std-features=panic_immediate_abort

# 720 bytes down to 504 bytes (nightly `.comment` content is larger than stable toolchain due to version info):
objcopy -R .comment target/x86_64-unknown-linux-gnu/release/example

NOTE: That was with origin = 0.23.0, since the current 0.23.1 release sets unwinding = 0.2.3 as the minimum, preventing me from using cargo update unwinding --precise 0.22.

With my rustix attempt, I do remember a bit of a slow down when looking at how to approach the exit() call as while I found it in rustix source, the docs didn't cover it.

Off-topic: Eyra docs.rs are failing to build, last successful docs publish was 0.16.0.

Hello World examples - 344 bytes + 456 bytes

For anyone interested in reproducing this, I'll share it, but at this point it doesn't help evaluate Origin or Eyra as it's too simple now there is no overhead that they can reduce:

#![no_std]
#![no_main]

#[no_mangle]
pub extern "C" fn _start() -> ! {
  exit(); // +8 bytes to size vs using `loop() {}`
}

fn exit() -> ! { unsafe { rustix::runtime::exit_thread(42) } }

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! { loop {} }
[package]
name = "example"
version = "0.0.0"
edition = "2021"

[dependencies]
rustix = { version = "0.38.37", default-features = false, features = ["runtime"] }

[profile.release]
lto = true
panic = "abort"
opt-level = "z"
strip = true
# Current stable Rust (1.81.0):
$ RUSTFLAGS='-C link-arg=-Wl,--build-id=none,--nmagic,-z,nognustack -C link-arg=-fuse-ld=lld -C relocation-model=static -C target-feature=+crt-static -C link-arg=-nostartfiles' \
  cargo build --release --target x86_64-unknown-linux-gnu

# Remove some extra weight:
$ objcopy -R .comment target/x86_64-unknown-linux-gnu/release/example

# Only 344 bytes:
$ du --bytes target/x86_64-unknown-linux-gnu/release/example
344     target/x86_64-unknown-linux-gnu/release/example

$ ldd target/x86_64-unknown-linux-gnu/release/example
        not a dynamic executable

# It works:
$ target/x86_64-unknown-linux-gnu/release/example
$ echo $?
42

For a little more functionality, add the stdio feature to the rustix dep, and in src/main.rs update _start() to call this method before exit():

#[inline(always)]
fn hello_world() {
  rustix::io::write(
    unsafe { rustix::stdio::stdout() },
    "Hello, world!\n".as_bytes()
  ).unwrap();
}

This will have some extra content we can trim away via other flags (if min size was the goal, these aren't always advised of course):

# Additional linker arg `--no-eh-frame-hdr`:
$ RUSTFLAGS='-C link-arg=-Wl,--build-id=none,--nmagic,-z,nognustack,--no-eh-frame-hdr -C link-arg=-fuse-ld=lld -C relocation-model=static -C target-feature=+crt-static -C link-arg=-nostartfiles' \
  cargo build --release --target x86_64-unknown-linux-gnu

# Also remove `.eh_frame`:
# NOTE: `--build-id=none` above is more optimal vs `-R .note.gnu.build-id` post-build:
$ objcopy -R .comment -R .eh_frame target/x86_64-unknown-linux-gnu/release/example

# Only 584 bytes:
$ du --bytes target/x86_64-unknown-linux-gnu/release/example
584     target/x86_64-unknown-linux-gnu/release/example

$ ldd target/x86_64-unknown-linux-gnu/release/example
        not a dynamic executable

$ target/x86_64-unknown-linux-gnu/release/example
Hello, world!

Alternatively, the --no-eh-frame-hdr and objcopy -R .eh_frame aren't relevant if you use -Z build-std=core -Z build-std-features=panic_immediate_abort, which when --nmagic is swapped for --omagic in this case results in 456 bytes (provides no improvement for the original 344 bytes version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants