Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thousands separator for the --size=bytes option would be very useful #533

Open
peter-joo opened this issue Jun 28, 2021 · 11 comments
Open
Labels
kind/enhancement Enhancement on current feature

Comments

@peter-joo
Copy link

  • OS: Linux 5.12.9-1-MANJARO x86_64 GNU/Linux
  • lsd --version: lsd 0.20.1
  • echo $TERM: xterm-256color
  • echo $LS_COLORS: rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:

Expected behavior:

It is very hard to quickly interpret/recognize the real file/directory sizes when the --size=bytes option is given:

>lsd --size=bytes --sort=size --reverse`
.rw-r--r-- p p          1  Mon Jun 28 19:12:04 2021  file_1.dat
.rw-r--r-- p p         12  Mon Jun 28 19:12:04 2021  file_2.dat
.rw-r--r-- p p        123  Mon Jun 28 19:12:04 2021  file_3.dat
.rw-r--r-- p p       1234  Mon Jun 28 19:12:04 2021  file_4.dat
.rw-r--r-- p p      12345  Mon Jun 28 19:12:04 2021  file_5.dat
.rw-r--r-- p p     123456  Mon Jun 28 19:12:04 2021  file_6.dat
.rw-r--r-- p p    1234567  Mon Jun 28 19:12:04 2021  file_7.dat
.rw-r--r-- p p   12345678  Mon Jun 28 19:12:04 2021  file_8.dat
.rw-r--r-- p p  123456789  Mon Jun 28 19:12:04 2021  file_9.dat
.rw-r--r-- p p 1234567890  Mon Jun 28 19:12:06 2021  file_10.dat

However the other/similar tool called exa ( https://github.com/ogham/exa ) includes the thousands separator by default:

>exa --bytes --long --sort=size`
.rw-r--r--             1 p 28 Jun 19:12 file_1.dat
.rw-r--r--            12 p 28 Jun 19:12 file_2.dat
.rw-r--r--           123 p 28 Jun 19:12 file_3.dat
.rw-r--r--         1,234 p 28 Jun 19:12 file_4.dat
.rw-r--r--        12,345 p 28 Jun 19:12 file_5.dat
.rw-r--r--       123,456 p 28 Jun 19:12 file_6.dat
.rw-r--r--     1,234,567 p 28 Jun 19:12 file_7.dat
.rw-r--r--    12,345,678 p 28 Jun 19:12 file_8.dat
.rw-r--r--   123,456,789 p 28 Jun 19:12 file_9.dat
.rw-r--r-- 1,234,567,890 p 28 Jun 19:12 file_10.dat

Actual behavior

Extra cognitive load without those thousands separators :(

@meain
Copy link
Member

meain commented Jun 29, 2021

This might not be a good idea. This will cause issues for people who might be using lsd in a script and grepping for the size part. I don't think breaking compatibility with gnu ls here would be a good idea.

@peter-joo
Copy link
Author

Well, I really wanted to describe what to achieve, not how to achieve.

Also I agree, a previous ticket was by someone who used awk to parse lsd's output and due to space (or other separators) the parsing has failed: #254 (comment)

But there is a very easy way out, which solves all aspect of the problem:
- do not (ever) add thousands separator when the --size=bytes option is used
- only add thousands separator when a new suboption is used, ie the --size=bytes_with_thousands_separator option is used for example

I hope it clears :)

@meain
Copy link
Member

meain commented Jun 29, 2021

Just wondering what a good option name would be? 🤔 bytes_with_thousands_separator is a bit too long. Or maybe even a separate option like --num-separators which someone can set to on,off,auto and auto will disable if we detect a pipe?

@peter-joo
Copy link
Author

It is perfectly up to you and up to the project owners, other contributors, etc. how to do it.

For me even the --size=fancy_bytes works :)

@zwpaper
Copy link
Member

zwpaper commented Jun 30, 2021

I would vote for a separated flag --num-separators, as we could apply the separator to B, MB, GB, and even UNIX timestamp may be an option to be applied.

@meain
Copy link
Member

meain commented Jul 1, 2021

Not sure if it will be useful in MB/GB etc as that will break off to next unit at around thousand. As for UNIX timestamp, I don't think comma in a timestamp looks natural. Nobody really reads a timestamp.

@zwpaper
Copy link
Member

zwpaper commented Jul 1, 2021

Oh, my bad, I did not notice that there is no MB or GB option for size.

also, it makes me a little bit awkward leaving me the only one reading timestamp😅.

but as the --num-separators option would only affect the byte-size, it seems that an opinion for --size might be reasonable.

@merkrafter
Copy link

Localization might have to be considered here as well, as some countries use dots for separating thousands. Not sure if that's a real problem though.

@zwpaper zwpaper added Hacktoberfest kind/enhancement Enhancement on current feature labels Oct 7, 2021
@areq212
Copy link
Contributor

areq212 commented Oct 15, 2021

Hi, I was thinking about this issue and I've two questions:

  1. System specific localization - there is num_format library which could provide us with system specific formatting, unfortunately for Windows it requires Clang. Is that a problem? Could Windows build be adjusted to deal with that?
  2. Flags discussion - personally I'm more into adding option for --size flag, with name bytes_with_separators, are there any objections?

@meain
Copy link
Member

meain commented Oct 15, 2021

The solution you bring up actually sound pretty good. Also the word thousands does not make sense anyway. I forgot that in my country we actually separate by hundreds after the first set 😂. bytes-with-separator seems to be good flag.

That said, I am not a big fan of adding clang as a dependency and that too just for Windows. None of the maintainers as far as I know use Windows and adding more brittleness to that platform is probably gonna make things worse.

areq212 pushed a commit to areq212/lsd that referenced this issue Oct 15, 2021
…ery useful | Added support for thousand separated bytes
areq212 pushed a commit to areq212/lsd that referenced this issue Nov 4, 2021
…ery useful | Use system setting to determine formatting - only on unix
areq212 pushed a commit to areq212/lsd that referenced this issue Nov 4, 2021
…ery useful | Potential fix for unix musl
@fgimian
Copy link

fgimian commented Jan 9, 2025

I've submitted a fresh PR adding this functionality in #1112 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancement on current feature
Projects
None yet
Development

No branches or pull requests

6 participants