-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symbolic links (git-annex files) are ignored #627
Comments
Interesting bug! Thank you for reporting this. |
I ran into this issue as well using datalad. Fixing it would be greatly appreciated! In the context of a git repository, my proposal for a fix would be to differentiate between two types of symlinks:
I am not sure how this would fit in with other VCS's though. Maybe if there is no VCS we should consider symlinks to files in the current directory and subdirectories of it to be of type 1 and symlinks pointing to something "external", i.e. outside of the current directory, to be of type 2. This assumes that the current directory can be considered a "project root". I could take a stab at implementing this, but I would like some feedback on this proposal first. |
I'm not familiar enough with git-annex to have a good input on this. I think your proposal makes sense, even barring Git specifics, so:
But I'm a little uncertain. 'Outside the project' sounds a little out-of-scope for REUSE. I guess the question is: is data tracked by git-annex part of the project? If yes, let's lint the data. If no, let's not. If 'it depends', then let's pick one behaviour as default and add a flag to toggle the behaviour. I err on the side of 'ignore all symlinks' as default behaviour, because it's heaps easier to document, and abides by the principle of least astonishment. |
In a typical git-annex project you would have a number of "annex'ed" files. These files are simply symlinks tracked by git, which point to somewhere in Since the symlinks are simply placeholders for the actual data, the data should definitely be considered part of the project. But because the symlinks might be "broken", I don't think we should In the terminology of the REUSE spec I think we should consider a symlink to another file under the project root (which is not ignored by the VCS) to be the same "Covered File" as it's target. Therefore we can ignore this symlink, since it's target will be lint'ed. This would keep the behaviour expected in #202. But if the symlink points outside of the project (e.g. into .git or to an ignored file, or really outside the project root) we should consider the symlink itself to be a "Covered File". In that case we have to provide a *.license file next to the symlink or specify license information in .reuse/dep5. Not resolving symlinks has two advantages:
From a high-level point of view, the linked-to files aren't really outside of the project. In the case of git-annex they are simply distributed in a more manageable/efficient way, but still are part of the repository. A symlink might also point to a shared location (maybe a network drive), and linking instead of copying is simply a storage optimization. I don't think we should add a flag for this. Making the result of |
git-annex is a tool to manage big files in Git repositories. In science it is used by the Datalad community to manage dataset. Git-annex works by managing symbolic links in the Git work tree which point to the actual file conten in
.git/annex/objects
. Now,reuse
seems to ignore such symbolic links.Steps to reproduce
First issue is that
reuse lint
doesn’t complain about annexed files not having a license.This becomes worse if I do assign a license, but then
reuse lint
fails because of an “unused” license. This leads to failed CI pipelines. So to continue the above:Reuse version: 1.0.0
git-annex version: 10.20221103
Desired behavior
reuse
follows symlinks, at least if they are annexed files, which means that the symlink points to something in.git/annex/objects
.I see that issue #202 discussed the topic of symlinks. I suggest to revisit the issue for git-annex.
The text was updated successfully, but these errors were encountered: