Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cygwin directories not being cleaned #1880

Closed
patacongo opened this issue Sep 23, 2020 · 20 comments · Fixed by #1901
Closed

Cygwin directories not being cleaned #1880

patacongo opened this issue Sep 23, 2020 · 20 comments · Fixed by #1901
Assignees
Labels
blocker Release Blocker Type: Bug Something isn't working Type: Regression

Comments

@patacongo
Copy link
Contributor

patacongo commented Sep 23, 2020

Under Cygwin, the following directories are not being removed by the distclean target:

arch/<arch>/include/board
arch/<arch>/include/chip

This was noted in apache/nuttx-testing#39

I have several old board and and chip directories left over from unsuccessful distclean operations on different architectures. These two lines in arch/arm/src/Makefile (and other arch Makefiles) are not apparently not working under Cygwin:

    $(Q) $(DIRUNLINK) include/arch/board
    $(Q) $(DIRUNLINK) include/arch/chip

Cygwin supports is own style of symbolic links and these are used EXCEPT for the case where a Windows native toolchain is used such as the popular (https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm).

Those are Window native tools and cannot follow non-standard Cygwin symbolic links so in this case, the tools/dirlinks.sh script does a full directory copy and the corresponding unlink script does a directory removal.

Without knowing more, it looks like the DIRUNLINK is failing to remove the Cygwin symbolic link or the copied directory.

@patacongo patacongo added blocker Release Blocker Type: Bug Something isn't working Type: Regression labels Sep 23, 2020
@patacongo
Copy link
Contributor Author

patacongo commented Sep 23, 2020

I don't think this bug has been introduced recently. I have not build the ez80 (under arch/z80) in many months.

I will do this as part of the review for the 10.0.0 release.

@patacongo
Copy link
Contributor Author

I hate the close button being associated with the comment window!!!!!!!!!!

@patacongo
Copy link
Contributor Author

patacongo commented Sep 23, 2020

The stranded directories are NOT the directory copies, but Cygwin symbolic links:

$ ls -ld arch/arm/include/chip
lrwxrwxrwx 1 spuda spuda 83 Aug 30 10:09 arch/arm/include/chip -> /cygdrive/d/Spuda/Documents/projects/nuttx/master/nuttx_fork/arch/arm/include/stm32

When I do

make distclean
tools/configures.sh -c stm32f4discovery:nsh
make context

I get the expected copied directories (because I am using the Arm Embedded toolchain and CONFIG_CYGWIN_WINTOOL=y):

$ ls -ld include/arch
drwxrwxr-x+ 1 spuda spuda 0 Sep 23 08:43 include/arch

$ ls -ld include/arch/board
drwxrwxr-x+ 1 spuda spuda 0 Sep 23 08:43 include/arch/board

$ ls -ld include/arch/chip
ddrwxrwxr-x+ 1 spuda spuda 0 Sep 23 08:43 include/arch/chip

Bit I also get Cygwin Symbolic links:

$ ls -ld arch/arm/include/chip
lrwxrwxrwx 1 spuda spuda 83 Sep 23 08:42 arch/arm/include/chip -> /cygdrive/d/Spuda/Documents/projects/nuttx/master/nuttx_fork/arch/arm/include/stm32

$ ls -ld arch/arm/include/board
lrwxrwxrwx 1 spuda spuda 102 Sep 23 08:42 arch/arm/include/board -> /cygdrive/d/Spuda/Documents/projects/nuttx/master/nuttx_fork/boards/arm/stm32/stm32f4discovery/include

That should not be happening and exonerates the DIRUNLINK calls implicated in the Issue description. These symbolic links are being created erroneously sometime in the context target.

Doing 'make clean_context' does not remove any of those. That is an error. 'make clean_context' should undo everything that was done my 'make context'. Apparently, somewhere along the way that critical part of the clean_context target got moved to distclean, breaking the symmetry of the make targets.

When I do 'make distclean' the unexpected Cygwin symbolic links remain:

$ find . -name chip
./arch/arm/include/chip

$ find . -name board
./arch/arm/include/board
./boards/arm/stm32/common/board

@patacongo patacongo removed the blocker Release Blocker label Sep 23, 2020
@patacongo
Copy link
Contributor Author

patacongo commented Sep 23, 2020

Aha... these bad directories are being created by configure.sh.

$ make distclean (and verify that there are no chip or board directories)
$ tools/configure.sh -c stm32f4discovery:nsh
$ find . -name board
./arch/arm/include/board
./arch/arm/src/board
./boards/arm/stm32/common/board
$ find . -name chip
./arch/arm/include/chip
./arch/arm/src/chip

@v01d @btashton This is a consequence of several changes to the Makefiles recently:

  • The olddefconfig target used to invoke the clean_context target to remove all residuals from the configuration
  • The symbolic links are (probably) created because CONFIG_CYGWIN_WINTOOL has not been selected yet in the configuration (it was initially a Linux configuration). The .config included by the Makefile is the original defconfig file and does not yet have the derived settings like CONFIG_CYGWIN_WINTOOL.=y. So the Makefile runs without sufficient information.
  • The logic that removes these incorrect links was removed from its correct location in the clean_context target to the distclean target.

A possible fix would be to restore the 'make clean_context' and all reconfigurations to eliminate this garbage from running the Makefile with an incomplete configuration. The old, fully working logic in Makefile.unix was:

do_olddefconfig: dirlinks apps_preconfig
	$(Q) APPSDIR=${CONFIG_APPS_DIR} kconfig-conf --olddefconfig Kconfig

olddefconfig: do_olddefconfig clean_context

Where clean_context removed the bad settings left by the 'make olddefconfig' setup. Restoring this code would not fully work, however. You would also have to restore the correct behavior or clean_context since the key clean up was more to the distclean target.

These are the commits that broke the Cygwin build: b5dc837 and c709ed2 Parts of those changes need to be reverted.

I will never understand why people intentionally break full working, well tested code with no real justification.

@xiaoxiang781216 Can you discuss this with @liuguo09 and @anchao? It is their commits together that break the Cygwin build. I am hesitant to restore this necessary logic sense I cannot know their intent with these changes.

@patacongo
Copy link
Contributor Author

@btashton @hartmannathan Should this be marked a blocker? I had it tagged that way, but decided not.

@hartmannathan
Copy link
Contributor

hartmannathan commented Sep 23, 2020 via email

@btashton
Copy link
Contributor

My inclination is for it to be a blocker because I cannot enable windows CI without it and I would like to see that pass prior to the releases.

@hartmannathan
Copy link
Contributor

My inclination is for it to be a blocker because I cannot enable windows CI without it and I would like to see that pass prior to the releases.

Okay in that case, make it a blocker. With the Windows CI working, we will find out about breakage much more quickly. That will be a good thing.

@patacongo
Copy link
Contributor Author

patacongo commented Sep 23, 2020

I don't think it needs to be a blocker, but maybe Good First Issue?

Way to complex and risky for a first issue. I don't even know how to solve it without coordination with Xiaomi guys. @xiaoxiang781216 I think we need to wait and at least get a recommendation (and preferably a draft PR). If you create the draft PR, I will check it out.

@patacongo patacongo added the blocker Release Blocker label Sep 23, 2020
@patacongo
Copy link
Contributor Author

patacongo commented Sep 23, 2020

There could also be other issues in the created context. Some configurations do much more than just setup links. They may, for example, auto-generate files based on the current configuration.

When a defconfig is included by the Makefile as the .config, it is an incomplete and invalid configuration. This could potentially cause other more serious problems than just some extra bogus soft links. It is just not a good behavior.

That is why clean_context was invoked before. To eliminate any potentially bad context created during 'make olddefconfig' or 'make memconfig'.

@patacongo
Copy link
Contributor Author

One solution that I would NOT recommend is modify the distclean target to clean more symbolic links. That is a bandaid and not a really solution to the generic problem that a defconfig is not a valid .config file and can cause problems when used as one in the Makefile.

@xiaoxiang781216
Copy link
Contributor

@patacongo we are looking this problem.

@patacongo
Copy link
Contributor Author

patacongo commented Sep 24, 2020

There are multiple causes of the Cygwin CI build breakage. This commit is also a factor: bd65688 Although the other cited commits leave the system with a bad context setup, that would not have been noticed prior to this commit which changes the WINTOOL variable to the CONFIG_CYGWIN_WINTOOL Kconfig setting. Previously WINTOOL would have been defined; bue CONFIG_CYGWIN_WINTOOL will not be defined (since it is the default and will not be in the defconfig).

That is three commits that break the Cygwin CI build. But b5dc837 is the real culprit because it allows bad context setups into the user build. That could introduce wide-reaching, difficult to analyze problems.

@patacongo
Copy link
Contributor Author

patacongo commented Sep 24, 2020

But b5dc837 is the real culprit because it allows bad context setups into the user build. That could introduce wide-reaching, difficult to analyze problems.

Actually I think I am mistaken about this: that commit does not do the full context. It only does dirlinks as far as I can tell. If that is the case, then things are not so bad. Perhaps we just need to separate out a clean_links target to get rid of those links, just as clean_context was called previously.

@liuguo09
Copy link
Contributor

@patacongo @btashton @xiaoxiang781216 It seems cygwin build using copydir.sh as DIRLINK in tools/Config.mk. In cygwin case, there are two copies: include/arch/chip and arch/arm/include/chip. And it finally only clean include/arch/chip. So could we switch to use link.sh as DIRLINK for cygwin build to keep one copy firstly? I have tested using link.sh locally and distclean verify ok. As to copydir.sh case, it may need separate out a clean_links target as @patacongo suggests.

@patacongo
Copy link
Contributor Author

@patacongo @btashton @xiaoxiang781216 It seems cygwin build using copydir.sh as DIRLINK in tools/Config.mk. In cygwin case, there are two copies: include/arch/chip and arch/arm/include/chip. And it finally only clean include/arch/chip. So could we switch to use link.sh as DIRLINK for cygwin build to keep one copy firstly? I have tested using link.sh locally and distclean verify ok. As to copydir.sh case, it may need separate out a clean_links target as @patacongo suggests.

The Cygwin build does NOT use copydir.sh unless CONFIG_CYGWIN_WINTOOL is selected. That is the problem.

  • Then Make runs during configuration, it includes .config. But that is not complete configuration file. It is a defconfig file.
  • CONFIG_CYGWIN_WINTOOL is the default setting for the selected toolchain so it is NOT selected in the .config file
  • As a result, the dirlinks target uses symbolic links to create the links (WRONG)
  • After configuring, CONFIG_CYGWIN_WINTOOL will be be defined in the expanded, complete .config file
  • The next time Make runs, copydir.sh re-creates all of links with directory copy copies EXCEPT for the two links at arch/arm/include/chip and arch/arm/include/board. Those are NOT replaced because the directory is copied into include/arch/board and include/arch/chip (the symbolic links are probably copied into include/arch as well (ALSO WRONG)
  • When 'make distclean' removes the links, the bad links at include/arch/board and include/arch/chip are not removed because the link target is include/arch/chip and include/arch/board. Hence, those two symbolic links are stranded and the CI build fails.

@patacongo
Copy link
Contributor Author

patacongo commented Sep 24, 2020

This is a consequence of:

  • Replaceingthe WINTOOL definition with the CONFIG_CYGWIN_WINTOOL configuration. WINTOOL was always defined previously.
  • Removing the olddefconfig dependency on clean_context which removed the context (that removed too much, it is only necessary to remove the bad links)
  • And, related, removing the logic that removes the links from the clean_context target.

@patacongo
Copy link
Contributor Author

The MYS2 build is similar in that it must copy directories since MSYS2 does not support any kind of symbolic linking. However, it does not depend on CONFIG_CYGWIN_WINTOOL so should not suffer from this issue.

After you decide on the correct change and before releasing 10.0.0, I will verify the MSYS2 build as well.

@liuguo09
Copy link
Contributor

Thanks for your explanation, I'm working on it and make PR ASAP once fixed.

@liuguo09
Copy link
Contributor

@patacongo @btashton I have made a draft PR #1901 Could you help to review and verify it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker Release Blocker Type: Bug Something isn't working Type: Regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants