Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunked file finder #8966

Closed
staabm opened this issue Jan 8, 2025 · 7 comments
Closed

chunked file finder #8966

staabm opened this issue Jan 8, 2025 · 7 comments
Labels

Comments

@staabm
Copy link
Contributor

staabm commented Jan 8, 2025

Feature Request

when changing rules in big codebases, I can see in blackfire profiles, that a huge portion of the time rector processes is spent in the file finder (not the actual code modifications).

I wonder whether we can/could chunk the file finding and e.g. start worker processes for the e.g. the first chunk of files, while we are still analyzing the file-system to find more files..?

see

grafik

e.g. we could try to turn the FileCacheStorage->load() method into a generator and return several times in chunks (or file by file?)

@staabm staabm added the feature label Jan 8, 2025
@TomasVotruba
Copy link
Member

Thanks for investigation 👍

Without much thinking, I think PHPStan resource loader needs all files before hand, and would be a blocker for this.

How does PHPStan handle file loading?

What is the time spent on these ~700 files?

If you find any way to improve this part, I'm all ears

@staabm
Copy link
Contributor Author

staabm commented Jan 8, 2025

What is the time spent on these ~700 files?

on my macbook m4 pro it takes ~1minute (87% of the overall time).

I will play a bit with the idea

@TomasVotruba
Copy link
Member

That's really huge number.
We have a project with 5 000 files and Rector stars working under 5 seconds. I'm on Ubuntu

@samsonasik
Copy link
Member

Just guessing, it probably related with some php extension on your macOS interferring tcp connection, like IMAP, which try to lookup connection in everytime parallel process created, I wrote a blog post about it

https://samsonasik.wordpress.com/2023/09/30/handle-slow-php-cli-with-imap-extension-on-macos/

@staabm
Copy link
Contributor Author

staabm commented Jan 9, 2025

thanks for the input.

my modules:

[PHP Modules]
bcmath
bz2
calendar
Core
ctype
curl
date
dba
dom
exif
FFI
fileinfo
filter
ftp
gd
gettext
gmp
hash
iconv
intl
json
ldap
libxml
mbstring
mysqli
mysqlnd
odbc
openssl
pcntl
pcre
PDO
pdo_dblib
pdo_mysql
PDO_ODBC
pdo_pgsql
pdo_sqlite
pgsql
Phar
posix
pspell
random
readline
Reflection
session
shmop
SimpleXML
soap
sockets
sodium
SPL
sqlite3
standard
sysvmsg
sysvsem
sysvshm
tidy
tokenizer
xml
xmlreader
xmlwriter
xsl
Zend OPcache
zip
zlib

[Zend Modules]
Zend OPcache

I have a feeling its somehow related to antivirus or security software intercepting filesystem access

@samsonasik
Copy link
Member

It is possibly a way of PHPStan container creation on the first execution, try run rector 2 times:

# first maybe spent time to save PHPStan Container
time vendor/bin/rector process --clear-cache

# use already exists PHPStan container
time vendor/bin/rector process --clear-cache

and see if there is time different.

which on first execution, the system try to verify some integrity of files/permission by some software/antivirus.

@samsonasik
Copy link
Member

samsonasik commented Jan 10, 2025

I am closing it, on parallel, it only proces passed chunked files per collection of files that per-job process already collected in per-total jobs basis, so the files verification only on very first process.

I am using m1 mac mini with only 8gb ram, and it run immediatelly for many files.

So if it it take long on starter, probably related with some extension or software that verify files in your mac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants