Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid security issues of subprocess shell #6498

Merged
merged 10 commits into from
Sep 11, 2024
Merged

Conversation

tjruwase
Copy link
Contributor

@tjruwase tjruwase commented Sep 5, 2024

Avoid security issues of shell=True in subprocess

hostname_cmd = ["hostname -I"]
result = subprocess.check_output(hostname_cmd, shell=True)
import shlex
hostname_cmd = shlex.split("hostname -I")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not need shlex.split when the command has no placeholder for injection. The same goes for some other fixes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For hostname -I, using socket.gethostname and socket.gethostbyname_ex can be safer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tohtana, thanks for the feedback. I will remove the shlex.split() here. However, I am not getting the same output if I use socket.gethostbyname_ex(socket.gethostname())

Copy link
Contributor Author

@tjruwase tjruwase Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the shlex.split() is still need to pass args as a list since we are removing shell=True. Alternatively, I could manually construct the list. However, I think I will keep the shlex.split() to future-proof for arg changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this code? This is a general proposal to make the code robust against malicious modifications of system commands. But I don't think this is crucial because it won't be a typical attack that can harm our users. We can just keep hostname -I if this doesn't work.

>>> import socket
>>> hostname = socket.gethostname()
>>> ip_addresses = socket.gethostbyname_ex(hostname)[2]
>>> ip_addresses
['172.17.0.2']

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that as well, it looks like we would need to modify the command to be:

>>> socket.gethostbyname_ex(socket.gethostname()+".local")[2][0]

Thought it is slower to run on my system with that change.

I wonder if hosts or DNS in your system (Windows?) has [HOSTNAME].local but it doesn't work on my env.

Copy link
Contributor Author

@tjruwase tjruwase Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> socket.gethostbyname_ex(socket.gethostname()+".local")[2][0]

This works and gives correct/expected results on my Linux lambda box.

@tohtana, do you mean this does not work for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work on my wsl.

$ python -c 'import socket; socket.gethostbyname_ex(socket.gethostname()+".local")[2][0]'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
socket.gaierror: [Errno -5] No address associated with hostname

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I'd say lets leave it as hostname -I for now, and we can make another PR to update where we can more strenuously test Windows and other OSs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to address this in another PR. This one is urgent and focuses on security.
But I'm not sure it is a good idea to get the IP from the first entry from hostname -I. It is not simple to control it even for the administrator. It is easier to tell users to configure /etc/hosts properly.
After a quick look on the usage, probably it can also be a hostname, not an IP.

@tjruwase
Copy link
Contributor Author

tjruwase commented Sep 5, 2024

@loadams, do you have any idea how to repro these py UT failures?
https://github.com/microsoft/DeepSpeed/actions/runs/10723587021/job/29737199498?pr=6498

@loadams
Copy link
Collaborator

loadams commented Sep 5, 2024

@loadams, do you have any idea how to repro these py UT failures? https://github.com/microsoft/DeepSpeed/actions/runs/10723587021/job/29737199498?pr=6498

Yes, @tjruwase - I believe this should just be taking an environment with torch installed and cloning your branch and running pip install .

Though these look to be caused by the changes in setup.py not being able to find git to get the commit hash to build the wheel with name 0.15.1+commithash.

@tjruwase
Copy link
Contributor Author

tjruwase commented Sep 5, 2024

Yes, @tjruwase - I believe this should just be taking an environment with torch installed and cloning your branch and running pip install .

Unfortunately, I am unable to repro this failure following those steps.

Though these look to be caused by the changes in setup.py not being able to find git to get the commit hash to build the wheel with name 0.15.1+commithash.

Yes, I think that is the problem. This work with shell=True, probably due to some env or PATH settings.

@tjruwase
Copy link
Contributor Author

tjruwase commented Sep 5, 2024

@loadams, do you have any idea how to repro these py UT failures? https://github.com/microsoft/DeepSpeed/actions/runs/10723587021/job/29737199498?pr=6498

Seems to be fixed by prepending with bash -c. Not sure why.

@loadams
Copy link
Collaborator

loadams commented Sep 5, 2024

Yes, @tjruwase - I believe this should just be taking an environment with torch installed and cloning your branch and running pip install .

Unfortunately, I am unable to repro this failure following those steps.

Though these look to be caused by the changes in setup.py not being able to find git to get the commit hash to build the wheel with name 0.15.1+commithash.

Yes, I think that is the problem. This work with shell=True, probably due to some env or PATH settings.

This does run in a docker container and you can run using the same container if needed to test as well, but looks like it is largely resolved as well

@tohtana tohtana requested review from tohtana and removed request for tohtana September 6, 2024 21:55
@tjruwase tjruwase added this pull request to the merge queue Sep 11, 2024
Merged via the queue into master with commit 659f6be Sep 11, 2024
12 checks passed
@jagadish-amd
Copy link
Contributor

We are hitting few issues on ROCm environment (Ubuntu ROCm Pytorch container) due to this PR.
1.
[rank7]: raise child_exception_type(errno_num, err_msg, err_filename)
[rank7]: FileNotFoundError: [Errno 2] No such file or directory: "/opt/rocm/bin/rocminfo | grep -Eo -m1 'Wavefront Size:[[:space:]]+[0-9]+' | grep -Eo '[0-9]+'"
[rank0]: Traceback (most recent call last):

Building extension module transformer_inference...
Using envvar MAX_JOBS (32) as the number of workers...
Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root...
Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root...
Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root...
ninja: error: build.ninja:8: unexpected '='

[rank4]: Traceback (most recent call last):
[rank4]: File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
[rank4]: subprocess.run(
[rank4]: File "/opt/conda/envs/py_3.9/lib/python3.9/subprocess.py", line 528, in run
[rank4]: raise CalledProcessError(retcode, process.args,
[rank4]: subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '32']' returned non-zero exit status 1.

Please let me know if we need to create new issue to track it.
@loadams @jithunnair-amd @pruthvistony @jeffdaily

@tjruwase
Copy link
Contributor Author

We are hitting few issues on ROCm environment (Ubuntu ROCm Pytorch container) due to this PR. 1. [rank7]: raise child_exception_type(errno_num, err_msg, err_filename) [rank7]: FileNotFoundError: [Errno 2] No such file or directory: "/opt/rocm/bin/rocminfo | grep -Eo -m1 'Wavefront Size:[[:space:]]+[0-9]+' | grep -Eo '[0-9]+'" [rank0]: Traceback (most recent call last):

Building extension module transformer_inference... Using envvar MAX_JOBS (32) as the number of workers... Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root... Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root... Using /root/.cache/torch_extensions/py39_cpu as PyTorch extensions root... ninja: error: build.ninja:8: unexpected '='

[rank4]: Traceback (most recent call last): [rank4]: File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build [rank4]: subprocess.run( [rank4]: File "/opt/conda/envs/py_3.9/lib/python3.9/subprocess.py", line 528, in run [rank4]: raise CalledProcessError(retcode, process.args, [rank4]: subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '32']' returned non-zero exit status 1.

Please let me know if we need to create new issue to track it. @loadams @jithunnair-amd @pruthvistony @jeffdaily

@jagadish-amd, yes please create a new ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants