Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add architecture detection logic to setup.py #13

Closed
honnibal opened this issue Aug 19, 2019 · 7 comments · Fixed by #44
Closed

Add architecture detection logic to setup.py #13

honnibal opened this issue Aug 19, 2019 · 7 comments · Fixed by #44

Comments

@honnibal
Copy link
Member

The setup.py needs a smarter way to pick non-x86_64 architectures automatically, so that folks don't need to use the BLIS_ARCH environment variable as often. This is currently hard for me to test as I don't have access to machines with alternate architectures, so --- help wanted!

@Impelon
Copy link
Contributor

Impelon commented Aug 21, 2019

To make suggestions, that give me some information on my AArch64 machine (so this will only work on Linux and maybe similar):

hostnamectl | grep "Architecture:"

This gives me Architecture: arm64 on my system, which is technically correct, but maybe a bit too broad.

lscpu | grep -e "Architecture:" -e "Model name:"

This gives me

Architecture:                    aarch64
Model name:                      Cortex-A53

which is more accurate. This also gives the micro-architecture name used by flame-blis for ARM.

My idea would be either to rely on these tools, if they are available on Linux, to determine the given architectures. Of course this is not that self contained anymore.
Let's try something pythonic and system-independent:

platform.machine() from the built-in platform module

This gives me aarch64 on my system, which is correct, but not very specific, again.
(Note: platform.processor() returns an empty string on my system, so does uname -p)

The py-cpuinfo module

This module actually gathers information from the linux utilities I mentioned above and many more (windows registry, etc.). It showed me the following relevant information:

[...]
Vendor ID: ARM
Hardware Raw: 
Brand: Cortex-A53
[...]
Arch: ARM_8
[...]
Raw Arch String: aarch64   # note from myself, this uses platform.machine() internally
[...]
Flags: asimd, cpuid, crc32, evtstrm, fp

The most interesting values are in my opinion Arch and Brand (also Raw Arch String, but as mentioned, that is the same as platform.machine()). Looking into it shows, that

  • Brand is obtained as Information from the CPU with the help of aforementioned system-dependent tools. It just happens that my Broadcom BCM2837 CPU declares itself as Cortex-A53. It may be the same for many ARM processors, but I don't know. So as far as I can tell, there is no reliable way to get the micro-architecture for ARM, except the declared CPU-Brand that may not be standardized. EDIT: in my case the Brand is actually just the Model name from lscpu.
  • Arch is obtained from Raw Arch String by running pattern-matching on known architectures. For that matter I do not know if it is any more useful than Raw Arch String.

I am however not experienced in this field at all, I am just making some suggestions that do deliver good results on my system. I hope I could help by gathering some information.

@honnibal
Copy link
Member Author

Thanks, that's very helpful!

I think it's reasonable to focus first on Linux support. I don't think non-x86 OSX is even possible currently.

@Impelon
Copy link
Contributor

Impelon commented Aug 23, 2019

I have actually looked some more into it.
pytorch has it's own cpu-detection library, which is also capable of determining the micro-architecture of a system. (aka. Cortex-A53 vs Cortex-A57, etc.)

Also after a bit more research I found, that Intel and AMD CPU's dump a hex identifying their micro-architecture in /proc/cpuinfo as microcode.
ARM systems do something similar. In /proc/cpuinfo one can find CPU part, which identifies the used micro-architecture as a 3-digit-hex. One can find these in ARM's official documentation on (CPU) ID Registers for each individual micro-architecture. The previous link should vaguely document the Cortex-A57's Main ID-Register which includes the Primary part number, which is exactly the same as the one shown in CPU part. Hence if one matches these hex-values one can find out the exact micro-architecture on ARM-systems and I believe this is the method used by lscpu and pytorch/cpuinfo.

I started writing a small script that would detect the micro-architecture on ARM systems with Linux, by comparing the found CPU part with known ones for aarch64 systems, as listed in pytorch's library and ARM's CPUID Register-documentation (For cortex-a53 this part-number is 0xD03 and for cortex-a57 it is 0xD07 for example). But then I discovered that different vendors (qualcom, etc.) may use other part-numbers for custom, arm-derived architectures and I gave up. I do not know how abundant those derived architectures are.

import platform

if platform.machine() == "aarch64": # we are on aarch64, hence we can check for "CPU part" to determine the micro-arch
with open("/proc/cpuinfo", "r") as f:
    for line in f:
        if not line.strip(): # ignore multiple (logical and physical) cores
            break
        if "cpu part" in line.lower():
            parsed = line.rpartition("0x")
            if parsed[1]:
                print("Micro-Arch:", parsed[2])  # found hex-value
                # prints `Micro-Arch: d03` on my aarch64 system
                # now what? find the corresponding micro-arch! somehow...
                #microarch_from_id(parsed[2])

This would be a crude but simple, self-contained way to detect aarch64 micro-architectures on Linux. But this would have to be maintained, and I could not find a good source for matching micro-architectures to identifiers, except maybe the file I already linked in pytorch's library.

I also quickly searched on flame/blis, since the config_registry mentioned they wanted to add support for this, and I found this, which also uses /proc/cpuinfo and probably the method I mentioned above.

All in all, I think it would be best to use a tool such as pytorch/cpuinfo (works on multiple OS) or lscpu (on Linux only) to handle the detection.

@gaiar
Copy link
Contributor

gaiar commented Aug 27, 2019

Mine output:

root@scw-cranky-dubinsky:~/developer# lscpu | grep -e "Architecture:" -e "Model name:"
Architecture:        aarch64
Model name:          ThunderX 88XX 

sebpop pushed a commit to sebpop/cython-blis that referenced this issue May 28, 2020
This is a fix for bug explosion#13 : add logic to setup.py to detect aarch64
LITTLE cores for which BLIS has configure flag BLIS_ARCH="cortexa53".  All other
cores will configure with BLIS_ARCH="cortexa57".
sebpop pushed a commit to sebpop/cython-blis that referenced this issue May 28, 2020
This is a fix for bug explosion#13 : add logic to setup.py to detect aarch64
LITTLE cores for which BLIS has configure flag BLIS_ARCH="cortexa53".  All other
cores will configure with BLIS_ARCH="cortexa57".
sebpop pushed a commit to sebpop/cython-blis that referenced this issue May 28, 2020
This is a fix for bug explosion#13 : add logic to setup.py to detect aarch64
LITTLE cores for which BLIS has configure flag BLIS_ARCH="cortexa53".  All other
cores will configure with BLIS_ARCH="cortexa57".
honnibal pushed a commit that referenced this issue May 28, 2020
This is a fix for bug #13 : add logic to setup.py to detect aarch64
LITTLE cores for which BLIS has configure flag BLIS_ARCH="cortexa53".  All other
cores will configure with BLIS_ARCH="cortexa57".
@Impelon
Copy link
Contributor

Impelon commented Nov 27, 2020

I see #26 has been reverted again. Is there a specific reason why? Maybe because not every aarch64 system has the required binaries.

How about my suggested solution for this using /proc/cpuinfo? That would allow for distinction between all aarch64 CPUs, especially most commonly cortexA53 and cortexA57.
I think that would be a good follow-up for #44

@adrianeboyd
Copy link
Contributor

I wondered where this had gone, too, and it looks like cortexa53 isn't currently supported for 0.7.x, but #26 is still present in 0.4.x. There might be some reason I'm not aware of, but I think we'd just need to add the relevant build files for 0.7.x to support it again? I don't think the cortexa57 build will break for cortexa53, but I haven't tested it personally and I don't know anything about how large the speed difference is. I presume it's a lot better than generic, but I really don't know for sure?

If you'd like to test it you should be able to run something like this to install from the PR branch:

pip install https://github.com/adrianeboyd/cython-blis/archive/feature/linux-arch-detection.zip

Run the (minimal) tests with:

python -m pytest --pyargs blis

I'd appreciate any testing, especially on platforms I don't have access to!

I would like to release v0.4.2 with python 3.9 wheels, but we need figure out a few details about what happens for users who are upgrading spacy on platforms that no longer have binary wheels. A lot of existing packages have blis>=0.4.0<0.5.0, so we might need to call it v0.5.0 and introduce more python version-specific blis requirements, I'm not quite sure yet. (I have spent nearly a month primarily dealing with packaging and need to get back to other tasks at some point, too.)

@Impelon
Copy link
Contributor

Impelon commented Nov 27, 2020

I'd appreciate any testing, especially on platforms I don't have access to!

Well I had testing blis on the Raspberry Pi 3 (with cortexa53) on my to-do list ever since someone reported a successful build in #9
Never got around to it, and currently I don't have access to my Pi, but I will let you know my results when I get around to it.

Edit: Also just noticed I mistakenly put referenced PR 41 instead of 44 above, wooups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants