-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting python bindings and cpp in ttnn library #17957
Conversation
Blackhole Post commit https://github.com/tenstorrent/tt-metal/actions/runs/13397082468 |
33ecac3
to
1989dc2
Compare
${CMAKE_CURRENT_SOURCE_DIR}/reduce_scatter/host/reduce_scatter_worker_builder.cpp | ||
${CMAKE_CURRENT_SOURCE_DIR}/reduce_scatter/host/reduce_scatter_common.cpp | ||
${CMAKE_CURRENT_SOURCE_DIR}/barrier/device/host/barrier_full_worker_grid.cpp | ||
${CMAKE_CURRENT_SOURCE_DIR}/barrier/device/barrier_op.cpp | ||
${CMAKE_CURRENT_SOURCE_DIR}/barrier/barrier.cpp | ||
${CMAKE_CURRENT_SOURCE_DIR}/barrier/barrier_pybind.cpp | ||
CACHE INTERNAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow. I'm still working on the CMake files at the Metalium layer. I haven't looked too closely at the TT-NN layer. But this right here is a recipe for sadness and tears.
Here's what's going on with this line: we're setting a cache entry (it'll show up in CMakeCache.txt
in the build tree). Cache entries do cache across CMake runs, so they're particularly useful for expensive values -- like searching the system for where a dependency lives, or results of probing the compiler to determine which flags to use, etc. But they also serve as a means for a dev to override a value. It's fair game (even encouraged) to edit CMakeCache.txt and adjust something without having to modify the upstream code. Like point to your local build of a dependency, or similar.
The consequence of that is that set(... CACHE)
is actually saying "use the cache value; and in case of a cache miss, here's the default value". That means that unless devs regularly blow away their cache, they're going to continue on with their old value. This is really bad when adding a file. It's just as bad when deleting a file. For a list of sources we really really do not want this behaviour.
I suspect the reason this was done was a way to bypass the scoping of variables -- set it globally so the higher scope can see its value.
The quickest band-aid for this is to add FORCE
. Which will always clobber the cache value. But this is still undesired. Slightly better, but still un-good, is to stop putting it in the cache and use PARENT_SCOPE
.
The ideal state (and most idiomatic) is to define sub-libs (object libs) for these sub-groups of files. This has the added benefit of being able to graph the internal dependencies to get a sense of how the larger library is designed. But any time a project is migrated from a global "anything can #include anything" to a more structured dependency tree, there's always fallout from revealing circular dependencies. Not something you want to mix into an unrelated PR.
A middle ground would be to add the sources here via target_sources(...)
instead of hoisting a variable up for another layer to catch and process. The add_subdirectory
must be done AFTER the target is defined. ie: these lines: https://github.com/tenstorrent/tt-metal/blob/main/ttnn/CMakeLists.txt#L680-L683 will need be shifted down somewhere below this line: https://github.com/tenstorrent/tt-metal/blob/main/ttnn/CMakeLists.txt#L819
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes lots of sense, let's do it right, once. So I will refactor this properly as you mentioned (which is absolutely the most scalable way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected, I moved to the file as other sublibraries were doing the same. We are following the same standard now
ttnn/CMakeLists.txt
Outdated
ADDITIONAL_CLEAN_FILES | ||
"${PROJECT_SOURCE_DIR}/ttnn/ttnn/_ttnn.so;${PROJECT_SOURCE_DIR}/ttnn/ttnn.egg-info" | ||
) | ||
if(BUILD_PYBIND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake consumers won't ever depend on the Pybind target (AFAIK). Because it doesn't export anything for another C++ library to ingest.
So rather than have libA that is either TT-NN C++ OR TT-NN Pybind, and libB be TT-NN C++ when we're doing Pybind, we could instead just have libA always always TT-NN C++, and libB as TT-NN Pybind iff we've build Pybind.
Again, this should be more consistent and reduce complexity in people's heads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake consumers won't ever depend on the Pybind target
I am not sure that’s technically true.
if I am building a framework on top, Andy framework has its bindings, I am happy to rely on ttnn pybinds and will link that lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. You're using CMake to build a Python project?
Or you're building another C++ project, also with pybinds, and want to pick up the TT-NN pybinds?
That makes sense. And in that case you'd still want to know For Realz that the target you're linking against is the one containing pybinds, and not the C++-only edition. So the idea of no shifting sands beneath our feet is still sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this split! Excellent change.
The set(... CACHE ..)
is a blocker -- had I seen that earlier I would have fixed that ASAP. We definitely need to fix it now before we start changing those values or many people will be hit in the face by it.
I've only given ttnn/CMakeLists.txt a cursory look since it has so much indirection. If you feel like improving that I can iterate with you to make it easier to digest. Even a small step would be a good step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a cmake expert and clearly the PR authors know more than me so I am approving to unblock :)
ttnn/CMakeLists.txt
Outdated
set(BUILD_DYNAMIC_LIB ON) | ||
if(WITH_PYTHON_BINDINGS) | ||
#ttnncpp will be static when building python bindings | ||
set(BUILD_DYNAMIC_LIB OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name BUILD_DYNAMIC_LIB
is a bit confusing. consider something like this or a more explicit name
set(TTNN_CPP_BUILD_TYPE DYNAMIC)
if(WITH_PYTHON_BINDINGS)
set(TTNN_CPP_BUILD_TYPE STATIC)
endif()
DESTINATION ${CMAKE_INSTALL_LIBDIR} | ||
LIBRARY | ||
DESTINATION ${CMAKE_INSTALL_LIBDIR} | ||
COMPONENT dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not familiar with COMPONENT
part. what does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik, is being used for the installation process, like a tag or a category. With that being said, you caught a problem that I already fixed!
if(ENABLE_COVERAGE AND CMAKE_CXX_COMPILER_ID MATCHES "GNU|Clang") | ||
message(STATUS "Enabling code coverage flags for all ttnn targets") | ||
add_compile_options(--coverage) | ||
add_link_options(--coverage) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already got it in tt_metal cmake. Can we remove this @blozano-tt ?
b58caba
to
4c30c6f
Compare
tt-train/sources/ttml/CMakeLists.txt
Outdated
set_target_properties( | ||
Metalium::TTNN | ||
Metalium::PYBIND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this project actually does not rely on pybinds, so its ok for it to stay on the old target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed!
ba6d274
to
cc4ad61
Compare
c449f43
to
1563ec0
Compare
…pp and ttnn, where ttnncpp will work as c++ backend while the ttnn library is the wrapper for python
…ached variables. Now those sublibraries are built in the same way than all the other sublibraries that are part of ttnncpp
…indings, otherwise the backend will be a dynamic library, so the user can choose the best scenario without having another level of indirection in the happy path
…a template instead of a function
Now we will have to libraries, ttnncpp and ttnn, where ttnncpp will work as c++ backend while the tin library is the wrapper for python
Ticket
#16418
Problem description
Our implementation of ttnn was coupled to python, so if anybody wanted to use the api directly from cpp, it could generate some linking issues.
What's changed
With ttnn divided in two, a user could link to the cpp backend without having to link to python, or use the python wrapper, depending on the use case. This gives more flexibility to our users to decide the way they decide to interact with the backend
Checklist