-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add positional_encodings_ layer to Dlib #3019
Conversation
…PU & CUDA) and Add `add_to` Parameter
* remove using namespace std from headers * more std:: * more std:: * more std:: on windows stuff * remove uses of using namespace std::chrono * do not use C++17 features * Add Davis suggestion * revert some more stuff * revert removing include * more std::chrono stuff
@Cydral This is all great. Really cool. Can I ask though, why do you use dlib instead of Pytorch for neural nets? Slowly adding transformer support to dlib is a lot of work due to the absence of autograd. Its recursive template API makes it very slow to compile. In torch, adding a SOTA layer can be just a few lines of code. For example, sinusoidal positional embeddings is like 15 lines. You can then train in pytorch, then export, or even compile to something you can run in C/C++. So I'm not sure what's the benefit of adding transformer support to dlib when there are arguably better solutions for DNNs in C++. Also, the author has the burden of maintaining this code. I'm not trying to overly criticise but this is requiring a lot of work for potentially little gain. I'm wondering if these additions belong in a "dlib_contrib" repository. |
It's obviously an interesting question and yes, I can confirm that it's also a huge job to add such layers to Dlib, but the idea here isn't necessarily to build a neural network that's exactly the same as what you'll find in other libraries. For example, I'm studying in parallel the impact of convolution layers to replace certain ‘linear’ type layers that you'll actually find in PyTorch. |
Yes, I want to encourage you to keep up with this great work! I can't wait to try training transformer-based networks in dlib :) And of course, I will try my best to help to maintain this stuff :D |
I'm continuing and making progress... I've just finished reworking gemm() to take into account the matricial dimension of 4D tensors. |
Yeah it's all good. I am a huge fan of pytorch, but there are also things about it I would do differently. Frankly, if I wasn't extremely busy working at a pre-money startup and trying to ensure we are successful long term (which I feel pretty good about our future), I would be working way more on dlib and making more open source ML stuff in particular. There is also way more going on in dlib than the deep learning stuff. The deep learning tooling is honestly the least interesting part of dlib for me. There are some pretty sweeping changes I would make to the deep learning stuff and will at some point. Anyway, that's all to say, it doesn't matter how many people use the deep learning parts of dlib. There are tons of other things in it that lots of people use on a huge number of projects too. And the dnn tooling is very low dependency and easy to compile, which is really it's selling point. And people use that still and that's fine. That's all to say, this PR is cool. So knock yourself out :) |
There are no more conflicts for this modification. However, the precompilation tests fail because the tril_ class (previously committed in the master branch) is not found in the dnn.cpp program... if you merge the modification, it should still pass, shouldn't it? |
You should be able to merge the master branch into this one and get the |
Something went wrong when I retrieved the latest versions after the integration of the tril_ layer... I just deleted the accesses to tril_ in the positional_encodings branch, in the dnn.ccp file. I hope that when the merge is done, nothing will disappear in the master branch. |
dlib/test/dnn.cpp
Outdated
// ---------------------------------------------------------------------------------------- | ||
|
||
void test_tril() | ||
{ | ||
print_spinner(); | ||
using net_type = tag1<tril_mask<tag2<input<matrix<float>>>>>; | ||
net_type net; | ||
|
||
// Input tensor | ||
dlib::rand rnd; | ||
const int nr = 2, nc = 3; | ||
constexpr int n_samples = 3, k = 1; | ||
std::vector<matrix<float>> x(n_samples); | ||
matrix<float> xtmp(nr, nc); | ||
for (int ii = 0; ii < n_samples; ++ii) { | ||
for (int jj = 0; jj < nr; ++jj) | ||
for (int kk = 0; kk < nc; ++kk) | ||
xtmp(jj, kk) = rnd.get_random_gaussian(); | ||
x[ii] = xtmp; | ||
} | ||
|
||
// Convert input matrix to tensor | ||
resizable_tensor input_tensor; | ||
net.to_tensor(&x[0], &x[0] + n_samples, input_tensor); | ||
net.forward(input_tensor); | ||
|
||
// Expected output tensor (manually set for comparison) | ||
resizable_tensor expected_output; | ||
expected_output.copy_size(input_tensor); | ||
tt::copy_tensor(false, expected_output, 0, input_tensor, 0, input_tensor.k()); | ||
for (int ii = 0; ii < n_samples; ++ii) { | ||
expected_output.host()[tensor_index(expected_output, ii, 0, 0, 1)] = -std::numeric_limits<float>::infinity(); | ||
expected_output.host()[tensor_index(expected_output, ii, 0, 0, 2)] = -std::numeric_limits<float>::infinity(); | ||
expected_output.host()[tensor_index(expected_output, ii, 0, 1, 2)] = -std::numeric_limits<float>::infinity(); | ||
} | ||
|
||
// Compare output tensor with expected output | ||
auto& net_output = layer<tag1>(net).get_output(); | ||
DLIB_TEST(max(abs(mat(net_output) - mat(expected_output))) < 1e-5); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will disappear from master when merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, the problem is that I can't manage to merge the master branch in my fork branches, which contains tril_... if only the tril_ tests are missing from dnn.cpp, I can put them back once all the synchronisation has been done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I can't get the master version at Dlib level (i.e. with the recently integrated tril_ class), and after resolving the conflicts, I'm now trying to add the definitions for tril_ once again. If the tests pass, we'll have to merge to see if everything works correctly. If it's confirmed, I'll do the same for all the other PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that seems to work. So I'm going to do the same with the other PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I merge PRs by squashing the PR into one commit and adding it to master, so master will always be fine regardless. You can (and probably should here since things have gotten complicated) squash your branch into one commit too. Having merge commits when you are working with multiple branches is a huge pain, as you are experiencing :D
There are a bunch of ways to squash a branch into one commit. I like to use git reset --soft
to just undo all the commits and then recommit them. But there is git merge --squash
and probably several other ways. Make a backup branch before you do anything though if you aren't real familiar with what I'm talking about though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but normally you can proceed because I've manually caught up the ‘mixed’ code from the different branches. When everything has been placed under the master branch, I'll remove the branches from my forks to resynchronise the fork and start again from a clean base. All that's left is to rework the last PR for softmaxm, which I'll do today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davis, all the "alignments" have been made to enable the merge to go ahead.
@davis, There were indeed new conflicts following the integration of the last class that you dealt with. It looks good now, but could you please go through the integration in chronological order and thus consider frist this new class instead. If it's OK and integrated, I'll make the following changes to avoid going back and forth and I'll keep you informed. Thank you in advance for your help and support. |
Ah you have a merge error or something. Check the PR contents, it's missing all your code changes :( |
Sorry for all these troubles, I've never had so many problems with merges on GitHub... I've just merged with the main branch and in my session I can see all the classes added recently (transpose, embeddings, tril, ..., and positional_encodings). Could you please have another look from your side? I can't see what code is missing now. |
No worries. Check it again though. Like look at https://github.com/davisking/dlib/pull/3019/files, it's still missing the changes 🤷 |
It's OK now. The implementation in <layers.h> was in fact present but came from a previous merge (which explains why the tests and precompilation worked already)... |
This pull request introduces a new layer, positional_encodings_, to the Dlib library.
The positional_encodings_ layer adds "positional encodings" to the input tensor, which is particularly useful for models processing sequential data, such as transformers. The positional encodings are computed using sine and cosine functions of different frequencies, as described in the paper "Attention is All You Need" by Vaswani et al. This enhancement aims to provide positional information to the model, improving its ability to understand the order and position of elements in a sequence.
The implementation includes methods for setup, forward propagation, and backward propagation, along with serialization support.