Skip to content

Commit

Permalink
"updating"
Browse files Browse the repository at this point in the history
  • Loading branch information
rohan-paul committed Jan 11, 2024
1 parent 5b96ef3 commit 207166b
Showing 1 changed file with 10 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@
"\n",
"πŸ‘‰ Rotary Embedding (Rotary Positional Encoding) is a kind of positional encoding, but unlike traditional positional encodings, it tries to encode the relative positions of tokens rather than their absolute positions. This is achieved by applying a rotation operation in the embedding space.\n",
"\n",
"RoPE is unique in that it encodes the absolute position m of tokens through a rotation matrix R_(ΞΈ, m), and also incorporates the explicit relative position dependency in the self-attention formulation. The idea is to embed the position of a token in a sequence by rotating queries and keys, with a different rotation at each position.\n",
"\n",
"The main benefit is, Invariance to Sequence Length: Unlike traditional position embeddings, RoPE does not require a predefined maximum sequence length. It can generate position embeddings on-the-fly for any length of sequences. This makes it much more scalable and adaptable to different tasks.\n",
"\n",
"-------\n",
"\n",
"πŸ“Œ Now looking at the method `apply_rotary_emb()` \n",
Expand All @@ -49,10 +53,16 @@
"\n",
"---------\n",
"\n",
"Why `xq` and `xk` are being converted into complex tensors with `view_as_complex` ?\n",
"\n",
"πŸ“Œ Rotary Embeddings involve performing a rotation in the embedding space. This rotation is applied to the queries and keys in the attention mechanism of Transformer models.\n",
"\n",
"πŸ“Œ The rotation in RoPE is fundamentally a complex multiplication. Complex numbers are ideal for representing rotations because multiplying by a complex number corresponds to a rotation in the complex plane.\n",
"\n",
"πŸ“Œ The \"rotation\" in this context is a phase shift in the complex plane for each element in the query and key tensors. This shift varies based on the position in the sequence, thus encoding positional information.\n",
"\n",
"πŸ“Œ The essence of RoPE is that by applying these phase shifts, the dot product (used in the self-attention mechanism) between queries and keys becomes sensitive to their relative positions. It's not merely about measuring \"real angle distance\" but rather about how the phase-shifted dot product correlates with the positional relationships of tokens in the input sequence.\n",
"\n",
"πŸ“Œ By converting `xq` and `xk` into complex tensors, these lines prepare them for such a rotation. The actual rotation is performed later in the code, where these complex tensors (`xq_` and `xk_`) are multiplied by another complex tensor representing the rotation (typically through phase factors, like `freqs_cis` in this code).\n",
"\n",
"---------\n",
Expand Down

0 comments on commit 207166b

Please sign in to comment.