Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MVU/VVU] Support for double-pumped DSPs #929

Closed
wants to merge 266 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
266 commits
Select commit Hold shift + click to select a range
8616148
[thresholding] pass O_BITS from top module to thresholding.sv core
fionnodonohoe-xlnx Nov 15, 2022
275abad
[thresholding] pass C_BITS from top module to thresholding.sv core
fionnodonohoe-xlnx Nov 15, 2022
8849c02
[thresholding] create & fill in RTL template values using FINN
fionnodonohoe-xlnx Nov 16, 2022
84704ed
[thresholding] add method get_weightstream_width()
fionnodonohoe-xlnx Nov 16, 2022
9aa7ff3
[thresholding] add method get_in/output_width()
fionnodonohoe-xlnx Nov 16, 2022
608b5da
[thresholding] add method body for code_generation_ipi()
fionnodonohoe-xlnx Nov 16, 2022
ca6e7e7
[thresholding] add method get_verilog_top_module_intf_names()
fionnodonohoe-xlnx Nov 16, 2022
7266ee9
[thresholding] retrieve axilite write sequence for runtime weight pro…
fionnodonohoe-xlnx Nov 16, 2022
f88bdbf
[thresholding] add methods for creating weight files for each simulat…
fionnodonohoe-xlnx Nov 16, 2022
560771a
[thresholding] add method generate_params()
fionnodonohoe-xlnx Nov 16, 2022
e763bf8
[thresholding] add method for preparing a Pyverilator object for RTL …
fionnodonohoe-xlnx Nov 16, 2022
84e08f1
[thresholding] add method to run rtlsim on a thresholding binary sear…
fionnodonohoe-xlnx Nov 16, 2022
b0be07a
[thresholding] add stubbed method for ipgen_singlenode_code()
fionnodonohoe-xlnx Nov 16, 2022
30d22f8
[thresholding] update class name to a more consistent naming convention
fionnodonohoe-xlnx Nov 16, 2022
3594edd
[thresholding] add fpgadataflow pytests for thresholding binary searc…
fionnodonohoe-xlnx Nov 17, 2022
0bee70d
[thresholding] add linter fixes
fionnodonohoe-xlnx Nov 17, 2022
0689c6a
[thresholding] add flake8 fixes
fionnodonohoe-xlnx Nov 17, 2022
e9a4a7b
[thresholding] change the pytest markers to omit tests from quicktest
fionnodonohoe-xlnx Nov 17, 2022
41c0b4b
[thresholding] update copyright banners of files I have added/changed
fionnodonohoe-xlnx Nov 25, 2022
71ef39b
Translate byte to parameter word addressing in AXI adapter.
preusser Dec 1, 2022
47a0cf9
Merge branch 'dev' into feature/thresholding_addressing
preusser Dec 1, 2022
bdd100f
Merge branch 'dev' into feature/thresholding
fionnodonohoe-xlnx Dec 13, 2022
d44a66c
[thresholding] remove unused attribute
fionnodonohoe-xlnx Dec 19, 2022
f79b9ec
[thresholding] remove unnecessary HLS bug prevention check
fionnodonohoe-xlnx Dec 19, 2022
7b82de2
[thresholding] align methods with hlscustom class by adding in additi…
fionnodonohoe-xlnx Dec 19, 2022
e2816d3
[thresholding] replace hardcoded tcl commands with node attributes
fionnodonohoe-xlnx Dec 19, 2022
61acc64
Merge branch 'feature/thresholding' into feature/thresholding_addressing
preusser Dec 20, 2022
bda05ae
Fix BIAS parameter specification.
preusser Dec 20, 2022
7388e76
[thresholding] remove unused ram_style attribute
fionnodonohoe-xlnx Dec 20, 2022
be1503a
First changes to custom_op for RTL-based MVAU
mmrahorovic Jan 3, 2023
e965396
[thresholding] skip test for unsupported cppsim configuration and mer…
fionnodonohoe-xlnx Jan 5, 2023
2b8a674
[thresholding] moving find_next_power_of_2() to the util suite
fionnodonohoe-xlnx Jan 5, 2023
45bb19f
[thresholding] remove find_next_power_of_2() from thresholding binary…
fionnodonohoe-xlnx Jan 5, 2023
ca00422
[thresholding] replace math functions with existing functions
fionnodonohoe-xlnx Jan 5, 2023
7f3455f
[thresholding] remove convept of mem_mode for RTL thresholding binary…
fionnodonohoe-xlnx Jan 5, 2023
4bc69f1
[thresholding] add methods needed for convertingToHls transformation
fionnodonohoe-xlnx Jan 5, 2023
3b6a198
[thresholding] add convertingToHls transformation for thresholding bi…
fionnodonohoe-xlnx Jan 5, 2023
b3800cd
[thresholding] add test for convertingToHls transformation for thresh…
fionnodonohoe-xlnx Jan 5, 2023
11464d8
[thresholding] skip tests with unsupported folding factor input
fionnodonohoe-xlnx Jan 5, 2023
e71b1c0
[thresholding] add comments for attributes
fionnodonohoe-xlnx Jan 5, 2023
3be1140
[thresholding] replace min() with signed() function
fionnodonohoe-xlnx Jan 5, 2023
e05effc
[thresholding] fix formatting from pre-commit
fionnodonohoe-xlnx Jan 5, 2023
48c3304
[thresholding] fix more flake8 formatting
fionnodonohoe-xlnx Jan 5, 2023
1e8a36c
[thresholding] remove backslashes for flake8
fionnodonohoe-xlnx Jan 5, 2023
08f1b5f
[thresholding] more flake8 fixes
fionnodonohoe-xlnx Jan 5, 2023
481d773
[thresholding] undo flake8 fixes
fionnodonohoe-xlnx Jan 5, 2023
a51bef4
[thresholding] another flake8 fix
fionnodonohoe-xlnx Jan 5, 2023
2c313ad
[thresholding] remove cppsim test file generation
fionnodonohoe-xlnx Jan 6, 2023
49bdd28
[thresholding] remove unnecessary data generation functions for simul…
fionnodonohoe-xlnx Jan 6, 2023
e663030
[thresholding] remove potentially problematic helper function
fionnodonohoe-xlnx Jan 6, 2023
42dbf23
[thresholding] implement flake8 formatting
fionnodonohoe-xlnx Jan 6, 2023
933d747
[thresholding] remove unused imports
fionnodonohoe-xlnx Jan 6, 2023
5c6dcd9
[thresholding] remove last ununsed import
fionnodonohoe-xlnx Jan 6, 2023
51acd11
[thresholding] reformat existing import
fionnodonohoe-xlnx Jan 6, 2023
9dd4e67
Merge pull request #715 from Xilinx/feature/thresholding_addressing
auphelia Jan 10, 2023
412de82
Merge branch 'dev' into feature/thresholding
auphelia Jan 18, 2023
b886a5a
[Docs] Add bin search thresholding to docs generation
auphelia Jan 18, 2023
2c3de2a
Corrected address width in Verilog wrapper for thresholding.
preusser Jan 23, 2023
7c9f5d8
[thresholding] remove bug affecting input width in top level wrapper
fionnodonohoe-xlnx Jan 23, 2023
3a0d59d
[thresholding] adjust thresholding binary search tests to use word ad…
fionnodonohoe-xlnx Jan 23, 2023
757e3a1
[thresholding] adjust typo in exception
fionnodonohoe-xlnx Jan 23, 2023
479575b
[thresholding] undo copyright header change - only needed for new files
fionnodonohoe-xlnx Jan 23, 2023
0d99b6c
[thresholding] add docstring for migrated find_next_power_of_2() func…
fionnodonohoe-xlnx Jan 23, 2023
5a77a32
[thresholding] add docstrings for methods not in base class
fionnodonohoe-xlnx Jan 23, 2023
eeed070
[thresholding] remove unused method
fionnodonohoe-xlnx Jan 23, 2023
c270868
[thresholding] remove 'return' at end of function - not needed
fionnodonohoe-xlnx Jan 27, 2023
af22177
[thresholding] remove cppsim exec_mode from test - not exercised
fionnodonohoe-xlnx Jan 27, 2023
fab120b
[thresholding] remove unused attributes
fionnodonohoe-xlnx Jan 27, 2023
5d6c964
[thresholding] adjust i/o port names on thresholding RTL wrapper
fionnodonohoe-xlnx Jan 27, 2023
bdfa6cb
[thresholding] remove duplicated test helper function
fionnodonohoe-xlnx Jan 31, 2023
6809351
[thresholding] assert on finding unsupported memory mode for threshol…
fionnodonohoe-xlnx Jan 31, 2023
4515cf7
[thresholding] precommit fix
fionnodonohoe-xlnx Jan 31, 2023
b51498e
[thresholding] precommit fix 2
fionnodonohoe-xlnx Jan 31, 2023
ff3b201
[thresholding] precommit fix 3
fionnodonohoe-xlnx Jan 31, 2023
e0e263b
Merge branch 'dev' into feature/thresholding
auphelia Jan 31, 2023
fc7e00d
[thresholding] adjust templates so that .sv files are modular and can…
fionnodonohoe-xlnx Mar 23, 2023
f530aba
[thresholding]: remove SIGN template in thresholding RTL and create p…
fionnodonohoe-xlnx Mar 23, 2023
3cd600c
[thresholding]: decouple thresholding core from axi wrapper by removi…
fionnodonohoe-xlnx Mar 23, 2023
54afa63
[thresholding]: patch in PE value to the thresholding AXI module and …
fionnodonohoe-xlnx Mar 28, 2023
29f9e1c
[thresholding]: remove reset that erases the 0th stage threshold value
fionnodonohoe-xlnx Mar 30, 2023
2c4c8e2
[thresholding]: enable PE testing of RTL threhoslding binary search node
fionnodonohoe-xlnx Mar 31, 2023
5d07a43
[thresholding]: add comment about why bipolar activations skipped for…
fionnodonohoe-xlnx Mar 31, 2023
33fadc7
Merge branch 'dev' into feature/thresholding
fionnodonohoe-xlnx Mar 31, 2023
fcf579c
fix precommit issues
fionnodonohoe-xlnx Mar 31, 2023
8265985
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Apr 5, 2023
6c9d1f5
[thresholding] only adjust MSB thresholding addressing bits when chan…
fionnodonohoe-xlnx Apr 5, 2023
b247ffb
[thresholding] update binary search to match qonnx 0.2.0
fionnodonohoe-xlnx Apr 5, 2023
afab9cd
[rtl custom op]: initial implementation of mvu_8sx9
mmrahorovic Apr 6, 2023
a94fc3b
[rtl custom op]: testbench for mvu_8sx9
mmrahorovic Apr 6, 2023
98f9acc
[rtl custom op]: initial implementation of flow control component for…
mmrahorovic Apr 6, 2023
96925a9
[rtl custom op]: implementation of replay buffer for mvu
mmrahorovic Apr 6, 2023
a3d1156
[rtl custom op]: testbench for mvu_8sx9_axi (including axi_wrapper & …
mmrahorovic Apr 6, 2023
2aea664
[rtl custom op]: initial implementation of verilog wrapper for mvu_8s…
mmrahorovic Apr 6, 2023
c92e4e3
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Apr 6, 2023
8b57849
[rtl mvu]: fix tab indentation
mmrahorovic Apr 11, 2023
5e61f42
[rtl custom op]: fix to indentation
mmrahorovic Apr 12, 2023
cbee193
[rtl custom-op]: minor changes for compiler integration
mmrahorovic Apr 12, 2023
ba5e77b
[rtl custom op]: moved testbenches to separate directory
mmrahorovic Apr 12, 2023
69310b4
[rtl custom op]: fixed output width to ACCU_WIDTH
mmrahorovic Apr 12, 2023
cfcff00
[rtl custom op]: renamed file and added generic to switch between com…
mmrahorovic Apr 12, 2023
72b5196
[rtl custom op]: renamed file and added generic to switch between com…
mmrahorovic Apr 12, 2023
7be5ce4
Defaulting BIAS and SIGNED parameters. Renaming M to K avoiding namin…
preusser Apr 17, 2023
c068bb6
[rtl mvu]: added behavioral model DSP58
mmrahorovic May 8, 2023
18f94e7
[rtl mvu]: extended flow control wrapper with additional compute core…
mmrahorovic May 8, 2023
6d4a0a7
[rtl mvu]: fix to done_len flag when SIMD dimension fully unrolled an…
mmrahorovic May 8, 2023
90c547d
[rtl mvu tb]: updated testbench
mmrahorovic May 8, 2023
0c37f1f
[builder]: added specialize_to_rtl step and changed standalone thresh…
mmrahorovic May 8, 2023
5ccb016
[builder]: added specialize_to_rtl step
mmrahorovic May 8, 2023
f099f4b
[custom op]: added custom op MatrixVectorActivation_rtl
mmrahorovic May 8, 2023
9a3b0fd
[custom op]: added additional attribute to enable conversion to RTL (…
mmrahorovic May 8, 2023
38aa930
[custom op]: modified ip-stitching and code generation
mmrahorovic May 8, 2023
4e44934
[tests]: initial version of unit test for RTL custom op and specializ…
mmrahorovic May 8, 2023
cc361d9
[rtl mvu]: specialized compute core for 4-bit weights and activations…
mmrahorovic May 8, 2023
8eefb53
[rtl mvu]: specialized compute core for > 4-bit weights and activatio…
mmrahorovic May 8, 2023
e7109e7
[fpgadataflow transform]: initial specialize_to_rtl_layers-transform …
mmrahorovic May 8, 2023
d107b4d
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic May 9, 2023
5a868d1
[rtl mvu] fixes for latest memstream + linting
maltanar May 9, 2023
4a9cfa1
[rtl custom_op]: add support for external weights
mmrahorovic May 11, 2023
8a9ac1a
Specify clock and reset associations of bus interfaces.
preusser May 11, 2023
51bbe02
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic May 21, 2023
3d856b7
Merge branch 'dev' into feature/dsp_packing
preusser May 23, 2023
d9b9079
[rtlmvu] More fixes for memstream and param gen
maltanar May 15, 2023
a5f2a83
[Build] apply config to only FIFO nodes in step_set_fifo_depths
maltanar May 11, 2023
08cbdc5
Revised control interface attributes.
preusser May 24, 2023
48f0c5c
Merge branch 'dev' into feature/dsp_packing
preusser May 24, 2023
d058cc2
Mask device primitives from Verilator in favor of using behavioral code.
preusser May 24, 2023
a66f38f
[Deps] update qonnx
maltanar May 11, 2023
8f9bd04
Adding folding hints. Impl selection by case statement.
preusser May 24, 2023
8799707
Merge branch 'feature/verilator_workarounds' into feature/dsp_packing
preusser May 24, 2023
9de5ed6
Fixed behavioral sideband prediction.
preusser May 24, 2023
b6e92bb
Merge remote-tracking branch 'origin/feature/dsp_packing' into featur…
mmrahorovic May 24, 2023
239759a
[rtl mvu]: extension to allow selecting PE values that are not multip…
mmrahorovic May 24, 2023
8d3247c
[rtlmvu] Avoid unintentional verilator metacomments
maltanar May 24, 2023
ffc11d6
Merge remote-tracking branch 'origin/feature/dsp_packing' into featur…
mmrahorovic May 24, 2023
c866350
[rtl mvu]: extension to allow selecting PE values that are not multip…
mmrahorovic May 24, 2023
fd1e038
[rtl mvu axi]: updated comments on folding hints
mmrahorovic May 24, 2023
f60d4c6
[rtl custom op]: minor fixes to codegen
mmrahorovic Jun 2, 2023
a1ad304
[specialize-to-rtl]: add ram_style and rt_writeable_weights support
mmrahorovic Jun 2, 2023
2cbb68f
[rtllib]: change string type to parameter type due to Vivado error
mmrahorovic Jun 2, 2023
92eb0ed
[rtllib]: renamed variable for consistency
mmrahorovic Jun 2, 2023
471a221
Fix improper blocking assignment & linting.
preusser Jun 2, 2023
5c5dc09
[test rtl mvu]: modified/extended test cases
mmrahorovic Jun 2, 2023
b4eb9b6
[rtl mvu]: updated DSP58 >4-bit variant to lift SIMD%3==0 restriction
mmrahorovic Jun 30, 2023
ad63673
[rtl mvu]: bug fix for SIMD=1 init_leave_loads
mmrahorovic Jun 30, 2023
79e8a5e
[mvu rtl]: restrict index i to be less than 3 (within bounds of hi4)
mmrahorovic Jul 13, 2023
7be62b4
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Jul 17, 2023
e3493c3
Rewrite replay_buffer for input elasticity.
preusser Jun 2, 2023
44fae0c
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Jul 31, 2023
df51f11
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Aug 16, 2023
2efba68
[to-rtl]: Infer unique node names after transformation is applied
mmrahorovic Sep 5, 2023
114ea1b
[mvu rtl]: add synthesis directive to handle 'X in simulation
mmrahorovic Sep 18, 2023
79fafdb
[replay buffer rtl]: minor fix to when LEN=1 (= AWIDTH=0)
mmrahorovic Sep 18, 2023
619d9db
[mvu lut]: LUT-based MVU compute core
mmrahorovic Sep 18, 2023
090f2ac
[custom op]: add preferred_backend attribute
mmrahorovic Sep 19, 2023
ac5e82d
Ensure a minimum of two buffer slots even for length-1 sequences.
preusser Sep 21, 2023
d5ff2a2
Merge pull request #1 from Xilinx/bugfix/replay_len1
mmrahorovic Sep 21, 2023
bb94092
Merge remote-tracking branch 'origin/feature/dsp_packing' into featur…
mmrahorovic Sep 21, 2023
8515693
[rtl mvu wrapper]: support for vvu layer and rename
mmrahorovic Sep 21, 2023
cf28d78
[mvu vvu tb]: modified testbench to also support testing VVU on DSP58
mmrahorovic Sep 21, 2023
2617c39
[axi wrapper]: minor modification to comment description
mmrahorovic Sep 21, 2023
8ca5fe7
[mvu axi]: add support for VVU on DSP58
mmrahorovic Sep 21, 2023
32d6338
[mvu vvu axi]: renamed file for consistency purposes
mmrahorovic Sep 21, 2023
031406d
[mvu 8sx9]: added support for VVU on DSP58, resolved PyVerilator-caus…
mmrahorovic Sep 21, 2023
e2c1f15
[mvu vvu 8sx9]: renamed compute core for consistency
mmrahorovic Sep 21, 2023
adb5869
[axi wrapper]: changed parameter to localparam
mmrahorovic Sep 21, 2023
f54d438
[axi]: added support for LUT-based VVU
mmrahorovic Sep 21, 2023
a4e2ac7
[mvu vvu 8sx9]: minor change to list of generics
mmrahorovic Sep 21, 2023
40ad0b4
[mvu lut]: added support for VVU
mmrahorovic Sep 21, 2023
30fcb5b
[mvu vvu lut]: renamed file for consistency
mmrahorovic Sep 21, 2023
cb43438
Revert to proper address truncation without generation bit.
preusser Sep 21, 2023
b4b69f3
remove deletd/renamed files
mmrahorovic Sep 21, 2023
14c5fa9
[mvu vvu 8sx9]: renamed for consistency
mmrahorovic Sep 21, 2023
3a37588
[mvu vvu axi]: changes for renamed module
mmrahorovic Sep 21, 2023
afe36ba
[mvu vvu wrapper]: convert localparam to param
mmrahorovic Sep 25, 2023
e4f2f9e
[mvau-rtl custom-op]: bugfix to instantiate memstreamer, modified ren…
mmrahorovic Sep 25, 2023
b49b79a
[specialize to rtl]: fix to changed attribute name and added support …
mmrahorovic Sep 25, 2023
9bdba03
Adding core for DSP48 backport.
preusser Sep 19, 2023
2cf1ef7
[mvu rtl core]: added support for signed activations for DSP48-based …
mmrahorovic Sep 25, 2023
ab8d4a8
[rtl mvu custom-op]: add upper bound to SEGMENTLEN equal to number of…
mmrahorovic Sep 25, 2023
74eb42b
Starting on pumped DSP compute.
preusser Sep 29, 2023
d9e2fc6
Flag TODO.
preusser Sep 29, 2023
5a429fc
[mvu_vvu dsp58]: change weight input to 2D instead of 3D array
mmrahorovic Oct 13, 2023
a4a18bb
[mvu_vvu axi]: re-wire weights appropriately for VVU DSP58
mmrahorovic Oct 13, 2023
cc0737b
[mvu_vvu axi wrapper]: fix to IS_MVU parameter
mmrahorovic Oct 13, 2023
c0eff0b
[mvu_vvu tb]: WIP -- changes to self-checker and shape of input data
mmrahorovic Oct 13, 2023
d5ae2d2
Merge remote-tracking branch 'origin/feature/dsp_packing' into featur…
mmrahorovic Oct 13, 2023
4591bb8
[vvu_hls]: add flag to specify preferred backend
mmrahorovic Oct 13, 2023
ef1cbbe
[vvu rtl]: added new custom-op VVU_RTL
mmrahorovic Oct 13, 2023
62cec50
[dwc pw]: added new custom-op SDWC operating on SWG with parallel win…
mmrahorovic Oct 13, 2023
511f835
[transformation]: extended InsertDWC transformation to instantiate a …
mmrahorovic Oct 13, 2023
4d949d6
[custom op]: added 2 new custom-ops
mmrahorovic Oct 13, 2023
05751c4
[VVU_RTL test]: added test for RTL-based VVU, which includes testing …
mmrahorovic Oct 13, 2023
6d4ee08
[mvu vvu axi]: minor bugfixes to enable VVU
mmrahorovic Nov 1, 2023
39dc27a
[mvu tb]: created separate vvu testbench and renamed mvu_vvu_axi tb
mmrahorovic Nov 1, 2023
87b25f9
[rtl-vvu custom-op]: flipped weights per SIMD-chunk to match pattern …
mmrahorovic Nov 1, 2023
1476927
[rtl vvu test]: extended testbench
mmrahorovic Nov 1, 2023
c2acd59
Merge commit '7be5ce412e5747f17fe0062769cd2cc476b5bfa4' into feature/…
mmrahorovic Nov 15, 2023
7fc173b
[RTLThres] compute obits in Python and use placeholder in template
maltanar Nov 14, 2023
9bd0744
Merge remote-tracking branch 'origin/feature/dsp_packing' into featur…
mmrahorovic Nov 15, 2023
0bb0a43
Merge remote-tracking branch 'origin/feature/vvu_dsp_packing' into fe…
mmrahorovic Nov 16, 2023
a62911c
[mvu vvu axi]: minor fix -- define mvauin_weight_t
mmrahorovic Nov 20, 2023
4d4c61b
[specialize_to_rtl step]: add transformation to infer RTL-VVU
mmrahorovic Nov 20, 2023
612ed8f
[rtl vvu custom op]: clean-up of unused functions
mmrahorovic Nov 20, 2023
0b31a88
[folding]: first attempt to extend folding transformation to parallel…
mmrahorovic Nov 20, 2023
92bc515
[to-rtl transformation]: extended with additional checker to ensure t…
mmrahorovic Nov 20, 2023
31914b1
[build steps]: move specialize_to_rtl step to be applied after conver…
mmrahorovic Nov 27, 2023
fe97dae
Merge remote-tracking branch 'origin/feature/vvu_dsp_packing' into fe…
mmrahorovic Nov 27, 2023
fa1d116
[Test] fix data layout for golden/ret comparison in RTL MVU test
maltanar Nov 24, 2023
becaac7
[RTLCustomOp] IP packaging fixes for pDWC+VVU, fix linting too
maltanar Nov 24, 2023
e0c6c26
Merge remote-tracking branch 'upstream/feature/dwc' into feature/vvu_…
mmrahorovic Nov 27, 2023
cf7f494
[mvu vvu axi]: minor bugfixes to enable VVU
mmrahorovic Nov 1, 2023
5ffc221
[mvu vvu axi]: minor fix -- define mvauin_weight_t
mmrahorovic Nov 20, 2023
d573043
Merge remote-tracking branch 'upstream/dev' into feature/dsp_packing
mmrahorovic Nov 27, 2023
40d652c
[rtl mvu op]: minor fix to chain length estimation and enabled behavi…
mmrahorovic Nov 29, 2023
9e0e333
Merge remote-tracking branch 'origin/feature/dsp_packing' into featur…
mmrahorovic Nov 29, 2023
977ce9b
Merge remote-tracking branch 'origin/feature/vvu_dsp_packing' into fe…
mmrahorovic Nov 29, 2023
3a1d9d2
[mvu vvu axi]: minor changes to enable double-pumped DSPs for uneven …
mmrahorovic Nov 29, 2023
493bcfe
[axi wrapper]: add port for double-clock
mmrahorovic Nov 29, 2023
58f191e
[builder]: add flag for enabling pumped compute
mmrahorovic Dec 1, 2023
f435aed
[hls custom op]: add clk2x interface
mmrahorovic Dec 1, 2023
4a8ff59
[mvu rtl]: add pumped compute attribute and fill out template accordi…
mmrahorovic Dec 1, 2023
f38fd6b
[stitched ip]: wire up clk2x interface
mmrahorovic Dec 1, 2023
078888a
[mvu vvu axi]: removed SIMD%2 constraint for double-pumped DSP58
mmrahorovic Dec 1, 2023
bbcbb5a
[builder]: minor fix to attribute naming
mmrahorovic Dec 1, 2023
b72d00d
[stitched-ip]: minor fixes to creating valid stitched-ip with ap_clk2…
mmrahorovic Dec 3, 2023
04f5863
[rtl-vvu]: add stitching support for pumped compute, minor fix to seg…
mmrahorovic Dec 3, 2023
9b80ac1
Prevent output register slice from operating in unnecessary ping-pong…
preusser Dec 3, 2023
60f483a
[mvu vvu axi]: verilator BLKLOOPINIT-error workaround
mmrahorovic Dec 7, 2023
23fb64f
[mvu vvu axi]: sign extend output tdata (byte-aligned)
mmrahorovic Dec 8, 2023
fdca45b
[mvu-rtl]: default seglen to 1 for now
mmrahorovic Dec 11, 2023
45074d9
update test config
mmrahorovic Dec 11, 2023
0ed3681
updated test config
mmrahorovic Dec 12, 2023
c396425
[rtlsim]: use pyverilator util functions
mmrahorovic Dec 13, 2023
538852d
[mvu vvu axi]: fix multiple driver error
mmrahorovic Dec 13, 2023
7e5306c
Mitigate hold time issues on feed from fast clock net.
preusser Dec 18, 2023
256931f
toggle P and Vld only when no backpressure is applied
mmrahorovic Dec 18, 2023
020c4e0
change naming
mmrahorovic Dec 18, 2023
7e12ae4
Reworking pumped DSP integration with simplified enable computation.
preusser Dec 19, 2023
6e98bac
[rtlsim]: use pyverilator util functions
mmrahorovic Dec 13, 2023
5dd74ad
[mvu vvu axi]: sign extend output tdata (byte-aligned)
mmrahorovic Dec 8, 2023
b20410b
[mvu core]: dsp48 convert unpacked array to packed array to work arou…
mmrahorovic Jan 8, 2024
1c2cc0c
[mvu axi]: update list of deduced parameters
mmrahorovic Jan 8, 2024
eeb3cea
[mvu custom-op]: remove lut-based implementation and update compute c…
mmrahorovic Jan 8, 2024
0813d14
[mvu axi]: remove LUT-based compute core
mmrahorovic Jan 8, 2024
4892d66
[hls custom-op]: enable reset in sim
mmrahorovic Jan 11, 2024
44f6e0f
[test mvu rtl]: updated test flow (DSP58 only)
mmrahorovic Jan 11, 2024
9b2cceb
[mvu vvu axi]: reworked flow control and backpressure handling by tpr…
mmrahorovic Jan 11, 2024
ee9f027
Adding DSP48E1 support for 8-bit compute. Todo: finer core differenti…
preusser Jan 31, 2024
3ab8296
Adding DSP48E1 support for 4-bit compute. Todo: finer core differenti…
preusser Jan 31, 2024
d5cd44c
Merge remote-tracking branch 'origin/feature/dsp_packing' into featur…
mmrahorovic Feb 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/finn/source_code/finn.custom_op.fpgadataflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,14 @@ finn.custom\_op.fpgadataflow.thresholding\_batch
:undoc-members:
:show-inheritance:

finn.custom\_op.fpgadataflow.thresholding\_binary\_search
-----------------------------------------------------------

.. automodule:: finn.custom_op.fpgadataflow.thresholding_binary_search
:members:
:undoc-members:
:show-inheritance:


finn.custom\_op.fpgadataflow.tlastmarker
-----------------------------------------------
Expand Down
494 changes: 494 additions & 0 deletions finn-rtllib/mvu/mvu_4sx4u.sv

Large diffs are not rendered by default.

492 changes: 492 additions & 0 deletions finn-rtllib/mvu/mvu_8sx8u_dsp48.sv

Large diffs are not rendered by default.

430 changes: 430 additions & 0 deletions finn-rtllib/mvu/mvu_vvu_8sx9_dsp58.sv

Large diffs are not rendered by default.

383 changes: 383 additions & 0 deletions finn-rtllib/mvu/mvu_vvu_axi.sv
Original file line number Diff line number Diff line change
@@ -0,0 +1,383 @@
/******************************************************************************
* Copyright (C) 2022, Advanced Micro Devices, Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. Neither the name of the copyright holder nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
* OR BUSINESS INTERRUPTION). HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
* OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
* ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* @brief Matrix Vector Unit (MVU) & Vector Vector Unit (VVU) AXI-lite interface wrapper.
* @details
* The following compute cores are supported:
* - 4-bit MVU on DSP48 & DSP58 achieving 4 MACs/DSP,
* (4,8]-bit MVU on DSP48 achieving 2 MACs/DSP,
* [4,9]-bit MVU and VVU on DSP58 achieving 3 MACs/DSP,
* 'unconstrained' LUT-based MVU and VVU.
* Folding hints:
* - PE scaling should divide MH.
* - SIMD scaling should divide MW.
* - Otherwise, keep SIMD and PE somewhat balanced. SIMD scaling tends to
* impact critical paths more than PE scaling. PE scaling implies a
* bigger fanout on the input activations.
* - Full unfolding along MH (PE=MH) results in no replay buffer instantiated
*****************************************************************************/

module mvu_vvu_axi #(
bit IS_MVU,
parameter COMPUTE_CORE,
int unsigned MW,
int unsigned MH,
int unsigned PE,
int unsigned SIMD,
int unsigned SEGMENTLEN = 0,

int unsigned ACTIVATION_WIDTH,
int unsigned WEIGHT_WIDTH,
int unsigned ACCU_WIDTH,
bit SIGNED_ACTIVATIONS = 0,

bit PUMPED_COMPUTE = 0, // requires an even SIMD % 2 == 0
bit FORCE_BEHAVIORAL = 0,
bit M_REG_LUT = 1,

// Safely deducible parameters
localparam int unsigned WEIGHT_STREAM_WIDTH = PE * SIMD * WEIGHT_WIDTH,
localparam int unsigned WEIGHT_STREAM_WIDTH_BA = (WEIGHT_STREAM_WIDTH + 7)/8 * 8,
localparam int unsigned INPUT_STREAM_WIDTH = (IS_MVU ? 1 : PE) * SIMD * ACTIVATION_WIDTH,
localparam int unsigned INPUT_STREAM_WIDTH_BA = (INPUT_STREAM_WIDTH + 7)/8 * 8,
localparam int unsigned OUTPUT_STREAM_WIDTH = PE*ACCU_WIDTH,
localparam int unsigned OUTPUT_STREAM_WIDTH_BA = (OUTPUT_STREAM_WIDTH + 7)/8 * 8,
localparam bit SIMD_UNEVEN = SIMD % 2
)(
// Global Control
input logic ap_clk,
input logic ap_clk2x, // synchronous, double-speed clock; only used for PUMPED_COMPUTE
input logic ap_rst_n,

// Weight Stream
input logic [WEIGHT_STREAM_WIDTH_BA-1:0] s_axis_weights_tdata,
input logic s_axis_weights_tvalid,
output logic s_axis_weights_tready,

// Input Stream
input logic [INPUT_STREAM_WIDTH_BA-1:0] s_axis_input_tdata,
input logic s_axis_input_tvalid,
output logic s_axis_input_tready,

// Output Stream
output logic [OUTPUT_STREAM_WIDTH_BA-1:0] m_axis_output_tdata,
output logic m_axis_output_tvalid,
input logic m_axis_output_tready
);

//-------------------- Parameter sanity checks --------------------\\
initial begin
if (MW % SIMD != 0) begin
$error("Matrix width (%0d) is not a multiple of SIMD (%0d).", MW, SIMD);
$finish;
end
if (MH % PE != 0) begin
$error("Matrix height (%0d) is not a multiple of PE (%0d).", MH, PE);
$finish;
end
if (WEIGHT_WIDTH > 8) begin
$error("Weight width of %0d-bits exceeds maximum of 8-bits", WEIGHT_WIDTH);
$finish;
end
if (ACTIVATION_WIDTH > 8) begin
if (!(SIGNED_ACTIVATIONS == 1 && ACTIVATION_WIDTH == 9 && COMPUTE_CORE == "mvu_vvu_8sx9_dsp58")) begin
$error("Activation width of %0d-bits exceeds maximum of 9-bits for signed numbers on DSP48", ACTIVATION_WIDTH);
$finish;
end
end
if (COMPUTE_CORE == "mvu_vvu_8sx9_dsp58") begin
if (SEGMENTLEN == 0) begin
$warning("Segment length of %0d defaults to chain length of %0d", SEGMENTLEN, (SIMD+2)/3);
end
if (SEGMENTLEN > (SIMD+2)/3) begin
$error("Segment length of %0d exceeds chain length of %0d", SEGMENTLEN, (SIMD+2)/3);
$finish;
end
end
if (!IS_MVU) begin
if (COMPUTE_CORE != "mvu_vvu_8sx9_dsp58" && COMPUTE_CORE != "mvu_vvu_lut") begin
$error("VVU only supported on DSP58 or LUT-based implementation");
$finish;
end
end

// //- Pumping Constraints ---------
// if(PUMPED_COMPUTE) begin
// if(SIMD % 2 != 0) begin
// $error("Odd SIMD=%0d is incompatible with pumped compute.", SIMD);
// $finish;
// end
// end
end

uwire clk = ap_clk;
uwire clk2x = ap_clk2x;
uwire rst = !ap_rst_n;

//- Replay to Accommodate Neuron Fold -----------------------------------
typedef logic [(IS_MVU? 1:PE)*SIMD-1:0][ACTIVATION_WIDTH-1:0] mvu_flatin_t;
uwire mvu_flatin_t amvau;
uwire alast;
uwire afin;
uwire avld;
uwire ardy;

localparam int unsigned SF = MW/SIMD;
localparam int unsigned NF = MH/PE;
replay_buffer #(.LEN(SF), .REP(IS_MVU ? NF : 1), .W($bits(mvu_flatin_t))) activation_replay (
.clk, .rst,
.ivld(s_axis_input_tvalid), .irdy(s_axis_input_tready), .idat(mvu_flatin_t'(s_axis_input_tdata)),
.ovld(avld), .ordy(ardy), .odat(amvau), .olast(alast), .ofin(afin)
);

//- Unflatten inputs into structured matrices ---------------------------
localparam int unsigned ACT_PE = IS_MVU? 1 : PE;
typedef logic [PE -1:0][SIMD-1:0][WEIGHT_WIDTH -1:0] mvu_w_t;
typedef logic [ACT_PE-1:0][SIMD-1:0][ACTIVATION_WIDTH-1:0] mvu_a_t;

uwire mvu_w_t mvu_w = s_axis_weights_tdata;

//- Conditional Activations Layout Adjustment for VVU
uwire mvu_a_t amvau_i;
if (IS_MVU || (PE == 1)) begin : genMVUInput
assign amvau_i = amvau;
end : genMVUInput
else begin : genVVUInput
// The input stream will have the channels interleaved for VVU when PE>1
// Hence, we need to 'untangle' the input stream, i.e. [..][SIMD*PE][..] --> [..][PE][SIMD][..]
// Note that for each 'SIMD' (S) and 'PE' (P) element, we have something like:
// (S_0, P_0), ..., (S_0, P_i), (S_1, P_0), ..., (S_1, P_i), ..., (S_i, P_i) which we need to 'untangle' to
// (S_0, P_0), ..., (S_i, P_0), (S_0, P_1), ..., (S_i, P_1), ..., (S_i, P_i)
for(genvar pe = 0; pe < ACT_PE; pe++) begin
for(genvar simd = 0; simd < SIMD; simd++) begin
assign amvau_i[pe][simd] = amvau[simd*ACT_PE+pe];
end
end
end : genVVUInput

//- Flow Control Bracket around Compute Core ----------------------------
uwire en;
uwire istb = avld && s_axis_weights_tvalid;
assign ardy = en && s_axis_weights_tvalid;
assign s_axis_weights_tready = en && avld;

//- Conditionally Pumped DSP Compute ------------------------------------
typedef logic [PE-1:0][ACCU_WIDTH-1:0] dsp_p_t;
uwire ovld;
uwire dsp_p_t odat;
if(1) begin : blkDsp
localparam int unsigned EFFECTIVE_SIMD = SIMD_UNEVEN && PUMPED_COMPUTE ? SIMD+1 : SIMD;
localparam int unsigned DSP_SIMD = EFFECTIVE_SIMD/(PUMPED_COMPUTE+1);
typedef logic [PE -1:0][DSP_SIMD-1:0][WEIGHT_WIDTH -1:0] dsp_w_t;
typedef logic [ACT_PE-1:0][DSP_SIMD-1:0][ACTIVATION_WIDTH-1:0] dsp_a_t;

uwire dsp_clk;
uwire dsp_en;

uwire dsp_last;
uwire dsp_zero;
uwire dsp_w_t dsp_w;
uwire dsp_a_t dsp_a;

uwire dsp_vld;
uwire dsp_p_t dsp_p;

if(!PUMPED_COMPUTE) begin : genUnpumpedCompute
assign dsp_clk = clk;
assign dsp_en = en;

assign dsp_last = alast && avld;
assign dsp_zero = !istb;
assign dsp_w = mvu_w;
assign dsp_a = amvau_i;

assign ovld = dsp_vld;
assign odat = dsp_p;
end : genUnpumpedCompute
else begin : genPumpedCompute
assign dsp_clk = clk2x;

// Identify second fast cycle just before active slow clock edge
logic Active = 0;
if(1) begin : blkActive
uwire clk_lut[2]; // Put some LUT delay on the input from the fast clock net
(* DONT_TOUCH = "TRUE", HLUTNM = "CLK_LUT" *) LUT1 #(.INIT(2'b10)) lut0(.O(clk_lut[0]), .I0(clk));
(* DONT_TOUCH = "TRUE", HLUTNM = "CLK_LUT" *) LUT1 #(.INIT(2'b10)) lut1(.O(clk_lut[1]), .I0(clk_lut[0]));
always_ff @(posedge clk2x) Active <= clk_lut[1];
end : blkActive

// The input for a slow cycle is split across two fast cycles along the SIMD dimension.
// - Both fast cycles are controlled by the same enable state.
// - A zero cycle is duplicated across both fast cycles.
// - The last flag must be restricted to the second fast cycle.

dsp_w_t W = 'x;
for(genvar pe = 0; pe < PE; pe++) begin : genPERegW

uwire [2*DSP_SIMD-1:0][WEIGHT_WIDTH-1:0] w;
for(genvar i = 0; i < SIMD; i++) assign w[i] = mvu_w[pe][i];
for(genvar i = SIMD; i < 2*DSP_SIMD; i++) assign w[i] = 0;

always_ff @(posedge clk2x) begin
if(rst) W[pe] <= 'x;
else if(en) W[pe] <= w[(Active? DSP_SIMD : 0) +: DSP_SIMD];
end

end : genPERegW

dsp_a_t A = 'x;
for(genvar pe = 0; pe < ACT_PE; pe++) begin : genPERegA

uwire [2*DSP_SIMD-1:0][ACTIVATION_WIDTH-1:0] a;
for(genvar i = 0; i < SIMD; i++) assign a[i] = amvau_i[pe][i];
for(genvar i = SIMD; i < 2*DSP_SIMD; i++) assign a[i] = 0;

always_ff @(posedge clk2x) begin
if(rst) A[pe] <= 'x;
else if(en) A[pe] <= a[(Active? DSP_SIMD : 0) +: DSP_SIMD];
end

end : genPERegA

logic Zero = 1;
logic Last = 0;
always_ff @(posedge clk2x) begin
if(rst) begin
Zero <= 1;
Last <= 0;
end
else if(en) begin
Zero <= !istb;
Last <= alast && avld && Active;
end
end

assign dsp_en = en;
assign dsp_last = Last;
assign dsp_zero = Zero;
assign dsp_w = W;
assign dsp_a = A;

// Since no two consecutive last cycles will ever be asserted on the input,
// valid outputs will also always be spaced by, at least, one other cycle.
// We can always hold a captured output for two cycles to allow the slow
// clock to pick it up.
logic Vld = 0;
dsp_p_t P = 'x;
always_ff @(posedge clk2x) begin
if(rst) begin
Vld <= 0;
P <= 'x;
end
else if(en) begin
if(dsp_vld) P <= dsp_p;
Vld <= dsp_vld || (Vld && !Active);
end
end
assign ovld = Vld;
assign odat = P;

end : genPumpedCompute

case(COMPUTE_CORE)
"mvu_vvu_8sx9_dsp58":
mvu_vvu_8sx9_dsp58 #(.IS_MVU(IS_MVU), .PE(PE), .SIMD(DSP_SIMD), .ACTIVATION_WIDTH(ACTIVATION_WIDTH), .WEIGHT_WIDTH(WEIGHT_WIDTH),
.ACCU_WIDTH(ACCU_WIDTH), .SIGNED_ACTIVATIONS(SIGNED_ACTIVATIONS), .SEGMENTLEN(SEGMENTLEN),
.FORCE_BEHAVIORAL(FORCE_BEHAVIORAL)) core (
.clk(dsp_clk), .rst, .en(dsp_en),
.last(dsp_last), .zero(dsp_zero), .w(dsp_w), .a(dsp_a),
.vld(dsp_vld), .p(dsp_p)
);
"mvu_4sx4u":
mvu_4sx4u #(.PE(PE), .SIMD(DSP_SIMD), .ACCU_WIDTH(ACCU_WIDTH), .SIGNED_ACTIVATIONS(SIGNED_ACTIVATIONS), .FORCE_BEHAVIORAL(FORCE_BEHAVIORAL)) core (
.clk(dsp_clk), .rst, .en(dsp_en),
.last(dsp_last), .zero(dsp_zero), .w(dsp_w), .a(dsp_a),
.vld(dsp_vld), .p(dsp_p)
);
"mvu_8sx8u_dsp48":
mvu_8sx8u_dsp48 #(.PE(PE), .SIMD(DSP_SIMD), .ACCU_WIDTH(ACCU_WIDTH), .ACTIVATION_WIDTH(ACTIVATION_WIDTH), .WEIGHT_WIDTH(WEIGHT_WIDTH),
.SIGNED_ACTIVATIONS(SIGNED_ACTIVATIONS), .FORCE_BEHAVIORAL(FORCE_BEHAVIORAL)) core (
.clk(dsp_clk), .rst, .en(dsp_en),
.last(dsp_last), .zero(dsp_zero), .w(dsp_w), .a(dsp_a),
.vld(dsp_vld), .p(dsp_p)
);
"mvu_vvu_lut":
mvu_vvu_lut #(.IS_MVU(IS_MVU), .PE(PE), .SIMD(DSP_SIMD), .ACCU_WIDTH(ACCU_WIDTH), .ACTIVATION_WIDTH(ACTIVATION_WIDTH),
.WEIGHT_WIDTH(WEIGHT_WIDTH), .SIGNED_ACTIVATIONS(SIGNED_ACTIVATIONS), .M_REG(M_REG_LUT)) core (
.clk(dsp_clk), .rst, .en(dsp_en),
.last(dsp_last), .zero(dsp_zero), .w(dsp_w), .a(dsp_a),
.vld(dsp_vld), .p(dsp_p)
);
default: initial begin
$error("Unrecognized COMPUTE_CORE '%s'", COMPUTE_CORE);
$finish;
end
endcase

end : blkDsp

//-------------------- Output register slice --------------------\\
// Make `en`computation independent from external inputs.
// Drive all outputs from registers.
struct packed {
logic rdy;
logic [PE-1:0][ACCU_WIDTH-1:0] dat;
} A = '{ rdy: 1, default: 'x }; // side-step register used when encountering backpressure
struct packed {
logic vld;
logic [PE-1:0][ACCU_WIDTH-1:0] dat;
} B = '{ vld: 0, default: 'x }; // ultimate output register

assign en = A.rdy;
uwire b_load = !B.vld || m_axis_output_tready;

always_ff @(posedge clk) begin
if(rst) begin
A <= '{ rdy: 1, default: 'x };
B <= '{ vld: 0, default: 'x };
end
else begin
if(A.rdy) A.dat <= odat;
A.rdy <= (A.rdy && !ovld) || b_load;

if(b_load) begin
B <= '{
vld: ovld || !A.rdy,
dat: A.rdy? odat : A.dat
};
end
end
end
assign m_axis_output_tvalid = B.vld;
// Why would we need a sign extension here potentially creating a higher signal load into the next FIFO?
// These extra bits should never be used. Why not 'x them out?
assign m_axis_output_tdata = { {(OUTPUT_STREAM_WIDTH_BA-OUTPUT_STREAM_WIDTH){B.dat[PE-1][ACCU_WIDTH-1]}}, B.dat};

endmodule : mvu_vvu_axi
Loading