diff --git a/README.md b/README.md index 66b5a01..20f8c1b 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,111 @@ -# ns3-rdma -NS3 simulator for RDMA over Converged Ethernet v2 (RoCEv2), including the implementation of DCQCN, TIMELY, PFC, ECN and shared buffer switch +# NS-3 simulator for RDMA +This is an NS-3 simulator for RDMA over Converged Ethernet v2 (RoCEv2). It includes the implementation of DCQCN, TIMELY, PFC, ECN and Broadcom shared buffer switch. -# Note -TIMELY module has not been merged into this yet. We are working on merging it. We will also add descriptions for this project soon. +It is based on NS-3 version 3.17, and ported to Visual Studio environment, as explained [here](https://www.nsnam.org/wiki/Ns-3_on_Visual_Studio_2012). + +## Note +TIMELY module has not been merged into this yet. We are working on merging it. + +## Quick Start + +### Build +To compile it out-of-the-box, you need Visual Studio. +People have successfully built it with *free* version, +which can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=48146). +Open windows/ns-3-dev/ns-3-dev.sln, just build the whole solution. + +You may try building it with the original Makefile, etc. We have done it a while back, but now you probably need to edit a few things to make it work. + +### Run +The binary will be generated at windows/ns-3-dev/x64/Release/main.exe. +We include a sample configuration file at windows/ns-3-dev/x64/Release/mix/config.txt +Execute main.exe in windows/ns-3-dev/x64/Release/: +``` +cd windows\ns-3-dev\x64\Release\ +main.exe mix\config.txt +``` + +It runs a 2:1 incast at 40Gbps for 1 second. Please allow a few minutes for it to finish. +The trace will be generated at mix/mix.tr, as defined by mix/config.txt + +There are quite a few options in mix/config.txt. We will gradually add documentation. +For your own convenience you can just check the code, +project "main" -- source files -- "third.cc", and see how these options are parsed. +You can also raise issues if you have any questions. + +## What did we add exactly? + +**point-to-point/model/qbb-net-device.cc** and all other qbb-* files: + +DCQCN and PFC implementation. +It also includes go-back-to-N and go-back-to-0 that handle packet drop due to corruption. + +In 2013, we got a very basic NS-3 PFC implementation somewhere, and developed based on it. +We cannot find the original repository anymore. + +**network/model/broadcom-node.cc** and **.h**: + +This implements a Broadcom ASIC switch model, which +is mostly doing all kinds of buffer threshold-related operations. These include deciding +whether PFC should be triggered, ECN should be marked, buffer is too full so packets should +be dropped, etc. It supports both static and dynamic thresholds for PFC. + +*Disclaim: this module is purely based on authors' personal understanding of Broadcom ASIC. It does not reflect any official confirmation from either Microsoft or Broadcom.* + +**network/utils/broadcom-egress-queue.cc** and **.h**: + +This is the actual MMU buffering packets. +It also includes switch scheduler, i.e., when upper layer ask for a packet to send, it will +decide which queue to be dequeued. Strategies like strict priority and round robin are supported. + +**applications/model/udp-echo-client.cc**: + +We implement the RDMA client here, which aligns +with the fact that RoCEv2 includes UDP header. In particular, original UDP client has troubles +when PFC pause the link. Original UDP client keeps sending packets at line rate, soon +it builds up huge queue and memory runs out. Here we throttle the sending rate if it gets +pushed back by PFC. + +**internet/model/seq-ts-header.cc** and **.h**: + +We didn't implement the full InfiniBand +header. Instead, what we really need is just the sequence number (for detecting corruption +drops, and also help us understand the throughput) and timestamp (required by TIMELY.) +This is where we encode this information into packets. + +**main/third.cc**: + +The main() function. + +There may be other edits here and there, especially the trace generation is scattered +among various network stacks. But above are the major ones. + +## Q&A + +**Q: Why do you port it to Windows?** + +A: This is a Microsoft project. Visual Studio, including the free version, works well. + +**Q: Fine. What if I want to run it on Linux, and do not want to spend time changing the build process?** + +A: You can build it using Visual Studio and run the .exe using WINE. We have tested WINE 1.6.2 and it works well. + +**Q: I don't understand ... (some part of the code or configuration)** + +A: Raise issues on GitHub, so that your questions can also help others. If you really do +not want others know you are working on this, you can email yibzh@microsoft.com + +**Q: What papers should I cite, if I also publish?** + +A: Below are the ones you should definitely check. They are ranked from most relevant to +less. That said, all of them are quite relevant: + +*ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY*, CoNEXT'16 (this project is released with this paper, we ask you to at least cite this paper if you use this code.) + +*Congestion Control for Large-scale RDMA Deployments*, SIGCOMM'15 (DCQCN) + +*TIMELY: RTT-based Congestion Control for the Datacenter*, SIGCOMM'15 (TIMELY) + +*RDMA over Commodity Ethernet at Scale*, SIGCOMM'16 (discussed go-back-to-N) + +*Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them*, HotNets'16 (PFC deadlock analysis, directly used this simulator.) \ No newline at end of file diff --git a/windows/ns-3-dev/mix/flow_tcp.txt b/windows/ns-3-dev/mix/flow_tcp.txt deleted file mode 100644 index 0e5d646..0000000 --- a/windows/ns-3-dev/mix/flow_tcp.txt +++ /dev/null @@ -1,24 +0,0 @@ -2 -2 1 1 10000000 2.0 9.5 -3 1 1 10000000 2.0 9.5 -4 1 1 10000000 2.0 9.5 -5 1 1 10000000 2.0 9.5 -6 1 1 10000000 2.0 9.5 -7 1 1 10000000 2.0 9.5 -8 1 1 10000000 2.0 9.5 -9 1 1 10000000 2.0 9.5 -10 1 1 10000000 2.0 9.5 -11 1 1 10000000 2.0 9.5 -12 1 1 10000000 2.0 9.5 -13 1 1 10000000 2.0 9.5 -14 1 1 10000000 2.0 9.5 -15 1 1 10000000 2.0 9.5 -16 1 1 10000000 2.0 9.5 -17 1 1 10000000 2.0 9.5 -18 1 1 10000000 2.0 9.5 -19 1 1 10000000 2.0 9.5 -20 1 1 10000000 2.0 9.5 -21 1 1 10000000 2.0 9.5 - -First line: flow # -src dst pg packet# diff --git a/windows/ns-3-dev/mix/topology.txt b/windows/ns-3-dev/mix/topology.txt deleted file mode 100644 index 16f496d..0000000 --- a/windows/ns-3-dev/mix/topology.txt +++ /dev/null @@ -1,29 +0,0 @@ -22 1 21 -0 -0 1 40Gbps 0.0001ms -0 2 40Gbps 0.0001ms -0 3 40Gbps 0.0001ms -0 4 40Gbps 0.0001ms -0 5 40Gbps 0.0001ms -0 6 40Gbps 0.0001ms -0 7 40Gbps 0.0001ms -0 8 40Gbps 0.0001ms -0 9 40Gbps 0.0001ms -0 10 40Gbps 0.0001ms -0 11 40Gbps 0.0001ms -0 12 40Gbps 0.0001ms -0 13 40Gbps 0.0001ms -0 14 40Gbps 0.0001ms -0 15 40Gbps 0.0001ms -0 16 40Gbps 0.0001ms -0 17 40Gbps 0.0001ms -0 18 40Gbps 0.0001ms -0 19 40Gbps 0.0001ms -0 20 40Gbps 0.0001ms -0 21 40Gbps 0.0001ms - -First line: total node #, switch node #, link # -Second line: switch node IDs... -src0 dst0 rate delay -src1 dst1 rate delay -... \ No newline at end of file diff --git a/windows/ns-3-dev/mix/config.txt b/windows/ns-3-dev/x64/Release/mix/config.txt similarity index 52% rename from windows/ns-3-dev/mix/config.txt rename to windows/ns-3-dev/x64/Release/mix/config.txt index 13396b4..4d0647b 100644 --- a/windows/ns-3-dev/mix/config.txt +++ b/windows/ns-3-dev/x64/Release/mix/config.txt @@ -1,21 +1,21 @@ ENABLE_QCN 1 USE_DYNAMIC_PFC_THRESHOLD 1 PACKET_LEVEL_ECMP 0 -FLOW_LEVEL_ECMP 0 +FLOW_LEVEL_ECMP 1 PAUSE_TIME 5 PACKET_PAYLOAD_SIZE 1000 -TOPOLOGY_FILE C:\ns-3-win2\windows\ns-3-dev\mix\topology.txt -FLOW_FILE C:\ns-3-win2\windows\ns-3-dev\mix\flow.txt -TCP_FLOW_FILE C:\ns-3-win2\windows\ns-3-dev\mix\flow_tcp.txt -TRACE_FILE C:\ns-3-win2\windows\ns-3-dev\mix\trace.txt -TRACE_OUTPUT_FILE Z:\mix.tr +TOPOLOGY_FILE mix/topology.txt +FLOW_FILE mix/flow.txt +TCP_FLOW_FILE mix/flow_tcp_0.txt +TRACE_FILE mix/trace.txt +TRACE_OUTPUT_FILE mix/mix.tr SEND_IN_CHUNKS 0 APP_START_TIME 1.0 APP_STOP_TIME 10.0 -SIMULATOR_STOP_TIME 2.05 +SIMULATOR_STOP_TIME 3.01 CNP_INTERVAL 50 ALPHA_RESUME_INTERVAL 55 @@ -23,11 +23,17 @@ NP_SAMPLING_INTERVAL 0 CLAMP_TARGET_RATE 1 CLAMP_TARGET_RATE_AFTER_TIMER 0 RP_TIMER 60 -BYTE_COUNTER 300000 +BYTE_COUNTER 300000000 DCTCP_GAIN 0.00390625 -KMAX 1000 -KMIN 40 -PMAX 1.0 +KMAX 1000 +KMIN 40 +PMAX 1.0 FAST_RECOVERY_TIMES 5 RATE_AI 40Mb/s RATE_HAI 200Mb/s + +ERROR_RATE_PER_LINK 0.0000 +L2_CHUNK_SIZE 4000 +L2_WAIT_FOR_ACK 0 +L2_ACK_INTERVAL 256 +L2_BACK_TO_ZERO 0 \ No newline at end of file diff --git a/windows/ns-3-dev/mix/flow.txt b/windows/ns-3-dev/x64/Release/mix/flow.txt similarity index 50% rename from windows/ns-3-dev/mix/flow.txt rename to windows/ns-3-dev/x64/Release/mix/flow.txt index d827d34..b24a52f 100644 --- a/windows/ns-3-dev/mix/flow.txt +++ b/windows/ns-3-dev/x64/Release/mix/flow.txt @@ -1,9 +1,7 @@ -4 +2 2 1 3 10000000 2.0 9.5 3 1 3 10000000 2.0 9.5 -4 1 3 10000000 2.0 9.5 -5 1 3 10000000 2.0 9.5 First line: flow # -src dst pg packet# +src dst priority packet# start_time end_time diff --git a/windows/ns-3-dev/mix/flow_tcp_0.txt b/windows/ns-3-dev/x64/Release/mix/flow_tcp_0.txt similarity index 100% rename from windows/ns-3-dev/mix/flow_tcp_0.txt rename to windows/ns-3-dev/x64/Release/mix/flow_tcp_0.txt diff --git a/windows/ns-3-dev/x64/Release/mix/topology.txt b/windows/ns-3-dev/x64/Release/mix/topology.txt new file mode 100644 index 0000000..29b4b3a --- /dev/null +++ b/windows/ns-3-dev/x64/Release/mix/topology.txt @@ -0,0 +1,11 @@ +4 1 3 +0 +0 1 40Gbps 0.001ms 0 +0 2 40Gbps 0.001ms 0 +0 3 40Gbps 0.001ms 0 + +First line: total node #, switch node #, link # +Second line: switch node IDs... +src0 dst0 rate delay error_rate +src1 dst1 rate delay error_rate +... diff --git a/windows/ns-3-dev/mix/trace.txt b/windows/ns-3-dev/x64/Release/mix/trace.txt similarity index 100% rename from windows/ns-3-dev/mix/trace.txt rename to windows/ns-3-dev/x64/Release/mix/trace.txt