This five-day workshop by VSD (https://www.vlsisystemdesign.com/) gives an insight into RTL design and synthesis through theory and labs. It briefly explains the following steps of chip design:
- Digital design.
- Validate the design through functional simualations and writing testbenches.
- Logical synthesis of the validated design.
- Gate-level simulation of the synthesized net-list.
The workshop uses SkyWater OpenSource Process Design Kit (or PDK in short). SkyWater PDK (https://github.com/google/skywater-pdk) is a Free and Open Source Silicon(or FOSS in short) PDK that is an output of the collaboration between Google and SkyWater Technology. It targets the SKY130 process node (180nm- 130nm hybrid technology).
This repo aims to make a record of the labs performed while briefing on what is covered in each video of the worshop. Each header in the table of contents below indicates a superficial heading which would comprise of either one or a few video's gist.
- Introduction to open-source Verilog simulator iverilog
- Labs using iverilog and gtkwave
- Introduction to Yosys and Logic Synthesis
- Labs using Yosys and SKY130 PDKs
- Introduction to timing.libs
- Heirarchial vs Flat Synthesis
- Various flop coding styles and Optimisation
- Introduction to optimisations
- Combinational logic optimisations
- Sequential logic optimisations
- Sequential optimisations for unused outputs
- GLS, Synthesis-Simulation mismatch, Blocking and Non-blocking statements
- Labs on Sythesis and Simulation mismatch
- Labs on Synthesis and Simulation mismatch for blocking statements
- If case constructs
- Labs on Incomplete If case
- Labs on Incomplete Overlapping If case
- For loop and For generate
- Labs on For loop and For generate
This is an introductory video to iverilog, Design, and testbenches. This video is best briefed in question-answer format.
Note: iverilog (or Icarus Verilog http://iverilog.icarus.com/) is a free (but copyrighted?) Verilog simulator and synthesis tool.
Question- What is a simulator?
Answer- The design is done to meet a set of given specifications. A simulator is a TOOL used to check whether the final RTL design is matching in accordance with what we initially wanted to design.
Question- What is a testbench?
Answer- A testbench (or TB in short) is a code that is used to provide stimulus to the RTL design and hence verify its functionality. This is done by generating something called test_vectors.
Question- How does a simulator work?
Answer- A simulator keeps 'checking' for changes in the input and changes the output accordingly. No change in the input => No change in the output
Question- Explain the testbench set-up using a block diagram.
Note: The design may have primary input(s) and primary output(s), but a testbench doesn't.
Question- What is gtkwave?
Answer- The iverilog simulator takes in the design file and the testbench file as inputs and gives out a vcd file (VCD: Value Change Dump, indicating that it changes with the change in input value). In order to view the vcd file, we use a tool called gtkwave. It is a GTK+ based waveform viewer (https://github.com/gtkwave/gtkwave).
There are three videos under this heading.
Video-1:
This is an introductoy video to the labs. This is Lab-1 It explains about the tool flow and the file set-up required for other labs. The following steps are followed:
-
Open RemoteSpark and open the linux terminal.
-
Paste this code in the terminal:
git clone https://github.com/kunalg123/vsdflow.git
//This creates a clone of vsdflow. You should see the folder on your desktop//
The below sequence of codes will show the files that are present in the vsdflow folder:
-
In the vsdflow directory, clone another repository by using:
git clone https://github.com/kunalg123/sky130RTLDesignAndSynthesisWorkshop.git
By using the ls command, we should be able to see the following folder in the vsdflow folder:
-
Check for the standard SKY130 library using the following sequence of commands:
-
Come back to the sky130RTLDesignAndSynthesisWorkshop directory
note: usecd ..
to come back one step out of the present directory, and get into verilog_files folder.
note: usecd foldername
to enter into the foldername directory.
Now use the ls command to view all the verilog source files and testbench files required for the workshop labs. A screenshot is provided for reference:
Video-2:
This is part-1 of the actual lab videos. This is Lab-2 part-1It talks on how to use iverilog and gtkwave. A 2x1 MUX is implemented (loaded).
-
A verilog file called good_mux.v(from the verilog_files directory) is called along with its TB file tb_good_mux.v. The command used for this purpose is iverilog good_mux.v tb_good_mux.v. A screenshot from the terminal is attached for reference:
-
Once the above command is run, a new file called a.out appears in the verilog_files directory (refer below screenshot):
-
The following command is used to execute this a.out file.
./a.out
This is going to create a dumpfile (refer below screenshot):
-
This VCD file is then run using gtkwave command as shown below:
-
After the above step, a new window pops-up (GTKWave Analyzer). It is shown below:
-
If we observe the timescale in the above pic, each time step is 1ps (1ps = 10^-12s), but the simulation time is 300000ps (refer step4 pic last line). Because of this, we can't observe the whole waveform until we zoom out. For this, click on this button (shown below):
Then the waveforms should appear as shown:
Check out the below figure to know what various other buttons do:
Video-3:
This is part-2 of the actual lab videos. This is Lab-3 part-2 In this video the file structure is analysed.
The following command is used to open the .v files:
gvim tb_good_mux.v -o good_mux.v
note: Check the directory before typing the above command. It should be ~/Desktop/vsdflow/sky130RTLDesignAndSynthesisWorkshop/verilog_files$
The following window then opens:
The following is the TB (design code is completely seen in the above pic. Note that the design follows behavioural style):
Note:
Firstly, if iVerilog can perform both simulation and synthesis (below pic taken from iverilog website http://iverilog.icarus.com/), why use Yosys?
A co-workshopian(is this word even there?) asked same question and the TA replied:
This section also have three videos.
Video-1:
In this video it is explained as to how we go about synthesizing the good_mux design using Yosys.
Actually.... the right video sequence for this heading is video2 -> video3 -> video1. So I'll doucment in this sequence:
Video-2:
This video explains what logic synthesis is and how we go about synthesizing logic.
Question: What is RTL Design?
Answer: It is the behavioural representation of the given specification.
So, we have a code (RTL) and we want physical implementation of it (Using NAND Gates etc.,). This mapping is done by Synthesis.
Design(input) -> Decide which gates to use -> Make connections between these gates -> Create a Netlist based on these connections (output)
and Frontend Library (input)
Question: What is a library file (extension .lib)?
Answer: It is a 'bucket' of logical gates. It'll have different variations of a same gate (slow, medium, fast, 2 input, 3 input etc.,).
Question: Why do we need different flavours of gates?
Video-3:
Answer: To meet the timing requirements (primarily... power and area are other creteria).
Note : Faster gates provide good processing power but would be more wide and hence occupy more space. They might also lead to hold-time violations.
Just a quick recap: Hold-time is the time required for the signal to stay stable by the time clock arrives. (I'm the signal waiting at rly station for the clock train). The guidance offered to the Synthesizer regarding hold-time and set-up time violations is called 'Constraints'.
Video-1:
A bit about Yosys from the abstract of its manual:
- The design is given as input to Yosys using
read_verilog
command. - The .lib is given as input using
read_liberty
command. - The ouput netlist is generated using
write_verilog
command.
Note :Liberty format is an industry standard to describe library cells of a particular technology (https://vlsiuniverse.blogspot.com/2016/12/liberty-format-introduction.html). It's manual can be found at https://people.eecs.berkeley.edu/~alanmi/publications/other/liberty07_03.pdf.
Question: How to verify the synthesis?
Answer: Earlier we verified the design by giving the design file and the TB as inputs to the iverilog and then observing the output (VCD file) using gtkwave. Now, we give the Netlist(instead of design file) and TB as inputs to iverilog and observe in gtkwave. The waveforms should be exactly same as what we got when we run using design file.
Whatever is discussed regarding synthesis, is implemented in this heading in three videos.
Video-1
- Yosys is invoked using
yosys
command. NOTE: Check the directory before invoking yosys. It SHOULD be verilog_files directory as all the my_lib files are here.
- Use
read_liberty -lib
command to read the library.
- Use
read_verilog
command to read the design file.
Note: Need to explore about AST representation and RTLIL representation later. Kunal's reply when one of the workshopian raised this query:
It's an intermediate representation (like proto RTL)
You can find all documentation here
http://eddiehung.github.io/yosys/d1/d01/structRTLIL_1_1Design.html
- Use
synth -top
. More about this in later part of document.
Note: Later, explore all the data that pops up after this command.
Small bit of all that data is snapped and put below. It shows the final result:
- Use
abc -liberty
. Here, we need to specify the directory and be wary of the double underscore before 'tt' sky130_fd_sc_hd__tt025C_1v80.lib.
Note: This command basically converts RTL into gates and decides what gates it has to link to. Here too explore the data. Result snippet below:
What happens if we execute the above command without specifying a particular library (in our case sky130)?
The ABC loads it's in-built library and executes the command.
courtesy: Sam Kambadur
- Use
show
command to see the graphical version (graphviz file with .dot extension) of logic it has synthesized.
Note: The instructor's graphical version (shown below) is different from the one in mine. Probably the library got updated with a mux_cell.
Video-2
The instructor makes sense of the graphical version that he got.
Video-3
- Use
write_verilog
to get the netlist.
Note: We can give any name, but good_mux_netlist makes more sense.
- The netlist can be viewed by using
!gvim good_mux_netlist.v
command line.
- We may choose to use
write_verilog -noattr good_mux_netlist.v
to see an easily readable version of netlist. The below pic shows code after executing !gvim command.
This section has three videos
Video-1
This video covers what exactly .lib file contains.
Use gvim ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
to open the .lib file.
PVT: Process, Votage, Temperature -> These are very critical in working of a design.
tt: Typical (as in slow, fast, typical)
025C: Temperature
Video-2
From the highlighted line onwards, as we go down we can observe various parameters mentioned.
Also, the as discussed earlier, the .lib file contains information about variations of a same cell type. That can be observed below (a311o_2 and a311o_4 for example):
Note in the gvim window, type the follwing code and get the .v file of a2111o gate:
:sp ../my_lib/lib/sky130_fd_sc_hd__a2111o.behavioural.v
Instead, if we wish to see the .v file with power ports included, type
:sp ../my_lib/lib/sky130_fd_sc_hd__a2111o.behavioural.pp.v
The below snapshot is taken from the lecture (as I didn't want to tamper the .lib file). It is observed that the parameters such as power etc., are different for different varity of cells even if they are all 2 input AND gates only.
It was also observed by one of the workshopian that the attribute cell_footprint is same for all the variants. Below is the summary of the discussion I had with him.
The cell_footprint attribute is given the same to all cells during the abc phase.
After PnR, whichever footprint class needs to be assigned (as per layout specs) would be assigned to this cell_footprint attribute
(by swapping with some other footprint class or not swapping) using the in_place_swap_mode attribute.
Has two videos
Video-1
Question: What does synth -top do?
- Use command
gvim multiple_modules.v
- Open yosys using
yosys
command. - Use
read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
- Use
read_verilog multiple_modules.v
.
Note: By mistake, I first executed the read_verilog
command and then used read_liberty
command. After that I used read_verilog
again and the following error showed:
Later, I exited the yosys environment and re-entered the right sequence: read_liberty
and then read_verilog
. Ponder why the wrong sequence won't work.
- Use
synth -top multiple_modules
. - Use
abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
.
Note: Using show
command will generate an error here as the 'multiple_modules' file has two submodules. So we need to specify exactly to show multiple modules.
- Use
show multiple_modules
Notes This is called the Hierarchial Design
- Use
write_verilog -noattr multiple_modules_hier.v
and then use!gvim multiple_modules_hier.v
.
Note: The instructors library (probably an old one) implements OR gate as below:
Check that the input literals are negated and then given as inputs to a NAND gate to realise an OR gate. This could also have been done with a NOR gate followed by an inverter. So, why did the tool 'chose' to use an extra inveter?
Note: My library used a different gate:
The details of this can be found at https://bit.ly/3F8tOUu
Video-2:
- Use
flatten
. - Use
write_verilog -noattr multiple_modules_flat.v
.
Note: We don't see any hierarchies in this one (such as sub-module1 or sub-module2 etc.,). It's a single netlist.
3. Use exit
.
4. Now follow this:
1. yosys
2. read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
3. read_verilog multiple_modules.v
4. write_verilog -noattr multiple_modules.v
5. synth -top multiple_modules
6. flatten
7. show
Note: that there are no sub-modules in this one unlike the hierarchial case.
Given multiple modules, what is the way to synthesize at sub-module level?
1. exit
2. yosys
3. read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
4. read_verilog multiple_modules.v
5. synth -top sub_module1
6. abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
7. show
Question: Why sub-module level syntyhesis?
Answer:
- Let's say we have a sub-module that repeats multiple times in a module. So instead of synthesizing same sub-module multiple times, it is time saving to synthesize it once and replicate it multiple times.
- If we give entire module to synthesize, the tool might get overloaded (and difficult for designer to debug the netlist too). So we divide the whole module into various sub-modules and synthesize these sub-modules seperately.
There are six videos under this heading. It explains how to code a flip-flop (FF), different types of FF available, and different coding styles possible. All the FF are available under verilog_files. Open these files using gvim.
Video-1
Question: Why Flip-Flops?
Theory: Glitch by propagation delay
It is evident from the above explanation that combinational circuits suffer from glitches. Combination of combinational circuits is even more prone.
So we want an element to store the value, making it more stable and less prone to glitches. And that is done by a FF.
In set-up1, the glitch gets propagated to the output. This doesn't happen in set-up2 because of the presence of D-FF (this gives input to output only at posedge clk).
We need to initialise these FF as without initialisation, the combinational ckt would yield a garbage value. Set/Reset are used to initialise a FF. These signals could be either synchronous (with clock) or asynchronous (independent of clock).
Video-2
From the dff_asyncret_syncres.v file:
- Observe that we don't give synchronous signal as trigger when calling the always block.
- If we wish to have asynchronous (or synchronous) Set instead of reset, just assign the output a value of 1 instead of 0 in the if/else conditions.
Video-3:
We first analyse the D FF with asynchronous reset. Use the following commands:
1. iverilog dff_asyncres.v tb_dff_asyncres.v
2. ./a.out
3. gtkwave tb_dff_asyncres.vcd
Note: By mistake, gave the extension in line 3 as .v instead of .vcd and the tool asked to check whether it is a vcd file!
The right command gives the follwing waveform:
It may be observed that the output goes low as soon as async_reset goes high independent of the clock signal or D.
Next we analyse the D FF with synchronous and asynchronous reset. Use the following commands:
1. iverilog dff_asyncres_syncres.v tb_dff_asyncres_syncres.v
2. ./a.out
3. gtkwave tb_dff_asyncres_syncres.vcd
In the above snapshot, observe that when the clk is low the output is still high even if the sync_reset is high. As soon as the posedge clk is detected, the output falls to zero since the sync_reset is high.
That is not the case in case of async_reset. As can be witnessed from the above pic, once async_reset is high, the output falls to zero irrespective of clk or D.
Video-4:
In this video, the above circuits are synthesized using yosys.
First is asynchrnous reset DFF. Follow the below commands:
1. read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
2. read_verilog dff_asyncres.v
3. synth -top dff_asyncres
4. dfflibmap -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib /*Many times we'll have a seperate library for FF, but here it's all in my_lib only*/
5. abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
6. show
Next is synchronous/Asynchrnous reset DFF. Follow same commands as above by replacing dff_asyncres with dff_asyncres_syncres.
Video-5
Question: What happens when we try multiplying a 3-bit number by 2?
Use the following commands in the yosys environment:
1. read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
2. read_verilog mult_2.v
3. synth -top mul_2
4. abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
5. show
Video-6
Multiplying a 3-bit number with 9.
Check out the below theory:
Use following commands
1. read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
2. read_verilog mult_8.v
3. synth -top mult8
4. abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
5. show
This section has three videos. The first video briefs about combinational optimisation techniques while the next two videos brief about sequential optimisations.
Video-1:
This is called Boolean Logic Optimisation.
Video-2:
Sequential logic optimisation techniques could be braodly classified as basic techniques (constant propagation) and advanced techniques. The following are the advanced techniques:
- State optimisation
- Retiming
- Sequential Logic Cloning (Floor Plan Aware synthesis)
Note: If the same set-up as above were to be done for a FF with Set input instead of Reset, we'd be tempted to say that the output follows Set signal. But upon timing diagram, we'd realise that the Set can make the output high asynchronously, but can't make it low!
Video-3:
- State Optimisation: Minimizing the number of states to reduce the number of FF used.
- Cloning: Observe the following figure
But how is this an optimisation?
- Retiming: Observe the following figure
This heading has two videos under it.
Video-1:
We use 'opt' files. These files can be seen by using ls *opt*
command.
The following snippet shows what's in the opt_check.v:
The following snippet shows what's in the opt_check2.v:
So basically, opt_check is a mux that'd act as an And gate while opt_check2 is a mux that'd act as an OR gate.
Upon synthesis, we see that the tool actually uses AND gate and OR gate instead of MUX based implementation. Refer below pics.
Note: The instructor uses a command opt_clean -purge
before implementing the abc, to make all the optimisations. Intrestingly, I haven't used that command and still got an optimised output! This is happening probably because yosys automatically does some basic optimisation even if we don't ask it to.
Video-2:
Testing the optimisation for opt_check3.v, opt_check4.v, and multiple_module_opt.v. Note that the designs having multiple modules should be flattened before optimisation (but why?).
Refer below snippets:
opt_check3.v below:
opt_check4.v below:
multiple_module_opt.v below (flattened the hierarchy):
Note: Synthesizing the above design, without flattening it resulted in the same graphviz too. So, why exactly should we flatten before optimizing?
Implemented the multiple_module_opt without flattening and without opt and the graphviz file is as below:
Video-1:
The files dff_const1.v and dff_const2.v are used. Opening these files in gvim using dff_const1.v -o dff_const2.v
will open both of them in a same window.
Video-2:
The below snippet is for dff_const3.v:
Video-3:
After synthesizing, this is what is found:
The below snippet shows dff_const4.v and dff_const5.v opened in gvim:
The below snippet shows the synthesized dff_const4.v and dff_const5.v:
Let's say we are getting a 3-bit output from a module, while we use only one of these bits as a primary output. The remaining bits which are not being used would be optimised. For example, consider:
But the synthesizer actually gives the below output:
(The output is taken and inverted and fed as input to the same FF)
This is called SEQUENTIAL OPTIMISATION FOR UNUSED OUTPUTS
Note: Need to explore it further by changing the source file a little bit (changing output bit length, changing the initial state etc.,)
Question: What is GLS?
Answer: Originally we ran simualtions with RTL code as unit under test. Now we'll run the simulation with netlist as unit under test. This is called Gate Level Simulation (or GLS in short).
Question: Why GLS?
Answer: We wish to verify whether the logic holds even after synthesis (reason why it might be different is answered next). Also, GLS considers the timing as we run it with Delay annotation (https://www.linkedin.com/pulse/gate-level-simulation-comprehensive-overview-jerry-mcgoveran/).
Question: Why does Synthesis-Simulation Mismatch happen?
Answer: Mostly three reasons:
- Missing sensitivity list: Example..for a mux, let's say we ask the ouput to respond only for change in select (but not input change)
- Blocking/Non-blocking mishap: Let's say we have two FF, the output of which is assigned as input to the other. If we assign this first and then assign the input of first FF to it's output, it'll create a mishap. Use Non-blocking for sequential circuits.
- Non-Standard Verilog coding<br /
Another instance of simulation-synthesis mismatch:
\snippet of full code\
always @(*)
begin
y = m&c;
m = a|b;
end
Observe that here, m is being used before it's value get's updated (previous value m is used). This seems like a register is being used, but in synthesis, y = c.(a+b). This is a simulation synthesis mismatch. (Doubt: Simulation without optimisation and Synthesis with optimisation.. would that be a sim-sys mismatch?)
The files ternary_operator_mux.v and its TB are given to iverilog and then the waveforms are viewed in gtkwave analyser:
The above waveform shows the behaviour of 2x1 mux.
Now we synthesize it and check:
Use the following commands:
1. iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v ternary_operator_mux.v tb_ternary_operator_mux.v
2. ./a.out
3. gtkwave tb_ternary_operator_mux.vcd
Blocking caveat implemented before GLS is shown below:
The GLS is shown below:
Cross verifying with RTL gtkwave, we observe that the waveforms won't match.
If statement
- If esle statements are implemented as a series of muxes.
- Bad coding style leads to inferred latches (casued because of incomplete if statements).
- Inferred latches are good for sequentil circuit design but should not be there for combinational circuit design.
Case statement
- Case statement is also going to infer a mux
- Incomplete case statement will also lead to inferred latches (for the undefined cases, the variables will latch onto output).
- Case statements with default case will avoid inferred latches.
- Partial assignments is another caveat that creates inferred latches. (Assign all the outputs in all the segments of case)
Note that if-else is executed on a priority manner. Meaning that if I have an if else statement, when the if condition is satisfied, the code comes out to end. This is not the same in 'case'. All the cases are checked one-by-one and whichever case satisfies is executed. That is why we should not have an overlapping case statement.
The file incomp_if.v is analysed:
Input the following commands:
1. gvim *incomp* -o /this opens all the files that have the string 'incomp' in their file name/
2. iverilog incomp_if.v tb_incomp_if.v
3. ./a.out
4. gtkwave tb_incomp_if.vcd
Upon synthesis, it yields following graphviz:
The file incomp_if2.v is analysed:
Upon simulation:
Upon synthesis:
Use gvim comp_case.v -o incomp_case.v -o partial_case_assign.v -o bad_case.v
to open these files's script in gvim.
The following are the observations for incomp_case.v:
Upon simulation:
Upon synthesis:
The following are the observations for comp_case.v:
Upon simulation:
Upon synthesis:
The following is a partial assign case:
Consider the code snippet below with i0, i1, and i2 as inputs, and x, and y as ouputs:
case(sel)
2'b00 : begin
y = i0;
x = i2;
end
2'b01 : y = i1;
default : begin
x = i1;
y =i2;
end
endcase
In case 01, x is not defined. So, even if we have a default case, when the case is 01, x will infer a latch.
Observe that there are no latches in the path of y. There is one latch in x's path.
The following is a bad case:
Consider the code snippet below with i0, i1, and i2 as inputs, and y as ouput:
case(sel)
2'b00 : y = i0;
2'b01 : y = i1;
2'b10 : y = i2;
2'b1? : y = i1;
endcase
This is a bad case as the tool will get confused whether to assign y as i1 or i2 when the MSB of select line is high.
Loop constructs are of two types: for loop (evaluating expressions) and generate for loop (instantiating hardware multiple times)
For loops are used in 'always' block, while the Generate for loops are used outside of the always blocks.
- For loop comes in handy in implementing very wide mux/de-mux
- We can also use if-generate block, but for-generate and if-generate should be used only outside the always block.
One example where for generate comes in handy would be a ripple carry adder where a full adder needs to be instantiated multiple times.
Use gvim mux_generate.v
to open the .v file.
The following is the code snippet:
for (k = 0; k < 4; k = k+1)
begin
if (k == sel)
y = i_int[k];
end
This generates a 4x1 mux. Just by changing the max value of k, we can scale up the design!
After simulation:
Note: Download tools and explore other files too. There is still ripple carry adder to be done.
Thanks to vsd for an insightful workshop. I'll keep updating this repo as I explore more of sky130.