Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RARP sometimes doesn`t work after power up or reboot of FPGA #227

Closed
mkrivda opened this issue Dec 13, 2023 · 25 comments
Closed

RARP sometimes doesn`t work after power up or reboot of FPGA #227

mkrivda opened this issue Dec 13, 2023 · 25 comments

Comments

@mkrivda
Copy link

mkrivda commented Dec 13, 2023

Hello.

After power-up or reboot of FPGA we have isuue with RARP.
Sometimes it is not working and it is not clear why.
When we check FPGA via JTAG, we always see that FPGA was programmed correctly.
We are using ipbus fw ver.1.13.

Marian

@dpcsankey
Copy link
Collaborator

Tell me more.
One particular module, or random module across a whole set (how large).
Do you see the RARP requests going out on the network.
Does the module think that it's got an IP address.

@mkrivda
Copy link
Author

mkrivda commented Dec 13, 2023

Random module from set of ~20 modules.
We don`t see RARP request from this module.

How to check if the module think it`s got the IP address ?

@dpcsankey
Copy link
Collaborator

dpcsankey commented Dec 13, 2023

There's a port on ipbus_ctrl Got_IP_addr: OUT std_logic;
In our designs I use this to control the 1 Hz blink LED, so I double-blink it whilst waiting for IP address.

@mkrivda
Copy link
Author

mkrivda commented Dec 13, 2023

The 1 Hz link LED is off.
When I access FPGA via JTAG (VIVADO hw manager), I see FPGA is programmed.
After "refresh fpga" there is no change.
After "boot fpga from memory device" it starts to work.

@mkrivda
Copy link
Author

mkrivda commented Dec 14, 2023

I have tested the most frequent board for power on/off cycle (10 x).
8x RARP request doesn`t come.
2x RARP request has come.

Before we didn`t see this behavior.

@dpcsankey
Copy link
Collaborator

Default way of driving the '1 Hz' LED is the '1 Hz' signal coming out of the 'clocks' entity anded with the locked signal from the MMCM(s) for the IPBus clock and the Ethernet clock (details here depend on which PHY you are using).
So no blink would suggest no lock?

@mkrivda
Copy link
Author

mkrivda commented Dec 14, 2023

Clocks from Si5345 are present.
I don`t understand why MMCM is not not able to lock.

Another hint:
Was there any change to IP cores: MAC a PHY for ver.1.13 ?
I have only upgraded old version of them using VIVADO 2020.2.

@dpcsankey
Copy link
Collaborator

Which PHY are you using?
What version had you been running previously?
git diff suggests that the release notes for the various releases are true and I see that Alessandro played with the clock constraints in v1.10, #107

@mkrivda
Copy link
Author

mkrivda commented Dec 14, 2023

I am using PHY ver. 16.2.
Before it was 16.1.

I don`t understand why to constrain ipbus_clk separately if it is generated clock.
In my constraints I have only sysclk.


set_clock_groups -asynchronous -group [get_clocks -include_generated_clocks sysclk]
-group [get_clocks -include_generated_clocks eth_refclk]
-group [get_clocks -include_generated_clocks {ddr4_0_inst0_c0_sys_clk_p ddr4_1_inst0_c0_sys_clk_p sys_clk_p_i}]
-group [get_clocks -include_generated_clocks onu_clk_rxref240]
-group [get_clocks -include_generated_clocks {CLKBC40 gth_ref_clk}]
-group [get_clocks -include_generated_clocks rxoutclk_out[0]_1] \
-group [get_clocks -include_generated_clocks rxoutclk_out[0]_2]
-group [get_clocks -include_generated_clocks rxoutclk_out[0]_3]

@dpcsankey
Copy link
Collaborator

These constraints wouldn't affect the MMCM tho'.

My take so far, I don't think it's me! If you've got the standard 1 Hz blink gated with lock and you see no blink then this says no lock. So this points to the instantiation of the MMCMs? Also there haven't been changes in the default IP since release v1.5 (gig_eth_pcs_pma_gmii_to_sgmii_bridge)

@mkrivda
Copy link
Author

mkrivda commented Dec 14, 2023

It doesn`t get get mmcm_locked from gig_ethernet_pcs_pma_basex_156_25.
The clock 156.25 MHz is preset.
I try to re-generate IP core.

@mkrivda
Copy link
Author

mkrivda commented Mar 4, 2024

I have re-generated IP core gig_ethernet_pcs_pma_basex_156_25.
I have enabled DHCP instead RARP.
I see still the same problem.
Do you know what else I can check ?

@dpcsankey
Copy link
Collaborator

If I look at the ports on ipbus_ctrl it sounds like rst_macclk is never asserted.
Looking at the ports on your clocks entity (clocks_usp_serdes?) this corresponds to rsto_125 never being asserted.
On the old designs (say clocks_7s_extphy) this was forced by the rctr logic, but with clocks_usp_serdes that logic is only in the clk_ipb_b clock domain. Could this be a race condition with dcm_locked being too quick?

@mkrivda
Copy link
Author

mkrivda commented Mar 5, 2024

I have folowed signal from ipbus LED.

  1. step
    locked <= clk_locked and eth_locked;
    eth_locked -> "0"
    clk_locked -> "1"

2.step
eth_locked <= resetdone and mmcm_locked;
resetdone -> "0"
MMCM_locked -> "1"

resetdone is out from gig_ethernet_pcs_pma_basex_156_25
The only reset input for gig_ethernet_pcs_pma_basex_156_25 is "rsti".
rsti => rst_eth
rsto_eth <= rst; -- ethernet startup reset (required!)
rst <= nuke_d2 or not dcm_locked;

It seems rst_eth is not performed.
dcm_locked -> "1" (it was check in step 1)

@mkrivda
Copy link
Author

mkrivda commented Mar 5, 2024

I have checked 2 signals (please see attached pictures):

  • rst_eth (yellow line)
  • rst125 (red line)

In case that IPbus is not working after reboot, rst125 stays always in "1".
IPbus_dead_after_reboot
IPbus_ok_after_reboot

@mkrivda
Copy link
Author

mkrivda commented Mar 5, 2024

rst_eth is sent to gig_ethernet_pcs_pma_basex_156_25, but resetdone is "0" and eth_locked is "0".
eth_done <= (eth_done or eth_locked) keeps signal rst125 in "1" forever.
A question is:
Why gig_ethernet_pcs_pma_basex_156_25 sometimes doesn`t accept rst_eth ?

@dpcsankey
Copy link
Collaborator

Can you remind me which chip you are targeting? Poking around it looks like the PHY is either failing to lock its MMCM or it's failing to complete its reset, so we fail to see locked come out of it, but we need to poke at that now.

@mkrivda
Copy link
Author

mkrivda commented Mar 5, 2024

I use Kintex Ultrascale, XCKU040...2E and XCKU060...2E.

@mkrivda
Copy link
Author

mkrivda commented Mar 6, 2024

mmcm_locked_out from gig_ethernet_pcs_pma_basex_156_25 was check in Step 2 (MMCM_locked -> "1"), so I guess it is reset which is failing.

@mkrivda
Copy link
Author

mkrivda commented Mar 12, 2024

Is there anything else related to the reset of PHY to be checked ?

@mkrivda
Copy link
Author

mkrivda commented Apr 5, 2024

Is there any progress for this issue ?

@mkrivda
Copy link
Author

mkrivda commented Jun 4, 2024

I have implemented ipbus_icap_us_usp and ipbus_iprog_us_usp.
Both use ICAPE3.
Reboot of FPGA via IPROG gives the same result as it is described above.

@dpcsankey
Copy link
Collaborator

I was wondering if I was seeing something similar with eFEX in Point 1, where we reboot OK doing DHCP negotiation but have some garbage on the network which results in alarms for the NetAdmins.
We did packet sniffing with CERN IT this week and my problem looks different to yours.
On mine it looks like the initial reset is missing on reload, (MMCM stays locked???) but toggling the enable signal once I've determined MAC address starts the DHCP negotiation, so as I said in #238 this is exactly what the enable port is for.
Yours it looks like the gig_ethernet_pcs_pma_basex_156_25 doesn't come out of reset. All I can really suggest is compare the implementation to https://docs.amd.com/r/en-US/pg047-gig-eth-pcs-pma possibly generating the equivalent example design, and/or adding state machine to kick rst_eth again if it sticks?

@mkrivda
Copy link
Author

mkrivda commented Jun 18, 2024

I have recompiled test_logic fw using VIVADO 2023.2 and the problem has disappeared.
So, I need to use new version of VIVADO also for a production fw.

@mkrivda mkrivda closed this as completed Jun 28, 2024
@mkrivda
Copy link
Author

mkrivda commented Jun 28, 2024

All types of fw are ok after recompilation with VIVADO 2023.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants