Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vif-plug-representor: fix support for plugging in PF ports #14

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 56 additions & 21 deletions lib/vif-plug-providers/representor/vif-plug-representor.c
Original file line number Diff line number Diff line change
Expand Up @@ -732,40 +732,75 @@ vif_plug_representor_port_prepare(const struct vif_plug_port_ctx_in *ctx_in,
if (ctx_in->op_type == PLUG_OP_REMOVE) {
return true;
}

const char *opt_flavor = smap_get(&ctx_in->lport_options,
"vif-plug:representor:flavor");
const char *opt_pf_mac = smap_get(&ctx_in->lport_options,
"vif-plug:representor:pf-mac");
const char *opt_vf_num = smap_get(&ctx_in->lport_options,
"vif-plug:representor:vf-num");
if (!opt_pf_mac || !opt_vf_num) {
return false;
}

const char *opt_bus_name = smap_get(&ctx_in->lport_options,
Copy link
Member

@dshcherb dshcherb Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to reuse the PF MAC for this: when a vf-num is not specified, we know that this is a request to plug a PF.

The issue with using the PCI info for plugging is that the main (hypervisor) host and the DPU host see two different PCIe topologies and PCI addresses don't match. Whereas the burned-in MAC can be seen from both the hypervisor host and the DPU host via devlink.

Requiring DPU-side information to be passed in vif-plug:representor fields makes an implicit requirement that the requester has privileged access to look that information up. However, the original design decision when writing this was to infer the DPU-side information based on the host-side information (e.g. both hosts can see the PF MAC, board serial, VF numbers). This allows you to have privilege separation between:

  1. the hypervisor service that picks a free PF/VF and is the source for the information to do the lookup of a representor
  2. the CMS that sets values for vif-plug:representor:<field>
  3. the DPU-side service (ovn-controller).

The hypervisor-facing MAC is possible to get via devlink from the DPU:

# Run from the DPU host to get a MAC of a PF via its representor: hw_addr is the MAC address of a PF facing the main host.

ubuntu@bf3:~$ devlink -j port show pci/0000:03:00.0/65536 | jq
{
  "port": {
    "pci/0000:03:00.0/65536": {
      "type": "eth",
      "netdev": "pf0hpf",
      "flavour": "pcipf",
      "controller": 1,
      "pfnum": 0,
      "external": true,
      "splittable": false,
      "function": {
        "hw_addr": "a0:88:c2:42:42:42"
      }
    }
  }
}
# Run from the main (hypervisor) host to get a PF MAC which matches the above hw_addr
ubuntu@hypervisor:~$ ip -j link show enp1s0np0 | jq .[].address
"a0:88:c2:42:42:42"

# If the main host-side MAC for a netdev has been changed programmatically you can still retrieve it via the permaddr (IFLA_PERM_ADDRESS). But this only matters if you collect this info at the main host.
# https://github.com/iproute2/iproute2/commit/2b8e699

In summary: I would add and initialize a separate index to the port table to do a lookup of a PF based on a PF MAC address passed in and in the flavor == DEVLINK_PORT_FLAVOUR_PCI_PF branch that you added use a function to look up a PF by a PF MAC from vif-plug:representor:pf-mac.

/* Port table.
*
* This data structure contains three indexes:
*
* mac_vf_table - port_node by PF MAC and VF number.
* ifindex_table - port_node by netdev ifindex.
* bus_dev_table - port_node by bus/dev name (only contains PHYSICAL and
* PCI_PF ports).
*
* There is a small number of PHYSICAL and PF flavoured ports per device. We
* will need to refer to these for every update we get to a VF in order to
* maintain the PF MAC+VF number index.
*
* Note that there is not really any association between PHYSICAL and PF
* representor ports from the devlink data structure point of view. However
* for systems running a kernel that does not provide the host facing MAC
* through devlink on the PF representor there is a compatibility interface in
* sysfs which is relative to a PHYSICAL ports netdev name (see the
* compat_get_host_pf_mac function).
*/
struct port_table {
struct hmap mac_vf_table; /* Hash table for lookups by mac+vf_num */
uint32_t mac_seed; /* We reuse the OVS mac+vlan hash functions for the
* PF MAC+VF number, and they require a uint32_t seed */
struct hmap ifindex_table; /* Hash table for lookups by ifindex */
struct hmap bus_dev_table; /* Hash table for lookup of PHYSICAL and PF
* ports by their bus_name/dev_name string.
* While there is a large number of VFs or SFs
* they will be associated with a small number
* of PFs */
};
static struct port_table *port_table;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to review @dshcherb much appreciated!

@iulhaq I tend to agree with the point @dshcherb puts forward about the two sides of the PCIe complex being enumerated independently. I have seen with my own eyes systems that expose a behavior where the PCI address for the device do not match.

You addition of a pf-num option is useful though as we do actually currently have a bug for systems that expose two PFs to the host and where the host put them into a bond. The reason we did not catch that is that we used the vendor recommended approach which was to bond at the DPU side and expose only one PF to the host (Example from NVIDIA BlueField documentation).

What would you think of plugging the PF representor on the basis of lack of vf-num option provided, avoiding the addition of the PCI and device name information?

"vif-plug:representor:bus-name");
const char *opt_dev_name = smap_get(&ctx_in->lport_options,
"vif-plug:representor:dev-name");
const char *opt_pf_num = smap_get(&ctx_in->lport_options,
"vif-plug:representor:pf-num");
/* Ensure lookup tables are up to date */
vif_plug_representor_run(NULL);

struct eth_addr pf_mac;
if (!eth_addr_from_string(opt_pf_mac, &pf_mac)) {
VLOG_WARN("Unable to parse option as Ethernet address for lport: %s "
"pf-mac: '%s' vf-num: '%s'",
ctx_in->lport_name, opt_pf_mac, opt_vf_num);
return false;
}
struct port_node *pn;

char *cp = NULL;
uint16_t vf_num = strtol(opt_vf_num, &cp, 10);
if (cp && cp != opt_vf_num && *cp != '\0') {
VLOG_WARN("Unable to parse option as VF number for lport: %s "
"pf-mac: '%s' vf-num: '%s'",
ctx_in->lport_name, opt_pf_mac, opt_vf_num);
char *cp_flavor = NULL;
uint16_t flavor = strtol(opt_flavor, &cp_flavor, 10);
if (cp_flavor && cp_flavor != opt_flavor && *cp_flavor != '\0') {
VLOG_WARN("Unable to parse option as flavor for lport: %s "
"flavor: '%s'",
ctx_in->lport_name, opt_flavor);
}

struct port_node *pn;
pn = port_table_lookup_pf_mac_vf(port_table, pf_mac, vf_num);
if (flavor==DEVLINK_PORT_FLAVOUR_PCI_VF) {
if (!opt_pf_mac || !opt_vf_num) {
return false;
}
struct eth_addr pf_mac;
if (!eth_addr_from_string(opt_pf_mac, &pf_mac)) {
VLOG_WARN("Unable to parse option as Ethernet address for lport: %s "
"pf-mac: '%s' vf-num: '%s'",
ctx_in->lport_name, opt_pf_mac, opt_vf_num);
return false;
}

char *cp = NULL;
uint16_t vf_num = strtol(opt_vf_num, &cp, 10);
if (cp && cp != opt_vf_num && *cp != '\0') {
VLOG_WARN("Unable to parse option as VF number for lport: %s "
"pf-mac: '%s' vf-num: '%s'",
ctx_in->lport_name, opt_pf_mac, opt_vf_num);
}
pn = port_table_lookup_pf_mac_vf(port_table, pf_mac, vf_num);
}
else if(flavor==DEVLINK_PORT_FLAVOUR_PCI_PF) {

char *cp = NULL;
uint16_t pf_num = strtol(opt_pf_num, &cp, 10);
if (cp && cp != opt_pf_num && *cp != '\0') {
VLOG_WARN("Unable to parse option as PF number for lport: %s "
"pf-num: '%s'",
ctx_in->lport_name, opt_pf_num);
}
pn = port_table_lookup_phy_bus_dev(port_table, opt_bus_name,
opt_dev_name,
flavor, pf_num);
}
else {
VLOG_INFO("Flavor not supported '%d 'for lport: %s",
flavor, ctx_in->lport_name);
return false;
}

if (!pn || !pn->netdev_name) {
VLOG_INFO("No representor port found for "
"lport: %s pf-mac: '%s' vf-num: '%s'",
ctx_in->lport_name, opt_pf_mac, opt_vf_num);
"lport: %s", ctx_in->lport_name);
return false;
} else if (port_node_rename_expected(pn)) {
VLOG_INFO("Lookup of representor port successful, but we anticipate "
Expand Down
Loading