Commit 7bab5d84 authored by Alessandro Rubini's avatar Alessandro Rubini

Merge branch 'pfilter-cleanup'

parents 0a74c14f e35610a6
......@@ -83,17 +83,6 @@ config CMD_CONFIG
reports the current configuration. This adds half a kilobyte
to the binary size (100b for the code plus the .config file).
config NIC_PFILTER
depends on WR_NODE
depends on ETHERBONE
bool "Add packet filter rules for wr-nic"
help
When using wr-nic (7S), the host must receive frames that are not
ptp nor etherbone ones. This adds the needed filter rules
to that effect. Such rules are not needed when no Etherbone
is there, because in that case all non-ptp frames reach the
host.
#
# This is a set of configuration options that should not be changed by
# normal users. If the "developer" menu is used, the binary is tainted.
......
......@@ -60,9 +60,7 @@ LDFLAGS_PLATFORM = -mmultiply-enabled -mbarrel-shift-enabled \
-nostdlib -T $(LDS-y)
# packet-filter rules depend on configuration; default is rules-plain
pfilter-y := rules-plain.bin
pfilter-$(CONFIG_IP) := rules-ebone.bin
pfilter-$(CONFIG_NIC_PFILTER) := rules-e+nic.bin
pfilter-y := rules-default.bin
export pfilter-y
all:
......
......@@ -10,7 +10,6 @@ CONFIG_PPSI=y
CONFIG_UART=y
CONFIG_W1=y
CONFIG_IP=y
CONFIG_NIC_PFILTER=y
# CONFIG_CMD_CONFIG is not set
#
......
......@@ -11,100 +11,167 @@
/* Endpoint Packet Filter/Classifier driver
A little explanation: The WR core needs to classify the incoming packets into
two (or more categories):
- PTP, ARP, DHCP packets, which should go to the WRCore CPU packet queue (mini-nic)
- Other packets matching user's provided pattern, which shall go to the external fabric
port - for example to Etherbone, host network controller, etc.
- packets to be dropped (used neither by the WR Core or the user application)
A little explanation: The WR core needs to classify the incoming
packets into two (or more) categories. Current HDL (as of 2015)
uses two actual classes even if the filter supports 8. Class bits
0-3 go to the CPU (through the mini-nic) and class bits 4-7 go to
the internal fabric. The frame, in the fabric, is prefixed with
a status word that includes the class bits.
The CPU is expected to receive PTP, ICMP, ARP and DHCP replies (so
local "bootpc" port).
The fabric should receive Etherbone (i.e. UDP port 0xebd0), the
"streamer" protocol used by some CERN installation (ethtype 0xdbff)
and everything else if the "NIC pfilter" feature by 7Solutions is used.
The logic cells connected to the fabric do their own check on the
frames, so it's not a problem if extra frames reach the fabric. Thus,
even if earlier code bases had a build-time choice of filter rulese,
we now have a catch-all rule set. Please note that the CPU is
not expected to receive a lot of undesired traffic, because it has
very limited processing power.
One special bit means "drop", and we'll use it when implementing vlans
(it currently is not used, because the fabric can receive extra stuff,
but it was activated by previous rule-sets.
0. Introduction
------------------------------------------
WR Endpoint (WR MAC) inside the WR Core therefore contains a simple microprogrammable
packet filter/classifier. The classifier processes the incoming packet, and assigns it
to one of 8 classes (an 8-bit word, where each bit corresponds to a particular class) or
eventually drops it. Hardware implementation of the unit is a simple VLIW processor with
32 single-bit registers (0 - 31). The registers are organized as follows:
WR Endpoint (WR MAC) inside the WR Core therefore contains a simple
microprogrammable packet filter/classifier. The classifier processes
the incoming packet, and assigns it to one of 8 classes (an 8-bit
word, where each bit corresponds to a particular class) or
eventually drops it. Hardware implementation of the unit is a simple
VLIW processor with 32 single-bit registers (0 - 31). The registers
are organized as follows:
- 0: don't touch (always 0)
- 1 - 22: general purpose registers
- 23: drop packet flag: if 1 at the end of the packet processing, the packet will be dropped.
- 24..31: packet class (class 0 = reg 24, class 7 = reg 31). -- see 2. below for "routing" rules.
- 1 - 22: general purpose registers (but comparison can only write to 1-14).
- 23: drop packet flag. It takes precedence to class assignment
- 24..31: packet class (class 0 = reg 24, class 7 = reg 31). See "2." below.
Program memory has 64 36-bit words, but you should use only 32,
because the pfilter is synchronouse with frame ingress, and frames
can be as short as 64 bytes long.
The packet filtering program is restarted every time a new packet comes.
The operations, <oper> and <oper2> below, are one of:
AND, NAND, OR, NOR, XOR, XNOR, MOV, NOT
Program memory has 64 36-bit words. Packet filtering program is restarted every time a new packet comes.
There are 5 possible instructions.
1. Instructions
------------------------------------------
1.1 CMP offset, value, mask, oper, Rd:
There are 5 possible instructions.
1.1 CMP offset, value, mask, <oper>, Rd:
------------------------------------------
* Rd = Rd oper ((((uint16_t *)packet) [offset] & mask) == value)
* Rd = Rd <oper> ((((uint16_t *)packet) [offset] & mask) == value)
Examples:
* CMP 3, 0xcafe, 0xffff, MOV, Rd
will compare the 3rd word of the packet (bytes 6, 7) against 0xcafe and if the words are equal,
1 will be written to Rd register.
* CMP 4, 0xbabe, 0xffff, AND, Rd
will do the same with the 4th word and write to Rd its previous value ANDed with the result
of the comparison. Effectively, Rd now will be 1 only if bytes [6..9] of the payload contain word
0xcafebabe.
* CMP 3, 0xcafe, 0xffff, MOV, Rd
will compare the 4th word of the packet (offset 3: bytes 6, 7) against
0xcafe and if the words are equal, 1 will be written to Rd register.
* CMP 4, 0xbabe, 0xffff, AND, Rd
Note that the mask value is nibble-granular. That means you can choose a particular
set of nibbles within a word to be compared, but not an arbitrary set of bits (e.g. 0xf00f, 0xff00
and 0xf0f0 masks are ok, but 0x8001 is wrong.
will do the same with the 4th word and write to Rd its previous
value ANDed with the result of the comparison. Effectively, Rd
now will be 1 only if bytes [6..9] of the payload contain word
0xcafebabe.
1.2. BTST offset, bit_number, oper, Rd
Note that the mask value is nibble-granular. That means you can
choose a particular set of nibbles within a word to be compared, but
not an arbitrary set of bits (e.g. 0xf00f, 0xff00 and 0xf0f0 masks
are ok, but 0x8001 is wrong.
The target of a comparison can be register 1-15 alone.
1.2. BTST offset, bit_number, <oper>, Rd
------------------------------------------
* Rd = Rd oper (((uint16_t *)packet) [offset] & (1<<bit_number) ? 1 : 0)
* Rd = Rd <oper> (((uint16_t *)packet) [offset] & (1<<bit_number) ? 1 : 0)
Examples:
* BTST 3, 10, MOV, 11
will write 1 to reg 11 if the 10th bit in the 3rd word of the packet is set (and 0 if it's clear)
* BTST 3, 10, MOV, 11
1.3. Logic opearations:
will write 1 to reg 11 if the 10th bit in the 3rd word of the
packet is set (and 0 if it's clear)
1.3. LOGIC2 Rd, Ra, <oper>, Rb
-----------------------------------------
* LOGIC2 Rd, Ra, OPER Rb - 2 argument logic (Rd = Ra OPER Rb). If the operation is MOV or NOT, Ra is
taken as the source register.
* LOGIC3 Rd, Ra, OPER Rb, OPER2, Rc - 3 argument logic Rd = (Ra OPER Rb) OPER2 Rc.
* Rd = Ra <oper> Rb
If the operation is MOV or NOT, Ra is taken as the source register
and Rb is ignored.
1.4. LOGIC3 Rd, Ra, <oper>, Rb, <oper2> Rc
-----------------------------------------
* Rd = (Ra <oper> Rb) <oper2> Rc
1.4. Misc
-----------------------------------------
FIN instruction terminates the program.
NOP executes a dummy instruction (LOGIC2 0, 0, AND, 0)
IMPORTANT:
- the program counter is advanved each time a 16-bit words of the packet arrives.
- the CPU doesn't have any interlocks to simplify the HW, so you can't compare the
10th word when PC = 2. Max comparison offset is always equal to the address of the instruction.
- Code may contain up to 64 operations, but it must classify shorter packets faster than in
32 instructions (there's no flow throttling)
- the program counter is advanved each time a 16-bit words of the
packet arrives.
- the CPU doesn't have any interlocks to simplify the HW, so you
can't compare the 10th word when PC = 2. Max comparison offset is
always equal to the address of the instruction.
- Code may contain up to 64 operations, but it must classify shorter
packets faster than in 32 instructions (there's no flow
throttling)
2. How the frame is routed after the pfilter
-----------------------------------------
After the input pipeline is over, the endpoint looks at the DROP and CLASS bits
set by the packet filter. There are two possible output ports: one associated with
classes 0..3 (to the cpu) and one associated to class 4..7 (external fabric).
These are the rules, in strict priority order.
After the input pipeline is over, the endpoint looks at the DROP bit.
If the frame is not dropped, it is passed over to the MUX (xwrf_mux.vhd),
with the CLASS bits attached (well, pre-pended).
The MUX is configured with two classes in wr_code.vhd:
mux_class(0) <= x"0f";
mux_class(1) <= x"f0";
This is the behaviour:
- If at least one bit is set in the first class (here 0x0f), the frame
goes to CPU and processing stops. Processing is in class order,
not bit order.
- If "DROP" is set, the frame is dropped irrespective of the rest. Done.
- If at least one bit is set in the first set (bits 0..3), the frame goes to CPU. Done.
- If at least one bit is set in the second set (bits 4..7) the frame goes to fabric. Done.
- No class is set, the frame goes to the fabric.
- If at least one bit is set in the second class (here 0xf0) the frame
goes to fabric and processing stops.
If, in the future or other implementations, the same pfilter is used with a differnet
set of bits connected to the output ports (e.g. three ports), ports are processed
from lowest-number to highest-number, and if no bit is set it goes to the last.
- If no class is set, the frame goes to the last class, the fabric.
If, in the future or other implementations, the same pfilter is used
with a different MUX configuration, ports are processed from lowest-number
to highest-number, and if no bit is set it goes to the last.
In the wr-nic gateware, a second MUX is connected, using two classes
and bitmasks 0x20 and 0x80 as I write this note. Such values must
be changed for consistency with the other configurations.
Please note that the "class" is actually a bitmask; it's ok to set more than one bit
in a single nibble, and the downstream user will find both set (for CPU we have the
class in the status register).
*/
#include <stdio.h>
......@@ -144,24 +211,22 @@ enum pf_symbolic_regs {
/* The first set is used for straight comparisons */
FRAME_BROADCAST = R_1,
FRAME_PTP_MCAST,
FRAME_OUR_MAC,
FRAME_TAGGED,
FRAME_TYPE_IPV4,
FRAME_TYPE_PTP2,
FRAME_TYPE_ARP,
FRAME_ICMP,
FRAME_UDP,
FRAME_TYPE_STREAMER, /* An ethtype by Tom, used in gateware */
FRAME_PORT_ETHERBONE,
PORT_UDP_HOST,
PORT_UDP_ETHERBONE,
R_TMP,
/* These are results of logic over the previous bits */
FRAME_IP_UNI,
FRAME_IP_OK, /* unicast or broadcast */
FRAME_PTP_OK,
FRAME_STREAMER_BCAST,
/* A temporary register, and the CPU target */
R_TMP,
FRAME_FOR_CPU, /* must be last */
};
......@@ -297,10 +362,6 @@ static void pfilter_output(char *fname)
fclose(f);
}
/* We generate all supported rule-sets, those that used to be ifdef'd */
#define MODE_ETHERBONE 1
#define MODE_NIC_PFILTER 2
void pfilter_init(int mode, char *fname)
{
......@@ -309,77 +370,66 @@ void pfilter_init(int mode, char *fname)
/*
* Make three sets of comparisons over the destination address.
* After these 9 instructions, the whole Eth header is available.
* After these instructions, the whole Eth header is there
*/
pfilter_cmp(0, 0x1234, 0xffff, MOV, FRAME_OUR_MAC); /* Use fake MAC: 12:34:56:78:9a:bc */
/* Local frame, using fake MAC: 12:34:56:78:9a:bc */
pfilter_cmp(0, 0x1234, 0xffff, MOV, FRAME_OUR_MAC);
pfilter_cmp(1, 0x5678, 0xffff, AND, FRAME_OUR_MAC);
pfilter_cmp(2, 0x9abc, 0xffff, AND, FRAME_OUR_MAC); /* set when our MAC */
pfilter_cmp(2, 0x9abc, 0xffff, AND, FRAME_OUR_MAC);
/* Broadcast frame */
pfilter_cmp(0, 0xffff, 0xffff, MOV, FRAME_BROADCAST);
pfilter_cmp(1, 0xffff, 0xffff, AND, FRAME_BROADCAST);
pfilter_cmp(2, 0xffff, 0xffff, AND, FRAME_BROADCAST); /* set when dst mac is broadcast */
pfilter_cmp(2, 0xffff, 0xffff, AND, FRAME_BROADCAST);
pfilter_cmp(0, 0x011b, 0xffff, MOV, FRAME_PTP_MCAST);
pfilter_cmp(1, 0x1900, 0xffff, AND, FRAME_PTP_MCAST);
pfilter_cmp(2, 0x0000, 0xffff, AND, FRAME_PTP_MCAST); /* set when dst mac is PTP multicast (01:1b:19:00:00:00) */
/* Tagged is dropped. We'll invert the check in the vlan rule-set */
pfilter_cmp(6, 0x8100, 0xffff, MOV, FRAME_TAGGED);
pfilter_logic2(R_DROP, FRAME_TAGGED, MOV, R_ZERO);
/* Identify some Ethertypes used later */
pfilter_cmp(6, 0x0800, 0xffff, MOV, FRAME_TYPE_IPV4);
/* Identify some Ethertypes used later. */
pfilter_cmp(6, 0x88f7, 0xffff, MOV, FRAME_TYPE_PTP2);
pfilter_cmp(6, 0x0800, 0xffff, MOV, FRAME_TYPE_IPV4);
pfilter_cmp(6, 0x0806, 0xffff, MOV, FRAME_TYPE_ARP);
pfilter_cmp(6, 0xdbff, 0xffff, MOV, FRAME_TYPE_STREAMER);
/* Mark bits for ip unicast and ip-valid (unicast or broadcast) */
pfilter_logic2(FRAME_IP_UNI, FRAME_OUR_MAC, AND, FRAME_TYPE_IPV4);
pfilter_logic3(FRAME_IP_OK, FRAME_BROADCAST, OR, FRAME_OUR_MAC, AND, FRAME_TYPE_IPV4);
/* Ethernet = 14 bytes, Offset to type in IP: 8 bytes = 22/2 = 11 */
pfilter_cmp(11, 0x0001, 0x00ff, MOV, FRAME_ICMP);
pfilter_cmp(11, 0x0011, 0x00ff, MOV, FRAME_UDP);
pfilter_logic2(FRAME_UDP, FRAME_UDP, AND, FRAME_IP_OK);
if (mode & MODE_ETHERBONE) {
/* Mark bits for unicast to us, and for unicast-to-us-or-broadcast */
pfilter_logic3(FRAME_IP_UNI, FRAME_OUR_MAC, OR, R_ZERO, AND, FRAME_TYPE_IPV4);
pfilter_logic3(FRAME_IP_OK, FRAME_BROADCAST, OR, FRAME_OUR_MAC, AND, FRAME_TYPE_IPV4);
/* For CPU: arp broadcast or icmp unicast or ptp */
pfilter_logic3(FRAME_FOR_CPU, FRAME_BROADCAST, AND, FRAME_TYPE_ARP, OR, FRAME_TYPE_PTP2);
pfilter_logic3(FRAME_FOR_CPU, FRAME_IP_UNI, AND, FRAME_ICMP, OR, FRAME_FOR_CPU);
/* Make a selection for the CPU, that is later still added-to */
pfilter_logic3(R_TMP, FRAME_BROADCAST, AND, FRAME_TYPE_ARP, OR, FRAME_TYPE_PTP2);
pfilter_logic3(FRAME_FOR_CPU, FRAME_IP_UNI, AND, FRAME_ICMP, OR, R_TMP);
/* Now look in UDP ports: at offset 18 (14 + 20 + 8 = 36) */
pfilter_cmp(18, 0x0044, 0xffff, MOV, PORT_UDP_HOST); /* bootpc */
pfilter_cmp(18, 0x013f, 0xffff, OR, PORT_UDP_HOST); /* ptp event */
pfilter_cmp(18, 0x0140, 0xffff, OR, PORT_UDP_HOST); /* ptp general */
/* Ethernet = 14 bytes, IPv4 = 20 bytes, offset to dport: 2 = 36/2 = 18 */
pfilter_cmp(18, 0x0044, 0xffff, MOV, R_TMP); /* R_TMP now means dport = BOOTPC */
/* The CPU gets those ports in a proper UDP frame, plus the previous selections */
pfilter_logic3(FRAME_FOR_CPU, FRAME_UDP, AND, PORT_UDP_HOST, OR, FRAME_FOR_CPU);
pfilter_logic3(R_TMP, R_TMP, AND, FRAME_UDP, AND, FRAME_IP_OK); /* BOOTPC and UDP and IP(unicast|broadcast) */
pfilter_logic2(FRAME_FOR_CPU, R_TMP, OR, FRAME_FOR_CPU);
/* Etherbone is UDP at port 0xebd0, let's "or" in the last move */
pfilter_cmp(18, 0xebd0, 0xffff, MOV, PORT_UDP_ETHERBONE);
if (mode & MODE_NIC_PFILTER) {
/* and now copy out the stuff: one cpu class, two fabric classes: 7 etherbone, 6 for anything else */
pfilter_logic2(R_CLASS(0), FRAME_FOR_CPU, MOV, R_ZERO);
pfilter_logic2(R_CLASS(7), FRAME_UDP, AND, PORT_UDP_ETHERBONE);
pfilter_logic2(R_CLASS(6), FRAME_UDP, NAND, PORT_UDP_ETHERBONE);
pfilter_cmp(18,0xebd0,0xffff,MOV, FRAME_PORT_ETHERBONE);
/* Here we had a commented-out check for magic (offset 21, value 0x4e6f) */
pfilter_logic2(R_CLASS(0), FRAME_FOR_CPU, MOV, R_ZERO);
pfilter_logic2(R_CLASS(5), FRAME_PORT_ETHERBONE, OR, R_ZERO); /* class 5: Etherbone packet => Etherbone Core */
pfilter_logic3(R_CLASS(7), FRAME_FOR_CPU, OR, FRAME_PORT_ETHERBONE, NOT, R_ZERO); /* class 7: Rest => NIC Core */
} else {
pfilter_logic3(R_TMP, FRAME_IP_OK, AND, FRAME_UDP, OR, FRAME_FOR_CPU); /* Something we accept: cpu+udp or streamer */
pfilter_logic3(R_DROP, R_TMP, OR, FRAME_TYPE_STREAMER, NOT, R_ZERO); /* None match? drop */
pfilter_logic2(R_CLASS(7), FRAME_IP_OK, AND, FRAME_UDP); /* class 7: UDP/IP(unicast|broadcast) => external fabric */
pfilter_logic2(R_CLASS(6), FRAME_BROADCAST, AND, FRAME_TYPE_STREAMER); /* class 6: streamer broadcasts => external fabric */
pfilter_logic2(R_CLASS(0), FRAME_FOR_CPU, MOV, R_ZERO); /* class 0: all selected for CPU earlier */
}
} else { /* not etherbone */
pfilter_logic3(FRAME_PTP_OK, FRAME_OUR_MAC, OR, FRAME_PTP_MCAST, AND, FRAME_TYPE_PTP2);
pfilter_logic2(FRAME_STREAMER_BCAST, FRAME_BROADCAST, AND, FRAME_TYPE_STREAMER);
pfilter_logic3(R_TMP, FRAME_PTP_OK, OR, FRAME_STREAMER_BCAST, NOT, R_ZERO); /* R_TMP = everything else */
pfilter_logic2(R_CLASS(7), R_TMP, MOV, R_ZERO); /* class 7: all non PTP and non-streamer traffic => external fabric */
pfilter_logic2(R_CLASS(6), FRAME_STREAMER_BCAST, MOV, R_ZERO); /* class 6: streamer broadcasts => external fabric */
pfilter_logic2(R_CLASS(0), FRAME_PTP_OK, MOV, R_ZERO); /* class 0: PTP frames => LM32 */
/*
* Note that earlier we used to be more strict in ptp ethtype (only proper multicast),
* but since we want to accept peer-delay sooner than later, we'd better avoid the checks
*/
}
/*
* Also, please note that "streamer" ethtype 0xdbff and "etherbone" udp port
* 0xebd0 go to the fabric by being part of the "anything else" choice".
*/
pfilter_output(fname);
......@@ -389,8 +439,6 @@ int main(int argc, char **argv) /* no arguments used currently */
{
prgname = argv[0];
pfilter_init(0, "rules-plain.bin");
pfilter_init(MODE_ETHERBONE, "rules-ebone.bin");
pfilter_init(MODE_ETHERBONE | MODE_NIC_PFILTER, "rules-e+nic.bin");
pfilter_init(0, "rules-default.bin");
exit(0);
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment