Commit 9e31590c authored by Alessandro Rubini's avatar Alessandro Rubini

Merge branch 'developers-docs'

This adds two documents of mine (with greg's additions) that are
considered useful for developers.
parents 7e82f062 584c163b
\input texinfo @c -*-texinfo-*-
%
% snmp-pain.in - main file for the documentation
%
%%%%
%------------------------------------------------------------------------------
%
% NOTE FOR THE UNAWARE USER
% =========================
%
% This file is a texinfo source. It isn't the binary file of some strange
% editor of mine. If you want ASCII, you should "make snmp-pain.txt".
%
%------------------------------------------------------------------------------
%
% This is not a conventional info file...
% I use three extra features:
% - The '%' as a comment marker, if at beginning of line ("\%" -> "%")
% - leading blanks are allowed (this is something I cannot live without)
% - braces are automatically escaped when they appear in example blocks
%
@comment %**start of header
@documentlanguage en
@documentencoding ISO-8859-1
@setfilename snmp-pain.info
@settitle snmp-pain
@iftex
@afourpaper
@end iftex
@paragraphindent none
@comment %**end of header
@setchapternewpage off
@set update-month August 2014
@c the release name below is substituted at build time
@set release __RELEASE_GIT_ID__
@finalout
@titlepage
@title SNMP Pain
@subtitle Why (and how) I suffered while adding SNMP to White Rabbit
@subtitle @value{update-month} (@value{release})
@author Alessandro Rubini
@end titlepage
@headings single
@c ##########################################################################
@iftex
@contents
@end iftex
@c ##########################################################################
@node Top
@top Introduction
This summarizes my experience with @sc{snmp} and the WR switch, which
has been a real pain to me.
I know it's mainly me, as it looks like the rest of the world is
happily using both @sc{snmp} as a protocol and @i{net-snmp} as a daemon. I
hope these notes may help other people within BE-CO-HT in finding a
better way through @sc{snmp} and its implementations.
This version of the document reflects the status of my knowledge and
implementation as of @t{wr-switch-sw-v4.0}.
As of Sep 2014 the document is included in the master branch of the
repository because my mates consider is useful as refernce of what
is there and why. I'll try to keep it up to date with the various
MIB object I add or modify.
@c ##########################################################################
@node The SNMP Protocol
@chapter The SNMP Protocol
The protocol itself is simple, as the name implies, but not really
trivial to digest (unlike SMTP or HTTP, for example). The datagrams
are binary data: it makes sense for efficiency and to avoid writing a
text parser in microcontrollers running the protocol, but it prevents
testing using @t{telnet} or @t{wget}.
The ``easiest'' testing tool is @i{snmpwalk}, which is bloated like
the package it is part of. This is unavoidable for ``complete''
implementations; my problem here is the lack of a simpler tool for
an easy start. See @ref{Tools} for more details.
@c =========================================================================
@node Protocol Definition
@section Protocol Definition
@sc{snmp} is defined in the public RFC documents, but the list of them is
too long to even list here -- it is really dozens of such documents.
A good selection is found in the @t{doc/rfc} subdirectory of the
@i{net-snmp} source archive. Most of the relevant RFCs deal with
defining MIB files (see @ref{The MIB Idea}), and the actual protocol
bits (the frame format) is difficult to find.
Actually, unlike most or all other IETF protocols, @sc{snmp} isn't really
defined in any RFC, but relies on the @i{basic encoding rules} (ITU-T
X.690) for the @i{abstract syntax notation} ASN.1 (i.e., again the MIB
stuff). So the huge lot of information out there tastes like a big loop of
nothing, where everyone takes a lot for granted and refers to everyone
else for details. For me it was like learning Chinese relying on
a number of Chinese dictionaries and nothing else.
Eventually, after making sense of the protocol itself (mainly by
sniffing and reverse engineering, while referring to the Chinese
dictionaries any now and then), I finally changed my mind and agreed
that it makes a good choice, and might be considered for the WR node
as well (SPEC or equivalent). This however is not covered here.
@c =========================================================================
@node Basic Concepts
@section Basic Concepts
The network device runs an ``agent'' and the user runs a ``manager''.
@sc{snmp} is based on the concept of management objects, that can be read
or written. So the requests a manager sends to agents are mainly
``get'' and ``set''.
Management objects are laid out in a tree. Thus an ``object
identifier'' (OID) is like a pathname. The textual representation of
an OID uses a dot as a separator; the binary representation is an
array of integers. For example WR owns the subtree
``1.3.6.1.4.1.96.100'', where the ``...96'' is CERN and ``....1'' is
for organizations. Identifiers are allocated by relevant organizations:
CERN gave us our ``100'' and we are responsible for further levels.
The manager can travel the whole tree or a subtree using a special
``get next'' request.
Additionally, an agent can send traps, but I have no experience with
generating custom traps.
@c =========================================================================
@node The MIB Idea
@section The MIB Idea
The main aim of @sc{snmp} is being simple for the agent, while still
remaining flexible. Thus the choice of a binary protocol and a get/set
approach with an extensible tree of objects as described.
When using a binary protocol on the wire and a simple pathname-based
tree with integers as path components, the need for an higher-level
description of the item arises immediately. The solution adopted by
@sc{snmp} designers is relying on a very-flexible and very-abstract notation:
ASN.1, as already noted.
Each and every management object is thus defined by a MIB, short for
@i{management information base}. At least in the golden theory. A MIB
is a text file, with a very boresome structure, usually suffering from
an over-engineering syndrome. I must admit, though, that the thing
was over-engineered enough to survive the decades passing: it allowed
casting of a number of new and unexpected needs into this old and
unpleasant abstract notation; modern and complex management tools can
make sense of unforeseen management objects, thanks to their
description found in the MIB file.
The @sc{snmp} mantra is ``everything is a MIB''. Or ``just write the MIB
and everything will automatically follow''. The theory says that
the MIB text file can be parsed by the manager's application to show
an user-friendly view of any object; it says that it can be parsed by
the command-line tools so to use user-friendly names in the requests;
it says that it can be parsed by the agent to create reply frames; that
it can be parsed by a code-generator to fill he low-level bits.
So @sc{snmp} is mainly boring stuff for bureaucrats. Everything is hidden
behind ASN.1: it either magically works or it magically fails; there
is no clear documentation of the various levels of software or
protocol, because it is claimed to ``just work'', which is not always
true, despite theory. Also, in practice sensible implementations avoid
the suggested MIB-driven path.
In the end, the @i{wr-switch-sw} package includes
@t{WR-SWITCH-MIB.txt}, but clearly it's not true that this file is the
whole of it.
@c ##########################################################################
@node Choosing an Agent
@chapter Choosing an Agent
When adding @sc{snmp} support to a network device, you should always at
least include the standard management objects, the ones that every
manager expects to find on the system. This is the list of network
interfaces, their features, their traffic statistics, and so on.
Thus, writing your own is not an option: it doesn't make sense to have
an incomplete wr-switch @sc{snmp} support, and re-implementing the huge
number of standard management objects in a custom @sc{snmp} implementation
is too big an effort to even consider it.
I therefore looked at available free-software implementations.
@c =========================================================================
@node Requirements for WR
@section Requirements for WR
In order to choose a proper @sc{snmp} engine, we first need to know what
we expect from it. This a quick attempt at summarizing our requirements:
@itemize @bullet
@item We need support for all the basic actions, including traps, even if
version 4 is not yet using them.
@item We need to customize standard tables. This
mainly involves VLAN support, because the WR switch needs a different
backend than the normal kernel-driven Linux VLANs. Again, not yet
implemented in version 4.
@item We need to add a custom subtree, that includes both simple items
and tables. Simple items are things such as the version string and the
current time (the former is static and the latter is dynamic); tables
are things such as the per-port statistic counters or the PTP slave
list, if and when we choose it implement it.
@end itemize
@c =========================================================================
@node net-snmp
@section net-snmp
Everybody, in the Unix world, is using @i{net-snmp}.
This is a big and bloated implementation, using GNU autotools for
configuration, shared libraries and everything else. It includes
``simple'' tools for querying agents using the command line, and the
usual bells and whistles.
It is included in all Linux distributions, both hosted and embedded ones,
and it really looks like the only choice available. So we installed
this one in the WR switch, by selecting the proper packages in our
@i{buildroot} configuration.
Documentation for @i{net-snmp} is mainly online: the promising @t{doc}
subdirectory in the source tree only hosts the RFC documents, but
fortunately @t{man/} includes documentation for the API and basic
daemon use. Tutorials and all the rest are available on the project
site. This means, among other things, that you can't access most
documentation while off-line (like I am while writing this document)
and you can't easily get documentation for the specific version you
are using, excluding the manual pages.
Still, the documentation is quite complete and well-done, though
featuring the @sc{snmp}-wide misfeature of taking a lot for granted, it is
useful for an expert user but not a great way to become one such beast.
@c =========================================================================
@node Other agents
@section Other agents
Simply put, there are no other choices other than @i{net-snmp}. A
number of proprietary implementations exist, but the free world seems
bound to this implementation.
(This section needs an update with the aid of a net search; I did it
back then but since I found nothing I didn't save the results; and
I'm now offline while writing this).
@c =========================================================================
@node Sub-agents
@section Sub-agents
Adding custom tables to an @sc{snmp} agent is a common requirement, so
RFC-2741 defines the @i{AgentX} protocol. The protocol is run on a
local socket interface, and allows registering handlers for specific
subtrees. @i{net-snmp} supports the AgentX protocol, so this looked
like an interesting option.
This is the result of my evaluation:
@itemize @bullet
@item It is definitely not much used. I could not find any mainstream user
of the feature; @t{lldpd} is the only one Debian package that names
AgentX but it looks like the feature is not used in the Debian build.
@item The code base supports both C and Perl bindings, the preferred
one being Perl. Besides any performance concern, we can't easily run Perl
on the WR switch because @i{buildroot} doesn't support it.
@item It looks like support in @i{net-snmp} is incomplete, but I can't
currently find the reference about this while being offline.
@end itemize
Still, AgentX looked promising, so I evaluated the thing. The most
interesting path being the various Python implementations: we already
have Python in the WR Switch, because @i{buildroot} supports it and we
needed it for the production test suite.
There are three Python implementations, as far as I know:
@table @code
@item https://github.com/rayed/pyagentx
A 2013-2014 implementation, BSD license. This only implements
``get'' and ``get-next'' so it doesn't support our post-v4.0
needs.
@item https://pypi.python.org/pypi/agentx/0.7
This is a 2010 GPL implementation. It looks a little too small
and simplified. I feared support dynamic tables herein would not
be easy.
@item https://pypi.python.org/pypi/netsnmpagent/0.5.0
A 2012-2013 implementation, GPL3. This seems serious, and it
paints itself as the result of a lot of frustration using the
available tools, something I sympathize with. It credits
@i{agentx} (preceding item in this list). Unfortunately traps
are still missing.
@end table
The last option seems a viable one, but I eventually ruled it out
because I'm not confident enough with Python to easily master it (for
example, I'm sure I wouldn't be able to add traps in a reasonable time
frame), and our WR management objects require a C language backend, at
times -- which is another area where my Python lacks.
Likely I was also scared by the length of the @t{SIMPLE-MIB.txt} it
includes, even if in the end I wrote my MIB anyways.
After evaluating sub-agents, I chose to stick to the shared-library
mechanism offered by @i{net-snmp}, as described in later chapters.
@c =========================================================================
@node Tools
@section Tools
As said, I didn't find any simple tool to make @sc{snmp} queries. So I stuck
to @t{snmpwalk}. It's worth noting that @t{pysnmpwalk} (part
of the @t{python-pysnmp4-apps} Debian package) works as well, with a
compatible command line and output format.
The reason why @t{snmpwalk} is not simple enough for me, is that it
includes a MIB parser, in order to turn numeric pathnames and data into
user-friendly names and values.
I won't repeat here how to use the tool (see other documentation or
the trivial examples in the @t{wr-switch-sw} manual), but I'll note
that @t{-d} provides a dump of @sc{snmp} frames being sent and received,
which is a good helper to understand what is happening under the hood,
and hopefully avoiding a detailed read of the ITU specification for
basic encoding.
@c ##########################################################################
@node Using the Code Generator
@chapter Using the Code Generator
What @i{net-snmp} suggests is the use of its own code generator,
which outputs C sources built from a specified MIB file and command-line
options. The generator is called @i{mib2c}.
There are a number of drawbacks in using the generator, so I finally
refrained from and chose to write code using the internal API
@itemize @bullet
@item The generated code includes parts that must be filled before it
can build and parts just marked as ``@t{XXX}'' but that otherwise
build. Thus, you really need to review the whole ``generated'' files
to bring them to a working state.
@item Some of the calls to be filled are normal API calls, so you need
to be confident with @i{net-snmp} internal data structures; the
same effort you need to spend before you are able to write code
by yourself.
@item The code is laid out as a number of big @t{switch} stanzas, without
relying on data structures. This makes editing the generated files
a heavy, repeating and failure-proof procedure.
@item The generated code includes a number of repetitions, so the same edits
must be redone at least twice (like filling the same @t{switch}
construct in two different places.
@item There are a number of options for the code generator: @i{mib2c}
uses a number of different templates, and I'm pretty sure not
all of them are used in practice; so I fear making the wrong choice
and finally hit bugs that are not my own.
@item Experimenting several options to compare them is unfeasible, because
every edit to fill specific data structures must be redone each
time. Similarly, you risk making the wrong choice and redo the
edits under a different template at a later time.
@item @i{mib2c} leaks object names found in the MIB file into the source
code it outputs, and it does it everywhere in the output files. Thus,
during development, you'll need to redo your
edits several times to keep the C files in sync with a moving MIB
definition. And the usual trick of re-applying the same patch doesn't
always work, because the MIB names appear all around the generated
file.
@item Documentation claims that the standard @i{net-snmp} modules are
based on @i{mib2c}, but while looking at the actual code I
didn't really found such traces. I admit that grepping for
``@t{auto-generated by mib2c}'' on the source tree finds a
number of matches, but most of them look heavily edited, and
none of them matches my expectations of a ``simple'' file.
@end itemize
I used the generator for the initial trial, @i{wrsScalar}, as
described in detail in @ref{wrsScalar}, but then I gave up. My failed
experiments are still part of the @t{netsnmp-pain} branch, pushed to
@t{ohwr.org}.
For example, commit @t{ed1d654} uses the ``mib for dummies'' option of
@i{mib2c} for the @t{pStats} table included in the local WR MIB file.
This crated 4000 lines of source code, in 12 files, and 1000 lines of
``README'' files. I would take days just to make sense of them.
The final statistics source file, not using the generator, is 230
lines of code.
@c ##########################################################################
@node Writing Real Code
@chapter Writing Real Code
In the end, I managed to make the thing work, using different
approaches for the different objects in the WR subtree. The objects
currently defined are the following ones, all under
@t{.1.3.6.1.4.1.96.100}, as defined in @t{WR-SWITCH-MIB.txt}:
@example
wrsScalar OBJECT IDENTIFIER ::= { wrSwitchMIB 1 }
wrsPstats OBJECT IDENTIFIER ::= { wrSwitchMIB 2 }
wrsPpsi OBJECT IDENTIFIER ::= { wrSwitchMIB 3 }
wrsVersion OBJECT IDENTIFIER ::= { wrSwitchMIB 4 }
wrsDate OBJECT IDENTIFIER ::= { wrSwitchMIB 5 }
@end example
All of the objects are read-only as of @t{wr-switch-sw-v4.0}. The
CamelCase naming convention matches what is found all over
@t{net-snmp}.
The various source files are built as a shared library; the
configuration file of @t{snmpd} instructs it to load the library at
run time.
Documentation claims that the same source files can be compiled to run
as an AgentX process or be directly linked to the @t{snmpd} binary. I
chose the shared-library build because this avoids patching
@t{net-snmp} within @i{buildroot}: maintaining direct source files
rather than patches is way easier. I didn't feel safe in using the
little-practiced AgentX protocol (especially for the upcoming ``set''
queries). Last but not least, the shared library build is what
Integrasys successfully used for version 2 of the WR switch.
@c =========================================================================
@node wrsScalar
@section wrsScalar
This is my first trial in making sense of the thing, and has no
relevance for White Rabbit. The object is an integer that is
incremented each time it is read. The source file comes from @i{mib2c}
using this ``intuitive'' sequence of commands (assuming you
build @t{wr-switch-sw} and @t{WRS_OUTOUT_DIR} is available.
@example
export BUILD_DIR="$WRS_OUTPUT_DIR/build/buildroot-2011.11/output/build"
export MIBDIRS=$BUILD_DIR/netsnmp-5.6.1.1/mibs
export MIBS=./WR-SWITCH-MIB.txt
$BUILD_DIR/netsnmp-5.6.1.1/local/mib2c \
-I $BUILD_DIR/netsnmp-5.6.1.1/local \
-c mib2c.scalar.conf \
wrsScalar
@end example
In practice, you need to set @t{MIBSDIR} and @t{MIBS} in the
environment, so both your own MIB and the ``standard'' ones are
available to the tool. Standard MIBs are needed to get type
definitions, using an ``import'' statement. Then you point your @i{-I}
to @i{mib2c} inside @i{net-snmp} itself and execute the program from
the same place.
The integer itself, like all scalar values, is returned in
item ``0'' of the subtree (thus, @t{1.3.6.1.4.1.96.100.1.0}).
@c =========================================================================
@node wrsPstats
@section wrsPstats
@c -------------------------------------------------------------------------
@node The pStats Table
@subsection The pStats Table
This subtree, @t{wrSwitchMIB.2}, is a table. All tables in @sc{snmp} are
described as being made of ``lines'' and ``columns''. The columns are
hardwired (in the MIB and in the code), and the lines can be dynamic
(this matches how people usually write tables).
The OID scanning, however, is reversed from our habit and the tables
are returned column by column. This happens because columns are
defined in the MIB (each column is a directory, or a subtree, in the
pathname of OIDs) while lines being dynamic can only appear as
trailing items in the scanning.
This implies, among other things, that any piece of code returning a
dynamic table should build an internal data structure representing the
whole table, in order to be able to consistently report the same lines
for each column.
Usually in a network-related table, the predefined columns represent
the counters (tx/rx byte, errors, and so on) and each network
interface is a line. This approach allows the same MIB to work for
every possible configuration. For WR port statistics we chose a
different approach: the counters themselves are somehow dynamic (they
may change across versions, while the gateware develops) while the
interfaces are restricted to be in the set @t{wr0}--@t{wr17}.
So our pStats table is reversed from the common use of @sc{snmp} tables.
As a side effect this allows the WR switch to return the name of each
counter, in column 0. This allows greater flexibility when we'll have
a new set of counters: the user-space tools will know the role of the
new set of counters without any need to change them or the MIB file
(we'll still need to change Switch @sc{snmp} code to match the new gateware.
@c -------------------------------------------------------------------------
@node pStats Code
@subsection pStats Code
After several distressing attempts with @i{mib2c}, still present in
the history of the @i{netsnmp-pain} branch in the @t{wr-switch-sw}
repository, I chose to base my code on the ``tcpTable'' implementation
that is present in the core @i{net-snmp} implementation. The TCP
table does not use any @i{mib2c} template but is rather using directly
the API. To be exceedingly safe in my steps, I started by replicating
the TCP table under the WRS subtree, and then I changed it
step-by-step to support the counters. Each and every small commit is
still in the @i{netsnmp-pain} branch, but the final source file was
separately committed to @i{snmp-for-wrs}, which was later merged to
@i{master}.
As described in @ref{The pStats Table}, @sc{snmp} tables are first filled
in local memory and then returned item-by-item to the network manager.
Table filling is performed by @t{wrsPstats_load()}, registered by
@t{init_wrsPstats()} using @t{netsnmp_inject_handler()}. Unlike what
happens in the TCP table, that allocates memory, I use static storage
to load the counter values. Then @t{wrsPstats_first_entry()}
and @t{wrsPstats_next_entry()} are used to scan the table, building the
indexes, but the actual value is returned by @t{wrsPstats_handler()}.
I suspect the thing is not very efficient overall.
One thing I found especially unpleasant in this implementation is the
use of ``context pointers'' in looping through the table. The API
supports the idea of a @t{loop_context} and @t{data_context}, but
elsewhere the loop context is called @t{iterator_context}. This
mismatch in naming in the tcpTable is now inherited in wrsPstats,
but sooner or later I'll fix it. As a side effect, I now use
the two contexts concurrently in an ambiguous way. No, I'm not
proud of this code.
I don't feel confident with all the data structures as yet, and there
still is some magic in all of this. This is confirmed by a buglet in
the current code, that makes @i{snmpwalk} always return one item after
the end of the table -- most likely I need to fix @t{next_entry()}
to return @t{NULL} earlier.
@c =========================================================================
@node wrsPpsi
@section wrsPpsi
This subtree was written in a hurry, and I feel likely we have some
buglets; for example ``servo updates'' is the number of iterations,
which will never exceed 32 bits, but it is reported using a 64-bit
counter. Moreover, not all management objects are actually filled,
but I chose to nail down the MIB even if the code is not completely
there.
This @t{wrSwitchMIB.3} is split in two subtrees: @t{wrSwitchMIB.3.1}
is an array of scalar values (all of them instantiated as @t{.0});
@t{wrSwitchMIB.3.2} is a table. Most functions in the code
use @t{ppsi_g} for globals and @t{ppsi_p} for the per-port table.
To keep the code compact and extensible, I chose to @t{popen(3)} a
connection to existing tools. The tools report information to
@i{stdout} in a line-oriented tagged-format: ``@t{<key>: <value>\n}''.
Thus, @t{wr_mon} now supports @t{-g} (``@t{SHOW_SNMP_GLOBALS}'') and
@t{-p} (``@t{SHOW_SNMP_PORTS}''). By pre-setting environment variables
it's also possible to override the command names, for testing; see
source code for details.
Parsing is implemented using a @t{pickinfo} structure, where each key
is associated to a data type, a pointer and a size. This is is used
to actually @t{sscanf} the value into a global structure. The same
``pickinfo'' table is later used to feed the binary data to @sc{snmp}.
This parsing trick is concise and completely debugged/tested, so I
plan to use to more widely when cleaning up and extending this WR @sc{snmp}
support.
The table of per-port values is scanned using the same steps as of
@i{wrsPstats} (i.e. the tcpTable way, using the @i{net-snmp} iterator
API).
The global items are registered as a ``scalar group'' using the
@i{net-snmp} API. I used @t{disman/expr/expScalars.c} as a reference
and starting point. The function @t{ppsi_g_group()} refreshes the
values (by calling @t{wr_mon -g}) whenever a request happens more than
1 second later than the previous refresh. I'm aware this is a quick
hack, but it works reliably without learning too many intricated API
calls.
@b{Note:} due to a bug in current @i{snmpwalk} implementations and
64-bit values, 64-bit counters are returned with the two halves
swapped. Also, there is no ``signed'' 64-bit value defined anywhere
in @sc{snmp}, thus the picosecond signed offset will be represented by
tools as a huge number when it actually is a small negative value.
@c =========================================================================
@node wrsVersion
@section wrsVersion
The version is an array of scalars, so I used the ``scalar group''
approach like the global PPSi values described in @ref{wrsPpsi}.
The implementation is easier, because I rely on the fact that versions
never change while the process runs. So I retrieve the version strings
at initialization time, by calling ``@t{wrsw_version -t}'' (tagged)
and parsing its @i{stdout}. Parsing is easier than what we have in
@i{wrsPpsi}, but my plan is having a unified parser overall, and eventually
get rid of this simplified special case.
@c =========================================================================
@node wrsDate
@section wrsDate
This subtree includes two scalars: the TAI seconds as a ``counter64''
value, and the human-readable equivalent string.
The code uses a scalar group, as other subtrees described above already did.
It maps @sc{fpga} memory to get the WR date and return it as scalar @sc{snmp}
values.
@b{Note:} due to a bug in current @i{snmpwalk} implementations and
64-bit values, the two halves of the 64-bit date are returned swapped.
We feel returning a 32-bit value would be a worse choice, not being
2038 safe. When the bug is overall fixed, we'll be able to avoid
word-swapping and be 2038-safe without changing the MIB file.
@bye
@c LocalWords: snmp wrSwitchMIB netsnmp ohwr snmpwalk AgentX buildroot
@c LocalWords: pStats gateware wrsScalar wrsPstats wrsPpsi wrsVersion
@c LocalWords: wrsDate subtree pathname
\input texinfo @c -*-texinfo-*-
%
% wrs-todo.in - main file for the documentation
%
%%%%
%------------------------------------------------------------------------------
%
% NOTE FOR THE UNAWARE USER
% =========================
%
% This file is a texinfo source. It isn't the binary file of some strange
% editor of mine. If you want ASCII, you should "make wrs-todo.txt".
%
%------------------------------------------------------------------------------
%
% This is not a conventional info file...
% I use three extra features:
% - The '%' as a comment marker, if at beginning of line ("\%" -> "%")
% - leading blanks are allowed (this is something I cannot live without)
% - braces are automatically escaped when they appear in example blocks
%
@comment %**start of header
@documentlanguage en
@documentencoding ISO-8859-1
@setfilename wrs-todo.info
@settitle wrs-todo
@iftex
@afourpaper
@end iftex
@paragraphindent none
@comment %**end of header
@setchapternewpage off
@set update-month September 2014
@c the release name below is substituted at build time
@set release __RELEASE_GIT_ID__
@finalout
@titlepage
@title WRS-Todo
@subtitle What is missing or lacking in wr-switch-sw (my own opinion)
@subtitle @value{update-month} (@value{release})
@author Alessandro Rubini
@end titlepage
@headings single
@c ##########################################################################
@iftex
@contents
@end iftex
@c ##########################################################################
@node Top
@top Introduction
This document lists the items I think are suboptimal in the current
@t{wr-switch-sw} package. This is Alessandro's own personal opition,
but it is more or less agreed-with by other developers in the group.
The items here more a wish list, or long-time planning, rather than
a list of pending bugs. For that, please use the @i{issues} feature
of @t{ohwr.org}.
The document is now included in the master branch, because the list,
seen as a review of the project, is considered useful. Besides, we are
going to fix things over time, from September 2014 onwards, and this
document will be shortened accordingly.
@sp 1
Some things are more important and some are less. Each developer has different
judgment metrics, so I won't mark the items by importance; rather, they
are split into topics and randomly listed within each of them.
@c ##########################################################################
@node Documentation
@chapter Documentation
@itemize @bullet
@item @t{wrs-build} grew to not only cover building. It has some of
the internals, which should be split off to somewhere else.
@item We need a document about internals, but it would somehow conflict
with the 7S user manual. We need to agree carefully about what lives
where. I started one, some time ago, which is not published anywhere;
it still needs serious work but everything depends on how we split
the material.
@item The user manual is not part of the software package, and I'm not
sure whether it should be or not -- it covers hardware as well. But I
really think a user manual is mainly about how to use the thing, so
its place is the software repository. Other people's opinion differ.
@item Documentation about the web interface (by 7S) must find a proper
place in the overall design, too.
@end itemize
@c ##########################################################################
@node Buildroot
@chapter Buildroot
@itemize @bullet
@item Buildroot, the embedded distribution we are using, should be updated
any now and then. This is not really needed or urgent at this point,
but updating the distribution allows us to stay current with features
people may expect on ``current'' networking gears. This means porting
and verifying our patches -- but they are few and it is not much work.
@item The @i{net-snmp} configuration should be patched to avoid
needless configuration items, like disk usage, processes and the list of
installed packages. For this we can lift a patch by Integrasys, that
must be cherry-picked and verified on the current code base. As a matter
of fact, the @i{snmpd} process is currently more than 1M in RAM footprint.
@item @i{bash} should be the default interactive shell. The management
of history in @i{barebox} is unacceptable (if you @i{ssh} in twice,
the up-arrow will show interference between shells).
@item We need a way to set a root password during configuration.
@end itemize
@c ##########################################################################
@node Booting
@chapter Booting
@itemize @bullet
@item @i{dataflash} support in our version of @i{barebox} is not working.
We need to either upgrade @i{barebox} or fix the bug. I don't know
whether the bug has been fixed in recent boot loader versions (last time
I checked the @i{git} logs I found nothing meaningful, but it was around
May 2014.
@item We need to save some hardware information items to @i{dataflash}.
Such information should never be modified after the switch is shipped.
The plan is using SDB for it (i.e. SDBFS) but we must define proper
procedures and prevent accidental erasure of such data. This is urgent,
and we need it by September for release 4.1.
@item Installation and run-time boots in two different ways. At installation
we do everything using the @i{sam-ba} tools, while runtime relies on
@i{at91boot}. This means we are currently maintaining two different
and very different code bases; SDRAM and clocks configuration is
done twice, in different ways. Both code bases are horrible-quality
code, so we wasted a lot of time already, and more is expected -- all of
this twice. We should get rid of the @i{sam-ba} tools
and use @i{at91boot} for installation. Not trivial, because @i{at91boot}
loads the boot loader from storage, while at installation it must stop
waiting for the boot loader to be sent to RAM from USB.
@end itemize
@c ##########################################################################
@node Kernel and Drivers
@chapter Kernel and Drivers
@itemize @bullet
@item 2.6.39 is pretty old nowadays. We should move to a recent kernel.
Unfortunately this means writing the ``device tree'' crap for our board,
as everything nowadays is device-tree-based. Which in my opinion has
been designed by our competitors, to kill us slowly and painfully.
In addition, our patches must be ported forward.
@item The @i{wr-nic} driver is very similar to what we run for the SPEC,
which is not surprising because the gateware is basically the same.
We should make the two drivers identical. I did some initial work
on this, available in the @i{wr-nic-130207} branch, but it must be
completed (a similar branch exists in @i{spec-sw}, changing the
@t{wr-nic.ko} to use the same code base as the switch.
@item We should push upstream the generic changes we made to the kernel,
like increasing @t{NR_IRQ} and exporting symbols for externally-loaded
@t{irq_chip} drivers.
@item We'd benefit from having ``pstats'' names in @t{/proc/sys} where
the counters are. This would make the @sc{snmp} code generic, so
no change there would be needed if and when the counters change.
@end itemize
@c ##########################################################################
@node Gateware
@chapter Gateware
@itemize @bullet
@item We should have SDB in the switch gateware, to simplify a number
of things and avoid explicit addresses in so many places.
@item The ``pstats'' counters should export a version (even before SDB
is used). (@b{Update}: it is there, it just not used yet -- thanks Greg).
Thus, the kernel driver will know what the counters are, and
we could run different gateware images with the same filesystem, because
the driver will support the various versions counter sets.
@end itemize
@c ##########################################################################
@node The WR Processes and Libraries
@chapter The WR Processes and Libraries
@b{Note:} I didn't yet review seriously the WR processes, and I'm sure
I'll have more to complain about when I'm done. Please note that my
criticism is only about the code itself and I've nothing against the
authors of the code: I know the history of the project and why things
are how they are.
@itemize @bullet
@item The most serious problem here is that we have no real design around
the code base. The bunch of things is sticking together, more or less, but
it's not clear to me (and to most other people) what is the role of each
library and how the various items communicate.
@item Related to the previous item, we lack proper locking for FPGA access,
and we should really have a single entry point to hardware, with
serialized access. This may be addressed by writing a driver, or
clarifying the library. I don't know the details at this point in time,
but I know everybody's calling @t{mmap} by itself, which makes me feel
very uneasy.
@item As an example for the above issue, we may well have a @i{wr}
kernel object where the WR date is returned as a @i{sysfs} item. This
would simplify a number of tools that all @t{mmap} the hardware and
arrange for the seconds and fractional part to be returned consistently.
Avoiding repetition of code avoids repetition of bugs as well.
Ideally, in the long run, we'll have a real @i{wr} driver, likely driven
by SDB, so this @i{date} attribute will be part of the WR ABI.
As a side effect, a file-based interface allows off-switch simulation
and development (which includes simulating errors to test error paths in
the actual code).
@item Whether or not we port some stuff in kernel space, we need to have
a single @i{libwr}, to merge and replace @i{libptpnetif} and
@i{libswitchhw}. The role of each of them is not really clear to
developers at this point, and using a single library will simplify
building the tools;
@item While merging the libraries, we should seriously audit them.
Some functions do nothing, some are never called, and some are only
used withing the @i{hal} process, so there's no need to have them
public in an exported library. The library code rusted over the years,
and we can't trust it any more without some clean up. I'm personally
scared every time I look in that code base.
@item The @i{mini-ipc} mechanism is showing its limits. We are communicating
too much information, and the polling mechanism used in some processes
introduces significant delays. Most of the communication is just
about reporting status, so this should be brought to a shared-memory
area with a well-defined structure. Again, this allows simulation and
development off-switch, with fault injection.
@item Whenever @i{ipc} is used for actions, rather than just status passing,
we should audit the processes to ensure the are based on @i{select}
rather than a timely poll. Actually, the HAL process is using quite
a lot of cpu time, and it could do much better.
@item The @i{rtud} should be seriously audited and simplified. The
interface with gateware is now simpler than it was, but software still
has remnants of old data structure. Also, the multi-threaded approach
is overkill, and the program could benefit from a simplification.
@item We need to sort out configuration files. I especially plan to
get rid of @i{lua}; it is only used to set internal variables in the @i{lua}
data structures, that are then queried. This means if you set an unused
item (e.g. because of a typo or some old wishful-thinking documents) it won't
have any effect and it will report no error. The best action may be move
all configurations to the PPSi config file -- the delays and all the rest
are PTP-specific anyways, and PPSi supports per-architecture configuration
items.
@item We need a way to change parameters at run time in the @i{softpll}
implementation running on the switch. We may base this on the @i{ipc}
mechanism that is already in place, or design a well-know structure in
the LM32 code, so the main CPU can just change parameters on the fly.
I prefer the latter approach, but I need to talk with LM32 developers
to agree about how to do that, and ensure everybody follows the policy
we define.
@end itemize
@c ##########################################################################
@node PTP
@chapter PTP
@itemize @bullet
@item We should remove @i{ptp-noposix}. Keeping it around brings a number
of issues; mainly it prevents changes in WR libraries. That's because
any change and cleanup must now be verified against both PPSi and
@i{ptp-noposix}. But the latter is not being used routinely, so those
changes require extra effort and remain not very well tested anyways.
@item PPSi support for @t{arch-wrs} has some strange use for the timeouts,
and some of the code is more complex than it should. I need to audit
it and make it shine. This applies to both the WR servo and the
@t{arch-wrs} code. For the latter, I'm already confident with it,
after fixing the lost-frame problem. Still, some time must be devoted
to this.
@item We should drive the WR PLL when being slave to a non-WR master.
The performance of wr-switch as a slave of another PTP host is now
unacceptable, because we only jump time without adjusting frequency.
@item PPSi should be able to rescan its configuration file. This is needed
to allow management actors to change the configuration and see it
effective soon after.
@item PPSi should be able to scan network interfaces, and apply a default
configuration to them. Listing 18 times the same information is definitely
neither beautiful nor useful.
@item We should clarify how VLANs are managed in our PTP implementation,
and by matching the standard if possible. It currently works ``by
chance'' -- but for example we didn't test on a PC, and there we'll
clearly need to rescan network interfaces to send PTP frames on new
VLANs as they are configured.
@item Our BSD mate @t{phk} made a serious review of PPSi code, and he
outlined a number of lacking areas. We should work on his input, to make
the code base stronger and more maintainable.
@item We should support UDP over IPV6, to be interoperable with our
neighbors, and to advertise ourselves as a more ``complete'' PTP
implementation.
@item WR GrandMaster Switch should be holy provided it has an
external reference. Currently if we have a GrandMaster Switch and we
connect a Free-running Master to it's Slave port (wr0) then it becomes
Slave to the Free-running Master and jumps it's WR time. All the
mechanism is in place, this should be trivial to fix.
@item If Slave is in TRACK_PHASE state and a time jump occurs it doesn't
reset its servo. It stays in TRACK_PHASE forever with a huge offset.
That is something we've already observed in AD test system for
Jean-Claude when GrandMaster was unlocking from the reference GPS. Not
following the time jump will be also a problem if we power on in the
same time all switches in the chain with GrandMaster on the top. It
takes some time for GM to lock so the network would start as
free-running to the second switch in the chain, but once GM is
synchronized all slaves will stay in TRACK_PHASE with large offset to
GrandMaster.
@end itemize
@c ##########################################################################
@node Tools
@chapter Tools
@itemize @bullet
@item Tool naming is completely inconsistent. We should move everything
to use unambiguous naming: @t{wr_} for every wr-wide item, and @t{wrsw_}
for every switch-specific thing. (Actually, @t{wrs_} would be better,
as it doesn't look like ``software'' and we use wrs/wrn internally already,
but several tools use @t{wrsw_} as a prefix already.
We fixed the ``shower'' name -- (@t{shw_ver}: switch hardware version),
but more are there. (@b{Update}: Greg says to use @t{wrs_} and do it soon).
@item A number of tools @t{mmap()} FPGA memory, but there is no
locking so it is (remotely) possible for tools to interfere and get
wrong results. We should define a good policy with proper locking, to
avoid mishaps. This problem is listed in @ref{The WR Processes and
Libraries} too.
@item There are too many @i{load} source files in @t{userspace/tools};
some are definitely not used any more. We need to clean up.
@item @t{wr_management} is an incomplete mess; mostly a copy of
@t{wr_mon}, slightly modify in its way toward being something else,
but nobody knows what it is now. We need to make it a serious thing
if it is used (by the web interface, it seems, called by
@t{functions.php}), or kill it outright.
@end itemize
@c ##########################################################################
@node Management
@chapter Management
@itemize @bullet
@item There is no grand design for management as yet. @sc{snmp} support,
the tools and the web interface are all indepenent items. If something
changes we must change it in several places.
@item Current @sc{snmp} code is not beautiful at all (as documented
in the @i{snmp-pain} document) and should be made better. Some stuff
needs to be factorized, some variables should be renamed and the use
of ``context'' items in tables should be fixed.
@item Some items that are already in the WR MIB are not filled, especially
in the PTP/PPSi area.
@item We lack support for the VLAN configuration items. This means
implementing the standardized VLAN MIB. Unfortunately we can't
really have any sensible code using @i{mib2c} (see the @i{snmp-pain}
document, so previous work by Integrasys can be only used as a reference.
also considering the gateware implementation is different from what they
used. Integrasys properly committed the auto-generated code and their
edits separately, so their work can be of help.
@item There is no ``set'' support, we only return values to the
manager by now. While current configuration objects are read-only,
I must learn how to deal with read-write objects before trying to
extend our MIB.
@item There is no support yet for sending traps on WR events.
This is even more important thatn @i{set}, because we want to notify
WR time jumps and lock/unlock events.
@item We need to seriously audit the web interface, and verify
everything is hooked properly to the system. This relates to the
changes in configuration files and everything else that is going to be
modified in the package at large. Ideally, we should agree about
a single interface between the web and the rest of the system.
If we are brave enough, the web interface might make @sc{snmp} calls;
but this requires having strong @sc{snmp} support for everything, first.
@end itemize
@bye
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment