Merge branch 'developers-docs'

This adds two documents of mine (with greg's additions) that are considered useful for developers.

Merge branch 'developers-docs'
This adds two documents of mine (with greg's additions) that are considered useful for developers.
9e31590c · Alessandro Rubini · 7e82f062 · 584c163b · 9e31590c · 9e31590c
Commit 9e31590c authored Sep 03, 2014 by Alessandro Rubini
Show whitespace changes
Inline Side-by-side

Showing with 1095 additions and 0 deletions

snmp-pain.in doc/snmp-pain.in +660 -0

wrs-todo.in doc/wrs-todo.in +435 -0

No files found.
--- a/doc/snmp-pain.in
+++ b/doc/snmp-pain.in
+\input texinfo    @c -*-texinfo-*-
+%
+% snmp-pain.in - main file for the documentation
+%
+%%%%
+
+%------------------------------------------------------------------------------
+%
+%                         NOTE FOR THE UNAWARE USER
+%                         =========================
+%
+%    This file is a texinfo source. It isn't the binary file of some strange
+%    editor of mine. If you want ASCII, you should "make snmp-pain.txt".
+%
+%------------------------------------------------------------------------------
+
+%
+% This is not a conventional info file...
+% I use three extra features:
+%   - The '%' as a comment marker, if at beginning of line ("\%" -> "%")
+%   - leading blanks are allowed (this is something I cannot live without)
+%   - braces are automatically escaped when they appear in example blocks
+%
+
+@comment %**start of header
+@documentlanguage en
+@documentencoding ISO-8859-1
+@setfilename snmp-pain.info
+@settitle snmp-pain
+@iftex
+@afourpaper
+@end iftex
+@paragraphindent none
+@comment %**end of header
+
+@setchapternewpage off
+
+@set update-month August 2014
+@c the release name below is substituted at build time
+@set release __RELEASE_GIT_ID__
+
+@finalout
+
+@titlepage
+@title SNMP Pain
+@subtitle Why (and how) I suffered while adding SNMP to White Rabbit
+@subtitle @value{update-month} (@value{release})
+
+@author Alessandro Rubini
+@end titlepage
+@headings single
+
+@c ##########################################################################
+@iftex
+@contents
+@end iftex
+
+@c ##########################################################################
+@node Top
+@top Introduction
+
+This summarizes my experience with @sc{snmp} and the WR switch, which
+has been a real pain to me.
+
+I know it's mainly me, as it looks like the rest of the world is
+happily using both @sc{snmp} as a protocol and @i{net-snmp} as a daemon.  I
+hope these notes may help other people within BE-CO-HT in finding a
+better way through @sc{snmp} and its implementations.
+
+This version of the document reflects the status of my knowledge and
+implementation as of @t{wr-switch-sw-v4.0}.
+
+As of Sep 2014 the document is included in the master branch of the
+repository because my mates consider is useful as refernce of what
+is there and why.  I'll try to keep it up to date with the various
+MIB object I add or modify.
+
+@c ##########################################################################
+@node The SNMP Protocol
+@chapter The SNMP Protocol
+
+The protocol itself is simple, as the name implies, but not really
+trivial to digest (unlike SMTP or HTTP, for example).  The datagrams
+are binary data: it makes sense for efficiency and to avoid writing a
+text parser in microcontrollers running the protocol, but it prevents
+testing using @t{telnet} or @t{wget}.
+
+The ``easiest'' testing tool is @i{snmpwalk}, which is bloated like
+the package it is part of. This is unavoidable for ``complete''
+implementations; my problem here is the lack of a simpler tool for
+an easy start. See @ref{Tools} for more details.
+
+@c =========================================================================
+@node Protocol Definition
+@section Protocol Definition
+
+@sc{snmp} is defined in the public RFC documents, but the list of them is
+too long to even list here -- it is really dozens of such documents.
+A good selection is found in the @t{doc/rfc} subdirectory of the
+@i{net-snmp} source archive.  Most of the relevant RFCs deal with
+defining MIB files (see @ref{The MIB Idea}), and the actual protocol
+bits (the frame format) is difficult to find.
+
+Actually, unlike most or all other IETF protocols, @sc{snmp} isn't really
+defined in any RFC, but relies on the @i{basic encoding rules} (ITU-T
+X.690) for the @i{abstract syntax notation} ASN.1 (i.e., again the MIB
+stuff).  So the huge lot of information out there tastes like a big loop of
+nothing, where everyone takes a lot for granted and refers to everyone
+else for details.  For me it was like learning Chinese relying on
+a number of Chinese dictionaries and nothing else.
+
+Eventually, after making sense of the protocol itself (mainly by
+sniffing and reverse engineering, while referring to the Chinese
+dictionaries any now and then), I finally changed my mind and agreed
+that it makes a good choice, and might be considered for the WR node
+as well (SPEC or equivalent). This however is not covered here.
+
+@c =========================================================================
+@node Basic Concepts
+@section Basic Concepts
+
+The network device runs an ``agent'' and the user runs a ``manager''.
+
+@sc{snmp} is based on the concept of management objects, that can be read
+or written.  So the requests a manager sends to agents are mainly
+``get'' and ``set''.
+
+Management objects are laid out in a tree. Thus an ``object
+identifier'' (OID) is like a pathname.  The textual representation of
+an OID uses a dot as a separator; the binary representation is an
+array of integers. For example WR owns the subtree
+``1.3.6.1.4.1.96.100'', where the ``...96'' is CERN and ``....1'' is
+for organizations. Identifiers are allocated by relevant organizations:
+CERN gave us our ``100'' and we are responsible for further levels.
+
+The manager can travel the whole tree or a subtree using a special
+``get next'' request.
+
+Additionally, an agent can send traps, but I have no experience with
+generating custom traps.
+
+@c =========================================================================
+@node The MIB Idea
+@section The MIB Idea
+
+The main aim of @sc{snmp} is being simple for the agent, while still
+remaining flexible. Thus the choice of a binary protocol and a get/set
+approach with an extensible tree of objects as described.
+
+When using a binary protocol on the wire and a simple pathname-based
+tree with integers as path components, the need for an higher-level
+description of the item arises immediately.  The solution adopted by
+@sc{snmp} designers is relying on a very-flexible and very-abstract notation:
+ASN.1, as already noted.
+
+Each and every management object is thus defined by a MIB, short for
+@i{management information base}. At least in the golden theory.  A MIB
+is a text file, with a very boresome structure, usually suffering from
+an over-engineering syndrome.  I must admit, though, that the thing
+was over-engineered enough to survive the decades passing: it allowed
+casting of a number of new and unexpected needs into this old and
+unpleasant abstract notation; modern and complex management tools can
+make sense of unforeseen management objects, thanks to their
+description found in the MIB file.
+
+The @sc{snmp} mantra is ``everything is a MIB''. Or ``just write the MIB
+and everything will automatically follow''. The theory says that
+the MIB text file can be parsed by the manager's application to show
+an user-friendly view of any object; it says that it can be parsed by
+the command-line tools so to use user-friendly names in the requests;
+it says that it can be parsed by the agent to create reply frames; that
+it can be parsed by a code-generator to fill he low-level bits. 
+
+So @sc{snmp} is mainly boring stuff for bureaucrats. Everything is hidden
+behind ASN.1: it either magically works or it magically fails; there
+is no clear documentation of the various levels of software or
+protocol, because it is claimed to ``just work'', which is not always
+true, despite theory.  Also, in practice sensible implementations avoid
+the suggested MIB-driven path.
+
+In the end, the @i{wr-switch-sw} package includes
+@t{WR-SWITCH-MIB.txt}, but clearly it's not true that this file is the
+whole of it.
+
+@c ##########################################################################
+@node Choosing an Agent
+@chapter Choosing an Agent
+
+When adding @sc{snmp} support to a network device, you should always at
+least include the standard management objects, the ones that every
+manager expects to find on the system.  This is the list of network
+interfaces, their features, their traffic statistics, and so on.
+
+Thus, writing your own is not an option: it doesn't make sense to have
+an incomplete wr-switch @sc{snmp} support, and re-implementing the huge
+number of standard management objects in a custom @sc{snmp} implementation
+is too big an effort to even consider it.
+
+I therefore looked at available free-software implementations.
+
+@c =========================================================================
+@node Requirements for WR
+@section Requirements for WR
+
+In order to choose a proper @sc{snmp} engine, we first need to know what
+we expect from it.  This a quick attempt at summarizing our requirements:
+
+@itemize @bullet
+
+@item We need support for all the basic actions, including traps, even if
+version 4 is not yet using them.
+
+@item We need to customize standard tables. This
+mainly involves VLAN support, because the WR switch needs a different
+backend than the normal kernel-driven Linux VLANs. Again, not yet
+implemented in version 4.
+
+@item We need to add a custom subtree, that includes both simple items
+and tables. Simple items are things such as the version string and the
+current time (the former is static and the latter is dynamic); tables
+are things such as the per-port statistic counters or the PTP slave
+list, if and when we choose it implement it.
+
+@end itemize
+
+@c =========================================================================
+@node net-snmp
+@section net-snmp
+
+Everybody, in the Unix world, is using @i{net-snmp}.
+
+This is a big and bloated implementation, using GNU autotools for
+configuration, shared libraries and everything else. It includes
+``simple'' tools for querying agents using the command line, and the
+usual bells and whistles.
+
+It is included in all Linux distributions, both hosted and embedded ones,
+and it really looks like the only choice available.  So we installed
+this one in the WR switch, by selecting the proper packages in our
+@i{buildroot} configuration.
+
+Documentation for @i{net-snmp} is mainly online: the promising @t{doc}
+subdirectory in the source tree only hosts the RFC documents, but
+fortunately @t{man/} includes documentation for the API and basic
+daemon use.  Tutorials and all the rest are available on the project
+site.  This means, among other things, that you can't access most
+documentation while off-line (like I am while writing this document)
+and you can't easily get documentation for the specific version you
+are using, excluding the manual pages.
+
+Still, the documentation is quite complete and well-done, though
+featuring the @sc{snmp}-wide misfeature of taking a lot for granted, it is
+useful for an expert user but not a great way to become one such beast.
+
+@c =========================================================================
+@node Other agents
+@section Other agents
+
+Simply put, there are no other choices other than @i{net-snmp}.  A
+number of proprietary implementations exist, but the free world seems
+bound to this implementation.
+
+(This section needs an update with the aid of a net search; I did it
+back then but since I found nothing I didn't save the results; and
+I'm now offline while writing this).
+
+@c =========================================================================
+@node Sub-agents
+@section Sub-agents
+
+Adding custom tables to an @sc{snmp} agent is a common requirement, so
+RFC-2741 defines the @i{AgentX} protocol.  The protocol is run on a
+local socket interface, and allows registering handlers for specific
+subtrees.  @i{net-snmp} supports the AgentX protocol, so this looked
+like an interesting option.
+
+This is the result of my evaluation:
+
+@itemize @bullet
+
+@item It is definitely not much used. I could not find any mainstream user
+of the feature; @t{lldpd} is the only one Debian package that names
+AgentX but it looks like the feature is not used in the Debian build.
+
+@item The code base supports both C and Perl bindings, the preferred
+one being Perl. Besides any performance concern, we can't easily run Perl
+on the WR switch because @i{buildroot} doesn't support it.
+
+@item It looks like support in @i{net-snmp} is incomplete, but I can't
+currently find the reference about this while being offline.
+
+@end itemize
+
+Still, AgentX looked promising, so I evaluated the thing.  The most
+interesting path being the various Python implementations: we already
+have Python in the WR Switch, because @i{buildroot} supports it and we
+needed it for the production test suite.
+
+There are three Python implementations, as far as I know:
+
+@table @code
+
+@item https://github.com/rayed/pyagentx
+
+	A 2013-2014 implementation, BSD license.  This only implements
+        ``get'' and ``get-next'' so it doesn't support our post-v4.0
+        needs.
+
+@item https://pypi.python.org/pypi/agentx/0.7
+
+	This is a 2010 GPL implementation. It looks a little too small
+        and simplified. I feared support dynamic tables herein would not
+        be easy.
+
+@item https://pypi.python.org/pypi/netsnmpagent/0.5.0
+
+	A 2012-2013 implementation, GPL3.  This seems serious, and it
+        paints itself as the result of a lot of frustration using the
+        available tools, something I sympathize with. It credits
+        @i{agentx} (preceding item in this list).  Unfortunately traps
+        are still missing.
+
+@end table
+
+The last option seems a viable one, but I eventually ruled it out
+because I'm not confident enough with Python to easily master it (for
+example, I'm sure I wouldn't be able to add traps in a reasonable time
+frame), and our WR management objects require a C language backend, at
+times -- which is another area where my Python lacks.
+
+Likely I was also scared by the length of the @t{SIMPLE-MIB.txt} it
+includes, even if in the end I wrote my MIB anyways.
+
+After evaluating sub-agents, I chose to stick to the shared-library
+mechanism offered by @i{net-snmp}, as described in later chapters.
+
+@c =========================================================================
+@node Tools
+@section Tools
+
+As said, I didn't find any simple tool to make @sc{snmp} queries. So I stuck
+to @t{snmpwalk}.  It's worth noting that @t{pysnmpwalk} (part
+of the @t{python-pysnmp4-apps} Debian package) works as well, with a
+compatible command line and output format.
+
+The reason why @t{snmpwalk} is not simple enough for me, is that it
+includes a MIB parser, in order to turn numeric pathnames and data into
+user-friendly names and values.
+
+I won't repeat here how to use the tool (see other documentation or
+the trivial examples in the @t{wr-switch-sw} manual), but I'll note
+that @t{-d} provides a dump of @sc{snmp} frames being sent and received,
+which is a good helper to understand what is happening under the hood,
+and hopefully avoiding a detailed read of the ITU specification for
+basic encoding.
+
+@c ##########################################################################
+@node Using the Code Generator
+@chapter Using the Code Generator
+
+What @i{net-snmp} suggests is the use of its own code generator,
+which outputs C sources built from a specified MIB file and command-line
+options. The generator is called @i{mib2c}.
+
+There are a number of drawbacks in using the generator, so I finally
+refrained from and chose to write code using the internal API
+
+@itemize @bullet
+
+@item The generated code includes parts that must be filled before it
+      can build and parts just marked as ``@t{XXX}'' but that otherwise
+      build. Thus, you really need to review the whole ``generated'' files
+      to bring them to a working state.
+
+@item Some of the calls to be filled are normal API calls, so you need
+      to be confident with @i{net-snmp} internal data structures; the
+      same effort you need to spend before you are able to write code
+      by yourself.
+
+@item The code is laid out as a number of big @t{switch} stanzas, without
+      relying on data structures. This makes editing the generated files
+      a heavy, repeating and failure-proof procedure.
+
+@item The generated code includes a number of repetitions, so the same edits
+      must be redone at least twice (like filling the same @t{switch}
+      construct in two different places.
+
+@item There are a number of options for the code generator: @i{mib2c}
+      uses a number of different templates, and I'm pretty sure not
+      all of them are used in practice; so I fear making the wrong choice
+      and finally hit bugs that are not my own.
+
+@item Experimenting several options to compare them is unfeasible, because
+      every edit to fill specific data structures must be redone each
+      time.  Similarly, you risk making the wrong choice and redo the
+      edits under a different template at a later time.
+
+@item @i{mib2c} leaks object names found in the MIB file into the source
+      code it outputs, and it does it everywhere in the output files. Thus,
+      during development, you'll need to redo your
+      edits several times to keep the C files in sync with a moving MIB
+      definition. And the usual trick of re-applying the same patch doesn't
+      always work, because the MIB names appear all around the generated
+      file.
+
+@item Documentation claims that the standard @i{net-snmp} modules are
+      based on @i{mib2c}, but while looking at the actual code I
+      didn't really found such traces. I admit that grepping for
+      ``@t{auto-generated by mib2c}'' on the source tree finds a
+      number of matches, but most of them look heavily edited, and
+      none of them matches my expectations of a ``simple'' file.
+
+@end itemize
+
+I used the generator for the initial trial, @i{wrsScalar}, as
+described in detail in @ref{wrsScalar}, but then I gave up. My failed
+experiments are still part of the @t{netsnmp-pain} branch, pushed to
+@t{ohwr.org}.
+
+For example, commit @t{ed1d654} uses the ``mib for dummies'' option of
+@i{mib2c} for the @t{pStats} table included in the local WR MIB file.
+This crated 4000 lines of source code, in 12 files, and 1000 lines of
+``README'' files. I would take days just to make sense of them.
+
+The final statistics source file, not using the generator, is 230
+lines of code.
+
+@c ##########################################################################
+@node Writing Real Code
+@chapter Writing Real Code
+
+In the end, I managed to make the thing work, using different
+approaches for the different objects in the WR subtree.  The objects
+currently defined are the following ones, all under
+@t{.1.3.6.1.4.1.96.100}, as defined in @t{WR-SWITCH-MIB.txt}:
+
+@example
+   wrsScalar     OBJECT IDENTIFIER ::= { wrSwitchMIB 1 }
+   wrsPstats     OBJECT IDENTIFIER ::= { wrSwitchMIB 2 }
+   wrsPpsi       OBJECT IDENTIFIER ::= { wrSwitchMIB 3 }
+   wrsVersion    OBJECT IDENTIFIER ::= { wrSwitchMIB 4 }
+   wrsDate       OBJECT IDENTIFIER ::= { wrSwitchMIB 5 }
+@end example
+
+All of the objects are read-only as of @t{wr-switch-sw-v4.0}.  The
+CamelCase naming convention matches what is found all over
+@t{net-snmp}.
+
+The various source files are built as a shared library; the
+configuration file of @t{snmpd} instructs it to load the library at
+run time.
+
+Documentation claims that the same source files can be compiled to run
+as an AgentX process or be directly linked to the @t{snmpd} binary.  I
+chose the shared-library build because this avoids patching
+@t{net-snmp} within @i{buildroot}: maintaining direct source files
+rather than patches is way easier.  I didn't feel safe in using the
+little-practiced AgentX protocol (especially for the upcoming ``set''
+queries). Last but not least, the shared library build is what
+Integrasys successfully used for version 2 of the WR switch.
+
+@c =========================================================================
+@node wrsScalar
+@section wrsScalar
+
+This is my first trial in making sense of the thing, and has no
+relevance for White Rabbit. The object is an integer that is
+incremented each time it is read.  The source file comes from @i{mib2c}
+using this ``intuitive'' sequence of commands (assuming you
+build @t{wr-switch-sw} and @t{WRS_OUTOUT_DIR} is available.
+
+@example
+   export BUILD_DIR="$WRS_OUTPUT_DIR/build/buildroot-2011.11/output/build"
+   export MIBDIRS=$BUILD_DIR/netsnmp-5.6.1.1/mibs
+
+   export MIBS=./WR-SWITCH-MIB.txt
+   $BUILD_DIR/netsnmp-5.6.1.1/local/mib2c \
+          -I $BUILD_DIR/netsnmp-5.6.1.1/local \
+          -c mib2c.scalar.conf \
+          wrsScalar
+@end example
+
+In practice, you need to set @t{MIBSDIR} and @t{MIBS} in the
+environment, so both your own MIB and the ``standard'' ones are
+available to the tool.  Standard MIBs are needed to get type
+definitions, using an ``import'' statement. Then you point your @i{-I}
+to @i{mib2c} inside @i{net-snmp} itself and execute the program from
+the same place.
+
+The integer itself, like all scalar values, is returned in
+item ``0'' of the subtree (thus, @t{1.3.6.1.4.1.96.100.1.0}).
+
+@c =========================================================================
+@node wrsPstats
+@section wrsPstats
+
+@c -------------------------------------------------------------------------
+@node The pStats Table
+@subsection The pStats Table
+
+This subtree, @t{wrSwitchMIB.2}, is a table.  All tables in @sc{snmp} are
+described as being made of ``lines'' and ``columns''. The columns are
+hardwired (in the MIB and in the code), and the lines can be dynamic
+(this matches how people usually write tables).
+
+The OID scanning, however, is reversed from our habit and the tables
+are returned column by column.  This happens because columns are
+defined in the MIB (each column is a directory, or a subtree, in the
+pathname of OIDs) while lines being dynamic can only appear as
+trailing items in the scanning.
+
+This implies, among other things, that any piece of code returning a
+dynamic table should build an internal data structure representing the
+whole table, in order to be able to consistently report the same lines
+for each column.
+
+Usually in a network-related table, the predefined columns represent
+the counters (tx/rx byte, errors, and so on) and each network
+interface is a line.  This approach allows the same MIB to work for
+every possible configuration.  For WR port statistics we chose a
+different approach: the counters themselves are somehow dynamic (they
+may change across versions, while the gateware develops) while the
+interfaces are restricted to be in the set @t{wr0}--@t{wr17}.
+
+So our pStats table is reversed from the common use of @sc{snmp} tables.
+As a side effect this allows the WR switch to return the name of each
+counter, in column 0. This allows greater flexibility when we'll have
+a new set of counters: the user-space tools will know the role of the
+new set of counters without any need to change them or the MIB file
+(we'll still need to change Switch @sc{snmp} code to match the new gateware.
+
+@c -------------------------------------------------------------------------
+@node pStats Code
+@subsection pStats Code
+
+After several distressing attempts with @i{mib2c}, still present in
+the history of the @i{netsnmp-pain} branch in the @t{wr-switch-sw}
+repository, I chose to base my code on the ``tcpTable'' implementation
+that is present in the core @i{net-snmp} implementation.  The TCP
+table does not use any @i{mib2c} template but is rather using directly
+the API.  To be exceedingly safe in my steps, I started by replicating
+the TCP table under the WRS subtree, and then I changed it
+step-by-step to support the counters. Each and every small commit is
+still in the @i{netsnmp-pain} branch, but the final source file was
+separately committed to @i{snmp-for-wrs}, which was later merged to
+@i{master}.
+
+As described in @ref{The pStats Table}, @sc{snmp} tables are first filled
+in local memory and then returned item-by-item to the network manager.
+Table filling is performed by @t{wrsPstats_load()}, registered by
+@t{init_wrsPstats()} using @t{netsnmp_inject_handler()}.  Unlike what
+happens in the TCP table, that allocates memory, I use static storage
+to load the counter values.  Then @t{wrsPstats_first_entry()}
+and @t{wrsPstats_next_entry()} are used to scan the table, building the
+indexes, but the actual value is returned by @t{wrsPstats_handler()}.
+I suspect the thing is not very efficient overall.
+
+One thing I found especially unpleasant in this implementation is the
+use of ``context pointers'' in looping through the table. The API
+supports the idea of a @t{loop_context} and @t{data_context}, but
+elsewhere the loop context is called @t{iterator_context}. This
+mismatch in naming in the tcpTable is now inherited in wrsPstats,
+but sooner or later I'll fix it. As a side effect, I now use
+the two contexts concurrently in an ambiguous way.  No, I'm not
+proud of this code.
+
+I don't feel confident with all the data structures as yet, and there
+still is some magic in all of this.  This is confirmed by a buglet in
+the current code, that makes @i{snmpwalk} always return one item after
+the end of the table -- most likely I need to fix @t{next_entry()}
+to return @t{NULL} earlier.
+
+@c =========================================================================
+@node wrsPpsi
+@section wrsPpsi
+
+This subtree was written in a hurry, and I feel likely we have some
+buglets; for example ``servo updates'' is the number of iterations,
+which will never exceed 32 bits, but it is reported using a 64-bit
+counter.  Moreover, not all management objects are actually filled,
+but I chose to nail down the MIB even if the code is not completely
+there.
+
+This @t{wrSwitchMIB.3} is split in two subtrees: @t{wrSwitchMIB.3.1}
+is an array of scalar values (all of them instantiated as @t{.0});
+@t{wrSwitchMIB.3.2} is a table.  Most functions in the code
+use @t{ppsi_g} for globals and @t{ppsi_p} for the per-port table.
+
+To keep the code compact and extensible, I chose to @t{popen(3)} a
+connection to existing tools.  The tools report information to
+@i{stdout} in a line-oriented tagged-format: ``@t{<key>: <value>\n}''.
+Thus, @t{wr_mon} now supports @t{-g} (``@t{SHOW_SNMP_GLOBALS}'') and
+@t{-p} (``@t{SHOW_SNMP_PORTS}'').  By pre-setting environment variables
+it's also possible to override the command names, for testing; see
+source code for details.
+
+Parsing is implemented using a @t{pickinfo} structure, where each key
+is associated to a data type, a pointer and a size.  This is is used
+to actually @t{sscanf} the value into a global structure.  The same
+``pickinfo'' table is later used to feed the binary data to @sc{snmp}.
+
+This parsing trick is concise and completely debugged/tested, so I
+plan to use to more widely when cleaning up and extending this WR @sc{snmp}
+support.
+
+The table of per-port values is scanned using the same steps as of
+@i{wrsPstats} (i.e. the tcpTable way, using the @i{net-snmp} iterator
+API).
+
+The global items are registered as a ``scalar group'' using the
+@i{net-snmp} API. I used @t{disman/expr/expScalars.c} as a reference
+and starting point. The function @t{ppsi_g_group()} refreshes the
+values (by calling @t{wr_mon -g}) whenever a request happens more than
+1 second later than the previous refresh. I'm aware this is a quick
+hack, but it works reliably without learning too many intricated API
+calls.
+
+@b{Note:} due to a bug in current @i{snmpwalk} implementations and
+64-bit values, 64-bit counters are returned with the two halves
+swapped.  Also, there is no ``signed'' 64-bit value defined anywhere
+in @sc{snmp}, thus the picosecond signed offset will be represented by
+tools as a huge number when it actually is a small negative value.
+
+@c =========================================================================
+@node wrsVersion
+@section wrsVersion
+
+The version is an array of scalars, so I used the ``scalar group''
+approach like the global PPSi values described in @ref{wrsPpsi}.
+
+The implementation is easier, because I rely on the fact that versions
+never change while the process runs. So I retrieve the version strings
+at initialization time, by calling ``@t{wrsw_version -t}'' (tagged)
+and parsing its @i{stdout}.  Parsing is easier than what we have in
+@i{wrsPpsi}, but my plan is having a unified parser overall, and eventually
+get rid of this simplified special case.
+
+@c =========================================================================
+@node wrsDate
+@section wrsDate
+
+This subtree includes two scalars: the TAI seconds as a ``counter64''
+value, and the human-readable equivalent string.
+
+The code uses a scalar group, as other subtrees described above already did.
+It maps @sc{fpga}  memory to get the WR date and return it as scalar @sc{snmp}
+values.
+
+@b{Note:} due to a bug in current @i{snmpwalk} implementations and
+64-bit values, the two halves of the 64-bit date are returned swapped.
+We feel returning a 32-bit value would be a worse choice, not being
+2038 safe.  When the bug is overall fixed, we'll be able to avoid
+word-swapping and be 2038-safe without changing the MIB file.
+
+
+@bye
+
+@c  LocalWords:  snmp wrSwitchMIB netsnmp ohwr snmpwalk AgentX buildroot
+@c  LocalWords:  pStats gateware wrsScalar wrsPstats wrsPpsi wrsVersion
+@c  LocalWords:  wrsDate subtree pathname
--- a/doc/wrs-todo.in
+++ b/doc/wrs-todo.in
+\input texinfo    @c -*-texinfo-*-
+%
+% wrs-todo.in - main file for the documentation
+%
+%%%%
+
+%------------------------------------------------------------------------------
+%
+%                         NOTE FOR THE UNAWARE USER
+%                         =========================
+%
+%    This file is a texinfo source. It isn't the binary file of some strange
+%    editor of mine. If you want ASCII, you should "make wrs-todo.txt".
+%
+%------------------------------------------------------------------------------
+
+%
+% This is not a conventional info file...
+% I use three extra features:
+%   - The '%' as a comment marker, if at beginning of line ("\%" -> "%")
+%   - leading blanks are allowed (this is something I cannot live without)
+%   - braces are automatically escaped when they appear in example blocks
+%
+
+@comment %**start of header
+@documentlanguage en
+@documentencoding ISO-8859-1
+@setfilename wrs-todo.info
+@settitle wrs-todo
+@iftex
+@afourpaper
+@end iftex
+@paragraphindent none
+@comment %**end of header
+
+@setchapternewpage off
+
+@set update-month September 2014
+@c the release name below is substituted at build time
+@set release __RELEASE_GIT_ID__
+
+@finalout
+
+@titlepage
+@title WRS-Todo
+@subtitle What is missing or lacking in wr-switch-sw (my own opinion)
+@subtitle @value{update-month} (@value{release})
+
+@author Alessandro Rubini
+@end titlepage
+@headings single
+
+@c ##########################################################################
+@iftex
+@contents
+@end iftex
+
+@c ##########################################################################
+@node Top
+@top Introduction
+
+This document lists the items I think are suboptimal in the current
+@t{wr-switch-sw} package.  This is Alessandro's own personal opition,
+but it is more or less agreed-with by other developers in the group.
+The items here more a wish list, or long-time planning, rather than
+a list of pending bugs. For that, please use the @i{issues} feature
+of @t{ohwr.org}.
+
+The document is now included in the master branch, because the list,
+seen as a review of the project, is considered useful. Besides, we are
+going to fix things over time, from September 2014 onwards, and this
+document will be shortened accordingly.
+
+@sp 1
+
+Some things are more important and some are less. Each developer has different
+judgment metrics, so I won't mark the items by importance; rather, they
+are split into topics and randomly listed within each of them.
+
+@c ##########################################################################
+@node Documentation
+@chapter Documentation
+
+@itemize @bullet
+
+@item @t{wrs-build} grew to not only cover building. It has some of
+the internals, which should be split off to somewhere else.
+
+@item We need a document about internals, but it would somehow conflict
+with the 7S user manual. We need to agree carefully about what lives
+where.  I started one, some time ago, which is not published anywhere;
+it still needs serious work but everything depends on how we split
+the material.
+
+@item The user manual is not part of the software package, and I'm not
+sure whether it should be or not -- it covers hardware as well.  But I
+really think a user manual is mainly about how to use the thing, so
+its place is the software repository. Other people's opinion differ.
+
+@item Documentation about the web interface (by 7S) must find a proper
+place in the overall design, too.
+
+@end itemize
+
+
+@c ##########################################################################
+@node Buildroot
+@chapter Buildroot
+
+@itemize @bullet
+
+@item Buildroot, the embedded distribution we are using, should be updated
+any now and then. This is not really needed or urgent at this point,
+but updating the distribution allows us to stay current with features
+people may expect on ``current'' networking gears.  This means porting
+and verifying our patches -- but they are few and it is not much work.
+
+@item The @i{net-snmp} configuration should be patched to avoid
+needless configuration items, like disk usage, processes and the list of
+installed packages.  For this we can lift a patch by Integrasys, that
+must be cherry-picked and verified on the current code base. As a matter
+of fact, the @i{snmpd} process is currently more than 1M in RAM footprint.
+
+@item @i{bash} should be the default interactive shell. The management
+of history in @i{barebox} is unacceptable (if you @i{ssh} in twice,
+the up-arrow will show interference between shells).
+
+@item We need a way to set a root password during configuration.
+
+@end itemize
+
+@c ##########################################################################
+@node Booting 
+@chapter Booting
+
+@itemize @bullet
+
+@item @i{dataflash} support in our version of @i{barebox} is not working.
+We need to either upgrade @i{barebox} or fix the bug.  I don't know
+whether the bug has been fixed in recent boot loader versions (last time
+I checked the @i{git} logs I found nothing meaningful, but it was around
+May 2014.
+
+@item We need to save some hardware information items to @i{dataflash}.
+Such information should never be modified after the switch is shipped.
+The plan is using SDB for it (i.e. SDBFS) but we must define proper
+procedures and prevent accidental erasure of such data. This is urgent,
+and we need it by September for release 4.1.
+
+@item Installation and run-time boots in two different ways. At installation
+we do everything using the @i{sam-ba} tools, while runtime relies on
+@i{at91boot}.  This means we are currently maintaining two different
+and very different code bases; SDRAM and clocks configuration is
+done twice, in different ways.  Both code bases are horrible-quality
+code, so we wasted a lot of time already, and more is expected -- all of
+this twice. We should get rid of the @i{sam-ba} tools
+and use @i{at91boot} for installation.  Not trivial, because @i{at91boot}
+loads the boot loader from storage, while at installation it must stop
+waiting for the boot loader to be sent to RAM from USB.
+
+@end itemize
+
+@c ##########################################################################
+@node Kernel and Drivers
+@chapter Kernel and Drivers
+
+@itemize @bullet
+
+@item 2.6.39 is pretty old nowadays. We should move to a recent kernel.
+Unfortunately this means writing the ``device tree'' crap for our board,
+as everything nowadays is device-tree-based.  Which in my opinion has
+been designed by our competitors, to kill us slowly and painfully.
+In addition, our patches must be ported forward.
+
+@item The @i{wr-nic} driver is very similar to what we run for the SPEC,
+which is not surprising because the gateware is basically the same.
+We should make the two drivers identical.  I did some initial work
+on this, available in the @i{wr-nic-130207} branch, but it must be
+completed (a similar branch exists in @i{spec-sw}, changing the
+@t{wr-nic.ko} to use the same code base as the switch.
+
+@item We should push upstream the generic changes we made to the kernel,
+like increasing @t{NR_IRQ} and exporting symbols for externally-loaded
+@t{irq_chip} drivers.
+
+@item We'd benefit from having ``pstats'' names in @t{/proc/sys} where
+the counters are. This would make the @sc{snmp} code generic, so
+no change there would be needed if and when the counters change.
+
+@end itemize
+
+@c ##########################################################################
+@node Gateware
+@chapter Gateware
+
+@itemize @bullet
+
+@item We should have SDB in the switch gateware, to simplify a number
+of things and avoid explicit addresses in so many places.
+
+@item The ``pstats'' counters should export a version (even before SDB
+is used). (@b{Update}: it is there, it just not used yet -- thanks Greg).
+Thus, the kernel driver will know what the counters are, and
+we could run different gateware images with the same filesystem, because
+the driver will support the various versions counter sets.
+
+@end itemize
+
+@c ##########################################################################
+@node The WR Processes and Libraries
+@chapter The WR Processes and Libraries
+
+@b{Note:} I didn't yet review seriously the WR processes, and I'm sure
+I'll have more to complain about when I'm done. Please note that my
+criticism is only about the code itself and I've nothing against the
+authors of the code: I know the history of the project and why things
+are how they are.
+
+@itemize @bullet
+
+@item The most serious problem here is that we have no real design around
+the code base.  The bunch of things is sticking together, more or less, but
+it's not clear to me (and to most other people) what is the role of each
+library and how the various items communicate.
+
+@item Related to the previous item, we lack proper locking for FPGA access,
+and we should really have a single entry point to hardware, with
+serialized access.  This may be addressed by writing a driver, or
+clarifying the library. I don't know the details at this point in time,
+but I know everybody's calling @t{mmap} by itself, which makes me feel
+very uneasy.
+
+@item As an example for the above issue, we may well have a @i{wr}
+kernel object where the WR date is returned as a @i{sysfs} item. This
+would simplify a number of tools that all @t{mmap} the hardware and
+arrange for the seconds and fractional part to be returned consistently.
+Avoiding repetition of code avoids repetition of bugs as well.
+Ideally, in the long run, we'll have a real @i{wr} driver, likely driven
+by SDB, so this @i{date} attribute will be part of the WR ABI.
+As a side effect, a file-based interface allows off-switch simulation
+and development (which includes simulating errors to test error paths in
+the actual code).
+
+@item Whether or not we port some stuff in kernel space, we need to have
+a single @i{libwr}, to merge and replace @i{libptpnetif} and
+@i{libswitchhw}.  The role of each of them is not really clear to
+developers at this point, and using a single library will simplify
+building the tools;
+
+@item While merging the libraries, we should seriously audit them.
+Some functions do nothing, some are never called, and some are only
+used withing the @i{hal} process, so there's no need to have them
+public in an exported library.  The library code rusted over the years,
+and we can't trust it any more without some clean up. I'm personally
+scared every time I look in that code base.
+
+@item The @i{mini-ipc} mechanism is showing its limits. We are communicating
+too much information, and the polling mechanism used in some processes
+introduces significant delays.  Most of the communication is just
+about reporting status, so this should be brought to a shared-memory
+area with a well-defined structure. Again, this allows simulation and
+development off-switch, with fault injection.
+
+@item Whenever @i{ipc} is used for actions, rather than just status passing,
+we should audit the processes to ensure the are based on @i{select}
+rather than a timely poll.  Actually, the HAL process is using quite
+a lot of cpu time, and it could do much better.
+
+@item The @i{rtud} should be seriously audited and simplified.  The
+interface with gateware is now simpler than it was, but software still
+has remnants of old data structure.  Also, the multi-threaded approach
+is overkill, and the program could benefit from a simplification.
+
+@item We need to sort out configuration files. I especially plan to
+get rid of @i{lua}; it is only used to set internal variables in the @i{lua}
+data structures, that are then queried. This means if you set an unused
+item (e.g. because of a typo or some old wishful-thinking documents) it won't
+have any effect and it will report no error.  The best action may be move
+all configurations to the PPSi config file -- the delays and all the rest
+are PTP-specific anyways, and PPSi supports per-architecture configuration
+items.
+
+@item  We need a way to change parameters at run time in the @i{softpll}
+implementation running on the switch.  We may base this on the @i{ipc}
+mechanism that is already in place, or design a well-know structure in
+the LM32 code, so the main CPU can just change parameters on the fly.
+I prefer the latter approach, but I need to talk with LM32 developers
+to agree about how to do that, and ensure everybody follows the policy
+we define.
+
+@end itemize
+
+@c ##########################################################################
+@node PTP
+@chapter PTP
+
+@itemize @bullet
+
+@item We should remove @i{ptp-noposix}.  Keeping it around brings a number
+of issues; mainly it prevents changes in WR libraries.  That's because
+any change and cleanup must now be verified against both PPSi and
+@i{ptp-noposix}.  But the latter is not being used routinely, so those
+changes require extra effort and remain not very well tested anyways.
+
+@item PPSi support for @t{arch-wrs} has some strange use for the timeouts,
+and some of the code is more complex than it should.  I need to audit
+it and make it shine.  This applies to both the WR servo and the
+@t{arch-wrs} code. For the latter, I'm already confident with it,
+after fixing the lost-frame problem. Still, some time must be devoted
+to this.
+
+@item We should drive the WR PLL when being slave to a non-WR master.
+The performance of wr-switch as a slave of another PTP host is now
+unacceptable, because we only jump time without adjusting frequency.
+
+@item PPSi should be able to rescan its configuration file. This is needed
+to allow management actors to change the configuration and see it
+effective soon after.
+
+@item PPSi should be able to scan network interfaces, and apply a default
+configuration to them.  Listing 18 times the same information is definitely
+neither beautiful nor useful.
+
+@item We should clarify how VLANs are managed in our PTP implementation,
+and by matching the standard if possible.  It currently works ``by
+chance'' -- but for example we didn't test on a PC, and there we'll
+clearly need to rescan network interfaces to send PTP frames on new
+VLANs as they are configured.
+
+@item Our BSD mate @t{phk} made a serious review of PPSi code, and he
+outlined a number of lacking areas. We should work on his input, to make
+the code base stronger and more maintainable.
+
+@item We should support UDP over IPV6, to be interoperable with our
+neighbors, and to advertise ourselves as a more ``complete'' PTP
+implementation.
+
+@item WR GrandMaster Switch should be holy provided it has an
+external reference. Currently if we have a GrandMaster Switch and we
+connect a Free-running Master to it's Slave port (wr0) then it becomes
+Slave to the Free-running Master and jumps it's WR time.  All the
+mechanism is in place, this should be trivial to fix.
+
+@item If Slave is in TRACK_PHASE state and a time jump occurs it doesn't
+reset its servo. It stays in TRACK_PHASE forever with a huge offset.
+That is something we've already observed in AD test system for
+Jean-Claude when GrandMaster was unlocking from the reference GPS. Not
+following the time jump will be also a problem if we power on in the
+same time all switches in the chain with GrandMaster on the top. It
+takes some time for GM to lock so the network would start as
+free-running to the second switch in the chain, but once GM is
+synchronized all slaves will stay in TRACK_PHASE with large offset to
+GrandMaster.
+
+
+@end itemize
+
+@c ##########################################################################
+@node Tools
+@chapter Tools
+
+@itemize @bullet
+
+@item Tool naming is completely inconsistent. We should move everything
+to use unambiguous naming: @t{wr_} for every wr-wide item, and @t{wrsw_}
+for every switch-specific thing.  (Actually, @t{wrs_} would be better,
+as it doesn't look like ``software'' and we use wrs/wrn internally already,
+but several tools use @t{wrsw_} as a prefix already. 
+We fixed the ``shower'' name -- (@t{shw_ver}: switch hardware version),
+but more are there. (@b{Update}: Greg says to use @t{wrs_} and do it soon).
+
+@item A number of tools @t{mmap()} FPGA memory, but there is no
+locking so it is (remotely) possible for tools to interfere and get
+wrong results.  We should define a good policy with proper locking, to
+avoid mishaps. This problem is listed in @ref{The WR Processes and
+Libraries} too.
+
+@item There are too many @i{load} source files in @t{userspace/tools};
+some are definitely not used any more. We need to clean up.
+
+@item @t{wr_management} is an incomplete mess; mostly a copy of
+@t{wr_mon}, slightly modify in its way toward being something else,
+but nobody knows what it is now.  We need to make it a serious thing
+if it is used (by the web interface, it seems, called by
+@t{functions.php}), or kill it outright.
+
+@end itemize
+
+@c ##########################################################################
+@node Management
+@chapter Management
+
+@itemize @bullet
+
+@item There is no grand design for management as yet. @sc{snmp} support,
+the tools and the web interface are all indepenent items. If something
+changes we must change it in several places.
+
+@item Current @sc{snmp} code  is not beautiful at all (as documented
+in the @i{snmp-pain} document) and should be made better. Some stuff
+needs to be factorized, some variables should be renamed and the use
+of ``context'' items in tables should be fixed.
+
+@item Some items that are already in the WR MIB are not filled, especially
+in the PTP/PPSi area.
+
+@item We lack support for the VLAN configuration items. This means
+implementing the standardized VLAN MIB.  Unfortunately we can't
+really have any sensible code using @i{mib2c} (see the @i{snmp-pain}
+document, so previous work by Integrasys can be only used as a reference.
+also considering the gateware implementation is different from what they
+used.  Integrasys properly committed the auto-generated code and their
+edits separately, so their work can be of help.
+
+@item There is no ``set'' support, we only return values to the
+manager by now.  While current configuration objects are read-only,
+I must learn how to deal with read-write objects before trying to
+extend our MIB.
+
+@item There is no support yet for sending traps on WR events.
+This is even more important thatn @i{set}, because we want to notify
+WR time jumps and lock/unlock events.
+
+@item We need to seriously audit the web interface, and verify
+everything is hooked properly to the system.  This relates to the
+changes in configuration files and everything else that is going to be
+modified in the package at large.  Ideally, we should agree about
+a single interface between the web and the rest of the system.
+If we are brave enough, the web interface might make @sc{snmp} calls;
+but this requires having strong @sc{snmp} support for everything, first.
+
+
+@end itemize
+
+@bye