\input texinfo @c -*-texinfo-*- % % snmp-pain.in - main file for the documentation % %%%% %------------------------------------------------------------------------------ % % NOTE FOR THE UNAWARE USER % ========================= % % This file is a texinfo source. It isn't the binary file of some strange % editor of mine. If you want ASCII, you should "make snmp-pain.txt". % %------------------------------------------------------------------------------ % % This is not a conventional info file... % I use three extra features: % - The '%' as a comment marker, if at beginning of line ("\%" -> "%") % - leading blanks are allowed (this is something I cannot live without) % - braces are automatically escaped when they appear in example blocks % @comment %**start of header @documentlanguage en @documentencoding ISO-8859-1 @setfilename snmp-pain.info @settitle snmp-pain @iftex @afourpaper @end iftex @paragraphindent none @comment %**end of header @setchapternewpage off @set update-month August 2014 @c the release name below is substituted at build time @set release __RELEASE_GIT_ID__ @finalout @titlepage @title SNMP Pain @subtitle Why (and how) I suffered while adding SNMP to White Rabbit @subtitle @value{update-month} (@value{release}) @author Alessandro Rubini @end titlepage @headings single @c ########################################################################## @iftex @contents @end iftex @c ########################################################################## @node Top @top Introduction This summarizes my experience with @sc{snmp} and the WR switch, which has been a real pain to me. I know it's mainly me, as it looks like the rest of the world is happily using both @sc{snmp} as a protocol and @i{net-snmp} as a daemon. I hope these notes may help other people within BE-CO-HT in finding a better way through @sc{snmp} and its implementations. This version of the document reflects the status of my knowledge and implementation as of @t{wr-switch-sw-v4.0}. As of Sep 2014 the document is included in the master branch of the repository because my mates consider is useful as refernce of what is there and why. I'll try to keep it up to date with the various MIB object I add or modify. @c ########################################################################## @node The SNMP Protocol @chapter The SNMP Protocol The protocol itself is simple, as the name implies, but not really trivial to digest (unlike SMTP or HTTP, for example). The datagrams are binary data: it makes sense for efficiency and to avoid writing a text parser in microcontrollers running the protocol, but it prevents testing using @t{telnet} or @t{wget}. The ``easiest'' testing tool is @i{snmpwalk}, which is bloated like the package it is part of. This is unavoidable for ``complete'' implementations; my problem here is the lack of a simpler tool for an easy start. See @ref{Tools} for more details. @c ========================================================================= @node Protocol Definition @section Protocol Definition @sc{snmp} is defined in the public RFC documents, but the list of them is too long to even list here -- it is really dozens of such documents. A good selection is found in the @t{doc/rfc} subdirectory of the @i{net-snmp} source archive. Most of the relevant RFCs deal with defining MIB files (see @ref{The MIB Idea}), and the actual protocol bits (the frame format) is difficult to find. Actually, unlike most or all other IETF protocols, @sc{snmp} isn't really defined in any RFC, but relies on the @i{basic encoding rules} (ITU-T X.690) for the @i{abstract syntax notation} ASN.1 (i.e., again the MIB stuff). So the huge lot of information out there tastes like a big loop of nothing, where everyone takes a lot for granted and refers to everyone else for details. For me it was like learning Chinese relying on a number of Chinese dictionaries and nothing else. Eventually, after making sense of the protocol itself (mainly by sniffing and reverse engineering, while referring to the Chinese dictionaries any now and then), I finally changed my mind and agreed that it makes a good choice, and might be considered for the WR node as well (SPEC or equivalent). This however is not covered here. @c ========================================================================= @node Basic Concepts @section Basic Concepts The network device runs an ``agent'' and the user runs a ``manager''. @sc{snmp} is based on the concept of management objects, that can be read or written. So the requests a manager sends to agents are mainly ``get'' and ``set''. Management objects are laid out in a tree. Thus an ``object identifier'' (OID) is like a pathname. The textual representation of an OID uses a dot as a separator; the binary representation is an array of integers. For example WR owns the subtree ``1.3.6.1.4.1.96.100'', where the ``...96'' is CERN and ``....1'' is for organizations. Identifiers are allocated by relevant organizations: CERN gave us our ``100'' and we are responsible for further levels. The manager can travel the whole tree or a subtree using a special ``get next'' request. Additionally, an agent can send traps, but I have no experience with generating custom traps. @c ========================================================================= @node The MIB Idea @section The MIB Idea The main aim of @sc{snmp} is being simple for the agent, while still remaining flexible. Thus the choice of a binary protocol and a get/set approach with an extensible tree of objects as described. When using a binary protocol on the wire and a simple pathname-based tree with integers as path components, the need for an higher-level description of the item arises immediately. The solution adopted by @sc{snmp} designers is relying on a very-flexible and very-abstract notation: ASN.1, as already noted. Each and every management object is thus defined by a MIB, short for @i{management information base}. At least in the golden theory. A MIB is a text file, with a very boresome structure, usually suffering from an over-engineering syndrome. I must admit, though, that the thing was over-engineered enough to survive the decades passing: it allowed casting of a number of new and unexpected needs into this old and unpleasant abstract notation; modern and complex management tools can make sense of unforeseen management objects, thanks to their description found in the MIB file. The @sc{snmp} mantra is ``everything is a MIB''. Or ``just write the MIB and everything will automatically follow''. The theory says that the MIB text file can be parsed by the manager's application to show an user-friendly view of any object; it says that it can be parsed by the command-line tools so to use user-friendly names in the requests; it says that it can be parsed by the agent to create reply frames; that it can be parsed by a code-generator to fill he low-level bits. So @sc{snmp} is mainly boring stuff for bureaucrats. Everything is hidden behind ASN.1: it either magically works or it magically fails; there is no clear documentation of the various levels of software or protocol, because it is claimed to ``just work'', which is not always true, despite theory. Also, in practice sensible implementations avoid the suggested MIB-driven path. In the end, the @i{wr-switch-sw} package includes @t{WR-SWITCH-MIB.txt}, but clearly it's not true that this file is the whole of it. @c ########################################################################## @node Choosing an Agent @chapter Choosing an Agent When adding @sc{snmp} support to a network device, you should always at least include the standard management objects, the ones that every manager expects to find on the system. This is the list of network interfaces, their features, their traffic statistics, and so on. Thus, writing your own is not an option: it doesn't make sense to have an incomplete wr-switch @sc{snmp} support, and re-implementing the huge number of standard management objects in a custom @sc{snmp} implementation is too big an effort to even consider it. I therefore looked at available free-software implementations. @c ========================================================================= @node Requirements for WR @section Requirements for WR In order to choose a proper @sc{snmp} engine, we first need to know what we expect from it. This a quick attempt at summarizing our requirements: @itemize @bullet @item We need support for all the basic actions, including traps, even if version 4 is not yet using them. @item We need to customize standard tables. This mainly involves VLAN support, because the WR switch needs a different backend than the normal kernel-driven Linux VLANs. Again, not yet implemented in version 4. @item We need to add a custom subtree, that includes both simple items and tables. Simple items are things such as the version string and the current time (the former is static and the latter is dynamic); tables are things such as the per-port statistic counters or the PTP slave list, if and when we choose it implement it. @end itemize @c ========================================================================= @node net-snmp @section net-snmp Everybody, in the Unix world, is using @i{net-snmp}. This is a big and bloated implementation, using GNU autotools for configuration, shared libraries and everything else. It includes ``simple'' tools for querying agents using the command line, and the usual bells and whistles. It is included in all Linux distributions, both hosted and embedded ones, and it really looks like the only choice available. So we installed this one in the WR switch, by selecting the proper packages in our @i{buildroot} configuration. Documentation for @i{net-snmp} is mainly online: the promising @t{doc} subdirectory in the source tree only hosts the RFC documents, but fortunately @t{man/} includes documentation for the API and basic daemon use. Tutorials and all the rest are available on the project site. This means, among other things, that you can't access most documentation while off-line (like I am while writing this document) and you can't easily get documentation for the specific version you are using, excluding the manual pages. Still, the documentation is quite complete and well-done, though featuring the @sc{snmp}-wide misfeature of taking a lot for granted, it is useful for an expert user but not a great way to become one such beast. @c ========================================================================= @node Other agents @section Other agents Simply put, there are no other choices other than @i{net-snmp}. A number of proprietary implementations exist, but the free world seems bound to this implementation. (This section needs an update with the aid of a net search; I did it back then but since I found nothing I didn't save the results; and I'm now offline while writing this). @c ========================================================================= @node Sub-agents @section Sub-agents Adding custom tables to an @sc{snmp} agent is a common requirement, so RFC-2741 defines the @i{AgentX} protocol. The protocol is run on a local socket interface, and allows registering handlers for specific subtrees. @i{net-snmp} supports the AgentX protocol, so this looked like an interesting option. This is the result of my evaluation: @itemize @bullet @item It is definitely not much used. I could not find any mainstream user of the feature; @t{lldpd} is the only one Debian package that names AgentX but it looks like the feature is not used in the Debian build. @item The code base supports both C and Perl bindings, the preferred one being Perl. Besides any performance concern, we can't easily run Perl on the WR switch because @i{buildroot} doesn't support it. @item It looks like support in @i{net-snmp} is incomplete, but I can't currently find the reference about this while being offline. @end itemize Still, AgentX looked promising, so I evaluated the thing. The most interesting path being the various Python implementations: we already have Python in the WR Switch, because @i{buildroot} supports it and we needed it for the production test suite. There are three Python implementations, as far as I know: @table @code @item https://github.com/rayed/pyagentx A 2013-2014 implementation, BSD license. This only implements ``get'' and ``get-next'' so it doesn't support our post-v4.0 needs. @item https://pypi.python.org/pypi/agentx/0.7 This is a 2010 GPL implementation. It looks a little too small and simplified. I feared support dynamic tables herein would not be easy. @item https://pypi.python.org/pypi/netsnmpagent/0.5.0 A 2012-2013 implementation, GPL3. This seems serious, and it paints itself as the result of a lot of frustration using the available tools, something I sympathize with. It credits @i{agentx} (preceding item in this list). Unfortunately traps are still missing. @end table The last option seems a viable one, but I eventually ruled it out because I'm not confident enough with Python to easily master it (for example, I'm sure I wouldn't be able to add traps in a reasonable time frame), and our WR management objects require a C language backend, at times -- which is another area where my Python lacks. Likely I was also scared by the length of the @t{SIMPLE-MIB.txt} it includes, even if in the end I wrote my MIB anyways. After evaluating sub-agents, I chose to stick to the shared-library mechanism offered by @i{net-snmp}, as described in later chapters. @c ========================================================================= @node Tools @section Tools As said, I didn't find any simple tool to make @sc{snmp} queries. So I stuck to @t{snmpwalk}. It's worth noting that @t{pysnmpwalk} (part of the @t{python-pysnmp4-apps} Debian package) works as well, with a compatible command line and output format. The reason why @t{snmpwalk} is not simple enough for me, is that it includes a MIB parser, in order to turn numeric pathnames and data into user-friendly names and values. I won't repeat here how to use the tool (see other documentation or the trivial examples in the @t{wr-switch-sw} manual), but I'll note that @t{-d} provides a dump of @sc{snmp} frames being sent and received, which is a good helper to understand what is happening under the hood, and hopefully avoiding a detailed read of the ITU specification for basic encoding. @c ########################################################################## @node Using the Code Generator @chapter Using the Code Generator What @i{net-snmp} suggests is the use of its own code generator, which outputs C sources built from a specified MIB file and command-line options. The generator is called @i{mib2c}. There are a number of drawbacks in using the generator, so I finally refrained from and chose to write code using the internal API @itemize @bullet @item The generated code includes parts that must be filled before it can build and parts just marked as ``@t{XXX}'' but that otherwise build. Thus, you really need to review the whole ``generated'' files to bring them to a working state. @item Some of the calls to be filled are normal API calls, so you need to be confident with @i{net-snmp} internal data structures; the same effort you need to spend before you are able to write code by yourself. @item The code is laid out as a number of big @t{switch} stanzas, without relying on data structures. This makes editing the generated files a heavy, repeating and failure-proof procedure. @item The generated code includes a number of repetitions, so the same edits must be redone at least twice (like filling the same @t{switch} construct in two different places. @item There are a number of options for the code generator: @i{mib2c} uses a number of different templates, and I'm pretty sure not all of them are used in practice; so I fear making the wrong choice and finally hit bugs that are not my own. @item Experimenting several options to compare them is unfeasible, because every edit to fill specific data structures must be redone each time. Similarly, you risk making the wrong choice and redo the edits under a different template at a later time. @item @i{mib2c} leaks object names found in the MIB file into the source code it outputs, and it does it everywhere in the output files. Thus, during development, you'll need to redo your edits several times to keep the C files in sync with a moving MIB definition. And the usual trick of re-applying the same patch doesn't always work, because the MIB names appear all around the generated file. @item Documentation claims that the standard @i{net-snmp} modules are based on @i{mib2c}, but while looking at the actual code I didn't really found such traces. I admit that grepping for ``@t{auto-generated by mib2c}'' on the source tree finds a number of matches, but most of them look heavily edited, and none of them matches my expectations of a ``simple'' file. @end itemize I used the generator for the initial trial, @i{wrsScalar}, as described in detail in @ref{wrsScalar}, but then I gave up. My failed experiments are still part of the @t{netsnmp-pain} branch, pushed to @t{ohwr.org}. For example, commit @t{ed1d654} uses the ``mib for dummies'' option of @i{mib2c} for the @t{pStats} table included in the local WR MIB file. This crated 4000 lines of source code, in 12 files, and 1000 lines of ``README'' files. I would take days just to make sense of them. The final statistics source file, not using the generator, is 230 lines of code. @c ########################################################################## @node Writing Real Code @chapter Writing Real Code In the end, I managed to make the thing work, using different approaches for the different objects in the WR subtree. The objects currently defined are the following ones, all under @t{.1.3.6.1.4.1.96.100}, as defined in @t{WR-SWITCH-MIB.txt}: @example wrsScalar OBJECT IDENTIFIER ::= { wrSwitchMIB 1 } wrsPstats OBJECT IDENTIFIER ::= { wrSwitchMIB 2 } wrsPpsi OBJECT IDENTIFIER ::= { wrSwitchMIB 3 } wrsVersion OBJECT IDENTIFIER ::= { wrSwitchMIB 4 } wrsDate OBJECT IDENTIFIER ::= { wrSwitchMIB 5 } @end example All of the objects are read-only as of @t{wr-switch-sw-v4.0}. The CamelCase naming convention matches what is found all over @t{net-snmp}. The various source files are built as a shared library; the configuration file of @t{snmpd} instructs it to load the library at run time. Documentation claims that the same source files can be compiled to run as an AgentX process or be directly linked to the @t{snmpd} binary. I chose the shared-library build because this avoids patching @t{net-snmp} within @i{buildroot}: maintaining direct source files rather than patches is way easier. I didn't feel safe in using the little-practiced AgentX protocol (especially for the upcoming ``set'' queries). Last but not least, the shared library build is what Integrasys successfully used for version 2 of the WR switch. @c ========================================================================= @node wrsScalar @section wrsScalar This is my first trial in making sense of the thing, and has no relevance for White Rabbit. The object is an integer that is incremented each time it is read. The source file comes from @i{mib2c} using this ``intuitive'' sequence of commands (assuming you build @t{wr-switch-sw} and @t{WRS_OUTOUT_DIR} is available. @example export BUILD_DIR="$WRS_OUTPUT_DIR/build/buildroot-2011.11/output/build" export MIBDIRS=$BUILD_DIR/netsnmp-5.6.1.1/mibs export MIBS=./WR-SWITCH-MIB.txt $BUILD_DIR/netsnmp-5.6.1.1/local/mib2c \ -I $BUILD_DIR/netsnmp-5.6.1.1/local \ -c mib2c.scalar.conf \ wrsScalar @end example In practice, you need to set @t{MIBSDIR} and @t{MIBS} in the environment, so both your own MIB and the ``standard'' ones are available to the tool. Standard MIBs are needed to get type definitions, using an ``import'' statement. Then you point your @i{-I} to @i{mib2c} inside @i{net-snmp} itself and execute the program from the same place. The integer itself, like all scalar values, is returned in item ``0'' of the subtree (thus, @t{1.3.6.1.4.1.96.100.1.0}). @c ========================================================================= @node wrsPstats @section wrsPstats @c ------------------------------------------------------------------------- @node The pStats Table @subsection The pStats Table This subtree, @t{wrSwitchMIB.2}, is a table. All tables in @sc{snmp} are described as being made of ``lines'' and ``columns''. The columns are hardwired (in the MIB and in the code), and the lines can be dynamic (this matches how people usually write tables). The OID scanning, however, is reversed from our habit and the tables are returned column by column. This happens because columns are defined in the MIB (each column is a directory, or a subtree, in the pathname of OIDs) while lines being dynamic can only appear as trailing items in the scanning. This implies, among other things, that any piece of code returning a dynamic table should build an internal data structure representing the whole table, in order to be able to consistently report the same lines for each column. Usually in a network-related table, the predefined columns represent the counters (tx/rx byte, errors, and so on) and each network interface is a line. This approach allows the same MIB to work for every possible configuration. For WR port statistics we chose a different approach: the counters themselves are somehow dynamic (they may change across versions, while the gateware develops) while the interfaces are restricted to be in the set @t{wr0}--@t{wr17}. So our pStats table is reversed from the common use of @sc{snmp} tables. As a side effect this allows the WR switch to return the name of each counter, in column 0. This allows greater flexibility when we'll have a new set of counters: the user-space tools will know the role of the new set of counters without any need to change them or the MIB file (we'll still need to change Switch @sc{snmp} code to match the new gateware. @c ------------------------------------------------------------------------- @node pStats Code @subsection pStats Code After several distressing attempts with @i{mib2c}, still present in the history of the @i{netsnmp-pain} branch in the @t{wr-switch-sw} repository, I chose to base my code on the ``tcpTable'' implementation that is present in the core @i{net-snmp} implementation. The TCP table does not use any @i{mib2c} template but is rather using directly the API. To be exceedingly safe in my steps, I started by replicating the TCP table under the WRS subtree, and then I changed it step-by-step to support the counters. Each and every small commit is still in the @i{netsnmp-pain} branch, but the final source file was separately committed to @i{snmp-for-wrs}, which was later merged to @i{master}. As described in @ref{The pStats Table}, @sc{snmp} tables are first filled in local memory and then returned item-by-item to the network manager. Table filling is performed by @t{wrsPstats_load()}, registered by @t{init_wrsPstats()} using @t{netsnmp_inject_handler()}. Unlike what happens in the TCP table, that allocates memory, I use static storage to load the counter values. Then @t{wrsPstats_first_entry()} and @t{wrsPstats_next_entry()} are used to scan the table, building the indexes, but the actual value is returned by @t{wrsPstats_handler()}. I suspect the thing is not very efficient overall. One thing I found especially unpleasant in this implementation is the use of ``context pointers'' in looping through the table. The API supports the idea of a @t{loop_context} and @t{data_context}, but elsewhere the loop context is called @t{iterator_context}. This mismatch in naming in the tcpTable is now inherited in wrsPstats, but sooner or later I'll fix it. As a side effect, I now use the two contexts concurrently in an ambiguous way. No, I'm not proud of this code. I don't feel confident with all the data structures as yet, and there still is some magic in all of this. This is confirmed by a buglet in the current code, that makes @i{snmpwalk} always return one item after the end of the table -- most likely I need to fix @t{next_entry()} to return @t{NULL} earlier. @c ========================================================================= @node wrsPpsi @section wrsPpsi This subtree was written in a hurry, and I feel likely we have some buglets; for example ``servo updates'' is the number of iterations, which will never exceed 32 bits, but it is reported using a 64-bit counter. Moreover, not all management objects are actually filled, but I chose to nail down the MIB even if the code is not completely there. This @t{wrSwitchMIB.3} is split in two subtrees: @t{wrSwitchMIB.3.1} is an array of scalar values (all of them instantiated as @t{.0}); @t{wrSwitchMIB.3.2} is a table. Most functions in the code use @t{ppsi_g} for globals and @t{ppsi_p} for the per-port table. To keep the code compact and extensible, I chose to @t{popen(3)} a connection to existing tools. The tools report information to @i{stdout} in a line-oriented tagged-format: ``@t{<key>: <value>\n}''. Thus, @t{wr_mon} now supports @t{-g} (``@t{SHOW_SNMP_GLOBALS}'') and @t{-p} (``@t{SHOW_SNMP_PORTS}''). By pre-setting environment variables it's also possible to override the command names, for testing; see source code for details. Parsing is implemented using a @t{pickinfo} structure, where each key is associated to a data type, a pointer and a size. This is is used to actually @t{sscanf} the value into a global structure. The same ``pickinfo'' table is later used to feed the binary data to @sc{snmp}. This parsing trick is concise and completely debugged/tested, so I plan to use to more widely when cleaning up and extending this WR @sc{snmp} support. The table of per-port values is scanned using the same steps as of @i{wrsPstats} (i.e. the tcpTable way, using the @i{net-snmp} iterator API). The global items are registered as a ``scalar group'' using the @i{net-snmp} API. I used @t{disman/expr/expScalars.c} as a reference and starting point. The function @t{ppsi_g_group()} refreshes the values (by calling @t{wr_mon -g}) whenever a request happens more than 1 second later than the previous refresh. I'm aware this is a quick hack, but it works reliably without learning too many intricated API calls. @b{Note:} due to a bug in current @i{snmpwalk} implementations and 64-bit values, 64-bit counters are returned with the two halves swapped. Also, there is no ``signed'' 64-bit value defined anywhere in @sc{snmp}, thus the picosecond signed offset will be represented by tools as a huge number when it actually is a small negative value. @c ========================================================================= @node wrsVersion @section wrsVersion The version is an array of scalars, so I used the ``scalar group'' approach like the global PPSi values described in @ref{wrsPpsi}. The implementation is easier, because I rely on the fact that versions never change while the process runs. So I retrieve the version strings at initialization time, by calling ``@t{wrsw_version -t}'' (tagged) and parsing its @i{stdout}. Parsing is easier than what we have in @i{wrsPpsi}, but my plan is having a unified parser overall, and eventually get rid of this simplified special case. @c ========================================================================= @node wrsDate @section wrsDate This subtree includes two scalars: the TAI seconds as a ``counter64'' value, and the human-readable equivalent string. The code uses a scalar group, as other subtrees described above already did. It maps @sc{fpga} memory to get the WR date and return it as scalar @sc{snmp} values. @b{Note:} due to a bug in current @i{snmpwalk} implementations and 64-bit values, the two halves of the 64-bit date are returned swapped. We feel returning a 32-bit value would be a worse choice, not being 2038 safe. When the bug is overall fixed, we'll be able to avoid word-swapping and be 2038-safe without changing the MIB file. @bye @c LocalWords: snmp wrSwitchMIB netsnmp ohwr snmpwalk AgentX buildroot @c LocalWords: pStats gateware wrsScalar wrsPstats wrsPpsi wrsVersion @c LocalWords: wrsDate subtree pathname