Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
\input texinfo @c -*-texinfo-*-
%
% wrs-todo.in - main file for the documentation
%
%%%%
%------------------------------------------------------------------------------
%
% NOTE FOR THE UNAWARE USER
% =========================
%
% This file is a texinfo source. It isn't the binary file of some strange
% editor of mine. If you want ASCII, you should "make wrs-todo.txt".
%
%------------------------------------------------------------------------------
%
% This is not a conventional info file...
% I use three extra features:
% - The '%' as a comment marker, if at beginning of line ("\%" -> "%")
% - leading blanks are allowed (this is something I cannot live without)
% - braces are automatically escaped when they appear in example blocks
%
@comment %**start of header
@documentlanguage en
@documentencoding ISO-8859-1
@setfilename wrs-todo.info
@settitle wrs-todo
@iftex
@afourpaper
@end iftex
@paragraphindent none
@comment %**end of header
@setchapternewpage off
@set update-month January 2015
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
@c the release name below is substituted at build time
@set release __RELEASE_GIT_ID__
@finalout
@titlepage
@title WRS-Todo
@subtitle What is missing or lacking in wr-switch-sw (my own opinion)
@subtitle @value{update-month} (@value{release})
@author Alessandro Rubini
@end titlepage
@headings single
@c ##########################################################################
@iftex
@contents
@end iftex
@c ##########################################################################
@node Top
@top Introduction
This document lists the items I think are suboptimal in the current
@t{wr-switch-sw} package. This is Alessandro's own personal opition,
but it is more or less agreed-with by other developers in the group.
The items here more a wish list, or long-time planning, rather than
a list of pending bugs. For that, please use the @i{issues} feature
of @t{ohwr.org}.
The document is now included in the master branch, because the list,
seen as a review of the project, is considered useful. Besides, we are
going to fix things over time, from September 2014 onwards, and this
document will be shortened accordingly.
@sp 1
Some things are more important and some are less. Each developer has different
judgment metrics, so I won't mark the items by importance; rather, they
are split into topics and randomly listed within each of them.
@c ##########################################################################
@node Documentation
@chapter Documentation
Documentation is mostly in place. Some information is still missing
and some things are to be moved around, but the layout is there, as
of the ``doc-shakeup'' merge commit (that doesn't include this wrs-todo
update.
@c ##########################################################################
@node Buildroot
@chapter Buildroot
@itemize @bullet
@item Buildroot, the embedded distribution we are using, should be updated
any now and then. This is not really needed or urgent at this point,
but updating the distribution allows us to stay current with features
people may expect on ``current'' networking gears. This means porting
and verifying our patches -- but they are few and it is not much work.
@item The @i{net-snmp} configuration should be patched to avoid
needless configuration items, like disk usage, processes and the list of
installed packages. For this we can lift a patch by Integrasys, that
must be cherry-picked and verified on the current code base. As a matter
of fact, the @i{snmpd} process is currently more than 1M in RAM footprint.
@item @i{bash} should be the default interactive shell. The management
of history in @i{barebox} is unacceptable (if you @i{ssh} in twice,
the up-arrow will show interference between shells).
@item We may consider moving this package to a real buildroot thing,
and submit upstream.
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
@item We need a way to set a root password during configuration.
@end itemize
@c ##########################################################################
@node Booting
@chapter Booting
@itemize @bullet
@item Installation and run-time boots in two different ways. At installation
we do everything using the @i{sam-ba} tools, while runtime relies on
@i{at91boot}. This means we are currently maintaining two different
and very different code bases; SDRAM and clocks configuration is
done twice, in different ways. Both code bases are horrible-quality
code, so we wasted a lot of time already, and more is expected -- all of
this twice. We should get rid of the @i{sam-ba} tools
and use @i{at91boot} for installation. Not trivial, because @i{at91boot}
loads the boot loader from storage, while at installation it must stop
waiting for the boot loader to be sent to RAM from USB.
@end itemize
@c ##########################################################################
@node Kernel and Drivers
@chapter Kernel and Drivers
@itemize @bullet
@item The @i{wr-nic} driver is very similar to what we run for the SPEC,
we went forward in merging it with the one in @i{spec-sw} but a few
lines are still different. Unifying will allow using a driver
for the gateware block, likely based on SDB.
@item We should push upstream the generic changes we made to the kernel,
like increasing @t{NR_IRQ} (likey not needed, as we have sparse interrupts
in more recent kernel trees) and exporting symbols for externally-loaded
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
@t{irq_chip} drivers.
@end itemize
@c ##########################################################################
@node Gateware
@chapter Gateware
@itemize @bullet
@item We should have SDB in the switch gateware, to simplify a number
of things and avoid explicit addresses in so many places.
@end itemize
@c ##########################################################################
@node The WR Processes and Libraries
@chapter The WR Processes and Libraries
@b{Note:} I didn't yet review seriously the WR processes, and I'm sure
I'll have more to complain about when I'm done. Please note that my
criticism is only about the code itself and I've nothing against the
authors of the code: I know the history of the project and why things
are how they are.
@itemize @bullet
@item The most serious problem here is that we have no real design around
the code base. The bunch of things is sticking together, more or less, but
it's not clear to me (and to most other people) what is the role of each
library and how the various items communicate.
@item Related to the previous item, we lack proper locking for FPGA access,
and we should really have a single entry point to hardware, with
serialized access. This may be addressed by writing a driver, or
clarifying the library. I don't know the details at this point in time,
but I know everybody's calling @t{mmap} by itself, which makes me feel
very uneasy.
@item As an example for the above issue, we may well have a @i{wr}
kernel object where the WR date is returned as a @i{sysfs} item. This
would simplify a number of tools that all @t{mmap} the hardware and
arrange for the seconds and fractional part to be returned consistently.
Avoiding repetition of code avoids repetition of bugs as well.
Ideally, in the long run, we'll have a real @i{wr} driver, likely driven
by SDB, so this @i{date} attribute will be part of the WR ABI.
As a side effect, a file-based interface allows off-switch simulation
and development (which includes simulating errors to test error paths in
the actual code).
@item We started auditing and cleaning up the libraries
(now one library only: @i{libwr}) but there's much work to be done here.
The current situation works, but we want to make the library
understandable and maintainable: the library code rusted over the years,
and we can't trust it any more without some clean up.
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
@item The @i{rtud} should be seriously audited and simplified. The
interface with gateware is now simpler than it was, but software still
has remnants of old data structure. Also, the multi-threaded approach
is overkill, and the program could benefit from a simplification.
@end itemize
@c ##########################################################################
@node PTP
@chapter PTP
@itemize @bullet
@item PPSi support for @t{arch-wrs} has some strange use for the timeouts,
and some of the code is more complex than it should. I need to audit
it and make it shine. This applies to both the WR servo and the
@t{arch-wrs} code. For the latter, I'm already confident with it,
after fixing the lost-frame problem. Still, some time must be devoted
to this.
@item PPSi should be able to rescan its configuration file. This is needed
to allow management actors to change the configuration and see it
effective soon after.
@item PPSi should be able to scan network interfaces, and apply a default
configuration to them. Listing 18 times the same information is definitely
neither beautiful nor useful.
@item We should clarify how VLANs are managed in our PTP implementation,
and by matching the standard if possible. It currently works ``by
chance'' -- but for example we didn't test on a PC, and there we'll
clearly need to rescan network interfaces to send PTP frames on new
VLANs as they are configured.
@item Our BSD mate @t{phk} made a serious review of PPSi code, and he
outlined a number of lacking areas. We should work on his input, to make
the code base stronger and more maintainable.
@item We should support UDP over IPV6, to be interoperable with our
neighbors, and to advertise ourselves as a more ``complete'' PTP
implementation.
@item WR GrandMaster Switch should be holy provided it has an
external reference. Currently if we have a GrandMaster Switch and we
connect a Free-running Master to it's Slave port (wri1) then it becomes
Slave to the Free-running Master and jumps it's WR time. All the
mechanism is in place, this should be trivial to fix.
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
@end itemize
@c ##########################################################################
@node Tools
@chapter Tools
@itemize @bullet
@item A number of tools @t{mmap()} FPGA memory, but there is no
locking so it is (remotely) possible for tools to interfere and get
wrong results. We should define a good policy with proper locking, to
avoid mishaps. This problem is listed in @ref{The WR Processes and
Libraries} too.
@end itemize
@c ##########################################################################
@node Management
@chapter Management
@itemize @bullet
@item There is no grand design for management as yet. @sc{snmp} support,
the tools and the web interface are all indepenent items. If something
changes we must change it in several places.
@item Current @sc{snmp} code is not beautiful at all (as documented
in the @i{snmp-pain} document) and should be made better. Some stuff
needs to be factorized, some variables should be renamed and the use
of ``context'' items in tables should be fixed.
@item We lack support for the VLAN configuration items. This means
implementing the standardized VLAN MIB. Unfortunately we can't
really have any sensible code using @i{mib2c} (see the @i{snmp-pain}
document, so previous work by Integrasys can be only used as a reference.
also considering the gateware implementation is different from what they
used. Integrasys properly committed the auto-generated code and their
edits separately, so their work can be of help.
@item There is no ``set'' support, we only return values to the
manager by now. While current configuration objects are read-only,
I must learn how to deal with read-write objects before trying to
extend our MIB.
@item There is no support yet for sending traps on WR events.
This is even more important that @i{set}, because we want to notify
WR time jumps and lock/unlock events.
@item We need to seriously audit the web interface, and verify
everything is hooked properly to the system. This relates to the
changes in configuration files and everything else that is going to be
modified in the package at large. Ideally, we should agree about
a single interface between the web and the rest of the system.
If we are brave enough, the web interface might make @sc{snmp} calls;
but this requires having strong @sc{snmp} support for everything, first.
@item We need a solution to know total of hours/days that switch is up.
It might be worth to store average temperature or histogram of temperatures
over switch lifetime. During implementation wear-out of flash shall be taken
into consideration. This data shall be stored locally and be exported via SNMP.
Such information may be useful for switch manufacturers to estimate MTTF.