[insert project logo here (125x200px max)]
Navigator
Mailing-lists
Please report any errors or omissions you find to our `Help' mailing-list, or
post a message in the Forums.
Copyright and Licensing Information
Snap is (c) Jonathan T. Moore, 1999-2002 and licensed under the
GNU General Public License (GPL).
All other parts of Splash are (c) Willem de Bruijn, 2002-2003 and
licensed under the BSD Open Source License.
All sourcecode is made publicly available.
Acknowledgment
Splash and the Splash website are hosted
by SourceForge.net
|
Splash - Documentation
SNMP Plus a Lightweight API for SNAP Handling
Applying High Speed Active Networking to Network Management A Master's Thesis
Willem de Bruijn Universiteit Leiden
2003
Abstract
We describe a practical solution to network management issues
based on combining traditional infrastructure with
active networking technology.
A software platform, Splash, is presented that unites
the widely used net-snmp package with SNAP, an active network
optimized for safe, predictable execution. Special care is taken
to ensure high processing speed. For this purpose
the active network interface is
compared to standard SNMP on a quantitative basis. Performance is
proven to be comparable for low level operations, while under more
elaborate scenarios Splash is shown to outperform SNMP in terms of
response time, network bandwidth utilization and general flexibility.
By combining both interfaces in a single process,
Splash can be used as a drop-in replacement for net-snmp,
allowing network operators to use the
monitoring paradigm (polling with SNMP, or an agent-based approach)
most appropriate for a given situation.
Preface
This thesis
marks the endpoint of a six year period of enrollment
at the Universiteit Leiden for me. During these years I've gotten to know and
appreciate student life in every aspect. While I was far from academically
inclined when I first arrived, I hope to have made up for that later on.
I'll let this work speak for itself on that subject.
During the course of my stay in Leiden I've increasingly become interested in
the field of Computer Science. Especially after completing the
preliminary courses, most notably those dealing with mathematics, did the
subject start to come alive. Gratitude goes to Walter Kosters for pushing me when
necessary and to Igor Grubisic for showing me that mathematics can be
quite entertaining after all.
In the latter years I had the chance to work on some interesting novel projects.
This was made possible mainly by the open attitude to student research the
academic staff at the Leiden Institute of Advanced Computer Science have. I
would hereby like to thank all of the people I've gotten to know at LIACS
for their help. A personal thanks goes to Michael Lew for helping me get
my first accepted conference paper.
The last year I've been working solely on the Splash project detailed here. Luckily
I could fall back on excellent assistance from Herbert Bos here at LIACS and
Jonathan Moore at the University of Pennsylvania. Hardly knowing what the abbreviation
SNMP stood for, let alone what active networking meant at first, they guided me through the
process at all times. The most enticing reason for taking up this research was the
opportunity to actually contribute to the scientific community. Even if this proves to
be my last exercise in this domain it has been a great stimulant for finishing what
I started and undoubtedly a learning experience I can benefit from for times to come. I
would therefore like to thank Jonathan for this opportunity and especially Herbert for
all the help he's given me throughout the last year. A big thanks also goes to Doug
DeGroot, who was kind enough to take the time to read through and comment on my thesis
in an extremely short timespan.
Academics aside, I wouldn't have been where I am without my flat mates' friendship.
You can't always be on the tip of your toes and the relaxed attitude at Flanor 4c made
sure I never got, and will not soon get, too overstressed. It was a lot of fun, guys!
I'm sorry for the times I did get a bit worked up, such as after my failed first attempt at
obtaining a driver's license and during the last weeks before handing in my thesis.
Naturally, the people I have depended on the most are the last to mention. Without my
family, especially my parents, I wouldn't even have had the chance to enroll,
let alone obtain a degree. Thank you for supporting me and letting me make my own choices in
these years.
Part 1 Background
The purpose of this study is to show that so called active networks (AN)
can be applied practically to network management (NM). For this purpose we
will compare an AN-based toolkit with a reference SNMP implementation.
An often heard argument against the use of active networks is that they increase
processing overhead considerably.
Experiments will show that active networks can extend NM functionality
without incurring a performance penalty on basic operations.
We introduce a network management toolkit that combines SNMP
and AN interpreters in a single executable. The resulting program can
serve as a drop-in replacement for a standard snmp daemon.
We all rely on the proper functioning of computer networks for our everyday tasks,
be it directly, by accessing the Internet through PC's and mobile phones or
indirectly, by withdrawing money from cash machines
and booking flights at our travel agencies. We take the existence of a backbone network
for granted and only notice our dependence on it when it fails. Unfortunately, cables do
sometimes break, power will fail once in a while and software isn't infallible, as
recent problems such as the California power shortage, Y2K and the
crash of the Ariane 5 rocket demonstrated.
Small glitches in an environment can have widespread consequences. That is why critical
applications are equipped with backup systems:
power stations run below their maximum output,
banks have multiple overlapping transaction systems and jets have ejection seats. Depending on
the weight of a problem a suitable action must be taken.
Increasing power output by a few percentages
is a gradual solution, abandoning an airplane is a more definitive one.
Increasing network availability is not merely a case of laying more
cables in the ground and buying more hardware. It is equally necessary
to maintain the existing resources in working order. Tasks scale from
fixing loose network connections to governing the quality of service of a national telephone
system in an emergency situation. The ad hoc fixing of problems can be called network maintenance.
Naturally, a countermeasure must be proportional to its problem.
Selecting an appropriate action constitutes the difference between maintenance and management.
The real task of keeping networked systems running smoothly
therefore is not confined to taking care of instantly emerging issues, but deals with
governing the entire decision making process.
This area is generally referred to as network management.
With the increase in on-line activity in the last few decades
the task of managing networks has grown in complexity. So has our dependence on it being
executed correctly.
New applications and environments are implemented everyday, for instance
on line banking, wireless ethernet and 3G mobile services. To be able to service for both
present and future needs network management tools must be secure, reliable, efficient and flexible.
However, most frequently used network management tools date back at least a decade
and the gap between what they offer and what we need has been widening
since [,1.3, 3.0].
Research into network management has progressed through the years.
Solutions have been found to deal with new problems on a regular basis.
Yet, we still rely on relatively old tools. The alternative solutions put forward so
far must therefore suffer from another weakness than pure ineffectiveness.
Considering the expenses put into the
existing infrastructure it is impractical to come up with a completely new system and
expect a quick global acceptance. This has often been executed, however, and has most
probably been the main weakness of many challengers.
Instead of replacing proven tools in the network
administrator's toolkit with untested alternatives
a new system should be able to interoperate with the features
already in place.
A specific branch of network management research deals with the use of so called
active networks. Active networks are based on the notion that programmable
network packets, as opposed to the static data packets used in traditional networking,
can solve many issues by moving computation to the most useful location.
A large body of work on AN-based systems exists and their advantages in
view of flexibility have been demonstrated many times. Programmable packets do
have a downside, however. So far, all attempts to adapt an active network to network
management have failed due to the increased processing cost they incur and the
subsequent drop in performance perceived. This performance argument has also often been
used for general criticism of active networks.
It is therefore an important barrier to cross if
active networks are to be used in practical situations.
It is our thesis that a new, more resource efficient
class of active networks can, contrary to popular views, help to increase
network management functionality
without incurring a performance penalty on basic operations.
In the following chapters we will discuss
the present state of network management and active networking (chapter ),
and identify their current weaknesses (chapter ).
Then, in part , a software design will be introduced that is
based on the insights gathered
previously. Also, specific test scenarios will be discussed that can show the
advantages of the new platform over an existing reference system. The suggested
design has been implemented. Details of this product are discussed in part .
The augmented product will be compared to a standard reference system in
part . Finally, we will
draw conclusions from these experiments and suggest follow-on research
in part .
The research discussed in the following chapters is
based on network management and active networking
technologies. These are specialized
fields of science, therefore
we cannot assume a working knowledge of the domain from
the reader. This chapter should give any reader acquainted
with general computer science terminology enough background
information to be able follow the argumentation detailed
in the following chapters.
2.1 Network Management
The term Network Management (NM) refers to the task of maintaining
computer networks in working order. It consists of both monitoring
and configuration tasks and encompasses many areas, of which the
most commonly mentioned are:
- Fault Management
- Configuration Management
- Accounting
- Performance Management
- Security Management
Together these areas are referred to as FCAPS
in the Open Systems Interconnect (OSI) Model [].
Computer networks are composed of a mix of general purpose
computer systems and specialized network connection elements, e.g.
routers, switches and firewalls. Traditionally, network
management is concerned with maintaining the network links and
therefore is primarily concerned with the networking components.
This said, network management tools can also be used
for the monitoring and configuring of general purpose computer systems.
When viewing a network as an abstract structure we can define the
connections between two components as edges and the
components themselves as nodes. Workstations and PC's
mostly have only a single edge between them and the rest of the network.
We define these type of components as end nodes, while
components with more than one connection will be referred to as
routers.
Managing a network can in essence be performed by manually
controlling each machine. However,
since we are dealing with a communication medium it is possible
to relieve ourselves of this burden by accessing nodes through
the network. This is especially useful in networks containing
many nodes or those that cover a large area. When talking about
network management we imply the use of remote administration tools.
The global increase in computers and the growth in system
complexity in the last few decades have
highly increased demand for network management tools. Since the advent of
the Internet computer networks have grown quickly and so did the
task of managing all connected devices. Although the task became
increasingly more difficult and time consuming, networks also
made it possible to automate and reduce some of the work.
In the early days
of networking remote shell access to most systems was made possible using
tools such as Telnet[] and Rlogin[]. Since
shell login allows access to all parts of the system this has long been a
preferred tool for Network Management. Even today shell access is actively
in use, most probably also for network management tasks.
During the 1980's many suppliers of networking components began adding
proprietary management tools to their hardware, such as
Netview by IBM, Accumaster by AT&T and DMA by Digital Equipment
[].Failing to agree on a common communication method for the tools
meant different hardware systems could not cooperate easily within
the same network. Naturally, this gave way to interoperability problems.
Furthermore, staff had to
be trained specifically for each system, making network management
a costly service.
Recognizing the need for a standardized communication framework, the Internet
Architecture Board initiated a process of discussion[] in 1988 that
led to the Simple Network Management Protocol[]. SNMP has been around
since 1989, when the first version was defined
as a soon to be replaced protocol. As is often the case, the system
became so widespread that it is still in use today and will probably
be so for quite a while.
As a replacement the Common Management Interface Protocol (CMIP) and the Common
Management Interface Services (CMIS) standards were developed.
CMIP/CMIS made up for many of the
problems of SNMP and displayed a more elaborate system of datastructures and
communication means. Despite the benefits CMIP/CMIS brought in terms of scalability and
flexibility it has never really caught
on, probably partly due to the increase in processing overhead and system
complexity its use entailed.
In the past decade no competing technology has been able to succeed SNMP as the
primary NM interface. Supplementary tools have helped in overcoming some
its shortcomings. Among the most popular
are Remote Monitoring (RMON) [] for transferring network statistics
and the SNMP Agent eXtensibility Framework (AgentX)[] for connecting
SNMP with other pieces of software.
Since SNMP is the current standard in the field,
One cannot talk about network management without discussing it.
SNMP is a framework for network management standardized in Request For
Comments (RFC) documents.
It consists of definitions for [,1.10.01]:
- an overall architecture
- the structure and identification of management information
- communication protocols
- operations for accessing management information
- a set of fundamental applications
Management Information Base
In SNMP, data is structured in a hierarchy. All objects in this hierarchy
have a globally unique identifier called an Object Identifier (OID). We
call the complete data structure the Management Information Base or MIB.
However, we should note that, in NM literature,
subtrees of the global structure are
often also referred to as MIBs. To avoid confusion we will only
use the term subtree when discussing specific parts of the hierarchy.
An OID consists of a string of numbers. The complete string denotes
a unique element in the MIB hierarchy. For example, the system name
of a node is encoded in the OID with sequence .1.3.6.1.2.1.1.5 .
These numbers correspond to subtrees in the MIB. For the system
name example this gives
.iso(1).org(3).dod(6).internet(1).mgmt(2).mib-2(1).system(1).sysName(5).
To simplify the process of retrieving information we can also use a
human readable form, giving RFC1213-MIB:system::SysName or
even sysName. Serving as an example, the system subtree is
shown in figure . The listing was obtained from
the net-snmp tutorial[,tutorial: SNMPtranslate]
+--system(1)
|
+-- -R-- String sysDescr(1)
| Textual Convention: DisplayString
+-- -R-- ObjID sysObjectID(2)
+-- -R-- TimeTicks sysUpTime(3)
+-- -RW- String sysContact(4)
| Textual Convention: DisplayString
+-- -RW- String sysName(5)
| Textual Convention: DisplayString
+-- -RW- String sysLocation(6)
| Textual Convention: DisplayString
+-- -R-- Integer sysServices(7)
+-- -R-- TimeTicks sysORLastChange(8)
| Textual Convention: TimeStamp
|
+--sysORtable(9)
|
+--sysOREntry(1)
|
+-- ---- Integer sysORIndex(1)
+-- -R-- ObjID sysORID(2)
+-- -R-- String sysORDescr(3)
| Textual Convention: DisplayString
+-- -R-- TimeTicks sysORUpTime(4)
Textual Convention: TimeStamp
Figure 2.1: MIB example: the system subtree
A subset of the MIB has been standardized by the IAB.
Examples
of this subset are access to Internet Protocol statistics, the
route table and the system MIB mentioned above. Full RFC references
can be found on the web [].
For other entries, mostly platform dependent trees, unique
OIDs have to be requested from the Internet Assigned Numbers Authority
(IANA)[]. Trees have been registered for Cisco Routers,
Oracle databases and other proprietary environments.
Communication Protocols
The SNMP protocol is sometimes confused with the overall
framework because they share the same name. However,
it is important to distinguish the protocol from the framework.
The protocol has been revised a number
of times. When we refer to solely the protocol we will mention the
name and the version. In many cases this is abbreviated to SNMPvX,
where X stands for the version number.
The original protocol, still in use, has only
basic functionality. It is possible to retrieve
system information and set variables remotely by directly
addressing them through their OID. The protocol consists of basically two
messages: a GET and a SET request. Requests are sent over UDP
to the server in a packet called a Protocol Data Unit (PDU).
Subsequent packets can be compressed by issuing a GETNEXT request
instead of a full GET. Multiple requests can be bundled in
a single PDU, but with SNMPv1, that is as far as data aggregation goes.
Unfortunately, the
simple GET/SET structure means that for each referenced variable or
set of variables a
request- and response package has to be sent, creating a lot of
network traffic. This problem has partly been dealt with in the second
version of the protocol.
SNMPv2 allows single requests for sets of related
data by issuing a bulk request
GETBULK requests
can be used to receive complete subtrees of the MIB. For instance,
the example tree segment displayed in figure 2.1 can be obtained by issuing one
GETBULK system request.
Network bandwidth is further alleviated by
creating a hierarchy of SNMP servers called proxies. This
replaces the one-to-many topology of the NM infrastructure
with a more flexible layout that can mimic the topology of the
underlying network. Obviously, only hierarchical networks
benefit from this feature.
During the standardization process of version 2
partly compliant implementations spread to bridge the gap with SNMPv1.
Protocols such as
SNMPv2p, SNMPv2u, SNMPv2* and SNMPv1.5, can all be considered
pseudo standards. One
derived version deserves extra attention, namely SNMPv2c. Version 2c is the most
widespread implementation of the v2 series, consisting of all
SNMPv2 features excluding the advanced security infrastructure.
Instead, it still relies on the
community-string based authentication found in v1, hence the `c'.
The latest version of the protocol, v3, added much requested
authentication and encryption. Although using these features comes at a
performance penalty, securing access to configuration options can
be necessary.
2.1.4 Practical Situations
Having discussed the tasks of network management and how SNMP
carries these out we should stop and ask a second question. How
important is SNMP to the average network administrator? Is it
really the central tool for network management? Some tasks listed in
the FCAPS model are extremely hard to perform using SNMP,
for instance accounting. How do administrators deal with these
issues, if SNMP cannot help them with this?
Many distributed administration tasks, for instance user accounting,
cannot be handled practically by SNMP. Distinct tools have been created
for individual tasks. Microsoft's Active Directory,
a tool for domain-wide user and resource control, for instance, was hailed as a
holy grail when it was first introduced. We suspect that
SNMP is steadily losing ground to these proprietary applications.
The worst case scenario is that a standardized framework
is rendered irrelevant because of its sheer incompetence in dealing
with administrators' greatest headaches. Judging from the number
of non-SNMP based NM solutions available, SNMP is moving down
the queue. A real concern is that command line scripts and
proprietary tools will again be utilized for tasks where standard toolkits
could equally well be used.
It is hard to say how urgent this concern is in the field of
network management. Indications that SNMP cannot handle the tasks that
it should are abundant, however.
2.2 Active Networking
2.2.1 Introduction
To be able to explain the term active networking (AN) we must first
talk about everyday networking practices. The telephone system
can be seen as the first large scale remote communications network.
Originally, each telephone connection had to be
manually set up by an operator. Establishing a call between two
people literally meant connecting the two wires to each other.
These types of networks are called circuit switched networks.
Digital data oriented networks have used circuit switched networks
as a primary connection until the American Defense Advanced Research
Projects Agency (DARPA)[] began constructing a more
fault tolerant form of communication in the late 1970's.
The first so called packet switched network, DARPAnet,
allowed communication to take multiple
routes from the source to the destination node. Instead of
setting up dedicated links between communicating nodes,
communication streams were cut up into small packets
that could each be sent across a different network fiber.
Linking
networks together and breaking up communication into small
packets increases network redundancy, since the failure of
a single link doesn't necessarily entail a break in connectivity.
Packets can be rerouted around an errorzone and reach their
destination safely.
The Internet Protocol (IP) is the predominant example of a packet switched
network. Our present Internet actually originated from the DARPA initiative.
Cutting up communications into small packets, as is done in
packet switched networks, permits traveling
across multiple edges in the network. To facilitate this
all edges are - in theory - kept continually open.
Directing a packet through a network consisting of multiple
connections entails taking traveling decisions at
intermediate locations. Routing, as this is called, is generally executed using
extra logic at the intermediate nodes. In the networks we use today,
such as the Internet, this logic is pre-configured into the individual nodes.
Based on information attached to each data packet a node chooses
the outgoing edge on witch it will resend the packet.
Active networking, on the contrary, places the necessary logic inside the
packet itself. In the extreme case, the software running on the nodes only executes
programs embedded in the data packet and displays its information
to the packets. This behaviour can
increase network flexibility by changing policies based
on the nature of the data.
2.2.2 Approaches in Active Networking
How program code is fetched can differ between AN implementations.
In the previous section
we mentioned the case where executable code is embedded directly into
a network packet. This method is generally called the
revolutionary approach. A similar, but different attempt uses
indirect access to program code. Instead of carrying the code itself,
packets then contain links to foreign code. When a packet enters
a node the linked-to code is downloaded from a remote location.
The transitional approach, as this is called, has the advantage of
reduced packet size. Especially when foreign code is cached on a node
for consecutive usage can the transitional approach increase efficiency.
The revolutionary approach is, we believe, the more intuitive one.
Self contained capsules truly follow the idea of programmable autonomous packets.
Also, if executed programs change frequently, caching code will
be a useless exercise. For practical reasons a transitional approach
may be warranted in certain circumstances.
The two approaches have more commonalities than differences and
most of the time implementation details will be largely invisible.
For the remainder of this document we will follow the revolutionary approach.
In spite of this, many remarks, if not all,
could equally well correspond to the transitional approach.
2.2.3 Applicability
Active networks change the way we think about network processing. Moving
control from the nodes to the packets is a far reaching paradigm
shift. The topic has spurred much controversy among scientists and
developers, therefore the practice has many critics. We will
briefly go into the arguments most often heard and try to
give counterarguments.
The discussion is not just a nuisance to active network proponents,
although some of them might think it is. In reality, discussion
helps to clearly
mark the boundaries of applicability for the field. We do not,
contrary to some proponents, believe active networking to be a
revolutionary successor to passive networking. Instead, we will try
to show where the critics have made a substantial contribution to
the discussion and where their statements can be countered. The
result of this should be a justification of research into active networks
as tools for certain tasks, not as a holy grail of computing. We will
begin by giving an example task for which active networking might make
a good tool. This intuitive justification will then be verified by
a close inspection of active networks' inherent characteristics.
Example Use
In case of general processing,
active networking can be used to let packets make their
own routing decisions. For instance, realtime communication streams need
low latency links, while software downloads prefer high bandwidth
over latency. When these two streams could choose between a narrow,
yet short distance modem connection and a wide, but unresponsive
satellite link they would probably choose differently. In the current
setup, where the intermediate nodes govern the decision making process,
disseminating between these types of streams, while possible, needs
cooperation from all participants in the process. Chances are that at
least one intermediate router uses a one-size-fits-all strategy, in which
case one of the streams will have a suboptimal connection. Active networking,
instead, would allow the end users to control their stream, a more natural
approach. By moving data processing options from the nodes
to the data itself we can not only alter the way data is routed
through a network. Other possible uses include on the fly data compression,
- conversion, - aggregation and - multiplexing. These features can have uses
for fields such as realtime multimedia delivery, multicasting
and network management, among potentially many others.
Discussion
Active networking has not yet caught on as an
established technology. This can be expected when
we look at the arguments speaking against AN deployment.
Out of possibly many,
we will here discuss what we believe are the
main contributing factors to the
slow adoption of active networking in every day practices.
Resource Consumption
Firstly, AN systems consume more resources, both in terms of network
bandwidth and processing time, than regular packet switching. In
the past, the trade-off between flexibility and efficiency has always
lead to optimizing the latter. This trade-off has changed in the last
few years. Increased processing power has enabled
us to start focusing more on flexibility.
Researchers working in the AN field obviously believe this
process will continue in the next decades, but this reliance on
increased processing power is a gamble. Even they must admit so much.
Relying on an increase in processing power is not enough,
therefore, to implement AN based software systems,
since the relative performance penalty
remains the same. Concurrently with the drop in processing costs,
new research has led to more
efficient AN implementations. If both processes continue
the current technological barrier will cease to be an issue at some
point in the future. Exactly at which point in time AN's can compete
with traditional systems depends on the rules of the game. Each
field of application has its own demands. Especially those fields where
network latency is a lesser issue are likely candidates.
Traditionally, processing power has been an expensive resource. To
maximize efficiency, network resources have been
designed to minimize per packet processing overhead, which has led to inflexible
machines running one-size-fits-all policies.
Since the price of processing power has dropped considerably in the
last decades, and most probably will continue to do so in the near future,
it is now becoming possible to increase processing. This does not
mean that we can increase per packet processing. Although processing power
increased considerably, network throughput has seen ever bigger growth numbers
and per packet processing time has actually decreased.
Despite of this, it has become possible to allow more complex
processes on the remote node for a subset of tasks that could benefit greatly.
Dataplane vs. Controlplane
We already suggested that active networking
might not be suited to all networking tasks. To further investigate to
which tasks it might be applicable we will introduce a distinction
frequently made between two classes of networking tasks:
dataplane tasks and controlplane tasks.
The dataplane is the domain of general purpose networking. All
processing that occurs inside the network is generally regarded as
part of the dataplane, with the exception of those messages needed
for administering the dataflow. The controlplane is the logical domain
of these `meta' messages. Examples of controlplane tasks
are therefore connection setup and destruction, synchronization and
network management.
One of the practical outcomes of this distinction is that dataplane
processing, forwarding of packets, has to be carried out at top
speed to allow for high performance tasks. Control
plane processing, on the other hand, is less time critical in
general. This brings us to employing active networking
first in the controlplane. We should only try to incorporate active
networking into
dataplane processing when it has shown to be able to handle the less demanding
controlplane operations.
We should point out that the distinction between dataplane and
controlplane is becoming blurred with the increase in complex
processing used in the dataplane, e.g. QoS negotiation and tasks specific
routing, and the growth in single frequency networks, where both
types of communication are handled through the same circuit.
Active networks have so far been suggested mostly for
tasks traditionally found on the controlplane, we believe with good reason. Again,
it is not our intent to propose active networking as a general
replacement for traditional practices.
Young Technology
Another reason for the limited deployment of active networks
worth mentioning is the
fact that it is still a relatively young research domain. In the
last years efficiency has increased, as we will show in section
. Other design issues,
most notably safety concerns, are still not resolved satisfactory.
Research into these issues is underway, but until this is dealt with
many applications cannot be replaced with AN systems.
Situations in which AN systems can already excel are being
explored. In [] Ramanujan and Thurber investigate
the use of active networks for dynamically scaling multicast video streams.
Using active network for management tasks has been
researched in [], [],
[], [], []
and [] among many others.
More successful case studies will
be necessary if we are to convince network administrators that they should replace
proven technology with AN alternatives.
Interoperability
Simultaneously with the drop in processing power costs, the application domain
for computer networks has expanded from basic tasks such as emailing
to fields including realtime
conferencing and streaming media broadcasting.
These new applications place extra
demands on the underlying network infrastructure, e.g. realtime delivery
constraints and increased effective data throughput.
Specialized technologies for quality of service negotiation and multicasting can
be implemented in network nodes to allow for such services, but each
of these only fixes a single problem. From past experience we can learn that
agreeing on a common standard can take a long time. Therefore it is
preferable to limit the number of agreements necessary. This is
where AN comes into play. By transferring the logic from the network
devices to the data packets we only have to define a single framework,
once, to address both present and future problems.
Increasing interoperability between rivaling implementations
is another issue for which a proper solution must be found.
Tennenhouse and Wetherall make the case for a flexible approach, which
leaves the possibility of adding newer designs open:
[...] we are not suggesting that a single model be immediately
standardized. The tensions between available programming models and
implementation technologies can sort themselves out in the research marketplace
as diverse experimental systems are developed, fielded, and accepted or
rejected by users. For example, if the marketplace identifies two or three
encodings as viable, then nodes that concurrently support all of them will
emerge. As systems evolve to incorporate the best features of their competitors,
we expect that a few schemes will become dominant.[...] []
The Active Network Encapsulation Protocol ANEP
is an example implementation of research underway to create an encapsulation
layer to further reduce transition problems.
The End-to-End Argument
An notorious problem holding back the deployment of AN is
the extra processing it incurs in highly connected nodes. Increasing
processing in the Wide Area Network (WAN) or backbone
can decrease overall performance.
If only a subset of all packages incur this overhead a skewed situation
grows where all packets suffer from a cost that is only acceptable for some.
it is therefore considered bad practice to move services to the
core of the network that are not beneficial to all packets.
This counterargument to processing in the network's core
is generally referred to as the end-to-end argument [].
One service often rejected following the end-to-end argument
is the interpreting of programmable packets: active networking.
However, in the case of active networks this argument has been dealt with in part.
In [], Bhattacharjee et al.
demonstrate that by localizing data processing
overall resource consumption can actually decrease. Their counterargument
is that central nodes in the physical network are themselves often endnodes
in view of specific tasks. Moving processing to these nodes is therefore not
only legitimate, but even preferable. From this discussion its becomes clear
that not all networked task are equally suited to be ported to
active networking environments. Network management is, as we will see
in chapter . The experiments will
show resource consumption savings similar
to those in [].
Security Concerns
A major concern with all remote invocation frameworks is or at least
should be security. Many holes have been found for instance in the
Microsoft Windows operating systems relating to automatic execution
of unsafe ActiveX components or VBScript routines.
Active networks have so far been proposed mostly as research project
and security has often been neglected because of that. While not a problem
for a research project, this limits adoption of existing frameworks
for critical tasks. A production environment will need to have
tools for authentication and encryption. One simple short-term
solution would be to use IPSec [] to secure the underlying transport
layer. In the case of network management one could use the SNMPv3
security framework. Script MIB is a remote invocation scheme that relies
on SNMP's infrastructure for safety. In any case, securing an active
network, even if it is itself insecure, should pose no technical boundaries.
Overview
Some arguments pleading against active network deployment can be
refuted, others' impact can be reduced. However, we believe active
networks should be implemented only in those situations we can expect
them to be directly of value. Since AN research is a relatively
young field we should first focus on tasks in the controlplane.
The next section will briefly discuss the current state of AN
research by discussing the best known implementations. The following
chapter then deals with applying active networks to network management
and tries to answer these questions: is it doable?
is it preferable? how is it to be accomplished?
2.2.4 Available Frameworks
Many active networking frameworks have been
introduced over the last five years. Giving a complete listing of all these
packages is not our goal at the moment. The following will only introduce those
packages that we deem important contributions to the research domain. Many of the
following have been made possible through funding by DARPA.
We will follow the thorough overview written by Jon Moore in [].
In this work he identifies the most actively developed initiatives and
discusses them based on their safety, efficiency and flexibility.
For a more detailed overview of the field we kindly direct the reader to
the references section or the previously cited document ([]).
At the Massachusetts Institute of Technology research is underway on
two related environments. The Active Network Transport System
orANTS [] for short is one of the oldest ANs . ANTS is set apart
from the rest by its code-referencing interface, i.e. packets do not carry the
actual code, but links to this code. The need for increasing efficiency led to
the Practical Active Network (PAN) []. PAN started as a
follow-on to ANTS to research practicality based on computation overhead.
Its special characteristics are therefore efficiency and speed optimizations,
such as in-kernel execution and code caching.
Another early entry has been and is still being developed at the University
of Pennsylvania:
the Packet Language for Active Networks []. PLAN is
based on the notion that safety can be ensured - at least partly - by
reducing the expressiveness of the language. Continuing on this
work the Safe and Nimble Active Packets (SNAP) environment was introduced by
members of the PLAN team. SNAP will be discussed in greater detail in chapter
. Distinguishing feature is the controlled execution environment
created by reducing the allowable language constructs.
Efficient execution was another key point of the design. Both
PLAN and SNAP are part of the so called Switchware project [],
a program for researching different approaches to active networking.
One of its other subprojects is the Secure Active Network
Environment, SANE []. As the name implies, it mostly
researches security in connection to active networks. For this purpose
it utilizes public/private key cryptography.
A non academic player, BBN Technologies, created the
Smart Packets environment []. Smart Packets were
especially tailored to network management tasks. Since these tasks often have to
take place in partly failing networks robustness was a key concern.
Another approach is being taken with the MØ mobile agent
environment []. MØ uses agents to create all the functionality
in a network for which synchronization is necessary. It is based on
the MØ language and is being developed at the University of Uppsala
in cooperation with the University of Geneva.
The SafetyNet [] initiative being developed by Wakeman et
al. at the University of London is another example of a network where
safety is being enforced by language constructs.
Finally, StreamCode [,],
developed by NEC in cooperation with ETH-Zürich, is a high performance
AN system. The discerning feature of the system is the hardware
implementation they use to execute code at the speed at which it is read from
the network. It seems that research into StreamCode has stopped, however.
The environments mentioned are but a subset of the research domain.
Research is also underway in fields that, while using a different
terminology, share a lot of features with active
networks. The boundaries between these can sometimes be very thin. Mobile
code, for example, can be seen as an active network without the
attention to intermediate node execution. Extensible kernels are
also a research domain where foreign code is allowed to execute on a networked
node.
On the networking side research is taking place into ad-hoc and
mesh networks, systems designed to dynamically reconfigure
themselves. These fields encounter some of the same problems as active
network research, specifically the performance and security issues.
Most closely related to active networks is the notion
of mobile agents. Agents are defined as small programs running
in the background. Mobile agents carry out requests by traversing
a network and executing at cooperating nodes. In contrast with
active networks agents are mostly geared at application level
processing. The term agent has been used in relationship with
active networks, however.
With the information provided in this chapter one should be able to follow
the line of thought set out in the rest of this work. The next chapter
starts with the inspection of problems related to traditional network
management.
We will also make the case for using active networking
technology to overcome these problems. Finally, our main thesis will be
discussed in detail.
Chapter 3 Issues and Contribution
Network Management has seen a transition from ad hoc fixes to
standardized methods in the 1980's. Since then advances in NM have
nearly stood still. Changes in application networks have not. The gap between
what's needed from network management tools and what
they can deliver has necessarily grown wider.
We will explore some of the problems currently facing NM in this chapter.
Previous solutions to these problems will be analyzed. Finally,
we will discuss a novel approach combining the useful features of the
alternatives with the performance and stability of SNMP.
3.1 Network Management Issues
Since regular users aren't concerned with network management, one can say that
it is a necessary evil. In an ideal world, all of the network's
bandwidth would be available for application communication. Minimizing the
footprint of network management on resources is one issue that will continuously
exist. It is questionable, however, whether this is a priority to
administrators. Following are a number of network management concerns,
some have to do with optimization, others pinpoint specific
shortcomings of the SNMP infrastructure. This overview is based in part
on a previous inspection by German Goldszmidt [,1.3, 3.0].
Performance
SNMP has been optimized for performance. Since it only allows relatively
basic operations this results in local optimizations in view of the
entire NM process. Depending on the complexity of the task at hand
further optimizations can be envisioned. Alternatives have proved to
outperform SNMP in specific tasks, some of which
use active networking as a basis. References are given
in the Related Work section ().
Bandwidth Consumption
While performance is of value, it is not the only feature worth optimizing.
The communication methods of SNMP are intentionally low-level, so that
the system can be used as a basis for
building complex operations. A consequence of the limited communication options
is a bloat in data transfer. The rigid structure necessarily placed on a network
infrastructure when using SNMP increases the amount of traffic generated
for a request. The SNMPv2 protocol revision eased the communication
restrictions somewhat by providing the option of creating hierarchies of
management nodes. Channeling data in this fashion can reduce bandwidth
needs considerably. Still, many other topologies that can offer further
savings aren't catered for.
A note worth making about bandwidth consumption is that optimizing it
has only value in specific situations. First and foremost, optimizations
in network management throughput will only benefit end-user applications when
the controlplane and the dataplane use the same network circuit. In
many Internet connections this is the case, e.g Ethernet and telephone modems.
However, when protocols
allow for multiple independent circuits, as is the case in ISDN,
or frequency division (FD) multiplexed fiber-optic networks, reducing
control channel bandwidth consumption will do nothing for the other channels.
When the control channel is not utilized maximally there is no need whatsoever
to reduce bandwidth consumption.
Scalability
Predefined topologies can be suboptimal in certain situations. This can render
them impractical to use. Especially the centralized topology of SNMPv1
has the problem that bandwidth consumption scales linearly with the number
of nodes. A centralized topology is - from a scalability point of view -
the single worst option. Not only bandwidth needs
increase dramatically when centralized networks scale.
By overloading the management station with information response times may
increase. Software complexity can also increase considerably. To a lesser extent
the same argumentation holds for hierarchical approaches.
Proponents of
existing infrastructure often resort to the statement that practical examples
show that current infrastructures can handle the load. Countering this
claim is simple: the question isn't whether it is doable. Better would be
to ask whether it is preferable. The obvious answer is
no, it is not.
Functionality
Partly due to its standardized nature SNMP also imposes a rigid structure
on the data format and on data access methods. Functionality of the system
is clearly defined, but for most modern cases simply not in an appropriate fashion.
A very simple example of this is the fact that SNMP cannot easily show us
information regarding relative data error rates. It's even harder to request
concise information from a complete network. With SNMP we must resort to sending
low-level GET requests for each individual value to each individual node. For
devices that do not have much computing power this situation is
preferable. However, many networked devices have ample computational power
that can be used for network management. SNMP simply does not allow differentiation
between devices, while almost all networks are heterogeneous in nature.
The crux of the matter lies in the fact that since all processing must
take place at the management station overly complex code is necessary at
that site to control the entire process. From the low-level communication
mechanisms central to SNMP follows that the management stations must
involve in tedious micro management tasks. In []
Bhattacharjee et al. argue the case for implementing code inside the network
nodes "because certain functions can be most effectively implemented with
information that is only available inside the network".
We should not misidentify these functionality concerns as performance or
bandwidth issues. Although there exist common situations, the functionality
argument also holds for individual scenarios. From a performance perspective
it is unimportant whether requests have to be sent to multiple nodes, since getting
the complete response only takes as long as the longest link when executed
in parallel. Obviously, bandwidth occupation can equally remain unchanged.
Reliability
Reliability concerns, too, point to introducing new methods for managing
network systems. A problem inherent to using a centralized scheme is
that such a method introduces a single point of failure, namely the
management station. The same holds for extensions on the centralized
topology. Hierarchical approaches also suffer from the single point
of error problem and introduce more intermediate `choke points'.
Even if availability of the management station is so high that the previous problem
does not crop up, overall network reliability can be crippled by imposing
a centralized or hierarchical management topology on the network. Apart
from placing a greater responsibility on individual nodes, the significance of
a small number of edges in the network is also increased. Especially
on occasions where a speedy reaction is important, e.g under high
bandwidth consumption, will network redundancy pay off. However, if the
management topology itself has no access to redundant resources it will
fail to resolve certain issues. For instance, errors in the route tables
can remove access to parts of the network, even when there are back up
physical connections. Being able to route around error spots, or right
into them, can be necessary to fix such issues.
Security
Remote configuration of devices needs a trustworthy communication protocol.
Malicious attacks are carried out on networked devices all the time.
It is therefore highly necessary to secure the management infrastructure
tightly. That this is a responsibility of a NM toolkit has
been recognized recently, with the addition of encryption and
authentication to SNMP, leading to SNMPv3.
According to a survey executed in December 2000 most administrators still
use the extremely insecure versions 1 or 2c [].
Why would people keep using the older versions when
a secure alternative exists? From the survey we learn that lack of SNMPv3
support in existing devices is troubling adoption. Therefore
we can expect the problem to go away for SNMP in time. An interesting
short term solution can be found in using a secure layer below SNMP.
Secure IP has been made possible by the IPSec infrastructure.
IPSec is a standardization initiative managed by a taskforce of
the IETF [].
A study on
how SNMPv2 with IPSec compares to SNMPv3 has been carried out [].
Perhaps of concern in the short term, in the long term security is not
a main concern in the SNMP world. However, we decided to mention it,
since securing an network management infrastructure is of vital importance.
Applications rivaling SNMP must have security features in place
before they can be seen as adequate adversaries. Recent experiences in
using IPSec to add security to insecure SNMP show promising results. The
same solution can perhaps be applied to other insecure NM toolkits.
Concluding Remarks
The weaknesses of the SNMP system are well known. Therefore its status as predominant
NM tool has regularly been under attack. Yet for various reasons none
of the challenging technologies has been able to surpass it.
Failure due to lack of industry support is not the main issue in most cases.
However, in a situation where a single entity has such a strong presence
as SNMP we should take this practical consideration into account.
To overcome the problems inherent to SNMP, one could suggest designing a
new system from scratch. Considering the enormous initial costs of having
to replace existing software and even hardware, widespread adoption of such
a system will be very unlikely in the near future.
Therefore a more gradual transition has to be made.
First and foremost, a new network management
infrastructure should be able to work together with SNMP.
Any new features can then be built upon the infrastructure already in place,
mixing SNMP queries where necessary with improved communication where possible.
Finally, a unified interface is necessary, seamlessly integrating results obtained
the old-fashioned way with those obtained with the augmented system.
3.2 Related Work
Since the inception of SNMP alternative technologies have
tried to overcome some of its shortcomings. Already
mentioned are the CMIP/CMIS infrastructure, RMON and AgentX extensions
and, of course, the revisions of the platform itself.
A lesson we can learn from these technologies is that extensions
to SNMP are much more easily adopted than
a complete overhaul of the existing infrastructure. Research initiatives
carried out in the last decade point to the same heuristic.
Hierarchical Topologies
Various types of contenders have been in the
spotlight during the 90s. First off were the hierarchical
networks, of which CMIP/CMIS was the prime contender. Most of these
have died a silent death. Instead
of overthrowing SNMP, the idea of hierarchy was simply adopted with
the launch of version 2 of the protocol in 1993.
An interesting paper concerning hierarchical layouts that we'd like to point out
does not place the management
objects, but the policies in a hierarchy []. Doing
so can increase automation of network management tasks, something still under
investigation today.
Two-Tier Software Suites
The next big thing in network management research has been
the creation of two tier networks. Based on the insight that
micro management can complicate high level tasks, alternative
solutions have been sought in adding higher level languages
to the low level SNMP interface. Generally, these middle-ware
tools can be seen as an extension to the hierarchical approach,
a point made very clear from the title of this paper discussing
a two tier approach:
"Hierarchical Network Management: a concept and its prototype in SNMPv2" [].
A number of publications
introduce an object oriented middle-ware layer to hide the
lower level details from the manager [,].
At the moment we can safely say that two tier networks have largely
failed. While management stations often have sophisticated software
packages for handling SNMP micro management, no toolkit has seen
wide adoption in legacy devices.
Web-Based Management
Perhaps connected to the Internet boom of the late nineties is
the idea of moving management tools from specialized software packages to the
World Wide Web. Web-based approaches to network management, too,
have been proposed []. Often, the web-based
approaches are no more than a specific instance of the two-tier
examples discussed previously. Web-based management is still being considered
for various tasks, for instance directory services. The Web-Based Enterprise
Management initiative [] by the Distributed Management Taskforce (DMTF)
seems to be widely supported. However, this initiative is mainly concerned
with accounting and web technology support, not with the tasks that are handled
by SNMP.
Remote Invocation
Moving processing from the management station to the remote nodes has been
executed many times. Remote code loading can result in unsafe execution
environment, therefore most implementations use domain
specific scripting languages, for instance IDEAL [].
Standardized in RFC 2593 by
the Distributed Management Workgroup of the IETF,
script MIB [] is the de facto standard for building
scripting environments around SNMP. As the name suggests, script MIB
adds a new subtree to the MIB that contains known scripts and
allows remote invocation of these scripts. Once more SNMP has
not been overthrown by rivaling technology. Instead, we've seen a
incorporation of scripting technology into the base
framework.
Overview
The historical progression of network management tools in
the last ten years shows two strong heuristics for future research.
First, there seems to be a continuing growth in flexibility of
suggested paradigms. Starting from the centralized approach,
we've seen hierarchical layouts, weak distribution of tasks and
strongly distributed remote processing. A number of papers discuss
this progression in detail and can thus serve as a thorough
background introduction into the
field [,]. In
short, topology growth has seen the following trend:
unconnected ®
centralized ®
hierarchical ®
multi-tier ®
topology independent
The second rule of thumb we can distill from the previous overview
is that environments coupled to the existing SNMP infrastructure
always seem to be preferred over their autonomous counterparts.
Taking financial costs of hardware replacement
into account this is to be expected.
Before we continue with an overview of active network based
solutions it is necessary to name some key terms used often
in the referenced literature. Terms such as Management
by Delegation, Distributed Management and Decentralized Management
have been used throughout the decade to denote different
network management extensions. Used to describe
hierarchical solutions in the early nineties, they are still often
cited to point to remote invocation schemes or intelligent agent
based designs. As such, these terms give an indication of the
general approach taken to network management research.
The last group of research projects we need to discuss are
those that are completely topology independent. Extending the
idea of remote method invocation is that of roaming processes
or mobile code. In this setup, scripts are executed remotely
not by direct orders from a central authority, but algorithmically.
Scripts can roam more or less freely through the network and
exhibit some intelligent behaviour. Again blurred by the
use of independent, yet overlapping terms, such as Mobile Code,
Mobile Agents, Intelligent Agents and Active Networks,
this field encompasses a wide range of projects. We will
refer to all solutions, contrary perhaps to their developer's
original definitions, as active networks.
3.2.2 Using Active Networks
Active networking allows for far greater flexibility than predefined data
processing. This can, if implemented right, reduce the amount of
network bandwidth occupied by management related traffic. Pre-processing data on
the remote hosts will increase demands for processing power, but can result in
less transferred data, since aggregated results can be sent to the remote
administrator. With more remote processing power becoming available active
networks are increasingly becoming a viable alternative to existing techniques.
An introductory overview of agent based management
paradigms can be found in [].
Programmable packets can decrease response time by
directly reacting on the remote node, as discussed
in []. By using simple algebraic operations
we can compute derived results at the node. These results can be directly
used as input to decision making algorithms that respond locally. Especially
under high network load direct response will be of use. But even under
normal circumstances this can help decrease NM overhead.
ANs allow a greater degree of cooperation between nodes. Since
packets can traverse the network as they see fit we can let a packet
flow through a subnet, gather global instead of local results and
react accordingly, all without intervention from a management
station.
Unfortunately, most AN systems in place today cannot exploit the features
described above. Processing costs of AN-based tools can be orders of magnitude
larger than pure packet processing. SNMP code runs as a local
executable, whereas active packets need to be interpreted. Systems based
on high level interpreted languages, e.g. Java, simply cannot offer
the performance needed in the general case to profit from these additional
features. A second class of systems has dropped security constraints
in favour of performance. These cannot offer the stability needed for
a network management application.
Employing active networks for network management has been frequently
proposed [,,,,,,,,,],
yet up until now the community has only
succeeded in surpassing SNMP performance in specific scenarios or for
large networks[,].
3.3 Definitions
Some of the abstract terms we will use in the remainder of this work
can have an ambiguous meaning.
Therefore we will discuss them in detail and present
concise definitions.
3.3.1 Functionality
According to the Oxford English dictionary, functionality can be described as
"of a functional character", where function is
"the mode of action by which it fulfils its purpose". Increasing a system's
functionality in this sense means either (1) increasing in how far it fulfils
its purpose or (2) expanding on the purposes it is designed for.
We will strive to do both. The first, increasing the efficiency of
a network management environment, relates to such issues as decreasing
response time and bandwidth utilization. The second goal is less well
defined, We will use the term flexibility for reasons discussed below.
These goals cannot be seen independently of one another, since they are
not orthogonal values. Increasing a system's efficiency will naturally lead
to opening up new uses for it. Similarly, extending the reach of a network
management system will not only allow it to handle more tasks, but can
- as we will see - give it more options for solving existing tasks,
thereby possibly increasing the efficiency of the system.
Efficiency
To increase in how far a system fulfills its purpose is to
improve upon the existing situation. SNMP's weaknesses have been
mentioned previously. Especially in complex scenarios, SNMP usage
can entail unnecessary micro management at the monitoring station
and correspondingly poor system-wide
response times. This Achilles heel reduces overall functionality, since
it makes certain tasks impractical to carry out.
Flexibility
Flexibility can be defined as the degree to which a system can
adapt to its surroundings. A flexible system, therefore, can more easily
adapt to a new environment than a rigid one. As such,
it can expand beyond those environments for which it was originally devised. It
can expand beyond the purposes for which it was designed.
In our case this boils down to (1) how well a
NM environment can adopt the logical topology of the underlying network and (2)
how well a solution can be tailored to a problem at hand within a certain framework.
Topology adoption in SNMP is restricted to either a basic client/server
structure or a hierarchy of proxy servers. Improving on this using
an active network is elementary considering the programmability of
network traversal on the nodes.
Adapting to a specific task will most likely also be straightforward thanks
to the programmability of the individual solutions, i.e. the packets.
In short, it is precisely the flexibility argument which points us in the
direction of using active networks. Flexibility in itself is not a virtue,
however. We should utilize the flexibility sported by
active networks to execute network management tasks that
cannot be executed using SNMP.
3.3.2 Performance
Performance is the speed at which tasks are executed on average.
As similar attempts have been frustrated by a severe performance penalty on
basic tasks, it is imperative that we optimize for low-level responses equally well
as for complex scenarios.
A twofold solution to this problem has been chosen. Firstly, we rely on an
active networking environment that has been proven to handle other simple tasks
equally well as traditional software. Secondly, and in line with the
interoperability argument discussed in 3.2.1, we have no
intention to replace SNMP but mean to cooperate with the system.
3.4 Contribution
In theory, active networks can surpass SNMP both in terms of performance and
functionality. Increased processing cost has so far held back successful
introduction of an AN based system.
Recent advances in active networking technology have reduced the overhead
they incur dramatically, while maintaining the necessary safety measures.
Since this new class of active networking environments will not
necessarily suffer from the performance penalty inherent to earlier systems
it should now be possible to provide a general alternative to SNMP.
It is our thesis that that the new class of active networks
can, contrary to popular views, help to increase
network management functionality
without incurring a performance penalty on basic operations.
To improve on the existing functionality an alternative must decrease system
response time, decrease network bandwidth and allow far greater flexibility than
is achievable using SNMP. At the same time, basic operations may not suffer
a severe performance penalty, which was the case with previous attempts.
We will implement an AN-based network management application
by combining a traditional SNMP
interpreter and an active network interface. Experiments will be carried out to show
that the AN
interface is superior to SNMP in terms of functionality and equivalent regarding
performance. By combining the existing infrastructure with an augmented interface
interoperability concerns are also tackled.
3.5 Reliability and Security Concerns
In the introduction we stated that a network management tool should be secure, reliable, efficient and
flexible. The latter two terms have been discussed and will
recur often. By focusing specifically on increasing functionality we have chosen to
disregard reliability and security concerns. These concerns, for instance
safety of execution, robustness, stability and authentication
will be mentioned, but are not our primary concerns.
Where appropriate, these terms are used as indications of valuable
production-ready features, not as elements of our research.
The terms introduced above can serve as testing guidelines, but are too vague
to directly compare network management environments' merits on.
In the following part we will introduce a software design that - in theory -
could help us attain our goals. Also, test scenarios that translate the
discussed terms into quantifiable values will be introduced.
Part 2 Design
The goals set forward can lead to conflicting requirements. The design of
the software should be such that an acceptable trade-off between opposite
values is made. We will outline the design below. Decisions will be discussed
in relation to the goals that must be satisfied.
4.1 Dual Interface
Interoperability with SNMP is one of our primary concerns. Performance
another. Creating a system which fulfils both goals at the same time
poses some problems. Walter Eaves of University College London has
created an interface between an AN and SNMP using Inter Process
Communication [].
Doing so improves interoperability, but IPC adds a lot of latency.
Instead, we opt for combining the active networking and SNMP interfaces
into the
same executable. This should give the same benefits, but without the
performance penalty. The result is shown in figure .
Figure
Figure 4.1: A framework design: a common MIB back-end services requests from
both SNMP and mobile agent front-ends.
The resulting application will be able to act as a drop-in replacement
for the traditional SNMP server. Additional constraints are necessary to
ensure that AN behaviour does not interfere with SNMP handling.
It is essential that the SNMP interface can listen for connections concurrently
with the AN handler. This can be accomplished by either implementing the
software using multiple threads of execution or by merging the connection
handling routine. Practical limitations have lead to the latter, details
will be discussed in section .
Merging the two interfaces into a single application may not alter their
original behaviour in any way. The original systems will undergo changes in time.
It is therefore important to separate the two systems as much as possible.
Practically, this boils down to only sharing the connection handler between
the two platforms. All other code must stay independent from one another.
Selecting which connection handler to keep
is a practical issue and therefore implementation dependent.
4.2 Common Back-end
Because both interfaces need access to the same information it is
no more than logical to share the data repository. As discussed, SNMP relies on the
standardized Management Information Base. Instead of creating additional
access methods for the active networking environment we opt for combining
the back-end access methods similarly to how we combine the frond-end
connection handler.
Giving both interfaces access to the exact same information repository
has some additional benefits. By doing so we can compare the two
systems solely on communication metrics. Therefore our claims are
necessarily data independent. Furthermore, configuration changes
made with one system are automatically and instantaneously visible to the other.
This is highly beneficial for interoperation with existing systems.
4.3 Modules
The separation of concern discussed in the previous section naturally
leads to a modular system design approach. By separating the functional parts
from each other, updates to the individual parts will most likely
be less disruptive to the whole. Inspection of the task list suggests
dividing the system into the following modules:
- a connection handler
- an SNMP request handler
- an active network packet handler
- a connection between SNMP and the MIB
- a connection between the AN and the MIB
- the Management Information Base software
Depending on the software we select as our reference SNMP system some
of these may already be implemented. The necessary extensions will have
to be inserted somewhere into the available package. Precisely two
connection points are of concern, namely the integration of the handlers
and the integration of the MIB access methods. How this is to be accomplished
is not a design -, but an implementational issue and will be discussed in
section .
4.4 Client Application
On the monitoring station a last piece of software will be needed.
A client application has to send data to either an SNMP aware server
or to a node in the active network. For this cause two separate applications
can be used whose behaviour is identical, obscuring the communication
taking place in the background. Merging the two into a single application can be a
next step for productivity, but is not strictly necessary for our research.
For the SNMP client a standard application can be selected. SNMP requests are
normally sent over UDP directly to the server. In essence, an AN client
can be constructed to follow the same principle. Doing so will ensure
shortest delay times. Instead, we will
only use UDP communication between the client and a local server. From
then on server to server communication relying on active packets takes
over. Results are sent back to the client using the reverse procedure.
From
a performance point of view it might be wiser to use direct communication between
client and server. However, in doing so we feel we would be simplifying the
test environment to such an extent that it might not even be classified as an active
network. For the claims set forward it is enough to prove that performance of an AN-based tool
is equal to that of SNMP. We will see in the experimental chapters
that even with the selected, suboptimal solution comparable results can be obtained.
In line with the quality concerns outlined in section
both client applications will have to be able to process multiple identical
requests and report extensive benchmark results.
The identification of a client application wraps up our design of a
network management system. In the following chapter we will present
test cases that can be used to compare an implementation of this design
with a reference SNMP application. Thereafter one such implementation will
be introduced.
In this chapter situations will be outlined that can serve as benchmarks
for proving the claims set forward in section 3.4.
Applicability of each situation in relation to the goals will be discussed.
Furthermore, a suitable network topology for testing will be displayed.
5.1 Quantifying our Claims
In section 3.4 the two goals underlying this thesis and
methods for achieving them have been mentioned. To obtain quantifiable
results it is imperative that for each method an accompanying metric is
found.
Functionality
Recall that we split the functionality argument in two: efficiency
and flexibility. We will now have to find case studies for both
values. Starting of with efficiency, metrics have to be found for
efficient processing. Two measures for processing overhead can be found:
consumption of processing resources and
consumption of network bandwidth. The first can be quantified by counting
used processor cycles for abstract cases or by simply keeping time in a
more real world scenario. For the second argument bandwidth calculations can be
carried out, using only pen and paper if the traversal of the data packets
is known a priori or by reading the traffic from the network.
In both cases we will opt for an experimental approach,
selecting roundtrip time time in m seconds and actual traffic in Bytes.
In the next section we will make the case for
a set of scenarios that should mimic real world practices as closely
as possible. If the model we select for benchmarking resembles actual
use close enough, performance and bandwidth measures should suffice to
make the case for efficiency.
Flexibility, the degree to which a system can adapt to its surroundings
is difficult to translate into a metric. Instead,
argumented case studies are given to show that: (1) increasing
flexibility has a positive result on efficiency and (2)
active network based systems are inherently more flexible than SNMP.
Examples are given of recurring network management tasks that cannot
be handled by SNMP due to its rigid structure.
Performance
Compared to traditional software, AN-based tools will always incur
additional processing.
Also, tasks that can be handled by issuing a single SNMP request do not
benefit from the increase in functionality active networks can offer.
From these two observations
follows that an AN-based management tool will naturally be less efficient
for simple tasks. Our goal
is not to outperform SNMP for these low-level requests, but to minimize the
penalty. For this, benchmark timing results are necessary, comparing SNMP with an augmented
system under simple low-level request handling scenarios. Comparisons will be
made on the round-trip time in m seconds.
5.2 Test Selection
5.2.1 From Claim to Scenario
The thesis defines visible goals. Testing performance can be carried out
in a `clean room' environment, but doing so makes no sense when
analyzing functionality. In the experiments
the systems should therefore be viewed as black boxes. Real world
behaviour is our primary concern.
In order to measure the relative functionality of a network management
tool we must
identify the various classes of tasks it must be able to perform.
Subsequently, experiments are to be carried out that compare SNMP functionality
with the functionality of the active network based environment.
5.2.2 Functionality Tests
At this point it should be mentioned that every class enumeration can be debated.
The FCAPS model introduced in 2.1.1 could, for example, be used to identify
different classes of scenarios based on execution domain. Instead, complexity
will be used to partition the use space. This decision stems from the
observation that since each variable is stored and retrieved in the same
manner under SNMP partitioning into execution domain based classes does not necessary
divide the method space. Executing the same request for multiple types
of variables will not show more interesting results than running the request
once. Selecting test cases based on their complexity will allow us to
compare multiple methods of communication and, if designed well, span the
entire field of network management tasks.
Tasks can be divided into levels of complexity based on:
- the need for postprocessing, i.e. computation of derived results
- the number of actors involved; are we dealing with a single client/server
request, a number of distinct servers or a distributed problem?
- the type of actions performed based on obtained results
From this list a set of scenarios can be derived that demand increasingly
more complex operations for successful execution. This gives:
- requesting directly available information from a single node; no response
- requesting derived information from a single node; no response
- requesting directly available information from multiple nodes; no response
- requesting derived information from multiple nodes; no response
- requesting information derived from data spread across multiple nodes; no response
- requesting information derived from data spread across multiple node;
executing additional actions based on these results
Since scenarios 3 and 4 consist of no more than the resending of requests from
scenarios 1 and 2 their metrics can be computed from the other scenarios.
Performance can be reduced to the performance of the slowest connection by
executing requests concurrently, while network bandwidth will simply be
the sum of all individual connections. Therefore these scenarios will not
be explicitly carried out.
5.2.3 Performance Tests
The previous set of tests cannot completely visualize the performance
penalty under basic operations. Only scenario 1 deals with low-level requests.
Since SNMP has multiple types of requests and can bundle multiple requests
in a single PDU we have to expand the set of basic tests. For clarity these
tests will be separated from the others. The previous set of tests deal
mostly with the functionality of the tools while this set focuses on
performance. The set of experiments is subdivided into these two categories as well.
Naturally, scenario 1 will not be reproduced in the functionality tests
since it already features extensively in the performance tests. To keep
with the increasing order of complexity the order of tests is hereby
also defined.
SNMP requests mostly deal with retrieving information
using either the GET, GETNEXT or GETBULK requests. Furthermore
variables can be set using SET requests and traps can be created,
allowing a node to send a result only when a predefined threshold
has been reached. An application must be able to handle all GET and
SET requests and, indirectly, also GETNEXT and GETBULK requests.
Since our interpreter will be a stateless device we will not handle
traps.
Ideally, a comparison could be made on a weighed set of operations
mimicking dynamic SNMP workload. Unfortunately, no figures are available
on the relative frequency of request type uses. In vague terms we
already made the statement that GET requests are most probably more
abundant than SET requests. This is intuitively clear, since SETs only
occur when conditions change, while GETs are usually executed at fixed intervals
to identify the current conditions. Most of the time no changes will occur,
therefore most of the time a number of GETs will be sent before a single SET is executed.
Since no figures on relative usage can be found we will
make no statements regarding combined
performance gains. Instead, we will discuss each request type as an
individual test case. To obtain a wide range of statistics
the following requests have been selected.
- a single GET request (GET1)
- a single SET request (SET1)
- a request containing five GETs (GET5)
- a request containing one SET and one GET (GETSET)
The GET1 and SET1 queries have been chosen for obtaining
standard benchmark results. Also executing GETSET and GET5
instructions can display trends not visible from a single
request, e.g. performance increase by bundling requests. We
have selected these and not GETBULK or GETNEXT instructions
because they can be more naturally compared to the single GET1 and
SET1 instructions.
5.3 Performance Breakdown
Issuing multiple requests increases the possibility for trend analysis
based on subtasks in the communication process. Knowledge of
the performance of these subtasks is of vital importance for optimizing
system throughput and latency. As an extension to the previous tests
we will discuss a breakdown of a single request transfer for various
network distances. Since we are only concerned with optimizing AN
performance the SNMP case will not be dealt with.
Performance statistics for the individual subtasks
will be displayed and analyzed. The knowledge we obtain here
can directly be made to good use when optimizing the functionality tests.
The communication process design was discussed in section 4.4.
Based on this, a single request procedure can be broken down into the following
subtasks:
in the client application:
- preprocessing: data structure preparation and connection build-up
- delay: the waiting time between sending the request and retrieving the response
- postprocessing: response processing and connection tear-down
on the intermediate node:
- preprocessing: translation from UDP request to AN packets
- delay: the waiting time between sending the request and retrieving the response
- postprocessing: translation from AN packets to UDP datagram
on the server node:
- preprocessing: execution of pre-SNMP code and translation into PDU
- delay: waiting for the MIB access handler
- postprocessing: execution of post SNMP code
5.4 Network model
As we are concerned with testing flexibility it is imperative
that the testing takes place in a network on which multiple
topologies can be visualized. This rules out simple layouts
consisting of solely linear or hierarchical connections.
Instead we selected a honeycomb-like structure. The precise
layout is depicted in figure .
Figure
Figure 5.1: Network topology used for experiments.
During the various stages of the test process the network
will be used to model different types of connections.
For the performance tests only direct client
to server connections are of interest to us. In order to
distinguish network latency from per node processing time
all tests will be carried out on servers at varying
distances from the client. To be precise, a linear list of
nodes will be selected in the network to which all requests
will be sent. The final results are closely related to the number of
intermediate connections. From now on this will be referred to as
the `hopcount'.
When testing functionality network complexity scales with the
increase in scenario complexity. Linear networks are no longer
sufficient in this case. Especially the distributed problems
have to be carried out on networks containing multiple routes from
one node to another. A honeycomb-like network configuration allows
packets to choose a route through the network from a large set of
applicable routes. More specifically, there are no single points of errors,
choke points, to be found in the center of such a network. Choke
points limit the option space, since all packets will have to route
through them. Such points can be considered a nuisance to us because they limit the
option space and therefore the quality of the tests.
5.5 Quality of Results
Due to fluctuations in network latency, processor scheduling and other
unmanageable issues the outcome of a single test can become
skewed. To rule out incidental effects as much as possible
averaging can be applied to a number of identical runs.
Computing an average can be accomplished in various ways.
The mean is computed by taking the
sum of all values and dividing this by the number of values. An
objection to taking the mean is that it doesn't really remove
erroneous results from the input set. Another averaging method
employed regularly in scientific work is taking the median. The
median is defined as the middle element of the input set. It
can be thought of as the discrete alternative to the extreme
of a normal distribution function. Stochastic distributions are defined
not only by the location of their extreme, but equally by their
spread. In terms of median calculation we will give an estimate
of the spread by calculating the accompanying `quartiles'.
Quartiles (Q1 and Q3) are defined as the medians of the sets that are created
by splitting the input at the original median (Q2). As such they give an indication
of the spread of the distribution.
When running performance tests the median will be used as the
value to be minimized. All results displayed will be, unless otherwise
specified, the median of 101 runs. The stability of the test environment
can be read from the offset between the quartiles and the median.
Therefore quartile offset for our test application is not allowed to grow
beyond that of its reference SNMP counterpart.
The presented tests can be used to compare network management tools.
The next part discusses
how the AN-based test application was implemented.
How well the specific implementation compares with a reference system
will be handled thereafter.
Part 3 Implementation
The previous chapter introduced a number of hardware and software features
necessary for implementing the active network management environment. We
intentionally delayed discussing implementational issues in-depth
until after the design.
In the following three sections we will discuss the
details of the software packages. To start off, we will identify all
prerequisites and select hard- and software based on these. The
next sections will then delve deeper into the selected packages'
implementation and explain our alterations to them.
6.1 Software Selection
Building a complete environment that fulfills all our goals from the
ground up is a cumbersome and time-consuming process. Due largely to the
abundance of freely available open source software packages it is
most probably also an unnecessary one. While there does not currently
exist a system capable of performing all the tasks we defined
there are packages that can relieve us of part of the work.
We also need to select an appropriate reference SNMP system to
compare our results with.
6.1.1 The net-snmp Package
To obtain useful experimental results it will be imperative that we select
as reference system an SNMP package that is widely in use. For reasons discussed
in section 4.2, this same system should also serve as the SNMP back-end
of our active network. Therefore an open source solution is to
be preferred.
The predominant SNMP package on UNIX systems, especially open source platforms such as
FreeBSD and Linux, is the net-snmp[] package originally developed at the
University of California at Davis. Previous versions of this package were
named ucd-SNMP for obvious reasons and are still in use today. Aside from being
widely supported and actively developed, the net-snmp package's source code is
distributed under a BSD-like open source license. This allows us to fully inspect and
alter the code where necessary.
In line with the argument given in 4.1 we have minimized code
sharing between this package and our extensions. No more than thirty lines of code had to be
inserted into the main SNMP agent codebase to make it accept additional packets. Separating
the concerns in this way has already proven its use, since during the development phase multiple
updates to the SNMP package have been incorporated into the source tree. We
have stopped incorporating updates when the first tests were run to ensure that
all results can be compared. The final version of net-snmp
we incorporated was version 5.0.6.
6.1.2 The SNAP Package
With the SNMP basis in place all we need now is an active networking environment that
complies with our demands. It must be able to process requests at
high speeds, since one of our goals is to handle network management tasks in
the same time as can be achieved with SNMP. Section 2.2.4
outlined a number of active networks. Most of these can be directly eliminated from our
selection list based on the performance requirement. Of those left most have not
been designed to enforce
safe execution, another feature highly important for a network management application.
The active network we have used suffers neither from performance nor from safety issues. Safe
and Nimble Active Packets, SNAP for short, has proven to be able to handle general
networking tasks in roughly the same time as standard packages. Basic networking
has been compared by implementing a ping request []. An introductory
test of network management was undertaken by comparing the latency of ping, SNAP and
SNMP in a Distributed Denial of Service use case []. Put together,
the results of these two tests suggest that we can accomplish our goals with the
help of SNAP.
Our work has been based on version 1.1 of the SNAP interpreter. At the time of writing
this package was not freely available on the Internet. However, an earlier version
can be found on the SNAP website []. Contrary to our approach with net-snmp
we did alter much of the codebase of the SNAP package. Combining the two applications in
a single executable meant we had to reimplement one as a library. Being less actively
maintained, SNAP was selected to be repackaged. The final layout of the
modules is depicted in figure .
Figure 6.1: Splash module layout
So far we have made a number of claims about the safety and performance of SNAP.
From its design follow characteristics not found,
at least not together,
in other active networking environments. Understanding how SNAP implements the
interpreter and how programs are encoded into packets is necessary if we want to build
a speed optimized NM package on top of it. These and other implementational issues related
to SNAP will be discussed in detail in section .
The original interpreter was in some aspects too limited to meet our demands. Therefore
we created a new codebranch and altered several important aspects, e.g. the
service library interface and client interface. These changes to the original
design we will discuss in section .
6.2 Hardware Selection
Finally, an appropriate network had to be selected. For the tests we obtained
access to a cluster of twenty 200 MHz Pentium Pro's running Redhat Linux 7.1 with kernel
version 2.4.7. The nodes are interconnected in a multiply redundant honeycomb-like structure
of which the layout is shown in figure 5.1. Communication with the outside
world was limited to secure shell login. This should suppress most superfluous bandwidth
usage. On the local nodes routed, a dynamic route configurator, was the only active
bandwidth occupying service.
All nodes have been configured identically. Each node accepts both
vanilla SNMP requests and SNAP packets.
Both handlers can be started as independent daemons. The SNMP daemon,
snmpd, can also be configured at runtime to accept SNAP packages.
The brief overview of our framework given here hides many of its intricacies.
SNMP has been discussed in detail in section 2.1.1.
To complement this, the next section will feature the SNAP active networking
environment. Design goals and implementational features
will be discussed and an example program will be shown.
Chapter 7 An in-depth look
at the SNAP Active Networking Environment
Safe, Nimble and Active Packets is an active network designed around
three goals: safety of execution, efficiency in resource utilization
and flexibility of the platform. These three goals make it a perfect
candidate for basing our research on. Before we employ the system
in the field of network management we first need to understand its inner
workings. In this section we will display how SNAP handles packets and
what the implications of its design mean in the current context. To
accustom the reader with the field a simple example is given.
7.1 Language
In section 2.2 active networks have been introduced. As any
AN, SNAP consists of a language specification and a software packet capable
of interpreting programs adhering to the language. Implementation of the
software will be discussed later on. First we will give an overview
of the SNAP language.
Just as many other interpreted languages, e.g. Java bytecode, the SNAP language consists of
simple assembly-like instructions. The main advantage of having a low level
language is the relatively high processing speed obtainable. Any drawbacks, most
notably development difficulties, can be overcome by using a higher level language
and compiling its code to SNAP bytecode. A PLAN to SNAP compiler []
has been created for this purpose.
Language Constructs
Although SNAP code resembles machine level assembly language, it is developed
especially for interpreting network packets.
Therefore it has a number of distinguishing features.
Most prominent is the lack of backward jumps. As mentioned previously, SNAP has been
designed with safety of execution in mind. One of the methods used to reach this goal
was to create so called linear execution time. Since SNAP does not allow jumping to
previously executed code - and thus lacks constructs for expressing unlimited loops
- execution of a SNAP packet will take at most a fixed amount of time
that scales linearly with the number of instructions in the packet. Through limiting
the expressiveness of the language in this way SNAP can guarantee safe execution without
the need for dynamic runtime checks or other CPU intensive overhead. However, one
can easily see that by resending a packet to the same computer we can allow backward jumps, since
execution would start all-over again. This and other side effects of network programming
are dealt with by using a special construct: the resource bound.
Resource Bound
A resource bound, or limit on the amount of resources a packet my consume, is implemented
as a simple counter inside the packet. All instructions that are not intrinsically
execution safe consume an amount of this resource bound. Once
a packet has eaten up all of its resource bound it is dropped by the interpreter. Currently
resource consumption is limited to packet sending, but the backward jump example above
shows that other language extending constructs could also be implemented in this fashion.
Instruction Set
By having low level instructions a relatively small set of instructions can be used to
perform a wide array of operations. SNAP consists of a base instruction set containing
operators for control flow, stack manipulation, heap manipulation, relational comparison,
basic arithmetic, network processing and packet inspection. A complete overview of the
base instruction set in SNAP v1.0 can be found in Appendix A of [], an
updated reference is located on the project website [].
Contributing to the flexibility
of the environment, another key point, is the inclusion of a service infrastructure. Operations
not part of the core language can be added to the system as services. Our SNMP connection
is a prime example of such an extending service. Services can also increase processing speed,
since often used high-level operations can be compiled into machine code and called as
a service. This can reduce the necessary number of SNAP instructions, thus decreasing packet size
and increasing processing speed.
Data Access
Accessing data from network packets poses new problems. Since packets are
designed to travel from node to node, direct memory access cannot be used. Instead,
datastructures are encapsulated inside the packet itself. Doing so limits the maximum size of the
data, since packets may not exceed the the network's Maximum Transfer Unit size.
The main data access mechanism is a simple stack. Contrary to machine dependent stacks,
however, the SNAP stack allows elements of different datatypes and -lengths to be processed
identically. The SNAP core language has support for integer, floating point,
IP address and character string basetypes.
Mimicking machine hardware, SNAP packets also contain a heap-like structure. However,
this heap cannot be addressed directly. The heap merely serves as a storage medium through the
use of pointers on the stack.
In the core implementation, SNAP packets cannot use any storage medium on the executing
nodes. In addition to the aforementioned CPU utilization, memory overflow is another
cause of system instability.
Limiting a packet's access to the executing machine's memory therefore increases SNAP's
innate safety.
7.2 Example program
We will now discuss a simple SNAP coded example program. The program, shown in figure
, is used retrieve an SNMP value from a remote location.
A high-level overview of the program operations identifies three tasks: network travel,
request handling and result delivery.
; part 1: travel
forw
bne athome-pc
; part 2: call SNMP
push "sysName.0"
calls "snmp_getsingle"
; part 3: reverse direction
getsrc
ishere
bne athome-pc
push 1
getsrc
forwto
; part 4: return results
athome:
push 7777
demux
#data 0
Figure 7.1: Example SNAP Packet : an SNMP GET request
Network Travel
Firstly, the packet has to travel from the source, the monitoring agent, to
the destination. Secondly, after completion of its remote task it must
travel back to the source to deliver
the results. In this elementary example travel through the network is expressed
using the two instructions in part 1. The forw instruction tells the
interpreter at the executing node to compare the current location with the
destination and resend the packet on an applicable network interface when they
do not match. If the two do match the next instruction is executed.
The second instruction,
bne athome-pc executes a branch on not equal operation on the top stack
element. This is part of the elementary control flow we implemented in the code.
A packet has to carry out a return-trip. To distinguish between the two
paths the top stack element is used. On its initial route to the destination
the top element carries a zero. This value is initialized at compile-time by
the last line in the program: #data 0. At the destination the element
has to be swapped for a 1 to express that the packet is going home. The bne
instruction will therefore branch when the value is 1, i.e. when it is back
home. The number of instructions it will jump over is athome-pc,
or an offset calculated from the distance between label athome: and
the current instruction (pc is an abbreviation of program counter).
The swapping of zero to one is handled in part 3. The first three
lines retrieve the source field from the package and compare it with
the current location. If they are the same the destination and
source necessarily are the same and we can directly jump to part 4.
Otherwise a one is pushed onto the stack, the source field is
retrieved and the packet is forwarded to the source.
Request Handling
Part 2 contains the request handling.
In this example, request handling is limited to a single system call for
brevity and clarity.
Note, however, that any arbitrarily complex operation can be executed here.
The push operation pushes a string onto the stack. This string is then read in by
the consecutive operation. The calls instruction searches for a
service, in this case snmp_getsingle, and executes the accompanying
compiled function. It first needs to convert stack elements into arguments.
After the call ended it then must convert
return values back into stack elements. After executing this call
the top stack element should contain a string representation of the sysName.0 object.
Result Delivery
Finally, when the packet is back at the source node the SNMP result has to
be transferred to the client. A demux, short for demultiplexer, instruction
is executed for this. The instruction takes as arguments the top two stack values,
which must contain the receiving portnumber and returnvalue, respectively. The preceding
instruction, push 7777, adds the portnumber to satisfy the precondition. Notice that
we do not explicitly remove the control flow value, 1, from the stack.
This has been taken care of
by the bne instruction.
After a successful run the client should have received the destination node's sysName.0 value.
Advanced Example
To show the strength of using an active network we will now look at a more interesting
scenario. Suppose we want the names of all systems on the way to the destination:
a traceroute. Using
SNMP this would entail sending the same request to all hosts. Even standard ICMP based
traceroute sends multiple requests. Instead, using SNAP we
can modify the previous example packet slightly to obtain the same result in a single
go.
Recall that the forw instruction stops execution and resends a packet to
its destination when it is not already there. If we were to move this instruction to
below the service call in effect the previous three instructions would be executed at
all intermediate hops.
The only problem is that the branch instruction removes the zero at each run. Therefore
we need to insert a push 0 between the service call and the forwarding command.
To deal with the same problem on the way home we must implement a similar extra check.
Inserting
getdest
ishere
bne athome2
push 1
forw
athome2:
after the athome: label should do the trick. We simply add a new label and check if the current location
is equal to the destination. If it isn't, we forward the packet. Otherwise the result delivery
process is handled as previously.
From these two examples it may seem as if control flow takes up most of the packet code.
However, this is true purely because of the limited actual program execution going on
and the relatively difficult to code control flow in this instance. Other examples
will be discussed in section , contradicting this assumption.
7.3 Practicality Framework
Having handled the basics of SNAP it is time to assess SNAP's distinctive qualities.
SNAP is optimized for safety, flexibility and
efficiency. Let's discuss these per item.
Safety
Safety should be a key concern to any networked application. SNAP has been developed
with the following design goal concerning safety:
SNAP packets should not be able to subvert or crash a node (robustness);
SNAP packets should not be able to directly interfere with other packets without
permission (isolation); and SNAP packets' resource usage should be predictable,
both for individual packet executions as well as globally across multiple nodes
(resource predictability) [].
A detailed overview of these features can be found in the cited work. Since safety is
especially of concern when dealing with network management we will briefly go into
each of the mentioned subgoals.
Robustness of the system is assured
by removing operations that are a potential
hazard to the stability of the system. Accessing the actual computer system underlying
a running SNAP interpreter is not possible using the core instruction set. Memory is
shielded from the packets since they can only alter values inside the packet. Similarly
denying access to the CPU would present an unworkable situation. Instead of removing
access it is closely guarded. A worst
case estimate of CPU cost can be calculated a priori by reviewing packet size. Packets
that are deemed too large can be dropped by the
interpreter. Limiting
access to system hardware in this manner guarantees safety of packet execution, regardless
of the actual code it carries.
Isolation is guaranteed because packets cannot communicate
without the help of additional services. By incorporating applicable
services communication can be allowed, but for security reasons this is not part of the
base package.
A strict maximum on resource consumption and the lack of backward jumps
enforce resource predictability. Upon arrival of a packet a decision can be made
to execute or drop the packet based on guaranteed behaviour.
Flexibility
An active network is only useful if it can be applied quickly and easily to specific
use cases. SNAP consists of two mechanisms for handling operations. Firstly,
the core set of instructions is extremely low level. This allows complex procedures to
be built on top of the system. Due to the its confined nature it cannot, however,
innately express all types of known algorithms, i.e. the language is not Turing complete.
A complete formal proof of this is left to the reader, but it can be expected when we compare
our language with a set of other languages. In [], Douglas R. Hofstadter
identifies the fine line between a Turing complete and an incomplete language (Bloop and Floop).
In essence, SNAP can be made to accept Bloop programs, but not Floop programs.
The difference between Bloop and Floop are the
addition of unbounded loops. SNAP can only accept Floop programs if it accepts
unbounded loops. However, for this a SNAP program needs
an infinite resource bound. Restricting the resource bound thus directly restrict the
language's capabilities to handling fixed length algorithms. An extended version of this
argument can be found in [].
The flexibility of SNAP is not only limited by the fact that it cannot handle all mathematical
problems, it is equally restricted by the fact that little interaction with the execution
platform is possible. However, these shortcomings are necessities for guaranteeing safety.
If a user is willing to sacrifice some degree of safety these problems can easily be overcome.
The service infrastructure allows arbitrarily complex programs written in
general purpose programming languages to be accessed from the SNAP interpreter.
Hypothetically, one can even
use SNAP solely as a tunnel to another environment, thus allowing the same level of
expressibility as is possible under any other existing execution environment. An interesting
approach has been taken by Kind et al [], who rewrote the resource bound
implementation to incorporate safe execution of loops.
The degree to which flexibility is traded off for safety is an issue that
can be dealt with on a case by case basis. Also, many active networking applications
do not need language constructs that go beyond what is possible in SNAP. The program shown
in figure 7.1 can serve as an example of this statement. More
examples are given in [].
Efficiency
Efficiency in case of active networking boils down to execution overhead. Active
networks necessarily introduce increased overhead over traditional networks. To
be practical, however, they should be able to handle Internet Protocol like functionality at
IP-like performance. Proof that SNAP is relatively efficient has been given in [],
where SNAP was compared to ICMP ping. Efficiency of SNAP in the network management domain
will be discussed in section .
7.4 Interpreter
The software package used to handle SNAP packages is nothing more than an interpreter.
Currently, there only exists
an implementation for the Linux operating system. To distinguish between SNAP programs
and the software package on which they execute we will refer to the last as the execution
environment or SNAP-ee. The SNAP-ee is based around the interpreter, a large switch
statement that carries out the operations defined in the SNAP core instruction set on the
packet's datastructures. To
help fulfill its tasks additional code is necessary. Most notably, the SNAP-ee contains interfaces to
the SNAP network, the client and the services.
Networking
The SNAP network interface serves as the access point to other SNAP daemons. It is not
used for communication with client applications. For that purpose
IPC is used instead. The specifics will be explained in detail in the next section.
Inter-server communication is carried out over
the Internet Protocol. Just as UDP and TCP, the SNAP network protocol
(SNAP-np) is positioned directly on top
op what is called raw IP, the level in the TCP/IP stack that handles wide area networking,
but lacks support for reliable connection oriented networking. Similarly to UDP, SNAP-np
adds only a thin layer on top of raw networking. Support for connections,
transport safety and receive order preservation are not part of the protocol. Implementing these
features is not necessary since SNAP-np is only used for sending individual packets. SNAP-np
packets can be distinguished from other types of data by the protocol number encoded in their
IP header and by the use of the IPv4 router alert
option. Packets with the router alert bit set can be taken out of the general queue by the kernel for
additional processing in a user level application. Several different methods
have been tried for filtering out SNAP-np packets. The pure SNAP implementation reads all incoming packets in
the user space application and filters out appropriate packets at that location. This incurs a
large performance penalty, probably mostly due to the large number of
userspace/kernelspace contextswitches necessary and the relatively small number of SNAP packets
in a normal network datastream. For our tests we have altered this setup considerably. A
detailed description of the alterations can be found in section .
The SNAP-np packet, stripped of the IP header, consists of a twelve byte header containing
information about the packet's version, currently unused flags, the program counter, source port and the
lengths of the three main datastructures, i.e code, heap and stack. The all important resource bound
value is computed on the fly from the IP header's Time To Live field.
The rest of the datagram can be filled in with
code, heapdata and stackdata. This layout permits nearly instantaneous execution upon
arrival at a node. Only a handful of integrity checks are performed prior to execution.
Most importantly, no copying of data is necessary. Removing the need for so-called
unmarshalling in this fashion was one of the means used for increasing efficiency.
Since SNAP-np does not contain support for various features found in reliable protocols
(e.g. TCP), special precautions have to be taken regarding packet size. The total
size of a SNAP packet, including heap and stack, must always stay below the Maximum
Transfer Unit of a node to avoid fragmentation. This means that the packet programmer
must either estimate the worst case size of a packet or implement an auto return-home
feature when a packet is becoming too large. For instance, in case of the advanced
example given earlier, the maximum number of traveled SNAP aware nodes must be known upon
injection of the packet into the network stream or else the packet might get dropped
or destroyed along the way.
Client Communication
Injecting a request into the network is taken care of by sending a packet to a known
SNAP server in the specified format. A client must have access to the
SNAP-np send instruction for this. Presently, this function is simply copied into
the client application. The revised Splash implementation, on the other hand, uses
a more elegant solution.
In most cases it is sufficient to send a packet to a local SNAP server.
From there on the packet is forwarded according to the control flow implemented in the
packet itself.
This way the strengths of the active network can be maximally exploited. While not
possible under the original configuration, it
is also possible to send a packet directly to the destination or a designated intermediate
hop in our augmented environment.
Specifics are discussed in section . Doing so will limit expressiveness,
but increases responsiveness, since it reverts active network general
case processing back to special case passive packet handling at the intermediate hops. Being able
to finely select the transport mechanism in this way increases the flexibility of the system.
Transferring of data from a SNAP packet to a client is handled by the demux
statement. This statement creates a UDP connection from the daemon to a client application.
Again, the augmented application uses a
different mechanism for daemon to client communication.
Services
For any extensible system the service infrastructure is an essential element.
By enabling access to all parts of the local computer services can
expand upon the base instruction set. A service infrastructure introduces potential
security hazards and should be implemented with due care.
In case of SNAP these security concerns are extremely important, since some of the
system's characteristics
are based on the notions of resource predictability and safety of
execution. Service calls operate outside of the SNAP-ee and therefore do not fall under
its restrictions.
The original service infrastructure was hard-coded into the application. Moreover,
the calling mechanism placed certain restrictions on the accepted services, which
proved unworkable for our research.
Because of this a new and more flexible service infrastructure has been implemented.
The old infrastructure will not be discussed here in much detail.
We should note that since services originally had to be coded into the application
there existed no actual difference between a service call and a SNAP instruction. A
consequence of this fact
is that services had to be written in the same language as the
interpreter, C. In the next section we will discuss the new
infrastructure and the advantages is brings to our research.
Chapter 8 Splash: Combining SNMP with SNAP
Although our primarily task was to combine the SNAP and SNMP daemons
into a single executable, multiple issues arose along the way that made
us decide to change key aspects of the execution environment. In this
section we will discuss the integration of the two software packages
as well as the alterations made to the original systems. The combined
efforts lead to the application used in the experiments: Splash.
8.1 Connecting SNMP and SNAP
Having selected two applications that meet our initial demands, we now have to
connect them. This task must be carried out in a way that minimizes extra
processing overhead while
keeping in mind the other design goals set forward in chapter 4.
The resulting program is supposed to carry out both original applications'
tasks and expand upon these. Therefore we have given it a distinct
name: Splash. Since SNAP itself has been revised to be part of Splash we have to
differentiate between the vanilla implementation and our revised version. For this
purpose we will use the name snap-wjdb when we talk about our revised implementation.
Creating a single executable from two distinct software packages
can be smoothly carried out by adopting a library oriented approach.
Acknowledging this rule of thumb, the
net-snmp developers have split their package into a daemon application
and a set of function
libraries.
The SNAP implementation, on the other hand, collects all functionality
in a single executable. One of the tasks at hand was therefore to split SNAP into
one or more libraries and a thin wrapper daemon.
Repackaging SNAP as a wrapper executable and library can be carried out relatively
easy by exporting the old main(...) function from the library. We have chosen a
somewhat more complex interface, mostly for performance reasons. The SNMP library
is used by both a SNAP service and by the original SNMP daemon. This gave rise to a new
problem: in a single executable
it is not possible to create multiple runtime MIBs, nor is multi-threading supported by
the SNMP library. To overcome this issue we
have merged the network select and receive loops of the two systems. The new
SNAP library exports functions similar to standard libc FD_ISSET,
FD_SET and recv(..)
routines. This makes it possible to add a SNAP capable handler to any other network daemon.
Expecting an initialized MIB exported by the SNMP daemon,
the current Splash implementation will not work from a vanilla SNAP daemon.
Stand alone operation could be implemented, but by disabling SNMP network access from
the command line essentially the same situation can already be reached with our
augmented snmpd.
8.2 Interfaces
The packet format and core instruction set of SNAP weren't altered for
the SNAP-wjdb implementation. Aside from the repackaging of code mentioned above,
the core of the interpreter could therefore remain largely unchanged. Connecting
the interpreter with the other elements of Splash did give rise to some inconveniences.
The various SNAP interfaces have been revised to fit our needs. We will now again discuss
each of the interfaces first mentioned in section 7.4.
8.2.1 SNAP-np interface
SNAP-np, the interserver networking protocol, performed extremely poorly in the initial
tests. The main cause of this proved to be the way active packets were being filtered
out of the general network queue.
Since SNAP runs right on top of raw IP it has no access to the socket
infrastructure commonly used for differentiating
packet destinations. Instead, an alternate selection
based on bits in the packet header is used. Originally, this was envisioned to be taken care
of by the kernel. Using for instance the netfilter package in
Linux 2.4 []
the goal of high processing speed combined with fine-grained control can be reached. However,
in practice SNAP dealt with this in another way. By opening up a low level ETH_SOCK ethernet interface,
all packets where read off the interface by SNAP itself. Since vanilla SNAP could run in kernelspace
this was not necessarily problematic.
When running in userspace, however, performance was considerably below expectations. Because local
access to SNMP is a design goal the application had to be able to run in userspace at
higher speeds.
In SNAP-wjdb, packet filtering is carried out differently.
Contrary to filtering based on portnumbers, filtering based on protocol is possible at
the level SNAP works. Protocol based filtering is executed in kernelspace, therefore
SNAP packets can be accepted in a userspace application
without having to read all other packets. A disadvantage of this
method is that each intermediate node must accept SNAP packets, otherwise
they get dropped.
To circumvent this problem network instructions have been added that do not
resend packets on a hop-by-hop basis (as forw, forwto and send do),
but instead directly send a packet to its destination using passive IP packet handling
en route. dforw, dforwto and dsend instructions can be used to
skip past conventional nodes in the network. Side-effect is that response times can be
decreased when intermediate packet handling is not needed. The simple example program in
figure 7.1 would, for instance, work equally well when we would
replace the forw operator with dforw. The advanced version of it, on the other hand, would not.
8.2.2 Client interface
The client interface in SNAP also uses a network tunnel to transfer results from the
SNAP daemon to the client application. During testing we noticed that delivering data
through the original interface could incur a performance penalty on the system. To
minimize this penalty the infrastructure has been rewritten from the ground up. SNAP
instructions were not altered, however, to ensure compatibility with older packets.
The codebase for the new daemon/client interface can be shared between clients and
the SNAP daemon. Transport protocols are largely abstracted from
the developer by an API that exports simple send(..) and recv(..)
functions similar to other
network interfaces. Underlying transport protocols can be toggled at runtime by setting
an environment variable. Currently, the interface supports
UDP, UNIX and raw IP protocols as transport layers, but others can be added later.
8.2.3 Networking similarities
During subsequent code revisions the SNAP-np interface and the client interface
have grown toward each other. Although the version of SNAP-wjdb used for our experiments
did consist of separate codebases it is our intention to merge the two in the end.
There are clearly use cases conceivable under which SNAP-np would benefit from running
over UDP or TCP. Due to the reliance on a protocol based selection mechanism this is
currently not possible. However, by merging the interfaces we could clean up the SNAP-wjdb
codebase and further extend SNAP's modi operandi.
8.2.4 Service interface
The third and last of SNAP's interfaces concerns access to services. Critique on the old
interface has already been given in section 7.4. We wanted to be able to update the SNMP
service more frequently than the SNAP library, since most of our development time went into
creating this service. Where possible, increasing the flexibility of SNAP by means of
extending its reach on the underlying system was a second concern.
Plug-in architecture
The service interface in SNAP-wjdb is based on a plug-in architecture. Plug-ins
are prevalent in systems where extensibility is a concern. Prime examples are the
Windows multimedia system sublayer that accepts additional encoders and decoders
and the Mozilla/Netscape plug-in architecture for web-enabled applications,
such as Macromedia's Flash, Sun's Java and Microsoft's aforementioned multimedia system.
In any implementation plug-in libraries must adhere to certain rules.
For instance, there must be an agreement on how the calling application accesses
the routines in the library. In SNAP-wjdb services can be written
as normal Linux ELF libraries, but must export certain additional functions. Note that
the standard library format enables us to link to the services at compile-time
instead of using the
plug-in architecture. However, since the plug-in architecture uses fundamentally the
same low-level interface as shared libraries we saw no reason
to do so. In any case, if we would have encountered serious drops in performance
we could have reverted back to the previous situation.
init | initializes static datastructures |
getnextfunc | loops through the exported service functions |
getlastresult | optionally returns a structured dataset |
free_local_returnstruct | frees the dataset |
fini | cleans up leftover datastructures
|
Table 8.1: A service's minimal set of exported functions
Aside from its actual service functions a SNAP-wjdb service must export the functions
outlined in table 8.1 to be recognized by the main application.
These various functions are needed by the service interface to handle service initialization
and destruction as well as structure copying. Through the use of conversion functions
the service infrastructure is capable of translating stack values directly into function
arguments. Similarly, return values are automatically placed on top of the stack. However,
when multiple return values can be expected, for instance by executing an SNMP request, an
indirect variable passing method can be applied. The getlastresult and free_local_returnstruct
functions are used to retrieve and optionally destroy extended sets of data.
Initialization
A plug-in based architecture can, if not implemented correctly, impose a performance penalty
on the system. To overcome this problem we have moved all plug-in specific code to the
initialization stage of the SNAP-wjdb package. Three functions are
used for handling services. In its default behaviour, SNAP-wjdb executes an init call
prior to accepting packets. When called, the function searches through a number
of standard library directories for files that match the following pattern:
snap_svc_[name].so. Matching libraries are then loaded and scanned for the
obligatory routines.
Service handlers can be found by calling a library's mandatory getnextfunc function.
This function is expected to iterate
through all service calls in the library and return the name of the SNAP call together with
a pointer to the actual handler, the number of arguments N and type of returnvalue R.
This information has to be programmed in by the library developer. Although this slightly
increases development complexity there are numerous advantages. The most important one and
the reason for implementing this scheme is that
nearly all existing functions can be called directly, i.e. without the need for wrapper
functions. With the help of the service infrastructure we can tap into the extensive array
of libraries designed for Linux quickly and easily. Only functions that do not take any
arguments cannot, presently, be called instantaneously.
After initialization all accepted services have been linked to and can be
used as if they were linked to at compile time. By having to use a vtable for
function lookup, performance will inevitably
be worse than it would be with statically linked libraries. There is
no reason to expect that a plug-in based architecture performs
worse than a shared library based implementation.
Execution
When a SNAP packet executes a calls instruction the service interface looks up the
handler in a hashtable, taking the SNAP argument as service name. If a service with that name
exists it then converts the top N stack values into arguments and
calls the appropriate handler with these stack values. NB: The infrastructure
does not check whether these are of the expected types of the function's arguments.
After a call completes the returnvalue is automatically converted by looking up
the handler's returnvalue R. The resulting variable is placed on the stack. Lastly, the getlastresult function is called to see if
extended datastructures were prepared. If so, these values are also
converted and placed on the stack.
One can see that this process is not particularly type-safe. Successfully calling a function
relies on proper use of the getnextfunc by the library developer
and on correct value
placement on the stack, a responsibility of the packet developer. Behaviour
under error conditions is largely undefined, a cause of possible problems. Increasing
the robustness of the service interface is necessary prior to using it in critical applications.
A standardized exception catching framework should be decided upon for this purpose.
Library development
The service infrastructure is designed for rapid functionality development. While services
can be programmed from the ground up in any language there already exist support routines and
examples for C and C++ developers. The snap_svc.c file implements most of the functions
in table 8.1. Developers of new services need
only supply a new file that contains
the getnextfunc function tailored to the exported services. If these services are functions of
other libraries they can be directly accessed by linking to these libraries, otherwise
new functions can be added in the same file. A template has been created for
rapid development purposes. Additionally, a
basic example can be found in the snap_svc_test source and header files.
8.3 Services
For the experiments we have had to implement a number of services. Most important is
the SNMP connection library, but we will also discuss the others here. The mechanisms
underlying the services have been discussed previously, we will not mention those here anymore.
SNMP service
The SNMP connection library has been separated from the main SNAP-wjdb codebase since we expected
that we would frequently need to update it. The library serves as a wrapper around the
necessary initialization, PDU creation, execution and teardown parts of the SNMP process.
Executing an SNMP request can be a tedious exertion, because of the many datastructures
that have to be set up. We don't want to have to place this logic inside the SNAP packets
themselves, therefore most of the work must be abstracted from the user. On the other
hand, too much abstraction can limit the usefulness of the connection library itself.
Therefore we chose to export many functions, including low-level ones dealing with
general SNMP functionality. Reducing SNAP package size is made possible by exporting a
second set of functions that automate much of the low-level processes, in essence handling
a special case. For instance, retrieving a single variable can be accomplished by
executing one service call. However, retrieving multiple variables can be speeded up
by resorting to lower level service calls, so removing unnecessary duplicate instructions.
Where possible, checks where implemented in the library to deal with
not executed low-level functions. For instance, the initialization routines will be
executed automatically if the necessary structures haven't been initialized prior to adding a variable
to the PDU. Exploiting these safety checks can reduce
a packet's size. One should be thoroughly aware of the existence of inserted safety checks before
skipping instructions, otherwise undefined program behaviour may be encountered.
service name | functionality |
snmp_init | initializes the library |
snmp_init_ip | idem, but uses IP to connect to the SNMP server |
snmp_pdu_init | initializes an empty protocol data unit |
snmp_pdu_addvar_null | adds a value to be retrieved |
snmp_pdu_addvar_withvalue | adds a value to be updated |
snmp_send | sends the prepared PDU to the server |
snmp_close | closes an open connection
|
Table 8.2: Low-level SNMP service calls
The list of exported low-level functions is shown in table 8.2.
From this list it can be seen that the connection between SNAP and SNMP is a pure
client/server one. However, one of our initial demands was that they could coexist
in the same execution space. The first function, SNMP_init, is set up to handle
this. It creates a connection to an SNMP server not through the use of a
regular transport protocol, but by simply passing pointers between the two systems.
The PDU created earlier is therefore not marshalled/unmarshalled and transferred, but
directly referenced from the SNMP library. Naturally, this greatly reduces
processing overhead.
The other initialization routine does use a standard UDP connection to an SNMP
server and should therefore not be used to connect to the local executable. Its
intended use is the querying of other nearby SNMP servers not running Splash.
A Splash daemon can thus serve as a proxy for SNMP servers, another bandwidth saving utility.
From these low-level services more advanced ones can be constructed. Obviously, doing so
does not increase flexibility. It can help limit packet size and increase
code reusability. The two most basic examples we created were snmp_getsingle and
snmp_setsingle. The first was used in the example shown in figure 7.1.
Both functions call - in consecutive order - the init, pdu_init, addvar, send and close
functions. Since we know that only one variable has to be retrieved or set we can reduce
the number of necessary calls from five to one, taking as arguments the combined arguments
of the low-level functions.
A more advanced use of SNMP calls can be seen in another set of services. Two control
flow problems have been dealt with by requesting information from SNMP. Firstly, to
travel across the network without prior knowledge of the underlying topology and without
use of datastructures a round-robin interface selection scheme has been implemented.
By calling snmp_getnexthop with the IP address of the incoming interface as argument a next hop
for a packet will be generated through the use of SNMP information. Second, snmp_getneighbours
places all local IP addresses, except 127.0.0.1 and the incoming address, on the stack for a flood
send. Both of these function can also be implemented without the use of SNMP by directly
querying the kernel, but in some instances SNMP can serve as a useful abstraction to
the raw data from the kernel. Note that, for the experiments, in the end we
decided to use kernel calls for these tasks. The SNMP requests proved to be very fragile
with regard to exception handling. One of the underlying reasons is that one request's
response may be necessary to construct the following OIDs, an undesirable situation,
since if the first fails we have no input for the next. Using kernel calls many steps
can be purged from the process, resulting in a cleaner and sleeker implementation. Kernel
call services will be discussed in the next section, along with a few others.
Other implemented services
During the development process issues arose that were hard to implement using only SNMP
access. We
could have blindly created SNAP instructions for each problem, but this would surely
lead to a bloat in SNAP-wjdb code. Alternatively, we created a number of services to
deal with each specific type of problem.
Configuration of nodes can in theory be
performed through SNMP calls. The net-snmp package, however, does not export many
variables that can be SET. In those cases where we needed to alter behaviour,
communication with the subsystem was therefore implemented
through kernel-calls. More specifically,
services have been created for read/write access to the route-table, the interface table
and the /proc interface. We chose to implement this functionality in Splash services instead
of an SNMP MIB for two reasons: (1) to reduce development time and (2) to experiment
with the service library layout.
A different kind of extension is the data dictionary. Following the security guidelines of
the SNAP language we did not want to add shared memory directly to the interpreter.
However, there are tasks in which a communication mechanism between packets
was needed, examples are given in chapter . Therefore
we implemented a simple mechanism that can be used to store SNAP
stack values on a node. A hashtable wrapper is used that exports basic GET, SET and DEL
functionality. No resource control constructs exist, so this would surely not be
an ideal implementation for critical systems. In a production environment some sort of timed
automatic data release, a maximum imposed on the amount of stored data and a cost function
eating up resource bound should be added. For our research purpose the simple library
proved acceptable, however.
Lastly, we've created kernel call based versions of the
control flow services mentioned in the SNMP service library.
Internally relying on the above mentioned route and proc services,
the if_getnexthop and if_getneightbours services
replace their SNMP based counterparts in our experiments.
This last chapter on Splash completes the overview of our test system's implementation.
In the following chapters we will discuss the experiments carried
out to support our thesis. In the
first (),
low level request Splash performance is compared with SNMP,
while in chapter functionality claims will be
reviewed.
Part 4 Research
Chapter 9 Experiments: Performance
Having discussed theoretically obtainable NM improvements and
the framework with which this is to be accomplished what remains
is to prove the stated claims. In this and the
following chapter we will experimentally establish the comparative
qualities of SNMP and the active network. This first chapter will compare
processing speed, while the next will deal with the
functionality argument.
9.1 Theory Recap
The experimental results displayed hereafter are based on the test cases
introduced in section 5.2.3. Contrary to the functionality
tests discussed in section 5.2.2
the following results are not intended to back our claim of superiority
over traditional SNMP. One should recall that to improve upon SNMP
we first have to establish that a rivaling environment is comparable
to SNMP in terms of SNMP's main utility: low-level request processing speed.
While the implemented tool, Splash, contains both an SNMP and AN interface,
we will use the name in the experiments to denote the active network interface
as opposed to its SNMP counterpart.
We will first make the case for Splash in general terms by comparing
the round trip time results of identical requests executed under
SNMP and Splash. Then, a single request is examined more closely to
distill the relative cost of the subprocesses involved. With this
information we can select situations in which Splash will excel
as well as those
for which the system is not suitable. The insights gathered
will help explain the comparative quality of Splash in the more
elaborate functionality tests discussed in the next chapter.
9.2 Considering Pre- and Postprocessing
The results shown in figure show the overhead of
SNAP and SNMP for the various scenarios without `preprocessing',
i.e. without the time it takes to establish an SNMP session.
The SNMP platform suffers from a disproportionately long preprocessing time.
During preprocessing several datastructures have to be set up,
for instance the Protocol Data Unit used
for sending the requests. To accomplish this, the net-snmp application has
to open the extensive SNMP library, which in turn must initialize several
data structures.
A lightweight package such as SNAP has less internal data
structures than SNMP and can therefore respond more quickly. For simple requests,
preprocessing can have a large impact on performance. For a single SNMP GET
request the overhead of preprocessing is approximately 500 times that of the
actual request, SNAP only needs around 30 request durations.
With this preprocessing taken into account, Splash
outperforms SNMP by an order of magnitude. Caching of structures can,
however, largely hide this weakness. Therefore we chose to keep it out of the
direct comparison.
9.3 Round trip Results
As discussed previously, we have selected a set of 4 requests to
obtain general purpose performance statistics. These requests are:
a single GET, a single SET, a combination of GET and SET and a combination
of 5 distinct GETs. All of these have been sent to 7 servers at varying
distances from the monitoring host. As explained in 5.4,
The 7 queried nodes are linked in a linear fashion. Therefore network
delay for a node n is the accumulated delay for
hopcount n-1 plus the delay incurred by traveling from n-1 to n.
While hopcounts 2 to 7 include
actual network links, the hopcount 1 case executes a request at the
monitoring client's node. Results for this case are different from
the others' because all processing takes place on the
same host, as we will see later on.
Figure 9.1: Low-level Requests Round trip Results
Despite the fact that net-snmp supports SNMPv3 we only used version SNMPv2c
for our comparative tests. Version 3 only adds authentication and encryption
features not present in SNMPv2. Using these technologies has a severe impact
on performance. Since SNAP currently lacks such features comparing it with
version 3 would be inappropriate.
For purpose of clarity the results in figure 9.1
are displayed in two separate
graphs. Figure displays the trend in performance
improvement when combining multiple similar requests while
figure shows the differences in speed between
retrieving and setting values. All results were obtained after taking
the median of 101 test runs. For all tests,
the upper and lower quartiles were within 2.57% of the median.
9.3.1 Retrieving One or More Values
Retrieving values from a remote node is arguably the most often
used feature of SNMP. It is important, therefore, to be able to
execute especially these requests at comparable speeds.
From figure
we can see that executing a
single request using Splash takes only fractionally longer than
executing it using SNMP. When sending the request to a remote
host the penalty of using Splash lies around 30% for the
worst case, i.e. when the network delay plays no significant role.
When network delay increases both platforms scale at what appears to be
the same rate. Therefore the comparative penalty drops with each
hop. After 7 hops the penalty is reduced to approximately 10%.
In any case, performance results for this important case fall
in the same order of magnitude for the two systems, although
SNMP still outperforms Splash by a percentage depending on the
network delay.
When executing a request at the local host the outcome changes
dramatically. In this case Splash actually outperforms SNMP.
In practice, this is not important, since querying of local
data can be handled more easily through normal shell tools or
the /proc file system. Nevertheless, from this observation
must follow that SNMP consumes more computing resources. If all
external factors remain the same, the program that performs
slower under time constrained circumstances must do so because it
consumes more resources. It is
uncertain whether this has to do with CPU time or I/O, but it
is clear that the heavyweight net-snmp daemon slows a system
down more than the lightweight SNAP interpreter, even
though both share the same back-end.
Both environments allow for the bundling of multiple similar
requests into a single packet, thus saving bandwidth by
combining the packet headers. SNMP does this by allowing multiple
OIDs with the same request type to be added to an initialized PDU.
By having procedural packets Splash can innately express any
number of combined requests, without the need for predefined
structures. The 5GET case makes it directly clear that the
specialized framework used by SNMP outperforms Splash on all levels.
Splash performs worse than SNMP even at the closest remote hop. This
could be expected, since the remote workload is higher for an AN
based system. What is striking, however, is the fact that the
two environments scale differently with hopcount, again at the
disadvantage of Splash. The underlying reason does not become
directly apparent. We would expect Splash to respond fractionally
slower on the remote host, but since intermediate handling remains
the same for all distances the function's angle cannot be explained
by differences in the methods of request handling. It appears that
an external factor is influencing the outcome of our tests.
In the following section the subprocesses will be discussed more
closely. To be able to explain the function angle anomaly occurring
in the 5GET case we will briefly touch on one of the factors here.
Contrary to initial expectations, packet size, even in the limited
scope of our research, plays a significant role in network delay.
The specialized data structure used by SNMP (the PDU) enables the
system to minimize packet size. Splash, on the other hand, has to
send complete programs in bytecode format, resulting in a discrepancy
in packet size. For the specific 5GET request we executed, SNMP
packets were 101 bytes long on the way to the server and 267 on the
return trip. Splash packets weighed in at 444 and 652 bytes respectively.
Doing a weighed ping test revealed that delays are indeed approximately
a factor 2 longer for packets of size 545 than for packets of size
180. Chosen packet sizes are estimates of the average size of the
round trip packages.
Acknowledging the influence packet size has on network delay and
the fact that Splash packets will always be larger than their SNMP
equals we must conclude that Splash requests will always scale worse than
their SNMP counterparts. This is not obvious in the 1GET case, since
other factors are of more importance for such small packets. However,
the larger the number of additional instructions, the larger the impact of this
size related delay. Optimizing Splash packets for these instances can
reduce this weakness. We have not looked into the matter when obtaining
these results, however.
Network delay plays no significant role when processing requests locally.
As with the single GET case, Splash can therefore still outperform SNMP
in the 1 hop case.
9.3.2 Setting vs. Retrieving Values
While retrieving values from a remote host is most often executed,
the setting of variables also plays an important factor in determining
comparative processing speed. Figure
depicts the tests concerning SET requests. We can immediately see
that altering variables is considerably slower than retrieving
them, in the current case approximately a factor of 3.
This holds for both environments.
Comparing SNMP with Splash in the SET case follows the same line of
reasoning as in the GET case. While Splash can outperform SNMP on
the local host it suffers a small penalty remotely. Both request
types scale with network delay, thereby decreasing Splash's disadvantage
when the distance increases.
Combining SET and GET requests presents a different picture altogether.
Since SNMP cannot combine SET and GET requests into a single PDU
two separate packages have to be sent to the remote node. Using
parallelization it is possible to send these two simultaneously,
thus reverting the test case to that of the slowest subtask, in
this case the SET request. However, in the test case the
variable that is to be retrieved is the same variable that is being
set. This means that the two have to be carried out in consecutive
order. Splash can combine these requests in a single packet,
while SNMP will have to wait for the SET to finish before it can
issue the GET. Splash easily outperforms SNMP.
The fact that SNMP has to send two requests explains the alternate
shape of its function: it grows twice as fast,
since it encounters twice as much delay. Splash GETSET, on the other hand,
scales with single network delay. More importantly, the combination
of GET and SET into a single packet improves considerably over sending
separate packets, regardless of the packet distance.
Even though the Splash packet is significantly larger than the SNMP
PDU, just as in the 5GET case, it can outperform SNMP in all instances.
The incurred penalty of having to send two packets completely
outweighs the size based delay, apparently. This
is the most elementary example of how the flexibility of the
SNAP language allows Splash to outperform SNMP by combining multiple
interdependent requests into a single packet. The next chapter will
follow through on this line of thought, expanding the gap between
Splash and SNMP performance.
9.4 Subprocesses
The previous results do not reflect the performance of SNAP
and SNMP handling alone. As noted earlier, part of the overhead can be ascribed
to network transfer time.
Decomposing the overhead reveals that it is composed of
overhead due to `pure' SNMP and SNAP handling time, and overhead due to the
factors that are common to both implementations, i.e. network transfer,
internal processing in the MIB and data conversion.
An important subtask is remote MIB processing.
We noticed early in the experiments
that the internal processing time for a request
in the MIB is directly dependent on the variable that is being
requested. Response times of identical request type but for different
variables (e.g. GET sysName.0 and GET ifNumber.0) may vary
widely. The factor that determines this is the location of the variable.
Requesting kernel values is a more computationally intensive task than
copying in-memory variables. This effect does not show in our results
directly. Instead, we have tried to select requests with minimal internal
processing time by requesting values that reside in memory.
Naturally, the same values have been requested using SNAP and SNMP.
Figure shows us how the different subprocesses that come into play
affect the overall results. For this breakdown into subprocesses
we selected the 1GET request from the previous tests. Since it executes a minimal
SNMP request the relative overhead is maximized.
Contrary to the previous tests, these results were obtained from a single
test run.
We use them solely to discuss the comparative processing times of individual subtasks.
The request was sent to five servers of different distance to the client. From this we can
distill the scalability with regard to network delay.
Total processing time cannot be compared
with previous results because the system had to export additional debugging information to obtain
these results.
However, the relative results are still applicable to the previous runs.
From figure we can see
that the redirection and client steps include a large amount of idle time,
where the program is waiting for another task to complete.
For instance, the
waiting stage of the redirection server takes equally long as the time within
which the packet is sent through the network combined with the time needed to
process the packet at the remote host. Similarly, the waiting stage of the client
is equal to the total processing time of the redirection server. Due to an overlap
in processing during the communication phases, the figures do not depict this behaviour
exactly. However, one can distill this fact when taking the timing overlap
into account.
Figure 9.2: Single GET Subprocess Results
Apparently, back-end MIB data retrieval plays only a minor
part in the entire process. Because back-end processing is especially limited for a
single in memory GET request the current
situation can be taken as a worst case estimate of processing overhead.
From previous experiments we can deduct that the same holds for SNMP. In case of Splash
it is relatively easy to improve upon these results. For inter-server communication,
a local redirection server is used.
However, there is no technical reason for doing so. If we were to use Splash
solely as a remote interpreter we could do away with the redirection server, reverting
all processing back to the 1 hop case, where no intermediate server comes into play.
In section 4.4 we discussed why we chose to use the
current less-than-optimal
solution from a performance point of view. Combining multiple
SNMP requests and dealing with them on site, as we will do in the next chapter,
is another means of reducing the relative cost of the overhead.
9.4.2 Individual Tasks
Figure 9.2 displays a somewhat course grain outlook of the subprocesses.
To better clarify the process we will briefly discuss each of the intermediate
steps.
Remote Server
In the general case, hops 2 to 5, processing on the remote server takes approximately 1 millisecond.
At most half of this time is used by the SNMP daemon. The other half is used for creating the
necessary structures, retrieving results and executing the packet's travel logic.
Naturally, the hopcount plays no role in this part of the request.
Redirection Server
All requests are sent through a local server. This redirection server
executes the special stages of the packet program used for initialization
and finalization. During the latter stage, it unpacks the return values and
hands them to the client through a local UDP connection. The rationale
behind this was discussed in section 4.4.
Figure
shows that this setup imposes a severe penalty on the processing speed. In
particular, postprocessing (unpacking and redirecting of the return values)
takes up a large amount of time. We can also see that the difference in
processing time between 2 and 5 hops is approximately equal to the
difference in ping times. This has been discussed previously, but again it
shows that total processing time scales linearly with network delay.
Client
The client, finally, adds another 100 ms to the final result.
Mostly due to printing of performance statistics and the result to the
screen, this is a penalty that does not exist under normal operation.
Special Case: Local Processing
For hopcount 1 we sent the request to the local server.
Statistics for this case are quite different from
the general case. No redirection is necessary, which decreases response time.
However, remote computation increases since postprocessing is now being executed
on the remote server.
9.5 Performance Overview
A number of conclusions can be drawn from the performance tests.
Advantageous to our case is the observation that
SNMP appears to consume more processing power than Splash. Also,
SNMP suffers from much longer preprocessing overhead.
Putting Splash at a disadvantage, SNMP still outperforms Splash by
a percentage in the general case.
Compared to previous results obtained with AN systems,
where performance was several factors or even orders worse than the
reference system this relatively small penalty is a great improvement,
however. Another drawback is the increased packet size inherent to using
programmable packets and especially the effect this has on network delay.
Under identical network delay Splash will always perform marginally slower
than SNMP, but since the significance of remote host processing diminishes
with network delay a small extra overhead on the remote host will be
reduced when intermediate delays grow.
If, as it does, the system scales worse, however, this may not hold. In the
experiments, comparative processing speed sometimes
actually dropped when network distance increased.
Since we have not tried to optimize packet size we expect that improvements
can be made in this area.
From examining the subprocesses it becomes clear that time spent on actually
retrieving the single SNMP value from the MIB can be small compared to the overall
process. It is essential to decrease the relative
overhead if we want to increase Splash's effectiveness.
We will do so in the next chapter by aggregating response data and
reducing network travel.
Most important outcome of these tests is that Splash can execute those requests
to which SNMP is especially tailored with only a minor performance penalty.
As the SNAP and SNMP results are in the same order of magnitude, Splash could
conceivably be used as a replacement for SNMP. When optimizing the codebase, as
has probably been done in the long lifespan of net-snmp, even better results are
surely obtainable. As we stated earlier, however, there
is no reason not to use the SNMP "half" of Splash.
The results here merely show that SNAP is not far behind
SNMP for any type of request, even those in which SNMP obviously excels.
Following on these experiments, the next chapter gives a functional comparison
of the two platforms as discussed in the functionality
test design of section 5.2.2.
Chapter 10 Experiments: Functionality
Most of SNMP's drawbacks have to do with the
rigid structure it imposes on the network topology and the
inefficiency in data transfer and system response time
this brings with it. Discussed here are
several techniques that can reduce this
overhead. For each of these we will present an implementation
in Splash.
10.1 Introduction
In the previous chapter it is shown that Splash performance
is near that of SNMP, but not completely on par. However, the tests
ran to obtain these results were tailored especially toward SNMP,
consisting solely of the low-level atomic operations in which it
excels.
While SNMP may outperform more flexible solutions in quick retrieval
requests, the limitations of the framework quickly become apparent
when implementing more elaborate NM solutions. In
section 5.2.2 NM tasks have been
presented that cannot
be handled practically by issuing these atomic requests. The list is by no
means exhaustive, but it contains enough everyday use-cases that are
difficult to carry out using SNMP.
For the presented use-cases we will disregard the issue of performance
altogether. The reason for this is simple: using SNMP, it is
possible to retrieve large amounts of data in parallel, execute
the necessary computations on the monitoring host and update the
necessary values in a single step. By utilizing large scale
parallelization the time the entire process takes is reverted to the
case of retrieval and subsequent setting of the slowest value. Nevertheless, the
amount of data that has to be transported may be of such proportions that
this is deemed an inapplicable solution to many problems.
Although it is hard to prove such a
claim, we suspect that SNMP's excessive network bandwidth utilization
in high-level tasks limits the exploration of practical
network management solutions. With the anticipated growth of network-enabled
appliances the need for more flexible and resource friendly management tools
shall almost certainly increase in the coming years. The
introduction of new dynamic networking paradigms (e.g. ad hoc
networking) especially calls for more intelligent communication [].
The techniques that have been selected can significantly reduce network traffic
and network wide system responsiveness. For each of the test cases an
example has been implemented in Splash. To save space
we refrained from presenting the complete programs below. Instead, we will
show instructive pseudo-code snippets. The complete programs, together with other
prototype packets and background information can be found on the Splash
website at http://splash-snap.sourceforge.net/papers/dsom2003/.
A detailed description of SNAP's semantics can be found
in [] and on the project website [].
10.2 Serverside Processing
Reducing data transfer and response time can be accomplished by removing
superfluous communication between the nodes. In the
first set of use-cases we will only look at single client to server communication.
10.2.1 Serverside Data Aggregation
In [], a SNAP-based program is introduced that travels to
a predefined list of hosts. This application, called
a surveyor, has the advantage over simple polling that it minimizes the
distance traveled through the network and thereby reduces overall management
traffic. Figure shows an example of a simplified
surveyor packet created for visiting only a single host. A more general solution
called a list-based surveyor is presented in figure .
|
1010
; continue to dest
forw
; last host on list?
bne atdest
; operational code
push "ifInErrors.2"
snmp_getsingle
push "ifInErrors.3"
snmp_getsingle
add
; move to next host
pull 1
forwto
; return data
atdest:
demux
Figure 10.1: A simple surveyor packet
|
| |
1010
; test against hard coded threshold value
gti 30000
bez normalrun-pc
push 0
calls "proc_setipforward"
normalrun:
; continue normal processing here
Figure 10.2: Serverside reacting
|
The simple surveyor program travels to a remote host, executes a number of
instructions and returns home. Figure 7.1 shows the
same packet tailored to a single SNMP get request. One of the advantages
of the programmability of AN packets is that it is possible to
retrieve a number of requests in this fashion, compute a derived result based
on those intermediate values
and then return to the host with only the derived value.
Figure 10.1 gives an example of this functionality by
computing the aggregate of the network packet count of two interfaces. This
specific example is very straightforward for purpose of clarity.
The application of this technique can, of course, be generalized to situations
where more elaborate calculations are
needed to obtain useful statistical data. With the basic arithmetic in place
any number of calculations can be carried out in this fashion using Splash.
Considering that many monitored statistics are in fact derived values,
aggregating data on the server can limit occupied
network bandwidth greatly. Splash packets are larger than their
SNMP counterparts, as could be seen in the 5GET request discussed
in 9.3. By bundling data results prior
to transmitting them this problem can be overcome.
Theoretically, the bandwidth reduction obtainable by serverside data aggregation
is, for m raw input values and a single requested derived value,
100 - [ 100/m]%. If we define packetsize in bytes as P(packettype),
choosing
Splash over SNMP can be considered a viable option when
m-1 > = PSplash - PSNMP .
10.2.2 Serverside Reacting
In the previous example we exploited only Splash's ability to compute
derived results and initiate a return trip to the monitoring client. For
use-cases in which reactions are predefined
returning home before reacting is unnecessary. Especially in situations where
speedy reacting is vital would we much rather take immediate action on
the spot.
|
1010
; calculate new value
push "ifInUcastPkts.2"
calls "snmp_getsingle"
push "sysUpTime.0"
calls "snmp_getsingle"
div
; get previous value
push "intrafficdx"
calls "memmap_getint"
; push new value
push "intrafficdx"
pull 1
calls "memmap_addint"
; compare new and old values
multi 2
gt
bez isok-pc
push "error"
Figure 10.3: Serverside Trend Analysis
|
| |
1010
; forward to destination
dforw
pullstack
bez atsource-pc
; execute requests here
; forward to next dest
pullstack
dsend
exit
; return data
atsource:
push 7777
demux
#data 1
#data 10.0.0.34
#data 1
#data 127.0.0.1
#data 0
#data 0
#data 0
Figure 10.4: A list based surveyor packet
|
As an extension to serverside data aggregation, we will now extend the
simple surveyor packet to perform direct actions based on
the computed values. We added access to the underlying operating system through
back-end services that carry out basic network maintenance tasks.
Examples are functions to set interface status (UP or DOWN),
forwarding rules (TRUE or FALSE), /proc file system values
and route table entries.
As net-snmp does not directly
support this, the corresponding functions were encapsulated in SNAP
services. These functions can be called from the surveyor packets,
for instance to shut down
interfaces on nodes from which too much traffic originates or to disable
forwarding in nodes where error rates are unacceptably high.
A common test case is the following: network throughput
is considered too high if a predefined threshold is reached. In the
surveyor packet we can test against this threshold and execute a special
code section. The snippet in figure shows
this example of serverside reacting. In the example we disable
IP forwarding if the total amount of incoming traffic exceeds a predefined
threshold.
Serverside Reaction can play an important role in increasing system
responsiveness. By taking action on location we can remove a complete
round trip through the network, which can take a disproportionate amount
of time, as became clear from the performance tests. Especially when
traveling through the network is difficult due to
an already high network load will serverside reacting pay off. Automation
of tasks in this manner can only be used if actions are known a priori,
however.
The subsequent savings related to employing this technique can be
written down as a saving in return trip time tnet per action test.
Therefore, defining the remote calculation time as tcalc and
client side calculation as tlocal gives the
following threshold test for using Splash: tnet + tlocal > tcalc.
From the performance results we can see that network traversal,
including processing, can take up a disproportionate amount of time,
both for Splash and for SNMP. Therefore remote calculation can be preferable
under many instances. It will be most advantageous in complex decision
making situations, however, since each action test adds a new
saving in time.
Each remote action test removes a
return trip to the monitoring client. Network bandwidth
consumption is therefore also reduced, with 2P*i, where i is the number
of action test branches resulting in serverside reaction.
10.2.3 Serverside Trend Analysis
An often recurring NM task is trend analysis.
RMON was created as an extension to SNMP for this purpose.
We will now display an element of Splash that mimics such behaviour by
being able to store historical data on location.
For this next example we added a data dictionary service to
demonstrate the application of serverside trend analysis. The data dictionary
consists of simple set, get and delete instructions for manipulating
named variables. Keeping track of previously calculated information
on the remote nodes reduces the
need for copying intermediate statistics to the monitoring agent.
Consequently, it makes calculating derived results based on time possible
on location.
Figure 10.3 displays the code for comparing
an error rate over time: a straightforward trend analysis example.
The depicted snippet could be inserted in any one
of the packets discussed earlier.
Savings are in this case twofold: network savings scale according to the
serverside data aggregation example while response time behaves identical to the
serverside action taking example.
10.3 Network Wide Processing
The previous examples showed some advantages of choosing an alternative over SNMP.
Nevertheless, the displayed tasks could just as well be carried out using on-demand
code loading or mobile code environments.
Active networks have the added ability to
traverse a network autonomously. The following techniques exploit the
added capability of AN based systems to make various decisions on location,
functionality that sets it apart from the other environments.
10.3.1 Network wide Data Aggregation
Statistical data frequently consists of raw values that are
found scattered through the network. In this case serverside processing
will not reduce traffic significantly, since each server has to be addressed
individually. Distributed derived values can be compared with a threshold without
having to access all hosts. For instance, comparing a global value
against a threshold can stop as soon as the threshold is reached. The
benefit of this technique is most apparent when the threshold is easily
reached, for instance in case a single boolean value has to be tested
on each machine.
In this case a hop-by-hop approach, where totals are computed from the
totals over the visited nodes, is preferable to a centralized approach.
Again, let us exemplify this idea by altering the simple surveyor packet.
The original packet searches only for aggregates. By using the list surveyor,
depicted in figure 10.4,
we can compute an aggregate over the visited hops. Testing the aggregate
against a threshold at each hop quickly reveals a possible error condition.
Not only is network bandwidth minimized, the problem itself is also localized,
since we know at least one section of the network that triggers a response.
If necessary, immediate action can be performed on location.
Retrieving a value from n hosts will always take 2n single
network traversals (n round trips) using simple polling.
The amount of messages sent is dropped to at most n+1 when
using hop-by-hop traversal. Implementing action tests on the way can further
reduce the data bandwidth, depending on the chance of triggering a global action
test. Bandwidth savings therefore lie between
100 - [((100)(n+1))/2n]% and 100 - [ 100/2n]%.
Response
times may increase by using hop-by-hop traversal, since
no parallelization is possible. Whether employing this technique is sensible
in time restrained situations depends on two factors: (1) whether
remote reaction is expected and (2) network delay. In NM situations
the monitoring client is often relatively far away from the
remote nodes, while these are heavily interconnected. This holds for instance when
scanning workstation subnets. If the link between the MC and the
nodes incurs a delay of tmain, while the node links incur - on average -
a delay of tinter response time will not decrease until
tmain*2*n > tmain*2 + tinter*(n-1) for a network of n remote nodes.
Applicability is therefore dependent on number of participating nodes and
relative cost of network traversal.
10.3.2 Network wide Reacting
One of the problems considered hard to perform using SNMP is
that of resolving distributed problems. In this section, we consider the
example of a Distributed Denial of Services (DDoS) attack. Discovering the
originating network nodes and taking action on these nodes is essential to
stop the network from becoming flooded. Using SNMP, the only way to find out
where a problem occurs is by sending messages to a large number (possibly
all) of the nodes in the network.
This increases the load considerably in an
already overloaded network and, as a side-effect, transmits a lot of data,
much of which is useless.
Instead, using Splash, we altered the original surveyor program to react
locally when a problem is spotted. At each hop, the network load is compared
to a predefined threshold value. If this value is exceeded, normal execution
is halted. The packet requests the IP addresses of all neighbouring nodes
and forwards itself to these nodes. It then immediately returns to the
management station where it delivers an error message. The resulting
program, shown in figure , is inserted directly after
the operational code of figure 10.1. Since SNAP does
not allow unlimited loops, we have to resend a packet to the current host
to execute the special case instruction for an a priori unknown
number of neighbouring hosts.
By using this algorithm the DDoS test is recursively copied throughout the
errorzone and dies out only at nodes that operate under normal load.
The management station therefore only receives error reports from those
areas that need extra attention. Optionally, error reporting could also
be replaced with serverside reacting to even further alleviate network stress.
Successful in specific situations, this technique can incur even more
bandwidth utilization than simple polling if used incorrectly. The crux
lies in the use of a so called flood-fill send. For a network of
n remote nodes, having on average s network connections, n*s packets
will be sent worst case. This is greater than using simple polling when s > 2.
However, since the process starts only when a possible error is detected
and spreads only in the errorzone + 1 additional hop savings can occur in practice.
For an errorzone of r neighbouring nodes r*s packets are being sent
inside of the network plus an additional 1 + r between the MC and the
network if serverside reaction is not used. Worst case this results
in a total of r*(s+1) + 1 messages, which is more efficient than n*s only
if r + [ r/s] + [ 1/s] < n. Depending on the number of interconnections
this threshold value will lie somewhere between r = ([(n-1)/2]) for s=1 and r = n
for s=¥. Naturally, the upper boundary is always satisfied, therefore
we can say that expected bandwidth savings increase with the
interconnectedness of the network and decrease with size of the errorzone.
The size of a possible errorzone depends on the topology of the network
and the number of external connections. Finding a clear threshold is therefore
not possible in the general case. As a heuristic we can say that
in sparsely connected networks other solutions should probably be considered.
Response time savings for this technique are dependent on the type of action
that should be undertaken. For a basic action, e.g. the shutting down of an
interface, response time will decrease with a single return trip for each
affected machine. Particularly in circumstances where the network load is
already extremely high, as is the case with DDoS attacks, will this pay off.
The response time calculation for the previous example still holds,
with the additional
factor that delays will be greater than normal in these circumstances.
Secondly, using serverside reacting, each affected node can be repaired
before communication takes place, thereby reducing overall bandwidth
utilization incrementally. This bonus will recursively copy through the
network, speeding up the entire recovery process possibly exponentially.
The precise speed-up depends on the order in which connections are overloaded.
A linear connection will necessarily become available at linear rate. This can be
considered a worst case situation, however.
10.4 Self organizing Networks
Computing totals and executing existing functions on location have direct
applications in the current network management domain. Nevertheless,
these applications can hardly be called revolutionary. With the ever increasing
abundance of networked devices and the correspondingly increasing cost of
maintaining these devices research is currently underway to automate
NM tasks as much as possible. Automated networks are popularly referred to
as self-organizing networks. The following examples serve as a primer
into how active networks can assist in exploring new NM algorithms
needed for automating much of the network administrator's tasks.
The next examples increase the use of the remote interpreters' functionality,
but employ no bandwidth or response time saving techniques other than the previous.
Saving estimates will therefore not be given.
10.4.1 Autonomous Network Roaming
We now present a more intelligent descendant of the surveyor family of
network programs: the autonomous surveyor.
The autonomous surveyor can travel through a network without need for
prior knowledge of the underlying topology. Selecting a next hop
based on local data is in essence a simple extension of the serverside
reacting case. However, it allows for new classes of algorithms. The
distributed surveyor already showed how locality can be exploited to
reduce global problem solving complexity. Autonomous
processing generalizes on this idea.
|
1010
; threshold test
gti 1000
bez normalrun-pc
; send to neighbour
specialcase:
getdst
calls "if_getallneighbours"
push 0
send
; resend to this host (loop)
push specialcase
getdst
forwto
Figure 10.5: Distributed processing
|
| |
1010
forw
; go home if resources
; are used up
getrb
lti 2
bne gohome-pc
; get and goto a next host
getdst
calls "if_getnexthop"
forwto
; go to client
gohome:
push athome
getsrc
send
; return info
athome:
demux
Figure 10.6: An autonomous surveyor packet
|
The example program shown in figure 10.6 uses
autonomous processing to select a next hop based on local data. Sent
into the network without knowledge of the topology, the autonomous
surveyor has to design its own route through the network.
The algorithm used for selecting the next destination directly impacts which
nodes can be accessed. We chose a simple heuristic: select as the outgoing interface
the entry listed directly after the incoming interface in the iFace table.
A round robin scheme is used to connect the list's outer elements. This
will only allow us to traverse the outer edges of a densely connected network.
In the honeycomb-like structure we selected for testing there will therefore
be a number of nodes that are not traversed. We will provide a more robust solution
in the next example.
By placing the interface selection algorithm in a service the resulting SNAP
packet stays very simple, as shown in figure 10.6.
The most apparent use of an autonomous agent is network discovery. This
in itself has a number of applications, mainly in highly dynamic and self-reorganizing
networks, such as ad hoc and peer-to-peer nets.
Hop-by-hop destination selection has yet another practical benefit over polling.
Because this application selects its next destination from the set of
neighbouring nodes, disabling forwarding has no impact on its ability to
move through the network. Similarly, as long as we do not disable all
interfaces, access to a node is retained, even when the routing tables do
not reveal this. In such situations, hop-by-hop traversal is
necessary to fix problems. They cannot be solved with SNMP.
10.4.2 Stateful Network
So far, we have discussed approaches to network management that act directly
on the available data. When no additional factors influence the execution of a
program we call a program stateless, i.e. it is not dependent on an internal state.
The applications discussed are not completely stateless, as execution is dependent
on external data, namely SNMP variables and on the packet's internal stack.
However, so far we have not used these values to actively guide a program's
execution beyond that of taking immediate counteractions based on a threshold value.
A stateless environment has inherent limitations.
One shortcoming became apparent
in the previous example. Since the autonomous surveyor had no knowledge of its
current location it used a simple heuristic to select the next hop. Result
was that the surveyor could not reach some of the internal nodes of our test network.
Guaranteeing access to all nodes can be accomplished by tracking previous
behaviour. A single variable containing the last exited interface on a node
can be used to select another outgoing interface each time we visit
this node. Using the internal stack of a program remembering all visited hops is
practically impossible due to the maximum size of the packet and the processing
overhead it would entail. More importantly, each time a packet
is destroyed the state of the network is wiped completely.
Instead, we will use the previously mentioned data dictionary to
add state to the network.
To demonstrate the advantage of using a stateful network we have added new
NM functionality to our SNMP interface. The augmented library can access
a remote SNMP daemon as well as the local one. In accordance with the
interoperability argument it would be useful if not all nodes in a network
would need local Splash daemons. By using remote SNMP access, Splash can serve
as a proxy server to devices that for some reason are unable to run Splash
themselves. However, to do so one needs to know the location of these
systems.
This example uses the previously mentioned techniques to create a
localized display of the NM topology. A discovery packet based on the
distributed surveyor searches for all Splash-enabled neighbouring nodes.
Before jumping to a new node it writes its
current knowledge of the network to the local data dictionary.
When a node has no Splash daemon running the packet gets lost and cannot
update the dictionary. Otherwise the intermediate results are overwritten
by the packet on its return trip. Intermediate results in a dictionary
will therefore refer to a
system running either stand-alone SNMP or no NM tools whatsoever.
A second packet can test this as well by sending a test SNMP request to
the server.
After the discovery phase, an autonomous agent can be sent to retrieve data
from any node N in the network.
Traveling through the network based on information found in the dictionaries
along the way, the packet knows which nodes are running Splash. When the agent
lands on the closest node to N running Splash it sends the SNMP
request and continues its internal program as if it was being executed on
node N itself.
One can think of more advanced scenarios where this two-tier topology can be
used to breach the functionality gap between flexible active
networks and inflexible existing infrastructure.
In any case, adding
state to the network can increase the option space of a roaming application.
Since SNMP relies on a pre-configured
network it is not suited to highly dynamic networks. The previous example
has shown that active networks can be applied naturally to these
environments and that an implementation can be created in Splash.
At this point we should make clear that Splash was not originally devised
to incorporate inter agent communication. The implemented data dictionary
can be used as a method of indirect communication, but other agent
platforms may include more graceful solutions.
10.5 Functionality Overview
Comparing functionality of two systems is harder than comparing performance
due to a lack of hard quantifiable data. However, we have supplied a number of
use-cases where SNMP usage either entails too much overhead or cannot be applied
at all. Table gives an overview of the discussed techniques
and the improvements each one brings over simple polling. We have also demonstrated that
these techniques can be implemented with Splash. Considering
that Splash can also carry out all of SNMP's functions aside from traps we state
that it is superior in terms of functionality. Adding traps to the system
has not been undertaken, but it should be apparent that doing so will bring with
it no new technical hurdles. Therefore the lack of traps is in our view
not a convincing counterargument to this claim.
serverside data aggregation | reduces network bandwidth utilization |
serverside reacting | reduces response time |
serverside trend analysis | improves on both metrics |
network wide data aggregation | reduces network bandwidth utilization |
network wide reacting | reduces response time |
autonomous network roaming | expands upon application space |
stateful network | expands upon application space |
Table 10.1: Overview of functional techniques
Each of the techniques mentioned offers an independent advantage for which we
tried to select an insightful example. The presented use-cases are kept
brief for purpose of clarity.
Many of them can also be expressed in other networked environments, such as
mobile code systems. The real power lies in combining
the extended functionality with the flexibility
sported by active network environments. AN based systems allow the user
to combine useful techniques into specialized problem solvers on a case by case
basis.
One
should note that the algorithms discussed here do not necessarily exhaust the
option space given by our system. Other methods for improving network
management currently implemented in legacy applications, for instance those
discussed in [] and [], can also be
ported to Splash. The obvious benefit is selection freedom:
an administrator can
trade-off functionality against performance for each individual use-case,
scaling from simple polling to the latest advances in network management
research. As such, AN systems can be
used to experiment with new algorithms, contrary to SNMP. Furthermore,
it is unnecessary to add new software packages for each new use-case.
These tests conclude the experimental part of our research. What remains is
to point to issues that have not been dealt with and draw final conclusions.
In the
following chapter we will pinpoint shortcomings of Splash's current implementation
and introduce follow on research projects.
Part 5 Inference
Splash was designed first and foremost as an experimental
testing platform. In its current state the system can
handle all requests described in this document and possible
many more. Splash can therefore serve as a
platform for various interesting academic pursuits.
In this chapter we will discuss a number of issues that
might make up interesting follow-up research - and
software development projects.
11.1 Research
In our research we've mainly been comparing performance
figures between SNMP and Splash. Building on previous
related work we have tried to lay a foundation for a
new round of research into active networking by displaying
that high performance processing is possible using AN technology.
Considering the current status of the Splash platform
and the directions into which academic research is going
we will now suggest research topics in two fields
that we believe can benefit from active networking.
Two other pointers are also
given concerning research into active networking itself,
naturally using Splash as a starting point.
11.1.1 Zero Configuration Network Management
Advances in network management technology have lagged behind
primarily because of the undisputed position of SNMP. In the
last few years various research projects, including this one,
have shown experimental results indicating that the main
reason for using SNMP, its unparalleled performance, no longer
holds.
At the same time the research community has broken new
ground in network technology by shifting its attention from
the well known static networks to more dynamic systems. Ad-hoc,
mesh and peer-to-peer networks are but a subset of the new
networking paradigms under investigation. These fluid networks
demand features from the management infrastructure not envisioned
years ago and therefore not catered for in today's NM systems.
One of the more promising directions in network management
is called
zero configuration. Suited especially to environments
where connections are highly volatile, zeroconf tools handle
network management tasks without human intervention. Research
into this field is also followed closely by traditional players
in the field because of the huge savings in personnel
costs it can bring to an organization. Zero configuration
IP networking is being investigated by the IETF zeroconf working
group [,]. A notable example of
zero configuration in practice is Apple's Rendezvous [].
A more general framework for on demand networking, including
automation tools for higher level management tasks such as resource handling
negotiation,
is the Open Grid Services
Architecture [].
We've tried to cater for
expansion into this field. The back-end functionality
is already largely in place. Splash's access to
the complete OS subsystem surpasses the functionality of
traditional SNMP. The service infrastructure allows rapid
inclusion of additional required tool sets. Finally, Splash
is in the unique position of being able to work on different
levels of the network, bypassing traditional tools such as
the routetable, if necessary.
Research into zero configuration using Splash will deal primarily
with writing the
packets that carry out resource negotiation and
network discovery, not with modifying the underlying software
platform. All in all it should be possible to show substantial
improvements in this field in a relatively short timespan
using Splash. The fact that zero configuration encompasses
many small issues makes it a candidate for work ranging from
a few weeks up to at least a full semester.
11.1.2 Technology Integration
Another subject that might be of interest is the adaption of
Splash to other domains. We have thus far only looked at the field
of network management. Other AN research groups have taken up
specialized fields such as realtime multimedia delivery.
With the help of, among others, SNAP it is now safe to say
that performance is not an obstacle to everyday active network
deployment. It is therefore possible to start exploring uses
of AN systems outside the narrow scope they've been confined
to so far.
Distributed applications are currently being developed using
a so called web services infrastructure. The
open grid service architecture [] has been defined to allow
for standardized web services in the near future.
Web services can be seen as a special kind of remote execution.
Keeping in line with the end-to-end argument it is perfectly
understandable that the accent in this field lies on
remote execution at the end nodes. However, active networks
can be used to ease the development of such initiatives by
allowing greater flexibility in the underlying environment.
We believe Splash makes a great candidate for this kind
of task by virtue of its extensibility. It already allows
for on demand loading and execution of legacy code, runtime
alteration of the environment and migration of agents.
Missing features are mostly practical in nature, namely
services or instructions should be added to actually download
and execute legacy code. Once this is added it should not
be hard to show that Splash can be used to underly a grid
architecture. Next step is to display the advantages of using
an active network as opposed to using passive IP and
custom built tools to handle this functionality.
Naturally, instead of embedding Splash in a multi-tier
grid environment, one could also search for applications
that can directly be added to the Splash service architecture.
The workload for this type of research therefore cannot
be estimated at this point.
11.1.3 Flexible Agents
The design of SNAP packets allows them to be handled very
quickly. It also holds back deployment of Splash
into numerous situations. While we do not have the intention of
overthrowing the current Splash implementation, we
observe that the platform would benefit greatly from
a more flexible agent infrastructure.
Naturally, any extension to the core SNAP-ee should at any
time be backward compatible to remain relevant in the field
of high performance active networking. Many flexible, yet
relatively slow alternatives already exist. Merging the
advantages of both systems could make an interesting research
project, however.
This is not the right place to go into details on how
such a task should
be accomplished, but we will give a few pointers. During our
research we observed that task expressibility in Splash was limited
by the lack of agent cooperation. Missing
language constructs is another problem that frustrates the
programming of Splash. To overcome these issues we suggest
the inclusion of various features that can be considered
perilous to resource consumption. Inter agent communication
and extended language constructs could be governed similarly
to network utilization. An interesting solution
to this problem has been implemented by Kind et al [].
We believe these two examples are
merely a subset of useful construct already used in
higher level mobile agent approaches. Taken from another,
yet related field is the concept of remote code loading.
Allowing the execution of machine dependent platform code
through the use of on demand services can speed up processing
and open up a vast array of useful tools to Splash developers.
SNAP has found a niche in the AN world by providing high
performance and secure operation. Yet, completely locking out
certain advanced features has limited its scope somewhat.
There are good indications that at least a number of
these features can be added to the system without
sidestepping the original goals. The precise how and what
can make up an interesting research consisting of a lot
of reading into advanced mobile agent approaches combined
with implementation of an, at least partly, governed version
of these constructs. No such mix of robust, fast and high level
construct currently exists. Research into this field
will probably result in a truly original contribution
to the scientific community. One should therefore also
expect this to take up at least a few months of work,
possibly up to a full semester.
The preceding examples hopefully showed some interesting new
directions for Splash. During their discussion we conveniently
disregarded security issues. However, security is a major concern
for a networked platform and Splash currently contains no
such features. It can be expected that authentication and
encryption are features that will be demanded for everyday
deployment of active networks. The SafetyNet []
initiative was started as an acknowledgment of this fact.
For Splash or a production ready
relative to be accepted as a viable solution to
networking problems it is imperative that the security issue is
dealt with in a suitable way. Especially the emphasis on
high performance calls for an implementation that differs
from earlier attempts at securing AN platforms.
This study is largely open ended. Powerful cryptography
must necessarily be a part of any security system. Some kind
of scaling from unsafe yet fast to secure and relatively slow
execution will be necessary to keep true to Splash's original
intents. For a single person we expect this work to
take at least three months. However, the workload can be distributed
among a number of assignments, both academic and engineering. For
instance, implementing a security infrastructure while disregarding
its implications for performance shouldn't take more than a few weeks.
11.2 Software Platform
While Splash is mainly a research platform there are a number of
practical issues, ranging from sloppy coding to missing features,
that could make up a nice software project. Neglecting the
point of originality, the following proposals deal mostly with
beautifying the available platform. The primary goal here is to
heighten Splash's position as a research and development platform.
Performance issues were of primary importance in the development of
both SNAP and Splash. Nevertheless, further optimizations are most
definitely within reach. The current installment of Splash has been
optimized locally where strictly necessary.
A more general solution can be found that
removes certain subtasks altogether. Upon close inspection one
can see that the current code base is far from lean. For example,
data conversion needed between various subtasks reduces
overall processing speed considerably. Creating a global data handling
standard would be beneficial on its own. Other problem areas include
startup time due to service loading and client communication waiting times.
As discussed in section 8.2.3,
the inter server - and client communication
protocols overlap in theory. Yet in the current implementation the two are strictly
separated. One programming task therefore would be to merge the two and possibly
place them in a separate library decoupled from the daemon code. A preliminary
implementation has been made for the, more elementary, client communication.
The new infrastructure, SNAP_demux_handler, separates the Splash
communication layer from the underlying transport protocol and can currently handle raw IP,
UDP and UNIX pipe protocols. Extending it to encompass TCP, IPv6 or lower level
protocols shouldn't pose any problems.
Apart from extending the handler's protocols, we would also want the handler
library to serve as a basis for inter server communication. This can be accomplished
easily by moving the marshalling and unmarshalling functions - that are extremely
simple - into the library. Furthermore, the separated recv(..) loops,
currently existing for inter server, client to server and server to client, should
be merged as well. The networking code is the most duplicated code in the existing distribution
and for maintainability purposes should therefore be rewritten as soon as possible.
A subject concerning both this and the previous task is data
formatting. Ideally, an identical datastructure would be used throughout the
entire system, including the networking code.
The preliminary networking framework was created for handling
demux statements and as such can only handle raw text strings. A
more advanced framework should replace this, for instance the
one already in use in the service interface.
11.2.3 Small issues
The issues discussed above are in our opinion the most problematic in
the present distribution of Splash. That said, many small problems
exist that users may need to work around for the time being. Both as a
guide for future work and as an introduction into the possible practical
problems one might encounter when using Splash we will quickly deal with
a number of smaller issues here.
Instructions and Services
There exists a clear, yet unnecessary
distinction between SNAP instructions and services. From the experiments
it showed that Splash packets can be relatively large due to their service
calling code. One way to avoid this would be to create bytecode instructions
based on the service name. A hashtable is being used as a lookup structure,
but the keys are encoded inefficiently as raw text string.
Instructions, on the other hand,
are encoded into bytecode by the assembler. These two solutions can
perhaps be merged to create consistent bytecode implementations. A
one-way function could be used for service name to bytecode translations.
Following such an operation, the gap between instruction and service
handling could be bridged, since all bytecode instructions could then also
be referenced through the hashtable, instead of through a separate
switch statement, as is now the case.
Unifying Front-End
This work has dealt with comparing Splash with SNMP. On multiple
occasions we've stressed the importance of interoperability between
the two systems. While possible in theory, no framework for interoperation
exists so far. A unifying front-end capable of handling both Splash and
SNMP results, is therefore desirable for practical deployment. Similarly,
which of the two platforms is preferable for sending a specific request
can be calculated automatically. Therefore both the preprocessing and
the postprocessing involved could be abstracted from the user through
a special purpose front-end user interface.
Code Cleanup and Documentation
Pushed through the development cycle a number of times, the Splash code base
looks far from consistent at the moment. A serious tidying up
would be useful. Many stale macro's and duplicated functions can easily be
removed, for instance one of the two hash tables used by the daemon.
A more rigorous cleanup is the removal of a large amount of stale code
related to
- a previous kernelspace implementation. This behaviour is broken in the -wjdb
build and all existing references should be removed.
- alternative packet formats. A previous implementation had support for
multiple packet formats. For practical reasons we only use the most efficient
format. Some handlers still exist for the alternate layouts, as well as their
complete definitions. However, the new interfaces have not been designed
with multiple packet formats in mind. Therefore the code base should be
purged from references to these formats or their handlers should be extended
to the new interfaces. For efficiency as well as readability reasons we suggest
doing the first.
Related is the issue of documentation.
As for now, the Splash internals have been documented
by an automated tool, doxygen. The output of this tool
can be found on the Splash website []. However, we've only
started using this tool after completion of the project, therefore
instructive commentary is sparse. Tidying up of the code base
should go hand in hand with the addition of useful commentary, preferably in
a syntax recognized by doxygen.
The projects discussed here identified some shortcomings in the current implementation
of Splash. From these observations and the outcomes of the experiments
conclusions will be drawn in the next - final - chapter.
The research executed in this work should have demonstrated that
an active networking environment can be used as a network management
tool. Our goal was twofold. First, to show that such a system can surpass
the current de facto standard tool, SNMP, in terms of functionality.
Second, that it would not incur a large performance penalty.
In chapter 2, a number of active networks were shown,
most of which were deemed inadequate for network management tasks
due to the relatively high processing overhead they incurred.
After having presented some weaknesses of SNMP
and how an active network might be used to overcome these in
chapter 3, we have shown a blueprint for such an AN-based application
in chapter 4 and identified a number of test scenarios
with which we can compare network management toolkits in chapter 5.
In part 5.5 an
implementation of the aforementioned design, Splash, was presented.
Splash implements an architecture that combines mobile agent support with
standard SNMP. This architecture minimizes the overlap of the code bases for
the underlying SNMP daemon and active network while providing the
functionality of both within a single process. Splash
combines the widely deployed net-snmp package with
a user-space SNAP active packet environment. The hybrid nature of Splash
allows it to circumvent a main drawback of other SNMP alternatives, i.e. lack
of interoperability.
The experiments discussed in part 8.3 serve to verify
the claims set forward in our thesis. To properly do so they must show that
(1) Splash can, contrary to
previous AN based management tools, execute low-level request at roughly the
same speed as SNMP, while (2) at the same time reduce management traffic
or improve overall responsiveness vs. SNMP in use-cases where derived or
distributed results come into
play. The first task is tackled in chapter 9, the second in
chapter 10.
Performance experiments showed
that Splash's performance is indeed only fractionally lower than SNMP's
for simple requests. Furthermore, from the design of our system follows
that it can interoperate with the reference system. If the incurred slowdown is
too high a burden one can always choose to use SNMP requests in specific low level
tasks and only use SNAP agents for complicated scenarios.
Functionality tests showed that with Splash it is possible to deploy algorithms
that can (1) reduce network traffic, (2) decrease response times and (3) solve
problems SNMP is incapable of handling. The used techniques, discussed in chapter 10,
are independent of one another and are applicable in specific situations. The
flexible active network framework allows one to combine these and other
techniques on a case by case basis.
Finally, shortcomings of our implementation were discussed in
chapter 11. They showed that Splash can not yet
replace existing systems. The most notable weakness is its lack of security.
While the underlying SNAP interpreter ensures safety with regard to resource
consumption, security and robustness are qualities not yet found in, but
easily added to Splash.
One can see that Splash is not a production ready
management application as it stands.
However, we never set out to create such a system. Splash's main
contribution is that it showed that active networks can be applied to network
management practically. Proving our thesis, Splash is able to surpass SNMP
in terms of functionality while at the same perform nearly as well as SNMP in
traditional tasks. The underlying active network and extensible service infrastructure
will allow it to adapt to unforeseen surroundings, both in- and outside of the network
management domain. By doing so, Splash creates a
safe environment for adoption of new networking practices. A
trial project of the technology in a metropolitan area wireless network, demonstrating
the practical value of active networking, is currently
underway. To further encourage research in this field we have released Splash
in the public domain [].
|