| 1 |
tdb |
1.4 |
<!--#include virtual="/doctype.inc" --> |
| 2 |
tdb |
1.5 |
<head> |
| 3 |
|
|
<title> |
| 4 |
|
|
CMS Features |
| 5 |
|
|
</title> |
| 6 |
tdb |
1.3 |
<!--#include virtual="/style.inc" --> |
| 7 |
tdb |
1.5 |
</head> |
| 8 |
|
|
<body> |
| 9 |
|
|
<div id="container"> |
| 10 |
|
|
<div id="main"> |
| 11 |
tdb |
1.3 |
<!--#include virtual="/header.inc" --> |
| 12 |
tdb |
1.5 |
<div id="contents"> |
| 13 |
|
|
<h1 class="top"> |
| 14 |
|
|
CMS Features |
| 15 |
|
|
</h1> |
| 16 |
|
|
<h2> |
| 17 |
|
|
Problem Specification |
| 18 |
|
|
</h2> |
| 19 |
|
|
<h3> |
| 20 |
|
|
Original Problem |
| 21 |
|
|
</h3> |
| 22 |
|
|
<p> |
| 23 |
|
|
This is the original specification given to us when we |
| 24 |
|
|
started the project. The i-scream central monitoring system |
| 25 |
|
|
meets this specification, and aims to extend it further. |
| 26 |
|
|
This is, however, where it all began. |
| 27 |
|
|
</p> |
| 28 |
|
|
<h3> |
| 29 |
|
|
Centralised Machine Monitoring |
| 30 |
|
|
</h3> |
| 31 |
|
|
<p> |
| 32 |
|
|
The Computer Science department has a number of different |
| 33 |
|
|
machines running a variety of different operating systems. |
| 34 |
|
|
One of the tasks of the systems administrators is to make |
| 35 |
|
|
sure that the machines don't run out of resources. This |
| 36 |
|
|
involves watching processor loads, available disk space, |
| 37 |
|
|
swap space, etc. |
| 38 |
|
|
</p> |
| 39 |
|
|
<p> |
| 40 |
|
|
It isn't practicle to monitor a large number of machines by |
| 41 |
|
|
logging on and running commands such as 'uptime' on the |
| 42 |
|
|
unix machines, or by using performance monitor for NT |
| 43 |
|
|
servers. Thus this project is to write monitoring software |
| 44 |
|
|
for each platform supported which reports resource usage |
| 45 |
|
|
back to one centralized location. System Administrators |
| 46 |
|
|
would then be able to monitor all machines from this |
| 47 |
|
|
centralised location. |
| 48 |
|
|
</p> |
| 49 |
|
|
<p> |
| 50 |
|
|
Once this basic functionality is implemented it could |
| 51 |
|
|
usefully be expanded to include logging of resource usage |
| 52 |
|
|
to identify longterm trends/problems, alerter services |
| 53 |
|
|
which can directly contact sysadmins (or even the general |
| 54 |
|
|
public) to bring attention to problem areas. Ideally it |
| 55 |
|
|
should be possible to run multiple instances of the |
| 56 |
|
|
reporting tool (with all instances being updated in |
| 57 |
|
|
realtime) and to to be able to run the reporting tool as |
| 58 |
|
|
both as stand alone application and embeded in a web page. |
| 59 |
|
|
</p> |
| 60 |
|
|
<p> |
| 61 |
|
|
This project will require you to write code for the unix |
| 62 |
|
|
and Win32 APIs using C and knowledge of how the underlying |
| 63 |
|
|
operating systems manage resources. It will also require |
| 64 |
|
|
some network/distributed systems code and a GUI front end |
| 65 |
|
|
for the reporting tool. It is important for students |
| 66 |
|
|
undertaking this project to understand the importance of |
| 67 |
|
|
writing efficient and small code as the end product will |
| 68 |
|
|
really be most useful when machines start run out of |
| 69 |
|
|
processing power/memory/disk. |
| 70 |
|
|
</p> |
| 71 |
|
|
<p> |
| 72 |
|
|
John Cinnamond (email jc) whose idea this is, will provide |
| 73 |
|
|
technical support for the project. |
| 74 |
|
|
</p> |
| 75 |
|
|
<h2> |
| 76 |
|
|
Features |
| 77 |
|
|
</h2> |
| 78 |
|
|
<h3> |
| 79 |
|
|
Key Features of The System |
| 80 |
|
|
</h3> |
| 81 |
|
|
<ul> |
| 82 |
|
|
<li>A centrally stored, dynamically reloaded, system wide |
| 83 |
|
|
configuration system |
| 84 |
|
|
</li> |
| 85 |
|
|
<li>A totally extendable monitoring system, nothing except |
| 86 |
|
|
the Host (which generates the data) and the Clients (which |
| 87 |
|
|
view it) know any details about the data being sent, |
| 88 |
|
|
allowing data to be modified without changes to the server |
| 89 |
|
|
architecture. |
| 90 |
|
|
</li> |
| 91 |
|
|
<li>Central server and reporting tools all Java based for |
| 92 |
|
|
multi-platform portability |
| 93 |
|
|
</li> |
| 94 |
|
|
<li>Distribution of core server components over CORBA to |
| 95 |
|
|
allow appropriate components to run independently and to |
| 96 |
|
|
allow new components to be written to conform with the |
| 97 |
|
|
CORBA interfaces. |
| 98 |
|
|
</li> |
| 99 |
|
|
<li>Use of CORBA to create a hierarchical set of data entry |
| 100 |
|
|
points to the system allowing the system to handle event |
| 101 |
|
|
storms and remote office locations. |
| 102 |
|
|
</li> |
| 103 |
|
|
<li>One location for all system messages, despite being |
| 104 |
|
|
distributed. |
| 105 |
|
|
</li> |
| 106 |
|
|
<li>XML data protocol used to make data processing and |
| 107 |
|
|
analysing easily extendable |
| 108 |
|
|
</li> |
| 109 |
|
|
<li>A stateless server which can be moved and restarted at |
| 110 |
|
|
will, while Hosts, Clients, and reporting tools are |
| 111 |
|
|
unaffected and simply reconnect when the server is |
| 112 |
|
|
available again. |
| 113 |
|
|
</li> |
| 114 |
|
|
<li>Simple and open end protocols to allow easy extension |
| 115 |
|
|
and platform porting of Hosts and Clients. |
| 116 |
|
|
</li> |
| 117 |
|
|
<li>Self monitoring, as all data queues within the system |
| 118 |
|
|
can be monitored and raise alerts to warn of event storms |
| 119 |
|
|
and impending failures (should any occur). |
| 120 |
|
|
</li> |
| 121 |
|
|
<li>A variety of web based information displays based on |
| 122 |
|
|
Java/SQL reporting and PHP on-the-fly page generation to |
| 123 |
|
|
show the latest alerts and data |
| 124 |
|
|
</li> |
| 125 |
|
|
<li>Large overhead monitor Helpdesk style displays for |
| 126 |
|
|
latest Alerting information |
| 127 |
|
|
</li> |
| 128 |
|
|
</ul> |
| 129 |
|
|
<h3> |
| 130 |
|
|
An Overview of the i-scream Central Monitoring System |
| 131 |
|
|
</h3> |
| 132 |
|
|
<p> |
| 133 |
|
|
The i-scream system monitors status and performance |
| 134 |
|
|
information obtained from machines feeding data into it and |
| 135 |
|
|
then displays this information in a variety of ways. |
| 136 |
|
|
</p> |
| 137 |
|
|
<p> |
| 138 |
|
|
This data is obtained through the running of small |
| 139 |
|
|
applications on the reporting machines. These applications |
| 140 |
|
|
are known as "Hosts". The i-scream system provides a range |
| 141 |
|
|
of hosts which are designed to be small and lightweight in |
| 142 |
|
|
their configuration and operation. See the website and |
| 143 |
|
|
appropriate documentation to locate currently available |
| 144 |
|
|
Host applications. These hosts are simply told where to |
| 145 |
|
|
contact the server at which point they are totally |
| 146 |
|
|
autonomous. They are able to obtain configuration from the |
| 147 |
|
|
server, detect changes in their configuration, send data |
| 148 |
|
|
packets (via UDP) containing monitoring information, and |
| 149 |
|
|
send so called "Heartbeat" packets (via TCP) periodically |
| 150 |
|
|
to indicate to the server that they are still alive. |
| 151 |
|
|
</p> |
| 152 |
|
|
<p> |
| 153 |
|
|
It is then fed into the i-scream server. The server then |
| 154 |
|
|
splits the data two ways. First it places the data in a |
| 155 |
|
|
database system, typically MySQL based, for later |
| 156 |
|
|
extraction and processing by the i-scream report generation |
| 157 |
|
|
tools. It then passes it onto to real-time "Clients" which |
| 158 |
|
|
handle the data as it enters the system. The system itself |
| 159 |
|
|
has an internal real-time client called the "Local Client" |
| 160 |
|
|
which has a series of Monitors running which can analyse |
| 161 |
|
|
the data. One of these Monitors also feeds the data off to |
| 162 |
|
|
a file repository, which is updated as new data comes in |
| 163 |
|
|
for each machine, this data is then read and displayed by |
| 164 |
|
|
the i-scream web services to provide a web interface to the |
| 165 |
|
|
data. The system also allows TCP connections by non-local |
| 166 |
|
|
clients (such as the i-scream supplied Conient), these |
| 167 |
|
|
applications provide a real-time view of the data as it |
| 168 |
|
|
flows through the system. |
| 169 |
|
|
</p> |
| 170 |
|
|
<p> |
| 171 |
|
|
The final section of the system links the Local Client |
| 172 |
|
|
Monitors to an alerting system. These Monitors can be |
| 173 |
|
|
configured to detect changes in the data past threshold |
| 174 |
|
|
levels. When a threshold is breached an alert is raised. |
| 175 |
|
|
This alert is then escalated as the alert persists through |
| 176 |
|
|
four live levels, NOTICE, WARNING, CAUTION and CRITICAL. |
| 177 |
|
|
The alerting system keeps an eye on the level and when a |
| 178 |
|
|
certain level is reached, certain alerting mechanisms fire |
| 179 |
|
|
through whatever medium they are configured to send. |
| 180 |
|
|
</p> |
| 181 |
|
|
</div> |
| 182 |
tdb |
1.3 |
<!--#include virtual="/footer.inc" --> |
| 183 |
tdb |
1.5 |
</div> |
| 184 |
tdb |
1.3 |
<!--#include virtual="/menu.inc" --> |
| 185 |
tdb |
1.5 |
</div> |
| 186 |
|
|
</body> |
| 187 |
tdb |
1.1 |
</html> |