| 1 |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" |
| 2 |
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| 3 |
|
| 4 |
<html> |
| 5 |
|
| 6 |
<head> |
| 7 |
<title>CMS Features</title> |
| 8 |
<!--#include virtual="/style.inc" --> |
| 9 |
</head> |
| 10 |
|
| 11 |
<body> |
| 12 |
|
| 13 |
<div id="container"> |
| 14 |
|
| 15 |
<div id="main"> |
| 16 |
|
| 17 |
<!--#include virtual="/header.inc" --> |
| 18 |
|
| 19 |
<div id="contents"> |
| 20 |
|
| 21 |
<h1 class="top">CMS Features</h1> |
| 22 |
|
| 23 |
<h2>Problem Specification</h2> |
| 24 |
|
| 25 |
<h3>Original Problem</h3> |
| 26 |
|
| 27 |
<p> |
| 28 |
This is the original specification given to us when we |
| 29 |
started the project. The i-scream central monitoring |
| 30 |
system meets this specification, and aims to extend it |
| 31 |
further. This is, however, where it all began. |
| 32 |
</p> |
| 33 |
|
| 34 |
<h3>Centralised Machine Monitoring</h3> |
| 35 |
|
| 36 |
<p> |
| 37 |
The Computer Science department has a number of different machines |
| 38 |
running a variety of different operating systems. One of the tasks |
| 39 |
of the systems administrators is to make sure that the machines |
| 40 |
don't run out of resources. This involves watching processor loads, |
| 41 |
available disk space, swap space, etc. |
| 42 |
</p> |
| 43 |
|
| 44 |
<p> |
| 45 |
It isn't practicle to monitor a large number of machines by logging |
| 46 |
on and running commands such as 'uptime' on the unix machines, or |
| 47 |
by using performance monitor for NT servers. Thus this project is |
| 48 |
to write monitoring software for each platform supported which |
| 49 |
reports resource usage back to one centralized location. System |
| 50 |
Administrators would then be able to monitor all machines from this |
| 51 |
centralised location. |
| 52 |
</p> |
| 53 |
|
| 54 |
<p> |
| 55 |
Once this basic functionality is implemented it could usefully be |
| 56 |
expanded to include logging of resource usage to identify longterm |
| 57 |
trends/problems, alerter services which can directly contact |
| 58 |
sysadmins (or even the general public) to bring attention to problem |
| 59 |
areas. Ideally it should be possible to run multiple instances of |
| 60 |
the reporting tool (with all instances being updated in realtime) |
| 61 |
and to to be able to run the reporting tool as both as stand alone |
| 62 |
application and embeded in a web page. |
| 63 |
</p> |
| 64 |
|
| 65 |
<p> |
| 66 |
This project will require you to write code for the unix and Win32 |
| 67 |
APIs using C and knowledge of how the underlying operating systems |
| 68 |
manage resources. It will also require some network/distributed |
| 69 |
systems code and a GUI front end for the reporting tool. It is |
| 70 |
important for students undertaking this project to understand the |
| 71 |
importance of writing efficient and small code as the end product |
| 72 |
will really be most useful when machines start run out of processing |
| 73 |
power/memory/disk. |
| 74 |
</p> |
| 75 |
|
| 76 |
<p> |
| 77 |
John Cinnamond (email jc) whose idea this is, will provide technical |
| 78 |
support for the project. |
| 79 |
</p> |
| 80 |
|
| 81 |
<h2>Features</h2> |
| 82 |
|
| 83 |
<h3>Key Features of The System</h3> |
| 84 |
|
| 85 |
<ul> |
| 86 |
<li>A centrally stored, dynamically reloaded, system wide configuration system</li> |
| 87 |
<li>A totally extendable monitoring system, nothing except the Host (which |
| 88 |
generates the data) and the Clients (which view it) know any details about |
| 89 |
the data being sent, allowing data to be modified without changes to the |
| 90 |
server architecture.</li> |
| 91 |
<li>Central server and reporting tools all Java based for multi-platform portability</li> |
| 92 |
<li>Distribution of core server components over CORBA to allow appropriate components |
| 93 |
to run independently and to allow new components to be written to conform with the |
| 94 |
CORBA interfaces.</li> |
| 95 |
<li>Use of CORBA to create a hierarchical set of data entry points to the system |
| 96 |
allowing the system to handle event storms and remote office locations.</li> |
| 97 |
<li>One location for all system messages, despite being distributed.</li> |
| 98 |
<li>XML data protocol used to make data processing and analysing easily extendable</li> |
| 99 |
<li>A stateless server which can be moved and restarted at will, while Hosts, |
| 100 |
Clients, and reporting tools are unaffected and simply reconnect when the |
| 101 |
server is available again.</li> |
| 102 |
<li>Simple and open end protocols to allow easy extension and platform porting of Hosts |
| 103 |
and Clients.</li> |
| 104 |
<li>Self monitoring, as all data queues within the system can be monitored and raise |
| 105 |
alerts to warn of event storms and impending failures (should any occur).</li> |
| 106 |
<li>A variety of web based information displays based on Java/SQL reporting and |
| 107 |
PHP on-the-fly page generation to show the latest alerts and data</li> |
| 108 |
<li>Large overhead monitor Helpdesk style displays for latest Alerting information</li> |
| 109 |
</ul> |
| 110 |
|
| 111 |
<h3>An Overview of the i-scream Central Monitoring System</h3> |
| 112 |
|
| 113 |
<p> |
| 114 |
The i-scream system monitors status and performance information |
| 115 |
obtained from machines feeding data into it and then displays |
| 116 |
this information in a variety of ways. |
| 117 |
</p> |
| 118 |
|
| 119 |
<p> |
| 120 |
This data is obtained through the running of small applications |
| 121 |
on the reporting machines. These applications are known as |
| 122 |
"Hosts". The i-scream system provides a range of hosts which are |
| 123 |
designed to be small and lightweight in their configuration and |
| 124 |
operation. See the website and appropriate documentation to |
| 125 |
locate currently available Host applications. These hosts are |
| 126 |
simply told where to contact the server at which point they are |
| 127 |
totally autonomous. They are able to obtain configuration from |
| 128 |
the server, detect changes in their configuration, send data |
| 129 |
packets (via UDP) containing monitoring information, and send |
| 130 |
so called "Heartbeat" packets (via TCP) periodically to indicate |
| 131 |
to the server that they are still alive. |
| 132 |
</p> |
| 133 |
|
| 134 |
<p> |
| 135 |
It is then fed into the i-scream server. The server then splits |
| 136 |
the data two ways. First it places the data in a database system, |
| 137 |
typically MySQL based, for later extraction and processing by the |
| 138 |
i-scream report generation tools. It then passes it onto to |
| 139 |
real-time "Clients" which handle the data as it enters the system. |
| 140 |
The system itself has an internal real-time client called the "Local |
| 141 |
Client" which has a series of Monitors running which can analyse the |
| 142 |
data. One of these Monitors also feeds the data off to a file |
| 143 |
repository, which is updated as new data comes in for each machine, |
| 144 |
this data is then read and displayed by the i-scream web services |
| 145 |
to provide a web interface to the data. The system also allows TCP |
| 146 |
connections by non-local clients (such as the i-scream supplied |
| 147 |
Conient), these applications provide a real-time view of the data |
| 148 |
as it flows through the system. |
| 149 |
</p> |
| 150 |
|
| 151 |
<p> |
| 152 |
The final section of the system links the Local Client Monitors to |
| 153 |
an alerting system. These Monitors can be configured to detect |
| 154 |
changes in the data past threshold levels. When a threshold is |
| 155 |
breached an alert is raised. This alert is then escalated as the |
| 156 |
alert persists through four live levels, NOTICE, WARNING, CAUTION |
| 157 |
and CRITICAL. The alerting system keeps an eye on the level and |
| 158 |
when a certain level is reached, certain alerting mechanisms fire |
| 159 |
through whatever medium they are configured to send. |
| 160 |
</p> |
| 161 |
</div> |
| 162 |
|
| 163 |
<!--#include virtual="/footer.inc" --> |
| 164 |
|
| 165 |
</div> |
| 166 |
|
| 167 |
<!--#include virtual="/menu.inc" --> |
| 168 |
|
| 169 |
</div> |
| 170 |
|
| 171 |
</body> |
| 172 |
</html> |