1 |
tdb |
1.4 |
<!--#include virtual="/doctype.inc" --> |
2 |
tdb |
1.1 |
|
3 |
|
|
<head> |
4 |
tdb |
1.3 |
<title>CMS Features</title> |
5 |
|
|
<!--#include virtual="/style.inc" --> |
6 |
tdb |
1.1 |
</head> |
7 |
|
|
|
8 |
tdb |
1.3 |
<body> |
9 |
|
|
|
10 |
|
|
<div id="container"> |
11 |
|
|
|
12 |
|
|
<div id="main"> |
13 |
|
|
|
14 |
|
|
<!--#include virtual="/header.inc" --> |
15 |
|
|
|
16 |
|
|
<div id="contents"> |
17 |
|
|
|
18 |
|
|
<h1 class="top">CMS Features</h1> |
19 |
tdb |
1.1 |
|
20 |
tdb |
1.3 |
<h2>Problem Specification</h2> |
21 |
|
|
|
22 |
|
|
<h3>Original Problem</h3> |
23 |
|
|
|
24 |
|
|
<p> |
25 |
|
|
This is the original specification given to us when we |
26 |
|
|
started the project. The i-scream central monitoring |
27 |
|
|
system meets this specification, and aims to extend it |
28 |
|
|
further. This is, however, where it all began. |
29 |
|
|
</p> |
30 |
tdb |
1.1 |
|
31 |
tdb |
1.3 |
<h3>Centralised Machine Monitoring</h3> |
32 |
|
|
|
33 |
|
|
<p> |
34 |
|
|
The Computer Science department has a number of different machines |
35 |
|
|
running a variety of different operating systems. One of the tasks |
36 |
|
|
of the systems administrators is to make sure that the machines |
37 |
|
|
don't run out of resources. This involves watching processor loads, |
38 |
|
|
available disk space, swap space, etc. |
39 |
|
|
</p> |
40 |
|
|
|
41 |
|
|
<p> |
42 |
|
|
It isn't practicle to monitor a large number of machines by logging |
43 |
|
|
on and running commands such as 'uptime' on the unix machines, or |
44 |
|
|
by using performance monitor for NT servers. Thus this project is |
45 |
|
|
to write monitoring software for each platform supported which |
46 |
|
|
reports resource usage back to one centralized location. System |
47 |
|
|
Administrators would then be able to monitor all machines from this |
48 |
|
|
centralised location. |
49 |
|
|
</p> |
50 |
|
|
|
51 |
|
|
<p> |
52 |
|
|
Once this basic functionality is implemented it could usefully be |
53 |
|
|
expanded to include logging of resource usage to identify longterm |
54 |
|
|
trends/problems, alerter services which can directly contact |
55 |
|
|
sysadmins (or even the general public) to bring attention to problem |
56 |
|
|
areas. Ideally it should be possible to run multiple instances of |
57 |
|
|
the reporting tool (with all instances being updated in realtime) |
58 |
|
|
and to to be able to run the reporting tool as both as stand alone |
59 |
|
|
application and embeded in a web page. |
60 |
|
|
</p> |
61 |
|
|
|
62 |
|
|
<p> |
63 |
|
|
This project will require you to write code for the unix and Win32 |
64 |
|
|
APIs using C and knowledge of how the underlying operating systems |
65 |
|
|
manage resources. It will also require some network/distributed |
66 |
|
|
systems code and a GUI front end for the reporting tool. It is |
67 |
|
|
important for students undertaking this project to understand the |
68 |
|
|
importance of writing efficient and small code as the end product |
69 |
|
|
will really be most useful when machines start run out of processing |
70 |
|
|
power/memory/disk. |
71 |
|
|
</p> |
72 |
|
|
|
73 |
|
|
<p> |
74 |
|
|
John Cinnamond (email jc) whose idea this is, will provide technical |
75 |
|
|
support for the project. |
76 |
|
|
</p> |
77 |
|
|
|
78 |
|
|
<h2>Features</h2> |
79 |
|
|
|
80 |
|
|
<h3>Key Features of The System</h3> |
81 |
tdb |
1.1 |
|
82 |
|
|
<ul> |
83 |
|
|
<li>A centrally stored, dynamically reloaded, system wide configuration system</li> |
84 |
|
|
<li>A totally extendable monitoring system, nothing except the Host (which |
85 |
|
|
generates the data) and the Clients (which view it) know any details about |
86 |
|
|
the data being sent, allowing data to be modified without changes to the |
87 |
|
|
server architecture.</li> |
88 |
|
|
<li>Central server and reporting tools all Java based for multi-platform portability</li> |
89 |
|
|
<li>Distribution of core server components over CORBA to allow appropriate components |
90 |
|
|
to run independently and to allow new components to be written to conform with the |
91 |
|
|
CORBA interfaces.</li> |
92 |
|
|
<li>Use of CORBA to create a hierarchical set of data entry points to the system |
93 |
|
|
allowing the system to handle event storms and remote office locations.</li> |
94 |
|
|
<li>One location for all system messages, despite being distributed.</li> |
95 |
|
|
<li>XML data protocol used to make data processing and analysing easily extendable</li> |
96 |
|
|
<li>A stateless server which can be moved and restarted at will, while Hosts, |
97 |
|
|
Clients, and reporting tools are unaffected and simply reconnect when the |
98 |
|
|
server is available again.</li> |
99 |
|
|
<li>Simple and open end protocols to allow easy extension and platform porting of Hosts |
100 |
|
|
and Clients.</li> |
101 |
|
|
<li>Self monitoring, as all data queues within the system can be monitored and raise |
102 |
|
|
alerts to warn of event storms and impending failures (should any occur).</li> |
103 |
|
|
<li>A variety of web based information displays based on Java/SQL reporting and |
104 |
|
|
PHP on-the-fly page generation to show the latest alerts and data</li> |
105 |
|
|
<li>Large overhead monitor Helpdesk style displays for latest Alerting information</li> |
106 |
|
|
</ul> |
107 |
|
|
|
108 |
tdb |
1.3 |
<h3>An Overview of the i-scream Central Monitoring System</h3> |
109 |
tdb |
1.1 |
|
110 |
tdb |
1.3 |
<p> |
111 |
tdb |
1.1 |
The i-scream system monitors status and performance information |
112 |
|
|
obtained from machines feeding data into it and then displays |
113 |
|
|
this information in a variety of ways. |
114 |
|
|
</p> |
115 |
|
|
|
116 |
tdb |
1.3 |
<p> |
117 |
tdb |
1.1 |
This data is obtained through the running of small applications |
118 |
|
|
on the reporting machines. These applications are known as |
119 |
|
|
"Hosts". The i-scream system provides a range of hosts which are |
120 |
|
|
designed to be small and lightweight in their configuration and |
121 |
|
|
operation. See the website and appropriate documentation to |
122 |
|
|
locate currently available Host applications. These hosts are |
123 |
|
|
simply told where to contact the server at which point they are |
124 |
|
|
totally autonomous. They are able to obtain configuration from |
125 |
|
|
the server, detect changes in their configuration, send data |
126 |
|
|
packets (via UDP) containing monitoring information, and send |
127 |
|
|
so called "Heartbeat" packets (via TCP) periodically to indicate |
128 |
|
|
to the server that they are still alive. |
129 |
|
|
</p> |
130 |
|
|
|
131 |
tdb |
1.3 |
<p> |
132 |
tdb |
1.1 |
It is then fed into the i-scream server. The server then splits |
133 |
|
|
the data two ways. First it places the data in a database system, |
134 |
|
|
typically MySQL based, for later extraction and processing by the |
135 |
|
|
i-scream report generation tools. It then passes it onto to |
136 |
|
|
real-time "Clients" which handle the data as it enters the system. |
137 |
|
|
The system itself has an internal real-time client called the "Local |
138 |
|
|
Client" which has a series of Monitors running which can analyse the |
139 |
|
|
data. One of these Monitors also feeds the data off to a file |
140 |
|
|
repository, which is updated as new data comes in for each machine, |
141 |
|
|
this data is then read and displayed by the i-scream web services |
142 |
|
|
to provide a web interface to the data. The system also allows TCP |
143 |
|
|
connections by non-local clients (such as the i-scream supplied |
144 |
|
|
Conient), these applications provide a real-time view of the data |
145 |
|
|
as it flows through the system. |
146 |
|
|
</p> |
147 |
|
|
|
148 |
tdb |
1.3 |
<p> |
149 |
tdb |
1.1 |
The final section of the system links the Local Client Monitors to |
150 |
|
|
an alerting system. These Monitors can be configured to detect |
151 |
|
|
changes in the data past threshold levels. When a threshold is |
152 |
|
|
breached an alert is raised. This alert is then escalated as the |
153 |
|
|
alert persists through four live levels, NOTICE, WARNING, CAUTION |
154 |
|
|
and CRITICAL. The alerting system keeps an eye on the level and |
155 |
|
|
when a certain level is reached, certain alerting mechanisms fire |
156 |
|
|
through whatever medium they are configured to send. |
157 |
|
|
</p> |
158 |
tdb |
1.3 |
</div> |
159 |
|
|
|
160 |
|
|
<!--#include virtual="/footer.inc" --> |
161 |
|
|
|
162 |
|
|
</div> |
163 |
|
|
|
164 |
|
|
<!--#include virtual="/menu.inc" --> |
165 |
|
|
|
166 |
|
|
</div> |
167 |
tdb |
1.1 |
|
168 |
|
|
</body> |
169 |
|
|
</html> |