1 |
|
<!--#include virtual="/doctype.inc" --> |
2 |
< |
|
3 |
< |
<head> |
4 |
< |
<title>CMS Features</title> |
2 |
> |
<head> |
3 |
> |
<title> |
4 |
> |
CMS Features |
5 |
> |
</title> |
6 |
|
<!--#include virtual="/style.inc" --> |
7 |
< |
</head> |
8 |
< |
|
9 |
< |
<body> |
10 |
< |
|
10 |
< |
<div id="container"> |
11 |
< |
|
12 |
< |
<div id="main"> |
13 |
< |
|
7 |
> |
</head> |
8 |
> |
<body> |
9 |
> |
<div id="container"> |
10 |
> |
<div id="main"> |
11 |
|
<!--#include virtual="/header.inc" --> |
12 |
< |
|
13 |
< |
<div id="contents"> |
14 |
< |
|
15 |
< |
<h1 class="top">CMS Features</h1> |
16 |
< |
|
17 |
< |
<h2>Problem Specification</h2> |
18 |
< |
|
19 |
< |
<h3>Original Problem</h3> |
20 |
< |
|
21 |
< |
<p> |
22 |
< |
This is the original specification given to us when we |
23 |
< |
started the project. The i-scream central monitoring |
24 |
< |
system meets this specification, and aims to extend it |
25 |
< |
further. This is, however, where it all began. |
26 |
< |
</p> |
27 |
< |
|
28 |
< |
<h3>Centralised Machine Monitoring</h3> |
29 |
< |
|
30 |
< |
<p> |
31 |
< |
The Computer Science department has a number of different machines |
32 |
< |
running a variety of different operating systems. One of the tasks |
33 |
< |
of the systems administrators is to make sure that the machines |
34 |
< |
don't run out of resources. This involves watching processor loads, |
35 |
< |
available disk space, swap space, etc. |
36 |
< |
</p> |
37 |
< |
|
38 |
< |
<p> |
39 |
< |
It isn't practicle to monitor a large number of machines by logging |
40 |
< |
on and running commands such as 'uptime' on the unix machines, or |
41 |
< |
by using performance monitor for NT servers. Thus this project is |
42 |
< |
to write monitoring software for each platform supported which |
43 |
< |
reports resource usage back to one centralized location. System |
44 |
< |
Administrators would then be able to monitor all machines from this |
45 |
< |
centralised location. |
46 |
< |
</p> |
47 |
< |
|
48 |
< |
<p> |
49 |
< |
Once this basic functionality is implemented it could usefully be |
50 |
< |
expanded to include logging of resource usage to identify longterm |
51 |
< |
trends/problems, alerter services which can directly contact |
52 |
< |
sysadmins (or even the general public) to bring attention to problem |
53 |
< |
areas. Ideally it should be possible to run multiple instances of |
54 |
< |
the reporting tool (with all instances being updated in realtime) |
55 |
< |
and to to be able to run the reporting tool as both as stand alone |
56 |
< |
application and embeded in a web page. |
57 |
< |
</p> |
58 |
< |
|
59 |
< |
<p> |
60 |
< |
This project will require you to write code for the unix and Win32 |
61 |
< |
APIs using C and knowledge of how the underlying operating systems |
62 |
< |
manage resources. It will also require some network/distributed |
63 |
< |
systems code and a GUI front end for the reporting tool. It is |
64 |
< |
important for students undertaking this project to understand the |
65 |
< |
importance of writing efficient and small code as the end product |
66 |
< |
will really be most useful when machines start run out of processing |
67 |
< |
power/memory/disk. |
68 |
< |
</p> |
69 |
< |
|
70 |
< |
<p> |
71 |
< |
John Cinnamond (email jc) whose idea this is, will provide technical |
72 |
< |
support for the project. |
73 |
< |
</p> |
74 |
< |
|
75 |
< |
<h2>Features</h2> |
76 |
< |
|
77 |
< |
<h3>Key Features of The System</h3> |
78 |
< |
|
79 |
< |
<ul> |
80 |
< |
<li>A centrally stored, dynamically reloaded, system wide configuration system</li> |
81 |
< |
<li>A totally extendable monitoring system, nothing except the Host (which |
82 |
< |
generates the data) and the Clients (which view it) know any details about |
83 |
< |
the data being sent, allowing data to be modified without changes to the |
84 |
< |
server architecture.</li> |
85 |
< |
<li>Central server and reporting tools all Java based for multi-platform portability</li> |
86 |
< |
<li>Distribution of core server components over CORBA to allow appropriate components |
87 |
< |
to run independently and to allow new components to be written to conform with the |
88 |
< |
CORBA interfaces.</li> |
89 |
< |
<li>Use of CORBA to create a hierarchical set of data entry points to the system |
90 |
< |
allowing the system to handle event storms and remote office locations.</li> |
91 |
< |
<li>One location for all system messages, despite being distributed.</li> |
92 |
< |
<li>XML data protocol used to make data processing and analysing easily extendable</li> |
93 |
< |
<li>A stateless server which can be moved and restarted at will, while Hosts, |
94 |
< |
Clients, and reporting tools are unaffected and simply reconnect when the |
95 |
< |
server is available again.</li> |
96 |
< |
<li>Simple and open end protocols to allow easy extension and platform porting of Hosts |
97 |
< |
and Clients.</li> |
98 |
< |
<li>Self monitoring, as all data queues within the system can be monitored and raise |
99 |
< |
alerts to warn of event storms and impending failures (should any occur).</li> |
100 |
< |
<li>A variety of web based information displays based on Java/SQL reporting and |
101 |
< |
PHP on-the-fly page generation to show the latest alerts and data</li> |
102 |
< |
<li>Large overhead monitor Helpdesk style displays for latest Alerting information</li> |
103 |
< |
</ul> |
104 |
< |
|
105 |
< |
<h3>An Overview of the i-scream Central Monitoring System</h3> |
106 |
< |
|
107 |
< |
<p> |
108 |
< |
The i-scream system monitors status and performance information |
109 |
< |
obtained from machines feeding data into it and then displays |
110 |
< |
this information in a variety of ways. |
111 |
< |
</p> |
112 |
< |
|
113 |
< |
<p> |
114 |
< |
This data is obtained through the running of small applications |
115 |
< |
on the reporting machines. These applications are known as |
116 |
< |
"Hosts". The i-scream system provides a range of hosts which are |
117 |
< |
designed to be small and lightweight in their configuration and |
118 |
< |
operation. See the website and appropriate documentation to |
119 |
< |
locate currently available Host applications. These hosts are |
120 |
< |
simply told where to contact the server at which point they are |
121 |
< |
totally autonomous. They are able to obtain configuration from |
122 |
< |
the server, detect changes in their configuration, send data |
123 |
< |
packets (via UDP) containing monitoring information, and send |
124 |
< |
so called "Heartbeat" packets (via TCP) periodically to indicate |
125 |
< |
to the server that they are still alive. |
126 |
< |
</p> |
127 |
< |
|
128 |
< |
<p> |
129 |
< |
It is then fed into the i-scream server. The server then splits |
130 |
< |
the data two ways. First it places the data in a database system, |
131 |
< |
typically MySQL based, for later extraction and processing by the |
132 |
< |
i-scream report generation tools. It then passes it onto to |
133 |
< |
real-time "Clients" which handle the data as it enters the system. |
134 |
< |
The system itself has an internal real-time client called the "Local |
135 |
< |
Client" which has a series of Monitors running which can analyse the |
136 |
< |
data. One of these Monitors also feeds the data off to a file |
137 |
< |
repository, which is updated as new data comes in for each machine, |
138 |
< |
this data is then read and displayed by the i-scream web services |
139 |
< |
to provide a web interface to the data. The system also allows TCP |
140 |
< |
connections by non-local clients (such as the i-scream supplied |
141 |
< |
Conient), these applications provide a real-time view of the data |
142 |
< |
as it flows through the system. |
143 |
< |
</p> |
144 |
< |
|
145 |
< |
<p> |
146 |
< |
The final section of the system links the Local Client Monitors to |
147 |
< |
an alerting system. These Monitors can be configured to detect |
148 |
< |
changes in the data past threshold levels. When a threshold is |
149 |
< |
breached an alert is raised. This alert is then escalated as the |
150 |
< |
alert persists through four live levels, NOTICE, WARNING, CAUTION |
151 |
< |
and CRITICAL. The alerting system keeps an eye on the level and |
152 |
< |
when a certain level is reached, certain alerting mechanisms fire |
153 |
< |
through whatever medium they are configured to send. |
154 |
< |
</p> |
155 |
< |
</div> |
156 |
< |
|
12 |
> |
<div id="contents"> |
13 |
> |
<h1 class="top"> |
14 |
> |
CMS Features |
15 |
> |
</h1> |
16 |
> |
<h2> |
17 |
> |
Problem Specification |
18 |
> |
</h2> |
19 |
> |
<h3> |
20 |
> |
Original Problem |
21 |
> |
</h3> |
22 |
> |
<p> |
23 |
> |
This is the original specification given to us when we |
24 |
> |
started the project. The i-scream central monitoring system |
25 |
> |
meets this specification, and aims to extend it further. |
26 |
> |
This is, however, where it all began. |
27 |
> |
</p> |
28 |
> |
<h3> |
29 |
> |
Centralised Machine Monitoring |
30 |
> |
</h3> |
31 |
> |
<p> |
32 |
> |
The Computer Science department has a number of different |
33 |
> |
machines running a variety of different operating systems. |
34 |
> |
One of the tasks of the systems administrators is to make |
35 |
> |
sure that the machines don't run out of resources. This |
36 |
> |
involves watching processor loads, available disk space, |
37 |
> |
swap space, etc. |
38 |
> |
</p> |
39 |
> |
<p> |
40 |
> |
It isn't practicle to monitor a large number of machines by |
41 |
> |
logging on and running commands such as 'uptime' on the |
42 |
> |
unix machines, or by using performance monitor for NT |
43 |
> |
servers. Thus this project is to write monitoring software |
44 |
> |
for each platform supported which reports resource usage |
45 |
> |
back to one centralized location. System Administrators |
46 |
> |
would then be able to monitor all machines from this |
47 |
> |
centralised location. |
48 |
> |
</p> |
49 |
> |
<p> |
50 |
> |
Once this basic functionality is implemented it could |
51 |
> |
usefully be expanded to include logging of resource usage |
52 |
> |
to identify longterm trends/problems, alerter services |
53 |
> |
which can directly contact sysadmins (or even the general |
54 |
> |
public) to bring attention to problem areas. Ideally it |
55 |
> |
should be possible to run multiple instances of the |
56 |
> |
reporting tool (with all instances being updated in |
57 |
> |
realtime) and to to be able to run the reporting tool as |
58 |
> |
both as stand alone application and embeded in a web page. |
59 |
> |
</p> |
60 |
> |
<p> |
61 |
> |
This project will require you to write code for the unix |
62 |
> |
and Win32 APIs using C and knowledge of how the underlying |
63 |
> |
operating systems manage resources. It will also require |
64 |
> |
some network/distributed systems code and a GUI front end |
65 |
> |
for the reporting tool. It is important for students |
66 |
> |
undertaking this project to understand the importance of |
67 |
> |
writing efficient and small code as the end product will |
68 |
> |
really be most useful when machines start run out of |
69 |
> |
processing power/memory/disk. |
70 |
> |
</p> |
71 |
> |
<p> |
72 |
> |
John Cinnamond (email jc) whose idea this is, will provide |
73 |
> |
technical support for the project. |
74 |
> |
</p> |
75 |
> |
<h2> |
76 |
> |
Features |
77 |
> |
</h2> |
78 |
> |
<h3> |
79 |
> |
Key Features of The System |
80 |
> |
</h3> |
81 |
> |
<ul> |
82 |
> |
<li>A centrally stored, dynamically reloaded, system wide |
83 |
> |
configuration system |
84 |
> |
</li> |
85 |
> |
<li>A totally extendable monitoring system, nothing except |
86 |
> |
the Host (which generates the data) and the Clients (which |
87 |
> |
view it) know any details about the data being sent, |
88 |
> |
allowing data to be modified without changes to the server |
89 |
> |
architecture. |
90 |
> |
</li> |
91 |
> |
<li>Central server and reporting tools all Java based for |
92 |
> |
multi-platform portability |
93 |
> |
</li> |
94 |
> |
<li>Distribution of core server components over CORBA to |
95 |
> |
allow appropriate components to run independently and to |
96 |
> |
allow new components to be written to conform with the |
97 |
> |
CORBA interfaces. |
98 |
> |
</li> |
99 |
> |
<li>Use of CORBA to create a hierarchical set of data entry |
100 |
> |
points to the system allowing the system to handle event |
101 |
> |
storms and remote office locations. |
102 |
> |
</li> |
103 |
> |
<li>One location for all system messages, despite being |
104 |
> |
distributed. |
105 |
> |
</li> |
106 |
> |
<li>XML data protocol used to make data processing and |
107 |
> |
analysing easily extendable |
108 |
> |
</li> |
109 |
> |
<li>A stateless server which can be moved and restarted at |
110 |
> |
will, while Hosts, Clients, and reporting tools are |
111 |
> |
unaffected and simply reconnect when the server is |
112 |
> |
available again. |
113 |
> |
</li> |
114 |
> |
<li>Simple and open end protocols to allow easy extension |
115 |
> |
and platform porting of Hosts and Clients. |
116 |
> |
</li> |
117 |
> |
<li>Self monitoring, as all data queues within the system |
118 |
> |
can be monitored and raise alerts to warn of event storms |
119 |
> |
and impending failures (should any occur). |
120 |
> |
</li> |
121 |
> |
<li>A variety of web based information displays based on |
122 |
> |
Java/SQL reporting and PHP on-the-fly page generation to |
123 |
> |
show the latest alerts and data |
124 |
> |
</li> |
125 |
> |
<li>Large overhead monitor Helpdesk style displays for |
126 |
> |
latest Alerting information |
127 |
> |
</li> |
128 |
> |
</ul> |
129 |
> |
<h3> |
130 |
> |
An Overview of the i-scream Central Monitoring System |
131 |
> |
</h3> |
132 |
> |
<p> |
133 |
> |
The i-scream system monitors status and performance |
134 |
> |
information obtained from machines feeding data into it and |
135 |
> |
then displays this information in a variety of ways. |
136 |
> |
</p> |
137 |
> |
<p> |
138 |
> |
This data is obtained through the running of small |
139 |
> |
applications on the reporting machines. These applications |
140 |
> |
are known as "Hosts". The i-scream system provides a range |
141 |
> |
of hosts which are designed to be small and lightweight in |
142 |
> |
their configuration and operation. See the website and |
143 |
> |
appropriate documentation to locate currently available |
144 |
> |
Host applications. These hosts are simply told where to |
145 |
> |
contact the server at which point they are totally |
146 |
> |
autonomous. They are able to obtain configuration from the |
147 |
> |
server, detect changes in their configuration, send data |
148 |
> |
packets (via UDP) containing monitoring information, and |
149 |
> |
send so called "Heartbeat" packets (via TCP) periodically |
150 |
> |
to indicate to the server that they are still alive. |
151 |
> |
</p> |
152 |
> |
<p> |
153 |
> |
It is then fed into the i-scream server. The server then |
154 |
> |
splits the data two ways. First it places the data in a |
155 |
> |
database system, typically MySQL based, for later |
156 |
> |
extraction and processing by the i-scream report generation |
157 |
> |
tools. It then passes it onto to real-time "Clients" which |
158 |
> |
handle the data as it enters the system. The system itself |
159 |
> |
has an internal real-time client called the "Local Client" |
160 |
> |
which has a series of Monitors running which can analyse |
161 |
> |
the data. One of these Monitors also feeds the data off to |
162 |
> |
a file repository, which is updated as new data comes in |
163 |
> |
for each machine, this data is then read and displayed by |
164 |
> |
the i-scream web services to provide a web interface to the |
165 |
> |
data. The system also allows TCP connections by non-local |
166 |
> |
clients (such as the i-scream supplied Conient), these |
167 |
> |
applications provide a real-time view of the data as it |
168 |
> |
flows through the system. |
169 |
> |
</p> |
170 |
> |
<p> |
171 |
> |
The final section of the system links the Local Client |
172 |
> |
Monitors to an alerting system. These Monitors can be |
173 |
> |
configured to detect changes in the data past threshold |
174 |
> |
levels. When a threshold is breached an alert is raised. |
175 |
> |
This alert is then escalated as the alert persists through |
176 |
> |
four live levels, NOTICE, WARNING, CAUTION and CRITICAL. |
177 |
> |
The alerting system keeps an eye on the level and when a |
178 |
> |
certain level is reached, certain alerting mechanisms fire |
179 |
> |
through whatever medium they are configured to send. |
180 |
> |
</p> |
181 |
> |
</div> |
182 |
|
<!--#include virtual="/footer.inc" --> |
183 |
< |
|
162 |
< |
</div> |
163 |
< |
|
183 |
> |
</div> |
184 |
|
<!--#include virtual="/menu.inc" --> |
185 |
< |
|
186 |
< |
</div> |
167 |
< |
|
168 |
< |
</body> |
185 |
> |
</div> |
186 |
> |
</body> |
187 |
|
</html> |