ViewVC Help
View File | Revision Log | Show Annotations | Revision Graph | Root Listing
root/i-scream/web/www/cms/features.xhtml
Revision: 1.5
Committed: Tue Mar 23 23:43:26 2004 UTC (20 years, 7 months ago) by tdb
Branch: MAIN
Changes since 1.4: +181 -163 lines
Log Message:
Another biggish commit.

All pages are now XHTML 1.1 compliant. I've also tided (with the help of
the tidy tool) all the pages, so they're neater.

There are still parts of the site that won't validate - such as the CGI
scripts, and the CVS stuff - but I'll get to them tomorrow.

File Contents

# User Rev Content
1 tdb 1.4 <!--#include virtual="/doctype.inc" -->
2 tdb 1.5 <head>
3     <title>
4     CMS Features
5     </title>
6 tdb 1.3 <!--#include virtual="/style.inc" -->
7 tdb 1.5 </head>
8     <body>
9     <div id="container">
10     <div id="main">
11 tdb 1.3 <!--#include virtual="/header.inc" -->
12 tdb 1.5 <div id="contents">
13     <h1 class="top">
14     CMS Features
15     </h1>
16     <h2>
17     Problem Specification
18     </h2>
19     <h3>
20     Original Problem
21     </h3>
22     <p>
23     This is the original specification given to us when we
24     started the project. The i-scream central monitoring system
25     meets this specification, and aims to extend it further.
26     This is, however, where it all began.
27     </p>
28     <h3>
29     Centralised Machine Monitoring
30     </h3>
31     <p>
32     The Computer Science department has a number of different
33     machines running a variety of different operating systems.
34     One of the tasks of the systems administrators is to make
35     sure that the machines don't run out of resources. This
36     involves watching processor loads, available disk space,
37     swap space, etc.
38     </p>
39     <p>
40     It isn't practicle to monitor a large number of machines by
41     logging on and running commands such as 'uptime' on the
42     unix machines, or by using performance monitor for NT
43     servers. Thus this project is to write monitoring software
44     for each platform supported which reports resource usage
45     back to one centralized location. System Administrators
46     would then be able to monitor all machines from this
47     centralised location.
48     </p>
49     <p>
50     Once this basic functionality is implemented it could
51     usefully be expanded to include logging of resource usage
52     to identify longterm trends/problems, alerter services
53     which can directly contact sysadmins (or even the general
54     public) to bring attention to problem areas. Ideally it
55     should be possible to run multiple instances of the
56     reporting tool (with all instances being updated in
57     realtime) and to to be able to run the reporting tool as
58     both as stand alone application and embeded in a web page.
59     </p>
60     <p>
61     This project will require you to write code for the unix
62     and Win32 APIs using C and knowledge of how the underlying
63     operating systems manage resources. It will also require
64     some network/distributed systems code and a GUI front end
65     for the reporting tool. It is important for students
66     undertaking this project to understand the importance of
67     writing efficient and small code as the end product will
68     really be most useful when machines start run out of
69     processing power/memory/disk.
70     </p>
71     <p>
72     John Cinnamond (email jc) whose idea this is, will provide
73     technical support for the project.
74     </p>
75     <h2>
76     Features
77     </h2>
78     <h3>
79     Key Features of The System
80     </h3>
81     <ul>
82     <li>A centrally stored, dynamically reloaded, system wide
83     configuration system
84     </li>
85     <li>A totally extendable monitoring system, nothing except
86     the Host (which generates the data) and the Clients (which
87     view it) know any details about the data being sent,
88     allowing data to be modified without changes to the server
89     architecture.
90     </li>
91     <li>Central server and reporting tools all Java based for
92     multi-platform portability
93     </li>
94     <li>Distribution of core server components over CORBA to
95     allow appropriate components to run independently and to
96     allow new components to be written to conform with the
97     CORBA interfaces.
98     </li>
99     <li>Use of CORBA to create a hierarchical set of data entry
100     points to the system allowing the system to handle event
101     storms and remote office locations.
102     </li>
103     <li>One location for all system messages, despite being
104     distributed.
105     </li>
106     <li>XML data protocol used to make data processing and
107     analysing easily extendable
108     </li>
109     <li>A stateless server which can be moved and restarted at
110     will, while Hosts, Clients, and reporting tools are
111     unaffected and simply reconnect when the server is
112     available again.
113     </li>
114     <li>Simple and open end protocols to allow easy extension
115     and platform porting of Hosts and Clients.
116     </li>
117     <li>Self monitoring, as all data queues within the system
118     can be monitored and raise alerts to warn of event storms
119     and impending failures (should any occur).
120     </li>
121     <li>A variety of web based information displays based on
122     Java/SQL reporting and PHP on-the-fly page generation to
123     show the latest alerts and data
124     </li>
125     <li>Large overhead monitor Helpdesk style displays for
126     latest Alerting information
127     </li>
128     </ul>
129     <h3>
130     An Overview of the i-scream Central Monitoring System
131     </h3>
132     <p>
133     The i-scream system monitors status and performance
134     information obtained from machines feeding data into it and
135     then displays this information in a variety of ways.
136     </p>
137     <p>
138     This data is obtained through the running of small
139     applications on the reporting machines. These applications
140     are known as "Hosts". The i-scream system provides a range
141     of hosts which are designed to be small and lightweight in
142     their configuration and operation. See the website and
143     appropriate documentation to locate currently available
144     Host applications. These hosts are simply told where to
145     contact the server at which point they are totally
146     autonomous. They are able to obtain configuration from the
147     server, detect changes in their configuration, send data
148     packets (via UDP) containing monitoring information, and
149     send so called "Heartbeat" packets (via TCP) periodically
150     to indicate to the server that they are still alive.
151     </p>
152     <p>
153     It is then fed into the i-scream server. The server then
154     splits the data two ways. First it places the data in a
155     database system, typically MySQL based, for later
156     extraction and processing by the i-scream report generation
157     tools. It then passes it onto to real-time "Clients" which
158     handle the data as it enters the system. The system itself
159     has an internal real-time client called the "Local Client"
160     which has a series of Monitors running which can analyse
161     the data. One of these Monitors also feeds the data off to
162     a file repository, which is updated as new data comes in
163     for each machine, this data is then read and displayed by
164     the i-scream web services to provide a web interface to the
165     data. The system also allows TCP connections by non-local
166     clients (such as the i-scream supplied Conient), these
167     applications provide a real-time view of the data as it
168     flows through the system.
169     </p>
170     <p>
171     The final section of the system links the Local Client
172     Monitors to an alerting system. These Monitors can be
173     configured to detect changes in the data past threshold
174     levels. When a threshold is breached an alert is raised.
175     This alert is then escalated as the alert persists through
176     four live levels, NOTICE, WARNING, CAUTION and CRITICAL.
177     The alerting system keeps an eye on the level and when a
178     certain level is reached, certain alerting mechanisms fire
179     through whatever medium they are configured to send.
180     </p>
181     </div>
182 tdb 1.3 <!--#include virtual="/footer.inc" -->
183 tdb 1.5 </div>
184 tdb 1.3 <!--#include virtual="/menu.inc" -->
185 tdb 1.5 </div>
186     </body>
187 tdb 1.1 </html>