ViewVC Help
View File | Revision Log | Show Annotations | Revision Graph | Root Listing
root/i-scream/web/www/cms/features.shtml
Revision: 1.3
Committed: Sun Mar 21 23:58:13 2004 UTC (20 years, 9 months ago) by tdb
Branch: MAIN
Changes since 1.2: +89 -40 lines
Log Message:
Commit new website. The old site is tagged, so this won't change the live
site... but it does move HEAD on to the new site.

Too many changes to list really. General points are:

- Moved to a XHTML CSS compliant site.
- Reorganised the site into a more multi-project based look.
- Removed a lot of cruft.

Still to do:

- Fix all the zillions of bugs stopping the whole site from validating :-)
- Tidy up the HTML in terms of layout and indentation.

Thanks to AJ for his help this weekend in doing this.

File Contents

# User Rev Content
1 tdb 1.3 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3 tdb 1.1
4     <html>
5    
6     <head>
7 tdb 1.3 <title>CMS Features</title>
8     <!--#include virtual="/style.inc" -->
9 tdb 1.1 </head>
10    
11 tdb 1.3 <body>
12    
13     <div id="container">
14    
15     <div id="main">
16    
17     <!--#include virtual="/header.inc" -->
18    
19     <div id="contents">
20    
21     <h1 class="top">CMS Features</h1>
22 tdb 1.1
23 tdb 1.3 <h2>Problem Specification</h2>
24    
25     <h3>Original Problem</h3>
26    
27     <p>
28     This is the original specification given to us when we
29     started the project. The i-scream central monitoring
30     system meets this specification, and aims to extend it
31     further. This is, however, where it all began.
32     </p>
33 tdb 1.1
34 tdb 1.3 <h3>Centralised Machine Monitoring</h3>
35    
36     <p>
37     The Computer Science department has a number of different machines
38     running a variety of different operating systems. One of the tasks
39     of the systems administrators is to make sure that the machines
40     don't run out of resources. This involves watching processor loads,
41     available disk space, swap space, etc.
42     </p>
43    
44     <p>
45     It isn't practicle to monitor a large number of machines by logging
46     on and running commands such as 'uptime' on the unix machines, or
47     by using performance monitor for NT servers. Thus this project is
48     to write monitoring software for each platform supported which
49     reports resource usage back to one centralized location. System
50     Administrators would then be able to monitor all machines from this
51     centralised location.
52     </p>
53    
54     <p>
55     Once this basic functionality is implemented it could usefully be
56     expanded to include logging of resource usage to identify longterm
57     trends/problems, alerter services which can directly contact
58     sysadmins (or even the general public) to bring attention to problem
59     areas. Ideally it should be possible to run multiple instances of
60     the reporting tool (with all instances being updated in realtime)
61     and to to be able to run the reporting tool as both as stand alone
62     application and embeded in a web page.
63     </p>
64    
65     <p>
66     This project will require you to write code for the unix and Win32
67     APIs using C and knowledge of how the underlying operating systems
68     manage resources. It will also require some network/distributed
69     systems code and a GUI front end for the reporting tool. It is
70     important for students undertaking this project to understand the
71     importance of writing efficient and small code as the end product
72     will really be most useful when machines start run out of processing
73     power/memory/disk.
74     </p>
75    
76     <p>
77     John Cinnamond (email jc) whose idea this is, will provide technical
78     support for the project.
79     </p>
80    
81     <h2>Features</h2>
82    
83     <h3>Key Features of The System</h3>
84 tdb 1.1
85     <ul>
86     <li>A centrally stored, dynamically reloaded, system wide configuration system</li>
87     <li>A totally extendable monitoring system, nothing except the Host (which
88     generates the data) and the Clients (which view it) know any details about
89     the data being sent, allowing data to be modified without changes to the
90     server architecture.</li>
91     <li>Central server and reporting tools all Java based for multi-platform portability</li>
92     <li>Distribution of core server components over CORBA to allow appropriate components
93     to run independently and to allow new components to be written to conform with the
94     CORBA interfaces.</li>
95     <li>Use of CORBA to create a hierarchical set of data entry points to the system
96     allowing the system to handle event storms and remote office locations.</li>
97     <li>One location for all system messages, despite being distributed.</li>
98     <li>XML data protocol used to make data processing and analysing easily extendable</li>
99     <li>A stateless server which can be moved and restarted at will, while Hosts,
100     Clients, and reporting tools are unaffected and simply reconnect when the
101     server is available again.</li>
102     <li>Simple and open end protocols to allow easy extension and platform porting of Hosts
103     and Clients.</li>
104     <li>Self monitoring, as all data queues within the system can be monitored and raise
105     alerts to warn of event storms and impending failures (should any occur).</li>
106     <li>A variety of web based information displays based on Java/SQL reporting and
107     PHP on-the-fly page generation to show the latest alerts and data</li>
108     <li>Large overhead monitor Helpdesk style displays for latest Alerting information</li>
109     </ul>
110    
111 tdb 1.3 <h3>An Overview of the i-scream Central Monitoring System</h3>
112 tdb 1.1
113 tdb 1.3 <p>
114 tdb 1.1 The i-scream system monitors status and performance information
115     obtained from machines feeding data into it and then displays
116     this information in a variety of ways.
117     </p>
118    
119 tdb 1.3 <p>
120 tdb 1.1 This data is obtained through the running of small applications
121     on the reporting machines. These applications are known as
122     "Hosts". The i-scream system provides a range of hosts which are
123     designed to be small and lightweight in their configuration and
124     operation. See the website and appropriate documentation to
125     locate currently available Host applications. These hosts are
126     simply told where to contact the server at which point they are
127     totally autonomous. They are able to obtain configuration from
128     the server, detect changes in their configuration, send data
129     packets (via UDP) containing monitoring information, and send
130     so called "Heartbeat" packets (via TCP) periodically to indicate
131     to the server that they are still alive.
132     </p>
133    
134 tdb 1.3 <p>
135 tdb 1.1 It is then fed into the i-scream server. The server then splits
136     the data two ways. First it places the data in a database system,
137     typically MySQL based, for later extraction and processing by the
138     i-scream report generation tools. It then passes it onto to
139     real-time "Clients" which handle the data as it enters the system.
140     The system itself has an internal real-time client called the "Local
141     Client" which has a series of Monitors running which can analyse the
142     data. One of these Monitors also feeds the data off to a file
143     repository, which is updated as new data comes in for each machine,
144     this data is then read and displayed by the i-scream web services
145     to provide a web interface to the data. The system also allows TCP
146     connections by non-local clients (such as the i-scream supplied
147     Conient), these applications provide a real-time view of the data
148     as it flows through the system.
149     </p>
150    
151 tdb 1.3 <p>
152 tdb 1.1 The final section of the system links the Local Client Monitors to
153     an alerting system. These Monitors can be configured to detect
154     changes in the data past threshold levels. When a threshold is
155     breached an alert is raised. This alert is then escalated as the
156     alert persists through four live levels, NOTICE, WARNING, CAUTION
157     and CRITICAL. The alerting system keeps an eye on the level and
158     when a certain level is reached, certain alerting mechanisms fire
159     through whatever medium they are configured to send.
160     </p>
161 tdb 1.3 </div>
162    
163     <!--#include virtual="/footer.inc" -->
164    
165     </div>
166    
167     <!--#include virtual="/menu.inc" -->
168    
169     </div>
170 tdb 1.1
171     </body>
172     </html>