www/cms/features.xhtml

<!--#include virtual="/doctype.inc" -->
  <head>
    <title>
      CMS Features
    </title>
<!--#include virtual="/style.inc" -->
  </head>
  <body>
    <div id="container">
      <div id="main">
<!--#include virtual="/header.inc" -->
        <div id="contents">
          <h1 class="top">
            CMS Features
          </h1>
          <h2>
            Problem Specification
          </h2>
          <h3>
            Original Problem
          </h3>
          <p>
            This is the original specification given to us when we
            started the project. The i-scream central monitoring system
            meets this specification, and aims to extend it further.
            This is, however, where it all began.
          </p>
          <h3>
            Centralised Machine Monitoring
          </h3>
          <p>
            The Computer Science department has a number of different
            machines running a variety of different operating systems.
            One of the tasks of the systems administrators is to make
            sure that the machines don't run out of resources. This
            involves watching processor loads, available disk space,
            swap space, etc.
          </p>
          <p>
            It isn't practicle to monitor a large number of machines by
            logging on and running commands such as 'uptime' on the
            unix machines, or by using performance monitor for NT
            servers. Thus this project is to write monitoring software
            for each platform supported which reports resource usage
            back to one centralized location. System Administrators
            would then be able to monitor all machines from this
            centralised location.
          </p>
          <p>
            Once this basic functionality is implemented it could
            usefully be expanded to include logging of resource usage
            to identify longterm trends/problems, alerter services
            which can directly contact sysadmins (or even the general
            public) to bring attention to problem areas. Ideally it
            should be possible to run multiple instances of the
            reporting tool (with all instances being updated in
            realtime) and to to be able to run the reporting tool as
            both as stand alone application and embeded in a web page.
          </p>
          <p>
            This project will require you to write code for the unix
            and Win32 APIs using C and knowledge of how the underlying
            operating systems manage resources. It will also require
            some network/distributed systems code and a GUI front end
            for the reporting tool. It is important for students
            undertaking this project to understand the importance of
            writing efficient and small code as the end product will
            really be most useful when machines start run out of
            processing power/memory/disk.
          </p>
          <p>
            John Cinnamond (email jc) whose idea this is, will provide
            technical support for the project.
          </p>
          <h2>
            Features
          </h2>
          <h3>
            Key Features of The System
          </h3>
          <ul>
            <li>A centrally stored, dynamically reloaded, system wide
            configuration system
            </li>
            <li>A totally extendable monitoring system, nothing except
            the Host (which generates the data) and the Clients (which
            view it) know any details about the data being sent,
            allowing data to be modified without changes to the server
            architecture.
            </li>
            <li>Central server and reporting tools all Java based for
            multi-platform portability
            </li>
            <li>Distribution of core server components over CORBA to
            allow appropriate components to run independently and to
            allow new components to be written to conform with the
            CORBA interfaces.
            </li>
            <li>Use of CORBA to create a hierarchical set of data entry
            points to the system allowing the system to handle event
            storms and remote office locations.
            </li>
            <li>One location for all system messages, despite being
            distributed.
            </li>
            <li>XML data protocol used to make data processing and
            analysing easily extendable
            </li>
            <li>A stateless server which can be moved and restarted at
            will, while Hosts, Clients, and reporting tools are
            unaffected and simply reconnect when the server is
            available again.
            </li>
            <li>Simple and open end protocols to allow easy extension
            and platform porting of Hosts and Clients.
            </li>
            <li>Self monitoring, as all data queues within the system
            can be monitored and raise alerts to warn of event storms
            and impending failures (should any occur).
            </li>
            <li>A variety of web based information displays based on
            Java/SQL reporting and PHP on-the-fly page generation to
            show the latest alerts and data
            </li>
            <li>Large overhead monitor Helpdesk style displays for
            latest Alerting information
            </li>
          </ul>
          <h3>
            An Overview of the i-scream Central Monitoring System
          </h3>
          <p>
            The i-scream system monitors status and performance
            information obtained from machines feeding data into it and
            then displays this information in a variety of ways.
          </p>
          <p>
            This data is obtained through the running of small
            applications on the reporting machines. These applications
            are known as "Hosts". The i-scream system provides a range
            of hosts which are designed to be small and lightweight in
            their configuration and operation. See the website and
            appropriate documentation to locate currently available
            Host applications. These hosts are simply told where to
            contact the server at which point they are totally
            autonomous. They are able to obtain configuration from the
            server, detect changes in their configuration, send data
            packets (via UDP) containing monitoring information, and
            send so called "Heartbeat" packets (via TCP) periodically
            to indicate to the server that they are still alive.
          </p>
          <p>
            It is then fed into the i-scream server. The server then
            splits the data two ways. First it places the data in a
            database system, typically MySQL based, for later
            extraction and processing by the i-scream report generation
            tools. It then passes it onto to real-time "Clients" which
            handle the data as it enters the system. The system itself
            has an internal real-time client called the "Local Client"
            which has a series of Monitors running which can analyse
            the data. One of these Monitors also feeds the data off to
            a file repository, which is updated as new data comes in
            for each machine, this data is then read and displayed by
            the i-scream web services to provide a web interface to the
            data. The system also allows TCP connections by non-local
            clients (such as the i-scream supplied Conient), these
            applications provide a real-time view of the data as it
            flows through the system.
          </p>
          <p>
            The final section of the system links the Local Client
            Monitors to an alerting system. These Monitors can be
            configured to detect changes in the data past threshold
            levels. When a threshold is breached an alert is raised.
            This alert is then escalated as the alert persists through
            four live levels, NOTICE, WARNING, CAUTION and CRITICAL.
            The alerting system keeps an eye on the level and when a
            certain level is reached, certain alerting mechanisms fire
            through whatever medium they are configured to send.
          </p>
        </div>
<!--#include virtual="/footer.inc" -->
      </div>
<!--#include virtual="/menu.inc" -->
    </div>
  </body>
</html>
Revision:	1.5
Committed:	Tue Mar 23 23:43:26 2004 UTC (21 years, 8 months ago) by tdb
Branch:	MAIN
Changes since 1.4:	+181 -163 lines
Log Message:	Another biggish commit. All pages are now XHTML 1.1 compliant. I've also tided (with the help of the tidy tool) all the pages, so they're neater. There are still parts of the site that won't validate - such as the CGI scripts, and the CVS stuff - but I'll get to them tomorrow.
#	User	Rev	Content
1	tdb	1.4	<!--#include virtual="/doctype.inc" -->
2	tdb	1.5	<head>
3			<title>
4			CMS Features
5			</title>
6	tdb	1.3	<!--#include virtual="/style.inc" -->
7	tdb	1.5	</head>
8			<body>
9			<div id="container">
10			<div id="main">
11	tdb	1.3	<!--#include virtual="/header.inc" -->
12	tdb	1.5	<div id="contents">
13			<h1 class="top">
14			CMS Features
15			</h1>
16			<h2>
17			Problem Specification
18			</h2>
19			<h3>
20			Original Problem
21			</h3>
22			<p>
23			This is the original specification given to us when we
24			started the project. The i-scream central monitoring system
25			meets this specification, and aims to extend it further.
26			This is, however, where it all began.
27			</p>
28			<h3>
29			Centralised Machine Monitoring
30			</h3>
31			<p>
32			The Computer Science department has a number of different
33			machines running a variety of different operating systems.
34			One of the tasks of the systems administrators is to make
35			sure that the machines don't run out of resources. This
36			involves watching processor loads, available disk space,
37			swap space, etc.
38			</p>
39			<p>
40			It isn't practicle to monitor a large number of machines by
41			logging on and running commands such as 'uptime' on the
42			unix machines, or by using performance monitor for NT
43			servers. Thus this project is to write monitoring software
44			for each platform supported which reports resource usage
45			back to one centralized location. System Administrators
46			would then be able to monitor all machines from this
47			centralised location.
48			</p>
49			<p>
50			Once this basic functionality is implemented it could
51			usefully be expanded to include logging of resource usage
52			to identify longterm trends/problems, alerter services
53			which can directly contact sysadmins (or even the general
54			public) to bring attention to problem areas. Ideally it
55			should be possible to run multiple instances of the
56			reporting tool (with all instances being updated in
57			realtime) and to to be able to run the reporting tool as
58			both as stand alone application and embeded in a web page.
59			</p>
60			<p>
61			This project will require you to write code for the unix
62			and Win32 APIs using C and knowledge of how the underlying
63			operating systems manage resources. It will also require
64			some network/distributed systems code and a GUI front end
65			for the reporting tool. It is important for students
66			undertaking this project to understand the importance of
67			writing efficient and small code as the end product will
68			really be most useful when machines start run out of
69			processing power/memory/disk.
70			</p>
71			<p>
72			John Cinnamond (email jc) whose idea this is, will provide
73			technical support for the project.
74			</p>
75			<h2>
76			Features
77			</h2>
78			<h3>
79			Key Features of The System
80			</h3>
81			<ul>
82			<li>A centrally stored, dynamically reloaded, system wide
83			configuration system
84			</li>
85			<li>A totally extendable monitoring system, nothing except
86			the Host (which generates the data) and the Clients (which
87			view it) know any details about the data being sent,
88			allowing data to be modified without changes to the server
89			architecture.
90			</li>
91			<li>Central server and reporting tools all Java based for
92			multi-platform portability
93			</li>
94			<li>Distribution of core server components over CORBA to
95			allow appropriate components to run independently and to
96			allow new components to be written to conform with the
97			CORBA interfaces.
98			</li>
99			<li>Use of CORBA to create a hierarchical set of data entry
100			points to the system allowing the system to handle event
101			storms and remote office locations.
102			</li>
103			<li>One location for all system messages, despite being
104			distributed.
105			</li>
106			<li>XML data protocol used to make data processing and
107			analysing easily extendable
108			</li>
109			<li>A stateless server which can be moved and restarted at
110			will, while Hosts, Clients, and reporting tools are
111			unaffected and simply reconnect when the server is
112			available again.
113			</li>
114			<li>Simple and open end protocols to allow easy extension
115			and platform porting of Hosts and Clients.
116			</li>
117			<li>Self monitoring, as all data queues within the system
118			can be monitored and raise alerts to warn of event storms
119			and impending failures (should any occur).
120			</li>
121			<li>A variety of web based information displays based on
122			Java/SQL reporting and PHP on-the-fly page generation to
123			show the latest alerts and data
124			</li>
125			<li>Large overhead monitor Helpdesk style displays for
126			latest Alerting information
127			</li>
128			</ul>
129			<h3>
130			An Overview of the i-scream Central Monitoring System
131			</h3>
132			<p>
133			The i-scream system monitors status and performance
134			information obtained from machines feeding data into it and
135			then displays this information in a variety of ways.
136			</p>
137			<p>
138			This data is obtained through the running of small
139			applications on the reporting machines. These applications
140			are known as "Hosts". The i-scream system provides a range
141			of hosts which are designed to be small and lightweight in
142			their configuration and operation. See the website and
143			appropriate documentation to locate currently available
144			Host applications. These hosts are simply told where to
145			contact the server at which point they are totally
146			autonomous. They are able to obtain configuration from the
147			server, detect changes in their configuration, send data
148			packets (via UDP) containing monitoring information, and
149			send so called "Heartbeat" packets (via TCP) periodically
150			to indicate to the server that they are still alive.
151			</p>
152			<p>
153			It is then fed into the i-scream server. The server then
154			splits the data two ways. First it places the data in a
155			database system, typically MySQL based, for later
156			extraction and processing by the i-scream report generation
157			tools. It then passes it onto to real-time "Clients" which
158			handle the data as it enters the system. The system itself
159			has an internal real-time client called the "Local Client"
160			which has a series of Monitors running which can analyse
161			the data. One of these Monitors also feeds the data off to
162			a file repository, which is updated as new data comes in
163			for each machine, this data is then read and displayed by
164			the i-scream web services to provide a web interface to the
165			data. The system also allows TCP connections by non-local
166			clients (such as the i-scream supplied Conient), these
167			applications provide a real-time view of the data as it
168			flows through the system.
169			</p>
170			<p>
171			The final section of the system links the Local Client
172			Monitors to an alerting system. These Monitors can be
173			configured to detect changes in the data past threshold
174			levels. When a threshold is breached an alert is raised.
175			This alert is then escalated as the alert persists through
176			four live levels, NOTICE, WARNING, CAUTION and CRITICAL.
177			The alerting system keeps an eye on the level and when a
178			certain level is reached, certain alerting mechanisms fire
179			through whatever medium they are configured to send.
180			</p>
181			</div>
182	tdb	1.3	<!--#include virtual="/footer.inc" -->
183	tdb	1.5	</div>
184	tdb	1.3	<!--#include virtual="/menu.inc" -->
185	tdb	1.5	</div>
186			</body>
187	tdb	1.1	</html>