| 1 | tdb | 1.4 | <!--#include virtual="/doctype.inc" --> | 
 
 
 
 
 
 
 
 | 2 | tdb | 1.1 |  | 
 
 
 
 
 | 3 |  |  | <head> | 
 
 
 
 
 
 
 
 | 4 | tdb | 1.3 | <title>CMS Features</title> | 
 
 
 
 
 | 5 |  |  | <!--#include virtual="/style.inc" --> | 
 
 
 
 
 
 
 
 | 6 | tdb | 1.1 | </head> | 
 
 
 
 
 | 7 |  |  |  | 
 
 
 
 
 
 
 
 | 8 | tdb | 1.3 | <body> | 
 
 
 
 
 | 9 |  |  |  | 
 
 
 
 
 | 10 |  |  | <div id="container"> | 
 
 
 
 
 | 11 |  |  |  | 
 
 
 
 
 | 12 |  |  | <div id="main"> | 
 
 
 
 
 | 13 |  |  |  | 
 
 
 
 
 | 14 |  |  | <!--#include virtual="/header.inc" --> | 
 
 
 
 
 | 15 |  |  |  | 
 
 
 
 
 | 16 |  |  | <div id="contents"> | 
 
 
 
 
 | 17 |  |  |  | 
 
 
 
 
 | 18 |  |  | <h1 class="top">CMS Features</h1> | 
 
 
 
 
 
 
 
 | 19 | tdb | 1.1 |  | 
 
 
 
 
 
 
 
 | 20 | tdb | 1.3 | <h2>Problem Specification</h2> | 
 
 
 
 
 | 21 |  |  |  | 
 
 
 
 
 | 22 |  |  | <h3>Original Problem</h3> | 
 
 
 
 
 | 23 |  |  |  | 
 
 
 
 
 | 24 |  |  | <p> | 
 
 
 
 
 | 25 |  |  | This is the original specification given to us when we | 
 
 
 
 
 | 26 |  |  | started the project. The i-scream central monitoring | 
 
 
 
 
 | 27 |  |  | system meets this specification, and aims to extend it | 
 
 
 
 
 | 28 |  |  | further. This is, however, where it all began. | 
 
 
 
 
 | 29 |  |  | </p> | 
 
 
 
 
 
 
 
 | 30 | tdb | 1.1 |  | 
 
 
 
 
 
 
 
 | 31 | tdb | 1.3 | <h3>Centralised Machine Monitoring</h3> | 
 
 
 
 
 | 32 |  |  |  | 
 
 
 
 
 | 33 |  |  | <p> | 
 
 
 
 
 | 34 |  |  | The Computer Science department has a number of different machines | 
 
 
 
 
 | 35 |  |  | running a variety of different operating systems. One of the tasks | 
 
 
 
 
 | 36 |  |  | of the systems administrators is to make sure that the machines | 
 
 
 
 
 | 37 |  |  | don't run out of resources. This involves watching processor loads, | 
 
 
 
 
 | 38 |  |  | available disk space, swap space, etc. | 
 
 
 
 
 | 39 |  |  | </p> | 
 
 
 
 
 | 40 |  |  |  | 
 
 
 
 
 | 41 |  |  | <p> | 
 
 
 
 
 | 42 |  |  | It isn't practicle to monitor a large number of machines by logging | 
 
 
 
 
 | 43 |  |  | on and running commands such as 'uptime' on the unix machines, or | 
 
 
 
 
 | 44 |  |  | by using performance monitor for NT servers. Thus this project is | 
 
 
 
 
 | 45 |  |  | to write monitoring software for each platform supported which | 
 
 
 
 
 | 46 |  |  | reports resource usage back to one centralized location. System | 
 
 
 
 
 | 47 |  |  | Administrators would then be able to monitor all machines from this | 
 
 
 
 
 | 48 |  |  | centralised location. | 
 
 
 
 
 | 49 |  |  | </p> | 
 
 
 
 
 | 50 |  |  |  | 
 
 
 
 
 | 51 |  |  | <p> | 
 
 
 
 
 | 52 |  |  | Once this basic functionality is implemented it could usefully be | 
 
 
 
 
 | 53 |  |  | expanded to include logging of resource usage to identify longterm | 
 
 
 
 
 | 54 |  |  | trends/problems, alerter services which can directly contact | 
 
 
 
 
 | 55 |  |  | sysadmins (or even the general public) to bring attention to problem | 
 
 
 
 
 | 56 |  |  | areas. Ideally it should be possible to run multiple instances of | 
 
 
 
 
 | 57 |  |  | the reporting tool (with all instances being updated in realtime) | 
 
 
 
 
 | 58 |  |  | and to to be able to run the reporting tool as both as stand alone | 
 
 
 
 
 | 59 |  |  | application and embeded in a web page. | 
 
 
 
 
 | 60 |  |  | </p> | 
 
 
 
 
 | 61 |  |  |  | 
 
 
 
 
 | 62 |  |  | <p> | 
 
 
 
 
 | 63 |  |  | This project will require you to write code for the unix and Win32 | 
 
 
 
 
 | 64 |  |  | APIs using C and knowledge of how the underlying operating systems | 
 
 
 
 
 | 65 |  |  | manage resources. It will also require some network/distributed | 
 
 
 
 
 | 66 |  |  | systems code and a GUI front end for the reporting tool. It is | 
 
 
 
 
 | 67 |  |  | important for students undertaking this project to understand the | 
 
 
 
 
 | 68 |  |  | importance of writing efficient and small code as the end product | 
 
 
 
 
 | 69 |  |  | will really be most useful when machines start run out of processing | 
 
 
 
 
 | 70 |  |  | power/memory/disk. | 
 
 
 
 
 | 71 |  |  | </p> | 
 
 
 
 
 | 72 |  |  |  | 
 
 
 
 
 | 73 |  |  | <p> | 
 
 
 
 
 | 74 |  |  | John Cinnamond (email jc) whose idea this is, will provide technical | 
 
 
 
 
 | 75 |  |  | support for the project. | 
 
 
 
 
 | 76 |  |  | </p> | 
 
 
 
 
 | 77 |  |  |  | 
 
 
 
 
 | 78 |  |  | <h2>Features</h2> | 
 
 
 
 
 | 79 |  |  |  | 
 
 
 
 
 | 80 |  |  | <h3>Key Features of The System</h3> | 
 
 
 
 
 
 
 
 | 81 | tdb | 1.1 |  | 
 
 
 
 
 | 82 |  |  | <ul> | 
 
 
 
 
 | 83 |  |  | <li>A centrally stored, dynamically reloaded, system wide configuration system</li> | 
 
 
 
 
 | 84 |  |  | <li>A totally extendable monitoring system, nothing except the Host (which | 
 
 
 
 
 | 85 |  |  | generates the data) and the Clients (which view it) know any details about | 
 
 
 
 
 | 86 |  |  | the data being sent, allowing data to be modified without changes to the | 
 
 
 
 
 | 87 |  |  | server architecture.</li> | 
 
 
 
 
 | 88 |  |  | <li>Central server and reporting tools all Java based for multi-platform portability</li> | 
 
 
 
 
 | 89 |  |  | <li>Distribution of core server components over CORBA to allow appropriate components | 
 
 
 
 
 | 90 |  |  | to run independently and to allow new components to be written to conform with the | 
 
 
 
 
 | 91 |  |  | CORBA interfaces.</li> | 
 
 
 
 
 | 92 |  |  | <li>Use of CORBA to create a hierarchical set of data entry points to the system | 
 
 
 
 
 | 93 |  |  | allowing the system to handle event storms and remote office locations.</li> | 
 
 
 
 
 | 94 |  |  | <li>One location for all system messages, despite being distributed.</li> | 
 
 
 
 
 | 95 |  |  | <li>XML data protocol used to make data processing and analysing easily extendable</li> | 
 
 
 
 
 | 96 |  |  | <li>A stateless server which can be moved and restarted at will, while Hosts, | 
 
 
 
 
 | 97 |  |  | Clients, and reporting tools are unaffected and simply reconnect when the | 
 
 
 
 
 | 98 |  |  | server is available again.</li> | 
 
 
 
 
 | 99 |  |  | <li>Simple and open end protocols to allow easy extension and platform porting of Hosts | 
 
 
 
 
 | 100 |  |  | and Clients.</li> | 
 
 
 
 
 | 101 |  |  | <li>Self monitoring, as all data queues within the system can be monitored and raise | 
 
 
 
 
 | 102 |  |  | alerts to warn of event storms and impending failures (should any occur).</li> | 
 
 
 
 
 | 103 |  |  | <li>A variety of web based information displays based on Java/SQL reporting and | 
 
 
 
 
 | 104 |  |  | PHP on-the-fly page generation to show the latest alerts and data</li> | 
 
 
 
 
 | 105 |  |  | <li>Large overhead monitor Helpdesk style displays for latest Alerting information</li> | 
 
 
 
 
 | 106 |  |  | </ul> | 
 
 
 
 
 | 107 |  |  |  | 
 
 
 
 
 
 
 
 | 108 | tdb | 1.3 | <h3>An Overview of the i-scream Central Monitoring System</h3> | 
 
 
 
 
 
 
 
 | 109 | tdb | 1.1 |  | 
 
 
 
 
 
 
 
 | 110 | tdb | 1.3 | <p> | 
 
 
 
 
 
 
 
 | 111 | tdb | 1.1 | The i-scream system monitors status and performance information | 
 
 
 
 
 | 112 |  |  | obtained from machines feeding data into it and then displays | 
 
 
 
 
 | 113 |  |  | this information in a variety of ways. | 
 
 
 
 
 | 114 |  |  | </p> | 
 
 
 
 
 | 115 |  |  |  | 
 
 
 
 
 
 
 
 | 116 | tdb | 1.3 | <p> | 
 
 
 
 
 
 
 
 | 117 | tdb | 1.1 | This data is obtained through the running of small applications | 
 
 
 
 
 | 118 |  |  | on the reporting machines.  These applications are known as | 
 
 
 
 
 | 119 |  |  | "Hosts".  The i-scream system provides a range of hosts which are | 
 
 
 
 
 | 120 |  |  | designed to be small and lightweight in their configuration and | 
 
 
 
 
 | 121 |  |  | operation.  See the website and appropriate documentation to | 
 
 
 
 
 | 122 |  |  | locate currently available Host applications.  These hosts are | 
 
 
 
 
 | 123 |  |  | simply told where to contact the server at which point they are | 
 
 
 
 
 | 124 |  |  | totally autonomous.  They are able to obtain configuration from | 
 
 
 
 
 | 125 |  |  | the server, detect changes in their configuration, send data | 
 
 
 
 
 | 126 |  |  | packets (via UDP) containing monitoring information, and send | 
 
 
 
 
 | 127 |  |  | so called "Heartbeat" packets (via TCP) periodically to indicate | 
 
 
 
 
 | 128 |  |  | to the server that they are still alive. | 
 
 
 
 
 | 129 |  |  | </p> | 
 
 
 
 
 | 130 |  |  |  | 
 
 
 
 
 
 
 
 | 131 | tdb | 1.3 | <p> | 
 
 
 
 
 
 
 
 | 132 | tdb | 1.1 | It is then fed into the i-scream server.  The server then splits | 
 
 
 
 
 | 133 |  |  | the data two ways.  First it places the data in a database system, | 
 
 
 
 
 | 134 |  |  | typically MySQL based, for later extraction and processing by the | 
 
 
 
 
 | 135 |  |  | i-scream report generation tools.  It then passes it onto to | 
 
 
 
 
 | 136 |  |  | real-time "Clients" which handle the data as it enters the system. | 
 
 
 
 
 | 137 |  |  | The system itself has an internal real-time client called the "Local | 
 
 
 
 
 | 138 |  |  | Client" which has a series of Monitors running which can analyse the | 
 
 
 
 
 | 139 |  |  | data.  One of these Monitors also feeds the data off to a file | 
 
 
 
 
 | 140 |  |  | repository, which is updated as new data comes in for each machine, | 
 
 
 
 
 | 141 |  |  | this data is then read and displayed by the i-scream web services | 
 
 
 
 
 | 142 |  |  | to provide a web interface to the data.  The system also allows TCP | 
 
 
 
 
 | 143 |  |  | connections by non-local clients (such as the i-scream supplied | 
 
 
 
 
 | 144 |  |  | Conient), these applications provide a real-time view of the data | 
 
 
 
 
 | 145 |  |  | as it flows through the system. | 
 
 
 
 
 | 146 |  |  | </p> | 
 
 
 
 
 | 147 |  |  |  | 
 
 
 
 
 
 
 
 | 148 | tdb | 1.3 | <p> | 
 
 
 
 
 
 
 
 | 149 | tdb | 1.1 | The final section of the system links the Local Client Monitors to | 
 
 
 
 
 | 150 |  |  | an alerting system.  These Monitors can be configured to detect | 
 
 
 
 
 | 151 |  |  | changes in the data past threshold levels.  When a threshold is | 
 
 
 
 
 | 152 |  |  | breached an alert is raised.  This alert is then escalated as the | 
 
 
 
 
 | 153 |  |  | alert persists through four live levels, NOTICE, WARNING, CAUTION | 
 
 
 
 
 | 154 |  |  | and CRITICAL.  The alerting system keeps an eye on the level and | 
 
 
 
 
 | 155 |  |  | when a certain level is reached, certain alerting mechanisms fire | 
 
 
 
 
 | 156 |  |  | through whatever medium they are configured to send. | 
 
 
 
 
 | 157 |  |  | </p> | 
 
 
 
 
 
 
 
 | 158 | tdb | 1.3 | </div> | 
 
 
 
 
 | 159 |  |  |  | 
 
 
 
 
 | 160 |  |  | <!--#include virtual="/footer.inc" --> | 
 
 
 
 
 | 161 |  |  |  | 
 
 
 
 
 | 162 |  |  | </div> | 
 
 
 
 
 | 163 |  |  |  | 
 
 
 
 
 | 164 |  |  | <!--#include virtual="/menu.inc" --> | 
 
 
 
 
 | 165 |  |  |  | 
 
 
 
 
 | 166 |  |  | </div> | 
 
 
 
 
 
 
 
 | 167 | tdb | 1.1 |  | 
 
 
 
 
 | 168 |  |  | </body> | 
 
 
 
 
 | 169 |  |  | </html> |