| 1 | 
 < | 
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 2 | 
 < | 
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 3 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 4 | 
 < | 
 <html> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 5 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 6 | 
 < | 
 <head> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 7 | 
 < | 
 <title>CMS Features</title> | 
 
 
 
 
 
 
 
 
 
 | 1 | 
 > | 
 <!--#include virtual="/doctype.inc" --> | 
 
 
 
 
 
 | 2 | 
 > | 
   <head> | 
 
 
 
 
 
 | 3 | 
 > | 
     <title> | 
 
 
 
 
 
 | 4 | 
 > | 
       CMS Features | 
 
 
 
 
 
 | 5 | 
 > | 
     </title> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 6 | 
   | 
 <!--#include virtual="/style.inc" --> | 
 
 
 
 
 
 
 
 
 
 
 
 | 7 | 
 < | 
 </head> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 8 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 9 | 
 < | 
 <body> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 10 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 13 | 
 < | 
 <div id="container"> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 14 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 15 | 
 < | 
 <div id="main"> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 16 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 | 7 | 
 > | 
   </head> | 
 
 
 
 
 
 | 8 | 
 > | 
   <body> | 
 
 
 
 
 
 | 9 | 
 > | 
     <div id="container"> | 
 
 
 
 
 
 | 10 | 
 > | 
       <div id="main"> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 11 | 
   | 
 <!--#include virtual="/header.inc" --> | 
 
 
 
 
 
 
 
 
 
 
 
 | 12 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 13 | 
 < | 
 <div id="contents">  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 14 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 15 | 
 < | 
   <h1 class="top">CMS Features</h1> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 16 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 17 | 
 < | 
   <h2>Problem Specification</h2> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 18 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 19 | 
 < | 
        <h3>Original Problem</h3> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 20 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 21 | 
 < | 
        <p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 22 | 
 < | 
         This is the original specification given to us when we | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 23 | 
 < | 
         started the project. The i-scream central monitoring | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 24 | 
 < | 
         system meets this specification, and aims to extend it | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 25 | 
 < | 
         further. This is, however, where it all began. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 26 | 
 < | 
        </p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 27 | 
 < | 
         | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 28 | 
 < | 
        <h3>Centralised Machine Monitoring</h3> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 29 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 30 | 
 < | 
        <p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 31 | 
 < | 
         The Computer Science department has a number of different machines | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 32 | 
 < | 
         running a variety of different operating systems. One of the tasks | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 33 | 
 < | 
         of the systems administrators is to make sure that the machines | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 34 | 
 < | 
         don't run out of resources. This involves watching processor loads, | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 35 | 
 < | 
         available disk space, swap space, etc. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 36 | 
 < | 
        </p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 37 | 
 < | 
         | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 38 | 
 < | 
        <p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 39 | 
 < | 
         It isn't practicle to monitor a large number of machines by logging | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 40 | 
 < | 
         on and running commands such as 'uptime' on the unix machines, or | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 41 | 
 < | 
         by using performance monitor for NT servers. Thus this project is | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 42 | 
 < | 
         to write monitoring software for each platform supported which | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 43 | 
 < | 
         reports resource usage back to one centralized location. System | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 44 | 
 < | 
         Administrators would then be able to monitor all machines from this | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 45 | 
 < | 
         centralised location. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 46 | 
 < | 
        </p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 47 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 48 | 
 < | 
        <p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 49 | 
 < | 
         Once this basic functionality is implemented it could usefully be | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 50 | 
 < | 
         expanded to include logging of resource usage to identify longterm | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 51 | 
 < | 
         trends/problems, alerter services which can directly contact | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 52 | 
 < | 
         sysadmins (or even the general public) to bring attention to problem | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 53 | 
 < | 
         areas. Ideally it should be possible to run multiple instances of | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 54 | 
 < | 
         the reporting tool (with all instances being updated in realtime) | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 55 | 
 < | 
         and to to be able to run the reporting tool as both as stand alone | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 56 | 
 < | 
         application and embeded in a web page. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 57 | 
 < | 
        </p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 58 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 59 | 
 < | 
        <p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 60 | 
 < | 
         This project will require you to write code for the unix and Win32 | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 61 | 
 < | 
         APIs using C and knowledge of how the underlying operating systems | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 62 | 
 < | 
         manage resources. It will also require some network/distributed | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 63 | 
 < | 
         systems code and a GUI front end for the reporting tool. It is | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 64 | 
 < | 
         important for students undertaking this project to understand the | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 65 | 
 < | 
         importance of writing efficient and small code as the end product | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 66 | 
 < | 
         will really be most useful when machines start run out of processing | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 67 | 
 < | 
         power/memory/disk. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 68 | 
 < | 
        </p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 69 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 70 | 
 < | 
        <p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 71 | 
 < | 
         John Cinnamond (email jc) whose idea this is, will provide technical | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 72 | 
 < | 
         support for the project. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 73 | 
 < | 
        </p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 74 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 75 | 
 < | 
   <h2>Features</h2> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 76 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 77 | 
 < | 
        <h3>Key Features of The System</h3> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 78 | 
 < | 
         | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 79 | 
 < | 
        <ul> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 80 | 
 < | 
         <li>A centrally stored, dynamically reloaded, system wide configuration system</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 81 | 
 < | 
         <li>A totally extendable monitoring system, nothing except the Host (which | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 82 | 
 < | 
           generates the data) and the Clients (which view it) know any details about | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 83 | 
 < | 
           the data being sent, allowing data to be modified without changes to the | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 84 | 
 < | 
           server architecture.</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 85 | 
 < | 
         <li>Central server and reporting tools all Java based for multi-platform portability</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 86 | 
 < | 
         <li>Distribution of core server components over CORBA to allow appropriate components | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 87 | 
 < | 
           to run independently and to allow new components to be written to conform with the | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 88 | 
 < | 
           CORBA interfaces.</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 89 | 
 < | 
         <li>Use of CORBA to create a hierarchical set of data entry points to the system | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 90 | 
 < | 
           allowing the system to handle event storms and remote office locations.</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 91 | 
 < | 
         <li>One location for all system messages, despite being distributed.</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 92 | 
 < | 
         <li>XML data protocol used to make data processing and analysing easily extendable</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 93 | 
 < | 
         <li>A stateless server which can be moved and restarted at will, while Hosts, | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 94 | 
 < | 
           Clients, and reporting tools are unaffected and simply reconnect when the | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 95 | 
 < | 
           server is available again.</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 96 | 
 < | 
         <li>Simple and open end protocols to allow easy extension and platform porting of Hosts | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 97 | 
 < | 
           and Clients.</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 98 | 
 < | 
         <li>Self monitoring, as all data queues within the system can be monitored and raise | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 99 | 
 < | 
           alerts to warn of event storms and impending failures (should any occur).</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 100 | 
 < | 
         <li>A variety of web based information displays based on Java/SQL reporting and | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 101 | 
 < | 
           PHP on-the-fly page generation to show the latest alerts and data</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 102 | 
 < | 
         <li>Large overhead monitor Helpdesk style displays for latest Alerting information</li> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 103 | 
 < | 
        </ul> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 104 | 
 < | 
         | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 105 | 
 < | 
        <h3>An Overview of the i-scream Central Monitoring System</h3> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 106 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 107 | 
 < | 
        <p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 108 | 
 < | 
         The i-scream system monitors status and performance information | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 109 | 
 < | 
         obtained from machines feeding data into it and then displays | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 110 | 
 < | 
         this information in a variety of ways. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 111 | 
 < | 
        </p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 112 | 
 < | 
         | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 113 | 
 < | 
        <p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 114 | 
 < | 
         This data is obtained through the running of small applications | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 115 | 
 < | 
         on the reporting machines.  These applications are known as | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 116 | 
 < | 
         "Hosts".  The i-scream system provides a range of hosts which are | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 117 | 
 < | 
         designed to be small and lightweight in their configuration and | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 118 | 
 < | 
         operation.  See the website and appropriate documentation to | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 119 | 
 < | 
         locate currently available Host applications.  These hosts are | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 120 | 
 < | 
         simply told where to contact the server at which point they are | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 121 | 
 < | 
         totally autonomous.  They are able to obtain configuration from | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 122 | 
 < | 
         the server, detect changes in their configuration, send data | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 123 | 
 < | 
         packets (via UDP) containing monitoring information, and send | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 124 | 
 < | 
         so called "Heartbeat" packets (via TCP) periodically to indicate | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 125 | 
 < | 
         to the server that they are still alive. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 126 | 
 < | 
        </p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 127 | 
 < | 
         | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 128 | 
 < | 
        <p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 129 | 
 < | 
         It is then fed into the i-scream server.  The server then splits | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 130 | 
 < | 
         the data two ways.  First it places the data in a database system, | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 131 | 
 < | 
         typically MySQL based, for later extraction and processing by the | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 132 | 
 < | 
         i-scream report generation tools.  It then passes it onto to | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 133 | 
 < | 
         real-time "Clients" which handle the data as it enters the system. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 134 | 
 < | 
         The system itself has an internal real-time client called the "Local | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 135 | 
 < | 
         Client" which has a series of Monitors running which can analyse the | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 136 | 
 < | 
         data.  One of these Monitors also feeds the data off to a file | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 137 | 
 < | 
         repository, which is updated as new data comes in for each machine, | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 138 | 
 < | 
         this data is then read and displayed by the i-scream web services | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 139 | 
 < | 
         to provide a web interface to the data.  The system also allows TCP | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 140 | 
 < | 
         connections by non-local clients (such as the i-scream supplied | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 141 | 
 < | 
         Conient), these applications provide a real-time view of the data | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 142 | 
 < | 
         as it flows through the system. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 143 | 
 < | 
        </p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 144 | 
 < | 
         | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 145 | 
 < | 
        <p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 146 | 
 < | 
         The final section of the system links the Local Client Monitors to | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 147 | 
 < | 
         an alerting system.  These Monitors can be configured to detect | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 148 | 
 < | 
         changes in the data past threshold levels.  When a threshold is | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 149 | 
 < | 
         breached an alert is raised.  This alert is then escalated as the | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 150 | 
 < | 
         alert persists through four live levels, NOTICE, WARNING, CAUTION | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 151 | 
 < | 
         and CRITICAL.  The alerting system keeps an eye on the level and | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 152 | 
 < | 
         when a certain level is reached, certain alerting mechanisms fire | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 153 | 
 < | 
         through whatever medium they are configured to send. | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 154 | 
 < | 
        </p> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 155 | 
 < | 
 </div> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 156 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 12 | 
 > | 
         <div id="contents"> | 
 
 
 
 
 
 | 13 | 
 > | 
           <h1 class="top"> | 
 
 
 
 
 
 | 14 | 
 > | 
             CMS Features | 
 
 
 
 
 
 | 15 | 
 > | 
           </h1> | 
 
 
 
 
 
 | 16 | 
 > | 
           <h2> | 
 
 
 
 
 
 | 17 | 
 > | 
             Problem Specification | 
 
 
 
 
 
 | 18 | 
 > | 
           </h2> | 
 
 
 
 
 
 | 19 | 
 > | 
           <h3> | 
 
 
 
 
 
 | 20 | 
 > | 
             Original Problem | 
 
 
 
 
 
 | 21 | 
 > | 
           </h3> | 
 
 
 
 
 
 | 22 | 
 > | 
           <p> | 
 
 
 
 
 
 | 23 | 
 > | 
             This is the original specification given to us when we | 
 
 
 
 
 
 | 24 | 
 > | 
             started the project. The i-scream central monitoring system | 
 
 
 
 
 
 | 25 | 
 > | 
             meets this specification, and aims to extend it further. | 
 
 
 
 
 
 | 26 | 
 > | 
             This is, however, where it all began. | 
 
 
 
 
 
 | 27 | 
 > | 
           </p> | 
 
 
 
 
 
 | 28 | 
 > | 
           <h3> | 
 
 
 
 
 
 | 29 | 
 > | 
             Centralised Machine Monitoring | 
 
 
 
 
 
 | 30 | 
 > | 
           </h3> | 
 
 
 
 
 
 | 31 | 
 > | 
           <p> | 
 
 
 
 
 
 | 32 | 
 > | 
             The Computer Science department has a number of different | 
 
 
 
 
 
 | 33 | 
 > | 
             machines running a variety of different operating systems. | 
 
 
 
 
 
 | 34 | 
 > | 
             One of the tasks of the systems administrators is to make | 
 
 
 
 
 
 | 35 | 
 > | 
             sure that the machines don't run out of resources. This | 
 
 
 
 
 
 | 36 | 
 > | 
             involves watching processor loads, available disk space, | 
 
 
 
 
 
 | 37 | 
 > | 
             swap space, etc. | 
 
 
 
 
 
 | 38 | 
 > | 
           </p> | 
 
 
 
 
 
 | 39 | 
 > | 
           <p> | 
 
 
 
 
 
 | 40 | 
 > | 
             It isn't practicle to monitor a large number of machines by | 
 
 
 
 
 
 | 41 | 
 > | 
             logging on and running commands such as 'uptime' on the | 
 
 
 
 
 
 | 42 | 
 > | 
             unix machines, or by using performance monitor for NT | 
 
 
 
 
 
 | 43 | 
 > | 
             servers. Thus this project is to write monitoring software | 
 
 
 
 
 
 | 44 | 
 > | 
             for each platform supported which reports resource usage | 
 
 
 
 
 
 | 45 | 
 > | 
             back to one centralised location. System Administrators | 
 
 
 
 
 
 | 46 | 
 > | 
             would then be able to monitor all machines from this | 
 
 
 
 
 
 | 47 | 
 > | 
             centralised location. | 
 
 
 
 
 
 | 48 | 
 > | 
           </p> | 
 
 
 
 
 
 | 49 | 
 > | 
           <p> | 
 
 
 
 
 
 | 50 | 
 > | 
             Once this basic functionality is implemented it could | 
 
 
 
 
 
 | 51 | 
 > | 
             usefully be expanded to include logging of resource usage | 
 
 
 
 
 
 | 52 | 
 > | 
             to identify longterm trends/problems, alerter services | 
 
 
 
 
 
 | 53 | 
 > | 
             which can directly contact sysadmins (or even the general | 
 
 
 
 
 
 | 54 | 
 > | 
             public) to bring attention to problem areas. Ideally it | 
 
 
 
 
 
 | 55 | 
 > | 
             should be possible to run multiple instances of the | 
 
 
 
 
 
 | 56 | 
 > | 
             reporting tool (with all instances being updated in | 
 
 
 
 
 
 | 57 | 
 > | 
             realtime) and to to be able to run the reporting tool as | 
 
 
 
 
 
 | 58 | 
 > | 
             both as stand alone application and embeded in a web page. | 
 
 
 
 
 
 | 59 | 
 > | 
           </p> | 
 
 
 
 
 
 | 60 | 
 > | 
           <p> | 
 
 
 
 
 
 | 61 | 
 > | 
             This project will require you to write code for the unix | 
 
 
 
 
 
 | 62 | 
 > | 
             and Win32 APIs using C and knowledge of how the underlying | 
 
 
 
 
 
 | 63 | 
 > | 
             operating systems manage resources. It will also require | 
 
 
 
 
 
 | 64 | 
 > | 
             some network/distributed systems code and a GUI front end | 
 
 
 
 
 
 | 65 | 
 > | 
             for the reporting tool. It is important for students | 
 
 
 
 
 
 | 66 | 
 > | 
             undertaking this project to understand the importance of | 
 
 
 
 
 
 | 67 | 
 > | 
             writing efficient and small code as the end product will | 
 
 
 
 
 
 | 68 | 
 > | 
             really be most useful when machines start run out of | 
 
 
 
 
 
 | 69 | 
 > | 
             processing power/memory/disk. | 
 
 
 
 
 
 | 70 | 
 > | 
           </p> | 
 
 
 
 
 
 | 71 | 
 > | 
           <p> | 
 
 
 
 
 
 | 72 | 
 > | 
             John Cinnamond (email jc) whose idea this is, will provide | 
 
 
 
 
 
 | 73 | 
 > | 
             technical support for the project. | 
 
 
 
 
 
 | 74 | 
 > | 
           </p> | 
 
 
 
 
 
 | 75 | 
 > | 
           <h2> | 
 
 
 
 
 
 | 76 | 
 > | 
             Features | 
 
 
 
 
 
 | 77 | 
 > | 
           </h2> | 
 
 
 
 
 
 | 78 | 
 > | 
           <h3> | 
 
 
 
 
 
 | 79 | 
 > | 
             Key Features of The System | 
 
 
 
 
 
 | 80 | 
 > | 
           </h3> | 
 
 
 
 
 
 | 81 | 
 > | 
           <ul> | 
 
 
 
 
 
 | 82 | 
 > | 
             <li>A centrally stored, dynamically reloaded, system wide | 
 
 
 
 
 
 | 83 | 
 > | 
             configuration system | 
 
 
 
 
 
 | 84 | 
 > | 
             </li> | 
 
 
 
 
 
 | 85 | 
 > | 
             <li>A totally extendable monitoring system, nothing except | 
 
 
 
 
 
 | 86 | 
 > | 
             the Host (which generates the data) and the Clients (which | 
 
 
 
 
 
 | 87 | 
 > | 
             view it) know any details about the data being sent, | 
 
 
 
 
 
 | 88 | 
 > | 
             allowing data to be modified without changes to the server | 
 
 
 
 
 
 | 89 | 
 > | 
             architecture. | 
 
 
 
 
 
 | 90 | 
 > | 
             </li> | 
 
 
 
 
 
 | 91 | 
 > | 
             <li>Central server and reporting tools all Java based for | 
 
 
 
 
 
 | 92 | 
 > | 
             multi-platform portability | 
 
 
 
 
 
 | 93 | 
 > | 
             </li> | 
 
 
 
 
 
 | 94 | 
 > | 
             <li>Distribution of core server components over CORBA to | 
 
 
 
 
 
 | 95 | 
 > | 
             allow appropriate components to run independently and to | 
 
 
 
 
 
 | 96 | 
 > | 
             allow new components to be written to conform with the | 
 
 
 
 
 
 | 97 | 
 > | 
             CORBA interfaces. | 
 
 
 
 
 
 | 98 | 
 > | 
             </li> | 
 
 
 
 
 
 | 99 | 
 > | 
             <li>Use of CORBA to create a hierarchical set of data entry | 
 
 
 
 
 
 | 100 | 
 > | 
             points to the system allowing the system to handle event | 
 
 
 
 
 
 | 101 | 
 > | 
             storms and remote office locations. | 
 
 
 
 
 
 | 102 | 
 > | 
             </li> | 
 
 
 
 
 
 | 103 | 
 > | 
             <li>One location for all system messages, despite being | 
 
 
 
 
 
 | 104 | 
 > | 
             distributed. | 
 
 
 
 
 
 | 105 | 
 > | 
             </li> | 
 
 
 
 
 
 | 106 | 
 > | 
             <li>XML data protocol used to make data processing and | 
 
 
 
 
 
 | 107 | 
 > | 
             analysing easily extendable | 
 
 
 
 
 
 | 108 | 
 > | 
             </li> | 
 
 
 
 
 
 | 109 | 
 > | 
             <li>A stateless server which can be moved and restarted at | 
 
 
 
 
 
 | 110 | 
 > | 
             will, while Hosts, Clients, and reporting tools are | 
 
 
 
 
 
 | 111 | 
 > | 
             unaffected and simply reconnect when the server is | 
 
 
 
 
 
 | 112 | 
 > | 
             available again. | 
 
 
 
 
 
 | 113 | 
 > | 
             </li> | 
 
 
 
 
 
 | 114 | 
 > | 
             <li>Simple and open end protocols to allow easy extension | 
 
 
 
 
 
 | 115 | 
 > | 
             and platform porting of Hosts and Clients. | 
 
 
 
 
 
 | 116 | 
 > | 
             </li> | 
 
 
 
 
 
 | 117 | 
 > | 
             <li>Self monitoring, as all data queues within the system | 
 
 
 
 
 
 | 118 | 
 > | 
             can be monitored and raise alerts to warn of event storms | 
 
 
 
 
 
 | 119 | 
 > | 
             and impending failures (should any occur). | 
 
 
 
 
 
 | 120 | 
 > | 
             </li> | 
 
 
 
 
 
 | 121 | 
 > | 
             <li>A variety of web based information displays based on | 
 
 
 
 
 
 | 122 | 
 > | 
             Java/SQL reporting and PHP on-the-fly page generation to | 
 
 
 
 
 
 | 123 | 
 > | 
             show the latest alerts and data | 
 
 
 
 
 
 | 124 | 
 > | 
             </li> | 
 
 
 
 
 
 | 125 | 
 > | 
             <li>Large overhead monitor Helpdesk style displays for | 
 
 
 
 
 
 | 126 | 
 > | 
             latest Alerting information | 
 
 
 
 
 
 | 127 | 
 > | 
             </li> | 
 
 
 
 
 
 | 128 | 
 > | 
           </ul> | 
 
 
 
 
 
 | 129 | 
 > | 
           <h3> | 
 
 
 
 
 
 | 130 | 
 > | 
             An Overview of the i-scream Central Monitoring System | 
 
 
 
 
 
 | 131 | 
 > | 
           </h3> | 
 
 
 
 
 
 | 132 | 
 > | 
           <p> | 
 
 
 
 
 
 | 133 | 
 > | 
             The i-scream system monitors status and performance | 
 
 
 
 
 
 | 134 | 
 > | 
             information obtained from machines feeding data into it and | 
 
 
 
 
 
 | 135 | 
 > | 
             then displays this information in a variety of ways. | 
 
 
 
 
 
 | 136 | 
 > | 
           </p> | 
 
 
 
 
 
 | 137 | 
 > | 
           <p> | 
 
 
 
 
 
 | 138 | 
 > | 
             This data is obtained through the running of small | 
 
 
 
 
 
 | 139 | 
 > | 
             applications on the reporting machines. These applications | 
 
 
 
 
 
 | 140 | 
 > | 
             are known as "Hosts". The i-scream system provides a range | 
 
 
 
 
 
 | 141 | 
 > | 
             of hosts which are designed to be small and lightweight in | 
 
 
 
 
 
 | 142 | 
 > | 
             their configuration and operation. See the website and | 
 
 
 
 
 
 | 143 | 
 > | 
             appropriate documentation to locate currently available | 
 
 
 
 
 
 | 144 | 
 > | 
             Host applications. These hosts are simply told where to | 
 
 
 
 
 
 | 145 | 
 > | 
             contact the server at which point they are totally | 
 
 
 
 
 
 | 146 | 
 > | 
             autonomous. They are able to obtain configuration from the | 
 
 
 
 
 
 | 147 | 
 > | 
             server, detect changes in their configuration, send data | 
 
 
 
 
 
 | 148 | 
 > | 
             packets (via UDP) containing monitoring information, and | 
 
 
 
 
 
 | 149 | 
 > | 
             send so called "Heartbeat" packets (via TCP) periodically | 
 
 
 
 
 
 | 150 | 
 > | 
             to indicate to the server that they are still alive. | 
 
 
 
 
 
 | 151 | 
 > | 
           </p> | 
 
 
 
 
 
 | 152 | 
 > | 
           <p> | 
 
 
 
 
 
 | 153 | 
 > | 
             It is then fed into the i-scream server. The server then | 
 
 
 
 
 
 | 154 | 
 > | 
             splits the data two ways. First it places the data in a | 
 
 
 
 
 
 | 155 | 
 > | 
             database system, typically MySQL based, for later | 
 
 
 
 
 
 | 156 | 
 > | 
             extraction and processing by the i-scream report generation | 
 
 
 
 
 
 | 157 | 
 > | 
             tools. It then passes it onto to real-time "Clients" which | 
 
 
 
 
 
 | 158 | 
 > | 
             handle the data as it enters the system. The system itself | 
 
 
 
 
 
 | 159 | 
 > | 
             has an internal real-time client called the "Local Client" | 
 
 
 
 
 
 | 160 | 
 > | 
             which has a series of Monitors running which can analyse | 
 
 
 
 
 
 | 161 | 
 > | 
             the data. One of these Monitors also feeds the data off to | 
 
 
 
 
 
 | 162 | 
 > | 
             a file repository, which is updated as new data comes in | 
 
 
 
 
 
 | 163 | 
 > | 
             for each machine, this data is then read and displayed by | 
 
 
 
 
 
 | 164 | 
 > | 
             the i-scream web services to provide a web interface to the | 
 
 
 
 
 
 | 165 | 
 > | 
             data. The system also allows TCP connections by non-local | 
 
 
 
 
 
 | 166 | 
 > | 
             clients (such as the i-scream supplied Conient), these | 
 
 
 
 
 
 | 167 | 
 > | 
             applications provide a real-time view of the data as it | 
 
 
 
 
 
 | 168 | 
 > | 
             flows through the system. | 
 
 
 
 
 
 | 169 | 
 > | 
           </p> | 
 
 
 
 
 
 | 170 | 
 > | 
           <p> | 
 
 
 
 
 
 | 171 | 
 > | 
             The final section of the system links the Local Client | 
 
 
 
 
 
 | 172 | 
 > | 
             Monitors to an alerting system. These Monitors can be | 
 
 
 
 
 
 | 173 | 
 > | 
             configured to detect changes in the data past threshold | 
 
 
 
 
 
 | 174 | 
 > | 
             levels. When a threshold is breached an alert is raised. | 
 
 
 
 
 
 | 175 | 
 > | 
             This alert is then escalated as the alert persists through | 
 
 
 
 
 
 | 176 | 
 > | 
             four live levels, NOTICE, WARNING, CAUTION and CRITICAL. | 
 
 
 
 
 
 | 177 | 
 > | 
             The alerting system keeps an eye on the level and when a | 
 
 
 
 
 
 | 178 | 
 > | 
             certain level is reached, certain alerting mechanisms fire | 
 
 
 
 
 
 | 179 | 
 > | 
             through whatever medium they are configured to send. | 
 
 
 
 
 
 | 180 | 
 > | 
           </p> | 
 
 
 
 
 
 | 181 | 
 > | 
         </div> | 
 
 
 
 
 
 
 
 
 
 
 
 | 182 | 
   | 
 <!--#include virtual="/footer.inc" --> | 
 
 
 
 
 
 
 
 
 
 
 
 | 183 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 165 | 
 < | 
 </div> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 166 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 | 183 | 
 > | 
       </div> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 184 | 
   | 
 <!--#include virtual="/menu.inc" --> | 
 
 
 
 
 
 
 
 
 
 
 
 | 185 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 186 | 
 < | 
 </div> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 170 | 
 < | 
  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 171 | 
 < | 
 </body> | 
 
 
 
 
 
 
 
 
 
 | 185 | 
 > | 
     </div> | 
 
 
 
 
 
 | 186 | 
 > | 
   </body> | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 | 187 | 
   | 
 </html> |