| 1 | 
 tdb | 
 1.4 | 
 <!--#include virtual="/doctype.inc" --> | 
 
 
 
 
 
 
 
 
 | 2 | 
 tdb | 
 1.5 | 
   <head> | 
 
 
 
 
 
 | 3 | 
   | 
   | 
     <title> | 
 
 
 
 
 
 | 4 | 
   | 
   | 
       CMS Features | 
 
 
 
 
 
 | 5 | 
   | 
   | 
     </title> | 
 
 
 
 
 
 
 
 
 | 6 | 
 tdb | 
 1.3 | 
 <!--#include virtual="/style.inc" --> | 
 
 
 
 
 
 
 
 
 | 7 | 
 tdb | 
 1.5 | 
   </head> | 
 
 
 
 
 
 | 8 | 
   | 
   | 
   <body> | 
 
 
 
 
 
 | 9 | 
   | 
   | 
     <div id="container"> | 
 
 
 
 
 
 | 10 | 
   | 
   | 
       <div id="main"> | 
 
 
 
 
 
 
 
 
 | 11 | 
 tdb | 
 1.3 | 
 <!--#include virtual="/header.inc" --> | 
 
 
 
 
 
 
 
 
 | 12 | 
 tdb | 
 1.5 | 
         <div id="contents"> | 
 
 
 
 
 
 | 13 | 
   | 
   | 
           <h1 class="top"> | 
 
 
 
 
 
 | 14 | 
   | 
   | 
             CMS Features | 
 
 
 
 
 
 | 15 | 
   | 
   | 
           </h1> | 
 
 
 
 
 
 | 16 | 
   | 
   | 
           <h2> | 
 
 
 
 
 
 | 17 | 
   | 
   | 
             Problem Specification | 
 
 
 
 
 
 | 18 | 
   | 
   | 
           </h2> | 
 
 
 
 
 
 | 19 | 
   | 
   | 
           <h3> | 
 
 
 
 
 
 | 20 | 
   | 
   | 
             Original Problem | 
 
 
 
 
 
 | 21 | 
   | 
   | 
           </h3> | 
 
 
 
 
 
 | 22 | 
   | 
   | 
           <p> | 
 
 
 
 
 
 | 23 | 
   | 
   | 
             This is the original specification given to us when we | 
 
 
 
 
 
 | 24 | 
   | 
   | 
             started the project. The i-scream central monitoring system | 
 
 
 
 
 
 | 25 | 
   | 
   | 
             meets this specification, and aims to extend it further. | 
 
 
 
 
 
 | 26 | 
   | 
   | 
             This is, however, where it all began. | 
 
 
 
 
 
 | 27 | 
   | 
   | 
           </p> | 
 
 
 
 
 
 | 28 | 
   | 
   | 
           <h3> | 
 
 
 
 
 
 | 29 | 
   | 
   | 
             Centralised Machine Monitoring | 
 
 
 
 
 
 | 30 | 
   | 
   | 
           </h3> | 
 
 
 
 
 
 | 31 | 
   | 
   | 
           <p> | 
 
 
 
 
 
 | 32 | 
   | 
   | 
             The Computer Science department has a number of different | 
 
 
 
 
 
 | 33 | 
   | 
   | 
             machines running a variety of different operating systems. | 
 
 
 
 
 
 | 34 | 
   | 
   | 
             One of the tasks of the systems administrators is to make | 
 
 
 
 
 
 | 35 | 
   | 
   | 
             sure that the machines don't run out of resources. This | 
 
 
 
 
 
 | 36 | 
   | 
   | 
             involves watching processor loads, available disk space, | 
 
 
 
 
 
 | 37 | 
   | 
   | 
             swap space, etc. | 
 
 
 
 
 
 | 38 | 
   | 
   | 
           </p> | 
 
 
 
 
 
 | 39 | 
   | 
   | 
           <p> | 
 
 
 
 
 
 | 40 | 
   | 
   | 
             It isn't practicle to monitor a large number of machines by | 
 
 
 
 
 
 | 41 | 
   | 
   | 
             logging on and running commands such as 'uptime' on the | 
 
 
 
 
 
 | 42 | 
   | 
   | 
             unix machines, or by using performance monitor for NT | 
 
 
 
 
 
 | 43 | 
   | 
   | 
             servers. Thus this project is to write monitoring software | 
 
 
 
 
 
 | 44 | 
   | 
   | 
             for each platform supported which reports resource usage | 
 
 
 
 
 
 
 
 
 | 45 | 
 tdb | 
 1.6 | 
             back to one centralised location. System Administrators | 
 
 
 
 
 
 
 
 
 | 46 | 
 tdb | 
 1.5 | 
             would then be able to monitor all machines from this | 
 
 
 
 
 
 | 47 | 
   | 
   | 
             centralised location. | 
 
 
 
 
 
 | 48 | 
   | 
   | 
           </p> | 
 
 
 
 
 
 | 49 | 
   | 
   | 
           <p> | 
 
 
 
 
 
 | 50 | 
   | 
   | 
             Once this basic functionality is implemented it could | 
 
 
 
 
 
 | 51 | 
   | 
   | 
             usefully be expanded to include logging of resource usage | 
 
 
 
 
 
 | 52 | 
   | 
   | 
             to identify longterm trends/problems, alerter services | 
 
 
 
 
 
 | 53 | 
   | 
   | 
             which can directly contact sysadmins (or even the general | 
 
 
 
 
 
 | 54 | 
   | 
   | 
             public) to bring attention to problem areas. Ideally it | 
 
 
 
 
 
 | 55 | 
   | 
   | 
             should be possible to run multiple instances of the | 
 
 
 
 
 
 | 56 | 
   | 
   | 
             reporting tool (with all instances being updated in | 
 
 
 
 
 
 | 57 | 
   | 
   | 
             realtime) and to to be able to run the reporting tool as | 
 
 
 
 
 
 | 58 | 
   | 
   | 
             both as stand alone application and embeded in a web page. | 
 
 
 
 
 
 | 59 | 
   | 
   | 
           </p> | 
 
 
 
 
 
 | 60 | 
   | 
   | 
           <p> | 
 
 
 
 
 
 | 61 | 
   | 
   | 
             This project will require you to write code for the unix | 
 
 
 
 
 
 | 62 | 
   | 
   | 
             and Win32 APIs using C and knowledge of how the underlying | 
 
 
 
 
 
 | 63 | 
   | 
   | 
             operating systems manage resources. It will also require | 
 
 
 
 
 
 | 64 | 
   | 
   | 
             some network/distributed systems code and a GUI front end | 
 
 
 
 
 
 | 65 | 
   | 
   | 
             for the reporting tool. It is important for students | 
 
 
 
 
 
 | 66 | 
   | 
   | 
             undertaking this project to understand the importance of | 
 
 
 
 
 
 | 67 | 
   | 
   | 
             writing efficient and small code as the end product will | 
 
 
 
 
 
 | 68 | 
   | 
   | 
             really be most useful when machines start run out of | 
 
 
 
 
 
 | 69 | 
   | 
   | 
             processing power/memory/disk. | 
 
 
 
 
 
 | 70 | 
   | 
   | 
           </p> | 
 
 
 
 
 
 | 71 | 
   | 
   | 
           <p> | 
 
 
 
 
 
 | 72 | 
   | 
   | 
             John Cinnamond (email jc) whose idea this is, will provide | 
 
 
 
 
 
 | 73 | 
   | 
   | 
             technical support for the project. | 
 
 
 
 
 
 | 74 | 
   | 
   | 
           </p> | 
 
 
 
 
 
 | 75 | 
   | 
   | 
           <h2> | 
 
 
 
 
 
 | 76 | 
   | 
   | 
             Features | 
 
 
 
 
 
 | 77 | 
   | 
   | 
           </h2> | 
 
 
 
 
 
 | 78 | 
   | 
   | 
           <h3> | 
 
 
 
 
 
 | 79 | 
   | 
   | 
             Key Features of The System | 
 
 
 
 
 
 | 80 | 
   | 
   | 
           </h3> | 
 
 
 
 
 
 | 81 | 
   | 
   | 
           <ul> | 
 
 
 
 
 
 | 82 | 
   | 
   | 
             <li>A centrally stored, dynamically reloaded, system wide | 
 
 
 
 
 
 | 83 | 
   | 
   | 
             configuration system | 
 
 
 
 
 
 | 84 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 85 | 
   | 
   | 
             <li>A totally extendable monitoring system, nothing except | 
 
 
 
 
 
 | 86 | 
   | 
   | 
             the Host (which generates the data) and the Clients (which | 
 
 
 
 
 
 | 87 | 
   | 
   | 
             view it) know any details about the data being sent, | 
 
 
 
 
 
 | 88 | 
   | 
   | 
             allowing data to be modified without changes to the server | 
 
 
 
 
 
 | 89 | 
   | 
   | 
             architecture. | 
 
 
 
 
 
 | 90 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 91 | 
   | 
   | 
             <li>Central server and reporting tools all Java based for | 
 
 
 
 
 
 | 92 | 
   | 
   | 
             multi-platform portability | 
 
 
 
 
 
 | 93 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 94 | 
   | 
   | 
             <li>Distribution of core server components over CORBA to | 
 
 
 
 
 
 | 95 | 
   | 
   | 
             allow appropriate components to run independently and to | 
 
 
 
 
 
 | 96 | 
   | 
   | 
             allow new components to be written to conform with the | 
 
 
 
 
 
 | 97 | 
   | 
   | 
             CORBA interfaces. | 
 
 
 
 
 
 | 98 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 99 | 
   | 
   | 
             <li>Use of CORBA to create a hierarchical set of data entry | 
 
 
 
 
 
 | 100 | 
   | 
   | 
             points to the system allowing the system to handle event | 
 
 
 
 
 
 | 101 | 
   | 
   | 
             storms and remote office locations. | 
 
 
 
 
 
 | 102 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 103 | 
   | 
   | 
             <li>One location for all system messages, despite being | 
 
 
 
 
 
 | 104 | 
   | 
   | 
             distributed. | 
 
 
 
 
 
 | 105 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 106 | 
   | 
   | 
             <li>XML data protocol used to make data processing and | 
 
 
 
 
 
 | 107 | 
   | 
   | 
             analysing easily extendable | 
 
 
 
 
 
 | 108 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 109 | 
   | 
   | 
             <li>A stateless server which can be moved and restarted at | 
 
 
 
 
 
 | 110 | 
   | 
   | 
             will, while Hosts, Clients, and reporting tools are | 
 
 
 
 
 
 | 111 | 
   | 
   | 
             unaffected and simply reconnect when the server is | 
 
 
 
 
 
 | 112 | 
   | 
   | 
             available again. | 
 
 
 
 
 
 | 113 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 114 | 
   | 
   | 
             <li>Simple and open end protocols to allow easy extension | 
 
 
 
 
 
 | 115 | 
   | 
   | 
             and platform porting of Hosts and Clients. | 
 
 
 
 
 
 | 116 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 117 | 
   | 
   | 
             <li>Self monitoring, as all data queues within the system | 
 
 
 
 
 
 | 118 | 
   | 
   | 
             can be monitored and raise alerts to warn of event storms | 
 
 
 
 
 
 | 119 | 
   | 
   | 
             and impending failures (should any occur). | 
 
 
 
 
 
 | 120 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 121 | 
   | 
   | 
             <li>A variety of web based information displays based on | 
 
 
 
 
 
 | 122 | 
   | 
   | 
             Java/SQL reporting and PHP on-the-fly page generation to | 
 
 
 
 
 
 | 123 | 
   | 
   | 
             show the latest alerts and data | 
 
 
 
 
 
 | 124 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 125 | 
   | 
   | 
             <li>Large overhead monitor Helpdesk style displays for | 
 
 
 
 
 
 | 126 | 
   | 
   | 
             latest Alerting information | 
 
 
 
 
 
 | 127 | 
   | 
   | 
             </li> | 
 
 
 
 
 
 | 128 | 
   | 
   | 
           </ul> | 
 
 
 
 
 
 | 129 | 
   | 
   | 
           <h3> | 
 
 
 
 
 
 | 130 | 
   | 
   | 
             An Overview of the i-scream Central Monitoring System | 
 
 
 
 
 
 | 131 | 
   | 
   | 
           </h3> | 
 
 
 
 
 
 | 132 | 
   | 
   | 
           <p> | 
 
 
 
 
 
 | 133 | 
   | 
   | 
             The i-scream system monitors status and performance | 
 
 
 
 
 
 | 134 | 
   | 
   | 
             information obtained from machines feeding data into it and | 
 
 
 
 
 
 | 135 | 
   | 
   | 
             then displays this information in a variety of ways. | 
 
 
 
 
 
 | 136 | 
   | 
   | 
           </p> | 
 
 
 
 
 
 | 137 | 
   | 
   | 
           <p> | 
 
 
 
 
 
 | 138 | 
   | 
   | 
             This data is obtained through the running of small | 
 
 
 
 
 
 | 139 | 
   | 
   | 
             applications on the reporting machines. These applications | 
 
 
 
 
 
 | 140 | 
   | 
   | 
             are known as "Hosts". The i-scream system provides a range | 
 
 
 
 
 
 | 141 | 
   | 
   | 
             of hosts which are designed to be small and lightweight in | 
 
 
 
 
 
 | 142 | 
   | 
   | 
             their configuration and operation. See the website and | 
 
 
 
 
 
 | 143 | 
   | 
   | 
             appropriate documentation to locate currently available | 
 
 
 
 
 
 | 144 | 
   | 
   | 
             Host applications. These hosts are simply told where to | 
 
 
 
 
 
 | 145 | 
   | 
   | 
             contact the server at which point they are totally | 
 
 
 
 
 
 | 146 | 
   | 
   | 
             autonomous. They are able to obtain configuration from the | 
 
 
 
 
 
 | 147 | 
   | 
   | 
             server, detect changes in their configuration, send data | 
 
 
 
 
 
 | 148 | 
   | 
   | 
             packets (via UDP) containing monitoring information, and | 
 
 
 
 
 
 | 149 | 
   | 
   | 
             send so called "Heartbeat" packets (via TCP) periodically | 
 
 
 
 
 
 | 150 | 
   | 
   | 
             to indicate to the server that they are still alive. | 
 
 
 
 
 
 | 151 | 
   | 
   | 
           </p> | 
 
 
 
 
 
 | 152 | 
   | 
   | 
           <p> | 
 
 
 
 
 
 | 153 | 
   | 
   | 
             It is then fed into the i-scream server. The server then | 
 
 
 
 
 
 | 154 | 
   | 
   | 
             splits the data two ways. First it places the data in a | 
 
 
 
 
 
 | 155 | 
   | 
   | 
             database system, typically MySQL based, for later | 
 
 
 
 
 
 | 156 | 
   | 
   | 
             extraction and processing by the i-scream report generation | 
 
 
 
 
 
 | 157 | 
   | 
   | 
             tools. It then passes it onto to real-time "Clients" which | 
 
 
 
 
 
 | 158 | 
   | 
   | 
             handle the data as it enters the system. The system itself | 
 
 
 
 
 
 | 159 | 
   | 
   | 
             has an internal real-time client called the "Local Client" | 
 
 
 
 
 
 | 160 | 
   | 
   | 
             which has a series of Monitors running which can analyse | 
 
 
 
 
 
 | 161 | 
   | 
   | 
             the data. One of these Monitors also feeds the data off to | 
 
 
 
 
 
 | 162 | 
   | 
   | 
             a file repository, which is updated as new data comes in | 
 
 
 
 
 
 | 163 | 
   | 
   | 
             for each machine, this data is then read and displayed by | 
 
 
 
 
 
 | 164 | 
   | 
   | 
             the i-scream web services to provide a web interface to the | 
 
 
 
 
 
 | 165 | 
   | 
   | 
             data. The system also allows TCP connections by non-local | 
 
 
 
 
 
 | 166 | 
   | 
   | 
             clients (such as the i-scream supplied Conient), these | 
 
 
 
 
 
 | 167 | 
   | 
   | 
             applications provide a real-time view of the data as it | 
 
 
 
 
 
 | 168 | 
   | 
   | 
             flows through the system. | 
 
 
 
 
 
 | 169 | 
   | 
   | 
           </p> | 
 
 
 
 
 
 | 170 | 
   | 
   | 
           <p> | 
 
 
 
 
 
 | 171 | 
   | 
   | 
             The final section of the system links the Local Client | 
 
 
 
 
 
 | 172 | 
   | 
   | 
             Monitors to an alerting system. These Monitors can be | 
 
 
 
 
 
 | 173 | 
   | 
   | 
             configured to detect changes in the data past threshold | 
 
 
 
 
 
 | 174 | 
   | 
   | 
             levels. When a threshold is breached an alert is raised. | 
 
 
 
 
 
 | 175 | 
   | 
   | 
             This alert is then escalated as the alert persists through | 
 
 
 
 
 
 | 176 | 
   | 
   | 
             four live levels, NOTICE, WARNING, CAUTION and CRITICAL. | 
 
 
 
 
 
 | 177 | 
   | 
   | 
             The alerting system keeps an eye on the level and when a | 
 
 
 
 
 
 | 178 | 
   | 
   | 
             certain level is reached, certain alerting mechanisms fire | 
 
 
 
 
 
 | 179 | 
   | 
   | 
             through whatever medium they are configured to send. | 
 
 
 
 
 
 | 180 | 
   | 
   | 
           </p> | 
 
 
 
 
 
 | 181 | 
   | 
   | 
         </div> | 
 
 
 
 
 
 
 
 
 | 182 | 
 tdb | 
 1.3 | 
 <!--#include virtual="/footer.inc" --> | 
 
 
 
 
 
 
 
 
 | 183 | 
 tdb | 
 1.5 | 
       </div> | 
 
 
 
 
 
 
 
 
 | 184 | 
 tdb | 
 1.3 | 
 <!--#include virtual="/menu.inc" --> | 
 
 
 
 
 
 
 
 
 | 185 | 
 tdb | 
 1.5 | 
     </div> | 
 
 
 
 
 
 | 186 | 
   | 
   | 
   </body> | 
 
 
 
 
 
 
 
 
 | 187 | 
 tdb | 
 1.1 | 
 </html> |