Thursday, August 17, 2006

XML database retrieval and capture.

Original Brief (as per Proposed Project Specifications document:
3. Design and implementation of general purpose Lift/Escalator information database.
- Design and implement fully normalised data structures
- Specify complete data dictionary to support all analysis and presentation requirements
- Design and implementation of XML interface to accept data and add it to datastore
- Design and implementation of efficient queries (including necessary indexing)
- Investigate storage technology options (eg object database)
- Investigate query technologies (eg SQL, X-Query)
- Investigate integration with third party applications (eg Word, Excel)

This brief is open to interpretation and should be treated as a list of possible areas to cover. There is no restriction on further work being added or items being omitted where desirable.

Three aspects of database design:
Structure (tables, columns and indexes),
Content (storing real data) and
Reporting (retrieval and analysis of data).

Standard method for relational database design is "Normalisation of Data" – ensures that data is in accessible form (ie it is well structured, not duplicated and can be searched effectively using the necessary criteria). This should be performed on the known data attributes to produce a well structured database.
Database selection:
Suggest using MySQL database, which is freely available, runs on multiple platforms (including Windows), and is in common use. Using XML with MySQL

There are alternatives (including: and ) - research required to identify these and compare benefits.
Xindice - uses XPath and XUpdate. Good for sending and retrieving partial data sets. Not suitable for large files. Not widely used and hard to find available scripts. Not well developed.

Native XML access to a database (ie you don't have to write software to convert to/from XML format in order to store/retreive data from the database) can be achieved via XPath and Xupdate.
There is no requirement for the database to support large numbers of concurrent users not to hold vast quantities of data or very large data items, so all these aspects may be dealt with summarily.
Data Capture:
Data will arrive in small discrete "events" encoded in XML (eg after decoding means that Lift "1", Group "Annex - High Rise", Building "Northwest Bank Building", changed Status to – "Out-of-Service" at 15:33 on 24-02-2006). This is then stored in the database table for Lift Status and indexed against the appropriate lift, group and building). Decoding of incoming XML could be via standard (procedural) programming language (eg Java, PHP, many others) or could be via XSL transform (which simply transforms XML into a different format using transformation rules) or could be directly to an XML interface of the database if such is available (eg XUpdate).

Data Retrieval:
Could be via SQL (the standard db query language) where pre-formatted queries which take parameters can themselves be stored in the database (stored procedures). Could be via XML queries in the XPath language.

Queries will use the indexes of the relevant database tables to select the required data and to filter out irrelevant data (eg Select * from LiftStatus where Bld=45 AND Grp=2 AND date>=01/01/2005 AND date<01/01/06). NB in this query the textual name of the Grp and Building have been replaced by enumerations (ie 2 & 45) which may be necessary for efficient querying (instead of having textual indexes but adds a layer of complication because each building has to be assigned a numeric ID. An alternative, easliy available in Java is to generate a "hash value" from the text and use that as the indexable ID value.

Analysis to be supported:
Section 2 of the original CCMS Architecture document identifies three possible types of presentation which the stored data and subsequent analysis must be able to support.

- Status report of %age In-service, Maintenance, Independent and Out-of-Service (eg as pie charts etc) for a selected lift, group or building
- Traffic analysis in terms of calls registered and response times (eg as a histogram showing numbers of calls in 5-minute samples over a period). This would be analysed for a group or a building but may be isolated for specific floors or groups of floors and may also specific call types (eg down calls, car calls, etc). A variation of this, which is much more valuable, is to identify separately "incoming traffic", "outgoing traffic" and "inter-floor traffic" (derived from the PhD thesis of Dr. Marja-Liisa Siikonen, now working for Kone), but this is very difficult given that we only have call data and not individual journey/passenger data (as would be available from a simulation).
- Dynamic graphical display of current status and operation of a group of lifts. This data is dynamic and is not really suited to storage in a database – it is just a current "snap-shot" – so may not include this. Use CIBSE Guide D and/or the Standard Elevator Information Schema to identify the data types that are relevant.

A good source of possible analysis presentations and techniques would be Richard Peters' Elevate simulator though again with the proviso that the data provided by the simulator is passenger-oriented and contains much more detail than can be expected from a live lift system.

Project write-up and analysis
Should include discussion of efficiency of data storage and subsequent retrieval of queries.
Should discuss adherence to standards.
Flexibility for future extension/enhancement and what those enhancements might be.

How the database should be set-up/configured and subsequently managed ie which buildings, how many groups in each building and how many lifts in each group.

Control of access to specific types of data by specified groups of users and what those groups might be (ie what is this used for and by whom).

No comments: