Recent Changes - Search:

Project Status

Project Documents


Bug Tracking

edit SideBar


1. Introduction

1.1. Abstract

This document will outline the current design ideas and considerations for the Photizo project. This includes the Data Importer; Authentication Framework; Station/Sensor Profile Editing; Data Store; Data Processing; Data Display: Current Conditions and Last 24 Hours; Station and Sensor Tests; Test Result Storage; and Diagnostic Pages and Notifications.

1.2. Design Summary

Photizo is comprised of several discrete components of varying size and complexity. The most visible will be the Current Conditions display, but will be relatively simple since it is simple data extraction and display. The most complex, and least visible, parts will be the Station/Sensor Profile Editing and Station and Sensor Tests. The current plan is for the data store to be a MySQL database, but an ORM will be used, so ultimately, the backend database will not matter. There will be further abstraction on top of the ORM to make storing a retrieving readings for stations simple and (hopefully) efficient.

1.3. Component Diagram

This diagram shows (conceptually) how the various components in Photizo will be related.

1.4. A note about software reuse

We will, whenever practical, reuse existing code and software. Trying to reinvent the wheel--especially a mature, well designed wheel--consumes inordinate amounts of time, and often results in a poorly implemented wheel.

Some packages we will probably be using:

  • PyTables for storing data. The hierarchical format will work well with station data, and the CSTables sister project will be used for client/server communication.
  • SymPy Computer Algebra System will allow arbitrary equations for converting and processing incoming readings.
  • Python Dateutil for manipulating and comparing times
  • Thrift "Thrift is a software framework for scalable cross-language services development." Allows defining a service once, and auto-generating code for access to that service in multiple languages. More information in the white paper.

2. Design

2.1. Data Import Modules

2.1.1. Discussion

Currently, data is being imported from files in the TAO5 CSV-like format. There are four header lines (with the second line being the column names), following by data in a CSV like format. However, other formats, such as the data logger "Array" format, which has no column headers, may also be imported. An import module will implement an interface while can be used to obtain data from the data format it has been designed to read and process.

Data importer classes should inherit from a common base class. Not only will this make object typing easier, it will save duplicate code by doing such things as defining exceptions, as well as things like a "universal file opener" which will enable a class to open just about any URL (file, http, ftp, etc) by default.

2.1.2. Methods Loading readings It needs to be told where to find the data. It needs a list of columns it is expected to find. It must raise exceptions if it does not find these columns, or finds more columns than it's expecting. If given a date/time, it will only pull in data including, and after, that date/time. The module must convert the date/time to the format for storing date/time in the logger data. Returning the readings after a load The module must return the readings the importer just just read in.

2.2. Data Store Structure and Layout

2.2.1. Discussion

On the surface, the storage of data is easy. Read it in, and put it in a table, read it out later. But, unfortunately, it's not that simple. Every table in a data logger might have a different number of fields. There is no universal structure between tables, nor a universal structure between data loggers for a given table (e.g. HourlyAvg, HrlyDiag, etc). As discussed in Data Storage and Retrieval, there are two possible ways to go about this. One is an Entity-Attribute-Value model, and the other is one data table per logger table. While neither are very elegant solutions, it seems one data table per logger table will be the best solution in the long run.


  • Much faster data access, as there is no need to do the row/column transforms every time data is inserted or retrieved.
  • When the tables are opened with viewing tools, the data is in a "human readable" format.
  • If data is accessed from other languages, the routines to do the row/column transforms don't have to be rewritten.


  • A new table must be created every time the column headers in a logger table change. This is not as large a problem as it may seem, as the profile for the station will have to be updated anyway, thus the new table can be created when the profile is updated.
  • Logic must be written to search across all data tables for a particular logger table. A scheme, probably date based, needs to be devised to determine when a particular profile/data table pair was in use.

2.2.2. Methodology Every time a profile affecting a logger table is changed, the associated storage table must be checked. If the change is nothing more than a reordering of columns, then no changes are needed If the number of columns change, or the names of columns change, then a new table must be created. If the change is nothing more than renaming columns (only the name changes, not the actual function of the column, or what it measures), then it is possible that only the structure of the storage table would be modified.

See more discussion in the section about profile editing.

2.3. Station and Sensor Profiles

2.3.1. Discussion

At the station (logger) level, Photizo must know what sensors are hooked up to the station, what the operating parameters are for each sensor, and what columns in the logger tables are fed by each sensor.

2.3.2. Sensor Profiles The operating parameters such as high and low values, as well as what a failure code might be, must be in the data store so they can be compared against incoming data. The maximum effective range must also be known. For example, some temperature sensors only give valid readings down to -40 Celsius. Once this sensor reaches this value, it's readings must be held suspect, and/or another sensor, which produces valid readings at temperatures colder than this, must be used. A sensor should have a type. For example, any sensor that measures air temperature would have a type such as "AirTemp." Then which AirTemp sensor to use at the moment would be determined by the sensor's priority (see next point) and checks to determine if the sensor is operating in range, as well as any other tests defined on the sensor. A sensor should have a priority. For example, if three identical sensors are installed for redundancy, it should be able to specify a primary, secondary, and tertiary sensor. The primary sensor would be used unless and until something indicated its values should no longer be trusted, then it would "fail over" to the secondary sensor, and so on.

2.3.3. Station Profiles The full complement of sensors installed at a station must be defined. The data tables being produced by the station must be defined. The names of the columns in the tables, and the sensors to which they refer, must be defined. Each sensor defined will need (yet to be defined) metadata such as location, mounting height, mounting depth, etc.

2.3.4. Meta Sensors

Can relate to sensor priorities. A meta sensor will define a virtual sensor which pulls its data from two or more other sensors. The value reported will be based on rules such as priority and failover, or on mathematical rules such as taking an average.

2.4. Station and Sensor Profile/Metadata Editing

2.4.1. Discussion

The very fact that we have Metadata is good, but unless Metadata is kept up-to-date and accurate, it becomes useless in a hurry. Thus, we must have facilities to quickly and easily edit the profiles and metadata. A web interface will be used. See below for discussion and design of the Auth/Auth capabilities.

2.4.2. Sensor Profiles/Metadata

Each sensor must store the information about itself as detailed at RequiredSensorMetadata. Some of that information will be "global" for the sensor, and other metadata will only apply to a specific installation (such as captions and expected readings range).

2.4.3. Station Profiles/Metadata

Each station must store information about itself detailed at RequiredStationMetadata. As mentioned above, the current table configuration must be stored, as well as past table configurations. To simplify things, much of the required station, table, and column metadata will be stored in the table itself since PyTables supports metadata on tables and columns.

2.4.4. Editing the Metadata

Upon login, the web interface will present a list of networks (e.g. North Slope Lakes, CCHRC, Fairbanks Mesonet, etc.), as well as a link to the list of defined sensors.

The user will be able to click on one of these networks, and be presented with a list of stations in the network. Clicking on the station will present some information about the station, including, but not limited to, the current station metadata, and the tables defined on the station. Clicking on a "Sensors" link would lead to a list of sensors, each editable (at least the "station level" settings). Clicking on a table name would give the table's list of columns, each editable by clicking on the column. Editing the Network

Each network has a list of station, possibly broken down into sub-networks (i.e. groups) to align along funding or other lines. The order in which stations are to be displayed must be stored too. Editing a Station

Metadata for a station will be editable by those with appropriate credentials. Editing Defined Sensors It must be possible to delete sensors existing on a station (although this will probably not be a common occurrance), As well as add sensors from the currently defined global list of sensors. Editing existing sensors (station-level metadata) will also be possible. Adding, editing, and deleting global sensors must also be possible. Referential integrity must be assured at all times. Editing Station Tables It must be possible to add, delete, and modify tables on a station. If a table definition changes, the old table must be archived, and a new table created. The procedure for archiving has not been determined yet (beyond renaming the table). The date coverage of a table must be tracked, either in a separate file, or in the metadata of the table itself. Tests will be defined at the table level since a test outcome depends on the range of the input.

2.5. Retrieving Data

2.5.1. Discussion

Now that we have the data imported and stored, and optionally tested, we need to be able to retrieve the data for display, graphing, or other analysis. Right now, the planned interfaces are direct access (opening the database files directly), web forms (for downloading extracts of data), and some kind of RPC mechanism (probably using Thrift, see the "Software Reuse" section). We want the data to be easily accessable from a variety of platforms and interfaces, and to allow other to obtain the data they want with a minimum of help from us. Web forms and RPC mechanisms will assist in this.

2.5.2. Retrieving Metadata All data accesses will be check against the Auth/Auth system. Once logged in, you will be able to retrieve the networks, stations, and tables available for download/introspection. There will be an "anonymous" user level that can retrieve all public data. Most all metadata shall be available, but information such as IPs of Moxa's and Pakbus IDs should not be made public (i.e. not available via anonymous access).

2.5.3. Retrieving Data Data will be retrieved by specifying the network, station, table, date range, and (optionally) columns desired. Also, optionally, will be specifying if a column name should be replaced by its "friendly name," if a friendly name is defined for that column. All stations will have a metasensors table that will retrieve data based on the rules defined for that station's metasensors.

2.6. Reporting

2.6.1. Discussion

The whole point of this project (or at least most of it) is to make data extraction and reporting of anomalies easy. Whether it be the hourly reporting or graphing, or ad-hoc reports and graphs for special projects, it will all (hopefully) be simplified by having everything in a good framework makes reporting much easier.

2.6.2. Public Reporting Will generated by Kid or Genshi templates. Will use the public interfaces for retrieving data. Will report a common set of meteorlogical conditions. Currently, those are: Wind speed, direction, and gust Battery Voltage and Panel Temperature Air Temperature, Relative Humidity, Dew Point, Precipitation, and Barometric pressure Others to be determined All fields will be conditionals in case some sensors aren't available on a station All meteorlogical sensors will also be present in a 24 hour history table (most recent first) Graphs of desired sensors will be generated

2.6.3. Private Reporting Will report information including (but not limited to): Station Name Last data download Battery Voltage Panel Temperature Any active alerts for the station Will allow "drilling down." E.g. clicking on the station to see list of errors, and which sensors are reporting errors

2.6.4. Out-of-band Reporting Users can subscribe to alerts and define the method by which the alert will reach them Alert methods will use a plugin architecture so that new methods can be defined. Common requirements for an alert plugin: Address Address format (a format against which it can check the address for validity) User/pass, if needed The plugin will know how to send the address it is given A user will subscribe to alerts Can subscribe at the Network or station level Select how to receive alerts on subscription When an alert is raised, any subscribed user will receive alerts via the selected medium

2.7. Auth/Auth

2.7.1. Discussion

While most of the data in the Photizo system will be public, there will be areas that will require a user to be authenticated and authorized to view and edit data. This is most clearly seen in the editing of metadata for the stations and sensors, as well as editing the station profiles. Viewing possibly sensitive information such as Moxa IPs and PakBus IDs will require proper authorization. The permissions system does not have to be complicated, as there is not much to modify from a web interface.

2.7.2. Users Information about users will include User name Password (encrypted, preferably with something like SHA256) Possible IP restrictions Authorized actions (actions to be defined)

2.7.3. Authentication Will use an RPC back end, similar to the data store. No passwords shall be transmitted over the wire. A challenge/response system will be used. Will authenticate on user/password pairs

2.7.4. Authorization Will authorize actions at global,network,station, or table level Can apply "negative" permissions below a positive permission Possible permissions: Read Modify Others?

2.8. Tests

2.8.1. Discussion

Probably the most useful, but most complex, part of photizo will be the testing framework. The idea, and goal, is to create a system flexible enough whereby a test can be defined in a module, imported, and the needed values from a sensor run through it. Designing a system flexible enough to cover all (forseen) cases could be fun.

2.8.2. Design/Ideas All tests will be implemented as classes A module will only contain one test All tests will be implemented as classes All tests will inherit from the PhotizoTestBase class All tests must define: Test name Required parameters (paramater glossary to TBD) The executed test All tests will need A value(s) to test A dict of ranges defining normal/warning/error operating parameters for a station


Edit - History - Print - Recent Changes - Search
Page last modified on April 05, 2007, at 02:04 AM