This section will provide an outline of the purpose of this document, the scope of the product, and an overview of the rest of the Project Requirements.
The Project Requirements will outline the functional and non-functional requirements of the Photizo project. It will also outline the user and system requirements. The intended audience for this document includes the Photizo development team; the project stakeholders (EE Internet and GW Scientific); as well as the members of my graduate committee: Dr. Knoke, Dr. Nance, and Dr. Roth.
GW Scientific currently collects sensor data from a wide range of data collection networks. This data is collected and displayed (both textual and graphical representations) on a variety of web pages, and is distributed to a variety of clients. Quality assurance (QA) and quality control (QC) on this data, however, if done, must be done by manual inspection. Needless to say, this is a labor intensive process. The goal of the Photizo project is to create a framework that will provide data processing, QA/QC, and data visualization, all in an automated fashion.
Section 2 will give an overall description of the software, including the product perspective, product functions, user characteristics, constraints, and assumptions. Section 3 details the software’s interface requirements. Section 4 gives detailed descriptions of each of the product’s functional requirements and features. Section 5 describes the product’s nonfunctional requirements, including performance, maintainability, quality, documentation, security, and re-usability.
This section will give an overview of the product being specified and the environment in which it will be used, the anticipated users of the product, and the known constraints, assumptions and dependencies.
This project is being developed as an original piece of software designed to run on a modern computer. While being an original piece of software, it will most likely use many Open Source components in its construction. As such, it will be released under and Open Source compatible license. After the initial development of the software, its maintenance will be the responsibility of EEI/GWS, probably by the primary author.
The software will maintain a collection of metadata on the stations and sensors, and will include facilities for editing this metadata.
The software will import sensor data into the database
The software will do required post-collection processing using the metadata collection.
The software will do quality analysis/control on the collected data using the metadata collection as well as testing modules. This includes activating alerts if certain tests fail.
The software will produce web pages and graphs to visualize the data that has been imported.
There will be two primary classes of users: The operators, who will maintain the reporting templates and graphs, as well as the sensor and station metadata; And the end users of the data, who will obtain the data either by reading generated reports or by the custom exports of the collected data.
The software will intentionally be designed to be cross platform. Our choice for a programming language is currently Python, so should run on any platform with a Python implementation. In places where speed is a concern, such as numerical processing, optimized libraries (such as NumPy ) will be used; the goal being, that all speed limitations will be on the CPU and database back end.
The data coming from the data loggers can be in a variety of formats. The format of the file itself can vary (such as how many header lines there are, if any), as well as the meanings of the various columns of data. Accommodation must be made for the currently used formats, as well as having the flexibility to be adapted to future formats.
The data coming in must be run through a variety of checks. The software must have a way of defining and executing these tests outside of modifying the core code of the software (e.g. test plugins).
The software must process the incoming data, and produce the desired output in an acceptable amount of time. Since data usually comes in from a station once per hour, it is assumed that a worse case scenario would be the software taking 59 minutes to process the data. For purposes of requirements, however, an “outside threshold” will be considered to be 15 minutes. A time much lower than this is anticipated.
The imported data will most likely be stored in a database, thus, a general facility must be provided to map incoming sensor data onto a database schema that is both efficient (for the database) and easy to understand (for the programmer/user).
It is assumed the software will run on any reasonable platform. Since Python has been ported to most Unixes, Windows, Mac, OS/2, and even the Nokia Series 60 Cell Phone, we do not forsee the destination platform being a problem.
It is assumed the location in which the software is installed will have the connectivity required to utilize the needed back-end database.
This section will describe how the software will connect to external components.
The data will be output, most likely to web pages, data files, and graphs. These will display in any compatible browser.
There will be an interface for editing the station and sensor metadata. It is as of yet undecided whether this will be a GUI (local) application or a web-based application, although preference is currently toward a web-based application, mainly because meta-data can then be edited anywhere.
The software will not directly any hardware interfaces as it is much higher up the communications stack. Actual data collection will be done by the data loggers, and will be accessed by reading the data files, not by any direct query of stations or sensors.
The software will connect to a back-end database via a database library, and will utilize other libraries for the processing of data, and the generation of data graphs. A browser will be used for editing metadata and viewing the output of the software.
This section contains the system features required in the software program, and includes detailed descriptions of each feature.
The software will be processing data from a variety of stations and sensors. The software must “know” certain information about these data points in order to make educated decisions regarding the processing and testing of the data. Thus, the software must have facilities for entering and editing the metadata about the stations and sensors.
High. The sensors are central to what the whole project is about, so this requirement actually must be implemented before any of the other implemented requirements can be put to full use.
The software must obtain data to process. Thus, there must be facilities to import a variety of data formats, probably most CSV-like formats, but that is not guaranteed. This data must be stored into a format that will be easily retrieved later.
High. Processing data is another central item to the project, so is depended on by many other aspects.
Data that comes in may be in a form that is not “human readable.” The usual example is that of temperature sensors reading out in resistance values. This is especially seen in older data loggers as processing these values to human readable values at the data logger would take too much processing power. These values must then be converted once the readings arrive in the data files. In addition, the functions/equations needed to convert the values must be defined in the sensor metadata.
High. Checking the sensor readings is central to the software.
Data coming in must go through a series of tests to make sure it is current; within operational parameters for the sensor; makes sense in context (e.g. seasonal temperatures); makes sense in a certain time period (e.g. a one hour temperature swing of 40 degree Celsius, while possible, probably isn’t likely); and other criteria.
High. This relates directly to the reason for this project: quality assurance on the data.
In some cases, readings for the same information (e.g. temperature) may come from more than one sensor. This may be due to redundancy requirements (three sensors buried on concrete), or sensor range requirements (one of the sensor’s readings is only valid in a certain range). Thus, there must be a way to define which reading takes precedence.
High. While this will be a feature that is needed once the we go into production, the absence of this feature can be worked around. However, the “hooks” for this feature must be built in to the software from the ground-up, as not doing so will require a major rewrite of parts of the core when the feature is needed.
Web pages with sensor readings and graphed sensor data will be produced and placed on public-facing web sites
Medium. These will need to be produced, but there are many pieces that will have to be in place first before these pages can be produced. Thus, the “medium” priority is more an indication of needed dependencies.
Web pages with diagnostic data, such as last report, panel temperatures, battery voltages, sensor-out-of-range warnings, as well as other pertinent data must be generated and placed on private-facing web sites.
High. Diagnostics are an important part of reason why this project is in place.
In addition to the web-based reporting, the software must also support error reporting by e-mail and/or SMS.
High. Alerts of adherent operation are desired.
The parts of this application that edit important data (station metadata, sensor metadata, test definitions, and the like) must be protected by a login facility to restrict these actions to known individuals.
High. This is an integral part of the overall system, as the integrity of the data is very important to the project.
This section details the other, nonfunctional requirements of the software (Some of these are on the list thanks to WikiPedia )
The software must be available to process data whenever new data is received.
The software must be documented. This includes all non-obvious code to be thoroughly commented, as well as a guide for the end users who will be maintaining station and sensor metadata. The usage of the output pages and graphs should be intuitive to anyone who knows how to use a browser and is familiar with the data being reported should need no additional documentation to use the web site.
The software must be written in such a way that when an error is encountered, the error is properly reported, and all transactions/operations are cancelled cleanly.
The software will be written in such a way that continuing maintenance will be straightforward. This relates to the the above point regarding documentation.
The software where maintain a high enough level of abstraction, where feasible and practical, such that subsequent changes to the behavior of the software will require the minimum amount of disruption to the core code.
Various functionality of the software will be divided into various modules, with well defined interfaces. Plugin architectures will be implemented in areas where underlying rules or mechanisms may change.
As alluded to earlier, performance is an important part of this software, but not a critical part. Operations must be kept within a “reasonable time” in order to support timely processing of incoming data. Ideally, data will be available for public viewing within five to ten minutes after arriving from the data loggers.
The software must produce correct results, ideally, 100% of the time. This will require the metadata and constructed tests to be correct.
While the data will be available to anyone that comes to the web site to view the various pages produced, the editing interface to the metadata must be protected in such a way that only authorized users will be able to make changes to the metadata.
The code should be written in such a way that is easily understandable, without the use of coding “tricks” or shortcuts. For example: if there is a choice between a shorter (or even better performing) piece of code that is more opaque than a longer, easier to understand piece of code, the longer code would be used, unless there is a significant (order of magnitude or better) performance penalty.
As the phrase goes, “Who tests the testers?” The software will be written using Test Driven Development (TDD). All modules will have tests written against them (preferably even before the modules themselves are written). These test will make sure the modules behave correctly under a range of input values, both valid, and invalid, values. (:notoc:)
