Recent Changes - Search:

Project Status

Project Documents

Discussion

Bug Tracking

edit SideBar

PhotizoImporter

Photizo Importer

Imported modules, engines and objects:

+ optparse: Sets attributes of the options object returned by parse_args() based on user-supplied command-line values. So for example, the importer will parse such commands as '/usr/local/bin/photizo_importer.py —initial-run —station=“DUS2” /home/eenet/photizo/dot.ini' when adding a new station. You can also use -vvv for debugging purposes.

+ os: for better portability between platforms

+ sys: provides access to variables and functions that interact with the interpreter, for example sys.argv

+ urllib2: defines functions and classes which help in opening URLs (mostly HTTP), including authentication and cookies

+ configobj: reads the photizo ini file

+ photizo.DRE: loads the Data Reader Engine appropriate for the file type being imported

+ photizo.DSE: loads the requested data storage engine (DSE) to store the raw data into a local database

+ from photizo.common.util import url_join: accesses the url_join function from '/photizo/trunk/photizo/common/util.py' to join two or more url components

Instantiate classes to print warnings when errors occur (missing data file or attributes, malformed attributes)

def files_to_attempt(initial, base_url, source_dir, prefix, table) constructs a list of files to try to load based on whether it’s an initial run or an incremental one. The source_dir, prefix and table are defined in the ini file; the base_url is read from the path to the raw data file from the def_file_exists(url) function.

+ instantiate the file_list and call def file_exists(url): use the url library to determine if the path exists and if it doesn’t, raise an exception

+ read the path to the raw data file from the prefix specified in the ini file and the table if listed as a filename in the ini file

+ numhigh = 201: steps through the .dat and, for initial runs, the .dat.backup files to import the data for a maximum of 201 files per station

+ for each file that is found in the expected path/url, a current file is appended to the file_list.

def import_table() is called with the arguments: dataStoreBaseURL, sourceFileBaseURL, source_dir, network, station, prefix, table, column_map, and column_offset; these arguments are traceable to the config file; two default options apply (not an initial run, and the table name will correspond to the raw data file name)

+ reads in the argument values and writes them to the same table name unless another name is specified

+ instantiate the path to the table as a list, accessible via a query string, and append the column offset (as a string) to the list

+ instantiate the variable that represents the path to a table, with the arguments for network, station and table name passed as strings; if it’s executed as an initial run, create the table, otherwise write to the table

+ instantiate the ‘store’ variable, which uses the DSE to read the destination path and any flags that apply.

+ if it’s an initial run, instantiate a dummy timestamp, otherwise read the latest timestamp from the file

+ the arguments needed to find the files/tables are the sourceFileBaseURL, source_dir, prefix and table, as read from the config file

+ instantiate the data_source_uri, which joins the data file base path to the subdirectory for the station (if it exists) and to the file

+ instantiate the query string, which concatenates the data_source_uri and the source uri query string - one is assigned to each file to be imported

+ instantiate the source - use the Data Reader Engine to open the data source uri, get the column names and assign them as attributes

+ get the row version or catch an exception. This is the format (order and name) of the columns in the file.

+ begin storage transaction, increment counter for each record

+ catch the DataFileNotFound exception if we can’t find the named data file.

+ catch a OS Error 2, or HTTP Error 404 if the file we’re trying to open doesn’t exist (which means the file disappeared during processing); catch any other exception, roll back the transaction and re-raise the exception.

+ close the store table and return the total count of records added

def create_row_def(column_names, column_map) an array of dictionaries, each with the following elements: Name (column name in data file), Common Name (a friendly name), Type (data type), Comment (such as for graphing legends)

+ instantiate row_def and assign it as an attribute of the data storage engine RowDefinition function; skip the column if it is the timestamp column.

+ read the other columns based on the column map in the ini file, establishing the data type (float, integer, string) - the default is float

+ add columns to row_def and return an object (row_def) that will be passed to add_row_def() of the storage engine used.

def get_column_map(ini_column_map, verbose=0) reads a dict of arrays of colon-delimited entries from the column maps in the ini file(s) and returns a column map

+ instantiates the dictionary, parse the map element into column/attribute pairs and create/add the column name to the map dictionary

+ parse the attributes into key/value pairs, add them to the column dictionary or raise an error if blank

def main() declares/defines the function and the version number

+ initialize the command-line argument parser

+ add command line options to the parser to specify verbosity, station, files, initial run or save-to-table options

+ daemon option (not yet implemented)

+ parser defaults (no options); parse with command-line arguments, and output errors if arguments are missing

+ open the config (ini) file and get information from the [Main] section of the config file

+ if the command line option specifies a station, make sure it exists and process only that station, or get the list of stations and sort them

+ if the ‘save-to-table’ option is called, do it with the arguments that are required (station, data filename base, options, column map, data file directory)

+ if the ‘files’ option is called (specifying the files to be imported), parse the option - if that fails, read in the station name and data file suffixes (the last part of the file name in the raw data files) from the config file; in either case, instantiate the variable ‘tables’.

+ iterate through the tables and if the table exists, read the column offset1 or use the default; if the verbose option is used, print the table name that is being imported to stdout.

+ read the column map from the config file unless the source table has a custom column map (in which case, read directly from file); print the column offset if the option is verbose

+ read rows from tables with applicable command-line options (initial_run, save_to_table), options read from the config file (data_store_base_url, source_file_base_url, data_file_directory, network, station, column_map, column_offset) and options defined in the program itself (prefix, table)

Finally, call the main function

1 Column offset: normally '3' to account for timestamp, record number and station ID; the default is '1' to omit the timestamp, which is retrieved manually) (:notoc:)

Edit - History - Print - Recent Changes - Search
Page last modified on April 01, 2010, at 12:00 AM