Data Sources

In today's world, information can come from a variety of sources in a variety of formats. Prajna provides readers for a number of different data sources. Prajna also supports applications which format data in a variety of ways.

Characteristics of Data Sources

The behavior and characteristics of all applications depend on the characteristics of the underlying data. A source of data can be characterized in several ways.

Size: Is the body of data a small dataset which can be easily retained in memory? Is it a moderate size, where representing a significant fraction of the data provides a representative sampling? Or is it a tremendous data repository with millions or billions of records which require more sophisticated filtering and navigation?
Mutability: How frequently does the data change? Is the body of data static, only changing on an occasional basis? Does it receive frequent updates that an application needs to periodically check for? Or is the data a continuous stream which requires continuous monitoring?
Atomic Objects: What are the atomic objects represented by the data? What does the data represent? A particular body of data may represent multiple objects depending on the context, but a particular application will need to differentiate these objects.
Object Structure: Are the objects structured, such as from a relational database? Or are they totally unstructured, such as text documents? If the object is structured, what are the fields and their data types?
Implied Knowledge: Are all of the fields of an object important to understanding it? What fields are useful for comprehension, and what fields are present simply for developer convenience? Are the auxilliary data elements, such as file location or timestamp on a particular data object, important? Do the objects include implicit knowledge? For instance, what are the units of measure for any measurements?
Data Structures: Does the data imply or define any data structures or relationships between the individual records, such as a graph, tree, or grid?
Format: What format is the data stored in? Is it stored in an SQL database? A collection of XML data files? Something else? How does the data need to be accessed?
Data Fusion: Is there only a single data source? Or are there multiple data sources which are referenced? If there are more than one data source, how do the records from each data source relate to one another? Do the records need to be fused? Are there common identifiers between the data sources?

Prajna provides a number of different utilities for accessing data from a variety of sources. Depending on the characteristics of the data source, a developer may wish to use a DataAccessor to extract various data structures, or a DocCorpus for accessing documents in an unstructured corpus. A developer can also use a FormatReader for data.