Time-oriented data are of great importance as they are found in almost any database. May it be in terms of a record of working hours or a detailed list of sales statistics in an online shop. However, as it is the case with any other data these records tend to contain errors and correcting them manually would require a lot of effort and time, and thus, high costs. Some estimations go so far as to say that up to 40% of data contains errors. There are many methods and tools that focus on cleansing 'dirty' data, however, they rarely focus on time-oriented data. Some tools may help with a few time-oriented data problems, but time is hardly considered to be the main target. Those, who set a goal to deal with 'dirty' time-oriented data are mostly focused on a visual representation to make the task of error detection easier for the user. This led us to implement a research prototype that provides (semi-)automatic operations in order to take care of many possible time-oriented quality problems. Most of them do not require any further knowledge of the methods applied and hence, are ready to use by a large audience. We have evaluated the prototype in a usability study and derived suggestions for possible improvement.
Time-Oriented Data
The visualization of time-oriented data is an essential part for analysis, which is solved by software applications in today’s digital age. The graphical preparation of time-oriented data is additionally a difficult task, because time is a complex variable. Visualization techniques try to prepare time-oriented data in a graphical way to identify specific patterns and structures. For the development of such techniques sample data is required, which is used for testing and demonstration. Data can be obtained from various sources, in which real data is not always available, e.g. due to legal issues. In such cases synthetic data can be used. Synthetic data is produced by data generators. There exist several generator, in which most are not specialized in time-oriented data. Even fewer generator are able to visualize the data they generated. In order to close this gap this master thesis presents a software design, which is able to generate time-oriented data in a visual way. Data will no longer be generated first and then visualized, data is generated directly from visualization. This master thesis describes a software design which is able to generate time-oriented data with visual aspects. An additional aim is to provide a design, which is good enough for both expert and non-expert, so they can work with a corresponding implementation. This design is represented by a prototype, which is also used for the evaluation. An expert and two users, who are not experts on time-oriented data, use the prototype and generate specific data sets.