Web Data Extraction

The information dispersed over the Internet is an enormous but elusive treasure. While tools for wrapping a priori known sites exist, data extraction from hitherto unknown sites remains a challenge. In the » DIADEM project [» PVLDB'14 | » WWW'12 | » DEXA'12 | » ICWE'11], we develop an approach for extracting data from arbitrary sites of a given domain, such as the real estate or rare book market. Thus, with DIADEM, it will be possible to obtain and continously maintain a database covering all offers of the respective market. Within DIADEM, I mainly work on three subprojects:

colorful whitebox

Test Case Generation

Although central to software engineering, there appears to be no adequate language to specify test suites concisely, precisely, and workable for corresponding tools. For example, most certification standards for dependable software require test suites satisfying some structural coverage criteria, such as statement or decision coverage. But in most cases, the precise meaning of the required criteria remains ambiguous.

As a possible solution to this problem, the FShell Query Language (FQL) extends regular expressions to match test suites rather than individual executions [» ASE'10]. Its associated test input generator » FShell [» VMCAI'09 | » CAV'08] automatically generates test suites for ANSI C programs which satisfy FQL coverage specifications. Since FQL supports not only common coverage criteria but also specifically tailored variants thereof, FQL allows to use coverage criteria in tasks which are traditionally driven by ad-hoc testing, such as program exploration. We address some of the thereoretical issues arising in the design of the FQL language in [» FSTTCS'13].

warning lights

Runtime Verification

In runtime verification (see [» JLAP'09] for an overview), properties of the system under scrutiny are continuously monitored during runtime---either as part of a testing procedure or as foundation for introspective systems which monitor their own behaviour, for example to achieve some degree of fault-tolerance. To this end, we need suitable specification languages to express the properties to be monitored and efficient methods to turn such specifications into runnable monitoring procedures.

In our approach, we developed for Linear Time Logic (LTL) both, a semantics for runtime verification and a corresponding monitoring approach. This semantics is impartial, i.e., it does not yield premature verdicts, and anticipatory, i.e., it provides a definitive verdict as early as possible. While monitor generation for this semantics is intractable in general, it proved to be efficient in practical cases. We lifted this approach to Timed LTL (TLTL) for real-time properties [» TOSEM'11] and generalized the approach towards other LTL variants [» ATVA'08]. We also compared our and other LTL semantics dealing with finite systems traces in [» JLC'10] for their usefulness in runtime verification.