Keynote Speakers
Amol Deshpande (University of Maryland, homepage)
will be speaking about "INCREASING REPRESENTATIONAL POWER AND SCALING REASONING IN PROBABILISTIC DATABASES".
ABSTRACT: Increasing numbers of real-world application domains are generating data that is inherently noisy, incomplete, and probabilistic in nature. Statistical analysis and probabilistic inference, widely used in those domains, often introduce additional layers of uncertainty. Examples include sensor data analysis, data integration and information extraction on the Web, social network analysis, and scientific and biomedical data management. Managing and querying such data requires us to combine the tools and the techniques from a variety of disciplines including databases, first-order logic, and probabilistic reasoning. There has been much work at the intersection of these research areas in recent years. The work on probabilistic databases has made great advances in efficiently executing SQL and inference queries over large-scale uncertain datasets. The research in first-order probabilistic models like probabilistic relational models, Markov logic networks etc. (see Getoor and Taskar for a comprehensive overview), and the work on lifted inference has resulted in several techniques for efficiently integrating first-order logic and probabilistic reasoning.
In this talk, I will present some of the foundations of large-scale probabilistic data management, and the challenges in scaling the representational power and the reasoning capabilities of probabilistic databases. I will use the PrDB probabilistic data management system being developed at the University of Maryland as a case study for this purpose. Unlike the other recent work on probabilistic databases, PrDB is designed to represent uncertain data with rich correlation structures, and it uses probabilistic graphical models as the basic representation model. I will discuss how PrDB supports compact specification of uncertainties at different abstraction levels, from "schema-level" uncertainties that apply to entire relations to "tuple-specific" uncertainties that apply to a specific tuple or a specific set of tuples; I will also discuss how this relates to the work on first-order probabilistic models. Query evaluation in PrDB can be formulated as inference in appropriately constructed graphical models, and I will briefly present some of the key novel techniques that we have developed for efficient query evaluation, and their relationship to recent work on efficient lifted inference. I will conclude with a discussion of some of the open research challenges moving forward.Pierre Fraigniaud (CNRS and University of Paris 7, homepage)
will be speaking about "INFORMATIVE LABELING SCHEMES".
ABSTRACT: Abstract: Network representations play an important role in many domains of computer science, ranging from data structures and graph algorithms, to parallel and distributed computing, and communication networks. Traditional network representations are usually global in nature. That is, in order to retrieve useful information, one must access a global data structure representing the entire network, even if the desired information is solely local, pertaining to only a few nodes. In contrast, the notion of informative labeling schemes suggests the use of a local representation of the network. The principle is to associate a label with each node, selected in a way that enables to infer information about any two nodes directly from their labels, without using any additional sources of information. Hence in essence, this method bases the entire representation on the set of labels alone. Obviously, labels of unrestricted size can be used to encode any desired information, including in particular the entire graph structure. The focus is thus on informative labeling schemes which use labels as short as possible.
This talk will introduce the notion of informative labeling scheme to the audience, and will survey some of the important results achieved in this context. In particular, we will focus on the design of compact adjacency-, ancestry-, and distance-labeling schemes for trees. These schemes find applications in various contexts, including the design of small universal graphs, and the design of small universal posets. We will actually specifically emphasis the importance of ancestry- labeling scheme for the design of compact such schemes finds applications in XML search engines. In this context, even small improvements in the label size are important, and we will survey the most recent results in this domain.
This is a joint work with Amos Korman (CNRS and University Paris Diderot)Martin Grohe (University of Berlin, homepage)
will be speaaking about "FROM POLYNOMIAL TIME QUERIES TO GRAPH STRUCTURE THEORY".
ABSTRACT: In a fundamental article on query languages for relational databases, Chandra and Harel asked in 1982 whether there is a language that expresses precisely those queries which can be answered in polynomial time. Gurevich later rephrased the question in the language of finite model theory, asking whether there is a logic that captures polynomial time. Despite serious efforts in the late 1980s and 1990s, the question is still wide open. It is considered to be one of the main open problems in database theory and finite model theory. Recently, there has been a renewed interest in the question. New languages have been proposed and old ones reconsidered, and a number of partial results stating that certain languages capture polynomial time on large and natural classes of structures have been obtained.
This talk will be a survey of the state of the art in the "quest for a logic capturing polynomial time." The focus will be on positive results for restricted classes of structures. This will lead us on an excursion to modern graph structure theory, and in particular to Robertson and Seymour's Graph Minor Theory.Ian Horrocks (Oxford University, homepage)
will be speaaking about "SCALABLE ONTOLOGY-BASED INFORMATION SYSTEMS"
ABSTRACT: Ontologies and ontology based systems are becoming increasingly important in meeting the demand for more powerful and flexible information systems. Requirements for such systems include the need to deal with incomplete and semi-structured information, to integrate information from heterogeneous sources, to employ richer and more flexible schemas, and for query answers to reflect both knowledge and data. Provision of such enhanced capabilities must, however, be in addition to, and not instead of, the well-established features of existing database systems, in particular their robust scalability. Achieving this is, of course, extremely challenging. In this talk I will present some recent research efforts that tackle this problem, including investigations of tractable fragments, new algorithmic techniques, new optimisations and the exploitation of relational database technology.
Val Tannen (University of Pennsylvania, homepage)
will be speaking about PROVENANCE FOR DATABASE TRANSFORMATIONS
ABSTRACT: Database transformations (queries, views, mappings) take apart, filter, and recombine source data in order to populate warehouses, materialize views, and provide inputs to analysis tools. As they do so, applications often need to track the relationship between parts and pieces of the sources and parts and pieces of the transformations' output. This relationship is what we call database provenance.
This talk presents an approach to database provenance that is based on two observations. First, provenance is a kind of annotation, and we can develop a general approach to annotation propagation that also covers other applications, for example to uncertainty and access control. In fact, provenance turns out to be the most general kind of such annotation, in a precise and practically useful sense. Second, the propagation of annotation through a broad class of transformations relies on just two operations: one when annotations are jointly used and one when they are used alternatively. This leads to annotations forming a specific algebraic structure, a commutative semiring.
The semiring approach works for annotating tuples, field values and attributes in standard relations, in nested relations (complex values), and for annotating nodes in (unordered) XML. It works for transformations expressed in the positive fragment of relational algebra, nested relational calculus, unordered XQuery, as well as for Datalog, GLAV schema mappings, and tgd constraints. Specific semirings correspond to earlier approaches to provenance, while others correspond to forms of uncertainty, trust, cost, and access control.
This is joint work with J.N. Foster, T.J. Green, Z. Ives, and G. Karvounarakis, done in part within the frameworks of the Orchestra and pPOD projects.