“Does greenhouse gas emission cause global warming?”,“Does gas price depend on supply and demand?”,“Does stress cause depression?”,“Does productivity depend on competition?” – these are a few of many questions that we face in our daily lives that naturally expose our minds to the concepts of dependence and causation; e.g., in the context of politics, sports, finance, education, health and technology. But, what do these two words imply? When exposed to the word dependence, an average person almost always thinks of a relation; i.e., if two variables are dependent, then change in one would result in a change in the other. The same intuitive understanding also applies to the concept of causation. Thus, these concepts are rather transparent in our minds. But, what remains a challenging problem till this day, and often a center of debate, is how to quantify these concepts while preserving their intuitive nature. Quantifying dependence and causation can be viewed as constructing shelters. Any building is a shelter, and it is built with bricks, the basic building block. But, they differ in their architecture, and may possess different characteristics to serve different purposes. Similarly, dependence and causation can also be quantified in many different ways where the building blocks are observations of signals, and the objective is to arrange them in the most robust and cost effective way, to capture a certain attribute.
This dissertation explores the concept of dependence and causation from an engineering perspective; in particular, machine learning and data mining. The contributions of this dissertation are; first, unifying available quadratic measures of independence to develop computationally simpler independent component analysis algorithm, second, establish a robust kernel based measure of conditional independence to detect Granger non-causality beyond linear interaction, third, develop novel understanding of dependence between arbitrary random variables from the perspective of realizations, and construct parameter-free estimators for practical problems such as variable selection in gene expression data, and fourth, extend this approach to develop scalable estimators of conditional dependence to quantify causal flow in EEG data.