Tidy Data, and Facts & Rules

Managing Relational Data in R

Neal Fultz

UCLA Statistics

Tidy Data
Messy Data
Tidy Tools
Analysis consists of a pipeline:

Many tools are (should be?) tidy-in/tidy-out.

None of that is new.

Each attribute must represent a fact about the key, the whole key, and nothing but the key, so help me Codd.

Normalize till it hurts, denormalize till it works.

Database ppl knew this stuff in the 80s
Facts & Rules for R
Example
require(FactsRules)
## Loading required package: FactsRules
data(sleep)
. <- NA
fact(sleep)
Fact Example
sleep(., ., .)
##    extra group ID
## 1    0.7     1  1
## 2   -1.6     1  2
## 3   -0.2     1  3
## 4   -1.2     1  4
## 5   -0.1     1  5
## 6    3.4     1  6
## 7    3.7     1  7
## 8    0.8     1  8
## 9    0.0     1  9
## 10   2.0     1 10
## 11   1.9     2  1
## 12   0.8     2  2
## 13   1.1     2  3
## 14   0.1     2  4
## 15  -0.1     2  5
## 16   4.4     2  6
## 17   5.5     2  7
## 18   1.6     2  8
## 19   4.6     2  9
## 20   3.4     2 10
Fact Example
sleep(ID = 2, extra)
##    extra
## 2   -1.6
## 12   0.8
sleep(ID = 2, .)
##    extra
## 2   -1.6
## 12   0.8
sleep(ID = id, extra = e, group = g)
##       e g id
## 1   0.7 1  1
## 2  -1.6 1  2
## 3  -0.2 1  3
## 4  -1.2 1  4
## 5  -0.1 1  5
## 6   3.4 1  6
## 7   3.7 1  7
## 8   0.8 1  8
## 9   0.0 1  9
## 10  2.0 1 10
## 11  1.9 2  1
## 12  0.8 2  2
## 13  1.1 2  3
## 14  0.1 2  4
## 15 -0.1 2  5
## 16  4.4 2  6
## 17  5.5 2  7
## 18  1.6 2  8
## 19  4.6 2  9
## 20  3.4 2 10
id = 2
sleep(ID = id, extra = e, group = g)
##       e g
## 2  -1.6 1
## 12  0.8 2
sleep(ID = id + 1, extra = e, group = g)
##       e g
## 3  -0.2 1
## 13  1.1 2
Fact Example
vsleep <- Vectorize(sleep)
vsleep(., ., ID = 2:4)
##       [,1]      [,2]      [,3]     
## extra Numeric,2 Numeric,2 Numeric,2
## group factor,2  factor,2  factor,2
Rule example
# fact(parent) A parent's parent is a grandparent.
rule(grandparent(g, x) ~ {
    parent(g, y)
    parent(y, x)
})
Rule example
# From example(merge) (m1 <- merge(authors, books, by.x = 'surname', by.y =
# 'name')) fact(authors); fact(books);
rule(bookAuthorDeceased(title, name, deceased) ~ {
    authors(name, deceased = deceased)
    books(name, title)
})
Rule example
# Good Old Fibonnaci example
rule(Fib(x = 0, y = 0) ~ {
    MATCH
}, Fib(x = 1, y = 1) ~ {
    MATCH
}, Fib(x, y) ~ {
    test(x > 1)
    data.frame(A = x - 1, B = x - 2)
    Fib(A, y1)
    Fib(B, y2)
    data.frame(y = y1 + y2)
})
Fib(x = 10, .)
##    y
## 1 55
References