Introduction
Track expression allows to retrieve numerical data that is
recorded in the tracks. Track expressions are widely used in various
functions (emr_screen
, emr_extract
,
emr_dist
, …).
Track expression is a character string that closely resembles a valid
R/Python expression. Just like any other R/Python expression it may
include conditions, function calls and variables defined beforehand.
"1 > 2"
, "mean(1:10)"
and
"myvar < 17"
are all valid track expressions. Unlike
regular R/Python expressions track expression might also contain track
names and / or virtual track names.
To understand how the track expression allows the access to the
tracks we must explain how the track expression gets evaluated.
Every track expression is accompanied by an iterator that
produces a set of id-time points of
(id, time, ref)
type. For each each iterator point the
track expression is evaluated. The value of the track expression
"mean(1:10)"
is constant regardless the iterator point.
However the track expression might contain a track name
mytrack
, like: "mytrack * 3"
. Naryn recognizes
then that mytrack
is not a regular R/Python variable but
rather a track name. A new run-time track variable named
mytrack
is added then to R environment (or Python module
local dictionary). For each iterator point this variable is assigned the
value of the track that matches (id, time, ref)
(or NaN if
no matching value exists in the track). Once mytrack
is
assigned the corresponding value, the track expression is evaluated in
R/Python.
Run-time Track Variable is a Vector
To boost the performance of the track expression evaluation, run-time
track variables are actually defined as vectors in R rather than
scalars. The result of the evaluation is expected to be also a vector of
a similar size. One should always keep in his mind the vectorial
notation and write the track expressions accordingly.
For example, at first glance a track expression
"min(mytrack, 10)"
seems to be perfectly fine. However the
evaluation of this expression produces always a scalar, i.e. a single
number even if mytrack
is actually a vector. The way to
correct the specific track expression so that it works on vectors, is to
use pmin
function instead of min
.
Python
Similarly to R, a track variable in Python is not a scalar but rather
an instance of numpy.ndarray
. The evaluation of a track
expression must therefore produce a numpy.ndarray
as well.
Various operations on numpy arrays indeed work the same way as with
scalars, however logical operations require different syntax. For
instance:
screen("mytrack1 > 1 and mytrack2 < 2", iterator = "mytrack1")
will produce an error given that mytrack1
and
mytrack2
are numpy arrays. The correct way to write the
expression is:
screen("(mytrack1 > 1) & (mytrack2 < 2)", iterator="mytrack1")
One may coerce the track variable to behave like a scalar: by setting
emr_eval.buf.size
option to 1
(see Appendix
for more details). Beware though that this might take its heavy toll on
run-time.
Matching Reference in the Track Expression
If the track expression contains a track (or virtual track) name,
then the values from the track are fetched one-by-one into the
identically named R variable based on id
, time
and ref
of the iterator point. If however ref
of the iterator point equals to -1
, we treat it as a
“wildcard”: matching is required then only for id
and
time
.
“Wildcard” reference in the iterator might create a new issue: more
than one track value might match then a single iterator point. In this
case the value placed in the track variable (e.g. mytrack
)
depends on the type of the track. If the track is categorical the track
variable is set to -1
, otherwise it is set to the average
of all matching values.
Virtual Tracks
So far we have shown that in some situations mytrack
variable can be set to the average of the matching track values. But
what if we do not want to average the values but rather pick up the
maximal, minimal or median value? What if we want to use the percentile
of a track value rather than the value itself? And maybe we even want to
alter the time of the iterator point: shift it or expand to a time
window and by that look at the different set of track values? For
instance: given an iterator point we might want to know what was the
maximal level of glucose during the last year that preceded the time of
the point.
This is where virtual tracks come in use.
Virtual track is a named set of rules that describe how the track
should be proceeded, and how the time of the iterator point should be
modified. Virtual tracks are created by emr_vtrack.create
function:
emr_vtrack.create("annual_glucose",
src = "glucose_track", func = "quantile",
param = 0.5, time.shift = c(-year(), 0)
)
This call creates a new virtual track named
annual_glucose
based on the underlying physical source
track glucose_track
. For each iterator point with time
T
we look at values of glucose_track
in the
time window of [T-365*24,T]
, i.e. one year prior to
T
. We calculate then the median over the values
(func="quantile"
, param=0.5
).
There is a rich set of various functions besides “quantile” that can
be applied to the track values. Some of these functions can be used only
with categorical tracks, other ones - only with quantitative tracks and
some functions can be applied to both types of the track. Please refer
the documentation of emr_vtrack.create
.
Once a virtual track is created it can be used in a track
expression:
emr_extract("annual_glucose", iterator = list(year(), "patients.dob"))
This would give us a median of an annual glucose level in year-steps
starting from the patient’s birthday. (This example makes use of an
Extended Beat Iterator that would be explained later.)
Let’s expand our example further and ignore in our calculations the
glucose readings that had been made within a week after steroids had
been prescribed. We can use an additional filter
parameter
to do that.
emr_filter.create("steroids_filter", "steroids_track", time.shift=c(-week(), 0))
emr_vtrack.create("annual_glucose",
src = "glucose_track", func = "quantile",
param = 0.5, time.shift = c(-year(), 0), filter = "!steroids_filter"
)
emr_extract("annual_glucose", iterator = list(year(), "date_of_birth_track"))
Filter is applied to the ID-Time points of the source track
(e.g. glucose_track
in our example). The virtual track
function (quantile
, …) is applied then only to the points
that pass the filter. The concept of filters is explained extensively in
a separate chapter.
Virtual tracks allow also to remap the patient ids. This is done via
id.map
parameter which accepts a data frame that defines
the id mapping. Remapping ids might be useful if family ties are
explored. For example, instead of glucose level of the patient we are
interested to check the glucose level of one of his family members.