Tutorial#
Heedy’s timeseries transform capabilities are simple, but powerful. This tutorial will guide you from the very basics of transforming data to creating sophisticated transform pipelines.
Basics#
Each datapoint in a timeseries has the following format:
{
"t": floating point timestamp (unix time in seconds),
"dt": floating point duration in seconds,
"d": the datapoint's data content.
}
Unless explicitly stated, transforms focus on the datapoint’s data content. To start out, we will use the following tiny timeseries (which has no duration):
[
{ "t": 123, "d": 2 },
{ "t": 124, "d": 3 },
{ "t": 125, "d": 0.1 },
{ "t": 126, "d": -50 }
]
Comparisons#
Let’s check which datapoints have their data >= 1.
d >= 1
If you are familiar with programming, this is just a simple comparison statement. In PipeScript, d represents the “current datapoint”. The transform is then run on each consecutive datapoint in the timeseries:
[
{ "t": 123, "d": true },
{ "t": 124, "d": true },
{ "t": 125, "d": false },
{ "t": 126, "d": false }
]
You can also use and
, or
, and not
to create logic of arbitrary complexity:
d < 0 or not d < 1
[
{ "t": 123, "d": true },
{ "t": 124, "d": true },
{ "t": 125, "d": false },
{ "t": 126, "d": true }
]
Algebra#
PipeScript also supports basic algebra. In particular, +-/*%^
are all built into the language, with x^y
meaning pow(x,y)
.
(d+5)/2
gives:
[
{ "t": 123, "d": 3.5 },
{ "t": 124, "d": 4 },
{ "t": 125, "d": 2.55 },
{ "t": 126, "d": -22.5 }
]
Aggregating Data#
Not all transforms return an answer for each datapoint. Some aggregate your data:
sum
[{ "t": 123, "dt": 3, "d": -44.9 }]
As expected, the sum transform returns a single datapoint, which has in its data portion the sum over the entire timeseries. Examples of other available aggregators are count
, mean
, min
, and max
.
Filtering Data#
Transforms can take arguments as input. For example, the where
transform removes all datapoints that don’t satisfy the condition given in its first argument:
where(d>=2)
[
{ "t": 123, "d": 2 },
{ "t": 124, "d": 3 }
]
Note that the parentheses are optional here. The above transform is equivalent to:
where d>=2
The where transform is similar to SELECT WHERE
in SQL.
Chaining Transforms#
Oftentimes, you want to combine multiple transforms. PipeScript allows you to do this using pipes:
a | b | c | d
The output of a
is used as the input to b
, and so forth.
For this section, we will use the following dataset from a fitness tracker:
[
{
"t": 1,
"d": {
"steps": 14,
"activity": "walking"
}
},
{
"t": 2,
"d": {
"steps": 10,
"activity": "running"
}
},
{
"t": 3,
"d": {
"steps": 12,
"activity": "walking"
}
},
{
"t": 4,
"d": {
"steps": 5,
"activity": "running"
}
}
]
Suppose we want to get the total number of steps we took while running. We cannot do this with one transform, but by chaining together a couple of simple transforms, we can get there!
First off, let’s filter the datapoints so that we have just those where we were running:
where d("activity")=="running"
Notice that the d
accepts an argument - it allows you to return a sub-object of the datapoint. Our result is:
[
{
"t": 2,
"d": {
"steps": 10,
"activity": "running"
}
},
{
"t": 4,
"d": {
"steps": 5,
"activity": "running"
}
}
]
We can now add a |
after the first part of our statement, and we can perform further transforms on the result of the previous operation (shown above).
After extracting only the datapoints that have their activity as “running”, we return only the “steps” portion of the datapoint:
where d("activity")=="running" | d("steps")
[
{
"t": 2,
"d": 10
},
{
"t": 4,
"d": 5
}
]
Finally, we want to sum the datapoints to get the total number of steps while running:
where d("activity")=="running" | d("steps") | sum
[
{
"t": 2,
"dt": 2,
"d": 15
}
]
Advanced Pipes#
All arguments to each transform are actually transform pipelines. For example, one can go multiple levels into a nested object within an argument to the where transform:
where( (d("level1") | d("level2")) == 4 )
For convenience, PipeScript also includes :
as a pipe symbol with high precedence (the pipe will be taken before algebra is done), allowing simplification of the script a bit by dropping the internal parentheses:
where( d("level1"):d("level2") == 4 )
In order for the parent (where
) to always get SOME result in its argument, sub-transforms cannot include transforms that are not One-To-One (for each datapoint that they get as input, they output one datapoint). This means that you cannot nest where
transforms.
Pipe Args#
Some advanced transforms have an argument listed as a pipe
type. Sub-transforms are linked to previous data, whereas pipes are treated as scripts, which are managed by the transform.
As an example, the map
transform splits the timeseries along unique values of its first argument, and runs the pipe given in its second argument on each resulting stream.
We will once again use the timeseries from the previous example:
[
{
"t": 1,
"d": {
"steps": 14,
"activity": "walking"
}
},
{
"t": 2,
"d": {
"steps": 10,
"activity": "running"
}
},
{
"t": 3,
"d": {
"steps": 12,
"activity": "walking"
}
},
{
"t": 4,
"d": {
"steps": 5,
"activity": "running"
}
}
]
Remember that previously, we found the total number of steps while running with the transform where d("activity")=="running" | d("steps") | sum
.
We will now extend that to find the number of steps for each activity, using the map
transform:
map( d("activity") , d("steps"):sum )
[
{
"t": 4,
"dt": 3,
"d": {
"walking": 26,
"running": 15
}
}
]
What happened here?
When calling map(arg1,arg2)
, the map
transform uses arg2
as a Pipe. It then makes copies of arg2
for each value of arg1
, returning the output of the pipe for each instantiation/
To clarify, we will see exactly what happened in the above call:
The
map
transform saw the first datapoint. The value ofarg1
, (d("activity")
) waswalking
. It created a new instance ofarg2
,d("steps"):sum
, and sent the datapoint through this transform, giving a total of14
so far forwalking
.The next datapoint had as its activity
running
. Another new instance ofd("steps"):sum
was created, and the datapoint was sent through it. The sum forrunning
starts at10
The third datapoint is
walking
.map
already has a pipeline started for this value, so it passes the new datapoint through the first pipe, giving a sum of26
(14+12)The fourth datapoint is
running
. Passing it to the the corresponding pipe, we get15
.There are no more datapoints. The
map
transform returns an object with the last value of each pipe as a result.
The ability of transforms to be passed pipes as arguments allows them to do extremely powerful aggregations.
Object-Valued Transforms#
You might want to perform multiple calculations at once in pipescript, or perhaps simply return an object. For this reason, PipeScript supports JSON-like values:
{"sum": sum, "total": count}
This transform will return both the sum of all of the datapoints’ values, and the number of datapoints at the same time. This object support also enables you to save values for later use in the pipeline.
Finally, since transforms can get fairly complex with objects, PipeScript does accept multiline scripts. That is, the following is a valid script format:
where d("activity")!="still"
| {
"total": d("steps"):sum,
"some_random_stuff": ( d("steps") | something | something else )
}