FIP: Dataflow topology builder and analytics

### **Motivation:**
Understanding millions of related events and their relationship in a dataflow is impossible. Events span serverless functions, stream processors and microservices. Building that story and extracting instrumentation knowledge is virtually impossible. It should be possible to build a queryable dataflow model that supports instrumentation. The result would be the ability to scan millions of events (historcally and realtime), break them down by different attributes, identify anomolies, drill into the set of interest, and then down to a singular dataflow with relevant contextual information.

For example:
* millions of IoT events -> those devices generating errors or performing slowly
* millions of payments -> system latency degradation over time, identifying the types of payments of attributes affecting performance.

### **Proposed change**

The ability to build a data flow model that support analytics such as number of flows, latency between steps. Break down millions of data flows against different attributes. Support filtering by analytics, attributes, latency and others. In someways it is equivalent to distributed tracing except it 
* also includes related logline against a correlation-Id (provides context). 
* doesn’t require special infrastructure (json data stored on cloud-storage)
* extensible. (I.e. can be updated to support cloud events in future)
* low impact to existing systems


### **V1 Mechanics:**

**Note  limitations:**
1. supports a single-correlation-id (no propagation fan-out etc), 
2. rewriting correlation data (duplication)

**Stage 1. Rewrite dataflows by time-correlation**
Format: {timestamp}-corr-id.log -> records[timestamp:data]

This makes each stage of the dataflow an append-only log of that stage of the correlations dataflow. Timestamps will distinguish between different states of the flow. 

For example.

```
time-1: corr-a data-1
time-2: corr-b data-2
time-3: corr-a data-3
```
Yield 2 data files
```
time-1-corr-a.log -> records for corr-a
time-2-corr-b.log -> records for corr-b
```
**Stage 2. calculate dataflow level analysis.index**

For each dataflow generate span-information into a time-n-corr-m.index file

**Stage 3. build per-day aggregate view of correlation data**

1. Scan all index files to extract histogram analytics including: 
* number of dataflows
* percentile breakdown of min,max,avg, 95th percentile, 99th percentile data
2. The output histogram will be stored using the timerange of the query to prevent recaulcation requirements.


#### **Datamodel:**

1.  RPC call

I.e. Client->function(request)->client(result)

**Client-app1.log** 
{ ts:’1234010’, corid:’aaaaa’}
{ ts:’1234011’, corid:’aaaaa’ stage:’request-calc’}

**ServerlessFunction.log**
{ ts:’1234021’, corid:’aaaaa’ stage:’bootstrap’ serverlessContextId:’1234:name:version’}
{ ts:’1234022, corid:’aaaaa’ stage:’starting’ serverlessContextId:’1234’}
{ ts:’1234023’, corid:’aaaaa’ msg:’processing some stuff for things’}
{ ts:’1234024’, corid:’aaaaa’ stage:’finished’ serverlessContextId:’1234’}
Lambda finished processing: memory:1234 time:2345 <todo>

Client-app1.log 
{ ts:’1234025, corid:’aaaaa’ stage:’calc-finished’}

**Single Call attributes collected: Raw**
Request start-time (corrid, timestamp, stage)
Fun load-time, (fnId, timestamp)
Fun handler start (corrid, fnId, timestamp, stage)
Fun handler end (corrId, fnId, timestamp , stage, fn-stats)
Response received-time (corrid, timestamp, stage)

**Analytics**

**Call chain analytics**
Stage1->stage2 elapsed (states are labels on the log line)
Stage2->stage3 elapsed
...etc...

**_Simple timeseries analysis:_**
Represented as candle-chart?
1. Individual stage-analytics using percentile/max/avg
 stageA-stageB elapsed percentiles (label-stageA->stageB 
  Expression: `analytics.dataflow.stageStats(dataset-name?)`

1. End-to-end stage analytics: 
Stage1->Stage4 elapsed using percentiles
  Expression: `analytics.dataflow.endToEndStats(dataset-name?)`

1. Filter stage-stage latency by value: 
Stage1->Stage4 elapsed > 100ms
  Expression: _`analytics.dataflow.stageFilter(stage1 - stage4 < 100)`_ `analytics.dataflow.endToEndStats(dataset-name?)`


2. Branch RPC
Client-app1.log 
{ ts:’1234010’, corid:’aaaaa’}
{ ts:’1234011’, corid:’aaaaa’ stage:’request-calc-1’}
{ ts:’1234011’, corid:’aaaaa’ stage:’request-calc-2’}
{ ts:’1234011’, corid:’aaaaa’ stage:’request-calc-3’}

ServerFunction.logs (single source file)
{ ts:’1234021’, corid:’aaaaa’ stage:’bootstrap’ serverlessContextId:’1111:name:version’}
{ ts:’1234022, corid:’aaaaa’ stage:’starting’ serverlessContextId:’1111’}
{ ts:’1234024’, corid:’aaaaa’ stage:’finished’ serverlessContextId:’1111’}
{ ts:’1234021’, corid:’aaaaa’ stage:’bootstrap’ serverlessContextId:’2222:name:version’}
{ ts:’1234022, corid:’aaaaa’ stage:’starting’ serverlessContextId:’2222’}
{ ts:’1234024’, corid:’aaaaa’ stage:’finished’ serverlessContextId:’2222’}
{ ts:’1234021’, corid:’aaaaa’ stage:’bootstrap’ serverlessContextId:’3333:name:version’}
{ ts:’1234022, corid:’aaaaa’ stage:’starting’ serverlessContextId:’3333’}
{ ts:’1234024’, corid:’aaaaa’ stage:’finished’ serverlessContextId:’3333’}

Client-app1.log 
{ ts:’1234025, corid:’aaaaa’ stage:’calc-finished’}



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIP: Dataflow topology builder and analytics #50

Motivation:

Proposed change

V1 Mechanics:

Datamodel:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FIP: Dataflow topology builder and analytics #50

Description

Motivation:

Proposed change

V1 Mechanics:

Datamodel:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions