ETL tool

extract form sheets, transform: groupby load: looker studio ext

small steps automation steps no code platform

visual

Nodes:

  • Extract: read organse
  • Transform yello
  • Load red
  • brown

row filters

Analytical nodes: custom logic rule engine node

work flow annotations: boxes and labeling to group nodes readability

Extensions?

automation?

An open source tool for data exploration.

Related:

Visual workflows are useful for agentic systems

Use workflows to ensure language in text docuements are of a given standard.

Types of workflows

Terminology updater

Handling Imbalanced Data

E.g Identifiying fraud: happens rarely

Uneven mix of data

Class seperation

High accuracy

What do the evaluation metrics say

High accuracy in prediction for majority but poor for minority class.

Patterns appear most heavily in the majority class.

Class-Imbalance Problem:

 aim to ensure that the prediction algorithm pays equal attention to the patterns presented by all classes—majority, minority  Possible sol: change the distrubiton in the training data

Data Sampling Methods:

  • Oversampling:  increase the number of minority class
    • SMOTE
  • Undersampling
  • Boot straping
  • Combinations

Cost Sensitive Methods:  focusing on the cost of making mistakes – the cost of misclassifications.

  • unequal cositing, threshold based

Algorithimic methods : SVM,Decisaino trees,clustering one classed based

Ensemble Methods: bagging boosting active learning