Data Archive
Search
Search
Dark mode
Light mode
Explorer
pages
Data Archive
DE_Tools
ML_Tools
Queries
Questions
Quotes
Tags
standardised
1-on-1 Template
AB testing
Accessing Gen AI generated content
Accuracy
ACID Transaction
Activation atlases
Activation Function
Active Learning
Ada boosting
Adam Optimizer
Adaptive Learning Rates
Addressing Multicollinearity
Addressing_Multicollinearity.py
Adjusted R squared
Agent-based modelling
Agentic Solutions
AI Engineer
AI governance
Algorithms
Amazon S3
Anomaly Detection
Anomaly Detection in Time Series
Anomaly Detection with Clustering
Anomaly Detection with Statistical Methods
Apache Kafka
API
API Driven Microservices
ARIMA
Attention Is All You Need
Attention mechanism
AUC
Automated Feature Creation
AWS Lambda
Azure
Backpropagation in Neural Networks
Bag of words
Bagging
Bandit example output
Bandit_Example_Fixed.py
Bandit_Example_Nonfixed.py
Baseline Forecasting
Bash
Batch Normalistion
Batch Processing
Bellman Equations
BERT
BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
BERTScore
Bias and variance
Big Data
BigQuery
binary classification
Boosting
Boxplot
Business observability
Career Interest
CatBoost
Central Limit Theorem
Chain of thought
Change Management
Checksum
Choosing a Threshold
Choosing the Number of Clusters
CI-CD
Class Separability
Classification
Classification Report
Claude
cleaning terminal path
Clustering
Clustering_Dashboard.py
Clustermap
Code Diagrams
Columnar Storage
Command line
Command Prompt
Common Security Vulnerabilities in Software Development
Common Table Expression
Communication principles
Communication Techniques
Comparing LLM
Comparing_Ensembles.py
Components of the database
Computer Science
Conceptual Model
Confidence Interval
Confusion Matrix
Continuous Delivery - Deployment
Continuous Integration
Converting categorical variables to a dummy indicators
Convolutional Neural Networks
Correlation
Correlation vs Causation
Cosine Similarity
Cost Function
Cost-Sensitive Analysis
Covariance
Covariance Structures
Covariance vs Correlation
Cross Entropy
Cross validation
Cross_Entropy_Single.py
Cross_Entropy.py
CRUD
Current challenges within the energy sector
Dash
Dashboarding
Data AI Education at Work
Data Analysis
Data Analyst
Data Architect
Data Archive Graph Analysis
data asset
Data Cleansing
Data Collection
Data Drift
Data Engineer
Data Engineering
Data Engineering Portal
Data Ingestion
Data Integrity
Data Leakage
Data Management
Data Mining - CRISP
Data Modelling
Data Orchestration
Data Pipeline
Data Pipeline to Data Products
Data Principles
Data Reduction
Data Roles
Data Science
Data Scientist
Data Selection
Data Steward
Data storage
Data Streaming
Data Validation
data virtualization
Data Visualisation
Database
Database Index
Database Management System (DBMS)
Database schema
Databricks
Databricks vs Snowflake
Datasets
DBScan
dbt
Debugging
Debugging ipynb
Debugging.py
Decision Tree
Deep Learning Frameworks
Deep Learning Overview
Deep Q-Learning
DeepSeek
Demand forecasting
Dendrograms
Determining Threshold Values
Difference between Databricks vs. Snowflake
Difference between snowflake to hadoop
Digital Transformation
Digital twin
Dimension Table
Dimensional Modelling
Dimensionality Reduction
dimensions
Directed Acyclic Graph (DAG)
Directory Structure
Distillation
Distributed Computing
Distributions
Docker
Docker Image
Documentation
Dropout
duckdb
EDA
Edge Machine Learning Models
Education and Training
Elastic Net
ELT
Embedded Methods
emergent behavior
Encoding Categorical Variables
Energy
Energy ABM
Energy Storage
Epoch
ER Diagrams
Estimator
ETL Pipeline example
ETL vs. ELT
etlt
Evaluating Language Models
Evaluation Metrics
Event Driven
Event Driven Events
Event Driven Microservices
Event-Driven Architecture
Everything
Excel & Sheets
Explain different gradient descent algorithms, their advantages, and limitations.
Explain the curse of dimensionality
Exploration
Exploration vs. Exploitation
Exponential Smoothing
F1 Score
Fabric
fact table
Factor Analysis
Factor_Analysis.py
facts
FastAPI
FastAPI_Example.py
Feature Engineering
Feature Evaluation
Feature Extraction
Feature Importance
Feature Scaling
Feature Selection
Feature selection and creation
Feature Selection vs Feature Importance
Feed Forward Neural Network
Feedback Template
Filter method
filter methods
Firebase
Fitting weights and biases of a neural network
Flask
Folder Tree Diagram
Foreign Key
Forward Propagation in Neural Networks
Full Lifecycle Management
Gaussian Distribution
Gaussian Mixture Models
Gaussian Model
gaussian_mixture_model_implementation.py
Generative Adversarial Networks
Generative AI
Generative AI From Theory to Practice
Get data
Gini Impurity
Gini Impurity vs Cross Entropy
GIS
Git
Gitlab
gitlab-ci.yml
Google Cloud Platform
Google My Maps Data Extraction
Gradient Boosting
Gradient Boosting Regressor
Gradient Descent
Gradio
Grain
Grammar method
Graph Analysis Plugin
GraphRAG
Grep
GridSeachCv
Groupby
Groupby vs Crosstab
GRU
GSheets
Guardrails
Hadoop
Handling Different Distributions
Handling Missing Data
Handwritten Digit Classification
Hashes
Heatmap
heterogeneous features
Hierarchical Clustering
High cross validation accuracy is not directly proportional to performance on unseen test data
How businesses use Gen AI
How do we evaluate of LLM Outputs
how do you do the data selection
How is reinforcement learning being combined with deep learning
How is schema evolution done in practice with SQL
How to do git commit messages properly
How to model to improve demand forecasting
How to normalise a merged table
How to reduce the need for Gen AI responses
How to search within a graph
How to use Sklearn Pipeline
Hugging Face
Hyperparameter
Hyperparameter Tuning
Hypothesis testing
Imbalanced Datasets
Imbalanced_Datasets_SMOTE.py
Implementing Database Schema
In NER how would you handle ambiguous entities
incremental synchronization
Industries of interest
inference
inference versus prediction
information theory
Input is Not Properly Sanitized
Interpretability
interview notepad
ipynb
Isolation Forest and Its Use in Anomaly Detection
Java vs JavaScript
Json
Json to Yaml
Junction Tables
Justfile
K_Means.py
K-means
K-nearest neighbours
Kaggle Abalone regression example
Kernelling
Key Differences of Web Feature Server (WFS) and Web Feature Server (WFS)
Kmeans vs GMM
Knowledge Graph
Knowledge graph vs RAG setup
Knowledge Graphs with Obsidian
Knowledge Work
Labelling data
Language Model Output Optimisation
Language Models
Language Models Large (LLMs) vs Small (SLMs)
Lasso
Latency
LBFGS
learning rate
Learning Styles
lemmatization
LightGBM
LightGBM vs XGBoost vs CatBoost
Linear Discriminant Analysis
Linear Regression
LLM
LLM Evaluation Metrics
Load Balancing
Local Interpretable Model-agnostic Explanations
Logical Model
Logistic Regression
Logistic regression in sklearn & Gradient Descent
loss function
Loss versus Cost function
LSTM
Machine Learning Algorithms
Machine Learning Operations
Machine Learning Workflow
Maintainable Code
Makefile
Manifold learning
Many-to-Many Relationships
Markov chain
Markov Decision Processes
Mathematical Reasoning in Transformers
mean absolute error
Mean Squared Error
melt
Memory
Memory Caching
Mermaid
Metadata Handling
Methods for Handling Outliers
Microsoft Access
Mini-batch gradient descent
Mixture of Experts
ML Engineer
MNIST
Model Building
Model Cascading
Model Deployment
Model Ensemble
Model Evaluation
Model Evaluation vs Model Optimisation
Model Interpretability
Model Observability
Model Optimisation
Model Parameters
Model Parameters Tuning
Model parameters vs hyperparameters
Model preparation
Model Selection
Model Validation
Momentum
Momentum.py
MongoDB
Monolith Architecture
Multi-Agent Reinforcement Learning
Multi-head attention
Multicollinearity
Multinomial Naive bayes
MySql
Naive Bayes
Natural Language Processing
Network Design
Neural network
Neural Network Classification
Neural network in Practice
Neural Scaling Laws
Ngrams
nltk
Non-parametric tests
Normalisation
Normalisation of data
Normalisation of Text
Normalisation vs Standardisation
NoSQL
NotebookLM
npy Files A NumPy Array storage
OLTP
oltp (online transactional processing)
One Pager Template
One-hot encoding
Optimisation function
Optimisation techniques
Optimising a Logistic Regression Model
Optimising Neural Networks
Optuna
Ordinary Least Squares
Orthogonalization
Outliers
Overfitting in Machine Learning
p values
p-values in linear regression in sklearn
Pandas
Parametric tests
parametric vs non-parametric models
parametric vs non-parametric tests
Parquet
parsimonious
Part of speech tagging
PCA Explained Variance Ratio
pdoc
PDP and ICE
Performance Drift in Machine Learning
Physical Model
PostgresSQL
PowerBI
Powerquery
PowerShell
Powershell versus cmd
Powershell vs Bash
Precision
Precision or Recall
Precision-Recall Curve
Preprocessing
Primary Key
Principal Component Analysis
Probability in other fields
Problem Definition
programming languages
Prompt engineering
Prompt Extracting information from blog posts
Prompting
Publish and Subscribe
Pull Request Template
PyCaret
Pycaret_Example.py
Pydantic
Pydantic_More.py
Pydantic.py
Pyright
Pyright vs Pydantic
PySpark
Pytest
Python
PyTorch
Pytorch vs Tensorflow
Q-Learning
Quartz
QUERY GSheets
Query Optimisation
Querying
R squared
R-squared metric not always a good indicator of model performance in regression
RAG
Random Forest Regression
Random Forests
React
Reasoning tokens
Recall
Recommender systems
Recurrent Neural Networks
Regression Analysis and its Applications
Regression metrics
Regularisation of Tree based models
Regularisation.py
Regularization in Machine Learning
Reinforcement learning
Relating Tables Together
Relationships in memory
REST API
Reward Function
Ridge
ROC (Receiver Operating Characteristic)
ROC_Curve.py
rollup
Row-based Storage
Sarsa
Scala
Scalability
Scaling Agentic Systems
Scaling databases
Scaling Server
Scientific Method
Search
Security
semantic layer
Semantic Relationships
Sentence Similarity
shapefile
SHapley Additive exPlanations
Sharepoint
Silhouette Analysis
Single source of truth
Sklearn
sklearn datasets
Sklearn Pipiline
Small Language Models
Smart Grids
SMOTE (Synthetic Minority Over-sampling Technique)
SMSS
Snowflake
Snowflake Schema
Software Development
Software Development Life Cycle
SparseCategorialCrossentropy or CategoricalCrossEntropy
Specificity
Spreadsheets to Databases
SQL vs NoSQL
SQLite
Stack
Stacking
Standard deviation
Standardisation
Star Schema
Statistical Assumptions
Statistical Tests
Statistics
Stemming
Stochastic Gradient Descent
Strongly vs Weakly typed language
Summarisation
Supervised Learning
Support Vector Classifier (SVC)
Support Vector Machines
Support Vector Regression
SVM_Example.py
Symbolic computation
Sympy
syntactic relationships
t-SNE
Tableau
Technical Analysis of Named Entity Recognition
Technical Debt
Technical Design Doc Template
Telecommunications
Tensorflow
Terminal commands
Test Loss When Evaluating Models
Testing
Testing_Pytest.py
Testing_unittest.py
Text2Cypher
TF-IDF
The Data Hierarchy of Needs
Thinking Systems
Time Series
Time Series Forecasting
Time Series Identify Trends and Patterns
Tokenisation
TOML
tool.bandit
tool.ruff
tool.uv
Train-Dev-Test Sets
Transaction
Transfer Learning
transfer_learning.py
Transformed Target Regressor
Transformer
Transformers vs RNNs
Types of Computational Bugs
Types of Database Schema
Types of Neural Networks
TypeScript
Typical Output Formats in Neural Networks
Ubuntu
UML
unittest
unstructured data
Unsupervised learning
Untitled
Use Cases for a Simple Neural Network Like
Use of RNNs in energy sector
Using SQLite to Process and Split Combined Data from Excel
vanishing and exploding gradients problem
variance
Vector Database
Vector Embedding
Vector_Embedding.py
Vectorisation
Vectorized Engine
View Use Case
Views
Violin plot
Virtual environments
WCSS and elbow method
Weak Learners
Web Feature Server (WFS)
Web Map Tile Service (WMTS)
What algorithms or models are used within the energy sector
What algorithms or models are used within the telecommunication sector
What are Data Processing Techniques (row-based, columnar, vectorized)?
What are the best practices for evaluating the effectiveness of different prompts
What are the top Cloud Providers?
What can ABM solve within the energy sector
What is a Data Lake?
What is a Data Lakehouse?
What is a Data Product?
What is a Data Warehouse?
What is a Jinja Template?
What is a Lambda Architecture?
What is a Metric?
What is a policy in RL
What is a Push-Down?
What is a Soft Delete?
What is a Storage Layer / Object Store?
What is an In-Memory Format?
What is Apache Airflow?
What is Apache Spark?
What is Business Intelligence
What is Dagster?
What is Data Governance?
What is Data Integration?
What is Data Lineage?
What is Data Literacy?
What is Data Observability?
What is Data Quality?
What is data transformation?
What is declarative?
What is DevOps?
What is ETL?
What is Functional Programming?
What is Granularity
What is imperative?
What is Kubernetes?
What is Machine Learning?
What is MapReduce?
What is Master Data Management (MDM)?
What is Normalization?
What is OLAP (Online Analytical Processing)?
What is Reverse ETL?
What is Schema Evolution?
What is semi-structured data?
What is Slowly Changing Dimension?
What is SQL?
What is structured data?
What is the Big-O Notation?
What is the difference between odds and probability
What is the role of gradient-based optimization in training deep learning models.
What is YAML?
When and why not to us regularisation
Why and when is feature scaling necessary
Why does increasing the number of models in a ensemble not necessarily improve the accuracy
Why does the Adam Optimizer converge
Why is named entity recognition (NER) a challenging task
Why is the Central Limit Theorem important when working with small sample sizes
Why JSON is Better than Pickle for Untrusted Data
Why Type 1 and Type 2 matter
Why Use Views
Wikipedia_API.py
Windows Subsystem for Linux
Word2vec
Word2Vec.py
Wrapper Methods
XGBoost
Z-Normalisation
Z-NormalisationZ-Score
Terminal commands
jupyter nbconvert K-Means_VideoGames_Raw.ipynb —to python —no-prompt