Data Archive

      • pages
        • Data Archive
        • DE_Tools
        • ML_Tools
        • Queries
        • Quotes
      • standardised
        • 1-on-1 Template
        • AB testing
        • Accessing Gen AI generated content
        • Accuracy
        • ACID Transaction
        • Activation atlases
        • Activation Function
        • Active Learning
        • Ada boosting
        • Adam Optimizer
        • Adaptive Learning Rates
        • Adding a database to PostgreSQL
        • Addressing Multicollinearity
        • Addressing_Multicollinearity.py
        • Adjusted R squared
        • Agent-based modelling
        • Agentic Solutions
        • Aggregation
        • AI Engineer
        • AI governance
        • Algorithms
        • Alternatives to Batch Processing
        • Amazon S3
        • Anomaly Detection
        • Anomaly Detection in Time Series
        • Anomaly Detection with Clustering
        • Anomaly Detection with Statistical Methods
        • Apache Kafka
        • API
        • API Driven Microservices
        • ARIMA
        • Asking questions
        • Attack mitigation
        • Attack types
        • Attention Is All You Need
        • Attention mechanism
        • AUC
        • Automated Feature Creation
        • AWS Lambda
        • Azure
        • B-tree
        • Backpropagation in Neural Networks
        • Bag of words
        • Bag_of_Words.py
        • Bagging
        • Bandit example output
        • Bandit_Example_Fixed.py
        • Bandit_Example_Nonfixed.py
        • Bash
        • Batch Normalisation
        • Batch Processing
        • Bellman Equations
        • Benefits of Data Transformation
        • Bernoulli
        • BERT
        • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
        • BERTScore
        • Bias and variance
        • Big Data
        • BigQuery
        • binary classification
        • Binder
        • Boosting
        • Bootstrap
        • Boxplot
        • Business observability
        • Business value of anomaly detection
        • Career Interest
        • Casual Inference
        • CatBoost
        • Central Limit Theorem
        • Chain of thought
        • Change Management
        • Checksum
        • Chi-Squared Test
        • Choosing a Threshold
        • Choosing the Number of Clusters
        • CI-CD
        • Class Separability
        • Classification
        • Classification Report
        • Claude
        • cleaning terminal path
        • Click_Implementation.py
        • Clustering
        • Clustering_Dashboard.py
        • Clustermap
        • Code Diagrams
        • Columnar Storage
        • Command line
        • Command Prompt
        • Common Table Expression
        • Communication principles
        • Communication Techniques
        • Comparing LLM
        • Comparing_Ensembles.py
        • Components of the database
        • Computer Science
        • Concatenate
        • conceptual data model
        • Conceptual Model
        • Concurrency
        • Confidence Interval
        • Confusion Matrix
        • Continuous Delivery - Deployment
        • Continuous Integration
        • Converting categorical variables to a dummy indicators
        • Convolutional Neural Networks
        • Correlation
        • Correlation vs Causation
        • Cosine Similarity
        • Cost Function
        • Cost-Sensitive Analysis
        • Covariance
        • Covariance Structures
        • Covariance vs Correlation
        • Covering Index
        • Cron jobs
        • Cross Entropy
        • Cross validation
        • Cross_Entropy_Single.py
        • Cross_Entropy.py
        • Crosstab
        • CRUD
        • Cryptography
        • Current challenges within the energy sector
        • Cypher
        • Dash
        • Dashboarding
        • Data AI Education at Work
        • Data Analysis
        • Data Analysis Portal
        • Data Analyst
        • Data Architect
        • Data Archive Graph Analysis
        • data asset
        • Data Cleansing
        • Data Collection
        • Data Contract
        • Data Distribution
        • Data Drift
        • Data Engineer
        • Data Engineering
        • Data Engineering Portal
        • Data Engineering Tools
        • Data Ingestion
        • Data Integrity
        • Data Leakage
        • Data Lifecycle Management
        • Data Management
        • Data Mining - CRISP
        • Data Modelling
        • Data Orchestration
        • Data Pipeline
        • Data Pipeline to Data Products
        • Data Principles
        • Data Reduction
        • Data Roles
        • Data Science
        • Data Scientist
        • Data Security
        • Data Selection
        • Data Selection in ML
        • Data Steward
        • Data storage
        • Data Streaming
        • Data Terms
        • Data transformation in Data Engineering
        • Data transformation in Machine Learning
        • Data Transformation with Pandas
        • Data Validation
        • data virtualization
        • Data Visualisation
        • Database
        • Database Index
        • Database Management System (DBMS)
        • Database schema
        • Database Techniques
        • Databricks
        • Databricks vs Snowflake
        • Datasets
        • DBScan
        • dbt
        • Debugging
        • Debugging ipynb
        • Debugging.py
        • Decision Tree
        • Deep Learning Frameworks
        • Deep Learning Overview
        • Deep Q-Learning
        • DeepSeek
        • Deleting rows or filling them with the mean is not always best
        • Demand forecasting
        • Dendrograms
        • dependency manager
        • Design Thinking Questions
        • Determining Threshold Values
        • Difference between Databricks vs. Snowflake
        • Difference between snowflake to hadoop
        • Differentation
        • Digital Transformation
        • Digital twin
        • Dimension Table
        • Dimensional Modelling
        • Dimensionality Reduction
        • dimensions
        • Directed Acyclic Graph (DAG)
        • Directory Structure
        • Distillation
        • Distributed Computing
        • Distribution_Analysis.py
        • Distributions
        • Docker
        • Docker Image
        • Documentation & Meetings
        • Dropout
        • DS & ML Portal
        • duckdb
        • DuckDB in python
        • DuckDB vs SQLite
        • Dummy variable trap
        • EDA
        • EDA_Pandas.py
        • Edge Machine Learning Models
        • Education and Training
        • Elastic Net
        • ELT
        • Embedded Methods
        • embeddings for OOV words
        • emergent behavior
        • Encoding Categorical Variables
        • Energy
        • Energy ABM
        • Energy Storage
        • Environment Variables
        • Epoch
        • Epub
        • ER Diagrams
        • Estimator
        • ETL Pipeline example
        • ETL vs. ELT
        • etlt
        • Evaluating Language Models
        • Evaluation Metrics
        • Event Driven
        • Event Driven Events
        • Event Driven Microservices
        • Event-Driven Architecture
        • Everything
        • Excel & Sheets
        • Explain different gradient descent algorithms, their advantages, and limitations.
        • Explain the curse of dimensionality
        • Exploration
        • Exploration vs. Exploitation
        • F1 Score
        • Fabric
        • fact table
        • Factor Analysis
        • Factor_Analysis.py
        • facts
        • FAISS
        • FastAPI
        • FastAPI_Example.py
        • Feature Engineering
        • Feature Evaluation
        • Feature Extraction
        • Feature Importance
        • Feature Scaling
        • Feature Selection
        • Feature selection and creation
        • Feature Selection vs Feature Importance
        • Feature_Distribution.py
        • Feed Forward Neural Network
        • Feedback Template
        • Filter method
        • filter methods
        • Firebase
        • Fishbone diagram
        • Fitting weights and biases of a neural network
        • Flask
        • Folder Tree Diagram
        • Forecasting_AutoArima.py
        • Forecasting_Baseline.py
        • Forecasting_Exponential_Smoothing.py
        • Foreign Key
        • Forward Propagation in Neural Networks
        • Fuzzywuzzy
        • Gartner Hype Cycle
        • Gaussian Distribution
        • Gaussian Mixture Models
        • Gaussian Model
        • gaussian_mixture_model_implementation.py
        • General Linear Regression
        • Generative Adversarial Networks
        • Generative AI
        • Generative AI From Theory to Practice
        • Get data
        • Gini Impurity
        • Gini Impurity vs Cross Entropy
        • GIS
        • Git
        • Gitlab
        • gitlab-ci.yml
        • Google Cloud Platform
        • Google My Maps Data Extraction
        • Gradient Boosting
        • Gradient Boosting Regressor
        • Gradient Descent
        • Gradio
        • Grain
        • Grammar method
        • Graph Analysis Plugin
        • Graph Neural Network
        • Graph Theory
        • Graph Theory Community
        • GraphRAG
        • Grep
        • GridSeachCv
        • Groupby
        • Groupby vs Crosstab
        • Grouped plots
        • GRU
        • GSheets
        • Guardrails
        • Hadoop
        • Handling Different Distributions
        • Handling Missing Data
        • Handling_Missing_Data_Basic.ipynb
        • Handling_Missing_Data.ipynb
        • Handwritten Digit Classification
        • Hash
        • Heatmap
        • Heatmaps_Dendrograms.py
        • heterogeneous features
        • Hierarchical Clustering
        • High cross validation accuracy is not directly proportional to performance on unseen test data
        • Honkit
        • Hosting
        • How businesses use Gen AI
        • How do we evaluate of LLM Outputs
        • how do you do the data selection
        • How is reinforcement learning being combined with deep learning
        • How is schema evolution done in practice with SQL
        • How LLMs store facts
        • How to do git commit messages properly
        • How to model to improve demand forecasting
        • How to normalise a merged table
        • How to reduce the need for Gen AI responses
        • How to search within a graph
        • How to use Sklearn Pipeline
        • How would you decide between using TF-IDF and Word2Vec for text vectorization
        • Hugging Face
        • Hyperparameter
        • Hyperparameter Tuning
        • Hypothesis testing
        • Imbalanced Datasets
        • Imbalanced_Datasets_SMOTE.py
        • Immutable vs mutable
        • Impact of multicollinearity on model parameters
        • Implementing Database Schema
        • In NER how would you handle ambiguous entities
        • incremental synchronization
        • Industries of interest
        • inference
        • inference versus prediction
        • information theory
        • Input is Not Properly Sanitized
        • Interoperability
        • interoperable
        • Interpretability
        • Interpreting logistic regression model parameters
        • Interquartile Range (IQR) Detection
        • interview notepad
        • ipynb
        • Isolation Forest and Its Use in Anomaly Detection
        • Java vs JavaScript
        • JavaScript
        • Jobs to be done
        • Johnson–Lindenstrauss lemma
        • Joining Datasets
        • Json
        • Json to Yaml
        • Junction Tables
        • Justfile
        • K_Means.py
        • K-means
        • K-nearest neighbours
        • Kaggle Abalone regression example
        • Kernel Density Estimation
        • Kernelling
        • Key Differences of Web Feature Server (WFS) and Web Feature Server (WFS)
        • Kmeans vs GMM
        • Knowledge Graph
        • Knowledge graph vs RAG setup
        • Knowledge Graphs with Obsidian
        • Knowledge Work
        • Label encoding
        • Labelling data
        • Langchain
        • Language Model Output Optimisation
        • Language Models
        • Language Models Large (LLMs) vs Small (SLMs)
        • Lasso
        • Latency
        • Latent Dirichlet Allocation
        • LBFGS
        • learning rate
        • Learning Styles
        • lemmatization
        • LightGBM
        • LightGBM vs XGBoost vs CatBoost
        • Linear Discriminant Analysis
        • Linear Regression
        • Linked List
        • LLM
        • LLM Evaluation Metrics
        • Load Balancing
        • Local Interpretable Model-agnostic Explanations
        • Logical Model
        • Logistic Regression
        • Logistic Regression does not predict probabilities
        • Logistic regression in sklearn & Gradient Descent
        • Logistic Regression Statsmodel Summary table
        • Looker Studio
        • loss function
        • Loss versus Cost function
        • LSTM
        • Machine Learning Algorithms
        • Machine Learning Operations
        • maintainability
        • Maintainable Code
        • Makefile
        • Manifold learning
        • Many-to-Many Relationships
        • Markov chain
        • Markov Decision Processes
        • Master Observability Datadog
        • Mathematical Reasoning in Transformers
        • Mathematics
        • Maximum Likelihood Estimation
        • mean absolute error
        • Mean Squared Error
        • mean vs median
        • melt
        • Memory
        • Memory Caching
        • Merge
        • Mermaid
        • Metadata Handling
        • Methods for Handling Outliers
        • Microsoft Access
        • Mini-batch gradient descent
        • Mixture of Experts
        • ML Engineer
        • MNIST
        • Model Building
        • Model Cascading
        • Model Deployment
        • Model Ensemble
        • Model Evaluation
        • Model Evaluation vs Model Optimisation
        • Model Interpretability
        • Model Observability
        • Model Optimisation
        • Model Parameters
        • Model Parameters Tuning
        • Model parameters vs hyperparameters
        • Model preparation
        • Model Selection
        • Model Validation
        • Momentum
        • Momentum.py
        • MongoDB
        • Monolith Architecture
        • Monte Carlo Simulation
        • Multi-Agent Reinforcement Learning
        • Multi-head attention
        • Multi-level index
        • Multicollinearity
        • Multinomial Naive bayes
        • MySql
        • Naive Bayes
        • Natural Language Processing
        • nbconvert
        • neo4j
        • neomodel
        • Network Design
        • Neural network
        • Neural Network Classification
        • Neural network in Practice
        • Neural Scaling Laws
        • Ngrams
        • nltk
        • Node.JS
        • Non-parametric tests
        • Normalisation
        • Normalisation of data
        • Normalisation of Text
        • Normalisation vs Standardisation
        • NoSQL
        • NotebookLM
        • npy Files A NumPy Array storage
        • Object Relational Mapper
        • OLAP
        • OLTP
        • oltp (online transactional processing)
        • One Pager Template
        • One_hot_encoding.py
        • One-hot encoding
        • Optimisation function
        • Optimisation techniques
        • Optimising a Logistic Regression Model
        • Optimising Neural Networks
        • Optuna
        • Ordinary Least Squares
        • Orthogonalization
        • Outliers
        • Over parameterised models
        • Overfitting in Machine Learning
        • p values
        • p-values in linear regression in sklearn
        • Page Rank
        • Pandas
        • Pandas Dataframe Agent
        • Pandas join vs merge
        • Pandas Pivot Table
        • Pandas Stack
        • Pandas_Common.py
        • Pandas_Stack.py
        • Parametric tests
        • parametric vs non-parametric models
        • parametric vs non-parametric tests
        • Parquet
        • parsimonious
        • Part of speech tagging
        • PCA Explained Variance Ratio
        • PCA Principal Components
        • PCA_Analysis.ipynb
        • PCA_Based_Anomaly_Detection.py
        • PCA-Based Anomaly Detection
        • pd.Grouper
        • PDF++
        • pdoc
        • PDP and ICE
        • Percentile Detection
        • Performance Dimensions
        • Performance Drift in Machine Learning
        • Physical Model
        • Pickle
        • Plotly
        • pmdarima
        • Poetry
        • Positional Encoding
        • PostgreSQL
        • PowerBI
        • Powerquery
        • PowerShell
        • Powershell versus cmd
        • Powershell vs Bash
        • Precision
        • Precision or Recall
        • Precision-Recall Curve
        • Prediction Intervals
        • Preprocessing
        • Prevention Is Better Than the Cure
        • Primary Key
        • Principal Component Analysis
        • Probability in other fields
        • Problem Definition
        • programming languages
        • Prompt engineering
        • Prompt Extracting information from blog posts
        • Prompting
        • Proportion Test
        • Publish and Subscribe
        • Pull Request Template
        • PyCaret
        • Pycaret_Anomaly.ipynb
        • Pycaret_Example.py
        • Pydantic
        • Pydantic_More.py
        • Pydantic.py
        • PyGraphviz
        • PyOD
        • Pyright
        • Pyright vs Pydantic
        • PySpark
        • Pytest
        • Python
        • Python Click
        • PyTorch
        • Pytorch vs Tensorflow
        • Q-Learning
        • Q-Q Plot
        • Quartz
        • QUERY GSheets
        • Query Optimisation
        • Query Plan
        • Querying
        • QuickSort
        • R
        • R squared
        • R-squared metric not always a good indicator of model performance in regression
        • Race Conditions
        • RAG
        • Random Forest Regression
        • Random Forests
        • React
        • Reasoning tokens
        • Recall
        • Recommender systems
        • Recurrent Neural Networks
        • Recursive Algorithm
        • Regression Analysis and its Applications
        • Regression metrics
        • Regression_Logistic_Metrics.ipynb
        • Regularisation of Tree based models
        • Regularisation.py
        • Regularization in Machine Learning
        • Reinforcement learning
        • Relating Tables Together
        • Relational Database
        • Relationships in memory
        • requirements.txt
        • REST API
        • Reward Function
        • Ridge
        • ROC (Receiver Operating Characteristic)
        • ROC_Curve.py
        • rollup
        • Row-based Storage
        • Sarsa
        • Scala
        • Scalability
        • Scaling Agentic Systems
        • Scaling Server
        • Scheduled Tasks
        • Scientific Method
        • Seaborn
        • Search
        • Security mitigation
        • Security Researcher
        • Security Vulnerabilities
        • semantic layer
        • Semantic Relationships
        • Sentence Similarity
        • shapefile
        • SHapley Additive exPlanations
        • Sharepoint
        • Silhouette Analysis
        • Similarity Search
        • Single source of truth
        • Sklearn
        • sklearn datasets
        • Sklearn Pipiline
        • Small Language Models
        • Smart Grids
        • SMOTE (Synthetic Minority Over-sampling Technique)
        • SMSS
        • Snowflake
        • Snowflake Schema
        • Software Design Patterns
        • Software Development Life Cycle
        • Software Development Portal
        • spaCy
        • SparseCategorialCrossentropy or CategoricalCrossEntropy
        • Specificity
        • Spreadsheets vs Databases
        • SQL Groupby
        • SQL Injection
        • SQL Joins
        • SQL vs NoSQL
        • SQL Window functions
        • SQLAlchemy
        • SQLAlchemy vs. sqlite3
        • SQLite
        • SQLite Studio
        • Stacking
        • Standard deviation
        • Standardisation
        • Star Schema
        • Statistical Assumptions
        • Statistical Tests
        • Statistics
        • Stemming
        • Stochastic Gradient Descent
        • Stored Procedures
        • Strongly vs Weakly typed language
        • Structuring and organizing data
        • Summarisation
        • Supervised Learning
        • Support Vector Classifier (SVC)
        • Support Vector Machines
        • Support Vector Regression
        • SVM_Example.py
        • Symbolic computation
        • Sympy
        • syntactic relationships
        • t-SNE
        • T-test
        • Tableau
        • Tags
        • Technical Analysis of Named Entity Recognition
        • Technical Debt
        • Technical Design Doc Template
        • Telecommunications
        • Tensorflow
        • Terminal commands
        • Test Loss When Evaluating Models
        • Testing
        • Testing_Pytest.py
        • Testing_unittest.py
        • Text2Cypher
        • TF-IDF
        • The Data Hierarchy of Needs
        • Thinking Systems
        • Time Series
        • Time Series Forecasting
        • Time Series Identify Trends and Patterns
        • Tokenisation
        • TOML
        • tool.bandit
        • tool.ruff
        • tool.uv
        • topic modeling
        • Train-Dev-Test Sets
        • Transaction
        • Transfer Learning
        • transfer_learning.py
        • Transformed Target Regressor
        • Transformer
        • Transformers vs RNNs
        • TS_Anomaly_Detection
        • TS_Anomaly_Detection.py
        • Turning a flat file into a database
        • Types of Computational Bugs
        • Types of Database Schema
        • Types of Neural Networks
        • TypeScript
        • Typical Output Formats in Neural Networks
        • Ubuntu
        • UML
        • unittest
        • univariate vs multivariate
        • unstructured data
        • Unsupervised learning
        • Untitled
        • Untitled 1
        • Untitled 2
        • Untitled 3
        • Use Cases for a Simple Neural Network Like
        • Use of RNNs in energy sector
        • Utilities
        • Vacuum
        • vanishing and exploding gradients problem
        • variance
        • Vector Database
        • Vector Embedding
        • Vector_Embedding.py
        • Vectorisation
        • Vectorized Engine
        • Vercel
        • View Use Case
        • Views
        • Violin plot
        • Virtual environments
        • WCSS and elbow method
        • Weak Learners
        • Web Feature Server (WFS)
        • Web Map Tile Service (WMTS)
        • Webpages relevant
        • What algorithms or models are used within the energy sector
        • What algorithms or models are used within the telecommunication sector
        • What are Data Processing Techniques (row-based, columnar, vectorized)?
        • What are the best practices for evaluating the effectiveness of different prompts
        • What are the top Cloud Providers?
        • What can ABM solve within the energy sector
        • What is a Data Lake?
        • What is a Data Lakehouse?
        • What is a Data Product?
        • What is a Data Warehouse?
        • What is a Jinja Template?
        • What is a Lambda Architecture?
        • What is a Metric?
        • What is a policy in RL
        • What is a Push-Down?
        • What is a Soft Delete?
        • What is a Storage Layer / Object Store?
        • What is an In-Memory Format?
        • What is Apache Airflow?
        • What is Apache Spark?
        • What is Business Intelligence
        • What is Dagster?
        • What is Data Governance?
        • What is Data Integration?
        • What is Data Lineage?
        • What is Data Literacy?
        • What is Data Observability?
        • What is Data Quality?
        • What is data transformation?
        • What is declarative?
        • What is DevOps?
        • What is ETL?
        • What is Functional Programming?
        • What is Granularity
        • What is imperative?
        • What is Kubernetes?
        • What is Machine Learning?
        • What is MapReduce?
        • What is Master Data Management (MDM)?
        • What is Normalization?
        • What is OLAP (Online Analytical Processing)?
        • What is Reverse ETL?
        • What is Schema Evolution?
        • What is semi-structured data?
        • What is Slowly Changing Dimension?
        • What is SQL?
        • What is structured data?
        • What is the Big-O Notation?
        • What is the difference between odds and probability
        • What is the role of gradient-based optimization in training deep learning models.
        • What is YAML?
        • When and why not to us regularisation
        • Why and when is feature scaling necessary
        • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
        • Why does label encoding give different predictions from one-hot encoding
        • Why does the Adam Optimizer converge
        • Why is named entity recognition (NER) a challenging task
        • Why is the Central Limit Theorem important when working with small sample sizes
        • Why JSON is Better than Pickle for Untrusted Data
        • Why Removing Outliers May Improve Regression but Harm Classification
        • Why Type 1 and Type 2 matter
        • Why use ER diagrams
        • Wikipedia_API.py
        • Windows Subsystem for Linux
        • Word2vec
        • Word2Vec.py
        • WordNet
        • Wrapper Methods
        • XGBoost
        • Z-Normalisation
        • Z-Score
        • Z-Scores vs Prediction Intervals
        • Z-Test

    spaCy


    Backlinks

    • Vector Embedding
    • embeddings for OOV words
        • pages
          • Data Archive
          • DE_Tools
          • ML_Tools
          • Queries
          • Quotes
        • standardised
          • 1-on-1 Template
          • AB testing
          • Accessing Gen AI generated content
          • Accuracy
          • ACID Transaction
          • Activation atlases
          • Activation Function
          • Active Learning
          • Ada boosting
          • Adam Optimizer
          • Adaptive Learning Rates
          • Adding a database to PostgreSQL
          • Addressing Multicollinearity
          • Addressing_Multicollinearity.py
          • Adjusted R squared
          • Agent-based modelling
          • Agentic Solutions
          • Aggregation
          • AI Engineer
          • AI governance
          • Algorithms
          • Alternatives to Batch Processing
          • Amazon S3
          • Anomaly Detection
          • Anomaly Detection in Time Series
          • Anomaly Detection with Clustering
          • Anomaly Detection with Statistical Methods
          • Apache Kafka
          • API
          • API Driven Microservices
          • ARIMA
          • Asking questions
          • Attack mitigation
          • Attack types
          • Attention Is All You Need
          • Attention mechanism
          • AUC
          • Automated Feature Creation
          • AWS Lambda
          • Azure
          • B-tree
          • Backpropagation in Neural Networks
          • Bag of words
          • Bag_of_Words.py
          • Bagging
          • Bandit example output
          • Bandit_Example_Fixed.py
          • Bandit_Example_Nonfixed.py
          • Bash
          • Batch Normalisation
          • Batch Processing
          • Bellman Equations
          • Benefits of Data Transformation
          • Bernoulli
          • BERT
          • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
          • BERTScore
          • Bias and variance
          • Big Data
          • BigQuery
          • binary classification
          • Binder
          • Boosting
          • Bootstrap
          • Boxplot
          • Business observability
          • Business value of anomaly detection
          • Career Interest
          • Casual Inference
          • CatBoost
          • Central Limit Theorem
          • Chain of thought
          • Change Management
          • Checksum
          • Chi-Squared Test
          • Choosing a Threshold
          • Choosing the Number of Clusters
          • CI-CD
          • Class Separability
          • Classification
          • Classification Report
          • Claude
          • cleaning terminal path
          • Click_Implementation.py
          • Clustering
          • Clustering_Dashboard.py
          • Clustermap
          • Code Diagrams
          • Columnar Storage
          • Command line
          • Command Prompt
          • Common Table Expression
          • Communication principles
          • Communication Techniques
          • Comparing LLM
          • Comparing_Ensembles.py
          • Components of the database
          • Computer Science
          • Concatenate
          • conceptual data model
          • Conceptual Model
          • Concurrency
          • Confidence Interval
          • Confusion Matrix
          • Continuous Delivery - Deployment
          • Continuous Integration
          • Converting categorical variables to a dummy indicators
          • Convolutional Neural Networks
          • Correlation
          • Correlation vs Causation
          • Cosine Similarity
          • Cost Function
          • Cost-Sensitive Analysis
          • Covariance
          • Covariance Structures
          • Covariance vs Correlation
          • Covering Index
          • Cron jobs
          • Cross Entropy
          • Cross validation
          • Cross_Entropy_Single.py
          • Cross_Entropy.py
          • Crosstab
          • CRUD
          • Cryptography
          • Current challenges within the energy sector
          • Cypher
          • Dash
          • Dashboarding
          • Data AI Education at Work
          • Data Analysis
          • Data Analysis Portal
          • Data Analyst
          • Data Architect
          • Data Archive Graph Analysis
          • data asset
          • Data Cleansing
          • Data Collection
          • Data Contract
          • Data Distribution
          • Data Drift
          • Data Engineer
          • Data Engineering
          • Data Engineering Portal
          • Data Engineering Tools
          • Data Ingestion
          • Data Integrity
          • Data Leakage
          • Data Lifecycle Management
          • Data Management
          • Data Mining - CRISP
          • Data Modelling
          • Data Orchestration
          • Data Pipeline
          • Data Pipeline to Data Products
          • Data Principles
          • Data Reduction
          • Data Roles
          • Data Science
          • Data Scientist
          • Data Security
          • Data Selection
          • Data Selection in ML
          • Data Steward
          • Data storage
          • Data Streaming
          • Data Terms
          • Data transformation in Data Engineering
          • Data transformation in Machine Learning
          • Data Transformation with Pandas
          • Data Validation
          • data virtualization
          • Data Visualisation
          • Database
          • Database Index
          • Database Management System (DBMS)
          • Database schema
          • Database Techniques
          • Databricks
          • Databricks vs Snowflake
          • Datasets
          • DBScan
          • dbt
          • Debugging
          • Debugging ipynb
          • Debugging.py
          • Decision Tree
          • Deep Learning Frameworks
          • Deep Learning Overview
          • Deep Q-Learning
          • DeepSeek
          • Deleting rows or filling them with the mean is not always best
          • Demand forecasting
          • Dendrograms
          • dependency manager
          • Design Thinking Questions
          • Determining Threshold Values
          • Difference between Databricks vs. Snowflake
          • Difference between snowflake to hadoop
          • Differentation
          • Digital Transformation
          • Digital twin
          • Dimension Table
          • Dimensional Modelling
          • Dimensionality Reduction
          • dimensions
          • Directed Acyclic Graph (DAG)
          • Directory Structure
          • Distillation
          • Distributed Computing
          • Distribution_Analysis.py
          • Distributions
          • Docker
          • Docker Image
          • Documentation & Meetings
          • Dropout
          • DS & ML Portal
          • duckdb
          • DuckDB in python
          • DuckDB vs SQLite
          • Dummy variable trap
          • EDA
          • EDA_Pandas.py
          • Edge Machine Learning Models
          • Education and Training
          • Elastic Net
          • ELT
          • Embedded Methods
          • embeddings for OOV words
          • emergent behavior
          • Encoding Categorical Variables
          • Energy
          • Energy ABM
          • Energy Storage
          • Environment Variables
          • Epoch
          • Epub
          • ER Diagrams
          • Estimator
          • ETL Pipeline example
          • ETL vs. ELT
          • etlt
          • Evaluating Language Models
          • Evaluation Metrics
          • Event Driven
          • Event Driven Events
          • Event Driven Microservices
          • Event-Driven Architecture
          • Everything
          • Excel & Sheets
          • Explain different gradient descent algorithms, their advantages, and limitations.
          • Explain the curse of dimensionality
          • Exploration
          • Exploration vs. Exploitation
          • F1 Score
          • Fabric
          • fact table
          • Factor Analysis
          • Factor_Analysis.py
          • facts
          • FAISS
          • FastAPI
          • FastAPI_Example.py
          • Feature Engineering
          • Feature Evaluation
          • Feature Extraction
          • Feature Importance
          • Feature Scaling
          • Feature Selection
          • Feature selection and creation
          • Feature Selection vs Feature Importance
          • Feature_Distribution.py
          • Feed Forward Neural Network
          • Feedback Template
          • Filter method
          • filter methods
          • Firebase
          • Fishbone diagram
          • Fitting weights and biases of a neural network
          • Flask
          • Folder Tree Diagram
          • Forecasting_AutoArima.py
          • Forecasting_Baseline.py
          • Forecasting_Exponential_Smoothing.py
          • Foreign Key
          • Forward Propagation in Neural Networks
          • Fuzzywuzzy
          • Gartner Hype Cycle
          • Gaussian Distribution
          • Gaussian Mixture Models
          • Gaussian Model
          • gaussian_mixture_model_implementation.py
          • General Linear Regression
          • Generative Adversarial Networks
          • Generative AI
          • Generative AI From Theory to Practice
          • Get data
          • Gini Impurity
          • Gini Impurity vs Cross Entropy
          • GIS
          • Git
          • Gitlab
          • gitlab-ci.yml
          • Google Cloud Platform
          • Google My Maps Data Extraction
          • Gradient Boosting
          • Gradient Boosting Regressor
          • Gradient Descent
          • Gradio
          • Grain
          • Grammar method
          • Graph Analysis Plugin
          • Graph Neural Network
          • Graph Theory
          • Graph Theory Community
          • GraphRAG
          • Grep
          • GridSeachCv
          • Groupby
          • Groupby vs Crosstab
          • Grouped plots
          • GRU
          • GSheets
          • Guardrails
          • Hadoop
          • Handling Different Distributions
          • Handling Missing Data
          • Handling_Missing_Data_Basic.ipynb
          • Handling_Missing_Data.ipynb
          • Handwritten Digit Classification
          • Hash
          • Heatmap
          • Heatmaps_Dendrograms.py
          • heterogeneous features
          • Hierarchical Clustering
          • High cross validation accuracy is not directly proportional to performance on unseen test data
          • Honkit
          • Hosting
          • How businesses use Gen AI
          • How do we evaluate of LLM Outputs
          • how do you do the data selection
          • How is reinforcement learning being combined with deep learning
          • How is schema evolution done in practice with SQL
          • How LLMs store facts
          • How to do git commit messages properly
          • How to model to improve demand forecasting
          • How to normalise a merged table
          • How to reduce the need for Gen AI responses
          • How to search within a graph
          • How to use Sklearn Pipeline
          • How would you decide between using TF-IDF and Word2Vec for text vectorization
          • Hugging Face
          • Hyperparameter
          • Hyperparameter Tuning
          • Hypothesis testing
          • Imbalanced Datasets
          • Imbalanced_Datasets_SMOTE.py
          • Immutable vs mutable
          • Impact of multicollinearity on model parameters
          • Implementing Database Schema
          • In NER how would you handle ambiguous entities
          • incremental synchronization
          • Industries of interest
          • inference
          • inference versus prediction
          • information theory
          • Input is Not Properly Sanitized
          • Interoperability
          • interoperable
          • Interpretability
          • Interpreting logistic regression model parameters
          • Interquartile Range (IQR) Detection
          • interview notepad
          • ipynb
          • Isolation Forest and Its Use in Anomaly Detection
          • Java vs JavaScript
          • JavaScript
          • Jobs to be done
          • Johnson–Lindenstrauss lemma
          • Joining Datasets
          • Json
          • Json to Yaml
          • Junction Tables
          • Justfile
          • K_Means.py
          • K-means
          • K-nearest neighbours
          • Kaggle Abalone regression example
          • Kernel Density Estimation
          • Kernelling
          • Key Differences of Web Feature Server (WFS) and Web Feature Server (WFS)
          • Kmeans vs GMM
          • Knowledge Graph
          • Knowledge graph vs RAG setup
          • Knowledge Graphs with Obsidian
          • Knowledge Work
          • Label encoding
          • Labelling data
          • Langchain
          • Language Model Output Optimisation
          • Language Models
          • Language Models Large (LLMs) vs Small (SLMs)
          • Lasso
          • Latency
          • Latent Dirichlet Allocation
          • LBFGS
          • learning rate
          • Learning Styles
          • lemmatization
          • LightGBM
          • LightGBM vs XGBoost vs CatBoost
          • Linear Discriminant Analysis
          • Linear Regression
          • Linked List
          • LLM
          • LLM Evaluation Metrics
          • Load Balancing
          • Local Interpretable Model-agnostic Explanations
          • Logical Model
          • Logistic Regression
          • Logistic Regression does not predict probabilities
          • Logistic regression in sklearn & Gradient Descent
          • Logistic Regression Statsmodel Summary table
          • Looker Studio
          • loss function
          • Loss versus Cost function
          • LSTM
          • Machine Learning Algorithms
          • Machine Learning Operations
          • maintainability
          • Maintainable Code
          • Makefile
          • Manifold learning
          • Many-to-Many Relationships
          • Markov chain
          • Markov Decision Processes
          • Master Observability Datadog
          • Mathematical Reasoning in Transformers
          • Mathematics
          • Maximum Likelihood Estimation
          • mean absolute error
          • Mean Squared Error
          • mean vs median
          • melt
          • Memory
          • Memory Caching
          • Merge
          • Mermaid
          • Metadata Handling
          • Methods for Handling Outliers
          • Microsoft Access
          • Mini-batch gradient descent
          • Mixture of Experts
          • ML Engineer
          • MNIST
          • Model Building
          • Model Cascading
          • Model Deployment
          • Model Ensemble
          • Model Evaluation
          • Model Evaluation vs Model Optimisation
          • Model Interpretability
          • Model Observability
          • Model Optimisation
          • Model Parameters
          • Model Parameters Tuning
          • Model parameters vs hyperparameters
          • Model preparation
          • Model Selection
          • Model Validation
          • Momentum
          • Momentum.py
          • MongoDB
          • Monolith Architecture
          • Monte Carlo Simulation
          • Multi-Agent Reinforcement Learning
          • Multi-head attention
          • Multi-level index
          • Multicollinearity
          • Multinomial Naive bayes
          • MySql
          • Naive Bayes
          • Natural Language Processing
          • nbconvert
          • neo4j
          • neomodel
          • Network Design
          • Neural network
          • Neural Network Classification
          • Neural network in Practice
          • Neural Scaling Laws
          • Ngrams
          • nltk
          • Node.JS
          • Non-parametric tests
          • Normalisation
          • Normalisation of data
          • Normalisation of Text
          • Normalisation vs Standardisation
          • NoSQL
          • NotebookLM
          • npy Files A NumPy Array storage
          • Object Relational Mapper
          • OLAP
          • OLTP
          • oltp (online transactional processing)
          • One Pager Template
          • One_hot_encoding.py
          • One-hot encoding
          • Optimisation function
          • Optimisation techniques
          • Optimising a Logistic Regression Model
          • Optimising Neural Networks
          • Optuna
          • Ordinary Least Squares
          • Orthogonalization
          • Outliers
          • Over parameterised models
          • Overfitting in Machine Learning
          • p values
          • p-values in linear regression in sklearn
          • Page Rank
          • Pandas
          • Pandas Dataframe Agent
          • Pandas join vs merge
          • Pandas Pivot Table
          • Pandas Stack
          • Pandas_Common.py
          • Pandas_Stack.py
          • Parametric tests
          • parametric vs non-parametric models
          • parametric vs non-parametric tests
          • Parquet
          • parsimonious
          • Part of speech tagging
          • PCA Explained Variance Ratio
          • PCA Principal Components
          • PCA_Analysis.ipynb
          • PCA_Based_Anomaly_Detection.py
          • PCA-Based Anomaly Detection
          • pd.Grouper
          • PDF++
          • pdoc
          • PDP and ICE
          • Percentile Detection
          • Performance Dimensions
          • Performance Drift in Machine Learning
          • Physical Model
          • Pickle
          • Plotly
          • pmdarima
          • Poetry
          • Positional Encoding
          • PostgreSQL
          • PowerBI
          • Powerquery
          • PowerShell
          • Powershell versus cmd
          • Powershell vs Bash
          • Precision
          • Precision or Recall
          • Precision-Recall Curve
          • Prediction Intervals
          • Preprocessing
          • Prevention Is Better Than the Cure
          • Primary Key
          • Principal Component Analysis
          • Probability in other fields
          • Problem Definition
          • programming languages
          • Prompt engineering
          • Prompt Extracting information from blog posts
          • Prompting
          • Proportion Test
          • Publish and Subscribe
          • Pull Request Template
          • PyCaret
          • Pycaret_Anomaly.ipynb
          • Pycaret_Example.py
          • Pydantic
          • Pydantic_More.py
          • Pydantic.py
          • PyGraphviz
          • PyOD
          • Pyright
          • Pyright vs Pydantic
          • PySpark
          • Pytest
          • Python
          • Python Click
          • PyTorch
          • Pytorch vs Tensorflow
          • Q-Learning
          • Q-Q Plot
          • Quartz
          • QUERY GSheets
          • Query Optimisation
          • Query Plan
          • Querying
          • QuickSort
          • R
          • R squared
          • R-squared metric not always a good indicator of model performance in regression
          • Race Conditions
          • RAG
          • Random Forest Regression
          • Random Forests
          • React
          • Reasoning tokens
          • Recall
          • Recommender systems
          • Recurrent Neural Networks
          • Recursive Algorithm
          • Regression Analysis and its Applications
          • Regression metrics
          • Regression_Logistic_Metrics.ipynb
          • Regularisation of Tree based models
          • Regularisation.py
          • Regularization in Machine Learning
          • Reinforcement learning
          • Relating Tables Together
          • Relational Database
          • Relationships in memory
          • requirements.txt
          • REST API
          • Reward Function
          • Ridge
          • ROC (Receiver Operating Characteristic)
          • ROC_Curve.py
          • rollup
          • Row-based Storage
          • Sarsa
          • Scala
          • Scalability
          • Scaling Agentic Systems
          • Scaling Server
          • Scheduled Tasks
          • Scientific Method
          • Seaborn
          • Search
          • Security mitigation
          • Security Researcher
          • Security Vulnerabilities
          • semantic layer
          • Semantic Relationships
          • Sentence Similarity
          • shapefile
          • SHapley Additive exPlanations
          • Sharepoint
          • Silhouette Analysis
          • Similarity Search
          • Single source of truth
          • Sklearn
          • sklearn datasets
          • Sklearn Pipiline
          • Small Language Models
          • Smart Grids
          • SMOTE (Synthetic Minority Over-sampling Technique)
          • SMSS
          • Snowflake
          • Snowflake Schema
          • Software Design Patterns
          • Software Development Life Cycle
          • Software Development Portal
          • spaCy
          • SparseCategorialCrossentropy or CategoricalCrossEntropy
          • Specificity
          • Spreadsheets vs Databases
          • SQL Groupby
          • SQL Injection
          • SQL Joins
          • SQL vs NoSQL
          • SQL Window functions
          • SQLAlchemy
          • SQLAlchemy vs. sqlite3
          • SQLite
          • SQLite Studio
          • Stacking
          • Standard deviation
          • Standardisation
          • Star Schema
          • Statistical Assumptions
          • Statistical Tests
          • Statistics
          • Stemming
          • Stochastic Gradient Descent
          • Stored Procedures
          • Strongly vs Weakly typed language
          • Structuring and organizing data
          • Summarisation
          • Supervised Learning
          • Support Vector Classifier (SVC)
          • Support Vector Machines
          • Support Vector Regression
          • SVM_Example.py
          • Symbolic computation
          • Sympy
          • syntactic relationships
          • t-SNE
          • T-test
          • Tableau
          • Tags
          • Technical Analysis of Named Entity Recognition
          • Technical Debt
          • Technical Design Doc Template
          • Telecommunications
          • Tensorflow
          • Terminal commands
          • Test Loss When Evaluating Models
          • Testing
          • Testing_Pytest.py
          • Testing_unittest.py
          • Text2Cypher
          • TF-IDF
          • The Data Hierarchy of Needs
          • Thinking Systems
          • Time Series
          • Time Series Forecasting
          • Time Series Identify Trends and Patterns
          • Tokenisation
          • TOML
          • tool.bandit
          • tool.ruff
          • tool.uv
          • topic modeling
          • Train-Dev-Test Sets
          • Transaction
          • Transfer Learning
          • transfer_learning.py
          • Transformed Target Regressor
          • Transformer
          • Transformers vs RNNs
          • TS_Anomaly_Detection
          • TS_Anomaly_Detection.py
          • Turning a flat file into a database
          • Types of Computational Bugs
          • Types of Database Schema
          • Types of Neural Networks
          • TypeScript
          • Typical Output Formats in Neural Networks
          • Ubuntu
          • UML
          • unittest
          • univariate vs multivariate
          • unstructured data
          • Unsupervised learning
          • Untitled
          • Untitled 1
          • Untitled 2
          • Untitled 3
          • Use Cases for a Simple Neural Network Like
          • Use of RNNs in energy sector
          • Utilities
          • Vacuum
          • vanishing and exploding gradients problem
          • variance
          • Vector Database
          • Vector Embedding
          • Vector_Embedding.py
          • Vectorisation
          • Vectorized Engine
          • Vercel
          • View Use Case
          • Views
          • Violin plot
          • Virtual environments
          • WCSS and elbow method
          • Weak Learners
          • Web Feature Server (WFS)
          • Web Map Tile Service (WMTS)
          • Webpages relevant
          • What algorithms or models are used within the energy sector
          • What algorithms or models are used within the telecommunication sector
          • What are Data Processing Techniques (row-based, columnar, vectorized)?
          • What are the best practices for evaluating the effectiveness of different prompts
          • What are the top Cloud Providers?
          • What can ABM solve within the energy sector
          • What is a Data Lake?
          • What is a Data Lakehouse?
          • What is a Data Product?
          • What is a Data Warehouse?
          • What is a Jinja Template?
          • What is a Lambda Architecture?
          • What is a Metric?
          • What is a policy in RL
          • What is a Push-Down?
          • What is a Soft Delete?
          • What is a Storage Layer / Object Store?
          • What is an In-Memory Format?
          • What is Apache Airflow?
          • What is Apache Spark?
          • What is Business Intelligence
          • What is Dagster?
          • What is Data Governance?
          • What is Data Integration?
          • What is Data Lineage?
          • What is Data Literacy?
          • What is Data Observability?
          • What is Data Quality?
          • What is data transformation?
          • What is declarative?
          • What is DevOps?
          • What is ETL?
          • What is Functional Programming?
          • What is Granularity
          • What is imperative?
          • What is Kubernetes?
          • What is Machine Learning?
          • What is MapReduce?
          • What is Master Data Management (MDM)?
          • What is Normalization?
          • What is OLAP (Online Analytical Processing)?
          • What is Reverse ETL?
          • What is Schema Evolution?
          • What is semi-structured data?
          • What is Slowly Changing Dimension?
          • What is SQL?
          • What is structured data?
          • What is the Big-O Notation?
          • What is the difference between odds and probability
          • What is the role of gradient-based optimization in training deep learning models.
          • What is YAML?
          • When and why not to us regularisation
          • Why and when is feature scaling necessary
          • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
          • Why does label encoding give different predictions from one-hot encoding
          • Why does the Adam Optimizer converge
          • Why is named entity recognition (NER) a challenging task
          • Why is the Central Limit Theorem important when working with small sample sizes
          • Why JSON is Better than Pickle for Untrusted Data
          • Why Removing Outliers May Improve Regression but Harm Classification
          • Why Type 1 and Type 2 matter
          • Why use ER diagrams
          • Wikipedia_API.py
          • Windows Subsystem for Linux
          • Word2vec
          • Word2Vec.py
          • WordNet
          • Wrapper Methods
          • XGBoost
          • Z-Normalisation
          • Z-Score
          • Z-Scores vs Prediction Intervals
          • Z-Test

      Created with Quartz v4.3.1 © 2025

      • GitHub
      • Linkedin