Data Archive

      • pages
        • Data Archive
        • DE_Tools
        • ML_Tools
        • Queries
        • Quotes
      • standardised
        • 1-on-1 Template
        • AB testing
        • Accessing Gen AI generated content
        • Accuracy
        • ACID Transaction
        • Activation atlases
        • Activation Function
        • Active Learning
        • Ada boosting
        • Adam Optimizer
        • Adaptive Learning Rates
        • Adding a database to PostgreSQL
        • Addressing Multicollinearity
        • Addressing_Multicollinearity.py
        • Adjusted R squared
        • Agent-based modelling
        • Agentic Solutions
        • Aggregation
        • AI Engineer
        • AI governance
        • Algorithms
        • Alternatives to Batch Processing
        • Amazon S3
        • Anomaly Detection
        • Anomaly Detection in Time Series
        • Anomaly Detection with Clustering
        • Anomaly Detection with Statistical Methods
        • Apache Kafka
        • API
        • API Driven Microservices
        • ARIMA
        • Asking questions
        • Attack mitigation
        • Attack types
        • Attention Is All You Need
        • Attention mechanism
        • AUC
        • Automated Feature Creation
        • AWS Lambda
        • Azure
        • B-tree
        • Backpropagation in Neural Networks
        • Bag of words
        • Bag_of_Words.py
        • Bagging
        • Bandit example output
        • Bandit_Example_Fixed.py
        • Bandit_Example_Nonfixed.py
        • Bash
        • Batch Normalisation
        • Batch Processing
        • Bellman Equations
        • Benefits of Data Transformation
        • Bernoulli
        • BERT
        • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
        • BERTScore
        • Bias and variance
        • Big Data
        • BigQuery
        • binary classification
        • Binder
        • Boosting
        • Bootstrap
        • Boxplot
        • Business observability
        • Business value of anomaly detection
        • Career Interest
        • Casual Inference
        • CatBoost
        • Central Limit Theorem
        • Chain of thought
        • Change Management
        • Checksum
        • Chi-Squared Test
        • Choosing a Threshold
        • Choosing the Number of Clusters
        • CI-CD
        • Class Separability
        • Classification
        • Classification Report
        • Claude
        • cleaning terminal path
        • Click_Implementation.py
        • Clustering
        • Clustering_Dashboard.py
        • Clustermap
        • Code Diagrams
        • Columnar Storage
        • Command line
        • Command Prompt
        • Common Table Expression
        • Communication principles
        • Communication Techniques
        • Comparing LLM
        • Comparing_Ensembles.py
        • Components of the database
        • Computer Science
        • Concatenate
        • conceptual data model
        • Conceptual Model
        • Concurrency
        • Confidence Interval
        • Confusion Matrix
        • Continuous Delivery - Deployment
        • Continuous Integration
        • Converting categorical variables to a dummy indicators
        • Convolutional Neural Networks
        • Correlation
        • Correlation vs Causation
        • Cosine Similarity
        • Cost Function
        • Cost-Sensitive Analysis
        • Covariance
        • Covariance Structures
        • Covariance vs Correlation
        • Covering Index
        • Cron jobs
        • Cross Entropy
        • Cross validation
        • Cross_Entropy_Single.py
        • Cross_Entropy.py
        • Crosstab
        • CRUD
        • Cryptography
        • Current challenges within the energy sector
        • Cypher
        • Dash
        • Dashboarding
        • Data AI Education at Work
        • Data Analysis
        • Data Analysis Portal
        • Data Analyst
        • Data Architect
        • Data Archive Graph Analysis
        • data asset
        • Data Cleansing
        • Data Collection
        • Data Contract
        • Data Distribution
        • Data Drift
        • Data Engineer
        • Data Engineering
        • Data Engineering Portal
        • Data Engineering Tools
        • Data Ingestion
        • Data Integrity
        • Data Leakage
        • Data Lifecycle Management
        • Data Management
        • Data Mining - CRISP
        • Data Modelling
        • Data Orchestration
        • Data Pipeline
        • Data Pipeline to Data Products
        • Data Principles
        • Data Reduction
        • Data Roles
        • Data Science
        • Data Scientist
        • Data Security
        • Data Selection
        • Data Selection in ML
        • Data Steward
        • Data storage
        • Data Streaming
        • Data Terms
        • Data transformation in Data Engineering
        • Data transformation in Machine Learning
        • Data Transformation with Pandas
        • Data Validation
        • data virtualization
        • Data Visualisation
        • Database
        • Database Index
        • Database Management System (DBMS)
        • Database schema
        • Database Techniques
        • Databricks
        • Databricks vs Snowflake
        • Datasets
        • DBScan
        • dbt
        • Debugging
        • Debugging ipynb
        • Debugging.py
        • Decision Tree
        • Deep Learning Frameworks
        • Deep Learning Overview
        • Deep Q-Learning
        • DeepSeek
        • Deleting rows or filling them with the mean is not always best
        • Demand forecasting
        • Dendrograms
        • dependency manager
        • Design Thinking Questions
        • Determining Threshold Values
        • Difference between Databricks vs. Snowflake
        • Difference between snowflake to hadoop
        • Differentation
        • Digital Transformation
        • Digital twin
        • Dimension Table
        • Dimensional Modelling
        • Dimensionality Reduction
        • dimensions
        • Directed Acyclic Graph (DAG)
        • Directory Structure
        • Distillation
        • Distributed Computing
        • Distribution_Analysis.py
        • Distributions
        • Docker
        • Docker Image
        • Documentation & Meetings
        • Dropout
        • DS & ML Portal
        • duckdb
        • DuckDB in python
        • DuckDB vs SQLite
        • Dummy variable trap
        • EDA
        • EDA_Pandas.py
        • Edge Machine Learning Models
        • Education and Training
        • Elastic Net
        • ELT
        • Embedded Methods
        • embeddings for OOV words
        • emergent behavior
        • Encoding Categorical Variables
        • Energy
        • Energy ABM
        • Energy Storage
        • Environment Variables
        • Epoch
        • Epub
        • ER Diagrams
        • Estimator
        • ETL Pipeline example
        • ETL vs. ELT
        • etlt
        • Evaluating Language Models
        • Evaluation Metrics
        • Event Driven
        • Event Driven Events
        • Event Driven Microservices
        • Event-Driven Architecture
        • Everything
        • Excel & Sheets
        • Explain different gradient descent algorithms, their advantages, and limitations.
        • Explain the curse of dimensionality
        • Exploration
        • Exploration vs. Exploitation
        • F1 Score
        • Fabric
        • fact table
        • Factor Analysis
        • Factor_Analysis.py
        • facts
        • FAISS
        • FastAPI
        • FastAPI_Example.py
        • Feature Engineering
        • Feature Evaluation
        • Feature Extraction
        • Feature Importance
        • Feature Scaling
        • Feature Selection
        • Feature selection and creation
        • Feature Selection vs Feature Importance
        • Feature_Distribution.py
        • Feed Forward Neural Network
        • Feedback Template
        • Filter method
        • filter methods
        • Firebase
        • Fishbone diagram
        • Fitting weights and biases of a neural network
        • Flask
        • Folder Tree Diagram
        • Forecasting_AutoArima.py
        • Forecasting_Baseline.py
        • Forecasting_Exponential_Smoothing.py
        • Foreign Key
        • Forward Propagation in Neural Networks
        • Fuzzywuzzy
        • Gartner Hype Cycle
        • Gaussian Distribution
        • Gaussian Mixture Models
        • Gaussian Model
        • gaussian_mixture_model_implementation.py
        • General Linear Regression
        • Generative Adversarial Networks
        • Generative AI
        • Generative AI From Theory to Practice
        • Get data
        • Gini Impurity
        • Gini Impurity vs Cross Entropy
        • GIS
        • Git
        • Gitlab
        • gitlab-ci.yml
        • Google Cloud Platform
        • Google My Maps Data Extraction
        • Gradient Boosting
        • Gradient Boosting Regressor
        • Gradient Descent
        • Gradio
        • Grain
        • Grammar method
        • Graph Analysis Plugin
        • Graph Neural Network
        • Graph Theory
        • Graph Theory Community
        • GraphRAG
        • Grep
        • GridSeachCv
        • Groupby
        • Groupby vs Crosstab
        • Grouped plots
        • GRU
        • GSheets
        • Guardrails
        • Hadoop
        • Handling Different Distributions
        • Handling Missing Data
        • Handling_Missing_Data_Basic.ipynb
        • Handling_Missing_Data.ipynb
        • Handwritten Digit Classification
        • Hash
        • Heatmap
        • Heatmaps_Dendrograms.py
        • heterogeneous features
        • Hierarchical Clustering
        • High cross validation accuracy is not directly proportional to performance on unseen test data
        • Honkit
        • Hosting
        • How businesses use Gen AI
        • How do we evaluate of LLM Outputs
        • how do you do the data selection
        • How is reinforcement learning being combined with deep learning
        • How is schema evolution done in practice with SQL
        • How LLMs store facts
        • How to do git commit messages properly
        • How to model to improve demand forecasting
        • How to normalise a merged table
        • How to reduce the need for Gen AI responses
        • How to search within a graph
        • How to use Sklearn Pipeline
        • How would you decide between using TF-IDF and Word2Vec for text vectorization
        • Hugging Face
        • Hyperparameter
        • Hyperparameter Tuning
        • Hypothesis testing
        • Imbalanced Datasets
        • Imbalanced_Datasets_SMOTE.py
        • Immutable vs mutable
        • Impact of multicollinearity on model parameters
        • Implementing Database Schema
        • In NER how would you handle ambiguous entities
        • incremental synchronization
        • Industries of interest
        • inference
        • inference versus prediction
        • information theory
        • Input is Not Properly Sanitized
        • Interoperability
        • interoperable
        • Interpretability
        • Interpreting logistic regression model parameters
        • Interquartile Range (IQR) Detection
        • interview notepad
        • ipynb
        • Isolation Forest and Its Use in Anomaly Detection
        • Java vs JavaScript
        • JavaScript
        • Jobs to be done
        • Johnson–Lindenstrauss lemma
        • Joining Datasets
        • Json
        • Json to Yaml
        • Junction Tables
        • Justfile
        • K_Means.py
        • K-means
        • K-nearest neighbours
        • Kaggle Abalone regression example
        • Kernel Density Estimation
        • Kernelling
        • Key Differences of Web Feature Server (WFS) and Web Feature Server (WFS)
        • Kmeans vs GMM
        • Knowledge Graph
        • Knowledge graph vs RAG setup
        • Knowledge Graphs with Obsidian
        • Knowledge Work
        • Label encoding
        • Labelling data
        • Langchain
        • Language Model Output Optimisation
        • Language Models
        • Language Models Large (LLMs) vs Small (SLMs)
        • Lasso
        • Latency
        • Latent Dirichlet Allocation
        • LBFGS
        • learning rate
        • Learning Styles
        • lemmatization
        • LightGBM
        • LightGBM vs XGBoost vs CatBoost
        • Linear Discriminant Analysis
        • Linear Regression
        • Linked List
        • LLM
        • LLM Evaluation Metrics
        • Load Balancing
        • Local Interpretable Model-agnostic Explanations
        • Logical Model
        • Logistic Regression
        • Logistic Regression does not predict probabilities
        • Logistic regression in sklearn & Gradient Descent
        • Logistic Regression Statsmodel Summary table
        • Looker Studio
        • loss function
        • Loss versus Cost function
        • LSTM
        • Machine Learning Algorithms
        • Machine Learning Operations
        • maintainability
        • Maintainable Code
        • Makefile
        • Manifold learning
        • Many-to-Many Relationships
        • Markov chain
        • Markov Decision Processes
        • Master Observability Datadog
        • Mathematical Reasoning in Transformers
        • Mathematics
        • Maximum Likelihood Estimation
        • mean absolute error
        • Mean Squared Error
        • mean vs median
        • melt
        • Memory
        • Memory Caching
        • Merge
        • Mermaid
        • Metadata Handling
        • Methods for Handling Outliers
        • Microsoft Access
        • Mini-batch gradient descent
        • Mixture of Experts
        • ML Engineer
        • MNIST
        • Model Building
        • Model Cascading
        • Model Deployment
        • Model Ensemble
        • Model Evaluation
        • Model Evaluation vs Model Optimisation
        • Model Interpretability
        • Model Observability
        • Model Optimisation
        • Model Parameters
        • Model Parameters Tuning
        • Model parameters vs hyperparameters
        • Model preparation
        • Model Selection
        • Model Validation
        • Momentum
        • Momentum.py
        • MongoDB
        • Monolith Architecture
        • Monte Carlo Simulation
        • Multi-Agent Reinforcement Learning
        • Multi-head attention
        • Multi-level index
        • Multicollinearity
        • Multinomial Naive bayes
        • MySql
        • Naive Bayes
        • Natural Language Processing
        • nbconvert
        • neo4j
        • neomodel
        • Network Design
        • Neural network
        • Neural Network Classification
        • Neural network in Practice
        • Neural Scaling Laws
        • Ngrams
        • nltk
        • Node.JS
        • Non-parametric tests
        • Normalisation
        • Normalisation of data
        • Normalisation of Text
        • Normalisation vs Standardisation
        • NoSQL
        • NotebookLM
        • npy Files A NumPy Array storage
        • Object Relational Mapper
        • OLAP
        • OLTP
        • oltp (online transactional processing)
        • One Pager Template
        • One_hot_encoding.py
        • One-hot encoding
        • Optimisation function
        • Optimisation techniques
        • Optimising a Logistic Regression Model
        • Optimising Neural Networks
        • Optuna
        • Ordinary Least Squares
        • Orthogonalization
        • Outliers
        • Over parameterised models
        • Overfitting in Machine Learning
        • p values
        • p-values in linear regression in sklearn
        • Page Rank
        • Pandas
        • Pandas Dataframe Agent
        • Pandas join vs merge
        • Pandas Pivot Table
        • Pandas Stack
        • Pandas_Common.py
        • Pandas_Stack.py
        • Parametric tests
        • parametric vs non-parametric models
        • parametric vs non-parametric tests
        • Parquet
        • parsimonious
        • Part of speech tagging
        • PCA Explained Variance Ratio
        • PCA Principal Components
        • PCA_Analysis.ipynb
        • PCA_Based_Anomaly_Detection.py
        • PCA-Based Anomaly Detection
        • pd.Grouper
        • PDF++
        • pdoc
        • PDP and ICE
        • Percentile Detection
        • Performance Dimensions
        • Performance Drift in Machine Learning
        • Physical Model
        • Pickle
        • Plotly
        • pmdarima
        • Poetry
        • Positional Encoding
        • PostgreSQL
        • PowerBI
        • Powerquery
        • PowerShell
        • Powershell versus cmd
        • Powershell vs Bash
        • Precision
        • Precision or Recall
        • Precision-Recall Curve
        • Prediction Intervals
        • Preprocessing
        • Prevention Is Better Than the Cure
        • Primary Key
        • Principal Component Analysis
        • Probability in other fields
        • Problem Definition
        • programming languages
        • Prompt engineering
        • Prompt Extracting information from blog posts
        • Prompting
        • Proportion Test
        • Publish and Subscribe
        • Pull Request Template
        • PyCaret
        • Pycaret_Anomaly.ipynb
        • Pycaret_Example.py
        • Pydantic
        • Pydantic_More.py
        • Pydantic.py
        • PyGraphviz
        • PyOD
        • Pyright
        • Pyright vs Pydantic
        • PySpark
        • Pytest
        • Python
        • Python Click
        • PyTorch
        • Pytorch vs Tensorflow
        • Q-Learning
        • Q-Q Plot
        • Quartz
        • QUERY GSheets
        • Query Optimisation
        • Query Plan
        • Querying
        • QuickSort
        • R
        • R squared
        • R-squared metric not always a good indicator of model performance in regression
        • Race Conditions
        • RAG
        • Random Forest Regression
        • Random Forests
        • React
        • Reasoning tokens
        • Recall
        • Recommender systems
        • Recurrent Neural Networks
        • Recursive Algorithm
        • Regression Analysis and its Applications
        • Regression metrics
        • Regression_Logistic_Metrics.ipynb
        • Regularisation of Tree based models
        • Regularisation.py
        • Regularization in Machine Learning
        • Reinforcement learning
        • Relating Tables Together
        • Relational Database
        • Relationships in memory
        • requirements.txt
        • REST API
        • Reward Function
        • Ridge
        • ROC (Receiver Operating Characteristic)
        • ROC_Curve.py
        • rollup
        • Row-based Storage
        • Sarsa
        • Scala
        • Scalability
        • Scaling Agentic Systems
        • Scaling Server
        • Scheduled Tasks
        • Scientific Method
        • Seaborn
        • Search
        • Security mitigation
        • Security Researcher
        • Security Vulnerabilities
        • semantic layer
        • Semantic Relationships
        • Sentence Similarity
        • shapefile
        • SHapley Additive exPlanations
        • Sharepoint
        • Silhouette Analysis
        • Similarity Search
        • Single source of truth
        • Sklearn
        • sklearn datasets
        • Sklearn Pipiline
        • Small Language Models
        • Smart Grids
        • SMOTE (Synthetic Minority Over-sampling Technique)
        • SMSS
        • Snowflake
        • Snowflake Schema
        • Software Design Patterns
        • Software Development Life Cycle
        • Software Development Portal
        • spaCy
        • SparseCategorialCrossentropy or CategoricalCrossEntropy
        • Specificity
        • Spreadsheets vs Databases
        • SQL Groupby
        • SQL Injection
        • SQL Joins
        • SQL vs NoSQL
        • SQL Window functions
        • SQLAlchemy
        • SQLAlchemy vs. sqlite3
        • SQLite
        • SQLite Studio
        • Stacking
        • Standard deviation
        • Standardisation
        • Star Schema
        • Statistical Assumptions
        • Statistical Tests
        • Statistics
        • Stemming
        • Stochastic Gradient Descent
        • Stored Procedures
        • Strongly vs Weakly typed language
        • Structuring and organizing data
        • Summarisation
        • Supervised Learning
        • Support Vector Classifier (SVC)
        • Support Vector Machines
        • Support Vector Regression
        • SVM_Example.py
        • Symbolic computation
        • Sympy
        • syntactic relationships
        • t-SNE
        • T-test
        • Tableau
        • Tags
        • Technical Analysis of Named Entity Recognition
        • Technical Debt
        • Technical Design Doc Template
        • Telecommunications
        • Tensorflow
        • Terminal commands
        • Test Loss When Evaluating Models
        • Testing
        • Testing_Pytest.py
        • Testing_unittest.py
        • Text2Cypher
        • TF-IDF
        • The Data Hierarchy of Needs
        • Thinking Systems
        • Time Series
        • Time Series Forecasting
        • Time Series Identify Trends and Patterns
        • Tokenisation
        • TOML
        • tool.bandit
        • tool.ruff
        • tool.uv
        • topic modeling
        • Train-Dev-Test Sets
        • Transaction
        • Transfer Learning
        • transfer_learning.py
        • Transformed Target Regressor
        • Transformer
        • Transformers vs RNNs
        • TS_Anomaly_Detection
        • TS_Anomaly_Detection.py
        • Turning a flat file into a database
        • Types of Computational Bugs
        • Types of Database Schema
        • Types of Neural Networks
        • TypeScript
        • Typical Output Formats in Neural Networks
        • Ubuntu
        • UML
        • unittest
        • univariate vs multivariate
        • unstructured data
        • Unsupervised learning
        • Untitled
        • Untitled 1
        • Untitled 2
        • Untitled 3
        • Use Cases for a Simple Neural Network Like
        • Use of RNNs in energy sector
        • Utilities
        • Vacuum
        • vanishing and exploding gradients problem
        • variance
        • Vector Database
        • Vector Embedding
        • Vector_Embedding.py
        • Vectorisation
        • Vectorized Engine
        • Vercel
        • View Use Case
        • Views
        • Violin plot
        • Virtual environments
        • WCSS and elbow method
        • Weak Learners
        • Web Feature Server (WFS)
        • Web Map Tile Service (WMTS)
        • Webpages relevant
        • What algorithms or models are used within the energy sector
        • What algorithms or models are used within the telecommunication sector
        • What are Data Processing Techniques (row-based, columnar, vectorized)?
        • What are the best practices for evaluating the effectiveness of different prompts
        • What are the top Cloud Providers?
        • What can ABM solve within the energy sector
        • What is a Data Lake?
        • What is a Data Lakehouse?
        • What is a Data Product?
        • What is a Data Warehouse?
        • What is a Jinja Template?
        • What is a Lambda Architecture?
        • What is a Metric?
        • What is a policy in RL
        • What is a Push-Down?
        • What is a Soft Delete?
        • What is a Storage Layer / Object Store?
        • What is an In-Memory Format?
        • What is Apache Airflow?
        • What is Apache Spark?
        • What is Business Intelligence
        • What is Dagster?
        • What is Data Governance?
        • What is Data Integration?
        • What is Data Lineage?
        • What is Data Literacy?
        • What is Data Observability?
        • What is Data Quality?
        • What is data transformation?
        • What is declarative?
        • What is DevOps?
        • What is ETL?
        • What is Functional Programming?
        • What is Granularity
        • What is imperative?
        • What is Kubernetes?
        • What is Machine Learning?
        • What is MapReduce?
        • What is Master Data Management (MDM)?
        • What is Normalization?
        • What is OLAP (Online Analytical Processing)?
        • What is Reverse ETL?
        • What is Schema Evolution?
        • What is semi-structured data?
        • What is Slowly Changing Dimension?
        • What is SQL?
        • What is structured data?
        • What is the Big-O Notation?
        • What is the difference between odds and probability
        • What is the role of gradient-based optimization in training deep learning models.
        • What is YAML?
        • When and why not to us regularisation
        • Why and when is feature scaling necessary
        • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
        • Why does label encoding give different predictions from one-hot encoding
        • Why does the Adam Optimizer converge
        • Why is named entity recognition (NER) a challenging task
        • Why is the Central Limit Theorem important when working with small sample sizes
        • Why JSON is Better than Pickle for Untrusted Data
        • Why Removing Outliers May Improve Regression but Harm Classification
        • Why Type 1 and Type 2 matter
        • Why use ER diagrams
        • Wikipedia_API.py
        • Windows Subsystem for Linux
        • Word2vec
        • Word2Vec.py
        • WordNet
        • Wrapper Methods
        • XGBoost
        • Z-Normalisation
        • Z-Score
        • Z-Scores vs Prediction Intervals
        • Z-Test

    How LLMs store facts

    How might LLMs store facts

    Not solved

    How do Multilayer Perceptrons store facts?

    Different directions encode information in Vector Embedding space.

    MLP’s are blocks of vectors, these are acted on my the context matrix

    Johnson–Lindenstrauss lemma

    Sparse Autoencoder - used in interpretability of LLM responses

    See Anthropic posts

    • https://transformer-circuits.pub/2022/toy_model/index.html#adversarial
    • https://transformer-circuits.pub/2023/monosemantic-features

    Backlinks

    • No backlinks found
        • pages
          • Data Archive
          • DE_Tools
          • ML_Tools
          • Queries
          • Quotes
        • standardised
          • 1-on-1 Template
          • AB testing
          • Accessing Gen AI generated content
          • Accuracy
          • ACID Transaction
          • Activation atlases
          • Activation Function
          • Active Learning
          • Ada boosting
          • Adam Optimizer
          • Adaptive Learning Rates
          • Adding a database to PostgreSQL
          • Addressing Multicollinearity
          • Addressing_Multicollinearity.py
          • Adjusted R squared
          • Agent-based modelling
          • Agentic Solutions
          • Aggregation
          • AI Engineer
          • AI governance
          • Algorithms
          • Alternatives to Batch Processing
          • Amazon S3
          • Anomaly Detection
          • Anomaly Detection in Time Series
          • Anomaly Detection with Clustering
          • Anomaly Detection with Statistical Methods
          • Apache Kafka
          • API
          • API Driven Microservices
          • ARIMA
          • Asking questions
          • Attack mitigation
          • Attack types
          • Attention Is All You Need
          • Attention mechanism
          • AUC
          • Automated Feature Creation
          • AWS Lambda
          • Azure
          • B-tree
          • Backpropagation in Neural Networks
          • Bag of words
          • Bag_of_Words.py
          • Bagging
          • Bandit example output
          • Bandit_Example_Fixed.py
          • Bandit_Example_Nonfixed.py
          • Bash
          • Batch Normalisation
          • Batch Processing
          • Bellman Equations
          • Benefits of Data Transformation
          • Bernoulli
          • BERT
          • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
          • BERTScore
          • Bias and variance
          • Big Data
          • BigQuery
          • binary classification
          • Binder
          • Boosting
          • Bootstrap
          • Boxplot
          • Business observability
          • Business value of anomaly detection
          • Career Interest
          • Casual Inference
          • CatBoost
          • Central Limit Theorem
          • Chain of thought
          • Change Management
          • Checksum
          • Chi-Squared Test
          • Choosing a Threshold
          • Choosing the Number of Clusters
          • CI-CD
          • Class Separability
          • Classification
          • Classification Report
          • Claude
          • cleaning terminal path
          • Click_Implementation.py
          • Clustering
          • Clustering_Dashboard.py
          • Clustermap
          • Code Diagrams
          • Columnar Storage
          • Command line
          • Command Prompt
          • Common Table Expression
          • Communication principles
          • Communication Techniques
          • Comparing LLM
          • Comparing_Ensembles.py
          • Components of the database
          • Computer Science
          • Concatenate
          • conceptual data model
          • Conceptual Model
          • Concurrency
          • Confidence Interval
          • Confusion Matrix
          • Continuous Delivery - Deployment
          • Continuous Integration
          • Converting categorical variables to a dummy indicators
          • Convolutional Neural Networks
          • Correlation
          • Correlation vs Causation
          • Cosine Similarity
          • Cost Function
          • Cost-Sensitive Analysis
          • Covariance
          • Covariance Structures
          • Covariance vs Correlation
          • Covering Index
          • Cron jobs
          • Cross Entropy
          • Cross validation
          • Cross_Entropy_Single.py
          • Cross_Entropy.py
          • Crosstab
          • CRUD
          • Cryptography
          • Current challenges within the energy sector
          • Cypher
          • Dash
          • Dashboarding
          • Data AI Education at Work
          • Data Analysis
          • Data Analysis Portal
          • Data Analyst
          • Data Architect
          • Data Archive Graph Analysis
          • data asset
          • Data Cleansing
          • Data Collection
          • Data Contract
          • Data Distribution
          • Data Drift
          • Data Engineer
          • Data Engineering
          • Data Engineering Portal
          • Data Engineering Tools
          • Data Ingestion
          • Data Integrity
          • Data Leakage
          • Data Lifecycle Management
          • Data Management
          • Data Mining - CRISP
          • Data Modelling
          • Data Orchestration
          • Data Pipeline
          • Data Pipeline to Data Products
          • Data Principles
          • Data Reduction
          • Data Roles
          • Data Science
          • Data Scientist
          • Data Security
          • Data Selection
          • Data Selection in ML
          • Data Steward
          • Data storage
          • Data Streaming
          • Data Terms
          • Data transformation in Data Engineering
          • Data transformation in Machine Learning
          • Data Transformation with Pandas
          • Data Validation
          • data virtualization
          • Data Visualisation
          • Database
          • Database Index
          • Database Management System (DBMS)
          • Database schema
          • Database Techniques
          • Databricks
          • Databricks vs Snowflake
          • Datasets
          • DBScan
          • dbt
          • Debugging
          • Debugging ipynb
          • Debugging.py
          • Decision Tree
          • Deep Learning Frameworks
          • Deep Learning Overview
          • Deep Q-Learning
          • DeepSeek
          • Deleting rows or filling them with the mean is not always best
          • Demand forecasting
          • Dendrograms
          • dependency manager
          • Design Thinking Questions
          • Determining Threshold Values
          • Difference between Databricks vs. Snowflake
          • Difference between snowflake to hadoop
          • Differentation
          • Digital Transformation
          • Digital twin
          • Dimension Table
          • Dimensional Modelling
          • Dimensionality Reduction
          • dimensions
          • Directed Acyclic Graph (DAG)
          • Directory Structure
          • Distillation
          • Distributed Computing
          • Distribution_Analysis.py
          • Distributions
          • Docker
          • Docker Image
          • Documentation & Meetings
          • Dropout
          • DS & ML Portal
          • duckdb
          • DuckDB in python
          • DuckDB vs SQLite
          • Dummy variable trap
          • EDA
          • EDA_Pandas.py
          • Edge Machine Learning Models
          • Education and Training
          • Elastic Net
          • ELT
          • Embedded Methods
          • embeddings for OOV words
          • emergent behavior
          • Encoding Categorical Variables
          • Energy
          • Energy ABM
          • Energy Storage
          • Environment Variables
          • Epoch
          • Epub
          • ER Diagrams
          • Estimator
          • ETL Pipeline example
          • ETL vs. ELT
          • etlt
          • Evaluating Language Models
          • Evaluation Metrics
          • Event Driven
          • Event Driven Events
          • Event Driven Microservices
          • Event-Driven Architecture
          • Everything
          • Excel & Sheets
          • Explain different gradient descent algorithms, their advantages, and limitations.
          • Explain the curse of dimensionality
          • Exploration
          • Exploration vs. Exploitation
          • F1 Score
          • Fabric
          • fact table
          • Factor Analysis
          • Factor_Analysis.py
          • facts
          • FAISS
          • FastAPI
          • FastAPI_Example.py
          • Feature Engineering
          • Feature Evaluation
          • Feature Extraction
          • Feature Importance
          • Feature Scaling
          • Feature Selection
          • Feature selection and creation
          • Feature Selection vs Feature Importance
          • Feature_Distribution.py
          • Feed Forward Neural Network
          • Feedback Template
          • Filter method
          • filter methods
          • Firebase
          • Fishbone diagram
          • Fitting weights and biases of a neural network
          • Flask
          • Folder Tree Diagram
          • Forecasting_AutoArima.py
          • Forecasting_Baseline.py
          • Forecasting_Exponential_Smoothing.py
          • Foreign Key
          • Forward Propagation in Neural Networks
          • Fuzzywuzzy
          • Gartner Hype Cycle
          • Gaussian Distribution
          • Gaussian Mixture Models
          • Gaussian Model
          • gaussian_mixture_model_implementation.py
          • General Linear Regression
          • Generative Adversarial Networks
          • Generative AI
          • Generative AI From Theory to Practice
          • Get data
          • Gini Impurity
          • Gini Impurity vs Cross Entropy
          • GIS
          • Git
          • Gitlab
          • gitlab-ci.yml
          • Google Cloud Platform
          • Google My Maps Data Extraction
          • Gradient Boosting
          • Gradient Boosting Regressor
          • Gradient Descent
          • Gradio
          • Grain
          • Grammar method
          • Graph Analysis Plugin
          • Graph Neural Network
          • Graph Theory
          • Graph Theory Community
          • GraphRAG
          • Grep
          • GridSeachCv
          • Groupby
          • Groupby vs Crosstab
          • Grouped plots
          • GRU
          • GSheets
          • Guardrails
          • Hadoop
          • Handling Different Distributions
          • Handling Missing Data
          • Handling_Missing_Data_Basic.ipynb
          • Handling_Missing_Data.ipynb
          • Handwritten Digit Classification
          • Hash
          • Heatmap
          • Heatmaps_Dendrograms.py
          • heterogeneous features
          • Hierarchical Clustering
          • High cross validation accuracy is not directly proportional to performance on unseen test data
          • Honkit
          • Hosting
          • How businesses use Gen AI
          • How do we evaluate of LLM Outputs
          • how do you do the data selection
          • How is reinforcement learning being combined with deep learning
          • How is schema evolution done in practice with SQL
          • How LLMs store facts
          • How to do git commit messages properly
          • How to model to improve demand forecasting
          • How to normalise a merged table
          • How to reduce the need for Gen AI responses
          • How to search within a graph
          • How to use Sklearn Pipeline
          • How would you decide between using TF-IDF and Word2Vec for text vectorization
          • Hugging Face
          • Hyperparameter
          • Hyperparameter Tuning
          • Hypothesis testing
          • Imbalanced Datasets
          • Imbalanced_Datasets_SMOTE.py
          • Immutable vs mutable
          • Impact of multicollinearity on model parameters
          • Implementing Database Schema
          • In NER how would you handle ambiguous entities
          • incremental synchronization
          • Industries of interest
          • inference
          • inference versus prediction
          • information theory
          • Input is Not Properly Sanitized
          • Interoperability
          • interoperable
          • Interpretability
          • Interpreting logistic regression model parameters
          • Interquartile Range (IQR) Detection
          • interview notepad
          • ipynb
          • Isolation Forest and Its Use in Anomaly Detection
          • Java vs JavaScript
          • JavaScript
          • Jobs to be done
          • Johnson–Lindenstrauss lemma
          • Joining Datasets
          • Json
          • Json to Yaml
          • Junction Tables
          • Justfile
          • K_Means.py
          • K-means
          • K-nearest neighbours
          • Kaggle Abalone regression example
          • Kernel Density Estimation
          • Kernelling
          • Key Differences of Web Feature Server (WFS) and Web Feature Server (WFS)
          • Kmeans vs GMM
          • Knowledge Graph
          • Knowledge graph vs RAG setup
          • Knowledge Graphs with Obsidian
          • Knowledge Work
          • Label encoding
          • Labelling data
          • Langchain
          • Language Model Output Optimisation
          • Language Models
          • Language Models Large (LLMs) vs Small (SLMs)
          • Lasso
          • Latency
          • Latent Dirichlet Allocation
          • LBFGS
          • learning rate
          • Learning Styles
          • lemmatization
          • LightGBM
          • LightGBM vs XGBoost vs CatBoost
          • Linear Discriminant Analysis
          • Linear Regression
          • Linked List
          • LLM
          • LLM Evaluation Metrics
          • Load Balancing
          • Local Interpretable Model-agnostic Explanations
          • Logical Model
          • Logistic Regression
          • Logistic Regression does not predict probabilities
          • Logistic regression in sklearn & Gradient Descent
          • Logistic Regression Statsmodel Summary table
          • Looker Studio
          • loss function
          • Loss versus Cost function
          • LSTM
          • Machine Learning Algorithms
          • Machine Learning Operations
          • maintainability
          • Maintainable Code
          • Makefile
          • Manifold learning
          • Many-to-Many Relationships
          • Markov chain
          • Markov Decision Processes
          • Master Observability Datadog
          • Mathematical Reasoning in Transformers
          • Mathematics
          • Maximum Likelihood Estimation
          • mean absolute error
          • Mean Squared Error
          • mean vs median
          • melt
          • Memory
          • Memory Caching
          • Merge
          • Mermaid
          • Metadata Handling
          • Methods for Handling Outliers
          • Microsoft Access
          • Mini-batch gradient descent
          • Mixture of Experts
          • ML Engineer
          • MNIST
          • Model Building
          • Model Cascading
          • Model Deployment
          • Model Ensemble
          • Model Evaluation
          • Model Evaluation vs Model Optimisation
          • Model Interpretability
          • Model Observability
          • Model Optimisation
          • Model Parameters
          • Model Parameters Tuning
          • Model parameters vs hyperparameters
          • Model preparation
          • Model Selection
          • Model Validation
          • Momentum
          • Momentum.py
          • MongoDB
          • Monolith Architecture
          • Monte Carlo Simulation
          • Multi-Agent Reinforcement Learning
          • Multi-head attention
          • Multi-level index
          • Multicollinearity
          • Multinomial Naive bayes
          • MySql
          • Naive Bayes
          • Natural Language Processing
          • nbconvert
          • neo4j
          • neomodel
          • Network Design
          • Neural network
          • Neural Network Classification
          • Neural network in Practice
          • Neural Scaling Laws
          • Ngrams
          • nltk
          • Node.JS
          • Non-parametric tests
          • Normalisation
          • Normalisation of data
          • Normalisation of Text
          • Normalisation vs Standardisation
          • NoSQL
          • NotebookLM
          • npy Files A NumPy Array storage
          • Object Relational Mapper
          • OLAP
          • OLTP
          • oltp (online transactional processing)
          • One Pager Template
          • One_hot_encoding.py
          • One-hot encoding
          • Optimisation function
          • Optimisation techniques
          • Optimising a Logistic Regression Model
          • Optimising Neural Networks
          • Optuna
          • Ordinary Least Squares
          • Orthogonalization
          • Outliers
          • Over parameterised models
          • Overfitting in Machine Learning
          • p values
          • p-values in linear regression in sklearn
          • Page Rank
          • Pandas
          • Pandas Dataframe Agent
          • Pandas join vs merge
          • Pandas Pivot Table
          • Pandas Stack
          • Pandas_Common.py
          • Pandas_Stack.py
          • Parametric tests
          • parametric vs non-parametric models
          • parametric vs non-parametric tests
          • Parquet
          • parsimonious
          • Part of speech tagging
          • PCA Explained Variance Ratio
          • PCA Principal Components
          • PCA_Analysis.ipynb
          • PCA_Based_Anomaly_Detection.py
          • PCA-Based Anomaly Detection
          • pd.Grouper
          • PDF++
          • pdoc
          • PDP and ICE
          • Percentile Detection
          • Performance Dimensions
          • Performance Drift in Machine Learning
          • Physical Model
          • Pickle
          • Plotly
          • pmdarima
          • Poetry
          • Positional Encoding
          • PostgreSQL
          • PowerBI
          • Powerquery
          • PowerShell
          • Powershell versus cmd
          • Powershell vs Bash
          • Precision
          • Precision or Recall
          • Precision-Recall Curve
          • Prediction Intervals
          • Preprocessing
          • Prevention Is Better Than the Cure
          • Primary Key
          • Principal Component Analysis
          • Probability in other fields
          • Problem Definition
          • programming languages
          • Prompt engineering
          • Prompt Extracting information from blog posts
          • Prompting
          • Proportion Test
          • Publish and Subscribe
          • Pull Request Template
          • PyCaret
          • Pycaret_Anomaly.ipynb
          • Pycaret_Example.py
          • Pydantic
          • Pydantic_More.py
          • Pydantic.py
          • PyGraphviz
          • PyOD
          • Pyright
          • Pyright vs Pydantic
          • PySpark
          • Pytest
          • Python
          • Python Click
          • PyTorch
          • Pytorch vs Tensorflow
          • Q-Learning
          • Q-Q Plot
          • Quartz
          • QUERY GSheets
          • Query Optimisation
          • Query Plan
          • Querying
          • QuickSort
          • R
          • R squared
          • R-squared metric not always a good indicator of model performance in regression
          • Race Conditions
          • RAG
          • Random Forest Regression
          • Random Forests
          • React
          • Reasoning tokens
          • Recall
          • Recommender systems
          • Recurrent Neural Networks
          • Recursive Algorithm
          • Regression Analysis and its Applications
          • Regression metrics
          • Regression_Logistic_Metrics.ipynb
          • Regularisation of Tree based models
          • Regularisation.py
          • Regularization in Machine Learning
          • Reinforcement learning
          • Relating Tables Together
          • Relational Database
          • Relationships in memory
          • requirements.txt
          • REST API
          • Reward Function
          • Ridge
          • ROC (Receiver Operating Characteristic)
          • ROC_Curve.py
          • rollup
          • Row-based Storage
          • Sarsa
          • Scala
          • Scalability
          • Scaling Agentic Systems
          • Scaling Server
          • Scheduled Tasks
          • Scientific Method
          • Seaborn
          • Search
          • Security mitigation
          • Security Researcher
          • Security Vulnerabilities
          • semantic layer
          • Semantic Relationships
          • Sentence Similarity
          • shapefile
          • SHapley Additive exPlanations
          • Sharepoint
          • Silhouette Analysis
          • Similarity Search
          • Single source of truth
          • Sklearn
          • sklearn datasets
          • Sklearn Pipiline
          • Small Language Models
          • Smart Grids
          • SMOTE (Synthetic Minority Over-sampling Technique)
          • SMSS
          • Snowflake
          • Snowflake Schema
          • Software Design Patterns
          • Software Development Life Cycle
          • Software Development Portal
          • spaCy
          • SparseCategorialCrossentropy or CategoricalCrossEntropy
          • Specificity
          • Spreadsheets vs Databases
          • SQL Groupby
          • SQL Injection
          • SQL Joins
          • SQL vs NoSQL
          • SQL Window functions
          • SQLAlchemy
          • SQLAlchemy vs. sqlite3
          • SQLite
          • SQLite Studio
          • Stacking
          • Standard deviation
          • Standardisation
          • Star Schema
          • Statistical Assumptions
          • Statistical Tests
          • Statistics
          • Stemming
          • Stochastic Gradient Descent
          • Stored Procedures
          • Strongly vs Weakly typed language
          • Structuring and organizing data
          • Summarisation
          • Supervised Learning
          • Support Vector Classifier (SVC)
          • Support Vector Machines
          • Support Vector Regression
          • SVM_Example.py
          • Symbolic computation
          • Sympy
          • syntactic relationships
          • t-SNE
          • T-test
          • Tableau
          • Tags
          • Technical Analysis of Named Entity Recognition
          • Technical Debt
          • Technical Design Doc Template
          • Telecommunications
          • Tensorflow
          • Terminal commands
          • Test Loss When Evaluating Models
          • Testing
          • Testing_Pytest.py
          • Testing_unittest.py
          • Text2Cypher
          • TF-IDF
          • The Data Hierarchy of Needs
          • Thinking Systems
          • Time Series
          • Time Series Forecasting
          • Time Series Identify Trends and Patterns
          • Tokenisation
          • TOML
          • tool.bandit
          • tool.ruff
          • tool.uv
          • topic modeling
          • Train-Dev-Test Sets
          • Transaction
          • Transfer Learning
          • transfer_learning.py
          • Transformed Target Regressor
          • Transformer
          • Transformers vs RNNs
          • TS_Anomaly_Detection
          • TS_Anomaly_Detection.py
          • Turning a flat file into a database
          • Types of Computational Bugs
          • Types of Database Schema
          • Types of Neural Networks
          • TypeScript
          • Typical Output Formats in Neural Networks
          • Ubuntu
          • UML
          • unittest
          • univariate vs multivariate
          • unstructured data
          • Unsupervised learning
          • Untitled
          • Untitled 1
          • Untitled 2
          • Untitled 3
          • Use Cases for a Simple Neural Network Like
          • Use of RNNs in energy sector
          • Utilities
          • Vacuum
          • vanishing and exploding gradients problem
          • variance
          • Vector Database
          • Vector Embedding
          • Vector_Embedding.py
          • Vectorisation
          • Vectorized Engine
          • Vercel
          • View Use Case
          • Views
          • Violin plot
          • Virtual environments
          • WCSS and elbow method
          • Weak Learners
          • Web Feature Server (WFS)
          • Web Map Tile Service (WMTS)
          • Webpages relevant
          • What algorithms or models are used within the energy sector
          • What algorithms or models are used within the telecommunication sector
          • What are Data Processing Techniques (row-based, columnar, vectorized)?
          • What are the best practices for evaluating the effectiveness of different prompts
          • What are the top Cloud Providers?
          • What can ABM solve within the energy sector
          • What is a Data Lake?
          • What is a Data Lakehouse?
          • What is a Data Product?
          • What is a Data Warehouse?
          • What is a Jinja Template?
          • What is a Lambda Architecture?
          • What is a Metric?
          • What is a policy in RL
          • What is a Push-Down?
          • What is a Soft Delete?
          • What is a Storage Layer / Object Store?
          • What is an In-Memory Format?
          • What is Apache Airflow?
          • What is Apache Spark?
          • What is Business Intelligence
          • What is Dagster?
          • What is Data Governance?
          • What is Data Integration?
          • What is Data Lineage?
          • What is Data Literacy?
          • What is Data Observability?
          • What is Data Quality?
          • What is data transformation?
          • What is declarative?
          • What is DevOps?
          • What is ETL?
          • What is Functional Programming?
          • What is Granularity
          • What is imperative?
          • What is Kubernetes?
          • What is Machine Learning?
          • What is MapReduce?
          • What is Master Data Management (MDM)?
          • What is Normalization?
          • What is OLAP (Online Analytical Processing)?
          • What is Reverse ETL?
          • What is Schema Evolution?
          • What is semi-structured data?
          • What is Slowly Changing Dimension?
          • What is SQL?
          • What is structured data?
          • What is the Big-O Notation?
          • What is the difference between odds and probability
          • What is the role of gradient-based optimization in training deep learning models.
          • What is YAML?
          • When and why not to us regularisation
          • Why and when is feature scaling necessary
          • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
          • Why does label encoding give different predictions from one-hot encoding
          • Why does the Adam Optimizer converge
          • Why is named entity recognition (NER) a challenging task
          • Why is the Central Limit Theorem important when working with small sample sizes
          • Why JSON is Better than Pickle for Untrusted Data
          • Why Removing Outliers May Improve Regression but Harm Classification
          • Why Type 1 and Type 2 matter
          • Why use ER diagrams
          • Wikipedia_API.py
          • Windows Subsystem for Linux
          • Word2vec
          • Word2Vec.py
          • WordNet
          • Wrapper Methods
          • XGBoost
          • Z-Normalisation
          • Z-Score
          • Z-Scores vs Prediction Intervals
          • Z-Test

      Created with Quartz v4.3.1 © 2025

      • GitHub
      • Linkedin