Data Archive

      • pages
        • Data Archive
        • DE_Tools
        • ML_Tools
        • Quotes
        • Research Questions
        • Reviews
      • standardised
        • 1-on-1 Template
        • 1-to-1's with a Line Manager
        • AB testing
        • Accessing Gen AI generated content
        • Accuracy
        • ACID Transaction
        • Activation atlases
        • Activation Function
        • Active Learning
        • Ada boosting
        • Adam Optimizer
        • Adaptive Learning Rates
        • Adding a database to PostgreSQL
        • Addressing Multicollinearity
        • Addressing_Multicollinearity.py
        • Adjusted R squared
        • Agent Exploration
        • Agent-based modelling
        • Agentic Solutions
        • Aggregation
        • AI Agents Memory
        • AI Engineer
        • AI governance
        • Algorithms
        • Altair
        • altair versus seaborn
        • Alternatives to Batch Processing
        • Amazon S3
        • Anomaly Detection
        • Anomaly Detection in Time Series
        • Anomaly Detection with Clustering
        • Anomaly Detection with Statistical Methods
        • ANOVA
        • Apache Iceberg
        • Apache Kafka
        • API
        • API Driven Microservices
        • ARIMA
        • Asking questions
        • Assumption of Normality
        • Attack mitigation
        • Attack types
        • Attention Is All You Need
        • Attention mechanism
        • AUC
        • Automated Feature Creation
        • AWS Lambda
        • Azure
        • B-tree
        • Backpropagation in Neural Networks
        • Bag of words
        • Bag_of_Words.py
        • Bagging
        • Bandit example output
        • Bandit_Example_Fixed.py
        • Bandit_Example_Nonfixed.py
        • Bash
        • bat
        • Batch Normalisation
        • Batch Processing
        • Bellman Equations
        • Benefits of Data Transformation
        • Bernoulli
        • BERT
        • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
        • BERTScore
        • Bias and variance
        • Big Data
        • BigQuery
        • binary classification
        • Binder
        • Boosting
        • Bootstrap
        • Boxplot
        • Business observability
        • Business value of anomaly detection
        • Casual Inference
        • CatBoost
        • Central Limit Theorem
        • Central Limit Theorem & Small Sample Sizes
        • Chain of thought
        • Change Management
        • ChatGPT
        • Checksum
        • Chi-Squared Test
        • Choosing a Threshold
        • Choosing the Number of Clusters
        • CI-CD
        • Class Separability
        • Classification
        • Classification Report
        • Claude
        • Click_Implementation.py
        • Cluster Density
        • Cluster Seperation
        • Clustering
        • Clustering_Dashboard.py
        • Clustermap
        • Code Diagrams
        • Columnar Storage
        • Command line
        • Command Prompt
        • Common Table Expression
        • Communication principles
        • Communication Techniques
        • Comparing LLMs
        • Comparing_Ensembles.py
        • Components of the database
        • Computer Science
        • Concatenate
        • conceptual data model
        • Conceptual Model
        • Concurrency
        • Confidence Interval
        • Confusion Matrix
        • Continuous Delivery - Deployment
        • Continuous Integration
        • Convolutional Neural Networks
        • Correlation
        • Correlation vs Causation
        • Cosine Similarity
        • Cost Function
        • Cost-Sensitive Analysis
        • Covariance
        • Covariance Structures
        • Covariance vs Correlation
        • Covering Index
        • Cron jobs
        • Cross Entropy
        • Cross validation
        • Cross_Entropy_Single.py
        • Cross_Entropy.py
        • Crosstab
        • CRUD
        • Cryptography
        • csv module
        • CUDA
        • Curse of dimensionality
        • Cypher
        • Dash
        • Dashboarding
        • Data AI Education at Work
        • Data Analysis
        • Data Analysis Portal
        • Data Analyst
        • Data Architect
        • Data Assessment
        • Data Cleansing
        • Data Collection
        • Data Contract
        • Data Distribution
        • Data Drift
        • Data Engineer
        • Data Engineering
        • Data Engineering Portal
        • Data Engineering Tools
        • Data Ingestion
        • Data Integrity
        • Data Leakage
        • Data Lifecycle Management
        • Data Management
        • Data Mining - CRISP
        • Data Modelling
        • Data Orchestration
        • Data Pipeline
        • Data Pipeline to Data Products
        • Data Principles
        • Data Reduction
        • Data Roles
        • Data Science
        • Data Scientist
        • Data Security
        • Data Selection
        • Data Selection in ML
        • Data Steward
        • Data storage
        • Data Streaming
        • Data transformation in Data Engineering
        • Data transformation in Machine Learning
        • Data Transformation with Pandas
        • Data Validation
        • data virtualization
        • Data Visualisation
        • Database
        • Database Index
        • Database Management System (DBMS)
        • Database schema
        • Database Techniques
        • Databricks
        • Databricks vs Snowflake
        • Datasets
        • DBScan
        • dbt
        • Debugging
        • Debugging ipynb
        • Debugging.py
        • Decision Tree
        • Deep Learning Frameworks
        • Deep Learning Overview
        • Deep Q-Learning
        • Deleting rows or filling them with the mean is not always best
        • Demand forecasting
        • Dendrograms
        • dependency manager
        • Design Thinking Questions
        • Determining Threshold Values
        • Differentation
        • Digital Transformation
        • Digital twin
        • Dimension Table
        • Dimensional Modelling
        • Dimensionality Reduction
        • dimensions
        • Directed Acyclic Graph (DAG)
        • Distillation
        • Distributed Computing
        • Distribution_Analysis.py
        • Distributions
        • Docker
        • Docker Image
        • documentation
        • Documentation & Meetings
        • Dropout
        • DS & ML Portal
        • duckdb
        • DuckDB in python
        • DuckDB vs SQLite
        • Dummy variable trap
        • EDA
        • Edge ML
        • Education and Training
        • Elastic Net
        • ElasticSearch
        • ELT
        • Embedded Methods
        • embeddings for OOV words
        • emergent behavior
        • Encoding Categorical Variables
        • Energy
        • Energy ABM
        • Energy Storage
        • Environment Variables
        • Epoch
        • Epub
        • ER Diagrams
        • Estimator
        • ETL Pipeline example
        • ETL vs. ELT
        • etlt
        • Evaluate Embedding Methods
        • Evaluating Language Models
        • Evaluating the effectiveness of prompts
        • Evaluation Metrics
        • Event Driven
        • Event Driven Events
        • Event Driven Microservices
        • Event-Driven Architecture
        • Everything
        • Excel
        • Excel pivot table
        • Excel vs Google Sheets
        • Experiment Plan Template
        • Exploration vs. Exploitation
        • f-regression
        • F-statistic
        • F1 Score
        • Fabric
        • fact table
        • Factor Analysis
        • Factor_Analysis.py
        • facts
        • FAISS
        • FastAPI
        • FastAPI_Example.py
        • Feature Engineering
        • Feature Evaluation
        • Feature Extraction
        • Feature Importance
        • Feature Scaling
        • Feature Selection
        • Feature Selection vs Feature Importance
        • Feature_Distribution.py
        • Feed Forward Neural Network
        • Feedback Template
        • File Management
        • Filter method
        • filter methods
        • Firebase
        • Fishbone diagram
        • Fitting weights and biases of a neural network
        • Flask
        • Folder Tree Diagram
        • Forecasting_AutoArima.py
        • Forecasting_Baseline.py
        • Forecasting_Exponential_Smoothing.py
        • Foreign Key
        • Forward Propagation in Neural Networks
        • frontend
        • Fuzzywuzzy
        • garbage collector
        • Gartner Hype Cycle
        • Gaussian Distribution
        • Gaussian Mixture Models
        • Gaussian Model
        • gaussian_mixture_model_implementation.py
        • General Linear Regression
        • Generative Adversarial Networks
        • Generative AI
        • Generative AI From Theory to Practice
        • Generators in Python
        • Gini Impurity
        • Gini Impurity vs Cross Entropy
        • GIS
        • Git
        • Gitlab
        • gitlab-ci.yml
        • Global Interpreter Lock
        • Google Cloud Platform
        • Google Colab
        • Google Collab
        • Google My Maps Data Extraction
        • Google OR Tools
        • Google Sheet Pivots Table
        • Google Sheets
        • GPT
        • Gradient Boosting
        • Gradient Boosting Regressor
        • Gradient Descent
        • Gradient descent in linear regression
        • Gradio
        • Grain
        • Grammar method
        • Graph Neural Network
        • Graph Query Language
        • Graph Theory
        • Graph Theory Community
        • GraphRAG
        • GraphRAG The Marriage of Knowledge Graphs and RAG
        • Grep
        • GridSeachCv
        • Groupby
        • Groupby vs Crosstab
        • Grouped plots
        • GRU
        • Guardrails
        • Hadoop
        • Handling Different Distributions
        • Handling Missing Data
        • Handling_Missing_Data_Basic.ipynb
        • Handling_Missing_Data.ipynb
        • Handwritten Digit Classification
        • Hash
        • Heap Data Structure
        • Heap Memory
        • Heatmap
        • Heatmaps_Dendrograms.py
        • heterogeneous features
        • Hierarchical Clustering
        • High cross validation accuracy is not directly proportional to performance on unseen test data
        • Honkit
        • Hosting
        • How businesses use Gen AI
        • How do we evaluate of LLM Outputs
        • how do you do the data selection
        • How is reinforcement learning being combined with deep learning
        • How is schema evolution done in practice with SQL
        • How LLMs store facts
        • How to do git commit messages properly
        • How to normalise a merged table
        • How to reduce the need for Gen AI responses
        • How to search within a graph
        • How to use Sklearn Pipeline
        • How would you decide between using TF-IDF and Word2Vec for text vectorization
        • html
        • Hugging Face
        • Hyperparameter
        • Hyperparameter Tuning
        • Hypothesis testing
        • Imbalanced Datasets
        • Imbalanced_Datasets_SMOTE.py
        • Immutable vs mutable
        • Impact of multicollinearity on model parameters
        • Implementing Database Schema
        • Imputation Techniques
        • In NER how would you handle ambiguous entities
        • incremental synchronization
        • Indexing in cypher
        • Industries of interest
        • Inertia K Means Cost Function
        • inference
        • inference versus prediction
        • information theory
        • initialization methods
        • Input is Not Properly Sanitized
        • Interoperability
        • interoperable
        • Interpretability
        • Interpreting logistic regression model parameters
        • Interquartile Range (IQR) Detection
        • ipynb
        • Isolation Forest and Its Use in Anomaly Detection
        • Java
        • Java vs JavaScript
        • JavaScript
        • Jobs to be done
        • Johnson–Lindenstrauss lemma
        • Joining Datasets
        • Json
        • Json to SQLite
        • Junction Tables
        • Jupyter Book
        • jupytext
        • Justfile
        • K_Means.py
        • K-means
        • K-nearest neighbours
        • Keras
        • Kernel Density Estimation
        • Kernelling
        • Key Components of Attention and Formula
        • Key Differences of Web Feature Server (WFS) and Web Feature Server (WFS)
        • Kmeans vs GMM
        • KNIME
        • Knowledge Graph
        • Knowledge graph vs RAG setup
        • Knowledge Work
        • Label encoding
        • Label encoding vs One-hot encoding
        • Labelling data
        • Langchain
        • Language Model Output Optimisation
        • Language Models
        • Language Models Large (LLMs) vs Small (SLMs)
        • Lasso
        • Latency
        • Latent Dirichlet Allocation
        • LBFGS
        • Learning Curve
        • learning rate
        • Learning Styles
        • lemmatization
        • LightGBM
        • LightGBM vs XGBoost vs CatBoost
        • Linear Discriminant Analysis
        • Linear Regression
        • Linked List
        • LLM
        • LLM Evaluation Metrics
        • LLM Memory
        • Load Balancing
        • Local Interpretable Model-agnostic Explanations
        • Local Outlier Factor (LOF)
        • Log transformation
        • Logical Model
        • Logistic Regression
        • Logistic Regression does not predict probabilities
        • Logistic regression in sklearn & Gradient Descent
        • Logistic Regression Statsmodel Summary table
        • Looker Studio
        • loss function
        • Loss versus Cost function
        • LSTM
        • Machine Learning Algorithms
        • Machine Learning Operations
        • maintainability
        • Maintainable Code
        • Makefile
        • Manifold learning
        • Many-to-Many Relationships
        • Markov chain
        • Markov Decision Processes
        • Master Observability Datadog
        • Mathematical Reasoning in Transformers
        • Mathematics
        • Maximum Likelihood Estimation
        • mean absolute error
        • Mean Squared Error
        • mean vs median
        • melt
        • Memory
        • Memory Caching
        • Merge
        • Mermaid
        • Metadata Handling
        • Methods for Handling Outliers
        • Microsoft
        • Microsoft Access
        • Mini-batch gradient descent
        • Mixture of Experts
        • ML Engineer
        • MNIST
        • Model Building
        • Model Cascading
        • Model Deployment
        • Model Ensemble
        • Model Evaluation
        • Model Evaluation vs Model Optimisation
        • Model Interpretability
        • Model Observability
        • Model Optimisation
        • Model Parameters
        • Model Parameters Tuning
        • Model parameters vs hyperparameters
        • Model Selection
        • Model Validation
        • model-agnostic feature importance
        • Momentum
        • Momentum.py
        • MongoDB
        • Monolith Architecture
        • Monte Carlo Simulation
        • Multi-Agent Reinforcement Learning
        • Multi-head attention
        • Multi-level index
        • Multicollinearity
        • Multinomial Naive bayes
        • Multiprocessing
        • Multiprocessing vs Multithreading
        • Multithreading
        • MySql
        • Naive Bayes
        • Natural Language Processing
        • nbconvert
        • nbconvert slideshows
        • neo4j
        • neomodel
        • NET
        • Network Design
        • Neural network
        • Neural Network Classification
        • Neural network in Practice
        • Neural Scaling Laws
        • Ngrams
        • nltk
        • Node.JS
        • non-parametric
        • Non-parametric tests
        • Normalisation
        • Normalisation of data
        • Normalisation of Text
        • Normalisation vs Standardisation
        • NoSQL
        • NotebookLM
        • npy Files A NumPy Array storage
        • Numpy
        • Object Relational Mapper
        • Odds
        • Odds vs Probability
        • OLAP
        • OLTP
        • One Pager Template
        • One_hot_encoding.py
        • One-hot encoding
        • OOV words
        • Operational Resilience for Growth and Adaptability
        • Optimisation function
        • Optimisation techniques
        • Optimising a Logistic Regression Model
        • Optimising Neural Networks
        • Optuna
        • Ordinary Least Squares
        • Orthogonalization
        • Outliers
        • Over parameterised models
        • Overfitting in Machine Learning
        • p values
        • Page Rank
        • Pandas
        • Pandas Dataframe Agent
        • Pandas join vs merge
        • Pandas Pivot Table
        • Pandas Stack
        • Pandas_Common.py
        • Pandas_Stack.py
        • Pandoc
        • Parametric tests
        • parametric vs non-parametric models
        • parametric vs non-parametric tests
        • Parquet
        • parsimonious
        • Part of speech tagging
        • PCA Explained Variance Ratio
        • PCA Principal Components
        • PCA_Analysis.ipynb
        • PCA_Based_Anomaly_Detection.py
        • PCA-Based Anomaly Detection
        • pd.Grouper
        • pdoc
        • PDP and ICE
        • Percentile Detection
        • Performance Dimensions
        • Performance Drift in Machine Learning
        • Physical Model
        • Pickle
        • Plotly
        • pmdarima
        • Poetry
        • Polynomial Regression
        • Positional Encoding
        • PostgreSQL
        • Postman
        • PowerBI
        • Powerquery
        • PowerShell
        • Powershell scripts
        • Powershell versus Command Prompt
        • Powershell vs Bash
        • Precision
        • Precision or Recall
        • Precision-Recall Curve
        • Prediction Intervals
        • Preprocessing
        • Prevention Is Better Than the Cure
        • Primary Key
        • Principal Component Analysis
        • Probability
        • Problem Definition
        • Process Based Parallelism
        • Processes vs Threads
        • programming languages
        • Project Management Portal
        • Prompt engineering
        • prompt retrievers
        • Prompting
        • Proportion Test
        • Publish and Subscribe
        • Pull Request Template
        • PyCaret
        • Pycaret_Anomaly.ipynb
        • Pycaret_Example.py
        • Pydantic
        • Pydantic_More.py
        • Pydantic.py
        • PyGraphviz
        • PyOD
        • Pyright
        • Pyright vs Pydantic
        • PySpark
        • Pytest
        • Python
        • Python Click
        • PyTorch
        • Pytorch vs Tensorflow
        • Q-Learning
        • Q-Q Plot
        • Quartz
        • Query Optimisation
        • Querying
        • QuickSort
        • R
        • R squared
        • R-squared metric not always a good indicator of model performance in regression
        • Race Conditions
        • RAG
        • Random Access Memory
        • Random Forest Regression
        • Random Forests
        • React
        • Reasoning tokens
        • Recall
        • Recommender systems
        • Recurrent Neural Networks
        • Recursive Algorithm
        • Registering a Scheduled Task
        • Regression
        • Regression metrics
        • Regression_Logistic_Metrics.ipynb
        • Regularisation of Tree based models
        • Regularisation.py
        • Regularization in Machine Learning
        • Reinforcement learning
        • Relating Tables Together
        • Relational Database
        • Relationships in memory
        • Relu
        • REST API
        • retriever
        • Reveal.js
        • Reward Function
        • Ridge
        • ROC (Receiver Operating Characteristic)
        • ROC_Curve.py
        • rollup
        • Root Mean Squared Error
        • Row-based Storage
        • Sarsa
        • Scala
        • Scalability
        • Scaling Agentic Systems
        • Scaling Data Science Capability
        • Scaling Server
        • Scatter Plots
        • Scientific Method
        • Scikit-Learn
        • Scipy
        • Seaborn
        • search
        • Security mitigation
        • Security Researcher
        • Security Vulnerabilities
        • Self Attention
        • Self attention vs multi-head attention
        • Self-Attention
        • semantic layer
        • Semantic Relationships
        • Semantic search
        • Sentence Similarity
        • Sentence Transformer Workflow
        • Sentence Transformers
        • shapefile
        • SHapley Additive exPlanations
        • Sharepoint
        • Silhouette Analysis
        • Similarity Search
        • Single source of truth
        • sklearn datasets
        • Sklearn Pipiline
        • Small Language Models
        • Smart Grids
        • SMOTE (Synthetic Minority Over-sampling Technique)
        • SMSS
        • Snowflake
        • Snowflake Schema
        • Snowflake vs Hadoop
        • Software Design Patterns
        • Software Development Life Cycle
        • Software Development Portal
        • spaCy
        • SparseCategorialCrossentropy or CategoricalCrossEntropy
        • Spearman vs Pearson Correlation
        • Specificity
        • Spreadsheets vs Databases
        • SQL Groupby
        • SQL Injection
        • SQL Joins
        • SQL vs NoSQL
        • SQL Window functions
        • SQLAlchemy
        • SQLAlchemy vs. sqlite3
        • SQLite
        • SQLite Studio
        • stack memory
        • Stacking
        • Standard deviation
        • Standardisation
        • Star Schema
        • Statistical Assumptions
        • Statistical Tests
        • Statistical theorems
        • Statistics
        • Stemming
        • Stochastic Gradient Descent
        • Stored Procedures
        • Streamlit
        • Strongly vs Weakly typed language
        • Structuring and organizing data
        • Summarisation
        • Supervised Learning
        • Support Vector Classifier (SVC)
        • Support Vector Machines
        • Support Vector Regression
        • SVM_Example.py
        • Symbolic computation
        • Sympy
        • syntactic relationships
        • t-SNE
        • T-test
        • Tableau
        • Technical Analysis of Named Entity Recognition
        • Technical Debt
        • Technical Design Doc Template
        • Telecommunications
        • Tensorflow
        • Terminal commands
        • Test Loss When Evaluating Models
        • Testing
        • Testing_Pytest.py
        • Testing_unittest.py
        • Text2Cypher
        • TF-IDF
        • TF-IDF Implementation
        • The Data Hierarchy of Needs
        • Thinking Systems
        • Time Series
        • Time Series Forecasting
        • Time Series Identify Trends and Patterns
        • Tokenisation
        • TOML
        • tool.bandit
        • tool.ruff
        • tool.uv
        • topic modeling
        • Train-Dev-Test Sets
        • Transaction
        • Transfer Learning
        • transfer_learning.py
        • Transformed Target Regressor
        • Transformer
        • Transformers vs RNNs
        • TS_Anomaly_Detection.py
        • Turning a flat file into a database
        • Type 1 error and Power
        • Types of Computational Bugs
        • Types of Database Schema
        • Types of Neural Networks
        • TypeScript
        • Typical Output Formats in Neural Networks
        • Ubuntu
        • UMAP
        • UML
        • unittest
        • univariate vs multivariate
        • Unix
        • unstructured data
        • Unsupervised learning
        • Use Cases for a Simple Neural Network Like
        • Use of RNNs in energy sector
        • Vacuum
        • vanishing and exploding gradients problem
        • Variability in linear models
        • variance
        • Vector Database
        • Vector Embedding
        • Vector_Embedding.py
        • Vectorisation
        • Vectorized Engine
        • Vercel
        • View Use Case
        • Views
        • Violin plot
        • Virtual environments
        • WCSS and elbow method
        • Weak Learners
        • Web Feature Server (WFS)
        • Web Map Tile Service (WMTS)
        • Webpages relevant
        • What are Data Processing Techniques (row-based, columnar, vectorized)?
        • What are the top Cloud Providers?
        • What is a Data Lake?
        • What is a Data Lakehouse?
        • What is a Data Product?
        • What is a Data Warehouse?
        • What is a Jinja Template?
        • What is a Lambda Architecture?
        • What is a Metric?
        • What is a policy in RL
        • What is a Push-Down?
        • What is a Soft Delete?
        • What is a Storage Layer / Object Store?
        • What is an In-Memory Format?
        • What is Apache Airflow?
        • What is Apache Spark?
        • What is Business Intelligence
        • What is Dagster?
        • What is Data Governance?
        • What is Data Integration?
        • What is Data Lineage?
        • What is Data Literacy?
        • What is Data Observability?
        • What is Data Quality?
        • What is data transformation?
        • What is declarative?
        • What is DevOps?
        • What is ETL?
        • What is Functional Programming?
        • What is Granularity
        • What is imperative?
        • What is Kubernetes?
        • What is Machine Learning?
        • What is MapReduce?
        • What is Master Data Management (MDM)?
        • What is Normalization?
        • What is OLAP (Online Analytical Processing)?
        • What is Reverse ETL?
        • What is Schema Evolution?
        • What is semi-structured data?
        • What is Slowly Changing Dimension?
        • What is SQL?
        • What is structured data?
        • What is the Big-O Notation?
        • What is YAML?
        • When and why not to us regularisation
        • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
        • Why does the Adam Optimizer converge
        • Why is named entity recognition (NER) a challenging task
        • Why JSON is Better than Pickle for Untrusted Data
        • Why Removing Outliers May Improve Regression but Harm Classification
        • Why standardise features
        • Why Type 1 and Type 2 matter
        • Why use ER diagrams
        • Wikipedia_API.py
        • Windows Scheduled Tasks
        • Windows Subsystem for Linux
        • Word2vec
        • Word2Vec.py
        • WordNet
        • Wrapper Methods
        • Xaiver
        • XGBoost
        • Z-Normalisation
        • Z-Score
        • Z-Scores vs Prediction Intervals
        • Z-Test
      • 01082025-files without tags

    01082025-files without tags

    • Mean Absolute Error.md
    • Maintainability.md
    • Assumption of Normality.md
    • Word2Vec.py.md
    • Why is named entity recognition (NER) a challenging task.md
    • Wikipedia_API.py.md
    • Why Type 1 and Type 2 matter.md
    • When and why not to us regularisation.md
    • Webpages relevant.md
    • Why does increasing the number of models in a ensemble not necessarily improve the accuracy.md
    • Web Map Tile Service (WMTS).md
    • Web Feature Server (WFS).md
    • Weak Learners.md
    • Vectorized Engine.md
    • Vector_Embedding.py.md
    • View Use Case.md
    • Variance.md
    • Variability in linear models.md
    • Vacuum.md
    • Use Cases for a Simple Neural Network Like.md
    • Unix.md
    • univariate vs multivariate.md
    • unittest.md
    • Ubuntu.md
    • Typical Output Formats in Neural Networks.md
    • UMAP.md
    • Types of Computational Bugs.md
    • Types of Database Schema.md
    • TS_Anomaly_Detection.py.md
    • Type 1 error and Power.md
    • Transformed Target Regressor.md
    • transfer_learning.py.md
    • Transaction.md
    • Train-Dev-Test Sets.md
    • topic modeling.md
    • tool.ruff.md
    • tool.bandit.md
    • TOML.md
    • Time Series Identify Trends and Patterns.md
    • Time Series Forecasting.md
    • TF-IDF Implementation.md
    • Text2Cypher.md
    • Testing_Pytest.py.md
    • Testing_unittest.py.md
    • Test Loss When Evaluating Models.md
    • Testing.md
    • Telecommunications.md
    • Technical Design Doc Template.md
    • syntactic relationships.md
    • T-test.md
    • SVM_Example.py.md
    • Support Vector Regression.md
    • Support Vector Classifier (SVC).md
    • Structuring and organizing data.md
    • Strongly vs Weakly typed language.md
    • Stored Procedures.md
    • Stochastic Gradient Descent.md
    • Stemming.md
    • Star Schema.md
    • Statistical theorems.md
    • Statistical Tests.md
    • Standard deviation.md
    • Statistical Assumptions.md
    • stack memory.md
    • Stacking.md
    • SQLAlchemy.md
    • SQLite Studio.md
    • SQLAlchemy vs. sqlite3.md
    • SQL Injection.md
    • SQL Joins.md
    • SparseCategorialCrossentropy or CategoricalCrossEntropy.md
    • Snowflake Schema.md
    • SMSS.md
    • SMOTE (Synthetic Minority Over-sampling Technique).md
    • Similarity Search.md
    • sklearn datasets.md
    • Silhouette Analysis.md
    • SHapley Additive exPlanations.md
    • Sentence Transformer Workflow.md
    • Self-Attention.md
    • Self attention vs multi-head attention.md
    • Self Attention.md
    • Security Researcher.md
    • Security mitigation.md
    • search.md
    • Scaling Server.md
    • Scatter Plots.md
    • Sarsa.md
    • Scaling Agentic Systems.md
    • Root Mean Squared Error.md
    • Row-based Storage.md
    • ROC_Curve.py.md
    • Reward Function.md
    • REST API.md
    • retriever.md
    • Relu.md
    • Relational Database.md
    • Regression_Logistic_Metrics.ipynb.md
    • Regularisation.py.md
    • Recursive Algorithm.md
    • React.md
    • Random Forest Regression.md
    • Random Access Memory.md
    • R-squared metric not always a good indicator of model performance in regression.md
    • Q-Q Plot.md
    • Race Conditions.md
    • R.md
    • QuickSort.md
    • Pytorch vs Tensorflow.md
    • Pytest.md
    • Pyright vs Pydantic.md
    • PyOD.md
    • PyGraphviz.md
    • Pydantic_More.py.md
    • Pydantic.py.md
    • Pydantic.md
    • Pycaret_Example.py.md
    • PyCaret.md
    • Proportion Test.md
    • Pycaret_Anomaly.ipynb.md
    • prompt retrievers.md
    • Pull Request Template.md
    • programming languages.md
    • Probability.md
    • Prevention Is Better Than The Cure.md
    • Problem Definition.md
    • Precision-Recall Curve.md
    • Primary Key.md
    • Postman.md
    • Powershell vs Bash.md
    • Poetry.md
    • Polynomial Regression.md
    • pmdarima.md
    • Pickle.md
    • Physical Model.md
    • Percentile Detection.md
    • PDP and ICE.md
    • pdoc.md
    • pd.Grouper.md
    • PCA_Based_Anomaly_Detection.py.md
    • PCA_Analysis.ipynb.md
    • PCA-Based Anomaly Detection.md
    • PCA Principal Components.md
    • parsimonious.md
    • PCA Explained Variance Ratio.md
    • Part of speech tagging.md
    • Parametric tests.md
    • parametric vs non-parametric tests.md
    • Pandas_Stack.py.md
    • Pandas_Common.py.md
    • Pandas Pivot Table.md
    • Pandas Dataframe Agent.md
    • Over parameterised models.md
    • Ordinary Least Squares.md
    • Optuna.md
    • Optimising Neural Networks.md
    • Orthogonalization.md
    • Optimising a Logistic Regression Model.md
    • Optimisation techniques.md
    • One_hot_encoding.py.md
    • OLTP.md
    • One Pager Template.md
    • OLAP.md
    • One-hot encoding.md
    • Object Relational Mapper.md
    • Odds.md
    • Numpy.md
    • Normalisation vs Standardisation.md
    • Normalisation of data.md
    • NoSQL.md
    • non-parametric.md
    • Non-parametric tests.md
    • Node.JS.md
    • Ngrams.md
    • Neural Network Classification.md
    • Neural network in Practice.md
    • NET.md
    • neomodel.md
    • nbconvert slideshows.md
    • Multithreading.md
    • Multiprocessing vs Multithreading.md
    • Multi-Agent Reinforcement Learning.md
    • Multinomial Naive bayes.md
    • MongoDB.md
    • Momentum.py.md
    • Monte Carlo Simulation.md
    • model-agnostic feature importance.md
    • Model Validation.md
    • Model Parameters.md
    • Model parameters vs hyperparameters.md
    • Model Evaluation vs Model Optimisation.md
    • Model Interpretability.md
    • Model Cascading.md
    • MNIST.md
    • Mixture of Experts.md
    • Model Building.md
    • Mini-batch gradient descent.md
    • ML Engineer.md
    • Microsoft.md
    • Methods for Handling Outliers.md
    • Metadata Handling.md
    • Merge.md
    • Memory.md
    • Memory Caching.md
    • Mean Squared Error.md
    • Maximum Likelihood Estimation.md
    • Many-to-Many Relationships.md
    • Maintainable Code.md
    • Looker Studio.md
    • Logistic regression in sklearn & Gradient Descent.md
    • Logistic Regression Statsmodel Summary table.md
    • Logistic Regression does not predict probabilities.md
    • Local Interpretable Model-agnostic Explanations.md
    • Load Balancing.md
    • Logical Model.md
    • LLM Evaluation Metrics.md
    • Linear Discriminant Analysis.md
    • LightGBM vs XGBoost vs CatBoost.md
    • LBFGS.md
    • Latency.md
    • Language Models Large (LLMs) vs Small (SLMs).md
    • Language Model Output Optimisation.md
    • Label encoding.md
    • Label encoding vs One-hot encoding.md
    • Labelling data.md
    • K_Means.py.md
    • Knowledge Graph.md
    • Kmeans vs GMM.md
    • Key Differences of Web Feature Server (WFS) and Web Feature Server (WFS).md
    • Kernelling.md
    • Key Components of Attention and Formula.md
    • Kernel Density Estimation.md
    • Keras.md
    • Justfile.md
    • Json.md
    • Junction Tables.md
    • Json to SQLite.md
    • Joining Datasets.md
    • JavaScript.md
    • Java.md
    • Interquartile Range (IQR) Detection.md
    • interoperable.md
    • Interpreting logistic regression model parameters.md
    • Interoperability.md
    • Input is Not Properly Sanitized.md
    • inference.md
    • inference versus prediction.md
    • Inertia K Means Cost Function.md
    • Imputation Techniques.md
    • incremental synchronization.md
    • In NER how would you handle ambiguous entities.md
    • Implementing Database Schema.md
    • Impact of multicollinearity on model parameters.md
    • Imbalanced_Datasets_SMOTE.py.md
    • Immutable vs mutable.md
    • Hyperparameter.md
    • How would you decide between using TF-IDF and Word2Vec for text vectorization.md
    • How to search within a graph.md
    • How to normalise a merged table.md
    • How LLMs store facts.md
    • How is reinforcement learning being combined with deep learning.md
    • How do we evaluate of LLM Outputs.md
    • High cross validation accuracy is not directly proportional to performance on unseen test data.md
    • Hierarchical Clustering.md
    • Heatmaps_Dendrograms.py.md
    • Heap Memory.md
    • Hash.md
    • Handwritten Digit Classification.md
    • Handling_Missing_Data_Basic.ipynb.md
    • Handling_Missing_Data.ipynb.md
    • Handling Different Distributions.md
    • GRU.md
    • GridSeachCv.md
    • Groupby vs Crosstab.md
    • Grouped plots.md
    • GraphRAG The Marriage of Knowledge Graphs and RAG.md
    • Graph Query Language.md
    • Gradio.md
    • Grain.md
    • Grammar method.md
    • Gradient descent in linear regression.md
    • Google My Maps Data Extraction.md
    • GPT.md
    • Google Colab.md
    • Google Sheet Pivots Table.md
    • Global Interpreter Lock.md
    • Google Collab.md
    • Gitlab.md
    • Gini Impurity.md
    • GIS.md
    • Gini Impurity vs Cross Entropy.md
    • Generative AI From Theory to Practice.md
    • Gaussian Model.md
    • Gaussian_Mixture_Model_Implementation.py.md
    • Gaussian Distribution.md
    • garbage collector.md
    • Forecasting_Exponential_Smoothing.py.md
    • Foreign Key.md
    • frontend.md
    • Forecasting_Baseline.py.md
    • Forecasting_AutoArima.py.md
    • Firebase.md
    • Filter method.md
    • Fitting weights and biases of a neural network.md
    • Feedback Template.md
    • Feature Selection vs Feature Importance.md
    • Feature_Distribution.py.md
    • FastAPI.md
    • Facts.md
    • Factor_Analysis.py.md
    • FastAPI_Example.py.md
    • F1 Score.md
    • Factor Analysis.md
    • Exploration vs. Exploitation.md
    • Experiment Plan Template.md
    • Excel pivot table.md
    • Excel vs Google Sheets.md
    • Event-Driven Architecture.md
    • Event Driven Microservices.md
    • Event Driven Events.md
    • Estimator.md
    • Epoch.md
    • Environment Variables.md
    • Energy ABM.md
    • emergent behavior.md
    • Dummy variable trap.md
    • Embedded Methods.md
    • DuckDB.md
    • DuckDB vs SQLite.md
    • Education and Training.md
    • DS & ML Portal.md
    • documentation.md
    • DuckDB in python.md
    • Docker.md
    • Docker Image.md
    • Distribution_Analysis.py.md
    • Distillation.md
    • Determining Threshold Values.md
    • dependency manager.md
    • Deleting rows or filling them with the mean is not always best.md
    • Dendrograms.md
    • Deep Q-Learning.md
    • Debugging.py.md
    • Deep Learning Frameworks.md
    • Datasets.md
    • Data Validation.md
    • Data Virtualization.md
    • Data transformation in Data Engineering.md
    • Data Steward.md
    • Data transformation in Machine Learning.md
    • Data Scientist.md
    • Data Roles.md
    • Data Leakage.md
    • Data Drift.md
    • Data Distribution.md
    • Data Contract.md
    • Data Collection.md
    • Data Assessment.md
    • Data Architect.md
    • Dashboarding.md
    • Dash.md
    • Cross_Entropy_Single.py.md
    • CRUD.md
    • Cross_Entropy.py.md
    • Crosstab.md
    • Covariance vs Correlation.md
    • Covering Index.md
    • Covariance Structures.md
    • Cost Function.md
    • Cosine Similarity.md
    • Continuous Integration.md
    • Continuous Delivery - Deployment.md
    • Correlation vs Causation.md
    • Concurrency.md
    • conceptual data model.md
    • Computer Science.md
    • Concatenate.md
    • Comparing_Ensembles.py.md
    • Components of the database.md
    • Columnar Storage.md
    • Clustermap.md
    • Click_Implementation.py.md
    • Classification Report.md
    • Choosing the Number of Clusters.md
    • Chi-Squared Test.md
    • CI-CD.md
    • Choosing a Threshold.md
    • ChatGPT.md
    • Chain of thought.md
    • Checksum.md
    • Casual Inference.md
    • Business value of anomaly detection.md
    • CatBoost.md
    • Central Limit Theorem & Small Sample Sizes.md
    • Bootstrap.md
    • Binder.md
    • BERTScore.md
    • BigQuery.md
    • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding.md
    • Bernoulli.md
    • Benefits of Data Transformation.md
    • Batch Normalisation.md
    • Bash.md

    Backlinks

    • No backlinks found
        • pages
          • Data Archive
          • DE_Tools
          • ML_Tools
          • Quotes
          • Research Questions
          • Reviews
        • standardised
          • 1-on-1 Template
          • 1-to-1's with a Line Manager
          • AB testing
          • Accessing Gen AI generated content
          • Accuracy
          • ACID Transaction
          • Activation atlases
          • Activation Function
          • Active Learning
          • Ada boosting
          • Adam Optimizer
          • Adaptive Learning Rates
          • Adding a database to PostgreSQL
          • Addressing Multicollinearity
          • Addressing_Multicollinearity.py
          • Adjusted R squared
          • Agent Exploration
          • Agent-based modelling
          • Agentic Solutions
          • Aggregation
          • AI Agents Memory
          • AI Engineer
          • AI governance
          • Algorithms
          • Altair
          • altair versus seaborn
          • Alternatives to Batch Processing
          • Amazon S3
          • Anomaly Detection
          • Anomaly Detection in Time Series
          • Anomaly Detection with Clustering
          • Anomaly Detection with Statistical Methods
          • ANOVA
          • Apache Iceberg
          • Apache Kafka
          • API
          • API Driven Microservices
          • ARIMA
          • Asking questions
          • Assumption of Normality
          • Attack mitigation
          • Attack types
          • Attention Is All You Need
          • Attention mechanism
          • AUC
          • Automated Feature Creation
          • AWS Lambda
          • Azure
          • B-tree
          • Backpropagation in Neural Networks
          • Bag of words
          • Bag_of_Words.py
          • Bagging
          • Bandit example output
          • Bandit_Example_Fixed.py
          • Bandit_Example_Nonfixed.py
          • Bash
          • bat
          • Batch Normalisation
          • Batch Processing
          • Bellman Equations
          • Benefits of Data Transformation
          • Bernoulli
          • BERT
          • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
          • BERTScore
          • Bias and variance
          • Big Data
          • BigQuery
          • binary classification
          • Binder
          • Boosting
          • Bootstrap
          • Boxplot
          • Business observability
          • Business value of anomaly detection
          • Casual Inference
          • CatBoost
          • Central Limit Theorem
          • Central Limit Theorem & Small Sample Sizes
          • Chain of thought
          • Change Management
          • ChatGPT
          • Checksum
          • Chi-Squared Test
          • Choosing a Threshold
          • Choosing the Number of Clusters
          • CI-CD
          • Class Separability
          • Classification
          • Classification Report
          • Claude
          • Click_Implementation.py
          • Cluster Density
          • Cluster Seperation
          • Clustering
          • Clustering_Dashboard.py
          • Clustermap
          • Code Diagrams
          • Columnar Storage
          • Command line
          • Command Prompt
          • Common Table Expression
          • Communication principles
          • Communication Techniques
          • Comparing LLMs
          • Comparing_Ensembles.py
          • Components of the database
          • Computer Science
          • Concatenate
          • conceptual data model
          • Conceptual Model
          • Concurrency
          • Confidence Interval
          • Confusion Matrix
          • Continuous Delivery - Deployment
          • Continuous Integration
          • Convolutional Neural Networks
          • Correlation
          • Correlation vs Causation
          • Cosine Similarity
          • Cost Function
          • Cost-Sensitive Analysis
          • Covariance
          • Covariance Structures
          • Covariance vs Correlation
          • Covering Index
          • Cron jobs
          • Cross Entropy
          • Cross validation
          • Cross_Entropy_Single.py
          • Cross_Entropy.py
          • Crosstab
          • CRUD
          • Cryptography
          • csv module
          • CUDA
          • Curse of dimensionality
          • Cypher
          • Dash
          • Dashboarding
          • Data AI Education at Work
          • Data Analysis
          • Data Analysis Portal
          • Data Analyst
          • Data Architect
          • Data Assessment
          • Data Cleansing
          • Data Collection
          • Data Contract
          • Data Distribution
          • Data Drift
          • Data Engineer
          • Data Engineering
          • Data Engineering Portal
          • Data Engineering Tools
          • Data Ingestion
          • Data Integrity
          • Data Leakage
          • Data Lifecycle Management
          • Data Management
          • Data Mining - CRISP
          • Data Modelling
          • Data Orchestration
          • Data Pipeline
          • Data Pipeline to Data Products
          • Data Principles
          • Data Reduction
          • Data Roles
          • Data Science
          • Data Scientist
          • Data Security
          • Data Selection
          • Data Selection in ML
          • Data Steward
          • Data storage
          • Data Streaming
          • Data transformation in Data Engineering
          • Data transformation in Machine Learning
          • Data Transformation with Pandas
          • Data Validation
          • data virtualization
          • Data Visualisation
          • Database
          • Database Index
          • Database Management System (DBMS)
          • Database schema
          • Database Techniques
          • Databricks
          • Databricks vs Snowflake
          • Datasets
          • DBScan
          • dbt
          • Debugging
          • Debugging ipynb
          • Debugging.py
          • Decision Tree
          • Deep Learning Frameworks
          • Deep Learning Overview
          • Deep Q-Learning
          • Deleting rows or filling them with the mean is not always best
          • Demand forecasting
          • Dendrograms
          • dependency manager
          • Design Thinking Questions
          • Determining Threshold Values
          • Differentation
          • Digital Transformation
          • Digital twin
          • Dimension Table
          • Dimensional Modelling
          • Dimensionality Reduction
          • dimensions
          • Directed Acyclic Graph (DAG)
          • Distillation
          • Distributed Computing
          • Distribution_Analysis.py
          • Distributions
          • Docker
          • Docker Image
          • documentation
          • Documentation & Meetings
          • Dropout
          • DS & ML Portal
          • duckdb
          • DuckDB in python
          • DuckDB vs SQLite
          • Dummy variable trap
          • EDA
          • Edge ML
          • Education and Training
          • Elastic Net
          • ElasticSearch
          • ELT
          • Embedded Methods
          • embeddings for OOV words
          • emergent behavior
          • Encoding Categorical Variables
          • Energy
          • Energy ABM
          • Energy Storage
          • Environment Variables
          • Epoch
          • Epub
          • ER Diagrams
          • Estimator
          • ETL Pipeline example
          • ETL vs. ELT
          • etlt
          • Evaluate Embedding Methods
          • Evaluating Language Models
          • Evaluating the effectiveness of prompts
          • Evaluation Metrics
          • Event Driven
          • Event Driven Events
          • Event Driven Microservices
          • Event-Driven Architecture
          • Everything
          • Excel
          • Excel pivot table
          • Excel vs Google Sheets
          • Experiment Plan Template
          • Exploration vs. Exploitation
          • f-regression
          • F-statistic
          • F1 Score
          • Fabric
          • fact table
          • Factor Analysis
          • Factor_Analysis.py
          • facts
          • FAISS
          • FastAPI
          • FastAPI_Example.py
          • Feature Engineering
          • Feature Evaluation
          • Feature Extraction
          • Feature Importance
          • Feature Scaling
          • Feature Selection
          • Feature Selection vs Feature Importance
          • Feature_Distribution.py
          • Feed Forward Neural Network
          • Feedback Template
          • File Management
          • Filter method
          • filter methods
          • Firebase
          • Fishbone diagram
          • Fitting weights and biases of a neural network
          • Flask
          • Folder Tree Diagram
          • Forecasting_AutoArima.py
          • Forecasting_Baseline.py
          • Forecasting_Exponential_Smoothing.py
          • Foreign Key
          • Forward Propagation in Neural Networks
          • frontend
          • Fuzzywuzzy
          • garbage collector
          • Gartner Hype Cycle
          • Gaussian Distribution
          • Gaussian Mixture Models
          • Gaussian Model
          • gaussian_mixture_model_implementation.py
          • General Linear Regression
          • Generative Adversarial Networks
          • Generative AI
          • Generative AI From Theory to Practice
          • Generators in Python
          • Gini Impurity
          • Gini Impurity vs Cross Entropy
          • GIS
          • Git
          • Gitlab
          • gitlab-ci.yml
          • Global Interpreter Lock
          • Google Cloud Platform
          • Google Colab
          • Google Collab
          • Google My Maps Data Extraction
          • Google OR Tools
          • Google Sheet Pivots Table
          • Google Sheets
          • GPT
          • Gradient Boosting
          • Gradient Boosting Regressor
          • Gradient Descent
          • Gradient descent in linear regression
          • Gradio
          • Grain
          • Grammar method
          • Graph Neural Network
          • Graph Query Language
          • Graph Theory
          • Graph Theory Community
          • GraphRAG
          • GraphRAG The Marriage of Knowledge Graphs and RAG
          • Grep
          • GridSeachCv
          • Groupby
          • Groupby vs Crosstab
          • Grouped plots
          • GRU
          • Guardrails
          • Hadoop
          • Handling Different Distributions
          • Handling Missing Data
          • Handling_Missing_Data_Basic.ipynb
          • Handling_Missing_Data.ipynb
          • Handwritten Digit Classification
          • Hash
          • Heap Data Structure
          • Heap Memory
          • Heatmap
          • Heatmaps_Dendrograms.py
          • heterogeneous features
          • Hierarchical Clustering
          • High cross validation accuracy is not directly proportional to performance on unseen test data
          • Honkit
          • Hosting
          • How businesses use Gen AI
          • How do we evaluate of LLM Outputs
          • how do you do the data selection
          • How is reinforcement learning being combined with deep learning
          • How is schema evolution done in practice with SQL
          • How LLMs store facts
          • How to do git commit messages properly
          • How to normalise a merged table
          • How to reduce the need for Gen AI responses
          • How to search within a graph
          • How to use Sklearn Pipeline
          • How would you decide between using TF-IDF and Word2Vec for text vectorization
          • html
          • Hugging Face
          • Hyperparameter
          • Hyperparameter Tuning
          • Hypothesis testing
          • Imbalanced Datasets
          • Imbalanced_Datasets_SMOTE.py
          • Immutable vs mutable
          • Impact of multicollinearity on model parameters
          • Implementing Database Schema
          • Imputation Techniques
          • In NER how would you handle ambiguous entities
          • incremental synchronization
          • Indexing in cypher
          • Industries of interest
          • Inertia K Means Cost Function
          • inference
          • inference versus prediction
          • information theory
          • initialization methods
          • Input is Not Properly Sanitized
          • Interoperability
          • interoperable
          • Interpretability
          • Interpreting logistic regression model parameters
          • Interquartile Range (IQR) Detection
          • ipynb
          • Isolation Forest and Its Use in Anomaly Detection
          • Java
          • Java vs JavaScript
          • JavaScript
          • Jobs to be done
          • Johnson–Lindenstrauss lemma
          • Joining Datasets
          • Json
          • Json to SQLite
          • Junction Tables
          • Jupyter Book
          • jupytext
          • Justfile
          • K_Means.py
          • K-means
          • K-nearest neighbours
          • Keras
          • Kernel Density Estimation
          • Kernelling
          • Key Components of Attention and Formula
          • Key Differences of Web Feature Server (WFS) and Web Feature Server (WFS)
          • Kmeans vs GMM
          • KNIME
          • Knowledge Graph
          • Knowledge graph vs RAG setup
          • Knowledge Work
          • Label encoding
          • Label encoding vs One-hot encoding
          • Labelling data
          • Langchain
          • Language Model Output Optimisation
          • Language Models
          • Language Models Large (LLMs) vs Small (SLMs)
          • Lasso
          • Latency
          • Latent Dirichlet Allocation
          • LBFGS
          • Learning Curve
          • learning rate
          • Learning Styles
          • lemmatization
          • LightGBM
          • LightGBM vs XGBoost vs CatBoost
          • Linear Discriminant Analysis
          • Linear Regression
          • Linked List
          • LLM
          • LLM Evaluation Metrics
          • LLM Memory
          • Load Balancing
          • Local Interpretable Model-agnostic Explanations
          • Local Outlier Factor (LOF)
          • Log transformation
          • Logical Model
          • Logistic Regression
          • Logistic Regression does not predict probabilities
          • Logistic regression in sklearn & Gradient Descent
          • Logistic Regression Statsmodel Summary table
          • Looker Studio
          • loss function
          • Loss versus Cost function
          • LSTM
          • Machine Learning Algorithms
          • Machine Learning Operations
          • maintainability
          • Maintainable Code
          • Makefile
          • Manifold learning
          • Many-to-Many Relationships
          • Markov chain
          • Markov Decision Processes
          • Master Observability Datadog
          • Mathematical Reasoning in Transformers
          • Mathematics
          • Maximum Likelihood Estimation
          • mean absolute error
          • Mean Squared Error
          • mean vs median
          • melt
          • Memory
          • Memory Caching
          • Merge
          • Mermaid
          • Metadata Handling
          • Methods for Handling Outliers
          • Microsoft
          • Microsoft Access
          • Mini-batch gradient descent
          • Mixture of Experts
          • ML Engineer
          • MNIST
          • Model Building
          • Model Cascading
          • Model Deployment
          • Model Ensemble
          • Model Evaluation
          • Model Evaluation vs Model Optimisation
          • Model Interpretability
          • Model Observability
          • Model Optimisation
          • Model Parameters
          • Model Parameters Tuning
          • Model parameters vs hyperparameters
          • Model Selection
          • Model Validation
          • model-agnostic feature importance
          • Momentum
          • Momentum.py
          • MongoDB
          • Monolith Architecture
          • Monte Carlo Simulation
          • Multi-Agent Reinforcement Learning
          • Multi-head attention
          • Multi-level index
          • Multicollinearity
          • Multinomial Naive bayes
          • Multiprocessing
          • Multiprocessing vs Multithreading
          • Multithreading
          • MySql
          • Naive Bayes
          • Natural Language Processing
          • nbconvert
          • nbconvert slideshows
          • neo4j
          • neomodel
          • NET
          • Network Design
          • Neural network
          • Neural Network Classification
          • Neural network in Practice
          • Neural Scaling Laws
          • Ngrams
          • nltk
          • Node.JS
          • non-parametric
          • Non-parametric tests
          • Normalisation
          • Normalisation of data
          • Normalisation of Text
          • Normalisation vs Standardisation
          • NoSQL
          • NotebookLM
          • npy Files A NumPy Array storage
          • Numpy
          • Object Relational Mapper
          • Odds
          • Odds vs Probability
          • OLAP
          • OLTP
          • One Pager Template
          • One_hot_encoding.py
          • One-hot encoding
          • OOV words
          • Operational Resilience for Growth and Adaptability
          • Optimisation function
          • Optimisation techniques
          • Optimising a Logistic Regression Model
          • Optimising Neural Networks
          • Optuna
          • Ordinary Least Squares
          • Orthogonalization
          • Outliers
          • Over parameterised models
          • Overfitting in Machine Learning
          • p values
          • Page Rank
          • Pandas
          • Pandas Dataframe Agent
          • Pandas join vs merge
          • Pandas Pivot Table
          • Pandas Stack
          • Pandas_Common.py
          • Pandas_Stack.py
          • Pandoc
          • Parametric tests
          • parametric vs non-parametric models
          • parametric vs non-parametric tests
          • Parquet
          • parsimonious
          • Part of speech tagging
          • PCA Explained Variance Ratio
          • PCA Principal Components
          • PCA_Analysis.ipynb
          • PCA_Based_Anomaly_Detection.py
          • PCA-Based Anomaly Detection
          • pd.Grouper
          • pdoc
          • PDP and ICE
          • Percentile Detection
          • Performance Dimensions
          • Performance Drift in Machine Learning
          • Physical Model
          • Pickle
          • Plotly
          • pmdarima
          • Poetry
          • Polynomial Regression
          • Positional Encoding
          • PostgreSQL
          • Postman
          • PowerBI
          • Powerquery
          • PowerShell
          • Powershell scripts
          • Powershell versus Command Prompt
          • Powershell vs Bash
          • Precision
          • Precision or Recall
          • Precision-Recall Curve
          • Prediction Intervals
          • Preprocessing
          • Prevention Is Better Than the Cure
          • Primary Key
          • Principal Component Analysis
          • Probability
          • Problem Definition
          • Process Based Parallelism
          • Processes vs Threads
          • programming languages
          • Project Management Portal
          • Prompt engineering
          • prompt retrievers
          • Prompting
          • Proportion Test
          • Publish and Subscribe
          • Pull Request Template
          • PyCaret
          • Pycaret_Anomaly.ipynb
          • Pycaret_Example.py
          • Pydantic
          • Pydantic_More.py
          • Pydantic.py
          • PyGraphviz
          • PyOD
          • Pyright
          • Pyright vs Pydantic
          • PySpark
          • Pytest
          • Python
          • Python Click
          • PyTorch
          • Pytorch vs Tensorflow
          • Q-Learning
          • Q-Q Plot
          • Quartz
          • Query Optimisation
          • Querying
          • QuickSort
          • R
          • R squared
          • R-squared metric not always a good indicator of model performance in regression
          • Race Conditions
          • RAG
          • Random Access Memory
          • Random Forest Regression
          • Random Forests
          • React
          • Reasoning tokens
          • Recall
          • Recommender systems
          • Recurrent Neural Networks
          • Recursive Algorithm
          • Registering a Scheduled Task
          • Regression
          • Regression metrics
          • Regression_Logistic_Metrics.ipynb
          • Regularisation of Tree based models
          • Regularisation.py
          • Regularization in Machine Learning
          • Reinforcement learning
          • Relating Tables Together
          • Relational Database
          • Relationships in memory
          • Relu
          • REST API
          • retriever
          • Reveal.js
          • Reward Function
          • Ridge
          • ROC (Receiver Operating Characteristic)
          • ROC_Curve.py
          • rollup
          • Root Mean Squared Error
          • Row-based Storage
          • Sarsa
          • Scala
          • Scalability
          • Scaling Agentic Systems
          • Scaling Data Science Capability
          • Scaling Server
          • Scatter Plots
          • Scientific Method
          • Scikit-Learn
          • Scipy
          • Seaborn
          • search
          • Security mitigation
          • Security Researcher
          • Security Vulnerabilities
          • Self Attention
          • Self attention vs multi-head attention
          • Self-Attention
          • semantic layer
          • Semantic Relationships
          • Semantic search
          • Sentence Similarity
          • Sentence Transformer Workflow
          • Sentence Transformers
          • shapefile
          • SHapley Additive exPlanations
          • Sharepoint
          • Silhouette Analysis
          • Similarity Search
          • Single source of truth
          • sklearn datasets
          • Sklearn Pipiline
          • Small Language Models
          • Smart Grids
          • SMOTE (Synthetic Minority Over-sampling Technique)
          • SMSS
          • Snowflake
          • Snowflake Schema
          • Snowflake vs Hadoop
          • Software Design Patterns
          • Software Development Life Cycle
          • Software Development Portal
          • spaCy
          • SparseCategorialCrossentropy or CategoricalCrossEntropy
          • Spearman vs Pearson Correlation
          • Specificity
          • Spreadsheets vs Databases
          • SQL Groupby
          • SQL Injection
          • SQL Joins
          • SQL vs NoSQL
          • SQL Window functions
          • SQLAlchemy
          • SQLAlchemy vs. sqlite3
          • SQLite
          • SQLite Studio
          • stack memory
          • Stacking
          • Standard deviation
          • Standardisation
          • Star Schema
          • Statistical Assumptions
          • Statistical Tests
          • Statistical theorems
          • Statistics
          • Stemming
          • Stochastic Gradient Descent
          • Stored Procedures
          • Streamlit
          • Strongly vs Weakly typed language
          • Structuring and organizing data
          • Summarisation
          • Supervised Learning
          • Support Vector Classifier (SVC)
          • Support Vector Machines
          • Support Vector Regression
          • SVM_Example.py
          • Symbolic computation
          • Sympy
          • syntactic relationships
          • t-SNE
          • T-test
          • Tableau
          • Technical Analysis of Named Entity Recognition
          • Technical Debt
          • Technical Design Doc Template
          • Telecommunications
          • Tensorflow
          • Terminal commands
          • Test Loss When Evaluating Models
          • Testing
          • Testing_Pytest.py
          • Testing_unittest.py
          • Text2Cypher
          • TF-IDF
          • TF-IDF Implementation
          • The Data Hierarchy of Needs
          • Thinking Systems
          • Time Series
          • Time Series Forecasting
          • Time Series Identify Trends and Patterns
          • Tokenisation
          • TOML
          • tool.bandit
          • tool.ruff
          • tool.uv
          • topic modeling
          • Train-Dev-Test Sets
          • Transaction
          • Transfer Learning
          • transfer_learning.py
          • Transformed Target Regressor
          • Transformer
          • Transformers vs RNNs
          • TS_Anomaly_Detection.py
          • Turning a flat file into a database
          • Type 1 error and Power
          • Types of Computational Bugs
          • Types of Database Schema
          • Types of Neural Networks
          • TypeScript
          • Typical Output Formats in Neural Networks
          • Ubuntu
          • UMAP
          • UML
          • unittest
          • univariate vs multivariate
          • Unix
          • unstructured data
          • Unsupervised learning
          • Use Cases for a Simple Neural Network Like
          • Use of RNNs in energy sector
          • Vacuum
          • vanishing and exploding gradients problem
          • Variability in linear models
          • variance
          • Vector Database
          • Vector Embedding
          • Vector_Embedding.py
          • Vectorisation
          • Vectorized Engine
          • Vercel
          • View Use Case
          • Views
          • Violin plot
          • Virtual environments
          • WCSS and elbow method
          • Weak Learners
          • Web Feature Server (WFS)
          • Web Map Tile Service (WMTS)
          • Webpages relevant
          • What are Data Processing Techniques (row-based, columnar, vectorized)?
          • What are the top Cloud Providers?
          • What is a Data Lake?
          • What is a Data Lakehouse?
          • What is a Data Product?
          • What is a Data Warehouse?
          • What is a Jinja Template?
          • What is a Lambda Architecture?
          • What is a Metric?
          • What is a policy in RL
          • What is a Push-Down?
          • What is a Soft Delete?
          • What is a Storage Layer / Object Store?
          • What is an In-Memory Format?
          • What is Apache Airflow?
          • What is Apache Spark?
          • What is Business Intelligence
          • What is Dagster?
          • What is Data Governance?
          • What is Data Integration?
          • What is Data Lineage?
          • What is Data Literacy?
          • What is Data Observability?
          • What is Data Quality?
          • What is data transformation?
          • What is declarative?
          • What is DevOps?
          • What is ETL?
          • What is Functional Programming?
          • What is Granularity
          • What is imperative?
          • What is Kubernetes?
          • What is Machine Learning?
          • What is MapReduce?
          • What is Master Data Management (MDM)?
          • What is Normalization?
          • What is OLAP (Online Analytical Processing)?
          • What is Reverse ETL?
          • What is Schema Evolution?
          • What is semi-structured data?
          • What is Slowly Changing Dimension?
          • What is SQL?
          • What is structured data?
          • What is the Big-O Notation?
          • What is YAML?
          • When and why not to us regularisation
          • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
          • Why does the Adam Optimizer converge
          • Why is named entity recognition (NER) a challenging task
          • Why JSON is Better than Pickle for Untrusted Data
          • Why Removing Outliers May Improve Regression but Harm Classification
          • Why standardise features
          • Why Type 1 and Type 2 matter
          • Why use ER diagrams
          • Wikipedia_API.py
          • Windows Scheduled Tasks
          • Windows Subsystem for Linux
          • Word2vec
          • Word2Vec.py
          • WordNet
          • Wrapper Methods
          • Xaiver
          • XGBoost
          • Z-Normalisation
          • Z-Score
          • Z-Scores vs Prediction Intervals
          • Z-Test
        • 01082025-files without tags

      Created with Quartz v4.3.1 © 2025

      • GitHub
      • Linkedin