Data Archive

    • categories
      • computer-science
        • Algorithms
        • Big O Notation
        • BM25 (Best Match 25)
        • Checksum
        • Computer Science
        • Concurrency
        • Convex Optimisation
        • csv module
        • Directed Acyclic Graph (DAG)
        • Flask
        • garbage collector
        • Generators in Python
        • Hash
        • Heap Data Structure
        • Heap Memory
        • How to search within a graph
        • Immutable vs mutable
        • Java
        • Java vs JavaScript
        • JavaScript
        • Knowledge Graph
        • Langchain
        • Machine Learning Algorithms
        • Monte Carlo Simulation
        • Multiprocessing vs Multithreading
        • Multithreading
        • neomodel
        • Node.JS
        • Numpy
        • Processes vs Threads
        • programming languages
        • PyGraphviz
        • QuickSort
        • Ranking models
        • Recursive Algorithm
        • Strongly vs Weakly typed language
        • Times Series Python Packages
      • data-analysis
        • Altair
        • altair versus seaborn
        • Binder
        • Boxplot
        • Dash
        • Dashboarding
        • Dashboards
        • Data Analysis
        • Data Analysis Portal
        • Data Analyst
        • Data Distribution
        • Data Mining
        • Data Product
        • Data Reduction
        • Data Visualisation
        • DuckDB
        • EDA
        • ER Diagrams
        • Heatmap
        • Label encoding
        • Linear Discriminant Analysis
        • Log transformation
        • Looker Studio
        • MariaDB vs MySQL
        • Melt
        • Multiple Correspondence Analysis
        • Multivariate Analysis
        • OLAP
        • Page Rank
        • Parquet
        • Plotly
        • PowerBI
        • Preprocessing
        • Preprocessing Text Classification
        • Seaborn
        • SQL Window functions
        • t-SNE
        • Tableau
      • data-engineering
        • ACID Transaction
        • Ada boosting
        • Adding a database to PostgreSQL
        • Aggregation
        • Apache Iceberg
        • Attack mitigation
        • Attack types
        • AWS Lambda
        • Azure
        • Bagging
        • Benefits of Data Transformation
        • Big Data
        • BigQuery
        • Cassandra
        • Cloud Providers
        • Coaching & Mentoring
        • Columnar Storage
        • Command Prompt
        • Common Table Expression
        • Components of the database
        • Covering Index
        • Crosstab
        • CRUD
        • CUDA
        • Curse of dimensionality
        • Cypher
        • Data Architect
        • Data Architecture
        • Data Cleansing
        • Data Contract
        • Data Deployment
        • Data Dictionary
        • Data Drift
        • Data Engineering
        • Data Engineering Portal
        • Data Engineering Tools
        • Data Evaluation
        • Data Hierarchy of Needs
        • Data Integration
        • Data Integrity
        • Data Lake
        • Data Lakehouse
        • Data Leakage
        • Data Lifecycle Management
        • data lineage
        • Data Management
        • Data Modeling
        • Data Observability
        • Data Principles
        • Data Quality
        • Data Security
        • Data Selection
        • Data Sources
        • Data Storage
        • Data Transformation
        • Data Transformation in Data Engineering
        • Data Transformation with Pandas
        • Data Validation
        • Data Virtualization
        • Data Warehouse
        • Database
        • Database Index
        • Database Management System (DBMS)
        • Database Schema
        • Database Storage
        • Database Techniques
        • Databricks 1
        • DataOps
        • dbt 1
        • design pattern
        • Digital twin
        • Distributed Computing
        • DuckDB in python
        • DuckDB vs SQLite
        • Durability
        • ELT
        • Estimator
        • ETL
        • ETL 1
        • ETL Pipeline Example
        • ETL vs ELT
        • EtLT
        • Event Driven Microservices
        • Event-Driven Architecture
        • Fabric
        • Faker
        • File Management
        • Folder Tree Diagram
        • Foreign Key
        • Github Actions
        • Google Sheet Pivots Table
        • Grain
        • Graph Query Language
        • Groupby
        • Groupby vs Crosstab
        • heterogeneous features
        • Honkit
        • Hosting
        • How is schema evolution done in practice with SQL
        • How to normalise a merged table
        • Implementing Database Schema
        • Imputation Techniques
        • in-memory format
        • incremental synchronization
        • Indexing in cypher
        • Input is Not Properly Sanitized
        • Joining Datasets
        • Junction Tables
        • KNIME
        • Logical Model
        • Many-to-Many Relationships
        • map reduce
        • MariaDB
        • master data management
        • Merge
        • Microsoft Access
        • Missing Data
        • Model Deployment
        • Monolith Architecture
        • Multi-level index
        • Multiprocessing
        • MySql
        • neo4j
        • Normalised Schema
        • NoSQL
        • Object Relational Mapper
        • OLTP
        • Overfitting
        • Pandas
        • Pandas join vs merge
        • Pandas Pivot Table
        • Pandas Stack
        • pd.Grouper
        • pgAdmin
        • Pgadmin Permissions on Windows
        • Physical Model
        • Pickle
        • Poetry
        • Polars
        • PostgreSQL
        • Postman
        • PowerShell
        • Prevention Is Better Than The Cure
        • Primary Key
        • Push-Down
        • Pydantic
        • Pyright vs Pydantic
        • Query Optimisation
        • Querying
        • Querying Time Series
        • Race Conditions
        • Relating Tables Together
        • Relational Database
        • reverse etl
        • rollup
        • Row parameters in SQL
        • Row-based Storage
        • Scalability
        • Scaling Server
        • Schema Evolution
        • Search
        • Security mitigation
        • Security Researcher
        • semantic layer
        • Single Source of Truth
        • Sklearn Pipiline
        • Slowly Changing Dimension
        • SMSS
        • Snowflake Schema
        • Soft Deletion
        • Software Design Patterns
        • Spreadsheets vs Databases
        • SQL
        • SQL Groupby
        • SQL Injection
        • SQL Joins
        • SQLAlchemy
        • SQLAlchemy vs. sqlite3
        • SQLite
        • SQLite Studio
        • Star Schema
        • storage layer object store
        • Stored Procedures
        • structured data
        • Structuring and organizing data
        • Transaction
        • Turning a flat file into a database
        • Types of Database Schema
        • Unix
        • unstructured data
        • Usability
        • Vacuum
        • Vector Database
        • Vectorized Engine
        • View Use Case
        • Views
        • Windows Subsystem for Linux
      • data-science
        • ACF Plots
        • Additive vs Multiplicative Models Time Series
        • ADF Test
        • Agent Exploration
        • Agentic Solutions
        • AI
        • ARIMA
        • ARIMA vs Random Forest in Time Series
        • Autocorrelation
        • Autocorrelation vs Autoregression
        • Autoregression
        • Baseline Forecast
        • Basics of Time Series
        • Batch gradient descent
        • Bellman Equations
        • Bias-Variance Trade Off
        • Capability
        • Choosing a Threshold
        • Choosing the Number of Clusters
        • Clustermap
        • Covariance Structures
        • Cross Validation
        • Data Assessment
        • Data Collection
        • Data Mining - CRISP
        • Data Preparation
        • Data Science
        • Data Scientist
        • Data Understanding
        • Datasets
        • Decomposition in Time Series
        • Differencing in Time Series
        • DS & ML Portal
        • Evaluating Time Series Forecasts
        • Evolving Seasonality
        • F-statistic
        • Feature Engineering
        • Feature Scaling
        • Feature Selection vs Feature Importance
        • Forecasting using Lags
        • Forward Propagation
        • Gaussian Mixture Models
        • Gitlab
        • Gompertz Model
        • Good Enough Principle in Data Projects
        • GraphRAG
        • Handling Missing Data
        • Holt-Winters (Exponential Smoothing)
        • Holt-Winters vs ARIMA
        • Holt’s Linear Trend Model (Double Exponential Smoothing)
        • how do you do the data selection
        • Imbalanced Datasets
        • Interpolation
        • Intervention Analysis
        • Joining Time Series
        • Kernel Machines
        • KPSS Test
        • Latency
        • Logistic Model Curve
        • LSTM in Time Series
        • Mean Absolute Percentage Error
        • MNIST
        • Normalisation
        • Out-of-sample rolling forecast evaluation
        • PACF Plots
        • Performance Dimensions
        • pmdarima
        • Properties of Time Series Models
        • Random Forest Regression
        • Residuals in Time Series
        • Scatter Plots
        • Scientific Method
        • Scipy
        • Seasonal Naive Forecast
        • Seasonality in Time Series
        • SHapley Additive exPlanations
        • Shot Learning
        • Silhouette Analysis
        • Simple Exponential Smoothing (SES)
        • sklearn datasets
        • SMOTE (Synthetic Minority Over-sampling Technique)
        • SparseCategorialCrossentropy or CategoricalCrossEntropy
        • stack memory
        • Stacking
        • Stationary Time Series
        • STL Decomposition
        • Time Series
        • Time Series Forecasting
        • Time Series Forecasts in Business
        • Time Series Learning Resources
        • Time Series Shocks
        • Trends in Time Series
      • deep-learning
        • Convolutional Neural Networks
        • Deep Learning
        • How is reinforcement learning being combined with deep learning
        • LSTM
        • Multi-Agent Reinforcement Learning
        • Policy
        • Relu
        • Sarsa
      • devops
        • AB testing
        • Alternatives to Batch Processing
        • Amazon S3
        • Apache Airflow
        • Apache Kafka
        • Apache Spark
        • API
        • API Driven Microservices
        • Bash
        • bat
        • Batch Processing
        • Batch vs PowerShell scripts
        • CI-CD
        • Clustering_Dashboard.py
        • Code Diagrams
        • Command Line
        • Continuous Delivery - Deployment
        • Continuous Integration
        • Cron jobs
        • dagster
        • Data Ingestion
        • Data Orchestration
        • Data Pipeline
        • Data Pipeline to Data Products
        • Data Streaming
        • Databricks
        • Databricks vs Snowflake
        • dbt
        • Debugging
        • Declarative Data Pipeline
        • dependency manager
        • DevOps
        • Devops Portal
        • Digital Transformation
        • Docker
        • Docker Image
        • Elastic Net
        • Environment Variables
        • Epub
        • Event Driven
        • Event Driven Events
        • Everything
        • Excel
        • Excel pivot table
        • Excel vs Google Sheets
        • FastAPI
        • Firebase
        • frontend
        • functional programming
        • GIS
        • Git
        • Github Gists
        • gitlab-ci.yml
        • Global Interpreter Lock
        • Google Cloud Platform
        • Google Colab
        • Google My Maps Data Extraction
        • Google Sheets
        • GPT
        • Gradio
        • Grep
        • Hadoop
        • Hugging Face
        • imperative
        • ipynb
        • jinja template
        • Json
        • Json to SQLite
        • jupytext
        • Justfile
        • kubernetes
        • Load Balancing
        • Maintainability
        • Maintainable Code
        • Makefile
        • Master Observability Datadog
        • Memory
        • Memory Caching
        • Microsoft
        • MongoDB
        • nbconvert
        • NET
        • Normalisation of Text
        • Pandas Series vs DataFrame
        • Pandoc
        • PMML
        • Powerquery
        • Powershell scripts
        • Powershell versus Command Prompt
        • Powershell vs Bash
        • Publish and Subscribe
        • PySpark
        • Pytest
        • Python
        • Python Click
        • Quartz
        • Random Access Memory
        • React
        • Registering a Scheduled Task
        • REST API
        • Scala
        • Security Vulnerabilities
        • shapefile
        • Sharepoint
        • Snowflake
        • Snowflake vs Hadoop
        • Software Development Life Cycle
        • SQL vs NoSQL
        • Streamlit
        • Technical Design Doc Template
        • Terminal commands
        • Testing
        • TOML
        • tool.bandit
        • tool.ruff
        • tool.uv
        • Types of Computational Bugs
        • TypeScript
        • Ubuntu
        • unittest
        • Vercel
        • Virtual environments
        • Web Feature Server (WFS)
        • Web Map Tile Service (WMTS)
        • Why JSON is Better than Pickle for Untrusted Data
        • Windows
        • Windows Scheduled Tasks
        • yaml
      • industry
        • AI Engineer
        • AI governance
        • Analytics Engineer
        • business intelligence
        • Business observability
        • Business Understanding
        • Business Values
        • Data AI Education at Work
        • Data Engineer
        • Data Governance
        • data literacy
        • Data Roles
        • Data Steward
        • Design Thinking Questions
        • Documentation & Meetings
        • Energy
        • Energy ABM
        • Energy Demand Forecasting
        • Energy Storage
        • Facts
        • Gartner Hype Cycle
        • Industries of interest
        • Knowledge Work
        • Managing People
        • ML Engineer
        • Network Design
        • Operational Resilience for Growth and Adaptability
        • Reporting
        • Scaling Data Science Capability
        • Smart Grids
        • Telecommunications
        • Thinking Systems
        • Use of RNNs in energy sector
        • Working with SMEs
      • machine-learning
        • Accuracy
        • Activation atlases
        • Activation Function
        • Active Learning
        • Adam Optimizer
        • Adaptive Learning Rates
        • Adjusted R squared
        • Agent-Based Modelling
        • AIC in Model Evaluation
        • Anomaly Detection
        • Anomaly Detection in Time Series
        • Anomaly Detection with Clustering
        • Anomaly Detection with Statistical Methods
        • Assessing Gen AI generated content
        • AUC
        • Automated Feature Creation
        • AutoML
        • Backpropagation
        • Batch Normalisation
        • Bias in ML
        • Binary Classification
        • Boosting
        • Business value of anomaly detection
        • CART
        • CatBoost
        • Challenges to Model Deployment
        • Class Separability
        • Classification
        • Classification Report
        • Cluster Density
        • Cluster Seperation
        • Clustering
        • Collaborative Filtering
        • conceptual data model
        • Confusion Matrix
        • Cost Function
        • Cost-Sensitive Analysis
        • Cross Entropy
        • Customer Growth Modeling
        • Data Selection in ML
        • Data Transformation in Machine Learning
        • DBSCAN
        • Decision Theory
        • Decision Tree
        • Decision Trees are Fragile
        • Deep Learning Frameworks
        • Deep Q-Learning
        • Dendrograms
        • Determining Threshold Values
        • Dimension Table
        • Dimensional Modelling
        • Dimensionality Reduction
        • Dimensions
        • Distributions in Decision Tree Leaves
        • Dropout
        • Dummy variable trap
        • Edge ML
        • emergent behavior
        • Encoding Categorical Variables
        • Epoch
        • Evaluating Language Models
        • Evaluating Logistic Regression
        • Evaluating the effectiveness of prompts
        • Evaluation Metrics
        • Exploration vs Exploitation
        • Exponential Smoothing
        • f-regression
        • F1 Score
        • Fact Table
        • FAISS
        • Feature Engineering for Time Series
        • Feature Evaluation
        • Feature Extraction
        • Feature Importance
        • Feature Selection
        • Feature Transformations
        • Feed Forward Neural Network
        • Filter Methods
        • Fitting weights and biases of a neural network
        • Framework for models
        • Gaussian Model
        • General Linear Regression
        • Generalisation
        • Generative Adversarial Networks
        • Gini Impurity
        • Gini Impurity vs Cross Entropy
        • Gradient Boosted Trees
        • Gradient Boosting
        • Gradient Boosting Regressor
        • Gradient Descent
        • Gradient descent in linear regression
        • granularity
        • Graph Neural Network
        • Graph Theory Community
        • GridSeachCv
        • Growth Models in Time Series
        • GRU
        • Hierarchical Clustering
        • High cross validation accuracy is not directly proportional to performance on unseen test data
        • Histogram
        • How do we evaluate of LLM Outputs
        • How to use Sklearn Pipeline
        • Hyperparameter
        • Hyperparameter Tuning
        • Impact of multicollinearity on model parameters
        • Inertia K Means Cost Function
        • inference
        • inference versus prediction
        • initialization methods
        • Interoperability
        • interoperable
        • Interpretability
        • Interpreting logistic regression model parameters
        • Isolated Forest
        • Jaccard Coefficient
        • K-means
        • K-nearest neighbours
        • Keras
        • Kernel Density Estimation
        • Kernelling
        • Kmeans vs GMM
        • L1 Regularisation
        • Label encoding vs One-hot encoding
        • Labelling data
        • Lagrange multipliers in optimisation
        • lambda architecture
        • Latent Dirichlet Allocation
        • Latent Semantic Indexing
        • LBFGS
        • Learning Curve
        • Learning Rate
        • Learning Styles
        • LightGBM
        • LightGBM vs XGBoost vs CatBoost
        • Linear Regression
        • LLM Evaluation Metrics
        • Local Interpretable Model-agnostic Explainations
        • Local Outlier Factor (LOF)
        • Logistic Regression
        • Logistic Regression does not predict probabilities
        • Logistic regression in sklearn & Gradient Descent
        • Logistic Regression Statsmodel Summary table
        • Loss function
        • Loss versus Cost function
        • Machine Learning
        • Machine Learning Operations
        • Manifold Learning
        • Markov Decision Processes
        • Maximum Likelihood Estimation
        • Median Absolute Error
        • Mermaid
        • Metadata Handling
        • Methods for Handling Outliers
        • Metric
        • Mini-batch gradient descent
        • MLOPS for Time Series
        • Model Building
        • Model Deployment using PyCaret
        • Model Ensemble
        • Model Evaluation
        • Model Evaluation vs Model Optimisation
        • Model Interpretability
        • Model Observability
        • Model Optimisation
        • Model Parameters
        • Model Parameters Tuning
        • Model parameters vs hyperparameters
        • Model Selection
        • Model Validation
        • model-agnostic feature importance
        • Momentum
        • Moving Average Forecast
        • Multinomial Naive bayes
        • Multiple Linear Regression
        • Naive Bayes Classifier
        • Naive Forecast
        • Neural network
        • Neural Network Classification
        • Neural network in Practice
        • Neural Scaling Laws
        • Non-negative matrix factorization in ML
        • Non-parametric tests
        • Normalisation of data
        • Normalisation vs Standardisation
        • objective function
        • One-hot encoding
        • Optimisation function
        • Optimisation techniques
        • Optimising a Logistic Regression Model
        • Optimising Neural Networks
        • Optuna
        • Ordinary Least Squares
        • Orthogonalization
        • Outliers
        • Over parameterised models
        • PCA Explained Variance Ratio
        • PCA Principal Components
        • PCA-Based Anomaly Detection
        • PDP and ICE
        • Percentile Detection
        • Performance Drift
        • Polynomial Regression
        • Positional Encoding
        • Precision
        • Precision or Recall
        • Precision-Recall Curve
        • Prediction Intervals vs Confidence Interval
        • Principal Component Analysis
        • PyCaret
        • PyOD
        • PyTorch
        • Pytorch vs Tensorflow
        • Q-Learning
        • Random Forest
        • Random Forest for Time Series
        • Recall
        • Recommender systems
        • Recurrent Neural Networks
        • Regression
        • Regression Metrics
        • Regularisation
        • Regularisation of Tree based models
        • Reinforcement learning
        • Relationships in memory
        • Reward Function
        • Ridge
        • ROC (Receiver Operating Characteristic)
        • Sammon’s Mapping
        • SARIMA
        • Scikit-Learn
        • Secretary Problem
        • semi-structured data
        • Sentence Transformers
        • Sklearn Pipeline
        • Specificity
        • Spectral Clustering
        • Supervised Learning
        • Support Vector Classifier
        • Support Vector Machines
        • Support Vector Regression
        • Tensorflow
        • Test Loss When Evaluating Models
        • Text Classification
        • Time Series Python Packages
        • Train-Dev-Test Sets
        • Transfer Learning
        • Transformed Target Regressor
        • Transformer
        • Transformers vs RNNs
        • Type I Error (False Positive)
        • Type II Error (False Negative)
        • Types of Neural Networks
        • Typical Output Formats in Neural Networks
        • UMAP
        • Unsupervised Learning
        • Use Cases for a Simple Neural Network Like
        • vanishing and exploding gradients problem
        • Variability in linear models
        • Variance in ML
        • Vector Embedding
        • WCSS and elbow method
        • Weak Learners
        • When and why not to us regularisation
        • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
        • Why does the Adam Optimizer converge
        • Why Removing Outliers May Improve Regression but Harm Classification
        • Why standardise features
        • Why Type 1 and Type 2 matter
        • Wrapper Methods
        • Xaiver
        • XGBoost
      • natural-language
        • AI Agents Memory
        • Attention mechanism
        • Bag of words
        • BERT
        • BERTScore
        • Chain of thought
        • ChatGPT
        • Claude
        • Comparing LLMs
        • Distillation
        • ElasticSearch
        • Embedded Methods
        • embeddings for OOV words
        • Evaluate Embedding Methods
        • Fuzzywuzzy
        • Generative AI
        • Generative AI From Theory to Practice
        • Grammar method
        • Guardrails
        • How businesses use Gen AI
        • How LLMs store facts
        • How to reduce the need for Gen AI responses
        • How would you decide between using TF-IDF and Word2Vec for text vectorization
        • In NER how would you handle ambiguous entities
        • Key Components of Attention and Formula
        • Knowledge graph vs RAG setup
        • Language Model Output Optimisation
        • Language Models
        • Language Models Large (LLMs) vs Small (SLMs)
        • lemmatization
        • LLM
        • LLM Memory
        • Local LLM use cases
        • Mathematical Reasoning in Transformers
        • Mixture of Experts
        • Model Cascading
        • Multi-head attention
        • Named Entity Recognition
        • NER Implementation
        • Ngrams
        • NLP
        • nltk
        • Non-negative Matrix Factorization
        • NotebookLM
        • OOV words
        • Pandas Dataframe Agent
        • Part of speech tagging
        • Prompt Engineering
        • prompt retrievers
        • Prompts
        • Pyright
        • RAG
        • Scaling Agentic Systems
        • Self attention vs multi-head attention
        • Self-Attention
        • Semantic Relationships
        • Semantic search
        • Sentence Similarity
        • Sentence Transformer Workflow
        • Similarity Search
        • Small Language Models
        • spaCy
        • Stemming
        • stopwords
        • Summarisation
        • syntactic relationships
        • Text2Cypher
        • TF-IDF
        • TF-IDF Implementation
        • Tokenisation
        • topic modeling
        • Vectorisation
        • Why is named entity recognition (NER) a challenging task
        • Word2vec
        • WordNet
      • OTHER
        • Addressing_Multicollinearity.py
        • Bag_of_Words.py
        • Bandit example output
        • Bandit_Example_Fixed.py
        • Click_Implementation.py
        • Comparing_Ensembles.py
        • Cross_Entropy_Single.py
        • Cross_Entropy.py
        • Debugging.py
        • Distribution_Analysis.py
        • Factor_Analysis.py
        • FastAPI_Example.py
        • Feature_Distribution.py
        • Forecasting_AutoArima.py
        • Forecasting_Baseline.py
        • Forecasting_Exponential_Smoothing.py
        • Gaussian_Mixture_Model_Implementation.py
        • Handling_Missing_Data_Basic.ipynb
        • Handling_Missing_Data.ipynb
        • Heatmaps_Dendrograms.py
        • Imbalanced_Datasets_SMOTE.py
        • K_Means.py
        • Momentum.py
        • One_hot_encoding.py
        • Pandas_Common.py
        • Pandas_Stack.py
        • PCA_Analysis.ipynb
        • PCA_Based_Anomaly_Detection.py
        • Pycaret_Anomaly.ipynb
        • Pycaret_Example.py
        • Pydantic_More.py
        • Pydantic.py
        • Regression_Logistic_Metrics.ipynb
        • Regularisation.py
        • ROC_Curve.py
        • SVM_Example.py
        • Testing_Pytest.py
        • Testing_unittest.py
        • transfer_learning.py
        • TS_Anomaly_Detection.py
        • Vector_Embedding.py
        • Wikipedia_API.py
        • Word2Vec.py
      • PAPER
        • Attention Is All You Need
        • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
      • project-management
        • 1-on-1 Template
        • 1-to-1's with a Line Manager
        • Asking questions
        • Change Management
        • Communication principles
        • Communication Techniques
        • Communication with Stakeholders
        • Conceptual Model
        • Documentation
        • Education and Training
        • Experiment Plan Template
        • Feedback Template
        • Fishbone diagram
        • How to do git commit messages properly
        • html
        • Jobs to be done
        • Jupyter Book
        • Managing Data Science Teams
        • Modern data team
        • nbconvert slideshows
        • One Pager Template
        • pdoc
        • Problem Definition
        • Process for prototyping
        • project management
        • Project Management Portal
        • Pull Request Template
        • RACI
        • Remaining useful life models
        • Return of Experience Form
        • Reveal.js
        • Technical Debt
        • UML
        • Why use ER diagrams
      • statistics
        • Addressing Multicollinearity
        • ANOVA
        • Assumption of Normality
        • Bernoulli
        • Bootstrap Sampling
        • Casual Inference
        • Central Limit Theorem
        • Central Limit Theorem & Small Sample Sizes
        • Chi-Squared Test
        • Confidence Interval
        • Correlation
        • Correlation vs Causation
        • Cosine Similarity
        • Covariance
        • Covariance vs Correlation
        • Cryptography
        • Differentation
        • Distributions
        • EM Algorithm
        • Factor Analysis
        • Gaussian Distribution
        • Graph Theory
        • Grouped plots
        • Handling Different Distributions
        • Hypothesis testing
        • information theory
        • Interquartile Range (IQR) Detection
        • Johnson–Lindenstrauss lemma
        • Markov chain
        • Mathematics
        • Mean Absolute Error
        • Mean Squared Error
        • mean vs median
        • Multicollinearity
        • non-parametric
        • Odds
        • Odds vs Probability
        • p values
        • Parametric tests
        • parametric vs non-parametric models
        • parametric vs non-parametric tests
        • parsimonious
        • Prediction Intervals
        • Probability
        • Proportion Test
        • Q-Q Plot
        • R
        • R squared
        • R-squared metric not always a good indicator of model performance in regression
        • Reasoning tokens
        • Root Mean Squared Error
        • Sampling
        • Spearman vs Pearson Correlation
        • Standard deviation
        • Standardisation
        • Statistical Assumptions
        • Statistical Tests
        • Statistical theorems
        • Statistics
        • statsmodels
        • Stochastic Gradient Descent
        • Symbolic computation
        • Sympy
        • T-test
        • univariate vs multivariate
        • Variance
        • Violin plot
        • Z-Normalisation
        • Z-Score
        • Z-Scores vs Prediction Intervals
        • Z-Test
      • uncategorised
        • Investigate pyodbc
        • NLP Portal
        • Science Portal
      • pages
        • Data Archive
        • DE_Tools
        • ML_Tools
        • Quotes
        • Research Questions
        • Reviews
    Home

    ❯

    categories

    ❯

    deep learning

    Folder: categories/deep-learning

    8 items under this folder.

    • 29 Sept 2025

      Convolutional Neural Networks

      • deep_learning
    • 29 Sept 2025

      Deep Learning

      • deep_learning
    • 29 Sept 2025

      How is reinforcement learning being combined with deep learning

      • deep_learning
    • 29 Sept 2025

      LSTM

      • deep_learning
      • time_series
    • 29 Sept 2025

      Multi-Agent Reinforcement Learning

      • agents
      • deep_learning
    • 29 Sept 2025

      Policy

      • 29 Sept 2025

        Relu

        • deep_learning
      • 29 Sept 2025

        Sarsa

        • deep_learning

      Backlinks

      • No backlinks found
        • categories
          • computer-science
            • Algorithms
            • Big O Notation
            • BM25 (Best Match 25)
            • Checksum
            • Computer Science
            • Concurrency
            • Convex Optimisation
            • csv module
            • Directed Acyclic Graph (DAG)
            • Flask
            • garbage collector
            • Generators in Python
            • Hash
            • Heap Data Structure
            • Heap Memory
            • How to search within a graph
            • Immutable vs mutable
            • Java
            • Java vs JavaScript
            • JavaScript
            • Knowledge Graph
            • Langchain
            • Machine Learning Algorithms
            • Monte Carlo Simulation
            • Multiprocessing vs Multithreading
            • Multithreading
            • neomodel
            • Node.JS
            • Numpy
            • Processes vs Threads
            • programming languages
            • PyGraphviz
            • QuickSort
            • Ranking models
            • Recursive Algorithm
            • Strongly vs Weakly typed language
            • Times Series Python Packages
          • data-analysis
            • Altair
            • altair versus seaborn
            • Binder
            • Boxplot
            • Dash
            • Dashboarding
            • Dashboards
            • Data Analysis
            • Data Analysis Portal
            • Data Analyst
            • Data Distribution
            • Data Mining
            • Data Product
            • Data Reduction
            • Data Visualisation
            • DuckDB
            • EDA
            • ER Diagrams
            • Heatmap
            • Label encoding
            • Linear Discriminant Analysis
            • Log transformation
            • Looker Studio
            • MariaDB vs MySQL
            • Melt
            • Multiple Correspondence Analysis
            • Multivariate Analysis
            • OLAP
            • Page Rank
            • Parquet
            • Plotly
            • PowerBI
            • Preprocessing
            • Preprocessing Text Classification
            • Seaborn
            • SQL Window functions
            • t-SNE
            • Tableau
          • data-engineering
            • ACID Transaction
            • Ada boosting
            • Adding a database to PostgreSQL
            • Aggregation
            • Apache Iceberg
            • Attack mitigation
            • Attack types
            • AWS Lambda
            • Azure
            • Bagging
            • Benefits of Data Transformation
            • Big Data
            • BigQuery
            • Cassandra
            • Cloud Providers
            • Coaching & Mentoring
            • Columnar Storage
            • Command Prompt
            • Common Table Expression
            • Components of the database
            • Covering Index
            • Crosstab
            • CRUD
            • CUDA
            • Curse of dimensionality
            • Cypher
            • Data Architect
            • Data Architecture
            • Data Cleansing
            • Data Contract
            • Data Deployment
            • Data Dictionary
            • Data Drift
            • Data Engineering
            • Data Engineering Portal
            • Data Engineering Tools
            • Data Evaluation
            • Data Hierarchy of Needs
            • Data Integration
            • Data Integrity
            • Data Lake
            • Data Lakehouse
            • Data Leakage
            • Data Lifecycle Management
            • data lineage
            • Data Management
            • Data Modeling
            • Data Observability
            • Data Principles
            • Data Quality
            • Data Security
            • Data Selection
            • Data Sources
            • Data Storage
            • Data Transformation
            • Data Transformation in Data Engineering
            • Data Transformation with Pandas
            • Data Validation
            • Data Virtualization
            • Data Warehouse
            • Database
            • Database Index
            • Database Management System (DBMS)
            • Database Schema
            • Database Storage
            • Database Techniques
            • Databricks 1
            • DataOps
            • dbt 1
            • design pattern
            • Digital twin
            • Distributed Computing
            • DuckDB in python
            • DuckDB vs SQLite
            • Durability
            • ELT
            • Estimator
            • ETL
            • ETL 1
            • ETL Pipeline Example
            • ETL vs ELT
            • EtLT
            • Event Driven Microservices
            • Event-Driven Architecture
            • Fabric
            • Faker
            • File Management
            • Folder Tree Diagram
            • Foreign Key
            • Github Actions
            • Google Sheet Pivots Table
            • Grain
            • Graph Query Language
            • Groupby
            • Groupby vs Crosstab
            • heterogeneous features
            • Honkit
            • Hosting
            • How is schema evolution done in practice with SQL
            • How to normalise a merged table
            • Implementing Database Schema
            • Imputation Techniques
            • in-memory format
            • incremental synchronization
            • Indexing in cypher
            • Input is Not Properly Sanitized
            • Joining Datasets
            • Junction Tables
            • KNIME
            • Logical Model
            • Many-to-Many Relationships
            • map reduce
            • MariaDB
            • master data management
            • Merge
            • Microsoft Access
            • Missing Data
            • Model Deployment
            • Monolith Architecture
            • Multi-level index
            • Multiprocessing
            • MySql
            • neo4j
            • Normalised Schema
            • NoSQL
            • Object Relational Mapper
            • OLTP
            • Overfitting
            • Pandas
            • Pandas join vs merge
            • Pandas Pivot Table
            • Pandas Stack
            • pd.Grouper
            • pgAdmin
            • Pgadmin Permissions on Windows
            • Physical Model
            • Pickle
            • Poetry
            • Polars
            • PostgreSQL
            • Postman
            • PowerShell
            • Prevention Is Better Than The Cure
            • Primary Key
            • Push-Down
            • Pydantic
            • Pyright vs Pydantic
            • Query Optimisation
            • Querying
            • Querying Time Series
            • Race Conditions
            • Relating Tables Together
            • Relational Database
            • reverse etl
            • rollup
            • Row parameters in SQL
            • Row-based Storage
            • Scalability
            • Scaling Server
            • Schema Evolution
            • Search
            • Security mitigation
            • Security Researcher
            • semantic layer
            • Single Source of Truth
            • Sklearn Pipiline
            • Slowly Changing Dimension
            • SMSS
            • Snowflake Schema
            • Soft Deletion
            • Software Design Patterns
            • Spreadsheets vs Databases
            • SQL
            • SQL Groupby
            • SQL Injection
            • SQL Joins
            • SQLAlchemy
            • SQLAlchemy vs. sqlite3
            • SQLite
            • SQLite Studio
            • Star Schema
            • storage layer object store
            • Stored Procedures
            • structured data
            • Structuring and organizing data
            • Transaction
            • Turning a flat file into a database
            • Types of Database Schema
            • Unix
            • unstructured data
            • Usability
            • Vacuum
            • Vector Database
            • Vectorized Engine
            • View Use Case
            • Views
            • Windows Subsystem for Linux
          • data-science
            • ACF Plots
            • Additive vs Multiplicative Models Time Series
            • ADF Test
            • Agent Exploration
            • Agentic Solutions
            • AI
            • ARIMA
            • ARIMA vs Random Forest in Time Series
            • Autocorrelation
            • Autocorrelation vs Autoregression
            • Autoregression
            • Baseline Forecast
            • Basics of Time Series
            • Batch gradient descent
            • Bellman Equations
            • Bias-Variance Trade Off
            • Capability
            • Choosing a Threshold
            • Choosing the Number of Clusters
            • Clustermap
            • Covariance Structures
            • Cross Validation
            • Data Assessment
            • Data Collection
            • Data Mining - CRISP
            • Data Preparation
            • Data Science
            • Data Scientist
            • Data Understanding
            • Datasets
            • Decomposition in Time Series
            • Differencing in Time Series
            • DS & ML Portal
            • Evaluating Time Series Forecasts
            • Evolving Seasonality
            • F-statistic
            • Feature Engineering
            • Feature Scaling
            • Feature Selection vs Feature Importance
            • Forecasting using Lags
            • Forward Propagation
            • Gaussian Mixture Models
            • Gitlab
            • Gompertz Model
            • Good Enough Principle in Data Projects
            • GraphRAG
            • Handling Missing Data
            • Holt-Winters (Exponential Smoothing)
            • Holt-Winters vs ARIMA
            • Holt’s Linear Trend Model (Double Exponential Smoothing)
            • how do you do the data selection
            • Imbalanced Datasets
            • Interpolation
            • Intervention Analysis
            • Joining Time Series
            • Kernel Machines
            • KPSS Test
            • Latency
            • Logistic Model Curve
            • LSTM in Time Series
            • Mean Absolute Percentage Error
            • MNIST
            • Normalisation
            • Out-of-sample rolling forecast evaluation
            • PACF Plots
            • Performance Dimensions
            • pmdarima
            • Properties of Time Series Models
            • Random Forest Regression
            • Residuals in Time Series
            • Scatter Plots
            • Scientific Method
            • Scipy
            • Seasonal Naive Forecast
            • Seasonality in Time Series
            • SHapley Additive exPlanations
            • Shot Learning
            • Silhouette Analysis
            • Simple Exponential Smoothing (SES)
            • sklearn datasets
            • SMOTE (Synthetic Minority Over-sampling Technique)
            • SparseCategorialCrossentropy or CategoricalCrossEntropy
            • stack memory
            • Stacking
            • Stationary Time Series
            • STL Decomposition
            • Time Series
            • Time Series Forecasting
            • Time Series Forecasts in Business
            • Time Series Learning Resources
            • Time Series Shocks
            • Trends in Time Series
          • deep-learning
            • Convolutional Neural Networks
            • Deep Learning
            • How is reinforcement learning being combined with deep learning
            • LSTM
            • Multi-Agent Reinforcement Learning
            • Policy
            • Relu
            • Sarsa
          • devops
            • AB testing
            • Alternatives to Batch Processing
            • Amazon S3
            • Apache Airflow
            • Apache Kafka
            • Apache Spark
            • API
            • API Driven Microservices
            • Bash
            • bat
            • Batch Processing
            • Batch vs PowerShell scripts
            • CI-CD
            • Clustering_Dashboard.py
            • Code Diagrams
            • Command Line
            • Continuous Delivery - Deployment
            • Continuous Integration
            • Cron jobs
            • dagster
            • Data Ingestion
            • Data Orchestration
            • Data Pipeline
            • Data Pipeline to Data Products
            • Data Streaming
            • Databricks
            • Databricks vs Snowflake
            • dbt
            • Debugging
            • Declarative Data Pipeline
            • dependency manager
            • DevOps
            • Devops Portal
            • Digital Transformation
            • Docker
            • Docker Image
            • Elastic Net
            • Environment Variables
            • Epub
            • Event Driven
            • Event Driven Events
            • Everything
            • Excel
            • Excel pivot table
            • Excel vs Google Sheets
            • FastAPI
            • Firebase
            • frontend
            • functional programming
            • GIS
            • Git
            • Github Gists
            • gitlab-ci.yml
            • Global Interpreter Lock
            • Google Cloud Platform
            • Google Colab
            • Google My Maps Data Extraction
            • Google Sheets
            • GPT
            • Gradio
            • Grep
            • Hadoop
            • Hugging Face
            • imperative
            • ipynb
            • jinja template
            • Json
            • Json to SQLite
            • jupytext
            • Justfile
            • kubernetes
            • Load Balancing
            • Maintainability
            • Maintainable Code
            • Makefile
            • Master Observability Datadog
            • Memory
            • Memory Caching
            • Microsoft
            • MongoDB
            • nbconvert
            • NET
            • Normalisation of Text
            • Pandas Series vs DataFrame
            • Pandoc
            • PMML
            • Powerquery
            • Powershell scripts
            • Powershell versus Command Prompt
            • Powershell vs Bash
            • Publish and Subscribe
            • PySpark
            • Pytest
            • Python
            • Python Click
            • Quartz
            • Random Access Memory
            • React
            • Registering a Scheduled Task
            • REST API
            • Scala
            • Security Vulnerabilities
            • shapefile
            • Sharepoint
            • Snowflake
            • Snowflake vs Hadoop
            • Software Development Life Cycle
            • SQL vs NoSQL
            • Streamlit
            • Technical Design Doc Template
            • Terminal commands
            • Testing
            • TOML
            • tool.bandit
            • tool.ruff
            • tool.uv
            • Types of Computational Bugs
            • TypeScript
            • Ubuntu
            • unittest
            • Vercel
            • Virtual environments
            • Web Feature Server (WFS)
            • Web Map Tile Service (WMTS)
            • Why JSON is Better than Pickle for Untrusted Data
            • Windows
            • Windows Scheduled Tasks
            • yaml
          • industry
            • AI Engineer
            • AI governance
            • Analytics Engineer
            • business intelligence
            • Business observability
            • Business Understanding
            • Business Values
            • Data AI Education at Work
            • Data Engineer
            • Data Governance
            • data literacy
            • Data Roles
            • Data Steward
            • Design Thinking Questions
            • Documentation & Meetings
            • Energy
            • Energy ABM
            • Energy Demand Forecasting
            • Energy Storage
            • Facts
            • Gartner Hype Cycle
            • Industries of interest
            • Knowledge Work
            • Managing People
            • ML Engineer
            • Network Design
            • Operational Resilience for Growth and Adaptability
            • Reporting
            • Scaling Data Science Capability
            • Smart Grids
            • Telecommunications
            • Thinking Systems
            • Use of RNNs in energy sector
            • Working with SMEs
          • machine-learning
            • Accuracy
            • Activation atlases
            • Activation Function
            • Active Learning
            • Adam Optimizer
            • Adaptive Learning Rates
            • Adjusted R squared
            • Agent-Based Modelling
            • AIC in Model Evaluation
            • Anomaly Detection
            • Anomaly Detection in Time Series
            • Anomaly Detection with Clustering
            • Anomaly Detection with Statistical Methods
            • Assessing Gen AI generated content
            • AUC
            • Automated Feature Creation
            • AutoML
            • Backpropagation
            • Batch Normalisation
            • Bias in ML
            • Binary Classification
            • Boosting
            • Business value of anomaly detection
            • CART
            • CatBoost
            • Challenges to Model Deployment
            • Class Separability
            • Classification
            • Classification Report
            • Cluster Density
            • Cluster Seperation
            • Clustering
            • Collaborative Filtering
            • conceptual data model
            • Confusion Matrix
            • Cost Function
            • Cost-Sensitive Analysis
            • Cross Entropy
            • Customer Growth Modeling
            • Data Selection in ML
            • Data Transformation in Machine Learning
            • DBSCAN
            • Decision Theory
            • Decision Tree
            • Decision Trees are Fragile
            • Deep Learning Frameworks
            • Deep Q-Learning
            • Dendrograms
            • Determining Threshold Values
            • Dimension Table
            • Dimensional Modelling
            • Dimensionality Reduction
            • Dimensions
            • Distributions in Decision Tree Leaves
            • Dropout
            • Dummy variable trap
            • Edge ML
            • emergent behavior
            • Encoding Categorical Variables
            • Epoch
            • Evaluating Language Models
            • Evaluating Logistic Regression
            • Evaluating the effectiveness of prompts
            • Evaluation Metrics
            • Exploration vs Exploitation
            • Exponential Smoothing
            • f-regression
            • F1 Score
            • Fact Table
            • FAISS
            • Feature Engineering for Time Series
            • Feature Evaluation
            • Feature Extraction
            • Feature Importance
            • Feature Selection
            • Feature Transformations
            • Feed Forward Neural Network
            • Filter Methods
            • Fitting weights and biases of a neural network
            • Framework for models
            • Gaussian Model
            • General Linear Regression
            • Generalisation
            • Generative Adversarial Networks
            • Gini Impurity
            • Gini Impurity vs Cross Entropy
            • Gradient Boosted Trees
            • Gradient Boosting
            • Gradient Boosting Regressor
            • Gradient Descent
            • Gradient descent in linear regression
            • granularity
            • Graph Neural Network
            • Graph Theory Community
            • GridSeachCv
            • Growth Models in Time Series
            • GRU
            • Hierarchical Clustering
            • High cross validation accuracy is not directly proportional to performance on unseen test data
            • Histogram
            • How do we evaluate of LLM Outputs
            • How to use Sklearn Pipeline
            • Hyperparameter
            • Hyperparameter Tuning
            • Impact of multicollinearity on model parameters
            • Inertia K Means Cost Function
            • inference
            • inference versus prediction
            • initialization methods
            • Interoperability
            • interoperable
            • Interpretability
            • Interpreting logistic regression model parameters
            • Isolated Forest
            • Jaccard Coefficient
            • K-means
            • K-nearest neighbours
            • Keras
            • Kernel Density Estimation
            • Kernelling
            • Kmeans vs GMM
            • L1 Regularisation
            • Label encoding vs One-hot encoding
            • Labelling data
            • Lagrange multipliers in optimisation
            • lambda architecture
            • Latent Dirichlet Allocation
            • Latent Semantic Indexing
            • LBFGS
            • Learning Curve
            • Learning Rate
            • Learning Styles
            • LightGBM
            • LightGBM vs XGBoost vs CatBoost
            • Linear Regression
            • LLM Evaluation Metrics
            • Local Interpretable Model-agnostic Explainations
            • Local Outlier Factor (LOF)
            • Logistic Regression
            • Logistic Regression does not predict probabilities
            • Logistic regression in sklearn & Gradient Descent
            • Logistic Regression Statsmodel Summary table
            • Loss function
            • Loss versus Cost function
            • Machine Learning
            • Machine Learning Operations
            • Manifold Learning
            • Markov Decision Processes
            • Maximum Likelihood Estimation
            • Median Absolute Error
            • Mermaid
            • Metadata Handling
            • Methods for Handling Outliers
            • Metric
            • Mini-batch gradient descent
            • MLOPS for Time Series
            • Model Building
            • Model Deployment using PyCaret
            • Model Ensemble
            • Model Evaluation
            • Model Evaluation vs Model Optimisation
            • Model Interpretability
            • Model Observability
            • Model Optimisation
            • Model Parameters
            • Model Parameters Tuning
            • Model parameters vs hyperparameters
            • Model Selection
            • Model Validation
            • model-agnostic feature importance
            • Momentum
            • Moving Average Forecast
            • Multinomial Naive bayes
            • Multiple Linear Regression
            • Naive Bayes Classifier
            • Naive Forecast
            • Neural network
            • Neural Network Classification
            • Neural network in Practice
            • Neural Scaling Laws
            • Non-negative matrix factorization in ML
            • Non-parametric tests
            • Normalisation of data
            • Normalisation vs Standardisation
            • objective function
            • One-hot encoding
            • Optimisation function
            • Optimisation techniques
            • Optimising a Logistic Regression Model
            • Optimising Neural Networks
            • Optuna
            • Ordinary Least Squares
            • Orthogonalization
            • Outliers
            • Over parameterised models
            • PCA Explained Variance Ratio
            • PCA Principal Components
            • PCA-Based Anomaly Detection
            • PDP and ICE
            • Percentile Detection
            • Performance Drift
            • Polynomial Regression
            • Positional Encoding
            • Precision
            • Precision or Recall
            • Precision-Recall Curve
            • Prediction Intervals vs Confidence Interval
            • Principal Component Analysis
            • PyCaret
            • PyOD
            • PyTorch
            • Pytorch vs Tensorflow
            • Q-Learning
            • Random Forest
            • Random Forest for Time Series
            • Recall
            • Recommender systems
            • Recurrent Neural Networks
            • Regression
            • Regression Metrics
            • Regularisation
            • Regularisation of Tree based models
            • Reinforcement learning
            • Relationships in memory
            • Reward Function
            • Ridge
            • ROC (Receiver Operating Characteristic)
            • Sammon’s Mapping
            • SARIMA
            • Scikit-Learn
            • Secretary Problem
            • semi-structured data
            • Sentence Transformers
            • Sklearn Pipeline
            • Specificity
            • Spectral Clustering
            • Supervised Learning
            • Support Vector Classifier
            • Support Vector Machines
            • Support Vector Regression
            • Tensorflow
            • Test Loss When Evaluating Models
            • Text Classification
            • Time Series Python Packages
            • Train-Dev-Test Sets
            • Transfer Learning
            • Transformed Target Regressor
            • Transformer
            • Transformers vs RNNs
            • Type I Error (False Positive)
            • Type II Error (False Negative)
            • Types of Neural Networks
            • Typical Output Formats in Neural Networks
            • UMAP
            • Unsupervised Learning
            • Use Cases for a Simple Neural Network Like
            • vanishing and exploding gradients problem
            • Variability in linear models
            • Variance in ML
            • Vector Embedding
            • WCSS and elbow method
            • Weak Learners
            • When and why not to us regularisation
            • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
            • Why does the Adam Optimizer converge
            • Why Removing Outliers May Improve Regression but Harm Classification
            • Why standardise features
            • Why Type 1 and Type 2 matter
            • Wrapper Methods
            • Xaiver
            • XGBoost
          • natural-language
            • AI Agents Memory
            • Attention mechanism
            • Bag of words
            • BERT
            • BERTScore
            • Chain of thought
            • ChatGPT
            • Claude
            • Comparing LLMs
            • Distillation
            • ElasticSearch
            • Embedded Methods
            • embeddings for OOV words
            • Evaluate Embedding Methods
            • Fuzzywuzzy
            • Generative AI
            • Generative AI From Theory to Practice
            • Grammar method
            • Guardrails
            • How businesses use Gen AI
            • How LLMs store facts
            • How to reduce the need for Gen AI responses
            • How would you decide between using TF-IDF and Word2Vec for text vectorization
            • In NER how would you handle ambiguous entities
            • Key Components of Attention and Formula
            • Knowledge graph vs RAG setup
            • Language Model Output Optimisation
            • Language Models
            • Language Models Large (LLMs) vs Small (SLMs)
            • lemmatization
            • LLM
            • LLM Memory
            • Local LLM use cases
            • Mathematical Reasoning in Transformers
            • Mixture of Experts
            • Model Cascading
            • Multi-head attention
            • Named Entity Recognition
            • NER Implementation
            • Ngrams
            • NLP
            • nltk
            • Non-negative Matrix Factorization
            • NotebookLM
            • OOV words
            • Pandas Dataframe Agent
            • Part of speech tagging
            • Prompt Engineering
            • prompt retrievers
            • Prompts
            • Pyright
            • RAG
            • Scaling Agentic Systems
            • Self attention vs multi-head attention
            • Self-Attention
            • Semantic Relationships
            • Semantic search
            • Sentence Similarity
            • Sentence Transformer Workflow
            • Similarity Search
            • Small Language Models
            • spaCy
            • Stemming
            • stopwords
            • Summarisation
            • syntactic relationships
            • Text2Cypher
            • TF-IDF
            • TF-IDF Implementation
            • Tokenisation
            • topic modeling
            • Vectorisation
            • Why is named entity recognition (NER) a challenging task
            • Word2vec
            • WordNet
          • OTHER
            • Addressing_Multicollinearity.py
            • Bag_of_Words.py
            • Bandit example output
            • Bandit_Example_Fixed.py
            • Click_Implementation.py
            • Comparing_Ensembles.py
            • Cross_Entropy_Single.py
            • Cross_Entropy.py
            • Debugging.py
            • Distribution_Analysis.py
            • Factor_Analysis.py
            • FastAPI_Example.py
            • Feature_Distribution.py
            • Forecasting_AutoArima.py
            • Forecasting_Baseline.py
            • Forecasting_Exponential_Smoothing.py
            • Gaussian_Mixture_Model_Implementation.py
            • Handling_Missing_Data_Basic.ipynb
            • Handling_Missing_Data.ipynb
            • Heatmaps_Dendrograms.py
            • Imbalanced_Datasets_SMOTE.py
            • K_Means.py
            • Momentum.py
            • One_hot_encoding.py
            • Pandas_Common.py
            • Pandas_Stack.py
            • PCA_Analysis.ipynb
            • PCA_Based_Anomaly_Detection.py
            • Pycaret_Anomaly.ipynb
            • Pycaret_Example.py
            • Pydantic_More.py
            • Pydantic.py
            • Regression_Logistic_Metrics.ipynb
            • Regularisation.py
            • ROC_Curve.py
            • SVM_Example.py
            • Testing_Pytest.py
            • Testing_unittest.py
            • transfer_learning.py
            • TS_Anomaly_Detection.py
            • Vector_Embedding.py
            • Wikipedia_API.py
            • Word2Vec.py
          • PAPER
            • Attention Is All You Need
            • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
          • project-management
            • 1-on-1 Template
            • 1-to-1's with a Line Manager
            • Asking questions
            • Change Management
            • Communication principles
            • Communication Techniques
            • Communication with Stakeholders
            • Conceptual Model
            • Documentation
            • Education and Training
            • Experiment Plan Template
            • Feedback Template
            • Fishbone diagram
            • How to do git commit messages properly
            • html
            • Jobs to be done
            • Jupyter Book
            • Managing Data Science Teams
            • Modern data team
            • nbconvert slideshows
            • One Pager Template
            • pdoc
            • Problem Definition
            • Process for prototyping
            • project management
            • Project Management Portal
            • Pull Request Template
            • RACI
            • Remaining useful life models
            • Return of Experience Form
            • Reveal.js
            • Technical Debt
            • UML
            • Why use ER diagrams
          • statistics
            • Addressing Multicollinearity
            • ANOVA
            • Assumption of Normality
            • Bernoulli
            • Bootstrap Sampling
            • Casual Inference
            • Central Limit Theorem
            • Central Limit Theorem & Small Sample Sizes
            • Chi-Squared Test
            • Confidence Interval
            • Correlation
            • Correlation vs Causation
            • Cosine Similarity
            • Covariance
            • Covariance vs Correlation
            • Cryptography
            • Differentation
            • Distributions
            • EM Algorithm
            • Factor Analysis
            • Gaussian Distribution
            • Graph Theory
            • Grouped plots
            • Handling Different Distributions
            • Hypothesis testing
            • information theory
            • Interquartile Range (IQR) Detection
            • Johnson–Lindenstrauss lemma
            • Markov chain
            • Mathematics
            • Mean Absolute Error
            • Mean Squared Error
            • mean vs median
            • Multicollinearity
            • non-parametric
            • Odds
            • Odds vs Probability
            • p values
            • Parametric tests
            • parametric vs non-parametric models
            • parametric vs non-parametric tests
            • parsimonious
            • Prediction Intervals
            • Probability
            • Proportion Test
            • Q-Q Plot
            • R
            • R squared
            • R-squared metric not always a good indicator of model performance in regression
            • Reasoning tokens
            • Root Mean Squared Error
            • Sampling
            • Spearman vs Pearson Correlation
            • Standard deviation
            • Standardisation
            • Statistical Assumptions
            • Statistical Tests
            • Statistical theorems
            • Statistics
            • statsmodels
            • Stochastic Gradient Descent
            • Symbolic computation
            • Sympy
            • T-test
            • univariate vs multivariate
            • Variance
            • Violin plot
            • Z-Normalisation
            • Z-Score
            • Z-Scores vs Prediction Intervals
            • Z-Test
          • uncategorised
            • Investigate pyodbc
            • NLP Portal
            • Science Portal
          • pages
            • Data Archive
            • DE_Tools
            • ML_Tools
            • Quotes
            • Research Questions
            • Reviews
          • categories
            • computer-science
              • Algorithms
              • Big O Notation
              • BM25 (Best Match 25)
              • Checksum
              • Computer Science
              • Concurrency
              • Convex Optimisation
              • csv module
              • Directed Acyclic Graph (DAG)
              • Flask
              • garbage collector
              • Generators in Python
              • Hash
              • Heap Data Structure
              • Heap Memory
              • How to search within a graph
              • Immutable vs mutable
              • Java
              • Java vs JavaScript
              • JavaScript
              • Knowledge Graph
              • Langchain
              • Machine Learning Algorithms
              • Monte Carlo Simulation
              • Multiprocessing vs Multithreading
              • Multithreading
              • neomodel
              • Node.JS
              • Numpy
              • Processes vs Threads
              • programming languages
              • PyGraphviz
              • QuickSort
              • Ranking models
              • Recursive Algorithm
              • Strongly vs Weakly typed language
              • Times Series Python Packages
            • data-analysis
              • Altair
              • altair versus seaborn
              • Binder
              • Boxplot
              • Dash
              • Dashboarding
              • Dashboards
              • Data Analysis
              • Data Analysis Portal
              • Data Analyst
              • Data Distribution
              • Data Mining
              • Data Product
              • Data Reduction
              • Data Visualisation
              • DuckDB
              • EDA
              • ER Diagrams
              • Heatmap
              • Label encoding
              • Linear Discriminant Analysis
              • Log transformation
              • Looker Studio
              • MariaDB vs MySQL
              • Melt
              • Multiple Correspondence Analysis
              • Multivariate Analysis
              • OLAP
              • Page Rank
              • Parquet
              • Plotly
              • PowerBI
              • Preprocessing
              • Preprocessing Text Classification
              • Seaborn
              • SQL Window functions
              • t-SNE
              • Tableau
            • data-engineering
              • ACID Transaction
              • Ada boosting
              • Adding a database to PostgreSQL
              • Aggregation
              • Apache Iceberg
              • Attack mitigation
              • Attack types
              • AWS Lambda
              • Azure
              • Bagging
              • Benefits of Data Transformation
              • Big Data
              • BigQuery
              • Cassandra
              • Cloud Providers
              • Coaching & Mentoring
              • Columnar Storage
              • Command Prompt
              • Common Table Expression
              • Components of the database
              • Covering Index
              • Crosstab
              • CRUD
              • CUDA
              • Curse of dimensionality
              • Cypher
              • Data Architect
              • Data Architecture
              • Data Cleansing
              • Data Contract
              • Data Deployment
              • Data Dictionary
              • Data Drift
              • Data Engineering
              • Data Engineering Portal
              • Data Engineering Tools
              • Data Evaluation
              • Data Hierarchy of Needs
              • Data Integration
              • Data Integrity
              • Data Lake
              • Data Lakehouse
              • Data Leakage
              • Data Lifecycle Management
              • data lineage
              • Data Management
              • Data Modeling
              • Data Observability
              • Data Principles
              • Data Quality
              • Data Security
              • Data Selection
              • Data Sources
              • Data Storage
              • Data Transformation
              • Data Transformation in Data Engineering
              • Data Transformation with Pandas
              • Data Validation
              • Data Virtualization
              • Data Warehouse
              • Database
              • Database Index
              • Database Management System (DBMS)
              • Database Schema
              • Database Storage
              • Database Techniques
              • Databricks 1
              • DataOps
              • dbt 1
              • design pattern
              • Digital twin
              • Distributed Computing
              • DuckDB in python
              • DuckDB vs SQLite
              • Durability
              • ELT
              • Estimator
              • ETL
              • ETL 1
              • ETL Pipeline Example
              • ETL vs ELT
              • EtLT
              • Event Driven Microservices
              • Event-Driven Architecture
              • Fabric
              • Faker
              • File Management
              • Folder Tree Diagram
              • Foreign Key
              • Github Actions
              • Google Sheet Pivots Table
              • Grain
              • Graph Query Language
              • Groupby
              • Groupby vs Crosstab
              • heterogeneous features
              • Honkit
              • Hosting
              • How is schema evolution done in practice with SQL
              • How to normalise a merged table
              • Implementing Database Schema
              • Imputation Techniques
              • in-memory format
              • incremental synchronization
              • Indexing in cypher
              • Input is Not Properly Sanitized
              • Joining Datasets
              • Junction Tables
              • KNIME
              • Logical Model
              • Many-to-Many Relationships
              • map reduce
              • MariaDB
              • master data management
              • Merge
              • Microsoft Access
              • Missing Data
              • Model Deployment
              • Monolith Architecture
              • Multi-level index
              • Multiprocessing
              • MySql
              • neo4j
              • Normalised Schema
              • NoSQL
              • Object Relational Mapper
              • OLTP
              • Overfitting
              • Pandas
              • Pandas join vs merge
              • Pandas Pivot Table
              • Pandas Stack
              • pd.Grouper
              • pgAdmin
              • Pgadmin Permissions on Windows
              • Physical Model
              • Pickle
              • Poetry
              • Polars
              • PostgreSQL
              • Postman
              • PowerShell
              • Prevention Is Better Than The Cure
              • Primary Key
              • Push-Down
              • Pydantic
              • Pyright vs Pydantic
              • Query Optimisation
              • Querying
              • Querying Time Series
              • Race Conditions
              • Relating Tables Together
              • Relational Database
              • reverse etl
              • rollup
              • Row parameters in SQL
              • Row-based Storage
              • Scalability
              • Scaling Server
              • Schema Evolution
              • Search
              • Security mitigation
              • Security Researcher
              • semantic layer
              • Single Source of Truth
              • Sklearn Pipiline
              • Slowly Changing Dimension
              • SMSS
              • Snowflake Schema
              • Soft Deletion
              • Software Design Patterns
              • Spreadsheets vs Databases
              • SQL
              • SQL Groupby
              • SQL Injection
              • SQL Joins
              • SQLAlchemy
              • SQLAlchemy vs. sqlite3
              • SQLite
              • SQLite Studio
              • Star Schema
              • storage layer object store
              • Stored Procedures
              • structured data
              • Structuring and organizing data
              • Transaction
              • Turning a flat file into a database
              • Types of Database Schema
              • Unix
              • unstructured data
              • Usability
              • Vacuum
              • Vector Database
              • Vectorized Engine
              • View Use Case
              • Views
              • Windows Subsystem for Linux
            • data-science
              • ACF Plots
              • Additive vs Multiplicative Models Time Series
              • ADF Test
              • Agent Exploration
              • Agentic Solutions
              • AI
              • ARIMA
              • ARIMA vs Random Forest in Time Series
              • Autocorrelation
              • Autocorrelation vs Autoregression
              • Autoregression
              • Baseline Forecast
              • Basics of Time Series
              • Batch gradient descent
              • Bellman Equations
              • Bias-Variance Trade Off
              • Capability
              • Choosing a Threshold
              • Choosing the Number of Clusters
              • Clustermap
              • Covariance Structures
              • Cross Validation
              • Data Assessment
              • Data Collection
              • Data Mining - CRISP
              • Data Preparation
              • Data Science
              • Data Scientist
              • Data Understanding
              • Datasets
              • Decomposition in Time Series
              • Differencing in Time Series
              • DS & ML Portal
              • Evaluating Time Series Forecasts
              • Evolving Seasonality
              • F-statistic
              • Feature Engineering
              • Feature Scaling
              • Feature Selection vs Feature Importance
              • Forecasting using Lags
              • Forward Propagation
              • Gaussian Mixture Models
              • Gitlab
              • Gompertz Model
              • Good Enough Principle in Data Projects
              • GraphRAG
              • Handling Missing Data
              • Holt-Winters (Exponential Smoothing)
              • Holt-Winters vs ARIMA
              • Holt’s Linear Trend Model (Double Exponential Smoothing)
              • how do you do the data selection
              • Imbalanced Datasets
              • Interpolation
              • Intervention Analysis
              • Joining Time Series
              • Kernel Machines
              • KPSS Test
              • Latency
              • Logistic Model Curve
              • LSTM in Time Series
              • Mean Absolute Percentage Error
              • MNIST
              • Normalisation
              • Out-of-sample rolling forecast evaluation
              • PACF Plots
              • Performance Dimensions
              • pmdarima
              • Properties of Time Series Models
              • Random Forest Regression
              • Residuals in Time Series
              • Scatter Plots
              • Scientific Method
              • Scipy
              • Seasonal Naive Forecast
              • Seasonality in Time Series
              • SHapley Additive exPlanations
              • Shot Learning
              • Silhouette Analysis
              • Simple Exponential Smoothing (SES)
              • sklearn datasets
              • SMOTE (Synthetic Minority Over-sampling Technique)
              • SparseCategorialCrossentropy or CategoricalCrossEntropy
              • stack memory
              • Stacking
              • Stationary Time Series
              • STL Decomposition
              • Time Series
              • Time Series Forecasting
              • Time Series Forecasts in Business
              • Time Series Learning Resources
              • Time Series Shocks
              • Trends in Time Series
            • deep-learning
              • Convolutional Neural Networks
              • Deep Learning
              • How is reinforcement learning being combined with deep learning
              • LSTM
              • Multi-Agent Reinforcement Learning
              • Policy
              • Relu
              • Sarsa
            • devops
              • AB testing
              • Alternatives to Batch Processing
              • Amazon S3
              • Apache Airflow
              • Apache Kafka
              • Apache Spark
              • API
              • API Driven Microservices
              • Bash
              • bat
              • Batch Processing
              • Batch vs PowerShell scripts
              • CI-CD
              • Clustering_Dashboard.py
              • Code Diagrams
              • Command Line
              • Continuous Delivery - Deployment
              • Continuous Integration
              • Cron jobs
              • dagster
              • Data Ingestion
              • Data Orchestration
              • Data Pipeline
              • Data Pipeline to Data Products
              • Data Streaming
              • Databricks
              • Databricks vs Snowflake
              • dbt
              • Debugging
              • Declarative Data Pipeline
              • dependency manager
              • DevOps
              • Devops Portal
              • Digital Transformation
              • Docker
              • Docker Image
              • Elastic Net
              • Environment Variables
              • Epub
              • Event Driven
              • Event Driven Events
              • Everything
              • Excel
              • Excel pivot table
              • Excel vs Google Sheets
              • FastAPI
              • Firebase
              • frontend
              • functional programming
              • GIS
              • Git
              • Github Gists
              • gitlab-ci.yml
              • Global Interpreter Lock
              • Google Cloud Platform
              • Google Colab
              • Google My Maps Data Extraction
              • Google Sheets
              • GPT
              • Gradio
              • Grep
              • Hadoop
              • Hugging Face
              • imperative
              • ipynb
              • jinja template
              • Json
              • Json to SQLite
              • jupytext
              • Justfile
              • kubernetes
              • Load Balancing
              • Maintainability
              • Maintainable Code
              • Makefile
              • Master Observability Datadog
              • Memory
              • Memory Caching
              • Microsoft
              • MongoDB
              • nbconvert
              • NET
              • Normalisation of Text
              • Pandas Series vs DataFrame
              • Pandoc
              • PMML
              • Powerquery
              • Powershell scripts
              • Powershell versus Command Prompt
              • Powershell vs Bash
              • Publish and Subscribe
              • PySpark
              • Pytest
              • Python
              • Python Click
              • Quartz
              • Random Access Memory
              • React
              • Registering a Scheduled Task
              • REST API
              • Scala
              • Security Vulnerabilities
              • shapefile
              • Sharepoint
              • Snowflake
              • Snowflake vs Hadoop
              • Software Development Life Cycle
              • SQL vs NoSQL
              • Streamlit
              • Technical Design Doc Template
              • Terminal commands
              • Testing
              • TOML
              • tool.bandit
              • tool.ruff
              • tool.uv
              • Types of Computational Bugs
              • TypeScript
              • Ubuntu
              • unittest
              • Vercel
              • Virtual environments
              • Web Feature Server (WFS)
              • Web Map Tile Service (WMTS)
              • Why JSON is Better than Pickle for Untrusted Data
              • Windows
              • Windows Scheduled Tasks
              • yaml
            • industry
              • AI Engineer
              • AI governance
              • Analytics Engineer
              • business intelligence
              • Business observability
              • Business Understanding
              • Business Values
              • Data AI Education at Work
              • Data Engineer
              • Data Governance
              • data literacy
              • Data Roles
              • Data Steward
              • Design Thinking Questions
              • Documentation & Meetings
              • Energy
              • Energy ABM
              • Energy Demand Forecasting
              • Energy Storage
              • Facts
              • Gartner Hype Cycle
              • Industries of interest
              • Knowledge Work
              • Managing People
              • ML Engineer
              • Network Design
              • Operational Resilience for Growth and Adaptability
              • Reporting
              • Scaling Data Science Capability
              • Smart Grids
              • Telecommunications
              • Thinking Systems
              • Use of RNNs in energy sector
              • Working with SMEs
            • machine-learning
              • Accuracy
              • Activation atlases
              • Activation Function
              • Active Learning
              • Adam Optimizer
              • Adaptive Learning Rates
              • Adjusted R squared
              • Agent-Based Modelling
              • AIC in Model Evaluation
              • Anomaly Detection
              • Anomaly Detection in Time Series
              • Anomaly Detection with Clustering
              • Anomaly Detection with Statistical Methods
              • Assessing Gen AI generated content
              • AUC
              • Automated Feature Creation
              • AutoML
              • Backpropagation
              • Batch Normalisation
              • Bias in ML
              • Binary Classification
              • Boosting
              • Business value of anomaly detection
              • CART
              • CatBoost
              • Challenges to Model Deployment
              • Class Separability
              • Classification
              • Classification Report
              • Cluster Density
              • Cluster Seperation
              • Clustering
              • Collaborative Filtering
              • conceptual data model
              • Confusion Matrix
              • Cost Function
              • Cost-Sensitive Analysis
              • Cross Entropy
              • Customer Growth Modeling
              • Data Selection in ML
              • Data Transformation in Machine Learning
              • DBSCAN
              • Decision Theory
              • Decision Tree
              • Decision Trees are Fragile
              • Deep Learning Frameworks
              • Deep Q-Learning
              • Dendrograms
              • Determining Threshold Values
              • Dimension Table
              • Dimensional Modelling
              • Dimensionality Reduction
              • Dimensions
              • Distributions in Decision Tree Leaves
              • Dropout
              • Dummy variable trap
              • Edge ML
              • emergent behavior
              • Encoding Categorical Variables
              • Epoch
              • Evaluating Language Models
              • Evaluating Logistic Regression
              • Evaluating the effectiveness of prompts
              • Evaluation Metrics
              • Exploration vs Exploitation
              • Exponential Smoothing
              • f-regression
              • F1 Score
              • Fact Table
              • FAISS
              • Feature Engineering for Time Series
              • Feature Evaluation
              • Feature Extraction
              • Feature Importance
              • Feature Selection
              • Feature Transformations
              • Feed Forward Neural Network
              • Filter Methods
              • Fitting weights and biases of a neural network
              • Framework for models
              • Gaussian Model
              • General Linear Regression
              • Generalisation
              • Generative Adversarial Networks
              • Gini Impurity
              • Gini Impurity vs Cross Entropy
              • Gradient Boosted Trees
              • Gradient Boosting
              • Gradient Boosting Regressor
              • Gradient Descent
              • Gradient descent in linear regression
              • granularity
              • Graph Neural Network
              • Graph Theory Community
              • GridSeachCv
              • Growth Models in Time Series
              • GRU
              • Hierarchical Clustering
              • High cross validation accuracy is not directly proportional to performance on unseen test data
              • Histogram
              • How do we evaluate of LLM Outputs
              • How to use Sklearn Pipeline
              • Hyperparameter
              • Hyperparameter Tuning
              • Impact of multicollinearity on model parameters
              • Inertia K Means Cost Function
              • inference
              • inference versus prediction
              • initialization methods
              • Interoperability
              • interoperable
              • Interpretability
              • Interpreting logistic regression model parameters
              • Isolated Forest
              • Jaccard Coefficient
              • K-means
              • K-nearest neighbours
              • Keras
              • Kernel Density Estimation
              • Kernelling
              • Kmeans vs GMM
              • L1 Regularisation
              • Label encoding vs One-hot encoding
              • Labelling data
              • Lagrange multipliers in optimisation
              • lambda architecture
              • Latent Dirichlet Allocation
              • Latent Semantic Indexing
              • LBFGS
              • Learning Curve
              • Learning Rate
              • Learning Styles
              • LightGBM
              • LightGBM vs XGBoost vs CatBoost
              • Linear Regression
              • LLM Evaluation Metrics
              • Local Interpretable Model-agnostic Explainations
              • Local Outlier Factor (LOF)
              • Logistic Regression
              • Logistic Regression does not predict probabilities
              • Logistic regression in sklearn & Gradient Descent
              • Logistic Regression Statsmodel Summary table
              • Loss function
              • Loss versus Cost function
              • Machine Learning
              • Machine Learning Operations
              • Manifold Learning
              • Markov Decision Processes
              • Maximum Likelihood Estimation
              • Median Absolute Error
              • Mermaid
              • Metadata Handling
              • Methods for Handling Outliers
              • Metric
              • Mini-batch gradient descent
              • MLOPS for Time Series
              • Model Building
              • Model Deployment using PyCaret
              • Model Ensemble
              • Model Evaluation
              • Model Evaluation vs Model Optimisation
              • Model Interpretability
              • Model Observability
              • Model Optimisation
              • Model Parameters
              • Model Parameters Tuning
              • Model parameters vs hyperparameters
              • Model Selection
              • Model Validation
              • model-agnostic feature importance
              • Momentum
              • Moving Average Forecast
              • Multinomial Naive bayes
              • Multiple Linear Regression
              • Naive Bayes Classifier
              • Naive Forecast
              • Neural network
              • Neural Network Classification
              • Neural network in Practice
              • Neural Scaling Laws
              • Non-negative matrix factorization in ML
              • Non-parametric tests
              • Normalisation of data
              • Normalisation vs Standardisation
              • objective function
              • One-hot encoding
              • Optimisation function
              • Optimisation techniques
              • Optimising a Logistic Regression Model
              • Optimising Neural Networks
              • Optuna
              • Ordinary Least Squares
              • Orthogonalization
              • Outliers
              • Over parameterised models
              • PCA Explained Variance Ratio
              • PCA Principal Components
              • PCA-Based Anomaly Detection
              • PDP and ICE
              • Percentile Detection
              • Performance Drift
              • Polynomial Regression
              • Positional Encoding
              • Precision
              • Precision or Recall
              • Precision-Recall Curve
              • Prediction Intervals vs Confidence Interval
              • Principal Component Analysis
              • PyCaret
              • PyOD
              • PyTorch
              • Pytorch vs Tensorflow
              • Q-Learning
              • Random Forest
              • Random Forest for Time Series
              • Recall
              • Recommender systems
              • Recurrent Neural Networks
              • Regression
              • Regression Metrics
              • Regularisation
              • Regularisation of Tree based models
              • Reinforcement learning
              • Relationships in memory
              • Reward Function
              • Ridge
              • ROC (Receiver Operating Characteristic)
              • Sammon’s Mapping
              • SARIMA
              • Scikit-Learn
              • Secretary Problem
              • semi-structured data
              • Sentence Transformers
              • Sklearn Pipeline
              • Specificity
              • Spectral Clustering
              • Supervised Learning
              • Support Vector Classifier
              • Support Vector Machines
              • Support Vector Regression
              • Tensorflow
              • Test Loss When Evaluating Models
              • Text Classification
              • Time Series Python Packages
              • Train-Dev-Test Sets
              • Transfer Learning
              • Transformed Target Regressor
              • Transformer
              • Transformers vs RNNs
              • Type I Error (False Positive)
              • Type II Error (False Negative)
              • Types of Neural Networks
              • Typical Output Formats in Neural Networks
              • UMAP
              • Unsupervised Learning
              • Use Cases for a Simple Neural Network Like
              • vanishing and exploding gradients problem
              • Variability in linear models
              • Variance in ML
              • Vector Embedding
              • WCSS and elbow method
              • Weak Learners
              • When and why not to us regularisation
              • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
              • Why does the Adam Optimizer converge
              • Why Removing Outliers May Improve Regression but Harm Classification
              • Why standardise features
              • Why Type 1 and Type 2 matter
              • Wrapper Methods
              • Xaiver
              • XGBoost
            • natural-language
              • AI Agents Memory
              • Attention mechanism
              • Bag of words
              • BERT
              • BERTScore
              • Chain of thought
              • ChatGPT
              • Claude
              • Comparing LLMs
              • Distillation
              • ElasticSearch
              • Embedded Methods
              • embeddings for OOV words
              • Evaluate Embedding Methods
              • Fuzzywuzzy
              • Generative AI
              • Generative AI From Theory to Practice
              • Grammar method
              • Guardrails
              • How businesses use Gen AI
              • How LLMs store facts
              • How to reduce the need for Gen AI responses
              • How would you decide between using TF-IDF and Word2Vec for text vectorization
              • In NER how would you handle ambiguous entities
              • Key Components of Attention and Formula
              • Knowledge graph vs RAG setup
              • Language Model Output Optimisation
              • Language Models
              • Language Models Large (LLMs) vs Small (SLMs)
              • lemmatization
              • LLM
              • LLM Memory
              • Local LLM use cases
              • Mathematical Reasoning in Transformers
              • Mixture of Experts
              • Model Cascading
              • Multi-head attention
              • Named Entity Recognition
              • NER Implementation
              • Ngrams
              • NLP
              • nltk
              • Non-negative Matrix Factorization
              • NotebookLM
              • OOV words
              • Pandas Dataframe Agent
              • Part of speech tagging
              • Prompt Engineering
              • prompt retrievers
              • Prompts
              • Pyright
              • RAG
              • Scaling Agentic Systems
              • Self attention vs multi-head attention
              • Self-Attention
              • Semantic Relationships
              • Semantic search
              • Sentence Similarity
              • Sentence Transformer Workflow
              • Similarity Search
              • Small Language Models
              • spaCy
              • Stemming
              • stopwords
              • Summarisation
              • syntactic relationships
              • Text2Cypher
              • TF-IDF
              • TF-IDF Implementation
              • Tokenisation
              • topic modeling
              • Vectorisation
              • Why is named entity recognition (NER) a challenging task
              • Word2vec
              • WordNet
            • OTHER
              • Addressing_Multicollinearity.py
              • Bag_of_Words.py
              • Bandit example output
              • Bandit_Example_Fixed.py
              • Click_Implementation.py
              • Comparing_Ensembles.py
              • Cross_Entropy_Single.py
              • Cross_Entropy.py
              • Debugging.py
              • Distribution_Analysis.py
              • Factor_Analysis.py
              • FastAPI_Example.py
              • Feature_Distribution.py
              • Forecasting_AutoArima.py
              • Forecasting_Baseline.py
              • Forecasting_Exponential_Smoothing.py
              • Gaussian_Mixture_Model_Implementation.py
              • Handling_Missing_Data_Basic.ipynb
              • Handling_Missing_Data.ipynb
              • Heatmaps_Dendrograms.py
              • Imbalanced_Datasets_SMOTE.py
              • K_Means.py
              • Momentum.py
              • One_hot_encoding.py
              • Pandas_Common.py
              • Pandas_Stack.py
              • PCA_Analysis.ipynb
              • PCA_Based_Anomaly_Detection.py
              • Pycaret_Anomaly.ipynb
              • Pycaret_Example.py
              • Pydantic_More.py
              • Pydantic.py
              • Regression_Logistic_Metrics.ipynb
              • Regularisation.py
              • ROC_Curve.py
              • SVM_Example.py
              • Testing_Pytest.py
              • Testing_unittest.py
              • transfer_learning.py
              • TS_Anomaly_Detection.py
              • Vector_Embedding.py
              • Wikipedia_API.py
              • Word2Vec.py
            • PAPER
              • Attention Is All You Need
              • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
            • project-management
              • 1-on-1 Template
              • 1-to-1's with a Line Manager
              • Asking questions
              • Change Management
              • Communication principles
              • Communication Techniques
              • Communication with Stakeholders
              • Conceptual Model
              • Documentation
              • Education and Training
              • Experiment Plan Template
              • Feedback Template
              • Fishbone diagram
              • How to do git commit messages properly
              • html
              • Jobs to be done
              • Jupyter Book
              • Managing Data Science Teams
              • Modern data team
              • nbconvert slideshows
              • One Pager Template
              • pdoc
              • Problem Definition
              • Process for prototyping
              • project management
              • Project Management Portal
              • Pull Request Template
              • RACI
              • Remaining useful life models
              • Return of Experience Form
              • Reveal.js
              • Technical Debt
              • UML
              • Why use ER diagrams
            • statistics
              • Addressing Multicollinearity
              • ANOVA
              • Assumption of Normality
              • Bernoulli
              • Bootstrap Sampling
              • Casual Inference
              • Central Limit Theorem
              • Central Limit Theorem & Small Sample Sizes
              • Chi-Squared Test
              • Confidence Interval
              • Correlation
              • Correlation vs Causation
              • Cosine Similarity
              • Covariance
              • Covariance vs Correlation
              • Cryptography
              • Differentation
              • Distributions
              • EM Algorithm
              • Factor Analysis
              • Gaussian Distribution
              • Graph Theory
              • Grouped plots
              • Handling Different Distributions
              • Hypothesis testing
              • information theory
              • Interquartile Range (IQR) Detection
              • Johnson–Lindenstrauss lemma
              • Markov chain
              • Mathematics
              • Mean Absolute Error
              • Mean Squared Error
              • mean vs median
              • Multicollinearity
              • non-parametric
              • Odds
              • Odds vs Probability
              • p values
              • Parametric tests
              • parametric vs non-parametric models
              • parametric vs non-parametric tests
              • parsimonious
              • Prediction Intervals
              • Probability
              • Proportion Test
              • Q-Q Plot
              • R
              • R squared
              • R-squared metric not always a good indicator of model performance in regression
              • Reasoning tokens
              • Root Mean Squared Error
              • Sampling
              • Spearman vs Pearson Correlation
              • Standard deviation
              • Standardisation
              • Statistical Assumptions
              • Statistical Tests
              • Statistical theorems
              • Statistics
              • statsmodels
              • Stochastic Gradient Descent
              • Symbolic computation
              • Sympy
              • T-test
              • univariate vs multivariate
              • Variance
              • Violin plot
              • Z-Normalisation
              • Z-Score
              • Z-Scores vs Prediction Intervals
              • Z-Test
            • uncategorised
              • Investigate pyodbc
              • NLP Portal
              • Science Portal
            • pages
              • Data Archive
              • DE_Tools
              • ML_Tools
              • Quotes
              • Research Questions
              • Reviews

          Backlinks

          • No backlinks found

          Created with Quartz v4.3.1 © 2025

          • GitHub
          • Linkedin