Data Archive

    • categories
      • computer-science
        • Algorithms
        • Big O Notation
        • BM25 (Best Match 25)
        • Checksum
        • Computer Science
        • Concurrency
        • Convex Optimisation
        • csv module
        • Directed Acyclic Graph (DAG)
        • Flask
        • garbage collector
        • Generators in Python
        • Hash
        • Heap Data Structure
        • Heap Memory
        • How to search within a graph
        • Immutable vs mutable
        • Java
        • Java vs JavaScript
        • JavaScript
        • Knowledge Graph
        • Langchain
        • Machine Learning Algorithms
        • Monte Carlo Simulation
        • Multiprocessing vs Multithreading
        • Multithreading
        • neomodel
        • Node.JS
        • Numpy
        • Processes vs Threads
        • programming languages
        • PyGraphviz
        • QuickSort
        • Ranking models
        • Recursive Algorithm
        • Strongly vs Weakly typed language
        • Times Series Python Packages
      • data-analysis
        • Altair
        • altair versus seaborn
        • Binder
        • Boxplot
        • Dash
        • Dashboarding
        • Dashboards
        • Data Analysis
        • Data Analysis Portal
        • Data Analyst
        • Data Distribution
        • Data Mining
        • Data Product
        • Data Reduction
        • Data Visualisation
        • DuckDB
        • EDA
        • ER Diagrams
        • Heatmap
        • Label encoding
        • Linear Discriminant Analysis
        • Log transformation
        • Looker Studio
        • MariaDB vs MySQL
        • Melt
        • Multiple Correspondence Analysis
        • Multivariate Analysis
        • OLAP
        • Page Rank
        • Parquet
        • Plotly
        • PowerBI
        • Preprocessing
        • Preprocessing Text Classification
        • Seaborn
        • SQL Window functions
        • t-SNE
        • Tableau
      • data-engineering
        • ACID Transaction
        • Ada boosting
        • Adding a database to PostgreSQL
        • Aggregation
        • Apache Iceberg
        • Attack mitigation
        • Attack types
        • AWS Lambda
        • Azure
        • Bagging
        • Benefits of Data Transformation
        • Big Data
        • BigQuery
        • Cassandra
        • Cloud Providers
        • Coaching & Mentoring
        • Columnar Storage
        • Command Prompt
        • Common Table Expression
        • Components of the database
        • Covering Index
        • Crosstab
        • CRUD
        • CUDA
        • Curse of dimensionality
        • Cypher
        • Data Architect
        • Data Architecture
        • Data Cleansing
        • Data Contract
        • Data Deployment
        • Data Dictionary
        • Data Drift
        • Data Engineering
        • Data Engineering Portal
        • Data Engineering Tools
        • Data Evaluation
        • Data Hierarchy of Needs
        • Data Integration
        • Data Integrity
        • Data Lake
        • Data Lakehouse
        • Data Leakage
        • Data Lifecycle Management
        • data lineage
        • Data Management
        • Data Modeling
        • Data Observability
        • Data Principles
        • Data Quality
        • Data Security
        • Data Selection
        • Data Sources
        • Data Storage
        • Data Transformation
        • Data Transformation in Data Engineering
        • Data Transformation with Pandas
        • Data Validation
        • Data Virtualization
        • Data Warehouse
        • Database
        • Database Index
        • Database Management System (DBMS)
        • Database Schema
        • Database Storage
        • Database Techniques
        • Databricks 1
        • DataOps
        • dbt 1
        • design pattern
        • Digital twin
        • Distributed Computing
        • DuckDB in python
        • DuckDB vs SQLite
        • Durability
        • ELT
        • Estimator
        • ETL
        • ETL 1
        • ETL Pipeline Example
        • ETL vs ELT
        • EtLT
        • Event Driven Microservices
        • Event-Driven Architecture
        • Fabric
        • Faker
        • File Management
        • Folder Tree Diagram
        • Foreign Key
        • Github Actions
        • Google Sheet Pivots Table
        • Grain
        • Graph Query Language
        • Groupby
        • Groupby vs Crosstab
        • heterogeneous features
        • Honkit
        • Hosting
        • How is schema evolution done in practice with SQL
        • How to normalise a merged table
        • Implementing Database Schema
        • Imputation Techniques
        • in-memory format
        • incremental synchronization
        • Indexing in cypher
        • Input is Not Properly Sanitized
        • Joining Datasets
        • Junction Tables
        • KNIME
        • Logical Model
        • Many-to-Many Relationships
        • map reduce
        • MariaDB
        • master data management
        • Merge
        • Microsoft Access
        • Missing Data
        • Model Deployment
        • Monolith Architecture
        • Multi-level index
        • Multiprocessing
        • MySql
        • neo4j
        • Normalised Schema
        • NoSQL
        • Object Relational Mapper
        • OLTP
        • Overfitting
        • Pandas
        • Pandas join vs merge
        • Pandas Pivot Table
        • Pandas Stack
        • pd.Grouper
        • pgAdmin
        • Pgadmin Permissions on Windows
        • Physical Model
        • Pickle
        • Poetry
        • Polars
        • PostgreSQL
        • Postman
        • PowerShell
        • Prevention Is Better Than The Cure
        • Primary Key
        • Push-Down
        • Pydantic
        • Pyright vs Pydantic
        • Query Optimisation
        • Querying
        • Querying Time Series
        • Race Conditions
        • Relating Tables Together
        • Relational Database
        • reverse etl
        • rollup
        • Row parameters in SQL
        • Row-based Storage
        • Scalability
        • Scaling Server
        • Schema Evolution
        • Search
        • Security mitigation
        • Security Researcher
        • semantic layer
        • Single Source of Truth
        • Sklearn Pipiline
        • Slowly Changing Dimension
        • SMSS
        • Snowflake Schema
        • Soft Deletion
        • Software Design Patterns
        • Spreadsheets vs Databases
        • SQL
        • SQL Groupby
        • SQL Injection
        • SQL Joins
        • SQLAlchemy
        • SQLAlchemy vs. sqlite3
        • SQLite
        • SQLite Studio
        • Star Schema
        • storage layer object store
        • Stored Procedures
        • structured data
        • Structuring and organizing data
        • Transaction
        • Turning a flat file into a database
        • Types of Database Schema
        • Unix
        • unstructured data
        • Usability
        • Vacuum
        • Vector Database
        • Vectorized Engine
        • View Use Case
        • Views
        • Windows Subsystem for Linux
      • data-science
        • ACF Plots
        • Additive vs Multiplicative Models Time Series
        • ADF Test
        • Agent Exploration
        • Agentic Solutions
        • AI
        • ARIMA
        • ARIMA vs Random Forest in Time Series
        • Autocorrelation
        • Autocorrelation vs Autoregression
        • Autoregression
        • Baseline Forecast
        • Basics of Time Series
        • Batch gradient descent
        • Bellman Equations
        • Bias-Variance Trade Off
        • Capability
        • Choosing a Threshold
        • Choosing the Number of Clusters
        • Clustermap
        • Covariance Structures
        • Cross Validation
        • Data Assessment
        • Data Collection
        • Data Mining - CRISP
        • Data Preparation
        • Data Science
        • Data Scientist
        • Data Understanding
        • Datasets
        • Decomposition in Time Series
        • Differencing in Time Series
        • DS & ML Portal
        • Evaluating Time Series Forecasts
        • Evolving Seasonality
        • F-statistic
        • Feature Engineering
        • Feature Scaling
        • Feature Selection vs Feature Importance
        • Forecasting using Lags
        • Forward Propagation
        • Gaussian Mixture Models
        • Gitlab
        • Gompertz Model
        • Good Enough Principle in Data Projects
        • GraphRAG
        • Handling Missing Data
        • Holt-Winters (Exponential Smoothing)
        • Holt-Winters vs ARIMA
        • Holt’s Linear Trend Model (Double Exponential Smoothing)
        • how do you do the data selection
        • Imbalanced Datasets
        • Interpolation
        • Intervention Analysis
        • Joining Time Series
        • Kernel Machines
        • KPSS Test
        • Latency
        • Logistic Model Curve
        • LSTM in Time Series
        • Mean Absolute Percentage Error
        • MNIST
        • Normalisation
        • Out-of-sample rolling forecast evaluation
        • PACF Plots
        • Performance Dimensions
        • pmdarima
        • Properties of Time Series Models
        • Random Forest Regression
        • Residuals in Time Series
        • Scatter Plots
        • Scientific Method
        • Scipy
        • Seasonal Naive Forecast
        • Seasonality in Time Series
        • SHapley Additive exPlanations
        • Shot Learning
        • Silhouette Analysis
        • Simple Exponential Smoothing (SES)
        • sklearn datasets
        • SMOTE (Synthetic Minority Over-sampling Technique)
        • SparseCategorialCrossentropy or CategoricalCrossEntropy
        • stack memory
        • Stacking
        • Stationary Time Series
        • STL Decomposition
        • Time Series
        • Time Series Forecasting
        • Time Series Forecasts in Business
        • Time Series Learning Resources
        • Time Series Shocks
        • Trends in Time Series
      • deep-learning
        • Convolutional Neural Networks
        • Deep Learning
        • How is reinforcement learning being combined with deep learning
        • LSTM
        • Multi-Agent Reinforcement Learning
        • Policy
        • Relu
        • Sarsa
      • devops
        • AB testing
        • Alternatives to Batch Processing
        • Amazon S3
        • Apache Airflow
        • Apache Kafka
        • Apache Spark
        • API
        • API Driven Microservices
        • Bash
        • bat
        • Batch Processing
        • Batch vs PowerShell scripts
        • CI-CD
        • Clustering_Dashboard.py
        • Code Diagrams
        • Command Line
        • Continuous Delivery - Deployment
        • Continuous Integration
        • Cron jobs
        • dagster
        • Data Ingestion
        • Data Orchestration
        • Data Pipeline
        • Data Pipeline to Data Products
        • Data Streaming
        • Databricks
        • Databricks vs Snowflake
        • dbt
        • Debugging
        • Declarative Data Pipeline
        • dependency manager
        • DevOps
        • Devops Portal
        • Digital Transformation
        • Docker
        • Docker Image
        • Elastic Net
        • Environment Variables
        • Epub
        • Event Driven
        • Event Driven Events
        • Everything
        • Excel
        • Excel pivot table
        • Excel vs Google Sheets
        • FastAPI
        • Firebase
        • frontend
        • functional programming
        • GIS
        • Git
        • Github Gists
        • gitlab-ci.yml
        • Global Interpreter Lock
        • Google Cloud Platform
        • Google Colab
        • Google My Maps Data Extraction
        • Google Sheets
        • GPT
        • Gradio
        • Grep
        • Hadoop
        • Hugging Face
        • imperative
        • ipynb
        • jinja template
        • Json
        • Json to SQLite
        • jupytext
        • Justfile
        • kubernetes
        • Load Balancing
        • Maintainability
        • Maintainable Code
        • Makefile
        • Master Observability Datadog
        • Memory
        • Memory Caching
        • Microsoft
        • MongoDB
        • nbconvert
        • NET
        • Normalisation of Text
        • Pandas Series vs DataFrame
        • Pandoc
        • PMML
        • Powerquery
        • Powershell scripts
        • Powershell versus Command Prompt
        • Powershell vs Bash
        • Publish and Subscribe
        • PySpark
        • Pytest
        • Python
        • Python Click
        • Quartz
        • Random Access Memory
        • React
        • Registering a Scheduled Task
        • REST API
        • Scala
        • Security Vulnerabilities
        • shapefile
        • Sharepoint
        • Snowflake
        • Snowflake vs Hadoop
        • Software Development Life Cycle
        • SQL vs NoSQL
        • Streamlit
        • Technical Design Doc Template
        • Terminal commands
        • Testing
        • TOML
        • tool.bandit
        • tool.ruff
        • tool.uv
        • Types of Computational Bugs
        • TypeScript
        • Ubuntu
        • unittest
        • Vercel
        • Virtual environments
        • Web Feature Server (WFS)
        • Web Map Tile Service (WMTS)
        • Why JSON is Better than Pickle for Untrusted Data
        • Windows
        • Windows Scheduled Tasks
        • yaml
      • industry
        • AI Engineer
        • AI governance
        • Analytics Engineer
        • business intelligence
        • Business observability
        • Business Understanding
        • Business Values
        • Data AI Education at Work
        • Data Engineer
        • Data Governance
        • data literacy
        • Data Roles
        • Data Steward
        • Design Thinking Questions
        • Documentation & Meetings
        • Energy
        • Energy ABM
        • Energy Demand Forecasting
        • Energy Storage
        • Facts
        • Gartner Hype Cycle
        • Industries of interest
        • Knowledge Work
        • Managing People
        • ML Engineer
        • Network Design
        • Operational Resilience for Growth and Adaptability
        • Reporting
        • Scaling Data Science Capability
        • Smart Grids
        • Telecommunications
        • Thinking Systems
        • Use of RNNs in energy sector
        • Working with SMEs
      • machine-learning
        • Accuracy
        • Activation atlases
        • Activation Function
        • Active Learning
        • Adam Optimizer
        • Adaptive Learning Rates
        • Adjusted R squared
        • Agent-Based Modelling
        • AIC in Model Evaluation
        • Anomaly Detection
        • Anomaly Detection in Time Series
        • Anomaly Detection with Clustering
        • Anomaly Detection with Statistical Methods
        • Assessing Gen AI generated content
        • AUC
        • Automated Feature Creation
        • AutoML
        • Backpropagation
        • Batch Normalisation
        • Bias in ML
        • Binary Classification
        • Boosting
        • Business value of anomaly detection
        • CART
        • CatBoost
        • Challenges to Model Deployment
        • Class Separability
        • Classification
        • Classification Report
        • Cluster Density
        • Cluster Seperation
        • Clustering
        • Collaborative Filtering
        • conceptual data model
        • Confusion Matrix
        • Cost Function
        • Cost-Sensitive Analysis
        • Cross Entropy
        • Customer Growth Modeling
        • Data Selection in ML
        • Data Transformation in Machine Learning
        • DBSCAN
        • Decision Theory
        • Decision Tree
        • Decision Trees are Fragile
        • Deep Learning Frameworks
        • Deep Q-Learning
        • Dendrograms
        • Determining Threshold Values
        • Dimension Table
        • Dimensional Modelling
        • Dimensionality Reduction
        • Dimensions
        • Distributions in Decision Tree Leaves
        • Dropout
        • Dummy variable trap
        • Edge ML
        • emergent behavior
        • Encoding Categorical Variables
        • Epoch
        • Evaluating Language Models
        • Evaluating Logistic Regression
        • Evaluating the effectiveness of prompts
        • Evaluation Metrics
        • Exploration vs Exploitation
        • Exponential Smoothing
        • f-regression
        • F1 Score
        • Fact Table
        • FAISS
        • Feature Engineering for Time Series
        • Feature Evaluation
        • Feature Extraction
        • Feature Importance
        • Feature Selection
        • Feature Transformations
        • Feed Forward Neural Network
        • Filter Methods
        • Fitting weights and biases of a neural network
        • Framework for models
        • Gaussian Model
        • General Linear Regression
        • Generalisation
        • Generative Adversarial Networks
        • Gini Impurity
        • Gini Impurity vs Cross Entropy
        • Gradient Boosted Trees
        • Gradient Boosting
        • Gradient Boosting Regressor
        • Gradient Descent
        • Gradient descent in linear regression
        • granularity
        • Graph Neural Network
        • Graph Theory Community
        • GridSeachCv
        • Growth Models in Time Series
        • GRU
        • Hierarchical Clustering
        • High cross validation accuracy is not directly proportional to performance on unseen test data
        • Histogram
        • How do we evaluate of LLM Outputs
        • How to use Sklearn Pipeline
        • Hyperparameter
        • Hyperparameter Tuning
        • Impact of multicollinearity on model parameters
        • Inertia K Means Cost Function
        • inference
        • inference versus prediction
        • initialization methods
        • Interoperability
        • interoperable
        • Interpretability
        • Interpreting logistic regression model parameters
        • Isolated Forest
        • Jaccard Coefficient
        • K-means
        • K-nearest neighbours
        • Keras
        • Kernel Density Estimation
        • Kernelling
        • Kmeans vs GMM
        • L1 Regularisation
        • Label encoding vs One-hot encoding
        • Labelling data
        • Lagrange multipliers in optimisation
        • lambda architecture
        • Latent Dirichlet Allocation
        • Latent Semantic Indexing
        • LBFGS
        • Learning Curve
        • Learning Rate
        • Learning Styles
        • LightGBM
        • LightGBM vs XGBoost vs CatBoost
        • Linear Regression
        • LLM Evaluation Metrics
        • Local Interpretable Model-agnostic Explainations
        • Local Outlier Factor (LOF)
        • Logistic Regression
        • Logistic Regression does not predict probabilities
        • Logistic regression in sklearn & Gradient Descent
        • Logistic Regression Statsmodel Summary table
        • Loss function
        • Loss versus Cost function
        • Machine Learning
        • Machine Learning Operations
        • Manifold Learning
        • Markov Decision Processes
        • Maximum Likelihood Estimation
        • Median Absolute Error
        • Mermaid
        • Metadata Handling
        • Methods for Handling Outliers
        • Metric
        • Mini-batch gradient descent
        • MLOPS for Time Series
        • Model Building
        • Model Deployment using PyCaret
        • Model Ensemble
        • Model Evaluation
        • Model Evaluation vs Model Optimisation
        • Model Interpretability
        • Model Observability
        • Model Optimisation
        • Model Parameters
        • Model Parameters Tuning
        • Model parameters vs hyperparameters
        • Model Selection
        • Model Validation
        • model-agnostic feature importance
        • Momentum
        • Moving Average Forecast
        • Multinomial Naive bayes
        • Multiple Linear Regression
        • Naive Bayes Classifier
        • Naive Forecast
        • Neural network
        • Neural Network Classification
        • Neural network in Practice
        • Neural Scaling Laws
        • Non-negative matrix factorization in ML
        • Non-parametric tests
        • Normalisation of data
        • Normalisation vs Standardisation
        • objective function
        • One-hot encoding
        • Optimisation function
        • Optimisation techniques
        • Optimising a Logistic Regression Model
        • Optimising Neural Networks
        • Optuna
        • Ordinary Least Squares
        • Orthogonalization
        • Outliers
        • Over parameterised models
        • PCA Explained Variance Ratio
        • PCA Principal Components
        • PCA-Based Anomaly Detection
        • PDP and ICE
        • Percentile Detection
        • Performance Drift
        • Polynomial Regression
        • Positional Encoding
        • Precision
        • Precision or Recall
        • Precision-Recall Curve
        • Prediction Intervals vs Confidence Interval
        • Principal Component Analysis
        • PyCaret
        • PyOD
        • PyTorch
        • Pytorch vs Tensorflow
        • Q-Learning
        • Random Forest
        • Random Forest for Time Series
        • Recall
        • Recommender systems
        • Recurrent Neural Networks
        • Regression
        • Regression Metrics
        • Regularisation
        • Regularisation of Tree based models
        • Reinforcement learning
        • Relationships in memory
        • Reward Function
        • Ridge
        • ROC (Receiver Operating Characteristic)
        • Sammon’s Mapping
        • SARIMA
        • Scikit-Learn
        • Secretary Problem
        • semi-structured data
        • Sentence Transformers
        • Sklearn Pipeline
        • Specificity
        • Spectral Clustering
        • Supervised Learning
        • Support Vector Classifier
        • Support Vector Machines
        • Support Vector Regression
        • Tensorflow
        • Test Loss When Evaluating Models
        • Text Classification
        • Time Series Python Packages
        • Train-Dev-Test Sets
        • Transfer Learning
        • Transformed Target Regressor
        • Transformer
        • Transformers vs RNNs
        • Type I Error (False Positive)
        • Type II Error (False Negative)
        • Types of Neural Networks
        • Typical Output Formats in Neural Networks
        • UMAP
        • Unsupervised Learning
        • Use Cases for a Simple Neural Network Like
        • vanishing and exploding gradients problem
        • Variability in linear models
        • Variance in ML
        • Vector Embedding
        • WCSS and elbow method
        • Weak Learners
        • When and why not to us regularisation
        • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
        • Why does the Adam Optimizer converge
        • Why Removing Outliers May Improve Regression but Harm Classification
        • Why standardise features
        • Why Type 1 and Type 2 matter
        • Wrapper Methods
        • Xaiver
        • XGBoost
      • natural-language
        • AI Agents Memory
        • Attention mechanism
        • Bag of words
        • BERT
        • BERTScore
        • Chain of thought
        • ChatGPT
        • Claude
        • Comparing LLMs
        • Distillation
        • ElasticSearch
        • Embedded Methods
        • embeddings for OOV words
        • Evaluate Embedding Methods
        • Fuzzywuzzy
        • Generative AI
        • Generative AI From Theory to Practice
        • Grammar method
        • Guardrails
        • How businesses use Gen AI
        • How LLMs store facts
        • How to reduce the need for Gen AI responses
        • How would you decide between using TF-IDF and Word2Vec for text vectorization
        • In NER how would you handle ambiguous entities
        • Key Components of Attention and Formula
        • Knowledge graph vs RAG setup
        • Language Model Output Optimisation
        • Language Models
        • Language Models Large (LLMs) vs Small (SLMs)
        • lemmatization
        • LLM
        • LLM Memory
        • Local LLM use cases
        • Mathematical Reasoning in Transformers
        • Mixture of Experts
        • Model Cascading
        • Multi-head attention
        • Named Entity Recognition
        • NER Implementation
        • Ngrams
        • NLP
        • nltk
        • Non-negative Matrix Factorization
        • NotebookLM
        • OOV words
        • Pandas Dataframe Agent
        • Part of speech tagging
        • Prompt Engineering
        • prompt retrievers
        • Prompts
        • Pyright
        • RAG
        • Scaling Agentic Systems
        • Self attention vs multi-head attention
        • Self-Attention
        • Semantic Relationships
        • Semantic search
        • Sentence Similarity
        • Sentence Transformer Workflow
        • Similarity Search
        • Small Language Models
        • spaCy
        • Stemming
        • stopwords
        • Summarisation
        • syntactic relationships
        • Text2Cypher
        • TF-IDF
        • TF-IDF Implementation
        • Tokenisation
        • topic modeling
        • Vectorisation
        • Why is named entity recognition (NER) a challenging task
        • Word2vec
        • WordNet
      • OTHER
        • Addressing_Multicollinearity.py
        • Bag_of_Words.py
        • Bandit example output
        • Bandit_Example_Fixed.py
        • Click_Implementation.py
        • Comparing_Ensembles.py
        • Cross_Entropy_Single.py
        • Cross_Entropy.py
        • Debugging.py
        • Distribution_Analysis.py
        • Factor_Analysis.py
        • FastAPI_Example.py
        • Feature_Distribution.py
        • Forecasting_AutoArima.py
        • Forecasting_Baseline.py
        • Forecasting_Exponential_Smoothing.py
        • Gaussian_Mixture_Model_Implementation.py
        • Handling_Missing_Data_Basic.ipynb
        • Handling_Missing_Data.ipynb
        • Heatmaps_Dendrograms.py
        • Imbalanced_Datasets_SMOTE.py
        • K_Means.py
        • Momentum.py
        • One_hot_encoding.py
        • Pandas_Common.py
        • Pandas_Stack.py
        • PCA_Analysis.ipynb
        • PCA_Based_Anomaly_Detection.py
        • Pycaret_Anomaly.ipynb
        • Pycaret_Example.py
        • Pydantic_More.py
        • Pydantic.py
        • Regression_Logistic_Metrics.ipynb
        • Regularisation.py
        • ROC_Curve.py
        • SVM_Example.py
        • Testing_Pytest.py
        • Testing_unittest.py
        • transfer_learning.py
        • TS_Anomaly_Detection.py
        • Vector_Embedding.py
        • Wikipedia_API.py
        • Word2Vec.py
      • PAPER
        • Attention Is All You Need
        • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
      • project-management
        • 1-on-1 Template
        • 1-to-1's with a Line Manager
        • Asking questions
        • Change Management
        • Communication principles
        • Communication Techniques
        • Communication with Stakeholders
        • Conceptual Model
        • Documentation
        • Education and Training
        • Experiment Plan Template
        • Feedback Template
        • Fishbone diagram
        • How to do git commit messages properly
        • html
        • Jobs to be done
        • Jupyter Book
        • Managing Data Science Teams
        • Modern data team
        • nbconvert slideshows
        • One Pager Template
        • pdoc
        • Problem Definition
        • Process for prototyping
        • project management
        • Project Management Portal
        • Pull Request Template
        • RACI
        • Remaining useful life models
        • Return of Experience Form
        • Reveal.js
        • Technical Debt
        • UML
        • Why use ER diagrams
      • statistics
        • Addressing Multicollinearity
        • ANOVA
        • Assumption of Normality
        • Bernoulli
        • Bootstrap Sampling
        • Casual Inference
        • Central Limit Theorem
        • Central Limit Theorem & Small Sample Sizes
        • Chi-Squared Test
        • Confidence Interval
        • Correlation
        • Correlation vs Causation
        • Cosine Similarity
        • Covariance
        • Covariance vs Correlation
        • Cryptography
        • Differentation
        • Distributions
        • EM Algorithm
        • Factor Analysis
        • Gaussian Distribution
        • Graph Theory
        • Grouped plots
        • Handling Different Distributions
        • Hypothesis testing
        • information theory
        • Interquartile Range (IQR) Detection
        • Johnson–Lindenstrauss lemma
        • Markov chain
        • Mathematics
        • Mean Absolute Error
        • Mean Squared Error
        • mean vs median
        • Multicollinearity
        • non-parametric
        • Odds
        • Odds vs Probability
        • p values
        • Parametric tests
        • parametric vs non-parametric models
        • parametric vs non-parametric tests
        • parsimonious
        • Prediction Intervals
        • Probability
        • Proportion Test
        • Q-Q Plot
        • R
        • R squared
        • R-squared metric not always a good indicator of model performance in regression
        • Reasoning tokens
        • Root Mean Squared Error
        • Sampling
        • Spearman vs Pearson Correlation
        • Standard deviation
        • Standardisation
        • Statistical Assumptions
        • Statistical Tests
        • Statistical theorems
        • Statistics
        • statsmodels
        • Stochastic Gradient Descent
        • Symbolic computation
        • Sympy
        • T-test
        • univariate vs multivariate
        • Variance
        • Violin plot
        • Z-Normalisation
        • Z-Score
        • Z-Scores vs Prediction Intervals
        • Z-Test
      • uncategorised
        • Investigate pyodbc
        • NLP Portal
        • Science Portal
      • pages
        • Data Archive
        • DE_Tools
        • ML_Tools
        • Quotes
        • Research Questions
        • Reviews

    transfer_learning.py

    https://github.com/rhyslwells/ML_Tools/blob/main/Explorations/Build/Neural_Network/transfer_learning.py

    For deep learning, to do Transfer Learning we take out and replace a few end layers of the network. We can then train just the last layer of weights of a neural network.

    The number of layers to remove and then added from pretrained depends on the similarity between tasks. Higher layers in networks are able to recognise higher detail components.


    Backlinks

    • Transfer Learning
    • blank
      • categories
        • computer-science
          • Algorithms
          • Big O Notation
          • BM25 (Best Match 25)
          • Checksum
          • Computer Science
          • Concurrency
          • Convex Optimisation
          • csv module
          • Directed Acyclic Graph (DAG)
          • Flask
          • garbage collector
          • Generators in Python
          • Hash
          • Heap Data Structure
          • Heap Memory
          • How to search within a graph
          • Immutable vs mutable
          • Java
          • Java vs JavaScript
          • JavaScript
          • Knowledge Graph
          • Langchain
          • Machine Learning Algorithms
          • Monte Carlo Simulation
          • Multiprocessing vs Multithreading
          • Multithreading
          • neomodel
          • Node.JS
          • Numpy
          • Processes vs Threads
          • programming languages
          • PyGraphviz
          • QuickSort
          • Ranking models
          • Recursive Algorithm
          • Strongly vs Weakly typed language
          • Times Series Python Packages
        • data-analysis
          • Altair
          • altair versus seaborn
          • Binder
          • Boxplot
          • Dash
          • Dashboarding
          • Dashboards
          • Data Analysis
          • Data Analysis Portal
          • Data Analyst
          • Data Distribution
          • Data Mining
          • Data Product
          • Data Reduction
          • Data Visualisation
          • DuckDB
          • EDA
          • ER Diagrams
          • Heatmap
          • Label encoding
          • Linear Discriminant Analysis
          • Log transformation
          • Looker Studio
          • MariaDB vs MySQL
          • Melt
          • Multiple Correspondence Analysis
          • Multivariate Analysis
          • OLAP
          • Page Rank
          • Parquet
          • Plotly
          • PowerBI
          • Preprocessing
          • Preprocessing Text Classification
          • Seaborn
          • SQL Window functions
          • t-SNE
          • Tableau
        • data-engineering
          • ACID Transaction
          • Ada boosting
          • Adding a database to PostgreSQL
          • Aggregation
          • Apache Iceberg
          • Attack mitigation
          • Attack types
          • AWS Lambda
          • Azure
          • Bagging
          • Benefits of Data Transformation
          • Big Data
          • BigQuery
          • Cassandra
          • Cloud Providers
          • Coaching & Mentoring
          • Columnar Storage
          • Command Prompt
          • Common Table Expression
          • Components of the database
          • Covering Index
          • Crosstab
          • CRUD
          • CUDA
          • Curse of dimensionality
          • Cypher
          • Data Architect
          • Data Architecture
          • Data Cleansing
          • Data Contract
          • Data Deployment
          • Data Dictionary
          • Data Drift
          • Data Engineering
          • Data Engineering Portal
          • Data Engineering Tools
          • Data Evaluation
          • Data Hierarchy of Needs
          • Data Integration
          • Data Integrity
          • Data Lake
          • Data Lakehouse
          • Data Leakage
          • Data Lifecycle Management
          • data lineage
          • Data Management
          • Data Modeling
          • Data Observability
          • Data Principles
          • Data Quality
          • Data Security
          • Data Selection
          • Data Sources
          • Data Storage
          • Data Transformation
          • Data Transformation in Data Engineering
          • Data Transformation with Pandas
          • Data Validation
          • Data Virtualization
          • Data Warehouse
          • Database
          • Database Index
          • Database Management System (DBMS)
          • Database Schema
          • Database Storage
          • Database Techniques
          • Databricks 1
          • DataOps
          • dbt 1
          • design pattern
          • Digital twin
          • Distributed Computing
          • DuckDB in python
          • DuckDB vs SQLite
          • Durability
          • ELT
          • Estimator
          • ETL
          • ETL 1
          • ETL Pipeline Example
          • ETL vs ELT
          • EtLT
          • Event Driven Microservices
          • Event-Driven Architecture
          • Fabric
          • Faker
          • File Management
          • Folder Tree Diagram
          • Foreign Key
          • Github Actions
          • Google Sheet Pivots Table
          • Grain
          • Graph Query Language
          • Groupby
          • Groupby vs Crosstab
          • heterogeneous features
          • Honkit
          • Hosting
          • How is schema evolution done in practice with SQL
          • How to normalise a merged table
          • Implementing Database Schema
          • Imputation Techniques
          • in-memory format
          • incremental synchronization
          • Indexing in cypher
          • Input is Not Properly Sanitized
          • Joining Datasets
          • Junction Tables
          • KNIME
          • Logical Model
          • Many-to-Many Relationships
          • map reduce
          • MariaDB
          • master data management
          • Merge
          • Microsoft Access
          • Missing Data
          • Model Deployment
          • Monolith Architecture
          • Multi-level index
          • Multiprocessing
          • MySql
          • neo4j
          • Normalised Schema
          • NoSQL
          • Object Relational Mapper
          • OLTP
          • Overfitting
          • Pandas
          • Pandas join vs merge
          • Pandas Pivot Table
          • Pandas Stack
          • pd.Grouper
          • pgAdmin
          • Pgadmin Permissions on Windows
          • Physical Model
          • Pickle
          • Poetry
          • Polars
          • PostgreSQL
          • Postman
          • PowerShell
          • Prevention Is Better Than The Cure
          • Primary Key
          • Push-Down
          • Pydantic
          • Pyright vs Pydantic
          • Query Optimisation
          • Querying
          • Querying Time Series
          • Race Conditions
          • Relating Tables Together
          • Relational Database
          • reverse etl
          • rollup
          • Row parameters in SQL
          • Row-based Storage
          • Scalability
          • Scaling Server
          • Schema Evolution
          • Search
          • Security mitigation
          • Security Researcher
          • semantic layer
          • Single Source of Truth
          • Sklearn Pipiline
          • Slowly Changing Dimension
          • SMSS
          • Snowflake Schema
          • Soft Deletion
          • Software Design Patterns
          • Spreadsheets vs Databases
          • SQL
          • SQL Groupby
          • SQL Injection
          • SQL Joins
          • SQLAlchemy
          • SQLAlchemy vs. sqlite3
          • SQLite
          • SQLite Studio
          • Star Schema
          • storage layer object store
          • Stored Procedures
          • structured data
          • Structuring and organizing data
          • Transaction
          • Turning a flat file into a database
          • Types of Database Schema
          • Unix
          • unstructured data
          • Usability
          • Vacuum
          • Vector Database
          • Vectorized Engine
          • View Use Case
          • Views
          • Windows Subsystem for Linux
        • data-science
          • ACF Plots
          • Additive vs Multiplicative Models Time Series
          • ADF Test
          • Agent Exploration
          • Agentic Solutions
          • AI
          • ARIMA
          • ARIMA vs Random Forest in Time Series
          • Autocorrelation
          • Autocorrelation vs Autoregression
          • Autoregression
          • Baseline Forecast
          • Basics of Time Series
          • Batch gradient descent
          • Bellman Equations
          • Bias-Variance Trade Off
          • Capability
          • Choosing a Threshold
          • Choosing the Number of Clusters
          • Clustermap
          • Covariance Structures
          • Cross Validation
          • Data Assessment
          • Data Collection
          • Data Mining - CRISP
          • Data Preparation
          • Data Science
          • Data Scientist
          • Data Understanding
          • Datasets
          • Decomposition in Time Series
          • Differencing in Time Series
          • DS & ML Portal
          • Evaluating Time Series Forecasts
          • Evolving Seasonality
          • F-statistic
          • Feature Engineering
          • Feature Scaling
          • Feature Selection vs Feature Importance
          • Forecasting using Lags
          • Forward Propagation
          • Gaussian Mixture Models
          • Gitlab
          • Gompertz Model
          • Good Enough Principle in Data Projects
          • GraphRAG
          • Handling Missing Data
          • Holt-Winters (Exponential Smoothing)
          • Holt-Winters vs ARIMA
          • Holt’s Linear Trend Model (Double Exponential Smoothing)
          • how do you do the data selection
          • Imbalanced Datasets
          • Interpolation
          • Intervention Analysis
          • Joining Time Series
          • Kernel Machines
          • KPSS Test
          • Latency
          • Logistic Model Curve
          • LSTM in Time Series
          • Mean Absolute Percentage Error
          • MNIST
          • Normalisation
          • Out-of-sample rolling forecast evaluation
          • PACF Plots
          • Performance Dimensions
          • pmdarima
          • Properties of Time Series Models
          • Random Forest Regression
          • Residuals in Time Series
          • Scatter Plots
          • Scientific Method
          • Scipy
          • Seasonal Naive Forecast
          • Seasonality in Time Series
          • SHapley Additive exPlanations
          • Shot Learning
          • Silhouette Analysis
          • Simple Exponential Smoothing (SES)
          • sklearn datasets
          • SMOTE (Synthetic Minority Over-sampling Technique)
          • SparseCategorialCrossentropy or CategoricalCrossEntropy
          • stack memory
          • Stacking
          • Stationary Time Series
          • STL Decomposition
          • Time Series
          • Time Series Forecasting
          • Time Series Forecasts in Business
          • Time Series Learning Resources
          • Time Series Shocks
          • Trends in Time Series
        • deep-learning
          • Convolutional Neural Networks
          • Deep Learning
          • How is reinforcement learning being combined with deep learning
          • LSTM
          • Multi-Agent Reinforcement Learning
          • Policy
          • Relu
          • Sarsa
        • devops
          • AB testing
          • Alternatives to Batch Processing
          • Amazon S3
          • Apache Airflow
          • Apache Kafka
          • Apache Spark
          • API
          • API Driven Microservices
          • Bash
          • bat
          • Batch Processing
          • Batch vs PowerShell scripts
          • CI-CD
          • Clustering_Dashboard.py
          • Code Diagrams
          • Command Line
          • Continuous Delivery - Deployment
          • Continuous Integration
          • Cron jobs
          • dagster
          • Data Ingestion
          • Data Orchestration
          • Data Pipeline
          • Data Pipeline to Data Products
          • Data Streaming
          • Databricks
          • Databricks vs Snowflake
          • dbt
          • Debugging
          • Declarative Data Pipeline
          • dependency manager
          • DevOps
          • Devops Portal
          • Digital Transformation
          • Docker
          • Docker Image
          • Elastic Net
          • Environment Variables
          • Epub
          • Event Driven
          • Event Driven Events
          • Everything
          • Excel
          • Excel pivot table
          • Excel vs Google Sheets
          • FastAPI
          • Firebase
          • frontend
          • functional programming
          • GIS
          • Git
          • Github Gists
          • gitlab-ci.yml
          • Global Interpreter Lock
          • Google Cloud Platform
          • Google Colab
          • Google My Maps Data Extraction
          • Google Sheets
          • GPT
          • Gradio
          • Grep
          • Hadoop
          • Hugging Face
          • imperative
          • ipynb
          • jinja template
          • Json
          • Json to SQLite
          • jupytext
          • Justfile
          • kubernetes
          • Load Balancing
          • Maintainability
          • Maintainable Code
          • Makefile
          • Master Observability Datadog
          • Memory
          • Memory Caching
          • Microsoft
          • MongoDB
          • nbconvert
          • NET
          • Normalisation of Text
          • Pandas Series vs DataFrame
          • Pandoc
          • PMML
          • Powerquery
          • Powershell scripts
          • Powershell versus Command Prompt
          • Powershell vs Bash
          • Publish and Subscribe
          • PySpark
          • Pytest
          • Python
          • Python Click
          • Quartz
          • Random Access Memory
          • React
          • Registering a Scheduled Task
          • REST API
          • Scala
          • Security Vulnerabilities
          • shapefile
          • Sharepoint
          • Snowflake
          • Snowflake vs Hadoop
          • Software Development Life Cycle
          • SQL vs NoSQL
          • Streamlit
          • Technical Design Doc Template
          • Terminal commands
          • Testing
          • TOML
          • tool.bandit
          • tool.ruff
          • tool.uv
          • Types of Computational Bugs
          • TypeScript
          • Ubuntu
          • unittest
          • Vercel
          • Virtual environments
          • Web Feature Server (WFS)
          • Web Map Tile Service (WMTS)
          • Why JSON is Better than Pickle for Untrusted Data
          • Windows
          • Windows Scheduled Tasks
          • yaml
        • industry
          • AI Engineer
          • AI governance
          • Analytics Engineer
          • business intelligence
          • Business observability
          • Business Understanding
          • Business Values
          • Data AI Education at Work
          • Data Engineer
          • Data Governance
          • data literacy
          • Data Roles
          • Data Steward
          • Design Thinking Questions
          • Documentation & Meetings
          • Energy
          • Energy ABM
          • Energy Demand Forecasting
          • Energy Storage
          • Facts
          • Gartner Hype Cycle
          • Industries of interest
          • Knowledge Work
          • Managing People
          • ML Engineer
          • Network Design
          • Operational Resilience for Growth and Adaptability
          • Reporting
          • Scaling Data Science Capability
          • Smart Grids
          • Telecommunications
          • Thinking Systems
          • Use of RNNs in energy sector
          • Working with SMEs
        • machine-learning
          • Accuracy
          • Activation atlases
          • Activation Function
          • Active Learning
          • Adam Optimizer
          • Adaptive Learning Rates
          • Adjusted R squared
          • Agent-Based Modelling
          • AIC in Model Evaluation
          • Anomaly Detection
          • Anomaly Detection in Time Series
          • Anomaly Detection with Clustering
          • Anomaly Detection with Statistical Methods
          • Assessing Gen AI generated content
          • AUC
          • Automated Feature Creation
          • AutoML
          • Backpropagation
          • Batch Normalisation
          • Bias in ML
          • Binary Classification
          • Boosting
          • Business value of anomaly detection
          • CART
          • CatBoost
          • Challenges to Model Deployment
          • Class Separability
          • Classification
          • Classification Report
          • Cluster Density
          • Cluster Seperation
          • Clustering
          • Collaborative Filtering
          • conceptual data model
          • Confusion Matrix
          • Cost Function
          • Cost-Sensitive Analysis
          • Cross Entropy
          • Customer Growth Modeling
          • Data Selection in ML
          • Data Transformation in Machine Learning
          • DBSCAN
          • Decision Theory
          • Decision Tree
          • Decision Trees are Fragile
          • Deep Learning Frameworks
          • Deep Q-Learning
          • Dendrograms
          • Determining Threshold Values
          • Dimension Table
          • Dimensional Modelling
          • Dimensionality Reduction
          • Dimensions
          • Distributions in Decision Tree Leaves
          • Dropout
          • Dummy variable trap
          • Edge ML
          • emergent behavior
          • Encoding Categorical Variables
          • Epoch
          • Evaluating Language Models
          • Evaluating Logistic Regression
          • Evaluating the effectiveness of prompts
          • Evaluation Metrics
          • Exploration vs Exploitation
          • Exponential Smoothing
          • f-regression
          • F1 Score
          • Fact Table
          • FAISS
          • Feature Engineering for Time Series
          • Feature Evaluation
          • Feature Extraction
          • Feature Importance
          • Feature Selection
          • Feature Transformations
          • Feed Forward Neural Network
          • Filter Methods
          • Fitting weights and biases of a neural network
          • Framework for models
          • Gaussian Model
          • General Linear Regression
          • Generalisation
          • Generative Adversarial Networks
          • Gini Impurity
          • Gini Impurity vs Cross Entropy
          • Gradient Boosted Trees
          • Gradient Boosting
          • Gradient Boosting Regressor
          • Gradient Descent
          • Gradient descent in linear regression
          • granularity
          • Graph Neural Network
          • Graph Theory Community
          • GridSeachCv
          • Growth Models in Time Series
          • GRU
          • Hierarchical Clustering
          • High cross validation accuracy is not directly proportional to performance on unseen test data
          • Histogram
          • How do we evaluate of LLM Outputs
          • How to use Sklearn Pipeline
          • Hyperparameter
          • Hyperparameter Tuning
          • Impact of multicollinearity on model parameters
          • Inertia K Means Cost Function
          • inference
          • inference versus prediction
          • initialization methods
          • Interoperability
          • interoperable
          • Interpretability
          • Interpreting logistic regression model parameters
          • Isolated Forest
          • Jaccard Coefficient
          • K-means
          • K-nearest neighbours
          • Keras
          • Kernel Density Estimation
          • Kernelling
          • Kmeans vs GMM
          • L1 Regularisation
          • Label encoding vs One-hot encoding
          • Labelling data
          • Lagrange multipliers in optimisation
          • lambda architecture
          • Latent Dirichlet Allocation
          • Latent Semantic Indexing
          • LBFGS
          • Learning Curve
          • Learning Rate
          • Learning Styles
          • LightGBM
          • LightGBM vs XGBoost vs CatBoost
          • Linear Regression
          • LLM Evaluation Metrics
          • Local Interpretable Model-agnostic Explainations
          • Local Outlier Factor (LOF)
          • Logistic Regression
          • Logistic Regression does not predict probabilities
          • Logistic regression in sklearn & Gradient Descent
          • Logistic Regression Statsmodel Summary table
          • Loss function
          • Loss versus Cost function
          • Machine Learning
          • Machine Learning Operations
          • Manifold Learning
          • Markov Decision Processes
          • Maximum Likelihood Estimation
          • Median Absolute Error
          • Mermaid
          • Metadata Handling
          • Methods for Handling Outliers
          • Metric
          • Mini-batch gradient descent
          • MLOPS for Time Series
          • Model Building
          • Model Deployment using PyCaret
          • Model Ensemble
          • Model Evaluation
          • Model Evaluation vs Model Optimisation
          • Model Interpretability
          • Model Observability
          • Model Optimisation
          • Model Parameters
          • Model Parameters Tuning
          • Model parameters vs hyperparameters
          • Model Selection
          • Model Validation
          • model-agnostic feature importance
          • Momentum
          • Moving Average Forecast
          • Multinomial Naive bayes
          • Multiple Linear Regression
          • Naive Bayes Classifier
          • Naive Forecast
          • Neural network
          • Neural Network Classification
          • Neural network in Practice
          • Neural Scaling Laws
          • Non-negative matrix factorization in ML
          • Non-parametric tests
          • Normalisation of data
          • Normalisation vs Standardisation
          • objective function
          • One-hot encoding
          • Optimisation function
          • Optimisation techniques
          • Optimising a Logistic Regression Model
          • Optimising Neural Networks
          • Optuna
          • Ordinary Least Squares
          • Orthogonalization
          • Outliers
          • Over parameterised models
          • PCA Explained Variance Ratio
          • PCA Principal Components
          • PCA-Based Anomaly Detection
          • PDP and ICE
          • Percentile Detection
          • Performance Drift
          • Polynomial Regression
          • Positional Encoding
          • Precision
          • Precision or Recall
          • Precision-Recall Curve
          • Prediction Intervals vs Confidence Interval
          • Principal Component Analysis
          • PyCaret
          • PyOD
          • PyTorch
          • Pytorch vs Tensorflow
          • Q-Learning
          • Random Forest
          • Random Forest for Time Series
          • Recall
          • Recommender systems
          • Recurrent Neural Networks
          • Regression
          • Regression Metrics
          • Regularisation
          • Regularisation of Tree based models
          • Reinforcement learning
          • Relationships in memory
          • Reward Function
          • Ridge
          • ROC (Receiver Operating Characteristic)
          • Sammon’s Mapping
          • SARIMA
          • Scikit-Learn
          • Secretary Problem
          • semi-structured data
          • Sentence Transformers
          • Sklearn Pipeline
          • Specificity
          • Spectral Clustering
          • Supervised Learning
          • Support Vector Classifier
          • Support Vector Machines
          • Support Vector Regression
          • Tensorflow
          • Test Loss When Evaluating Models
          • Text Classification
          • Time Series Python Packages
          • Train-Dev-Test Sets
          • Transfer Learning
          • Transformed Target Regressor
          • Transformer
          • Transformers vs RNNs
          • Type I Error (False Positive)
          • Type II Error (False Negative)
          • Types of Neural Networks
          • Typical Output Formats in Neural Networks
          • UMAP
          • Unsupervised Learning
          • Use Cases for a Simple Neural Network Like
          • vanishing and exploding gradients problem
          • Variability in linear models
          • Variance in ML
          • Vector Embedding
          • WCSS and elbow method
          • Weak Learners
          • When and why not to us regularisation
          • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
          • Why does the Adam Optimizer converge
          • Why Removing Outliers May Improve Regression but Harm Classification
          • Why standardise features
          • Why Type 1 and Type 2 matter
          • Wrapper Methods
          • Xaiver
          • XGBoost
        • natural-language
          • AI Agents Memory
          • Attention mechanism
          • Bag of words
          • BERT
          • BERTScore
          • Chain of thought
          • ChatGPT
          • Claude
          • Comparing LLMs
          • Distillation
          • ElasticSearch
          • Embedded Methods
          • embeddings for OOV words
          • Evaluate Embedding Methods
          • Fuzzywuzzy
          • Generative AI
          • Generative AI From Theory to Practice
          • Grammar method
          • Guardrails
          • How businesses use Gen AI
          • How LLMs store facts
          • How to reduce the need for Gen AI responses
          • How would you decide between using TF-IDF and Word2Vec for text vectorization
          • In NER how would you handle ambiguous entities
          • Key Components of Attention and Formula
          • Knowledge graph vs RAG setup
          • Language Model Output Optimisation
          • Language Models
          • Language Models Large (LLMs) vs Small (SLMs)
          • lemmatization
          • LLM
          • LLM Memory
          • Local LLM use cases
          • Mathematical Reasoning in Transformers
          • Mixture of Experts
          • Model Cascading
          • Multi-head attention
          • Named Entity Recognition
          • NER Implementation
          • Ngrams
          • NLP
          • nltk
          • Non-negative Matrix Factorization
          • NotebookLM
          • OOV words
          • Pandas Dataframe Agent
          • Part of speech tagging
          • Prompt Engineering
          • prompt retrievers
          • Prompts
          • Pyright
          • RAG
          • Scaling Agentic Systems
          • Self attention vs multi-head attention
          • Self-Attention
          • Semantic Relationships
          • Semantic search
          • Sentence Similarity
          • Sentence Transformer Workflow
          • Similarity Search
          • Small Language Models
          • spaCy
          • Stemming
          • stopwords
          • Summarisation
          • syntactic relationships
          • Text2Cypher
          • TF-IDF
          • TF-IDF Implementation
          • Tokenisation
          • topic modeling
          • Vectorisation
          • Why is named entity recognition (NER) a challenging task
          • Word2vec
          • WordNet
        • OTHER
          • Addressing_Multicollinearity.py
          • Bag_of_Words.py
          • Bandit example output
          • Bandit_Example_Fixed.py
          • Click_Implementation.py
          • Comparing_Ensembles.py
          • Cross_Entropy_Single.py
          • Cross_Entropy.py
          • Debugging.py
          • Distribution_Analysis.py
          • Factor_Analysis.py
          • FastAPI_Example.py
          • Feature_Distribution.py
          • Forecasting_AutoArima.py
          • Forecasting_Baseline.py
          • Forecasting_Exponential_Smoothing.py
          • Gaussian_Mixture_Model_Implementation.py
          • Handling_Missing_Data_Basic.ipynb
          • Handling_Missing_Data.ipynb
          • Heatmaps_Dendrograms.py
          • Imbalanced_Datasets_SMOTE.py
          • K_Means.py
          • Momentum.py
          • One_hot_encoding.py
          • Pandas_Common.py
          • Pandas_Stack.py
          • PCA_Analysis.ipynb
          • PCA_Based_Anomaly_Detection.py
          • Pycaret_Anomaly.ipynb
          • Pycaret_Example.py
          • Pydantic_More.py
          • Pydantic.py
          • Regression_Logistic_Metrics.ipynb
          • Regularisation.py
          • ROC_Curve.py
          • SVM_Example.py
          • Testing_Pytest.py
          • Testing_unittest.py
          • transfer_learning.py
          • TS_Anomaly_Detection.py
          • Vector_Embedding.py
          • Wikipedia_API.py
          • Word2Vec.py
        • PAPER
          • Attention Is All You Need
          • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
        • project-management
          • 1-on-1 Template
          • 1-to-1's with a Line Manager
          • Asking questions
          • Change Management
          • Communication principles
          • Communication Techniques
          • Communication with Stakeholders
          • Conceptual Model
          • Documentation
          • Education and Training
          • Experiment Plan Template
          • Feedback Template
          • Fishbone diagram
          • How to do git commit messages properly
          • html
          • Jobs to be done
          • Jupyter Book
          • Managing Data Science Teams
          • Modern data team
          • nbconvert slideshows
          • One Pager Template
          • pdoc
          • Problem Definition
          • Process for prototyping
          • project management
          • Project Management Portal
          • Pull Request Template
          • RACI
          • Remaining useful life models
          • Return of Experience Form
          • Reveal.js
          • Technical Debt
          • UML
          • Why use ER diagrams
        • statistics
          • Addressing Multicollinearity
          • ANOVA
          • Assumption of Normality
          • Bernoulli
          • Bootstrap Sampling
          • Casual Inference
          • Central Limit Theorem
          • Central Limit Theorem & Small Sample Sizes
          • Chi-Squared Test
          • Confidence Interval
          • Correlation
          • Correlation vs Causation
          • Cosine Similarity
          • Covariance
          • Covariance vs Correlation
          • Cryptography
          • Differentation
          • Distributions
          • EM Algorithm
          • Factor Analysis
          • Gaussian Distribution
          • Graph Theory
          • Grouped plots
          • Handling Different Distributions
          • Hypothesis testing
          • information theory
          • Interquartile Range (IQR) Detection
          • Johnson–Lindenstrauss lemma
          • Markov chain
          • Mathematics
          • Mean Absolute Error
          • Mean Squared Error
          • mean vs median
          • Multicollinearity
          • non-parametric
          • Odds
          • Odds vs Probability
          • p values
          • Parametric tests
          • parametric vs non-parametric models
          • parametric vs non-parametric tests
          • parsimonious
          • Prediction Intervals
          • Probability
          • Proportion Test
          • Q-Q Plot
          • R
          • R squared
          • R-squared metric not always a good indicator of model performance in regression
          • Reasoning tokens
          • Root Mean Squared Error
          • Sampling
          • Spearman vs Pearson Correlation
          • Standard deviation
          • Standardisation
          • Statistical Assumptions
          • Statistical Tests
          • Statistical theorems
          • Statistics
          • statsmodels
          • Stochastic Gradient Descent
          • Symbolic computation
          • Sympy
          • T-test
          • univariate vs multivariate
          • Variance
          • Violin plot
          • Z-Normalisation
          • Z-Score
          • Z-Scores vs Prediction Intervals
          • Z-Test
        • uncategorised
          • Investigate pyodbc
          • NLP Portal
          • Science Portal
        • pages
          • Data Archive
          • DE_Tools
          • ML_Tools
          • Quotes
          • Research Questions
          • Reviews

      Created with Quartz v4.3.1 © 2025

      • GitHub
      • Linkedin