Data Archive

    • categories
      • computer-science
        • Algorithms
        • Big O Notation
        • BM25 (Best Match 25)
        • Checksum
        • Computer Science
        • Concurrency
        • Convex Optimisation
        • csv module
        • Directed Acyclic Graph (DAG)
        • Flask
        • garbage collector
        • Generators in Python
        • Hash
        • Heap Data Structure
        • Heap Memory
        • How to search within a graph
        • Immutable vs mutable
        • Java
        • Java vs JavaScript
        • JavaScript
        • Knowledge Graph
        • Langchain
        • Machine Learning Algorithms
        • Monte Carlo Simulation
        • Multiprocessing vs Multithreading
        • Multithreading
        • neomodel
        • Node.JS
        • Numpy
        • Processes vs Threads
        • programming languages
        • PyGraphviz
        • QuickSort
        • Ranking models
        • Recursive Algorithm
        • Strongly vs Weakly typed language
        • Times Series Python Packages
      • data-analysis
        • Altair
        • altair versus seaborn
        • Binder
        • Boxplot
        • Dash
        • Dashboarding
        • Dashboards
        • Data Analysis
        • Data Analysis Portal
        • Data Analyst
        • Data Distribution
        • Data Mining
        • Data Product
        • Data Reduction
        • Data Visualisation
        • DuckDB
        • EDA
        • ER Diagrams
        • Heatmap
        • Label encoding
        • Linear Discriminant Analysis
        • Log transformation
        • Looker Studio
        • MariaDB vs MySQL
        • Melt
        • Multiple Correspondence Analysis
        • Multivariate Analysis
        • OLAP
        • Page Rank
        • Parquet
        • Plotly
        • PowerBI
        • Preprocessing
        • Preprocessing Text Classification
        • Seaborn
        • SQL Window functions
        • t-SNE
        • Tableau
      • data-engineering
        • ACID Transaction
        • Ada boosting
        • Adding a database to PostgreSQL
        • Aggregation
        • Apache Iceberg
        • Attack mitigation
        • Attack types
        • AWS Lambda
        • Azure
        • Benefits of Data Transformation
        • Big Data
        • BigQuery
        • Cassandra
        • Cloud Providers
        • Coaching & Mentoring
        • Columnar Storage
        • Command Prompt
        • Common Table Expression
        • Components of the database
        • Covering Index
        • Crosstab
        • CRUD
        • CUDA
        • Curse of dimensionality
        • Cypher
        • Data Architect
        • Data Architecture
        • Data Cleansing
        • Data Contract
        • Data Deployment
        • Data Dictionary
        • Data Drift
        • Data Engineering
        • Data Engineering Portal
        • Data Engineering Tools
        • Data Evaluation
        • Data Hierarchy of Needs
        • Data Integration
        • Data Integrity
        • Data Lake
        • Data Lakehouse
        • Data Leakage
        • Data Lifecycle Management
        • data lineage
        • Data Management
        • Data Modeling
        • Data Observability
        • Data Principles
        • Data Quality
        • Data Security
        • Data Selection
        • Data Sources
        • Data Storage
        • Data Transformation
        • Data Transformation in Data Engineering
        • Data Transformation with Pandas
        • Data Validation
        • Data Virtualization
        • Data Warehouse
        • Database
        • Database Index
        • Database Management System (DBMS)
        • Database Schema
        • Database Storage
        • Database Techniques
        • DataOps
        • dbt 1
        • design pattern
        • Digital twin
        • Distributed Computing
        • DuckDB in python
        • DuckDB vs SQLite
        • Durability
        • ELT
        • Estimator
        • ETL
        • ETL 1
        • ETL Pipeline Example
        • ETL vs ELT
        • EtLT
        • Event Driven Microservices
        • Event-Driven Architecture
        • Fabric
        • Faker
        • File Management
        • Folder Tree Diagram
        • Foreign Key
        • Github Actions
        • Google Sheet Pivots Table
        • Grain
        • Graph Query Language
        • Groupby
        • Groupby vs Crosstab
        • heterogeneous features
        • Honkit
        • Hosting
        • How is schema evolution done in practice with SQL
        • How to normalise a merged table
        • Implementing Database Schema
        • Imputation Techniques
        • in-memory format
        • incremental synchronization
        • Indexing in cypher
        • Input is Not Properly Sanitized
        • Joining Datasets
        • Junction Tables
        • KNIME
        • Logical Model
        • Many-to-Many Relationships
        • map reduce
        • MariaDB
        • master data management
        • Merge
        • Microsoft Access
        • Missing Data
        • Model Deployment
        • Monolith Architecture
        • Multi-level index
        • Multiprocessing
        • MySql
        • neo4j
        • Normalised Schema
        • NoSQL
        • Object Relational Mapper
        • OLTP
        • Overfitting
        • Pandas
        • Pandas join vs merge
        • Pandas Pivot Table
        • Pandas Stack
        • pd.Grouper
        • pgAdmin
        • Pgadmin Permissions on Windows
        • Physical Model
        • Pickle
        • Poetry
        • Polars
        • PostgreSQL
        • Postman
        • PowerShell
        • Prevention Is Better Than The Cure
        • Primary Key
        • Push-Down
        • Pydantic
        • Pyright vs Pydantic
        • Query Optimisation
        • Querying
        • Querying Time Series
        • Race Conditions
        • Relating Tables Together
        • Relational Database
        • reverse etl
        • rollup
        • Row parameters in SQL
        • Row-based Storage
        • Scalability
        • Scaling Server
        • Schema Evolution
        • Search
        • Security mitigation
        • Security Researcher
        • semantic layer
        • Single Source of Truth
        • Sklearn Pipiline
        • Slowly Changing Dimension
        • SMSS
        • Snowflake Schema
        • Soft Deletion
        • Software Design Patterns
        • Spreadsheets vs Databases
        • SQL
        • SQL Groupby
        • SQL Injection
        • SQL Joins
        • SQLAlchemy
        • SQLAlchemy vs. sqlite3
        • SQLite
        • SQLite Studio
        • Star Schema
        • storage layer object store
        • Stored Procedures
        • structured data
        • Structuring and organizing data
        • Transaction
        • Turning a flat file into a database
        • Types of Database Schema
        • Unix
        • unstructured data
        • Usability
        • Vacuum
        • Vector Database
        • Vectorized Engine
        • View Use Case
        • Views
        • Windows Subsystem for Linux
      • data-science
        • ACF Plots
        • Additive vs Multiplicative Models Time Series
        • ADF Test
        • Agent Exploration
        • Agentic Solutions
        • AI
        • ARIMA
        • ARIMA vs Random Forest in Time Series
        • Autocorrelation
        • Autocorrelation vs Autoregression
        • Autoregression
        • Baseline Forecast
        • Basics of Time Series
        • Batch gradient descent
        • Bellman Equations
        • Bias-Variance Trade Off
        • Capability
        • Choosing a Threshold
        • Choosing the Number of Clusters
        • Clustermap
        • Covariance Structures
        • Cross Validation
        • Data Assessment
        • Data Collection
        • Data Mining - CRISP
        • Data Preparation
        • Data Science
        • Data Scientist
        • Data Understanding
        • Datasets
        • Decomposition in Time Series
        • Differencing in Time Series
        • DS & ML Portal
        • Dynamic Time Warping
        • Evaluating Time Series Forecasts
        • Evolving Seasonality
        • F-statistic
        • Feature Engineering
        • Feature Scaling
        • Feature Selection vs Feature Importance
        • Forecasting using Lags
        • Forecasting with Autoregressive (AR) Models
        • Forward Propagation
        • Gaussian Mixture Models
        • Gitlab
        • Gompertz Model
        • Good Enough Principle in Data Projects
        • GraphRAG
        • Handling Missing Data
        • Holt-Winters (Exponential Smoothing)
        • Holt-Winters vs ARIMA
        • Holt’s Linear Trend Model (Double Exponential Smoothing)
        • how do you do the data selection
        • Imbalanced Datasets
        • Interpolation
        • Intervention Analysis
        • Joining Time Series
        • Kernel Machines
        • KPSS Test
        • Latency
        • Logistic Model Curve
        • LSTM in Time Series
        • Mean Absolute Percentage Error
        • MNIST
        • Normalisation
        • Out-of-sample rolling forecast evaluation
        • PACF Plots
        • Performance Dimensions
        • pmdarima
        • Properties of Time Series Models
        • Prophet
        • Random Forest Regression
        • Residuals in Time Series
        • Scatter Plots
        • Scientific Method
        • Scipy
        • Seasonal Naive Forecast
        • Seasonality in Time Series
        • SHapley Additive exPlanations
        • Shot Learning
        • Silhouette Analysis
        • Simple Exponential Smoothing (SES)
        • sklearn datasets
        • SMOTE (Synthetic Minority Over-sampling Technique)
        • SparseCategorialCrossentropy or CategoricalCrossEntropy
        • stack memory
        • Stacking
        • Stationary Time Series
        • STL Decomposition
        • Time Series
        • Time Series Forecasting
        • Time Series Forecasts in Business
        • Time Series Learning Resources
        • Time Series Shocks
        • Trends in Time Series
      • deep-learning
        • Convolutional Neural Networks
        • Deep Learning
        • How is reinforcement learning being combined with deep learning
        • LSTM
        • Multi-Agent Reinforcement Learning
        • Policy
        • Relu
        • Sarsa
      • devops
        • AB testing
        • Alternatives to Batch Processing
        • Amazon S3
        • Apache Airflow
        • Apache Kafka
        • Apache Spark
        • API
        • API Driven Microservices
        • Bash
        • bat
        • Batch Processing
        • Batch vs PowerShell scripts
        • Catalogs, Schemas, and Tables in Databricks
        • CI-CD
        • Click
        • Clustering_Dashboard.py
        • Code Diagrams
        • Command Line
        • Continuous Delivery - Deployment
        • Continuous Integration
        • Cron jobs
        • dagster
        • Data Ingestion
        • Data Orchestration
        • Data Pipeline
        • Data Pipeline to Data Products
        • Data Streaming
        • Databricks
        • Databricks vs Snowflake
        • dbt
        • Debugging
        • Declarative Data Pipeline
        • Delta Tables in Databricks
        • dependency manager
        • DevOps
        • Devops Portal
        • Digital Transformation
        • Docker
        • Docker Image
        • Elastic Net
        • Environment Variables
        • Epub
        • Event Driven
        • Event Driven Events
        • Everything
        • Excel
        • Excel pivot table
        • Excel vs Google Sheets
        • FastAPI
        • Firebase
        • frontend
        • functional programming
        • GIS
        • Git
        • Github Gists
        • gitlab-ci.yml
        • Global Interpreter Lock
        • Google Cloud Platform
        • Google Colab
        • Google My Maps Data Extraction
        • Google Sheets
        • GPT
        • Gradio
        • Grep
        • Hadoop
        • Hugging Face
        • imperative
        • ipynb
        • jinja template
        • Json
        • Json to SQLite
        • jupytext
        • Justfile
        • kubernetes
        • Load Balancing
        • Loading Google Sheets into Databricks
        • Maintainability
        • Maintainable Code
        • Makefile
        • Master Observability Datadog
        • Memory
        • Memory Caching
        • Microsoft
        • MongoDB
        • nbconvert
        • NET
        • Normalisation of Text
        • Overwriting and Refreshing Tables in Databricks
        • Pandas Series vs DataFrame
        • Pandoc
        • PMML
        • Powerquery
        • Powershell scripts
        • Powershell versus Command Prompt
        • Powershell vs Bash
        • Publish and Subscribe
        • PySpark
        • Pytest
        • Python
        • Quartz
        • Random Access Memory
        • React
        • Registering a Scheduled Task
        • REST API
        • Scala
        • Security Vulnerabilities
        • shapefile
        • Sharepoint
        • Snowflake
        • Snowflake vs Hadoop
        • Software Development Life Cycle
        • Spark DataFrames in Databricks
        • SQL vs NoSQL
        • Streamlit
        • Technical Design Doc Template
        • Terminal commands
        • Testing
        • TOML
        • tool.bandit
        • tool.ruff
        • tool.uv
        • Types of Computational Bugs
        • TypeScript
        • Ubuntu
        • unittest
        • Using requirements or env.yml
        • Vercel
        • Virtual environments
        • Web Feature Server (WFS)
        • Web Map Tile Service (WMTS)
        • Why JSON is Better than Pickle for Untrusted Data
        • Windows
        • Windows Scheduled Tasks
        • yaml
      • industry
        • AI Engineer
        • AI governance
        • Analytics Engineer
        • business intelligence
        • Business observability
        • Business Understanding
        • Business Values
        • Data AI Education at Work
        • Data Engineer
        • Data Governance
        • data literacy
        • Data Roles
        • Data Steward
        • Design Thinking Questions
        • Documentation & Meetings
        • Energy
        • Energy ABM
        • Energy Demand Forecasting
        • Energy Storage
        • Facts
        • Gartner Hype Cycle
        • Industries of interest
        • Knowledge Work
        • Managing People
        • ML Engineer
        • Network Design
        • Operational Resilience for Growth and Adaptability
        • Reporting
        • Scaling Data Science Capability
        • Smart Grids
        • Telecommunications
        • Thinking Systems
        • Use of RNNs in energy sector
        • Working with SMEs
      • machine-learning
        • Accuracy
        • Activation atlases
        • Activation Function
        • Active Learning
        • Adam Optimizer
        • Adaptive Learning Rates
        • Adjusted R squared
        • Agent-Based Modelling
        • AIC in Model Evaluation
        • Anomaly Detection
        • Anomaly Detection in Time Series
        • Anomaly Detection with Clustering
        • Anomaly Detection with Statistical Methods
        • Assessing Gen AI generated content
        • AUC
        • Automated Feature Creation
        • AutoML
        • Backpropagation
        • Bagging
        • Batch Normalisation
        • Bias in ML
        • Binary Classification
        • Boosting
        • Business value of anomaly detection
        • CART
        • CatBoost
        • Challenges to Model Deployment
        • Class Separability
        • Classification
        • Classification Report
        • Cluster Density
        • Cluster Seperation
        • Clustering
        • Collaborative Filtering
        • conceptual data model
        • Confusion Matrix
        • Cost Function
        • Cost-Sensitive Analysis
        • Cross Entropy
        • Customer Growth Modeling
        • Data Selection in ML
        • Data Transformation in Machine Learning
        • DBSCAN
        • Decision Theory
        • Decision Tree
        • Decision Trees are Fragile
        • Deep Learning Frameworks
        • Deep Q-Learning
        • Dendrograms
        • Determining Threshold Values
        • Dimension Table
        • Dimensional Modelling
        • Dimensionality Reduction
        • Dimensions
        • Distributions in Decision Tree Leaves
        • Dropout
        • Dummy variable trap
        • Edge ML
        • emergent behavior
        • Encoding Categorical Variables
        • Epoch
        • Evaluating Language Models
        • Evaluating Logistic Regression
        • Evaluating the effectiveness of prompts
        • Evaluation Metrics
        • Exploration vs Exploitation
        • Exponential Smoothing
        • f-regression
        • F1 Score
        • Fact Table
        • FAISS
        • Feature Engineering for Time Series
        • Feature Evaluation
        • Feature Extraction
        • Feature Importance
        • Feature Selection
        • Feature Transformations
        • Feed Forward Neural Network
        • Filter Methods
        • Fitting weights and biases of a neural network
        • Framework for models
        • Gaussian Model
        • General Linear Regression
        • Generalisation
        • Generative Adversarial Networks
        • Gini Impurity
        • Gini Impurity vs Cross Entropy
        • Gradient Boosted Trees
        • Gradient Boosting
        • Gradient Boosting Regressor
        • Gradient Descent
        • Gradient descent in linear regression
        • granularity
        • Graph Neural Network
        • Graph Theory Community
        • GridSeachCv
        • Growth Models in Time Series
        • GRU
        • Hierarchical Clustering
        • High cross validation accuracy is not directly proportional to performance on unseen test data
        • Histogram
        • How do we evaluate of LLM Outputs
        • How to use Sklearn Pipeline
        • Hyperparameter
        • Hyperparameter Tuning
        • Impact of multicollinearity on model parameters
        • Inertia K Means Cost Function
        • inference
        • inference versus prediction
        • initialization methods
        • Interoperability
        • interoperable
        • Interpretability
        • Interpreting logistic regression model parameters
        • Isolated Forest
        • Jaccard Coefficient
        • K-means
        • K-nearest neighbours
        • Keras
        • Kernel Density Estimation
        • Kernelling
        • Kmeans vs GMM
        • L1 Regularisation
        • Label encoding vs One-hot encoding
        • Labelling data
        • Lagrange multipliers in optimisation
        • lambda architecture
        • Latent Dirichlet Allocation
        • Latent Semantic Indexing
        • LBFGS
        • Learning Curve
        • Learning Rate
        • Learning Styles
        • LightGBM
        • LightGBM vs XGBoost vs CatBoost
        • Linear Regression
        • LLM Evaluation Metrics
        • Local Interpretable Model-agnostic Explainations
        • Local Outlier Factor (LOF)
        • Logistic Regression
        • Logistic Regression does not predict probabilities
        • Logistic regression in sklearn & Gradient Descent
        • Logistic Regression Statsmodel Summary table
        • Loss function
        • Loss versus Cost function
        • Machine Learning
        • Machine Learning Operations
        • Manifold Learning
        • Markov Decision Processes
        • Maximum Likelihood Estimation
        • Median Absolute Error
        • Mermaid
        • Metadata Handling
        • Methods for Handling Outliers
        • Metric
        • Mini-batch gradient descent
        • Model Building
        • Model Deployment using PyCaret
        • Model Ensemble
        • Model Evaluation
        • Model Evaluation vs Model Optimisation
        • Model Interpretability
        • Model Observability
        • Model Optimisation
        • Model Parameters
        • Model Parameters Tuning
        • Model parameters vs hyperparameters
        • Model Selection
        • Model Validation
        • model-agnostic feature importance
        • Momentum
        • Moving Average Forecast
        • Multinomial Naive bayes
        • Multiple Linear Regression
        • Naive Bayes Classifier
        • Naive Forecast
        • Neural network
        • Neural Network Classification
        • Neural network in Practice
        • Neural Scaling Laws
        • Non-negative matrix factorization in ML
        • Non-parametric tests
        • Normalisation of data
        • Normalisation vs Standardisation
        • objective function
        • One-hot encoding
        • Optimisation function
        • Optimisation techniques
        • Optimising a Logistic Regression Model
        • Optimising Neural Networks
        • Optuna
        • Order matters in Boosting
        • Ordinary Least Squares
        • Orthogonalization
        • Outliers
        • Over parameterised models
        • PCA Explained Variance Ratio
        • PCA Principal Components
        • PCA-Based Anomaly Detection
        • PDP and ICE
        • Percentile Detection
        • Performance Drift
        • Polynomial Regression
        • Positional Encoding
        • Precision
        • Precision or Recall
        • Precision-Recall Curve
        • Prediction Intervals vs Confidence Interval
        • Principal Component Analysis
        • PyCaret
        • PyOD
        • PyTorch
        • Pytorch vs Tensorflow
        • Q-Learning
        • Random Forest
        • Random Forest for Time Series
        • Recall
        • Recommender systems
        • Recurrent Neural Networks
        • Regression
        • Regression Metrics
        • Regularisation
        • Regularisation of Tree based models
        • Reinforcement learning
        • Relationships in memory
        • Reward Function
        • Ridge
        • ROC (Receiver Operating Characteristic)
        • Sammon’s Mapping
        • SARIMA
        • Scikit-Learn
        • Secretary Problem
        • semi-structured data
        • Sentence Transformers
        • Sklearn Pipeline
        • Specificity
        • Spectral Clustering
        • Supervised Learning
        • Support Vector Classifier
        • Support Vector Machines
        • Support Vector Regression
        • Tensorflow
        • Test Loss When Evaluating Models
        • Text Classification
        • Time Series Python Packages
        • Train-Dev-Test Sets
        • Transfer Learning
        • Transformed Target Regressor
        • Transformer
        • Transformers vs RNNs
        • Type I Error (False Positive)
        • Type II Error (False Negative)
        • Types of Neural Networks
        • Typical Output Formats in Neural Networks
        • UMAP
        • Unsupervised Learning
        • Use Cases for a Simple Neural Network Like
        • vanishing and exploding gradients problem
        • Variability in linear models
        • Variance in ML
        • Vector Embedding
        • WCSS and elbow method
        • Weak Learners
        • When and why not to us regularisation
        • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
        • Why does the Adam Optimizer converge
        • Why Removing Outliers May Improve Regression but Harm Classification
        • Why standardise features
        • Why Type 1 and Type 2 matter
        • Wrapper Methods
        • Xaiver
        • XGBoost
      • natural-language
        • AI Agents Memory
        • Attention mechanism
        • Bag of words
        • BERT
        • BERTScore
        • Chain of thought
        • ChatGPT
        • Claude
        • Comparing LLMs
        • Distillation
        • ElasticSearch
        • Embedded Methods
        • embeddings for OOV words
        • Evaluate Embedding Methods
        • Fuzzywuzzy
        • Generative AI
        • Generative AI From Theory to Practice
        • Grammar method
        • Guardrails
        • How businesses use Gen AI
        • How LLMs store facts
        • How to reduce the need for Gen AI responses
        • How would you decide between using TF-IDF and Word2Vec for text vectorization
        • In NER how would you handle ambiguous entities
        • Key Components of Attention and Formula
        • Knowledge graph vs RAG setup
        • Language Model Output Optimisation
        • Language Models
        • Language Models Large (LLMs) vs Small (SLMs)
        • lemmatization
        • LLM
        • LLM Memory
        • Local LLM use cases
        • Mathematical Reasoning in Transformers
        • Mixture of Experts
        • Model Cascading
        • Multi-head attention
        • Named Entity Recognition
        • NER Implementation
        • Ngrams
        • NLP
        • nltk
        • Non-negative Matrix Factorization
        • NotebookLM
        • OOV words
        • Pandas Dataframe Agent
        • Part of speech tagging
        • Prompt Engineering
        • prompt retrievers
        • Prompts
        • Pyright
        • RAG
        • Scaling Agentic Systems
        • Self attention vs multi-head attention
        • Self-Attention
        • Semantic Relationships
        • Semantic search
        • Sentence Similarity
        • Sentence Transformer Workflow
        • Similarity Search
        • Small Language Models
        • spaCy
        • Stemming
        • stopwords
        • Summarisation
        • syntactic relationships
        • Text2Cypher
        • TF-IDF
        • TF-IDF Implementation
        • Tokenisation
        • topic modeling
        • Vectorisation
        • Why is named entity recognition (NER) a challenging task
        • Word2vec
        • WordNet
      • OTHER
        • Addressing_Multicollinearity.py
        • Bag_of_Words.py
        • Bandit example output
        • Bandit_Example_Fixed.py
        • Click_Implementation.py
        • Comparing_Ensembles.py
        • Cross_Entropy_Single.py
        • Cross_Entropy.py
        • Debugging.py
        • Distribution_Analysis.py
        • Factor_Analysis.py
        • FastAPI_Example.py
        • Forecasting_AutoArima.py
        • Forecasting_Baseline.py
        • Forecasting_Exponential_Smoothing.py
        • Gaussian_Mixture_Model_Implementation.py
        • Handling_Missing_Data_Basic.ipynb
        • Handling_Missing_Data.ipynb
        • Imbalanced_Datasets_SMOTE.py
        • K_Means.py
        • Momentum.py
        • One_hot_encoding.py
        • Pandas_Common.py
        • Pandas_Stack.py
        • PCA_Analysis.ipynb
        • PCA_Based_Anomaly_Detection.py
        • Pycaret_Anomaly.ipynb
        • Pycaret_Example.py
        • Pydantic_More.py
        • Pydantic.py
        • Regression_Logistic_Metrics.ipynb
        • ROC_Curve.py
        • SVM_Example.py
        • Testing_Pytest.py
        • Testing_unittest.py
        • transfer_learning.py
        • TS_Anomaly_Detection.py
        • Vector_Embedding.py
        • Wikipedia_API.py
        • Word2Vec.py
      • PAPER
        • Attention Is All You Need
        • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
      • project-management
        • 1-on-1 Template
        • 1-to-1's with a Line Manager
        • Asking questions
        • Change Management
        • Communication principles
        • Communication Techniques
        • Communication with Stakeholders
        • Conceptual Model
        • Documentation
        • Education and Training
        • Experiment Plan Template
        • Feedback Template
        • Fishbone diagram
        • How to do git commit messages properly
        • html
        • Jobs to be done
        • Jupyter Book
        • Managing Data Science Teams
        • Modern data team
        • nbconvert slideshows
        • One Pager Template
        • pdoc
        • Problem Definition
        • Process for prototyping
        • project management
        • Project Management Portal
        • Pull Request Template
        • RACI
        • Remaining useful life models
        • Return of Experience Form
        • Reveal.js
        • Technical Debt
        • UML
        • Why use ER diagrams
      • statistics
        • Addressing Multicollinearity
        • ANOVA
        • Assumption of Normality
        • Bernoulli
        • Bootstrap Sampling
        • Casual Inference
        • Central Limit Theorem
        • Central Limit Theorem & Small Sample Sizes
        • Chi-Squared Test
        • Confidence Interval
        • Correlation
        • Correlation vs Causation
        • Cosine Similarity
        • Covariance
        • Covariance vs Correlation
        • Cryptography
        • Differentation
        • Distributions
        • EM Algorithm
        • Factor Analysis
        • Gaussian Distribution
        • Graph Theory
        • Grouped plots
        • Handling Different Distributions
        • Hypothesis testing
        • information theory
        • Interquartile Range (IQR) Detection
        • Johnson–Lindenstrauss lemma
        • Markov chain
        • Mathematics
        • Mean Absolute Error
        • Mean Squared Error
        • mean vs median
        • Multicollinearity
        • non-parametric
        • Odds
        • Odds vs Probability
        • p values
        • Parametric tests
        • parametric vs non-parametric models
        • parametric vs non-parametric tests
        • parsimonious
        • Prediction Intervals
        • Probability
        • Proportion Test
        • Q-Q Plot
        • R
        • R squared
        • R-squared metric not always a good indicator of model performance in regression
        • Reasoning tokens
        • Resampling
        • Root Mean Squared Error
        • Spearman vs Pearson Correlation
        • Standard deviation
        • Standardisation
        • Statistical Assumptions
        • Statistical Tests
        • Statistical theorems
        • Statistics
        • statsmodels
        • Stochastic Gradient Descent
        • Symbolic computation
        • Sympy
        • T-test
        • univariate vs multivariate
        • Variance
        • Violin plot
        • Z-Normalisation
        • Z-Score
        • Z-Scores vs Prediction Intervals
        • Z-Test
      • uncategorised
        • Bagging vs Boosting
        • Correlated Time Series
        • Databricks & dbt
        • Granger Causality Test
        • Investigate pyodbc
        • Mean reverting
        • NLP Portal
        • rolling mean vs cumulative mean
        • Science Portal
        • Time sampling
        • Untitled
        • Untitled 1
        • Untitled 2
        • Untitled 2axx
        • Why Use PySpark in Databricks
      • pages
        • Data Archive
        • DE_Tools
        • ML_Tools
        • Quotes
        • Research Questions
        • Reviews
    Home

    ❯

    categories

    ❯

    uncategorised

    Folder: categories/uncategorised

    15 items under this folder.

    • 01 Nov 2025

      Bagging vs Boosting

      • boosting
      • clustering
      • ml
      • ml_process
    • 01 Nov 2025

      Correlated Time Series

      • analysis
      • ml_process
      • statistics
      • time_series
    • 01 Nov 2025

      Databricks & dbt

      • data_integration
      • data_modeling
      • data_pipeline
      • governance
    • 01 Nov 2025

      Granger Causality Test

      • 01 Nov 2025

        Mean reverting

        • 01 Nov 2025

          NLP Portal

          • portal
        • 01 Nov 2025

          Science Portal

          • 01 Nov 2025

            Time sampling

            • 01 Nov 2025

              Untitled 1

              • 01 Nov 2025

                Untitled 2

                • 01 Nov 2025

                  Untitled 2axx

                  • 01 Nov 2025

                    Untitled

                    • 01 Nov 2025

                      Why Use PySpark in Databricks

                      • 01 Nov 2025

                        Investigate pyodbc

                        • SQL
                        • python
                        • database
                        • collection
                      • 01 Nov 2025

                        rolling mean vs cumulative mean

                        • analysis
                        • statistics
                        • time_series

                      Backlinks

                      • No backlinks found
                        • categories
                          • computer-science
                            • Algorithms
                            • Big O Notation
                            • BM25 (Best Match 25)
                            • Checksum
                            • Computer Science
                            • Concurrency
                            • Convex Optimisation
                            • csv module
                            • Directed Acyclic Graph (DAG)
                            • Flask
                            • garbage collector
                            • Generators in Python
                            • Hash
                            • Heap Data Structure
                            • Heap Memory
                            • How to search within a graph
                            • Immutable vs mutable
                            • Java
                            • Java vs JavaScript
                            • JavaScript
                            • Knowledge Graph
                            • Langchain
                            • Machine Learning Algorithms
                            • Monte Carlo Simulation
                            • Multiprocessing vs Multithreading
                            • Multithreading
                            • neomodel
                            • Node.JS
                            • Numpy
                            • Processes vs Threads
                            • programming languages
                            • PyGraphviz
                            • QuickSort
                            • Ranking models
                            • Recursive Algorithm
                            • Strongly vs Weakly typed language
                            • Times Series Python Packages
                          • data-analysis
                            • Altair
                            • altair versus seaborn
                            • Binder
                            • Boxplot
                            • Dash
                            • Dashboarding
                            • Dashboards
                            • Data Analysis
                            • Data Analysis Portal
                            • Data Analyst
                            • Data Distribution
                            • Data Mining
                            • Data Product
                            • Data Reduction
                            • Data Visualisation
                            • DuckDB
                            • EDA
                            • ER Diagrams
                            • Heatmap
                            • Label encoding
                            • Linear Discriminant Analysis
                            • Log transformation
                            • Looker Studio
                            • MariaDB vs MySQL
                            • Melt
                            • Multiple Correspondence Analysis
                            • Multivariate Analysis
                            • OLAP
                            • Page Rank
                            • Parquet
                            • Plotly
                            • PowerBI
                            • Preprocessing
                            • Preprocessing Text Classification
                            • Seaborn
                            • SQL Window functions
                            • t-SNE
                            • Tableau
                          • data-engineering
                            • ACID Transaction
                            • Ada boosting
                            • Adding a database to PostgreSQL
                            • Aggregation
                            • Apache Iceberg
                            • Attack mitigation
                            • Attack types
                            • AWS Lambda
                            • Azure
                            • Benefits of Data Transformation
                            • Big Data
                            • BigQuery
                            • Cassandra
                            • Cloud Providers
                            • Coaching & Mentoring
                            • Columnar Storage
                            • Command Prompt
                            • Common Table Expression
                            • Components of the database
                            • Covering Index
                            • Crosstab
                            • CRUD
                            • CUDA
                            • Curse of dimensionality
                            • Cypher
                            • Data Architect
                            • Data Architecture
                            • Data Cleansing
                            • Data Contract
                            • Data Deployment
                            • Data Dictionary
                            • Data Drift
                            • Data Engineering
                            • Data Engineering Portal
                            • Data Engineering Tools
                            • Data Evaluation
                            • Data Hierarchy of Needs
                            • Data Integration
                            • Data Integrity
                            • Data Lake
                            • Data Lakehouse
                            • Data Leakage
                            • Data Lifecycle Management
                            • data lineage
                            • Data Management
                            • Data Modeling
                            • Data Observability
                            • Data Principles
                            • Data Quality
                            • Data Security
                            • Data Selection
                            • Data Sources
                            • Data Storage
                            • Data Transformation
                            • Data Transformation in Data Engineering
                            • Data Transformation with Pandas
                            • Data Validation
                            • Data Virtualization
                            • Data Warehouse
                            • Database
                            • Database Index
                            • Database Management System (DBMS)
                            • Database Schema
                            • Database Storage
                            • Database Techniques
                            • DataOps
                            • dbt 1
                            • design pattern
                            • Digital twin
                            • Distributed Computing
                            • DuckDB in python
                            • DuckDB vs SQLite
                            • Durability
                            • ELT
                            • Estimator
                            • ETL
                            • ETL 1
                            • ETL Pipeline Example
                            • ETL vs ELT
                            • EtLT
                            • Event Driven Microservices
                            • Event-Driven Architecture
                            • Fabric
                            • Faker
                            • File Management
                            • Folder Tree Diagram
                            • Foreign Key
                            • Github Actions
                            • Google Sheet Pivots Table
                            • Grain
                            • Graph Query Language
                            • Groupby
                            • Groupby vs Crosstab
                            • heterogeneous features
                            • Honkit
                            • Hosting
                            • How is schema evolution done in practice with SQL
                            • How to normalise a merged table
                            • Implementing Database Schema
                            • Imputation Techniques
                            • in-memory format
                            • incremental synchronization
                            • Indexing in cypher
                            • Input is Not Properly Sanitized
                            • Joining Datasets
                            • Junction Tables
                            • KNIME
                            • Logical Model
                            • Many-to-Many Relationships
                            • map reduce
                            • MariaDB
                            • master data management
                            • Merge
                            • Microsoft Access
                            • Missing Data
                            • Model Deployment
                            • Monolith Architecture
                            • Multi-level index
                            • Multiprocessing
                            • MySql
                            • neo4j
                            • Normalised Schema
                            • NoSQL
                            • Object Relational Mapper
                            • OLTP
                            • Overfitting
                            • Pandas
                            • Pandas join vs merge
                            • Pandas Pivot Table
                            • Pandas Stack
                            • pd.Grouper
                            • pgAdmin
                            • Pgadmin Permissions on Windows
                            • Physical Model
                            • Pickle
                            • Poetry
                            • Polars
                            • PostgreSQL
                            • Postman
                            • PowerShell
                            • Prevention Is Better Than The Cure
                            • Primary Key
                            • Push-Down
                            • Pydantic
                            • Pyright vs Pydantic
                            • Query Optimisation
                            • Querying
                            • Querying Time Series
                            • Race Conditions
                            • Relating Tables Together
                            • Relational Database
                            • reverse etl
                            • rollup
                            • Row parameters in SQL
                            • Row-based Storage
                            • Scalability
                            • Scaling Server
                            • Schema Evolution
                            • Search
                            • Security mitigation
                            • Security Researcher
                            • semantic layer
                            • Single Source of Truth
                            • Sklearn Pipiline
                            • Slowly Changing Dimension
                            • SMSS
                            • Snowflake Schema
                            • Soft Deletion
                            • Software Design Patterns
                            • Spreadsheets vs Databases
                            • SQL
                            • SQL Groupby
                            • SQL Injection
                            • SQL Joins
                            • SQLAlchemy
                            • SQLAlchemy vs. sqlite3
                            • SQLite
                            • SQLite Studio
                            • Star Schema
                            • storage layer object store
                            • Stored Procedures
                            • structured data
                            • Structuring and organizing data
                            • Transaction
                            • Turning a flat file into a database
                            • Types of Database Schema
                            • Unix
                            • unstructured data
                            • Usability
                            • Vacuum
                            • Vector Database
                            • Vectorized Engine
                            • View Use Case
                            • Views
                            • Windows Subsystem for Linux
                          • data-science
                            • ACF Plots
                            • Additive vs Multiplicative Models Time Series
                            • ADF Test
                            • Agent Exploration
                            • Agentic Solutions
                            • AI
                            • ARIMA
                            • ARIMA vs Random Forest in Time Series
                            • Autocorrelation
                            • Autocorrelation vs Autoregression
                            • Autoregression
                            • Baseline Forecast
                            • Basics of Time Series
                            • Batch gradient descent
                            • Bellman Equations
                            • Bias-Variance Trade Off
                            • Capability
                            • Choosing a Threshold
                            • Choosing the Number of Clusters
                            • Clustermap
                            • Covariance Structures
                            • Cross Validation
                            • Data Assessment
                            • Data Collection
                            • Data Mining - CRISP
                            • Data Preparation
                            • Data Science
                            • Data Scientist
                            • Data Understanding
                            • Datasets
                            • Decomposition in Time Series
                            • Differencing in Time Series
                            • DS & ML Portal
                            • Dynamic Time Warping
                            • Evaluating Time Series Forecasts
                            • Evolving Seasonality
                            • F-statistic
                            • Feature Engineering
                            • Feature Scaling
                            • Feature Selection vs Feature Importance
                            • Forecasting using Lags
                            • Forecasting with Autoregressive (AR) Models
                            • Forward Propagation
                            • Gaussian Mixture Models
                            • Gitlab
                            • Gompertz Model
                            • Good Enough Principle in Data Projects
                            • GraphRAG
                            • Handling Missing Data
                            • Holt-Winters (Exponential Smoothing)
                            • Holt-Winters vs ARIMA
                            • Holt’s Linear Trend Model (Double Exponential Smoothing)
                            • how do you do the data selection
                            • Imbalanced Datasets
                            • Interpolation
                            • Intervention Analysis
                            • Joining Time Series
                            • Kernel Machines
                            • KPSS Test
                            • Latency
                            • Logistic Model Curve
                            • LSTM in Time Series
                            • Mean Absolute Percentage Error
                            • MNIST
                            • Normalisation
                            • Out-of-sample rolling forecast evaluation
                            • PACF Plots
                            • Performance Dimensions
                            • pmdarima
                            • Properties of Time Series Models
                            • Prophet
                            • Random Forest Regression
                            • Residuals in Time Series
                            • Scatter Plots
                            • Scientific Method
                            • Scipy
                            • Seasonal Naive Forecast
                            • Seasonality in Time Series
                            • SHapley Additive exPlanations
                            • Shot Learning
                            • Silhouette Analysis
                            • Simple Exponential Smoothing (SES)
                            • sklearn datasets
                            • SMOTE (Synthetic Minority Over-sampling Technique)
                            • SparseCategorialCrossentropy or CategoricalCrossEntropy
                            • stack memory
                            • Stacking
                            • Stationary Time Series
                            • STL Decomposition
                            • Time Series
                            • Time Series Forecasting
                            • Time Series Forecasts in Business
                            • Time Series Learning Resources
                            • Time Series Shocks
                            • Trends in Time Series
                          • deep-learning
                            • Convolutional Neural Networks
                            • Deep Learning
                            • How is reinforcement learning being combined with deep learning
                            • LSTM
                            • Multi-Agent Reinforcement Learning
                            • Policy
                            • Relu
                            • Sarsa
                          • devops
                            • AB testing
                            • Alternatives to Batch Processing
                            • Amazon S3
                            • Apache Airflow
                            • Apache Kafka
                            • Apache Spark
                            • API
                            • API Driven Microservices
                            • Bash
                            • bat
                            • Batch Processing
                            • Batch vs PowerShell scripts
                            • Catalogs, Schemas, and Tables in Databricks
                            • CI-CD
                            • Click
                            • Clustering_Dashboard.py
                            • Code Diagrams
                            • Command Line
                            • Continuous Delivery - Deployment
                            • Continuous Integration
                            • Cron jobs
                            • dagster
                            • Data Ingestion
                            • Data Orchestration
                            • Data Pipeline
                            • Data Pipeline to Data Products
                            • Data Streaming
                            • Databricks
                            • Databricks vs Snowflake
                            • dbt
                            • Debugging
                            • Declarative Data Pipeline
                            • Delta Tables in Databricks
                            • dependency manager
                            • DevOps
                            • Devops Portal
                            • Digital Transformation
                            • Docker
                            • Docker Image
                            • Elastic Net
                            • Environment Variables
                            • Epub
                            • Event Driven
                            • Event Driven Events
                            • Everything
                            • Excel
                            • Excel pivot table
                            • Excel vs Google Sheets
                            • FastAPI
                            • Firebase
                            • frontend
                            • functional programming
                            • GIS
                            • Git
                            • Github Gists
                            • gitlab-ci.yml
                            • Global Interpreter Lock
                            • Google Cloud Platform
                            • Google Colab
                            • Google My Maps Data Extraction
                            • Google Sheets
                            • GPT
                            • Gradio
                            • Grep
                            • Hadoop
                            • Hugging Face
                            • imperative
                            • ipynb
                            • jinja template
                            • Json
                            • Json to SQLite
                            • jupytext
                            • Justfile
                            • kubernetes
                            • Load Balancing
                            • Loading Google Sheets into Databricks
                            • Maintainability
                            • Maintainable Code
                            • Makefile
                            • Master Observability Datadog
                            • Memory
                            • Memory Caching
                            • Microsoft
                            • MongoDB
                            • nbconvert
                            • NET
                            • Normalisation of Text
                            • Overwriting and Refreshing Tables in Databricks
                            • Pandas Series vs DataFrame
                            • Pandoc
                            • PMML
                            • Powerquery
                            • Powershell scripts
                            • Powershell versus Command Prompt
                            • Powershell vs Bash
                            • Publish and Subscribe
                            • PySpark
                            • Pytest
                            • Python
                            • Quartz
                            • Random Access Memory
                            • React
                            • Registering a Scheduled Task
                            • REST API
                            • Scala
                            • Security Vulnerabilities
                            • shapefile
                            • Sharepoint
                            • Snowflake
                            • Snowflake vs Hadoop
                            • Software Development Life Cycle
                            • Spark DataFrames in Databricks
                            • SQL vs NoSQL
                            • Streamlit
                            • Technical Design Doc Template
                            • Terminal commands
                            • Testing
                            • TOML
                            • tool.bandit
                            • tool.ruff
                            • tool.uv
                            • Types of Computational Bugs
                            • TypeScript
                            • Ubuntu
                            • unittest
                            • Using requirements or env.yml
                            • Vercel
                            • Virtual environments
                            • Web Feature Server (WFS)
                            • Web Map Tile Service (WMTS)
                            • Why JSON is Better than Pickle for Untrusted Data
                            • Windows
                            • Windows Scheduled Tasks
                            • yaml
                          • industry
                            • AI Engineer
                            • AI governance
                            • Analytics Engineer
                            • business intelligence
                            • Business observability
                            • Business Understanding
                            • Business Values
                            • Data AI Education at Work
                            • Data Engineer
                            • Data Governance
                            • data literacy
                            • Data Roles
                            • Data Steward
                            • Design Thinking Questions
                            • Documentation & Meetings
                            • Energy
                            • Energy ABM
                            • Energy Demand Forecasting
                            • Energy Storage
                            • Facts
                            • Gartner Hype Cycle
                            • Industries of interest
                            • Knowledge Work
                            • Managing People
                            • ML Engineer
                            • Network Design
                            • Operational Resilience for Growth and Adaptability
                            • Reporting
                            • Scaling Data Science Capability
                            • Smart Grids
                            • Telecommunications
                            • Thinking Systems
                            • Use of RNNs in energy sector
                            • Working with SMEs
                          • machine-learning
                            • Accuracy
                            • Activation atlases
                            • Activation Function
                            • Active Learning
                            • Adam Optimizer
                            • Adaptive Learning Rates
                            • Adjusted R squared
                            • Agent-Based Modelling
                            • AIC in Model Evaluation
                            • Anomaly Detection
                            • Anomaly Detection in Time Series
                            • Anomaly Detection with Clustering
                            • Anomaly Detection with Statistical Methods
                            • Assessing Gen AI generated content
                            • AUC
                            • Automated Feature Creation
                            • AutoML
                            • Backpropagation
                            • Bagging
                            • Batch Normalisation
                            • Bias in ML
                            • Binary Classification
                            • Boosting
                            • Business value of anomaly detection
                            • CART
                            • CatBoost
                            • Challenges to Model Deployment
                            • Class Separability
                            • Classification
                            • Classification Report
                            • Cluster Density
                            • Cluster Seperation
                            • Clustering
                            • Collaborative Filtering
                            • conceptual data model
                            • Confusion Matrix
                            • Cost Function
                            • Cost-Sensitive Analysis
                            • Cross Entropy
                            • Customer Growth Modeling
                            • Data Selection in ML
                            • Data Transformation in Machine Learning
                            • DBSCAN
                            • Decision Theory
                            • Decision Tree
                            • Decision Trees are Fragile
                            • Deep Learning Frameworks
                            • Deep Q-Learning
                            • Dendrograms
                            • Determining Threshold Values
                            • Dimension Table
                            • Dimensional Modelling
                            • Dimensionality Reduction
                            • Dimensions
                            • Distributions in Decision Tree Leaves
                            • Dropout
                            • Dummy variable trap
                            • Edge ML
                            • emergent behavior
                            • Encoding Categorical Variables
                            • Epoch
                            • Evaluating Language Models
                            • Evaluating Logistic Regression
                            • Evaluating the effectiveness of prompts
                            • Evaluation Metrics
                            • Exploration vs Exploitation
                            • Exponential Smoothing
                            • f-regression
                            • F1 Score
                            • Fact Table
                            • FAISS
                            • Feature Engineering for Time Series
                            • Feature Evaluation
                            • Feature Extraction
                            • Feature Importance
                            • Feature Selection
                            • Feature Transformations
                            • Feed Forward Neural Network
                            • Filter Methods
                            • Fitting weights and biases of a neural network
                            • Framework for models
                            • Gaussian Model
                            • General Linear Regression
                            • Generalisation
                            • Generative Adversarial Networks
                            • Gini Impurity
                            • Gini Impurity vs Cross Entropy
                            • Gradient Boosted Trees
                            • Gradient Boosting
                            • Gradient Boosting Regressor
                            • Gradient Descent
                            • Gradient descent in linear regression
                            • granularity
                            • Graph Neural Network
                            • Graph Theory Community
                            • GridSeachCv
                            • Growth Models in Time Series
                            • GRU
                            • Hierarchical Clustering
                            • High cross validation accuracy is not directly proportional to performance on unseen test data
                            • Histogram
                            • How do we evaluate of LLM Outputs
                            • How to use Sklearn Pipeline
                            • Hyperparameter
                            • Hyperparameter Tuning
                            • Impact of multicollinearity on model parameters
                            • Inertia K Means Cost Function
                            • inference
                            • inference versus prediction
                            • initialization methods
                            • Interoperability
                            • interoperable
                            • Interpretability
                            • Interpreting logistic regression model parameters
                            • Isolated Forest
                            • Jaccard Coefficient
                            • K-means
                            • K-nearest neighbours
                            • Keras
                            • Kernel Density Estimation
                            • Kernelling
                            • Kmeans vs GMM
                            • L1 Regularisation
                            • Label encoding vs One-hot encoding
                            • Labelling data
                            • Lagrange multipliers in optimisation
                            • lambda architecture
                            • Latent Dirichlet Allocation
                            • Latent Semantic Indexing
                            • LBFGS
                            • Learning Curve
                            • Learning Rate
                            • Learning Styles
                            • LightGBM
                            • LightGBM vs XGBoost vs CatBoost
                            • Linear Regression
                            • LLM Evaluation Metrics
                            • Local Interpretable Model-agnostic Explainations
                            • Local Outlier Factor (LOF)
                            • Logistic Regression
                            • Logistic Regression does not predict probabilities
                            • Logistic regression in sklearn & Gradient Descent
                            • Logistic Regression Statsmodel Summary table
                            • Loss function
                            • Loss versus Cost function
                            • Machine Learning
                            • Machine Learning Operations
                            • Manifold Learning
                            • Markov Decision Processes
                            • Maximum Likelihood Estimation
                            • Median Absolute Error
                            • Mermaid
                            • Metadata Handling
                            • Methods for Handling Outliers
                            • Metric
                            • Mini-batch gradient descent
                            • Model Building
                            • Model Deployment using PyCaret
                            • Model Ensemble
                            • Model Evaluation
                            • Model Evaluation vs Model Optimisation
                            • Model Interpretability
                            • Model Observability
                            • Model Optimisation
                            • Model Parameters
                            • Model Parameters Tuning
                            • Model parameters vs hyperparameters
                            • Model Selection
                            • Model Validation
                            • model-agnostic feature importance
                            • Momentum
                            • Moving Average Forecast
                            • Multinomial Naive bayes
                            • Multiple Linear Regression
                            • Naive Bayes Classifier
                            • Naive Forecast
                            • Neural network
                            • Neural Network Classification
                            • Neural network in Practice
                            • Neural Scaling Laws
                            • Non-negative matrix factorization in ML
                            • Non-parametric tests
                            • Normalisation of data
                            • Normalisation vs Standardisation
                            • objective function
                            • One-hot encoding
                            • Optimisation function
                            • Optimisation techniques
                            • Optimising a Logistic Regression Model
                            • Optimising Neural Networks
                            • Optuna
                            • Order matters in Boosting
                            • Ordinary Least Squares
                            • Orthogonalization
                            • Outliers
                            • Over parameterised models
                            • PCA Explained Variance Ratio
                            • PCA Principal Components
                            • PCA-Based Anomaly Detection
                            • PDP and ICE
                            • Percentile Detection
                            • Performance Drift
                            • Polynomial Regression
                            • Positional Encoding
                            • Precision
                            • Precision or Recall
                            • Precision-Recall Curve
                            • Prediction Intervals vs Confidence Interval
                            • Principal Component Analysis
                            • PyCaret
                            • PyOD
                            • PyTorch
                            • Pytorch vs Tensorflow
                            • Q-Learning
                            • Random Forest
                            • Random Forest for Time Series
                            • Recall
                            • Recommender systems
                            • Recurrent Neural Networks
                            • Regression
                            • Regression Metrics
                            • Regularisation
                            • Regularisation of Tree based models
                            • Reinforcement learning
                            • Relationships in memory
                            • Reward Function
                            • Ridge
                            • ROC (Receiver Operating Characteristic)
                            • Sammon’s Mapping
                            • SARIMA
                            • Scikit-Learn
                            • Secretary Problem
                            • semi-structured data
                            • Sentence Transformers
                            • Sklearn Pipeline
                            • Specificity
                            • Spectral Clustering
                            • Supervised Learning
                            • Support Vector Classifier
                            • Support Vector Machines
                            • Support Vector Regression
                            • Tensorflow
                            • Test Loss When Evaluating Models
                            • Text Classification
                            • Time Series Python Packages
                            • Train-Dev-Test Sets
                            • Transfer Learning
                            • Transformed Target Regressor
                            • Transformer
                            • Transformers vs RNNs
                            • Type I Error (False Positive)
                            • Type II Error (False Negative)
                            • Types of Neural Networks
                            • Typical Output Formats in Neural Networks
                            • UMAP
                            • Unsupervised Learning
                            • Use Cases for a Simple Neural Network Like
                            • vanishing and exploding gradients problem
                            • Variability in linear models
                            • Variance in ML
                            • Vector Embedding
                            • WCSS and elbow method
                            • Weak Learners
                            • When and why not to us regularisation
                            • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
                            • Why does the Adam Optimizer converge
                            • Why Removing Outliers May Improve Regression but Harm Classification
                            • Why standardise features
                            • Why Type 1 and Type 2 matter
                            • Wrapper Methods
                            • Xaiver
                            • XGBoost
                          • natural-language
                            • AI Agents Memory
                            • Attention mechanism
                            • Bag of words
                            • BERT
                            • BERTScore
                            • Chain of thought
                            • ChatGPT
                            • Claude
                            • Comparing LLMs
                            • Distillation
                            • ElasticSearch
                            • Embedded Methods
                            • embeddings for OOV words
                            • Evaluate Embedding Methods
                            • Fuzzywuzzy
                            • Generative AI
                            • Generative AI From Theory to Practice
                            • Grammar method
                            • Guardrails
                            • How businesses use Gen AI
                            • How LLMs store facts
                            • How to reduce the need for Gen AI responses
                            • How would you decide between using TF-IDF and Word2Vec for text vectorization
                            • In NER how would you handle ambiguous entities
                            • Key Components of Attention and Formula
                            • Knowledge graph vs RAG setup
                            • Language Model Output Optimisation
                            • Language Models
                            • Language Models Large (LLMs) vs Small (SLMs)
                            • lemmatization
                            • LLM
                            • LLM Memory
                            • Local LLM use cases
                            • Mathematical Reasoning in Transformers
                            • Mixture of Experts
                            • Model Cascading
                            • Multi-head attention
                            • Named Entity Recognition
                            • NER Implementation
                            • Ngrams
                            • NLP
                            • nltk
                            • Non-negative Matrix Factorization
                            • NotebookLM
                            • OOV words
                            • Pandas Dataframe Agent
                            • Part of speech tagging
                            • Prompt Engineering
                            • prompt retrievers
                            • Prompts
                            • Pyright
                            • RAG
                            • Scaling Agentic Systems
                            • Self attention vs multi-head attention
                            • Self-Attention
                            • Semantic Relationships
                            • Semantic search
                            • Sentence Similarity
                            • Sentence Transformer Workflow
                            • Similarity Search
                            • Small Language Models
                            • spaCy
                            • Stemming
                            • stopwords
                            • Summarisation
                            • syntactic relationships
                            • Text2Cypher
                            • TF-IDF
                            • TF-IDF Implementation
                            • Tokenisation
                            • topic modeling
                            • Vectorisation
                            • Why is named entity recognition (NER) a challenging task
                            • Word2vec
                            • WordNet
                          • OTHER
                            • Addressing_Multicollinearity.py
                            • Bag_of_Words.py
                            • Bandit example output
                            • Bandit_Example_Fixed.py
                            • Click_Implementation.py
                            • Comparing_Ensembles.py
                            • Cross_Entropy_Single.py
                            • Cross_Entropy.py
                            • Debugging.py
                            • Distribution_Analysis.py
                            • Factor_Analysis.py
                            • FastAPI_Example.py
                            • Forecasting_AutoArima.py
                            • Forecasting_Baseline.py
                            • Forecasting_Exponential_Smoothing.py
                            • Gaussian_Mixture_Model_Implementation.py
                            • Handling_Missing_Data_Basic.ipynb
                            • Handling_Missing_Data.ipynb
                            • Imbalanced_Datasets_SMOTE.py
                            • K_Means.py
                            • Momentum.py
                            • One_hot_encoding.py
                            • Pandas_Common.py
                            • Pandas_Stack.py
                            • PCA_Analysis.ipynb
                            • PCA_Based_Anomaly_Detection.py
                            • Pycaret_Anomaly.ipynb
                            • Pycaret_Example.py
                            • Pydantic_More.py
                            • Pydantic.py
                            • Regression_Logistic_Metrics.ipynb
                            • ROC_Curve.py
                            • SVM_Example.py
                            • Testing_Pytest.py
                            • Testing_unittest.py
                            • transfer_learning.py
                            • TS_Anomaly_Detection.py
                            • Vector_Embedding.py
                            • Wikipedia_API.py
                            • Word2Vec.py
                          • PAPER
                            • Attention Is All You Need
                            • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
                          • project-management
                            • 1-on-1 Template
                            • 1-to-1's with a Line Manager
                            • Asking questions
                            • Change Management
                            • Communication principles
                            • Communication Techniques
                            • Communication with Stakeholders
                            • Conceptual Model
                            • Documentation
                            • Education and Training
                            • Experiment Plan Template
                            • Feedback Template
                            • Fishbone diagram
                            • How to do git commit messages properly
                            • html
                            • Jobs to be done
                            • Jupyter Book
                            • Managing Data Science Teams
                            • Modern data team
                            • nbconvert slideshows
                            • One Pager Template
                            • pdoc
                            • Problem Definition
                            • Process for prototyping
                            • project management
                            • Project Management Portal
                            • Pull Request Template
                            • RACI
                            • Remaining useful life models
                            • Return of Experience Form
                            • Reveal.js
                            • Technical Debt
                            • UML
                            • Why use ER diagrams
                          • statistics
                            • Addressing Multicollinearity
                            • ANOVA
                            • Assumption of Normality
                            • Bernoulli
                            • Bootstrap Sampling
                            • Casual Inference
                            • Central Limit Theorem
                            • Central Limit Theorem & Small Sample Sizes
                            • Chi-Squared Test
                            • Confidence Interval
                            • Correlation
                            • Correlation vs Causation
                            • Cosine Similarity
                            • Covariance
                            • Covariance vs Correlation
                            • Cryptography
                            • Differentation
                            • Distributions
                            • EM Algorithm
                            • Factor Analysis
                            • Gaussian Distribution
                            • Graph Theory
                            • Grouped plots
                            • Handling Different Distributions
                            • Hypothesis testing
                            • information theory
                            • Interquartile Range (IQR) Detection
                            • Johnson–Lindenstrauss lemma
                            • Markov chain
                            • Mathematics
                            • Mean Absolute Error
                            • Mean Squared Error
                            • mean vs median
                            • Multicollinearity
                            • non-parametric
                            • Odds
                            • Odds vs Probability
                            • p values
                            • Parametric tests
                            • parametric vs non-parametric models
                            • parametric vs non-parametric tests
                            • parsimonious
                            • Prediction Intervals
                            • Probability
                            • Proportion Test
                            • Q-Q Plot
                            • R
                            • R squared
                            • R-squared metric not always a good indicator of model performance in regression
                            • Reasoning tokens
                            • Resampling
                            • Root Mean Squared Error
                            • Spearman vs Pearson Correlation
                            • Standard deviation
                            • Standardisation
                            • Statistical Assumptions
                            • Statistical Tests
                            • Statistical theorems
                            • Statistics
                            • statsmodels
                            • Stochastic Gradient Descent
                            • Symbolic computation
                            • Sympy
                            • T-test
                            • univariate vs multivariate
                            • Variance
                            • Violin plot
                            • Z-Normalisation
                            • Z-Score
                            • Z-Scores vs Prediction Intervals
                            • Z-Test
                          • uncategorised
                            • Bagging vs Boosting
                            • Correlated Time Series
                            • Databricks & dbt
                            • Granger Causality Test
                            • Investigate pyodbc
                            • Mean reverting
                            • NLP Portal
                            • rolling mean vs cumulative mean
                            • Science Portal
                            • Time sampling
                            • Untitled
                            • Untitled 1
                            • Untitled 2
                            • Untitled 2axx
                            • Why Use PySpark in Databricks
                          • pages
                            • Data Archive
                            • DE_Tools
                            • ML_Tools
                            • Quotes
                            • Research Questions
                            • Reviews

                        Backlinks

                        • No backlinks found

                        Created with Quartz v4.3.1 © 2025

                        • GitHub
                        • Linkedin