Data Archive

    • categories
      • computer-science
        • Algorithms
        • Big O Notation
        • BM25 (Best Match 25)
        • Checksum
        • Computer Science
        • Concurrency
        • Convex Optimisation
        • csv module
        • Directed Acyclic Graph (DAG)
        • Flask
        • garbage collector
        • Generators in Python
        • Hash
        • Heap Data Structure
        • Heap Memory
        • How to search within a graph
        • Immutable vs mutable
        • Java
        • Java vs JavaScript
        • JavaScript
        • Knowledge Graph
        • Langchain
        • Machine Learning Algorithms
        • Monte Carlo Simulation
        • Multiprocessing vs Multithreading
        • Multithreading
        • neomodel
        • Node.JS
        • Numpy
        • Processes vs Threads
        • programming languages
        • PyGraphviz
        • QuickSort
        • Ranking models
        • Recursive Algorithm
        • Strongly vs Weakly typed language
        • Times Series Python Packages
      • data-analysis
        • Altair
        • altair versus seaborn
        • Binder
        • Boxplot
        • Dash
        • Dashboarding
        • Dashboards
        • Data Analysis
        • Data Analysis Portal
        • Data Analyst
        • Data Distribution
        • Data Mining
        • Data Product
        • Data Reduction
        • Data Visualisation
        • DuckDB
        • EDA
        • ER Diagrams
        • Heatmap
        • Label encoding
        • Linear Discriminant Analysis
        • Log transformation
        • Looker Studio
        • MariaDB vs MySQL
        • Melt
        • Multiple Correspondence Analysis
        • Multivariate Analysis
        • OLAP
        • Page Rank
        • Parquet
        • Plotly
        • PowerBI
        • Preprocessing
        • Preprocessing Text Classification
        • Seaborn
        • SQL Window functions
        • t-SNE
        • Tableau
      • data-engineering
        • ACID Transaction
        • Ada boosting
        • Adding a database to PostgreSQL
        • Aggregation
        • Apache Iceberg
        • Attack mitigation
        • Attack types
        • AWS Lambda
        • Azure
        • Bagging
        • Benefits of Data Transformation
        • Big Data
        • BigQuery
        • Cassandra
        • Cloud Providers
        • Coaching & Mentoring
        • Columnar Storage
        • Command Prompt
        • Common Table Expression
        • Components of the database
        • Covering Index
        • Crosstab
        • CRUD
        • CUDA
        • Curse of dimensionality
        • Cypher
        • Data Architect
        • Data Architecture
        • Data Cleansing
        • Data Contract
        • Data Deployment
        • Data Dictionary
        • Data Drift
        • Data Engineering
        • Data Engineering Portal
        • Data Engineering Tools
        • Data Evaluation
        • Data Hierarchy of Needs
        • Data Integration
        • Data Integrity
        • Data Lake
        • Data Lakehouse
        • Data Leakage
        • Data Lifecycle Management
        • data lineage
        • Data Management
        • Data Modeling
        • Data Observability
        • Data Principles
        • Data Quality
        • Data Security
        • Data Selection
        • Data Sources
        • Data Storage
        • Data Transformation
        • Data Transformation in Data Engineering
        • Data Transformation with Pandas
        • Data Validation
        • Data Virtualization
        • Data Warehouse
        • Database
        • Database Index
        • Database Management System (DBMS)
        • Database Schema
        • Database Storage
        • Database Techniques
        • Databricks 1
        • DataOps
        • dbt 1
        • design pattern
        • Digital twin
        • Distributed Computing
        • DuckDB in python
        • DuckDB vs SQLite
        • Durability
        • ELT
        • Estimator
        • ETL
        • ETL 1
        • ETL Pipeline Example
        • ETL vs ELT
        • EtLT
        • Event Driven Microservices
        • Event-Driven Architecture
        • Fabric
        • Faker
        • File Management
        • Folder Tree Diagram
        • Foreign Key
        • Github Actions
        • Google Sheet Pivots Table
        • Grain
        • Graph Query Language
        • Groupby
        • Groupby vs Crosstab
        • heterogeneous features
        • Honkit
        • Hosting
        • How is schema evolution done in practice with SQL
        • How to normalise a merged table
        • Implementing Database Schema
        • Imputation Techniques
        • in-memory format
        • incremental synchronization
        • Indexing in cypher
        • Input is Not Properly Sanitized
        • Joining Datasets
        • Junction Tables
        • KNIME
        • Logical Model
        • Many-to-Many Relationships
        • map reduce
        • MariaDB
        • master data management
        • Merge
        • Microsoft Access
        • Missing Data
        • Model Deployment
        • Monolith Architecture
        • Multi-level index
        • Multiprocessing
        • MySql
        • neo4j
        • Normalised Schema
        • NoSQL
        • Object Relational Mapper
        • OLTP
        • Overfitting
        • Pandas
        • Pandas join vs merge
        • Pandas Pivot Table
        • Pandas Stack
        • pd.Grouper
        • pgAdmin
        • Pgadmin Permissions on Windows
        • Physical Model
        • Pickle
        • Poetry
        • Polars
        • PostgreSQL
        • Postman
        • PowerShell
        • Prevention Is Better Than The Cure
        • Primary Key
        • Push-Down
        • Pydantic
        • Pyright vs Pydantic
        • Query Optimisation
        • Querying
        • Querying Time Series
        • Race Conditions
        • Relating Tables Together
        • Relational Database
        • reverse etl
        • rollup
        • Row parameters in SQL
        • Row-based Storage
        • Scalability
        • Scaling Server
        • Schema Evolution
        • Search
        • Security mitigation
        • Security Researcher
        • semantic layer
        • Single Source of Truth
        • Sklearn Pipiline
        • Slowly Changing Dimension
        • SMSS
        • Snowflake Schema
        • Soft Deletion
        • Software Design Patterns
        • Spreadsheets vs Databases
        • SQL
        • SQL Groupby
        • SQL Injection
        • SQL Joins
        • SQLAlchemy
        • SQLAlchemy vs. sqlite3
        • SQLite
        • SQLite Studio
        • Star Schema
        • storage layer object store
        • Stored Procedures
        • structured data
        • Structuring and organizing data
        • Transaction
        • Turning a flat file into a database
        • Types of Database Schema
        • Unix
        • unstructured data
        • Usability
        • Vacuum
        • Vector Database
        • Vectorized Engine
        • View Use Case
        • Views
        • Windows Subsystem for Linux
      • data-science
        • ACF Plots
        • Additive vs Multiplicative Models Time Series
        • ADF Test
        • Agent Exploration
        • Agentic Solutions
        • AI
        • ARIMA
        • ARIMA vs Random Forest in Time Series
        • Autocorrelation
        • Autocorrelation vs Autoregression
        • Autoregression
        • Baseline Forecast
        • Basics of Time Series
        • Batch gradient descent
        • Bellman Equations
        • Bias-Variance Trade Off
        • Capability
        • Choosing a Threshold
        • Choosing the Number of Clusters
        • Clustermap
        • Covariance Structures
        • Cross Validation
        • Data Assessment
        • Data Collection
        • Data Mining - CRISP
        • Data Preparation
        • Data Science
        • Data Scientist
        • Data Understanding
        • Datasets
        • Decomposition in Time Series
        • Differencing in Time Series
        • DS & ML Portal
        • Evaluating Time Series Forecasts
        • Evolving Seasonality
        • F-statistic
        • Feature Engineering
        • Feature Scaling
        • Feature Selection vs Feature Importance
        • Forecasting using Lags
        • Forward Propagation
        • Gaussian Mixture Models
        • Gitlab
        • Gompertz Model
        • Good Enough Principle in Data Projects
        • GraphRAG
        • Handling Missing Data
        • Holt-Winters (Exponential Smoothing)
        • Holt-Winters vs ARIMA
        • Holt’s Linear Trend Model (Double Exponential Smoothing)
        • how do you do the data selection
        • Imbalanced Datasets
        • Interpolation
        • Intervention Analysis
        • Joining Time Series
        • Kernel Machines
        • KPSS Test
        • Latency
        • Logistic Model Curve
        • LSTM in Time Series
        • Mean Absolute Percentage Error
        • MNIST
        • Normalisation
        • Out-of-sample rolling forecast evaluation
        • PACF Plots
        • Performance Dimensions
        • pmdarima
        • Properties of Time Series Models
        • Random Forest Regression
        • Residuals in Time Series
        • Scatter Plots
        • Scientific Method
        • Scipy
        • Seasonal Naive Forecast
        • Seasonality in Time Series
        • SHapley Additive exPlanations
        • Shot Learning
        • Silhouette Analysis
        • Simple Exponential Smoothing (SES)
        • sklearn datasets
        • SMOTE (Synthetic Minority Over-sampling Technique)
        • SparseCategorialCrossentropy or CategoricalCrossEntropy
        • stack memory
        • Stacking
        • Stationary Time Series
        • STL Decomposition
        • Time Series
        • Time Series Forecasting
        • Time Series Forecasts in Business
        • Time Series Learning Resources
        • Time Series Shocks
        • Trends in Time Series
      • deep-learning
        • Convolutional Neural Networks
        • Deep Learning
        • How is reinforcement learning being combined with deep learning
        • LSTM
        • Multi-Agent Reinforcement Learning
        • Policy
        • Relu
        • Sarsa
      • devops
        • AB testing
        • Alternatives to Batch Processing
        • Amazon S3
        • Apache Airflow
        • Apache Kafka
        • Apache Spark
        • API
        • API Driven Microservices
        • Bash
        • bat
        • Batch Processing
        • Batch vs PowerShell scripts
        • CI-CD
        • Clustering_Dashboard.py
        • Code Diagrams
        • Command Line
        • Continuous Delivery - Deployment
        • Continuous Integration
        • Cron jobs
        • dagster
        • Data Ingestion
        • Data Orchestration
        • Data Pipeline
        • Data Pipeline to Data Products
        • Data Streaming
        • Databricks
        • Databricks vs Snowflake
        • dbt
        • Debugging
        • Declarative Data Pipeline
        • dependency manager
        • DevOps
        • Devops Portal
        • Digital Transformation
        • Docker
        • Docker Image
        • Elastic Net
        • Environment Variables
        • Epub
        • Event Driven
        • Event Driven Events
        • Everything
        • Excel
        • Excel pivot table
        • Excel vs Google Sheets
        • FastAPI
        • Firebase
        • frontend
        • functional programming
        • GIS
        • Git
        • Github Gists
        • gitlab-ci.yml
        • Global Interpreter Lock
        • Google Cloud Platform
        • Google Colab
        • Google My Maps Data Extraction
        • Google Sheets
        • GPT
        • Gradio
        • Grep
        • Hadoop
        • Hugging Face
        • imperative
        • ipynb
        • jinja template
        • Json
        • Json to SQLite
        • jupytext
        • Justfile
        • kubernetes
        • Load Balancing
        • Maintainability
        • Maintainable Code
        • Makefile
        • Master Observability Datadog
        • Memory
        • Memory Caching
        • Microsoft
        • MongoDB
        • nbconvert
        • NET
        • Normalisation of Text
        • Pandas Series vs DataFrame
        • Pandoc
        • PMML
        • Powerquery
        • Powershell scripts
        • Powershell versus Command Prompt
        • Powershell vs Bash
        • Publish and Subscribe
        • PySpark
        • Pytest
        • Python
        • Python Click
        • Quartz
        • Random Access Memory
        • React
        • Registering a Scheduled Task
        • REST API
        • Scala
        • Security Vulnerabilities
        • shapefile
        • Sharepoint
        • Snowflake
        • Snowflake vs Hadoop
        • Software Development Life Cycle
        • SQL vs NoSQL
        • Streamlit
        • Technical Design Doc Template
        • Terminal commands
        • Testing
        • TOML
        • tool.bandit
        • tool.ruff
        • tool.uv
        • Types of Computational Bugs
        • TypeScript
        • Ubuntu
        • unittest
        • Vercel
        • Virtual environments
        • Web Feature Server (WFS)
        • Web Map Tile Service (WMTS)
        • Why JSON is Better than Pickle for Untrusted Data
        • Windows
        • Windows Scheduled Tasks
        • yaml
      • industry
        • AI Engineer
        • AI governance
        • Analytics Engineer
        • business intelligence
        • Business observability
        • Business Understanding
        • Business Values
        • Data AI Education at Work
        • Data Engineer
        • Data Governance
        • data literacy
        • Data Roles
        • Data Steward
        • Design Thinking Questions
        • Documentation & Meetings
        • Energy
        • Energy ABM
        • Energy Demand Forecasting
        • Energy Storage
        • Facts
        • Gartner Hype Cycle
        • Industries of interest
        • Knowledge Work
        • Managing People
        • ML Engineer
        • Network Design
        • Operational Resilience for Growth and Adaptability
        • Reporting
        • Scaling Data Science Capability
        • Smart Grids
        • Telecommunications
        • Thinking Systems
        • Use of RNNs in energy sector
        • Working with SMEs
      • machine-learning
        • Accuracy
        • Activation atlases
        • Activation Function
        • Active Learning
        • Adam Optimizer
        • Adaptive Learning Rates
        • Adjusted R squared
        • Agent-Based Modelling
        • AIC in Model Evaluation
        • Anomaly Detection
        • Anomaly Detection in Time Series
        • Anomaly Detection with Clustering
        • Anomaly Detection with Statistical Methods
        • Assessing Gen AI generated content
        • AUC
        • Automated Feature Creation
        • AutoML
        • Backpropagation
        • Batch Normalisation
        • Bias in ML
        • Binary Classification
        • Boosting
        • Business value of anomaly detection
        • CART
        • CatBoost
        • Challenges to Model Deployment
        • Class Separability
        • Classification
        • Classification Report
        • Cluster Density
        • Cluster Seperation
        • Clustering
        • Collaborative Filtering
        • conceptual data model
        • Confusion Matrix
        • Cost Function
        • Cost-Sensitive Analysis
        • Cross Entropy
        • Customer Growth Modeling
        • Data Selection in ML
        • Data Transformation in Machine Learning
        • DBSCAN
        • Decision Theory
        • Decision Tree
        • Decision Trees are Fragile
        • Deep Learning Frameworks
        • Deep Q-Learning
        • Dendrograms
        • Determining Threshold Values
        • Dimension Table
        • Dimensional Modelling
        • Dimensionality Reduction
        • Dimensions
        • Distributions in Decision Tree Leaves
        • Dropout
        • Dummy variable trap
        • Edge ML
        • emergent behavior
        • Encoding Categorical Variables
        • Epoch
        • Evaluating Language Models
        • Evaluating Logistic Regression
        • Evaluating the effectiveness of prompts
        • Evaluation Metrics
        • Exploration vs Exploitation
        • Exponential Smoothing
        • f-regression
        • F1 Score
        • Fact Table
        • FAISS
        • Feature Engineering for Time Series
        • Feature Evaluation
        • Feature Extraction
        • Feature Importance
        • Feature Selection
        • Feature Transformations
        • Feed Forward Neural Network
        • Filter Methods
        • Fitting weights and biases of a neural network
        • Framework for models
        • Gaussian Model
        • General Linear Regression
        • Generalisation
        • Generative Adversarial Networks
        • Gini Impurity
        • Gini Impurity vs Cross Entropy
        • Gradient Boosted Trees
        • Gradient Boosting
        • Gradient Boosting Regressor
        • Gradient Descent
        • Gradient descent in linear regression
        • granularity
        • Graph Neural Network
        • Graph Theory Community
        • GridSeachCv
        • Growth Models in Time Series
        • GRU
        • Hierarchical Clustering
        • High cross validation accuracy is not directly proportional to performance on unseen test data
        • Histogram
        • How do we evaluate of LLM Outputs
        • How to use Sklearn Pipeline
        • Hyperparameter
        • Hyperparameter Tuning
        • Impact of multicollinearity on model parameters
        • Inertia K Means Cost Function
        • inference
        • inference versus prediction
        • initialization methods
        • Interoperability
        • interoperable
        • Interpretability
        • Interpreting logistic regression model parameters
        • Isolated Forest
        • Jaccard Coefficient
        • K-means
        • K-nearest neighbours
        • Keras
        • Kernel Density Estimation
        • Kernelling
        • Kmeans vs GMM
        • L1 Regularisation
        • Label encoding vs One-hot encoding
        • Labelling data
        • Lagrange multipliers in optimisation
        • lambda architecture
        • Latent Dirichlet Allocation
        • Latent Semantic Indexing
        • LBFGS
        • Learning Curve
        • Learning Rate
        • Learning Styles
        • LightGBM
        • LightGBM vs XGBoost vs CatBoost
        • Linear Regression
        • LLM Evaluation Metrics
        • Local Interpretable Model-agnostic Explainations
        • Local Outlier Factor (LOF)
        • Logistic Regression
        • Logistic Regression does not predict probabilities
        • Logistic regression in sklearn & Gradient Descent
        • Logistic Regression Statsmodel Summary table
        • Loss function
        • Loss versus Cost function
        • Machine Learning
        • Machine Learning Operations
        • Manifold Learning
        • Markov Decision Processes
        • Maximum Likelihood Estimation
        • Median Absolute Error
        • Mermaid
        • Metadata Handling
        • Methods for Handling Outliers
        • Metric
        • Mini-batch gradient descent
        • MLOPS for Time Series
        • Model Building
        • Model Deployment using PyCaret
        • Model Ensemble
        • Model Evaluation
        • Model Evaluation vs Model Optimisation
        • Model Interpretability
        • Model Observability
        • Model Optimisation
        • Model Parameters
        • Model Parameters Tuning
        • Model parameters vs hyperparameters
        • Model Selection
        • Model Validation
        • model-agnostic feature importance
        • Momentum
        • Moving Average Forecast
        • Multinomial Naive bayes
        • Multiple Linear Regression
        • Naive Bayes Classifier
        • Naive Forecast
        • Neural network
        • Neural Network Classification
        • Neural network in Practice
        • Neural Scaling Laws
        • Non-negative matrix factorization in ML
        • Non-parametric tests
        • Normalisation of data
        • Normalisation vs Standardisation
        • objective function
        • One-hot encoding
        • Optimisation function
        • Optimisation techniques
        • Optimising a Logistic Regression Model
        • Optimising Neural Networks
        • Optuna
        • Ordinary Least Squares
        • Orthogonalization
        • Outliers
        • Over parameterised models
        • PCA Explained Variance Ratio
        • PCA Principal Components
        • PCA-Based Anomaly Detection
        • PDP and ICE
        • Percentile Detection
        • Performance Drift
        • Polynomial Regression
        • Positional Encoding
        • Precision
        • Precision or Recall
        • Precision-Recall Curve
        • Prediction Intervals vs Confidence Interval
        • Principal Component Analysis
        • PyCaret
        • PyOD
        • PyTorch
        • Pytorch vs Tensorflow
        • Q-Learning
        • Random Forest
        • Random Forest for Time Series
        • Recall
        • Recommender systems
        • Recurrent Neural Networks
        • Regression
        • Regression Metrics
        • Regularisation
        • Regularisation of Tree based models
        • Reinforcement learning
        • Relationships in memory
        • Reward Function
        • Ridge
        • ROC (Receiver Operating Characteristic)
        • Sammon’s Mapping
        • SARIMA
        • Scikit-Learn
        • Secretary Problem
        • semi-structured data
        • Sentence Transformers
        • Sklearn Pipeline
        • Specificity
        • Spectral Clustering
        • Supervised Learning
        • Support Vector Classifier
        • Support Vector Machines
        • Support Vector Regression
        • Tensorflow
        • Test Loss When Evaluating Models
        • Text Classification
        • Time Series Python Packages
        • Train-Dev-Test Sets
        • Transfer Learning
        • Transformed Target Regressor
        • Transformer
        • Transformers vs RNNs
        • Type I Error (False Positive)
        • Type II Error (False Negative)
        • Types of Neural Networks
        • Typical Output Formats in Neural Networks
        • UMAP
        • Unsupervised Learning
        • Use Cases for a Simple Neural Network Like
        • vanishing and exploding gradients problem
        • Variability in linear models
        • Variance in ML
        • Vector Embedding
        • WCSS and elbow method
        • Weak Learners
        • When and why not to us regularisation
        • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
        • Why does the Adam Optimizer converge
        • Why Removing Outliers May Improve Regression but Harm Classification
        • Why standardise features
        • Why Type 1 and Type 2 matter
        • Wrapper Methods
        • Xaiver
        • XGBoost
      • natural-language
        • AI Agents Memory
        • Attention mechanism
        • Bag of words
        • BERT
        • BERTScore
        • Chain of thought
        • ChatGPT
        • Claude
        • Comparing LLMs
        • Distillation
        • ElasticSearch
        • Embedded Methods
        • embeddings for OOV words
        • Evaluate Embedding Methods
        • Fuzzywuzzy
        • Generative AI
        • Generative AI From Theory to Practice
        • Grammar method
        • Guardrails
        • How businesses use Gen AI
        • How LLMs store facts
        • How to reduce the need for Gen AI responses
        • How would you decide between using TF-IDF and Word2Vec for text vectorization
        • In NER how would you handle ambiguous entities
        • Key Components of Attention and Formula
        • Knowledge graph vs RAG setup
        • Language Model Output Optimisation
        • Language Models
        • Language Models Large (LLMs) vs Small (SLMs)
        • lemmatization
        • LLM
        • LLM Memory
        • Local LLM use cases
        • Mathematical Reasoning in Transformers
        • Mixture of Experts
        • Model Cascading
        • Multi-head attention
        • Named Entity Recognition
        • NER Implementation
        • Ngrams
        • NLP
        • nltk
        • Non-negative Matrix Factorization
        • NotebookLM
        • OOV words
        • Pandas Dataframe Agent
        • Part of speech tagging
        • Prompt Engineering
        • prompt retrievers
        • Prompts
        • Pyright
        • RAG
        • Scaling Agentic Systems
        • Self attention vs multi-head attention
        • Self-Attention
        • Semantic Relationships
        • Semantic search
        • Sentence Similarity
        • Sentence Transformer Workflow
        • Similarity Search
        • Small Language Models
        • spaCy
        • Stemming
        • stopwords
        • Summarisation
        • syntactic relationships
        • Text2Cypher
        • TF-IDF
        • TF-IDF Implementation
        • Tokenisation
        • topic modeling
        • Vectorisation
        • Why is named entity recognition (NER) a challenging task
        • Word2vec
        • WordNet
      • OTHER
        • Addressing_Multicollinearity.py
        • Bag_of_Words.py
        • Bandit example output
        • Bandit_Example_Fixed.py
        • Click_Implementation.py
        • Comparing_Ensembles.py
        • Cross_Entropy_Single.py
        • Cross_Entropy.py
        • Debugging.py
        • Distribution_Analysis.py
        • Factor_Analysis.py
        • FastAPI_Example.py
        • Feature_Distribution.py
        • Forecasting_AutoArima.py
        • Forecasting_Baseline.py
        • Forecasting_Exponential_Smoothing.py
        • Gaussian_Mixture_Model_Implementation.py
        • Handling_Missing_Data_Basic.ipynb
        • Handling_Missing_Data.ipynb
        • Heatmaps_Dendrograms.py
        • Imbalanced_Datasets_SMOTE.py
        • K_Means.py
        • Momentum.py
        • One_hot_encoding.py
        • Pandas_Common.py
        • Pandas_Stack.py
        • PCA_Analysis.ipynb
        • PCA_Based_Anomaly_Detection.py
        • Pycaret_Anomaly.ipynb
        • Pycaret_Example.py
        • Pydantic_More.py
        • Pydantic.py
        • Regression_Logistic_Metrics.ipynb
        • Regularisation.py
        • ROC_Curve.py
        • SVM_Example.py
        • Testing_Pytest.py
        • Testing_unittest.py
        • transfer_learning.py
        • TS_Anomaly_Detection.py
        • Vector_Embedding.py
        • Wikipedia_API.py
        • Word2Vec.py
      • PAPER
        • Attention Is All You Need
        • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
      • project-management
        • 1-on-1 Template
        • 1-to-1's with a Line Manager
        • Asking questions
        • Change Management
        • Communication principles
        • Communication Techniques
        • Communication with Stakeholders
        • Conceptual Model
        • Documentation
        • Education and Training
        • Experiment Plan Template
        • Feedback Template
        • Fishbone diagram
        • How to do git commit messages properly
        • html
        • Jobs to be done
        • Jupyter Book
        • Managing Data Science Teams
        • Modern data team
        • nbconvert slideshows
        • One Pager Template
        • pdoc
        • Problem Definition
        • Process for prototyping
        • project management
        • Project Management Portal
        • Pull Request Template
        • RACI
        • Remaining useful life models
        • Return of Experience Form
        • Reveal.js
        • Technical Debt
        • UML
        • Why use ER diagrams
      • statistics
        • Addressing Multicollinearity
        • ANOVA
        • Assumption of Normality
        • Bernoulli
        • Bootstrap Sampling
        • Casual Inference
        • Central Limit Theorem
        • Central Limit Theorem & Small Sample Sizes
        • Chi-Squared Test
        • Confidence Interval
        • Correlation
        • Correlation vs Causation
        • Cosine Similarity
        • Covariance
        • Covariance vs Correlation
        • Cryptography
        • Differentation
        • Distributions
        • EM Algorithm
        • Factor Analysis
        • Gaussian Distribution
        • Graph Theory
        • Grouped plots
        • Handling Different Distributions
        • Hypothesis testing
        • information theory
        • Interquartile Range (IQR) Detection
        • Johnson–Lindenstrauss lemma
        • Markov chain
        • Mathematics
        • Mean Absolute Error
        • Mean Squared Error
        • mean vs median
        • Multicollinearity
        • non-parametric
        • Odds
        • Odds vs Probability
        • p values
        • Parametric tests
        • parametric vs non-parametric models
        • parametric vs non-parametric tests
        • parsimonious
        • Prediction Intervals
        • Probability
        • Proportion Test
        • Q-Q Plot
        • R
        • R squared
        • R-squared metric not always a good indicator of model performance in regression
        • Reasoning tokens
        • Root Mean Squared Error
        • Sampling
        • Spearman vs Pearson Correlation
        • Standard deviation
        • Standardisation
        • Statistical Assumptions
        • Statistical Tests
        • Statistical theorems
        • Statistics
        • statsmodels
        • Stochastic Gradient Descent
        • Symbolic computation
        • Sympy
        • T-test
        • univariate vs multivariate
        • Variance
        • Violin plot
        • Z-Normalisation
        • Z-Score
        • Z-Scores vs Prediction Intervals
        • Z-Test
      • uncategorised
        • Investigate pyodbc
        • NLP Portal
        • Science Portal
      • pages
        • Data Archive
        • DE_Tools
        • ML_Tools
        • Quotes
        • Research Questions
        • Reviews
    Home

    ❯

    categories

    ❯

    computer science

    Folder: categories/computer-science

    37 items under this folder.

    • 29 Sept 2025

      Processes vs Threads

      • software
    • 29 Sept 2025

      PyGraphviz

      • graph
    • 29 Sept 2025

      QuickSort

      • algorithm
    • 29 Sept 2025

      Ranking models

      • 29 Sept 2025

        Recursive Algorithm

        • algorithm
      • 29 Sept 2025

        Strongly vs Weakly typed language

        • software
      • 29 Sept 2025

        Times Series Python Packages

        • python
      • 29 Sept 2025

        csv module

        • python
      • 29 Sept 2025

        garbage collector

        • system
      • 29 Sept 2025

        neomodel

        • graph
        • python
      • 29 Sept 2025

        programming languages

        • programming
      • 29 Sept 2025

        Algorithms

        • algorithm
      • 29 Sept 2025

        BM25 (Best Match 25)

        • 29 Sept 2025

          Big O Notation

          • math
        • 29 Sept 2025

          Checksum

          • algorithm
          • security
        • 29 Sept 2025

          Computer Science

          • field
        • 29 Sept 2025

          Concurrency

          • system
        • 29 Sept 2025

          Convex Optimisation

          • 29 Sept 2025

            Directed Acyclic Graph (DAG)

            • math
            • orchestration
          • 29 Sept 2025

            Flask

            • python
          • 29 Sept 2025

            Generators in Python

            • data_structure
            • python
          • 29 Sept 2025

            Hash

            • data_structure
          • 29 Sept 2025

            Heap Data Structure

            • data_structure
          • 29 Sept 2025

            Heap Memory

            • memory_management
          • 29 Sept 2025

            How to search within a graph

            • graph
            • querying
          • 29 Sept 2025

            Immutable vs mutable

            • data_structure
            • python
          • 29 Sept 2025

            Java vs JavaScript

            • software
          • 29 Sept 2025

            Java

            • programming
          • 29 Sept 2025

            JavaScript

            • programming
          • 29 Sept 2025

            Knowledge Graph

            • graph
            • NLP
          • 29 Sept 2025

            Langchain

            • GenAI
            • python
          • 29 Sept 2025

            Machine Learning Algorithms

            • algorithm
            • modeling
          • 29 Sept 2025

            Monte Carlo Simulation

            • algorithm
            • statistics
          • 29 Sept 2025

            Multiprocessing vs Multithreading

            • programming
          • 29 Sept 2025

            Multithreading

            • programming
            • system
          • 29 Sept 2025

            Node.JS

            • programming
          • 29 Sept 2025

            Numpy

            • data_structure
            • python

          Backlinks

          • No backlinks found
            • categories
              • computer-science
                • Algorithms
                • Big O Notation
                • BM25 (Best Match 25)
                • Checksum
                • Computer Science
                • Concurrency
                • Convex Optimisation
                • csv module
                • Directed Acyclic Graph (DAG)
                • Flask
                • garbage collector
                • Generators in Python
                • Hash
                • Heap Data Structure
                • Heap Memory
                • How to search within a graph
                • Immutable vs mutable
                • Java
                • Java vs JavaScript
                • JavaScript
                • Knowledge Graph
                • Langchain
                • Machine Learning Algorithms
                • Monte Carlo Simulation
                • Multiprocessing vs Multithreading
                • Multithreading
                • neomodel
                • Node.JS
                • Numpy
                • Processes vs Threads
                • programming languages
                • PyGraphviz
                • QuickSort
                • Ranking models
                • Recursive Algorithm
                • Strongly vs Weakly typed language
                • Times Series Python Packages
              • data-analysis
                • Altair
                • altair versus seaborn
                • Binder
                • Boxplot
                • Dash
                • Dashboarding
                • Dashboards
                • Data Analysis
                • Data Analysis Portal
                • Data Analyst
                • Data Distribution
                • Data Mining
                • Data Product
                • Data Reduction
                • Data Visualisation
                • DuckDB
                • EDA
                • ER Diagrams
                • Heatmap
                • Label encoding
                • Linear Discriminant Analysis
                • Log transformation
                • Looker Studio
                • MariaDB vs MySQL
                • Melt
                • Multiple Correspondence Analysis
                • Multivariate Analysis
                • OLAP
                • Page Rank
                • Parquet
                • Plotly
                • PowerBI
                • Preprocessing
                • Preprocessing Text Classification
                • Seaborn
                • SQL Window functions
                • t-SNE
                • Tableau
              • data-engineering
                • ACID Transaction
                • Ada boosting
                • Adding a database to PostgreSQL
                • Aggregation
                • Apache Iceberg
                • Attack mitigation
                • Attack types
                • AWS Lambda
                • Azure
                • Bagging
                • Benefits of Data Transformation
                • Big Data
                • BigQuery
                • Cassandra
                • Cloud Providers
                • Coaching & Mentoring
                • Columnar Storage
                • Command Prompt
                • Common Table Expression
                • Components of the database
                • Covering Index
                • Crosstab
                • CRUD
                • CUDA
                • Curse of dimensionality
                • Cypher
                • Data Architect
                • Data Architecture
                • Data Cleansing
                • Data Contract
                • Data Deployment
                • Data Dictionary
                • Data Drift
                • Data Engineering
                • Data Engineering Portal
                • Data Engineering Tools
                • Data Evaluation
                • Data Hierarchy of Needs
                • Data Integration
                • Data Integrity
                • Data Lake
                • Data Lakehouse
                • Data Leakage
                • Data Lifecycle Management
                • data lineage
                • Data Management
                • Data Modeling
                • Data Observability
                • Data Principles
                • Data Quality
                • Data Security
                • Data Selection
                • Data Sources
                • Data Storage
                • Data Transformation
                • Data Transformation in Data Engineering
                • Data Transformation with Pandas
                • Data Validation
                • Data Virtualization
                • Data Warehouse
                • Database
                • Database Index
                • Database Management System (DBMS)
                • Database Schema
                • Database Storage
                • Database Techniques
                • Databricks 1
                • DataOps
                • dbt 1
                • design pattern
                • Digital twin
                • Distributed Computing
                • DuckDB in python
                • DuckDB vs SQLite
                • Durability
                • ELT
                • Estimator
                • ETL
                • ETL 1
                • ETL Pipeline Example
                • ETL vs ELT
                • EtLT
                • Event Driven Microservices
                • Event-Driven Architecture
                • Fabric
                • Faker
                • File Management
                • Folder Tree Diagram
                • Foreign Key
                • Github Actions
                • Google Sheet Pivots Table
                • Grain
                • Graph Query Language
                • Groupby
                • Groupby vs Crosstab
                • heterogeneous features
                • Honkit
                • Hosting
                • How is schema evolution done in practice with SQL
                • How to normalise a merged table
                • Implementing Database Schema
                • Imputation Techniques
                • in-memory format
                • incremental synchronization
                • Indexing in cypher
                • Input is Not Properly Sanitized
                • Joining Datasets
                • Junction Tables
                • KNIME
                • Logical Model
                • Many-to-Many Relationships
                • map reduce
                • MariaDB
                • master data management
                • Merge
                • Microsoft Access
                • Missing Data
                • Model Deployment
                • Monolith Architecture
                • Multi-level index
                • Multiprocessing
                • MySql
                • neo4j
                • Normalised Schema
                • NoSQL
                • Object Relational Mapper
                • OLTP
                • Overfitting
                • Pandas
                • Pandas join vs merge
                • Pandas Pivot Table
                • Pandas Stack
                • pd.Grouper
                • pgAdmin
                • Pgadmin Permissions on Windows
                • Physical Model
                • Pickle
                • Poetry
                • Polars
                • PostgreSQL
                • Postman
                • PowerShell
                • Prevention Is Better Than The Cure
                • Primary Key
                • Push-Down
                • Pydantic
                • Pyright vs Pydantic
                • Query Optimisation
                • Querying
                • Querying Time Series
                • Race Conditions
                • Relating Tables Together
                • Relational Database
                • reverse etl
                • rollup
                • Row parameters in SQL
                • Row-based Storage
                • Scalability
                • Scaling Server
                • Schema Evolution
                • Search
                • Security mitigation
                • Security Researcher
                • semantic layer
                • Single Source of Truth
                • Sklearn Pipiline
                • Slowly Changing Dimension
                • SMSS
                • Snowflake Schema
                • Soft Deletion
                • Software Design Patterns
                • Spreadsheets vs Databases
                • SQL
                • SQL Groupby
                • SQL Injection
                • SQL Joins
                • SQLAlchemy
                • SQLAlchemy vs. sqlite3
                • SQLite
                • SQLite Studio
                • Star Schema
                • storage layer object store
                • Stored Procedures
                • structured data
                • Structuring and organizing data
                • Transaction
                • Turning a flat file into a database
                • Types of Database Schema
                • Unix
                • unstructured data
                • Usability
                • Vacuum
                • Vector Database
                • Vectorized Engine
                • View Use Case
                • Views
                • Windows Subsystem for Linux
              • data-science
                • ACF Plots
                • Additive vs Multiplicative Models Time Series
                • ADF Test
                • Agent Exploration
                • Agentic Solutions
                • AI
                • ARIMA
                • ARIMA vs Random Forest in Time Series
                • Autocorrelation
                • Autocorrelation vs Autoregression
                • Autoregression
                • Baseline Forecast
                • Basics of Time Series
                • Batch gradient descent
                • Bellman Equations
                • Bias-Variance Trade Off
                • Capability
                • Choosing a Threshold
                • Choosing the Number of Clusters
                • Clustermap
                • Covariance Structures
                • Cross Validation
                • Data Assessment
                • Data Collection
                • Data Mining - CRISP
                • Data Preparation
                • Data Science
                • Data Scientist
                • Data Understanding
                • Datasets
                • Decomposition in Time Series
                • Differencing in Time Series
                • DS & ML Portal
                • Evaluating Time Series Forecasts
                • Evolving Seasonality
                • F-statistic
                • Feature Engineering
                • Feature Scaling
                • Feature Selection vs Feature Importance
                • Forecasting using Lags
                • Forward Propagation
                • Gaussian Mixture Models
                • Gitlab
                • Gompertz Model
                • Good Enough Principle in Data Projects
                • GraphRAG
                • Handling Missing Data
                • Holt-Winters (Exponential Smoothing)
                • Holt-Winters vs ARIMA
                • Holt’s Linear Trend Model (Double Exponential Smoothing)
                • how do you do the data selection
                • Imbalanced Datasets
                • Interpolation
                • Intervention Analysis
                • Joining Time Series
                • Kernel Machines
                • KPSS Test
                • Latency
                • Logistic Model Curve
                • LSTM in Time Series
                • Mean Absolute Percentage Error
                • MNIST
                • Normalisation
                • Out-of-sample rolling forecast evaluation
                • PACF Plots
                • Performance Dimensions
                • pmdarima
                • Properties of Time Series Models
                • Random Forest Regression
                • Residuals in Time Series
                • Scatter Plots
                • Scientific Method
                • Scipy
                • Seasonal Naive Forecast
                • Seasonality in Time Series
                • SHapley Additive exPlanations
                • Shot Learning
                • Silhouette Analysis
                • Simple Exponential Smoothing (SES)
                • sklearn datasets
                • SMOTE (Synthetic Minority Over-sampling Technique)
                • SparseCategorialCrossentropy or CategoricalCrossEntropy
                • stack memory
                • Stacking
                • Stationary Time Series
                • STL Decomposition
                • Time Series
                • Time Series Forecasting
                • Time Series Forecasts in Business
                • Time Series Learning Resources
                • Time Series Shocks
                • Trends in Time Series
              • deep-learning
                • Convolutional Neural Networks
                • Deep Learning
                • How is reinforcement learning being combined with deep learning
                • LSTM
                • Multi-Agent Reinforcement Learning
                • Policy
                • Relu
                • Sarsa
              • devops
                • AB testing
                • Alternatives to Batch Processing
                • Amazon S3
                • Apache Airflow
                • Apache Kafka
                • Apache Spark
                • API
                • API Driven Microservices
                • Bash
                • bat
                • Batch Processing
                • Batch vs PowerShell scripts
                • CI-CD
                • Clustering_Dashboard.py
                • Code Diagrams
                • Command Line
                • Continuous Delivery - Deployment
                • Continuous Integration
                • Cron jobs
                • dagster
                • Data Ingestion
                • Data Orchestration
                • Data Pipeline
                • Data Pipeline to Data Products
                • Data Streaming
                • Databricks
                • Databricks vs Snowflake
                • dbt
                • Debugging
                • Declarative Data Pipeline
                • dependency manager
                • DevOps
                • Devops Portal
                • Digital Transformation
                • Docker
                • Docker Image
                • Elastic Net
                • Environment Variables
                • Epub
                • Event Driven
                • Event Driven Events
                • Everything
                • Excel
                • Excel pivot table
                • Excel vs Google Sheets
                • FastAPI
                • Firebase
                • frontend
                • functional programming
                • GIS
                • Git
                • Github Gists
                • gitlab-ci.yml
                • Global Interpreter Lock
                • Google Cloud Platform
                • Google Colab
                • Google My Maps Data Extraction
                • Google Sheets
                • GPT
                • Gradio
                • Grep
                • Hadoop
                • Hugging Face
                • imperative
                • ipynb
                • jinja template
                • Json
                • Json to SQLite
                • jupytext
                • Justfile
                • kubernetes
                • Load Balancing
                • Maintainability
                • Maintainable Code
                • Makefile
                • Master Observability Datadog
                • Memory
                • Memory Caching
                • Microsoft
                • MongoDB
                • nbconvert
                • NET
                • Normalisation of Text
                • Pandas Series vs DataFrame
                • Pandoc
                • PMML
                • Powerquery
                • Powershell scripts
                • Powershell versus Command Prompt
                • Powershell vs Bash
                • Publish and Subscribe
                • PySpark
                • Pytest
                • Python
                • Python Click
                • Quartz
                • Random Access Memory
                • React
                • Registering a Scheduled Task
                • REST API
                • Scala
                • Security Vulnerabilities
                • shapefile
                • Sharepoint
                • Snowflake
                • Snowflake vs Hadoop
                • Software Development Life Cycle
                • SQL vs NoSQL
                • Streamlit
                • Technical Design Doc Template
                • Terminal commands
                • Testing
                • TOML
                • tool.bandit
                • tool.ruff
                • tool.uv
                • Types of Computational Bugs
                • TypeScript
                • Ubuntu
                • unittest
                • Vercel
                • Virtual environments
                • Web Feature Server (WFS)
                • Web Map Tile Service (WMTS)
                • Why JSON is Better than Pickle for Untrusted Data
                • Windows
                • Windows Scheduled Tasks
                • yaml
              • industry
                • AI Engineer
                • AI governance
                • Analytics Engineer
                • business intelligence
                • Business observability
                • Business Understanding
                • Business Values
                • Data AI Education at Work
                • Data Engineer
                • Data Governance
                • data literacy
                • Data Roles
                • Data Steward
                • Design Thinking Questions
                • Documentation & Meetings
                • Energy
                • Energy ABM
                • Energy Demand Forecasting
                • Energy Storage
                • Facts
                • Gartner Hype Cycle
                • Industries of interest
                • Knowledge Work
                • Managing People
                • ML Engineer
                • Network Design
                • Operational Resilience for Growth and Adaptability
                • Reporting
                • Scaling Data Science Capability
                • Smart Grids
                • Telecommunications
                • Thinking Systems
                • Use of RNNs in energy sector
                • Working with SMEs
              • machine-learning
                • Accuracy
                • Activation atlases
                • Activation Function
                • Active Learning
                • Adam Optimizer
                • Adaptive Learning Rates
                • Adjusted R squared
                • Agent-Based Modelling
                • AIC in Model Evaluation
                • Anomaly Detection
                • Anomaly Detection in Time Series
                • Anomaly Detection with Clustering
                • Anomaly Detection with Statistical Methods
                • Assessing Gen AI generated content
                • AUC
                • Automated Feature Creation
                • AutoML
                • Backpropagation
                • Batch Normalisation
                • Bias in ML
                • Binary Classification
                • Boosting
                • Business value of anomaly detection
                • CART
                • CatBoost
                • Challenges to Model Deployment
                • Class Separability
                • Classification
                • Classification Report
                • Cluster Density
                • Cluster Seperation
                • Clustering
                • Collaborative Filtering
                • conceptual data model
                • Confusion Matrix
                • Cost Function
                • Cost-Sensitive Analysis
                • Cross Entropy
                • Customer Growth Modeling
                • Data Selection in ML
                • Data Transformation in Machine Learning
                • DBSCAN
                • Decision Theory
                • Decision Tree
                • Decision Trees are Fragile
                • Deep Learning Frameworks
                • Deep Q-Learning
                • Dendrograms
                • Determining Threshold Values
                • Dimension Table
                • Dimensional Modelling
                • Dimensionality Reduction
                • Dimensions
                • Distributions in Decision Tree Leaves
                • Dropout
                • Dummy variable trap
                • Edge ML
                • emergent behavior
                • Encoding Categorical Variables
                • Epoch
                • Evaluating Language Models
                • Evaluating Logistic Regression
                • Evaluating the effectiveness of prompts
                • Evaluation Metrics
                • Exploration vs Exploitation
                • Exponential Smoothing
                • f-regression
                • F1 Score
                • Fact Table
                • FAISS
                • Feature Engineering for Time Series
                • Feature Evaluation
                • Feature Extraction
                • Feature Importance
                • Feature Selection
                • Feature Transformations
                • Feed Forward Neural Network
                • Filter Methods
                • Fitting weights and biases of a neural network
                • Framework for models
                • Gaussian Model
                • General Linear Regression
                • Generalisation
                • Generative Adversarial Networks
                • Gini Impurity
                • Gini Impurity vs Cross Entropy
                • Gradient Boosted Trees
                • Gradient Boosting
                • Gradient Boosting Regressor
                • Gradient Descent
                • Gradient descent in linear regression
                • granularity
                • Graph Neural Network
                • Graph Theory Community
                • GridSeachCv
                • Growth Models in Time Series
                • GRU
                • Hierarchical Clustering
                • High cross validation accuracy is not directly proportional to performance on unseen test data
                • Histogram
                • How do we evaluate of LLM Outputs
                • How to use Sklearn Pipeline
                • Hyperparameter
                • Hyperparameter Tuning
                • Impact of multicollinearity on model parameters
                • Inertia K Means Cost Function
                • inference
                • inference versus prediction
                • initialization methods
                • Interoperability
                • interoperable
                • Interpretability
                • Interpreting logistic regression model parameters
                • Isolated Forest
                • Jaccard Coefficient
                • K-means
                • K-nearest neighbours
                • Keras
                • Kernel Density Estimation
                • Kernelling
                • Kmeans vs GMM
                • L1 Regularisation
                • Label encoding vs One-hot encoding
                • Labelling data
                • Lagrange multipliers in optimisation
                • lambda architecture
                • Latent Dirichlet Allocation
                • Latent Semantic Indexing
                • LBFGS
                • Learning Curve
                • Learning Rate
                • Learning Styles
                • LightGBM
                • LightGBM vs XGBoost vs CatBoost
                • Linear Regression
                • LLM Evaluation Metrics
                • Local Interpretable Model-agnostic Explainations
                • Local Outlier Factor (LOF)
                • Logistic Regression
                • Logistic Regression does not predict probabilities
                • Logistic regression in sklearn & Gradient Descent
                • Logistic Regression Statsmodel Summary table
                • Loss function
                • Loss versus Cost function
                • Machine Learning
                • Machine Learning Operations
                • Manifold Learning
                • Markov Decision Processes
                • Maximum Likelihood Estimation
                • Median Absolute Error
                • Mermaid
                • Metadata Handling
                • Methods for Handling Outliers
                • Metric
                • Mini-batch gradient descent
                • MLOPS for Time Series
                • Model Building
                • Model Deployment using PyCaret
                • Model Ensemble
                • Model Evaluation
                • Model Evaluation vs Model Optimisation
                • Model Interpretability
                • Model Observability
                • Model Optimisation
                • Model Parameters
                • Model Parameters Tuning
                • Model parameters vs hyperparameters
                • Model Selection
                • Model Validation
                • model-agnostic feature importance
                • Momentum
                • Moving Average Forecast
                • Multinomial Naive bayes
                • Multiple Linear Regression
                • Naive Bayes Classifier
                • Naive Forecast
                • Neural network
                • Neural Network Classification
                • Neural network in Practice
                • Neural Scaling Laws
                • Non-negative matrix factorization in ML
                • Non-parametric tests
                • Normalisation of data
                • Normalisation vs Standardisation
                • objective function
                • One-hot encoding
                • Optimisation function
                • Optimisation techniques
                • Optimising a Logistic Regression Model
                • Optimising Neural Networks
                • Optuna
                • Ordinary Least Squares
                • Orthogonalization
                • Outliers
                • Over parameterised models
                • PCA Explained Variance Ratio
                • PCA Principal Components
                • PCA-Based Anomaly Detection
                • PDP and ICE
                • Percentile Detection
                • Performance Drift
                • Polynomial Regression
                • Positional Encoding
                • Precision
                • Precision or Recall
                • Precision-Recall Curve
                • Prediction Intervals vs Confidence Interval
                • Principal Component Analysis
                • PyCaret
                • PyOD
                • PyTorch
                • Pytorch vs Tensorflow
                • Q-Learning
                • Random Forest
                • Random Forest for Time Series
                • Recall
                • Recommender systems
                • Recurrent Neural Networks
                • Regression
                • Regression Metrics
                • Regularisation
                • Regularisation of Tree based models
                • Reinforcement learning
                • Relationships in memory
                • Reward Function
                • Ridge
                • ROC (Receiver Operating Characteristic)
                • Sammon’s Mapping
                • SARIMA
                • Scikit-Learn
                • Secretary Problem
                • semi-structured data
                • Sentence Transformers
                • Sklearn Pipeline
                • Specificity
                • Spectral Clustering
                • Supervised Learning
                • Support Vector Classifier
                • Support Vector Machines
                • Support Vector Regression
                • Tensorflow
                • Test Loss When Evaluating Models
                • Text Classification
                • Time Series Python Packages
                • Train-Dev-Test Sets
                • Transfer Learning
                • Transformed Target Regressor
                • Transformer
                • Transformers vs RNNs
                • Type I Error (False Positive)
                • Type II Error (False Negative)
                • Types of Neural Networks
                • Typical Output Formats in Neural Networks
                • UMAP
                • Unsupervised Learning
                • Use Cases for a Simple Neural Network Like
                • vanishing and exploding gradients problem
                • Variability in linear models
                • Variance in ML
                • Vector Embedding
                • WCSS and elbow method
                • Weak Learners
                • When and why not to us regularisation
                • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
                • Why does the Adam Optimizer converge
                • Why Removing Outliers May Improve Regression but Harm Classification
                • Why standardise features
                • Why Type 1 and Type 2 matter
                • Wrapper Methods
                • Xaiver
                • XGBoost
              • natural-language
                • AI Agents Memory
                • Attention mechanism
                • Bag of words
                • BERT
                • BERTScore
                • Chain of thought
                • ChatGPT
                • Claude
                • Comparing LLMs
                • Distillation
                • ElasticSearch
                • Embedded Methods
                • embeddings for OOV words
                • Evaluate Embedding Methods
                • Fuzzywuzzy
                • Generative AI
                • Generative AI From Theory to Practice
                • Grammar method
                • Guardrails
                • How businesses use Gen AI
                • How LLMs store facts
                • How to reduce the need for Gen AI responses
                • How would you decide between using TF-IDF and Word2Vec for text vectorization
                • In NER how would you handle ambiguous entities
                • Key Components of Attention and Formula
                • Knowledge graph vs RAG setup
                • Language Model Output Optimisation
                • Language Models
                • Language Models Large (LLMs) vs Small (SLMs)
                • lemmatization
                • LLM
                • LLM Memory
                • Local LLM use cases
                • Mathematical Reasoning in Transformers
                • Mixture of Experts
                • Model Cascading
                • Multi-head attention
                • Named Entity Recognition
                • NER Implementation
                • Ngrams
                • NLP
                • nltk
                • Non-negative Matrix Factorization
                • NotebookLM
                • OOV words
                • Pandas Dataframe Agent
                • Part of speech tagging
                • Prompt Engineering
                • prompt retrievers
                • Prompts
                • Pyright
                • RAG
                • Scaling Agentic Systems
                • Self attention vs multi-head attention
                • Self-Attention
                • Semantic Relationships
                • Semantic search
                • Sentence Similarity
                • Sentence Transformer Workflow
                • Similarity Search
                • Small Language Models
                • spaCy
                • Stemming
                • stopwords
                • Summarisation
                • syntactic relationships
                • Text2Cypher
                • TF-IDF
                • TF-IDF Implementation
                • Tokenisation
                • topic modeling
                • Vectorisation
                • Why is named entity recognition (NER) a challenging task
                • Word2vec
                • WordNet
              • OTHER
                • Addressing_Multicollinearity.py
                • Bag_of_Words.py
                • Bandit example output
                • Bandit_Example_Fixed.py
                • Click_Implementation.py
                • Comparing_Ensembles.py
                • Cross_Entropy_Single.py
                • Cross_Entropy.py
                • Debugging.py
                • Distribution_Analysis.py
                • Factor_Analysis.py
                • FastAPI_Example.py
                • Feature_Distribution.py
                • Forecasting_AutoArima.py
                • Forecasting_Baseline.py
                • Forecasting_Exponential_Smoothing.py
                • Gaussian_Mixture_Model_Implementation.py
                • Handling_Missing_Data_Basic.ipynb
                • Handling_Missing_Data.ipynb
                • Heatmaps_Dendrograms.py
                • Imbalanced_Datasets_SMOTE.py
                • K_Means.py
                • Momentum.py
                • One_hot_encoding.py
                • Pandas_Common.py
                • Pandas_Stack.py
                • PCA_Analysis.ipynb
                • PCA_Based_Anomaly_Detection.py
                • Pycaret_Anomaly.ipynb
                • Pycaret_Example.py
                • Pydantic_More.py
                • Pydantic.py
                • Regression_Logistic_Metrics.ipynb
                • Regularisation.py
                • ROC_Curve.py
                • SVM_Example.py
                • Testing_Pytest.py
                • Testing_unittest.py
                • transfer_learning.py
                • TS_Anomaly_Detection.py
                • Vector_Embedding.py
                • Wikipedia_API.py
                • Word2Vec.py
              • PAPER
                • Attention Is All You Need
                • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
              • project-management
                • 1-on-1 Template
                • 1-to-1's with a Line Manager
                • Asking questions
                • Change Management
                • Communication principles
                • Communication Techniques
                • Communication with Stakeholders
                • Conceptual Model
                • Documentation
                • Education and Training
                • Experiment Plan Template
                • Feedback Template
                • Fishbone diagram
                • How to do git commit messages properly
                • html
                • Jobs to be done
                • Jupyter Book
                • Managing Data Science Teams
                • Modern data team
                • nbconvert slideshows
                • One Pager Template
                • pdoc
                • Problem Definition
                • Process for prototyping
                • project management
                • Project Management Portal
                • Pull Request Template
                • RACI
                • Remaining useful life models
                • Return of Experience Form
                • Reveal.js
                • Technical Debt
                • UML
                • Why use ER diagrams
              • statistics
                • Addressing Multicollinearity
                • ANOVA
                • Assumption of Normality
                • Bernoulli
                • Bootstrap Sampling
                • Casual Inference
                • Central Limit Theorem
                • Central Limit Theorem & Small Sample Sizes
                • Chi-Squared Test
                • Confidence Interval
                • Correlation
                • Correlation vs Causation
                • Cosine Similarity
                • Covariance
                • Covariance vs Correlation
                • Cryptography
                • Differentation
                • Distributions
                • EM Algorithm
                • Factor Analysis
                • Gaussian Distribution
                • Graph Theory
                • Grouped plots
                • Handling Different Distributions
                • Hypothesis testing
                • information theory
                • Interquartile Range (IQR) Detection
                • Johnson–Lindenstrauss lemma
                • Markov chain
                • Mathematics
                • Mean Absolute Error
                • Mean Squared Error
                • mean vs median
                • Multicollinearity
                • non-parametric
                • Odds
                • Odds vs Probability
                • p values
                • Parametric tests
                • parametric vs non-parametric models
                • parametric vs non-parametric tests
                • parsimonious
                • Prediction Intervals
                • Probability
                • Proportion Test
                • Q-Q Plot
                • R
                • R squared
                • R-squared metric not always a good indicator of model performance in regression
                • Reasoning tokens
                • Root Mean Squared Error
                • Sampling
                • Spearman vs Pearson Correlation
                • Standard deviation
                • Standardisation
                • Statistical Assumptions
                • Statistical Tests
                • Statistical theorems
                • Statistics
                • statsmodels
                • Stochastic Gradient Descent
                • Symbolic computation
                • Sympy
                • T-test
                • univariate vs multivariate
                • Variance
                • Violin plot
                • Z-Normalisation
                • Z-Score
                • Z-Scores vs Prediction Intervals
                • Z-Test
              • uncategorised
                • Investigate pyodbc
                • NLP Portal
                • Science Portal
              • pages
                • Data Archive
                • DE_Tools
                • ML_Tools
                • Quotes
                • Research Questions
                • Reviews
              • categories
                • computer-science
                  • Algorithms
                  • Big O Notation
                  • BM25 (Best Match 25)
                  • Checksum
                  • Computer Science
                  • Concurrency
                  • Convex Optimisation
                  • csv module
                  • Directed Acyclic Graph (DAG)
                  • Flask
                  • garbage collector
                  • Generators in Python
                  • Hash
                  • Heap Data Structure
                  • Heap Memory
                  • How to search within a graph
                  • Immutable vs mutable
                  • Java
                  • Java vs JavaScript
                  • JavaScript
                  • Knowledge Graph
                  • Langchain
                  • Machine Learning Algorithms
                  • Monte Carlo Simulation
                  • Multiprocessing vs Multithreading
                  • Multithreading
                  • neomodel
                  • Node.JS
                  • Numpy
                  • Processes vs Threads
                  • programming languages
                  • PyGraphviz
                  • QuickSort
                  • Ranking models
                  • Recursive Algorithm
                  • Strongly vs Weakly typed language
                  • Times Series Python Packages
                • data-analysis
                  • Altair
                  • altair versus seaborn
                  • Binder
                  • Boxplot
                  • Dash
                  • Dashboarding
                  • Dashboards
                  • Data Analysis
                  • Data Analysis Portal
                  • Data Analyst
                  • Data Distribution
                  • Data Mining
                  • Data Product
                  • Data Reduction
                  • Data Visualisation
                  • DuckDB
                  • EDA
                  • ER Diagrams
                  • Heatmap
                  • Label encoding
                  • Linear Discriminant Analysis
                  • Log transformation
                  • Looker Studio
                  • MariaDB vs MySQL
                  • Melt
                  • Multiple Correspondence Analysis
                  • Multivariate Analysis
                  • OLAP
                  • Page Rank
                  • Parquet
                  • Plotly
                  • PowerBI
                  • Preprocessing
                  • Preprocessing Text Classification
                  • Seaborn
                  • SQL Window functions
                  • t-SNE
                  • Tableau
                • data-engineering
                  • ACID Transaction
                  • Ada boosting
                  • Adding a database to PostgreSQL
                  • Aggregation
                  • Apache Iceberg
                  • Attack mitigation
                  • Attack types
                  • AWS Lambda
                  • Azure
                  • Bagging
                  • Benefits of Data Transformation
                  • Big Data
                  • BigQuery
                  • Cassandra
                  • Cloud Providers
                  • Coaching & Mentoring
                  • Columnar Storage
                  • Command Prompt
                  • Common Table Expression
                  • Components of the database
                  • Covering Index
                  • Crosstab
                  • CRUD
                  • CUDA
                  • Curse of dimensionality
                  • Cypher
                  • Data Architect
                  • Data Architecture
                  • Data Cleansing
                  • Data Contract
                  • Data Deployment
                  • Data Dictionary
                  • Data Drift
                  • Data Engineering
                  • Data Engineering Portal
                  • Data Engineering Tools
                  • Data Evaluation
                  • Data Hierarchy of Needs
                  • Data Integration
                  • Data Integrity
                  • Data Lake
                  • Data Lakehouse
                  • Data Leakage
                  • Data Lifecycle Management
                  • data lineage
                  • Data Management
                  • Data Modeling
                  • Data Observability
                  • Data Principles
                  • Data Quality
                  • Data Security
                  • Data Selection
                  • Data Sources
                  • Data Storage
                  • Data Transformation
                  • Data Transformation in Data Engineering
                  • Data Transformation with Pandas
                  • Data Validation
                  • Data Virtualization
                  • Data Warehouse
                  • Database
                  • Database Index
                  • Database Management System (DBMS)
                  • Database Schema
                  • Database Storage
                  • Database Techniques
                  • Databricks 1
                  • DataOps
                  • dbt 1
                  • design pattern
                  • Digital twin
                  • Distributed Computing
                  • DuckDB in python
                  • DuckDB vs SQLite
                  • Durability
                  • ELT
                  • Estimator
                  • ETL
                  • ETL 1
                  • ETL Pipeline Example
                  • ETL vs ELT
                  • EtLT
                  • Event Driven Microservices
                  • Event-Driven Architecture
                  • Fabric
                  • Faker
                  • File Management
                  • Folder Tree Diagram
                  • Foreign Key
                  • Github Actions
                  • Google Sheet Pivots Table
                  • Grain
                  • Graph Query Language
                  • Groupby
                  • Groupby vs Crosstab
                  • heterogeneous features
                  • Honkit
                  • Hosting
                  • How is schema evolution done in practice with SQL
                  • How to normalise a merged table
                  • Implementing Database Schema
                  • Imputation Techniques
                  • in-memory format
                  • incremental synchronization
                  • Indexing in cypher
                  • Input is Not Properly Sanitized
                  • Joining Datasets
                  • Junction Tables
                  • KNIME
                  • Logical Model
                  • Many-to-Many Relationships
                  • map reduce
                  • MariaDB
                  • master data management
                  • Merge
                  • Microsoft Access
                  • Missing Data
                  • Model Deployment
                  • Monolith Architecture
                  • Multi-level index
                  • Multiprocessing
                  • MySql
                  • neo4j
                  • Normalised Schema
                  • NoSQL
                  • Object Relational Mapper
                  • OLTP
                  • Overfitting
                  • Pandas
                  • Pandas join vs merge
                  • Pandas Pivot Table
                  • Pandas Stack
                  • pd.Grouper
                  • pgAdmin
                  • Pgadmin Permissions on Windows
                  • Physical Model
                  • Pickle
                  • Poetry
                  • Polars
                  • PostgreSQL
                  • Postman
                  • PowerShell
                  • Prevention Is Better Than The Cure
                  • Primary Key
                  • Push-Down
                  • Pydantic
                  • Pyright vs Pydantic
                  • Query Optimisation
                  • Querying
                  • Querying Time Series
                  • Race Conditions
                  • Relating Tables Together
                  • Relational Database
                  • reverse etl
                  • rollup
                  • Row parameters in SQL
                  • Row-based Storage
                  • Scalability
                  • Scaling Server
                  • Schema Evolution
                  • Search
                  • Security mitigation
                  • Security Researcher
                  • semantic layer
                  • Single Source of Truth
                  • Sklearn Pipiline
                  • Slowly Changing Dimension
                  • SMSS
                  • Snowflake Schema
                  • Soft Deletion
                  • Software Design Patterns
                  • Spreadsheets vs Databases
                  • SQL
                  • SQL Groupby
                  • SQL Injection
                  • SQL Joins
                  • SQLAlchemy
                  • SQLAlchemy vs. sqlite3
                  • SQLite
                  • SQLite Studio
                  • Star Schema
                  • storage layer object store
                  • Stored Procedures
                  • structured data
                  • Structuring and organizing data
                  • Transaction
                  • Turning a flat file into a database
                  • Types of Database Schema
                  • Unix
                  • unstructured data
                  • Usability
                  • Vacuum
                  • Vector Database
                  • Vectorized Engine
                  • View Use Case
                  • Views
                  • Windows Subsystem for Linux
                • data-science
                  • ACF Plots
                  • Additive vs Multiplicative Models Time Series
                  • ADF Test
                  • Agent Exploration
                  • Agentic Solutions
                  • AI
                  • ARIMA
                  • ARIMA vs Random Forest in Time Series
                  • Autocorrelation
                  • Autocorrelation vs Autoregression
                  • Autoregression
                  • Baseline Forecast
                  • Basics of Time Series
                  • Batch gradient descent
                  • Bellman Equations
                  • Bias-Variance Trade Off
                  • Capability
                  • Choosing a Threshold
                  • Choosing the Number of Clusters
                  • Clustermap
                  • Covariance Structures
                  • Cross Validation
                  • Data Assessment
                  • Data Collection
                  • Data Mining - CRISP
                  • Data Preparation
                  • Data Science
                  • Data Scientist
                  • Data Understanding
                  • Datasets
                  • Decomposition in Time Series
                  • Differencing in Time Series
                  • DS & ML Portal
                  • Evaluating Time Series Forecasts
                  • Evolving Seasonality
                  • F-statistic
                  • Feature Engineering
                  • Feature Scaling
                  • Feature Selection vs Feature Importance
                  • Forecasting using Lags
                  • Forward Propagation
                  • Gaussian Mixture Models
                  • Gitlab
                  • Gompertz Model
                  • Good Enough Principle in Data Projects
                  • GraphRAG
                  • Handling Missing Data
                  • Holt-Winters (Exponential Smoothing)
                  • Holt-Winters vs ARIMA
                  • Holt’s Linear Trend Model (Double Exponential Smoothing)
                  • how do you do the data selection
                  • Imbalanced Datasets
                  • Interpolation
                  • Intervention Analysis
                  • Joining Time Series
                  • Kernel Machines
                  • KPSS Test
                  • Latency
                  • Logistic Model Curve
                  • LSTM in Time Series
                  • Mean Absolute Percentage Error
                  • MNIST
                  • Normalisation
                  • Out-of-sample rolling forecast evaluation
                  • PACF Plots
                  • Performance Dimensions
                  • pmdarima
                  • Properties of Time Series Models
                  • Random Forest Regression
                  • Residuals in Time Series
                  • Scatter Plots
                  • Scientific Method
                  • Scipy
                  • Seasonal Naive Forecast
                  • Seasonality in Time Series
                  • SHapley Additive exPlanations
                  • Shot Learning
                  • Silhouette Analysis
                  • Simple Exponential Smoothing (SES)
                  • sklearn datasets
                  • SMOTE (Synthetic Minority Over-sampling Technique)
                  • SparseCategorialCrossentropy or CategoricalCrossEntropy
                  • stack memory
                  • Stacking
                  • Stationary Time Series
                  • STL Decomposition
                  • Time Series
                  • Time Series Forecasting
                  • Time Series Forecasts in Business
                  • Time Series Learning Resources
                  • Time Series Shocks
                  • Trends in Time Series
                • deep-learning
                  • Convolutional Neural Networks
                  • Deep Learning
                  • How is reinforcement learning being combined with deep learning
                  • LSTM
                  • Multi-Agent Reinforcement Learning
                  • Policy
                  • Relu
                  • Sarsa
                • devops
                  • AB testing
                  • Alternatives to Batch Processing
                  • Amazon S3
                  • Apache Airflow
                  • Apache Kafka
                  • Apache Spark
                  • API
                  • API Driven Microservices
                  • Bash
                  • bat
                  • Batch Processing
                  • Batch vs PowerShell scripts
                  • CI-CD
                  • Clustering_Dashboard.py
                  • Code Diagrams
                  • Command Line
                  • Continuous Delivery - Deployment
                  • Continuous Integration
                  • Cron jobs
                  • dagster
                  • Data Ingestion
                  • Data Orchestration
                  • Data Pipeline
                  • Data Pipeline to Data Products
                  • Data Streaming
                  • Databricks
                  • Databricks vs Snowflake
                  • dbt
                  • Debugging
                  • Declarative Data Pipeline
                  • dependency manager
                  • DevOps
                  • Devops Portal
                  • Digital Transformation
                  • Docker
                  • Docker Image
                  • Elastic Net
                  • Environment Variables
                  • Epub
                  • Event Driven
                  • Event Driven Events
                  • Everything
                  • Excel
                  • Excel pivot table
                  • Excel vs Google Sheets
                  • FastAPI
                  • Firebase
                  • frontend
                  • functional programming
                  • GIS
                  • Git
                  • Github Gists
                  • gitlab-ci.yml
                  • Global Interpreter Lock
                  • Google Cloud Platform
                  • Google Colab
                  • Google My Maps Data Extraction
                  • Google Sheets
                  • GPT
                  • Gradio
                  • Grep
                  • Hadoop
                  • Hugging Face
                  • imperative
                  • ipynb
                  • jinja template
                  • Json
                  • Json to SQLite
                  • jupytext
                  • Justfile
                  • kubernetes
                  • Load Balancing
                  • Maintainability
                  • Maintainable Code
                  • Makefile
                  • Master Observability Datadog
                  • Memory
                  • Memory Caching
                  • Microsoft
                  • MongoDB
                  • nbconvert
                  • NET
                  • Normalisation of Text
                  • Pandas Series vs DataFrame
                  • Pandoc
                  • PMML
                  • Powerquery
                  • Powershell scripts
                  • Powershell versus Command Prompt
                  • Powershell vs Bash
                  • Publish and Subscribe
                  • PySpark
                  • Pytest
                  • Python
                  • Python Click
                  • Quartz
                  • Random Access Memory
                  • React
                  • Registering a Scheduled Task
                  • REST API
                  • Scala
                  • Security Vulnerabilities
                  • shapefile
                  • Sharepoint
                  • Snowflake
                  • Snowflake vs Hadoop
                  • Software Development Life Cycle
                  • SQL vs NoSQL
                  • Streamlit
                  • Technical Design Doc Template
                  • Terminal commands
                  • Testing
                  • TOML
                  • tool.bandit
                  • tool.ruff
                  • tool.uv
                  • Types of Computational Bugs
                  • TypeScript
                  • Ubuntu
                  • unittest
                  • Vercel
                  • Virtual environments
                  • Web Feature Server (WFS)
                  • Web Map Tile Service (WMTS)
                  • Why JSON is Better than Pickle for Untrusted Data
                  • Windows
                  • Windows Scheduled Tasks
                  • yaml
                • industry
                  • AI Engineer
                  • AI governance
                  • Analytics Engineer
                  • business intelligence
                  • Business observability
                  • Business Understanding
                  • Business Values
                  • Data AI Education at Work
                  • Data Engineer
                  • Data Governance
                  • data literacy
                  • Data Roles
                  • Data Steward
                  • Design Thinking Questions
                  • Documentation & Meetings
                  • Energy
                  • Energy ABM
                  • Energy Demand Forecasting
                  • Energy Storage
                  • Facts
                  • Gartner Hype Cycle
                  • Industries of interest
                  • Knowledge Work
                  • Managing People
                  • ML Engineer
                  • Network Design
                  • Operational Resilience for Growth and Adaptability
                  • Reporting
                  • Scaling Data Science Capability
                  • Smart Grids
                  • Telecommunications
                  • Thinking Systems
                  • Use of RNNs in energy sector
                  • Working with SMEs
                • machine-learning
                  • Accuracy
                  • Activation atlases
                  • Activation Function
                  • Active Learning
                  • Adam Optimizer
                  • Adaptive Learning Rates
                  • Adjusted R squared
                  • Agent-Based Modelling
                  • AIC in Model Evaluation
                  • Anomaly Detection
                  • Anomaly Detection in Time Series
                  • Anomaly Detection with Clustering
                  • Anomaly Detection with Statistical Methods
                  • Assessing Gen AI generated content
                  • AUC
                  • Automated Feature Creation
                  • AutoML
                  • Backpropagation
                  • Batch Normalisation
                  • Bias in ML
                  • Binary Classification
                  • Boosting
                  • Business value of anomaly detection
                  • CART
                  • CatBoost
                  • Challenges to Model Deployment
                  • Class Separability
                  • Classification
                  • Classification Report
                  • Cluster Density
                  • Cluster Seperation
                  • Clustering
                  • Collaborative Filtering
                  • conceptual data model
                  • Confusion Matrix
                  • Cost Function
                  • Cost-Sensitive Analysis
                  • Cross Entropy
                  • Customer Growth Modeling
                  • Data Selection in ML
                  • Data Transformation in Machine Learning
                  • DBSCAN
                  • Decision Theory
                  • Decision Tree
                  • Decision Trees are Fragile
                  • Deep Learning Frameworks
                  • Deep Q-Learning
                  • Dendrograms
                  • Determining Threshold Values
                  • Dimension Table
                  • Dimensional Modelling
                  • Dimensionality Reduction
                  • Dimensions
                  • Distributions in Decision Tree Leaves
                  • Dropout
                  • Dummy variable trap
                  • Edge ML
                  • emergent behavior
                  • Encoding Categorical Variables
                  • Epoch
                  • Evaluating Language Models
                  • Evaluating Logistic Regression
                  • Evaluating the effectiveness of prompts
                  • Evaluation Metrics
                  • Exploration vs Exploitation
                  • Exponential Smoothing
                  • f-regression
                  • F1 Score
                  • Fact Table
                  • FAISS
                  • Feature Engineering for Time Series
                  • Feature Evaluation
                  • Feature Extraction
                  • Feature Importance
                  • Feature Selection
                  • Feature Transformations
                  • Feed Forward Neural Network
                  • Filter Methods
                  • Fitting weights and biases of a neural network
                  • Framework for models
                  • Gaussian Model
                  • General Linear Regression
                  • Generalisation
                  • Generative Adversarial Networks
                  • Gini Impurity
                  • Gini Impurity vs Cross Entropy
                  • Gradient Boosted Trees
                  • Gradient Boosting
                  • Gradient Boosting Regressor
                  • Gradient Descent
                  • Gradient descent in linear regression
                  • granularity
                  • Graph Neural Network
                  • Graph Theory Community
                  • GridSeachCv
                  • Growth Models in Time Series
                  • GRU
                  • Hierarchical Clustering
                  • High cross validation accuracy is not directly proportional to performance on unseen test data
                  • Histogram
                  • How do we evaluate of LLM Outputs
                  • How to use Sklearn Pipeline
                  • Hyperparameter
                  • Hyperparameter Tuning
                  • Impact of multicollinearity on model parameters
                  • Inertia K Means Cost Function
                  • inference
                  • inference versus prediction
                  • initialization methods
                  • Interoperability
                  • interoperable
                  • Interpretability
                  • Interpreting logistic regression model parameters
                  • Isolated Forest
                  • Jaccard Coefficient
                  • K-means
                  • K-nearest neighbours
                  • Keras
                  • Kernel Density Estimation
                  • Kernelling
                  • Kmeans vs GMM
                  • L1 Regularisation
                  • Label encoding vs One-hot encoding
                  • Labelling data
                  • Lagrange multipliers in optimisation
                  • lambda architecture
                  • Latent Dirichlet Allocation
                  • Latent Semantic Indexing
                  • LBFGS
                  • Learning Curve
                  • Learning Rate
                  • Learning Styles
                  • LightGBM
                  • LightGBM vs XGBoost vs CatBoost
                  • Linear Regression
                  • LLM Evaluation Metrics
                  • Local Interpretable Model-agnostic Explainations
                  • Local Outlier Factor (LOF)
                  • Logistic Regression
                  • Logistic Regression does not predict probabilities
                  • Logistic regression in sklearn & Gradient Descent
                  • Logistic Regression Statsmodel Summary table
                  • Loss function
                  • Loss versus Cost function
                  • Machine Learning
                  • Machine Learning Operations
                  • Manifold Learning
                  • Markov Decision Processes
                  • Maximum Likelihood Estimation
                  • Median Absolute Error
                  • Mermaid
                  • Metadata Handling
                  • Methods for Handling Outliers
                  • Metric
                  • Mini-batch gradient descent
                  • MLOPS for Time Series
                  • Model Building
                  • Model Deployment using PyCaret
                  • Model Ensemble
                  • Model Evaluation
                  • Model Evaluation vs Model Optimisation
                  • Model Interpretability
                  • Model Observability
                  • Model Optimisation
                  • Model Parameters
                  • Model Parameters Tuning
                  • Model parameters vs hyperparameters
                  • Model Selection
                  • Model Validation
                  • model-agnostic feature importance
                  • Momentum
                  • Moving Average Forecast
                  • Multinomial Naive bayes
                  • Multiple Linear Regression
                  • Naive Bayes Classifier
                  • Naive Forecast
                  • Neural network
                  • Neural Network Classification
                  • Neural network in Practice
                  • Neural Scaling Laws
                  • Non-negative matrix factorization in ML
                  • Non-parametric tests
                  • Normalisation of data
                  • Normalisation vs Standardisation
                  • objective function
                  • One-hot encoding
                  • Optimisation function
                  • Optimisation techniques
                  • Optimising a Logistic Regression Model
                  • Optimising Neural Networks
                  • Optuna
                  • Ordinary Least Squares
                  • Orthogonalization
                  • Outliers
                  • Over parameterised models
                  • PCA Explained Variance Ratio
                  • PCA Principal Components
                  • PCA-Based Anomaly Detection
                  • PDP and ICE
                  • Percentile Detection
                  • Performance Drift
                  • Polynomial Regression
                  • Positional Encoding
                  • Precision
                  • Precision or Recall
                  • Precision-Recall Curve
                  • Prediction Intervals vs Confidence Interval
                  • Principal Component Analysis
                  • PyCaret
                  • PyOD
                  • PyTorch
                  • Pytorch vs Tensorflow
                  • Q-Learning
                  • Random Forest
                  • Random Forest for Time Series
                  • Recall
                  • Recommender systems
                  • Recurrent Neural Networks
                  • Regression
                  • Regression Metrics
                  • Regularisation
                  • Regularisation of Tree based models
                  • Reinforcement learning
                  • Relationships in memory
                  • Reward Function
                  • Ridge
                  • ROC (Receiver Operating Characteristic)
                  • Sammon’s Mapping
                  • SARIMA
                  • Scikit-Learn
                  • Secretary Problem
                  • semi-structured data
                  • Sentence Transformers
                  • Sklearn Pipeline
                  • Specificity
                  • Spectral Clustering
                  • Supervised Learning
                  • Support Vector Classifier
                  • Support Vector Machines
                  • Support Vector Regression
                  • Tensorflow
                  • Test Loss When Evaluating Models
                  • Text Classification
                  • Time Series Python Packages
                  • Train-Dev-Test Sets
                  • Transfer Learning
                  • Transformed Target Regressor
                  • Transformer
                  • Transformers vs RNNs
                  • Type I Error (False Positive)
                  • Type II Error (False Negative)
                  • Types of Neural Networks
                  • Typical Output Formats in Neural Networks
                  • UMAP
                  • Unsupervised Learning
                  • Use Cases for a Simple Neural Network Like
                  • vanishing and exploding gradients problem
                  • Variability in linear models
                  • Variance in ML
                  • Vector Embedding
                  • WCSS and elbow method
                  • Weak Learners
                  • When and why not to us regularisation
                  • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
                  • Why does the Adam Optimizer converge
                  • Why Removing Outliers May Improve Regression but Harm Classification
                  • Why standardise features
                  • Why Type 1 and Type 2 matter
                  • Wrapper Methods
                  • Xaiver
                  • XGBoost
                • natural-language
                  • AI Agents Memory
                  • Attention mechanism
                  • Bag of words
                  • BERT
                  • BERTScore
                  • Chain of thought
                  • ChatGPT
                  • Claude
                  • Comparing LLMs
                  • Distillation
                  • ElasticSearch
                  • Embedded Methods
                  • embeddings for OOV words
                  • Evaluate Embedding Methods
                  • Fuzzywuzzy
                  • Generative AI
                  • Generative AI From Theory to Practice
                  • Grammar method
                  • Guardrails
                  • How businesses use Gen AI
                  • How LLMs store facts
                  • How to reduce the need for Gen AI responses
                  • How would you decide between using TF-IDF and Word2Vec for text vectorization
                  • In NER how would you handle ambiguous entities
                  • Key Components of Attention and Formula
                  • Knowledge graph vs RAG setup
                  • Language Model Output Optimisation
                  • Language Models
                  • Language Models Large (LLMs) vs Small (SLMs)
                  • lemmatization
                  • LLM
                  • LLM Memory
                  • Local LLM use cases
                  • Mathematical Reasoning in Transformers
                  • Mixture of Experts
                  • Model Cascading
                  • Multi-head attention
                  • Named Entity Recognition
                  • NER Implementation
                  • Ngrams
                  • NLP
                  • nltk
                  • Non-negative Matrix Factorization
                  • NotebookLM
                  • OOV words
                  • Pandas Dataframe Agent
                  • Part of speech tagging
                  • Prompt Engineering
                  • prompt retrievers
                  • Prompts
                  • Pyright
                  • RAG
                  • Scaling Agentic Systems
                  • Self attention vs multi-head attention
                  • Self-Attention
                  • Semantic Relationships
                  • Semantic search
                  • Sentence Similarity
                  • Sentence Transformer Workflow
                  • Similarity Search
                  • Small Language Models
                  • spaCy
                  • Stemming
                  • stopwords
                  • Summarisation
                  • syntactic relationships
                  • Text2Cypher
                  • TF-IDF
                  • TF-IDF Implementation
                  • Tokenisation
                  • topic modeling
                  • Vectorisation
                  • Why is named entity recognition (NER) a challenging task
                  • Word2vec
                  • WordNet
                • OTHER
                  • Addressing_Multicollinearity.py
                  • Bag_of_Words.py
                  • Bandit example output
                  • Bandit_Example_Fixed.py
                  • Click_Implementation.py
                  • Comparing_Ensembles.py
                  • Cross_Entropy_Single.py
                  • Cross_Entropy.py
                  • Debugging.py
                  • Distribution_Analysis.py
                  • Factor_Analysis.py
                  • FastAPI_Example.py
                  • Feature_Distribution.py
                  • Forecasting_AutoArima.py
                  • Forecasting_Baseline.py
                  • Forecasting_Exponential_Smoothing.py
                  • Gaussian_Mixture_Model_Implementation.py
                  • Handling_Missing_Data_Basic.ipynb
                  • Handling_Missing_Data.ipynb
                  • Heatmaps_Dendrograms.py
                  • Imbalanced_Datasets_SMOTE.py
                  • K_Means.py
                  • Momentum.py
                  • One_hot_encoding.py
                  • Pandas_Common.py
                  • Pandas_Stack.py
                  • PCA_Analysis.ipynb
                  • PCA_Based_Anomaly_Detection.py
                  • Pycaret_Anomaly.ipynb
                  • Pycaret_Example.py
                  • Pydantic_More.py
                  • Pydantic.py
                  • Regression_Logistic_Metrics.ipynb
                  • Regularisation.py
                  • ROC_Curve.py
                  • SVM_Example.py
                  • Testing_Pytest.py
                  • Testing_unittest.py
                  • transfer_learning.py
                  • TS_Anomaly_Detection.py
                  • Vector_Embedding.py
                  • Wikipedia_API.py
                  • Word2Vec.py
                • PAPER
                  • Attention Is All You Need
                  • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
                • project-management
                  • 1-on-1 Template
                  • 1-to-1's with a Line Manager
                  • Asking questions
                  • Change Management
                  • Communication principles
                  • Communication Techniques
                  • Communication with Stakeholders
                  • Conceptual Model
                  • Documentation
                  • Education and Training
                  • Experiment Plan Template
                  • Feedback Template
                  • Fishbone diagram
                  • How to do git commit messages properly
                  • html
                  • Jobs to be done
                  • Jupyter Book
                  • Managing Data Science Teams
                  • Modern data team
                  • nbconvert slideshows
                  • One Pager Template
                  • pdoc
                  • Problem Definition
                  • Process for prototyping
                  • project management
                  • Project Management Portal
                  • Pull Request Template
                  • RACI
                  • Remaining useful life models
                  • Return of Experience Form
                  • Reveal.js
                  • Technical Debt
                  • UML
                  • Why use ER diagrams
                • statistics
                  • Addressing Multicollinearity
                  • ANOVA
                  • Assumption of Normality
                  • Bernoulli
                  • Bootstrap Sampling
                  • Casual Inference
                  • Central Limit Theorem
                  • Central Limit Theorem & Small Sample Sizes
                  • Chi-Squared Test
                  • Confidence Interval
                  • Correlation
                  • Correlation vs Causation
                  • Cosine Similarity
                  • Covariance
                  • Covariance vs Correlation
                  • Cryptography
                  • Differentation
                  • Distributions
                  • EM Algorithm
                  • Factor Analysis
                  • Gaussian Distribution
                  • Graph Theory
                  • Grouped plots
                  • Handling Different Distributions
                  • Hypothesis testing
                  • information theory
                  • Interquartile Range (IQR) Detection
                  • Johnson–Lindenstrauss lemma
                  • Markov chain
                  • Mathematics
                  • Mean Absolute Error
                  • Mean Squared Error
                  • mean vs median
                  • Multicollinearity
                  • non-parametric
                  • Odds
                  • Odds vs Probability
                  • p values
                  • Parametric tests
                  • parametric vs non-parametric models
                  • parametric vs non-parametric tests
                  • parsimonious
                  • Prediction Intervals
                  • Probability
                  • Proportion Test
                  • Q-Q Plot
                  • R
                  • R squared
                  • R-squared metric not always a good indicator of model performance in regression
                  • Reasoning tokens
                  • Root Mean Squared Error
                  • Sampling
                  • Spearman vs Pearson Correlation
                  • Standard deviation
                  • Standardisation
                  • Statistical Assumptions
                  • Statistical Tests
                  • Statistical theorems
                  • Statistics
                  • statsmodels
                  • Stochastic Gradient Descent
                  • Symbolic computation
                  • Sympy
                  • T-test
                  • univariate vs multivariate
                  • Variance
                  • Violin plot
                  • Z-Normalisation
                  • Z-Score
                  • Z-Scores vs Prediction Intervals
                  • Z-Test
                • uncategorised
                  • Investigate pyodbc
                  • NLP Portal
                  • Science Portal
                • pages
                  • Data Archive
                  • DE_Tools
                  • ML_Tools
                  • Quotes
                  • Research Questions
                  • Reviews

              Backlinks

              • No backlinks found

              Created with Quartz v4.3.1 © 2025

              • GitHub
              • Linkedin