Data Archive

    • categories
      • computer-science
        • Algorithms
        • Big O Notation
        • BM25 (Best Match 25)
        • Checksum
        • Computer Science
        • Concurrency
        • Convex Optimisation
        • csv module
        • Directed Acyclic Graph (DAG)
        • Flask
        • garbage collector
        • Generators in Python
        • Hash
        • Heap Data Structure
        • Heap Memory
        • How to search within a graph
        • Immutable vs mutable
        • Java
        • Java vs JavaScript
        • JavaScript
        • Knowledge Graph
        • Langchain
        • Machine Learning Algorithms
        • Monte Carlo Simulation
        • Multiprocessing vs Multithreading
        • Multithreading
        • neomodel
        • Node.JS
        • Numpy
        • Processes vs Threads
        • programming languages
        • PyGraphviz
        • QuickSort
        • Ranking models
        • Recursive Algorithm
        • Strongly vs Weakly typed language
        • Times Series Python Packages
      • data-analysis
        • Altair
        • altair versus seaborn
        • Binder
        • Boxplot
        • Dash
        • Dashboarding
        • Dashboards
        • Data Analysis
        • Data Analysis Portal
        • Data Analyst
        • Data Distribution
        • Data Mining
        • Data Product
        • Data Reduction
        • Data Visualisation
        • DuckDB
        • EDA
        • ER Diagrams
        • Heatmap
        • Label encoding
        • Linear Discriminant Analysis
        • Log transformation
        • Looker Studio
        • MariaDB vs MySQL
        • Melt
        • Multiple Correspondence Analysis
        • Multivariate Analysis
        • OLAP
        • Page Rank
        • Parquet
        • Plotly
        • PowerBI
        • Preprocessing
        • Preprocessing Text Classification
        • Seaborn
        • SQL Window functions
        • t-SNE
        • Tableau
      • data-engineering
        • ACID Transaction
        • Ada boosting
        • Adding a database to PostgreSQL
        • Aggregation
        • Apache Iceberg
        • Attack mitigation
        • Attack types
        • AWS Lambda
        • Azure
        • Bagging
        • Benefits of Data Transformation
        • Big Data
        • BigQuery
        • Cassandra
        • Cloud Providers
        • Coaching & Mentoring
        • Columnar Storage
        • Command Prompt
        • Common Table Expression
        • Components of the database
        • Covering Index
        • Crosstab
        • CRUD
        • CUDA
        • Curse of dimensionality
        • Cypher
        • Data Architect
        • Data Architecture
        • Data Cleansing
        • Data Contract
        • Data Deployment
        • Data Dictionary
        • Data Drift
        • Data Engineering
        • Data Engineering Portal
        • Data Engineering Tools
        • Data Evaluation
        • Data Hierarchy of Needs
        • Data Integration
        • Data Integrity
        • Data Lake
        • Data Lakehouse
        • Data Leakage
        • Data Lifecycle Management
        • data lineage
        • Data Management
        • Data Modeling
        • Data Observability
        • Data Principles
        • Data Quality
        • Data Security
        • Data Selection
        • Data Sources
        • Data Storage
        • Data Transformation
        • Data Transformation in Data Engineering
        • Data Transformation with Pandas
        • Data Validation
        • Data Virtualization
        • Data Warehouse
        • Database
        • Database Index
        • Database Management System (DBMS)
        • Database Schema
        • Database Storage
        • Database Techniques
        • Databricks 1
        • DataOps
        • dbt 1
        • design pattern
        • Digital twin
        • Distributed Computing
        • DuckDB in python
        • DuckDB vs SQLite
        • Durability
        • ELT
        • Estimator
        • ETL
        • ETL 1
        • ETL Pipeline Example
        • ETL vs ELT
        • EtLT
        • Event Driven Microservices
        • Event-Driven Architecture
        • Fabric
        • Faker
        • File Management
        • Folder Tree Diagram
        • Foreign Key
        • Github Actions
        • Google Sheet Pivots Table
        • Grain
        • Graph Query Language
        • Groupby
        • Groupby vs Crosstab
        • heterogeneous features
        • Honkit
        • Hosting
        • How is schema evolution done in practice with SQL
        • How to normalise a merged table
        • Implementing Database Schema
        • Imputation Techniques
        • in-memory format
        • incremental synchronization
        • Indexing in cypher
        • Input is Not Properly Sanitized
        • Joining Datasets
        • Junction Tables
        • KNIME
        • Logical Model
        • Many-to-Many Relationships
        • map reduce
        • MariaDB
        • master data management
        • Merge
        • Microsoft Access
        • Missing Data
        • Model Deployment
        • Monolith Architecture
        • Multi-level index
        • Multiprocessing
        • MySql
        • neo4j
        • Normalised Schema
        • NoSQL
        • Object Relational Mapper
        • OLTP
        • Overfitting
        • Pandas
        • Pandas join vs merge
        • Pandas Pivot Table
        • Pandas Stack
        • pd.Grouper
        • pgAdmin
        • Pgadmin Permissions on Windows
        • Physical Model
        • Pickle
        • Poetry
        • Polars
        • PostgreSQL
        • Postman
        • PowerShell
        • Prevention Is Better Than The Cure
        • Primary Key
        • Push-Down
        • Pydantic
        • Pyright vs Pydantic
        • Query Optimisation
        • Querying
        • Querying Time Series
        • Race Conditions
        • Relating Tables Together
        • Relational Database
        • reverse etl
        • rollup
        • Row parameters in SQL
        • Row-based Storage
        • Scalability
        • Scaling Server
        • Schema Evolution
        • Search
        • Security mitigation
        • Security Researcher
        • semantic layer
        • Single Source of Truth
        • Sklearn Pipiline
        • Slowly Changing Dimension
        • SMSS
        • Snowflake Schema
        • Soft Deletion
        • Software Design Patterns
        • Spreadsheets vs Databases
        • SQL
        • SQL Groupby
        • SQL Injection
        • SQL Joins
        • SQLAlchemy
        • SQLAlchemy vs. sqlite3
        • SQLite
        • SQLite Studio
        • Star Schema
        • storage layer object store
        • Stored Procedures
        • structured data
        • Structuring and organizing data
        • Transaction
        • Turning a flat file into a database
        • Types of Database Schema
        • Unix
        • unstructured data
        • Usability
        • Vacuum
        • Vector Database
        • Vectorized Engine
        • View Use Case
        • Views
        • Windows Subsystem for Linux
      • data-science
        • ACF Plots
        • Additive vs Multiplicative Models Time Series
        • ADF Test
        • Agent Exploration
        • Agentic Solutions
        • AI
        • ARIMA
        • ARIMA vs Random Forest in Time Series
        • Autocorrelation
        • Autocorrelation vs Autoregression
        • Autoregression
        • Baseline Forecast
        • Basics of Time Series
        • Batch gradient descent
        • Bellman Equations
        • Bias-Variance Trade Off
        • Capability
        • Choosing a Threshold
        • Choosing the Number of Clusters
        • Clustermap
        • Covariance Structures
        • Cross Validation
        • Data Assessment
        • Data Collection
        • Data Mining - CRISP
        • Data Preparation
        • Data Science
        • Data Scientist
        • Data Understanding
        • Datasets
        • Decomposition in Time Series
        • Differencing in Time Series
        • DS & ML Portal
        • Evaluating Time Series Forecasts
        • Evolving Seasonality
        • F-statistic
        • Feature Engineering
        • Feature Scaling
        • Feature Selection vs Feature Importance
        • Forecasting using Lags
        • Forward Propagation
        • Gaussian Mixture Models
        • Gitlab
        • Gompertz Model
        • Good Enough Principle in Data Projects
        • GraphRAG
        • Handling Missing Data
        • Holt-Winters (Exponential Smoothing)
        • Holt-Winters vs ARIMA
        • Holt’s Linear Trend Model (Double Exponential Smoothing)
        • how do you do the data selection
        • Imbalanced Datasets
        • Interpolation
        • Intervention Analysis
        • Joining Time Series
        • Kernel Machines
        • KPSS Test
        • Latency
        • Logistic Model Curve
        • LSTM in Time Series
        • Mean Absolute Percentage Error
        • MNIST
        • Normalisation
        • Out-of-sample rolling forecast evaluation
        • PACF Plots
        • Performance Dimensions
        • pmdarima
        • Properties of Time Series Models
        • Random Forest Regression
        • Residuals in Time Series
        • Scatter Plots
        • Scientific Method
        • Scipy
        • Seasonal Naive Forecast
        • Seasonality in Time Series
        • SHapley Additive exPlanations
        • Shot Learning
        • Silhouette Analysis
        • Simple Exponential Smoothing (SES)
        • sklearn datasets
        • SMOTE (Synthetic Minority Over-sampling Technique)
        • SparseCategorialCrossentropy or CategoricalCrossEntropy
        • stack memory
        • Stacking
        • Stationary Time Series
        • STL Decomposition
        • Time Series
        • Time Series Forecasting
        • Time Series Forecasts in Business
        • Time Series Learning Resources
        • Time Series Shocks
        • Trends in Time Series
      • deep-learning
        • Convolutional Neural Networks
        • Deep Learning
        • How is reinforcement learning being combined with deep learning
        • LSTM
        • Multi-Agent Reinforcement Learning
        • Policy
        • Relu
        • Sarsa
      • devops
        • AB testing
        • Alternatives to Batch Processing
        • Amazon S3
        • Apache Airflow
        • Apache Kafka
        • Apache Spark
        • API
        • API Driven Microservices
        • Bash
        • bat
        • Batch Processing
        • Batch vs PowerShell scripts
        • CI-CD
        • Clustering_Dashboard.py
        • Code Diagrams
        • Command Line
        • Continuous Delivery - Deployment
        • Continuous Integration
        • Cron jobs
        • dagster
        • Data Ingestion
        • Data Orchestration
        • Data Pipeline
        • Data Pipeline to Data Products
        • Data Streaming
        • Databricks
        • Databricks vs Snowflake
        • dbt
        • Debugging
        • Declarative Data Pipeline
        • dependency manager
        • DevOps
        • Devops Portal
        • Digital Transformation
        • Docker
        • Docker Image
        • Elastic Net
        • Environment Variables
        • Epub
        • Event Driven
        • Event Driven Events
        • Everything
        • Excel
        • Excel pivot table
        • Excel vs Google Sheets
        • FastAPI
        • Firebase
        • frontend
        • functional programming
        • GIS
        • Git
        • Github Gists
        • gitlab-ci.yml
        • Global Interpreter Lock
        • Google Cloud Platform
        • Google Colab
        • Google My Maps Data Extraction
        • Google Sheets
        • GPT
        • Gradio
        • Grep
        • Hadoop
        • Hugging Face
        • imperative
        • ipynb
        • jinja template
        • Json
        • Json to SQLite
        • jupytext
        • Justfile
        • kubernetes
        • Load Balancing
        • Maintainability
        • Maintainable Code
        • Makefile
        • Master Observability Datadog
        • Memory
        • Memory Caching
        • Microsoft
        • MongoDB
        • nbconvert
        • NET
        • Normalisation of Text
        • Pandas Series vs DataFrame
        • Pandoc
        • PMML
        • Powerquery
        • Powershell scripts
        • Powershell versus Command Prompt
        • Powershell vs Bash
        • Publish and Subscribe
        • PySpark
        • Pytest
        • Python
        • Python Click
        • Quartz
        • Random Access Memory
        • React
        • Registering a Scheduled Task
        • REST API
        • Scala
        • Security Vulnerabilities
        • shapefile
        • Sharepoint
        • Snowflake
        • Snowflake vs Hadoop
        • Software Development Life Cycle
        • SQL vs NoSQL
        • Streamlit
        • Technical Design Doc Template
        • Terminal commands
        • Testing
        • TOML
        • tool.bandit
        • tool.ruff
        • tool.uv
        • Types of Computational Bugs
        • TypeScript
        • Ubuntu
        • unittest
        • Vercel
        • Virtual environments
        • Web Feature Server (WFS)
        • Web Map Tile Service (WMTS)
        • Why JSON is Better than Pickle for Untrusted Data
        • Windows
        • Windows Scheduled Tasks
        • yaml
      • industry
        • AI Engineer
        • AI governance
        • Analytics Engineer
        • business intelligence
        • Business observability
        • Business Understanding
        • Business Values
        • Data AI Education at Work
        • Data Engineer
        • Data Governance
        • data literacy
        • Data Roles
        • Data Steward
        • Design Thinking Questions
        • Documentation & Meetings
        • Energy
        • Energy ABM
        • Energy Demand Forecasting
        • Energy Storage
        • Facts
        • Gartner Hype Cycle
        • Industries of interest
        • Knowledge Work
        • Managing People
        • ML Engineer
        • Network Design
        • Operational Resilience for Growth and Adaptability
        • Reporting
        • Scaling Data Science Capability
        • Smart Grids
        • Telecommunications
        • Thinking Systems
        • Use of RNNs in energy sector
        • Working with SMEs
      • machine-learning
        • Accuracy
        • Activation atlases
        • Activation Function
        • Active Learning
        • Adam Optimizer
        • Adaptive Learning Rates
        • Adjusted R squared
        • Agent-Based Modelling
        • AIC in Model Evaluation
        • Anomaly Detection
        • Anomaly Detection in Time Series
        • Anomaly Detection with Clustering
        • Anomaly Detection with Statistical Methods
        • Assessing Gen AI generated content
        • AUC
        • Automated Feature Creation
        • AutoML
        • Backpropagation
        • Batch Normalisation
        • Bias in ML
        • Binary Classification
        • Boosting
        • Business value of anomaly detection
        • CART
        • CatBoost
        • Challenges to Model Deployment
        • Class Separability
        • Classification
        • Classification Report
        • Cluster Density
        • Cluster Seperation
        • Clustering
        • Collaborative Filtering
        • conceptual data model
        • Confusion Matrix
        • Cost Function
        • Cost-Sensitive Analysis
        • Cross Entropy
        • Customer Growth Modeling
        • Data Selection in ML
        • Data Transformation in Machine Learning
        • DBSCAN
        • Decision Theory
        • Decision Tree
        • Decision Trees are Fragile
        • Deep Learning Frameworks
        • Deep Q-Learning
        • Dendrograms
        • Determining Threshold Values
        • Dimension Table
        • Dimensional Modelling
        • Dimensionality Reduction
        • Dimensions
        • Distributions in Decision Tree Leaves
        • Dropout
        • Dummy variable trap
        • Edge ML
        • emergent behavior
        • Encoding Categorical Variables
        • Epoch
        • Evaluating Language Models
        • Evaluating Logistic Regression
        • Evaluating the effectiveness of prompts
        • Evaluation Metrics
        • Exploration vs Exploitation
        • Exponential Smoothing
        • f-regression
        • F1 Score
        • Fact Table
        • FAISS
        • Feature Engineering for Time Series
        • Feature Evaluation
        • Feature Extraction
        • Feature Importance
        • Feature Selection
        • Feature Transformations
        • Feed Forward Neural Network
        • Filter Methods
        • Fitting weights and biases of a neural network
        • Framework for models
        • Gaussian Model
        • General Linear Regression
        • Generalisation
        • Generative Adversarial Networks
        • Gini Impurity
        • Gini Impurity vs Cross Entropy
        • Gradient Boosted Trees
        • Gradient Boosting
        • Gradient Boosting Regressor
        • Gradient Descent
        • Gradient descent in linear regression
        • granularity
        • Graph Neural Network
        • Graph Theory Community
        • GridSeachCv
        • Growth Models in Time Series
        • GRU
        • Hierarchical Clustering
        • High cross validation accuracy is not directly proportional to performance on unseen test data
        • Histogram
        • How do we evaluate of LLM Outputs
        • How to use Sklearn Pipeline
        • Hyperparameter
        • Hyperparameter Tuning
        • Impact of multicollinearity on model parameters
        • Inertia K Means Cost Function
        • inference
        • inference versus prediction
        • initialization methods
        • Interoperability
        • interoperable
        • Interpretability
        • Interpreting logistic regression model parameters
        • Isolated Forest
        • Jaccard Coefficient
        • K-means
        • K-nearest neighbours
        • Keras
        • Kernel Density Estimation
        • Kernelling
        • Kmeans vs GMM
        • L1 Regularisation
        • Label encoding vs One-hot encoding
        • Labelling data
        • Lagrange multipliers in optimisation
        • lambda architecture
        • Latent Dirichlet Allocation
        • Latent Semantic Indexing
        • LBFGS
        • Learning Curve
        • Learning Rate
        • Learning Styles
        • LightGBM
        • LightGBM vs XGBoost vs CatBoost
        • Linear Regression
        • LLM Evaluation Metrics
        • Local Interpretable Model-agnostic Explainations
        • Local Outlier Factor (LOF)
        • Logistic Regression
        • Logistic Regression does not predict probabilities
        • Logistic regression in sklearn & Gradient Descent
        • Logistic Regression Statsmodel Summary table
        • Loss function
        • Loss versus Cost function
        • Machine Learning
        • Machine Learning Operations
        • Manifold Learning
        • Markov Decision Processes
        • Maximum Likelihood Estimation
        • Median Absolute Error
        • Mermaid
        • Metadata Handling
        • Methods for Handling Outliers
        • Metric
        • Mini-batch gradient descent
        • MLOPS for Time Series
        • Model Building
        • Model Deployment using PyCaret
        • Model Ensemble
        • Model Evaluation
        • Model Evaluation vs Model Optimisation
        • Model Interpretability
        • Model Observability
        • Model Optimisation
        • Model Parameters
        • Model Parameters Tuning
        • Model parameters vs hyperparameters
        • Model Selection
        • Model Validation
        • model-agnostic feature importance
        • Momentum
        • Moving Average Forecast
        • Multinomial Naive bayes
        • Multiple Linear Regression
        • Naive Bayes Classifier
        • Naive Forecast
        • Neural network
        • Neural Network Classification
        • Neural network in Practice
        • Neural Scaling Laws
        • Non-negative matrix factorization in ML
        • Non-parametric tests
        • Normalisation of data
        • Normalisation vs Standardisation
        • objective function
        • One-hot encoding
        • Optimisation function
        • Optimisation techniques
        • Optimising a Logistic Regression Model
        • Optimising Neural Networks
        • Optuna
        • Ordinary Least Squares
        • Orthogonalization
        • Outliers
        • Over parameterised models
        • PCA Explained Variance Ratio
        • PCA Principal Components
        • PCA-Based Anomaly Detection
        • PDP and ICE
        • Percentile Detection
        • Performance Drift
        • Polynomial Regression
        • Positional Encoding
        • Precision
        • Precision or Recall
        • Precision-Recall Curve
        • Prediction Intervals vs Confidence Interval
        • Principal Component Analysis
        • PyCaret
        • PyOD
        • PyTorch
        • Pytorch vs Tensorflow
        • Q-Learning
        • Random Forest
        • Random Forest for Time Series
        • Recall
        • Recommender systems
        • Recurrent Neural Networks
        • Regression
        • Regression Metrics
        • Regularisation
        • Regularisation of Tree based models
        • Reinforcement learning
        • Relationships in memory
        • Reward Function
        • Ridge
        • ROC (Receiver Operating Characteristic)
        • Sammon’s Mapping
        • SARIMA
        • Scikit-Learn
        • Secretary Problem
        • semi-structured data
        • Sentence Transformers
        • Sklearn Pipeline
        • Specificity
        • Spectral Clustering
        • Supervised Learning
        • Support Vector Classifier
        • Support Vector Machines
        • Support Vector Regression
        • Tensorflow
        • Test Loss When Evaluating Models
        • Text Classification
        • Time Series Python Packages
        • Train-Dev-Test Sets
        • Transfer Learning
        • Transformed Target Regressor
        • Transformer
        • Transformers vs RNNs
        • Type I Error (False Positive)
        • Type II Error (False Negative)
        • Types of Neural Networks
        • Typical Output Formats in Neural Networks
        • UMAP
        • Unsupervised Learning
        • Use Cases for a Simple Neural Network Like
        • vanishing and exploding gradients problem
        • Variability in linear models
        • Variance in ML
        • Vector Embedding
        • WCSS and elbow method
        • Weak Learners
        • When and why not to us regularisation
        • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
        • Why does the Adam Optimizer converge
        • Why Removing Outliers May Improve Regression but Harm Classification
        • Why standardise features
        • Why Type 1 and Type 2 matter
        • Wrapper Methods
        • Xaiver
        • XGBoost
      • natural-language
        • AI Agents Memory
        • Attention mechanism
        • Bag of words
        • BERT
        • BERTScore
        • Chain of thought
        • ChatGPT
        • Claude
        • Comparing LLMs
        • Distillation
        • ElasticSearch
        • Embedded Methods
        • embeddings for OOV words
        • Evaluate Embedding Methods
        • Fuzzywuzzy
        • Generative AI
        • Generative AI From Theory to Practice
        • Grammar method
        • Guardrails
        • How businesses use Gen AI
        • How LLMs store facts
        • How to reduce the need for Gen AI responses
        • How would you decide between using TF-IDF and Word2Vec for text vectorization
        • In NER how would you handle ambiguous entities
        • Key Components of Attention and Formula
        • Knowledge graph vs RAG setup
        • Language Model Output Optimisation
        • Language Models
        • Language Models Large (LLMs) vs Small (SLMs)
        • lemmatization
        • LLM
        • LLM Memory
        • Local LLM use cases
        • Mathematical Reasoning in Transformers
        • Mixture of Experts
        • Model Cascading
        • Multi-head attention
        • Named Entity Recognition
        • NER Implementation
        • Ngrams
        • NLP
        • nltk
        • Non-negative Matrix Factorization
        • NotebookLM
        • OOV words
        • Pandas Dataframe Agent
        • Part of speech tagging
        • Prompt Engineering
        • prompt retrievers
        • Prompts
        • Pyright
        • RAG
        • Scaling Agentic Systems
        • Self attention vs multi-head attention
        • Self-Attention
        • Semantic Relationships
        • Semantic search
        • Sentence Similarity
        • Sentence Transformer Workflow
        • Similarity Search
        • Small Language Models
        • spaCy
        • Stemming
        • stopwords
        • Summarisation
        • syntactic relationships
        • Text2Cypher
        • TF-IDF
        • TF-IDF Implementation
        • Tokenisation
        • topic modeling
        • Vectorisation
        • Why is named entity recognition (NER) a challenging task
        • Word2vec
        • WordNet
      • OTHER
        • Addressing_Multicollinearity.py
        • Bag_of_Words.py
        • Bandit example output
        • Bandit_Example_Fixed.py
        • Click_Implementation.py
        • Comparing_Ensembles.py
        • Cross_Entropy_Single.py
        • Cross_Entropy.py
        • Debugging.py
        • Distribution_Analysis.py
        • Factor_Analysis.py
        • FastAPI_Example.py
        • Feature_Distribution.py
        • Forecasting_AutoArima.py
        • Forecasting_Baseline.py
        • Forecasting_Exponential_Smoothing.py
        • Gaussian_Mixture_Model_Implementation.py
        • Handling_Missing_Data_Basic.ipynb
        • Handling_Missing_Data.ipynb
        • Heatmaps_Dendrograms.py
        • Imbalanced_Datasets_SMOTE.py
        • K_Means.py
        • Momentum.py
        • One_hot_encoding.py
        • Pandas_Common.py
        • Pandas_Stack.py
        • PCA_Analysis.ipynb
        • PCA_Based_Anomaly_Detection.py
        • Pycaret_Anomaly.ipynb
        • Pycaret_Example.py
        • Pydantic_More.py
        • Pydantic.py
        • Regression_Logistic_Metrics.ipynb
        • Regularisation.py
        • ROC_Curve.py
        • SVM_Example.py
        • Testing_Pytest.py
        • Testing_unittest.py
        • transfer_learning.py
        • TS_Anomaly_Detection.py
        • Vector_Embedding.py
        • Wikipedia_API.py
        • Word2Vec.py
      • PAPER
        • Attention Is All You Need
        • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
      • project-management
        • 1-on-1 Template
        • 1-to-1's with a Line Manager
        • Asking questions
        • Change Management
        • Communication principles
        • Communication Techniques
        • Communication with Stakeholders
        • Conceptual Model
        • Documentation
        • Education and Training
        • Experiment Plan Template
        • Feedback Template
        • Fishbone diagram
        • How to do git commit messages properly
        • html
        • Jobs to be done
        • Jupyter Book
        • Managing Data Science Teams
        • Modern data team
        • nbconvert slideshows
        • One Pager Template
        • pdoc
        • Problem Definition
        • Process for prototyping
        • project management
        • Project Management Portal
        • Pull Request Template
        • RACI
        • Remaining useful life models
        • Return of Experience Form
        • Reveal.js
        • Technical Debt
        • UML
        • Why use ER diagrams
      • statistics
        • Addressing Multicollinearity
        • ANOVA
        • Assumption of Normality
        • Bernoulli
        • Bootstrap Sampling
        • Casual Inference
        • Central Limit Theorem
        • Central Limit Theorem & Small Sample Sizes
        • Chi-Squared Test
        • Confidence Interval
        • Correlation
        • Correlation vs Causation
        • Cosine Similarity
        • Covariance
        • Covariance vs Correlation
        • Cryptography
        • Differentation
        • Distributions
        • EM Algorithm
        • Factor Analysis
        • Gaussian Distribution
        • Graph Theory
        • Grouped plots
        • Handling Different Distributions
        • Hypothesis testing
        • information theory
        • Interquartile Range (IQR) Detection
        • Johnson–Lindenstrauss lemma
        • Markov chain
        • Mathematics
        • Mean Absolute Error
        • Mean Squared Error
        • mean vs median
        • Multicollinearity
        • non-parametric
        • Odds
        • Odds vs Probability
        • p values
        • Parametric tests
        • parametric vs non-parametric models
        • parametric vs non-parametric tests
        • parsimonious
        • Prediction Intervals
        • Probability
        • Proportion Test
        • Q-Q Plot
        • R
        • R squared
        • R-squared metric not always a good indicator of model performance in regression
        • Reasoning tokens
        • Root Mean Squared Error
        • Sampling
        • Spearman vs Pearson Correlation
        • Standard deviation
        • Standardisation
        • Statistical Assumptions
        • Statistical Tests
        • Statistical theorems
        • Statistics
        • statsmodels
        • Stochastic Gradient Descent
        • Symbolic computation
        • Sympy
        • T-test
        • univariate vs multivariate
        • Variance
        • Violin plot
        • Z-Normalisation
        • Z-Score
        • Z-Scores vs Prediction Intervals
        • Z-Test
      • uncategorised
        • Investigate pyodbc
        • NLP Portal
        • Science Portal
      • pages
        • Data Archive
        • DE_Tools
        • ML_Tools
        • Quotes
        • Research Questions
        • Reviews

    Devops Portal

    Tools:

    • tool.uv
    • tool.ruff

    File types:

    • Justfile
    • TOML
    • Makefile
    • Json

    Practices:

    • Testing
    • Documentation & Meetings

    Related to:

    • DevOps
    TABLE file.name AS "Note", length(file.inlinks) AS "Backlinks"
    FROM ""
    SORT length(file.inlinks) DESC
    LIMIT 100

    Backlinks

    • DevOps
    • Security Vulnerabilities
    • Home
    • Research Questions
    • portal
      • categories
        • computer-science
          • Algorithms
          • Big O Notation
          • BM25 (Best Match 25)
          • Checksum
          • Computer Science
          • Concurrency
          • Convex Optimisation
          • csv module
          • Directed Acyclic Graph (DAG)
          • Flask
          • garbage collector
          • Generators in Python
          • Hash
          • Heap Data Structure
          • Heap Memory
          • How to search within a graph
          • Immutable vs mutable
          • Java
          • Java vs JavaScript
          • JavaScript
          • Knowledge Graph
          • Langchain
          • Machine Learning Algorithms
          • Monte Carlo Simulation
          • Multiprocessing vs Multithreading
          • Multithreading
          • neomodel
          • Node.JS
          • Numpy
          • Processes vs Threads
          • programming languages
          • PyGraphviz
          • QuickSort
          • Ranking models
          • Recursive Algorithm
          • Strongly vs Weakly typed language
          • Times Series Python Packages
        • data-analysis
          • Altair
          • altair versus seaborn
          • Binder
          • Boxplot
          • Dash
          • Dashboarding
          • Dashboards
          • Data Analysis
          • Data Analysis Portal
          • Data Analyst
          • Data Distribution
          • Data Mining
          • Data Product
          • Data Reduction
          • Data Visualisation
          • DuckDB
          • EDA
          • ER Diagrams
          • Heatmap
          • Label encoding
          • Linear Discriminant Analysis
          • Log transformation
          • Looker Studio
          • MariaDB vs MySQL
          • Melt
          • Multiple Correspondence Analysis
          • Multivariate Analysis
          • OLAP
          • Page Rank
          • Parquet
          • Plotly
          • PowerBI
          • Preprocessing
          • Preprocessing Text Classification
          • Seaborn
          • SQL Window functions
          • t-SNE
          • Tableau
        • data-engineering
          • ACID Transaction
          • Ada boosting
          • Adding a database to PostgreSQL
          • Aggregation
          • Apache Iceberg
          • Attack mitigation
          • Attack types
          • AWS Lambda
          • Azure
          • Bagging
          • Benefits of Data Transformation
          • Big Data
          • BigQuery
          • Cassandra
          • Cloud Providers
          • Coaching & Mentoring
          • Columnar Storage
          • Command Prompt
          • Common Table Expression
          • Components of the database
          • Covering Index
          • Crosstab
          • CRUD
          • CUDA
          • Curse of dimensionality
          • Cypher
          • Data Architect
          • Data Architecture
          • Data Cleansing
          • Data Contract
          • Data Deployment
          • Data Dictionary
          • Data Drift
          • Data Engineering
          • Data Engineering Portal
          • Data Engineering Tools
          • Data Evaluation
          • Data Hierarchy of Needs
          • Data Integration
          • Data Integrity
          • Data Lake
          • Data Lakehouse
          • Data Leakage
          • Data Lifecycle Management
          • data lineage
          • Data Management
          • Data Modeling
          • Data Observability
          • Data Principles
          • Data Quality
          • Data Security
          • Data Selection
          • Data Sources
          • Data Storage
          • Data Transformation
          • Data Transformation in Data Engineering
          • Data Transformation with Pandas
          • Data Validation
          • Data Virtualization
          • Data Warehouse
          • Database
          • Database Index
          • Database Management System (DBMS)
          • Database Schema
          • Database Storage
          • Database Techniques
          • Databricks 1
          • DataOps
          • dbt 1
          • design pattern
          • Digital twin
          • Distributed Computing
          • DuckDB in python
          • DuckDB vs SQLite
          • Durability
          • ELT
          • Estimator
          • ETL
          • ETL 1
          • ETL Pipeline Example
          • ETL vs ELT
          • EtLT
          • Event Driven Microservices
          • Event-Driven Architecture
          • Fabric
          • Faker
          • File Management
          • Folder Tree Diagram
          • Foreign Key
          • Github Actions
          • Google Sheet Pivots Table
          • Grain
          • Graph Query Language
          • Groupby
          • Groupby vs Crosstab
          • heterogeneous features
          • Honkit
          • Hosting
          • How is schema evolution done in practice with SQL
          • How to normalise a merged table
          • Implementing Database Schema
          • Imputation Techniques
          • in-memory format
          • incremental synchronization
          • Indexing in cypher
          • Input is Not Properly Sanitized
          • Joining Datasets
          • Junction Tables
          • KNIME
          • Logical Model
          • Many-to-Many Relationships
          • map reduce
          • MariaDB
          • master data management
          • Merge
          • Microsoft Access
          • Missing Data
          • Model Deployment
          • Monolith Architecture
          • Multi-level index
          • Multiprocessing
          • MySql
          • neo4j
          • Normalised Schema
          • NoSQL
          • Object Relational Mapper
          • OLTP
          • Overfitting
          • Pandas
          • Pandas join vs merge
          • Pandas Pivot Table
          • Pandas Stack
          • pd.Grouper
          • pgAdmin
          • Pgadmin Permissions on Windows
          • Physical Model
          • Pickle
          • Poetry
          • Polars
          • PostgreSQL
          • Postman
          • PowerShell
          • Prevention Is Better Than The Cure
          • Primary Key
          • Push-Down
          • Pydantic
          • Pyright vs Pydantic
          • Query Optimisation
          • Querying
          • Querying Time Series
          • Race Conditions
          • Relating Tables Together
          • Relational Database
          • reverse etl
          • rollup
          • Row parameters in SQL
          • Row-based Storage
          • Scalability
          • Scaling Server
          • Schema Evolution
          • Search
          • Security mitigation
          • Security Researcher
          • semantic layer
          • Single Source of Truth
          • Sklearn Pipiline
          • Slowly Changing Dimension
          • SMSS
          • Snowflake Schema
          • Soft Deletion
          • Software Design Patterns
          • Spreadsheets vs Databases
          • SQL
          • SQL Groupby
          • SQL Injection
          • SQL Joins
          • SQLAlchemy
          • SQLAlchemy vs. sqlite3
          • SQLite
          • SQLite Studio
          • Star Schema
          • storage layer object store
          • Stored Procedures
          • structured data
          • Structuring and organizing data
          • Transaction
          • Turning a flat file into a database
          • Types of Database Schema
          • Unix
          • unstructured data
          • Usability
          • Vacuum
          • Vector Database
          • Vectorized Engine
          • View Use Case
          • Views
          • Windows Subsystem for Linux
        • data-science
          • ACF Plots
          • Additive vs Multiplicative Models Time Series
          • ADF Test
          • Agent Exploration
          • Agentic Solutions
          • AI
          • ARIMA
          • ARIMA vs Random Forest in Time Series
          • Autocorrelation
          • Autocorrelation vs Autoregression
          • Autoregression
          • Baseline Forecast
          • Basics of Time Series
          • Batch gradient descent
          • Bellman Equations
          • Bias-Variance Trade Off
          • Capability
          • Choosing a Threshold
          • Choosing the Number of Clusters
          • Clustermap
          • Covariance Structures
          • Cross Validation
          • Data Assessment
          • Data Collection
          • Data Mining - CRISP
          • Data Preparation
          • Data Science
          • Data Scientist
          • Data Understanding
          • Datasets
          • Decomposition in Time Series
          • Differencing in Time Series
          • DS & ML Portal
          • Evaluating Time Series Forecasts
          • Evolving Seasonality
          • F-statistic
          • Feature Engineering
          • Feature Scaling
          • Feature Selection vs Feature Importance
          • Forecasting using Lags
          • Forward Propagation
          • Gaussian Mixture Models
          • Gitlab
          • Gompertz Model
          • Good Enough Principle in Data Projects
          • GraphRAG
          • Handling Missing Data
          • Holt-Winters (Exponential Smoothing)
          • Holt-Winters vs ARIMA
          • Holt’s Linear Trend Model (Double Exponential Smoothing)
          • how do you do the data selection
          • Imbalanced Datasets
          • Interpolation
          • Intervention Analysis
          • Joining Time Series
          • Kernel Machines
          • KPSS Test
          • Latency
          • Logistic Model Curve
          • LSTM in Time Series
          • Mean Absolute Percentage Error
          • MNIST
          • Normalisation
          • Out-of-sample rolling forecast evaluation
          • PACF Plots
          • Performance Dimensions
          • pmdarima
          • Properties of Time Series Models
          • Random Forest Regression
          • Residuals in Time Series
          • Scatter Plots
          • Scientific Method
          • Scipy
          • Seasonal Naive Forecast
          • Seasonality in Time Series
          • SHapley Additive exPlanations
          • Shot Learning
          • Silhouette Analysis
          • Simple Exponential Smoothing (SES)
          • sklearn datasets
          • SMOTE (Synthetic Minority Over-sampling Technique)
          • SparseCategorialCrossentropy or CategoricalCrossEntropy
          • stack memory
          • Stacking
          • Stationary Time Series
          • STL Decomposition
          • Time Series
          • Time Series Forecasting
          • Time Series Forecasts in Business
          • Time Series Learning Resources
          • Time Series Shocks
          • Trends in Time Series
        • deep-learning
          • Convolutional Neural Networks
          • Deep Learning
          • How is reinforcement learning being combined with deep learning
          • LSTM
          • Multi-Agent Reinforcement Learning
          • Policy
          • Relu
          • Sarsa
        • devops
          • AB testing
          • Alternatives to Batch Processing
          • Amazon S3
          • Apache Airflow
          • Apache Kafka
          • Apache Spark
          • API
          • API Driven Microservices
          • Bash
          • bat
          • Batch Processing
          • Batch vs PowerShell scripts
          • CI-CD
          • Clustering_Dashboard.py
          • Code Diagrams
          • Command Line
          • Continuous Delivery - Deployment
          • Continuous Integration
          • Cron jobs
          • dagster
          • Data Ingestion
          • Data Orchestration
          • Data Pipeline
          • Data Pipeline to Data Products
          • Data Streaming
          • Databricks
          • Databricks vs Snowflake
          • dbt
          • Debugging
          • Declarative Data Pipeline
          • dependency manager
          • DevOps
          • Devops Portal
          • Digital Transformation
          • Docker
          • Docker Image
          • Elastic Net
          • Environment Variables
          • Epub
          • Event Driven
          • Event Driven Events
          • Everything
          • Excel
          • Excel pivot table
          • Excel vs Google Sheets
          • FastAPI
          • Firebase
          • frontend
          • functional programming
          • GIS
          • Git
          • Github Gists
          • gitlab-ci.yml
          • Global Interpreter Lock
          • Google Cloud Platform
          • Google Colab
          • Google My Maps Data Extraction
          • Google Sheets
          • GPT
          • Gradio
          • Grep
          • Hadoop
          • Hugging Face
          • imperative
          • ipynb
          • jinja template
          • Json
          • Json to SQLite
          • jupytext
          • Justfile
          • kubernetes
          • Load Balancing
          • Maintainability
          • Maintainable Code
          • Makefile
          • Master Observability Datadog
          • Memory
          • Memory Caching
          • Microsoft
          • MongoDB
          • nbconvert
          • NET
          • Normalisation of Text
          • Pandas Series vs DataFrame
          • Pandoc
          • PMML
          • Powerquery
          • Powershell scripts
          • Powershell versus Command Prompt
          • Powershell vs Bash
          • Publish and Subscribe
          • PySpark
          • Pytest
          • Python
          • Python Click
          • Quartz
          • Random Access Memory
          • React
          • Registering a Scheduled Task
          • REST API
          • Scala
          • Security Vulnerabilities
          • shapefile
          • Sharepoint
          • Snowflake
          • Snowflake vs Hadoop
          • Software Development Life Cycle
          • SQL vs NoSQL
          • Streamlit
          • Technical Design Doc Template
          • Terminal commands
          • Testing
          • TOML
          • tool.bandit
          • tool.ruff
          • tool.uv
          • Types of Computational Bugs
          • TypeScript
          • Ubuntu
          • unittest
          • Vercel
          • Virtual environments
          • Web Feature Server (WFS)
          • Web Map Tile Service (WMTS)
          • Why JSON is Better than Pickle for Untrusted Data
          • Windows
          • Windows Scheduled Tasks
          • yaml
        • industry
          • AI Engineer
          • AI governance
          • Analytics Engineer
          • business intelligence
          • Business observability
          • Business Understanding
          • Business Values
          • Data AI Education at Work
          • Data Engineer
          • Data Governance
          • data literacy
          • Data Roles
          • Data Steward
          • Design Thinking Questions
          • Documentation & Meetings
          • Energy
          • Energy ABM
          • Energy Demand Forecasting
          • Energy Storage
          • Facts
          • Gartner Hype Cycle
          • Industries of interest
          • Knowledge Work
          • Managing People
          • ML Engineer
          • Network Design
          • Operational Resilience for Growth and Adaptability
          • Reporting
          • Scaling Data Science Capability
          • Smart Grids
          • Telecommunications
          • Thinking Systems
          • Use of RNNs in energy sector
          • Working with SMEs
        • machine-learning
          • Accuracy
          • Activation atlases
          • Activation Function
          • Active Learning
          • Adam Optimizer
          • Adaptive Learning Rates
          • Adjusted R squared
          • Agent-Based Modelling
          • AIC in Model Evaluation
          • Anomaly Detection
          • Anomaly Detection in Time Series
          • Anomaly Detection with Clustering
          • Anomaly Detection with Statistical Methods
          • Assessing Gen AI generated content
          • AUC
          • Automated Feature Creation
          • AutoML
          • Backpropagation
          • Batch Normalisation
          • Bias in ML
          • Binary Classification
          • Boosting
          • Business value of anomaly detection
          • CART
          • CatBoost
          • Challenges to Model Deployment
          • Class Separability
          • Classification
          • Classification Report
          • Cluster Density
          • Cluster Seperation
          • Clustering
          • Collaborative Filtering
          • conceptual data model
          • Confusion Matrix
          • Cost Function
          • Cost-Sensitive Analysis
          • Cross Entropy
          • Customer Growth Modeling
          • Data Selection in ML
          • Data Transformation in Machine Learning
          • DBSCAN
          • Decision Theory
          • Decision Tree
          • Decision Trees are Fragile
          • Deep Learning Frameworks
          • Deep Q-Learning
          • Dendrograms
          • Determining Threshold Values
          • Dimension Table
          • Dimensional Modelling
          • Dimensionality Reduction
          • Dimensions
          • Distributions in Decision Tree Leaves
          • Dropout
          • Dummy variable trap
          • Edge ML
          • emergent behavior
          • Encoding Categorical Variables
          • Epoch
          • Evaluating Language Models
          • Evaluating Logistic Regression
          • Evaluating the effectiveness of prompts
          • Evaluation Metrics
          • Exploration vs Exploitation
          • Exponential Smoothing
          • f-regression
          • F1 Score
          • Fact Table
          • FAISS
          • Feature Engineering for Time Series
          • Feature Evaluation
          • Feature Extraction
          • Feature Importance
          • Feature Selection
          • Feature Transformations
          • Feed Forward Neural Network
          • Filter Methods
          • Fitting weights and biases of a neural network
          • Framework for models
          • Gaussian Model
          • General Linear Regression
          • Generalisation
          • Generative Adversarial Networks
          • Gini Impurity
          • Gini Impurity vs Cross Entropy
          • Gradient Boosted Trees
          • Gradient Boosting
          • Gradient Boosting Regressor
          • Gradient Descent
          • Gradient descent in linear regression
          • granularity
          • Graph Neural Network
          • Graph Theory Community
          • GridSeachCv
          • Growth Models in Time Series
          • GRU
          • Hierarchical Clustering
          • High cross validation accuracy is not directly proportional to performance on unseen test data
          • Histogram
          • How do we evaluate of LLM Outputs
          • How to use Sklearn Pipeline
          • Hyperparameter
          • Hyperparameter Tuning
          • Impact of multicollinearity on model parameters
          • Inertia K Means Cost Function
          • inference
          • inference versus prediction
          • initialization methods
          • Interoperability
          • interoperable
          • Interpretability
          • Interpreting logistic regression model parameters
          • Isolated Forest
          • Jaccard Coefficient
          • K-means
          • K-nearest neighbours
          • Keras
          • Kernel Density Estimation
          • Kernelling
          • Kmeans vs GMM
          • L1 Regularisation
          • Label encoding vs One-hot encoding
          • Labelling data
          • Lagrange multipliers in optimisation
          • lambda architecture
          • Latent Dirichlet Allocation
          • Latent Semantic Indexing
          • LBFGS
          • Learning Curve
          • Learning Rate
          • Learning Styles
          • LightGBM
          • LightGBM vs XGBoost vs CatBoost
          • Linear Regression
          • LLM Evaluation Metrics
          • Local Interpretable Model-agnostic Explainations
          • Local Outlier Factor (LOF)
          • Logistic Regression
          • Logistic Regression does not predict probabilities
          • Logistic regression in sklearn & Gradient Descent
          • Logistic Regression Statsmodel Summary table
          • Loss function
          • Loss versus Cost function
          • Machine Learning
          • Machine Learning Operations
          • Manifold Learning
          • Markov Decision Processes
          • Maximum Likelihood Estimation
          • Median Absolute Error
          • Mermaid
          • Metadata Handling
          • Methods for Handling Outliers
          • Metric
          • Mini-batch gradient descent
          • MLOPS for Time Series
          • Model Building
          • Model Deployment using PyCaret
          • Model Ensemble
          • Model Evaluation
          • Model Evaluation vs Model Optimisation
          • Model Interpretability
          • Model Observability
          • Model Optimisation
          • Model Parameters
          • Model Parameters Tuning
          • Model parameters vs hyperparameters
          • Model Selection
          • Model Validation
          • model-agnostic feature importance
          • Momentum
          • Moving Average Forecast
          • Multinomial Naive bayes
          • Multiple Linear Regression
          • Naive Bayes Classifier
          • Naive Forecast
          • Neural network
          • Neural Network Classification
          • Neural network in Practice
          • Neural Scaling Laws
          • Non-negative matrix factorization in ML
          • Non-parametric tests
          • Normalisation of data
          • Normalisation vs Standardisation
          • objective function
          • One-hot encoding
          • Optimisation function
          • Optimisation techniques
          • Optimising a Logistic Regression Model
          • Optimising Neural Networks
          • Optuna
          • Ordinary Least Squares
          • Orthogonalization
          • Outliers
          • Over parameterised models
          • PCA Explained Variance Ratio
          • PCA Principal Components
          • PCA-Based Anomaly Detection
          • PDP and ICE
          • Percentile Detection
          • Performance Drift
          • Polynomial Regression
          • Positional Encoding
          • Precision
          • Precision or Recall
          • Precision-Recall Curve
          • Prediction Intervals vs Confidence Interval
          • Principal Component Analysis
          • PyCaret
          • PyOD
          • PyTorch
          • Pytorch vs Tensorflow
          • Q-Learning
          • Random Forest
          • Random Forest for Time Series
          • Recall
          • Recommender systems
          • Recurrent Neural Networks
          • Regression
          • Regression Metrics
          • Regularisation
          • Regularisation of Tree based models
          • Reinforcement learning
          • Relationships in memory
          • Reward Function
          • Ridge
          • ROC (Receiver Operating Characteristic)
          • Sammon’s Mapping
          • SARIMA
          • Scikit-Learn
          • Secretary Problem
          • semi-structured data
          • Sentence Transformers
          • Sklearn Pipeline
          • Specificity
          • Spectral Clustering
          • Supervised Learning
          • Support Vector Classifier
          • Support Vector Machines
          • Support Vector Regression
          • Tensorflow
          • Test Loss When Evaluating Models
          • Text Classification
          • Time Series Python Packages
          • Train-Dev-Test Sets
          • Transfer Learning
          • Transformed Target Regressor
          • Transformer
          • Transformers vs RNNs
          • Type I Error (False Positive)
          • Type II Error (False Negative)
          • Types of Neural Networks
          • Typical Output Formats in Neural Networks
          • UMAP
          • Unsupervised Learning
          • Use Cases for a Simple Neural Network Like
          • vanishing and exploding gradients problem
          • Variability in linear models
          • Variance in ML
          • Vector Embedding
          • WCSS and elbow method
          • Weak Learners
          • When and why not to us regularisation
          • Why does increasing the number of models in a ensemble not necessarily improve the accuracy
          • Why does the Adam Optimizer converge
          • Why Removing Outliers May Improve Regression but Harm Classification
          • Why standardise features
          • Why Type 1 and Type 2 matter
          • Wrapper Methods
          • Xaiver
          • XGBoost
        • natural-language
          • AI Agents Memory
          • Attention mechanism
          • Bag of words
          • BERT
          • BERTScore
          • Chain of thought
          • ChatGPT
          • Claude
          • Comparing LLMs
          • Distillation
          • ElasticSearch
          • Embedded Methods
          • embeddings for OOV words
          • Evaluate Embedding Methods
          • Fuzzywuzzy
          • Generative AI
          • Generative AI From Theory to Practice
          • Grammar method
          • Guardrails
          • How businesses use Gen AI
          • How LLMs store facts
          • How to reduce the need for Gen AI responses
          • How would you decide between using TF-IDF and Word2Vec for text vectorization
          • In NER how would you handle ambiguous entities
          • Key Components of Attention and Formula
          • Knowledge graph vs RAG setup
          • Language Model Output Optimisation
          • Language Models
          • Language Models Large (LLMs) vs Small (SLMs)
          • lemmatization
          • LLM
          • LLM Memory
          • Local LLM use cases
          • Mathematical Reasoning in Transformers
          • Mixture of Experts
          • Model Cascading
          • Multi-head attention
          • Named Entity Recognition
          • NER Implementation
          • Ngrams
          • NLP
          • nltk
          • Non-negative Matrix Factorization
          • NotebookLM
          • OOV words
          • Pandas Dataframe Agent
          • Part of speech tagging
          • Prompt Engineering
          • prompt retrievers
          • Prompts
          • Pyright
          • RAG
          • Scaling Agentic Systems
          • Self attention vs multi-head attention
          • Self-Attention
          • Semantic Relationships
          • Semantic search
          • Sentence Similarity
          • Sentence Transformer Workflow
          • Similarity Search
          • Small Language Models
          • spaCy
          • Stemming
          • stopwords
          • Summarisation
          • syntactic relationships
          • Text2Cypher
          • TF-IDF
          • TF-IDF Implementation
          • Tokenisation
          • topic modeling
          • Vectorisation
          • Why is named entity recognition (NER) a challenging task
          • Word2vec
          • WordNet
        • OTHER
          • Addressing_Multicollinearity.py
          • Bag_of_Words.py
          • Bandit example output
          • Bandit_Example_Fixed.py
          • Click_Implementation.py
          • Comparing_Ensembles.py
          • Cross_Entropy_Single.py
          • Cross_Entropy.py
          • Debugging.py
          • Distribution_Analysis.py
          • Factor_Analysis.py
          • FastAPI_Example.py
          • Feature_Distribution.py
          • Forecasting_AutoArima.py
          • Forecasting_Baseline.py
          • Forecasting_Exponential_Smoothing.py
          • Gaussian_Mixture_Model_Implementation.py
          • Handling_Missing_Data_Basic.ipynb
          • Handling_Missing_Data.ipynb
          • Heatmaps_Dendrograms.py
          • Imbalanced_Datasets_SMOTE.py
          • K_Means.py
          • Momentum.py
          • One_hot_encoding.py
          • Pandas_Common.py
          • Pandas_Stack.py
          • PCA_Analysis.ipynb
          • PCA_Based_Anomaly_Detection.py
          • Pycaret_Anomaly.ipynb
          • Pycaret_Example.py
          • Pydantic_More.py
          • Pydantic.py
          • Regression_Logistic_Metrics.ipynb
          • Regularisation.py
          • ROC_Curve.py
          • SVM_Example.py
          • Testing_Pytest.py
          • Testing_unittest.py
          • transfer_learning.py
          • TS_Anomaly_Detection.py
          • Vector_Embedding.py
          • Wikipedia_API.py
          • Word2Vec.py
        • PAPER
          • Attention Is All You Need
          • BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
        • project-management
          • 1-on-1 Template
          • 1-to-1's with a Line Manager
          • Asking questions
          • Change Management
          • Communication principles
          • Communication Techniques
          • Communication with Stakeholders
          • Conceptual Model
          • Documentation
          • Education and Training
          • Experiment Plan Template
          • Feedback Template
          • Fishbone diagram
          • How to do git commit messages properly
          • html
          • Jobs to be done
          • Jupyter Book
          • Managing Data Science Teams
          • Modern data team
          • nbconvert slideshows
          • One Pager Template
          • pdoc
          • Problem Definition
          • Process for prototyping
          • project management
          • Project Management Portal
          • Pull Request Template
          • RACI
          • Remaining useful life models
          • Return of Experience Form
          • Reveal.js
          • Technical Debt
          • UML
          • Why use ER diagrams
        • statistics
          • Addressing Multicollinearity
          • ANOVA
          • Assumption of Normality
          • Bernoulli
          • Bootstrap Sampling
          • Casual Inference
          • Central Limit Theorem
          • Central Limit Theorem & Small Sample Sizes
          • Chi-Squared Test
          • Confidence Interval
          • Correlation
          • Correlation vs Causation
          • Cosine Similarity
          • Covariance
          • Covariance vs Correlation
          • Cryptography
          • Differentation
          • Distributions
          • EM Algorithm
          • Factor Analysis
          • Gaussian Distribution
          • Graph Theory
          • Grouped plots
          • Handling Different Distributions
          • Hypothesis testing
          • information theory
          • Interquartile Range (IQR) Detection
          • Johnson–Lindenstrauss lemma
          • Markov chain
          • Mathematics
          • Mean Absolute Error
          • Mean Squared Error
          • mean vs median
          • Multicollinearity
          • non-parametric
          • Odds
          • Odds vs Probability
          • p values
          • Parametric tests
          • parametric vs non-parametric models
          • parametric vs non-parametric tests
          • parsimonious
          • Prediction Intervals
          • Probability
          • Proportion Test
          • Q-Q Plot
          • R
          • R squared
          • R-squared metric not always a good indicator of model performance in regression
          • Reasoning tokens
          • Root Mean Squared Error
          • Sampling
          • Spearman vs Pearson Correlation
          • Standard deviation
          • Standardisation
          • Statistical Assumptions
          • Statistical Tests
          • Statistical theorems
          • Statistics
          • statsmodels
          • Stochastic Gradient Descent
          • Symbolic computation
          • Sympy
          • T-test
          • univariate vs multivariate
          • Variance
          • Violin plot
          • Z-Normalisation
          • Z-Score
          • Z-Scores vs Prediction Intervals
          • Z-Test
        • uncategorised
          • Investigate pyodbc
          • NLP Portal
          • Science Portal
        • pages
          • Data Archive
          • DE_Tools
          • ML_Tools
          • Quotes
          • Research Questions
          • Reviews

      Created with Quartz v4.3.1 © 2025

      • GitHub
      • Linkedin