Data Archive
Search
Search
Dark mode
Light mode
Explorer
categories
computer-science
Algorithms
Big O Notation
BM25 (Best Match 25)
Checksum
Computer Science
Concurrency
Convex Optimisation
csv module
Directed Acyclic Graph (DAG)
Flask
garbage collector
Generators in Python
Hash
Heap Data Structure
Heap Memory
How to search within a graph
Immutable vs mutable
Java
Java vs JavaScript
JavaScript
Knowledge Graph
Langchain
Machine Learning Algorithms
Monte Carlo Simulation
Multiprocessing vs Multithreading
Multithreading
neomodel
Node.JS
Numpy
Processes vs Threads
programming languages
PyGraphviz
QuickSort
Ranking models
Recursive Algorithm
Strongly vs Weakly typed language
Times Series Python Packages
data-analysis
Altair
altair versus seaborn
Binder
Boxplot
Dash
Dashboarding
Dashboards
Data Analysis
Data Analysis Portal
Data Analyst
Data Distribution
Data Mining
Data Product
Data Reduction
Data Visualisation
DuckDB
EDA
ER Diagrams
Heatmap
Label encoding
Linear Discriminant Analysis
Log transformation
Looker Studio
MariaDB vs MySQL
Melt
Multiple Correspondence Analysis
Multivariate Analysis
OLAP
Page Rank
Parquet
Plotly
PowerBI
Preprocessing
Preprocessing Text Classification
Seaborn
SQL Window functions
t-SNE
Tableau
data-engineering
ACID Transaction
Ada boosting
Adding a database to PostgreSQL
Aggregation
Apache Iceberg
Attack mitigation
Attack types
AWS Lambda
Azure
Bagging
Benefits of Data Transformation
Big Data
BigQuery
Cassandra
Cloud Providers
Coaching & Mentoring
Columnar Storage
Command Prompt
Common Table Expression
Components of the database
Covering Index
Crosstab
CRUD
CUDA
Curse of dimensionality
Cypher
Data Architect
Data Architecture
Data Cleansing
Data Contract
Data Deployment
Data Dictionary
Data Drift
Data Engineering
Data Engineering Portal
Data Engineering Tools
Data Evaluation
Data Hierarchy of Needs
Data Integration
Data Integrity
Data Lake
Data Lakehouse
Data Leakage
Data Lifecycle Management
data lineage
Data Management
Data Modeling
Data Observability
Data Principles
Data Quality
Data Security
Data Selection
Data Sources
Data Storage
Data Transformation
Data Transformation in Data Engineering
Data Transformation with Pandas
Data Validation
Data Virtualization
Data Warehouse
Database
Database Index
Database Management System (DBMS)
Database Schema
Database Storage
Database Techniques
Databricks 1
DataOps
dbt 1
design pattern
Digital twin
Distributed Computing
DuckDB in python
DuckDB vs SQLite
Durability
ELT
Estimator
ETL
ETL 1
ETL Pipeline Example
ETL vs ELT
EtLT
Event Driven Microservices
Event-Driven Architecture
Fabric
Faker
File Management
Folder Tree Diagram
Foreign Key
Github Actions
Google Sheet Pivots Table
Grain
Graph Query Language
Groupby
Groupby vs Crosstab
heterogeneous features
Honkit
Hosting
How is schema evolution done in practice with SQL
How to normalise a merged table
Implementing Database Schema
Imputation Techniques
in-memory format
incremental synchronization
Indexing in cypher
Input is Not Properly Sanitized
Joining Datasets
Junction Tables
KNIME
Logical Model
Many-to-Many Relationships
map reduce
MariaDB
master data management
Merge
Microsoft Access
Missing Data
Model Deployment
Monolith Architecture
Multi-level index
Multiprocessing
MySql
neo4j
Normalised Schema
NoSQL
Object Relational Mapper
OLTP
Overfitting
Pandas
Pandas join vs merge
Pandas Pivot Table
Pandas Stack
pd.Grouper
pgAdmin
Pgadmin Permissions on Windows
Physical Model
Pickle
Poetry
Polars
PostgreSQL
Postman
PowerShell
Prevention Is Better Than The Cure
Primary Key
Push-Down
Pydantic
Pyright vs Pydantic
Query Optimisation
Querying
Querying Time Series
Race Conditions
Relating Tables Together
Relational Database
reverse etl
rollup
Row parameters in SQL
Row-based Storage
Scalability
Scaling Server
Schema Evolution
Search
Security mitigation
Security Researcher
semantic layer
Single Source of Truth
Sklearn Pipiline
Slowly Changing Dimension
SMSS
Snowflake Schema
Soft Deletion
Software Design Patterns
Spreadsheets vs Databases
SQL
SQL Groupby
SQL Injection
SQL Joins
SQLAlchemy
SQLAlchemy vs. sqlite3
SQLite
SQLite Studio
Star Schema
storage layer object store
Stored Procedures
structured data
Structuring and organizing data
Transaction
Turning a flat file into a database
Types of Database Schema
Unix
unstructured data
Usability
Vacuum
Vector Database
Vectorized Engine
View Use Case
Views
Windows Subsystem for Linux
data-science
ACF Plots
Additive vs Multiplicative Models Time Series
ADF Test
Agent Exploration
Agentic Solutions
AI
ARIMA
ARIMA vs Random Forest in Time Series
Autocorrelation
Autocorrelation vs Autoregression
Autoregression
Baseline Forecast
Basics of Time Series
Batch gradient descent
Bellman Equations
Bias-Variance Trade Off
Capability
Choosing a Threshold
Choosing the Number of Clusters
Clustermap
Covariance Structures
Cross Validation
Data Assessment
Data Collection
Data Mining - CRISP
Data Preparation
Data Science
Data Scientist
Data Understanding
Datasets
Decomposition in Time Series
Differencing in Time Series
DS & ML Portal
Evaluating Time Series Forecasts
Evolving Seasonality
F-statistic
Feature Engineering
Feature Scaling
Feature Selection vs Feature Importance
Forecasting using Lags
Forward Propagation
Gaussian Mixture Models
Gitlab
Gompertz Model
Good Enough Principle in Data Projects
GraphRAG
Handling Missing Data
Holt-Winters (Exponential Smoothing)
Holt-Winters vs ARIMA
Holt’s Linear Trend Model (Double Exponential Smoothing)
how do you do the data selection
Imbalanced Datasets
Interpolation
Intervention Analysis
Joining Time Series
Kernel Machines
KPSS Test
Latency
Logistic Model Curve
LSTM in Time Series
Mean Absolute Percentage Error
MNIST
Normalisation
Out-of-sample rolling forecast evaluation
PACF Plots
Performance Dimensions
pmdarima
Properties of Time Series Models
Random Forest Regression
Residuals in Time Series
Scatter Plots
Scientific Method
Scipy
Seasonal Naive Forecast
Seasonality in Time Series
SHapley Additive exPlanations
Shot Learning
Silhouette Analysis
Simple Exponential Smoothing (SES)
sklearn datasets
SMOTE (Synthetic Minority Over-sampling Technique)
SparseCategorialCrossentropy or CategoricalCrossEntropy
stack memory
Stacking
Stationary Time Series
STL Decomposition
Time Series
Time Series Forecasting
Time Series Forecasts in Business
Time Series Learning Resources
Time Series Shocks
Trends in Time Series
deep-learning
Convolutional Neural Networks
Deep Learning
How is reinforcement learning being combined with deep learning
LSTM
Multi-Agent Reinforcement Learning
Policy
Relu
Sarsa
devops
AB testing
Alternatives to Batch Processing
Amazon S3
Apache Airflow
Apache Kafka
Apache Spark
API
API Driven Microservices
Bash
bat
Batch Processing
Batch vs PowerShell scripts
CI-CD
Clustering_Dashboard.py
Code Diagrams
Command Line
Continuous Delivery - Deployment
Continuous Integration
Cron jobs
dagster
Data Ingestion
Data Orchestration
Data Pipeline
Data Pipeline to Data Products
Data Streaming
Databricks
Databricks vs Snowflake
dbt
Debugging
Declarative Data Pipeline
dependency manager
DevOps
Devops Portal
Digital Transformation
Docker
Docker Image
Elastic Net
Environment Variables
Epub
Event Driven
Event Driven Events
Everything
Excel
Excel pivot table
Excel vs Google Sheets
FastAPI
Firebase
frontend
functional programming
GIS
Git
Github Gists
gitlab-ci.yml
Global Interpreter Lock
Google Cloud Platform
Google Colab
Google My Maps Data Extraction
Google Sheets
GPT
Gradio
Grep
Hadoop
Hugging Face
imperative
ipynb
jinja template
Json
Json to SQLite
jupytext
Justfile
kubernetes
Load Balancing
Maintainability
Maintainable Code
Makefile
Master Observability Datadog
Memory
Memory Caching
Microsoft
MongoDB
nbconvert
NET
Normalisation of Text
Pandas Series vs DataFrame
Pandoc
PMML
Powerquery
Powershell scripts
Powershell versus Command Prompt
Powershell vs Bash
Publish and Subscribe
PySpark
Pytest
Python
Python Click
Quartz
Random Access Memory
React
Registering a Scheduled Task
REST API
Scala
Security Vulnerabilities
shapefile
Sharepoint
Snowflake
Snowflake vs Hadoop
Software Development Life Cycle
SQL vs NoSQL
Streamlit
Technical Design Doc Template
Terminal commands
Testing
TOML
tool.bandit
tool.ruff
tool.uv
Types of Computational Bugs
TypeScript
Ubuntu
unittest
Vercel
Virtual environments
Web Feature Server (WFS)
Web Map Tile Service (WMTS)
Why JSON is Better than Pickle for Untrusted Data
Windows
Windows Scheduled Tasks
yaml
industry
AI Engineer
AI governance
Analytics Engineer
business intelligence
Business observability
Business Understanding
Business Values
Data AI Education at Work
Data Engineer
Data Governance
data literacy
Data Roles
Data Steward
Design Thinking Questions
Documentation & Meetings
Energy
Energy ABM
Energy Demand Forecasting
Energy Storage
Facts
Gartner Hype Cycle
Industries of interest
Knowledge Work
Managing People
ML Engineer
Network Design
Operational Resilience for Growth and Adaptability
Reporting
Scaling Data Science Capability
Smart Grids
Telecommunications
Thinking Systems
Use of RNNs in energy sector
Working with SMEs
machine-learning
Accuracy
Activation atlases
Activation Function
Active Learning
Adam Optimizer
Adaptive Learning Rates
Adjusted R squared
Agent-Based Modelling
AIC in Model Evaluation
Anomaly Detection
Anomaly Detection in Time Series
Anomaly Detection with Clustering
Anomaly Detection with Statistical Methods
Assessing Gen AI generated content
AUC
Automated Feature Creation
AutoML
Backpropagation
Batch Normalisation
Bias in ML
Binary Classification
Boosting
Business value of anomaly detection
CART
CatBoost
Challenges to Model Deployment
Class Separability
Classification
Classification Report
Cluster Density
Cluster Seperation
Clustering
Collaborative Filtering
conceptual data model
Confusion Matrix
Cost Function
Cost-Sensitive Analysis
Cross Entropy
Customer Growth Modeling
Data Selection in ML
Data Transformation in Machine Learning
DBSCAN
Decision Theory
Decision Tree
Decision Trees are Fragile
Deep Learning Frameworks
Deep Q-Learning
Dendrograms
Determining Threshold Values
Dimension Table
Dimensional Modelling
Dimensionality Reduction
Dimensions
Distributions in Decision Tree Leaves
Dropout
Dummy variable trap
Edge ML
emergent behavior
Encoding Categorical Variables
Epoch
Evaluating Language Models
Evaluating Logistic Regression
Evaluating the effectiveness of prompts
Evaluation Metrics
Exploration vs Exploitation
Exponential Smoothing
f-regression
F1 Score
Fact Table
FAISS
Feature Engineering for Time Series
Feature Evaluation
Feature Extraction
Feature Importance
Feature Selection
Feature Transformations
Feed Forward Neural Network
Filter Methods
Fitting weights and biases of a neural network
Framework for models
Gaussian Model
General Linear Regression
Generalisation
Generative Adversarial Networks
Gini Impurity
Gini Impurity vs Cross Entropy
Gradient Boosted Trees
Gradient Boosting
Gradient Boosting Regressor
Gradient Descent
Gradient descent in linear regression
granularity
Graph Neural Network
Graph Theory Community
GridSeachCv
Growth Models in Time Series
GRU
Hierarchical Clustering
High cross validation accuracy is not directly proportional to performance on unseen test data
Histogram
How do we evaluate of LLM Outputs
How to use Sklearn Pipeline
Hyperparameter
Hyperparameter Tuning
Impact of multicollinearity on model parameters
Inertia K Means Cost Function
inference
inference versus prediction
initialization methods
Interoperability
interoperable
Interpretability
Interpreting logistic regression model parameters
Isolated Forest
Jaccard Coefficient
K-means
K-nearest neighbours
Keras
Kernel Density Estimation
Kernelling
Kmeans vs GMM
L1 Regularisation
Label encoding vs One-hot encoding
Labelling data
Lagrange multipliers in optimisation
lambda architecture
Latent Dirichlet Allocation
Latent Semantic Indexing
LBFGS
Learning Curve
Learning Rate
Learning Styles
LightGBM
LightGBM vs XGBoost vs CatBoost
Linear Regression
LLM Evaluation Metrics
Local Interpretable Model-agnostic Explainations
Local Outlier Factor (LOF)
Logistic Regression
Logistic Regression does not predict probabilities
Logistic regression in sklearn & Gradient Descent
Logistic Regression Statsmodel Summary table
Loss function
Loss versus Cost function
Machine Learning
Machine Learning Operations
Manifold Learning
Markov Decision Processes
Maximum Likelihood Estimation
Median Absolute Error
Mermaid
Metadata Handling
Methods for Handling Outliers
Metric
Mini-batch gradient descent
MLOPS for Time Series
Model Building
Model Deployment using PyCaret
Model Ensemble
Model Evaluation
Model Evaluation vs Model Optimisation
Model Interpretability
Model Observability
Model Optimisation
Model Parameters
Model Parameters Tuning
Model parameters vs hyperparameters
Model Selection
Model Validation
model-agnostic feature importance
Momentum
Moving Average Forecast
Multinomial Naive bayes
Multiple Linear Regression
Naive Bayes Classifier
Naive Forecast
Neural network
Neural Network Classification
Neural network in Practice
Neural Scaling Laws
Non-negative matrix factorization in ML
Non-parametric tests
Normalisation of data
Normalisation vs Standardisation
objective function
One-hot encoding
Optimisation function
Optimisation techniques
Optimising a Logistic Regression Model
Optimising Neural Networks
Optuna
Ordinary Least Squares
Orthogonalization
Outliers
Over parameterised models
PCA Explained Variance Ratio
PCA Principal Components
PCA-Based Anomaly Detection
PDP and ICE
Percentile Detection
Performance Drift
Polynomial Regression
Positional Encoding
Precision
Precision or Recall
Precision-Recall Curve
Prediction Intervals vs Confidence Interval
Principal Component Analysis
PyCaret
PyOD
PyTorch
Pytorch vs Tensorflow
Q-Learning
Random Forest
Random Forest for Time Series
Recall
Recommender systems
Recurrent Neural Networks
Regression
Regression Metrics
Regularisation
Regularisation of Tree based models
Reinforcement learning
Relationships in memory
Reward Function
Ridge
ROC (Receiver Operating Characteristic)
Sammon’s Mapping
SARIMA
Scikit-Learn
Secretary Problem
semi-structured data
Sentence Transformers
Sklearn Pipeline
Specificity
Spectral Clustering
Supervised Learning
Support Vector Classifier
Support Vector Machines
Support Vector Regression
Tensorflow
Test Loss When Evaluating Models
Text Classification
Time Series Python Packages
Train-Dev-Test Sets
Transfer Learning
Transformed Target Regressor
Transformer
Transformers vs RNNs
Type I Error (False Positive)
Type II Error (False Negative)
Types of Neural Networks
Typical Output Formats in Neural Networks
UMAP
Unsupervised Learning
Use Cases for a Simple Neural Network Like
vanishing and exploding gradients problem
Variability in linear models
Variance in ML
Vector Embedding
WCSS and elbow method
Weak Learners
When and why not to us regularisation
Why does increasing the number of models in a ensemble not necessarily improve the accuracy
Why does the Adam Optimizer converge
Why Removing Outliers May Improve Regression but Harm Classification
Why standardise features
Why Type 1 and Type 2 matter
Wrapper Methods
Xaiver
XGBoost
natural-language
AI Agents Memory
Attention mechanism
Bag of words
BERT
BERTScore
Chain of thought
ChatGPT
Claude
Comparing LLMs
Distillation
ElasticSearch
Embedded Methods
embeddings for OOV words
Evaluate Embedding Methods
Fuzzywuzzy
Generative AI
Generative AI From Theory to Practice
Grammar method
Guardrails
How businesses use Gen AI
How LLMs store facts
How to reduce the need for Gen AI responses
How would you decide between using TF-IDF and Word2Vec for text vectorization
In NER how would you handle ambiguous entities
Key Components of Attention and Formula
Knowledge graph vs RAG setup
Language Model Output Optimisation
Language Models
Language Models Large (LLMs) vs Small (SLMs)
lemmatization
LLM
LLM Memory
Local LLM use cases
Mathematical Reasoning in Transformers
Mixture of Experts
Model Cascading
Multi-head attention
Named Entity Recognition
NER Implementation
Ngrams
NLP
nltk
Non-negative Matrix Factorization
NotebookLM
OOV words
Pandas Dataframe Agent
Part of speech tagging
Prompt Engineering
prompt retrievers
Prompts
Pyright
RAG
Scaling Agentic Systems
Self attention vs multi-head attention
Self-Attention
Semantic Relationships
Semantic search
Sentence Similarity
Sentence Transformer Workflow
Similarity Search
Small Language Models
spaCy
Stemming
stopwords
Summarisation
syntactic relationships
Text2Cypher
TF-IDF
TF-IDF Implementation
Tokenisation
topic modeling
Vectorisation
Why is named entity recognition (NER) a challenging task
Word2vec
WordNet
OTHER
Addressing_Multicollinearity.py
Bag_of_Words.py
Bandit example output
Bandit_Example_Fixed.py
Click_Implementation.py
Comparing_Ensembles.py
Cross_Entropy_Single.py
Cross_Entropy.py
Debugging.py
Distribution_Analysis.py
Factor_Analysis.py
FastAPI_Example.py
Feature_Distribution.py
Forecasting_AutoArima.py
Forecasting_Baseline.py
Forecasting_Exponential_Smoothing.py
Gaussian_Mixture_Model_Implementation.py
Handling_Missing_Data_Basic.ipynb
Handling_Missing_Data.ipynb
Heatmaps_Dendrograms.py
Imbalanced_Datasets_SMOTE.py
K_Means.py
Momentum.py
One_hot_encoding.py
Pandas_Common.py
Pandas_Stack.py
PCA_Analysis.ipynb
PCA_Based_Anomaly_Detection.py
Pycaret_Anomaly.ipynb
Pycaret_Example.py
Pydantic_More.py
Pydantic.py
Regression_Logistic_Metrics.ipynb
Regularisation.py
ROC_Curve.py
SVM_Example.py
Testing_Pytest.py
Testing_unittest.py
transfer_learning.py
TS_Anomaly_Detection.py
Vector_Embedding.py
Wikipedia_API.py
Word2Vec.py
PAPER
Attention Is All You Need
BERT Pretraining of Deep Bidirectional Transformers for Language Understanding
project-management
1-on-1 Template
1-to-1's with a Line Manager
Asking questions
Change Management
Communication principles
Communication Techniques
Communication with Stakeholders
Conceptual Model
Documentation
Education and Training
Experiment Plan Template
Feedback Template
Fishbone diagram
How to do git commit messages properly
html
Jobs to be done
Jupyter Book
Managing Data Science Teams
Modern data team
nbconvert slideshows
One Pager Template
pdoc
Problem Definition
Process for prototyping
project management
Project Management Portal
Pull Request Template
RACI
Remaining useful life models
Return of Experience Form
Reveal.js
Technical Debt
UML
Why use ER diagrams
statistics
Addressing Multicollinearity
ANOVA
Assumption of Normality
Bernoulli
Bootstrap Sampling
Casual Inference
Central Limit Theorem
Central Limit Theorem & Small Sample Sizes
Chi-Squared Test
Confidence Interval
Correlation
Correlation vs Causation
Cosine Similarity
Covariance
Covariance vs Correlation
Cryptography
Differentation
Distributions
EM Algorithm
Factor Analysis
Gaussian Distribution
Graph Theory
Grouped plots
Handling Different Distributions
Hypothesis testing
information theory
Interquartile Range (IQR) Detection
Johnson–Lindenstrauss lemma
Markov chain
Mathematics
Mean Absolute Error
Mean Squared Error
mean vs median
Multicollinearity
non-parametric
Odds
Odds vs Probability
p values
Parametric tests
parametric vs non-parametric models
parametric vs non-parametric tests
parsimonious
Prediction Intervals
Probability
Proportion Test
Q-Q Plot
R
R squared
R-squared metric not always a good indicator of model performance in regression
Reasoning tokens
Root Mean Squared Error
Sampling
Spearman vs Pearson Correlation
Standard deviation
Standardisation
Statistical Assumptions
Statistical Tests
Statistical theorems
Statistics
statsmodels
Stochastic Gradient Descent
Symbolic computation
Sympy
T-test
univariate vs multivariate
Variance
Violin plot
Z-Normalisation
Z-Score
Z-Scores vs Prediction Intervals
Z-Test
uncategorised
Investigate pyodbc
NLP Portal
Science Portal
pages
Data Archive
DE_Tools
ML_Tools
Quotes
Research Questions
Reviews
Over parameterised models
Neural network
Universal approximation theory