Modular, Practical Reference

Over the past year, I’ve been developing two structured repositories: ML_Tools and DE_Tools. These are designed to collect, distill, and organise machine learning and data engineering concepts into concise, reusable notebooks.

The goal is to reduce time spent revisiting the same problems across projects by providing well-scoped, implementation-neutral reference points for recurring tasks. Each repository is modular by design-organised by topic (e.g., Classifiers, Regression)-and built around lightweight, adaptable notebooks that can be easily referenced, reused, or extended. Unlike tool-specific libraries such as scikit-learn, these notebooks focus on practical application and not a specific package, making them broadly applicable across tools and frameworks.

These repositories emerged from practical need, as I often found myself searching across multiple codebases for the same techniques, examples, and workflows. I wanted a centralised, reusable resource that could support me to find starting points quickly, and adapt them, without the overhead of navigating large or domain-specific projects. The structure also supports a community-driven approach, allowing other developers, researchers, and analysts to fork and customise the materials to fit their own workflows.

Both repositories are linked to the Data Archive, a longer-form companion site that adds context, links related concepts, and puts each notebook within broader technical context.

If you find these resources useful, feel free to fork them or contribute. The goal is to build a community-driven collection that helps others get a starting point for their own projects.

Leave a comment