Challenges with Machine Learning Projects

Introduction

Machine Learning Projects are hard to make. Maintaining them is even much harder. Code becomes complex and end of month or year, it becomes impossible to understand it.

Why do projects become spaghetti code?

Lack of project structure

Projects that lack structure initially later become a mess. Usually, this occurs when all code is dumped into a src folder. This folder gets highly polluted and later it becomes hard to understand the codebase. It is wise to divide the codebase into folders and subfolders, which make it easier to understand code.

Not documenting the code or work

Code that looks obvious right now becomes very hard to understand later. Remember that code written should be reusable and accessible to all people. Easy methods such as type hinting and docstrings are sufficient and make code clear.

For example, consider a function that takes a tuple denoting a rectangle in (x, y, w, h) format:

def foo(l):
  a = l[2] * l[3]
  return a

The above code is functional, but without documentation, it would be hard to understand. Let’s improve it:

def area_rect(rect: Tuple):
  """
  Arguments:
    rect (Tuple): A tuple denoting a rectangle in (x, y, w, h)
    format. Where x, y are coordinates and w, h are width
    and height.
  Returns (Int):
    Area of the rectangle.
  """
  area = rect[2] * rect[3]
  return area

How simple yet so powerful documenting can be!

Missing requirements, not specifying how to use the project

Small additions, such as having a requirements.txt file, can help people replicate your computer packages. This ensures consistency with your work.

Having a proper README that describes how the project is structured and how it should be run is really helpful. It helps people replicate your code and try it for their work.

Not using functions

Many people work without writing functions. This does not make code modular and harder to comprehend. Having functions with docstrings and using them keeps content clear and easier to understand. Writing functions at the top of the file and calling them in main is easier. For example:

if __name__ == "__main__":
  rect = [0, 0, 10, 20]
  res = area_rect(rect)

Overengineering

While the above examples are cases of underengineering or not following simple practices, overengineering is another issue. Sometimes, people try to do too much and reinvent the wheel. There are multiple well-supported and documented libraries available. Try to use those and their methods; most functions are already present and work really well.

That’s all for this blog! Hopefully, your next project doesn’t become spaghetti code!