
As I got familiar with the agile manifesto methodology a few months ago – agile values and principles, the most interesting thing for me was to think about its application within Data Science projects. The idea is very simple, however, I am deeply convinced that each of the values has its own disclaimer – applicable in one context, not in another, because of this and that, etc. The Agile Manifesto as a step forward in relation to traditional methodologies such as the so-called “waterfall” approach, should make it possible to reduce the difference in communication that exists between the client and the vendor, while on the other hand – respond to the increasingly rapid development of new technologies, and introducing changes in initial requirements, resulting from work dynamics, digitalization and competitive market.
Let us recall agile values through the prism of Data Science.
Individuals and interactions over processes and tools. In Data Science, interactions are key to understanding and defining business problems, that is – extracting maximum value based on analysis. It is very important to keep a critical mindset, and point out irregularities. Some insights often lead to a change of direction and resolving things that were not defined by the scope, which sometimes means going out of the defined process. On the other hand, limiting analytics to certain tools and technologies can result in truncated analysis and unusable insights.
Working software over comprehensive documentation. This is perhaps the value that can be most discussed through the prism of Data Science. Having applicable software and delivery is very important, but writing detailed documentation and explanations of how the data was sampled and prepared, which models were used and why, what is behind those models and how to interpret their output, what are the expected performances – all this it is very important to elaborate in detail in order to ensure the value in addition to the delivered solution.
Customer collaboration over contract negotiation. As in any software development project, having good collaboration with the client is a prerequisite for everything. In Data Science, this is very important, both for understanding the domain through interaction with the client, and for the interpretation and testing of the solution that is delivered by the client. Since it is a specific area, it is very important to establish cooperation with the client, which involves making joint efforts to create a solution that will have a value – most often the value of the solution directly depends on the knowledge of domain experts.
Responding to change over following a plan. This is where Agile and Data Science coincide the most. It is very often the case that as a result of the analysis a new idea of future steps, improvements or adjustments of the existing plan is awakened, and therefore it is necessary to be agile and not adhere to a blindly defined plan, in order to successfully respond to such requirements. The plan is very important, but it becomes obsolete and unusable as the goals are redefined and changed.
To make it clear – the fact that Agile and Data Science, as I characterized them in the title, are a “perfect combination” does not mean that they always lead to a perfect outcome in realization. What I wanted to emphasize is precisely that Agile allows Data Scientists to be – Data Scientists. This means that they can dedicate themselves to research, to change the direction of movement and redefine goals, depending on the course of analysis and the insights gained, to work closely with clients in attempts to find a solution, and so on. Further, if we are talking about agile principles (the famous twelve), there is a good chance that every Data Scientist / developer will agree with each of them at first. That is the beauty of agile principles – they are defined so that they can be successfully applied to any project. When you think about it more thoroughly, there are some principles that are debatable – e.g. a principle that says that the best architecture, requirements and design come from self-organizing teams. I believe in this. But one very important precondition for this is – the way these teams are made. If there are no people in that team who are adorned with innovation, “growth mindset”, autonomy and responsibility – it is very likely that this idea will fall apart. Simply, it usually happens that the teams are made – as they had to be made, and sometimes it is evident that the team lacks a leader who will supervise and lead it – which is not how self-organizing teams work. I could do this about each principle individually, but I will only dwell on this, and allow you to think about the pros and cons of each (or situations where a principle could be challenged).

However, there are several (more serious) problems that can occur as a result of this compound, and they are:
- poor and pruned (or even no) documentation of the research process, because the focus is on insights and results – which can be a problem if someone else needs to be involved in the process
- very frequent changes in requirements can take the analysis in a completely different direction, which makes it difficult to define “acceptance” criteria and deadlines – sometimes the process of developing a module takes several months (unnecessary)
- clients do not always have an understanding of the lean results of predictive models, which then affects communication and the quality of collaboration
- also, clients often believe that Data Science is a magical weapon that will solve all their business problems – which in turn affects communication, the quality of collaboration and the practical use of the solution
- Data Scientists often have a problem with feeling a lot of pressure – their solution is difficult to materialize, and when it is materialized, it is critically dependent on the input data, which they cannot influence
- communications on a daily basis can be depressing, as it often happens that in some Data Science tasks there is no significant progress for several days in a row, where the idea of frequent and incremental shifts is lost
Since Data Science is so diverse, depending on what the Data Science project you’re working on includes – the ability to apply agile methods will vary from project to project. If you are working on product development, Data Science in that sense becomes a niche of software engineering, where the application of agile methodologies and scrum prove to be very useful. On the other hand, if you are working on one-time projects or solutions – the application can be much milder and meaningful only in certain phases. The most important thing is to recognize what are the good sides that you could use, in order to improve your way of working and achieve the best possible results.
Be the first to comment