The world’s leading publication for data science, AI, and ML professionals.

How to Manage your Data Science Projects

The five P's of Data Science Projects

Photo by Tim Swaan on Unsplash
Photo by Tim Swaan on Unsplash

In this short article I want to provide you with the answer of what the 5P’s are and how you can benefit from them by using in your Data Science projects.

Purpose

Analogous to classic project management, a goal or purpose should always be formulated. In the world of Big Data, possible goals can be:

  • Better business insights
  • Fraud prevention/detection
  • Prediction
  • Maximization problems, etc.

But one should never do Big Data or Data Science projects without being aware of a goal. Don’t just do something because everyone is doing it!

People

The people who are normally involved in the development team have roles like:

  • Developers
  • Testers
  • Data Scientists
  • Domain Experts
Agile Data Science Team - Image by Author
Agile Data Science Team – Image by Author

also stakeholders and project sponsors who are informed about the progress of the project along the project manager/product owner, who mediate between both sides – and have the task of setting up an appropriate team, organizing deadlines, creating the project plan or stories. Read more here about setting up a team.

Processes

You have to take two different types of processes into consideration. On the one hand organizational processes and topics like:

  • Project organization: classic vs. Agile
  • Project progress reports and project marketing (e.g. how do I inform my stakeholders?)
  • Change process (How do I involve the people involved and make them collaborators?)

On the other hand you will deal with technical processes:

  • What is the business process I am trying to support with IT and data science?
  • Data integration process and questions like what to use – ETL vs. ELT as an example
  • What data analysis and data science processes you want to choose (e.g. CRISP)

Platforms

Fundamental and strategical questions that influence what platforms you will use for your Analytics and products are:

  • What does my IT governance specify?
  • Which (IT) strategies do I pursue?
  • Previous (IT) architecture
  • One or two speed IT?
  • What does my compliance/security dictate?

This leads to technical questions like:

  • How should the Data integration be realized? (Manually via Java vs. tools like talend, Dataflow, etc.)
  • Which cloud should be used (AWS vs. Google vs. Azure and public vs. private vs. hybrid)

  • What are my technical requirements?
  • SLA’s (contract between a service provider and a customer)

How to build up your data analytics platform is described in this article.

Programmability

Which tools and programming languages do I use? This point is of course also determined and influenced by the IT governance and strategy and the answers to the questions above.

Tools and programming languages could be:

  • Programing languages like SQL, Python, R
  • Big Data tools like Hadoop, Googles Cloud Storage & Big Query, AWS Redshift and S3
  • Streaming software like Kafka, Spark, talend
  • BI tools like: Tableau, Qlik, Google Data Studio

Conclusion

When working in a Data Science project or even managing them, the 5P’s will provide you with the five main topics you should think of. Furthermore, they administer you with questions you may ask yourself during a project. For further information you can click on the links below.

Sources and Further Readings

How to set up an flexible and scalable Data Analytics Platform

What is Data Science ?? The 5 P’s !! | Data Science and Machine Learning


Related Articles