Are you interested in Data Science but not sure if you can succeed because you do not have any programming experience? Or Not sure how much programming is enough to get started?
Do not worry! In this article, I am going to explain the basic python programming concepts that are required to start your journey of learning Data Science. Even if you are not choosing python to learn Data Science still the topics would remain the same.
Setting up the Environment
The first step is to set the environment right, for learning python it is best to have Anaconda installed. There are some benefits in using Anaconda.
- It comes with Jupyter Notebook and Spyder IDEs
- You have the option to access the IDEs through Command Prompt or using Navigator (UI based)
- It has more 1500 data science packages
- Extremely easy and fast to set up and get started
- Easy to create multiple environments, this feature is especially useful when you are working on many projects with different contrasting dependencies on the libraries
Install Anaconda from [here](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html). You can get started with Anaconda using the link here, this would cover all the basic commands and features. If you are more of a video person I have made a video on getting started with Anaconda, check below:
Before getting started with any topic try to familiarize yourself with the IDE that you are going to work on, simple things like starting the IDE, shutting down the IDE, clearing the memory, executing the scripts, and so on.
Variable and String Data Type
On the successful installation of the tool, the next step is to know about declaring a variable and learn more about the string data type and the operations that can be performed on the string data type.
It is important to understand how the string data type is working in python. The string data types in Python are considered as an array of characters, and the character(s) of the string can be accessed using square brackets by specifying the index
Some of the string operations that you should get familiar are
- Replace() – Used to replace a specific value in the string with a new value
- Split() – Used to split a string into 2 or more parts based on a delimiter
- Concatenation – Plus operator used to concatenate a number of strings
- Trim() – Used to remove the extra spaces before and after a string
- Lower() and Upper() – Used to change the string case, either upper case to lower case or from lower case to upper case
- Use of Index to filter a subset of string
- Use of ‘IN’ and ‘NOT’ to check for the presence/absence of a phrase in the string
Refer to the python tutorial here to learn about these concepts and features of python programming language
Numeric, Boolean, and Operators
Learn about the three numeric data types in python – Integer, Float, and Complex. Also about the methods to convert data from one numeric format to the others and how it would reflect in the data as such. For example, define an integer convert it into float and check what happens to the value.
Boolean represents either True or False. They are generally used to evaluate an expression/condition like Is A > B?
Coming to Operators they are used to perform operations between variables or values or a combination of both variables and values. Some of the operators available in python are,
- Assignment Operators
- Comparison Operators
- Arithmetic Operators
- Logical Operators
- Bitwise Operators
- Identity and Membership Operators
Learn about all the python operators from the tutorial here and practice by trying out them
Collection Data Types (List, Tuple, Sets, and Dictionary)
There are four collection data types in python. All these data types might look very similar to each other for a beginner but each one of them has unique features that differentiate it from the others and makes them special for a particular use-case. Some of their unique characteristics are,
List
- Declared using ‘[‘ and ‘]’ brackets
- Elements can be accessed using Index
- They are mutable, which means that they can be altered/changed
- They can be sorted
- Elements of the list can be of any data type
- Use-cases: This is the most popular collection data type in python as it provides more flexibility.
Tuple
- Declared using ‘(‘ and ‘)’ brackets
- A tuple is immutable, which means that they can not be altered once defined
- They are ordered, can be accessed using index
- Tuple is much faster than a list
- Elements of Tuple can be of any data type
- Use-cases: This should be used in scenarios where the list of elements can’t or shouldn’t be changed
Sets
- Declared using ‘{‘ and ‘}’ brackets
- They are Mutable
- These are not ordered and they don’t have an index to access a specific element hence they can’t have any duplicates
- Though sets are themself mutable they can’t have an element which can be mutable like ‘a list can’t be an element in a set’
- The specialty of a set is they allow operations like Union and Intersection
- Use-cases: When there is a requirement to compare the various list of values like identifying the number of common elements then it is best to define them as a set
Dictionary
- Declared using ‘{‘ and ‘}’ brackets
- The elements are stored as Key-Value pairs similar to JSON format
- They are mutable
- They are unordered but the elements can be accessed using the Key
- Use-cases: When you would like to have a mapping between a key and value such as contact number associated with a customer then it is better to declare them as a dictionary. Also, the dictionary data type can be used to store much complex data structure
Check the tutorial here to better understand the above collections data types and also about the operations that can be performed on these collection data types.
Conditional (If-Then-Else) and Control Flow Statements (For and While Loop)
Implementation of these is must-know in python. As in any Data Science project, there will be a use-case that would require us to loop through a List of Items or a data frame for which we need to implement Loop functionality. Similarly, there would always be a requirement to check for a condition. Hence learn the implementation of the below-using python
- If-Then-Else
- For Loop
- While Loop
For people who are very new to programming, the basic difference between a ‘For Loop’ and a ‘While Loop’ is, the ‘For Loop’ will iterate through a specific list of elements and here the variable will be initialized, checked, and incremented automatically while in case of ‘While Loop’ it would iterate until the condition specified in the While Loop is met also the initialization and incrementation need to mentioned explicitly. It is preferable to use ‘For Loop’ when we are sure about the number of iterations. In the case of ‘While Loop’ if you miss mentioning the increment statement inside the loop then it will become an infinite loop.
Function and Lambda Functions

Functions are used to avoid repetition of the code, to reduce complexity, and to improve readability. A function in python is defined using ‘def’ and it would generally end with a ‘return’ statement. When a set of statements needs to be executed multiple times at different parts in the project then it would be better to define them as a function and call them when they are required. The functions can take any number of arguments as inputs and while calling the function the number arguments must exactly match i.e. if the function is expecting 2 arguments then while calling the function we need to pass two arguments no more and no less.
A Lambda Function is similar to a function it can take any arguments but would have only one expression. Unlike functions, the lambda functions can remain anonymous. In the case of the lambda function, the keyword ‘lambda’ will be used and the syntax is ‘lambda arguments: expression’.
These basic concepts are just enough to get started with Learning Data Science. If you like tutorial videos then check out the video series I have made on python basic required for Data Science, it has 7 modules each about 20–30 mins with workout exercises to try at end of each module and the total length of the tutorial is just short of 3 hours. Below is the link to the python basic tutorial series, please subscribe for more contents related to Data Science.
Final Statement
This is just the beginning of your programming journey. These concepts will be very useful to get started with Data Science and could also help in breaking your learning barrier. Just keep in mind to continue your learning journey.
"Intellectual growth should commence at birth and cease only at death"
-Albert Einstein