Sql is a programming language that is used by most relational database management systems (RDBMS) to manage data stored in tabular form (i.e. tables).
SQL is a fundamental skill expected from a data scientist. You might argue that it is the job of a data engineer but the data scientist roles are inclined to being full-stack. Besides, as a data scientist, you wouldn’t want to depend on a data engineer to retrieve data from a database.
In this article, we will go over two important data manipulation statements of SQL: UPDATE and INSERT.
Although the SQL syntax is mostly the same for all relational database management systems, there might be small differences. We will be using MySQL in this article.
Update statement modifies rows in a table by updating values. I have the following item table in a sales database.
mysql> select * from item;
+---------+----------------+-------+----------+
| item_id | description | price | store_id |
+---------+----------------+-------+----------+
| 1 | apple | 2.45 | 1 |
| 2 | banana | 3.45 | 1 |
| 3 | cereal | 4.20 | 2 |
| 4 | milk 1 liter | 3.80 | 2 |
| 5 | cheddar cheese | 4.50 | 2 |
| 6 | icecream | 6.10 | 2 |
| 7 | water 2 liters | 1.10 | 3 |
| 8 | tomato | 0.95 | 1 |
| 9 | egg 15 | 4.40 | 3 |
| 10 | sprite 1 liter | 1.60 | 3 |
+---------+----------------+-------+----------+
I want to make a copy of it and do the examples on this copy. There are multiple ways to create a copy of a table. One way is to create a new table and select rows from the existing table.
mysql> create table item_copy
-> select * from item limit 5;
The new table contains the first 5 rows of the existing table.
mysql> select * from item_copy;
+---------+----------------+-------+----------+
| item_id | description | price | store_id |
+---------+----------------+-------+----------+
| 1 | apple | 2.45 | 1 |
| 2 | banana | 3.45 | 1 |
| 3 | cereal | 4.20 | 2 |
| 4 | milk 1 liter | 3.80 | 2 |
| 5 | cheddar cheese | 4.50 | 2 |
+---------+----------------+-------+----------+
In order to properly use the update statement, we need to specify both the value to be updated and its new value. For instance, we can update the price of apple using the following statement.
mysql> update item_copy set price = 2.95
-> where description = 'apple';
mysql> select * from item_copy;
+---------+----------------+-------+----------+
| item_id | description | price | store_id |
+---------+----------------+-------+----------+
| 1 | apple | 2.95 | 1 |
| 2 | banana | 3.45 | 1 |
| 3 | cereal | 4.20 | 2 |
| 4 | milk 1 liter | 3.80 | 2 |
| 5 | cheddar cheese | 4.50 | 2 |
+---------+----------------+-------+----------+
We first write the table name after the update keyword. The set keyword specifies the new value. The where is optional and used to update based on some condition.
If we do not specify a condition, all rows are updated. For instance, the following statement will update all the rows of the store id column.
mysql> update item_copy set store_id = 99;
mysql> select * from item_copy;
+---------+----------------+-------+----------+
| item_id | description | price | store_id |
+---------+----------------+-------+----------+
| 1 | apple | 2.95 | 99 |
| 2 | banana | 3.45 | 99 |
| 3 | cereal | 4.20 | 99 |
| 4 | milk 1 liter | 3.80 | 99 |
| 5 | cheddar cheese | 4.50 | 99 |
+---------+----------------+-------+----------+
We can do multiple updates in one statement. Let’s update both the description and the price of apple.
mysql> update item_copy
-> set description = 'green apple', price = 3.20
-> where description = 'apple';
mysql> select * from item_copy limit 1;
+---------+-------------+-------+----------+
| item_id | description | price | store_id |
+---------+-------------+-------+----------+
| 1 | green apple | 3.20 | 99 |
+---------+-------------+-------+----------+
The insert statement is used to add rows to an existing table.
mysql> insert into item_copy
-> values (6, 'bread', 1.80, 99);
The statement above adds a new row to the item_copy table. We first specify the name of the table and then write the values.
mysql> select * from item_copy;
+---------+----------------+-------+----------+
| item_id | description | price | store_id |
+---------+----------------+-------+----------+
| 1 | green apple | 3.20 | 99 |
| 2 | banana | 3.45 | 99 |
| 3 | cereal | 4.20 | 99 |
| 4 | milk 1 liter | 3.80 | 99 |
| 5 | cheddar cheese | 4.50 | 99 |
| 6 | bread | 1.80 | 99 |
+---------+----------------+-------+----------+
We can insert multiple rows in one statement.
mysql> insert into item_copy
-> values (7, 'water', 1.25, 2),
-> (8, 'coffee', 2.20, 2);
mysql> select count(*) from item_copy;
+----------+
| count(*) |
+----------+
| 8 |
+----------+
We have added two more rows and the number of rows in the table is 8 now.
There are some constraints to keep in mind when using the insert statement. The data types must be compatible. For instance, the following statement will return an error.
mysql> insert into item_copy values (6, 'bread', 1.80, 'new');
ERROR 1366 (HY000): Incorrect integer value: 'new' for column 'store_id' at row 1
The store id column only accepts integer values because it is specified data type when creating the table. If we pass a value with a different data type, we will not be able to insert the row.
Another constraint is that we cannot have duplicate values in the primary key which is the column that uniquely identifies each row. It is like the index of a pandas dataframe.
I did not set a primary key in the copied table but the original item table has it.
mysql> describe item;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| item_id | int(11) | NO | PRI | NULL | |
| description | varchar(20) | YES | | NULL | |
| price | decimal(6,2) | NO | | NULL | |
| store_id | int(11) | YES | MUL | NULL | |
+-------------+--------------+------+-----+---------+-------+
The item id column is the primary key so it cannot have any duplicate values. If we try to insert a row with an item id that already exists in the table, we will get an error.
mysql> insert into item
-> values (5, 'orange', 2.40, 2);
ERROR 1062 (23000): Duplicate entry '5' for key 'PRIMARY'
If we need to insert a row with a primary key value that already exists, we can use "on duplicate key update" clause. In that case, we will not get an error and the existing rows will be updated based on the information we specify.
mysql> insert into item values (5, 'orange', 2.40, 2)
-> on duplicate key update description = 'orange';
The description in the row with item id equals 5 is updated as ‘orange’.
Conclusion
We have covered two fundamental data manipulation statements of SQL which are "update" and "insert".
We have done some basic level examples that explain the syntax and structure of these statements. After you are comfortable working with the basics, you can work on more advanced statements.
Thank you for reading. Please let me know if you have any feedback.