How To Split Sql Insert Statement
“Don’t let yesterday take up too much of today.” - Will Rogers
Case study
Assume you have a SQL INSERT statement like this:
INSERT INTO table_name VALUES (value1, value2, value3, ...);
If you want to split the values into a list, you can use the following code snippet:
import re
split_values = values.split(",")
If work in some case until for some values contains comma (,) inside single quotes (‘), like this:
INSERT INTO table_name VALUES (value1, value2, 'value3, value3.1', ...);
In this case, the split() method will not work as expected. To handle this case, you can use the following code snippet:
split_values = re.split(r",(?=(?:[^']*'[^']*')*[^']*$)", values)
Elaboration
Let’s break down the code snippet re.split(r",(?=(?:[^']*'[^']*')*[^']*$)", values)
:
The re.split() function is used to split a string based on a specified pattern.
In this case, the pattern is ",(?=(?:[^']*'[^']*')*[^']*$)"
.
Now, let’s analyze the pattern:
-
, - This matches a comma (,) character, which serves as the delimiter for splitting the string.
-
(?=(?:[^’]’[^’]’)[^’]$) - This is a positive lookahead assertion. It ensures that the comma (,) is followed by a specific pattern without actually consuming it during the split.
-
(?:[^’]’[^’]’)* - This non-capturing group (?: … ) matches zero or more occurrences of the following pattern: any character except a single quote ([^’]), followed by a single quote (‘), followed by any character except a single quote ([^’]), and another single quote (‘). Essentially, it matches pairs of single quotes and any characters between them.
-
[^’]$ - This matches any character except a single quote ([^’]) until the end of the string ($). It ensures that there are no unmatched single quotes after the comma.
Therefore, the overall pattern ",(?=(?:[^']*'[^']*')*[^']*$)"
matches a comma (,) only if it is followed by an even number of single quotes (indicating it is not inside a pair of quotes).
The re.split() function splits the values string based on this pattern, effectively separating the values for each column while ignoring commas inside quotes. The resulting list, split_values, contains the extracted values.
Note that the strip(“’”) method is applied to each value in the for loop to remove the surrounding single quotes.
–HTH–