In the last step, we decided to postpone cleaning the &
; character because Excel was giving a weird error about it. Now that we have finished Step three – import the data into MySQL in a single table and our data is imported into MySQL, we can very easily clean the data using an UPDATE
statement and the replace()
string function. Here is the SQL query needed to take all instances of &
; and replace them with &
:
UPDATE sentiment140 SET tweet_text = replace(tweet_text,'&', '&');
The replace()
function works just like find and replace in Excel or in a text editor. We can see that tweet ID 594, which used to say #at&t is complete fail
, now reads #at&t is complete fail
.