Scrub data in SQLite-database with regexp?

Any trouble you encounter with the Pedias, here's the place to ask for help.
Post Reply
User avatar
The Swede
Bruji Friend
Bruji Friend
Posts: 10
Joined: Sat Aug 26, 2006 8:19 am
Location: Sweden

Scrub data in SQLite-database with regexp?

Post by The Swede »

I need to scrub the data coming in from LOC and Libris, KB, Sweden. (Se example data at the bottom)

I use Navicat to execute SQL queries direct to the database. I can scrub away the unwanted colons, " s", dots and commas. But I'm stuck with filtering away the brackets and text in between. Normally I'd use a regular expression and something like this:
UPDATE ZENTRY
SET ZPAGES=REPLACE(ZPAGES,'%[.?%]','')
WHERE ZPAGES NOTNULL
but it doesn't work due to some regexp restrictions in SQLite. Therefor I have temporarily solved the issue like this:
UPDATE ZENTRY
SET ZPAGES = REPLACE (ZPAGES, '[1', '[');

UPDATE ZENTRY
SET ZPAGES = REPLACE (ZPAGES, '[2', '[');

UPDATE ZENTRY
SET ZPAGES = REPLACE (ZPAGES, '[3', '[');
.
.and so on...
.
UPDATE ZENTRY
SET ZPAGES = REPLACE (ZPAGES, '[]', '');

It works but it is extremely slow. I scrub several fields like this and the code for the query has grown to some gasping 1300 lines and takes 40 seconds to run on a database with 1000 posts. Then consider that I need to run the whole query at least 20-50 times since the number of characters between the brackets can vary. It's gruesome...

My question is; can this be done faster/smarter/better/nicer/cooler?



Exampledata
ZPAGES
218, [1] s
397, [1] s
249, [1] s
408, [8] s
[4] s., s. 15-432, [2] s., s. 5-430, [1] s
172p
332, [1] s
326, [3] s
262 s
491 s
221, [1] s
126 s
[1], 356, [1] s
340, [1] s
323 s
[2], 342, 319, [3] s
232 s
258 S
257, [1] s

ZTITLE
Vad är pengar?: [allt du velat veta om världsekonomin men inte vågat fråga om]
Krukträdgården: [skapa en unik trädgård med hjälp av krukor]
Upptäck tron: [har livet mening?] : [vad påstår Jesus?] : [är tron till för alla?]
Prismas stora blombok för hemmet: [med över 1000 vackra färgbilder]
User avatar
Conor
Top Dog
Posts: 5343
Joined: Sat Jul 03, 2004 12:58 pm
Contact:

Re: Scrub data in SQLite-database with regexp?

Post by Conor »

SQLite3 is limited in it's string handling functions. Do remember some ltrim work around for some solutions. And there are ways to add functions to MYSQL so that you can work with regular expressions. But if I had to solve this problem I would dump the field and the z_pk field then I would work the text in my favorite regular expression editor and at the end use a regular expression to change it back to an import command for SQLite3.

Dumping the information:

Code: Select all

select Z_PK, zpages from zentry;
218, [1] s|1

You can update the separator to something more unique with ".separator" command.

Final output after the regular expressions:

Code: Select all

UPDATE zentry SET zPages = 218  WHERE z_pk = 1);
It's even possible to roll the whole thing into a script if you want to go with one click update.
Post Reply