Dumping the database
For my purposes I use sqliteman to dump the database to a file. I have not tested the script simply using the sqlite3 command but I imagine it would work.
the script
The script below lacks some (a lot) of features and has hard coded file names, etc but it does the job. I used python because I might want to actually use some sqlite connections at some point.
The script works by attempting to interpret each row in the dump file as Unicode. If it fails the associated line and the related sql command is dropped.
#!/usr/bin/env python import re import string import binascii file = open("kobo-corrupt-nick.sql", "rb") err_file = open("error.log", "w") out_file = open("out.log", "w") new_stmt = False prev_row = "" err_line = "" corrupt = False prev_corrupt = False firstrow = True while 1: line = file.readline() if not line: break try: err_line = line if re.match("[I][N][S][E][R][T] *", line): new_stmt = True row = line elif re.match("[D][R][O][P] *", line): new_stmt = True row = line elif re.match("[C][R][E][A][T][E] *", line): new_stmt = True row = line elif re.match("[P][R][A][G][M][A] *", line): new_stmt = True row = line elif re.match("[B][E][G][I][N] *", line): new_stmt = True row = line elif re.match("[C][O][M][M][I][T] *", line): new_stmt = True row = line else: new_stmt = False row = row + line if new_stmt == True: if prev_corrupt == False: if prev_row != "": out_file.write(prev_row)else:if prev_row != "": err_file.write(prev_row) prev_row = row unicode(line, 'utf-8') prev_corrupt = corrupt except UnicodeDecodeError: prev_corrupt = True prev_row = row print line continue except: raise if new_stmt == True: if prev_corrupt == False: if prev_row != "": out_file.write(prev_row) else: if prev_row != "": err_file.write(prev_row) file.close() err_file.close() out_file.close()
Importing the file
Importing the file is straight forward, simply run:
sqlite3 KoboReader.sqlite < out.log
Validating the data
This is the kicker. A lack of understanding of the data relationships may prevent being able to sort out the data. I have started on some scripts that may help. For instance:
select
c.title,
vs.volumeid, c.numshortcovers - count(vs.volumeid) diff
from
volume_shortcovers vs inner join content c on
c.contentid = vs.volumeid
where
c.bookid is Null
group by
c.title,vs.volumeid, c.numshortcovers
having diff <> 0
This script looks for issues where the number of rows in volume_shortcovers for each book is not the number specified by numshortcovers in the book record in the content table.
Does it mean anything? Who knows as my Kobo that has no issues also as differences.
Another sql statement attempts to rebuild those missing rows:
insert into volume_shortcovers (volumeid, shortcoverid, VolumeIndex)
select
c.bookid, c.contentid, c.volumeindex
from content c left outer join volume_shortcovers vs on
c.bookid = vs.volumeid and
c.contentid = vs.shortcoverid and
c.volumeindex = vs.volumeindex
where
c.bookid is not Null and
vs.volumeid is Null
order by c.bookid, c.volumeindex
Is that safe? No idea but I am working on it... Running it on a database that had issues reduced the number of rows returned by the first sql query.
As I learn more I will post it here.
No comments:
Post a Comment