Tim's Random Interests: March 2011

This is likely to be one of many posts on the subject. I have an interest since the malformed database often rears its ugly head when using Calibre. Since I wrote the Kobo driver for Calibre and I do not believe Calibre causes the issue I have an interest in knowing more about the cause and potential fixes.

Dumping the database

For my purposes I use sqliteman to dump the database to a file. I have not tested the script simply using the sqlite3 command but I imagine it would work.

the script

The script below lacks some (a lot) of features and has hard coded file names, etc but it does the job. I used python because I might want to actually use some sqlite connections at some point.

The script works by attempting to interpret each row in the dump file as Unicode. If it fails the associated line and the related sql command is dropped.

#!/usr/bin/env python

import re
import string
import binascii

file = open("kobo-corrupt-nick.sql", "rb")
err_file = open("error.log", "w")
out_file = open("out.log", "w")

new_stmt = False
prev_row = ""
err_line = ""
corrupt = False
prev_corrupt = False
firstrow = True

while 1:
    line = file.readline()

    if not line:
        break
    try:
        err_line = line

        if re.match("[I][N][S][E][R][T] *", line):
            new_stmt = True
            row = line
        elif re.match("[D][R][O][P] *", line):
            new_stmt = True
            row = line
        elif re.match("[C][R][E][A][T][E] *", line):
            new_stmt = True
            row = line
        elif re.match("[P][R][A][G][M][A] *", line):
            new_stmt = True
            row = line
        elif re.match("[B][E][G][I][N] *", line):
            new_stmt = True
            row = line
        elif re.match("[C][O][M][M][I][T] *", line):
            new_stmt = True
            row = line
        else:
            new_stmt = False
            row = row + line

        if new_stmt == True:
            if prev_corrupt == False:
                if prev_row != "":
                    out_file.write(prev_row)
           else:
                if prev_row != "":
                    err_file.write(prev_row)
        prev_row = row

        unicode(line, 'utf-8')
        prev_corrupt = corrupt

    except UnicodeDecodeError:
        prev_corrupt = True
        prev_row = row
        print line
        continue
    except:
        raise

if new_stmt == True:
    if prev_corrupt == False:
        if prev_row != "":
            out_file.write(prev_row)
    else:
        if prev_row != "":
            err_file.write(prev_row)

file.close()
err_file.close()
out_file.close()

Importing the file

Importing the file is straight forward, simply run:

sqlite3 KoboReader.sqlite < out.log

Validating the data

This is the kicker. A lack of understanding of the data relationships may prevent being able to sort out the data. I have started on some scripts that may help. For instance:

select

c.title,

vs.volumeid, c.numshortcovers - count(vs.volumeid) diff

from

volume_shortcovers vs inner join content c on

c.contentid = vs.volumeid

where

c.bookid is Null

group by

c.title,vs.volumeid, c.numshortcovers

having diff <> 0

This script looks for issues where the number of rows in volume_shortcovers for each book is not the number specified by numshortcovers in the book record in the content table.

Does it mean anything? Who knows as my Kobo that has no issues also as differences.

Another sql statement attempts to rebuild those missing rows:

insert into volume_shortcovers (volumeid, shortcoverid, VolumeIndex)

select

c.bookid, c.contentid, c.volumeindex

from content c left outer join volume_shortcovers vs on

c.bookid = vs.volumeid and

c.contentid = vs.shortcoverid and

c.volumeindex = vs.volumeindex

where

c.bookid is not Null and

vs.volumeid is Null

order by c.bookid, c.volumeindex

Is that safe? No idea but I am working on it... Running it on a database that had issues reduced the number of rows returned by the first sql query.

As I learn more I will post it here.

Tim's Random Interests

Thursday, March 03, 2011

Fixing KOBO's malformed database

timlegge's shared items

Blog Archive

About Me