Importing HTML files to Bookpedia?

Any trouble you encounter with the Pedias, here's the place to ask for help.
Post Reply
User avatar
phule92
Addicted to Bruji
Addicted to Bruji
Posts: 34
Joined: Mon Oct 26, 2009 11:17 am

Importing HTML files to Bookpedia?

Post by phule92 »

Is it possible to import a listing of books in an HTML file into Bookpedia :?:
User avatar
Conor
Top Dog
Posts: 5344
Joined: Sat Jul 03, 2004 12:58 pm
Contact:

Re: Importing HTML files to Bookpedia?

Post by Conor »

Because HTML is so variable there is no direct import from HTML. The best technique is to transform the HTML via regular expressions int a tab delimited file. The file can then be read by the import function in Bookpedia. However, since regular expressions can be complicated to pull out a number of details for each book you can concentrate on pulling out a single value and creating a list of ISBNs or titles that can then be copied and pasted into the add multiple window.

A program like BBEdit or TextWrangler will do a regular expression search over a number of files, if not all books are in a single HTML file. You run a multiple find using the regular expression such as <title>(.*)</title> to extract all the information between the title tags. You can then actually copy the results list in to a new document resulting in something like this:

Code: Select all

/Users/me/anExport/page1.html:6:  <title>Adaptation</title>
/Users/me/anExport/page2.html:6:    <title>The Lies of Locke Lamora</title>
/Users/me/anExport/page3.html:6:  <title>Little House on the Prairie</title>
This can be cleaned up to be only the title with the following find and replace:
find: .*<title>(.*)</title>
replace: \1

This is just an example of how to pull out the title on one particular template, I would recommend pulling out the ISBN if possible as it will give you exact results. The list can be copied into the add multiple window even though the field is only one line high it will take a long list and separate them at the new line character.
User avatar
phule92
Addicted to Bruji
Addicted to Bruji
Posts: 34
Joined: Mon Oct 26, 2009 11:17 am

Re: Importing HTML files to Bookpedia?

Post by phule92 »

Conor wrote:Because HTML is so variable there is no direct import from HTML. The best technique is to transform the HTML via regular expressions int a tab delimited file. The file can then be read by the import function in Bookpedia. However, since regular expressions can be complicated to pull out a number of details for each book you can concentrate on pulling out a single value and creating a list of ISBNs or titles that can then be copied and pasted into the add multiple window.
[SNIP]

Since importing a HTML file is so complicated, I'll stick with a tab delimited file instead. And here I thought HTML would be simpler. Oh well. Thanks for clearing things up.
Post Reply