Kindle Forum banner
1 - 12 of 12 Posts

·
Registered
Joined
·
5,610 Posts
Discussion Starter · #1 ·
I apologize if this question has already been asked, but here it is.

I sometimes use Calibre software to convert and send PDF files to my Kindle 3. The question is... is it possible to strip the headers and footers from the file during the conversion? I found a page online that apparently showed how to do this, but it was for an old version of Calibre, and I can't find similar checkboxes in the current version.
 

·
Registered
Joined
·
1,209 Posts
Ask in the Calibre forum on mobileread.com, if you can't find the answer on the Calibre web site or in an existing posting on mobileread.com.  The author(s) are on that forum all the time, and can help with most things.
 

·
Registered
Joined
·
1,205 Posts
Hmmm, I was worried this would be the case but wanted to check first...

If you want to be able to remove anything more than the simplest of headers and footers you need to try to learn something called Regular Expressions or Regex/Regexp. These are a way of specifying search strings. A starter tutorial is here http://manual.calibre-ebook.com/regexp.html

The old menus for removing headers and footers used regexs to search for what it thought might be a header and footer and remove them. I suspect the author found that a lot of the time these didn't work (because all headers and footers are different) so has replaced them with a generic search and replace tab.

Let's take a simple example, suppose your book has a header of "War and Peace" at the top of every page.

On the Search and Replace tab in the convert options, you could simply put "War and Peace" in the first search box, and leave the replace box blank, and it would remove all the headers. Simple! However...

If "War and Peace" appeared anywhere else in the book - in the body of the text, for instance, it would remove that as well. :(

So your search has to be a little more complex than that, and this is where Regex's come in. Here you'd rely on the fact that there is a line break after the header, and search for "War and Peace
" where
means line break.

For another simple example, if your book had Page 1, Page 2, Page 3 etc. at the bottom of the page, you could search for "Page \d" which means Page, then a space, then a single digit.

But what happens on Page 10? To cover this, you'd use "Page \d+", where the + modifies the \d to mean "one or more of these".

Sounds simple-ish, but I'm skipping over a lot of complications here - my examples wouldn't work very well if at all! - and they do start getting complicated very quickly. You need to use a lot of trial-and-error to work them - although when you get them working they are fantastically powerful. For instance, I have regexs to turn "Smith, Arthur" into Arthur Smith" and things like that.

You might find a web search for something like "Calibre header footer remove regex" gives you some ideas, I found these threads for instance; http://www.mobileread.com/forums/showthread.php?t=97738 http://www.mobileread.com/forums/showthread.php?t=75594.

Unfortunately, there's no such thing as a universal regex to remove headers and footers, which I suspect is why Kovid stopped trying to put them in Calibre.
 

·
Registered
Joined
·
1,205 Posts
{I've tidied up my postings, I meant to edit the post to change it slightly and quoted myself instead. Doh! :-[}

Jim, I've just looked at this page http://wiki.mobileread.com/wiki/Calibre:Regular_expressions,_Calibre_and_you-_an_introduction#I_think_I.27m_beginning_to_understand_these_regular_expressions_now..._how_do_I_use_them_in_Calibre.3F.

This is what it suggests (with some additions from me...)

Pick the book you want to convert. Select Convert Books, then select Search and replace, and click the wizard button at the right. Then...

...you get a preview of what Calibre "sees" during the conversion process. Scroll down to the header or footer you want to remove, select and copy it, paste it into the regexp field on top of the window. ... Hit the button labeled "Test" and Calibre highlights the parts it would remove were you to use the regexp. Once you're satisfied, hit OK and convert.

It's possible that this alone will give you the desired results. Stranger things have happened! ;D Otherwise, you'll have to try and work out what's going wrong, for example:

If there are variable parts, like page numbers or so, use sets and quantifiers to cover those, and while you're at it, rememper to escape special characters, if there are some.

The examples on the rest of the page and the tutorial I linked to previously might then give you the clues you need.
 

·
Registered
Joined
·
1,205 Posts
Just thought, there is an alternative approach...

If you're familiar with Microsoft Word, you could try convert... edit... convert again. Convert the PDF to RFT. Open the RFT in Word, and edit out the headers and footers there. Then convert the RFT to MOBI.

It's a messy approach, and it would have to be a book you really care about to be worth the trouble, but you might get results that way.
 
1 - 12 of 12 Posts
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top