Comic Book Plus Forum

News, Rules And Introductions => Basic Site Rules => Topic started by: EddieTea on June 11, 2013, 01:36:59 PM

Title: Text files
Post by: EddieTea on June 11, 2013, 01:36:59 PM
Hey everyone, firstly well done on creating an amazing collection of important literature.

I'm a post-grad linguistics researcher at the University of Wales, Swansea, specialising in children's literature. I'm currently building a corpus of 'classic' children's stories, and using Project Gutenberg to download texts (eg Treasure Island, Peter Pan, Little Women, Adventures of Mark Twain). As the period of publication is around 1860 - 1920, there are no copyright issues in downloading the texts and using them for linguistic analysis.

I'm particularly interested in collections such as the 'Half Dime Library' you have on your site, published during the same period.

Is there a way of extracting the text, or downloading rich text files (RTF), without the illustrations?

Title: Re: Text files
Post by: mr_goldenage on June 11, 2013, 03:49:37 PM
Is there a RTF extrac tor like there is for PDF files that are extracted without the text? Just a thought.

RB @ Work
Title: Re: Text files
Post by: SuperScrounge on June 12, 2013, 07:38:50 AM
Aren't those image files? (Either gif, jpg or png.)

If so I don't believe you could extract text as there is no real text, just pixels creating the illusion of text.

What you would need is an Optical Character Scanner program which could read the image and create a text document.

Or ask the scanner to rescan the pages you want with an OCR.
Title: Re: Text files
Post by: dcburtonjr on June 27, 2013, 04:23:22 AM
Hi,
I'm new here and I also want to be able to pick out a particular panel and be able to use it on my website and/or blog. How can I do that? I'm using Windows 8 which isn't compatible with some programs. Any suggestions?
Thanks.
Title: Re: Text files
Post by: SuperScrounge on June 27, 2013, 05:48:08 AM
Once you've downloaded the file use an image editing program to open the page you want and Copy the panel(s) you want.

Or you could create a duplicate of the page and use Crop. (Don't use crop on the original if you wish to reread the rest of the page.)
Title: Re: Text files
Post by: Yoc on June 27, 2013, 03:22:35 PM
I'm not sure if Windows 8 comes with image editing software.
If not you might grab a free one like Irfanview which can crop and save most image formats.

Don't forget to credit the original scanner and the site when your repost.

Good luck,
-Yoc
Title: Re: Text files
Post by: jimmm kelly on June 27, 2013, 03:39:53 PM
Wordpress allows you to crop any images. A fact I didn't realize when I first started blogging on it. Saves me some time, if I haven't already cropped the image. My usual practice is to crop images in paint--including some images I've saved from this site. I don't know how to sharpen the images--I guess you need photoshop for that, which I don't have.
Title: Re: Text files
Post by: narfstar on June 27, 2013, 04:39:32 PM
Irfanview allows sharpen as does free paint .net
Title: Re: Text files
Post by: Yoc on June 27, 2013, 10:59:01 PM
If you download Irfanview and the available plug-ins from them you can use Loss-less jpg editing which might cut down on the blurring associated with editing jpgs.  Irfanview is even able to use some Photoshop plugins out there.

-Yoc
Yes, I use Irfanview a lot
Title: Re: Text files
Post by: Comix on August 03, 2013, 04:56:12 PM
Hi there,
I
Title: Re: Text files
Post by: SammiD on September 04, 2016, 12:58:28 PM
I use Firefox with the "DownThemAll!" add-on. I haven't had any time-out problems; you might want to try it.