images are belong to DocumentPage model.
So you need to make Paperclip Processor, which you use in Document
model. Inside this Processor you need to:
1. Create tmp folders where you will perform all operations
2. Convert PPT to PDF using http://www.artofsolving.com/opensource/pyodconverter
3. Convert PDF to TIFF images using ImageMagick.
4. Process TIFF images with
Tesseract(http://code.google.com/p/tesseract-ocr/) to extract
keywords.
5. Convert TIFF to PNG
6. Create DocumentPage models passing PNG images and extracted
keywords as a parameters.
7. If all DocumentPage models are created, just go out of Processor to
let the Document model be created.
Here is the Processor
https://gist.github.com/723079
It's kinda messy and kinda belongs to my application, but you get the idea.
On Tue, Nov 23, 2010 at 6:31 AM, Andy <andymilk@gmail.com> wrote:
> Does anyone know of any gems or plugins that can take a PowerPoint and
> create images out of every slide and also access the text in each
> slide?
>
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
> To post to this group, send email to rubyonrails-talk@googlegroups.com.
> To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
>
>
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
No comments:
Post a Comment