Ruby on Rails Tuesday, November 30, 2010

Let's say the uploaded PPT is belong to Document model, and it's pages
images are belong to DocumentPage model.

So you need to make Paperclip Processor, which you use in Document
model. Inside this Processor you need to:
1. Create tmp folders where you will perform all operations
2. Convert PPT to PDF using http://www.artofsolving.com/opensource/pyodconverter
3. Convert PDF to TIFF images using ImageMagick.
4. Process TIFF images with
Tesseract(http://code.google.com/p/tesseract-ocr/) to extract
keywords.
5. Convert TIFF to PNG
6. Create DocumentPage models passing PNG images and extracted
keywords as a parameters.
7. If all DocumentPage models are created, just go out of Processor to
let the Document model be created.

Here is the Processor
https://gist.github.com/723079
It's kinda messy and kinda belongs to my application, but you get the idea.

On Tue, Nov 23, 2010 at 6:31 AM, Andy <andymilk@gmail.com> wrote:
> Does anyone know of any gems or plugins that can take a PowerPoint and
> create images out of every slide and also access the text in each
> slide?
>
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
> To post to this group, send email to rubyonrails-talk@googlegroups.com.
> To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
>
>

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

No comments:

Post a Comment