Ruby on Rails Saturday, July 7, 2012

On Jul 7, 2012, at 11:11 AM, David M. wrote:

> I've got a web-app currently partially working. The user uploads a .txt,
> .docx or .doc file to the server.
>
> Currently the model handles those files, saves some metadata (the
> extention and orig filename) then saves the file to the hard drive. Next
> it converts the doc and docx files to plain text and saves the output to
> a txt file.
>
> My problem is I want to copy the plain text contents of those txt files
> to the :body field in my database, but by the time those files are
> written no more changes can be sent to the data base (because all the
> file handling is done in after_save)
>
> Where or how do I sanely get the contents of those TXT files into the
> database?

I built this feature in my first commercial Rails app. I used Paperclip for my file storage, which offers its own callback called 'after_post_process' that worked out perfectly for me.

First, I created a Paperclip processor to extract the text version of the uploaded file (mine were all PDF).

# /lib/paperclip_processors/text.rb

module Paperclip
# Handles extracting plain text from PDF file attachments
class Text < Processor

attr_accessor :whiny

# Creates a Text extract from PDF
def make
src = @file
dst = Tempfile.new([@basename, 'txt'].compact.join("."))
command = <<-end_command
"#{ File.expand_path(src.path) }"
"#{ File.expand_path(dst.path) }"
end_command

begin
success = Paperclip.run("/usr/bin/pdftotext -nopgbrk", command.gsub(/\s+/, " "))
Rails.logger.info "Processing #{src.path} to #{dst.path} in the text processor."
rescue PaperclipCommandLineError
raise PaperclipError, "There was an error processing the text for #{@basename}" if @whiny
end
dst
end
end
end

Then in my document.rb (model for the file attachment), I added the following bits:

has_attached_file :pdf,:styles => { :text => { :fake => 'variable' } }, :processors => [:text]

after_post_process :extract_text


private
def extract_text
file = File.open("#{pdf.queued_for_write[:text].path}","r")
plain_text = ""
while (line = file.gets)
plain_text << Iconv.conv('ASCII//IGNORE', 'UTF8', line)
end
self.plain_text = plain_text
end

And that was that.

Walter

>
> See model attached:
>
> Attachments:
> http://www.ruby-forum.com/attachment/7574/doc_file.rb
>
>
> --
> Posted via http://www.ruby-forum.com/.
>
> --
> You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
> To post to this group, send email to rubyonrails-talk@googlegroups.com.
> To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en-US.
>

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en-US.

No comments:

Post a Comment