Ruby on Rails Saturday, September 12, 2015



On Friday, September 11, 2015 at 7:32:01 PM UTC+1, wbsurfver@yahoo.com wrote:
# I was just at a job interview and was asked to write a function that did a frequency of various words from a string.
# I wrote something out on the board similar to my freq() method below. When I got home I coded this example up as
# an exercise
#
# I was then asked how could I improve the efficiency of the function. My guess is it would need to involve multi processing ?
# multi threading would not help I don't think. Offhand I wasn't sure how to answer the question. Multi processing via fork is
# not easy because the freq has to be combined at the end. That seems to have been the best answer but how to do it in
# my simple function I don't know ..
#

At the moment your inner loop body looks like this:

map[tok] ||= 0
map[tok] += 1

which reads from the hash twice and writes to it once or twice depending on whether the value is already set.

You could do
existing = map[tok] || 0
map[tok] = existing + 1

which always does one hash read and one hash write. You could also use a hash with default value

map = Hash.new(0)
And then in your loop, just map[tok]+=1

You could also argue that building up an array of words is wasteful - you're allocating a great big array of words but you don't actually need the array - just one work at a time.. You could instead do

@data_str.scan(/\w+/) do |token|
  map[token] += 1
end

In a quick benchmark this is about 15% faster than the original on my machine

Personally I wouldn't consider parallelising this to be something that makes it more efficient - just something that makes it faster.  Forking should work fine. As you say you will have to combine the results, but that's not a problem. You might want to look at the map-reduce approach for more examples of this

Fred

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe@googlegroups.com.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/13ee10a3-f422-4512-b744-f7ef23d5bba7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment