Ruby on Rails Thursday, January 26, 2012

On Thu, Jan 26, 2012 at 6:37 PM, Peter Vandenabeele <peter@vandenabeele.com> wrote:
On Thu, Jan 26, 2012 at 6:15 PM, Linus Pettersson <linus.pettersson@gmail.com> wrote:
Hi

I tested to remove the .order(...) part and indeed, the query time goes down to ~100ms. However, it doesn't help to add indices, at least not as I did :)

add_index :categories, :name
add_index :subcategories, :name

Did some more testing and if I keep the .order... but don't join the products table I get a query that runs at about ~55ms. So the bottleneck seems to be the products table.
The query that I'm running looks like this:

Category.eager_load(:subcategories)
             .joins("INNER JOIN products AS p ON resellercategories.id = p.resellercategory_id")
             .order("categories.name ASC, subcategories.name ASC")

(Skipping the gender here...)

What I have is Categories and Subcategories. They are related to each other through a Resellercategories table. Products are related to Resellercategories.
So, the reason that I want to join the products as well is because I only want to show categories and subcategories that actually have some products (there are some empty categories/subcategories still).

So the above query is what we came up with in another thread here in the group.

- Maybe there is a better way to check if a category/subcategory has products without joining the entire products table?


It is possible to add a :counter_cache , but then you need to make sure you
use the proper methods for each product that you add or remove from the
association.

Alternative to the default counter cache (from Rails), you could build your own
logic as in:

* has_male_products
* ...

changing you query to  ...

Category.eager_load(:subcategories).
               where(:has_male_products => true).
               order(...)

Then you would need to set the cache on the appropriate categories in
an after_save on the product you are creating/updating/deactivating/(deleting ?).

Both ideas would probably be faster for querying, but certainly more
complex for making sure that cache is always correct.

Sorry to reply to my own post.


TL;DR  Is there pagination? Then a smaller set may return much faster.


I was thinking over my reply and may have forgotten a fundamental
aspect ... If you say 2200 categories, 8000 products.

How many entries does you query return ?
(replace .add with .count at the end).

How many do you need ?

What happens when you add .limit(20) to your query ?

By which "primary object" do you want to sort and paginate ?
(I will assume 'Product' in the discussion below).

With the includes that are currently implemented, you may
have to redo the whole query into 2 qeuries ... 

1) for fetching the "primary objects"
(e.g. exactly 20 Products, no additional "has_many" data, because
that would increase the number of returned rows for 1 product and
make proper pagination in the database impossible; including
"belongs_to" here is no problem)

2) a second query for fetching eventual "has_many" data
on those 20 "primary products" (is that "Reification" ?)

If the performance problem could be solved by taking the
pagination into account, that would be a _much_ better
solution that building cache columns in this early phase
of your project.

HTH,

Peter

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

No comments:

Post a Comment