On Jul 22, 2015, at 9:10 PM, Mohit Sindhwani <mo_mail@onghu.com> wrote:
> We have tried this and the query is quite a bit slower. Filtering to the last 900k records before doing the recorded_on part helped speed it up.
I don't understand how that could possibly be the case if there's an index on recorded_on.
> Your email got me going back to look at all the parts again since obviously the query should be using the index and it was still slow. Further search last night made me realize that it's not the indexes that are a problem. The problem is the count(distinct group_id) part which seems to be quite slow in PostgreSQL. This is a lot faster:
> select count(*) from
> (select distinct group_id from
> data_store_v2 where recorded_on >= '2015-06-06') td;
> than:
> select count(distinct group_id) from
> data_store_v2 where recorded_on >= '2015-06-06';
>
> as explained here: https://www.periscope.io/blog/use-subqueries-to-count-distinct-50x-faster.html
>
> So, I guess the real problem was being masked by something else and an incorrect assumption on my part :)
I would expect the select count(distinct...) to be a major contributor to the time taken by the query, just given the amount of work it must do. The select count(*) from (select distinct...) alternative is a nice tip :)
>
> Thanks for the analysis :D
> I do understand SQL and I thought I'm not throwing things together... for my understanding, which column was unnecessary? I thought we needed all:
> > group_id for counting the distinct group_id
> > recorded_on for the subsequent query on it
> > id only for getting the most recent records
I apologize--I misread the query structure. I got it into my head as:
select ... from (select ... from ... where recorded_on ... order by ... limit ...)
I think you can see how THAT query would have better fit my description of being poorly constructed.
I'm glad that my somewhat off-base pontification still managed to point you in a useful direction!
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe@googlegroups.com.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/1DD7FF4B-12E3-479B-977F-048C512FF5C1%40elevated-dev.com.
For more options, visit https://groups.google.com/d/optout.
No comments:
Post a Comment