0

Hello guys!!
I have a database and I store public tweets inside.
I have lots of columns, including timestamp with timezone column.
For example:

timestamp                                       |      tweet_id
"2013-07-30 11:14:07.404+03"                        6598dsf4466a
"2013-07-30 11:14:08.028+03"                        87f9dss99s97
...
"2013-08-04 17:36:37.924+03"                        646s4f64a64
"2013-08-04 17:36:38.891+03"                        654asd9f654a

So, I want to make clusters (time clusters) with these results
For example hourly clusters "break" results to hour interval

The result that I want is

ids                                          |              hour
{123,654,468,4698,654,..}                                    12
{6s4f6a,ghj,ghjth,654,....}                                  13

or with minute clustering

ids                                                                |              minute
{123,654,468,4698,654,..}                                                            12
{6s4f6a,ghj,ghjth,654,....}                                                          13

what I am doing now is to run my query

public function getHourClusters ( $specDay )
    {
        $sth = $this->db->prepare("SELECT time, date_part('hour',time) as hour,text,
            id_str FROM tweet WHERE time >=  '$specDay'
            GROUP BY time,id_str,text
            ORDER BY time");
        $sth->execute();
        $data = $sth->fetchAll();      
        return $data;
    }

And then I manage or "build" the clusters outside the query like this:

if ( strcmp ( $value['hour'] , '00' ) == 0 )
            {
                $hour_0[] = $value['id_str'];
            }
            if ( strcmp ( $value['hour'] , '1' ) == 0 )
            {
                $hour_1[] = $value['id_str'];
            }
            if ( strcmp ( $value['hour'] , '2' ) == 0 )
            {
                $hour_2[] = $value['id_str'];
            }
...

And this is so awful, but I can't think something better
I also tried this:

SELECT id_str, time  FROM tweet
                WHERE int4 (date_part ('hour', time)) % 2= 0 
                AND time >=  '2013-07-06'
                GROUP BY time, id_str ORDER BY time

But again this isn't what I want..
Anyways, I hope to help me.
Regards
Theodore, Greece :)

Edited by theodore.xrysohoidis

2
Contributors
1
Reply
10
Views
4 Years
Discussion Span
Last Post by LastMitch
0

I have a database and I store public tweets inside.
I have lots of columns, including timestamp with timezone column.

@theodore.xrysohoidis

Have try this before you used cluster data:

http://www.postgresql.org/docs/9.1/static/sql-analyze.html

The query's you provided won't work. After the analyze I think the query will work.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.