4
Contributors
7
Replies
68
Views
7 Months
Discussion Span
Last Post by pty
2

How do facebook, twitter, and other social media app store the data of the user's chat history or chat log?

You don't want to know (nor is it all public, billions of dollars rely on that technology), you can't replicate that. Facebook, for instance, uses a customized combination of HBase, Hadoop and Zookeeper. They are far too large to be considered an example.

I see you tagged this thread with mysql which might be an option, but also consider NoSQL. HBase is a column based storage, so is Cassandra. If you want to store user-relations, or other types of relations (for instance user X liked post Y, user X added user Y as a friend) you can use a graph based database like Neo4J.

As it so happens, you could also integrate with the Dazah network (the network you signed up with for DaniWeb). It powers DaniWeb as well, and was also made by Dani. She will no doubt help you out with that if you have any questions on how to get that set up.

2

Dazah is a chat API and I would like nothing more for you to give it a whirl. You can check it out here: https://www.dazah.com/developers

I originally created Dazah with MySQL because it's what I knew, and I felt it would be the quickest thing I could get up and running with the skills I already had. However, as I attempted to scale it, I became increasingly worried it wasn't the right tool for the job.

My biggest challenge was adapting MySQL to the concept of user nodes, metadata attached to those nodes, relationships between user nodes to determine who is in a conversation with whom, etc.

My friend recommended Neo4j to me because, as a graph-based database, where you can create nodes, properties for nodes, and relationships between nodes, it was the super ideal fit for my schema. However, after working for an ENTIRE VERY LONG DAY with a friend of a friend (who has now graduated to just friend status) who is very experienced with Neo4j in the world of big data (isn't it awesme living in silicon valley?!), we together determined that MySQL was the best fit after all.

What it boiled down to was that I was doing so many performance hacks in MySQL that it felt to me like it was not the right tool for the job because my queries were getting increasingly "hacky" IMHO. It constantly felt like I was trying to stick a square peg into a round hole. Couple that with the fact that I've only ever coded for myself for DaniWeb, and I am always self-conscious that I'm not doing things the "right" way. It took someone experienced with enterprise big data to look at my queries and basically tell me that Facebook doesn't do anything different than the way I'm doing it. Needless to say, I was shocked. We took some of the most "hacky" queries I had and converted them to Neo4j, and my queries were taking a shorter amount of time than it takes to establish a Neo4j connection!! Apparently I was unaware of the super significant overhead that Neo4j requires, making it not ideal for scaling realtime applications such as chat.

0

First of all, @jeffersonalomia, you probably don't want to use Neo4J. I still strongly suggest trying out Dazah.

@Dani,

I already mentioned the facebook bit to you so I won't repeat that, however regarding Neo4J there is something I'd like to add.

not ideal for scaling realtime applications such as chat

It's highly scalable for real time applications. Walmart uses it for real time product recommendations, as does Adidas. Global 500 uses it for real time routing, as does Ebay.

Recently this stackoverflow user posted his conclusion after a speed comparison between MySQL and Neo4J. The final verdict: given 100K nodes / 10M relations, a recursive query going down 4 levels of relationship took 40s in Neo4J and 24s in MySQL. Sounds like a solid experiment, and if you look around the Neo4J Google Group you'll discover it was actually written as part of a master thesis and that it wasn't quite the result he was expecting.

Luckily for his hypothesis, a good reply came in shortly after. With some query optimization and the use of Roaring bitmaps the Neo4J performance had slightly improved: 2.7s, one laptop, one thread.

Twice as slow became ten times faster.

This doesn't mean Neo4J is automatically suited for a chatting application. It does however handle graph data better than a relational DBMS, after all that's what it's designed for. To put it in perspective, MySQL predates Neo4J only by a decade.

Long story short, if MySQL is quicker than Neo4J in your case, keep using it. But just because you can't connect to it as fast as you can to MySQL doesn't mean others have that same problem, or that it's slower than MySQL and unsuited for real time applications.

*takes a chill pill*

I am always self-conscious that I'm not doing things the "right" way (...) Couple that with the fact that I've only ever coded for myself

If you weren't it would mean you stopped learning. As long as that feeling stays you'll keep improving. And as far as coding for yourself goes.. you wouldn't be the first one to do a better job than the "professionals".

0

Long story short, if MySQL is quicker than Neo4J in your case, keep using it. But just because you can't connect to it as fast as you can to MySQL doesn't mean others have that same problem, or that it's slower than MySQL and unsuited for real time applications.

Yes, my experience with Neo4j is limited to about a week. However, the amount of time it took me to connect to Neo4j was just under the amount of time it took me to connect to and execute one of my more complicated MySQL queries. So I decided not to pursue that route because, although conceptually it made more sense for the schema, it also came down to the fact that I would still need to be using MySQL for the realtime chat. (As it wasn't performant to have all of the messages be individual nodes, and that made more sense with a relational db). It was really the matching stuff and figuring out degrees of separation that I wanted to do with Neo4j.

Also, I don't mean to imply Neo4j isn't suited to all realtime applications. It just didn't make sense for chat in my case.

1

A graph database alone isn't the right choice for a chat application. A message would be represented as an edge between two nodes (people) and too many edges will hurt performance.

Storing the actual social network portion of a chat app in one makes a lot of sense, though. It's the perfect use case for a graph database.

Something like OrientDB might make more sense. There's even a chat program in the use cases section of the documentation.

1

Storing the actual social network portion of a chat app in one makes a lot of sense, though. It's the perfect use case for a graph database.

Yes, I realize that storing messages in a graph database would not be ideal. For my use case, I was trying to store who has an existing conversation with whom, who is included in which audience segment, and which audience segments are included in the current Dazah API access token's "app bubble". Therefore, a single Dazah app would be able to find all of the converations I'm in, the people I know who I'm not yet in converations with, and degrees of separation, with other users accessible within the current application's scope.

0

Yes, that sounds like a good use case and it's quite easy to set up a reasonable demo in Neo4J.

create
    (Frank:Member {name:'Frank'}),
    (Julian:Member {name: 'Julian'}),
    (Amanda:Member {name: 'Amanda'}),
    (Claire:Member:Admin {name: 'Claire'}),
    (Susie:Member {name: 'Susie'}),
    (Dolly:Member:Admin {name: 'Dolly'}),

(Frank)-[:HAS_CHATTED_TO]->(Amanda),
(Claire)-[:HAS_CHATTED_TO]->(Frank),
(Amanda)-[:HAS_CHATTED_TO]->(Susie),
(Susie)-[:HAS_NOT_CHATTED_TO]->(Dolly),
(Dolly)-[:HAS_NOT_CHATTED_TO]->(Frank),
(Frank)-[:IGNORES]->(Susie),
(Dolly)-[:IGNORES]->(Julian),
(Julian)-[:HAS_CHATTED_TO]->(Amanda)

Which results in this graph:

neo4j graph

Simple queries are straightforward, I'm no expert though and combining multiple conditions baffles me.

match (n:Member)-[r:HAS_CHATTED_TO]-(m) where n.name='Frank' return n as Frank,r,m

Results:

+---------------------------------------------------------------------------+
| Frank                  | r                      | m                       |
+---------------------------------------------------------------------------+
| Node[25]{name:"Frank"} | :HAS_CHATTED_TO[24] {} | Node[27]{name:"Amanda"} |
| Node[25]{name:"Frank"} | :HAS_CHATTED_TO[25] {} | Node[28]{name:"Claire"} |
+---------------------------------------------------------------------------+
2 rows
17 ms

Compiler CYPHER 3.1

Planner COST

Runtime INTERPRETED

+------------------+----------------+------+---------+------------------+----------------------------+
| Operator         | Estimated Rows | Rows | DB Hits | Variables        | Other                      |
+------------------+----------------+------+---------+------------------+----------------------------+
| +ProduceResults  |              1 |    2 |       0 | Frank, m, r      | Frank, r, m                |
| |                +----------------+------+---------+------------------+----------------------------+
| +Projection      |              1 |    2 |       0 | Frank -- m, n, r | {Frank : n, r : r, m : m}  |
| |                +----------------+------+---------+------------------+----------------------------+
| +Expand(All)     |              1 |    2 |       3 | m, r -- n        | (n)-[r:HAS_CHATTED_TO]-(m) |
| |                +----------------+------+---------+------------------+----------------------------+
| +Filter          |              1 |    1 |       6 | n                | n.name == {  AUTOSTRING0}  |
| |                +----------------+------+---------+------------------+----------------------------+
| +NodeByLabelScan |              6 |    6 |       7 | n                | :Member                    |
+------------------+----------------+------+---------+------------------+----------------------------+

Total database accesses: 16

Edited by pty: image alignment

Attachments Screen_Shot_2017-03-07_at_19_08_23.png 114.89 KB
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.