Discussion:
Apache Cassandra - Question about data model
Lior Menashe
2015-12-31 13:36:12 UTC
Permalink
Hi,

Just got your mail from the #cassandra channel on the web chat because i
couldn't get an answer...

I have a question that i'll be glad if you can help me or give me a
direction.

I have an activity feed like the activity feed on Instagram. When user
(lets say UserA) enters his page he can see all the activities that are
related to him,
for example, user B liked your post.user C commented on your post etc...

the cassandra data model that i thought about is:

userID UDID (partition key)
datetimeadded timestamp (clustering column DESC)
userID_Name text
userID_Picture_URL text
userID_From UDID (this is userB from the example)
userID_From_Name text
userID_From_Picture_URL

With this structure i can get the different activities to a user and it
works just fine. My problem is that userID_From can change his name and his
pictire and i need this data to be updated all arround the different tables
because i want to show the current right values.

The problem is that the update is a table scan and it's not efficient.
Should i hold only the ID and every time that i select a slice of the data
and get a several ID's i'll do a nother query to query about
the values of the users name and picture path? Should i do something else?

Best regards,
Lior
Matthias Eichstaedt
2015-12-31 15:58:35 UTC
Permalink
Hi Lior,
how about something like this where you separate the user fields into a
separate USER_TABLE:

FEED_TABLE
userID UDID (partition key)
datetimeadded timestamp (clustering column DESC)
userID_From UDID (this is userB from the example)

USER_TABLE
userID UDID (partition key)
userID_Name text
userID_Picture_URL text

You have an extra query but you can change the name and picture in one
place.

Matthias
Post by Lior Menashe
Hi,
Just got your mail from the #cassandra channel on the web chat because i
couldn't get an answer...
I have a question that i'll be glad if you can help me or give me a
direction.
I have an activity feed like the activity feed on Instagram. When user
(lets say UserA) enters his page he can see all the activities that are
related to him,
for example, user B liked your post.user C commented on your post etc...
userID UDID (partition key)
datetimeadded timestamp (clustering column DESC)
userID_Name text
userID_Picture_URL text
userID_From UDID (this is userB from the example)
userID_From_Name text
userID_From_Picture_URL
With this structure i can get the different activities to a user and it
works just fine. My problem is that userID_From can change his name and his
pictire and i need this data to be updated all arround the different tables
because i want to show the current right values.
The problem is that the update is a table scan and it's not efficient.
Should i hold only the ID and every time that i select a slice of the data
and get a several ID's i'll do a nother query to query about
the values of the users name and picture path? Should i do something else?
Best regards,
Lior
Lior Menashe
2015-12-31 18:20:22 UTC
Permalink
Hi Matthias,

Thanks for your answer.
According to what you've wrote, if i will select the first 30 lines from
the feed table to a user
i'll need to perform up to 30 more queries to the user table in order to
get the users data.

Isn't it better to use Cassandra for the feed and Some Sql Server to get
the users data in one query?

BR,
Lior



2015-12-31 17:58 GMT+02:00 Matthias Eichstaedt <
Post by Matthias Eichstaedt
Hi Lior,
how about something like this where you separate the user fields into a
FEED_TABLE
userID UDID (partition key)
datetimeadded timestamp (clustering column DESC)
userID_From UDID (this is userB from the example)
USER_TABLE
userID UDID (partition key)
userID_Name text
userID_Picture_URL text
You have an extra query but you can change the name and picture in one
place.
Matthias
Post by Lior Menashe
Hi,
Just got your mail from the #cassandra channel on the web chat because i
couldn't get an answer...
I have a question that i'll be glad if you can help me or give me a
direction.
I have an activity feed like the activity feed on Instagram. When user
(lets say UserA) enters his page he can see all the activities that are
related to him,
for example, user B liked your post.user C commented on your post etc...
userID UDID (partition key)
datetimeadded timestamp (clustering column DESC)
userID_Name text
userID_Picture_URL text
userID_From UDID (this is userB from the example)
userID_From_Name text
userID_From_Picture_URL
With this structure i can get the different activities to a user and it
works just fine. My problem is that userID_From can change his name and
his
Post by Lior Menashe
pictire and i need this data to be updated all arround the different
tables
Post by Lior Menashe
because i want to show the current right values.
The problem is that the update is a table scan and it's not efficient.
Should i hold only the ID and every time that i select a slice of the
data
Post by Lior Menashe
and get a several ID's i'll do a nother query to query about
the values of the users name and picture path? Should i do something
else?
Post by Lior Menashe
Best regards,
Lior
--
ליאו׹ מנשה
Jack Krupansky
2015-12-31 17:15:45 UTC
Permalink
It's best to ask usage and data modeling questions on the user email list -
this list is the dev list, for development of Cassandra itself, not for
development of applications.

See:
http://cassandra.apache.org/


-- Jack Krupansky
Post by Lior Menashe
Hi,
Just got your mail from the #cassandra channel on the web chat because i
couldn't get an answer...
I have a question that i'll be glad if you can help me or give me a
direction.
I have an activity feed like the activity feed on Instagram. When user
(lets say UserA) enters his page he can see all the activities that are
related to him,
for example, user B liked your post.user C commented on your post etc...
userID UDID (partition key)
datetimeadded timestamp (clustering column DESC)
userID_Name text
userID_Picture_URL text
userID_From UDID (this is userB from the example)
userID_From_Name text
userID_From_Picture_URL
With this structure i can get the different activities to a user and it
works just fine. My problem is that userID_From can change his name and his
pictire and i need this data to be updated all arround the different tables
because i want to show the current right values.
The problem is that the update is a table scan and it's not efficient.
Should i hold only the ID and every time that i select a slice of the data
and get a several ID's i'll do a nother query to query about
the values of the users name and picture path? Should i do something else?
Best regards,
Lior
Loading...