Can anyone give me hint how to develop the suffix array part ? i know the concept, LCP array design but i am not gettinng how to implement in C ? can anyone please help. I know the uses, algo of suffix array as i have read a lot on it. i want the implemetaion hint of the part in which i have to sort the suffixes of an string.

for ex: if a string is given banana, then

Data structure should be like this:($ -> nmoenic)


then, after keeping i need to sort it. means lowest substring should be at the top most point. So how to do that ? string can be of large length. How to do this thing ? can you please give hint or link ? i have tried and now thinking from your help. thanks in advance.

please somebosy reply here. atleasy one must reply.

atleasy one must reply.

Why? Replying to threads is strictly voluntary. Just because you really want an answer doesn't obligate anyone to provide one.

@decpetikon sorry sir! but what to do ? Nobody was replying and i was eager to get a reply. Can you please tell a basic simple thing in this question ? i just need to ask how can i save the string in C ? i will be thankful to you if u can help me. thanks

but what to do ? Nobody was replying and i was eager to get a reply.

My first instinct is to say don't put all your eggs in one basket. It's unreasonable to expect that Daniweb will have all of your answers, so while waiting for a reply you should be doing as much research as possible in an attempt to answer your own question.

i just need to ask how can i save the string in C ?

It looks to me like you were asking how to write a suffix array based sort, which isn't exactly trivial. Might I suggest looking at existing programs that do what you want? I have no doubt you can find at least one on google.

commented: quite well answered! +0

means lowest substring should be at the top most point. So how to do that ? string can be of large length. How to do this thing ? can you please give hint or link ?

So you want to know how to sort String in the way to use in Suffix array?

After looking at wikipedia, string is compared by its char position. The string inside the array should follow rules as follows:

1)A string whose character at the same position comes first will be at a lower index (i.e. "ana" must come before "nana" because the character at position 0 (0-index) of "ana" which is "a" comes before "n" from "nana")
2)If both strings starts with the same characters, the shorter one will have a lower index (i.e. "ana" must come before "anana" even though they both start with "ana" but the "ana" is shorter than "anana")

Because it is a suffix array, each element will never be the same length, thus there will never be equal value in comparison.

@taywin yes, i have read the link very nicely before your post. i have read around 10 articles on it. but i wana ask that should i use a 2-D array for storing this types of different strings which i think is not the solution. Can you suggest me any tricky way to do this ? like something related to pointer? (because many strings problems solves with pointer, but in this case i am not getting how!) thanks

Hmm... Not sure why you would use 2d array to solve the sorting? Could you simply use two of 1d arrays (one for original and the other for sorted) and apply any sort algorithm to the sorted one? What you would need to customize is the string comparison part. Unless, of course, you want to optimize the way it work, then you may want a new data structure which could be pointers. However, I would implement a tree-liked structure instead to handle the data.

can u elborate more ? i am not going to use 2-D array here

Before answering you, I need to know what would your first thought about using 2d array be? Are you going to use one dimension as index value and the other as value of the index (as in the wikipedia suffix & i columns)? I want to ensure about how your understanding to the suffix array.

Does the suffix array need to handle only one word? Is the amount of word to represent by the suffix tree not bound? Your example seems trivial, meaning that you can simply just use std::string to hold a string data and print out the suffix tree by for loops. How about a general problem statement or the actual problem statement

no, like in the string banana, i am thinking that i will make a 2-D array and in each row i will store it's one sub-string and then sort. got it ? but i am thinking that if i do that, then 2-D array will go out of bounds. so was asking for your help that how are you thinking this problem ? and which Ds you are using to store the strings before sorting them ? thanks

@first person string is of max length:1,00,000. problem statement is that i need to find the different sub-string in a string. like banana here. and i am using suffix array to do that. and suffix array need sorting of its sub-strings which i have prvided in the first post. got it ? now, my question is that how to store those strings and then i need to sort then also. i so coding in pure C. can you help a little bit ? thanks

If I correctly understood what you wanted, your purpose is to be able to quickly find a given substring in a very long string if found.

This is quite a trivia problem, but would help you understand how suffix array works. Anyway, you do not need 2d array for the long string at all. What you need is one 1d String array. Now, you could either use a size of n or n+1 where n is the length of the string. The reason is that the value of index can be used in the position finding in suffix array. If you use size of n, when you deal with the position, return the value of the index you found plus 1. If you use n+1, you simply ignore the item at index 0 of the array and work your way starting from index 1.

You need to create the array by chopping each string in a loop...

i <- 0
array <- an array size of n  // use array size of n
while i less than the length of the string
  array[i] <- substring from 0 to i+1
end while

You now use the array to sort in any kind of sorting algorithm... Below is an example using comb sort algorithm.

array   // the array created with all substrings of the desired string
i <- 0
gap <- array_size
swapped <- false;

while gap is greater than 1 || swapped
  if gap greater than 1
    gap <- integer value of (gap/1.247330950103979)
  end if

  swapped = false;

  for i<-0; i+gap < array_size; i=i+1
    if compareString(array[i], array[i+gap]) > 0
      swap values of array[i] and array[i+gap]
      swapped <- true;
    end if
  end for
end while

compareString(string1, string2)
  // follow the rule I mention above.
  // do not use the short cut of comparing incoming string sizes
  // but rather compare each charater from the head to tail
  // and return a proper value (1, 0, or -1 but 0 won't ever be returned though)

no, How exactly you are dealing with the pronlem i have asked you ? I mean Can't i sort the strings just by using qsort() function ? if yes, then what type of array will i send in the function ? and if no, then how exaclty you are storing the string so as to sort them and then make LCP array. I hope you are corerctly getting what i am trying to ask you. are you getting what exactly i am trying to ask ? just read the first post once, and let me tell u what exactly i want.

I input a string. okay ? then i have to make a LCP array(which i know) and suffix array(it is a part in making full siffix array algorithm) okay ? now, when i input a string, then i need to have a sorted version of all suffices of a string which i input as given in the first post of banana. okay ?

My approach: i wana use qsort() function defined in C already so as the sort all the sub-strings which i have got after cutting the big string like i have shown in the first post. okay ? now, my question is how to store and then sort the sub-string which i have got above after cutting, and after sorting i will make a LCP array which i know how to make that.

Basic question: this question is that find all the possible different sub-string in the given substring like banana. thanks.

Excuse me, who are here asking for help? If you want to be stubborn, I guess I can't help you, Okey?

Basic question: this question is that find all the possible different sub-string in the given substring like banana. thanks.

That's what I understand. However, you stick to built-in sort function and don't want to customize a sort function yourself. As a result, you cannot get it to work because traditional string sort does not work. Let's see... How about look at this c qsort() example for a change. What you need to modify is the int cstring_cmp(const void *a, const void *b) function. You need to customize it in the way that it should compare 2 strings with customized rules as I mention already. The current implementation in the code is a tranditional string comparison using strcmp() function which won't work in your case. That's what I am trying to tell you -- customization. Any kind of sorting is fine. Even though any sort algorithm works, I used comb sort because it is easy to implement and won't use more memory to store data while sorting because you said that your original string could be very long.

@nitin1: Be kind, we're taking our time to help you. We could be going on with other matters, but instead we realize that we were also like you at one point, and thus (for me at least) the feeling of empathy and wanting to help is the reason I am help you. I can choose not to and so can others. So before anything, get you attitude in order. 

Now instead of writing the code, I'm going to give you an idea on how to start, say we want to find all suffix of the string "bananna", well one way to achieve that is to find all suffix from position 0, the go on and find all suffix from position 1, and so on. Here is an instance example,

startIndex = 0:

b   -> (0,1)   #starting and ending position in the string 'bananna'
ba  -> (0,2)   #starting and ending position in the string 'bananna'
ban -> (0,3)   #starting and ending position in the string 'bananna'
bananna -> (0,7)   #starting and ending position in the string 'bananna'

startIndex = 1:

a -> (1,2)   #starting and ending position in the string 'bananna'
an -> (1,3)   #starting and ending position in the string 'bananna'
ana -> (1,4)   #starting and ending position in the string 'bananna'
ananna -> (1,7)   #starting and ending position in the string 'bananna'

and so on, you get the drift. After you have that, you can worry about sorting later.