Adak 419 Nearly a Posting Virtuoso

I can't tell by looking at your code, what's wrong. My suggestion is you alter your code temporarily, and have it put out EXACTLY the same pic.

Now you can use utilities like "fc /b filename filename" (but check it with "fc /?", because I haven't used it in a long time), to do a binary comparison of the files and see where the differences are - and thus find your error. (Linux has similar tools).

Looking at an altered image, it's very hard to see the exact pixels that are wrong. Looking at two versions of the SAME image, when magnified with something like Windows Paint, or any image editing program, it because a LOT easier.

Adak 419 Nearly a Posting Virtuoso

No, the code is not correct, and it's not Windows problem. Listen!

You need to add the ampersand for your scanf()'s where the variable is not a pointer (or the name of an array).

printf("\n\t\tEnter SSS contribution: P");
scanf(" %d",sss);             //add an ampersand: &sss
printf("\n\t\tEnter PAG IBIG contribution: P");
scanf(" %f",pagibig);        //add an ampersand: &pagibig
printf("\n\t\tEnter Philhealth contribution: P");
scanf(" %f",phl);            //add an ampersand: &phl

I did not check all your code for problems, or for missing ampersands - you need to do that.

Adak 419 Nearly a Posting Virtuoso

Why not read through the subjects of other projects, and see if that doesn't spark a bit of interest in you. Imo, it has to be something you're interested in.

Also, read through the science news (Google it), and again, see what catches your eye.

Driverless cars? A few are licensed and on our roads, in the US.

Navigation ("GPS") in space using Pulsars is being studied by Max Planke Institute in Germany. Seems we currently can't position our satellites once they get away from Earth, very well.

Argon-Argon Dating is in the news on the BBC website. Best date yet for the extinction of the dinosaurs, and the first placental mammal (non-egg laying).

Plethera of distributed computing projects AND illegal botnets (Symantic and Microsoft just took down a HUGE botnet). The good and the bad they do.

Ex-President Bush just had his email hacked into - interested in securty?

You'll find lots of stuff if you just search on-line. I don't think anyone can direct you to a subject however - it has to be something you're interested in, curious about, etc.

Good luck! ;)

Adak 419 Nearly a Posting Virtuoso

The only thing the program is timing is the run time of the issorted(), being called over and over.

And we have no code for issorted() here.

You might be able to optimize the sorting function, but not without seeing the data being sorted. Qsort is not the fastest sorter, but it's no slouch either. It's claim to fame is it will sort ANYTHING since it works with void pointers, without any changes to the underlying code.

qsort can be beat - but probably not worth it if you want a sorting function that will handle void pointers.

Adak 419 Nearly a Posting Virtuoso

Interesting. Stanford and MIT (among several others), also have computer classes/lessons on-line.

There are several other "problem/challenge" computer programming sites also:

These are the two problems I've helped with:
Codechef: http://www.codechef.com/BTCD2012/problems/DOORS
SPOJ: http://www.spoj.com/problems/TDKPRIME/

And the funniest of them all:
http://www.ioccc.org/

hilarious! ;) Park your C code sanity, at the door!

Adak 419 Nearly a Posting Virtuoso

You could start here:
http://www.daniweb.com/software-development/computer-science/threads/13488/time-complexity-of-algorithm

Wikipedia has a couple pages on it. Start here, and follow the links to Big O Notation:
http://en.wikipedia.org/wiki/Analysis_of_algorithms

Eratosthenes was an interesting Greek - brilliant and well balanced in several sciences, he became known as "second" when he entered several math/science contests and came in second place. He was the head librarian in the famous Alexandria library, when it had the world's greatest knowledge resource.

He got the (very) last laugh it seems, since his Sieve for finding primes can be optimized easily, and runs in first place (fastest time), for all the ints you can fit into an array, without causing swapping to HD.

Google up "Euler Project" for a series of computer/math challenges.

Adak 419 Nearly a Posting Virtuoso

There are WAY too many algorithms to remember even the most important one's. Because depending on what the program is doing, LOTS of them are very important.

I'm a hobbiest, and never took anything beyond the first semester of C, but I've got a FEW years of hobby experience with it, so:

I try to match the speed (complexity) of the algorithm, with the job. No sense trying to optimize a telephone directory sort by name, when it's your own personal directory, with 100 names or less. On the other hand, if it's the county/district directory, with 5 Million names - yeah, that's worth a very fast algorithm.

Fast algorithms I like:
Insertion sort - for almost sorted arrays, and all small one's (less than 1,000) say. Oddly, it's faster than anything else for sorting small and almost sorted arrays.

Except of course, for the King of Fast:
Counting sort - limited use, but BLAZING UNMATCHED speed, if it can be used. (which is rare)

Quicksort - with Insertion sort on small sub-arrays to optimize it, if it REALLY needs the best speed. Pick first + last value of each sub-array/2 for the pivot. Uses minimal amount of extra memory. You hear bad mouthing about it - well, not if you pick the pivot this way.

Merge sort - for large external data that can't all fit in memory.

Binary Search - great general purpose searcher of sorted data. Don't use it for tiny jobs, but ALL the …

tux4life commented: Nice list ;) +13
Adak 419 Nearly a Posting Virtuoso

Thinking more about your post, I'm not sure whether you wanted a more optimal program, or a simpler - clearer - program. If it's simpler and clearer, or for a small assignment/job, then I would use strstr(), (part of the string.h include file definitions), in a loop:

#include <stdio.h>
#include <string.h>

int main(void) {
   int n,len;
   char text[273]={"I am the eagle, I live in high country, in rocky\n"
   "cathedrals that reach to the sky. I am the hawk,\n"
   "and there's blood on my feathers, but time is still turning,\n"
   "they soon will be dry. All those who see me, and all\n"
   "who believe in me, share in the freedom I feel when I fly."};

   char target[273];
   char *pch=text;
   printf("\nOur text is: \n%s\n\nEnter the string you want to search for: ",text);
   fflush(stdout);
   scanf("%[^\n]",target); //take all input up to a newline
   printf("\nOur target to search for is: %s\n\n",target);
   len=strlen(target);

   n=0;
   while(pch>(text-len)) {
      pch=strstr(pch, target);
      //printf("\ntext: %P   pch: %P  Press enter when ready\n\n",text,pch); getchar();
      if(pch) {
         printf("%s\n\n",pch);
         ++n;
      }
      pch+=len;
   }

   printf("\nTarget was found: %d times\n",n);
   return 0;
}

But this program has a problem. Try searching for:
in rocky cathedrals that reach to the sky.

And it can't find it. The problem is the newline - the target doesn't have one, right after "rocky", and the text phrase, does have one.

So you'd have to search for"
in rocky
cathedrals that reach to the sky.

Which would require replacing [^\n] in the scanf(), with maybe[^~], or other …

Adak 419 Nearly a Posting Virtuoso

Code is useless at this point. Not your code, but ANY code. First, before ANYTHING ELSE, get the right algorithm. And you don't have it.

No amount of optimizing later will overcome the problems of a poor choice in the algorithm.

And this is the algorithm you should be studying:
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

THAT is the most widely used string searching algorithm, and at that link, at the bottom of the page, is a link to a C example showing it.

What's wrong with your algorithm? You are stepping through the string by merely 1 char at a time.

ONE CHAR AT A TIME

Think about it. How could that be improved?

The thing is, you need to work it through with paper and pen several times, by hand. Do YOU check for a string inside a larger string, by checking EACH AND EVERY CHAR?

Of course not!

What do YOU do? Work it through several times, S-L-O-W-L-Y, and notice the pattern that your eye and pen follow doing this. The pattern will reveal itself if you repeat it enough.

And do read up on Boyer-Moore, for all kinds of hints. You don't need to use all their tricks, but you should use some of them.

Save this code, and later, put a timer in it and compare it's time, with Boyer-Moore's program. The difference is astounding if the text being searched is large. The larger the sub-string, the quicker BM's algorithm gets.

Adak 419 Nearly a Posting Virtuoso

You have a stop variable. Every number that is single, is given a "stop" equal to, the first number read, meaning no other numbers will be printed beyond the first one. Otherwise, stop is given the value of the number after the hyphen (the second number).

So I picture this as a read, followed by a for or while loop: e.g.

int number, stop;
read the number, assign value to stop.
for(i=number;i<=stop;i++)
      print i

How verbose was your code? (How many lines?) I don't believe this is highly optimized, but it seems straight forward, clear and acceptably concise.

Adak 419 Nearly a Posting Virtuoso

You have two constraints and one goal. The goal is ez - maximum volume of the box.
(length x width x height)

The two constraints are:

1) The length of wire to make around the edges of the box (there are 4 horizontal edges on the bottom square of the box, 4 on the top square of the box, and 4 on the vertical edges of the box.

See this picture:
http://chemistry.about.com/od/chemistry101/ss/3dformulas_3.htm

2) The surface area of the box, which must be smaller than the amount of cubic cm of paper that is given in the problem.

Study the above drawing in the link, and note especially the formula for the surface area of a box. That formula tell you when you have reached the greatest possible surface area for the given problem.

Questions:

Is it true that there is a direct correlation between a greater wire length used in making the box, and the volume of the box?

Is it true that there is a direct correlation between a greater surface area of a box, and a greater volume of the box?

You need to dig deep to solve these problems, particularly if you haven't seen them before.

Adak 419 Nearly a Posting Virtuoso

When you finish moving menu() and etc., outside of the other functions, then post your new code up, if it still has problems.

That way we'll be able to see the new code, and be on the same pew.

Adak 419 Nearly a Posting Virtuoso

main() is always the first function the program will execute - regardless of what order your functions are listed in. (At least, of the functions you can see. There are other functions that boot the program and prepare it's execution, but you never see them)

You MUST have a main() function, explicitly declared, nowadays. (Before this change, some compilers would create a main() for you if you forgot it. No more.)

Your program should have ONE main() function, and your program should NEVER call main(). main() should not call itself, nor should main be called from any other function. (That you can see).

Adak 419 Nearly a Posting Virtuoso

scanf() requires the address of the variable it will change. printf() does not.

& is the address of operator.

Adak 419 Nearly a Posting Virtuoso

In C, a semi-colon marks the end of an expression. On these and other expressions, your semi-colon has cut them off - and you need to remove them.

for ( i=0 ; i<=8 ; i++ )**;**
{
for (j=i+1;j<=9;j++)**;**
{
int temp;
if(array[i]>array[j])**;**
{
Adak 419 Nearly a Posting Virtuoso

I'm still yakking with my ISP about a handle that I want. :(

If you want to run your own program to gather and make statistical reports on 300 million people - it's going to need some "tedious" code work. You can't fit 300 million records into memory, and still work with it, all at the same time.

Since you want to avoid that coding, then clearly a database is the way to go. I can't help you with database questions, however. Haven't used one in 20 years, and they bore me to tears. ;)

Go to the forum that specializes in your database. There, you'll find the most experienced users to answer your questions.

Adak 419 Nearly a Posting Virtuoso

That's what database software (and the operating system itself), will do, behind the scenes.

Even more, actually. They may not do it as fast, depending on what you want, but they are very flexible, and generally have decades of programmer-years behind them, in making it right.

If you have a program and a large file of strings for instance, just enter this at a command prompt (in a console window):

sort <nameOfInputFile >nameOfOutputFile

In the case of Windows, that will launch multiple threads or processes, and give you a guaranteed amount of resources, also. Unless you know how to use those extra resources, even the best sorting algorithm can't beat Windows sort command.

You never have to use multi-tier merging, but it is elegant, and fast. Not as difficult as it looks - you keep the number of the highest file at each tier, in a simple int array or variable.

Good luck with your project.

Adak 419 Nearly a Posting Virtuoso

I'm no math whiz, but don't you have to find the range of possible answers for a,b,c,d, separately, in every equation, and then look at the range where all 4 variable's ranges, overlap?

Emphasis on the highest and lowest values for each variable, of course. Everything in between should be golden. (good)

Adak 419 Nearly a Posting Virtuoso

Use arrays, of course. The arrays can be either global (declared outside a function, usually above main(), or preferrably, created with malloc(). Be sure to include stdlib.h for malloc().

Although I have preferred one style of Quicksort, I am starting to really warm up to qsort() for these sorting jobs. The only problem with qsort is the darn syntax for the parameters. It requires void pointers, which means casts are needed for both the call to qsort, and for the parameters to the compare function you make.

Just look it up, and write it down - one for strings, and another syntax example for integers, should get you started.

qsort(A, MAX, sizeof(A[0]), compare_int); //calling qsort

where A is the array name, MAX is the highest index to be sorted (and technically a size_t, sizeof(A[0]) is the size of the first element of A, and compare_int is the name of the compare function that you want qsort() to use.

Strings:

int compare_str(const void *a, const void *b) {
     return(strcmp(a,b));
}

The compare functions are typically tiny, as above.

Integers:

int compare_int(const void *a,  const void *b) {
return ( *(int*)a - *(int*)b );

For multi=key comparisons, the compare function looks like this:

int compare ( const void *a, const void *b ) {
  const struct user *pa = (const struct user *)a;
  const struct user *pb = (const struct user *)b;
  for ( int i = 0 ; i < LEN ; i++ ) …
Adak 419 Nearly a Posting Virtuoso

Looking at the bigger picture, it would be best to use database software, to handle this quantity of data, and provide you with the info from that data, in the report format you need. For an academic situation, maybe you are required to do this with your own program(s).

Moving on, then:

1) If you judge hashtag (tags), matching to be a part of similarity between users, then you need to have a way of gathering the tag data. You need a way to gather up the raw data quickly and put it into your servers.

Each tag will go into a file, with no sorting, because it's placement is fixed. New tags are appended to the end of the tag file. Each tag is given a number, matching it's row number, in the list of tags.

Each user you record will have their name in the user file, and in their own file, they will have the list of all the tags they have used - but not tags, just tag row numbers.
(The indeices if you prefer). These numbers will be counted up for frequency of use, and the most frequent 50 tags or so, will be recorded, in a users data file. Minimum data, just an ID number, name/handle, and the 50 or so most frequent tags they've used, by index number.

You must have this before any steps below, can work.

2) Now open this data file just made at the end of step 1), …

Adak 419 Nearly a Posting Virtuoso

Big speed up from multi-key sorting, etc. 2 seconds for a million records.

Uses a bit more memory than I'd like, but this is with 52 hashtag indices per record.

Adak 419 Nearly a Posting Virtuoso

A struct in C, corresponds to an object (roughly), in OOP programming. Each record would then be an "object", and other record fields can be added as needed.

If you make the hashtags in the record, a string, instead of a number (index) to a master hashtag list (all strings), then "deeper" searches will be much slower. You'll be making a simple number comparison in the first case, but several actual string comparisons will be needed, in the second case, since you will not have the index of the string anymore.

There would need to be a master list of hashtags kept, which would enumerate all the strings.

You're welcome. I'm working on a permutation program tonight, but tomorrow I'll get back into refining the sorting on those hashtags.

Adak 419 Nearly a Posting Virtuoso

Good news! This "column wise" record sorting, which I've been struggling with, is well known as multi-key sorting.

So the sort times will be able to be cut a GREAT deal. I should have seen that, but missed it.

Adak 419 Nearly a Posting Virtuoso

It's more difficult than that. There are lots of different image formats, and every one of them has strictly different requirements.

Google for an image library for the language of your choice. Several are free. We don't take code requests - we help YOU with YOUR code that you bring.

Adak 419 Nearly a Posting Virtuoso

I've been wrestling with the "Quicksort is not a stable sort", Gorilla - for a holiday diversion. ;)

The Quicksort gorilla was bad enough, but the Radix gorilla was REALLY touchy. Get him in a good arm bar, and he wants to bite, or throw malformed integers at you!

The inital hashtag column of one million records, is sorted by Quicksort in about 1/2 a second, but there the fun ends. All the other 51 columns of hashtag indices, have to be gone through, and have the out of order one's, corrected. You can't have Quicksort do it, unless you really slow it down.

So in run times, we're back to about where we were last week, with 17.1 seconds for 1 million records with 52 hastags per record. That is ONLY for sorting, btw.

The advantage is that the hastag values remain as digits, and don't have to be translated into strings, and then put back to integers, so they can be used.

This is an example of the first 100 rows of hashtags:

 0:  0   1 474 753 478 945 456 255   1 198 847 945 551  24 396 986 242  85 154 
 1:  0   3 299 465 382 737 460 803 301 771 179  50 801 262 483 330 815 905 349 
 2:  0   3 345  44 796  23 873  36 883 551 419 744 744 659 336 470 891 699  16 
 3:  0   3 849 827 747 316 243 938 517 197 597 700 …
Adak 419 Nearly a Posting Virtuoso

I will, but I'd like to keep this in the public forum, to the extent it's possible without divulging proprietary info. You started it here, it is an interesting topic, and imo it's not well covered by search engines. There's a ton of info on sorting, but I couldn't find anything half-decent on sorting records by columns of an array.

There WAS a few with Bubblesort, and clumsy logic, though! < ROFL!! >

Bubblesort: the Pain in the Butt, that just keeps on giving.

The big problems I see with any project like this are:

1) Acquiring the raw data

2) Classifying the raw data, into usable data

3) Processing the usable data

You must have #1 planned or handled already, or there would be no point to work with #2 or #3.

I understood from the previous pic that the format of the data would be:

ID_number: a unique integer
handle (name) a string, may not be unique
tags (some variable number of them)

Although there might be hundreds of tags a user might have used, we can't work with every tag. We need to select from those tags, the top 50 (fewer is faster), or so. That's where the binary search will be very useful. If a user has tags of:

art
beaches
cats
caves

Then those will be searched from the list of all tags (yes, millions are fine here). Each tag "hit" is counted, (by incrementing the index to that matching …

Adak 419 Nearly a Posting Virtuoso

We've all been there, Tadas. That's half the motivation to get better with our programming, to be honest. ;)

Welcome to the forum!

Adak 419 Nearly a Posting Virtuoso

Eureka!

Records sorted: 1000000 with 52 columns each. Time:  1.467 seconds -- new
Adak 419 Nearly a Posting Virtuoso

One = is for assignment ONLY. Two ='s are needed for comparison: c==0. Since this is already true, the loop never starts.

Use

for(i=0;i<10;i++)
   printf("%d\n",i);

a is zero, so adding it to i will make no difference.

Adak 419 Nearly a Posting Virtuoso

I'm getting a new email addy. No one can remember the old one.

Sort times with the hashtags as strings was disappointing, at 16 seconds for 100K records with 52 tags for each user. I'm re-configuring the program to use numbers, instead of strings. Also, recoding the sorter function to work with records with multiple columns.

It will be much faster.

< Happy new year! >

Adak 419 Nearly a Posting Virtuoso

It's now using the data for 100,000 records, with 52 hashtags for every user, from a set of 300,000 hashtags. Run time is long because I have an extra write out of all the data, (in one format), so it can be read in, in a better one. This will be removed.

Please post how you sorted (or were going to sort) these records. I'm unsure if it's better to:

*sort the tags indices as a single string (what I'm doing now)

or

*sort the tags as an arrays of 52 ints per user (in effect, making it a very large 2D int array with one user's hashtag indices, per row.

Once they see some code from you (in accordance with the forum's policies), I'm hoping a few regulars will reply and give us further ideas.

Also, I'd like to not propose things that were already not approved. No sense in that.

I'll have some preliminary sorting times late today.

Adak 419 Nearly a Posting Virtuoso

Thanks for the best wishes, and same to you and yours, in 2013.

The frequency of the hashtags would be for one individual only - at least one version of it needs to be - otherwise the similarity grouping goes out the window, as far as speed, because a sort on everyone's hash tags, won't mean much at all. We need the left most hashtag to be the most frequent hashtag from that ONE user.

I've done this kind of thing before, but usually it's on the CBoard forum, so there's lots of discussion and idea's floating around. With your permission I'd like to make a thread over there, just on the hashtag sorting issue, and see what they think. I am hoping someone like Deceptikon will also pitch in with idea's, on this forum.

I can't say too much about counting the hashtags for frequency by an individual, because I don't know what the raw data will be like. I agree the hashtag sorting is the biggest problem to the program. How could it not be?

I'm starting out with an optimized Quicksort I have. It's my champion integer sorter for large quantities (up to 10 million only though). I also have a Radix MSB function I thought I'd try, and I'm reading up on a couple other excellent string sorters.

The hashtags storage I'm suggesting, will have a file (and an array when needed), of the strings. The hashtags may be sorted, or just indexed (which is just …

Adak 419 Nearly a Posting Virtuoso

I've increased the tags to 32 per person, minimum. I have to increase the number of users. I need to see more changes in the run times for the sorting. (which I'm experimenting with atm).

Adak 419 Nearly a Posting Virtuoso

I'm sure there was a bit of preloading into the drive's buffer, but I had to give a satisfied smile.

Time to load 3K records, and build the two arrays, including writing out all the 5K tags with 10 tags per user to the files:

0.015 seconds.

;)

Adak 419 Nearly a Posting Virtuoso

E-mail will have to wait until Monday.

In the meantime, delete the bool array idea I mentioned above. This is my latest idea:

We get all the tags for an individual user, and sort them by their frequency, in descending order (important). We also have a file with all the most frequent tags, from all the users. Say the top 500 million. There will be lots more tags, in total, but these will be the most frequent tags, so we're getting the meat of the data, since there will be tens of thousands of repeats, in the set of all tags from every user.

In the user's data file, the abbreviated record might look something like this:

12,Addison,kinetic happier ejectors gales sated madam neurones garters lapsed 

Where 12 is Addison's ID, and each word after her name, is a tag she's frequently used, which is sorted in descending order - so "kinetic" is her most common tag, for this example, and "happier" is her next most common tag, etc.

But we don't want to work through these words, because nothing is slower than doing a lot of string work, on a computer. So we use the tag array and binary search, to replace those tags, with the index numbers matching those tags.

Now Addison's abbreviated record looks like this:

12,Addison,2181 1614 1043 1294 3649 2480 2844 1405 2402 0 

Much nicer!

To make comparisons, we put those numbers into a string, including the spaces:

2181 …
Adak 419 Nearly a Posting Virtuoso

While waiting to sort out an issue with my email, I'm putting together a data file with 3 thousand names, and giving each name, from 0 to 99 tags.

Get a better feel for what works well.

More on this, as it progresses.

Adak 419 Nearly a Posting Virtuoso

That helps! The cosine similarity article looks good.

Adak 419 Nearly a Posting Virtuoso

Thanks for the kind words. I'll get you an email address. Not meaning to poke fun at a "god", but throw out the previous idea of mine to measure similarity - won't work for what you need. New idea is to create a boolean type string, which has a 1 value if it has the hashtag associated with that index of the string, (which would be in a table), or a 0 if the user did not have that particular hashtag.

I'm thinking structs, and array of structs, and every user has a similarity "string" in their struct. A simple example:

User has hashtags of: art, beaches, cats, cows,

And the hastags list is: aardvarks,art,bats,baseball,basketball,beaches,caps,cats,comics,cows.

So this users hashtag string would be:

    0100010101
    0 for aardvarks
     1 for art
      0 - bats
       0 - baseball
        0 - basketball
         1 - beaches
          0 - caps
           1 - cats
            0 -comics
             1 - cows

Now we can make an indexed table of these hastags strings, which would be created as the data was input, by using either an int index array or an array of pointers. In effect, it gives us a way of searching through the hashtag strings AS IF they were all in sorted order, even if they aren't sorted at all by hashtag strings.

Using binary search, we can also find the K closest hashtag strings, for any user, very quickly, BUT The downside to the above scheme is that in any such sorting, the 1's …

Adak 419 Nearly a Posting Virtuoso

It's best to state your question in a new thread for two reasons:

1) it's not the same as the original problem, and

2) it may not get the attention it deserves, here.

You can't use an array to hold the numbers? What CAN you use? No sense speculating about it, when you must have been told what to use, or at least given a hint.

I'll find you in your new thread - or here, as you prefer. Show or tell us specifically just WHAT you can use. And do you have any attempts in code to show us? It's difficult to know what to advise, without a concrete "something" in code, from you.

So give it a try, and post back. You might try working it through by paper and pen, and see what kinds of patterns you see in your own solutions to it, apart from the computer.
That could serve as the backbone of the programs logic.

Adak 419 Nearly a Posting Virtuoso

Just a little program to show the extreme usefulness of Binary Searching (aka "binsearch").

#include <stdio.h>
#include <string.h>
#include <time.h>

#define SIZE 250000

typedef struct {
   char name[30];
   char state[40];
}user;

user users[SIZE][40];    //using memory from the heap, not the local function stack

int binarySearch(char *target, char *states[56]);

int main(void) {
   clock_t timer;
   int i,count,ok;
   char target[40];
   char *states[56]={"Alabama","Alaska","American Samoa","Arizona","Arkansas","California",
   "Colorado","Connecticut","Delaware","District of Columbia", "Florida","Georgia","Guam",
   "Hawaii","Idaho","Illinois","Indiana","Iowa","Kansas","Kentucky","Louisiana","Maine",
   "Maryland","Massachusetts","Michigan","Minnesota","Mississippi","Missouri","Montana","Nebraska",
   "Nevada","New Hampshire","New Jersey","New Mexico","New York","North Carolina","North Dakota",
   "Northern Marianas Islands","Ohio","Oklahoma","Oregon","Pennsylvania","Puerto Rico",
   "Rhode Island","South Carolina","South Dakota","Tennessee","Texas","Utah","Vermont","Virgin Islands",
   "Virginia","Washington","West Virginia","Wisconsin","Wyoming"};

   FILE *fpIn;
   fpIn=fopen("NamesStates.txt","r"); //250,000 names with states
   if(!fpIn) {
      printf("Error opening file!\n");
      return 1;
   }
   timer=clock();
   for(i=0;i<SIZE;i++) {
      fscanf(fpIn, "%s %[^\n]",users[i]->name,users[i]->state); 
   }

   fclose(fpIn);
   //match all the states in the from the users array, 
   //with the states in the states array,using the binary search.
   for(i=0,count=0;i<SIZE;i++) {
      strcpy(target,users[i]->state);
      ok=binarySearch(target, states);
      if(ok>-1)
         count++;
      else if(ok<0) {
         printf("target: %s users[i].state: %s \n",target, users[i]->state);
         getchar();
      }
   }

   timer=clock()-timer;
   printf("Matches found: %d   Elapsed time: %.3f seconds\n",count,(double)timer/CLOCKS_PER_SEC);

   printf("\n");
   return 0;
}

int binarySearch(char *target, char *states[56]) {
   int lo, hi, mid;
   lo=0;
   hi = 56-1;

   while(lo <= hi) {
      mid=lo + (hi-lo)/2;
      if((strcmp(states[mid], target)) <0) {
         lo=mid+1;
      }
      else if((strcmp(states[mid], target)) >0) {
         hi=mid-1;
      }
      else
         return mid;
   }
   return -1;
}

The above is not an optimized program - run of the mill. But it finds a quarter of million matches of 56 states and territories, in less than 4/10ths of a second, on my PC. Pretty impressive, this algorithm! ;)

This is the data file, which I simply copied …

Adak 419 Nearly a Posting Virtuoso

All these are just ideas. You will find ways to refine them, going forward, before you decide "OK, this is set". Using the strings of hashes, for instance. Might find something better, but it's an idea that can work quickly, and should be considered.

Binary search is the kind of logic that you want your fast program, to use. In the world of speed up's we have things like:

1) Do nothing - that's ultimately fast. If you can do nothing further that gives you a gain from lots of records, early on (before a lot of other processing), that's what you want to do.

2) Well designed programs that logically use the data to it's fullest, and don't have to re-load, re-write, re-anything -- if possible. You'd be amazed how often programs have computers doing work, inefficiently.

3) Brilliant algorithms. Among them:

Binary search. With just a handful of comparisons, complete a search for a value, out of a million values! I'm making up a program to demonstrate it. I'll post it up later today.

Lots of similarly brilliant algorithms - from Dancing Links to Dijikstra's shortest path, to optimized Quicksort/Mergesort. They blow the slower algorithms into the weeds.

4) I shouldn't mention bit fiddling tricks, but they are quick. I almost never use them however.

Re: hashtags. strstr() is OK, but Boyer-Moore string searching algorithm may be worth using. strstr() is the standard C function for finding a word, (substring of any kind), inside a larger string. It …

Adak 419 Nearly a Posting Virtuoso

First, I'm not a professional - just a hobbyist, but have a keen interest in high speed processing in C. And three times your age. ;)

Say we had some records - a record is a name and their hashtag:

name     hashtags
====================
alfa     art,bats,cows
bravo    art,cows
charlie  baseball,cows
delta    basketball,dogs,ducks
delta    basketball,dogs, ducks
echo     basketball,dogs, ducks
...
...
zulu    art, bats, cows

Here, our NATO phonetic users, are sorted by name - but thats not what we want, or zulu will be very slow to be seen as similar to alfa, what we want is to sort each hashtag, and put it first into a string:

art bats cows,alfa
art bats cows,zulu
art cows bravo
baseball cows,charlie
basketball dogs ducks,delta
basketball dogs ducks,delta
basketball dogs ducks,echo

No linked lists needed (or wanted). Linked lists are slow, use them only when you can't use an array.

We've sorted each hashtag, into alphabetic order, (maybe they're already listed that way?), but because names were added to the string being sorted, duplicate records for the same user, are right next to each other, and quickly and easily removed if you prefer.

Also, all the art lovers are easily compared, and their similarities scored. Maybe 10 points for the first name, and 20-30 points for 2 matching hashtags, 70 points for 3 matching hashtags, etc. The score could be a flat percentage of all the matching hashtags, or it could take a more exponential curve upward, as the matching …

Adak 419 Nearly a Posting Virtuoso

I'd write a very short utility to get further info from the data, first. I'd want to know how good the hashtags are for differentiating one user from the other. Specifically, what percent of a large sampling (all 6 million), are unique? (100% - repeat%).

I'm not a Twitter user, so I'd want to know how good the hashtags are, in measuring similiarity - sounds like it could be the most important key you have?

If that hashtag similarity association is valid, I'd want to create a string, based on: hashtags first, then their name/handle.

Anything else that needs to be thrown into the ID stew we're about to cook? Shorter is better, but these strings must be able to tell us, upon quick analysis, the similarity ranking we're looking for.

I'm thinking then I'd use a two-pass sort of these strings, so everyone with "cat" in their hashtag, would be grouped together (so every word in the hash tags would be parsed out, and then sorted first, then the whole listing would be sorted). Now give a value for every matching word in the hashtags of the two user's you're comparing. Since the closest one's should be sorted near each other now, it's easy, and when the first word changes, you don't make any more comparisons with that first person.

So everyone with the word "cat" is compared with the first person with "cat", then the second person with "cat", then the third, etc. When you run out of …

Adak 419 Nearly a Posting Virtuoso

Why are you including the header file "stdafx.h"? I'm not familiar with this in C. What about "stddef.h". Why do you need this?

Why are you writing out your words in binary mode, into the file, but reading in file data in text mode? (line 62 and 77)

To debug this:

1) Check that your file data is being put into the file correctly. Be consistent with the file mode you use, always. Any editor can be used, but don't use your program for this part of the debugging.

2) Are you able to read in the words correctly, from the text file? Before you do anything to find the 4 letter words, you have to be sure you're getting all the words OK, from the raw text in the file.

3) If #2 is OK, then concentrate on how you're finding the 4 letter words. Checking for a space first is incorrect - %s will always start with a char, never whitespace.

Work with that and post back. You're close.

Adak 419 Nearly a Posting Virtuoso

Two ways come to mind:

1) use fscanf(). It naturally breaks with spaces, so if your record fields have just a single word in them, followed by a comma and a space before the next field, it's good. The comma will have to be removed from the last char of the word, however. (easy enough).

2) use strtok(). Set the delimiter to ',' and '\n'. Works on multi word fields, where #1 would not work as easily. Requires the include file string.h be in the header file list.

Post up some code to get started, if you need some help.

Adak 419 Nearly a Posting Virtuoso

Sure, but that won't help you. How would YOU solve this - forget the computer for now. Solve it by pen and paper a few times, very slowly.

How did you do that?

Write down the steps you took to solve it, and use small steps. BECOME the tip of the pen, as you are solving it.

Do that a few times, and eventually you will begin to notice definite patterns emerging. Those patterns, will form the backbone of the logic for your program.

This is too good and simple to be giving you ANY hints -- it is VERBOTEN! ;)

Adak 419 Nearly a Posting Virtuoso

If a number can't be repeated, then it's not random or pseudo random - just to be clear.

If a number can't be repeated, even during several runs of the program, then clearly you need a data file to be read to tell the program what numbers either can or can not, be used at that time.

This is not a C code "request" forum. We help you overcome problems with programs you are working on.

Adak 419 Nearly a Posting Virtuoso

Yes, you can.

I have barely worked with this, years ago, but IIRC it was done like this:

1) print the letters out, as per regular

2) Technique #1:
a) use ascii char like 176,177 and 178 (░, ▒, ▓)to "dim" the letters, when you print over them. Then immediately reprint that letter, using bright white (15) color, instead of the dull white normal color (7). Most of the basic colors for the console, have a bright version of it, as well: dull red - bright red, dull blue - bright blue, etc.

It's great to fine tune these techniques by using the milli second timers.

Technique #2:
a) use a variety of extended ascii char's like 220 through 223 to "blank out" a portion (top, bottom, left or right) of the letter (but very fast!). If you do it fast enough, the "blank out" will not hardly be noticed - the letter just seems to pulsate or twinkle.

Technique #3:
a) if you want to see "wings" on the letters, you need to overstrike the letter space next to it with one of the extended ascii (charts available for free download from many sites), char's that has a horizontal extension on it. Print the extension (and you can use more than just hyphens!), where the asterisks * are now

Space the letters one more space apart, and you can give "wings" to every letter, on both sides, and make them "flap". (a little like you're doing …

nullptr commented: Im glad that someone understands what twinkling is :) +4
Adak 419 Nearly a Posting Virtuoso

Use %p for pointers.

Adak 419 Nearly a Posting Virtuoso

It's not hard once you understand the problem. Work it through several times yourself, by hand. Notice particularly HOW you are using logical patterns to solve it.

Repeat until you understand those patterns - and that will form the skeleton of your algorithm for the program.

BUT - it's up to you to get a thorough understanding of the problem, and make a start with it - it's your assignment or problem, and we can't help hot air, so make that initial work. If you get stuck, post your code and describe the problem.