| | |
Dictionaries
Please support our Computer Science advertiser: Learn about neural networks and artificial intelligence.
![]() |
•
•
Join Date: Aug 2006
Posts: 137
Reputation:
Solved Threads: 11
Hey guys, I'm creating a code analyzer that enforces the CamelCase convention that Java uses, for example, thisIsAWellConstructedJavaVariable and thisinnotawellconstructedjavavariable.
So obviously I need to allow my program to indentify English words. What I plan to do is search over a dictionary (database if it exists) until the are no results and so assume that a new word has begun. So suppose I had areaoftriangle as a variable, then I'd search a... ar... are... area... areao
areao would not be found and so I assume I'm starting a new word. Thus an alphabetic list of words in some highly accessible form would be perfect! I've searched and found dictionaries such as WordWeb, WordNet, ASpell, etc. But does anyone have a recommendation for me?
Thanks in advance!
So obviously I need to allow my program to indentify English words. What I plan to do is search over a dictionary (database if it exists) until the are no results and so assume that a new word has begun. So suppose I had areaoftriangle as a variable, then I'd search a... ar... are... area... areao
areao would not be found and so I assume I'm starting a new word. Thus an alphabetic list of words in some highly accessible form would be perfect! I've searched and found dictionaries such as WordWeb, WordNet, ASpell, etc. But does anyone have a recommendation for me?
Thanks in advance!
>So obviously I need to allow my program to indentify English words.
Obviously? Indeed, how would you enforce camel case in this identifier:
Identifiers are pretty close to free form in Java, so you're in for a rough ride with this program if you want it to be remotely useful. First, you need to pick out identifiers (you can do this most easily by identifying declarations rather than parsing every token in the source for identifiers). Then you need to grab the identifier and determine breakpoints (the place where a programmer might put an underscore) and match it against a mask using camel case. If it matches, move on. If it doesn't, offer the mask as a suggested change.
That's all pretty easy except for determining the breakpoints. I can guarantee that matching English words will either fail miserably or be of limited use. You might be better off writing this part as a plug-in where client code can supply logic that matches their naming conventions. If you want to do this for the general case, you have to account for English words as well as common and uncommon abbreviations across a wide range of project domains.
And finally, you said that this program enforces camel case. If your design suggests changes then that's fine, but if it actually makes changes or requires them to be made, that's not fine. What if the program is wrong? Nobody will use it, plain and simple. It's extremely difficult to write this program to be always right, so you need to make a compromise and suggest rather than enforce.
Obviously? Indeed, how would you enforce camel case in this identifier:
KfxReturnValue kfxRetVal;
That's all pretty easy except for determining the breakpoints. I can guarantee that matching English words will either fail miserably or be of limited use. You might be better off writing this part as a plug-in where client code can supply logic that matches their naming conventions. If you want to do this for the general case, you have to account for English words as well as common and uncommon abbreviations across a wide range of project domains.
And finally, you said that this program enforces camel case. If your design suggests changes then that's fine, but if it actually makes changes or requires them to be made, that's not fine. What if the program is wrong? Nobody will use it, plain and simple. It's extremely difficult to write this program to be always right, so you need to make a compromise and suggest rather than enforce.
I'm here to prove you wrong.
instead I'd recommend going through all the appropriate identifiers and parsing their camelcasedness.
In otherwords, go through all the identifiers and make a two-way mapping of camelcase fragments and where they appear. So if you found identifiers "abcDefGhi", "abcKoopa", "caterpillar", "snowCat" and "abcbomb", you'd get in your dictionary "abc", "def", "ghi", "koopa", "carbomb", "snow", "cat", and "caterpillar", with pointers (in the abstract sense) to the places in source code where those fragments appeared.
Then use some magic algorithm that searches for fragments that are concatenations of others or prefixes of others, and if they don't form some ordinary English word, then they're bad. For example, "abc" is a prefix of abcbomb. Maybe abcBomb was meant? But while cat is a prefix of caterpillar, caterpillar's in the dictionary.
Of course, that's dumb. If you want to enforce the camelcase rule, just tell people to do it and threaten to fire them, or if your company's in North Korea, threaten to imprison them, if they don't comply. If you disallow underscores from the names, that'll be enough to compel them to use camelcase. Right? Then again, if people can get through three years of CS thinking a std::vector's implemented with a linked list, maybe it isn't. Sigh.
In otherwords, go through all the identifiers and make a two-way mapping of camelcase fragments and where they appear. So if you found identifiers "abcDefGhi", "abcKoopa", "caterpillar", "snowCat" and "abcbomb", you'd get in your dictionary "abc", "def", "ghi", "koopa", "carbomb", "snow", "cat", and "caterpillar", with pointers (in the abstract sense) to the places in source code where those fragments appeared.
Then use some magic algorithm that searches for fragments that are concatenations of others or prefixes of others, and if they don't form some ordinary English word, then they're bad. For example, "abc" is a prefix of abcbomb. Maybe abcBomb was meant? But while cat is a prefix of caterpillar, caterpillar's in the dictionary.
Of course, that's dumb. If you want to enforce the camelcase rule, just tell people to do it and threaten to fire them, or if your company's in North Korea, threaten to imprison them, if they don't comply. If you disallow underscores from the names, that'll be enough to compel them to use camelcase. Right? Then again, if people can get through three years of CS thinking a std::vector's implemented with a linked list, maybe it isn't. Sigh.
•
•
Join Date: Aug 2006
Posts: 137
Reputation:
Solved Threads: 11
Hehe well perhaps obvious was a bit presumptuous of me? But really it seemed like the only course of action. I’m not entirely sure how one would identify break points? The analyzer is aimed specifically for a novice user and works by providing suggestions (so by enforce, I actually meant tries to enforce or something like that
).
The only way I could think of finding the break points was by using a dictionary. Conceptually it is the only solution my mind can perceive. I thought that the dictionary would be allowed to grow so that abbreviations would eventually be understood in future analysis. Of course that means explaining the concept of the CamelCamel case convention, of which is also apart of my analyzer, that is, it’s a learning tool. Yeah so all of this is part of my Honours project
‘My best code is written with the delete key,’ I like that!
Interesting approach Rashakil (how do you pronounce your name? Cool name though!). It was also suggested to me that I gather all the identifiers for comparison because perhaps an identifier’s case was mistyped so that ‘variableOne’ and ‘variableone’ would allow me to suggest that ‘variableOne’ was meant. That is, compare indentifies regardless of case and then suggest that identifier that has a capital letter in it. I will also consider building up a dictionary as you suggest, but because this is a small part of what I’m trying to achieve and time is limited, I might not implement it. Also, because it is aimed at novice users, it is likely that they will tend to not use the CamelCase convention and so my built up dictionary would probably just consist of large compound words. But it really is an interesting take! Thank you!
I would still like to use my dictionary search so if anyone has a suggestion of a good alphabetic dictionary database kind of thingy, then please holla! Oh and easier methods would be welcomed too!
Thanks for responding!
Power to the people.
).The only way I could think of finding the break points was by using a dictionary. Conceptually it is the only solution my mind can perceive. I thought that the dictionary would be allowed to grow so that abbreviations would eventually be understood in future analysis. Of course that means explaining the concept of the CamelCamel case convention, of which is also apart of my analyzer, that is, it’s a learning tool. Yeah so all of this is part of my Honours project

‘My best code is written with the delete key,’ I like that!
Interesting approach Rashakil (how do you pronounce your name? Cool name though!). It was also suggested to me that I gather all the identifiers for comparison because perhaps an identifier’s case was mistyped so that ‘variableOne’ and ‘variableone’ would allow me to suggest that ‘variableOne’ was meant. That is, compare indentifies regardless of case and then suggest that identifier that has a capital letter in it. I will also consider building up a dictionary as you suggest, but because this is a small part of what I’m trying to achieve and time is limited, I might not implement it. Also, because it is aimed at novice users, it is likely that they will tend to not use the CamelCase convention and so my built up dictionary would probably just consist of large compound words. But it really is an interesting take! Thank you!
I would still like to use my dictionary search so if anyone has a suggestion of a good alphabetic dictionary database kind of thingy, then please holla! Oh and easier methods would be welcomed too!
Thanks for responding!
Power to the people.
![]() |
Similar Threads
- Starting Python (Python)
- execute python script from shell/Interactive Window (Python)
- get posted form data in python (Python)
- I know I'll sound stupid but... (Geeks' Lounge)
- NTFS - what can I do? (Getting Started and Choosing a Distro)
- portals... snooze newz (DaniWeb Community Feedback)
- Woo-woo! (Geeks' Lounge)
Other Threads in the Computer Science Forum
- Previous Thread: a bot for online game
- Next Thread: kernal
| Thread Tools | Search this Thread |
ai algorithm algorithms amazon assignment assignmenthelp assignments battery bigbrother binary bittorrent bizarre bletchleypark blogging bomb business codebreaker compiler computer computers computerscience computertrackingsoftware connect conversion csc data dataanalysis dataintepretation development dfa dissertation dissertations dissertationthesis dissertationtopic ebook employment energy extensions floatingpoint foreclosure foreclosuresoftware fuel gadgets geeks givemetehcodez government graphics hardware history homeowners homeworkassignment homeworkhelp humor ibm idea ideas internet ipod itcontracts jobs kindle laser laws linkbait lsmeans mainframes marketing mobileapplication msaccess nano networking news os p2p piracy piratebay principles programming rasterizer sam-being-cute sas science security sex simulation software spying sql stephenfry study supercomputer supercomputing sweden technology textfield turingtest two'scompliment uk virus ww2






