Member Avatar for diafol

Hi all. Bit of a problem - hope somebody can help.

I am trying to return lists sorted alphabetically in my native language - Welsh.

The alphabet:

a,b,c,ch,d,dd,e,f,ff,g,ng,h,i,j,l,ll,m,n,o,p,ph,r,rh,s,t,th,u,w,y (29 letters)

Note that there are 'double characters' - these are considered as single discrete letters.

Roman letters (k,q,v,x,z) can be added to produce this 'composite alphabet':

a,b,c,ch,d,dd,e,f,ff,g,ng,h,i,j,k,l,ll,m,n,o,p,ph,q,r,rh,s,t,th,u,v,w,x,y,z (34 letters)

In addition vowels (a,e,i,o,u,w,y) can be accented with grave, acute, umlaut and circumflex - these should be equivalent to the unaccented letters - i.e. â, à etc equivalent to 'a'.


This means that search results should be sorted as follows:
agor
angel
amaeth
anablu

Due to 'ng' coming before 'm'.

caru
clwtyn
curo
chwarae

Due to 'ch' coming after 'c'.

This has given me a real headache. I've tried placing a 'symbol' in front of 'double characters', but ended up making a right pig's ear of things. I would be grateful for any pointers. Note - this is not for an assignment/commercial purposes - just trying to implement corrections to search results.

Member Avatar for diafol

Hmm - played with using a symbol in pHp. So far:

Here's the string substitution for double character letters:

$s = array("ž","Ž"); //substitute letters (lower and upperecase)
$gw = array("ch","dd","ff","ng","ll","ph","rh","th","Ch","Dd","Ff","Ng","Ll","Ph","Rh","Th","CH","DD","FF","NG","LL","PH","RH","TH");
$sw = array("c$s[0]","d$s[0]","f$s[0]","g$s[0]","l$s[0]","p$s[0]","r$s[0]","t$s[0]","C$s[0]","D$s[0]","F$s[0]","G$s[0]","L$s[0]","P$s[0]","R$s[0]","T$s[0]","C$s[1]","D$s[1]","F$s[1]","G$s[1]","L$s[1]","P$s[1]","R$s[1]","T$s[1]");

Here's the sorted word list producer:

function doList($input){
	foreach($input as $key => $value) { 
		if($value == "") { 
			unset($input[$key]); 
		}
	} 
	foreach($input as &$item){
		$item = stripslashes(swapChar($item));
	}
	$array_lowercase = array_map('strtolower', $input);
	array_multisort($array_lowercase, SORT_ASC, SORT_STRING, $input);
	foreach($input as &$item){
		if($item != ""){
			$item = swapBack($item);
		}
	}
	return $input;
}

//$mylist = doList($some_random_words_array);
//print_r($mylist);

The swap character functions:

function swapChar($input){
	global $gw, $sw;
	return str_replace($gw,$sw,$input);
}

function swapBack($input){
	global $gw, $sw;
	return str_replace($sw,$gw,$input);
}

This uses the ž and Ž characters to force certain doubles to appear beyond the last possible entry for the previous letter. But it ain't pretty. As 'z' doen't appear in the Welsh language, it should be reasonably safe, but if a 'borrowed' Eastern European word appears in a list - oops!

I AM still looking for help w.r.t MySQL - if something could be done here so that lists do not have to be messed with in php - that'd be great.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.