I have a function that I intended to be able to remove duplicate rows in an excel worksheet. I pass it $objPHPExcel which has gone through the following function

    function checkExtension($inputFileName) {
            if (pathinfo($inputFileName,PATHINFO_EXTENSION) == "csv") {
                    $inputFileType = 'CSV';
                    $objReader = PHPExcel_IOFactory::createReader($inputFileType);
                    return $objPHPExcel = $objReader->load($inputFileName);
            } else {
                    return $objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
            }
    }

The removeDuplicates function is

    function removeDuplicates($objPHPExcel) {
                $worksheet = $objPHPExcel->getActiveSheet();
                foreach ($worksheet->getRowIterator() as $row) {
                    $rowIndex = $row->getRowIndex(); 
                    foreach ($worksheet->getRowIterator($rowIndex + 1) as $testRow) {
                        if ($testRow == $row) {
                                $worksheet->removeRow($rowIndex);
                        }
                    }
                }

                $newFileName = 'imports/canx.csv';
                $objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');
                $objWriter->save($newFileName);

                return $objPHPExcel = checkExtension($newFileName);
    }

The problem is that canx.csv is the same as the orignal file. i.e no rows are getting removed (I've purposefully put in the duplicate rows into the original).

What am I doing wrong here?

Thanks for any assistance given.

Member Avatar

The problem is that canx.csv is the same as the orignal file. i.e no rows are getting removed (I've purposefully put in the duplicate rows into the original).

I don't know what you are trying to do? Did you post this question the in the PHPExcel forum?

Take a look at these examples:

http://phpexcel.codeplex.com/discussions/80845

http://phpexcel.codeplex.com/discussions/76379

You might need this (a preg_match function to match the duplicate and remove it):

$duplicate[] = NULL;
$cvs = 'csv';

if (preg_match($worksheet,$newFileName)){
$inputFileType=$cvs;
$valve = $sheet->getCellByColumnAndRow($z, $row)->getValue();
}

if ($z == $inputFileType && preg_match($worksheet,$valve)){
$invaild[]=$row;
}
elseif ($z == $inputFileType && preg_match($worksheet,$valve)){{
$duplicate[]=$valve;
$duplicate[]=$row;
}

It's not tested you might have to debug some errors to make it work.

Hopefully you will get an idea how it works.

Thanks guys. The links helped.

I've altered my approach.

        function removeDuplicates($inputFileName, $objPHPExcel) {
                $worksheet = $objPHPExcel->getActiveSheet();
                $urn = array();

                foreach ($worksheet->getRowIterator() as $row) {
                    $rowIndex = $row->getRowIndex();
                    $cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
                    array_push($urn, $cellValue);       
                }

                $numberOfURNs = count($urn);

                foreach ($worksheet->getRowIterator() as $row) {
                    $rowIndex = $row->getRowIndex();
                    $cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
                    for ($i = $rowIndex; $i != $numberOfURNs; $i++) {
                        if ($urn[$i] == $cellValue) {
                            $worksheet->removeRow($rowIndex);
                            $objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');
                            $objWriter->save($inputFileName);
                            global $row;
                            $row = $worksheet->getRowIterator($rowIndex);
                            break;  
                        }
                    }
                }

                return $objPHPExcel = checkExtension($inputFileName);
    }

It removes the rows fine but once I remove a row I assume it is shifting the other rows up meaning the rowIterator is then in the wrong place. How do I set the iterator right? I've tried

$row = $worksheet->getRowIterator($rowIndex);

but it isn't working.

I suggest getting the total number of rows (if possible), and then work your way backwards through the rows. Anything that will shift will not affect you.

Thanks Pritaeas. I'll have to invest in some of that common sense :)

    function removeDuplicates($inputFileName, $objPHPExcel) {
                $worksheet = $objPHPExcel->getActiveSheet();
                $urn = array();

                foreach ($worksheet->getRowIterator() as $row) {
                    $rowIndex = $row->getRowIndex();
                    $cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
                    array_push($urn, $cellValue);       
                }

                $numberOfURNs = count($urn);

                for ($rowIndex = $numberOfURNs; $rowIndex != 1; $rowIndex--) {
                    $cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
                    for ($i = $rowIndex - 2; $i != 0; $i--) {
                        if ($urn[$i] == $cellValue) {
                            $worksheet->removeRow($rowIndex);
                            $objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');
                            $objWriter->save($inputFileName);
                            break;  
                        }
                    }
                }

                return $objPHPExcel = checkExtension($inputFileName);
    }

works!