采用部分快速排序算法實現(xiàn)數(shù)組的部分排序

jijo 2009-08-28

展開全文

eaglet的公告
文章分類
.Net 技術
Hubble.net
Lucene
辦公軟件使用技巧
操作系統(tǒng)
數(shù)據(jù)庫
數(shù)據(jù)挖掘
搜索引擎
算法
網(wǎng)絡技術
.Net 技術
BLOG風格設置
C#中如何快速將SQL_SERVER數(shù)據(jù)庫中導出的數(shù)據(jù)導入到EXCEL里面
WebBrowser腳本錯誤的完美解決方案
Windows Workflow Beta2 HOL學習筆記（一）：創(chuàng)建一個簡單的WF項目
Windows Workflow Beta2 HOL學習筆記（三）：使用IfElse Activity，聲明條件和自定義活動
Windows Workflow Beta2 HOL學習筆記（二）：向工作流中傳遞參數(shù)
其他
存檔
2009年05月(14)
2009年03月(2)
2008年11月(3)
2008年10月(5)
2008年09月(1)
2008年08月(1)
2008年07月(3)
2008年06月(2)
2008年05月(9)
2008年04月(4)
2007年09月(1)
2007年08月(5)
2007年07月(6)
2007年06月(1)
2007年05月(2)
2007年04月(12)
2003年10月(1)
2003年05月(2)
采用部分快速排序算法實現(xiàn)數(shù)組的部分排序收藏
采用部分快速排序算法實現(xiàn)數(shù)組的部分排序

Author: Eaglet

快速排序算法，網(wǎng)上相關文章已經(jīng)介紹的很多了，數(shù)據(jù)結(jié)構(gòu)教材中也有很詳細的介紹。本文需要闡述的不是全排序快速排序算法，而是部分快速排序算法。所謂部分快速排序算法是指通過排序獲取一個數(shù)列中最大的若干條有序記錄。比如我們需要從一個有1百萬記錄的數(shù)組中獲取前100條有序記錄，并按從大到小順序顯示給用戶，這種應用在搜索引擎中經(jīng)常被使用，很少會有人有耐心將100萬條搜索出來的記錄都閱讀一遍，一般閱讀前幾百條紀錄就可以得到自己滿意的答案。其實這種算法很像SQLSERVER 中的 TOP n 的實現(xiàn)，不過數(shù)據(jù)庫是預先已經(jīng)將記錄通過B+樹索引的方式進行了組織，實現(xiàn)方法完全不同。本文需要闡述的是不通過數(shù)據(jù)庫，如何高效完成Top n 這種部分排序的功能。

在開始之前，我們先來分析一下2種其他的排序方法，如果進行部分排序。

1. 選擇排序

選擇排序方法實現(xiàn)部分排序非常簡單，如果你需要獲取長度為M的數(shù)組中前N條排序記錄，你只需要對M條記錄進行N次掃描，每次掃描都找出剩余序列中的最大（或最小值）就可以完成。當N 遠小于 M 時，選擇排序的時間復雜度大概為O(M*N)。

2. 堆排序

對排序可以說是實現(xiàn)部分排序時最常被用到的算法，包括著名的搜索組件Lucene在內(nèi)都采用了堆排序來實現(xiàn)部分排序。具體算法我在網(wǎng)上找了一個，請參見部分排序問題。根據(jù)這篇文章所述，堆排序?qū)崿F(xiàn)部分排序的實現(xiàn)復雜度是 O（MlogN)

現(xiàn)在讓我們開始部分快速排序算法。

快速排序的一趟排序，可以根據(jù)一個基準值將數(shù)組分割成兩個部分，基準前的所有數(shù)都比基準小，基準后的所有數(shù)都比基準大。如上圖所示，5 就是這個基準值。全排序的快速排序算法，在實現(xiàn)了一趟排序后，通過遞歸分別對基準前后的數(shù)列再進行相同的排序直到所有數(shù)據(jù)都有序。

我們可以巧妙的利用這種基準分割的方法來實現(xiàn)部分排序。如果要實現(xiàn)部分排序，我們不需要對所有的數(shù)據(jù)都進行排序，其實每次只需要對基準的一邊進行一趟排序就可以了，其原理很像二分查找。如上圖，如果我們希望得到最小的前2個排序數(shù)據(jù)，我們只需要在第二次排序時對5之前的數(shù)列重新做一次一趟排序，而不需要對 5之后的數(shù)據(jù)做，因為5在一趟排序后被排在了第6位，其之后的數(shù)據(jù)肯定不可能出現(xiàn)在前2位。假設第二次我們選取5之前的數(shù)列的中間位置的值2作為基準，那么第二次一趟排序的過程如下

3 4 2 1 5 5 9 8 7

_ 4 3 1 5 5 9 8 7 (2)

1 4 3 _ 5 5 9 8 7 (2)

1 _ 3 4 5 5 9 8 7 (2)

1 2 3 4 5 5 9 8 7

兩次一趟排序后，前2條排序記錄已經(jīng)算出，但從結(jié)果可以看出，這時整個數(shù)列并沒有被完全排序，因為我們不需要完整的排序數(shù)列。第二輪的演化過程請參考嚴蔚敏的數(shù)據(jù)結(jié)構(gòu)教材，采用的是相同的算法。

下面來分析一下部分快速排序的時間復雜度

理想情況

我們假設理想情況下，每次基準都選擇在數(shù)列的中間位置，那么其掃描的趟數(shù)是

1 + 1/2 + 1/4 + 1/8 ... 1/ K (K = log(M/N)) + NlogN

這個等比級數(shù)，如果K趨近為無窮的時候值為2，為什么是2，參考高等數(shù)學，這里不多闡述。

那么現(xiàn)在我們可以看出，在這種情況下時間復雜度 < O(2M + NlogN), 當N遠小于M時，時間復雜度近似為小于 O(2M)

這個時間復雜度在N>4時要比部分堆排序的 O（MlogN)要小，而且隨著N 的增大，部分快速排序的優(yōu)勢越來越明顯。

最佳情況

最佳情況并不是每次基準都出現(xiàn)的中間位置，而是第一趟排序選擇的基準正好位于第N個位置。這時時間復雜度為O(M)

最壞情況

每次基準出現(xiàn)在M-i 的位置，這時時間復雜度為 O (M*M)

下面是測試結(jié)果：

測試1 數(shù)組中為隨機數(shù)據(jù)，數(shù)組長度M從2的16次方到2的22次方，N=100，每組長度下測試100次計算平均用時

平均用時(單位毫秒)/
數(shù)組長度 65536 131072 262144 524288 1024857 2097152 41944304
部分快速排序 0.55 1.61 3.13 7.46 13.04 30.1 62.96
完全快速排序 10.45 21.85 45.55 93.97 197.3 405.54 841.75

測試2 數(shù)組中為以排序好的數(shù)據(jù)，數(shù)組長度M從2的16次方到2的22次方，N=100，每組長度下測試100次計算平均用時

平均用時(單位毫秒)/
數(shù)組長度 65536 131072 262144 524288 1024857 2097152 41944304
部分快速排序 0.03 0 0.11 1.12 3.04 6.66 12.87
完全快速排序 2.12 4.9 10.49 21.6 44.29 91 185.77

從測試結(jié)果看采用部分快速排序獲取前100條有序記錄要比采用完全快速排序要快10到100倍。

下面給出代碼，由于.Net的Array.Sort就是完全快速排序，所以直接使用，沒有自己實現(xiàn)完全快速排序。

QuickSort 類

using System;
using System.Collections.Generic;
using System.Text;

namespace Sort
   {
       /**/ ///   <summary>
      /// Quick Sort
      ///   </summary>
      ///   <typeparam name="T"></typeparam>
      public   class QuickSort < T >
        {
          // Partition for int
          private   static   int PartitionInt( int [] array, int low, int high, int pivotIndex)
            {
              int pivotValue = array[pivotIndex];
             array[pivotIndex] = array[low];
             array[low] = pivotValue;

              while (low < high)
                {
                  while (array[high] >= pivotValue && high > low)
                    {
                      -- high;
                 }

                  if (high > low)
                    {
                     array[low] = array[high];
                 }

                  while (array[low] <= pivotValue && high > low)
                    {
                      ++ low;
                 }

                  if (high > low)
                    {
                     array[high] = array[low];
                 }

             }

             array[low] = pivotValue;

              return low;
         }

          // Partition for long
          private   static   int PartitionLong( long [] array, int low, int high, int pivotIndex)
            {
              long pivotValue = array[pivotIndex];
             array[pivotIndex] = array[low];
             array[low] = pivotValue;

              while (low < high)
                {
                  while (array[high] >= pivotValue && high > low)
                    {
                      -- high;
                 }

                  if (high > low)
                    {
                     array[low] = array[high];
                 }

                  while (array[low] <= pivotValue && high > low)
                    {
                      ++ low;
                 }

                  if (high > low)
                    {
                     array[high] = array[low];
                 }

             }

             array[low] = pivotValue;

              return low;
         }

          // Normal Partition
          private   static   int Partition(T[] array, int low, int high, int pivotIndex, IComparer < T > comparer)
            {
              if (comparer ==   null )
                {
                 Array arr = array;

                  if ( typeof (T) ==   typeof ( int ))
                    {
                      return PartitionInt(( int [])arr, low, high, pivotIndex);
                 }
                  else   if ( typeof (T) ==   typeof ( long ))
                    {
                      return PartitionLong(( long [])arr, low, high, pivotIndex);
                 }
             }

             T pivotValue = array[pivotIndex];
             T pLow = array[low];

              while (low < high)
                {
                  while (comparer.Compare(array[high], pivotValue) >=   0   && high > low)
                    {
                      -- high;
                 }

                  if (high > low)
                    {
                     array[low] = array[high];
                 }

                  while (comparer.Compare(array[low], pivotValue) <=   0   && high > low)
                    {
                      ++ low;
                 }

                  if (high > low)
                    {
                     array[high] = array[low];
                 }

             }

             array[low] = pLow;

              return low;
         }

          public   static   void TopSort(T[] array, int top)
            {
             TopSort(array, top, null );
         }

          public   static   void TopSort(T[] array, int top, IComparer < T > comparer)
            {
              // If comparer is null
              if (comparer ==   null )
                {
                 Array arr = array;

                  if ( typeof (T) !=   typeof ( int ) &&
                      typeof (T) !=   typeof ( long ))
                    {
                     Array.Sort(array);
                      return ;
                 }
             }

              // Judge input
              if (array.Length <=   2   || top >= array.Length /   2 )
                {
                 Array.Sort(array, comparer);
                  return ;
             }

              // One time partition
              int pivot = Partition(array, 0 , array.Length -   1 , array.Length /   2 , comparer);
              int lastPivot = pivot;

              // Run until pivot near the top
              while (( ! (lastPivot >= top && pivot <= top)))
                {
                 lastPivot = pivot;

                  if (pivot > top)
                    {
                     pivot = Partition(array, 0 , pivot, pivot / 2 , comparer);

                      if (pivot == lastPivot)
                        {
                         pivot -- ;
                     }
                 }
                  else
                    {
                      if (pivot >= array.Length -   1 )
                        {
                         lastPivot = array.Length -   1 ;
                          break ;
                     }

                     pivot = Partition(array, pivot +   1 , array.Length - 1 , (array.Length - pivot) /   2 , comparer);
                 }
             }

              // Finally sort
              if (lastPivot < array.Length)
                {
                 Array.Sort(array, 0 , lastPivot +   1 , comparer);
             }
              else
                {
                 Array.Sort(array, 0 , lastPivot, comparer);
             }
         }
     }
}

測試代碼

          static   void Main( string [] args)
            {
              int [] testArr =   null ;
              int [] testArr1 =   null ;
             Stopwatch sw =   new Stopwatch();
             List < int > testValues =   new List < int > ();

             Random rand =   new Random();

              int pow =   16 ;

              int count = ( int )Math.Pow( 2 , pow);
              int top =   100 ;

              while (pow <   23 )
                {
                 Console.WriteLine( string .Format( " Test count={0} " , count));

                  double topSortElapsedMilliseconds =   0 ;
                  double fullSortElapsedMilliseconds =   0 ;

                  for ( int j =   0 ; j <   100 ; j ++ )
                    {

                     testValues.Clear();

                      for ( int i =   0 ; i < count; i ++ )
                        {
                         testValues.Add(rand.Next());
                          // testValues.Add(i);
                     }

                     testArr =   new   int [testValues.Count];
                     testArr1 =   new   int [testValues.Count];

                     testValues.CopyTo(testArr);
                     testValues.CopyTo(testArr1);

                     sw.Reset();
                     sw.Start();
                     Sort.QuickSort < int > .TopSort(testArr, top);

                     sw.Stop();

                     topSortElapsedMilliseconds += sw.ElapsedMilliseconds;

                     sw.Reset();
                     sw.Start();
                     Array.Sort(testArr1);

                     sw.Stop();

                     fullSortElapsedMilliseconds += sw.ElapsedMilliseconds;

                      // Compare result
                      for ( int i =   0 ; i < top; i ++ )
                        {
                          if (testArr[i] != testArr1[i])
                            {
                             Console.WriteLine( string .Format( " Wrong at {0}! {1} {2} " ,
                                 i, testArr[i], testArr1[i]));

                             Console.ReadKey();

                              // For test
                              while ( true )
                                {
                                 testValues.CopyTo(testArr);
                                 Sort.QuickSort < int > .TopSort(testArr, top);
                             }
                         }
                     }

                 }

                 Console.WriteLine( string .Format( " Top sort elapses {0} ms " ,
                     topSortElapsedMilliseconds / 100 ));
                 Console.WriteLine( string .Format( " Full sort elapses {0} ms " ,
                     fullSortElapsedMilliseconds / 100 ));

                 count *=   2 ;
                 pow ++ ;
             }

             Console.ReadKey();

         }
發(fā)表于 @ 2009年05月25日　09:34:00

本站是提供個人知識管理的網(wǎng)絡存儲空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點。請注意甄別內(nèi)容中的聯(lián)系方式、誘導購買等信息，謹防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點擊一鍵舉報。

小男孩‘自慰网亚洲一区二区,亚洲一级在线播放毛片,亚洲中文字幕av每天更新,黄aⅴ永久免费无码,91成人午夜在线精品,色网站免费在线观看,亚洲欧洲wwwww在线观看

采用部分快速排序算法實現(xiàn)數(shù)組的部分排序