why-is-it-faster-to-process-a-sorted-array-than-an-unsorted-array


SUBMITTED BY: menamagice

DATE: Aug. 9, 2017, 2:11 a.m.

FORMAT: Text only

SIZE: 2.3 kB

HITS: 279

  1. Here is a piece of C++ code that seems very peculiar. For some strange reason, sorting the data miraculously makes the code almost six times faster.
  2. #include <algorithm>
  3. #include <ctime>
  4. #include <iostream>
  5. int main()
  6. {
  7. // Generate data
  8. const unsigned arraySize = 32768;
  9. int data[arraySize];
  10. for (unsigned c = 0; c < arraySize; ++c)
  11. data[c] = std::rand() % 256;
  12. // !!! With this, the next loop runs faster
  13. std::sort(data, data + arraySize);
  14. // Test
  15. clock_t start = clock();
  16. long long sum = 0;
  17. for (unsigned i = 0; i < 100000; ++i)
  18. {
  19. // Primary loop
  20. for (unsigned c = 0; c < arraySize; ++c)
  21. {
  22. if (data[c] >= 128)
  23. sum += data[c];
  24. }
  25. }
  26. double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;
  27. std::cout << elapsedTime << std::endl;
  28. std::cout << "sum = " << sum << std::endl;
  29. }
  30. Without std::sort(data, data + arraySize);, the code runs in 11.54 seconds.
  31. With the sorted data, the code runs in 1.93 seconds.
  32. Initially, I thought this might be just a language or compiler anomaly. So I tried it in Java.
  33. import java.util.Arrays;
  34. import java.util.Random;
  35. public class Main
  36. {
  37. public static void main(String[] args)
  38. {
  39. // Generate data
  40. int arraySize = 32768;
  41. int data[] = new int[arraySize];
  42. Random rnd = new Random(0);
  43. for (int c = 0; c < arraySize; ++c)
  44. data[c] = rnd.nextInt() % 256;
  45. // !!! With this, the next loop runs faster
  46. Arrays.sort(data);
  47. // Test
  48. long start = System.nanoTime();
  49. long sum = 0;
  50. for (int i = 0; i < 100000; ++i)
  51. {
  52. // Primary loop
  53. for (int c = 0; c < arraySize; ++c)
  54. {
  55. if (data[c] >= 128)
  56. sum += data[c];
  57. }
  58. }
  59. System.out.println((System.nanoTime() - start) / 1000000000.0);
  60. System.out.println("sum = " + sum);
  61. }
  62. }
  63. With a somewhat similar but less extreme result.
  64. My first thought was that sorting brings the data into the cache, but then I thought how silly that is because the array was just generated.

comments powered by Disqus