Parallel Stream In Java is a powerful and advanced feature. Parallel Stream In Java came into action after the revolution of Java 8. It is used to process large sets or collections of data concurrently. It provides the feature of using multiple threads to execute tasks in parallel which allows performance optimisation while processing data.
When it comes to sequential processing, it is time-consuming and not optimised for large data sets. We can convert sequential processing to parallel processing in Java by changing a small piece of code.
An improper usage of Parallel Stream can cause inefficiencies and bad experience sometime.
When we call the parallelStream()
method on collections, the stream divides the work into smaller chunks and processes them concurrently. We can assume it same as threads in Java.
Key Features Of Parallel Stream In Java:
- Concurrency: Multiple tasks are processed at the same time.
- Performance boost: Speeds up operations on large data sets.
- Ease of use: The syntax remains the same as a regular stream.
Best Practices To Create a Parallel Stream in Java?
Below are a few best practices to create and use parallel stream in Java. Incorporating these we can achieve the best performance and avoid inefficiencies.
How To Create a Parallel Stream in Java
In Java, we can create a parallel stream with parallelStream()
or parallel()
Methods. An example of a parallelStream() method.
List<String> watchesList = Arrays.asList("Patek Philip", "Tag Heuer", "Casio", "Rolex"); watchesList.parallelStream().forEach(System.out::println);
An example of the parallel() method
watchesList.stream() // Creating a regular stream .parallel() // Converting it to a parallel stream .forEach(System.out::println);
Here is another example of a parallel stream for large calculations.
List<Integer> valueRange = IntStream.range(1, 10000).boxed().toList(); // Using parallel stream to calculate the sum as total int total = valueRange.parallelStream().mapToInt(Integer::intValue).sum(); System.out.println("Sum: " + total);
Best Practices
Choose the Right Collection
Parallel Stream is ideal for larger data sets. For smaller collections or data sets it is better to use a normal stream. Dealing with small collections and managing multiple threads can slow down the performance. Therefore, it’s best to use Parallel Streams on collections with a large number of elements.
Avoid Using Parallel Streams on Small Collections
For small collections, using parallelStream() can result in performance overhead due to the cost of managing multiple threads. Stick to sequential streams for small datasets to avoid unnecessary complexity.
List<String> watchesList = Arrays.asList("Patek Philip", "Tag Heuer", "Casio", "Rolex"); watchesList.forEach(watch -> { System.out.println("watch name : " + watch); }); // OR shoeList.stream().forEach(System.out::println);
Use Parallel Streams for Independent Operations
Parallel streams work best when each operation is independent. If your operations depend on the previous result, using a parallel stream can lead to unexpected behavior, such as race conditions.
List<String> productList = Arrays.asList("Patek Philippe", "Rolex", "Casio", "Tag Heuer"); // Using Parallel Stream for independent operations productList.parallelStream() .forEach(product -> { // Independent operation 1: Convert the product name to uppercase System.out.println("Uppercase: " + product.toUpperCase()); // Independent operation 2: Print the length of the product name System.out.println("Length: " + product.length()); });
Handling Large Datasets with Order Sensitivity
For larger datasets, if the order of processing is critical, ensure that the data is sorted beforehand, or use forEachOrdered()
to ensure that the original order is maintained during parallel processing.
List<String> productList = Arrays.asList("Patek Philippe", "Rolex", "Casio", "Tag Heuer", "Apple", "Samsung", "Nikon"); // Using parallel stream with forEachOrdered to ensure order for large datasets productList.parallelStream() .sorted() // Optional: Sort the items before processing .forEachOrdered(System.out::println); // Or For each oredering productList.parallelStream() .forEachOrdered(product -> System.out.println(product));
Use Parallel Streams with Stream Pipelines
You can combine parallel streams with other stream operations, such as filter()
, map()
, reduce()
, and more. These operations work efficiently when the tasks are independent and can be divided among multiple threads.
List<String> productList = Arrays.asList("Patek Philippe", "Rolex", "Casio", "Tag Heuer", "Citizen"); // Using parallel stream with a stream pipeline List<String> filteredProducts = productList.parallelStream() .filter(product -> product.startsWith("C")) // Filter products starting with 'C' .map(String::toUpperCase) // Convert product names to uppercase .collect(Collectors.toList()); // Collect the results into a list // Print the filtered and processed list filteredProducts.forEach(System.out::println); // Sort & Count data long count = productList.parallelStream() .filter(product -> product.length() > 5) // Filter products with names longer than 5 characters .sorted() // Sort the filtered products alphabetically .peek(System.out::println) // Print each product as it is processed .count();// Count the number of filtered products System.out.println("Total count: " + count);
Explanation:
- filter(): Select only products with names longer than 5 characters.
- sorted(): Sorts the filtered products alphabetically.
- peek(): Allows you to print each element in the pipeline for debugging or logging.
- count(): Counts the number of elements that pass through the pipeline.
Monitor the Performance
performance of parallel streams can vary depending on factors like dataset size, hardware, and the type of operations performed. It’s crucial to monitor and measure their performance to ensure they meet your application’s requirements.
Here is a simple way to calculate processing time.
List<Integer> numbers = IntStream.rangeClosed(1, 10_000_000).boxed().collect(Collectors.toList()); // Measure time for sequential stream long startSequential = System.currentTimeMillis(); long sumSequential = numbers.stream().mapToLong(Integer::longValue).sum(); long endSequential = System.currentTimeMillis(); System.out.println("Sequential Stream Time: " + (endSequential - startSequential) + " ms"); // Measure time for parallel stream long startParallel = System.currentTimeMillis(); long sumParallel = numbers.parallelStream().mapToLong(Integer::longValue).sum(); long endParallel = System.currentTimeMillis(); System.out.println("Parallel Stream Time: " + (endParallel - startParallel) + " ms"); System.out.println("Sum Sequential: " + sumSequential + ", Sum Parallel: " + sumParallel);
You can also monitor the performance of a parallel stream in Java with built-in tools like Java Management Extensions (JMX), JVisualVM and Java Flight Recorder (JFR).
Be Careful with Side Effects
Side effects occur when:
- An operation alters the state of a shared variable.
- A function used in the stream is not stateless or thread-safe.
- Operations depend on an external mutable state, like a collection or a variable outside the stream.
Common Issues Caused by Side Effects
- Race Conditions: Multiple threads compete to modify the same resource, leading to unpredictable behavior.
- Data Corruption: Inconsistent or incomplete modifications to shared data.
- Performance Degradation: Synchronization overhead can negate the benefits of parallelism.
- Incorrect Results: The final output may not match the expected result.
Avoid Shared Mutable State:
- Use thread-safe data structures like
ConcurrentHashMap
immutable collections. - Avoid external variables in lambda expressions.
Use Stateless Operations:
- Ensure functions used in
map()
,filter()
, orreduce()
are stateless and side-effect-free.
Collect Results Properly:
- Use built-in collectors like
Collectors.toList()
ortoMap()
for thread-safe result aggregation.
Debug Carefully:
- Use debugging tools like logs to verify parallel execution.
Consider Sequential Streams:
- Use sequential streams if side effects are unavoidable or parallelism adds complexity without significant performance gains.
How Parallel Stream in Java Works Internally?
- Source Splitting: The source data (e.g., List, Set, Array) is split into smaller chunks using the Spliterator interface, recursively dividing until chunks are small enough for processing.
- ForkJoinPool Allocation: Parallel streams use the ForkJoinPool.commonPool(), which manages threads with a default pool size equal to the number of available CPU cores.
- Task Submission: Each chunk of data is submitted as a separate task to the ForkJoinPool, and tasks are distributed across worker threads.
- Task Execution in Parallel: Tasks are executed concurrently by multiple threads, ensuring efficient use of available CPU cores.
- Combining Partial Results: Results from individual threads are merged back together, typically using reduce or collect operations.
- Result Aggregation: The merged results are aggregated into a final form, ensuring correctness and consistency.
- Final Output: The aggregated result is returned as the output of the parallel stream operation.
What Are The Advantages of Parallel Stream?
- Processes Faster for Large Data: When dealing with big datasets, parallel streams break the data into parts and process them together, making things faster.
- Makes Use of Multi-Core Processors: Modern CPUs have multiple cores. Parallel streams use all of them to give a better performance without extra effort from you.
- No Need to Handle Threads Manually: You don’t have to write complex code to manage threads or worry about synchronization. Parallel streams do it for you.
- Built-in Thread Management: It uses the ForkJoinPool, which handles threads automatically, so you don’t have to track or manage them.
- Scalable with Your System: Whether you’re running it on a dual-core or an octa-core processor, parallel streams scale up to use available cores efficiently.
- Cleaner Code: The syntax of parallel streams is straightforward and easy to read, making your code simple and clean.
- Great for Stateless Tasks: Operations like filtering, mapping, or reducing, which don’t depend on external variables, run very smoothly with parallel streams.
- Works Well with Collectors: You can easily combine parallel processing with collectors like
toList()
,toSet()
, or evenjoining()
to get the output in the desired format. - Saves Time: It’s a great way to save development time and effort, especially for data-heavy operations where parallelism shines.
What Are The Disadvantages of Parallel Stream?
- Overhead for Small Data: Parallel streams work best with large datasets. For small datasets, the overhead of splitting and combining results can slow things down.
- Not Always Faster: Sometimes, parallel streams may not be faster than sequential ones, especially when the task is not computationally intensive or when there are a lot of IO operations involved.
- Thread Management Complexity: While the ForkJoinPool manages threads, if your program is already using threads elsewhere, the overhead from managing multiple threads can reduce performance.
- Order of Execution Issues: Since parallel streams don’t guarantee the order of execution, you might get inconsistent results or face challenges when order matters.
- Difficult to Debug: Debugging parallel streams can be tricky due to concurrent execution. Thread-related issues, like race conditions, might be hard to spot.
- Not Suitable for All Operations: Parallel streams work best with stateless operations. Operations that have side effects, depend on the mutable state or require synchronization can lead to bugs or unpredictable behavior.
- Memory Consumption: Because parallel streams split tasks into multiple threads, they can sometimes consume more memory than sequential streams, especially when dealing with large collections.
- Limited Control Over Thread Pool: The default thread pool used by parallel streams (ForkJoinPool) may not always be optimal for specific tasks. You don’t have full control over how many threads are used or how they are managed.
- Increased Complexity: While parallel streams simplify parallelization in many cases, they add complexity when it comes to managing shared resources, synchronization, and ensuring thread safety.
Conclusion
We learned the parallel stream in Java. We also explored how it works and the best practices for using it. You may add the code or anything you want me to add to the article. Comment that or ping me at [email protected]
Leave a Reply