import { Component } from '@angular/core'
import { AppModeService } from '../../app-mode.service'

@Component({
    template: `
    <div id="main-content">
    <table class="mission-statement-table">
        <tr>
            <td class="mission-statement-column1">
            </td>  
        </tr>
        <td class="mission-statement-column2">
                <h2><strong>Structured Streaming</strong></h2>
                <div class="author-date-item">Jagan Lakshmipathy</div>
                <div class="author-date-item">October 19, 2019</div>
                <p></p>
                <h3>What is Structured Streaming?</h3>
                <p>Batch processing has been a popular workflow model in the big data world. Batch processing is inherently fault-tolerant and reasonably accurate. However, batch processing is relatively very slow compared to streaming processing. Traditional Streaming was built to address real-time or near real-time processing. Kafka, Samza, Flink, etc are some popular streaming systems. While streams provided good throughput and speed it was lacking in fault-tolerence and accuracy. Nathan Marz in his popular blog introduced Lambda Architecture which merged the best of both worlds (batch processing and stream processing).  </p>
                <p>In Lambda Architecture, data will flow into both batch and streaming modules and will be processed at different intervals and the output will be merged to provide near real-time, and reasonably accurate results. However, redundency in the lambda architecture didn't withstand test of time. Newer and improved architectures were proposed. Spark 2.x structured streaming is an advancement in that direction. It treats Batch as a subset of stream and thereby merging the apis in to one unified interface. Please check out <A href="https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#basic-concepts"> here </A>for concepts of Spark structured streaming. In essence, Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming.</p>
                <p>As data in the batch processing can be viewed as a static relation or table and streaming as a data that continously appending. By providing a schema this continous flow of data can be viewed as a growing relation or an unbounded table. So a typical layout of the code with the structured streaming api will look like the following:</p>
<pre>/**  
* Input setup logic
*/
val def = spark.read
    .format("json")
    .load("/from/here")

/**  
* Data manipulation logic
*/
val output = df
    .select($"name", $"age")
    .where($"age" > 21)

/**  
* Ouput saving logic
*/
val output.write
    .format("parquet")
    .save("/to/here")</pre>

                </table>

    `
})
export class StructuredStreamingComponent {
   
    constructor(private modeService: AppModeService){
        this.modeService.displaySidebar()
    }
}