SparklingSecurity

The Spark In Your Security Analytics

  • Blog

  • About

  • More

    Who Am I?

    Hi.....I am Sanjay Menon.A Data Science enthusiast who loves exploring the application  of Data Science principles  in Security Analytics.I have rich experience across large enterprises like Symantec,Deutsche Bank,J.P.Morgan ,HP and Mercedez-Benz.

     

    I hold Masters degree in Information Security from Royal Holloway,University of London and holds Certificate in Data Science from University of Washington and Certificate in Statistical Inference and Regression Techniques from Duke University.I am a CISSP and also holds Hortonworks Certified Hadoop Developer certification along with certificate from Elastic for ELK stack along with many other Technical as well as Security domain certifications

     

     

    R

    R/NLP:Identifying Incident Types using NLP

    14-Nov-2017

    Summary description of an incident tickets contains the major information about the incident.The data however is mostly unstructured and major effort  to  utilize  this data in analytics requires high effort in the ETL process.Once we have  the data in a structured for...

    Read More
    Spark

    Spark/Scala/MLib:Outlier Detection Of Vulnerable Assets Using Kmeans

    14-Nov-2017

    Major reason for the security breaches happening in enterprises is the existence of vulnerable assets primarily due to unpatched vulnerabilities.

    Below spark code utilizes Kmeans as the clustering algorithm and using  various attributes like criticality of open vulnerab...

    Read More
    Spark

    Spark/Scala/MLib : Collobrative Filter To Predict Incident Types And Severity Of The Incidents

    12-Nov-2017

    As part of incident response function,attaining  capabilities to predict the incidents on different assets based on historical data of incidents and their severity will be a useful tool for proactive incident prevention.

    Below Spark code utilizes the collaborative filte...

    Read More
    Spark

    Spark/Scala/GraphX:Breadth First Search Algorithm Identifying Connected Components

    12-Nov-2017

    During a security incident scenario,it is highly useful to identify infrastructure components which are directly or indirectly connected infrastructure components and this should be  part of Incident response plan.

    Breadth-first search (BFS) is an algorithm for traversi...

    Read More
    ELK

    Logstash/Geoip:Plotting On A Worldmap

    2-May-2016

    Logstash ships with a geoip filter which is quite useful in plotting attributes related to attacks  on a world map. Logstash releases ship with the GeoLiteCity database made available from Maxmind with a CCA-ShareAlike 3.0 license.

    For example,the below code p...

    Read More
    Spark

    Spark/MLib/Pipeline : Multiple Algorithm In A Single Pipeline

    2-May-2016

    MLib provides a pipeline function which can take multiple algorithms as input and execute them A as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through...

    Read More
    Spark

    Spark/Mlib/DecisionTree : A Look At The Impurity Measures Of DecisionTree

    21-Mar-2016

    A major  success critera for a rule in  DecisionTree algorithm to be good is to ensure that target values are relatively homogenous or pure.Two common measures of impurity used with DecisionTree are Gini impurity and Entropy

    Gini impurity measures the acc...

    Read More
    R

    R : Insights From Confusion Matrix Of A Classifier

    21-Mar-2016

    When evaluating a classifier,generating a confusion matrix for the model gives indication on the performance of the model.Confusion matrix provides a statistical view on the parameters of Accuracy,Precision and Recall capabities of the model considering the True /Fase...

    Read More
    Spark

    Spark/MLib/Evaluation : Evaluating Performance Of A Classifier

    20-Mar-2016

    One of the major criteria  in selecting a model for a classifier is the performance
    capability of the models.Couple of metrics which can assist in making this decision is PR curver
    and ROC curve.A brief description of these metrics are given below.

    Precision -Recall(PR)...

    Read More
    R

    R/stats : Outlier Detection Using Hierarchical Clustering

    29-Feb-2016

    Kmeans is the most popular among the clustering techniques but comes with an overhead of selecting an optimum cluster size for more effective output.Hierarchical Clustering option is a good option in such scenarios since there is no need to provide the cluster informat...

    Read More

    Older Posts >

    Please reload