Who Am I?

Hi.....I am Sanjay Menon.A Data Science enthusiast who loves exploring the application of Data Science principles in Security Analytics.I have rich experience across large enterprises like Symantec,Deutsche Bank,J.P.Morgan ,HP and Mercedez-Benz.
I hold Masters degree in Information Security from Royal Holloway,University of London and holds Certificate in Data Science from University of Washington and Certificate in Statistical Inference and Regression Techniques from Duke University.I am a CISSP and also holds Hortonworks Certified Hadoop Developer certification along with certificate from Elastic for ELK stack along with many other Technical as well as Security domain certifications
14-Nov-2017
Summary description of an incident tickets contains the major information about the incident.The data however is mostly unstructured and major effort to utilize this data in analytics requires high effort in the ETL process.Once we have the data in a structured for...
14-Nov-2017
Major reason for the security breaches happening in enterprises is the existence of vulnerable assets primarily due to unpatched vulnerabilities.
Below spark code utilizes Kmeans as the clustering algorithm and using various attributes like criticality of open vulnerab...
12-Nov-2017
As part of incident response function,attaining capabilities to predict the incidents on different assets based on historical data of incidents and their severity will be a useful tool for proactive incident prevention.
Below Spark code utilizes the collaborative filte...
12-Nov-2017
During a security incident scenario,it is highly useful to identify infrastructure components which are directly or indirectly connected infrastructure components and this should be part of Incident response plan.
Breadth-first search (BFS) is an algorithm for traversi...
2-May-2016
Logstash ships with a geoip filter which is quite useful in plotting attributes related to attacks on a world map. Logstash releases ship with the GeoLiteCity database made available from Maxmind with a CCA-ShareAlike 3.0 license.
For example,the below code p...
2-May-2016
MLib provides a pipeline function which can take multiple algorithms as input and execute them A as a sequence of stages, and each stage is either a Transformer or an Estimator. These stages are run in order, and the input DataFrame is transformed as it passes through...
21-Mar-2016
A major success critera for a rule in DecisionTree algorithm to be good is to ensure that target values are relatively homogenous or pure.Two common measures of impurity used with DecisionTree are Gini impurity and Entropy
Gini impurity measures the acc...
21-Mar-2016
When evaluating a classifier,generating a confusion matrix for the model gives indication on the performance of the model.Confusion matrix provides a statistical view on the parameters of Accuracy,Precision and Recall capabities of the model considering the True /Fase...
20-Mar-2016
One of the major criteria in selecting a model for a classifier is the performance
capability of the models.Couple of metrics which can assist in making this decision is PR curver
and ROC curve.A brief description of these metrics are given below.
Precision -Recall(PR)...
29-Feb-2016
Kmeans is the most popular among the clustering techniques but comes with an overhead of selecting an optimum cluster size for more effective output.Hierarchical Clustering option is a good option in such scenarios since there is no need to provide the cluster informat...