Feature engineering in relational data

Feature selection techniques Automated feature engineering tool & example example-1 example-2 here

Read more

Confusion Matrix..confusing?

Read more

Machine Learning Interview | Basics | Quick revision

Here it goes Deep learning Error Analysis (Bias variance, train/train-dev/dev/test) Why use softmax only in the output layer and not in hidden layers further explaination Tradeoff batch size vs. number of iterations to train a neural network Write your own custom activation function from scratch Write your own custom activation function from tensorflow primitives What do you mean by 1D, 2D and 3D Convolutions in CNN Common causes of nans during deep training Introducing both L2 regularization and dropout into the network. Does it Makes sense? How do CNNs Deal with Position Differences? How many images do you need to train a neural network? what is GEMM in deep learning ? Know more about Gradient descent NLP best practices Activation function should be differentiable..Always? Know internals of Neural network & train efficiently - 1 Efficient Backprop by Yann Le cun tanh activation function vs sigmoid activation function how to take derivative of sigmoid The Right Way to Oversample in Predictive Modeling Implement from scratch - Neural networks Implement from scratch - Logistic regression Understanding Convolutions Vanishing gradient problem Walkthrough of back-propagation Derivation of back-propagation Why do we use activation function ?

Read more

StegoSOC AI driven threat detection

This post is about how we are trying to automate the threat detection for an enterprise be it public, private or hybrid cloud setup. Real time threat detection is imperative with GDPR and other cyberlaws. Threat detection is always been an expensive solution to implement. It requires CSO and Security Analyst to implement the solution. StegoSOC brings down time to detect threats at 1/10th cost on public clouds with AI and cloud-native technologies. Some backgroundSecurity is of paramount significance for any organization having its production servers deployed in its own enterprise infrastructure or in the cloud. Due to unceasing evolution of software vulnerabilities, misconfigured devices in the network and software vulnerabilities already present in the devices in the network, an enterprise is always under a major risk of intrusion by an attack either from inside or outside the organization network. These attackers are highly proficient, have malicious intentions and can cause various kinds of exploits leading to attacks targeting critical resource or tampering the integrity of crucial information present in assets in the network, which might influence a company’s major business decision. What have we done? To summarize We started off with logs as an input, since application is what is exposed off mostly. But as we all know, logs aren’t structured depending upon the applications you use & mechanism of your logging. To add to this, there is always an ever-increasing list of tools/applications being deployed on servers with a different structure every time. As thehackernews rightly quotes here The very purpose of IT security is to be proactive and the above measures make it more difficult for someone who attempts to compromise the network. This might just not be enough and you need to able to detect the actual breaches as they are being attempted. This is where log data really help. To expose an attack or identify the damage caused, you need to analyze the log events on your network in real-time. By collecting and analyzing logs, you can understand what transpires within your network. Each log file contains many pieces of information that can be invaluable, especially if you know how to read them and analyze them. With proper analysis of this actionable data you can identify intrusion attempts, mis-configured equipment, and many more. Also for managing compliance, especially for PCI DSS — you need to retain logs and review them. I will now describe step-by-step AI coons that we have deployed for customer. So, our first hurdle. How do we come up with a universal structure for logs ? Log-Parsing I will not go into technical details, but the foremost business requirements of Log-parsers were :- Parse a log into meaningful attributes. if log format is not supported, log ingestion pipeline should quickly (roughly a few hours — depending upon traffic of unknown log formats) ingest the new format into existing supported formats. It should support a enterprise specific logging along with a global engine. L1 — Rule based detection This module should support filtration through handwritten rules. The handwritten rules helps to filter out common attacks in Cybersecurity and it also helps us to put enterprise in loop to draft their own rulesets. Since every enterprise has their own set of route rules, firewall rules, change management policies etc, this was really important. To incorporate all of this, it was really important to have global plus local rulesets. L2 — Anomaly detection Whatever goes undetected, goes through Anomaly detection, which not only alerts the SOC admin every hours, as to what is happening in their infra but also tells who is producing anomalies in your system. L3 — Attack Graph Logs analysed, now what ? To extend this, we then started an additional effort to model the interaction of vulnerabilities present in the system and the network configuration. The information in the National Vulnerability Database (NVD), the information extracted from machine and network configurations are used as base information for the attack graph engine. We also try to capture the operating system’s behaviour and interaction of various components in the network. Inputs: Advisories: Vulnerabilities that exist on the machine Host Configuration: software and services running on the hosts, and their configurations. Network Configuration: configurations of the network routers and firewalls Principles: legitimate users of the company’s network Interaction: interaction model of the network elements Policy: permitted policy AWS — attack graph Above you can see a sample run of attack graph engine on our AWS account, wherein it shows, how an attacker can reach each machine on account of vulnerabilities that exist in the enterprise and network configuration. I know the graph snapshot is hard to understand, so in next post, I will write technical details on this as well as Neo4J Visualization. ========================================================== Finally Demostrating StegoSOC AI results Sample StegoSOC-AI results ========================================================== Want to deploy StegoSOC in your enterprise ? Visit our website here & Signup or directly mail us support@stegosoc.com | Contact us AI team: Prashant Gupta, Ayush Rai Security Lead: Munish Kumar Product Lead: Mir Adnan

Read more

OCR sample results

Demonstrating OCR results Sample OCR results

Read more

Myntra challenge

I Participated in Myntra Data Science Challenge. The challenge was about predicting the graphic type of T-shirt given the image of T-shirt. From Business point of view, Knowing that in e-commerce industry, there is a lot of dead-stock, this can be used to rotate products in Inventory thereby reducing loss to company and enhance customer experience as well by giving them enough options to choose from latest trendy T-shirts that people are wearing in Social Media. Here are the summary from the Webinar by Mr. Anoop from Myntra. Accidentally, I just saw webinar on 2nd April’18. How to deal with skewed data distribution? - Image Augmentation. The Accuracy should be seen per class i.e… Confusion matrix should be built for each class rather than overall data, since overall data is badly skewed in an extreme case of 0.4% to 40% A class can be embedded into another class. Like Humour style T-shirts could be embedded into Geometry style T-shirts (Tom & Jerry in a triangle shape). In this case prefer to output a class which is more visble i.e.. Humour in this case. There is a General Class called Graphic, which is like if you don’t know precisely which class to identify, pick Graphics class just like Others class. Localisation can be done to crop the image and get only the T-shirt part of it i.e.. cut the faces, because in some images, there were multiple people standing. Even boy/girl face could introduce bias. Multi-class labels are not allowed. Choose the one with maximum confidence. No metadata should be used to do classification. Input is ony an image and the output should be label. Since the data distribution, is same in Train/test/valid, we can choose to ignore classes with like 1 sample i.e.. Horizontal Stripe.

Read more

Transferring file from my MAC to MSI Windows laptop - GPU

Prepare the PC for the files and find the PC’s network address On the Windows 10 PC If you’re migtating from a MAC to a Windows 10 PC, follow these steps to create and then share a folder, and then find your PC’s IP address: First, create and share a folder on your desktop. To do this, tollow these steps: Right-click the Desktop and click New and then Folder and name the folder something like “From my MAC” Right click the new folder on the desktop and click Share with and select Specific People. If you see your user name in the File Sharing window you are ready to receive files, click Share and click Continue If prompted, and then click Done. To transfer your files over a network, you must connect both the Mac and the PC to the network. You will need a shared folder on the PC to migrate the files to and you will need the IP address for the PC, see Step 1 in this article to create and share the folder and get the IP address of the PC. To Connect the MAC to the network and connect to the shared folder on the PC, follow these steps: With Finder open on the Mac, press Command+K, or select Connect to Server from the Go menu. Type smb:// and then the network address of the PC that you want to transfer files to. Example: smb://172.16.10.11 Cick Connect, you will be prompted to authenticate, if you have not specified a shared folder you will be prompted to select one. Once your connected to the PC, locate the files to be migrated and drag them to the shared folder on the PC

Read more

Sentimental Analysis

Sample sentence 1 Sample Sentence 2 Since these are really long, let’s instead gain some insight by looking at the Wordclouds formed by combining the positive and negative reviews: Generate word cloud A wordcloud is a pretty common visualization in textual data, where word sizes are proportional to their occurences in the data. So, it is really handy to visualize, occurences of words keeping in mind their sentiments. Extremely Negative Negative Neutral Positive Extremely Positive we can clearly infer from the Wordclouds, the sentiment expressed, starting from high counts of bad in the extremely negative cloud to best in the extremely positive cloud. We will be using the following metrics to benchmark our performance: Accuracy Confusion Matrix Before we start on anything in NLP, data cleansing is very important. See this post for data cleansing. Approaches Simple Dictionary Lookup A classical technique for sentiment analysis, dictionary based lookups have recieved tons of criticism for being inexhaustive, ignoring semantic meaning and many others. Yet, they were amongst the first and simplest techniques to be applied. The steps are simple: Have a dictionary with a key-value pair as word:score, where score should be positive for positive words and negative for negative words. Start iterating through a given review word by word with a score counter of 0. If the word being considered is present in the dictionary, add its score to the score counter. The final value of the score counter and the end of the review determines the label to be assigned. Extremely Positive For this model we have used the AFINN dictionary Finally, if your looking for a list of the best additional dictionaries to experiment with you can check this link. Bag of Words Approach(BOW) What is Bag of words? Read here Doing sentimental Analysis with BOW

Read more

Context free Grammer

Parsing: Input - Sentence Output - Parsed tree Parsing is a supervised machine learning problem. Training can be achieved by Treebank which consists of several sentences and their associated parsed trees. One example is Penn WSJ Treebank The leaf nodes makes up a sentences. THEN part of speech tagging THEN PHRASES/CONSTITUENTS. NP - noun phrase VP - verb phrase DT - Determiner S - Sentence V - Verb N - Noun

Read more

Probability Basic Questions

Always remember in probability questions, if multiple objects are being picked simulataneously or one by one. for practice try this and this Circular permutation (n-1)! , how? solve this

Read more