Here it goes
This post is about how we are trying to automate the threat detection for an enterprise be it public, private or hybrid cloud setup.
Real time threat detection is imperative with GDPR and other cyberlaws. Threat detection is always been an expensive solution to implement. It requires CSO and Security Analyst to implement the solution.Continue Reading...
I Participated in Myntra Data Science Challenge. The challenge was about predicting the graphic type of T-shirt given the image of T-shirt. From Business point of view, Knowing that in e-commerce industry, there is a lot of dead-stock, this can be used to rotate products in Inventory thereby reducing loss to company and enhance customer experience as well by giving them enough options to choose from latest trendy T-shirts that people are wearing in Social Media.Continue Reading...
Prepare the PC for the files and find the PC’s network address
On the Windows 10 PC
If you’re migtating from a MAC to a Windows 10 PC, follow these steps to create and then share a folder, and then find your PC’s IP address:Continue Reading...
Input - Sentence Output - Parsed tree
Parsing is a supervised machine learning problem. Training can be achieved by Treebank which consists of several sentences and their associated parsed trees. One example is Penn WSJ Treebank
The leaf nodes makes up a sentences. THEN part of speech tagging THEN PHRASES/CONSTITUENTS.
NP - noun phrase VP - verb phrase DT - Determiner S - Sentence V - Verb N - Noun
How this setup works?
Flask is managed by uWSGI.
uWSGI talks to nginx.
nginx handles contact with the outside world.
When a client connects to your server trying to reach your Flask app:
nginx opens the connection and proxies it to uWSGI
uWSGI handles the Flask instances you have and connects one to the client
Flask talks to the client happily
Cross validation is a model evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication of how well the learner will do when it is asked to make new predictions for data it has not already seen. One way to overcome this problem is to not use the entire data set when training a learner. Some of the data is removed before training begins. Then when training is done, the data that was removed can be used to test the performance of the learned model on new data. This is the basic idea for a whole class of model evaluation methods called cross validation.
Know more techniques? Start here
One method of judging the quality of a particular model is by residuals. That means the model is fit using all the data points and the prediction for each data point is compared with its actual output. The absolute value of each error is taken and the mean of those values is computed to arrive at the mean absolute residual error. Models with lower values of this measure are deemed to be better.
There are always a plethora of metrics in machine learning that can be used to evaluate the performance of a ML model. This is an attempt to draw a metric map, just to keep them all in one place.Continue Reading...
Machine Learning Algorithms in one glance.Continue Reading...
Table of contents
- Benchmarking right way - Introduction
- Benchmarking time
- Memory benchmark
- CPU benchmark
Benchmakring API should be majorly concerned with time, cpu & memory. One should also be concerned about the number of concurrent connections the API can handle in Prod env. This post will help you understand the mechanisms of network slowdowns.Continue Reading...
Reliability…what is that?
“continuing to work correctly, even when things go wrong.” The things that can go wrong are called faults, and systems that anticipate faults and can cope with them are called fault-tolerant or resilient. The former term is slightly misleading: it suggests that we could make a system tolerant of every possible kind of fault, which in reality is not feasible. If the entire planet Earth (and all servers on it) were swallowed by a black hole, tolerance of that fault would require web hosting in space—good luck getting that budget item approved. So it only makes sense to talk about tolerating certain types of faults. Note that a fault is not the same as a failure. A fault is usually defined as one component of the system deviating from its spec, whereas a failure is when the system as a whole stops providing the required service to the user. It is impossible to reduce the probability of a fault to zero; therefore it is usually best to design fault-tolerance mechanisms that prevent faults from causing failures.Continue Reading...
Table of contents
- Injection Attacks
- Broken authentication & session management
- Cross site scripting
- Insecure direct object references
- Security Misconfiguration
- Sensitive data exposure
- Missing function level access control
- Cross site request forgery
- Unvalidated redirects & Forwards
Table of contents
- Software Design Theoritical concepts - Introduction
- Class Diagrams
- Sample UML Diagrams examples
- Object oriented cheat-sheet
To successfully build the pipeline, it was required to automate all the
yes invocation while executing Anaconda sh file.
I did by invoking sh file with
bash Anaconda2-5.0.1-Linux-x86_64.sh -b
You have done all your research, prototyped it, optimize it and now you are ready to ship it. This post not only focusses on shipping machine learning modules but python based codebases in general.
How do you ship ?
- Expose an API
- Package your code in a single executable
To start with
feature selection: select a subset of the original feature set.
feature extraction: build new set of features from original feature set.
What is Language Modelling ?
Language modeling in very simple terms is the task of assigning a probability to sentences in a language. Besides assigning a probability to each sequence of words, the language models also assigns a probability for the likelihood of a given word (or a sequence of words) to follow a sequence of words.Continue Reading...
Why is word embedding needed?
Purpose is to create a representation for words that capture their meanings, semantic relationships and the different types of contexts they are used in.
And all of these are implemented by using Word Embeddings or numerical representations of texts so that computers may handle them.
What is word embedding ?
In very simplistic terms, Word Embeddings are the texts converted into numbers and there may be different numerical representations of the same text. But before we dive into the details of Word Embeddings, the following question should be asked – Why do we need Word Embeddings?
As it turns out, many Machine Learning algorithms and almost all Deep Learning Architectures are incapable of processing strings or plain text in their raw form. They require numbers as inputs to perform any sort of job, be it classification, regression etc. in broad terms. And with the huge amount of data that is present in the text format, it is imperative to extract knowledge out of it and build applications. Some real world applications of text applications are – sentiment analysis of reviews by Amazon etc., document or news classification or clustering by Google etc.
Let us now define Word Embeddings formally. A Word Embedding format generally tries to map a word using a dictionary to a vector. Let us break this sentence down into finer details to have a clear view.
Take a look at this example – sentence=” Word Embeddings are Word converted into numbers ”
A word in this sentence may be “Embeddings” or “numbers ” etc.
A dictionary may be the list of all unique words in the sentence. So, a dictionary may look like – [‘Word’,’Embeddings’,’are’,’Converted’,’into’,’numbers’]
A vector representation of a word may be a one-hot encoded vector where 1 stands for the position where the word exists and 0 everywhere else. The vector representation of “numbers” in this format according to the above dictionary is [0,0,0,0,0,1] and of converted is[0,0,0,1,0,0].
This is just a very simple method to represent a word in the vector form. Let us look at different types of Word Embeddings or Word Vectors and their advantages and disadvantages over the rest.
Different types of Word Embeddings
The different types of word embeddings can be broadly classified into two categories-
- Frequency based Embedding
- Prediction based Embedding
Let us try to understand each of these methods in detail.
2.1 Frequency based Embedding
There are generally three types of vectors that we encounter under this category.
Count Vector TF-IDF Vector Co-Occurrence Vector
Let us look into each of these vectorization methods in detail.
To get a better semantic understanding of a word, word2vec was published for nlp community.Continue Reading...
Feature Extraction from texts using Bag of words
The bag of words model ignores grammar and order of words. ‘All my cats in a row’, ‘When my cat sits down, she looks like a Furby toy!’,
Breaking down the given sentences into words and assigning them each a unique IDContinue Reading...
I am really fascinated by the subject of this broad research topic. So, I decided to play around things and this post is a serially arranged attempts of mine into visual recognition.
What inspired me ?
Dr. Fei Fei Li with her TED talkContinue Reading...
Data are pieces of information about individuals organized into variables. By an individual, we mean a particular person or object. By a variable, we mean a particular characteristic of the individual.
Variables can be classified into one of two types: categorical or quantitative.Continue Reading...
I stepped into competitve programming in my college. I started from SPOJ attempted Life, the Universe, and Everything and wola got compilation errror :laughing:
Anyways, that was a learning curve and I continued with other platforms like Codechef and Codeforces along with SPOJ. Topcoder problems were tough then and now as well :wink:
In the spirit of making myself a better developer, I am releasing all my submitted solutions of various problems on all platforms. The main reason behind putting all my codes at one place Continue Reading...
Ignore file/folder while commit
For a File For a folder Continue Reading...
To monitor CPU Usage on any Linux distribution, useContinue Reading...