Tag Archives: text analytics

What is Link / Social Network Analysis?

Posted by Crime Tech SolutionsPic003

Computer-based link analysis is a set of techniques for exploring associations among large numbers of objects of different types. These methods have proven crucial in assisting human investigators in comprehending complex webs of evidence and drawing conclusions that are not apparent from any single piece of information. These methods are equally useful for creating variables that can be combined with structured data sources to improve automated decision-making processes. Typically, linkage data is modeled as a graph, with nodes representing entities of interest and links representing relationships or transactions. Links and nodes may have attributes specific to the domain. For example, link attributes might indicate the certainty or strength of a relationship, the dollar value of a transaction, or the probability of an infection.

Some linkage data, such as telephone call detail records, may be simple but voluminous, with uniform node and link types and a great deal of regularity. Other data, such as law enforcement data, may be extremely rich and varied, though sparse, with elements possessing many attributes and confidence values that may change over time.

Various techniques are appropriate for distinct problems. For example, heuristic, localized methods might be appropriate for matching known patterns to a network of financial transactions in a criminal investigation. Efficient global search strategies, on the other hand, might be best for finding centrality or severability in a telephone network.

Link analysis can be broken down into two components—link generation, and utilization of the resulting linkage graph.

Link Generation

Link generation is the process of computing the links, link attributes and node attributes. There are several different ways to define links. The different approaches yield very different linkage graphs. A key aspect in defining a link analysis is deciding which representation to use.

Explicit Links

A link may be created between the nodes corresponding to each pair of entities in a transaction. For example, with a call detail record, a link is created between the originating telephone number and the destination telephone number. This is referred to as an explicit link.

Aggregate Links

A single link may be created from multiple transactions. For example, a single link could represent all telephone calls between two parties, and a link attribute might be the number of calls represented. Thus, several explicit links may be collapsed into a single aggregate link.

Inferred Relationships

Links may also be created between pairs of nodes based on inferred strengths of relationships between them. These are sometimes referred to as soft links, association links, or co-occurrence links. Classes of algorithms for these computations include association rules, Bayesian belief networks and context vectors. For example, a link may be created between any pair of nodes whose context vectors lie within a certain radius of one another. Typically, one attribute of such a link is the strength of the relationship it represents. Time is a key feature that offers an opportunity to uncover linkages that might be missed by more typical data analysis approaches. For example, suppose a temporal analysis of wire transfer records indicates that a transfer from account A to person X at one bank is temporally proximate to a transfer from account B to person Y at another bank. This yields an inferred link between accounts A and B. If other aspects of the accounts or transactions are also suspicious, they may be flagged for additional scrutiny for possible money laundering activity.

A specific instance of inferred relationships is identifying two nodes that may actually correspond to the same physical entity, such as a person or an account. Link analysis includes mechanisms for collapsing these to a single node. Typically, the analyst creates rules or selects parameters specifying in which instances to merge nodes in this fashion.

Utilization

Once a linkage graph, including the link and node attributes, has been defined, it can be browsed, searched or used to create variables as inputs to a decision system.

Visualization

In visualizing linking graphs, each node is represented as an icon, and each link is represented as a line or an arrow between two nodes. The node and link attributes may be displayed next to the items or accessed via mouse actions. Different icon types represent different entity types. Similarly, link attributes determine the link representation (line strength, line color, arrowhead, etc.).

Standard graphs include spoke and wheel, peacock, group, hierarchy and mesh. An analytic component of the visualization is the automatic positioning of the nodes on the screen, i.e., the projection of the graph onto a plane. Different algorithms position the nodes based on the strength of the links between nodes or to agglomerate the nodes into groups of the same kind. Once displayed, the user typically has the ability to move nodes, modify node and link attributes, zoom in, collapse, highlight, hide or delete portions of the graph.

Variable Creation

Link analysis can append new fields to existing records or create entirely new data sets for subsequent modeling stages in a decision system. For example, a new variable for a customer might be the total number of email addresses and credit card numbers linked to that customer.

Search

Link analysis query mechanisms include retrieving nodes and links matching specified criteria, such as node and link attributes, as well as search by example to find more nodes that are similar to the specified example node.

A more complex task is similarity search, also called clustering. Here, the objective is to find groups of similar nodes. These may actually be multiple instances of the same physical entity, such as a single individual using multiple accounts in a similar fashion.

Network Analysis

Network analysis is the search for parts of the linkage graph that play particular roles. It is used to build more robust communication networks and to combat organized crime. This exploration revolves around questions such as:

  • Which nodes are key or central to the network?
  • Which links can be severed or strengthened to most effectively impede or enhance the operation of the network?
  • Can the existence of undetected links or nodes be inferred from the known data?
  • Are there similarities in the structure of subparts of the network that can indicate an underlying relationship (e.g., modus operandi)?
  • What are the relevant sub-networks within a much larger network?
  • What data model and level of aggregation best reveal certain types of links and sub-networks?
  • What types of structured groups of entities occur in the data set?

Applications

Link analysis tools such as those provided by Crime Tech Solutions are increasingly used in law enforcement investigations, detecting terrorist threats, fraud detection, detecting money laundering, telecommunications network analysis, classifying web pages, analyzing transportation routes, pharmaceuticals research, epidemiology, detecting nuclear proliferation and a host of other specialized applications. For example, in the case of money laundering, the entities might include people, bank accounts and businesses, and the transactions might include wire transfers, checks and cash deposits. Exploring relationships among these different objects helps expose networks of activity, both legal and illegal.

The Name Game Fraud

  1. Hello-my-name-is1Posted by Douglas Wood, Editor. Alright everybody, let’s play a game. The name game!

“Shirley, Shirley bo Birley. Bonana fanna fo Firley. Fee fy mo Mirley. Shirley!” No, not THAT name game. (Admit it… you used to love singing the “Chuck” version, though.)

The name game I’m referring to is slightly more sinister, and relates to the criminal intent to deceive others for gain by slightly misrepresenting attributes in order to circumvent fraud detection techniques. Pretty much anywhere money, goods, or services are dispensed, folks play the name game.

Utilities, Insurance, Medicaid, retail, FEMA. You name it.

Several years ago, I helped a large online insurance provider determine the extent to which they were offering insurance policies to corporations and individuals with whom they specifically did not want to do business. Here’s what the insurer knew:

  1. They had standard application questions designed to both determine the insurance quote AND to ensure that they were not doing business with undesirables. These questions included things such as full name, address, telephone number, date of birth, etc… but also questions related to the insured property. “Do you live within a mile of a fire station?”, Does your home have smoke detectors?”, and “Is your house made of matchsticks?”
  2. On top of the questions, the insurer had a list of entities with whom the knew they did not want to do business for one reason or another. Perhaps Charlie Cheat had some previously questionable claims… he would have been on their list.

In order to circumvent the fraud prevention techniques, of course, the unscrupulous types figured out how to mislead the insurer just enough so that the policy was approved. Once approved, the car would immediately be stolen. The house would immediately burn down, etc.

The most common way by which the fraudsters misled the insurers was a combination of The Name Game and modifying answers until the screening system was fooled. Through a combination of investigative case management and link analysis software, I went back and looked at several months of historical data and found some amazing techniques used by the criminals. Specifically, I found one customer who made 19 separate online applications – each time changing just one attribute or answer slightly – until the policy was issued. Within a week of the policy issue, a claim was made. You can use your imagination to determine if it was a legitimate claim. 😀

This customer, Charlie Cheat (obviously not his real name), first used his real name, address, telephone number, and date of birth… and answered all of the screening questions honestly. Because he did not meet the criteria AND appeared on an internal watch list for having suspicious previous claims, his application was automatically denied. Then he had his wife, Cheri Cheat, complete the application in hopes that the system would see a different name and approve the policy. Thirdly, he modified his name to Charlie Cheat, Chuck E. Cheat, and so on. Still no go. His address went from 123 Fifth Street to 123-A 5th Street. You get the picture.

Then he began to modify answers to the screening questions. All of a sudden, he DID live within a mile of a fire station… and his house was NOT made of matchsticks… and was NOT located next door to a fireworks factory. After almost two dozen attempts, he was finally issued the policy under a slightly revised name, a tweak in his address, and some less-than-truthful answers on the screening page. By investing in powerful  investigative case management software with link analysis and fuzzy matching this insurer was able to dramatically decrease the number of policies issued to known fraudsters or otherwise ineligible entities.

Every time a new policy is applied for, the system analyzes the data against previous responses and internal watch lists in real time.  In other words, Charlie and Cheri just found it a lot more difficult to rip this insurer off. These same situations occur in other arenas, costing us millions annually in increased taxes and prices. So, what happened to the Cheats after singing the name game?

Let’s just say that after receiving a letter from the insurer, Charlie and Cheri started singing a different tune altogether.

Part Two: Major Investigation Analytics – Big Data and Smart Data

Posted by Douglas Wood, Editor.

As regular readers of this blog know, I spend a great deal of time writing about the use of technology in the fight against crime – financial and otherwise. In Part One of this series, I overviewed the concept of Major Investigation Analytics and Investigative Case Management.

I also overviewed the major providers of this software technology – Palantir Technologies, Case Closed Software, and Visallo. The latter two recently became strategic partners, in fact.

The major case for major case management (pun intended) was driven home at a recent crime and investigation conference in New York. Full Disclosure: I attended the conference for educational purposes as part of my role at Crime Tech Weekly. Throughout the three day conference, speaker after speaker talked about making sense of data. I think if I’d have heard the term ‘big data’ one more time I’d have gone insane.  Nevertheless, that was the topic du jour as you can imagine, and the 3 V’s of big data – volume, variety, and velocity – remain a front and center topic for the vendor community serving the investigation market.

According to one report, 96% of everything we do in life – personal or at work – generates data. That statement probably best sums up how big ‘big data’ is.  Unfortunately,  there was very little discussion about how big data can help investigate major crimes. There was a lot of talk about analytics, for sure, but there was a noticeable lack of ‘meat on the bone’ when it came to major investigation analytics.

Nobody has ever yelled out “Help, I’ve been attacked. Someone call the big data!”. That’s because big data doesn’t, in and by itself, do anything.  Once you can move ‘big data’ into ‘smart data’, however, you have an opportunity to investigate and adjudicate crime. To me, smart data (in the context of investigations) is a subset of an investigator’s ability to:

  1. Quickly triage a threat (or case) using only those bits of data that are most immediately relevant
  2. Understand the larger scope of the crime through experience and crime analytics, and
  3. Manage that case through intelligence-led analytics and investigative case management, data sharing, link exploration, text analytics, and so on.

Connecting the dots, as they say. From an investigation perspective, however, connecting dots can be daunting. In the children’s game, there is a defined starting point and a set of rules.  We simply need to follow the instructions and the puzzle is solved. Not so in the world of the investigator. The ‘dots’ are not as easy to find. It can be like looking for a needle in a haystack, but the needle is actually broken into pieces and spread across ten haystacks.

Big data brings those haystacks together, but only smart data finds the needles… and therein lies the true value of major investigation analytics.