Tag Archives: law enforcement software

What is Link / Social Network Analysis?

Posted by Crime Tech SolutionsPic003

Computer-based link analysis is a set of techniques for exploring associations among large numbers of objects of different types. These methods have proven crucial in assisting human investigators in comprehending complex webs of evidence and drawing conclusions that are not apparent from any single piece of information. These methods are equally useful for creating variables that can be combined with structured data sources to improve automated decision-making processes. Typically, linkage data is modeled as a graph, with nodes representing entities of interest and links representing relationships or transactions. Links and nodes may have attributes specific to the domain. For example, link attributes might indicate the certainty or strength of a relationship, the dollar value of a transaction, or the probability of an infection.

Some linkage data, such as telephone call detail records, may be simple but voluminous, with uniform node and link types and a great deal of regularity. Other data, such as law enforcement data, may be extremely rich and varied, though sparse, with elements possessing many attributes and confidence values that may change over time.

Various techniques are appropriate for distinct problems. For example, heuristic, localized methods might be appropriate for matching known patterns to a network of financial transactions in a criminal investigation. Efficient global search strategies, on the other hand, might be best for finding centrality or severability in a telephone network.

Link analysis can be broken down into two components—link generation, and utilization of the resulting linkage graph.

Link Generation

Link generation is the process of computing the links, link attributes and node attributes. There are several different ways to define links. The different approaches yield very different linkage graphs. A key aspect in defining a link analysis is deciding which representation to use.

Explicit Links

A link may be created between the nodes corresponding to each pair of entities in a transaction. For example, with a call detail record, a link is created between the originating telephone number and the destination telephone number. This is referred to as an explicit link.

Aggregate Links

A single link may be created from multiple transactions. For example, a single link could represent all telephone calls between two parties, and a link attribute might be the number of calls represented. Thus, several explicit links may be collapsed into a single aggregate link.

Inferred Relationships

Links may also be created between pairs of nodes based on inferred strengths of relationships between them. These are sometimes referred to as soft links, association links, or co-occurrence links. Classes of algorithms for these computations include association rules, Bayesian belief networks and context vectors. For example, a link may be created between any pair of nodes whose context vectors lie within a certain radius of one another. Typically, one attribute of such a link is the strength of the relationship it represents. Time is a key feature that offers an opportunity to uncover linkages that might be missed by more typical data analysis approaches. For example, suppose a temporal analysis of wire transfer records indicates that a transfer from account A to person X at one bank is temporally proximate to a transfer from account B to person Y at another bank. This yields an inferred link between accounts A and B. If other aspects of the accounts or transactions are also suspicious, they may be flagged for additional scrutiny for possible money laundering activity.

A specific instance of inferred relationships is identifying two nodes that may actually correspond to the same physical entity, such as a person or an account. Link analysis includes mechanisms for collapsing these to a single node. Typically, the analyst creates rules or selects parameters specifying in which instances to merge nodes in this fashion.


Once a linkage graph, including the link and node attributes, has been defined, it can be browsed, searched or used to create variables as inputs to a decision system.


In visualizing linking graphs, each node is represented as an icon, and each link is represented as a line or an arrow between two nodes. The node and link attributes may be displayed next to the items or accessed via mouse actions. Different icon types represent different entity types. Similarly, link attributes determine the link representation (line strength, line color, arrowhead, etc.).

Standard graphs include spoke and wheel, peacock, group, hierarchy and mesh. An analytic component of the visualization is the automatic positioning of the nodes on the screen, i.e., the projection of the graph onto a plane. Different algorithms position the nodes based on the strength of the links between nodes or to agglomerate the nodes into groups of the same kind. Once displayed, the user typically has the ability to move nodes, modify node and link attributes, zoom in, collapse, highlight, hide or delete portions of the graph.

Variable Creation

Link analysis can append new fields to existing records or create entirely new data sets for subsequent modeling stages in a decision system. For example, a new variable for a customer might be the total number of email addresses and credit card numbers linked to that customer.


Link analysis query mechanisms include retrieving nodes and links matching specified criteria, such as node and link attributes, as well as search by example to find more nodes that are similar to the specified example node.

A more complex task is similarity search, also called clustering. Here, the objective is to find groups of similar nodes. These may actually be multiple instances of the same physical entity, such as a single individual using multiple accounts in a similar fashion.

Network Analysis

Network analysis is the search for parts of the linkage graph that play particular roles. It is used to build more robust communication networks and to combat organized crime. This exploration revolves around questions such as:

  • Which nodes are key or central to the network?
  • Which links can be severed or strengthened to most effectively impede or enhance the operation of the network?
  • Can the existence of undetected links or nodes be inferred from the known data?
  • Are there similarities in the structure of subparts of the network that can indicate an underlying relationship (e.g., modus operandi)?
  • What are the relevant sub-networks within a much larger network?
  • What data model and level of aggregation best reveal certain types of links and sub-networks?
  • What types of structured groups of entities occur in the data set?


Link analysis tools such as those provided by Crime Tech Solutions are increasingly used in law enforcement investigations, detecting terrorist threats, fraud detection, detecting money laundering, telecommunications network analysis, classifying web pages, analyzing transportation routes, pharmaceuticals research, epidemiology, detecting nuclear proliferation and a host of other specialized applications. For example, in the case of money laundering, the entities might include people, bank accounts and businesses, and the transactions might include wire transfers, checks and cash deposits. Exploring relationships among these different objects helps expose networks of activity, both legal and illegal.

The Name Game Fraud

  1. Hello-my-name-is1Posted by Douglas Wood, Editor. Alright everybody, let’s play a game. The name game!

“Shirley, Shirley bo Birley. Bonana fanna fo Firley. Fee fy mo Mirley. Shirley!” No, not THAT name game. (Admit it… you used to love singing the “Chuck” version, though.)

The name game I’m referring to is slightly more sinister, and relates to the criminal intent to deceive others for gain by slightly misrepresenting attributes in order to circumvent fraud detection techniques. Pretty much anywhere money, goods, or services are dispensed, folks play the name game.

Utilities, Insurance, Medicaid, retail, FEMA. You name it.

Several years ago, I helped a large online insurance provider determine the extent to which they were offering insurance policies to corporations and individuals with whom they specifically did not want to do business. Here’s what the insurer knew:

  1. They had standard application questions designed to both determine the insurance quote AND to ensure that they were not doing business with undesirables. These questions included things such as full name, address, telephone number, date of birth, etc… but also questions related to the insured property. “Do you live within a mile of a fire station?”, Does your home have smoke detectors?”, and “Is your house made of matchsticks?”
  2. On top of the questions, the insurer had a list of entities with whom the knew they did not want to do business for one reason or another. Perhaps Charlie Cheat had some previously questionable claims… he would have been on their list.

In order to circumvent the fraud prevention techniques, of course, the unscrupulous types figured out how to mislead the insurer just enough so that the policy was approved. Once approved, the car would immediately be stolen. The house would immediately burn down, etc.

The most common way by which the fraudsters misled the insurers was a combination of The Name Game and modifying answers until the screening system was fooled. Through a combination of investigative case management and link analysis software, I went back and looked at several months of historical data and found some amazing techniques used by the criminals. Specifically, I found one customer who made 19 separate online applications – each time changing just one attribute or answer slightly – until the policy was issued. Within a week of the policy issue, a claim was made. You can use your imagination to determine if it was a legitimate claim. 😀

This customer, Charlie Cheat (obviously not his real name), first used his real name, address, telephone number, and date of birth… and answered all of the screening questions honestly. Because he did not meet the criteria AND appeared on an internal watch list for having suspicious previous claims, his application was automatically denied. Then he had his wife, Cheri Cheat, complete the application in hopes that the system would see a different name and approve the policy. Thirdly, he modified his name to Charlie Cheat, Chuck E. Cheat, and so on. Still no go. His address went from 123 Fifth Street to 123-A 5th Street. You get the picture.

Then he began to modify answers to the screening questions. All of a sudden, he DID live within a mile of a fire station… and his house was NOT made of matchsticks… and was NOT located next door to a fireworks factory. After almost two dozen attempts, he was finally issued the policy under a slightly revised name, a tweak in his address, and some less-than-truthful answers on the screening page. By investing in powerful  investigative case management software with link analysis and fuzzy matching this insurer was able to dramatically decrease the number of policies issued to known fraudsters or otherwise ineligible entities.

Every time a new policy is applied for, the system analyzes the data against previous responses and internal watch lists in real time.  In other words, Charlie and Cheri just found it a lot more difficult to rip this insurer off. These same situations occur in other arenas, costing us millions annually in increased taxes and prices. So, what happened to the Cheats after singing the name game?

Let’s just say that after receiving a letter from the insurer, Charlie and Cheri started singing a different tune altogether.

Asking data questions

Posted by Douglas Wood, Editor.  http://www.linkedin.com/in/dougwood.

A brief read and good perspective from my friend Chris Westphal of Raytheon. The article is by Anna Forrester of ExecutiveGov.com.

Federal managers should invest in technology that would help them extract insights from data and base their investment decision on the specific problems and information they want to learn and solve, Federal Times reported Friday.

Rutrell Yasin writes that the above managers should follow three steps as they seek to compress the high volume of data their agencies encounter in daily tasks and to derive value from them.

According to Shawn Kingsberry, chief information officer for the Recovery Accountability and Transparency Board, federal managers should first determine the questions they need to ask of data then create a profile for the customer or target audience.

Next, they should locate the data and their sources then correspond with those sources to determine quality of data, the report said. “Managers need to know if the data is in a federal system of records that gives the agency terms of use or is it public data,” writes Yasin.

Finally, they should consider the potential impact of the data, the insights and resulting technology investments on the agency.

Yasin reports that the Recovery Accountability and Transparency Board uses data analytics tools from Microsoft, SAP and SAS and link analysis tools from Palantir Technologies.

According to Chris Westphal, director of analytics technology at Raytheon, organizations should invest in a platform that gathers data from separate sources into a single data repository with analytics tools.

Yasin adds that agencies should also appoint a chief data officer and data scientists or architects to assist the CIO and CISO on these areas.