Graph database Neo4j discovers fake reviews on Amazon

Digital Detective

Tracking Down Suspects

Once the data has been bundled onto the Neo4j server, users can type interactive commands in the Cypher shell to make queries and start analyses. Figure 2 shows a call to the Similarity algorithm [2] from a Neo4j plugin of scientific tools.

Figure 2: The Similarity algorithm has tagged reviewers 1 and 2 as suspicious.

The algorithm finds nodes in the graph that are connected by their relations to as many common neighbors as possible and then evaluates these as similar. It calculates the numerical degree of similarity from the Jaccard index [3] of the candidates.

Figure 2 shows the result: Obviously the algorithm has determined that reviewers 1 and 2 have jointly evaluated products 1 and 2 and therefore assigns a numerical similarity value of 1.0 to the two rascals. Of course, this is not yet hard evidence of unfair practices, but the result at least shows where you could drill down further to reveal more evidence in a suspicious case.

What is interesting in the result is that other reviewers also evaluated several products, but not the same products in partnership, and were therefore given a lower similarity value. For example, reviewer 8 rated products 2 and 5, and reviewer 4 rated products 2 and 4, both receiving only 0.5 on the similarity scale because their behavior was less suspicious.

In the Thick of It

The best way to install a Neo4j instance on your home computer is to use a Docker container, which the command docker run retrieves from the network to launch a Neo4j server (Figure 3). Then, you can jump into the container by typing docker exec and open the interactive Neo4j Cypher shell to send commands to the server.

Figure 3: Docker commands retrieve Neo4j from the network, launch the server in a container, and open the interactive Cypher shell.

To allow browsers and API scripts to access the containerized Neo4j server from outside, the call in Figure 3 exports ports 7474 and 7687 from the container to the host machine, where the user can then access the Neo4j web server in a browser over http://localhost:7474 .

After feeding the data into Neo4j, the browser view in Figure 1 pointing to http://localhost:7474 shows the advanced relationship model. On port 7687, the server in the container listens for commands from the Bolt terminal API officially used by Neo4j; scripts can use this port to query the database and feed in new data.

The call to Docker connects the data/, logs/, import/, and plugins/ directories on the host to the container, which allows the host and the container to exchange database files and logs; the user can load new plugins off the network in plugins/ and upload them to the container.

Automatic Feed

Once the server is running in the container, the Go program can form a series of Neo4j commands from the YAML list of review data to feed the relationships into the database. To do this, first create nodes of the Reviewer and Product types and then insert a relation reviewed between the two (Listing 2); you could also enter these commands manually in the Cypher shell.

Listing 2


01 MERGE (product1:Product {name:'product1'})
02 MERGE (reviewer1:Reviewer {name:'reviewer1'})
03 MERGE (reviewer1)-[:Reviewed {name: 'reviewed'}]-(product1)
04 MERGE (reviewer2:Reviewer {name:'reviewer2'})
05 MERGE (reviewer2)-[:Reviewed {name: 'reviewed'}]-(product1)
06 MERGE (reviewer3:Reviewer {name:'reviewer3'})
07 MERGE (reviewer3)-[:Reviewed {name: 'reviewed'}]-(product1)
08 MERGE (reviewer7:Reviewer {name:'reviewer7'})
09 MERGE (reviewer7)-[:Reviewed {name: 'reviewed'}]-(product1)
10 [...]

The MERGE command creates a new entry, either a node or a relation, which could just as easily be done with a CREATE command; however, MERGE will not run wild if the entry already exists. Line 1 creates a new node of type Product, assigns it the name attribute product1, and stores a reference to it in the product1 variable. The same happens with a Reviewer node in line 2; line 3 then links the previously defined reviewer1 and product1 variables with a relation of type Reviewed, which sets the name attribute to reviewed.

Entering all the data manually would quickly get on a user's nerves, which is why the Go program in Listing 3 automates the task of generating a series of Neo4j commands from the YAML list and sends them over port 7474 to the Neo4j server running in the container.

Listing 3


01 package main
03 import (
04   "database/sql"
05   "fmt"
06   _ ""
07   ""
08   "io/ioutil"
09   "log"
10 )
12 type Config struct {
13   Reviews map[string][]string
14 }
16 func main() {
17   yamlFile := "reviews.yaml"
18   data, err := ioutil.ReadFile(yamlFile)
19   if err != nil {
20     log.Fatal(err)
21   }
23   var config Config
24   err = yaml.Unmarshal(data, &config)
25   if err != nil {
26     log.Fatal(err)
27   }
29   created := map[string]bool{}
30   cmd := ""
31     // nuke all content
32   toNeo4j(`MATCH (n) OPTIONAL MATCH
33            (n)-[r]-() DELETE n,r;`)
35   for prod, reviewers :=
36       range config.Reviews {
37     for _, rev := range reviewers {
38       if _, ok := created[prod]; !ok {
39         cmd += fmt.Sprintf(
40         "MERGE (%s:Product {name:'%s'})\n",
41           prod, prod)
42         created[prod] = true
43       }
44       if _, ok := created[rev]; !ok {
45         cmd += fmt.Sprintf(
46         "MERGE (%s:Reviewer {name:'%s'})\n",
47           rev, rev)
48         created[rev] = true
49       }
50       cmd += fmt.Sprintf(
51         "MERGE (%s)-[:Reviewed " +
52         "{name: 'reviewed'}]-(%s)\n",
53          rev, prod)
54     }
55   }
56   cmd += ";"
57   toNeo4j(cmd)
58 }
60 func toNeo4j(cmd string) {
61   db, err := sql.Open("neo4j-cypher",
62     "http://neo4j:test@localhost:7474")
63   if err != nil {
64     log.Fatal(err)
65   }
66   defer db.Close()
68   _, err = db.Exec(cmd)
70   if err != nil {
71     log.Fatal(err)
72   }
73 }

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus