« Previous 1 2 3 Next »
Graph database Neo4j discovers fake reviews on Amazon
Digital Detective
Tracking Down Suspects
Once the data has been bundled onto the Neo4j server, users can type interactive commands in the Cypher shell to make queries and start analyses. Figure 2 shows a call to the Similarity algorithm [2] from a Neo4j plugin of scientific tools.
The algorithm finds nodes in the graph that are connected by their relations to as many common neighbors as possible and then evaluates these as similar. It calculates the numerical degree of similarity from the Jaccard index [3] of the candidates.
Figure 2 shows the result: Obviously the algorithm has determined that reviewers 1 and 2 have jointly evaluated products 1 and 2 and therefore assigns a numerical similarity value of 1.0 to the two rascals. Of course, this is not yet hard evidence of unfair practices, but the result at least shows where you could drill down further to reveal more evidence in a suspicious case.
What is interesting in the result is that other reviewers also evaluated several products, but not the same products in partnership, and were therefore given a lower similarity value. For example, reviewer 8 rated products 2 and 5, and reviewer 4 rated products 2 and 4, both receiving only 0.5 on the similarity scale because their behavior was less suspicious.
In the Thick of It
The best way to install a Neo4j instance on your home computer is to use a Docker container, which the command docker run
retrieves from the network to launch a Neo4j server (Figure 3). Then, you can jump into the container by typing docker exec
and open the interactive Neo4j Cypher shell to send commands to the server.
To allow browsers and API scripts to access the containerized Neo4j server from outside, the call in Figure 3 exports ports 7474 and 7687 from the container to the host machine, where the user can then access the Neo4j web server in a browser over http://localhost:7474 .
After feeding the data into Neo4j, the browser view in Figure 1 pointing to http://localhost:7474 shows the advanced relationship model. On port 7687, the server in the container listens for commands from the Bolt terminal API officially used by Neo4j; scripts can use this port to query the database and feed in new data.
The call to Docker connects the data/
, logs/
, import/
, and plugins/
directories on the host to the container, which allows the host and the container to exchange database files and logs; the user can load new plugins off the network in plugins/
and upload them to the container.
Automatic Feed
Once the server is running in the container, the Go program can form a series of Neo4j commands from the YAML list of review data to feed the relationships into the database. To do this, first create nodes of the Reviewer
and Product
types and then insert a relation reviewed
between the two (Listing 2); you could also enter these commands manually in the Cypher shell.
Listing 2
neo4j-commands.txt
01 MERGE (product1:Product {name:'product1'}) 02 MERGE (reviewer1:Reviewer {name:'reviewer1'}) 03 MERGE (reviewer1)-[:Reviewed {name: 'reviewed'}]-(product1) 04 MERGE (reviewer2:Reviewer {name:'reviewer2'}) 05 MERGE (reviewer2)-[:Reviewed {name: 'reviewed'}]-(product1) 06 MERGE (reviewer3:Reviewer {name:'reviewer3'}) 07 MERGE (reviewer3)-[:Reviewed {name: 'reviewed'}]-(product1) 08 MERGE (reviewer7:Reviewer {name:'reviewer7'}) 09 MERGE (reviewer7)-[:Reviewed {name: 'reviewed'}]-(product1) 10 [...]
The MERGE
command creates a new entry, either a node or a relation, which could just as easily be done with a CREATE
command; however, MERGE
will not run wild if the entry already exists. Line 1 creates a new node of type Product
, assigns it the name
attribute product1
, and stores a reference to it in the product1
variable. The same happens with a Reviewer
node in line 2; line 3 then links the previously defined reviewer1
and product1
variables with a relation of type Reviewed
, which sets the name
attribute to reviewed
.
Entering all the data manually would quickly get on a user's nerves, which is why the Go program in Listing 3 automates the task of generating a series of Neo4j commands from the YAML list and sends them over port 7474 to the Neo4j server running in the container.
Listing 3
rimport.go
01 package main 02 03 import ( 04 "database/sql" 05 "fmt" 06 _ "gopkg.in/cq.v1" 07 "gopkg.in/yaml.v2" 08 "io/ioutil" 09 "log" 10 ) 11 12 type Config struct { 13 Reviews map[string][]string 14 } 15 16 func main() { 17 yamlFile := "reviews.yaml" 18 data, err := ioutil.ReadFile(yamlFile) 19 if err != nil { 20 log.Fatal(err) 21 } 22 23 var config Config 24 err = yaml.Unmarshal(data, &config) 25 if err != nil { 26 log.Fatal(err) 27 } 28 29 created := map[string]bool{} 30 cmd := "" 31 // nuke all content 32 toNeo4j(`MATCH (n) OPTIONAL MATCH 33 (n)-[r]-() DELETE n,r;`) 34 35 for prod, reviewers := 36 range config.Reviews { 37 for _, rev := range reviewers { 38 if _, ok := created[prod]; !ok { 39 cmd += fmt.Sprintf( 40 "MERGE (%s:Product {name:'%s'})\n", 41 prod, prod) 42 created[prod] = true 43 } 44 if _, ok := created[rev]; !ok { 45 cmd += fmt.Sprintf( 46 "MERGE (%s:Reviewer {name:'%s'})\n", 47 rev, rev) 48 created[rev] = true 49 } 50 cmd += fmt.Sprintf( 51 "MERGE (%s)-[:Reviewed " + 52 "{name: 'reviewed'}]-(%s)\n", 53 rev, prod) 54 } 55 } 56 cmd += ";" 57 toNeo4j(cmd) 58 } 59 60 func toNeo4j(cmd string) { 61 db, err := sql.Open("neo4j-cypher", 62 "http://neo4j:test@localhost:7474") 63 if err != nil { 64 log.Fatal(err) 65 } 66 defer db.Close() 67 68 _, err = db.Exec(cmd) 69 70 if err != nil { 71 log.Fatal(err) 72 } 73 }
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.